Our papers

S. Cebrat, M.R. Dudek, 1996, Symmetry in chromosome fractal organization and DNA domain structure. Proceedings of the 8th Joint EPS-APS Int. Conference on Physics Computing ?96, eds. P. Borchards, M. Bubak, A. Maksymowicz (Academic Computer Center, CYFRONET-KRAKÓW), 371 -374.

Abstract. We have shown that coding sequences in DNA molecule are highly correlated and organized in self-similar domain structure in which the nucleotide triplet-antitriplet mirror symmetry in the strand is preserved. The tendency to reach the symmetry forces the specific organization of DNA molecule and generates long-range power-like correlations in purines and pyrimidines distribution.



S. Cebrat, M.R. Dudek, A. Rogowska, 1997, Asymmetry in nucleotide composition of sense and antisense strands as a parameter for discriminating open reading frames as protein coding sequences. J. Appl. Genetics, 38(1), 1 - 9.

Abstract. Coding properties of yeast chromosomes were analyzed and a strong asymmetry was found in nucleotide composition of sense and antisense strands. This property generates two very simple parameters ? [A]/[T] and [G]/[C] of the sense strand ? which could be used for discrimination of open reading frames as coding sequences with very high, statistically described level of significance. The paper contains a description of the method of ellipse of concentration in the two parameter space, which can close coding sequences inside, leaving a big fraction of noncoding sequences outside the ellipse.



S. Cebrat, M.R. Dudek, P. Mackiewicz, M. Kowalczuk, M. Fita, 1997, Asymmetry of coding versus non-coding strand in coding sequences of different genomes. Microbial & Comparative Genomics. 2/4, 259 - 268.

Abstract. We have used the asymmetry between the coding and noncoding strands in different codon positions of coding sequences of DNA as a parameter to evaluate the coding probability for open reading frames (ORFs). The method enables an approximation of the total number of coding ORFs in the set of analyzed sequences as well as an estimation of the coding probability for the ORFs. The asymmetry observed in the nucleotide composition of codons in coding sequences has been used successfully for analysis of the genomes completed at the time of this analysis.



S. Cebrat, M.R. Dudek, P. Mackiewicz, 1997, Is there any mystery of ORPHANs? J. Appl. Genet., 38(4), 365 - 372.

Abstract. We have analyzed the coding capacity of ORFs longer than 100 codons found in the yeast genome. Comparing the parameters describing the DNA asymmetry in the set of known genes and the set of all ORFs >100 codons we have found that there are about 4700 coding ORFs in the yeast genome. Since for more than 2300 ORFs recognizable functions have been already found and for about 2000 ORFs homology to known genes has been identified ? only about 400 ORFs can be considered as orphans ? ORFs without any known functions or homology. This finding means that there is no mystery of orphans- a paradox showing that the fraction of orphans has been growing with the growing number of genes with known functions in the yeast genome.



S. Cebrat, J. Kąkol, 1997, The effect of social alliances on wolve population on their survival under hunting, Int. J. Mod. Physics C, 8. 2. 417 - 426.

Abstract. We have introduced the modified Verhulst factor to simulate the dynamics of wolves? population . The new factor enlarges the capacity of environment for organisms living in organized groups. Under this factor, social behavior allows the populations to reach the larger size in the same ecological niche. The other effect of the introduced factor is that additional non-selective killing factors limit the population size not only directly but also by shrinking the effective ecological niche capacity.



S. Cebrat, P. Mackiewicz, M.R. Dudek, 1998, The role of the genetic code in generating new coding sequences inside existing genes, Biosystems, 45/2, 165 - 176.

Abstract. The genetic code has a very interesting property ? it generates an open reading frame (ORF) inside a coding sequence, in a specific phase of antisense strand with much higher probability than in the random DNA sequences. Furthermore , these antisense ORFs (A-ORFs) posses the same features as real genes- the asymmetry in the nucleotide composition in the first and the second positions in codons. About two thirds of the 2997 overlapping ORFs in the yeast genome posses this feature. Thus , the question arises : has this feature of the genetic code been exploited in the evolution of genes? We have searched the FASTA data bases for homologies with the antisense translation products of a specific class of genes and we have found same sequences with relatively high homology . Many of them have scores which could be randomly found in the searched data bases with a probability lower than 10-6 . We conclude that some genes could arise by positioning a copy of the original gene under a promoter in the opposite direction in such a way that both , the original gene and its copy initially use the same nucleotides in the third , degenerated positions in codons .



S. Cebrat, M.R. Dudek, P. Mackiewicz, 1998, Sequence asymmetry as a parameter indicating coding sequence in Saccharomyces cerevisiae genome. Theory in Biosciences, 117, 78 - 89.

Abstract. We have compared a symmetry in purine and pyrimidine occurrence in different codon positions of coding , presumably coding and noncoding sequences of whole genome of Saccharomyces cerevisiae. We have shown that there is a very strong asymmetry in sense versus antisense strand in nucleotide occurrence in the first and second positions in codons. Science the observed asymmetry results from specific composition of the first two codon positions- the parameter is not correlated with Codon adaptation Index (CAI) and this property could be used as an independent parameter discriminating Open Reading Frames (ORFs) as coding sequences. We have also estimated the number of presumably coding ORFs in the Saccharomyces cerevisiae genome as 4718 (without interrupted genes). This approximation has been done for all ORFs longer than 100 codons identified in the yeast genome. The same method of approximation performed for ORFs published by SGD program (after selection made before publication of the data base) gave the total number of 4691 coding ORFs. That means: a- the previously suggested number of coding ORFs is overestimated; b- some ORFs discarded by the first selection could be coding (if we assume that there is any significant difference between the two results cited above); c- the method of estimation is, at least roughly, correct since it eliminates more than 2700 noncoding ORFs from our database and about 1400 ORFs from the published SGD, leaving discrepancy for only 27 ORFs and resulting in almost the same number of coding ORFs.



S. Cebrat, M.R. Dudek, 1998, The effect of DNA phase structure on DNA walks The European Physical Journal B3, 271 - 276

Abstract. We have performed several kinds of DNA walks which often are the first steps for further analysis of DNA structure and long range correlations. The DNA walks analyzing frequency of G+C versus A+T cannot indicate the coding strand while purine versus pyrimidine DNA walks or two-dimensional (A-T, G-C) DNA walks in same instances can indicate the coding strand but cannot resolve the coding frame. The modified two-dimensional (A-T, G-C) DNA walks respecting the three-nucleotide codon structure show very high correlation in nucleotide composition of DNA coding sequences. They can distinguish between coding and non-coding sequences and indicate the strand and the phase in which DNA is coding.



S. Cebrat, 1998, Penna Model From the Perspective of One Geneticist, Physica A, 258, 493-498.

Abstract. Penna model of ageing predicts many phenomena in population dynamics. Since the model assumes that all genes in genomes are switched on chronologically and that there are no structural differences between male and female genomes, it cannot explain genetic death before birth and differences in mortality rates of men and women. I suggest adding the set of housekeeping genes, which are switched on during the embryo development, to the ?death genes? of Penna model. Taking into account the large fraction of genes located on X chromosome whose deleterious mutations exert dominant effect on the male phenotype and recessive on the female phenotype would make it possible to avoid introducing somatic mutations as a cause of higher mortality of men. The modeling of linkage disequilibrium and its implications on eugenics have also been suggested.



J. Schneider, S. Cebrat, D. Stauffer; 1998, Why do women live longer than men? A monte Carlo simulation of Penna models with X and Y chromosomes. Int. J. Mod. Physics C, 9. 721 -725.

Abstract. There are about 300 different theories on ageing but only a few of them try to explain the fact that male mortality is higher than the female?s. Some of these theories are very interesting: Men die earlier than women on average because they produce sperm or because men drink more vodka; others blame the stress caused by speeches at mostly male conferences for the higher male mortality. All these explanations ignore the fact that the mortality of male babies is already higher than that of female babies. This work which is based on a hypothesis by Cebrat is able to explain why male mortality is higher without suggesting less somatic mutations, less dominant mutations, or higher resistance against mutations in women.



S. Cebrat, M.R. Dudek, A. Gierlik, M. Kowalczuk P. Mackiewicz, 1999, Effect of replication on the third base of codons. Physica A, 265/1-2, 78 - 84.

Abstract. We have analyzed third position in codons and have observed strong long-range correlations along DNA sequence.We have shown that the correlations are caused mostly by asymmetric replication. In the analysis, we have used a DNA walk (spider analysis) in two-dimensional spaces [A-T, G-C]. The particular case of the Escherichia coli sequence has been studied in detail.



A. Gierlik, P. Mackiewicz, M. Kowalczuk, M.R. Dudek, S. Cebrat, 1999, Some hints on Open Reading Frame statistics - how ORF length depends on selection. Int. J. Modern Phys. C in press.

Abstract. Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially into antisense strand. This is a specific feature of genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer than spontaneously generated in random DNA sequences should be considered as two different sets of ORFs: the first one ? coding for proteins, the second one ? generated by the coding ORFs. Even intergenic sequences possess different capacity of generating ORFs than the random DNA sequence of the same nucleotide composition, what seems to be one of premises that intergenic sequences were generated from coding sequences by recombinational mechanisms.



P. Mackiewicz, A. Gierlik, M. Kowalczuk, M.R. Dudek, S. Cebrat, 1999, How does replication-associated mutational pressure influence amino acid composition of proteins? Genome Research, 9(5), 409-416.

Abstract. We have performed detrended DNA walks on whole prokaryotic genomes, on noncoding sequences and, separately, on each position in codons of coding sequences. Our method of DNA walks enables us to distinguish the mutational pressure associated with replication from the mutational pressure associated with transcription and with other mechanisms introducing asymmetry into prokaryotic chromosomes. In many prokaryotic genomes, each component of mutational pressure affects coding sequences not only in silent positions but also in positions where changes cause amino acid substitutions in coded proteins. Asymmetry in the silent positions of codons differentiates the rate of translation of mRNA produced from leading and lagging strands. Asymmetry in amino acid composition of proteins resulting from replication-associated mutational pressure also corresponds to leading and lagging roles of DNA strands, while asymmetry connected with transcription and coding function corresponds to the distance of genes from origin or terminus of chromosome replication.



M. Kowalczuk, P. Mackiewicz, A. Gierlik, M.R. Dudek, Stanislaw Cebrat, 1999, Total Number of Coding Open Reading Frames in the Yeast Genome, YEAST, 15, 1031-1034.

Abstract. At the end of 1996 we approximated the total number of protein coding ORFs in the Saccharomyces cerevisiae genome, based on their properties, for 4700 - 4800. The number is much smaller than the 5800 which is widely accepted. According to our calculations, there remain about 200 - 300 orphans - ORFs without known function or homology to already discovered genes, which is only about 5% of the total number of genes. Our results would be questionable if the analysed set of known genes was not a statistically representative sample of the whole set of protein coding genes in the S. cerevisiae genome. Therefore we repeated our estimation with the recently updated data bases. In the course of the last 18 months, previously unknown functions of about 500 genes were found. We used them to check our method, former results, and conclusions. Our previous estimation of the total number of coding ORFs was confirmed.



P. Mackiewicz, A. Gierlik, M. Kowalczuk, M.R. Dudek, S. Cebrat, 1999, Asymmetry of nucleotide composition of prokaryotic chromosomes, J. Appl. Genet., 40(1), 1-14.

Abstract. We have analysed the causes of asymmetry in nucleotide composition of DNA complementary strands of prokaryotic chromosomes. Analysing DNA walks we have separated the effect of replication-associated processes from the effect introduced by transcription and coding functions. The asymmetry introduced by replication switches its polarity at the origin and at the terminus of replication, which is observed in both noncoding and coding sequences and varies with respect to positions in codons. Coding functions introduce very strong trends into protein coding ORFs, which are specific for each nucleotide position in the codon. Using detrended DNA walks we have eliminated the effect of coding density and we were able to distinguish between mutational pressure associated with replication and compositional bias for genes proximal and distal to the origin of replication.



P. Mackiewicz, M. Kowalczuk, A. Gierlik, M. R. Dudek, S. Cebrat, 1999, Origin and properties of noncoding ORFs in the yeast genome, Nucleic Acids Res., 27(17), 3503-3509.

Abstract. In a recent paper we have estimated the total number of protein coding Open Reading Frames (ORFs) in the Saccharomyces cerevisiae genome, based on their properties, at about 4800. This number is much smaller than the 5800 - 6000 which is widely accepted. In this paper we analyse differences between the set of ORFs with known phenotypes annotated in the Munich Information Centre for Protein Sequences (MIPS) data base and ORFs for which the probability of coding, counted by us, is very low. We have found that many of the latter ORFs have properties of anti-sense sequences of coding ORFs, which suggests that they could have been generated by duplication of coding sequences. Since coding sequences generate ORFs inside themselves, with especially high frequency in the antisense, we have looked for homology between known proteins and hypothetical polypeptides generated by ORFs under consideration in all the six phases. For many ORFs we have found paralogues and orthologues in phases different than the phase which had been assumed in MIPS data base as coding.



S. Cebrat, M.R. Dudek, 1999, Periodyczność w zdolnościach kodujących DNA, Symetrie w Naukach Przyrodniczych. in press

Abstract. The yeast genome is composed of 16 chromosomes, each being a single DNA molecule of at least hundreds of thousands nucleotides long. Stanley et al. have shown that there are long range correlations in noncoding sequences of DNA. That means that there are some rules in distribution of nucleotides along the DNA molecule at least in noncoding regions. We have found, that the distribution of whole coding regions along the DNA is not random. It seems that the rules responsible for nonrandom distribution of genes are the result of physical properties of the molecule and the necessity of keeping it in the stable state. Probably the huge molecule is stable when it possesses a character of dispersed palindrome. We have shown that the properties of dispersed palindrome could be found in the whole genome, single chromosomes or even in very small fractions of the chromosome. This symmetry is broken inside the coding sequence but these sequences are distributed along the DNA in such a way that they compensate each other in very short distances. We have also found that there are very strong rules in distribution of nucleotides inside coding regions. Especially distribution of nucleotides in the first position of codons are highly correlated with the distribution of nucleotides in the second positions of codons. Since there are no such a correlation outside coding regions - this properties could be used for looking for genes. We discuss the impact of the found symmetry and the long range correlations in the DNA sequences on the genome evolution and our understanding of such phenomena as Haldane dilemma.



S. Cebrat, P. Mackiewicz, M.R. Dudek, 1999, Kodujące palindromy, Symetrie w Naukach Przyrodniczych. in press

Abstract. The genetic code has a very interesting property - it generates an open reading frame (ORF) inside a coding sequence, in a specific phase of the antisense strand with much higher probability than in the random DNA sequences. Furthermore, these antisense ORFs possess the same features as real genes - the asymmetry in the nucleotide composition at the first and the second positions in codons. About two thirds of 2997 overlapping ORFs in the yeast genome possess this feature. Thus, the question arises: has this feature of the genetic code been exploited in the evolution of genes? We have searched the FASTA data bases for homologies with the antisense translation products of a specific class of genes and we have found some sequences with relatively high homology. Many of them have scores which could be randomly found in the searched data bases with probability lower than 10-6. We conclude that some genes could arise by positioning a copy of the original gene under a promoter in the opposite direction in such a way that both, the original gene and its copy initially use the same nucleotides in the third, degenerated positions in codons.



S. Cebrat, M.R. Dudek, 1995, Coding rhythm of DNA strands. preprint, Instytut Fizyki Teoretycznej, Uniwersytet Wroclawski, 893/95

Abstract. We have analysed the Yeast chromosome II and showed that that the noncoding nucleotide sequence of DNA molecule are synchronized both with the appearance of the coding regions in the six phases of DNA and their size. The property generates the long-range power-law correlations in DNA reflecting the hierarchical harmonic structure of DNA molecule.



M.R. Dudek, S. Cebrat, 1995, Stochastic DNA in the presence of the coding bias. preprint, Instytut Fizyki Teoretycznej, Uniwersytet Wroclawski, 897/95

Abstract. We have shown that the long-range power-law correlations in DNA coding function are connected with the appearance of purine rich sequences in
the sense strand of DNA molecule. Superimposing the 1/f noise rules of natural chromosome on the random DNA sequence generates long-range correlations in it
and introduces into the stochastic sequence some characters of natural chromosomes.



S. Cebrat, M.R. Dudek, 1996, Symmetry in chromosome fractal organization and DNA domain structure. preprint, Instytut Fizyki Teoretycznej, Uniwersytet Wroclawski , 904/96

Abstract. We have shown that coding sequences in DNA molecule are highly correlated and organised in a self-similar domain structure in which the nucleotide  triplet-antitriplet mirror symmetry in the strand is preserved. The tendency  to reach the symmetry forces the specific organisation of DNA molecule and  generates long-range power-like correlations in purines and pyrimidines distribution.