S. Cebrat, M.R. Dudek, 1996, Symmetry in chromosome fractal organization and DNA domain structure. Proceedings of the 8th Joint EPS-APS Int. Conference on Physics Computing ?96, eds. P. Borchards, M. Bubak, A. Maksymowicz (Academic Computer Center, CYFRONET-KRAKÓW), 371 -374.
Abstract. We have shown that coding sequences in DNA molecule
are highly correlated and organized in self-similar domain structure in
which the nucleotide triplet-antitriplet mirror symmetry in the strand
is preserved. The tendency to reach the symmetry forces the specific organization
of DNA molecule and generates long-range power-like correlations in purines
and pyrimidines distribution.
Abstract. Coding properties of yeast chromosomes were analyzed
and a strong asymmetry was found in nucleotide composition of sense and
antisense strands. This property generates two very simple parameters ?
[A]/[T] and [G]/[C] of the sense strand ? which could be used for discrimination
of open reading frames as coding sequences with very high, statistically
described level of significance. The paper contains a description of the
method of ellipse of concentration in the two parameter space, which can
close coding sequences inside, leaving a big fraction of noncoding sequences
outside the ellipse.
Abstract. We have used the asymmetry between the coding and noncoding
strands in different codon positions of coding sequences of DNA as a parameter
to evaluate the coding probability for open reading frames (ORFs). The
method enables an approximation of the total number of coding ORFs in the
set of analyzed sequences as well as an estimation of the coding probability
for the ORFs. The asymmetry observed in the nucleotide composition of codons
in coding sequences has been used successfully for analysis of the genomes
completed at the time of this analysis.
Abstract. We have analyzed the coding capacity of ORFs longer
than 100 codons found in the yeast genome. Comparing the parameters describing
the DNA asymmetry in the set of known genes and the set of all ORFs >100
codons we have found that there are about 4700 coding ORFs in the yeast
genome. Since for more than 2300 ORFs recognizable functions have been
already found and for about 2000 ORFs homology to known genes has been
identified ? only about 400 ORFs can be considered as orphans ? ORFs without
any known functions or homology. This finding means that there is no mystery
of orphans- a paradox showing that the fraction of orphans has been growing
with the growing number of genes with known functions in the yeast genome.
Abstract. We have introduced the modified Verhulst factor to
simulate the dynamics of wolves? population . The new factor enlarges the
capacity of environment for organisms living in organized groups. Under
this factor, social behavior allows the populations to reach the larger
size in the same ecological niche. The other effect of the introduced factor
is that additional non-selective killing factors limit the population size
not only directly but also by shrinking the effective ecological niche
capacity.
Abstract. The genetic code has a very interesting property ?
it generates an open reading frame (ORF) inside a coding sequence, in a
specific phase of antisense strand with much higher probability than in
the random DNA sequences. Furthermore , these antisense ORFs (A-ORFs) posses
the same features as real genes- the asymmetry in the nucleotide composition
in the first and the second positions in codons. About two thirds of the
2997 overlapping ORFs in the yeast genome posses this feature. Thus , the
question arises : has this feature of the genetic code been exploited in
the evolution of genes? We have searched the FASTA data bases for homologies
with the antisense translation products of a specific class of genes and
we have found same sequences with relatively high homology . Many of them
have scores which could be randomly found in the searched data bases with
a probability lower than 10-6 . We conclude that some genes
could arise by positioning a copy of the original gene under a promoter
in the opposite direction in such a way that both , the original gene and
its copy initially use the same nucleotides in the third , degenerated
positions in codons .
Abstract. We have compared a symmetry in purine and pyrimidine
occurrence in different codon positions of coding , presumably coding and
noncoding sequences of whole genome of Saccharomyces cerevisiae.
We have shown that there is a very strong asymmetry in sense versus antisense
strand in nucleotide occurrence in the first and second positions in codons.
Science the observed asymmetry results from specific composition of the
first two codon positions- the parameter is not correlated with Codon adaptation
Index (CAI) and this property could be used as an independent parameter
discriminating Open Reading Frames (ORFs) as coding sequences. We have
also estimated the number of presumably coding ORFs in the Saccharomyces
cerevisiae genome as 4718 (without interrupted genes). This approximation
has been done for all ORFs longer than 100 codons identified in the yeast
genome. The same method of approximation performed for ORFs published by
SGD program (after selection made before publication of the data base)
gave the total number of 4691 coding ORFs. That means: a- the previously
suggested number of coding ORFs is overestimated; b- some ORFs discarded
by the first selection could be coding (if we assume that there is any
significant difference between the two results cited above); c- the method
of estimation is, at least roughly, correct since it eliminates more than
2700 noncoding ORFs from our database and about 1400 ORFs from the published
SGD, leaving discrepancy for only 27 ORFs and resulting in almost the same
number of coding ORFs.
Abstract. We have performed several kinds of DNA walks which
often are the first steps for further analysis of DNA structure and long
range correlations. The DNA walks analyzing frequency of G+C versus
A+T cannot indicate the coding strand while purine versus pyrimidine
DNA walks or two-dimensional (A-T, G-C) DNA walks in same instances
can indicate the coding strand but cannot resolve the coding frame. The
modified two-dimensional (A-T, G-C) DNA walks respecting the three-nucleotide
codon structure show very high correlation in nucleotide composition of
DNA coding sequences. They can distinguish between coding and non-coding
sequences and indicate the strand and the phase in which DNA is coding.
Abstract. Penna model of ageing predicts many phenomena in population
dynamics. Since the model assumes that all genes in genomes are switched
on chronologically and that there are no structural differences between
male and female genomes, it cannot explain genetic death before birth and
differences in mortality rates of men and women. I suggest adding the set
of housekeeping genes, which are switched on during the embryo development,
to the ?death genes? of Penna model. Taking into account the large fraction
of genes located on X chromosome whose deleterious mutations exert dominant
effect on the male phenotype and recessive on the female phenotype would
make it possible to avoid introducing somatic mutations as a cause of higher
mortality of men. The modeling of linkage disequilibrium and its implications
on eugenics have also been suggested.
Abstract. There are about 300 different theories on ageing but
only a few of them try to explain the fact that male mortality is higher
than the female?s. Some of these theories are very interesting: Men die
earlier than women on average because they produce sperm or because men
drink more vodka; others blame the stress caused by speeches at mostly
male conferences for the higher male mortality. All these explanations
ignore the fact that the mortality of male babies is already higher than
that of female babies. This work which is based on a hypothesis by Cebrat
is able to explain why male mortality is higher without suggesting less
somatic mutations, less dominant mutations, or higher resistance against
mutations in women.
Abstract. We have analyzed third position in codons and have
observed strong long-range correlations along DNA sequence.We have shown
that the correlations are caused mostly by asymmetric replication. In the
analysis, we have used a DNA walk (spider analysis) in two-dimensional
spaces [A-T, G-C]. The particular case of the Escherichia coli sequence
has been studied in detail.
Abstract. Coding sequences of DNA generate Open Reading Frames
(ORFs) inside them with much higher frequency than random DNA sequences
do, especially into antisense strand. This is a specific feature of genetic
code. Since coding sequences are selected for their length, the generated
ORFs are indirect results of this selection and their length is also influenced
by selection. That is why ORFs found in any genome, even much longer than
spontaneously generated in random DNA sequences should be considered as
two different sets of ORFs: the first one ? coding for proteins, the second
one ? generated by the coding ORFs. Even intergenic sequences possess different
capacity of generating ORFs than the random DNA sequence of the same nucleotide
composition, what seems to be one of premises that intergenic sequences
were generated from coding sequences by recombinational mechanisms.
Abstract. We have performed detrended DNA walks on whole prokaryotic
genomes, on noncoding sequences and, separately, on each position in codons
of coding sequences. Our method of DNA walks enables us to distinguish
the mutational pressure associated with replication from the mutational
pressure associated with transcription and with other mechanisms introducing
asymmetry into prokaryotic chromosomes. In many prokaryotic genomes, each
component of mutational pressure affects coding sequences not only in silent
positions but also in positions where changes cause amino acid substitutions
in coded proteins. Asymmetry in the silent positions of codons differentiates
the rate of translation of mRNA produced from leading and lagging strands.
Asymmetry in amino acid composition of proteins resulting from replication-associated
mutational pressure also corresponds to leading and lagging roles of DNA
strands, while asymmetry connected with transcription and coding function
corresponds to the distance of genes from origin or terminus of chromosome
replication.
Abstract. At the end of 1996 we approximated the total number
of protein coding ORFs in the Saccharomyces cerevisiae genome, based
on their properties, for 4700 - 4800. The number is much smaller than the
5800 which is widely accepted. According to our calculations, there remain
about 200 - 300 orphans - ORFs without known function or homology to already
discovered genes, which is only about 5% of the total number of genes.
Our results would be questionable if the analysed set of known genes was
not a statistically representative sample of the whole set of protein coding
genes in the S. cerevisiae genome. Therefore we repeated our estimation
with the recently updated data bases. In the course of the last 18 months,
previously unknown functions of about 500 genes were found. We used them
to check our method, former results, and conclusions. Our previous estimation
of the total number of coding ORFs was confirmed.
Abstract. We have analysed the causes of asymmetry in nucleotide
composition of DNA complementary strands of prokaryotic chromosomes. Analysing
DNA walks we have separated the effect of replication-associated processes
from the effect introduced by transcription and coding functions. The asymmetry
introduced by replication switches its polarity at the origin and at the
terminus of replication, which is observed in both noncoding and coding
sequences and varies with respect to positions in codons. Coding functions
introduce very strong trends into protein coding ORFs, which are specific
for each nucleotide position in the codon. Using detrended DNA walks we
have eliminated the effect of coding density and we were able to distinguish
between mutational pressure associated with replication and compositional
bias for genes proximal and distal to the origin of replication.
Abstract. In a recent paper we have estimated the total number
of protein coding Open Reading Frames (ORFs) in the Saccharomyces cerevisiae
genome, based on their properties, at about 4800. This number is much smaller
than the 5800 - 6000 which is widely accepted. In this paper we analyse
differences between the set of ORFs with known phenotypes annotated in
the Munich Information Centre for Protein Sequences (MIPS) data base and
ORFs for which the probability of coding, counted by us, is very low. We
have found that many of the latter ORFs have properties of anti-sense sequences
of coding ORFs, which suggests that they could have been generated by duplication
of coding sequences. Since coding sequences generate ORFs inside themselves,
with especially high frequency in the antisense, we have looked for homology
between known proteins and hypothetical polypeptides generated by ORFs
under consideration in all the six phases. For many ORFs we have found
paralogues and orthologues in phases different than the phase which had
been assumed in MIPS data base as coding.
Abstract. The yeast genome is composed of 16 chromosomes, each
being a single DNA molecule of at least hundreds of thousands nucleotides
long. Stanley et al. have shown that there are long range correlations
in noncoding sequences of DNA. That means that there are some rules in
distribution of nucleotides along the DNA molecule at least in noncoding
regions. We have found, that the distribution of whole coding regions along
the DNA is not random. It seems that the rules responsible for nonrandom
distribution of genes are the result of physical properties of the molecule
and the necessity of keeping it in the stable state. Probably the huge
molecule is stable when it possesses a character of dispersed palindrome.
We have shown that the properties of dispersed palindrome could be found
in the whole genome, single chromosomes or even in very small fractions
of the chromosome. This symmetry is broken inside the coding sequence but
these sequences are distributed along the DNA in such a way that they compensate
each other in very short distances. We have also found that there are very
strong rules in distribution of nucleotides inside coding regions. Especially
distribution of nucleotides in the first position of codons are highly
correlated with the distribution of nucleotides in the second positions
of codons. Since there are no such a correlation outside coding regions
- this properties could be used for looking for genes. We discuss the impact
of the found symmetry and the long range correlations in the DNA sequences
on the genome evolution and our understanding of such phenomena as Haldane
dilemma.
Abstract. The genetic code has a very interesting property -
it generates an open reading frame (ORF) inside a coding sequence, in a
specific phase of the antisense strand with much higher probability than
in the random DNA sequences. Furthermore, these antisense ORFs possess
the same features as real genes - the asymmetry in the nucleotide composition
at the first and the second positions in codons. About two thirds of 2997
overlapping ORFs in the yeast genome possess this feature. Thus, the question
arises: has this feature of the genetic code been exploited in the evolution
of genes? We have searched the FASTA data bases for homologies with the
antisense translation products of a specific class of genes and we have
found some sequences with relatively high homology. Many of them have scores
which could be randomly found in the searched data bases with probability
lower than 10-6. We conclude that some genes could arise by
positioning a copy of the original gene under a promoter in the opposite
direction in such a way that both, the original gene and its copy initially
use the same nucleotides in the third, degenerated positions in codons.
Abstract. We have analysed the Yeast chromosome II and showed
that that the noncoding nucleotide sequence of DNA molecule are synchronized
both with the appearance of the coding regions in the six phases of DNA
and their size. The property generates the long-range power-law correlations
in DNA reflecting the hierarchical harmonic structure of DNA molecule.
Abstract. We have shown that the long-range power-law correlations
in DNA coding function are connected with the appearance of purine rich
sequences in
the sense strand of DNA molecule. Superimposing the 1/f noise rules
of natural chromosome on the random DNA sequence generates long-range correlations
in it
and introduces into the stochastic sequence some characters of natural
chromosomes.
Abstract. We have shown that coding sequences in DNA molecule
are highly correlated and organised in a self-similar domain structure
in which the nucleotide triplet-antitriplet mirror symmetry in the
strand is preserved. The tendency to reach the symmetry forces the
specific organisation of DNA molecule and generates long-range power-like
correlations in purines and pyrimidines distribution.