S. Cebrat, M. R. Dudek, P. Mackiewicz, M. Kowalczuk, M. Fita, 1997, Asymmetry of coding versus non-coding strand in coding sequences of different genomes. Microbial & Comparative Genomics. 2(4), 259 - 268. (abstract)
S. Cebrat, M. Dudek, P. Mackiewicz, 1997, Is there any mystery of ORPHANs? Journal of Applied Genetics, 38(4), 365 - 372. (abstract)
S. Cebrat, P. Mackiewicz, M. R. Dudek, 1998, The role of the genetic code in generating new coding sequences inside existing genes. Biosystems, 45(2), 165 - 176. (abstract)
S. Cebrat, M. R. Dudek, P. Mackiewicz, 1998, Sequence asymmetry as a parameter indicating coding sequence in Saccharomyces cerevisiae genome. Theory in Biosciences, 117, 78 - 89. (abstract)
S. Cebrat, M. Dudek, 1998, The effect of DNA phase structure on DNA walks. The European Physical Journal B, 3, 271 - 276. (abstract)
S. Cebrat, M. R. Dudek, A. Gierlik, M. Kowalczuk P. Mackiewicz, 1999, Effect of replication on the third base of codons. Physica A, 265(1-2), 78 ? 84. (abstract)
P. Mackiewicz, A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, How does replication-associated mutational pressure influence amino acid composition of proteins? Genome Research,9(5), 409-416. (abstract)
M. Kowalczuk, P. Mackiewicz, A. Gierlik, M. R. Dudek, S. Cebrat, 1999, Total Number of Coding Open Reading Frames in the Yest Genome. YEAST, in press. (abstract)
P. Mackiewicz, A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, Asymmetry of nucleotide composition of prokaryotic chromosomes. Journal of Applied Genetics, 40(1), 1-14. (abstract)
Graphic representation of coding DNA sequences - a spider
To make a graphic representation of a coding DNA sequence in two dimensional space, we analyzed the displacement of a DNA walker which checked each position of codons separately. For the DNA walk we have used a modified method of Berthelsen, Glazier and Skolnick (1992). For each sequence we have performed three DNA walks, independently for each nucleotide position in triplets (Fig. 1).
Fig. 1. Three DNA walks done independently for each nucleotide position
in triplets. Each DNA walk represents "history" of nucleotide composition
of the first, the second or the third position of codons along the DNA
sequence. The three walks together have been called a spider and
a single walk has been called a spider leg. It is possible to extract
some numerical information from these plots:
- the slope, (if measured in degrees it is equal to arcus tangent
[(G-C)/(A-T)]), and
- the length of the vector determined by the origin and the end
of the spider leg (it is equal to sqrt[(G-C)2 +(A-T)2].
The first walker starts from the first nucleotide position
of the first codon and jumps every third nucleotide until the end of the
examined sequence has been reached. Similarly, the second and the third
walkers start from the second and third nucleotide positions of the first
codon, respectively. Every jump of a walker is associated with a unit shift
in the two-dimensional space depending on the type of nucleotide visited.
The shifts are: (0,1) for G, (1,0) for A, (0,-1) for C and (-1,0) for T.
Hence, each DNA walk represents "history" of nucleotide composition of
the first, the second or the third position of codons along the DNA sequence.
The three walks together have been called a spider and a single
walk has been called a spider leg.
Fig. 2a shows an example of a spider representing a typical
gene of the yeast genome, SWP73 (YNR023w), a component of SWI/SNF complex
activating transcription. In Fig. 2b a spider representing an intergenic
sequence 921 triplets long is presented.
Fig. 2. Three DNA walks for a) a typical gene of the yeast genome,
SWP73 (YNR023w); b) an intergenic sequence 921 triplets long.
Fig. 3. The genomic spider made for Saccharomyces
|
It is possible to make a spider not
only for a particular ORF or gene but for all ORFs (genes) found in the
analyzed genome. To do that, all ORFs of the genome are spliced in tandem;
stop to start. Such spiders are called genomic spiders. Genomic
spiders show graphically the trends in nucleotide compositions of particular
positions in codons. The genomic spider made for spliced yeast genes
is presented in Fig. 3.
It is also possible to make spiders for all ORFs coded by the leading strand or the lagging strand in bacterial genomes. Comparing these spiders, one can easily notice differences in nucleotide composition of genes coded by the two strands. Other genomic spiders and their analyses are shown in the section: Sense-antisense DNA strand asymmetry and Bacterial chromosome asymmetry.
|
Distribution of ORFs in a torus projection
Spiders depict nucleotide composition of the three positions in codons, but it is possible to extract only some numerical information from these plots and to characterize whole sets of ORFs by this method. For each ORF we have measured (in degrees) the slopes of the vectors determined by the origins and the ends of the spider legs (Fig. 1). In fact the slopes are equal to arcus tangent [(G-C)/(A-T)] for a given position in codons. We have assumed that the slopes have positive values for the first two quarters of the plot and negative for the third and fourth quarters. This has enabled us to construct a plot where each ORF is represented by a point whose co-ordinates are: (x) - the slope representing the first leg, and (y) - the slope representing the second leg. It is also possible to use the slope of the third leg as one of the two co-ordinates or as the third co-ordinate in three-dimensional space. The distribution of intergenic sequences, all ORFs longer than 100 codons and genes from the yeast genome is presented in Fig. 4.
Fig. 4. Distribution of sequences from the Saccharomyces cerevisiae genome on the torus projection for a) intergenic sequences; b) all ORFs longer than 100 codons; c) genes.
Note that the surfaces of these plots are finite projections of toruses (Fig. 5).
Fig. 5. Distribution of all ORFs longer than 100 codons from the Saccharomyces cerevisiae genome on the torus.
Distributions of different sets of ORFs for other genomes
are presented in the section: Sense-antisense DNA
strand asymmetry.
Distribution of ORFs on the torus projection was a base
for our method of approximating the total number of protein coding ORFs
in the yeast genome. See section: Total
number of coding ORFs in the yeast genome
DNA walks show bacterial chromosome asymmetry
To show DNA
compositional bias, different DNA walks and their transformations were
done. Detailed descriptions of DNA walks, their possible interpretation
and nomenclature are according to Cebrat
and Dudek (1998). To show local
trends independent of coding functions, we performed ?detrended DNA walks?
(DDW) in which we eliminated strong trends resulting from base composition
of coding ORFs (Cebrat
et
al., 1997, Cebrat
and Dudek, 1998) which mask the asymmetry of strands introduced
by mutational pressure.
To eliminate
these ?coding trends? we counted for a given ORF the value:
J = [N] - (F x L), where:
J - is the value of the walker jump for the ORF,
N - is the number of nucleotide (A, T, G or C) in the
analyzed positions of the ORF,
F - is the frequency of the given nucleotide at the examined
positions in the whole set of analyzed ORFs,
L - is the length of the given ORF in codons.
When intergenic
sequences were analyzed, F was the frequency of the nucleotide in the whole
set of intergenic sequences and L was the length of the visited sequence
in nucleotides.
We applied an analogous
procedure to the analysis of distribution of codons and amino acids on
chromosome. In this case we put in the above equation the number of the
analyzed codons or the coded amino acid residues instead of N for a given
ORF and the frequency of the given codon or amino acid in the set
of the analyzed ORFs instead of F.
We have used the J values to make
detrended DNA walks - walking along the chromosome the walker cumulated
these values.
The idea of
elimination of these trends is shown in Fig. 6. Analysis of the Treponema
pallidum genome is an example. In Fig. 6a the "direct" method of sliding
windows was used. Numbers on y-axis show the number of G in consecutive
600 nucleotide long sequences. In Fig. 6b numbers on y-axis indicate differences
between the mean value of G content (red line in Fig. 6a) and the found
value for a given window. In Fig. 6c the values shown in Fig. 6b for the
consecutive windows were cumulated.
Fig. 6. Elimination of the trend for guanine
for the
Treponema pallidum genome; a) sliding windows, b) deviations
from mean value,
c) cumulative plot.
In Fig.7 the asymmetry of the Treponema pallidum genome is presented. For other examples of chromosome asymmetry, see section: Bacterial chromosome asymmetry and Asymmetry of chromosomes in their coding properties.
Subtraction and addition of DNA walks
Fig. 7. Subtraction and addition of DNA walks made for ORFs longer than 150 codons of the Treponema pallidum genome. |
The idea of transformations of DNA
walks (subtraction and addition) is shown in Fig. 7.
When the walks for the Crick strand were subtracted from the walks for the Watson strand, the value of the walker jump for each ORF lying in the Crick strand was multiplied by (-1). When the walks for the two strands were added, the walker visited non-overlapping ORFs of both strands as they appeared on the chromosome, scanned them in the proper reading frame and moved according to the result of scanning. When asymmetry is introduced by replication-associated mechanisms to ORFs located on different strands in the same region of chromosome, then values of asymmetry should have opposite signs, so if we add the values of the asymmetry of these strands, they will compensate each other and no effect of leading/lagging asymmetry will be observed. Moreover, subtraction of these values will cumulate the effect of asymmetry introduced by replication-associated mechanisms into leading and lagging strands. At the terminus of replication trends should inverse, because at this point strands change their role from leading to lagging and vice versa. On the other hand, transcription or ?coding trends? should introduce a bias independently of leading and lagging strand. Thus, addition will cumulate these trends while after subtraction they should diminish or disappear. Note, that additions and subtractions are done on detrended DNA walks.
|
Fig. 8. DNA walks made for ORFs longer
|
There are two significantly different classes
of DNA walks analyzing coding sequences. The DNA walks of the first class
are performed in the scale of chromosome (Fig. 8a). In these walks numbers
on x-axis represents the real co-ordinates of ORFs on chromosome. Note
that detrended walks done in the scale of chromosome lose their information
on the asymmetry in the total length of ORFs on leading and lagging strands
(coding density). That is why addition of these walks done for ORFs of
Watson and Crick strands eliminates the effect of replication-associated
mutational pressure and does not depend on differences in coding density
of leading versus lagging strands.
It is also possible to make a DNA walk on spliced ORFs (Fig. 8b). All ORFs lying only on one strand (Watson or Crick) in their proper order are spliced together stop to start. In this analysis the x-axis is scaled in the numbers representing co-ordinates of the spliced sequence, not the chromosome. In case of a bacterial chromosome, when ORFs of one strand only are spliced (Watson strand for example), the walk can show the asymmetry in coding density between leading and lagging strands. The shifts of the extrema in Fig. 8b are the measure of the coding density differences between leading and lagging strands (see also section: Bacterial chromosome asymmetry).
|
Such DNA walks transformations make it possible to show asymmetry in bacterial chromosomes and asymmetry of chromosomes in their coding properties, and enable distinguishing between the mutational pressure associated with replication and the mutational pressure associated with transcription and/or with other mechanisms introducing asymmetry into prokaryotic chromosomes.