S. Cebrat, M. R. Dudek, A. Gierlik, M. Kowalczuk P. Mackiewicz, 1999, Effect of replication on the third base of codons.Physica A, 265(1-2), 78-84. (abstract)
P. Mackiewicz,A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, How does replication-associated mutational pressure influence amino acid composition of proteins? Genome Research,9(5), 409-416. (abstract) www.genome.org
P. Mackiewicz, A.Gierlik, M. Kowalczuk, M. R. Dudek, S. Cebrat, 1999, Asymmetry of nucleotide composition of prokaryotic chromosomes. Journal of Applied Genetics, 40(1), 1-14. (abstract)
Traditional DNA
walks supply a lot of information on chemical structure of DNA. Nevertheless,
it is easy to modify DNA walks in the way which enables the logical DNA
analyses.
The most important
approach is analysis of coding capacity of DNA and coding properties of
genomes. We have already shown how DNA walks can depict differences in
coding density between leading and lagging DNA strands (DNA
walks). But our walker can recognise any
feature of analysed sequence, not only nucleotide. It can recognise individual
codons, or classes of codons groupped according to their nucleotide composition
or coding sense. Thus, we can get information on many specific properties
of genomes.
In the section
DNA
walks we have shown that in many
bacterial genomes a specific asymmetry between leading and lagging strands
in nucleotide composition occurs. There are many mechanisms which introduce
or may introduce the compositional asymmetry into DNA molecule. A random
DNA sequence should not exhibit any statistically significant compositional
bias between the two complementary strands. Nevertheless, there are some
processes which do not treat the two strands of natural DNA molecule equally.
One of these processes is replication. The main cause of unequal fidelity
of leading and lagging strand replication is still not clear. It is controversial
if replication of only one or both strands is discontinuous (Okazaki et
al. 1968, Kornberg and Baker 1992, Wang and Chen 1992, 1994). Nevertheless,
the topology of the replication fork itself requires the involvement of
different enzymatic mechanisms in replication of each DNA strand (Kunkel
1992, Waga and Stillman 1994). Besides the above-mentioned mechanisms,
differences in processivity of leading and lagging DNA strands may be responsible
for differential accuracy of DNA replication of these two strands (Fijalkowska
et al. 1998). Thus, both strands are exposed to different mutational pressures
and compositional bias has been found between them as a result (Lobry 1996a
and 1996b; Blattner et al. 1997; Mrazek and Karlin 1998; Grigoriev 1998;
Freeman
et al., 1998; McLean et al., 1998).
The asymmetry in nucleotide compostion
of DNA implicate a question:
Are there any differences in
the amino acid composition of proteins coded by genes located on
leading and lagging strands?
It is possible
to get a kind of "degenerated" information about the influence of replication-associated
mutational pressure on amino acid composition. Some substitution
in the third positions in codons, e.g. almost all transitions, are silent,
but others are not and belong to the class of missense mutations. If we
assume that most of the accumulated mutations are in the four fold degenerated
codons where each mutation in the third position is silent, we should find
differences in the accumulation of mutations in codons where transversions
in the third positions are missens (two fold degenerated codons). To check
this, we have performed separate walks on the third positions of two fold
and four fold degenerated codons. Both classes of codons accumulate mutations
and some of these mutations (transversions in two fold degenerated codons)
are of missense class. In Fig. 1 we have presented subtraction of DNA walks
(DNA walks) for
many bacterial genomes. In these walks walkers moved up when the analysed
nucleotide in the third codon position was a purine or down when it was
a pyrimidine. Each plot shows walks done separately on two-fold degenerated
codons (blue lines) and four-fold degenerated codons (red lines). For each
genome the two plots where normalised in such a way that for both kinds
of codons the shape of curves can be compared. We can observe two different
relations. In the Chlamydia trachomatis, Escherichia coli and
Haemophilus influenzae genomes the accumulation of transversions in
two fold degenerated codons is almost exactly the same as in four fold
degenerated codons. On the other hand, in Borrelia burgdorferi genome
the number of substitutions accumulated in the two fold degenerated codons
is four times lower than the number of mutations accumulated in the four
fold degenerated codons.
Note: the begining of the plots is at the origin of replication
(also for linear B. burgdorferi genome)
Fig. 1. Detrended DNA walks (subtraction of walk on C strands from
walks on W strands) on two fold degenerated (blue) and four fold degenerated
(red) codons. Walkers move up when the visited nucleotide at the third
position of codon is purine and down when it is pyrimidine. Note: in two
fold degenerated codons all transversions are missense mutations. Numbers
on X-axis represent positions on chromosome in bp.
Since even in the third positions a transversion can change
the encoded amino acid, we have performed walks on amino acids coded by
ORFs lying on the two DNA strands, and we have subtracted and added the
resulting walks to separate the effect of replication-associated mutational
pressure from the effect of transcription and/or other effects. In Fig.
2 the effect of replication on amino acid composition of proteins
coded by genes lying on leading and lagging strands of many bacterial genomes
is shown. Analysing the results of subtraction of walks, we have found
amino acids which prevail on the leading or on the lagging strand in different
genomes. In genomes of E. coli, B. subtilis, T. pallidum,
B.
burgdorferi and C. trachomatis Gly, Val, and Asp were relatively
more frequently coded on the leading strand, while Ile, Thr, and His on
the lagging strand. Nevertheless, eubacterial genomes differ significantly
in prevalence of specific amino acids on leading or lagging strands. These
results prove that the previously found skew in the prevalence of some
codons in genes transcribed in the direction of replication (Fraser et.
al., 1998), is connected to replication-associated mutational pressure.
aa | |
Ala | |
Arg | |
Asn | |
Asp | |
Cys | |
Gln | |
Glu | |
Gly | |
His | |
Ile | |
Leu | |
Lys | |
Met | |
Phe | |
Pro | |
Ser | |
Thr | |
Trp | |
Tyr | |
Val |
Fig. 2. The effect of subtraction of walks ?on aminoacids? for eight prokaryotic genomes. Numbers on y-axis indicate the relative cumulative abundance of the amino acids. Numbers on x-axis represent positions on chromosome in triplets.
In the all axamined genomes no signifficant effects other than these connected with the leading/lagging role of DNA strands on protein composition have been observed. However, in large genomes (E. coli and B. subtilis) addition of DNA walks done for ORFs from W and C strands differentiates regions proximal and distal to the origin of replication of chromosome (Fig. 3). Note that replication-associated effects divide chromosomes into two replichores ? left and right, with extrema in the centre of plots. Other effects which we have observed are connected with proximal/distal parts of chromosomes with extrema near the middle of replichores. The trends at the left and right ends of the plot (Fig. 3) are the same and reciprocal to the trends in the central part of the plots. Central part of the plot corresponds to the region close to the terminus of replication (from both sides), and both ends of plots correspond to regions close to the origin of replication (from both sides).
Fig. 3. Additions of walks on B. subtilis genome done for
codons coding twelve amino acids with significant proximal/distal trends.
Numbers on X-axis represent positions on chromosome in bp.