Download Cracking the Second Genetic Code: Sequence Patterns in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Zinc finger nuclease wikipedia , lookup

Transcript
Cracking the Second Genetic Code: Sequence Patterns in Noncoding DNA
Jeff Elhai
Center for the Study of Biological Complexity
Forty years ago, a series of experiments culminated in the elucidation of the genetic code, a means of
translating the information within the DNA of genes into the information within protein. We now know
that only 3% of human DNA is within genes. How do we understand the remaining 97%? The situation is
somewhat more comfortable with bacterial genomes, where 70 to 80% of the DNA encodes protein, but
the puzzle remains as to the function of the noncoding DNA.
All possible tetrameric, pentameric, and hexameric
sequences were counted within the genome of
Nostoc PCC 7120 and compared with statistical
expectation (Fig. 1). Some were profoundly underrepresented, and most of these turned out to be
palindromes, double-stranded DNA sequences that
are the same read left-to-right on one strand as rightto-left on the other (e.g. GGATCC
CCTAGG ). A large fraction of
highly underrepresented hexamers are recognition
sites for restriction enzymes encoded by Nostoc
PCC 7120 or other Nostoc-like strains. Since
restriction enzymes are used by bacteria as a
defense mechanism against foreign DNA, this result
suggests that these cyanobacteria have freely
exchanged DNA in recent evolutionary time and
calls into question the traditional definition of
species as it applies to cyanobacteria.
Frequency of occurrence
A time-honored method of breaking codes is to catalog those symbols that appear much less frequently
than expected and also those that appear much more frequently than expected. My colleagues and I have
pursued this strategy, focusing on the genomes of cyanobacteria of the genus Nostoc, which comprised
amongst the largest bacterial genomes known.
10
8
6
Underrepresented
Overrepresented
Palindromic
hexemers
Nonpalindromic
hexemers
4
2
350
300
250
200
150
100
50
0
0
0
0.25
0.5
0.75
1
1.25
1.5
1.75
Bias (Rho)
Figure 1. Distribution of hexameric DNA sequences
in the genome of the Nostoc PCC 7120.
Nonpalindromic (red, dashed) and palindromic (green
solid) sequences were evaluated as to their frequency
relative to the frequencies of all included sequences,
measured by the statistic rho. Thick lines represent
sequences that are recognized by restriction enzymes
found in some strain of Nostoc-like cyanobacteria.
Sequences that appear more frequently than statistically expected have also pointed to significant
biological functions. The genome of Nostoc punctiforme was found to possess more than a thousand
instances of tandem heptameric repeats and hundreds of dispersed repeats, typically 24-bp in length.
These 24-bp repeats were usually flanked by tandem heptameric repeats. We have found many instances
where the 24-bp repeats have evidently served as sites for intragenomic recombination (Fig. 2). Such
rampant recombination would be expected to randomize gene order within the genome, and we have
indeed observed this, by comparing gene order between the genomes of different Nostoc’s. Repeated
sequences are therefore likely to play an important role in the evolution of the genome.
heptamer repeats |←
24-bp repeat
→| heptamer repeats
A
...TTAATCTAAAATCCAAAATGGTTCGACTGAGCGAAGCCGAAGTCCAAAATCCAAAATTAGGGGA...
B
...ATGCTGTGCGATGTCTACGATGGTCACTGAGCGCAGCCGTACCACTTCGTGGAAGCAAGCTACG...
C
...TCAATCTAAAATCCAAAATTGTTCGACTGAGCGTAGCCGTACCCCTCCGGGGAAGCAAGCTACG...
Figure 2. Recombination between dispersed instances of a 24-bp repeat in the genome of Nostoc punctiforme.
Three highly separated regions of the genome are shown. Yellow and gray highlight a sequence found in the genome
approximately 200 times. Magenta and purple highlight a 7-bp tandem repeat (yAAAATy) that frequently flanks the
24-bp repeat. Dark cyano highlights a nonrepetitive sequence sometimes found flanking the 24-bp repeat.
Recombination between sequence A and B can explain the structure of sequence C.