* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic library wikipedia , lookup
Mitochondrial replacement therapy wikipedia , lookup
Gene regulatory network wikipedia , lookup
Peptide synthesis wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Gene expression wikipedia , lookup
Genetic engineering wikipedia , lookup
Non-coding DNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Protein structure prediction wikipedia , lookup
Epitranscriptome wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Molecular evolution wikipedia , lookup
Genome evolution wikipedia , lookup
Transfer RNA wikipedia , lookup
Research on Mitochondrial Genomes Lectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs and BBSRC 1. Building a database for mitochondrial genomes. 2. Large scale - gene order evolution. 3. Medium scale – sequence evolution. Molecular phylogenetics. 4. Small scale – mutation and selection. Variation in base and amino acid frequencies. Codon usage. 5. Genetic code evolution People: 1. Wenli Jia, Bin Tang, Daniel Jameson 2. Howsun Jow, Magnus Rattray, Cendrine Hudelot, Vivek Gowri-Shankar, Xiaoguang Yang 3. Wei Xu, Daniel Jameson 4. Daniel Urbina, Wenli Jia. 5. Supratim Sengupta Mitochondria are organelles inside eukaryotic cells. They are the site of oxidative phosphorylation and ATP synthesis. They contain their own genome distinct from the DNA in the nucleus. Typical animal mitochondrial genomes are short and circular (~16,000 bases). They usually contain: 2 rRNAs 22 tRNAs 13 proteins LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM NC_001922 16646 bp DNA circular VRT 20-SEP-2002 Alligator mississippiensis mitochondrion, complete genome. NC_001922 NC_001922.1 GI:5835540 . mitochondrion Alligator mississippiensis (American alligator) Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator. REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737 FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) 1 caacagactt agtcctggtc 61 gcgaaccagt gagaacaccc rRNA 1046..2635 121 caaccgatag cccaaaacgc /product="16S ribosomal RNA" 181 ccttaaacca taagcgaaag tRNA 2636..2710 241 gtgccagcaa ccgcggttag /product="tRNA-Leu" 301 gctagaactc tatctccccc /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) 361 cacaccgcaa acatcaacac 421 aagctgagaa acaaactggg gene 2711..3676 481 gtacacaaca gactaccctc /gene="ND1" 541 gacggcactt taaacccccc CDS 2711..3676 601 cgaccacctt tagcctactc /gene="ND1" 661 aaacaaaacg cgcgcaacag 721 aggtggaaga gatgggctac 781 gaaatacagg actgtcaaag 841 gtcggtaacg aagtgcgtac 901 aacaacaggc acaatgttgg 961 ggtgcacttg gaacatcaaa 1021 agtcccacca tcggaccatt An example of a GenBank file Complete mitochondrial genome of the Alligator ttttcattag tacaagtctg ctagcccagc cttgatttag acgaaaacct attagtgcag aaaactggcc attagatacc gccagagaat tagaggagcc agtctgtata ctcaaccgag attttctcaa ccggatttag acaccgcccg gcaagatggg atgtagctta ttgaaaccca ctagtactca acagacgaat cacaccccca ttagagtaga caagttaatt atacggtatc ctaatctcaa ccactatgct tacgagcccc tgtcctataa ccgccgtcgc ctaacacgtc catgtagaaa cagtaaactg tcaccctcct gaaagtcgta aatttaaagc tatctagccc acttatacat ggagccggca agggtctcag tatagaggcg gacaaacggc acagtagtga agatgtactc cagcccttaa gcttaaaact tcgacagtac aagcccgtcc aggtcaaggt tattcaacgg ggaaagaata cgaacccaac acaaggtaag attcagttta tacctccttt gcaagcatcc tcaggcacat cagtgattaa gtcaactctc gtaaattgtg taaacttcat gattccacga cattggtgta caaaggactt acgttacacc catttgaggg gcagccaaca agagccctat cctagttgaa aaaatgccca cgtaccggaa cacctgaaaa caacatgctt OGRe (= Organellar Genome Retrieval) is a relational database. available at http://ogre.mcmaster.ca More than 800 complete animal mitochondrial genomes. Efficient means of storage and retrieval of information. Uses PostgreSQL Schema defines relationships between different types of information. fileindex genome_c ode offst filename c itations genome_c ode medline_code s pecies s pecies _c ode c lass ification group_name latin_name c ommon_name c lass ification group_name parent alternative_parent has _c hildren genome genome_c ode s pecies _c ode genome_ty pe ncbiac des cription ncbidate acc es sion_date las tmodified las tmodifiedby notes genome_length genetic_code a_c ontent c _c ontent g_c ontent t_content final gene_order gene_order_notrna rna_a_content rna_c_content rna_g_content rna_t_c ontent c odon_usage genome_c ode c odon s trand usage feature feature_id genome_c ode ty pe feature_name notes des cription alignment_file feature_location feature_id s tart s top s trand trna feature_id amino_ac id anticodon c odon feature_desc riptions feature_name des cription The OGRe front page: http://ogre.mcmaster.ca Sequence information for OGRe is taken from GenBank. We aim to keep up to date with publicly available animal mitochondrial genomes. Species may be selected individually from an alphabetical list Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected Large Scale – Evolution of Gene Order in Whole Genomes On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes. Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two genes in the blue block. Genome reshuffling mechanisms Inversions: C -C -B B A B C D A A D Translocations: A (B C) D A D B C Duplications and deletions A B C D A B / C B C/ D A C B D D Example of an inversion Example of a translocation The T and –F genes are duplicated in Cordylus warreni. If the first T and the second –P were deleted, the relative position of T and –P would change. Sometimes things go crazy …. Drosophila and Thrips are both insects yet there are 30 breakpoints for only 37 genes i.e. almost nothing in common. OGRe contains gene orders as strings. This allows searching and comparison. 231 unique gene orders have been found in 858 species. The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders. Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million. Still to do (new project) : - estimate relative rates of different rearrangement processes - predict most likely ancestral gene orders - use gene order evidence in phylogenetics Medium Scale – Sequence Alignments and Phylogenetics Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA Full gene is length ~950 11 Primate species with mouse as outgroup Mouse Lemur Tarsier SakiMonkey Marmoset Baboon Gibbon Orangutan Gorilla PygmyChimp Chimp Human : : : : : : : : : : : : * 20 * 40 * 60 * CUCACCAUCUCUUGCUAAUUCAGCCUAUAUACCGCCAUCUUCAGCAAACCCUAAAAAGG-UAUUAAAGUAAGCAAAAGA CUCACCACUUCUUGCUAAUUCAACUUAUAUACCGCCAUCCCCAGCAAACCCUAUUAAGGCCC-CAAAGUAAGCAAAAAC CUUACCACCUCUUGCUAAUUCAGUCUAUAUACCGCCAUCUUCAGCAAACCCUAAUAAAGGUUUUAAAGUAAGCACAAGU CUUACCACCUCUUGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCUA-UAAUGACAGUAAAGUAAGCACAAGU CUCACCACGUCUAGCC-AU-CAGCCUGUAUACCGCCAUCUUCAGCAAACUCCU-UAAUGAUUGUAAAGUAAGCAGAAGU CCCACCCUCUCUUGCU----UAGUCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACGAAGUGAGCGCAAAU CUCACCAUCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACAAAGGCUAUAAAGUAAGCACAAAC CUCACCACCCCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCCACGAAGUAAGCGCAAAC CUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGACGAAGGCCACAAAGUAAGCACAAGU CUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGU CUCACCGCCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGUUACAAAGUAAGCGCAAGU CUCACCACCUCUUGCU----CAGCCUAUAUACCGCCAUCUUCAGCAAACCCUGAUGAAGGCUACAAAGUAAGCGCAAGU CucACC cuCUuGCu cAgccUaUAUACCGCCAUCuuCAGCAAACcCu A G aAAGUaAGC AA : : : : : : : : : : : : 78 78 79 76 76 75 75 75 75 75 75 75 69 Mammals with complete motochondrial genomes. Used two models simulatneously Total of 3571 sites = 1637 single sites + 967 pairs Hudelot et al. 2003 Afrotheria / Laurasiatheria Striking examples of convergent evolution Terebratulina Katharina Limulus Heptathela Ornithoctonus Habronattus Varroa Carios Ornithodoros moubata Ornithodoros porcinus Rhipicephalus Amblyomma Haemaphysalis Ixodes holocyclus Ixodes hexagonus Ixodes persulcatus Scutigera Lithobius Thyropygus Narceus Speleonectes Vargula Hutchinsoniella Arthropod phylogenetics Very difficult due to strong variation in rates of evolution between species. Tigriopus Armillifer Argulus Tetraclita Pollicipes Penaeus Cherax Portunus Panulirus Pagurus Artemia Triops Daphnia Tetrodontophora Gomphiocephalus Tricholepidion Locusta Aleurodicus Triatoma Philaenus tRNA tree – branch lengths optimized on fixed consensus topology Long branch species are problematic if tree is not fixed. Thrips Lepidopsocid Heterodoxus Pyrocoelia Tribolium Crioceris Apis Melipona Ostrinia Antheraea Bombyx Anopheles Drosophila Chrysomya 0.1 Images coutesy of University of Nebraska, Dept.of Entomology. http://entomology.unl.edu/images/ protein tree – branch lengths optimized on fixed consensus topology Terebratulina Katharina Limulus Heptathela Ornithoctonus Habronattus Varroa Carios Ornithodoros moubata Ornithodoros porcinus Rhipicephalus Amblyomma Haemaphysalis Ixodes holocyclus Ixodes hexagonus Ixodes persulcatus Scutigera Lithobius Thyropygus Narceus Speleonectes Vargula Hutchinsoniella Tigriopus Armillifer Argulus Tetraclita Pollicipes Penaeus Same species are on long branches in proteins as in RNAs 0.1 Cherax Portunus Panulirus Pagurus Artemia Triops Daphnia Tetrodontophora Gomphiocephalus Tricholepidion Locusta Aleurodicus Triatoma Philaenus Thrips Lepidopsocid Heterodoxus Pyrocoelia Tribolium Crioceris Apis Melipona Ostrinia Antheraea Bombyx Anopheles Drosophila Chrysomya Images coutesy of University of Nebraska, Dept.of Entomology. http://entomology.unl.edu/images/ Relative rate test for sequence evolution - Templeton Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal. 0 1 Calculate: 2 m1 = number of sites where 0 and 2 are the same and 1 is different. m2 = number of sites where 0 and 1 are the same and 2 is different. 2 ( m m ) 2 m2 1 (m1 m2 ) Should follow a chi squared distribution with one degree of freedom. Many pairs of related species found to have different rates in the mitochondrial sequences. Gene Order sometimes gives evidence of phylogenetic relationships The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus. Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans) Moderately rearranged Completely scrambled Tigriopus Heterodoxus Thrips Pollicipes Cherax Tetraclita Argulus Speleonectes Apis Hutchinsoniella Pagurus Vargula Lepidopsocid Habronattus Ornithoctonus Scutigera Melipona Varroa Armillifer Narceus Thyropygus Aleurodicus Anopheles Tetrodontophora Artemia Rhipicephalus Amblyomma Haemaphysalis Locusta Bombyx Portunus Ostrinia Tribolium Antheraea Chrysomya Tricholepidion Daphnia Pyrocoelia Drosophila Panulirus Triatoma Lithobius Philaenus Gomphiocephalus Penaeus Crioceris Triops Limulus Ixodes Ixodes Ixodes Carios Ornithodoros Heptathela Ornithodoros japonicus macropus imaginis polymerus destructor japonica americanus tulumensis mellifera macracantha longicarpus hilgendorfii RS-2001 oregonensis huwena coleoptrata bicolor destructor armillatus annularus sp. dugesii gambiae bielanensis franciscana sanguineus triguttatum flava migratoria mori trituberculatus furnacalis castaneum pernyi putoria gertschi pulex rufa melanogaster japonicus dimidiata forficatus spumarius hodgsoni monodon duodecimpunctata cancriformis polyphemus persulcatus holocyclus hexagonus capensis porcinus hangzhouensis moubata Breakpoints Inversions 35 32 35 32 32 29 22 16 20 16 20 16 20 18 19 16 19 16 18 16 18 12 17 15 17 16 16 14 15 13 15 15 14 8 14 12 13 12 9 9 9 9 8 5 8 6 8 6 7 5 7 6 7 6 7 6 6 5 6 5 6 5 6 5 6 5 6 5 4 2 3 2 3 2 3 2 3 2 3 2 3 2 3 3 3 2 3 2 3 2 3 2 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Dup/Del 0 0 1 2 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 tRNA 2.15 1.39 1.34 0.69 0.54 0.66 0.72 0.83 0.84 0.86 0.65 0.79 0.60 1.48 1.95 0.48 0.93 0.83 0.85 0.63 0.49 1.04 0.41 0.77 0.63 0.82 0.88 0.82 0.38 0.51 0.51 0.49 0.55 0.50 0.36 0.44 0.62 0.52 0.37 0.58 0.59 1.13 0.69 0.69 0.34 0.55 0.42 0.36 0.72 0.76 0.74 0.70 0.67 0.76 0.68 Protein 1.34 1.83 1.32 0.59 0.57 0.57 1.12 0.93 1.50 0.87 0.45 1.41 0.59 1.09 1.23 0.44 1.66 1.09 1.73 0.58 0.46 1.54 0.47 0.70 0.64 0.96 1.00 0.96 0.52 0.54 0.44 0.48 0.53 0.54 0.42 0.39 0.51 0.77 0.42 0.53 0.50 0.61 0.58 0.62 0.32 0.58 0.40 0.40 0.82 0.83 0.90 0.79 0.86 0.87 0.88 Very High High Medium Low Species ranked according to breakpoint distance from ancestor. R =0.99 R =0.53 R =0.59 R =0.69 Highly rearranged genomes have highly divergent sequences. Rates of sequence evolution and genome rearrangement are correlated. Both are very non-clocklike. Breakpoint category Very High High Moderate Low min 1.33 0.48 0.38 0.34 tRNA distance mean 1.62 0.86 0.63 0.60 tRNA only High tRNA only Mod/Low 0.66 0.34 1.01 0.60 max 2.14 1.94 1.04 1.13 protein distance min mean max 1.32 1.50 1.83 0.44 0.99 1.73 0.43 0.69 1.54 0.32 0.62 0.90 1.94 1.13 0.57 0.32 1.15 0.63 1.73 1.54 There are many species where only tRNAs have changed position. Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins. Relative rate of genome rearrangement (Xu et al 2006) Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal. 0 1 Calculate: 2 n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1 n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2 2 ( n n ) n2 1 2 (n1 n2 ) Should follow a chi squared distribution with one degree of freedom. We took pairs where there was a significant difference in rearrangement rates (χn2 was large) and showed that there was a significant difference in substitution rates too (χm2 was large). Good Guys Bad Guys Gene order is sometimes a strong phylogenetic marker but the Bad Guys are problematic in gene order analysis as well as phylogenetics. Why does the evolutionary rate speed up in these isolated groups of species? Why to tRNA genes move more frequently? What are the relative rates of inversion and translocation? Credits: Daniel Jameson/ Bin Tang – Database design and management Daniel Urbina – Base and Amino Acid Frequencies Wei Xu – Gene Order Analysis and Arthropod Phylogenies Small Scale Evolution – Variation in Frequencies of Bases and Amino Acids The two strands of DNA are complementary. G G C AAAAT C C GTTTTA Freq of A on one strand = Freq of T on the other Freq of C on one strand = Freq of G on the other If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands. This means that A = T and C = G on any one strand. In this case base frequencies can be described by a single variable: G+C content. BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent. The frequencies of bases on the two strands are not equal. On any one strand the frequencies of the four bases may vary independently. Mitochondrial genome replication Figure from Faith & Pollock (2003) Genetics Rank genes in order of increasing time spent single stranded COI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb ND6 is on the other strand The Genetic Code maps the 64 DNA codons to the 20 amino acids. (This version applies to Vertebrate Mitochondria) SECOND POSITION F I R S T P O S I T I O N T T C TTT F 1 TTC F TCT TCC TCA TCG TTA L 2 TTG L C A CTT CTC CTA CTG L L L L ATT I ATC I 3 ATA M 4 ATG M G GTT GTC GTA GTG V V 5 V V A G THIRD POSITION S S 6 S S TAT Y 10 TAC Y TGT C 17 TGC C TAA Stop TAG Stop TGA W 18 TGG W T C A G CCT CCC CCA CCG P P 7 P P CAT H 11 CAC H CGT CGC CGA CGG ACT ACC ACA ACG T T 8 T T GCT GCC GCA GCG A A 9 A A R R 19 R R T C A G AAT N 13 AAC N AGT S 20 AGC S AAA K 14 AAG K AGA Stop AGG Stop T C A G GAT D 15 GAC D GGT G GGC G 21 GGA G GGG G CAA Q 12 CAG Q GAA E 16 GAG E T C A G 4-codon families where the third position is synonymous Base frequencies at FFD sites in each gene (averaged over mammals) Deamination: C to U and A to G on the heavy strand Base frequencies at FFD sites are controlled by mutation. Base frequencies at 1st and 2nd positions are influenced by mutation and selection Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites. Selection at 1st position is weaker than at 2nd Mutation pressure is sufficient to cause change in amino acid frequencies. Second Position F i r s t T C P o s i t i o n A T C A G Third Pos. F 1 F S S 6 S S Y 10 Y C 17 C Stop Stop W 18 W T C A G P P 7 P P H 11 H R R 19 R R T C A G T T 8 T T N 13 N S 20 S K 14 K Stop Stop T C A G A A 9 A A D 15 D G G L L L 2 L L L I 3 I M 4 M G V V 5 V V Q 12 Q E 16 E 21 G G T C A G Slopes of the amino acid freq v base freq show the response of the amino acid to mutational pressure. Black = fish White = mammals Amino acids in the first two columns of the code have larger slopes. Physical Properties of Amino Acids Vol . Bulk. Polarity pI Hyd.1 Hyd.2 Surface Area Fract. Area Ala A 67 11.50 0.00 6.00 1.8 1.6 113 0.74 Arg R 148 14.28 52.00 10.76 -4.5 -12.3 241 0.64 Asn N 96 12.28 3.38 5.41 -3.5 -4.8 158 0.63 Asp D 91 11.68 49.70 2.77 -3.5 -9.2 151 0.62 Cys C 86 13.46 1.48 5.05 2.5 2.0 140 0.91 Gln Q 114 14.45 3.53 5.65 -3.5 -4.1 189 0.62 Glu E 109 13.57 49.90 3.22 -3.5 -8.2 183 0.62 Gly G 48 3.40 0.00 5.97 -0.4 1.0 85 0.72 His H 118 13.69 51.60 7.59 -3.2 -3.0 194 0.78 y2 Each Amino Acid is a point in 8-d space. dij = Euclidean distance between a.a. i and j in 8-d space. y1 y3 Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions. Big Small Hydrophobic Hydrophilic Responsiveness measures how much an amino acid frequency varies in response to mutational pressure = Root mean square of 8 slopes for each amino acid (i.e. 4 bases x 2 data sets) Second Position F i R s t P o s i t i o n T C A T C A G Third Pos. F F S S S S Y Y C C Stop Stop W W T C A G P P P P H H R R R R T C A G T T T T N N S S K K Stop Stop T C A G A A A A D D G E E G L L L L L L I I M M G V V V V Q Q G G T C A G Proximity measures how similar the neighbouring amino acids are in the genetic code = Mean of 1/d for accessible amino acids e.g. Prox (T) = 1 2 2 6 4 4 2 2 + + + + + + + 2 0 24 dTI dTM dTS dTP dTA dTN dTK Responsiveness and Proximity are highly correlated. R =0.87 (p < 10-6) An amino acid frequency responds to mutational pressure more easily if there are neighbouring amino acids with similar physical properties. Urbina et al. (2006) J. Mol. Evol. Homo sapiens Strand = + 3624 codons F F L L UUU UUC UUA UUG 69 139 65 11 S S S S UCU UCC UCA UCG 29 99 81 7 Y Y * * UAU UAC UAA UAG 35 89 4 3 C C W W UGU UGC UGA UGG 5 17 90 9 L L L L CUU CUC CUA CUG 65 167 276 42 P P P P CCU CCC CCA CCG 37 119 52 7 H H Q Q CAU CAC CAA CAG 18 79 82 8 R R R R CGU CGC CGA CGG 6 26 28 0 I I M M AUU AUC AUA AUG 112 196 165 32 T T T T ACU ACC ACA ACG 50 155 132 10 N N K K AAU AAC AAA AAG 29 131 84 9 S S * * AGU AGC AGA AGG 11 37 1 0 V V V V GUU GUC GUA GUG 22 45 61 8 A A A A GCU GCC GCA GCG 39 123 79 5 D D E E GAU GAC GAA GAG 12 51 63 15 G G G G GGU GGC GGA GGG 16 87 61 19 Fish - 23 Frequency ratios p(X 2Y3 ) r(X 2Y3 ) = q(X 2 )q(Y3 ) Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection. UU 1.250 CU 0.939 GU 0.605 UC 0.756 CC 1.205 GC 0.878 UA 1.030 CA 0.938 GA 1.145 UG 1.274 CG 0.554 GG 1.891 Mammals - 23 UU 0.939 CU 1.101 GU 0.763 UC 0.743 CC 1.163 GC 1.005 UA 1.136 CA 0.906 GA 1.027 UG 1.433 CG 0.552 GG 1.654 Fish - 31 UU 0.933 CU 1.162 AU 0.907 GU 0.911 UC 0.918 CC 1.371 AC 0.739 GC 0.839 CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA) UA 1.096 CA 0.849 AA 1.135 GA 0.758 UG 1.049 CG 0.609 AG 1.228 GG 1.499 DNA binding proteins.... Mammals - 31 UU 0.855 CU 1.082 AU 0.996 GU 1.115 UC 0.994 CC 1.363 AC 0.797 GC 0.873 UA 1.206 CA 0.945 AA 0.974 GA 0.776 UG 0.856 CG 0.546 AG 1.293 GG 1.369 Changes in tRNA content of genomes from bacteria to mitochondria Ala Asp Gln Pro Ser-UCN Total Anticodon GGCUGCCGCGUCUUGCUGGGGUGGCGGGGAUGACGA Epsilon Proteobacteria Campylobacter jejuni 1 3 0 2 1 0 0 1 0 1 1 0 42 Helicobacter pylori J99 1 1 0 1 1 0 1 1 0 1 1 0 36 Gamma Proteobacteria Pseudomonas aeruginosa 2 4 0 4 1 0 1 1 1 1 1 1 63 Vibrio parahaemolyticus 1 4 0 6 6 0 0 3 0 1 4 0 126 Haemophilus influenzae 1 3 0 3 2 0 0 2 0 1 2 0 56 Buchnera aphidicola 1 1 0 1 1 0 0 1 0 1 1 0 32# Blochmannia floridanus 0 1 0 1 1 0 0 1 0 0 1 1 36# Wigglesworthia glossinidia 1 1 0 1 1 0 0 1 0 1 1 0 34# Escherichia coli K12 2 3 0 3 2 2 1 1 1 2 1 1 86 Alpha Proteobacteria Agrobacterium tumefaciens 1 4 0 2 1 1 1 1 1 1 1 1 53 Sinorhizobium meliloti 1 3 1 2 1 1 1 1 1 1 1 1 51 Rickettsia prowazekii 1 1 0 1 1 0 0 1 0 1 1 0 33# Wolbachia (D. mel) 0 1 0 1 1 0 0 1 0 1 1 0 34# Caulobacter crescentus 1 2 1 2 1 0 1 1 1 1 1 1 51 Mitochondria Reclinomonas americana 0 1 0 1 1 0 0 1 0 0 1 0 26 Homo sapiens 0 1 0 1 1 0 0 1 0 0 1 0 22 Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total. # denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs. Evolution of the Genetic Code: Before and After the LUCA 1. The genetic code evolved to its canonical form before the Last Universal Common Ancestor of Archaea, Bacteria and Eukaryotes - >3 billion years ago. It appears to be highly optimized. How did it get to be this way? 2. Numerous small changes have occurred to the canonical code since then. What is the mechanism of codon reassignment? Codon Reassignment – The Genetic code is variable in mitochondria (and also some cases of other types of genomes) Second Position F i r s t P o s i t i o n U C A G U C A G Third Pos. F F L L S S S S Y Y Stop Stop C C Stop W U C A G L L L L P P P P H H Q Q R R R R U C A G CUN Leu to Thr I I I M T T T T N N K K S S R R U C A G AGR Arg to Ser to Stop/Gly V V V V A A A A D D E E G G G G U C A G UGA Stop to Trp AUA Ile to Met CGN Arg to unassigned etc..... But how can this happen? It should be disadvantageous. Example 1: AUA was reassigned from Ile to Met during the early evolution of the mitochondrial genome. Before Codon Anticodon Ile Ile Ile Met AUU AUC AUA GAU k2CAU AUG CAU Codon Anticodon Ile Ile AUU AUC GAU Met Met AUA AUG UAU or f5CAU After Notes G in the wobble position of the tRNA-Ile can pair with U and C in the third codon position Bacteria and some protist mitochondria possess another tRNA-Ile with a modified base that translates AUA only. The tRNA-Met translates AUG only. Notes In animal mitochondria the k2CAU tRNA has been deleted. There is a gain of function of the tRNA-Met by a mutation or a base modification Example 2: UGA was reassigned from Stop to Trp many times (12 times in mitochondria). Before Codon Anticodon Notes Stop UGA RF Release Factor recognizes UGA codon. Trp UGG CCA Normal tRNA-Trp translates only UGG codons. After Codon Anticodon Trp Trp UGA UGG UCA Notes In animal mitochondria (and elsewhere) there is a gain of function of the tRNA-Trp via mutation or base modification so that it translates both UGG and UGA. The GAIN-LOSS framework (Sengupta & Higgs, Genetics 2005) LOSS = deletion or loss of function of a tRNA or RF GAIN = gain of a new tRNA or a gain of function of an existing one. GAIN Ambiguous codon. Selective disadvantage. New Code. Selective disadvantage because codons are used in wrong places Initial Code. No Problem. LOSS LOSS Unassigned codon. Selective disadvantage. Note – the strength of the selective disadvantage depends on the number of times the codon is used. There is no disadvantage if the codon disappears. GAIN Mutations in coding sequences New Code. Codons now used in right places. No Problem. Four possible mechanisms of codon reassignment. 1. Codon Disappearance - The codon disappears. The order of the gain and loss is irrelevant. For the other three mechanisms the codon does not disappear. 2. Ambiguous Intermediate – The gain happens before the loss. There is a period when the gain is fixed in the population and translation is ambiguous. 3. Unassigned Codon – The loss happens before the gain. There is a period when the loss is fixed in the population and the codon is unassigned. 4. Compensatory Change – The gain and loss are fixed in the population simultaneously (although they do not arise at the same time). There is no intermediate period between the old and the new codes. - cf. theory of compensatory substitutions in RNA helices. Sengupta & Higgs (2005) showed that all four mechanisms work in a population genetics simulation Summary of Codon Reassignments in Mitochondria Codon reassignment Can this be explained by GCAU mutation pressure? No. of times Change in No. of tRNAs Is mispairing important? Mechanism UAG: Stop Leu 2 G A at 3rd pos. +1 No CD UAG: Stop Ala 1 G A at 3rd pos. +1 No CD 0 Possibly. CA at 3rd pos. CD UGA: Stop Trp 12 G A at 2nd pos. CUN: Leu Thr 1 C U at 1st pos. 0 No CD CGN: Arg Unass 5 C A at 1st pos. -1 No CD AUA: Ile Met or Unassigned 3 / 5 -1 Yes. GA at 3rd pos. UC 0 Yes. GA at 3rd pos. AI 0 Possibly. GA at 3rd pos. UC or AI -1 Yes. GA at 3rd pos. UC AAA: Lys Asn AAA: Lys Unass AGR: Arg Ser 2 1 1 No No No No AGR: Ser Stop 1 No 0 No AI(b) AGR: Ser Gly 1 No +1 No AI(b) UUA: Leu Stop 1 No 0 No UC or AI UCA: Ser Stop 1 No 0 No UC or AI CD mechanism explains disappearance of stop codons because they are rare initially. Only a few examples of CD for sense codons. UC and AI are important for sense codons. Three examples in yeasts (Mutation pressure GC to AU) CUN is rare (replaced by UUR) Second Position F i r s t P o s i t i o n U C A G U F F L L S S S S Y Y Stop Stop C C Stop W U C A G C L L L L P P P P H H Q Q R R R R U C A G I I I M T T T T N N K K S S R R U C A G V V V V A A A A D D E E G U C A G A G Third Pos. G G G CUN Leu to Thr CGN is rare (replaced by AGR) CGN Arg codons become unassigned. AUA and AUU common and AUC is rare Nevertheless AUA is reassigned to Met. Codon does not disappear Leu and Arg codons in yeasts Codon Disappearance causes reassignments Leu Leu CUN UUR Arg CGN Arg AGR S 53 192 7 33 Y. 44 618 0** 75 C 3 279 12 29 C 132 397 47 26 C 66 547 39 45 P 25 714 18 67 K 0 286 0** 48 C 11* 294 1** 45 S 33* 333 7 49 S 19* 274 0** 40 S 22* 300 0** 46 * CUN = Thr. Unusual tRNA-Thr present instead of tRNA-Leu ** CGN = unassigned. tRNA-Arg is deleted AUA Ile to Met in Yeasts codon anticodon AUU Ile GUA AUC Ile “ AUA Ile K2CAU AUG Met CAU Codon Usage AUU AUC AUA AUG AUA is J 133 40 32 48 Ile O 161 34 0 57 Absent P 113 39 49 51 Ile tRNA K2CAU none K2CAU AUU AUC AUA AUG 119 81 229 100 Ile 303 32 193 117 Ile 274 18 562 105 Ile 213 16 7 63 ? 207 21 16 73 Met 239 31 60 73 Met 203 7 101 56 Met 218 11 95 70 Met K2CAU K2CAU K2CAU none C*AU C*AU C*AU C*AU C C P K C S S S Reassignments in Metazoa Porifera Cnidaria Arthropoda Nematoda Lophotrochozoa Loss of tRNA-Ile(CAU) but AUA remains Ile Loss of tRNA-Arg(UCU) and AGR : Arg -> Ser Loss of many tRNAs + import from cytoplasm Platyhelminthes Echinodermata Hemichordata AUA : Ile -> Met AGR : Ser -> Stop Urochordata AGR : Ser -> Gly AAA : Lys -> Asn AAA : Lys -> unassigned Cephalochordata Craniata AGR in Metazoa – One loss of tRNA-Arg with several responses. Second Position U F i r s t P o s i t i o n U C A G C A G Third Pos. F F L L S S S S Y Y Stop Stop C C Stop W U C A G L L L L P P P P H H Q Q R R R R U C A G I I I M T T T T N N K K S S R R U C A G V V V V A A A A D D E E G G G G U C A G codon anticodon AGU Ser GUA AGC Ser “ AGA Arg UCU AGG Arg “ AGR can become (i) Ser/Unass (e.g Arthropods) (ii) Stop (e.g. Vertebrates) (iii) Gly (e.g. Urochordates) Evolution of the canonical code - Before the LUCA The canonical code seems to be optimized to reduce the effects of translational and mutational errors. Neighbouring codons code for similar amino acids. 5 7 C LI F WM Y V PT A SG HQ 9 R 11 NK Woese’s polar requirement scale Measure difference between amino acid properties by how far apart they are on this scale. 13 E D Cost function g(a,b) for replacing amino acid a by amino acid b e.g. difference in Polar Requirement E rij g (ai , a j ) / rij i j i j rij = rate of mistaking codon i for codon j = 1 for single position mistakes, 0 otherwise E = measure of error associated with a code Generate random codes by permuting the 20 amino acids in the code table E is smaller for the canonical code than for almost all random codes. f ~ 10-6 p(E) Ereal one in a million codes is better (Freeland and Hurst) f E Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions. Big Small Hydrophobic Hydrophilic Modified codes show that the Canonical code could have changed as it evolved – not completely a frozen accident. Possibility of competition between organisms with different codes – natural selection. Early codes had <20 amino acids (???). Gradual increase in complexity. Increased repertoire of amino acids gives more protein functions. Order of addition – Astrobiology - which amino acids were common on early Earth? Prebiotic synthesis of amino acids Amino acids are found in • Meteorites • Atmospheric chemistry experiments (Miller-Urey) • Hydrothermal synthesis • Icy dust grains in space Rank amino acids in order of decreasing frequency in 12 observations. Derive mean ranking. G A D E V S I L P T (found non-biologically - early amino acids) K R H F Q N Y W C M (not found non-biologically – late amino acids) Early and Late amino acids are determined by thermodynamics Positions of early and late amino acids.... What does this mean? Second Position F i r s t P o s i t i o n U C A G U FF FF L L S S S S Y Y Stop Stop C C Stop W U C A G C L L L L P P P P H H Q Q R R R R U C A G I I I MM T T T T N N K K S S R R U C A G V V V V A A A A D D E E G A G Third Pos. G G G U C A G Maybe only 2nd position was relevant initially. Late amino acids took over codons previously assigned to amino acids with similar properties. Other points – Column structure suggests that translational errors were more important than mutational errors (tRNA structure/RNA world) Precursor-product pairs tend to be neighbours (but doubts over statistical significance). Maybe late amino acids took over codons previously assigned to their biochemical precursors. Direct chemical interactions between RNA motifs and amino acids (“stereochemical theory”). In vitro selection experiments suggest binding sites of aptamers preferentially contain codon and anticodon sequences.