* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download long - David Pollock
Designer baby wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
History of genetic engineering wikipedia , lookup
Transposable element wikipedia , lookup
Population genetics wikipedia , lookup
Protein moonlighting wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Microsatellite wikipedia , lookup
Human genome wikipedia , lookup
Adaptive evolution in the human genome wikipedia , lookup
Point mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Koinophilia wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Evolution of Proteins and Genomes Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine [email protected] www.EvolutionaryGenomics.com Evolution of Proteins Jason de Koning Description Focus on protein structure, sequence, and functional evolution Subjects structural comparison and prediction, biochemical adaptation, evolution of protein complexes, probabilistic methods for detecting patterns of sequence evolution, effects of population structure on protein evolution, lattice and other computational models of protein evolution, protein folding and energetics, mutagenesis experiments, directed evolution, coevolutionary interactions within and between proteins, and detection of adaptation, diversifying selection and functional divergence. Reconstruction of Ancestral Function How do You Understand a New Protein? Structural and Functional Studies Experimental (NMR, X-tallography…) Computational (structure prediction…) Comparative Sequence Analysis Looking at sets of sequences A common but wrong assumption: sequences are a random sample from the set of all possible sequences Mouse: Rat: Baboon: Chimp: …TLSPGLKIVSNPL… …TLTPGLKLVSDTL… …TVSPGLRIVSDGV… …TISPGLVIVSENL... Conserved proline Variable “High entropy” Comparative Sequence Analysis Looking at sets of sequences In reality, proteins are related by evolutionary process Confounding Effect of Evolution …TLSKRNPL… SF PT …TLFKRNPL… …TLSKRNTL… …TLSKRNT… …TLFKRNP… …TLSKRNT… …TLFKRNP… …TLFKRNP… …TLSKRNT… Confounding Effect of Evolution …TLSKRNPL… SF PT …TLFKRNPL… …TLSKRNTL… …TLSKRNT… …TLFKRNP… …TLFKRNP… …TLSKRNT… …TLFKRNP… …TLSKRNT… Everytime there is an F, there is a P! Everytime there is an S, there is a T! Ways to Deal with This… Most common: Ignorance is Bliss Some: Try to estimate the extent of the confounding (Mirny, Atchley) Remove the confounding (Maxygen) Include evolution explicitly in the model (Goldstein, Pollock, Goldman, Thorne, …) Fitness Selective Pressure Folding Mouse: Rat: Baboon: Chimp: Stability Function Selection Stochastic Realizations A B C …TLSPGLKIVSNPL… …TLTPGLKLVSDTL… …TVSPGLRIVSDGV… …TISPGLVIVSENL... Understanding Selective Pressure Folding Mouse: Rat: Baboon: Chimp: Stability Function Data Model A B C …TLSPGLKIVSNPL… …TLTPGLKLVSDTL… …TVSPGLRIVSDGV… …TISPGLVIVSENL... Purines Pyrimidines DNA What does DNA do? Replication Translation Folding mRNA DNA Protein Protein Function Mutations result in genetic variation Selective Pressure Genetic changes …UGUACAAAG… Substitution Insertion Deletion …UGUAUAAAG… …UGUAAAAG… …UGUUACAAAG… Substitutions Can Be: Purines: Transitions A G Transversions Pyrimidines: C T Substitutions in coding regions can be: Cys Arg Lys UGU/AGA/AAG Silent Nonsense Missense UGU/CGA/AAG Cys Arg Lys UGU/GGA/AAG Cys Gly Lys First position: 4% of all changes silent Second position: no changes silent Third position: 70% of all changes silent (wobble position) UGU/UGA/AAG Cys STOP Lys Homologous crossover Uneven crossover leading to gene deletion and duplication Gene conversion Fate of a duplicated gene Keep on doing whatever it originally was doing Lose ability to do anything (become a pseudogene) Learn to do something new (neofunctionalization) Split old functions among new genes (subfunctionalization) Homologies Gene duplication a Hemoglobin b Hemoglobin Speciation Mouse a Hb Rat a Hb Paralogs Mouse b Hb Orthologs Rat b Hb Initial Population Mistakes are Made Elimination Polymorphism Fixation Selection Differences in fitness (capacity for fertile offspring) 1 gene 2 alleles (variations), A and B 3 genotypes (diploid organism): AA, AB, BB Genotype Fitness ωAA = 1 (wild type) ωAB = 1 + SAB ωBB = 1 + SBB AA AB BB S > 0 advantageous S < 0 unfavorable S ~ 0 neutral Evolution of Gene Frequencies q = frequency of B p = (1-q) = frequency of A , , population: differential equation for p, q q(next generation) = q(this generation) + pq[psAB + q(sBB-sAB)] p2 + 2pq(sAB+1) + q2(sBB+1) Fixation of an Advantageous Recessive Allele (s=0.01) Frequency of B 1 0.8 Genotype AA AB BB 0.6 0.4 Fitness Value 1.0 1.0 (recessive) 1.01 0.2 0 0 1000 2000 3000 Generation 4000 5000 Equilibration of an Overdominant Allele 1 Frequency of B 0.8 0.6 Genotype 0.4 Fitness Value AA AB BB 0.2 1.0 1.02 1.01 0 0 200 400 600 Generation 800 1000 Probability of fixation = 1-e-2s 1-e-2Ns 1 N = 10 10-02 N = 100 Fixation probability 10-04 10-06 = 2s (large, positive S, large N) N = 1000 = 1/(2N) when |s| < 1/(2N) 10-08 10-10 N = 10,000 10-12 10-14 -0.01 0 0.01 Selective advantage (s) 0.02 Real phylogenetic trees The Rate of Evolution Depends on Constraints Human vs. Rodent Comparison Highest substitution rates pseudogenes introns 3’ flanking (not transcribed to mature mRNA) 4-fold degenerate sites Intermediate substitution rates 5’ flanking (contains promoter) 3’, 5’ untranslated (transcribed to mRNA) 2-fold degenerate sites Lowest substitution rates Nondegenerate sites Selection of Species for DNA comparisons Human versus Chimpanzee Mouse Opossum Pufferfish Size (Gbp) 3.0 2.5 4.2 0.4 Time since divergence ~5 MYA ~ 65 MYA ~150 MYA ~450 MYA Sequence conservation (in coding regions) >99% ~80% ~70-75% ~65% Aids identification of… Recently changed sequences and genomic rearrangements Both Both Primarily coding and coding and coding non-coding non-coding sequences sequences sequences UCSC Genome Browser 39 Comparative analysis of multi-species sequences from targeted genomic regions Nature, 2003 40 Comparative Genomics in the CFTR Region Near CFTR 1.8 Mb of human Ch7, Sequenced for 12 ssp. How does a region change over evolutionary time? How much does it change? What types of changes are more/less common? Do some lineages have more of certain changes than others? How much comparative genomic data do we need??? Sequence Conservation 42 Looking backward from the human genome How much is still there after 450my (Fugu) 43 Transposable Elements Gone Wild! Transposable Elements Gone Wild! 45 Transposable Elements Gone Wild! 46 Transposable Elements Gone Wild! 47 Transposable Elements Gone Wild! BovB CR1 48 Nucleotide Changes Big insertions/deletions are more common than nucleotide changes! In primates, large indels are the principal mechanism accounting for observed sequence differences 49 Identifying Functionally Important Regions How many comparative genomes do we need? Can’t we just use the mouse? Using 12 species, 561 Multi-Species Conserved Sequences (MCSs) were found False Pos. True Pos. False Neg. How can be found using just the Mouse genome (rather than all 12) Multi-Species Conserved Sequences 950 of the 1,194 MCSs are neither exonic nor lie less than 1-kb upstream of transcribed sequence. Meaning they are otherwise hard to predict (Evolutionary Distance) Strong argument for comparative genomics: Need many species, and distant species – like cat, dog, fish - to ID conserved possibly-functional regions in humans! 51 Interpreting Evolutionary Changes Requires a Model …IGTLS… …IGRLS... In evolution: what is the rate R(T R) at which Ts become Rs? e.g. 0.00005 / my 20 x 20 Substitution Matrix