* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2 - UPCH
Gene regulatory network wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Expanded genetic code wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Genome evolution wikipedia , lookup
History of molecular evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
EVOLUCIÓN: PRINCIPIOS Y CUANTIFICACIÓN Human nuclear genome Only 3% coding DNA The Evolutionary forces: • • • • Natural selection (Darwin, Bernardi) Neutral Theory: Genetic drift (Kimura) Small population sizes Mutation – Mutationalist theory (Sueoka) – Thermodynamic pressure theory (Zimic & Arévalo) • Gene flow (migration), horizontal transfer MUTATION THE ULTIMATE SOURCE OF NEW GENETIC VARIATION. Mutation rates are in the general range of: Approx 10-7 to 10-8 per nucleotide per generation Approx 10-5 per gene per generation Approx 10-3 per generation at microsatellites BZM210: E.Willassen Genomes, genes and molecular evolution purines A G transitions transversions C T transitions pyrimidines Interesting links: http://www.no.embnet.org/ http://www.ncbi.nlm.nih.gov/index.html Transitions - transversions purines A G transitions Expected: transversions TS / TV = 4 / 8 = 0.5 T C transitions pyrimidines TS / TV ratios mtDNA and globins 9.0 0.66 GENE FLOW SPREAD OF VARIATION OVER SPACE BY MOVEMENT AND/OR INTERMARRIAGE AMONG PEOPLE (‘ADMIXTURE’) INTRODUCES NEW VARIATION INTO A POPULATION REDUCES VARIATION BETWEEN POPULATIONS ADMIXTURE IF ALLELES FROM TWO ‘PARENTAL’ POPULATIONS (1 and 2) MIX IN PROPORTION m FROM POPULATION 1, THE ALLELE FREQUENCY IN THE ADMIXED POPULATION (a)WILL BE pa = mp1 + (1-m)p2 ADMIXTURE IS COMMON IN THE US AND REFLECTS POPULATION HISTORY Autosomal vs. Y-specific vs. mtDNA Native American genetic contribution in the Hispanic population of San Luis Valley, Colorado. 100% 80% % Native American contribution 60% 40% 20% 0% AutosomalGlobal Y-chromosome mtDNA GENETIC DRIFT ALLELE FREQUENCY CHANGE DUE TO CHANCE FACTORS IN SEGREGATION, SURVIVAL & REPRODUCTION IN FINITE POPULATIONS. GENETIC DRIFT INVERSELY RELATED TO POPULATION SIZE POSITIVELY RELATED TO TIME. PROBABILITY OF ULTIMATE FIXATION OF AN ALLELE IS ITS CURRENT FREQUENCY APPLIES TO WHOLE SPECIES, BECAUSE TIME IS LONG UP OR OUT IN SMALL POPULATIONS Initial p=0.5, N=25, 80 generations About ½ get fixed CHANGE IS SLOWER IN BIG POPULATIONS Initial p=0.5, N=300, 100 generations Change is slower in larger populations Drift reduces variation within populations due to fixation & loss of neutral alleles. Drift increases variation between populations because different alleles are fixed in each population LAS RAZAS HUMANAS EXISTEN? FOUNDER EFFECT DRIFT EFFECT ON ALLELE FREQUENCIES WHEN A POPULATION IS FOUNDED BY A SMALL NUMBER OF PEOPLE FROM A LARGER POPULATION CAN RAISE THE FREQUENCY OF A DISEASE ALLELE BY CHANCE e.g., RELIGIOUS ISOLATES LIKE THE AMISH Variation, measured by heterozygosity is reduced by genetic drift due to allele loss Humans: 10,000 years Effect of drift over time, random mating, no mutation: Ht=H0(1-1/2N)t Mean times to fixation and loss for selectively neutral alleles Loss occurs more rapidly than fixation Common alleles are generally old alleles Geographically widespread alleles are usually old GENE ‘TREES’ What happens to a DNA sequence over time? . . . . . and why? (THINK OF THE DICE EXPERIMENT) Darwin’s Tree of Life. What about a gene? SPECIATION A reduction in gene flow between populations accompagnied by divergent selection and/or genetic drift, can lead to speciation. Evolutionary history includes the transformation and divergens of lineages Phylogenetic evolution or anagenesis Looking forward in time, a sequence will diverge among descendant copies, because of the accumulation of mutations. Looking backward in time, present-day sequences coalesce to a common ancestor in the past. THEN DEMOGRAPHIC HISTORY OF GENE COPIES T1 T2 Coalescent events (common ancestry) T3 T4 T5 NOW 1 2 3 4 5 6 SAMPLES THEN MUTATION HISTORY OF ALLELES A1 T1 mutation T2 T3 T4 T5 NOW A2 A3 A1 A1 A4 ALLELES A5 SAMPLED The Coalescent process Mutations arise hierarchically over time, generating a phylogeny of cladistic (tree-like, branching) DNA sequence relationships. WHAT CAN WE SAY ABOUT THE RELATIONSHIPS AMONG A SET OF DNA SEQUENCES SAMPLED TODAY? ACTAA AATGA CGAAA CGAAG AGTAG DNA sequences have a common ancestor and their variation reflects their descent history MRCA of all samples MRCA of these 3 samples ACTAA AATGA CGAAA CGAAG Current sample of DNA sequences AGTAG Ancestral sequences can sometimes be inferred AAAAA (common ancestor, or coalescent) AGAAA AATAA CGAAA ACTAA AATGA CGAAA CGAAG Current sample of DNA sequences AGTAG Population history is reflected in the pattern of sequence variation, and the geographic location where DNA sequence haplotypes are found. Similar sequences are found geographically near each other. Common alleles are usually old Ancient alleles are found globally. Global alleles were present at our species’ origin New sequences are geographically localized. The Neutral Theory • The great majority of mutations that are fixed are effectively neutral with respect to fitness, and are fixed by genetic drift • polymorphism within populations is transient and due to the presence of selectively neutral alleles on their way to fixation or loss The Neutral Theory • Adaptive Evolution is due to Natural Selection • Advantageous mutations are rare • most genetic variation at the molecular level is not selected within a population • most genetic substitutions at the molecular level are not due to selection Functional Constraint • Definition: • an amino acid in a protein cannot be changed – at all – only to an amino acid of the same type • without giving rise to a deleterious mutation Functional Constraint • • • • vertebrates fibrinopeptides hemoglobin cytochromes hemoglobin fibrinopeptide cytochrome c • rates depend on functional constraints millions of years since divergence Functional Constraint • Mitochondrial gene in mammals • uniform rate • rate difference between silent and amino-acid replacement mutations silent replacement Molecular Clock: observations • -hemoglobin in vertebrates • plot amino acid differences against divergence time • good linear approximation Molecular Clock: observations -hemoglobin about constant rate over time Molecular Clock 1 • What use is the molecular clock? date divergence in phylogeny • as a first approximation Rates of Nucleotide Substitution Insuline • Rate variation among different regions of a gene • Insuline: excised C-protein evolves faster Functional: Rate: 0.13 Excised: Rate: 0.97 Rates of Nucleotide Substitution Rate: number of substitutions K between two homologous sequences divided by twice the time of divergence t Ancestral sequence t t Sequence 1 Sequence 2 Rates of Nucleotide Substitution • Number of substitutions 1 lineage K=rt 2 lineages from split K=2rt Molecular Clock • • • • • molecular clock is used to put a time to phylogenies construct phylogeny first by clock independent method clock based on well established partial phylogenies rate tests on reference set and subsets estimate times on total data base Orthologous genes or not? Well matching sequences may not be directly homologous •orthologous - same gene copy •paralogous - duplicate gene copy •xenologous - introgressed gene copy (hybridization, virus) Horizontal transfer transversions Base pair differences ’Multiple hits’ and ’saturation’ 0.11 0.10 0.08 0.07 v (3rd) 0.05 0.04 0.03 0.01 -0.0581 0.0000 0.0581 0.1163 0.1744 0.2326 0.2907 0.3489 F84 distance Time G-T-A-T E D C A>T T>A G>T B A Reversal to a previous state may be detected as homoplasy. True phylogenetic signal would be masked with time and give false synapomorphies. Signal depends on mutation rates, r. transversions Base pair differences Adjusted sequence change 0.11 ’correction factor’ 0.10 0.08 0.07 v (3rd) 0.05 0.04 0.03 0.01 -0.0581 0.0000 0.0581 0.1163 0.1744 0.2326 0.2907 0.3489 F84 distance Time different models have been made with intention to correct for multiple hits by converting observed distances between sequences to actual (expected) distances (under the particlar model) We can use genetic differences among populations or species to reconstruct evolutionary history Infering on likely evolutionary history from genetic differences Divergence can be used for grouping Human Horse Cow Kangaroo Newt Carp 0.1 Amino acid sequences of hemoglobin alpha chains No. of Taxa : 6 Gaps/Missing data : Complete Deletion Distance method : Amino: Poisson correction No. of Sites : 140 d : Estimate 1 2 3 4 5 6 [1] Human [2] Horse 0.13 [3] Cow 0.13 0.13 [4] Kangaroo 0.21 0.23 0.20 [5] Newt 0.57 0.64 0.60 0.64 [6] Carp 0.66 0.65 0.62 0.71 0.75 - An example of phyllogeny reconstruction from genetic differences by UPGMA Molecular clocks:The longer time => The more genetic divergence Sequence divergence and time let K be the distance between two sequences the rate of amino acid substitution, r, can be estimated if we know the time of divergence, T the rate is: r = K / 2T (because 2 sequences are diverging) Human Horse If we know r, the time of divergence T can be estimated as: T = K / 2r Dating a branch split approximate T2 is known from historical record, vicariance event, or fossil record compare sequences A, B, C pairwise and compute number of substitutions per site, K T1 = (KAC + KBC)T2 / (2KAB) A B This procedure assumes constant substitution rate in all branches C T2 T1 also, - mutations (divergence points) may be older than the dating speciation events Molecular clock Zuckerkandl & Pauling (1965): rate of amino acid change appears constant through time Kimura (1968,1983): •if sequence divergence between humans and horses is scaled for time using fossils •and estimated evolutionary rate, r, is applied to all known protein coding loci •one amino acid substitution has been fixed every second year on average Interpretation: This is too much for selection to have been influential during evolution of the vertebrates the fate of mutations mutations can be neutral mutations can be advantageous and subject to positive selection mutations can be disadvantageous and subject to purifying selection Mutations can be driven by thermodynamic pressure selection can be detected by testing sequences against the predictions of neutral theory (for instance synonymous vs non-synonymous codons) Evolutionary constraints on DNAs 0.7 Entropy 0.6 0.5 0.4 0.3 0.2 0.1 0 Base position AA UA AG UU AA U U A A A U A U A U U U U U U U U A A AA U A UU U U A UA U A A UU U A G U C G G A U A U G U U U G C U G U A A U U A U A A U G C U A U A C A U U A UA AA U A U A A UA G G A C U A U U U UU C C U A A A A GC U U UA GGA U U U U U G A A A G CA U C A U G AU U U U A G AA A A C A U UA A A A U U AG U C GCU U G A G U AU U A U UUU loop helix Constraints are associated with functionality, for instance the need for rRNA to base pair and form helices in a secondary molecular structure Transcription and translation Translation requires available tRNA with appropriate anticodons to match with each codon on mRNA anticodon codon DNA coding Codes in organelle genomes differ slightly from the standard code Codon usage codon bias: all codons are not equally frequent Anopheles gambiae AAcid Codon Fraction Gly GG G 0.14 Gly GG A 0.56 Gly GG T 0.27 Gly GG C 0.03 Codon redundancy: synonomous (silent) substitutions give the same amino acids. Glu Glu Asp Asp GA G GA A GA T GA C 0.02 0.98 0.95 0.05 synonomous substitutions do not affect the translation product and thus should be neutral in expressed genes Val Val Val Val GT G GT A GT T GT C 0.02 0.50 0.45 0.02 Ala Ala Ala Ala GC G GC A GC T GC C 0.00 0.28 0.64 0.08 However, availability of specific tRNAs may make some codons more ’fit’ 5. Anomalous DNA composition Synonymous codons are expected to be neutral, are expected to occur in equal frequency Expect 50/50 frequency for two phenylalanine codons Codon biases are found in all known prokaryotes Codon frequencies in E. coli Translational efficiency depends on tRNA availability some tRNAs may pair with different codons due to: •’wobbles’ on the anticodon •modified nucleotides on the anticodon (possibility of G-U-pairing, Inosine (G’) pairs with A,C and U for instance: codons xxC and xxU can be read by the same anticodon, xxG xxG xxG xxC xxU anticodon codon Consequently some genomes do well with reduced number of tRNA types in the genome: 22 in vertebrate mitochondrial (mtDNA). Leucine codons in two organisms tRNAavailability Usage: highly expressed Usage: lowly expressed Factor Analysis of codon usage of B. subtilis genes reveals three classes of genes Class 2 (5%) genes that are highly expressed under exponential growth conditions Class 1 comprises the majority of the B. subtilis genes (82%) Class 3 (13%) genes that were apparently Kunst, F et al. Nature (1997) 390 249-256 horizontally Because some of the genes in this group showedtransferred. clear relationships with bacteriophage genes, the hypothesis has been proposed that all these genes were alien and have been acquired horizontally from various sources. Why do horizontally transferred genes use the genetic code differently? Mozner I. Current Opinion in Microbiology 1999, 2:524–528 Bacterial species display a wide degree of variation in their overall G+C content Rocha EP. Trends Genet 2002 Jun;18(6):291-4 However, most genes have roughly the same GC content within a genome • • • Distribution of A + T-rich islands along the chromosome of B. subtilis. Location of genes from class 3 according to codon usage analysis is indicated by dots at the bottom of the graph. Known prophages (PBSX, SPb and skin) are indicated by their names, and prophage-like elements are numbered from 1 to 7. Kunst, F et al. Nature (1997) 390 249-256 Synonimous substitutions are not necessarily neutral lowly expressed genes weak selection for translational efficiency highly expressed genes strong selection for translational efficiency more tRNAs used fewer tRNAs used weak codon bias strong codon bias high rates of silent (neutral) mutations low rates of silent mutations: i.e. synonomous mutations are not necessarily neutral! purifying selection Code Table: Standard Method: Nei-Gojobori (1986) S = No. of synonymous sites N = No. of nonsynonymous sites ----- No of Sites Redundancy ----- for codon Pos Pos Pos Codon S N 1st 2nd 3rd TTT (F) 0.333 2.667 0 0 2 TTC (F) 0.333 2.667 0 0 2 TCT (S) 1.000 2.000 0 0 4 TCC (S) 1.000 2.000 0 0 4 TCA (S) 1.000 2.000 0 0 4 TCG (S) 1.000 2.000 0 0 4 TAA (*) 0.000 3.000 0 0 0 TAG (*) 0.000 3.000 0 0 0 TGA (*) 0.000 3.000 0 0 0 TGT (C) 0.500 2.500 0 0 2 TGC (C) 0.500 2.500 0 0 2 TGG (W) 0.000 3.000 0 0 0 CTT (L) 1.000 2.000 0 0 4 CTC (L) 1.000 2.000 0 0 4 CTA (L) 1.333 1.667 2 0 4 TTA (L) 0.667 2.333 2 0 2 TTG (L) 0.667 2.333 2 0 2 Redundancy and rates on codon positions With codon redundancy we would expect less selective constraints on 3rd codon positions. 1st and 2nd position should be under stronger selective pressure. Consequently evolution rates on 3rd codon positions are usually found to be higher than on 1st and 2nd positions The problem: different molecules can yield different trees AND may still be telling the truth Even the sacred of sacreds of phylogenetic taxonomy can be violoated: Gene tree A Gene tree B Archae Bacteria Gene tree C Kingdoms are not monophyletic in gene tree B and C The solution: Horizontal Gene Transfer (HGT) HGT possesses two ingredients sure to cause a controversy 1. Challenges the traditional tree-based view of evolution 2. Is difficult to prove unambiguously “Infectious heredity” The significance of horizontal transfer was first recognized in the 1950’s resistance to multiple antibiotics could be transferred simultaneously from Shigella to Escherichia coli Xenologs arise by horizontal transfer organisms Ancestral gene Paralogs Speciation time Orthologs Duplication Xenologs Horizontal Transfer Xenologs Xenologs – homologs related by horizontal transfer Orthologs – homologs related by speciation Paralogs – homologs related by duplication Mechanisms of horizontal transfer (also referred to as lateral transfer) 1) Transformation – prokaryotes can take up free DNA from their surroundings 2) Conjugation – (bacterial sex) an organism builds a tube-like structure known as the pilus, joins it to its ‘‘mate’’, and transfers a plasmid through the tube. E. coli has been shown to conjugate with cyanobacteria, AND EVEN with S. cerevisiae! 3) Transduction – genes can be moved from one prokaryote species to another via viruses. Horizontally transferred genes retain the sequence characteristics of the donor genome Base composition differences are mostly due to third position of codons Lawrence and Ochman. J Mol Evol (1997) 44:383–397 4. Conservation of gene order Gene order is not generally conserved in microbial genomes E. coli B. subtilis V. cholerae • The presence of three or more genes in the same order in distant genomes is extremely unlikely unless these genes form an operon. • Each operon typically emerges only once during evolution and is maintained by selection ever after. • Therefore, when an operon is present in only a few distantly related genomes, horizontal gene transfer seems to be the most likely scenario.