* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The divergence of duplicate genes in Arabidopsis
Neuronal ceroid lipofuscinosis wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Oncogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Ridge (biology) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Genomic imprinting wikipedia , lookup
Public health genomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene desert wikipedia , lookup
Human genome wikipedia , lookup
Genomic library wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
Human Genome Project wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Designer baby wikipedia , lookup
Genome editing wikipedia , lookup
The dynamics of nuclear gene order in the eukaryotes Genome archaeology in the angiosperms Todd Vision Department of Biology University of North Carolina at Chapel Hill Comparative maps Spaghetti Diagram Livingstone et al 1999 Genetics 152:1183 Crop Circle Gale & Devos 1998 PNAS 95:1972 Arabidopsis as a hub for plant comparative maps megabases genome sizes in angiosperms 907 1000 750 560 622 473 367 367 372 415 439 500 262 250 145 0 is ch er ge ya ce go ot am an to s p ea mb an pa ri an rr y be ma o o p cu or pa m ca d a t i m b cu a li r A data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218 Tomato-Arabidopsis synteny Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121 Outline • Ancient genome duplication – How can we reconstruct genomic history? • Computational challenges • Role of different classes of gene duplication in genome evolution Outline • Ancient genome duplication – How can we reconstruct genomic history? • Computational challenges • Role of different classes of gene duplication in genome evolution Rice-Arabidopsis synteny Mayer et al. (2001) Genome Res. 11, 1167 Paleotetraploidy? The Arabidopsis Genome Initiative. 2000. Nature 408:796 Genomic dot-plot Chromosome copy 1 Chromosome copy 2 gene 1 2 3 4 5 6 7 8 1 1 0 0 0 1 0 0 0 2 0 1 0 0 0 1 0 0 1 2 3 4 5 6 7 8 3 0 0 1 0 0 0 1 0 4 0 0 0 1 0 0 0 1 5 1 0 0 0 1 0 0 0 6 0 1 0 0 0 1 0 0 7 0 0 1 0 0 0 1 0 8 0 0 0 1 0 0 0 1 Duplication vs. multiplication Multiple duplications generate abundant overlaps among homeologous regions Segmental paralogy in Arabidopsis Vision et al. (2000) Science 290:2114-7. Many duplicated segments but few duplication events frequency of blocks 12 A B C D E F 10 8 6 4 2 0 0 .1 .2 .3 .4 .5 .6 .7 .8 amino acid substitution .9 Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144. rice Arabidopsis Angiosperm Phylogeny Website. Version 2 August 2001. http://www.mobot.org/MOBOT/research/APweb/. tomato Block 37 after Asterid-Rosid split Block 57 before monocot-dicot divergence Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129 Divergence of homeologs • Homeologs from age class C and older share less than a third of their genes – Gene loss – Or subsequent gene movement? • There is no evidence for uneven proportions of duplicated genes between homeologs Redundant gene function: SHATTERPROOF Martin Yanofsky Implications for comparative maps • Networks of synteny • Goodbye to pairwise comparisons Outline • Ancient genome duplication – How can we reconstruct genomic history? • Computational challenges • Role of different classes of gene duplication in genome evolution Ghosts and Muggles Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627 Interspecies comparison can reveal Ghosts Things needful • Identification of highly diverged Muggles • A systematic way to identify Ghosts • Centralization of mapped and sequenced DNA markers from multiple species FISH (Fast Identification of Segmental Homology) • Identifies candidate segmental homologies – Dynamic programming • Statistically evaluates candidates – Null model of transpositional duplication • No permutations required • Approaches limits to sensitivity FISH under null model k 2 observed standard upper number error bound 45.8 0.06 47.6 lower bound 40.1 3 2.28 0.02 2.39 1.78 4 0.113 0.003 0.120 0.079 5 0.006 0.001 0.006 0.004 6 0.0003 0.0002 0.0003 0.0002 eAssembler • Reconstructs ancestral gene order by joining duplicated blocks with overlapping gene content • Uses ‘breakpoint median’ as objective function • Similar to algorithms used in sequence assembly Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144. PHYTOME integrating plant genome maps, sequences and phylogenies From www.plantgdb.org Outline • Ancient genome duplication – How can we reconstruct genomic history? • Computational challenges • Role of different classes of gene duplication in genome evolution Gene duplications in a chromosomal context • Turnover within gene families can be high – Rate of duplication= 0.002/gene*MY – Half-life=23MY • Three modes of duplication – Tandem – Transpositional – Segmental • How does the mode of origin affect the molecular and functional divergence of duplicate genes? Gene family turnover Lynch and Conery (2000) Science 290, 1151 Importance of tandem and transpositional duplications ~10% of genes are in tandem arrays 85% of dispersed duplications are not in blocks • Duplicates on the same chromosome are 20% more common than expected by chance • Duplicates on the same chromosome are 86% as distant as would be expected by chance Aux/IAA and ARF sister families • Importance in Arabidopsis Diversification of the Aux/IAA gene family David Remington and Jason Reed Diversification of ARF gene family Chromosome 2-4 complex: 242 duplicated gene pairs 4200 chromosome 4 (4.6 Mb) 52 3800 54 45 3400 56 49 3000 2600 1200 1600 2000 2400 chromosome 2 (5.6 Mb) 2800 Substitutions in coding sequences • silent substitutions (Ks) only alter the codon, not the resulting amino acid • replacement substitutions (Ka) alter the amino acid • Ka and Ks are standardized by the numbers of synonymous and nonsynonymous sites Ratio of Ka to Ks Ka/Ks < 1 selective constraint Ka/Ks = 1 pure neutrality Ka/Ks > 1 positive selection How have these ancient segmental duplicates diverged? 1. What is the variation in Ka and Ks among simultaneously duplicated pairs? 2. Do the Ka/Ks ratios suggest positive selection? 3. Do the members of each duplicated pair evolve at the same rate? 70 coefficient of variation = 0.67 60 frequency 50 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Ka 120 coefficient of variation = 0.53 100 frequency 80 60 40 20 0 0 1 2 3 Ks 4 5 Relationship between Ka and Ks 1 0.8 Ka/Ks =1 Ka 0.6 0.4 0.2 0 0 1 2 3 Ks r2=0.558, p<0.001 4 5 Relative rate test O (outgroup) d1 A B d2 d3 compare the fit of a model in which d2 = d3 with one in which they are allowed to vary Relative rate tests • 105 gene pairs could be evaluated against an outgroup • >30 showed significantly unequal rates of evolution • no evident chromosomal or regional biases Distance measure Significant pairs protein 15 Ka 29 Ks 9 Are paralogs different than orthologs? • Homologous genes are either – Paralogs that diverged through duplication – Orthologs that diverged though speciation • Paralogs must coexist in the same genome – do they diverge differently as a result? • Comparison to 212 Arabidopsis-Brassica orthologs by Tiffin and Hahn (2002) JME 54, 746. – For all pairs, Ka/Ks < 1 – Ka/Ks unimodal around 0.14 (as opposed to 0.20) – CVKs/CVKa is appx. 2 Conclusions • A network of synteny due to duplication and gene loss makes deep comparative mapping difficult • But phylogenetically-informed methods should allow us to go much deeper than at present • Only by going deep will we be able to understand the varied roles of different kinds of duplication events in the diversification of gene families Acknowledgements • Arabidopsis genome evolution – Daniel Brown – Steven Tanksley • Comparative mapping – Peter Calabrese – Sugata Chakravarty – Luke Huan • Evolution of duplicated genes – Liqing Zhang – Brandon Gaut – David Remington – Jason Reed • Support – USDA – NSF Conservation of gene orientation parallel convergent divergent Formulating the problem in terms of graph traversal • nodes are matches • edges are unidirectional • edges have associated distances The putative duplicated blocks consist of the paths through the graph that traverse edges with short distances Statistical framework • Null model of duplications – Single-gene duplication/random transposition – Leads to uniformly distributed dots • Null distribution for – The edge distance between nearest neighbors – The number of serially connected short edges • Observed edge distances and path lengths analytically compared to null expectation • Can be approximated by a permutation test Only a fraction of the genes are (still?) duplicated Chr2 segment 1183 genes Chr4 segment 1168 genes 326 duplicates (~28%) 271 (83%) pairwise duplications Tandem substitutions • correlation between Ka and Ks disappears when tandem substitutions are excluded • could be due to – doublet mutations – compensatory substitutions 49.5 calmodulin-binding protein 49.62 beta-expansin AT4g31000 At2g18750 tobacco 1698547 0.16 0.37 rice 8118436 AT4g28250 At2g20750 0.22 0.70 0.12 0.13 p<0.0001 p<0.05 49.63 NADH-ubiquinone oxireductase 56.1 unknown transmembrane At2g20800 AT4g30430 AT4g28220 potato 0.29 5734586 Hemerocallis 3551953 At2g23810 0.16 0.10 0.30 0.14 p<0.0001 p<0.01 0.22