* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download (a) p 1 - Biology Department | UNC Chapel Hill
Human genome wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
X-inactivation wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Genomic library wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic imprinting wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome evolution wikipedia , lookup
Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill Outline Gene order rearrangement in plants • Chromosomal perspective • Gene family perspective Gene duplication and functional divergence • Segmental duplications as a tool Chromosomal perspective Biological importance • Clustering of gene function • Clustering of transcriptional activity Applied importance • Conservation of gene order (synteny) Devos and Gale 2000 Plant Cell 12, 637 Arabidopsis as a hub for plant comparative maps megabases genome sizes in angiosperms 907 1000 750 560 622 473 367 367 372 415 439 500 262 250 145 0 is ch er ge ya ce go ot am an to s p ea mb an pa ri an rr y be ma o o p cu or pa m ca d a t i m b cu a li r A Arumuganathan and Earle 1991 Plant Mol Biol Rep 9, 208. Arabidopsis paleopolyploidy The Arabidopsis Genome Initiative 2000 Nature 408, 796 Non-overlapping syntenies 4200 chromosome 4 (4.6 Mb) 52 3800 54 45 3400 56 49 3000 2600 1200 1600 2000 2400 chromosome 2 (5.6 Mb) 2800 Blanc et al. 2003 Genome Res. 13, 137. Blanc and Wolfe 2004 Plant Cell 16, 1667. Tomato-Arabidopsis synteny Bancroft 2001 TIG 17, 89 after Ku et al. 2000 PNAS 97, 9121. Rice-Arabidopsis microsynteny Mayer et al. 2001 Genome Res. 11, 1167. Hidden syntenies Simillion et al. 2002 PNAS 99, 13627. Interspecies comparison can reveal hidden syntenies Vandepoele et al. 2002 TIG 18, 606. Simillion et al. 2004 Genome Res. 14, 1095 From descriptive to predictive Can we predict the gene content of homologous segments when markers are sparse? Utility for QTL mapping • Prioritize candidate genes in a QTL region from a non-sequenced genome • Provide markers for fine-mapping Hidden Markov Models (HMM) Transition probabilities t1,1 Hidden states t1,2 1 p1(a) Emission probabilities p1(b) t2,2 t2,end 2 p2(a) p2(b) Observed states: a->b->a Hidden states: 1->1->2->end Probability: p1(a) t1,1 p1(b) t1,2 p2(a) t2,end end A gene content HMM Observed states • a homologous gene is either observed or not Hidden states • presence or absence of gene within a segment Emission probabilities • A gene will be unobserved if it is not present • A gene may be unobserved even if it is present • Dependent on the density of the gene map Transition probabilities • reflect conservation of gene content along the branches of a phylogeny Transition probabilities and the segment phylogeny 1 1-a Loss (L) P a 1-b Loss-Gain (LG) Multiple Loss-Gain (MLG) A b a A1 P 1-b 1-ai b A1 1 1-a P A2 1 ai A2 1 speciation i 2 duplication Estimating model parameters Segment phylogeny • Each set of homologous genes is missing from some segments • Estiimate an “averaged” distance matrix • Build tree with neighbor-joining and midpoint rooting HMM parameter estimation • Loss rate(s) • Gain rate • Number of genes present at the root Do parameter estimates converge? LG model n=100 genes no missing data a1 = 0.1, a2 = 0.3 1000 replicates Initial a 0.05 0.3 ˆ1 a SE ˆ2 a SE 0.106 0.006 0.294 0.018 0.106 0.006 0.294 0.018 Accuracy of hidden state assignments 5 segment phylogeny, a= a 1=0.1, a2=0.3, b=0.1, 24% gain estimated probability 1 0.8 L LG MLG 0.6 0.4 0.2 0 0 0.2 0.4 0.6 true probability 0.8 1 A large multiplicon 12 segments from rice and arabidopsi 56 sets of homologous genes Vandepoele et al 2003 Plant Cell 15, 2192. Self-validation test ? ? ? ? ? Probability of gene presence (8 longest segments) Segment True Estimate Diff 1 0.251 0.173 +0.078 2 0.225 0.166 +0.059 3 0.262 0.171 +0.091 4 0.149 0.175 -0.026 5 0.268 0.171 +0.097 6 0.233 0.167 +0.066 7 0.226 0.170 +0.056 8 0.148 0.168 -0.020 Branch lengths scaled so that longest branch is 1.0 Estimate of a = 0.7 Summary: gene content HMM Multispecies comparative maps • Becoming more common • Most species only partially characterized • Usefulness also compromised by sparse synteny Probabilistic models will allow us to move • from simple descriptions of the extent of synteny • to predictive tools that can guide further experiments Gene family perspective Modes of duplication • Tandem (T) • Dispersed (D) • Segmental (S) T D S A tale of two sisters: the ARF and the Aux/IAA gene families Modulate whole plant response to auxin Interact via dimerization • ARFs are transcription factors • Aux/IAAs bind and repress ARFs in the absence of auxin Diversification of ARFs Remington et al 2004 Plant Cell 135, 1738 The chromosomal context Remington et al 2004 Plant Cell 135, 1738 Diversification of the Aux/IAAs Remington et al 2004 Plant Cell 135, 1738 Remington et al 2004 Plant Cell 135, 1738 Why the different patterns of diversification? 12% (ARF) vs 40% (Aux/IAA) segmental duplications Presumably reflects differential retention Possible explanations • Dosage requirements • Coevolution with other interacting genes • Regional transcriptional regulation How typical is the Aux/IAA family? Gene family Genes S events Proteasome alpha & beta subunits 23 9 Ser/Thr phosphatase 26 10 Ras related GTP-binding 72 19 Auxin-independent growth 33 8 promoter Major instrinsic protein 38 10 Calmodulin 79 20 Phosphatidylcholine transferase 30 8 Cation/hydrogen exchanger Cannon et 28 8 Biology 4, 10. al. 2004 BMC Plant Segmental duplication of pathways? Blanc and Wolfe 2004 Plant Cell 16, 1679. Summary: gene family perspective Chromosomal context can matter Gene families differ in their patterns of duplicate gene proliferation • Presumably due to differential retention Polyploidy • Qualitatively differs from other gene duplication modes • Divergence of whole pathways possible Functional divergence and chromosomal context Do patterns of divergence (ie spatiotemporal expression) differ among T, D, and S duplicates? Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003) Appx. 50% of pairs diverge very rapidly Proportion of divergent pairs increases with synonymous substitions (Ks) Less so with replacement changes (Ka) • Plateaus at Ka ~0.3 in human In humans, distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific Digital expression profiling Massively Parallel Signature Sequencing (MPSS) • Count occurrence of 17-20 bp mRNA signatures • Cloning and sequencing is done on microbeads • Similar to Serial Analysis of Gene Expression (SAGE) “Bar-code” counting reduces concerns of • cross-hybridization • probe affinity • background hybridization Which enables • Accurate counts of low expression genes • Distinguishing expression profiles of duplicate genes MPSS technology Clone 3’ ends of transcripts to microbeads Sort by FACS and deposit in channeled monolayer Sequence 17-20 bp from 5’ end by hybridization Brenner et al. 2000 PNAS 97:1665. MPSS Data signature GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG . . GATCGGACCGATCGACT Total # of tags: frequency 2 53 212 349 417 561 672 702 814 . . 2,935 >1,000,000 Classifying signatures Duplicated: expression may be from other site in genome Potential alternative splicing or nested gene Anti-sense transcript or nested gene? Potential alternative termination Typical signatures Potential anti-sense transcript Potential un-annotated ORF Triangles refer to colors used on our web page: or Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. or Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). or Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. or Class 5 - entirely within intron, same strand. or Class 6 - entirely within intron, anti-sense. or Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. Core Arabidopsis MPSS libraries sequenced by Lynx for Blake Meyers, U. of Delaware Library Root Shoot Flower Callus Silique TOTAL Signatures sequenced 3,645,414 2,885,229 1,791,460 1,963,474 2,018,785 12,304,362 Distinct signatures 48,102 53,396 37,754 40,903 38,503 133,377 http://www.dbi.udel.edu/mpss Query by • Sequence • Arabidopsis gene identifier • chromosomal position • BAC clone ID • MPSS signature • Library comparison Site includes • Library and tissue information • FAQs and help pages Genome-wide MPSS profile in Arabidopsis Chr. I Chr. II Chr. III Chr. IV Chr. V Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures Dataset of duplicate pairs Arabidopsis gene families of size 2 classified as • Dispersed (280) • Segmental (149) • Tandem (63) For each pair • Measured similarity/distance in expression profile • Estimated silent Ks and replacement KA changes Expression distance library 2 library 1 library 3 Major findings Many pairs are divergent in sequence but not expression and vice versa Pairs have atypically high expression • Especially slowly evolving pairs Divergence increases with Ka, • Particularly among S duplicates! • Divergence tends to be highly asymmetric Expression level >5 ppm in x libraries Libraries 0 1 2 3 4 5 Genes in pairs 153 (15.5%) 124 (12.6%) 73 (7.4%) 93 (9.5%) 109 (11.1%) 432 (43.9%) All genes 4160 (23.3%) 2643 (14.8%) 1727 (9.6%) 1777 (10.0%) 1930 (10.8%) 5612 (31.4%) dN =0.48+0.37 KA, p<0.0001 Asymmetric divergence Type of Pair A B C ___________________________________________________ Young Dispersed (Ks0.5) 14 61 8 15.7% 68.5% 9.0% Tandem (Ks0.5) Old Dispersed (Ks>0.5) Segmental (All) 8 35 31 D 6 6.7% 29 14.3% 10 51.8% 9 17.9% 16.1% 111 18.3% 24 58.1% 21 12.6% 11.0% 104 20.8% 7 69.8% 7 4.7% 4.7% A: Each copy has higher expression in at least one library B: One copy has higher expression in all libraries that differ and at least two libraries differ C: Copies differ in expression in only one library D: Copies do not differ in expression in any libraries Why put gene family evolution into a chromosomal context? We can begin to understand and utilize patterns of evolution in gene order We can gain insight into the function and evolution of gene families that are not apparent from beanbag genomics Thanks to: Zongli Xu David Remington Jason Reed Tom Guilfoyle Blake Meyers NSF