* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download RISE AND FALL OF GENE FAMILIES Dynamics of Their Expansion
Vectors in gene therapy wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Pathogenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
History of genetic engineering wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Evolution of Plant Stress Responsiveness: Genome-wide and Gene Family Level Analysis Shin-Han Shiu Department of Plant Biology KBS, 1/18, 2008 Outline Major interests and why Gene families and stress responsiveness: The interplay between gene family expansion, duplication mechanism, and the elusive selection pressure The Receptor Kinase family as an example One of the biggest plant gene families and their involvement in plant biotic interactions If there is enough time, the short story on plant pseudogenes When can you can a gene pseudogene? Major interests Molecular evolutionary patterns Source of selection pressure: abiotic and biotic stress conditions Target of selection: duplicate genes Genetic basis of adaptation Where does all these duplicates come from Whole genome duplication + Tandem duplication Segmental duplication Replicative transposition Plasticity of plant gene contents 3 whole genome duplications in the Arabidopsis thaliana lineage over the past ~150 million years 15,000* Expected 30,000 60,000 120,000 Observed Arabidopsis gene content: 21,000** More “recent” retentions in plants *: Number of orthologous groups in shared families between Arabidopsis and rice. **: Number of genes in shared families. Shiu et al. (2006) PNAS Plant Gene Family Evolution: Major questions What is the rate of gene gains in plants? Do certain types of genes have higher gain rate? What is the influence of duplication mechanisms? Finally, how does genes that are responsive to stresses behave? AtGenExpress microarray dataset 22 stress conditions Measuring Lineage-specific Gain Orthologous group and lineage-specific gain Reconcile species and gene trees Retention rate along the A. thaliana lineage Diminishing rate of retention over time Retained (R) Rate (R/My) 1 2 3 M R P A 1 1521-1576 3.0- 3.2 2 734-1479 9.2-18.5 3 5774-6995 48.1-58.3 Expansion at the gene family level Lineage-specific gains per family in one plant lineage are moderately correlated with gains in the other lineage. E.g. a family with 3 OGs Moss 1 4 Moss 3 4 Moss 1 At 9 Moss 5 P. patens lineage-specific gains 1 150 LRR LRR 120 Protein kinase 90 Kinesin Protein kinase y = 0.32x r 2 = 0. 33 ABC trans 60 AP2 Mito_carr 30 UDPGT PPR P450 NB-ARC C1 PPR 0 0 50 100 150 200 250 300 A. thaliana lineage-specific gains Rensing et al. (2008) Science N Obs Enrichment: log N Exp log(freq) Expansion at the orthologous group level log(OG size) Two major patterns in OG expansion Convergent expansion Single lineage expansion >6 >6 5 5 5 4 4 4 3 Poplar >6 Rice Moss 3 3 2 2 2 1 1 1 0 0 0 0 1 2 3 4 5 >6 Arabidopsis thaliana −0.7 0 0.7 log2(Obs/Exp) 0 1 2 3 4 5 >6 Arabidopsis thaliana N Obs Enrichment: log N Exp 0 1 2 3 4 5 >6 Arabidopsis thaliana Expansion patterns and duplication mechanisms Comparison of ratios between tandem and non-tandem genes e.g. for A-M orthology OG type A-M A-R A-P Convergent Single-lineage Tandem 756 848 Non-tandem 4500 2918 Ratio 0.17 0.30 Method for defining OG Similarity Tree Similarity Tree Similarity Tree Expansion pattern Convergent 1 Single-lineage2 0.17 (756/4500) < 0.30 (848/2918) 0.16 (831/5297) < 0.40 (1443/3566) 0.31 (959/3115) < 0.47 (644/1375) 0.27 (844/3073) < 0.50 (1631/3294) 0.29 (1141/3944) < 0.60 (741/1234) 0.26 (1014/3930) < 0.64 (1578/2452) P values 3 2.2×10-23 9.4×10-88 3.1×10-12 2.3×10-33 7.2×10-38 1.0×10-83 Summary I Duplicate gene turn over But even though some of them are retained for millions of years, the majority of them will be lost over hundreds MY time scale. The degree of lineage-specific expansion is similar at the family level but with substantial variation Expansion patterns fall into two major categories Convergent expansion Single lineage expansion Orthologous group with single lineage expansion Tend to be enriched in tandemly repeated genes What's so special about tandem genes Duplication rate (event per unit time): Whole genome duplication: 1 event / ~50 million years Tandem duplication: multiple events / generation Rate of recombination Recombination rate: Pathogen attack > control Lucht et al., 2002. Nature. Recombination rate: Tandem > non-tandem Zhang & Gaut, 2003. Genome Res. Gene family expansion and functional bias Question: What types of genes tend to experience expansion? What is the influence of duplication mechanism? Classification of genes: In OG without expansion In OG with expansion Gene Ontology, controlled vocabulary describing Gene functions: e.g. protein kinase, involved in attaching phosphates onto self or other proteins, serving as a molecular switch. Biological processes involved e.g. serine/threonine phosphorylation, the process of attaching phosphate onto amino acid ser or thr. Location within the cell e.g. plasma membrane Functional bias of gene retention Stress response categories over-represented in the vascular plant lineage Cellular component categories: T vs. NT Tandem: Extracellular region, cell surface, endomembrane Non-tandem: cytosol, cytoskeleton, nucleus Biological process categories: T vs. NT Tandem: kinases, glycosinolate transferase, toxin responses Non-tandem: regulation & hormone metabolism cellular metabolism generation precursor met and energy phosphorus metabolism secondary metabolism hormone metabolism regulation metabolism glycosinolate metabolism metabolism regulation physiological process regulation biological process regulation cellular process regulation cell physiol process toxin metabolism regulation cellular metabolism response toxin physiological process biological process cellular process localization establishment localization cellular physiological process transport cell communication response to stimulus response to chemical stimulus response endogenous stimulus response stress defense response response to abiotic stimulus response to biotic stimulus signal transduction (ST) response drug response hormone stimulus response to osmotic stress response to other organism peptide transport lipid transport drug transport cell surface receptor linked ST response ABA stimulus response salt stress response to bacterium Biological process categories: T vs. NT (contd.) Tandem: response to stimuli, various transport functions Non-tandem: cell-cell communication and hormone response cellular metabolism generation precursor met and energy phosphorus metabolism secondary metabolism hormone metabolism regulation metabolism glycosinolate metabolism metabolism regulation physiological process regulation biological process regulation cellular process regulation cell physiol process toxin metabolism regulation cellular metabolism response toxin physiological process biological process cellular process localization establishment localization cellular physiological process transport cell communication response to stimulus response to chemical stimulus response endogenous stimulus response stress defense response response to abiotic stimulus response to biotic stimulus signal transduction (ST) response drug response hormone stimulus response to osmotic stress response to other organism peptide transport lipid transport drug transport cell surface receptor linked ST response ABA stimulus response salt stress response to bacterium Stress responsiveness Expression data set: Arabidopsis thaliana Under 22 abiotic and biotic stress conditions Definition: stress responsiveness For a given gene ET: Expression level under stress condition Ec: Expression level under mock treatment control If ET >> Ec: Significant UP regulation If ET << Ec: Significant DOWN regulation Question: do stress responsive genes tend to be those that are gained throughout plant evolution? Expansion of responsive genes and conditions Genes in expanded OGs tends be enriched in stress responsive genes Response Up regulation OG type A-M Statistical test 1 Exp Down regulation A-R 2 T/N A-P A-M A-R Exp T/N Exp T/N Exp T/N + T + T + N Exp A-P T/N Exp T/N 3 Abiotic stress conditions + UV-B T Wounding + T + Cold4C + N + + N + + + + + Salt + + + + + + Osmotic T + N + + + + + Biotic stress conditions3 AvrRpm1 + + + DC3000 + + + Flg22 + T + T + GST-NPP1 + T + T + T HrcC- + T + T + T HrpZ + P. infestans + T + T + T Psph + T + T + T + the 5% level + N Heat Drought +: significant at + N Stress responsiveness and duplication mechanisms Enrichment of tandemly over non-tandemly expanded genes under biotic conditions Response Up regulation OG type Statistical test A-M Exp1 Down regulation A-R T/N2 Abiotic stress conditions3 + UV-B T A-P A-M Exp T/N Exp T/N Exp T/N + T + T + N Wounding + T + Cold4C + N + + N + + + Drought + + T Salt + + + + + + Exp A-P T/N Exp T/N + N Heat Osmotic A-R + + N + N + Biotic stress conditions3 1 AvrRpm1 + + + DC3000 + + + Flg22 + T + T + GST-NPP1 + T + T + T HrcC- + T + T + T HrpZ + P. infestans + T + T + T Psph + T + T + T + Significant at the 5% level + + + T: tandem >> non-tandem N: non-tandem >> tandem Tandem genes tend to be “bioticly” responsive This does not mean biotic responsive genes tend to be tandem Among GO molecular function categories that are enriched in genes respond to biotic stresses: Tandem >> non-tandem 1 ns P 1 a t M 000 2 s h NP C- Z P e f R 2 3 T n p avr DC Flg GS Hrc Hrp P-i Ps Non-tandem >> tandem DNA binding nucleic ac transcription factor transcription regulator binding ion binding metal ion binding transition metal ion binding carbohydrate binding oxidoreductase transferase glycosy UDP-glycosy kinase_activity Summary II Over the course of plant evolution, retention rate: Stress response genes >> genome average True for genes up-regulated in both biotic and abiotic stress conditions Influence of duplication mechanism, particularly for biotic stress conditions, retention rate: Tandem >> non-tandem However, genes responsive to biotic stimuli are not necessarily tandem Depend on their location in the signaling network e.g. Plant receptor kinase: biotic -> tandem e.g. Transcription factors -> non-tandem, presumably WGD Receptor Kinase Arabidopsis Transmembrane Kinase 1 Shiu & Bleecker (2001) Science’s STKE Functional bias: the Receptor-Like Kinase family Shiu & Bleecker (2001) PNAS The Kinase superfamily Family size differences imply differential expansion Kinase: >1000 in A. thaliana, >1600 in Oryza sativa RLK/Pelle: ~600 in At, ~1200 in Os Animal homolog: Drsophila: Pelle Mammalian: IRAKs Shiu et al. (2004) Plant Cell Receptor kinase configuration ECD Kinase RLK RLCK Kinase Other Kinases Arabidopsis thaliana (A) Populus trichocarpa (P) 148 388 462 187 453 1003 159 376 Oryza sativa (O) 911 Physcomitrella patens (M) Chlamydomonas reinhardtii 73 256 356 2 424 Ostreococcus tauri 93 Innovation LysM GDPD Thaumatin CHASE DUF26 LRR LRR GH18 GH18 DUF26 Thaumatin LRR DUF26 Thaumatin Functional bias: motivated by RLK studies Shiu et al., 2004 Plant Cell Stress responsiveness of RLKs RLKs are more responsive to stress than genome average Response Statistical test Up regulation RLK Abiotic stress conditions UV-B O Wounding O Drought U Cold4C Heat U Salt Osmotic O Biotic stess conditions Flg22 GST-NPP1 HrcCP.infestans Psph HrpZ AvrRpm1 DC3000 Down regulation T/N RLK T/N T O O O N N na N O N O O O O O T T T T T O T O N O O O N N O O N Stress responsiveness of RLKs Tandem RLKs are more responsive to biotic stress than nontandem RLKs Response Statistical test Up regulation RLK Abiotic stress conditions UV-B O Wounding O Drought U Cold4C Heat U Salt Osmotic O Biotic stess conditions Flg22 GST-NPP1 HrcCP.infestans Psph HrpZ AvrRpm1 DC3000 Down regulation T/N RLK T/N T O O O N N na N O N O O O O O T T T T T O T O N O O O N N O O N Stress responsiveness and tandem RLKs Responsiveness (R) of an RLK subfamily For subfamilies with ≥ 10 genes i: subfamily j: condition UP: # of up-regulated genes DN: # of down-regulated Ri j UPj Ni or Ri j DN j Ni The “RLK swarm” model In the context of biotic stress signaling networks T > NT NT > T NT > T T > NT Summary III Innovation in the RLK/Pelle family Most RK configuration established > 700 million ago. Plenty evidence of domain shuffling, but the rate is not high. Shuffled domains suggest involvement in biotic stress perception. History of expansion 4 major turnover patterns Substantially more recent gains in poplar and rice Mostly involved subfamilies with lots of tandem repeats Stress responsiveness RLK > genome average Tandem > non-tandem Biotic > abiotic Stress responsive genes are not necessarily tandem Plant pseudogenes Pseudogenes are: Genomic DNA sequences similar to normal genes but nonfunctional For protein coding genes, non-functional to many means: They have frameshift mutation or premature stop codons They are not transcribed into mRNA They exhibit signatures of neutral selection Pseudogene numbers and family size Gene family size is generally correlated with the number of pseudogenes in the family in question. 120 100 Ank 80 Pkinase Pkinase_tyr PPR NB-ARC 60 LRR_1 40 P450 Myb_DNA_binding 20 zf-C3H4 LRRNT-2 RRM1 0 Number of pseudogenes F-box 0 200 400 600 Domain family size 800 1 2 Family size (S) Slope Spearman's rank (ρ) p-value Overall 0.1247 0.5484 <2.2e-16 S < 10 0.0967 0.3000 <2.2e-16 10 ≤ S < 25 0.1259 0.2291 3.94e-5 25 ≤ S < 50 0.1950 0.2307 0.0209 50 ≤ S < 100 0.1650 0.3317 0.0152 S > 100 0.1177 0.6042 3.20e-4 Selection pressure on pseudogenes Pseudogenes still show signatures of purifying selection Determine pseudogene expression Tiling microarray Cover the whole genome, regardless of the annotation Can distinguish sense and antisense transcripts Transcript array Tiling array Exon UTR Intron Cis-regulatory elements Novel genes MAR (Matrix attachment regions) Selection pressure on pseudogenes Pseudogenes still show signatures of purifying selection A. Arabidopsis B. Rice Summary IV Relationships between gene family sizes and the numbers of pseudogenes Positively correlated Larger gene families tend to loss more frequently than smaller families Pseudogene still shows signature of purifying selection Mostly may due to the fact that pseudogenization event occurred relatively recently Pseudogenes are still expressed Significantly higher than intron antisense expression In rice, pseudogene expression is even as high as that among presumably functional genes Acknowledgement Lab members Melissa Lehti-Shiu Gaurav Moghe Cheng Zou Past member Kosuke Hanada, RIKEN Collaborators Jeff Conner Gregg Howe, PRL Rong Jin, CSE Doug Schemske Mike Thomashow, PRL Funding: Takk! http://blog.riflegear.com/