Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006 % of alternatively spliced human and mouse genes by year of publication Human (genome / random sample) All genes Human (individual chromosomes) Only multiexon genes Mouse (genome / random sample) Genes with high EST coverage Plan • Evolution of alternative exon-intron structure – mammals: human, mouse, dog – dipteran insects: Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae • Evolutionary rate in constitutive and alternative regions – human / mouse – D. melanogaster / D. pseudoobscura – human-chimpanzee / human SNPs Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron Alternative exon-intron structure in the human, mouse and dog genomes • EDAS: a database of human alternative splicing (human genome + GenBank + EST data from RefSeq) – consider casette exons and alternative splicing sites – functionality: potentially translated vs. NMD-inducing elementary alternatives • Human-mouse-dog triples of orthologous genes • We follow the fate of human alternative sites and exons in the mouse and dog genomes • Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: – conservation of the corresponding region (homologous exon is actually present in the considered genome); – conservation of splicing sites (GT and AG) Caveats • we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes • we do not consider situations when alternative human exon (or site) is constitutive in mouse or dog • of course, functionality assignments (translated / NMD-inducing) are not very reliable Translated cassette exons constitutive NMD-inducing cassette exons Observations • Predominantly included exons are highly conserved irrespective of function • Predominantly skipped translated exons are more conserved than NMD-inducing ones • Numerous lineage-specific losses – more in mouse than in dog • Still, ~40% of skipped (<1% inclusion) exons are conserved in at least one lineage Alternative donor and acceptor sites: same trends • Higher conservation of ~uniformly used sites • Internal sites are more conserved than external ones (as expected) Alternative exon-intron structure in fruit flies and the malarial mosquito • Same procedure (AS data from FlyBase) – cassette exons, splicing sites – also mutually exclusive exons, retained introns • Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes • Technically more difficult: – incomplete genomes – the quality of alignment with the Anopheles genome is lower – frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles) Conservation of coding segments constitutive segments alternative segments D. melanogaster – D. pseudoobscura 97% 75-80% D. melanogaster – Anopheles gambiae 77% ~45% Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CONSTANT exon Donor site Acceptor site Retained intron Cassette exon Exclusive exon • retained introns are the least conserved (are all of them really functional?) • mutually exclusive exons are as conserved as constitutive exons Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CONSTANT exon Donor site Acceptor site Retained intron Cassette exon Exclusive exon • ~30% joined, ~10% divided exons (less introns in Aga) • mutually exclusive exons are conserved exactly • cassette exons are the least conserved CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles a) Dme, Dps Aga CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site in Anopheles Dme, Dps Aga Evolutionary rate in constitutive and alternative regions • Human and mouse orthologous genes • Estimation of the dn/ds ratio: higher fraction of non-synonymous (changing amino acid) substitutions => weaker stabilizing (or stronger positive) selection Concatenates of constitutive and alternative regions in all genes: different evolutionary rates 0,301 0,30 Am ino-acid ide ntity 0,199 0,176 0,187 dN/dS 0,20 0,9 0,10 0,886 0,874 0,878 0,807 0,8 0,7 Constitutive 0,00 Constitutive N-end alternative Internal alternative C-end alternative • Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio) N-end alternative Internal alternative • Less amino acid identity in alternative regions Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end C-end alternative Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger in alternative regions (vertical acis) than in constitutive regions (horizontal acis) A 10 1 0.1 0.01 0.001 С Non-symmetrical histogram of dn/ds(const)–dn/ds(alt) Genes 1000 752 642 329 199 100 136 73 67 40 27 10 18 15 9 18 10 7 5 7 3 1 0 0 0 1 – C – –1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Black: shadow of the left half. In a larger fraction of genes dn/ds(const)<dn/ds(alt), especially for larger values A Genes 1000 526 469 233 100 The same effect is seen in: 127 111 52 49 32 21 10 12 7 7 3 1 8 6 5 3 2 1 0 0 0 – AN – AI – AC C –1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 – 0.1 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 285 283 N-terminal, 100 109 101 51 17 10 15 10 9 internal, 30 23 10 6 4 3 5 3 2 1 –1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 – 0 0 0 0 C 0.1 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 C-terminal parts 100 20 10 14 23 19 9 6 4 3 0 1 – 1 0 4 3 0 1 –1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1 2 0 1 0 0 0 0 C 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Drosophilas: less selection in alternative regions? More mutations in alt. regions Similar level of mutations More mutations in const. regions In a majority of genes, both synonymous and nonsynonymous mutation rates are higher in alternative regions than in constitutive regions Different behavior of N-terminal, internal and C-terminal alternatives N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions; most genes have higher stabilizing selection in alt. regions Internal alternatives: intermediate situation C-terminal alternatives: more non-synonymous substitutions and less synonymous substitutions => lower stabilizing selection in alternative regions The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions • • • • Human and chimpanzee genome mismatches vs human SNPs Exons conserved in mouse and/or dog Genes with at least 60 ESTs (median number) Fisher’s exact test for significance Pn/Ps (SNPs) Dn/Ds (genomes) Const. 0.72 0.62 Major 0.78 0.65 diff. – 0.10 – 0.13 Signif. 0 0.5% Minor + 0.48 0.1% 1.41 1.89 Minor isoform alternative regions: • More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% • More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37% • Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions • Similar results for all highly covered genes or all conserved exons An attempt of integration • AS is often genome-specific • young AS isoforms are often minor and tissue-specific • … but still functional – although unique isoforms may result from aberrant splicing • AS regions show evidence for decreased negative selection – excess non-synonymous codon substitutions • AS regions show evidence for positive selection – excess non-synonymous SNPs • AS tends to shuffle domains and target functional sites in proteins • Thus AS may serve as a testing ground for new functions without sacrificing old ones What next? • Multiple genomes – many Drosophila spp. – ENCODE data for many mammals • Estimate not only the rate of loss, but also the rate of gain (as opposed to aberrant splicing) • Control for: – functionality: translated / NMD-inducing – exon inclusion (or site choice) level: major / minor isoform – tissue specificity pattern (?) – type of alternative: N-terminal / internal / C-terminal • Evolution of regulation of AS • Splicing errors and mutations: retained introns, skipped exons, cryptic sites Acknowledgements • Discussions – – – – – – Vsevolod Makeev (GosNIIGenetika) Eugene Koonin (NCBI) Igor Rogozin (NCBI) Dmitry Petrov (Stanford) Dmitry Frishman (GSF, TUM) Shamil Sunyaev (Harvard University Medical School) • Data – King Jordan (NCBI) • Support – Howard Hughes Medical Institute – INTAS – Russian Academy of Sciences (program “Molecular and Cellular Biology”) – Russian Fund of Basic Research Authors • Andrei Mironov (Moscow State University) • Ramil Nurtdinov (Moscow State University) – human/mouse/dog • Dmitry Malko (GosNIIGenetika) – drosophila/mosquito • Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks • Vasily Ramensky (Institute of Molecular Biology) – SNPs • Irena Artamonova (GSF/MIPS) – human/mouse, plots • Alexei Neverov (GosNIIGenetika) – functionality of isoforms