Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia % of alternatively spliced human and mouse genes by year of publication Human (genome / random sample) All genes Human (individual chromosomes) Only multiexon genes Mouse (genome / random sample) Genes with high EST coverage Plan • Evolution of alternative exon-intron structure – mammals: human, mouse, dog – dipteran insects: Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae • Evolutionary rate in constitutive and alternative regions – human / mouse – D. melanogaster / D. pseudoobscura – human-chimpanzee / human SNPs • Functional consequences of alternative splicing: what does it do with proteins Alternative exon-intron structure in fruit flies and the malarial mosquito • Same procedure (AS data from FlyBase) – cassette exons, splicing sites – also mutually exclusive exons, retained introns • Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes • Technically more difficult: – incomplete genomes – the quality of alignment with the Anopheles genome is lower – frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles) Conservation of coding segments constitutive segments alternative segments D. melanogaster – D. pseudoobscura 97% 75-80% D. melanogaster – Anopheles gambiae 77% ~45% Observations • Alternative splicing is less conserved than constitutive one • D.melanogaster D.pseudoobscura – retained introns are the least conserved (are all of them really functional?) – mutually exclusive exons are as conserved as constitutive exons • D.melanogaster – Anopheles gambiae – mutually exclusive exons are conserved exactly (no intron insertions – would disrupt regulation?) – cassette exons are the least conserved 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CONSTANT exon Donor site Acceptor site Retained intron Cassette exon Exclusive exon 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% CONSTANT exon Donor site Acceptor site Retained intron Cassette exon Exclusive exon The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions • • • • Human and chimpanzee genome mismatches vs human SNPs Exons conserved in mouse and/or dog Genes with at least 60 ESTs (median number) Fisher’s exact test for significance Pn/Ps (SNPs) Dn/Ds (genomes) Const. 0.72 0.62 Major 0.78 0.65 diff. – 0.10 – 0.13 Signif. 0 0.5% Minor + 0.48 0.1% 1.41 1.89 Minor isoform alternative regions: • More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% • More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37% • Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions • Similar results for all highly covered genes or all conserved exons Alternative splicing avoids disrupting domains (and non-domain units) a) 100% 90% 13% 21% 80% 70% 6% Non-domain functional units partially Domains partially Data: • SwissProt • PROFAM 34% • PROSITE 60% 40% 50% No annotated unit affected 40% 37% 30% 15% Non-domain functional units completely 20% 10% 0% 10% 19% 6% Expected Observed Domains completely Control: fix the domain structure; randomly place alternative regions Positive selection towards domain shuffling (not simply avoidance of disrupting domains by occurring between domains ) b) No annotated units affected Nondomain units completely Domains completely a) 100% 90% 13% 21% 80% 70% 6% Non-domain functional units partially Domains partially 34% 60% 40% 50% No annotated unit affected 40% 37% 30% 15% Non-domain functional units completely 19% Domains completely 20% 10% 0% 10% 6% Expected Observed Expected Observed Short (<50 aa) alternative splicing events within domains target protein functional sites c) FT positions affected FT positions unaffected Prosite patterns affected a) 100% 90% 13% 21% 80% 70% 6% Non-domain functional units partially Domains partially Prosite patterns unaffected 34% 60% 40% 50% No annotated unit affected 40% 37% 30% 15% Non-domain functional units completely 19% Domains completely 20% 10% 0% 10% 6% Expected Observed Expected Observed An attempt of integration • AS is often genome-specific – alternative exons and sites are less conserved (more often lost or gained) than constitutive ones • … but still functional – Even NMD-inducing isoforms are conserved in at least one lineage – … especially those supported by multiple ESTs • AS regions show evidence for decreased negative (stabilizing) selection – excess non-synonymous codon substitutions • AS regions show evidence for positive (diversifying) selection – excess non-synonymous SNPs • AS tends to shuffle domains and target functional sites in proteins • Thus AS may serve as a testing ground for new functions without sacrificing old ones Acknowledgements • Discussions – Vsevolod Makeev (GosNIIGenetika) – Eugene Koonin (NCBI) – Igor Rogozin (NCBI) – Dmitry Petrov (Stanford) – Dmitry Frishman (GSF, TUM) – Shamil Sunyaev (Harvard University Medical School) • Data Authors • Andrei Mironov (Moscow State University) • Ramil Nurtdinov (Moscow State University) – human/mouse/dog Dmitry Malko (GosNIIGenetika) – drosophila/mosquito Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks Vasily Ramensky (Institute of Molecular Biology) – SNPs • • – King Jordan (NCBI) • Support – Howard Hughes Medical Institute – INTAS – Russian Academy of Sciences (program “Molecular and Cellular Biology”) – Russian Fund of Basic Research • • • Irena Artamonova (GSF/MIPS) – human/mouse, plots Alexei Neverov (GosNIIGenetika) – functionality of isoforms