Download Evolution of alternative splicing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Alternative splicing:
A playground of evolution
Mikhail Gelfand
Research and Training Center for Bioinformatics
Institute for Information Transmission Problems RAS,
Moscow, Russia
October 2006
% of alternatively spliced human and mouse genes
by year of publication
Human (genome / random sample)
All genes
Human (individual chromosomes)
Only multiexon genes
Mouse (genome / random sample)
Genes with high EST coverage
Plan
• Evolution of alternative exon-intron
structure
– mammals: human, mouse, dog
– dipteran insects: Drosophila melanogaster,
D. pseudoobscura, Anopheles gambiae
• Evolutionary rate in constitutive and
alternative regions
– human / mouse
– D. melanogaster / D. pseudoobscura
– human-chimpanzee / human SNPs
Elementary alternatives
Cassette exon
Alternative
donor site
Alternative
acceptor site
Retained intron
Alternative exon-intron structure in the
human, mouse and dog genomes
• EDAS: a database of human alternative splicing (human
genome + GenBank + EST data from RefSeq)
– consider casette exons and alternative splicing sites
– functionality: potentially translated vs. NMD-inducing elementary
alternatives
• Human-mouse-dog triples of orthologous genes
• We follow the fate of human alternative sites and exons in the
mouse and dog genomes
• Each human AS isoform is spliced-aligned to the mouse and
dog genome. Definition of conservation:
– conservation of the corresponding region (homologous exon is actually
present in the considered genome);
– conservation of splicing sites (GT and AG)
Caveats
• we consider only possibility of AS in mouse and
dog: do not require actual existence of
corresponding isoforms in known transcriptomes
• we do not consider situations when alternative
human exon (or site) is constitutive in mouse or
dog
• of course, functionality assignments (translated /
NMD-inducing) are not very reliable
Translated cassette exons
constitutive
NMD-inducing cassette exons
Observations
• Predominantly included exons are
highly conserved irrespective of
function
• Predominantly skipped translated
exons are more conserved than
NMD-inducing ones
• Numerous lineage-specific losses
– more in mouse than in dog
• Still, ~40% of skipped (<1%
inclusion) exons are conserved in
at least one lineage
Alternative donor and acceptor sites: same trends
• Higher conservation of ~uniformly used sites
• Internal sites are more conserved than external ones (as expected)
Alternative exon-intron structure in
fruit flies and the malarial mosquito
• Same procedure (AS data from FlyBase)
– cassette exons, splicing sites
– also mutually exclusive exons, retained introns
• Follow the fate of D. melanogaster exons in the D.
pseudoobscura and Anopheles genomes
• Technically more difficult:
– incomplete genomes
– the quality of alignment with the Anopheles genome is lower
– frequent intron insertion/loss (~4.7 introns per gene in
Drosophila vs. ~3.5 introns per gene in Anopheles)
Conservation of coding segments
constitutive
segments
alternative
segments
D. melanogaster –
D. pseudoobscura
97%
75-80%
D. melanogaster –
Anopheles gambiae
77%
~45%
Conservation of D.melanogaster elementary
alternatives in D. pseudoobscura genes
blue – exact
green – divided exons
yellow – joined exon
orange – mixed
red – non-conserved
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
CONSTANT
exon
Donor site
Acceptor site Retained intron Cassette exon Exclusive exon
• retained introns
are the least
conserved (are all
of them really
functional?)
• mutually
exclusive exons
are as conserved
as constitutive
exons
Conservation of D.melanogaster elementary
alternatives in Anopheles gambiae genes
blue – exact
green – divided exons
yellow – joined exons
orange – mixed
red – non-conserved
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
CONSTANT
exon
Donor site
Acceptor site Retained intron Cassette exon Exclusive exon
• ~30% joined, ~10%
divided exons (less
introns in Aga)
• mutually exclusive
exons are
conserved exactly
• cassette exons are
the least conserved
CG1517: cassette exon in Drosophila,
alternative acceptor site in Anopheles
a)
Dme, Dps
Aga
CG31536: cassette exon in Drosophila,
shorter cassette exon and alternative donor site
in Anopheles
Dme, Dps
Aga
Evolutionary rate in constitutive
and alternative regions
• Human and mouse orthologous genes
• Estimation of the dn/ds ratio:
higher fraction of non-synonymous
(changing amino acid) substitutions
=> weaker stabilizing (or stronger positive) selection
Concatenates of constitutive and alternative
regions in all genes: different evolutionary rates
0,301
0,30
Am ino-acid ide ntity
0,199
0,176
0,187
dN/dS
0,20
0,9
0,10
0,886
0,874
0,878
0,807
0,8
0,7
Constitutive
0,00
Constitutive
N-end
alternative
Internal
alternative
C-end
alternative
• Relatively more non-synonimous
substitutions in alternative
regions (higher dN/dS ratio)
N-end
alternative
Internal
alternative
• Less amino acid identity in
alternative regions
Columns (left-to-right) – (1) constitutive regions;
(2–4) alternative regions: N-end, internal, C-end
C-end
alternative
Individual genes: the rate of non-synonymous to
synonymous substitutions dn/ds tends to be larger
in alternative regions (vertical acis)
than in constitutive regions (horizontal acis)
A
10
1
0.1
0.01
0.001
С
Non-symmetrical histogram of
dn/ds(const)–dn/ds(alt)
Genes
1000
752 642
329
199
100
136
73
67
40
27
10
18
15
9
18
10
7
5
7
3
1
0
0
0
1
–
C
–
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
0
0.1
0.2 0.3
0.4 0.5
0.6
0.7 0.8
0.9
1
Black: shadow of the left half.
In a larger fraction of genes dn/ds(const)<dn/ds(alt),
especially for larger values
A
Genes
1000
526 469
233
100
The same
effect is seen
in:
127
111
52
49
32
21
10
12
7
7
3
1
8
6
5
3
2
1
0
0
0
–
AN
–
AI
–
AC
C
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
–
0.1
0
0.2 0.3
0.4 0.5
0.6
0.7 0.8
0.9
1
1000
285 283
N-terminal,
100
109
101
51
17
10
15
10
9
internal,
30
23
10
6
4
3
5
3
2
1
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
–
0
0
0
0
C
0.1
0
0.2 0.3
0.4 0.5
0.6
0.7 0.8
0.9
1
1000
C-terminal
parts
100
20
10
14
23
19
9
6
4
3
0
1
–
1
0
4
3
0
1
–1 –0.9 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
2
0
1
0
0
0
0
C
0
0.1
0.2 0.3
0.4 0.5
0.6
0.7 0.8
0.9
1
Drosophilas: less selection in alternative regions?
More mutations
in alt. regions
Similar level of
mutations
More mutations in
const. regions
In a majority of genes, both synonymous and nonsynonymous mutation rates are higher in
alternative regions than in constitutive regions
Different behavior of
N-terminal, internal and C-terminal alternatives
N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions;
most genes have higher stabilizing selection in alt. regions
Internal alternatives: intermediate situation
C-terminal alternatives: more non-synonymous substitutions and less
synonymous substitutions => lower stabilizing selection in alternative regions
The MacDonald-Kreitman test: evidence for positive
selection in (minor isoform) alternative regions
•
•
•
•
Human and chimpanzee genome mismatches vs human SNPs
Exons conserved in mouse and/or dog
Genes with at least 60 ESTs (median number)
Fisher’s exact test for significance
Pn/Ps (SNPs) Dn/Ds (genomes)
Const.
0.72
0.62
Major
0.78
0.65
diff.
– 0.10
– 0.13
Signif.
0
0.5%
Minor
+ 0.48
0.1%
1.41
1.89
Minor isoform alternative regions:
• More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06%
• More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37%
• Positive selection (as opposed to lower stabilizing selection):
α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions
• Similar results for all highly covered genes or all conserved exons
An attempt of integration
• AS is often genome-specific
• young AS isoforms are often minor and tissue-specific
• … but still functional
– although unique isoforms may result from aberrant splicing
• AS regions show evidence for decreased negative
selection
– excess non-synonymous codon substitutions
• AS regions show evidence for positive selection
– excess non-synonymous SNPs
• AS tends to shuffle domains and target functional sites in
proteins
• Thus AS may serve as a testing ground for
new functions without sacrificing old ones
What next?
• Multiple genomes
– many Drosophila spp.
– ENCODE data for many mammals
• Estimate not only the rate of loss, but also the rate of gain
(as opposed to aberrant splicing)
• Control for:
– functionality: translated / NMD-inducing
– exon inclusion (or site choice) level: major / minor isoform
– tissue specificity pattern (?)
– type of alternative: N-terminal / internal / C-terminal
• Evolution of regulation of AS
• Splicing errors and mutations:
retained introns, skipped exons, cryptic sites
Acknowledgements
• Discussions
–
–
–
–
–
–
Vsevolod Makeev (GosNIIGenetika)
Eugene Koonin (NCBI)
Igor Rogozin (NCBI)
Dmitry Petrov (Stanford)
Dmitry Frishman (GSF, TUM)
Shamil Sunyaev (Harvard University Medical School)
• Data
– King Jordan (NCBI)
• Support
– Howard Hughes Medical Institute
– INTAS
– Russian Academy of Sciences
(program “Molecular and Cellular Biology”)
– Russian Fund of Basic Research
Authors
• Andrei Mironov (Moscow State University)
• Ramil Nurtdinov (Moscow State University)
– human/mouse/dog
• Dmitry Malko (GosNIIGenetika)
– drosophila/mosquito
• Ekaterina Ermakova (Moscow State University, IITP)
– Kn/Ks
• Vasily Ramensky (Institute of Molecular Biology)
– SNPs
• Irena Artamonova (GSF/MIPS)
– human/mouse, plots
• Alexei Neverov (GosNIIGenetika)
– functionality of isoforms
Related documents