Download talk_splicing - Columbia University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SR protein wikipedia , lookup

Alternative splicing wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Trans-splicing in Trypanosoma
brucei— results from genome-wide
experiments
Shai Carmi
Bar-Ilan University
Department of physics and the faculty of life sciences
February 2010
mRNA processing in T. brucei


Almost all genes have
no promoters.
Gene expression is
regulated by
controlling splicing (?),
mRNA stability,
and translation.
Gene1 Gene2 Gene3 Gene4
Polycistronic
Transcript
SL
Trans-Splicing=
And
Polyadenylation=
AAAA
AAAA
AAAA
translation
AAAA
mature transcripts
Itai Dov Tkacz
Splicing overview
SL- Spliced Leader RNA
See also:
Liang et. al,
Euk. Cell (2003).
cis-splicing machinery and consensus
mammalian
snRNPs
3’ splice-site
10-12nts
Yeast conserved branch site: TACTAAC
Splicing regulation
SR proteins create
’bridges’ to stabilize the
spliceosome
hnRNP
splicing enhancer
splicing silencer
In trypanosomes:
• U2F65 and 35 exist and
do not interact.
• U2F65 interacts with SF1.
• Interacting SR proteins
were identified.
• hnRNP proteins exist.
Open questions

3’ splice site recognition and selection.

Spatial organization of splicing factors:
protein-protein and protein-RNA interactions.

Splicing efficiency and gene expression regulation.

Detailed molecular mechanism of trans-splicing and
spliceosome assembly, structure of 5’ splice site, SLRNA biogenesis, and coupling to poly-adenylation:
not in this talk.
Past studies of splicing regulation

Clayton et. al, Mol. Biochem. Parasit. (2005):
Calculated the statistical properties of the splice sites based on a
couple of hundreds ESTs.


Clayton et. al, Mol. Cell. Biol. (1994); Ullu et. al, Mol. Cell. Biol. (1998); Cross et. al,
Mol. Cell. Biol. (2005):
Used reporter gene systems with the splice sites of model genes
(tubulin, actin, procyclin) to study the effect of splice site
composition on splicing efficiency.
Limited applicability.
3’ splice-site
promoter
intron
AG
5’UTR
Taken from endogenous gene and mutated
reporter gene
Major known facts

Poly-adenylation is coupled to downstream trans-splicing.
polyA-site
reporter gene




3’UTR
3’ splice-site
intergenic region
5’UTR
reporter gene
Hierarchy of trans-splicing and polyA signals exist.
Specific sequences in the 5’UTR (exon) are required for splicing.
Optimal PPT should be 25 nts long, U dominated but interspersed
with Cs, and have no two consecutive purines.
Optimal PPT-AG spacer should be 20-25 nts long, have U at
position -3 and never AC at [-3,-4].
Research strategy– outline

Sequence all messenger RNAs to map transcript
boundaries.

Silence splicing factors and measure the effect on each
transcript.

Examine the splice site regions of regulated genes to
infer possible roles for splicing factors and mechanisms
of splicing regulation.
Methods– deep sequencing
illumina guide.
Deep sequencing of T. brucei mRNA


Experiment performed at Ullu and Tschudi’s lab, Yale University.
Library preparation:
Total RNA
Terminator exonuclease treatment
First strand cDNA synthesis
with random hexamer primers
Second strand cDNA synthesis
with SL primer
15 million useful reads!
Poly(A)+ RNA selection
First strand cDNA synthesis
with random hexamer or oligo(dT) primers
Second strand cDNA synthesis
with RNaseH-derived RNA primers
cDNA fragmentation
and size selection
Addition of adapters and amplification
Illumina sequencing
Ullu’s lab results






532 transcripts with misannotated start codon.
805 annotated genes not producing an transcript.
442 genes with alternative transcript in their UTRs.
1,114 new transcripts, conserved coding and non-coding.
Trans-splicing and polyadenylation of snoRNA clusters.
The experimental method can be slightly modified to discover pol-II
transcription initiation sites. These sites were found at strand-switchregions, in proximity to tRNA genes,
75% of genes
and within
0-1
1-10
10-100
>100 mRNA
molecules
transcription units.
per cell
Digital gene
expression.

number of genes
30
25
20
15
10
5
0
1
10
100
1000
relative abundance
10000
100000
1
Examples of reannotated features
Chr VIII
Correctly annotated gene cluster.
Blue- number of reads from SL-enriched library.
Red- number of reads from polyA-enriched library.
Chr X
A novel transcript.
Chr VII
A misannotated start codon.
Blues line at the bottom denote SL reads.
Chr XI
An ORF which is part of a larger transcript.
Chr VII
A short transcript at the 3’end of a gene.
Red lines at the bottom denote polyA reads.
Examples were experimentally verified for all cases.
Statistics of UTR lengths
5’
median- 388
3’
median- 91
UTR length distribution is
approximately log-normal.
Splice-site composition
Non AG splice-sites due to sequencing errors
and strain differences.
No G allowed at the -3 position
PPT
Maximum at about -25,
distance from AG varies:
unique to trypansomes.
No signal observed in the exon
Splice-site composition
Pyrimidine content
Sites closer to the PPT are stronger.
AG
exon
PPT disturbed along
tens of nucleotides.
Purines favored in the exon.
Splice-site composition
AC is not preferred at positions [-3,-4] of the 3’ splice-site:
Splice-site with AC are less abundant.
Splicing heterogeneity
Uncertainty of splice-site usage.
H   pi ln pi
i
log-scale
Average distance (nts) of
all weak splice sites
from the strongest splice site.
Uncertainty

6967 genes: one major site
978 genes: two major sites
21 genes: three major sites
Not alternative splicing in the regular sense- leads to the same protein.
Splicing heterogeneity illustrated
• Each row correspond to one gene.
• Each site is denoted with a bar.
• Sites are centered around the
strongest site.
• Bar color is according to relative
usage.
relative usage of
trans-splice sites
Downstream sites are more popular.
Some sites are found in frame.
ATG
60
40
20
0
-300
-100
100
300
nt position relative to START codon
Predicting splicing heterogeneity






What determines if a gene will be differentially spliced?
Look at 100nts up- and down-stream the strongest site.
Rank all potential splice sites: TAG-3, AAG, CAG-2, GAG-1.
heterogeneity rank of a gene = sum of ranks of all other AG
dinucleotides / rank of strongest site.
Average heterogeneity rank about 10 for high uncertainty genes, but
only about 7 for low uncertainty genes (P=10-20).
Signatures do not look meaningful, but analysis show that longer
5’UTRs, shorter PPTs, and longer PPT-AG distance also contribute
significantly to heterogeneity.
What is heterogeneity good for?




Unclear at the moment. Such heterogeneity is not found
in other organisms.
In cis-splicing, exon boundaries must be conserved to
maintain intact coding sequence. In trans-splicing, such
evolutionary pressure does not exist.
However, trans-splicing heterogeneity was not observed
in C. elegans.
Can reflect another level of complexity in gene
expression regulation, as the degree of heterogeneity
significantly varies throughout the genome.
Explaining abundance

A-rich exons are more abundant.
Splice-site ambiguity is anti-correlated
with abundance.
Other correlations:
Genes with longer PPT and shorter 5’UTR are more abundant.
A possible model for splicing factors
organization?


U2F65 does not bind U2F35, so AG can be far from PPT.
Variable distance between AG and PPT allows regulation by
differential binding of the splicing efficiency.
competitor
splice-site
AG
intergenic region
BP
PPT
Optimal:
AG
10-30
0-80
25
25
5’UTR
AC-rich
Silencing methods– RNAi
Stem-loop construct
T7-opposing construct
Wang et. al, JBC (2000).
Inducible by Tertracycline.
Gene is silenced after 3 days.
Silencing methods– microarrays

Microarrays are chips on
which thousands of DNA
oligos are printed in an array.
Each oligo represents a
fragment of one gene.

Expression profiles of entire
genomes are obtained in a
single experiment.
Wikipedia
Genome-wide observations
red-up, green-down.



Hundreds of genes are
upregulatedunprecedented
phenomenon.
U2F65 and SF1
are physically
interacting and thus
have similar pattern.
Vazquez et al., Mol. Biochem
Parasitol. 164, 137 (2009).
Genome-wide correlations
Spearman correlation coefficient
Prp43
SmD1
U2F35
Prp31
U2F65
SF1
U1
PTB1
PTB2
Tsr1
Tsr1IP
hnRNP_FHPrp19
Prp43
1 0.278349 -0.13685 0.294357 0.240051 0.342593 0.149605 0.125257 0.130586 -0.02391 0.221945 0.204737 0.10404
SmD1
0.278349
1 0.044152 0.383218 0.333834 0.315953 -0.01695 0.230517 0.163068 0.041223 0.28852 0.494197 0.068624
U2F35
-0.13685 0.044152
1 -0.3023 0.435671 0.190754 0.378621 0.010175 0.264658 0.375165 0.500294 0.255059 0.088768
Prp31
0.294357 0.383218 -0.3023
1 0.217689 0.248819 0.017184 0.179219 -0.13272 0.024078 0.106424 -0.11128 0.126101
U2F65
0.240051 0.333834 0.435671 0.217689
1 0.698639 0.428154 0.071559 0.290715 0.394992 0.742415 0.366936 0.169848
SF1
0.342593 0.315953 0.190754 0.248819 0.698639
1 0.261155 0.175059 0.276896 0.056967 0.682194 0.344552 0.199872
U1
0.149605 -0.01695 0.378621 0.017184 0.428154 0.261155
1 0.007941 0.189195 0.312908 0.38916 0.174986 0.078526
PTB1
0.125257 0.230517 0.010175 0.179219 0.071559 0.175059 0.007941
1 0.254598 -0.11872 0.169165 0.024827 0.21833
PTB2
0.130586 0.163068 0.264658 -0.13272 0.290715 0.276896 0.189195 0.254598
1 0.178874 0.345913 0.37377 0.178053
Tsr1
-0.02391 0.041223 0.375165 0.024078 0.394992 0.056967 0.312908 -0.11872 0.178874
1 0.30302 0.147911 0.07821
Tsr1IP
0.221945 0.28852 0.500294 0.106424 0.742415 0.682194 0.38916 0.169165 0.345913 0.30302
1 0.348961 0.231646
hnRNP_FH 0.204737 0.494197 0.255059 -0.11128 0.366936 0.344552 0.174986 0.024827 0.37377 0.147911 0.348961
1 0.052611
Prp19
0.10404 0.068624 0.088768 0.126101 0.169848 0.199872 0.078526 0.21833 0.178053 0.07821 0.231646 0.052611
1


Potential protein-protein interactions should be biochemically verified.
Interactions maybe indirect.
Processes affected by splicing defects
UpregulatedMostly ribosomal
and translation
involved proteins,
peptidases, and
chaperones.
10 candidates
verified
experimentally by
RT-PCR.
DownregultedMostly
metabolic
enzymes and
transporters.
Downregulated genes


The sequence at the splice site of the genes most impacted by
silencing may indicate the role of the splicing factor.
Look at PPT length and distance to 3’ splice-site.
Genes with shorter PPT require SF1
P-value=0.001

Genes with longer PPT-AG distance require PTB1
P-value=0.004
Most results are negative (discuss reason later).
Sequence motifs

Using DRIM tool of Yael Mandel-Gutfreund’s lab.
U2F65
Up
5'UTR
AGGGT
TACAT
CCCCA




3'UTR
TTAAG
GAAAA
SF1
Up
5'UTR
TTGCT
CAACC
GGCAG
TAAGT
CTTTT
ACATA
3'UTR
TAAGG
AAAAC
AGAGA
GGGGT
ACTCA
CTACC
U2F65
Down
5'UTR
ACTTC
ATAAA
3'UTR
TTTAG
AAGCG
TCAAT
GGTAA
SF1
Down
5'UTR
None
3'UTR
TGTCA
AATTT
GCGGG
CAAAA
TTAGT
U2F65
Both
5'UTR
ACTCT
3'UTR
AAGGG
SF1
Both
5'UTR
None
3'UTR
GCGGG
TAAGG
Hard to assess the significance of the motifs.
hnRNPF/H binding sites.
Surprisingly no pyrimidine-rich motifs identified.
Other tools not suited for RNA motifs or intended for the human
genome and thus perform poorly.
Should look which elements are conserved.
Mechanisms of regulation


RNA level regulation can be mediated via two mechanisms:
1. mRNA stability.

The 3’UTR carries a specific sequence that causes stabilization or destabilization under
given experimental conditions (silencing).
Demonstrated experimentally for a few upregulated genes.
Binding can be directly to the silenced splicing factor (U2F65, SF1, …). Splicing factors
have been shown to bind mature mRNA in human cells (Carmo-Fonseca et. al, 2006).
Alternatively, binding can be to some other factor which is affected by the silencing
(secondary effect).
Binding can induce both up- and down-regulation of different genes, depending on the
context (e.g., competing with stabilizing/destabilizing proteins).
Regulation might not due to binding but due to secondary structure.

2. Splicing defects.

The absence of a splicing factor might cause downregulation of genes for which it is
required for splicing.
Such genes may have certain properties such as weak splice site, long PPT-AG
distance, short PPT, competition with other AGs, etc.






Discussion (problems)




Computational approaches are limited by low reproducibility of the
microarrays, noisy fold changes, and the very small number of
genes affected by more than one factor.
Genes with splicing defects are masked by many more genes which
are regulated by mRNA stability. It is unclear at the moment if there
is a significant number of genes regulated by splicing.
mRNA stability can be mediated by more than one factor (primary
and secondary effects).
Thus, a clean set of genes which undergo the same regulation is
hard to obtain.
Discussion (future plans)







Computational:
Deep-sequencing of Leishmania at Ullu’s lab may provide
information about conserved regulatory elements.
Secondary structure of 3’UTR will be explored.
Experimental:
Reporter gene system with the intergenic region of a model gene.
CLIP-seq (in vivo cross linking and immunoprecipiation followed by
deep-sequencing) should yield RNA binding sites.
Examine splicing defects (accumulation of SL-RNA or Y-structure) of
individual genes or genome-wide (co-silencing of the exosome).
Thank you for your attention!