Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Trans-splicing in Trypanosoma brucei— results from genome-wide experiments Shai Carmi Bar-Ilan University Department of physics and the faculty of life sciences February 2010 mRNA processing in T. brucei Almost all genes have no promoters. Gene expression is regulated by controlling splicing (?), mRNA stability, and translation. Gene1 Gene2 Gene3 Gene4 Polycistronic Transcript SL Trans-Splicing= And Polyadenylation= AAAA AAAA AAAA translation AAAA mature transcripts Itai Dov Tkacz Splicing overview SL- Spliced Leader RNA See also: Liang et. al, Euk. Cell (2003). cis-splicing machinery and consensus mammalian snRNPs 3’ splice-site 10-12nts Yeast conserved branch site: TACTAAC Splicing regulation SR proteins create ’bridges’ to stabilize the spliceosome hnRNP splicing enhancer splicing silencer In trypanosomes: • U2F65 and 35 exist and do not interact. • U2F65 interacts with SF1. • Interacting SR proteins were identified. • hnRNP proteins exist. Open questions 3’ splice site recognition and selection. Spatial organization of splicing factors: protein-protein and protein-RNA interactions. Splicing efficiency and gene expression regulation. Detailed molecular mechanism of trans-splicing and spliceosome assembly, structure of 5’ splice site, SLRNA biogenesis, and coupling to poly-adenylation: not in this talk. Past studies of splicing regulation Clayton et. al, Mol. Biochem. Parasit. (2005): Calculated the statistical properties of the splice sites based on a couple of hundreds ESTs. Clayton et. al, Mol. Cell. Biol. (1994); Ullu et. al, Mol. Cell. Biol. (1998); Cross et. al, Mol. Cell. Biol. (2005): Used reporter gene systems with the splice sites of model genes (tubulin, actin, procyclin) to study the effect of splice site composition on splicing efficiency. Limited applicability. 3’ splice-site promoter intron AG 5’UTR Taken from endogenous gene and mutated reporter gene Major known facts Poly-adenylation is coupled to downstream trans-splicing. polyA-site reporter gene 3’UTR 3’ splice-site intergenic region 5’UTR reporter gene Hierarchy of trans-splicing and polyA signals exist. Specific sequences in the 5’UTR (exon) are required for splicing. Optimal PPT should be 25 nts long, U dominated but interspersed with Cs, and have no two consecutive purines. Optimal PPT-AG spacer should be 20-25 nts long, have U at position -3 and never AC at [-3,-4]. Research strategy– outline Sequence all messenger RNAs to map transcript boundaries. Silence splicing factors and measure the effect on each transcript. Examine the splice site regions of regulated genes to infer possible roles for splicing factors and mechanisms of splicing regulation. Methods– deep sequencing illumina guide. Deep sequencing of T. brucei mRNA Experiment performed at Ullu and Tschudi’s lab, Yale University. Library preparation: Total RNA Terminator exonuclease treatment First strand cDNA synthesis with random hexamer primers Second strand cDNA synthesis with SL primer 15 million useful reads! Poly(A)+ RNA selection First strand cDNA synthesis with random hexamer or oligo(dT) primers Second strand cDNA synthesis with RNaseH-derived RNA primers cDNA fragmentation and size selection Addition of adapters and amplification Illumina sequencing Ullu’s lab results 532 transcripts with misannotated start codon. 805 annotated genes not producing an transcript. 442 genes with alternative transcript in their UTRs. 1,114 new transcripts, conserved coding and non-coding. Trans-splicing and polyadenylation of snoRNA clusters. The experimental method can be slightly modified to discover pol-II transcription initiation sites. These sites were found at strand-switchregions, in proximity to tRNA genes, 75% of genes and within 0-1 1-10 10-100 >100 mRNA molecules transcription units. per cell Digital gene expression. number of genes 30 25 20 15 10 5 0 1 10 100 1000 relative abundance 10000 100000 1 Examples of reannotated features Chr VIII Correctly annotated gene cluster. Blue- number of reads from SL-enriched library. Red- number of reads from polyA-enriched library. Chr X A novel transcript. Chr VII A misannotated start codon. Blues line at the bottom denote SL reads. Chr XI An ORF which is part of a larger transcript. Chr VII A short transcript at the 3’end of a gene. Red lines at the bottom denote polyA reads. Examples were experimentally verified for all cases. Statistics of UTR lengths 5’ median- 388 3’ median- 91 UTR length distribution is approximately log-normal. Splice-site composition Non AG splice-sites due to sequencing errors and strain differences. No G allowed at the -3 position PPT Maximum at about -25, distance from AG varies: unique to trypansomes. No signal observed in the exon Splice-site composition Pyrimidine content Sites closer to the PPT are stronger. AG exon PPT disturbed along tens of nucleotides. Purines favored in the exon. Splice-site composition AC is not preferred at positions [-3,-4] of the 3’ splice-site: Splice-site with AC are less abundant. Splicing heterogeneity Uncertainty of splice-site usage. H pi ln pi i log-scale Average distance (nts) of all weak splice sites from the strongest splice site. Uncertainty 6967 genes: one major site 978 genes: two major sites 21 genes: three major sites Not alternative splicing in the regular sense- leads to the same protein. Splicing heterogeneity illustrated • Each row correspond to one gene. • Each site is denoted with a bar. • Sites are centered around the strongest site. • Bar color is according to relative usage. relative usage of trans-splice sites Downstream sites are more popular. Some sites are found in frame. ATG 60 40 20 0 -300 -100 100 300 nt position relative to START codon Predicting splicing heterogeneity What determines if a gene will be differentially spliced? Look at 100nts up- and down-stream the strongest site. Rank all potential splice sites: TAG-3, AAG, CAG-2, GAG-1. heterogeneity rank of a gene = sum of ranks of all other AG dinucleotides / rank of strongest site. Average heterogeneity rank about 10 for high uncertainty genes, but only about 7 for low uncertainty genes (P=10-20). Signatures do not look meaningful, but analysis show that longer 5’UTRs, shorter PPTs, and longer PPT-AG distance also contribute significantly to heterogeneity. What is heterogeneity good for? Unclear at the moment. Such heterogeneity is not found in other organisms. In cis-splicing, exon boundaries must be conserved to maintain intact coding sequence. In trans-splicing, such evolutionary pressure does not exist. However, trans-splicing heterogeneity was not observed in C. elegans. Can reflect another level of complexity in gene expression regulation, as the degree of heterogeneity significantly varies throughout the genome. Explaining abundance A-rich exons are more abundant. Splice-site ambiguity is anti-correlated with abundance. Other correlations: Genes with longer PPT and shorter 5’UTR are more abundant. A possible model for splicing factors organization? U2F65 does not bind U2F35, so AG can be far from PPT. Variable distance between AG and PPT allows regulation by differential binding of the splicing efficiency. competitor splice-site AG intergenic region BP PPT Optimal: AG 10-30 0-80 25 25 5’UTR AC-rich Silencing methods– RNAi Stem-loop construct T7-opposing construct Wang et. al, JBC (2000). Inducible by Tertracycline. Gene is silenced after 3 days. Silencing methods– microarrays Microarrays are chips on which thousands of DNA oligos are printed in an array. Each oligo represents a fragment of one gene. Expression profiles of entire genomes are obtained in a single experiment. Wikipedia Genome-wide observations red-up, green-down. Hundreds of genes are upregulatedunprecedented phenomenon. U2F65 and SF1 are physically interacting and thus have similar pattern. Vazquez et al., Mol. Biochem Parasitol. 164, 137 (2009). Genome-wide correlations Spearman correlation coefficient Prp43 SmD1 U2F35 Prp31 U2F65 SF1 U1 PTB1 PTB2 Tsr1 Tsr1IP hnRNP_FHPrp19 Prp43 1 0.278349 -0.13685 0.294357 0.240051 0.342593 0.149605 0.125257 0.130586 -0.02391 0.221945 0.204737 0.10404 SmD1 0.278349 1 0.044152 0.383218 0.333834 0.315953 -0.01695 0.230517 0.163068 0.041223 0.28852 0.494197 0.068624 U2F35 -0.13685 0.044152 1 -0.3023 0.435671 0.190754 0.378621 0.010175 0.264658 0.375165 0.500294 0.255059 0.088768 Prp31 0.294357 0.383218 -0.3023 1 0.217689 0.248819 0.017184 0.179219 -0.13272 0.024078 0.106424 -0.11128 0.126101 U2F65 0.240051 0.333834 0.435671 0.217689 1 0.698639 0.428154 0.071559 0.290715 0.394992 0.742415 0.366936 0.169848 SF1 0.342593 0.315953 0.190754 0.248819 0.698639 1 0.261155 0.175059 0.276896 0.056967 0.682194 0.344552 0.199872 U1 0.149605 -0.01695 0.378621 0.017184 0.428154 0.261155 1 0.007941 0.189195 0.312908 0.38916 0.174986 0.078526 PTB1 0.125257 0.230517 0.010175 0.179219 0.071559 0.175059 0.007941 1 0.254598 -0.11872 0.169165 0.024827 0.21833 PTB2 0.130586 0.163068 0.264658 -0.13272 0.290715 0.276896 0.189195 0.254598 1 0.178874 0.345913 0.37377 0.178053 Tsr1 -0.02391 0.041223 0.375165 0.024078 0.394992 0.056967 0.312908 -0.11872 0.178874 1 0.30302 0.147911 0.07821 Tsr1IP 0.221945 0.28852 0.500294 0.106424 0.742415 0.682194 0.38916 0.169165 0.345913 0.30302 1 0.348961 0.231646 hnRNP_FH 0.204737 0.494197 0.255059 -0.11128 0.366936 0.344552 0.174986 0.024827 0.37377 0.147911 0.348961 1 0.052611 Prp19 0.10404 0.068624 0.088768 0.126101 0.169848 0.199872 0.078526 0.21833 0.178053 0.07821 0.231646 0.052611 1 Potential protein-protein interactions should be biochemically verified. Interactions maybe indirect. Processes affected by splicing defects UpregulatedMostly ribosomal and translation involved proteins, peptidases, and chaperones. 10 candidates verified experimentally by RT-PCR. DownregultedMostly metabolic enzymes and transporters. Downregulated genes The sequence at the splice site of the genes most impacted by silencing may indicate the role of the splicing factor. Look at PPT length and distance to 3’ splice-site. Genes with shorter PPT require SF1 P-value=0.001 Genes with longer PPT-AG distance require PTB1 P-value=0.004 Most results are negative (discuss reason later). Sequence motifs Using DRIM tool of Yael Mandel-Gutfreund’s lab. U2F65 Up 5'UTR AGGGT TACAT CCCCA 3'UTR TTAAG GAAAA SF1 Up 5'UTR TTGCT CAACC GGCAG TAAGT CTTTT ACATA 3'UTR TAAGG AAAAC AGAGA GGGGT ACTCA CTACC U2F65 Down 5'UTR ACTTC ATAAA 3'UTR TTTAG AAGCG TCAAT GGTAA SF1 Down 5'UTR None 3'UTR TGTCA AATTT GCGGG CAAAA TTAGT U2F65 Both 5'UTR ACTCT 3'UTR AAGGG SF1 Both 5'UTR None 3'UTR GCGGG TAAGG Hard to assess the significance of the motifs. hnRNPF/H binding sites. Surprisingly no pyrimidine-rich motifs identified. Other tools not suited for RNA motifs or intended for the human genome and thus perform poorly. Should look which elements are conserved. Mechanisms of regulation RNA level regulation can be mediated via two mechanisms: 1. mRNA stability. The 3’UTR carries a specific sequence that causes stabilization or destabilization under given experimental conditions (silencing). Demonstrated experimentally for a few upregulated genes. Binding can be directly to the silenced splicing factor (U2F65, SF1, …). Splicing factors have been shown to bind mature mRNA in human cells (Carmo-Fonseca et. al, 2006). Alternatively, binding can be to some other factor which is affected by the silencing (secondary effect). Binding can induce both up- and down-regulation of different genes, depending on the context (e.g., competing with stabilizing/destabilizing proteins). Regulation might not due to binding but due to secondary structure. 2. Splicing defects. The absence of a splicing factor might cause downregulation of genes for which it is required for splicing. Such genes may have certain properties such as weak splice site, long PPT-AG distance, short PPT, competition with other AGs, etc. Discussion (problems) Computational approaches are limited by low reproducibility of the microarrays, noisy fold changes, and the very small number of genes affected by more than one factor. Genes with splicing defects are masked by many more genes which are regulated by mRNA stability. It is unclear at the moment if there is a significant number of genes regulated by splicing. mRNA stability can be mediated by more than one factor (primary and secondary effects). Thus, a clean set of genes which undergo the same regulation is hard to obtain. Discussion (future plans) Computational: Deep-sequencing of Leishmania at Ullu’s lab may provide information about conserved regulatory elements. Secondary structure of 3’UTR will be explored. Experimental: Reporter gene system with the intergenic region of a model gene. CLIP-seq (in vivo cross linking and immunoprecipiation followed by deep-sequencing) should yield RNA binding sites. Examine splicing defects (accumulation of SL-RNA or Y-structure) of individual genes or genome-wide (co-silencing of the exosome). Thank you for your attention!