Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RNA-RNA interaction A biological crash course and introduction to prediction methods Part I – Biological crash course Bacteria Plasmid copy control Post-segregational killing systems trans-encoded chromosomal RNAs RNA interference (gene silencing) Translation regulation C. elegans developmental regulation miRNA-miRNA interactions Human telomerase DNA vs. RNA Bases #Strands Structure DNA A,C,G,T 2 Double helix RNA A,C,G,U 1 or 2 Stem-loop, pseudoknots, etc. Gene expression Central dogma of molecular biology Translation mRNA -> protein via triplet code What happens if mRNA is destroyed or otherwise can’t be translated? Bacteria backgrounder Single-celled organisms Prokaryotes = no nucleus Multi-cistronic transcripts -> multiple genes transcribed at one time, often with overlapping reading frames Bacterial genetic information Bacterial chromosome (1) Genome of organism Required for life Plasmids (2) Circular DNA molecules Double-stranded Independently self-replicating Not required for life, often confer selective advantage such as antibiotic resistance Plasmid replication (1),(2) – Genes encoded on plasmid (3) – Origin of Replication (ORI) Plasmid copy control Recall independent self-replication Copy number fluctuations are unavoidable Too many -> “runaway”, host dies Too few -> increased risk of plasmid loss Problem: How to control copy count? Solution: negative feedback loop mediated by RNA-RNA interaction R1 copy control Genes: – origin of replication repA – lots of this protein product is required for replication initiation tap – translation of protein product is required for translation of repA protein copA – product is antisense RNA copB – product is a repressor protein (not covered here) oriR1 R1 copy control (2) copA – RNA with stem-loop structure copT – target segment of repA/tap mRNA, also forms a stem-loop structure Single loop-loop interaction R1 copy control (3) R1 copy control (4) copA RNA is unstable; it degrades If not enough plasmids are producing copA antisense RNA (copy number is too low), more repA protein can be produced Therefore the plasmid can replicate Post-segregational killing systems Plasmid self-preservation mechanism Bacterial host losing plasmid results in host death R1 plasmid hok/sok system is the prototype All such systems work similarly R1 hok/sok system hok/sok locus encodes: protein – “host killing” Overlapping reading frame – mok – “modulator of killing” sok RNA – “suppressor of killer” hok mok must be translated for hok to be expressed mok cannot be translated if sok is present R1 hok/sok system (2) hok mRNA is extremely compact Many stem-loop structures Flush 5’ – 3’ pairing Highly stable -> long half-life Translationally inert mok segment is both: Translationally active Able to bind sok inhibitor RNA R1 hok/sok system (3) sok RNA is highly unstable Bacteria with R1 have lots of sok produced sok binds mok, hok is not translated Bacteria which lose R1 have: Lots of stable hok mRNA Quickly degrading sok RNA (low stability) No new sok RNA being produced hok is translated -> bacteria dies Bacterial chromosomes Plasmid antisense RNAs are generally cisencoded Implies complete Watson-Crick complementarity Bacterial chromosomes contain transencoded antisense RNAs Not necessarily complete complementarity Often stress-related control systems oxyS/fhlA in E. coli oxyS – RNA transcript induced by stress fhlA – transcriptional activator site oxyS/fhlA complex binds via two loop-loop interactions RNA interference (RNAi) a.k.a. post-transcriptional gene silencing Double-stranded RNAs are introduced into the cell Complementary to mRNA for a gene Directly introduced in a wet lab, or Produced by the cell itself RNA interference (2) dsRNAs are cleaved into 21-23 nt segments (“small interfering RNAs”, or siRNAs) by an enzyme called Dicer RNA interference (3) siRNAs are incorporated into RNAinduced silencing complex (RISC) RNA interference (4) Guided by base complementarity of the siRNA, the RISC targets mRNA for degradation RNA interference – why? Studying gene function Knock out or inhibit a gene’s normal function Can the organism survive? What phenotypic changes are observed? Therapeutic suppression E.g. cancer treatment micro RNA (miRNA) Gene expression regulation Created by similar process to siRNA Generally prevents binding of ribosome Ex: C. elegans development lin-4 and let-7 antisense RNAs Regulate larval development in C. elegans One of the two binding sites for lin-41 and let-7 interaction: Human telomerase Telomerase = ribonucleoprotein complex Ribo = ribosomal/RNA association Nucleo = nuclear localization Protein = contains a protein Responsible for maintaining telomere length in eukaryotic chromosomes Main components: Telomerase reverse transcriptase Human telomerase RNA (hTR) Human telomerase (2) Reverse transcriptase Transcribes RNA to DNA (rather than the usual DNA to RNA) Telomeres – repeated regions at the end of eukaryotic chromosomes hTR is the template for the repeated region Human telomerase (3) hTR 11-nt templating region consists of: Repeat template: CUAACCC Alignment domain: UAAC Positions telomerase on the DNA strand Provides template for repeat region Human telomerase (4) Loop-loop interaction Sometimes referred to as “kissing loops” Recall that all of the RNA-RNA interaction discussed so far (excepting RNAi), involve loop-loop interaction Predicting miRNA transcripts and targets involves loop structure prediction References Couzin, J. (2002) “Breakthrough of the year – Small RNAs make big splash.” Science 298(5602):2296-2297. Lai, E.C., Wiel, C., and Rubin, G.M. (2004) “Complementary miRNA pairs suggest a regulatory role for miRNA:miRNA duplexes.” RNA 10(2):171-175. Moss, E.G. (2001) “RNA interference – It’s a small RNA world.” Current Biology 11(19):R722-775. Sharp, P.A. (2001) “RNA interference – 2001.” Genes and Development 15(5):485-90. Shi, Y. (2003) “Mammalian RNAi for the masses.” TRENDS in Genetics 19(1):9-12. References (2) Ueda, C.T., and Roberts, R.W. (2004) “Analysis of a longrange interaction between conserved domains of human telomerase RNA.” RNA 10(1):139-147. Wagner, E.G.H. and Flärdh, K. (2002) “Antisense RNAs everywhere?” TRENDS in Genetics 18(5):223-226. Wagner, E.G.H., Altuvia, S., and Romby, P. (2002) “Antisense RNAs in bacteria and their genetic elements.” Advances in Genetics 45:361-398. Part II – Prediction Identifying effective siRNAs Neural network approach Identifying targets Mammalian miRNA target prediction Prediction of siRNAs Sequence properties that make a good antisense RNA an effective gene inhibitor are not well understood Most computational models consider only: RNA structure prediction Motif searches Neural net approach Training set: 490 known siRNA molecules Input parameters: Base composition mRNA:siRNA binding energy properties 3’ and 5’ binding energy Structure of siRNA (hairpin energy and quality) Target function: efficacy Neural net approach (2) Neural net results 14 inputs, 11 hidden units, 1 output Success rate of 92% Average prediction of 12 effective siRNAs per 1000 base pairs Stringent (high specificity) Good for designing siRNAs for RNAi Prediction of miRNA targets Mammals/vertebrates Lots of known miRNAs Mostly unknown target genes Initial method outline Look at conserved miRNAs Look for conserved target sites micro RNAs in animals 0.5-1.0% of predicted genes encode miRNA One of the more abundant regulatory classes Tissue-specific or developmental stagespecific expression High evolutionary conservation micro RNAs in plants Finding targets in plants is relatively easy Look for mRNA transcripts with nearperfect complementarity to known miRNAs Signal-to-noise ratio exceeds 10:1 for Arabidopsis (model plant organism) Naïve approach in C. elegans and D. melanogaster? No more hits than expected by random chance! So what can we use? Pairing to nucleotides 2-8 at the 5’ end of the miRNA Target recognition Target regions enriched for genes involved in transcriptional regulation Goals for algorithm Predict 100s of miRNA targets Estimate false-positive rates Provide computational and experimental evidence of authenticity Identify common functionality classes other than transcriptional regulator genes TargetScan Algorithm developed by Lewis et al 2003 Input: miRNA that is known to be conserved across multiple organisms Orthologous 3’ UTR sequences Cut-off values for two parameters Value for one free parameter Output: Ranked list of candidate target genes TargetScan (1) Search UTRs in one organism 2-8 from miRNA = “miRNA seed” Perfect Watson-Crick complementarity No wobble pairs (G-U) 7nt matches = “seed matches” Bases TargetScan (2) Extend seed matches Allow G-U (wobble) pairs Both directions Stop at mismatches TargetScan (3) Optimize basepairing Remaining 3’ region of miRNA 35 bases of UTR 5’ to each seed match RNAfold program (Hofacker et al 1994) TargetScan (4) Folding free energy (G) assigned to each putative miRNA:target interaction Ignores initiation free energy RNAeval (Hofacker et al 1994) TargetScan (5) Z score for each UTR (no match -> Z=1.0) n Z e Gk / T k 1 n = number of seed matches in UTR (may be more than one) Gk = free energy of miRNA:target site interaction of kth seed match T = parameter influencing relative weighting of UTRs with few high affinity target sites against UTRs with lots of low affinity target sites (experimentally determined) TargetScan (6) Order UTRs by Z score Assign rank to each UTR Repeat this process for each of the other organisms with UTR datasets TargetScan (7) UTR i is a predicted target if for all organisms: Zi ZC Ri RC Datasets nrMamm (mammalian – 79 sequences) Homologs in human, mouse, and pufferfish Identical between human and mouse, not necessarily pufferfish (fugu) nrVert (vertebrate – 55 sequences) Identical between human, mouse, and fugu Non-redundant: if multiple miRNAs had the same seed, one representative chosen Sample program flow Results for nrMamm nrMamm searched against human, mouse, and rat orthologous 3’ UTRs 451 miRNA:target interactions predicted for 400 unique genes Average 5.7 targets per miRNA Signal:noise ratio of 3.2:1 Results for nrVert Additional search against fugu UTRs Signal:noise ratio improves to 4.6:1 Relaxed cut-off values 115 predicted miRNA:target interactions for 107 unique genes 2.1 putative targets per miRNA Signal:noise ratio calculation Signal = number of predicted targets from nrMamm dataset Noise = number of predicted targets from randomly shuffled miRNAs Shuffled control sequences screened to ensure preservation of relevant features – don’t underestimate the noise! Screening control sequences Features to consider: Expected frequency of seed matches Expected frequency of matching to 3’ end of miRNA (after seed extension) Observed count of seed matches in UTR datasets Predicted free energies for seed:match interactions Signal:noise results Filled bars are for authentic miRNAs Open bars show the mean and standard deviation for shuffled sequences nrMamm set used for first two, nrVert used for set including fugu Biological relevance Hypothesis: 5’ conservation of miRNAs is important for mRNA target recognition Highest signal:noise ratio observed when seed positioned close to 5’ end Hypothesis: highly conserved miRNAs are more involved in regulation High degree of conservation -> more predicted targets Membership in large miRNA family -> more predicted targets Experimental verification 15 predicted target sites chosen All with known biological function Representative of the entire list of candidates 11 target sites confirmed Expression of upstream ORF influenced 27% false positives – close correspondance to predicted 30% false positives References Chalk, A.M. and Sonnhammer, E.L.L. (2002) “Computational antisense oligo prediction with a neural network model.” Bioinformatics 18(12):1567-1575. Hofacker, I.L., Fontanta, W., Stadler, P.F., Bonhoeffer, S., Tacker, M., and Schuster, P. (1994) “Fast folding and comparison of RNA secondary structures.” Monatshefte fur Chemie 125:167-168. Lewis, B.P., Shih, I., Jones-Rhoades, M.W., and Bartel, D.P. (2003) “Prediction of mammalian microRNA targets.” Cell 115(7):787-798.