Download Review: RECOMB Satellite Workshop on Regulatory Genomics

Review: RECOMB Satellite Workshop on Regulatory Genomics (Held March 26-27, 2004) Workshop Themes/Trends • More comprehensive evaluations of motif-detection algorithms • Making more effective use of comparative mapping/evolution data • Models that explain rather than just describe • Moving from binding motifs to entire regulatory modules • Methods are simple not sophisticated Outline • Jim Kadonaga, University of California, San Diego The MTE, a New Core Promoter Element for Transcription by RNA Polymerase II • Rotem Sorek, Compugen and Tel Aviv University The "promoters" of splicing: Intronic sequences that regulate alternative splicing • Yitzhak Pilpel, Weizman Institute Revealing the architecture of genetic backup circuits through inspection of transcription regulatory networks • Ron Shamir, Tel Aviv University Revealing selection patterns in the evolution of yeast transcription regulation • Michael Eisen, Lawrence Berkeley National Lab Evolutionary Signatures of Regulatory Sequences A New Core Promoter Element for Transcription by RNA Polymerase II (Jim Kadonaga) The majority of transcription activity is regulated by sequence-specific DNA-binding factors, which are thus the focus of the bulk of current research on regulation, however... The ultimate target of all of this action is the core promoter, which also plays a part in regulation Core promoter • Encompasses TSS • Directs RNA polymerase II • Most well-known component is the TATA box Core promoter • Encompasses TSS • Directs RNA polymerase II • Most well-known component is the TATA box Only about 30-40% of promoters contain a TATA box! What’s going on the rest of the time? Finding Novel Promoter Elements • Experimentally investigated binding in those promoters with no TATA-box – found novel promoter element DPE • Large scale motif detection of 2000 core promoters in Drosophila (Ohler et al, 2002) – Plotted distance of top 10 motifs to TSS • four motifs had clear peak: TATA, Inr, DPE and ... • a novel promoter element MTE The Core Promoter gets a new look MTE Motif Ten Promoter Element (Kadonaga, powerpoint slides) DPE and MTE Two newly Identified Promoter Elements • Conserved from Drosophila to human (unknown whether occur in yeast) • Very sensitive to spacing to Inr motif – experimentally found TSS (papers not reliable) – single insertion/delection between motifs causes 7-fold reduction in transcription • Inr and DPE (or MTE) bound cooperatively by TFIID – first step in transcription initiation TATA gets top billing but... • In Drosophila (out of 205 core promoters) – TATA and DPE: 14% – TATA only: 29% – DPE only: 26% – Neither: 31% • TATA, DPE, and MTE can all – independently support transcription – compensate for mutation in one other And finally... regulation. • NC2 previously known to repress TATAdependent transcription; unexpectedly found to activate DPE-dependent transcripton • Studied 18 enhancers and estimate that about 25% exhibit some specificity for DPE or TATA • Similar work in progress for MTE The “Promoters” of Splicing (Rotem Sorek) In general it is not known how alternative splicing (AS) is regulated • A few known splicing regulatory proteins – like TFs they are sequence-specific, but they bind to RNA not DNA – binding motif (usually 4-10 nt) can be located in exon or intron – can act as enhancers or silencers • Evidence for combinatorial regulation The typical “motif in a haystack” • Most work on finding splicing factor motifs focuses on exons – short enough that mutation studies feasible • Introns too long, require a computational approach • Compiled training dataset – 250 AS exons, AS both in mouse/human – large set of constituitively spliced (CS) exons, conserved across human/mouse Sorek and Ast, Genome Research 2003 Their Primary Finding: there tends to be significantly more conservation in introns surrounding AS exons than CS exons On average about 100 bases on either side of each exon are conserved, compared to around 7 bases for constituitively spliced exons What’s the explanation? – multiple binding motifs? – helping to determine secondary structure in RNA, which helps lead to correct splicing? Predicting Alternative Splicing • Additional Predictive features – – – – Higher conservation around exon Higher conservation of exon itself (motifs?) Shorter exons Exons that are a multiple of 3 • Method: somehow chose one threshold for each feature? • Performance: scanned human genome, predicted 1000 AS exons (incl training data?) – 70% had EST evidence of AS vs 6-7% baseline – Lab test showed that 7/15 (randomly?) selected from remaining 30% are AS in at least one of 15 tissues • Significance: estimate “splicing promoters” cover 3x10^6 bp Genetic Backup Circuits (Kafri and Pilpel) • Fact: single gene knockouts often have little or no phenotypic effect – 10% lethal in worm – 27% lethal in yeast • Question: Can we better understand the mechanisms of genetic backup? • Task: Predict whether a knockout will be lethal or not Duplicates Suggest Redundancy • Genes with duplicates are less likely to be essential • But clearly this doesn’t tell the whole story – lethal genes can have duplicates – nonessential genes often have no duplicate (Gu, Z. et al Nature 2003) Function of Duplicate Matters • Compute dispensability of yeast genes – growth rate after knockout compared to mean growth rate, averaged over many conditions • Compared GO functional annotations of highly similar genes. Found higher dispensability when – higher functional similarity (Resnik info content) – little functional similarity but high sequence similarity (Blast E-values) Similarity of Expression – backup is best provided by genes which do not share expression patterns 0.9 Dispensability • 40 time series, 500 timepoints • In each condition calculated correlation of expression profiles of each pair of paralogous genes • Average correlation suggests 0.8 0.7 0.6 -0.5 -0.25 0 0.25 0.5 Mean Expression Correlation 0.75 How can we explain this unexpected result? – never similarly expressed • positive correlation: – always similarly expressed • no correlation: – never similarly expressed or – similarly expressed in certain conditions 0.9 Dispensability Classify pairs into: • negative correlation: 0.8 0.7 0.6 -0.5 -0.25 0 0.25 0.5 Mean Expression Correlation 0.75 Variability of Expression 0.95 Dispensability • Use stdDev to quantify consistency of correlation across conditions 0.85 0.75 0.65 0.55 0.45 0.35 0.25 0 0.2 0.4 0.6 StdDev Expression Correlation 0.8 Goldilocks and the three little 1 paralogs Expression Stdev 0.75 correlated in only a subset of conditions Just Right 0.5 0.25 Never Same Expression Too Diverged Always Same Expression Too Similar 0 Strongly Negative Little Correlation Strongly Positive Mean Optimal backup requires the “ability to switch between similar and dissimilar expression in a condition dependent manner” Predictions about the Past... Hypothesized Duplication Mechanism 1. duplication occurs 2. leads to nonstable redundancy 3. quickly followed by either – – mutation and loss of one of the duplicates subfunctionalization leading to stable redundancy Hypothesize two distinct types of subfunctionalization 1. mutation of coding region leading to functional divergence 2. mutation of control region leading to divergence of expression Need for Regulatory Flexibility • This second type of subfunctionalization would entail a quite significant regulatory challenge if the paralogs are to provide backup for one another – Upon mutation of B, A must be turned on in the conditions that would normally require B • Postulate that – this regulatory challenge is met when a gene has a significant amount of regulatory diversity (i.e. different TF motifs) – backup asymmetry arises when one of the genes has few motifs (Kellis suggests otherwise?) Experiments, but no hard numbers • Claim the capacity of genes to respond at the transcriptional level when their counterpart is deleted is central to their ability to provide backup – Most paralogs downregulated when other gene is knocked out (cross-hybridization?) • lower stdev -> down regulation • Claim that asymmetry of backup capability can be predicted based on number of transcription factor binding sites. – Gene that has the larger number of motifs is the one that is capable of providing a backup to the other – Genes with few motifs are “parasites” – can’t backup • Claim an improved ability to predict effect of double knockouts A Question • They claim that only when the genes diverge in function will they be maintained in evolution. • But if the duplicated pair can compensate for each other’s function then won’t there be little selection pressure to maintain both copies? From General Conservation to Specific Motifs • Searched conserved intronic regions for overrepresented hexamer – literature search for most significant hexamer shows that hexamer mentioned as an AS motif in six papers • Next steps: – identify the consensus sequences of additional motifs – learn tissue/developmental specificity for each motif Revealing Selection Patterns in the Evolution of Yeast Transcription Regulation (Amos Tanay, Irit Gat-Viks and Ron Shamir) • Identifying TF binding sites is hard • Even harder to predict more complex interactions – rarely a binary switch – not a linear relation between affinity and acivation – different binding affinities can lead to different results (e.g. P53 can lead to apoptosis or rescue) Conservation indicates functionality Evolution dynamics disclose details of functionality An Analogy: Imagine we didn’t know the genetic code, but just the length of the codes We know that synonymous substitutions are more common in coding regions than nonsynonymous substitutions 1. 2. 3. build a network where each 3-letter nt string is represented by one node put an edge between nodes where the thickness of the edge represents the frequency of mutations in aligned coding regions of related organisms see strongly connected components comprised of nodes which all code for the same amino acid A “Simple” Approach • Chose to use the four recent genomes of “simple” yeasts (promoter regions are relatively short) • Identified 4000 promoters and aligned them using ClustalW • Use simple window scanning method to identify all “motifs” of size 8 • Simple parsimony method to infer ancestral sequences at each node in the phylogeny A Simple Approach (2) • Calculate background substitution rate – 16 parameter background model for each branch in phylogeny • For each motif, compute 8 tables of sitespecific substitution rates s[m, i, a, b] E (count ) – simply count observed substitutions at each site, summed over all branches of the tree and all instances of the motif – normalized substitution rate: log of ratio of observed substitutions over expected substitutions Building a “Selection Network” • Each node represents an 8mer “motif” • Connect all motifs that are 1 substitution apart – if substitution rate is positive, dark edge – if substitution rate is negative, light edge – if not enough data, very thin edge images taken from: http://www.cs.tau.ac.il/~amos/promoter_evo/ • Did some larger scale evaluations based on ChiP and gene expression data • Also some anectodal results Matrix of Substitutions from the Motif Concensus Evolutionary Signatures of Regulatory Sequences (Michael Eisen) • Examples of “Evolutionary Signatures” – coding sequence: conserved conserved variable – structural RNA, nt that basepair are coevolving What are the evolutionary constraints imposed on sequences by TF binding? • Aligned 4 yeast species – for each base in genome, estimate evolutionary rate (very noisy estimates) Analyze the pattern of rate variation across the entire binding site Moses et al Evol Biol 2003 Position-specific Rate Variation • The pattern of rate variation across the entire binding site for a particular TF – within one genome – across genomes Position-specific Rate Variation • The pattern of rate variation across the entire binding site for a particular TF – within one genome – across genomes Highly Correlated • Clearly due to structural constraints – protein contacts – even when we know there’s no contact, there’s DNA bending issues.... These “signatures” are missing from current motif-prediction programs • Although this isn’t a particularly suprising result, many predicted motifs (e.g. from MEME etc.) do not display this TFBS “signature” – could use as a filter, or incorporate it more directly (they’re working on this currently?) • Different families of TF have different “signatures” – Eisen thinks the community is still underutilizing this information Make better use of comparative data by using an explicit evolutionary model • Is there likely to have been a TFBS in the ancestor? – build a PSSM representing the chemical contribution of each base to the binding specificity – use Halpern and Bruno model to predict how the TFBS will evolve given proposal + selection model Make better use of comparative data by using an explicit evolutionary model Moses et al Evol Biol 2003 Larger Cis-Regulatory Sequences • Known binding patterns in Drosophila have low information content – find a sequence match for each TFBS before almost every gene in the genome • Build a statistical model to identify significant clusters of binding sites in windows of arbitrary size – improved detection of cis-regulatory modules – experimental results still show many false positives • Use comparative data to discriminate real clusters from false ones How to use comparative data • Conservation in Drosophila pseudoobscura isn’t a good indicator of functionality – all real and fake clusters have very high overall sequence conservation, including their flanking regions (a surprise) • However... – the actual binding sites are often not conserved – even one or two mutations can destroy a binding site conservation of binding site density is a useful indicator of function An Impassioned Speech on the Evolution of the Scientific Journal • “If you publish [your work] in a journal like Science which fewer and fewer people in the world have access to you run a really big risk of being the next Mendel and that your work will languish in obscurity” • Don’t publish in a journal that “takes your writing, your ideas, thoughts and paper and claims ownership of them and then only doles them out to a relatively narrow bunch of people who have enough money to pay for them..solely to promote the financial health of the journal...” • Don’t be “like Microsoft”... publish in Public Library of Science or another freely available journal For More Information • Most of the talks I picked were invited talks • For the workshop there there is often only an abstract • Video feed is available online: http://www.calit2.net/multimedia/recomb2004vid eos.html • Many have papers that have just come out or are about to come out with additional details... check the authors’ webpages Evolution and Larger CisRegulatory Sequences • what are enhancer? whole regions of binding sites? • how are Drosophila enhancers organized • only 5 binding sites whose specificities are well characterized from experim. studies – low information content – find them all over the genome • Clusters of binding sites -> Surrogate for regulatory function • Shown previously that if look for clusters of these sites – all identified regions overlap known enhancers – don’t find anything else – then I don’t understand next study with 39 clusters • Found 39 clusters – 9 overlap known enhancers – 28 tested experimentally • 6 clearly regulating nearby gene • 3 shown some regulatory role perhaps • remainder don’t appear to be real (but could have wrong promoter? look back at donoga talk) • What’s difference between real and fake? – use comparative mapping • Used two flies (which ones) – distant enough based on coding region conservation that expect to see conservation only of funtionally conserved regions – not the case – all real and fake clusters have very high overall sequence conservation, including their flanking regions (why?) • However, – – – – – binding sites not conserved one or two mutaitons enough to destroy a binding site measure conservation of binding site density show graph (37:18) summary (39:21) • In more distantly related species – alignment more of an issue – binding sites will move around more – been shown that huge binding site turnover– will have 2 separate ways to make the same enhancer – no sequence identity but in experimental studies can replace each other?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Review: RECOMB Satellite Workshop on Regulatory Genomics