Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Translating the Cell’s “Instruction Manual” A Biophysicist’s Approach to Understanding Gene Regulation Rachel Patton McCord Bulyk Lab Harvard University Biophysics Program 3/20/08 “Knobloch lives?” What are characteristics of “life”? Response to environment Take in nutrients and produce waste Reproduction …. Biological Signal Processing oxygen ethanol Biological Signal Processing Inputs Outputs protein Transcription Factor mRNA Nucleus Regulation of Gene Expression Transcription Factor (TF) recognizes DNA bases (ACGT) Promotes gene expression: transcription of mRNA RNA Polymerase Sequence-Specific TFs RNA (output) Organisms Ideal: understand gene regulation in human Problems: Large genome size, diverse cell types, likely complicated gene regulation “rules” Begin with model system single celled organism Saccharomyces cerevisiae (yeast) A few hundred bp Goals: Find DNA sequences bound by TFs Predict how TFs function in the cell Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. TF-DNA Sequence Recognition Protein Binding Microarray (PBM) Technology dsDNA Fluorophore labeled antibody TF TF Microarray slide Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339. TF-DNA Sequence Recognition Protein Binding Microarray (PBM) Technology Laser (488 nm) Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339. Detector Universal Array Design Interested in sequences of 8-10 bases 410 ≈ 1,000,000 total 10-mers 410 / 27 ≈ 40,000 total spots 36 nt variable sequence 24 nt fixed sequence 5’ 3’ CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG CTATCTACACA TATCTACACAC 27 10-mers per spot ATCTACACACA TCTACACACAA Berger, Philippakis et al., Nature Biotechnology (2006), 24:1429-1435. Philippakis, Qureshi et al., RECOMB (2007). Universal Array Design Use an idea from cryptography: “de Bruijn” sequence contains all sequence variants of length k in the shortest sequence possible All possible 3-mers AAA ACA AGA ATA CAA CCA CGA CTA GAA GCA GGA GTA TAA TCA TGA TTA AAC ACC AGC ATC CAC CCC CGC CTC GAC GCC GGC GTC TAC TCC TGC TTC AAG ACG AGG ATG CAG CCG CGG CTG GAG GCG GGG GTG TAG TCG TGG TTG AAT ACT AGT ATT CAT CCT CGT CTT GAT GCT GGT GTT TAT TCT TGT TTT de Bruijn sequence Test sequence (36 bp) Length = 43 = 64 bp Anthony Philippakis, Mike Berger Fixed sequence (24 bp) TCGATTGCGTGACAGGGTAGTCCGGGTTCTTTGCGCTCACTATAC TCGATTGCGTGACAGGGTAAAACAAGACCCTGACCATGGCAGTGT Deriving Binding Strength at each Sequence Every 8mer is represented 16 times Take median over intensities of all spots containing this 8mer Example: CATGGAAA CCGTCAGCAGTCATGGAAAGCTGGTAGAAGTTCTGGGTCTGTGTTCCGTTGTCCGTGCTG TTATACCATGGAAAGACAAACGTAGCATGTTGGAGTGTCTGTGTTCCGTTGTCCGTGCTG CCATGGAAATGTGTCCCTAAGGGTGGTAACAAAATAGTCTGTGTTCCGTTGTCCGTGCTG CACTACGCAAGTGCGGTGCATGGAAAGGGTTCTGGAGTCTGTGTTCCGTTGTCCGTGCTG ATCTCATGGAAAAGACTCATAACGATCAACAGTCGGGTCTGTGTTCCGTTGTCCGTGCTG ACAACAGAGCACCGATGGCATGGAAACTTGCGTAGAGTCTGTGTTCCGTTGTCCGTGCTG GTGGAGAAAGGGGTCAAACATGGAAACGCATCGACAGTCTGTGTTCCGTTGTCCGTGCTG GCCCGGGATCCCATCCATGGAAAATGTCGCTTACATGTCTGTGTTCCGTTGTCCGTGCTG CAGAAGTGTCCTACGTAACATCCACATGGAAAGTACGTCTGTGTTCCGTTGTCCGTGCTG GTTGCATACACGCATGGAAATAACAATCGAACTCCAGTCTGTGTTCCGTTGTCCGTGCTG TCATGTGCTGGGCTTGATTCAGCATGGAAAACCAGTGTCTGTGTTCCGTTGTCCGTGCTG TATTCTTCTCTTCATGGAAACAGTAAAAAATCGGACGTCTGTGTTCCGTTGTCCGTGCTG CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG CCTGGGGACATGGAAAAATGAAGTCACCCATGGTGCGTCTGTGTTCCGTTGTCCGTGCTG ATCATCCTTACATTACATGGAAATCGTGTGCCAATAGTCTGTGTTCCGTTGTCCGTGCTG AAGGCCCATGGAAACCACGTCATATTCACAACTAACGTCTGTGTTCCGTTGTCCGTGCTG Deriving Binding Strength at each Sequence Rev. Comp. Median Signal GTCACGTG GCACGTGC CACGTGCC GCACGTGA TCACGTGA ACACGTGA ATCACGTG CACGTGTA CCACGTGA ACACGTGG CACGTGAG AGCACGTG ACACGTGC CACGTGTC ACCACGTG CACGTGCG CACGTGCA AACACGTG CCACGTGC CACGTGGC ... CACGCGAC GCACGTGC GGCACGTG TCACGTGC TCACGTGA TCACGTGT CACGTGAT TACACGTG TCACGTGG CCACGTGT CTCACGTG CACGTGCT GCACGTGT GACACGTG CACGTGGT CGCACGTG TGCACGTG CACGTGTT GCACGTGG GCCACGTG ... 108178 95854 89203 74295 69377 68733 58874 58656 47900 47240 42887 41755 36764 36463 36380 35515 32370 28948 22983 19315 ... Affinity vs. PBM Signal (Cbf1) log (KD-1) 8-mer ka kd ka [TF] + [DNA] [TF-DNA] kd Signal) log (PBM Median Maerkl and Quake. Science (2007); 315:233-237. Goals: Find DNA sequences bound by TFs PBMs Predict how TFs function in the cell Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. Predicting TF Cellular Functions Use known/measurable inputs and outputs: Gene expression Heat shock Gene Deletion mRNA Gene Expression Data 1327 Publicly Available Microarray Datasets Condition 1 Condition 2 mRNA Predicting Cellular Functions of Components Basic model/assumptions TF binding near genes causes change in expression Similar TF binding probability + similar expression = active regulation PBM data TF1 TF1 TF1 TF1 Expression data Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Physically Realistic Binding Probability Simple (and often used) view: Promoter region is BOUND: Gene is ON Cbf1 GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Gene Promoter region is NOT BOUND: Gene is OFF GGCACGTGGCTGCATGAGCGGAGGCTCGCGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCCGTGCGCCCTTTTATGTTGTCAGTGGGTGCAC Gene Physically Realistic Binding Probability Physical reality: Energy landscape of potential TF binding Cbf1 GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC TF occupancy probability = Integration of binding potential across sequence near gene Dictates likelihood of recruiting RNA polymerase and thus level of mRNA transcription Gene Physically Realistic Binding Probability Physical reality: Energy landscape of potential binding Cbf1 GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Gene Sum median intensity data across all possible 8-mers in sequence near gene Intensity = 117651 Intensity = 215352 GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC Gene Goals of New Analysis Method Combine binding probability with expression data to predict TF function and condition specific binding site usage Target Gene: PBM data 1 Condition A Condition B 2 3 Condition C 4 Condition D 5 6 TF Function Gene expression Goals of New Analysis Method Consider all data rather than drawing arbitrary cutoffs Low affinity binding as well as minor expression changes may be biologically relevant Tanay, 2006; Foat et al., 2006 Binding probability ? CRACR “Combination Rank-order Analysis of Condition-specific Regulation” Basics of CRACR Approach Order genes by expression in condition of interest Assign ranks based on PBM-derived binding probability for TF TF binding rank: 3 6 9 1 8 5 10 4 7 YER130C YAR029W YGR087C YAR014C YAR003W YAL003C YAR018W YAR044W YGR088W YGR043C Most induced 11 YPL054W 2 Most repressed Basics of Analysis Approach Select: similarly expressed foreground genes background set PBM p-value rank: foreground 3 6 9 1 8 5 10 4 7 YER130C YAR029W YGR087C YAR014C YAR003W YAL003C YAR018W YAR044W YGR088W YGR043C Most induced 11 YPL054W 2 background Most repressed Basics of Analysis Approach Slide window along ordered expression Calculate an area statistic for enrichment of PBM targets within each window vs. background 1 area = [ (B + F) ρB ρF B F [ ρ = rank sum F = foreground B = background 1 8 5 10 YAR044W YAR018W YAL003C YAR003W YAR014C YGR087C 4 7 11 YPL054W 9 YER130C 6 YAR029W 3 YGR088W Most induced 2 YGR043C PBM p-value rank: Most repressed Predicting TF Function Plot area statistic (ranges -0.5 to 0.5) at each window Determine condition significance by permutation test-derived threshold (gray line: p < 0.001) metabolism switch enzyme Glucose added: Mig1 targets repressed area statistic Glucose Mig1 induced-----------------repressed Expression fold change >8.0 5.0 3.4 2.3 1.5 0 -1.5 mRNA -2.3 -3.4 -5 <-8 Predicting TF Function Determine which individual genes are repressed by Mig1 Group of genes repressed by Mig1 Glucose added: Mig1 targets repressed Mig1 area statistic YHR005C Mig1 YER130C Mig1 YBL054W induced-----------------repressed Expression fold change >8.0 5.0 3.4 2.3 1.5 0 -1.5 -2.3 -3.4 -5 <-8 Prediction of General TF Function Find all (of 1327) expression conditions where a TF is predicted to be active Look for enrichment of general biological functions in this set Selected Mcm1 significant conditions Conditions for which there is significant enrichment of PBM targets: Effect Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced Expression during the cell cycle (alpha factor arrest and release)(16) induced Expression during the cell cycle (cdc15 arrest and release)(8) induced Expression during the cell Cycle (cdc28)(7) induced Expression in response to 50 nM alpha-factor: 120 min induced Expression in ckb2 deletion mutant induced Expression in dig1, dig2 deletion mutant induced Expression in swi6 (haploid) deletion mutant induced Expression in tec1 (haploid) deletion mutant induced Expression in yel044w deletion mutant induced Expression in sir2 deletion mutant repressed Expression in snf2 mutant cells in minimal medium repressed Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed Prediction of General TF Function Find all (of 1327) expression conditions where a TF is predicted to be active Look for enrichment of general biological functions in this set Selected Mcm1 significant conditions Conditions for which there is significant enrichment of PBM targets: Effect Cell Cycle: Expression in response to Clb2p (set 1, 40 min) induced Expression during the cell cycle (alpha factor arrest and release)(16) induced Expression during the cell cycle (cdc15 arrest and release)(8) induced Expression during the cell Cycle (cdc28)(7) induced Expression in response to 50 nM alpha-factor: 120 min induced Expression in ckb2 deletion mutant induced Expression in dig1, dig2 deletion mutant induced Expression in swi6 (haploid) deletion mutant induced Expression in tec1 (haploid) deletion mutant induced Expression in yel044w deletion mutant induced Expression in sir2 deletion mutant repressed Expression in snf2 mutant cells in minimal medium repressed Expression in response to 50 nM alpha-factor in bni1mutant: 60 min repressed Prediction of General TF Function Find all (of 1327) expression conditions where a TF is predicted to be active Look for enrichment of general biological functions in this Selected set Mcm1 significant conditions Prediction: Mcm1 involved in cell cycle and mating alpha factor “alpha” cell “a” cell Prediction of TF function After PBM experiments, CRACR has been used to predict functions of 90 yeast TFs (paper in process) Binding Site Affinity Effects TF concentration low Binding affinity TF concentration medium High affinity TF Medium affinity TF Low affinity TF Gene 1 TF concentration high ka Gene 2 Gene 3 ka kd [TF] + [DNA] [TF-DNA] kd Demonstrating Effects of Binding site affinity Low vs. high affinity binding sites may have different biological functions Experimentally Validated Occupancy Units Expression after oxidative stress vs. Rap1 binding affinity 20 18 16 14 12 10 8 6 4 2 0 ALD4- Predicted Conditional Target *** ** 0 20 30 Occupancy Units Time after diamide treatment (min) 10 9 8 7 6 5 4 3 2 1 0 MCR1- Predicted Conditional Target * *** 0 20 30 Time after diamide treatment (min) Highest binding affinity……………Lowest binding affinity Goals: Find DNA sequences bound by TFs Predict how TFs function in the cell PBMs CRACR Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. Reasons for Different Functions: TF structure? Goal: Consider biophysical TF structure instead of cartoon “TF blob” cyc8 tup1 Mig1 TF Structure and Function Are certain TFs structurally suited for certain types of biological processes? Case Study: CST6 (bZIP) Lower Information Content Motif GAL4 (Zn2Cys6) Regulatory hub; many target genes cell fate, cell cycle More specific, fewer target genes metabolism of specific nutrients Higher Information Content Motif Goals: Find DNA sequences bound by TFs Predict how TFs function in the cell PBMs CRACR Look for biophysical links between TF structure and function Use quantitative approaches to maintain a physically realistic view of biology. Future Directions Completion of functional predictions and study of yeast gene regulation Toward predictive model in humans Experiments for understanding gene regulation rules Acknowledgements Martha Bulyk Mike Berger Anthony Philippakis Cong Zhu Kelsey Byers Trevor Siggers Vicky Zhou Cherelle Walls Jason Warner Jaime Chapoy Other Bulyk Lab Members NSF graduate research fellowship NIH/NHGRI R01 GO CATS!! Advantages and Challenges of Interdisciplinary Work Insight gained by quantitative reasoning in biology, combining of different perspectives “Physicists and mathematicians choose projects in biology that are fun, but not necessarily important” Important not to get caught up in what “counts” as “true biology” or “true physics”