Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Translating the Cell’s
“Instruction Manual”
A Biophysicist’s Approach to
Understanding Gene Regulation
Rachel Patton McCord
Bulyk Lab
Harvard University Biophysics Program
3/20/08

“Knobloch lives?”

What are characteristics of “life”?




Response to environment
Take in nutrients and produce waste
Reproduction
….
Biological Signal Processing
oxygen
ethanol
Biological Signal Processing
Inputs
Outputs
protein
Transcription
Factor
mRNA
Nucleus
Regulation of Gene Expression


Transcription Factor (TF) recognizes DNA bases
(ACGT)
Promotes gene expression: transcription of mRNA
RNA Polymerase
Sequence-Specific TFs
RNA
(output)
Organisms

Ideal: understand gene regulation in human

Problems: Large genome size, diverse cell types,
likely complicated gene regulation “rules”

Begin with model system single celled organism
Saccharomyces cerevisiae (yeast)
A few hundred bp
Goals:

Find DNA sequences bound by TFs

Predict how TFs function in the cell

Look for biophysical links between
TF structure and function

Use quantitative approaches to maintain a physically realistic view
of biology.
TF-DNA Sequence Recognition
Protein Binding Microarray (PBM) Technology
dsDNA
Fluorophore
labeled
antibody
TF
TF
Microarray
slide
Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.
TF-DNA Sequence Recognition
Protein Binding Microarray (PBM) Technology
Laser
(488 nm)
Mukherjee, Berger, et al., Nature Genetics (2004), 36:1331-1339.
Detector
Universal Array Design

Interested in sequences of 8-10 bases
410 ≈ 1,000,000 total 10-mers
410 / 27 ≈ 40,000 total spots
36 nt variable sequence
24 nt fixed sequence
5’
3’
CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG
CTATCTACACA
TATCTACACAC
27 10-mers per spot
ATCTACACACA
TCTACACACAA
Berger, Philippakis et al., Nature Biotechnology (2006), 24:1429-1435.
Philippakis, Qureshi et al., RECOMB (2007).
Universal Array Design
Use an idea from cryptography:
“de Bruijn” sequence contains all sequence variants of
length k in the shortest sequence possible
All possible 3-mers
AAA
ACA
AGA
ATA
CAA
CCA
CGA
CTA
GAA
GCA
GGA
GTA
TAA
TCA
TGA
TTA
AAC
ACC
AGC
ATC
CAC
CCC
CGC
CTC
GAC
GCC
GGC
GTC
TAC
TCC
TGC
TTC
AAG
ACG
AGG
ATG
CAG
CCG
CGG
CTG
GAG
GCG
GGG
GTG
TAG
TCG
TGG
TTG
AAT
ACT
AGT
ATT
CAT
CCT
CGT
CTT
GAT
GCT
GGT
GTT
TAT
TCT
TGT
TTT
de Bruijn sequence
Test sequence
(36 bp)
Length = 43 = 64 bp
Anthony Philippakis, Mike Berger
Fixed sequence
(24 bp)
TCGATTGCGTGACAGGGTAGTCCGGGTTCTTTGCGCTCACTATAC

TCGATTGCGTGACAGGGTAAAACAAGACCCTGACCATGGCAGTGT

Deriving Binding Strength at each
Sequence


Every 8mer is represented 16 times
Take median over intensities of all spots containing this 8mer
Example: CATGGAAA
CCGTCAGCAGTCATGGAAAGCTGGTAGAAGTTCTGGGTCTGTGTTCCGTTGTCCGTGCTG
TTATACCATGGAAAGACAAACGTAGCATGTTGGAGTGTCTGTGTTCCGTTGTCCGTGCTG
CCATGGAAATGTGTCCCTAAGGGTGGTAACAAAATAGTCTGTGTTCCGTTGTCCGTGCTG
CACTACGCAAGTGCGGTGCATGGAAAGGGTTCTGGAGTCTGTGTTCCGTTGTCCGTGCTG
ATCTCATGGAAAAGACTCATAACGATCAACAGTCGGGTCTGTGTTCCGTTGTCCGTGCTG
ACAACAGAGCACCGATGGCATGGAAACTTGCGTAGAGTCTGTGTTCCGTTGTCCGTGCTG
GTGGAGAAAGGGGTCAAACATGGAAACGCATCGACAGTCTGTGTTCCGTTGTCCGTGCTG
GCCCGGGATCCCATCCATGGAAAATGTCGCTTACATGTCTGTGTTCCGTTGTCCGTGCTG
CAGAAGTGTCCTACGTAACATCCACATGGAAAGTACGTCTGTGTTCCGTTGTCCGTGCTG
GTTGCATACACGCATGGAAATAACAATCGAACTCCAGTCTGTGTTCCGTTGTCCGTGCTG
TCATGTGCTGGGCTTGATTCAGCATGGAAAACCAGTGTCTGTGTTCCGTTGTCCGTGCTG
TATTCTTCTCTTCATGGAAACAGTAAAAAATCGGACGTCTGTGTTCCGTTGTCCGTGCTG
CTATCTACACACAACTATGCGGTCGCCATGGAAATGGTCTGTGTTCCGTTGTCCGTGCTG
CCTGGGGACATGGAAAAATGAAGTCACCCATGGTGCGTCTGTGTTCCGTTGTCCGTGCTG
ATCATCCTTACATTACATGGAAATCGTGTGCCAATAGTCTGTGTTCCGTTGTCCGTGCTG
AAGGCCCATGGAAACCACGTCATATTCACAACTAACGTCTGTGTTCCGTTGTCCGTGCTG
Deriving Binding Strength at each
Sequence
Rev. Comp.
Median Signal
GTCACGTG
GCACGTGC
CACGTGCC
GCACGTGA
TCACGTGA
ACACGTGA
ATCACGTG
CACGTGTA
CCACGTGA
ACACGTGG
CACGTGAG
AGCACGTG
ACACGTGC
CACGTGTC
ACCACGTG
CACGTGCG
CACGTGCA
AACACGTG
CCACGTGC
CACGTGGC
...
CACGCGAC
GCACGTGC
GGCACGTG
TCACGTGC
TCACGTGA
TCACGTGT
CACGTGAT
TACACGTG
TCACGTGG
CCACGTGT
CTCACGTG
CACGTGCT
GCACGTGT
GACACGTG
CACGTGGT
CGCACGTG
TGCACGTG
CACGTGTT
GCACGTGG
GCCACGTG
...
108178
95854
89203
74295
69377
68733
58874
58656
47900
47240
42887
41755
36764
36463
36380
35515
32370
28948
22983
19315
...
Affinity vs. PBM Signal (Cbf1)
log (KD-1)
8-mer
ka
kd
ka
[TF] + [DNA]
[TF-DNA]
kd Signal)
log (PBM Median
Maerkl and Quake. Science (2007); 315:233-237.
Goals:

Find DNA sequences bound by TFs

PBMs

Predict how TFs function in the cell

Look for biophysical links between
TF structure and function

Use quantitative approaches to maintain a physically realistic view
of biology.
Predicting TF Cellular Functions

Use known/measurable inputs and outputs:
Gene expression
Heat shock
Gene Deletion
mRNA
Gene Expression Data

1327 Publicly Available Microarray Datasets
Condition 1
Condition 2
mRNA
Predicting Cellular Functions of
Components

Basic model/assumptions


TF binding near genes
causes change in
expression
Similar TF binding
probability + similar
expression = active
regulation
PBM data
TF1
TF1
TF1
TF1
Expression data
Gene 1
Gene 2
Gene 3
Gene 4
Gene 5
Physically Realistic Binding
Probability

Simple (and often used) view:
Promoter region is BOUND:
Gene is ON
Cbf1
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG
CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Gene
Promoter region is NOT
BOUND:
Gene is OFF
GGCACGTGGCTGCATGAGCGGAGGCTCGCGGGAAAATACAACAGTCACCCACGTG
CCGTGCACCGACGTACTCGCCTCCGTGCGCCCTTTTATGTTGTCAGTGGGTGCAC
Gene
Physically Realistic Binding
Probability

Physical reality:

Energy landscape of potential TF binding
Cbf1
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG
CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

TF occupancy probability = Integration of binding potential
across sequence near gene
 Dictates likelihood of recruiting RNA polymerase and
thus level of mRNA transcription
Gene
Physically Realistic Binding
Probability

Physical reality:

Energy landscape of potential binding
Cbf1
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG
CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC

Gene
Sum median intensity data across all possible 8-mers in
sequence near gene
Intensity = 117651
Intensity = 215352
GGCACGTGGCTGCATGAGCGGAGTCACGTGGGAAAATACAACAGTCACCCACGTG
CCGTGCACCGACGTACTCGCCTCAGTGCACCCTTTTATGTTGTCAGTGGGTGCAC
Gene
Goals of New Analysis Method

Combine binding probability with expression data
to predict TF function and condition specific
binding site usage
Target Gene:
PBM data
1
Condition A
Condition B
2
3
Condition C
4
Condition D
5
6
TF Function
Gene expression
Goals of New Analysis Method
Consider all data rather than drawing
arbitrary cutoffs

Low affinity binding as well as minor
expression changes may be biologically
relevant

Tanay, 2006; Foat et al., 2006
Binding
probability

?
CRACR
“Combination Rank-order Analysis of
Condition-specific Regulation”
Basics of CRACR Approach


Order genes by expression in condition of
interest
Assign ranks based on PBM-derived binding
probability for TF
TF binding rank:
3
6
9
1
8
5
10
4
7
YER130C
YAR029W
YGR087C
YAR014C
YAR003W
YAL003C
YAR018W
YAR044W
YGR088W
YGR043C
Most
induced
11
YPL054W
2
Most
repressed
Basics of Analysis Approach

Select:


similarly expressed foreground genes
background set
PBM p-value rank:
foreground
3
6
9
1
8
5
10
4
7
YER130C
YAR029W
YGR087C
YAR014C
YAR003W
YAL003C
YAR018W
YAR044W
YGR088W
YGR043C
Most
induced
11
YPL054W
2
background
Most
repressed
Basics of Analysis Approach

Slide window along ordered expression
Calculate an area statistic for enrichment of PBM targets
within each window vs. background
1
area =
[
(B + F)
ρB
ρF
B
F
[

ρ = rank sum
F = foreground B = background
1
8
5
10
YAR044W
YAR018W
YAL003C
YAR003W
YAR014C
YGR087C
4
7
11
YPL054W
9
YER130C
6
YAR029W
3
YGR088W
Most
induced
2
YGR043C
PBM p-value rank:
Most
repressed
Predicting TF Function


Plot area statistic (ranges -0.5 to 0.5) at each window
Determine condition significance by permutation test-derived
threshold (gray line: p < 0.001)
metabolism
switch
enzyme
Glucose added: Mig1 targets repressed
area statistic
Glucose
Mig1
induced-----------------repressed
Expression
fold change
>8.0
5.0
3.4
2.3
1.5
0
-1.5
mRNA
-2.3
-3.4
-5
<-8
Predicting TF Function

Determine which individual genes are repressed by
Mig1
Group of genes repressed by Mig1
Glucose added: Mig1 targets repressed
Mig1
area statistic
YHR005C
Mig1
YER130C
Mig1
YBL054W
induced-----------------repressed
Expression
fold change
>8.0
5.0
3.4
2.3
1.5
0
-1.5
-2.3
-3.4
-5
<-8
Prediction of General TF Function


Find all (of 1327) expression conditions where a TF is predicted
to be active
Look for enrichment of general biological functions in this set
Selected Mcm1 significant conditions
Conditions for which there is significant enrichment of PBM targets:
Effect
Cell Cycle: Expression in response to Clb2p (set 1, 40 min)
induced
Expression during the cell cycle (alpha factor arrest and release)(16)
induced
Expression during the cell cycle (cdc15 arrest and release)(8)
induced
Expression during the cell Cycle (cdc28)(7)
induced
Expression in response to 50 nM alpha-factor: 120 min
induced
Expression in ckb2 deletion mutant
induced
Expression in dig1, dig2 deletion mutant
induced
Expression in swi6 (haploid) deletion mutant
induced
Expression in tec1 (haploid) deletion mutant
induced
Expression in yel044w deletion mutant
induced
Expression in sir2 deletion mutant
repressed
Expression in snf2 mutant cells in minimal medium
repressed
Expression in response to 50 nM alpha-factor in bni1mutant: 60 min
repressed
Prediction of General TF Function


Find all (of 1327) expression conditions where a TF is predicted
to be active
Look for enrichment of general biological functions in this set
Selected Mcm1 significant conditions
Conditions for which there is significant enrichment of PBM targets:
Effect
Cell Cycle: Expression in response to Clb2p (set 1, 40 min)
induced
Expression during the cell cycle (alpha factor arrest and release)(16)
induced
Expression during the cell cycle (cdc15 arrest and release)(8)
induced
Expression during the cell Cycle (cdc28)(7)
induced
Expression in response to 50 nM alpha-factor: 120 min
induced
Expression in ckb2 deletion mutant
induced
Expression in dig1, dig2 deletion mutant
induced
Expression in swi6 (haploid) deletion mutant
induced
Expression in tec1 (haploid) deletion mutant
induced
Expression in yel044w deletion mutant
induced
Expression in sir2 deletion mutant
repressed
Expression in snf2 mutant cells in minimal medium
repressed
Expression in response to 50 nM alpha-factor in bni1mutant: 60 min
repressed
Prediction of General TF Function
Find all (of 1327) expression conditions where a TF is
predicted to be active
 Look for enrichment of general biological functions in this
Selected
set Mcm1 significant conditions


Prediction: Mcm1 involved in cell cycle and mating
alpha factor
“alpha” cell
“a” cell
Prediction of TF function

After PBM experiments, CRACR has been
used to predict functions of 90 yeast TFs
(paper in process)
Binding Site Affinity Effects
TF concentration low
Binding affinity
TF concentration medium
High
affinity
TF
Medium
affinity
TF
Low
affinity
TF
Gene 1
TF concentration high
ka
Gene 2
Gene 3
ka
kd
[TF] + [DNA]
[TF-DNA]
kd
Demonstrating Effects of
Binding site affinity

Low vs. high affinity binding sites may have different
biological functions
Experimentally Validated
Occupancy Units
Expression after oxidative stress vs. Rap1 binding affinity
20
18
16
14
12
10
8
6
4
2
0
ALD4- Predicted Conditional Target
***
**
0
20
30
Occupancy Units
Time after diamide treatment (min)
10
9
8
7
6
5
4
3
2
1
0
MCR1- Predicted Conditional Target
*
***
0
20
30
Time after diamide treatment (min)
Highest binding affinity……………Lowest binding affinity
Goals:

Find DNA sequences bound by TFs


Predict how TFs function in the cell


PBMs
CRACR
Look for biophysical links between
TF structure and function

Use quantitative approaches to maintain a physically realistic view
of biology.
Reasons for Different
Functions: TF structure?

Goal: Consider biophysical TF structure
instead of cartoon “TF blob”
cyc8
tup1
Mig1
TF Structure and Function

Are certain TFs structurally suited for
certain types of biological processes?

Case Study:
CST6 (bZIP)
Lower Information
Content Motif
GAL4 (Zn2Cys6)
Regulatory hub;
many target
genes
cell fate,
cell cycle
More specific,
fewer target
genes
metabolism
of specific
nutrients
Higher Information
Content Motif
Goals:

Find DNA sequences bound by TFs


Predict how TFs function in the cell


PBMs
CRACR
Look for biophysical links between
TF structure and function

Use quantitative approaches to maintain a physically realistic view
of biology.
Future Directions


Completion of functional predictions and
study of yeast gene regulation
Toward predictive model in humans

Experiments for understanding gene regulation
rules
Acknowledgements
Martha Bulyk
Mike Berger
Anthony Philippakis
Cong Zhu
Kelsey Byers
Trevor Siggers
Vicky Zhou
Cherelle Walls
Jason Warner
Jaime Chapoy
Other Bulyk Lab Members
NSF graduate research fellowship
NIH/NHGRI R01
GO CATS!!
Advantages and Challenges of
Interdisciplinary Work



Insight gained by quantitative reasoning in
biology, combining of different perspectives
“Physicists and mathematicians choose
projects in biology that are fun, but not
necessarily important”
Important not to get caught up in what
“counts” as “true biology” or “true physics”
Related documents