Download My Slides - people.vcu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Epigenetic Analysis
BIOS 691- 803
Statistics for Systems Biology
Spring 2008
Kinds of Questions
• Where are the epigenetic modifications?
• How do they co-vary?
• How do epigenetic changes affect
expression of genes?
Covariation of Epigenetic Measures
• Motivating questions
– How are epigenetic modifications related?
– What are the major determinants of
epigenetic state?
• Statistical techniques
– Covariance calculation
– Principal component analysis
– Linear models
Location and Covariance
• Question: do epigenetic modifiers act on
specific targets or do they act on whole
regions of DNA?
• Direct experimental evidence contradictory
• Statistics may help:
– Covariation patterns may be evidence
CalcA in NCI60
• Calcitonin A gene
• Two CpG clusters
plus 3 odd CpG’s
• High correlation
within clusters
CDH1 in NCI 60
Covariation in Methylation of 7 Genes
• Individual genes have multiple CpG sites
• Most variation: overall methylation
Correlation Map of 108
CpG sites in 6 genes across
5 ECOG pilot samples
Red = 1
White = 0
Blue < 0
Epigenomic Analysis
Methylation and Expression
• Single gene (E-cadherin) results suggest
overall methylation correlated with
expression
Methylation and Expression
• HELP assay gives genome-wide sampling
of methylation sites at 15K genes
• If select genes with S/N > 2 in both
measures, then correlations with
associated genes are bi-modal
Epigenomic Analysis
What Causes Methylation?
• NCI-60 derived from various tissues
• Tissue characteristic profile + specific
history of cells
• Fit linear model to each methylation site
– 9 tissues for 60 observations
• 51 error df
• Overall 41% of variance attributable to
tissue
• What causes the remainder of methylation
differences?
PCA for Cell-specific Factors
1.0
0.5
0.0
Variances
1.5
pp
• Residual variance has one strong PC
• Remainder are ‘noise’
• 1st PC is almost constant
– Reflects overall level of methylation
– Is this an artifact or is it real?
– Significantly correlated with expression of
DNMT1 & DNMT3A
Relations Between Epigenetic
Measures - III
Stem Cells & Cancer
Issue: Cancer Stem Cells?
• Hypothesis: cancers arise from stem cells rather
than differentiated epithelial cells
• How would you tell the difference between
partially differentiated stem cells and dedifferentiated epithelial cells?
• Proposal: compare characteristic epigenetic
modifications of stem cells with cancers
• Epigenetic modifications are distinct
– PRC2 (stem cells) vs methylation (cancer)
Statistical Methodology
• Test of association 2 x 2 table
PRC2
not
Methylated 34
43
Not
97
3
• Fisher Exact p ~ 10-5
Statistical Methodology
• Test of association 2 x 2 table
PRC2
not
Methylated 34
43
Not
97
3
• Fisher Exact p ~ 10-5
• Alternatives
– T-test (predictor: PRC2)
– Linear model (predictor: methylation: T – N )
PRC2 – Methylation Association
Are CIMP’s Stem Cell Clones?
• Distinctive PRC2 sites appear
preferentially methylated in CIMP tumors
Correlations between epigenetic
and expression measures – I
Copy Number and Expression
Copy Number and Expression
• Large sections of DNA containing many
genes are often copied or deleted
• We think most control elements are copied
or deleted also
• If more (or fewer) copies of a gene then
ceteris paribus there should be more
(fewer) copies of RNA
Integrative Studies of
CGH & Gene Expression
• Expect to see strong correlation between
copy number and expression in data
• Previous studies report report weak effects
– Average correlations from (0.04 to 0.27)
• NCI 60 study average correlation 0.16
Why Not?
• H1: there really isn’t much effect – biology
– Somehow the cells are compensating
– In any case there shouldn’t be any effect on
non-expressed genes
• H2: we may not be able to measure the
effect that is there – technical error
– Probes may be insensitive/cross-hybridizing
– Signal/noise too low even when probes are
sensitive
Eliminating Uninformative Genes
• Genes which are silenced will not show
effect of copy number variation
– Mean signal a rough proxy
– Remove genes with mean signal above 6.3
• Only genes with significant copy number
variation (above measurement noise) will
show effect
– Select genes with SD of copy number > 0.5
Correlations of Selected Measures
Black: All correlations
Red: Reliably measured correlations
Estimating True Correlations
• If measurement noise of SD ~ 0.3
degrades expression measures, then true
correlations of variables will be mostly
closer to 0 than correlations of measures
• Given a correlation and measured
standard deviations, what are most likely
true standard deviations and true
correlation?
MLE of Noisy Correlations
•
•
•
•
Noise can be estimated from replicates
If N large can estimate ˆ  r 1  e12 / s 2 1  e22 / s 2 
SD of originals can be estimated by ML
Given s and e, the MLE of correlation can be
inferred
• For NCI 60 median MLE correlation ~ 0.65
Epigenomic Analysis
Correlations between epigenetic
and expression measures – II
Chromatin and Expression
Do Epigenetic Marks Regulate
Transcription?
• Several studies finding only weak
evidence by correlation analysis
• Same technical issue: S/N ratio
• Questions
– Does methylation shut down most genes?
– Which histone marks indicate active
transcription?
Methylation and Expression
• HELP assay gives genome-wide sampling
of methylation sites at 15K genes
• Select genes with S/N > 2 in both
measures
• Correlations with gene expression values
are bi-modal
Epigenomic Analysis
Interpretation of Meth-Expr Corrs
• MLE of negative mode ~ -0.8
• ~ 2/3 of genes under that hump
• Unclear whether positive hump is real or
an artifact of small sample size
• Possible explanations:
– True induction by methylation
• Methylation of insulator
– Irrelevant CpG site
Acetylation and Expression
• Histones often acetylated during
expression
• Histone 3 lysine 9 (H3K9) acetylation
measured
• Measures corrupted by noise
– Blue: S/N > 2.5
– Red: S/N > 2
– Black: S/N > 1.5
Biological Prediction
• H3K9 acetylation
• Is this real?
gene expression
– Experimental test: find genes with high
acetylation variance, and little expression
variance by microarray
• Results (7 genes)
• Confirm hypothesis
• Implies:
– Expression arrays are not sensitive
Epigenomic Analysis
Related documents