Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008 Kinds of Questions • Where are the epigenetic modifications? • How do they co-vary? • How do epigenetic changes affect expression of genes? Covariation of Epigenetic Measures • Motivating questions – How are epigenetic modifications related? – What are the major determinants of epigenetic state? • Statistical techniques – Covariance calculation – Principal component analysis – Linear models Location and Covariance • Question: do epigenetic modifiers act on specific targets or do they act on whole regions of DNA? • Direct experimental evidence contradictory • Statistics may help: – Covariation patterns may be evidence CalcA in NCI60 • Calcitonin A gene • Two CpG clusters plus 3 odd CpG’s • High correlation within clusters CDH1 in NCI 60 Covariation in Methylation of 7 Genes • Individual genes have multiple CpG sites • Most variation: overall methylation Correlation Map of 108 CpG sites in 6 genes across 5 ECOG pilot samples Red = 1 White = 0 Blue < 0 Epigenomic Analysis Methylation and Expression • Single gene (E-cadherin) results suggest overall methylation correlated with expression Methylation and Expression • HELP assay gives genome-wide sampling of methylation sites at 15K genes • If select genes with S/N > 2 in both measures, then correlations with associated genes are bi-modal Epigenomic Analysis What Causes Methylation? • NCI-60 derived from various tissues • Tissue characteristic profile + specific history of cells • Fit linear model to each methylation site – 9 tissues for 60 observations • 51 error df • Overall 41% of variance attributable to tissue • What causes the remainder of methylation differences? PCA for Cell-specific Factors 1.0 0.5 0.0 Variances 1.5 pp • Residual variance has one strong PC • Remainder are ‘noise’ • 1st PC is almost constant – Reflects overall level of methylation – Is this an artifact or is it real? – Significantly correlated with expression of DNMT1 & DNMT3A Relations Between Epigenetic Measures - III Stem Cells & Cancer Issue: Cancer Stem Cells? • Hypothesis: cancers arise from stem cells rather than differentiated epithelial cells • How would you tell the difference between partially differentiated stem cells and dedifferentiated epithelial cells? • Proposal: compare characteristic epigenetic modifications of stem cells with cancers • Epigenetic modifications are distinct – PRC2 (stem cells) vs methylation (cancer) Statistical Methodology • Test of association 2 x 2 table PRC2 not Methylated 34 43 Not 97 3 • Fisher Exact p ~ 10-5 Statistical Methodology • Test of association 2 x 2 table PRC2 not Methylated 34 43 Not 97 3 • Fisher Exact p ~ 10-5 • Alternatives – T-test (predictor: PRC2) – Linear model (predictor: methylation: T – N ) PRC2 – Methylation Association Are CIMP’s Stem Cell Clones? • Distinctive PRC2 sites appear preferentially methylated in CIMP tumors Correlations between epigenetic and expression measures – I Copy Number and Expression Copy Number and Expression • Large sections of DNA containing many genes are often copied or deleted • We think most control elements are copied or deleted also • If more (or fewer) copies of a gene then ceteris paribus there should be more (fewer) copies of RNA Integrative Studies of CGH & Gene Expression • Expect to see strong correlation between copy number and expression in data • Previous studies report report weak effects – Average correlations from (0.04 to 0.27) • NCI 60 study average correlation 0.16 Why Not? • H1: there really isn’t much effect – biology – Somehow the cells are compensating – In any case there shouldn’t be any effect on non-expressed genes • H2: we may not be able to measure the effect that is there – technical error – Probes may be insensitive/cross-hybridizing – Signal/noise too low even when probes are sensitive Eliminating Uninformative Genes • Genes which are silenced will not show effect of copy number variation – Mean signal a rough proxy – Remove genes with mean signal above 6.3 • Only genes with significant copy number variation (above measurement noise) will show effect – Select genes with SD of copy number > 0.5 Correlations of Selected Measures Black: All correlations Red: Reliably measured correlations Estimating True Correlations • If measurement noise of SD ~ 0.3 degrades expression measures, then true correlations of variables will be mostly closer to 0 than correlations of measures • Given a correlation and measured standard deviations, what are most likely true standard deviations and true correlation? MLE of Noisy Correlations • • • • Noise can be estimated from replicates If N large can estimate ˆ r 1 e12 / s 2 1 e22 / s 2 SD of originals can be estimated by ML Given s and e, the MLE of correlation can be inferred • For NCI 60 median MLE correlation ~ 0.65 Epigenomic Analysis Correlations between epigenetic and expression measures – II Chromatin and Expression Do Epigenetic Marks Regulate Transcription? • Several studies finding only weak evidence by correlation analysis • Same technical issue: S/N ratio • Questions – Does methylation shut down most genes? – Which histone marks indicate active transcription? Methylation and Expression • HELP assay gives genome-wide sampling of methylation sites at 15K genes • Select genes with S/N > 2 in both measures • Correlations with gene expression values are bi-modal Epigenomic Analysis Interpretation of Meth-Expr Corrs • MLE of negative mode ~ -0.8 • ~ 2/3 of genes under that hump • Unclear whether positive hump is real or an artifact of small sample size • Possible explanations: – True induction by methylation • Methylation of insulator – Irrelevant CpG site Acetylation and Expression • Histones often acetylated during expression • Histone 3 lysine 9 (H3K9) acetylation measured • Measures corrupted by noise – Blue: S/N > 2.5 – Red: S/N > 2 – Black: S/N > 1.5 Biological Prediction • H3K9 acetylation • Is this real? gene expression – Experimental test: find genes with high acetylation variance, and little expression variance by microarray • Results (7 genes) • Confirm hypothesis • Implies: – Expression arrays are not sensitive Epigenomic Analysis