* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Yeast whole-genome analysis of conserved regulatory motifs
Epigenetics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Gene expression profiling wikipedia , lookup
Histone acetyltransferase wikipedia , lookup
Gene desert wikipedia , lookup
Designer baby wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome-wide association study wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenomics wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Chromatin state dynamics in nine human cell types elucidate regulators and disease-associated SNPs Jason Ernst Joint work with Pouya Kheradpour, Luke Ward Brad Bernstein and Manolis Kellis Goal: interpreting disease-associated variants using epigenomics CATGACTG CATGCCTG Epigenomics Disease variants • GWAS implicate hundreds of non-coding loci with disease • Challenges towards interpreting disease variants: – Find ‘true’ causative SNP among many in Linkage Disequilibrium – Determine type of function: especially outside protein-coding – Reveal relevant cell type of activity – Link to upstream regulators and downstream target genes Epigenomics tools to address these challenges 2 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Challenge of data integration in many marks/cells Construct antibodies pull down chromatin ChIP-seq tracks Epigenomic information retains genome ‘state’ in differentiation and development Two types: DNA methyl. Histone marks Histone tail modifications • Dozens of chromatin tracks • Understand their function • Reveal their combinations • Annotate systematically • Common chromatin states DNA packaged into chromatin around histone proteins • Explicitly model combinations • Unsupervised approach, probabilistic model From ‘chromatin marks’ to ‘chromatin states’ Promoter states Transcribed states Active Intergenic Repressed • Learn de novo significant combinations of chromatin marks • Reveal functional elements, even without looking at sequence • Use for genome annotation • Use for studying regulation dynamics in different cell types Ernst and Kellis, Nat Biotech 2010 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants ENCODE: Study nine marks in nine human cell lines 9 human cell types 9 marks 81 Chromatin Tracks (2^81 combinations) H3K4me1 HUVEC Umbilical vein endothelial H3K4me2 NHEK Keratinocytes GM12878 Lymphoblastoid K562 Myelogenous leukemia HepG2 Liver carcinoma H4K20me1 NHLF Normal human lung fibroblast H3K36me3 HMEC Mammary epithelial cell CTCF HSMM Skeletal muscle myoblasts H1 Embryonic H3K4me3 H3K27ac H3K9ac H3K27me3 +WCE +RNA x Brad Bernstein Chromatin Group Ernst et al, Nature 2011 Chromatin states dynamics across nine cell types • Single annotation track for each cell type • Summarize cell-type activity at a glance • Can study 9-cell activity pattern across From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Introducing multi-cell activity profiles Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Linking Distal Regulatory Elements to Genes Which gene(s) is this active enhancer in HMEC likely regulating? ? HMEC state IRF6 expression -0.7 ? H3K27ac signal -1.1 -1.7 1.2 -1.6 0.0 -1.7 -1.3 0.9 0.5 -1.6 -0.1 -1.6 0.1 4.2 0.4 3.7 0.3 Compute correlations between gene expression levels and enhancer associated histone modification signals C1orf107 expression 12 Linking Distal Regulatory Elements to Genes Which gene(s) is this active enhancer in HMEC likely regulating? Random gene expression HMEC state -1.1 IRF6 expression 4.0 -1.7 -0.5 -1.6 -0.8 -1.7 0.5 0.9 -0.5 -1.6 0.6 -1.6 -1.1 4.2 -1.0 3.7 Random H3K27ac signal -0.7 Combine intensity signal from all marks: Train logistic regression classifier to discriminate real from random correlations, conditioned on state, TSS dist, cell type Real Compare correlations between enhancer and gene expression between real and randomized data 13 Enhancer-gene links supported by eQTL-gene links eQTL study 15kb Individuals Indiv. 1 -0.5 Indiv. 2 -1.5 Indiv. 3 -1.8 Indiv. 4 3.1 Indiv. 5 1.1 Indiv. 6 -1.8 Indiv. 7 -1.4 Indiv. 8 3.2 Indiv. 9 4.4 … … Expression level of gene A A A C A A A C C … Validation rationale: • Expression Quantitative Trait Loci (eQTLs) provide independent SNP-to-gene links • Do they agree with activity-based links? Example: Lymphoblastoid (GM) cells study • Expression/genotype across 60 individuals (Montgomery et al, Nature 2010) • 120 eQTLs are eligible for enhancer-gene linking based on our datasets • 51 actually linked (43%) using predictions 4-fold enrichment (10% exp. by chance) Sequence variant at distal position • Independent validation of links. • Relevance to disease datasets.14 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Introducing multi-cell activity profiles Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Coordinated activity reveals activators/repressors Enhancer activity Gene activity Predicted regulators Activity signatures for each TF • Enhancer networks: Regulator enhancer target gene • Ex1: Oct4 predicted activator of embryonic stem (ES) cells • Ex2: Gfi1 repressor of K562/GM cells Causal motifs supported by dips & enhancer assays Dip evidence of TF binding (nucleosome displacement) Enhancer activity halved by single-motif disruption Motifs bound by TF, contribute to enhancers 18 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Revisiting diseaseassociated variants (Ganesh et al, Nat Genet 2009) (Teslovich et al, Nature 2010) (Stahl et al, Nat Genet 2010) (Liu et al, Nat Genet 2010) (Han et al, Nat Genet 2009) (Kathiresan et al, 2008) (Kamatani et al, Nat Genet 2009) (Soranzo et al, Nat Genet 2009) (Houlston et al, Nat Genet 2008) (Newton-Chen et al, Nat Genet 2009) rs9271100 • Disease-associated SNPs enriched for enhancers in relevant cell type Ex1: Systemic lupus erythrematosus SNP: Ets-1 motif • SNP in lymphoblastoid GM enhancer state • Disrupts Ets1 motif instance, predicted GM regulator Model: Disease SNP abolishes GM enhancer Ets-1 is a predicted activator of GM enhancers Enhancer activity Activity signatures for each TF • Ets expression Ets-1 motif enrichment in enhancers Model: Ets-1 disruption would abolish enhancer state Ex2: Erythrocyte phenotype study SNP: Gfi-1 motif K562: erythroleukaemia cell type ` ` • Disease SNP creates motif instance for Gfi-1 repressor • Gfi-1 predicted repressor for K562-specific enhancers Creation of repressive motif abolishes K562 enhancer Gfi-1 is a predicted repressor of non-K562 enhancers Enhancer activity Activity signatures for each TF • Gfi expression Gfi-1 motif depletion in enhancers • Prediction: Gfi-1 large-scale repression of non-K562 Motif created Gfi-1 recruited enhancer repressed SNPs from GWAS Enrich for Cell Type Specific Strong Enhancer Chromatin States in Biologically Relevant Cell Types Title Author/ Journal Multiple loci influence erythrocyte phenotypes in the Ganesh et al CHARGE Consortium. Nat Genet 2009 Biological, clinical and population relevance of 95 loci Teslovich et al for blood lipids Nature 2010 Genome-wide association study meta-analysis identifies Stahl et al seven new rheumatoid arthritis risk loci Nat Genet 2010 Genome-wide meta-analyses identify three loci Liu et al associated with primary biliary cirrhosis Nat Genet 2010 Genome-wide association study in a Chinese Han population identifies nine new susceptibility loci for Han et al systemic lupus erythematosus. Nat Genet 2009 Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein Kathiresan et al cholesterol or triglycerides in humans. Nat Genet 2008 Genome-wide association study of hematological and Kamatani et al biochemical traits in a Japanese population Nat Genet 2009 A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the Soranzo et al HaemGen consortium. Nat Genet 2009 Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal Houlston et al cancer. Nat Genet 2008 Genome-wide association study identifies eight loci Newton-Chen et al associated with blood pressure. Nat Genet 2009 Total #SNPs Fold 35 101 Cell Type 17 K562 11 HepG2 # SNPs in Strong enhancers FDR 9 0.02 13 0.02 29 15 GM12878 7 0.03 6 41 GM12878 4 0.03 18 21 GM12878 6 0.03 18 24 HepG2 5 0.03 39 12 K562 7 0.03 28 15 K562 6 0.03 4 66 HepG2 3 0.03 9 30 K562 4 0.04 From chromatin states to disease Chromatin State Introduction Chromatin State Dynamics across Cell Types Reveal enhancer networks: TFenhancertarget Use these to study disease-associated variants Chromatin state dynamics: Contributions summary • Chromatin states capture mark combinations – Reveal promoter/enhancer/insulator/transcribed regions • Chromatin states capture chromatin dynamics – Single annotation track for each cell type – One 15-state track per cell type instead of 29 combinations • Activity profiles capture correlated changes – Gene expression vs. chromatin: EnhancerGene links – Motifs vs. TF expr vs. chromatin: Activators/Repressors • Regulatory predictions validated: eQTLs/dips/lucif. – eQTLs: links. Dips: binding. Luciferase assays: motif role • Interpret disease-associated variants – Intergenic SNPs enriched for cell-type specific enhancers – Mechanistic predictions reveal potential drug targets Ever-expanding dimensions of epigenomics Additional dimensions: Environment Thousands of whole-genome Genotype datasets Disease Gender Chromatin marks Stage Age Cell types • Today: Cell-type and chromatin-mark dimensions • Next: Personal epigenomes: genotype/phenotype • Complete matrix of conditions, individuals, alleles Collaborators and Acknowledgements Broad Institute/ MGH Pathology/HHMI: • Tarjei Mikkelsen MIT compbio group: • Noam Shoresh • Pouya Kheradpour • Charles B. Epstein • Lucas Ward • Xiaolan Zhang • Manolis Kellis • Li Wang ENCODE consortium • Robyn Issner • Michael Coyne Funding • Manching Ku • NHGRI, NIH, NSF, • Timothy Durham HHMI, Sloan Foundation • Bradley E. Bernstein