Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Disease epigenomics: Interpreting non-coding variants using chromatin and activity signatures Jason Ernst Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory Challenge: interpreting disease-associated variants Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Roles in gene/chromatin regulation Activator/repressor signatures CATGACTG CATGCCTG Non-coding annotation Chromatin signatures Disease-associated variant (SNP/CNV/…) Other evidence of function Signatures of selection (sp/pop) • GWAS, case-control,… reveal disease-associated variants Molecular mechanism, cell-type specificity, drug targets • Challenges towards interpreting disease variants – – – – Find ‘true’ causative SNP among many candidates in LD Use ‘causal’ variant: predict function, pathway, drug targets Non-coding variant: type of function, cell type of activity Regulatory variant: upstream regulators, downstream targets • This talk: genomics tools for addressing these challenges The good news: ever-expanding dimensions Additional dimensions: Environment Each point represents a Genotype genome-wide dataset Disease Gender Chromatin marks Stage Age Cell types • Now: Cell-type and chromatin-mark dimensions • Next: References for each background • All clearly needed, and increasingly available Difficulty of interpreting increasing # tracks Challenge: simplify – – – – Learn combinations Interpret function Prioritize marks Study dynamics Challenge of data integration in many marks/cells Epigenomic information retains genome ‘state’ in differentiation and development Two types: DNA methyl. Histone marks DNA packaged into chromatin around histone proteins Genome-wide modification maps Hundreds of histone tail modifications already known • Epigenetic modifications • DNA/histone/nucleosome • Encode epigenetic state • Histone code hypothesis • Distinct function for distinct combinations of marks? • Hundreds of histone marks • Astronomical number of histone mark combinations • How do we find biologically relevant ones? • Unsupervised approach • Probabilistic model • Explicit combinatorics Genomic tools for disease SNP interpretation • Chromatin states regulatory region annotation – Combinatorial patterns of marks chromatin states – Distinct classes of prom/enh/transcr/repres’d/repetitive – Reveal new genes, lincRNAs, enhancers, GWAS/SNP • Activity signatures linking enhancer networks – Correlated changes in expression, chromatin, motifs – Link TFs to enhancers and enhancers to targets – Predict causal cell-type specific activators/repressors • Interpreting disease variants – Predicting SNP chromatin states and cell-type specificity – Specific mechanistic predictions for disease SNPs – Measuring selective pressures within human populations ChromHMM: learning ‘hidden’ chromatin states Transcription Start Site Enhancer Observed chromatin marks. Called based on a poisson distribution Most likely Hidden State K4me1 K27ac 1 200bp intervals K4me3 K4me3 Transcribed Region K4me1 K36me3 K36me3 4 6 6 DNA K36me3 K36me3 K4me1 2 3 6 6 High Probability Chromatin Marks in State 0.8 0.8 0.7 1: 2: 3: K4me1 K27ac 0.9 0.8 K4me3 K4me1 0.9 K4me3 4: K4me1 5: 6: 0.9 6 5 5 5 All probabilities are learned de novo from chromatin data alone (Baum-Welch aka. EM) 7 K36me3 Each state: vector of emissions, vector of transitions Ernst and Kellis, Nature Biotech 2010 Chromatin states for genome annotation Promoter states Transcribed states Active Intergenic Repressed • Learn de novo significant combinations of chromatin marks • Reveal functional elements, even without looking at sequence • Use for genome annotation • Use for studying regulation dynamics in different cell types Emerging large-scale genomic/epigenomic datasets Multiple cell types Diverse experiments Developmental time-course Reference Epigenome Mapping Centers Used to study many disease epigenomes ENCODE Chromatin Group (PI: Bernstein) 9 human cell types 9 chromatin marks+WCE HUVEC Umbilical vein endothelial H3K4me1 NHEK Keratinocytes H3K4me2 GM12878 Lymphoblastoid H3K4me3 K562 Myelogenous leukemia H3K27ac HepG2 Liver carcinoma NHLF Normal human lung fibroblast x H3K9ac H3K27me3 H4K20me1 H3K36me3 HMEC Mammary epithelial cell HSMM Skeletal muscle myoblasts +WCE H1 Embryonic +RNA CTCF 15-state model learned jointly Promoter Enhancer Insulator Transcribed Repressed Repetitive HUVEC NHEK … H1 Cell type concatenation approach -Ensures common emission parameters - Verified with independent learning Chromatin states capture coordinated mark changes • State definitions are cell-type invariant – Same combinations consistently found • State locations are cell-type specific – Can study pair-wise or multi-way changes Chromatin states correlation with gene expression -50kb TS S +50kb Lower expression Higher expression Pair-wise changes reveal cell-type specific functions • Gene functional enrichments match cell function • Distinguish On, Off, and Poised promoter states Genomic tools for disease SNP interpretation • Chromatin states regulatory region annotation – Combinatorial patterns of marks chromatin states – Distinct classes of prom/enh/transcr/repres’d/repetitive – Reveal new genes, lincRNAs, enhancers, GWAS/SNP • Activity signatures linking enhancer networks – Correlated changes in expression, chromatin, motifs – Link TFs to enhancers and enhancers to targets – Predict causal cell-type specific activators/repressors • Interpreting disease variants – Predicting SNP chromatin states and cell-type specificity – Specific mechanistic predictions for disease SNPs – Measuring selective pressures within human populations Introducing multi-cell activity profiles Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip-aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Enhancer vs. promoter dynamics Promoters typically active in many cells Enhancers exquisitely cell-type specific Linking candidate enhancers to correlated target genes 10kb Candidate TM4SF1 Enhancer Search for coherent changes between: • gene expression • chromatin marks at distant loci (10kb) Combine two vectors: 1.Expression vector for each gene 2.Vector of mark intensities at dist locus (combine marks based on enhancer emissions) 3. High correlation enhancer/target link Mark intensity correlation w/ expr Predictive power of distal enhancer regions 10kb upstream 100kb upstream 10kb/100kb controls Correlation of individual regions (Sorted by Rank) • At least 100 regions with >80% correlation Coordinated activity reveals enhancer links Enhancer activity Gene activity Predicted regulators Activity signatures for each TF • Distal enhancer hard to integrate in regulatory models • Linked to target genes based on coordinated activity • Linked to upstream regulators using TF expr & motifs Nucleosome Positioning Footprints Supports Transcription Factor Cell Type Predictions Tag Enrichment for H3K27ac Genomic tools for disease SNP interpretation • Chromatin states regulatory region annotation – Combinatorial patterns of marks chromatin states – Distinct classes of prom/enh/transcr/repres’d/repetitive – Reveal new genes, lincRNAs, enhancers, GWAS/SNP • Activity signatures linking enhancer networks – Correlated changes in expression, chromatin, motifs – Link TFs to enhancers and enhancers to targets – Predict causal cell-type specific activators/repressors • Interpreting disease variants – Predicting SNP chromatin states and cell-type specificity – Specific mechanistic predictions for disease SNPs – Measuring selective pressures within human populations Enhancer annotationxxrevisits disease SNPs Previously unlinked phenotypes enriched for cell-type specific enhancers Application1: Pinpoint disease SNPs in enhancers • Much smaller fraction of genome considered • Strong enhancers 1.9%, weak 2.8%, promoter 1.4% Application 2: Make much more precise predictions Use: * Cell-type specificity of chromatin states * Predicted activators/repressors of these states * Predicted motif instances across the genome Ex1: Systemic lupus erythematosus intergenic SNP • SNP in lymphoblastoid GM-specific enhancer state • Disrupts Ets1 motif instance, predicted GM regulator Model: Disease SNP abolishes GM-specific enhancer Ets-1 is a predicted activator of GM/HUVEC enhancers Enhancer activity Gene activity Predicted regulators Activity signatures for each TF • Enhancer class specific to GM and HUVEC cell types • Ets expression Ets-1 motif enrichment in enhancers Model: Ets-1 disruption would abolish enhancer state Ex2: Erythrocyte phenotype study intronic SNP K562: erythroleukaemia cell type ` ` • Disease SNP creates motif instance for Gfi-1 repressor • Gfi-1 predicted repressor for K562-specific enhancers Creation of repressive motif abolishes K562 enhancer Gfi-1 is a predicted repressor of non-K562 enhancers Enhancer activity Gene activity Predicted regulators Activity signatures for each TF • Gfi expression Gfi-1 motif depletion in enhancers • Prediction: Gfi-1 large-scale repression of non-K562 Motif created Gfi-1 recruited enhancer repressed More generally: eQTLs in specific chromatin states Dixon 2007: All eQTLs, Lymphoblasts, 400 ind. Schadt 2008: Trans eQTLs, liver cells, 427 ind. • Nucleotide-resolution genome-wide expr. predictors • Strong enrichment for promoter and enhancer states • Trans-eQTLs select for cell-type specific enhancers Genomic tools for disease SNP interpretation • Chromatin states regulatory region annotation – Combinatorial patterns of marks chromatin states – Distinct classes of prom/enh/transcr/repres’d/repetitive – Reveal new genes, lincRNAs, enhancers, GWAS/SNP • Activity signatures linking enhancer networks – Correlated changes in expression, chromatin, motifs – Link TFs to enhancers and enhancers to targets – Predict causal cell-type specific activators/repressors • Interpreting disease variants – Predicting SNP chromatin states and cell-type specificity – Specific mechanistic predictions for disease SNPs – Measuring selective pressures within human populations