Download Genetics Journal Club - Perelman School of Medicine at the

Genetics Journal Club Sylvia 03/26/15 GWAS SNP to target gene(s) association  eQTL  ASE  Chromosome Conformation Interactions (4C), using the Fto (purple) or Irx3 (blue) promoter as a viewpoint. The locus is displayed around the inner circle, the interaction are displayed as lines (darker lines symbolizing greater significance), and interactions above background are shown on the outer circles. (Figure 1a, taken from Smemo et al). Transcriptional Regulation of Gene Expression Chromosome Conformation Capture Capture-C “one-versus-some” 2002 “one-versus-all” 2006 “many-versus-many” 2006 2014 Hi-C “all-versus-all” 2009 What? Hi-C CTCF previously published histone mods and MethylC public DNAse public WGS RNA-Seq Why? Describe higher order chromatin organization during lineage specification TeSR™-E8™ H1 human embryonic stem cell line The trophoblast “Among the most studied of all hESC lines, H1 was derived by James Thomson, director of regenerative biology at the Morgridge Institute for Research and professor of anatomy at UWMadison, during his 1998 breakthrough discovery of these unique and promising cells. H1 is the first of the formerly approved pre-2001 Bush era lines to meet the new NIH guidelines for stem cell research.” forms the outer layer of the blastocyst, which provide nutrients to the embryo and develop into a large part of the placenta blastocyst Mesoderm and Mesenchymal Stem Cells NPCs and Neurons (Ectoderm) IMR90: human fetal lung fibroblast cell line H1 differentiation TB: 5d E8 minus FGF2 with 50ng/ml BMP4 ME: 2d E8 with 5ng/ml BMP4 and 25ng/ml Activin A MSC: 6d M-SFEM containing 50% StemLine™ II serum-free HSC expansion medium (HSFEM; Sigma), 50% ESFM, GlutaMAX™ (1/100 dilution), Ex-Cyte® supplement (1/2000 dilution), 100 μM MTG, and 10 ng/ml FGF2. NPC: 7d E8 minus FGF2, minus TGFβ1, with only 5 μg/ml insulin, with 10 μM SB431542 and 100ng/ml Noggin. Neurons: additional 25d DMEM/F12 medium supplemented with 1x N2, 1x B27, 64 μg/ml vitamin C, 14ng/ml sodium selenite and 5ng/ml FGF2 Jargon Topological associated Domains (TADs) Mb-scale compartments of interphase chromosomes either open, gene rich, highly transcribed and interactive (A compartment) or closed, gene poor, less transcriptionally active (B compartment) TADs based on Directionality Index and Hidden Markov Model Directionality Index “We noted that the regions at the periphery of the topological domains are highly biased in their interaction frequencies… To determine the directional bias at any given bin in the genome, we developed a Directionality Index (DI) to quantify the degree of upstream or downstream bias of a given bin. The directionality index is calculated in equation 1, where A is the number of reads that map from a given 40kb bin to the upstream 2Mb, B is the number of reads that map from the same 40kb bin to the downstream 2Mb, and E, the expected number of reads under the null hypothesis, is equal to (A + B)/2. Figure 1 | Dynamic reorganization of chromatin structure during differentiation of human ES cells. 36% A/B compartment change in at least one lineage First principal component (PC1) values blue: A compartment yellow: B compartment A/B compartments within TADs Hi-C interaction heat maps many A/B transitions are lineage restricted expansion of repressive heterochromatin during differentiation blue: A compartment yellow: B compartment b) K-means clustering (k=20) of PC1 values for 40-kb regions of the genome that change A/B compartment status in at least one lineage. c) K-means clustering of PC1 values surrounding TAD boundaries (b=TAD boundary) A to B change correlates with decreased expression B to A change with increased expression maybe only subset of genes affected subtle effects! all genes in compartment d) Distribution of fold-change in gene expression for genes that change compartment status or that remain the same (‘stable’) upon differentiation e) Genome browser for two genes of which one (OTX2) shows concordance between expression and PC1 values, whereas a second (TMEM260) does not. Figure 2 | Domain-wide alterations in chromatin interaction frequency and chromatin state. positioning of TADs stable a) Chromatin interaction heat maps in H1 lineages and IMR90 fibroblasts. Also shown are domain calls in ES cells and the directionality index (DI) in each lineage. domain-wide increase or decrease in interactions b) Changes in interaction frequency between ES and MS cells. Regions with higher interaction frequency in ES cells are shown in blue, while regions with higher interaction frequency in MS cells are shown in yellow. TADs having a concerted increase or decrease in intra-domain interaction frequency are labelled yellow or blue, respectively, with the fraction of the domain showing increased or decreased interaction frequency listed. Domains that do not show a concerted change are shown in grey. interaction frequency changes correlate with histone mark changes LOI: loss of interaction GOI: gain of interaction c) Boxplots of Pearson correlations coefficients between interaction frequency changes and chromatin mark changes across TADs for each chromosome (n=23). d) Classification accuracy of the Random Forest model in predicting whether a bin increases or decreases in interaction frequency (n=768,793), tested on 10 randomly selected subsets of Hi-C data. (actual data (blue), circularized permutation (green) and a random permutation (yellow)) e) Ranked chromatin features shown according to importance in classification as boxplots of the mean decrease in Gini index from 10 randomly selected data subsets. Whiskers correspond to the highest and lowest points within 1.53 the interquartile range. The vector of histone modification values was calculated as follows. For each 40-kb interacting bin, the enrichment of a given chromatin mark in the two 40-kb bins that compose the interaction was averaged. The average enrichment was then multiplied by a weight proportional to the genomic distance between the two 40-kb bins. This weight was based on the global average of Hi-C interaction frequencies from six lineages analyzed between loci separated by a given genomic distance. The two vectors were used to calculate a Pearson correlation in each chromosome, which reflects how change in domain-wide Random Forrest model to better understand which histone mark is most predictive for changes in interaction frequency Figure 3| Haplotype-resolved chromatin organization in H1 lineages. Phasing = Haplotype Inference from population genotype data (HapMap, 1000 Genomes) haplotype-specific reanalysis of genome-wide data p1, p2: parental alleles Genotypes Public WGS data Hg18 (Novoalign) Picard tools, GATK, Unified Genotyper Haplotypes From Hi-C data HaploSeq method using HapCUT  chromosome-span haplotypes including 93.5% of all H1 heterozygous variants A/B compartment patterns highly similar (autosomes) c) Genome browser image of PC1 values along chromosome 2 for the p1 and p2 allele. d) Allele specific compartment A/B patterns and mRNA-seq surrounding the imprinted ZDBF2 gene. only 0.6-2.3% of genome e) Boxplots of the difference between alleles of PC1 values. Regions with imprinted genes and allelic genes have more variable PC1 values f) Similar to e, but for regions with differential allelic chromatin activity (the number of allelic biased variants per 200-kb bin). Regions in the top 0.1% of differential allelic activities (orange) show greater differences in PC1 values compared other regions Imprinted genes: list of known imprinted genes downloaded from www.geneimprint.com Figure 4 | Allelic biases in gene expression in H1 lineages. both lineage-specific and constitutive genes mostly not on/off events a) Proportion of genes with detectable allelic expression with statistically significant allelic bias. b) Density plot of the absolute value of the fold change in expression (log2) between alleles. ? c) Heat map showing k-means (k=20) clustering of the allelic expression ratios (log2) at genes with constitutively testable expression (a minimum of 10 reads in each lineage). d) Genome browser image of variable allelic expression of the PARP9 gene. Only in rare cases do genes switch expression from one allele to the other between cell types. Imprinted genes are overrepresented within ASE genes but imprinting not MAIN mechanism of ASE imprinted genes often occur in clusters the majority of allele-biased gene expression is not clustered in the genome e) Fraction of imprinted genes among allele-biased genes and other genes. (P=4.4x10-5, Fisher’s exact test). f) Fraction of allele-biased genes that are known imprinted genes. . allele-specific chromatin mark SNPs closer to ASE genes > role of cis-regulatory elements in ASE ASE strongly correlated with allele-specific chromatin marks at promoters ? 29% majority of ASE genes show allele-specific chromatin marks in promoter g) Cumulative density plot of distances from variants to the nearest allele-specific gene. Allele specific variants are defined using histone acetylation, H3K9me3, H3K27me3, DHS and H3K4me3 h) Number of allele-biased genes showing consistent allele specific chromatin states in their promoter regions. Active variants are defined by H3K4me3, DHS or histone acetylation. Inactive promoter variants are defined by DNA methylation and H3K9me3/27me3. i) Genome browser image of mRNA-seq and chromatin features surrounding the TDG gene. Figure 5 | Allele biases at enhancers in H1 lineages. allelic enhancers closer to ASE genes mostly long –range enhancers high correlation between allelic enhancers and enrichment a) Enrichment of acetylation (top row), DHS (middle) and DNA methylation (bottom) at enhancers defined as allelic by acetylation (left column), DHS (middle), or DNA methylation (right). The active allele is in blue, inactive allele in red. b) The distance between allelic genes and enhancers as defined by allelic histone acetylation (purple) compared with randomly selected enhancers (grey). c) Number of allele specific genes linked to concordantly biased allele specific enhancers. Genes linked by ‘long-range enhancers’ are defined using Hi-C interaction frequencies, whereas ‘short-range enhancers’ are defined as any enhancer less than 20 kb from a genes transcription start site. correlation strongest for strong Hi-C interactions 4/6 allelic enhancers interact with ASE gene promoter d) Boxplots of the Pearson correlation coefficients between allelic gene-enhancer pairs defined by acetylation. Gene-enhancer pairs are grouped into strongly interacting (top 30%), weakly interacting (bottom 30%), and intermediately interacting pairs (others) based on Hi-C interaction frequency (P values using Welch’s t-test). e) Normalized 4C-seq interaction frequencies near the HAPLN1 gene. The 4C-seq bait region is in an allele-biased enhancer near the 3’ end of the EDIL3 gene. Specific interactions called by the LOWESS regression model are shown in black as ‘bait interacting regions’ (BIRs). Are there allele-specific interactions of allelic enhancers and target genes (Hi-C data)? Supplementary Figure 9a correlation trend between allelic enhancer and expression or 4C interaction frequency f) Allele-biased expression of the two alleles of the HAPLN1 gene, histone acetylation levels at the nearby interacting allele-biased enhancer and allele resolved 4C-seq data Summary Extensive A/B compartment switching during differentiation 36% of genome in at least one lineage correlate with gene expression level changes Domain-level changes in interaction intensity correlate with changes in chromatin marks predictable (H3K4me1 most informative) Allele-specific chromatin organization autosomal A/B compartments mostly stable between alleles allelic difference at imprinted genes and regions with allelic chromatin activity Allelic imbalance between different lineages both lineage-specific and constitutive genes imprinted genes overrepresented ASE genes and allelic promoters/enhancers strong correlation between ASE genes and allelic chromatin marks at promoters evidence for correlation between allelic enhancers and ASE Discussion move from Hi-C in different lineages (fg1&2) to ASE and chromosome marks (fig3&4) back to Hi-C Analysis of allele-specific chromosome interactions problematic because of Hi-C resolution problem Figure 2b Are domain-wide changes in interaction intensities associated with lineage specific genes? Both compartment subset gene changes and whole domain changes?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Genetics Journal Club - Perelman School of Medicine at the