* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lluís Millán Ariño GENOMIC DISTRIBUTION AND FUNCTIONAL SPECIFICITY OF
Primary transcript wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Oncogenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Cancer epigenetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Microevolution wikipedia , lookup
Genome evolution wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Gene expression programming wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Designer baby wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Minimal genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
GENOMIC DISTRIBUTION AND FUNCTIONAL SPECIFICITY OF HUMAN HISTONE H1 SUBTYPES Lluís Millán Ariño DOCTORAL THESIS · 2013 Thesis supervisor: Dr. Albert Jordan Vallès Molecular Genomics Department Institut de Biologia Molecular de Barcelona (IBMB) - CSIC DEPARTMENT OF EXPERIMENTAL AND HEALTH SCIENCES – UPF RESULTS RESULTS - CHAPTER I: GENOMIC DISTRIBUTION OF HUMAN HISTONE H1 SUBTYPES Results The main goal of this thesis project was to determine whether the genomic distribution of human histone H1 is different among variants. To achieve it, we decided to use chromatin immunoprecipitation combined with semiquantitative PCR (ChIP-qPCR), promoter array hybridization (ChIP-on-chip) and massive sequencing (ChIP-seq). By the combining these complementary strategies we expected to go in depth in the understanding of the heterogeneity of the histone H1 family. 1. DEVELOPMENT AND CHARACTERIZATION OF STABLE HA-TAGGED T47D-MTVL CELL LINES Because a limited number of H1-variant specific ChIP-grade antibodies exist (only H1.2 and H1X in our hands), we developed T47D-MTVL-derived cell lines stably expressing hemagglutinin (HA)-tagged versions of each of the five somatic H1 variants expressed in most cell types (H1.0, H1.2, H1.3, H1.4 and H1.5) (see Materials and Methods). These cell lines proliferated similarly to parental cells (Figure R.1A) and no significant differences in cell cycle profile were observed compared to T47D original cell line (Figure R.1B). Moreover, HA-tagged H1 variants (H1-HA) were expressed at levels below or similar to their correspondent endogenous histone, and comparable between different H1 variant-HA-expressing cell lines (Figure R.1C). Finally, these exogenous H1 forms were appropriately incorporated into chromatin (Figure R.1D). A B C D 75 Results Figure R.1. Expression of HA-tagged somatic histone H1 variants in breast cancer cells. (A) Proliferation assay to measure the effect of exogenous H1 variant expression. H1-HA expressing cell lines (GFP positive) were mixed 1:1 with parental T47D cells (GFP negative). Cells were split at the indicated times and the percentage of GFP-positive cells was measured by FACS. Data is expressed as percentage of variation of the proportion GFP-positive versus GFP-negative cells along time respect to the initial seeding proportion. (B) Cell cycle profile after propidium iodide (PI) staining. Data is represented as percentage of cells in G1, S and G2/M cell cycle phases. (C and D) H1 extract (C) or chromatin (D) was prepared from T47D-derived cells stably expressing HA-tagged H1 variants, wild-type or a K26A mutant of H1.4, and loaded into a 10% SDS-PAGE. Western blot hybridization was performed with H1 variantspecific antibodies (C), or with anti-HA antibody (D). 2. SPECIFICITY OF THE H1 AND HA ANTIBODIES USED IN ChIP EXPERIMENTS During our chromatin immunoprecipitation (ChIP) assays, we used a specific HA antibody to immunoprecipitate the exogenous H1 variants (H1-HA), and also two commercial antibodies specifically recognizing H1.2 and H1X subtypes. In other to test the specificity of the HA antibody, we performed ChIP in HA-expressing cell lines and cells infected with the empty vector, not containing HA (Figure R.2). Immunoprecipitation with the HA antibody was only achieved in cells expressing H1.2-HA or H1.4-HA, but not in mock cells, were immunoprecipitated material was comparable with IPed material using an unrelated immunoglobulin (IgG). This assays proved the specificity of the HA antibody used in further ChIP experiments for exogenously expressed H1-HA variants. Figure R.2. Specificity of the anti-HA antibody. ChIP was performed in cells expressing H1-HA or mock infected with the empty retroviral expression vector, with anti-HA antibody or unrelated immunoglobulins (IgG), and the abundance of IPed material was quantified by qPCR with ß-actin promoter oligonucleotides. In addition, the ChIP specificity for endogenous H1.2 and H1X antibodies was also confirmed in H1.2 and H1X inducible knock-down cells (Figure R.3). ChIP experiments were performed after 6 days of doxycycline treatment cells, when H1 depletion is achieved, as shown by western blot hybridization. H1.2 and H1X immunoprecipitated material for the corresponding H1.2 and H1X depleted cell lines was much lower than in control non-treated cells for all the genomic 76 Results loci interrogated, although signal was not completely abrogated. This could be due to incomplete knock-down of the H1 variants. A B Figure R.3. Specificity of H1.2 and H1X antibodies in ChIP experiments determined in H1 variant-specific knockdown cells. T47D-derived cells stably harboring an inducible system for shRNA expression against H1.2 (A) or H1X (B) expression were treated with doxycycline for 6 days or left untreated. Then, ChIP was performed with H1 variant-specific antibodies against H1.2 and H1X (or control IgG), and the IPed material was quantified by qPCR with oligonucleotides for the indicated promoters (-10 kb distal promoter or TSS), and corrected by input DNA amplification. Western blot hybridizations showing the rate of H1 depletion upon doxycycline treatment are shown. 3. STUDYING H1 VARIANT DISTRIBUTION BY ChIP-qPCR 3.1. All H1 variants are non-specifically present at gene promoters and are depleted from transcription start sites (TSS) in active genes In our first attempt to elucidate H1 variant distribution in the genome, we tested occupancy in specific loci corresponding to diverse genomic features. Therefore, in ChIP-qPCR experiments using the anti-HA antibody in H1-HA expressing cells, H1-associated chromatin included gene promoters, coding regions and repetitive DNA, irrespective of which H1-HA variant was immunoprecipitated (Figure R.4). The abundance of the different H1 variants in those regions was comparable, although few differences among them were observed, e.g. H1.3 was reduced at alphoid repeats while H1.4 and H1.5 were relatively enriched. Moreover, promoter regions seemed to have reduced H1 levels compared to coding regions or repetitive heterochromatic regions. 77 Results Figure R.4. HA-tagged H1 variants are associated with gene promoters, coding regions and repetitive DNA. ChIP was performed in cells expressing H1-HA with anti-HA antibody and the abundance of IPed material was quantified by qPCR with oligonucleotides for the indicated promoters, coding or heterochromatic regions, and corrected by input DNA amplification with the same primer pair. Later, the specificity of H1 variants distribution was investigated in more detail at gene promoters previously shown to contain H1 at distal regions located 10kbp upstream of their transcription start site (TSS), and a depletion of H1 at the TSS (“H1 valley”) [170]. In those selected gene promoters, all H1 variants were detected at all distal promoter regions tested, in equivalent proportions, and a similar H1 depletion was observed at the TSS of all genes for all H1 variants (Figure R.5A). The distribution of all H1 variants in those selected promoters was also comparable to an H1.4 mutant (K26A) at a residue targeted by acetyl and methyl transferases and reported to be involved in recruiting chromatin proteins (Figure R.5A) [149, 150, 164, 165]. Focusing on endogenous histones, a local depletion of H1 at TSS was also observed by immunoprecipitating H1s with specific H1.2 and H1X antibodies (Figure R.5B). Interestingly, the TSS-associated H1 valley was not observed at genes inactive in these cells, i.e. OCT4 and NANOG (Figure R.5C). Thus, we confirmed in our breast cancer cell line the H1 depletion near the transcriptional start site (TSS) of active genes, but not in repressed genes. Moreover, this “H1 valley” seems not to be generally specific for any of the H1 subtypes tested, including a mutant form of H1.4. 78 Results A B C Figure R.5. All H1 variants are present at gene promoters and depleted from transcription start sites. (A, C) ChIP experiments were performed in T47Dderived cells stably expressing HA-tagged H1 variants, wild-type or a K26A mutant of H1.4, with anti-HA antibody, and the abundance of IPed material was quantified by qPCR with oligonucleotides for the indicated promoters (-10 kb distal promoter or TSS), and corrected by input DNA amplification with the same primer pair. (B) ChIP experiments were performed in parental T47D cells with H1 variantspecific antibodies against H1.2 and H1X and the IPed material was quantified as in (A). To further confirm that the H1 valley was dependent on gene expression, we performed several assays relating the transcriptional status of the genes and the presence H1 depletion at the TSS. Thus, the H1.0-HA and H1.2 valley was evident at genes being expressed, depicted by mRNA accumulation measured by RT-qPCR, and by representation of gene expression microarray data. Moreover, the presence of H1 valley positively correlated with H3K4me3 enrichment at TSS compared to a -10kbp upstream region, a modification associated with active transcription. Finally, an open chromatin state at TSS measured by formaldehydeassisted isolation of regulatory elements (FAIRE)-qPCR, and nucleosome depletion at the TSS (H3 ChIP), was observed in genes presenting a clear H1 valley, but not in genes whose 79 Results promoter was fully occupied with H1 (Figure R.6). In conclusion, an H1 valley was found in the promoter of transcribing genes, but not in repressed genes. Figure R.6. An H1 valley at TSS is found at genes being expressed that show FAIREmeasured open chromatin and increased H3K4me3 at TSS. Several genes representing different levels of expression according to microarrays data were chosen to analyze mRNA abundance by RT-qPCR, chromatin accessibility at TSS by FAIRE-qPCR, and distribution of H1, H3 and H3K4me3 by ChIP at distal (upstream) promoter compared to TSS. ND, not determined. RT-PCR values for each gene were normalized to GAPDH expression and genomic DNA amplification with the same set of primers. ChIP and FAIRE PCRs were normalized to input DNA. 3.2.H1 variants are further depleted from TSS upon induced gene activation Another approach to relate H1 promoter abundance and gene expression is to study the H1 content in inducible promoters under stimulating conditions. To do so, we treated T47D cells either with a progesterone analog (R5020) or mitogenic drugs (PMA). Thus, H1 variant depletion at TSS was evident at the nucleosome B (nucB) of the MMTV inducible promoter after progesterone treatment at different time points (Figure R.7A). Moreover, treating cells 80 Results with PMA after serum starvation increased the total H1, H1.2 and H1X depletion at TSS around 2-fold or more in inducible Jun and Fos promoters, while in active and repressed nonresponsive control genes (PSMB4 and OCT4) the extend of H1 valley did not change (Figure R.7B and C). A B C 81 Results Figure R.7. H1 depletion at TSS of inducible promoters. (A) H1 depletion at TSS of hormone-responsive promoters upon stimulation with ligand. T47D-derived cells harboring an MMTV-luciferase construct and expressing different H1-HAs were treated with a progestin analog (R5020 10nM) for the indicated time and ChIP was performed with anti-HA antibody. The abundance of IPed material was quantified by qPCR with specific oligonucleotides for the MMTV promoter (nucleosome B). (B and C) The H1 valley at TSS of JUN and FOS genes is preformed and increases upon mitogenic stimulation. T47D cells were treated with PMA 100nM for 60 min or left untreated, and ChIP was performed with H1, H1.2, H1X and H3 antibodies. The abundance of IPed material was quantified by qPCR with oligonucleotides for the indicated promoters (-10 kb distal promoter or TSS), and corrected by input DNA amplification (B). Jun and Fos were responsive to PMA as shown in the RT-qPCR experiment (right panels). PSMB4 and OCT4 are non-responsive control genes, active and repressed, respectively, in T47D cells. The table below (C) shows the H1 valley ratio (calculated as distal/TSS) at 0 and 60 min, or the relative H1 valley ratio at 60 min compared to 0 min. Interestingly, these immediate-early response genes presented an H1 and H3 valley already preformed at non-inducing conditions that became deeper upon stimulation. This suggests that their promoter is kept in an “open” state to allow rapid response after stimulation by preserving the nucleosome free regions (NFR) at the TSS, but also by maintaining the promoter free of H1. 4. ChIP-ChIP EXPERIMENTS ON PROMOTER ARRAYS After validating our ChIP method in studying H1 variant localization and, in order to explore the genome-wide distribution of the different H1 variants along gene promoters, we hybridized ChIP material, obtained with variant-specific antibodies or corresponding to HAtagged H1 variants-associated chromatin, on a Nimblegen promoter tiling array (Nimblegen HG18 RefSeq Promoter 3x720K) containing probes for 30,893 transcripts (-3,200 to +800bp to the TSS) arising from 22,542 human promoters. 4.1. Extended depletion of H1 at promoters is dependent on the transcriptional status of the gene and shows differences between variants For each variant, the average log2 ratio of probe intensity for all transcripts was represented regarding the relative distance to TSS, and an H1 valley close to TSS was apparent in all cases. Interestingly, in the two H1.2 samples (endogenous H1.2 and H1.2-HA), the valley was more pronounced and slightly shifted towards the TSS, compared to the rest of H1 variants (endogenous H1X and H1.0/3/4/5-HA) (Figure R.8). 82 Results Figure R.8. Extension of H1 depletion at promoters. Average log2 enrichment ratio of ChIP-chip probe intensity for all transcripts was represented regarding the relative distance to TSS for each variant. Thereafter, this ChIP-chip data was combined with gene expression data for ca. 20,000 of the transcripts, obtained with the parental cell line growing in the same conditions in a human expression array from Agilent (Human v1 Sureprint G3 Human GE 8x60k Microarray). Thus, we worked for further analysis with the 20,338 promoters whose expression data was included in this Agilent expression array, obtained by overlapping transcript IDs from the ChIP-chip promoter array with genes in the expression microarray (Figure R.9). A B Figure R.9. Global gene expression profile in T47D cells. (A) Overlap between unique transcript IDs at the Nimblegen promoter array and Agilent expression microarray. (B) Gene expression profile in T47D cells of 62,976 transcripts, from highest to lowest expression, obtained by hybridization with an Agilent microarray. Next, using these expression data, heat maps representing H1 binding intensity around promoter regions were constructed for each variant, ranking promoters from highest to lowest gene expression (Figure R.10A). An H1 valley was clearly seen for at least the top 50-60% highly expressed transcripts in all variants. Noteworthy, for both H1.2 samples the valley extended towards the lowest expressed genes. Then, the total number of considered transcripts was 83 Results divided into 10 groups, from high to low expression, and average log2 ratio of ChIP-chip probe intensity was represented regarding the relative distance to the TSS for each expression group and each variant (Figure R.10B). These graphs confirmed that H1 depletion at promoters is dependent on the transcriptional status of the gene, as H1 valley was progressively weakening as genes became less expressed. Interestingly, the H1 valley around the TSS was deeper and wider for H1.2 than for the other variants, irrespective whether endogenous or HA-tagged histone was measured, and repressed genes contained less H1.2 compared with other subtypes. These observations point to a specific behavior of H1.2 variant at promoter regions. A B Figure R.10. The extension of H1 depletion at promoters is transcription status-dependent and variant-specific. (A) Heat maps of ChIP-chip probe intensity around TSS (-3,200 to +800bp) for 20,338 transcripts from which the expression rate was determined. Genes are ordered from highest to lowest gene expression. (B) Average log2 ratio of ChIP-chip probe intensity represented regarding the relative distance to the TSS for all transcripts classified according to expression in ten groups containing a same number of transcripts, from highest (EG1) to lowest (EG10) expression. Representative ChIP-chip experiments are shown. 84 Results In general, it is worth noting that H1 depletion extended in some degree at least 1kbp upstream the TSS of active genes for all H1 variants, wider than the predicted extent of the reported nucleosome free region (NFR) that precedes the TSS. In order to confirm this result, ChIP-chip for the core histone H3, used to monitor nucleosome occupancy, was also performed and showed that H3 was also depleted at active genes but more locally than H1 (Figure R.10A and B). Moreover, H3 and all H1s, except H1.2, presented a marked enrichment peak immediately downstream of TSS that may correspond to a positioned nucleosome as previously reported [98, 100]. Additionally, ChIP-qPCR on selected promoters confirmed some of these observations, i.e. some repressed promoters presented high H1.0 content around TSS, but low H1.2 (Figure R.11). Figure R.11. ChIP-qPCR validation of ChIPchip observations. Some repressed genes show an H1.2 valley at the TSS. ChIPed material from H1.0-HA cells or with the specific H1.2 antibody was quantified by qPCR with oligonucleotides for the indicated promoter regions and corrected by input DNA amplification. Selected genes belong to the indicated expression profile percentiles in Figure R.10B. 4.2. An H1 valley at non-protein coding promoter transcripts is seen for H1.2 In addition to protein-coding genes, the promoter array contained 1,145 non-coding promoter transcripts (NRs) including structural RNAs and transcribed pseudogenes, that overall presented a low expression rate compared to the total transcriptome. An H1 valley at TSS was only apparent at the ChIP-chip heat maps for endogenous and HA-tagged H1.2, in agreement with the previous observation that an H1.2 valley occurs even at lowly expressed promoters (Figure R.12). 85 Results A B Figure R.12. Non protein-coding transcripts show an H1.2 valley at TSS. (A) Heat maps of ChIP-chip probe intensity around TSS (-3200 to +800 bp) for 1,145 non protein-coding transcripts (NRs) from which the expression rate was determined (B). NRs are ordered from highest to lowest gene expression. (B) Expression levels of NRs are shown as a boxplot compared to total transcriptome included in the Agilent expression microarray. 4.3. H1.2 abundance at distal promoter is a mark of transcriptional inactivity Noteworthy, previous heat map and expression-dependent plot analysis indicated that H1.2 abundance at distal promoter regions is inversely proportional to gene expression, being more abundant at repressed promoters (Figure R.10A and B). This was also observed in some extent for the other H1 variants and H3, with the exception of the top and bottom ca. 10% expressed genes that showed an inverse correlation. In agreement with this, when gene promoters were ranked from lowest to highest H1 enrichment at the distal promoter region (-3,200 to -2,000 bp from TSS), a negative correlation with gene expression was clearly seen mainly for H1.2 (Figure R.13A). At the same time, we compared the expression rate of highly (top 10%) or lowly (bottom 10%) enriched promoters for each variant with total transcriptome included in the expression array (Figure R.13B). In agreement with previous analysis, genes with the highest distal promoter H1.2 content (top 10%) mainly fell within the lowest expressed genes, whereas genes with the lowest H1.2 content (bottom 10%) fell within the highly expressed genes. This was partially true also for H1X but less evident for the H1-HAs. In conclusion, H1.2 abundance at distal promoter regions is inversely related with the expression of the associated gene, and this is not so true for other variants. 86 Results B A Figure R.13. H1.2 abundance at distal promoter regions negatively correlates with gene expression. (A) Heat maps of gene expression data for 20,338 transcripts ordered from lowest to highest H1 content at distal promoter regions (-3200 to -2000 bp relative to TSS), for each of the H1 variants indicated. (B) Expression levels of genes presenting the highest or lowest H1 variant content at distal promoter are shown as a boxplot, and compared with total transcriptome. 4.4. H1.2 abundance at distal promoter negatively correlates with the presence of other H1 variants With the aim to further characterize the relation of H1 variant abundance in promoter regions, we generated heat maps for all H1 variants ranking genes from low to high H1.2 content at the distal promoter. Interestingly, H1.2 abundance at distal promoter regions inversely correlated with H3, H1X and H1-HA abundance, except H1.2-HA that showed an intermediate pattern (Figure R.14A). This inverse relationship was also observed by performing correlation analysis comparing mean probe intensity at distal promoters for H1.2 with other variants (Figure R.14B). H1.2 negatively correlated with H1.0-HA, H1X and H3, and presented almost no correlation with H1.2-HA, as expected from Figure R.14A. On the other hand, H1.0-HA and H3 presented a strong positive correlation, suggesting that they mainly bind the same promoters. Globally, this indicates that there is a preferential binding of H1.2 for some promoters (mostly repressed genes, according to previous data in Figure R.13) compared to the rest of variants, and vice versa, many promoters are devoid of H1.2 but contain other H1 variants. 87 Results A B Figure R.14. H1.2 abundance at distal promoter regions negatively correlates with abundance of other variants. (A) Heat maps of H1 ChIP-chip probe intensity around TSS (-3,200 to +800bp) for 20,338 transcripts from which the expression rate was determined. Genes are ordered from lowest to highest H1.2 content at distal promoter regions. Genes with the highest or lowest distal H1.2 content are indicated. These genes for each H1 variant (2050 genes in each group, 10% of the total) were used to determine the number of coinciding genes as shown in Figure R.15 and R.16. (B) Correlation scatter plots between abundance of different H1 variants at distal promoter regions. X and Y axis represent mean probe intensity at distal promoter regions (-3200 to -2000 bp relative to TSS), for the indicated H1 variants. R: Pearson’s correlation coefficient. Taking previous observations into account, we further deciphered if differentially H1 variant content at distal promoter was related with the regulation of distinct biological processes. For this purpose we compared gene ontology (GO) analysis of endogenous H1.2 and H1X variants enriched (top 10%) or deprived (bottom 10%) promoters, as these endogenous H1 variants presented an opposite distribution at distal promoter regions (Figure R.14). Analysis using DAVID software depicted that different biological processes were related with differential H1 variant abundance at distal promoter regions in T47D cells. For example, promoters with the lowest content of H1X included genes involved in chromatin organization. Promoters with the lowest H1.2 content included cell-cell signaling or regionalization. Genes with the highest H1X content at promoter included pattern formation, and genes with high H1.2 included repressed genes involved in sensory perception (Table R.1). This observation suggests that different 88 Results biological processes are regulated by different H1 variants, supporting the vision of H1 variants regulating different subsets of genes, and, hence, different functions. Table R.1. Gene ontology of genes presenting the highest (top 10%) or lowest (bottom 10%) H1.2 or H1X content at distal promoter (-3200 to -2000 bp relative to TSS) according to ChIP-chip data shown in Figure R.13. P-value (adjusted for multiple testing by Benjamini method) and false discovery rate are shown. Afterwards, in order to corroborate the preferential binding of H1.2 for certain promoters compared with other variants, Venn diagrams of top 10% genes having high or low H1.2 at distal promoter and high or low H1X were performed to identify genes presenting high2/lowX and low2/highX (Figure R.15A and B). Those genes were supposed to present high levels of H1.2 and low levels of H1X at their distal promoter, and vice versa. The most abundant coincidence was between low2/highX promoters (553 genes), that mainly correspond to expressed genes compared with the total transcriptome in the expression array (Figure R.15C). 89 Results On the other hand, genes containing high H1.2 were mainly repressed, and genes depleted in both H1.2 and H1X were the most active ones. A B C Figure R.15. Coincidence between genes presenting the highest or lowest H1.2 or H1X content at distal promoter. (A) Heat maps of H1 ChIP-chip probe intensity around TSS. Genes are ordered from lowest to highest H1.2 (left) or H1X (right) content at distal promoter regions. Genes with the highest or lowest distal H1 content are indicated. These genes (2050 genes in each group, 10% of the total) were used to determine the number of coinciding genes as shown in Venn diagrams (B). A streaking coincidence exists between genes presenting few H1.2 but high H1X. (C) Expression levels of coinciding genes in the four comparisons depicted in (B). Genes were classified in five expression groups (EG) similar to Figure R.10B, and the percentage of coinciding genes belonging to each of these groups was determined. (Right panel) Expression levels of coinciding genes are also shown as a boxplot. Similarly, we also compared H1.2 with H1.0-HA abundance that, as H1X, also presented an opposed binding at the distal promoter region in relation with H1.2. Venn diagram comparison of top 10% genes having high or low H1.2 versus H1.0-HA showed that the most abundant coincidences were between low2/high0, with 716 genes, and between high2/low0, with 276 genes (Figure R.16). Again, H1.2-enriched genes were mainly repressed. 90 Results A B C Figure R.16. Coincidence between genes presenting the highest or lowest H1.2 or H1.0-HA content at distal promoter. See Figure R.15 legend. Altogether, our data indicates that promoters having few H1.2 are loaded with high amounts of other variants, not only with exogenously expressed H1.0-HA, but also with endogenous H1X. Expression analysis of such groups of genes denoted that genes with few H1 variants at distal promoter are highly expressed, and vice versa, but H1.2 content is the strongest predictor of gene expression (Figures 15C and 16C). Finally, we focused in differential promoter binding of endogenous H1.2 versus H1X (Figure R.15) in order to experimentally confirm that some promoters have preferential binding for particular variants. Thus, representative genes of the intersections for low2/highX (TMEM204 and TUBGCP5) and high2/lowX (COL4A3 and CUGBP2) were randomly selected and interrogated by ChIP-qPCR (Figure R.17A). When we represented the differential ratio between 91 Results H1.2 and H1X (H1.2/H1X) abundance at those selected promoters, both TMEM204 and TUBGCP5 presented a lower relative H1.2/H1X ratio compared with COL4A3 and CUGBP2, as expected, indicating that H1.2 and H1X are differentially enriched within these two groups of genes. So, these results, experimentally confirmed that some genes are enriched in H1.2 or H1X at distal promoters regions. The universality of the relative H1.2/H1X abundance at representative genes was also tested in two additional cell lines by ChIP-qPCR (Figure R.17A and B). HeLa cells showed similar results than T47D, i.e. COL4A3 and CUGBP2 genes presented a higher H1.2/H1X ratio than TMEM204 and TUBGCP5. Interestingly, ratios were even higher than in T47D, reflecting a higher relative abundance of H1.2 compared with H1X in HeLa cells (Figure R.17C and D). On the other hand, MCF7 cells presented similar H1.2/H1X ratios in all four genes, due to the increased H1X signal in COL4A3 and CUGBP2 genes (Figure R.17A, B and C). Note that H1X content at chromatin in MCF7 cells is higher than in T47D cells, which in turn presents more H1X than HeLa cells (Figure R.17C). This result denotes that relative abundance between variants at promoters is not fully conserved between cell types, probably due to differences in their relative content within the nucleus. A C B D 92 Results Figure R.17. The ratio between H1.2 and H1X abundance at selected genes is conserved among T47D and HeLa cells, but not in MCF7, in relation to the cell abundance of each variant. (A) ChIP-qPCR confirmed that some genes are enriched in H1.2 or H1X at distal promoter. The differential ratio between H1.2 and H1X abundance at selected genes observed in T47D cells is conserved in HeLa cells but not in MCF7. TMEM204 and TUBGCP5 genes were randomly chosen among the group of genes presenting low H1.2 and high H1X (553 genes), and COL4A3 and CUGBP2 genes among the genes presenting high H1.2 and low H1X (189 genes) (see Figure R.15). After ChIP-qPCR of H1.2 and H1X abundance at distal promoter regions of these genes in T47D, HeLa and MCF7 cells, the relative ratio H1.2/H1X was calculated. (B) ChIP-qPCR of H1.2 and H1X abundance at TMEM204, TUBGCP5, COL4A3 and CUGBP2 distal promoter regions in T47D, HeLa and MCF7 cells. IPed material was corrected by input DNA amplification. (C) Abundance of H1 variants in T47D, HeLa and MCF7 chromatin determined by immunoblot with specific antibodies. (D) Expression of H1 variants in T47D, HeLa and MCF7 cells determined by RT-qPCR. cDNA levels for the indicated H1 variants were corrected by GAPDH expression and amplification of genomic DNA with the same PCR primers. In relation with this, ChIP-chip experiments for endogenous H1.2 and H1X were further extended to HeLa cells, confirming that these two variants do not totally coexist at the same promoters (Figure R.18). Therefore, when H1.2 and H1X promoter occupancy of both T47D and HeLa cells was represented ranking genes from lowest to highest H1.2 abundance at distal promoter region in T47D cells, H1.2 distribution in HeLa cells resembled H1.2 distribution in T47D, but H1X presented a different distribution in both cell types compared with H1.2. Figure R.18. Comparison of H1.2 and H1X promoter content among T47D and HeLa cell lines. Heat maps of H1.2 and H1X ChIP-chip probe intensity around TSS (-3200 to +800 bp) for 20,338 transcripts. Shown promoters in all heat maps are ordered from lowest to highest H1.2 content at distal promoter regions in T47D cells. 93 Results 4.5. H1.2 abundance correlates with clusters of differential gene expression along chromosomes Next, we represented heat maps of H1.2 abundance at the promoter ordering genes according to their position along several human chromosomes (Figure R.19A). Interestingly, several domains of H1.2 abundance became apparent along chromosomes, correlating with clusters of differential gene expression. Thus, clustered genes presenting high amounts of H1.2 along the promoter, including the TSS, are transcriptionally repressed, while genomic domains with few H1.2 are active. This supports the observation that the amount of H1.2 at distal promoter is related with the transcriptional status of the associated gene, as active and inactive gens cluster together in broad genomic regions depleted or enriched in H1.2, respectively. Noteworthy, after calculating a gene-richness coefficient for each individual chromosome (Figure R.19B), chromosomes 17 and 19, the highest gene-rich chromosomes, showed overall high gene expression and low H1.2 content at promoters. On the other side, the gene-poorest chromosome 13 presented low gene expression and high H1.2 content (Figure R.19A and C). These observations are in agreement with previously reported associations of active and inactive gene-rich domains in the nuclear space, as well as with the distinct radial organization of chromosomes in the nucleus regarding gene density and transcriptional activity [104, 250]. Moreover, they favor again a strong relation between H1.2 content and gene expression. Additionally, the observed clustered distribution of H1 was well conserved within variants in different cell lines, but differed among H1 variants (Figure R.19D). H1.2 and H1X content in chromosome 1 of HeLa cells resembled distribution of their counterparts in T47D cells. However, interestingly, H1X or H1.0-HA abundance was not clustered matching gene expression in both T47D and HeLa cells, like H1.2 did. In summary, H1.2 content at promoters is the best H1 reporter of gene expression. Moreover, its organization in chromosomal clusters correlating with differential gene expression, points to a role of this variant in the chromatin organization within the nucleus in breast cancer cells. 94 Results A B C D 95 Results Figure R.19. H1 variant content at gene promoters along human chromosomes. (A) Heat maps of H1.2 ChIP-chip probe intensity around TSS (-3,200 to +800bp) for genes ordered according to their position along several human chromosomes. Gene expression levels for each gene in T47D cells are represented in the left in two different ways (as a heat map and as graphical representation of log 2 ratios). A gene-richness coefficient (GRC) for each chromosome is indicated. The centromere location is marked with a triangle. A region of interest is marked with an asterisk, viewed in the UCSC genome browser in Figure R.23. (B) Gene-richness of human chromosomes. The relative size (base pairs) of each chromosome compared to the total human genome is represented as percentage, as well as the relative content of coding genes compared to the total number of genes in the genome. The generichness coefficient for each chromosome, calculated as the ratio between the percentage of genes present in each chromosome and the percentage of base pairs of each chromosome to the total human genome, is shown in the graph below. (C) Expression levels of genes present in several chromosomes. Boxplots show the distribution of expression (log2 ratio) in T47D cells for the genes present in each indicated chromosome. (D) Heat maps of H1.2 and H1X ChIP-chip probe intensity around TSS (-3.2 to +0.8 kbp) for genes ordered according to their position along chromosome 1 in T47D and HeLa cells. Heat map of H1.0-HA in T47D is also included. 5. GENES SPECIFICALLY DEREGULATED BY KNOCK-DOWN OF A PARTICULAR H1 VARIANTS ARE NOT ENRICHED IN SUCH VARIANT AT THE PROMOTER We have previously shown that inducible knock-downs (KD) of individual H1 variants deregulate a reduced subset of genes (≤2%), specific for each variant, and including up- or down-regulated genes in similar proportions [97]. One hypothesis would be that these subsets contain genes specifically targeted or with prevalence for some of the H1 variants. If a promoter was heavily or uniquely loaded by a particular variant and H1 was playing a role in its repression or activation, its depletion could cause gene deregulation. We explored the H1 variants occupancy at promoters specifically deregulated by the correspondent variant KD and we found no differences on their abundance at distal promoter regions compared to total genes (Figure R.20C). For example, the H1.2 content at distal promoter of genes up- or downregulated by inducible H1.2 KD was similar to the H1.2 content distribution of total genes, just slightly lower for the up-regulated genes, contrary to what the hypothesis predicted. Results were similar within all H1 KD cell lines. On the other hand, genes deregulated in H1 KD cells were mostly within the top 50% basally expressed genes, especially true for the down- genes (Figure R.20A and B). So, low H1.2 content at distal promoter would be expected for H1.2 according to data in Figure R.10B and R.13, and lower at down- than up-regulated genes. Strikingly, we observed that H1.2 content at down-regulated genes was higher than what was predicted, meaning that perhaps in those genes H1.2 plays an important role, and genes became down-regulated upon depletion from the promoter. 96 Results A B C Figure R.20. Gene expression response to specific H1 variant knock-down is not related with the abundance of such variant at the promoter. (A) Gene expression profile in T47D cells harboring inducible shRNAs against the indicated H1 variant, in the presence or absence of Doxycicline as inducer (6 days treatment at 2.5 mg/ml), represented from highest to lowest basal (-Dox) expression. Data was obtained by hybridization with an Illumina microarray and reported elsewhere [97]. (B) Basal gene expression of the subsets of genes up- or down-regulated in the different H1 KD cells, compared to expression of total transcriptome, represented as boxplots. (C) Boxplots of H1.0-HA, H1.2 and H1X abundance (ChIP-chip probe intensity ) at distal promoter regions (-3200 to -2000 bp relative to TSS) for the genes up- or down-regulated upon H1.0, H1.2 or H1X knock-down, respectively, compared to the H1 abundance of total genes. 6. ChIP-seq EXPERIMENTS TO FURTHER STUDY H1 VARIANT DISTRIBUTION IN THE GENOME To further determine whether the genomic distribution of H1 variants is heterogeneous, we extended our study to high-throughput sequencing analysis in order to get the complete genomic map for several H1 variants in T47D cells. For this purpose, we combined ChIP of endogenous H1.2, H1X, and H3, and of HA-tagged H1.0, H1.2, and H1.4 with high-resolution sequencing (ChIP-seq) up to 50 million reads per sample (Table R.2). Table R.2. Summary of samples analyzed by ChIP-seq in three independent experiments (r1, r2, r3; replica 1, 2 and 3). Read length, number of total reads obtained, and number of mappable reads to the human genome version hg18, as well as mapped rate, is shown. 97 Results 6.1. Genome-wide H1 variant distribution around gene bodies To confirm the results obtained by ChIP-chip, we focused first on the input-subtracted normalized average ChIP signal obtained around coding regions of genes, grouped according to basal expression as before (Figure R.21). Again, the H1 valley at the TSS depended on expression rates and differences were seen between H1 variants. Mainly, H1.2 was less abundant at TSS of non-expressed genes compared to the other subtypes that showed higher levels towards nucleosome +1. Note that both H1.2-HA and endogenous H1.2 behaved similarly, and different than the other variants. Transcription termination sites (TTS) also showed differences between variants, being depleted of H1 subtypes, except for H1.2. Interestingly, the H1 content of gene bodies increased towards the end and was also depending on gene expression rates. Therefore, while H3 levels were uniform along gene bodies, H1 variants such as H1.2 were reduced at the 5’ moiety of highly active genes (Figure Gene expression R.21). Figure R.21. H1 distribution in gene bodies. Average, input-subtracted ChIP-seq signal of H1 variants around gene bodies flanked by transcription start site (TSS) and transcription termination site (TTS), grouped according to basal expression (10% of total genes in each group). EG1 represents top expressed genes, and EG10 genes with the lowest expression. Average for all genes is shown in black. Genic regions are represented as a 3 kb-long meta-gene surrounded by 1 kb region upstream TSS and 1 kb downstream TTS. 98 Results In addition, like in the case of ChIP-chip analysis, the H1 depletion in the promoter of active genes was wider than the predicted nucleosome free region (NFR). Indeed, H3 depletion, reporting nucleosome occupancy, was more restricted to the TSS than H1 depletion, which expanded up to 1kbp upstream the TSS, confirming again the broader H1 depletion in active promoters compared to the NFR. 6.2. H1 variants are differentially depleted from regulatory regions and enriched at CpG sites ChIP-seq experiments enabled us to explore H1 variant distribution out of promoter context. Thereby, in addition to the local displacement of H1 from active promoters, H1 variants were also depleted from other regulatory regions along the genome, i.e. CCCTC-binding factor (CTCF) binding sites corresponding to insulators, and p300 binding sites associated with enhancers, but, interestingly, not that much from DNase hypersensitivity sites and FAIRE regions, representing open chromatin regions (Figure R.22A and D). Moreover, we also studied H1 variant distribution at regions enriched for several histone modifications. Thus, when the input-subtracted coverage of H1 variants across the peaks of selected core histone modifications was calculated, depletion of H1.0 and H1.2, and in some extent H1.4 but not H1X, was associated with positive histone marks linked to strong enhancers such as H3K4me1, H3K4me2 and H3K27ac (Figure R.22B). H1 abundance at H3K4me3 or H3K9ac sites, mainly enriched at TSS of active promoters, was different between variants, reflecting H1.2 depletion at TSS of most genes but local enrichment of the other variants immediately after TSS. On the other hand, no significant enrichment of H1 was found at negative histone marks such as H3K9me3 or H3K27me3. Worth noting, H1.2 abundance was lower at active marks than at marks related with repression and chromatin compaction, in agreement with the observed correlation between H1.2 content and gene repression. Note that data for histone PTMs belongs to HeLa cells, so better correlations would be expected if using T47D data, which is not available. Next, we investigated whether H1 variants coincided with CpG regions across the genome. As seen in Figure R.22C and D, H1.0, H1X and H1.4 were clearly overrepresented at CpG regions, compared with H1.2. Because CpGs are mostly localized at gene promoters, this coincidence may reflect the overall higher abundance of those variants compared to H1.2 around TSS, considering the lowly expressed genes. Alternatively, it may not be discarded a major 99 Results relationship between H1.0 (and other variants different than H1.2) and CpG or DNA methylation. Further analyses on this issue are presented below. A B C D Figure R.22. H1 is depleted from regulatory regions but present at CpG sites in a variant-specific manner. (A) ChIPseq signal of H1 variants at regions enriched for the indicated genomic features shown as a boxplot: DNase hypersensitivity sites (data from T47D cells), FAIRE regions (HeLa data), CTCF and p300 binding sites (T47D data). (B) ChIP-seq signal of H1 variants at regions enriched for the indicated histone marks (data from HeLa cells) shown as a boxplot. (C) ChIP-seq signal of the indicated H1 variants at CpG islands (as defined in UCSC database) shown as a boxplot. (D) Average input-subtracted ChIP-seq signal of H1 variants around the center of genomic CTCF and p300 binding sites (data from T47D data) and CpG islands (as defined in UCSC database). 100