* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplementary Methods and Tables Supplementary Methods ChIP
Essential gene wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Long non-coding RNA wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Transposable element wikipedia , lookup
Public health genomics wikipedia , lookup
Transcription factor wikipedia , lookup
Human genome wikipedia , lookup
Gene desert wikipedia , lookup
Point mutation wikipedia , lookup
Primary transcript wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
SNP genotyping wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Non-coding DNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genome (book) wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Metagenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Supplementary Methods and Tables Supplementary Methods ChIP-chip data normalization and analysis qChIP validation of ChIP-chip results qRT-PCR validation of expression profiling Sequence analysis of AML1-ETO binding regions Supplementary Tables Table S1. AML1-ETO target genes identified by ChIP-chip on the Agilent Human Proximal Promoter 244k Array set. Table S2a. Functions significantly overrepresented on the 1168 AML1ETO putative target genes (B-H p-value <0.05). Table S2b. Pathways significantly overrepresented on the 1168 AML1ETO putative target genes (B-H p-value <0.05). Table S3. Genes found to be AML1-ETO and HDAC1 co-occupied and to present H4 deacetylation on HSPC-AE. Table S4a. Functions significantly overrepresented on the 103 AML1ETO/HDAC1 putative target genes. Table S4b. Pathways overrepresented on the 103 AML1-ETO/HDAC1 putative target genes. Table S5. Genes with simultaneous presence of H3K9me3 and AML1ETO occupancy. Table S6a. Functions significantly overrepresented on the 264 AML1ETO/H3K9me3 putative target genes. Table S6b. Pathways significantly overrepresented on the 264 AML1ETO/H3K9me3 putative target genes (B-H p-value <0.05). Table S7: Unsupervised sequence analysis of AML1-ETO data by PSCAN algorithm. Table S8: Unsupervised sequence analysis of AML1-ETO data by PSCAN algorithm. Table S9: AML1 and SP1 TFBS observed on the different AML1-ETO epigenetically modified gene sets. Table S10a: Functions significantly overrepresented on the 625 AML1ETO target genes in which an SP1 motif has been identify (B-H p-value <0.05). Table S10b: Pathways significantly overrepresented on the 625 AML1ETO target genes in which an SP1 motif has been identify (B-H p-value <0.05). ChIP-chip data normalization and analysis Using the Agilent ChIP Analytics 1.3 software package ( www.chem.agilent.com ) we calculated the log of the ratio of intensity in the IP-enriched channel to intensity in the genomic DNA channel for each probe and used the neighborhood error model to calculate confidence values for each spot on each array. This error model converts the intensity information in both channels to an X score which is dependent on both the absolute value of intensities and background noise in each channel. The X scores are assumed to be normally distributed, thus enabling a p value to be calculated for the enrichment ratio observed at each feature. p values were also calculated based on a second model, assuming that, for any range of signal intensities, IP:WCE log ratios below 0.7 represent noise, as the immunoprecipitation should only result in enrichment of specific signals. Probes were marked as potentially bound if the p value of average X was less than 0.001. Furthermore probe sets were required to pass one of two additional filters: two of the three probes in a probe set must each have single-probe p values < 0.005 or the center probe in the probe set must have a single-probe p value < 0.001 and one of the flanking probes must have a single-probe p value < 0.1. Individual probe sets that fulfilled these criteria and were spaced closely together were collapsed into bound regions if the center probes of the probe sets were within 1000 bp of each other. This algorithm selection criterion identified bound probes for ChIPenriched DNA with each antibody on either HSPC-AE or HSPC control cells. For each ChIP-chip experiment (HA-AML1-ETO, HDAC1, H4Ac, or H3K9me3), probes common to HSPC-AE and HSPC control were removed. Each probe was annotated to the nearest transcript or gene ID using simple proximity heuristics. The relationship between each probe and the nearby gene or transcript identified is defined (assuming a 10kb window) as: “promoter” (when the probe is upstream the TSS), “inside” (when the probe is inside the gene), or as “divergent promoter” (when the probe is upstream of two genes that are transcribed in opposite directions). The HSPC-AE filtered gene lists derived from each experiment were crossed to find the overlapping genes. qChIP validation of ChIP-chip results The following oligonucleotides were used to validate 7 AML1ETO/HDAC1 binding regions and 5 AML1-ETO/H3K9me3, as described in Materials and Methods. A baseline for AML1-ETO enrichment was calculated by qChIP in 3 negative promoters (ELA2, DLEU2, IRF2) taken from 1. The AML1-ETO known target gene CDKN2A was used as positive control for AML1-ETO, HDAC1 and H4Ac qChIPs. The heterochromatic SATa region was used as positive control for H3K9me3 and SUV39H1 qChIPs. Further information about the primers can be found at SAbiosciences www.sabiosciences.com ). Assay position Chromosome RefSeq Primer reference MAPK1 NC_000022.9 GPH022094(+)01A 681 CTCF NC_000016.8 GPH005067(+)01A –18 MLLT3 NC_000009.10 GPH026139(+)01A 836 RPS19 NC_000019.8 GPH006636(+)01A –399 AML1 NC_000021.7 GPH021919(–)01A –743 SIRT1 NC_000010.9 GPH001660(+)01A –393 YES1 NC_000018.8 GPH019642(+)01A –387 CDKN2A NC_000009.10 GPH026158(–)01A –446 ELA2 NC_000019.8 GPH006142(+)01A 601 DLEU2 NC_000013.9 GPH017330(–)1A –801 IRF2 NC_000004.10 GPH023603(–)1A –515 SATa NC_000001.10 GPH110005C(+)1A -508 GSK3 NC_000019.8 GPH020330(-)1A -420 OCLN NC_000005.8 GPH010205(-)1A -348 SOX18 NC_000020.10 GPH1022261(-)1A -568 Gene symbol (from TSS) qRT-PCR validation of expression profiling ( Reactions were performed as described in Materials and Methods. Each sample was run in triplicate. The mean value of replicates for each sample was calculated using the Qbase Software package2. Context Sequence Gene Symbol Target Exons Hs00231079_m1 AML1 5 GTCGACTCTCAACGGCACCCGACCT FAM Hs00923894_m1 CDKN2A 2 GGTCCCTCAGACATCCCCGATTGAA FAM Hs00736972_m1 YES1 1 TCCTGCTGGTTTAACAGGTGGTGTT FAM Hs00180312_m1 MLLT3 9 ACCAACAACAACCAGATTCTTGAAG FAM Hs00902008_m1 CTCF 11 ACAGAACCAGCCAACAGCTATCATT FAM Hs01046830_m1 MAPK1 6 GCTGACTCCAAAGCTCTGGACTTAT FAM Hs01009006_m1 SIRT1 8 AGATTAGTAGGCGGCTTGATGGTAA FAM Hs00357218_g1 RPS19 1 CGGAGGCCGCACGATGCCTGGAGTT FAM Assay ID Dye (the nt seq surrounding the probe) Sequence analysis of AML1-ETO-binding regions Sequence analysis of the DNA regions bound by transcription factors can be performed through bioinformatics approaches that yield different kinds of information. Supervised approaches search for the presence of defined matrices within a group of sequences and highlight significant enrichments using a random set of sequences as a control. The main limitations of these methods are that they only explore a defined set of matrices and they require previous manipulations of raw data to identify DNA sequences bound by a transcription factor. Unsupervised prediction methods are extremely powerful, since they do not rely on predetermined matrices and do not require an arbitrary definition of specifically bound regions. Thus, DNA sequence determinants associated with AML1-ETO binding on the promoter array were analyzed using both supervised and unsupervised prediction methods. First, gene data sets identified in the promoter-bias approach were analyzed through the oPOSSUM motif search system to identify AML1 TFBS3. For this analysis, an area of –5 kb to +5 kb around the TSS was scanned using the vertebrate-specific PSSMs in the JASPAR database4, setting the parameters so that for a predicted binding site to be considered as such it had to have a PSSM score of <85% and a minimum human-mouse conservation of 70%. Second, to identify other TFBS targeted by complexes that include the AML1-ETO fusion protein and other DNA-binding factors, two unsupervised prediction methods were used, namely, the STAMP5 and PSCAN6 algorithms. For the multiple alignment motif study using the STAMP algorithm, the nucleotide sequences were extracted using the University of California Santa Cruz genome browser (UCSC, human database release Hg18, March 2006). The Pearson correlation coefficient was used as a column comparison metric, ungapped Smith-Waterman as the alignment method, iterative refinement as a multiple alignment strategy, and UPGMA as a tree-building algorithm on untrimmed PSAMs. Each input motif was compared with JASPAR 4 and TRANSFAC 7 using the STAMP algorithm (Sandelin and Wasserman test) to recognize potential similarity with other known TFBS. The list obtained (data not shown) was crossed with the list of known AML1 or ETO interacting proteins downloaded from the STRING database8 and BioGRID database9 to select the putative AML1-ETO-complexed transcription factors (data not shown). The TFBS information was also extracted using the PSCAN6 web tool, analyzing an area of 1kb upstream the TSS of each gene. Two lists of known TFBS were obtained, one using the JASPAR database (Table S7) and the other using TRANSFAC (Table S8). References 1 2 3 4 5 6 7 8 9 Gardini A., C. M., Luzi L., Okumura A.J., Biggs J.R., Minardi S.P., Venturini E., Zhang D-E., Pelicci P.G. & Alcalay M. AML1-ETO oncoprotein is directed to AML1 binding regions and co-localizes with AML1 and HEB on its targets. PLOS Genet 4, 1-12 (2008). Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol 8, R19, doi:10.1186/gb-2007-8-2-r19 (2007). Shannan J. Ho Sui, H. R. M., David J. Arenillas, Jochen Brumm, Christopher J. Walsh, Brian P. Kennedy and Wyeth W. Wasserman. oPOSSUM: identification of over-represented transcription factor binding sites in coexpressed genes. Nucl Acid Res 33, 3154-3164 (2005). Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32, D91-94, doi:10.1093/nar/gkh01232/suppl_1/D91 [pii] (2004). Benos, S. M. a. P. V. STAMP: a web tool for exploring DNA-binding motif similarities. . Nucl Acid Res 35, W253-W258 (2007). Zambelli, F., Pesole, G. & Pavesi, G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or coexpressed genes. Nucleic Acids Res 37, W247-252, doi: 10.1093/nar/gkp464 (2009). E. Wingender, P. D., H. Karas and R. Knüppel. TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24, 238241 (1996). Lars J. Jensen, M. K., Manuel Stark, Samuel Chaffron, Chris Creevey,, Jean Muller, T. D., Philippe Julien, Alexander Roth, Milan Simonovic, & Mering ., P. B. a. C. v. String 8- a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37, D412-D416 (2009). Chris Stark, B.-J. B., Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz and Mike Tyers. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34, D535-D539 (2006).