Download Supplementary Methods and Tables Supplementary Methods ChIP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Essential gene wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Long non-coding RNA wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Transposable element wikipedia , lookup

Public health genomics wikipedia , lookup

Transcription factor wikipedia , lookup

Human genome wikipedia , lookup

NEDD9 wikipedia , lookup

Gene desert wikipedia , lookup

Point mutation wikipedia , lookup

Primary transcript wikipedia , lookup

Genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

SNP genotyping wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression programming wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Metagenomics wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transcript
Supplementary Methods and Tables
Supplementary Methods

ChIP-chip data normalization and analysis

qChIP validation of ChIP-chip results

qRT-PCR validation of expression profiling

Sequence analysis of AML1-ETO binding regions
Supplementary Tables

Table S1. AML1-ETO target genes identified by ChIP-chip on the
Agilent Human Proximal Promoter 244k Array set.

Table S2a. Functions significantly overrepresented on the 1168 AML1ETO putative target genes (B-H p-value <0.05).

Table S2b. Pathways significantly overrepresented on the 1168 AML1ETO putative target genes (B-H p-value <0.05).

Table S3. Genes found to be AML1-ETO and HDAC1 co-occupied and
to present H4 deacetylation on HSPC-AE.

Table S4a. Functions significantly overrepresented on the 103 AML1ETO/HDAC1 putative target genes.

Table S4b. Pathways overrepresented on the 103 AML1-ETO/HDAC1
putative target genes.

Table S5. Genes with simultaneous presence of H3K9me3 and AML1ETO occupancy.

Table S6a. Functions significantly overrepresented on the 264 AML1ETO/H3K9me3 putative target genes.

Table S6b. Pathways significantly overrepresented on the 264 AML1ETO/H3K9me3 putative target genes (B-H p-value <0.05).

Table S7: Unsupervised sequence analysis of AML1-ETO data by
PSCAN algorithm.

Table S8: Unsupervised sequence analysis of AML1-ETO data by
PSCAN algorithm.

Table S9: AML1 and SP1 TFBS observed on the different AML1-ETO
epigenetically modified gene sets.

Table S10a: Functions significantly overrepresented on the 625 AML1ETO target genes in which an SP1 motif has been identify (B-H p-value
<0.05).

Table S10b: Pathways significantly overrepresented on the 625 AML1ETO target genes in which an SP1 motif has been identify (B-H p-value
<0.05).
ChIP-chip data normalization and analysis
Using
the
Agilent
ChIP
Analytics
1.3
software
package
(
www.chem.agilent.com ) we calculated the log of the ratio of intensity in the
IP-enriched channel to intensity in the genomic DNA channel for each probe
and used the neighborhood error model to calculate confidence values for
each spot on each array. This error model converts the intensity information in
both channels to an X score which is dependent on both the absolute value of
intensities and background noise in each channel. The X scores are assumed
to be normally distributed, thus enabling a p value to be calculated for the
enrichment ratio observed at each feature. p values were also calculated
based on a second model, assuming that, for any range of signal intensities,
IP:WCE log ratios below 0.7 represent noise, as the immunoprecipitation
should only result in enrichment of specific signals.
Probes were marked as potentially bound if the p value of average X
was less than 0.001. Furthermore probe sets were required to pass one of
two additional filters: two of the three probes in a probe set must each have
single-probe p values < 0.005 or the center probe in the probe set must have
a single-probe p value < 0.001 and one of the flanking probes must have a
single-probe p value < 0.1. Individual probe sets that fulfilled these criteria and
were spaced closely together were collapsed into bound regions if the center
probes of the probe sets were within 1000 bp of each other.
This algorithm selection criterion identified bound probes for ChIPenriched DNA with each antibody on either HSPC-AE or HSPC control cells.
For each ChIP-chip experiment (HA-AML1-ETO, HDAC1, H4Ac, or
H3K9me3), probes common to HSPC-AE and HSPC control were removed.
Each probe was annotated to the nearest transcript or gene ID using simple
proximity heuristics. The relationship between each probe and the nearby
gene or transcript identified is defined (assuming a 10kb window) as:
“promoter” (when the probe is upstream the TSS), “inside” (when the probe is
inside the gene), or as “divergent promoter” (when the probe is upstream of
two genes that are transcribed in opposite directions). The HSPC-AE filtered
gene lists derived from each experiment were crossed to find the overlapping
genes.
qChIP validation of ChIP-chip results
The following oligonucleotides were used to validate 7 AML1ETO/HDAC1 binding regions and 5 AML1-ETO/H3K9me3, as described in
Materials and Methods. A baseline for AML1-ETO enrichment was calculated
by qChIP in 3 negative promoters (ELA2, DLEU2, IRF2) taken from 1. The
AML1-ETO known target gene CDKN2A was used as positive control for
AML1-ETO, HDAC1 and H4Ac qChIPs. The heterochromatic SATa region
was used as positive control for H3K9me3 and SUV39H1 qChIPs. Further
information about the primers can be found at SAbiosciences
www.sabiosciences.com ).
Assay position
Chromosome
RefSeq
Primer reference
MAPK1
NC_000022.9
GPH022094(+)01A
681
CTCF
NC_000016.8
GPH005067(+)01A
–18
MLLT3
NC_000009.10
GPH026139(+)01A
836
RPS19
NC_000019.8
GPH006636(+)01A
–399
AML1
NC_000021.7
GPH021919(–)01A
–743
SIRT1
NC_000010.9
GPH001660(+)01A
–393
YES1
NC_000018.8
GPH019642(+)01A
–387
CDKN2A
NC_000009.10
GPH026158(–)01A
–446
ELA2
NC_000019.8
GPH006142(+)01A
601
DLEU2
NC_000013.9
GPH017330(–)1A
–801
IRF2
NC_000004.10
GPH023603(–)1A
–515
SATa
NC_000001.10
GPH110005C(+)1A
-508
GSK3
NC_000019.8
GPH020330(-)1A
-420
OCLN
NC_000005.8
GPH010205(-)1A
-348
SOX18
NC_000020.10
GPH1022261(-)1A
-568
Gene symbol
(from TSS)
qRT-PCR validation of expression profiling
(
Reactions were performed as described in Materials and Methods.
Each sample was run in triplicate. The mean value of replicates for each
sample was calculated using the Qbase Software package2.
Context Sequence
Gene
Symbol
Target
Exons
Hs00231079_m1
AML1
5
GTCGACTCTCAACGGCACCCGACCT FAM
Hs00923894_m1
CDKN2A
2
GGTCCCTCAGACATCCCCGATTGAA FAM
Hs00736972_m1
YES1
1
TCCTGCTGGTTTAACAGGTGGTGTT
FAM
Hs00180312_m1
MLLT3
9
ACCAACAACAACCAGATTCTTGAAG
FAM
Hs00902008_m1
CTCF
11
ACAGAACCAGCCAACAGCTATCATT
FAM
Hs01046830_m1
MAPK1
6
GCTGACTCCAAAGCTCTGGACTTAT
FAM
Hs01009006_m1
SIRT1
8
AGATTAGTAGGCGGCTTGATGGTAA FAM
Hs00357218_g1
RPS19
1
CGGAGGCCGCACGATGCCTGGAGTT FAM
Assay ID
Dye
(the nt seq surrounding the probe)
Sequence analysis of AML1-ETO-binding regions
Sequence analysis of the DNA regions bound by transcription factors
can be performed through bioinformatics approaches that yield different kinds
of information. Supervised approaches search for the presence of defined
matrices within a group of sequences and highlight significant enrichments
using a random set of sequences as a control. The main limitations of these
methods are that they only explore a defined set of matrices and they require
previous manipulations of raw data to identify DNA sequences bound by a
transcription factor. Unsupervised prediction methods are extremely powerful,
since they do not rely on predetermined matrices and do not require an
arbitrary definition of specifically bound regions. Thus, DNA sequence
determinants associated with AML1-ETO binding on the promoter array were
analyzed using both supervised and unsupervised prediction methods.
First, gene data sets identified in the promoter-bias approach were
analyzed through the oPOSSUM motif search system to identify AML1
TFBS3. For this analysis, an area of –5 kb to +5 kb around the TSS was
scanned using the vertebrate-specific PSSMs in the JASPAR database4,
setting the parameters so that for a predicted binding site to be considered as
such it had to have a PSSM score of <85% and a minimum human-mouse
conservation of 70%.
Second, to identify other TFBS targeted by complexes that include the
AML1-ETO fusion protein and other DNA-binding factors, two unsupervised
prediction methods were used, namely, the STAMP5 and PSCAN6 algorithms.
For the multiple alignment motif study using the STAMP algorithm, the
nucleotide sequences were extracted using the University of California Santa
Cruz genome browser (UCSC, human database release Hg18, March 2006).
The Pearson correlation coefficient was used as a column comparison metric,
ungapped Smith-Waterman as the alignment method, iterative refinement as
a multiple alignment strategy, and UPGMA as a tree-building algorithm on
untrimmed PSAMs. Each input motif was compared with JASPAR 4 and
TRANSFAC
7
using the STAMP algorithm (Sandelin and Wasserman test) to
recognize potential similarity with other known TFBS. The list obtained (data
not shown) was crossed with the list of known AML1 or ETO interacting
proteins downloaded from the STRING database8 and BioGRID database9 to
select the putative AML1-ETO-complexed transcription factors (data not
shown).
The TFBS information was also extracted using the PSCAN6 web tool,
analyzing an area of 1kb upstream the TSS of each gene. Two lists of known
TFBS were obtained, one using the JASPAR database (Table S7) and the
other using TRANSFAC (Table S8).
References
1
2
3
4
5
6
7
8
9
Gardini A., C. M., Luzi L., Okumura A.J., Biggs J.R., Minardi S.P., Venturini
E., Zhang D-E., Pelicci P.G. & Alcalay M. AML1-ETO oncoprotein is directed
to AML1 binding regions and co-localizes with AML1 and HEB on its targets.
PLOS Genet 4, 1-12 (2008).
Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J.
qBase relative quantification framework and software for management and
automated analysis of real-time quantitative PCR data. Genome Biol 8, R19,
doi:10.1186/gb-2007-8-2-r19 (2007).
Shannan J. Ho Sui, H. R. M., David J. Arenillas, Jochen Brumm, Christopher
J. Walsh, Brian P. Kennedy and Wyeth W. Wasserman. oPOSSUM:
identification of over-represented transcription factor binding sites in coexpressed genes. Nucl Acid Res 33, 3154-3164 (2005).
Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W. W. & Lenhard, B.
JASPAR: an open-access database for eukaryotic transcription factor binding
profiles.
Nucleic
Acids
Res
32,
D91-94,
doi:10.1093/nar/gkh01232/suppl_1/D91 [pii] (2004).
Benos, S. M. a. P. V. STAMP: a web tool for exploring DNA-binding motif
similarities. . Nucl Acid Res 35, W253-W258 (2007).
Zambelli, F., Pesole, G. & Pavesi, G. Pscan: finding over-represented
transcription factor binding site motifs in sequences from co-regulated or coexpressed genes. Nucleic Acids Res 37, W247-252, doi: 10.1093/nar/gkp464
(2009).
E. Wingender, P. D., H. Karas and R. Knüppel. TRANSFAC: a database on
transcription factors and their DNA binding sites. Nucleic Acids Res 24, 238241 (1996).
Lars J. Jensen, M. K., Manuel Stark, Samuel Chaffron, Chris Creevey,, Jean
Muller, T. D., Philippe Julien, Alexander Roth, Milan Simonovic, & Mering ., P.
B. a. C. v. String 8- a global view on proteins and their functional interactions
in 630 organisms. Nucleic Acids Res 37, D412-D416 (2009).
Chris Stark, B.-J. B., Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz and
Mike Tyers. BioGRID: a general repository for interaction datasets. Nucleic
Acids Res 34, D535-D539 (2006).