Download Supplementary Information (doc 7548K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcription factor wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Frameshift mutation wikipedia , lookup

Genetic drift wikipedia , lookup

Epistasis wikipedia , lookup

X-inactivation wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Primary transcript wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Mutation wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of depression wikipedia , lookup

Gene wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Genomic imprinting wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

SNP genotyping wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Tag SNP wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Point mutation wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

NEDD9 wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Supplementary information, Celton et al.
Contents
Supplementary Figure 1. Point mutations and insertion of GATA2 locus in CG-SH
Supplementary Figure 2. Correlation between the mutational status of DNMT3A and the allele-specific
expression of GATA2.
Supplementary Figure 3. Validation of allele usage AML patients expressing DNMT3A R882.
Supplementary Figure 4. DNA methylation differences from TCGA data.
Supplementary Figure 5. Allele usage of H19 in DNMT3A R882 mutants and linked to
hypermethylation.
Supplementary Figure 6 Positive and negative gene correlation network diagram for GATA2.
Supplementary Figure 7. Transcription factor binding site enrichment by community
Supplemental References
Supplementary Table 1. Clinical characteristics of CG-SH cell line and AML patients.
Supplementary Table 2. Normal cord blood CD34+ cells characteristics.
Supplementary Table 3. Occurrence of allele-specific expression in non-NK-AML patients.
Supplementary Table 4. Over-represented TFBS motifs in GATA2-related gene network.
Supplementary Table 5. Known coding variants in GATA2 and their predicted effects.
Supplementary Figure 1 | Point mutations and insertion of GATA2 locus in CG-SH.
a, SNPs and DNA rearrangements in both alleles of GATA2 in CG-SH cell line are shown.
Genomic co-ordinates (hg19) for translational start (ATG), SNPs and start and end of
duplicated region are shown above wildtype allele 1 (silent). Red triangles represent nonreference SNP alleles while white triangles match reference bases at these positions. The
expressed allele 2 also has a novel point mutation and position 128,200,682 indicated and a
duplication and reinsertion of part of the last intron and exon marked by a red box at the
insertion point 128,199,926.
b, Validation of allele-specific expression within GATA2 in CG-SH. Chromatograms from the
direct sequencing of cDNA and gDNA fragments containing the two heterozygous novel and
known SNPs (chr3: 128,200,682 and 128,199,662), confirm the allele-specific expression of
these variants (arrows).
Supplementary Figure 2 | Correlation between the mutational status of DNMT3A and
the allelic bias in expression of GATA2. Screenshots taken from the Integrated Genome
Viewer (IGV) illustrate the read coverage of gDNA and cDNA at heterozygous SNP positions
from 4 NK-AML patients with wild type DNMT3A and 4 with mutation DNMT3A R882. A
decrease, but not complete loss, of allele-specific expression of GATA2 can be seen in
patients with the mutated form of DNMT3A (right column).
S
upplementary Figure 3 | Validation of allele usage AML patients expressing DNMT3A
R882. Chromatograms from the direct sequencing of gDNA and cDNA fragments containing
heterozygous SNPs (arrows) from 2 NK-AML patients with wild type DNMT3A and 2 with
mutation DNMT3A R882 are shown. The results confirm the RNA-seq data showing allelespecific expression of SNPs in DNMT3A wild type patients and a decrease of allele-specific
expression in DNMT3A R882 patients.
Supplementary Figure 4 | TCGA DNA methylation data for DNMT3a mutants. Data from
TCGA DNA methylation arrays (Illumina Infinium HumanMethylation450 beadchip) status
across GALNT7, H19 and ESRP2 are shown. Red lines represent data from 21 patients with
wildtype DNMT3a and red lines show data from 7 patients with DNMT3a-882 mutations. Each
point represents the average signal for patients at 1 probe and “*” mark positions of probes with
a significant difference (p-val <0.001) between DNMT3a wt and mutant patients. Exons for each
gene are shown as green rectangles and black boxes represent CpG islands defined in the
UCSC human genome browser (hg19). ESRP2 was identified previously as having significant
methylation differences in patients with DNMT3a mutations compared to wildtype1.
Supplementary Figure 5 | Allele usage of H19 in DNMT3A R882 mutants and linked to
hypermethylation.
a, Annotated position plot of allele usage for known SNP positions in H19 is shown for either
the reference allele (green box) or alternate allele (red box) for the AML cell line CG-SH or adult
NK-AML patients expressing either the wild type (top lines) or mutated (bottom lines) form of
DNMT3A. The height of boxes is representative of the number of reads mapping and read totals
are shown for each allele when both have  15% of the total. SNPs are identified by the position
in H19 and by the dbSNP rsID number.
b, Chromatograms from the direct sequencing of cDNA fragments of H19 gene containing 2
heterozygous SNPs (arrows) from CG-SH cells treated without (left) and with DAC (0.1uM for 3
days) (right). The second allele of H19 is expressed after hypomethylation treatment.
a
Supplementary Figure 6a | Positive and negative GATA2 co-expression network.
Positive GATA2 expression network built based on the 30 genes most correlated with GATA2
(indicated by a *). Some genes of potential biological relevant genes to AML are indicated in
boxes (e.g. WT1, MSI2 and DNMT3b). Colouring shown is based on sub-network discovery
algorithm of Blondel et al.2 as implemented in Gephi software (see Material and methods).
b
Supplementary Figure 6b | Negative GATA2 co-expression network.
A network of the 30 genes whose expression was most anti-correlated with GATA2 (indicated
by * within node and a black line connecting to GATA2) are shown. For the both networks the
thickness of the lines connecting nodes are proportional to the absolute value of the correlations
(and anti-correlations with GATA2). Colouring of node groups was performed as for figure 6a.
Supplementary Figure 7 | Transcription factor binding site enrichment by community.
Sets of genes from within the positively correlated GATA2 network that were identified as
belonging to sub-networks (using the algorithm of (Blondel et al., 2008)) were generated.
Ref_seq identifiers for these gene sets were used as input to the Pscan software3 to identify
over-represented human transcription factor binding sites (TFBS) in co-regulated genes. The
region selected for each gene was -1000 to +0 and Jaspar was used at the source of
transcription factor binding matrices. For each of the 7 sub-networks identified, enriched binding
sites for transcription factors with a p-value of less than 0.05 are shown.
Supplemental References
1.
Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE, et al. DNMT3A
mutations in acute myeloid leukemia. New England Journal of Medicine 2010; 363(25):
2424-2433.
2.
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in
large networks. Journal of Statistical Mechanics: Theory and Experiment 2008; 2008(10):
P10008.
3.
Zambelli F, Pesole G, Pavesi G. Pscan: finding over-represented transcription factor
binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic acids
research 2009; 37(suppl 2): W247-W252.
2. Supplementary Tables
Supplementary Table 1 | Clinical characteristics of CG-SH cell line and AML patients.
Data include FAB, French-American-British classification of AML, and mutational status of
nucleophosmin gene (NPM1), internal tandem duplications of FLT3 gene (FLT3-3-ITD) and
point mutations in DNMT3A (R882H). Average maximum allele bias seen at heterozygous sites
within GATA2 for each sample is noted. Among AML patients, 36 are informative (with
heterozygous SNPs, high GATA2 expression or exon coverage above coverage thresholds
used) and highlighted in dark.
Supplementary Table 2 | Normal cord blood CD34+ cells characteristics. Details of normal
cord blood cells collected for RNA-seq are shown. Sex and volume of blood obtained from each
donor are indicated along with total mononuclear cell (MNC) counts. Percentages of CD34+
cells pre and post FACS sorting are indicated.
Supplementary Table 3 | Occurrence of allele-specific expression in non-NK-AML
patients. Non-NK-AML patient data is shown for 15 samples including information about the
karyotype, presence of an R882 mutation in DNMT3a and an assessment of the presence of
allele-specific expression of GATA2 in each sample. Samples were considered to have allelespecific expression of GATA2 (“1” in the ASE column) if a known SNP (present in dbSNP135)
that was heterozygous by exome sequencing (minimum coverage of 10 reads with >3 reads for
each allele) showed 80% usage of a single allele. Patient samples where GATA2 gene
expression was too low to assess allelic usage or where only homozygous SNPs were present
were reported as non-informative (NI) in the table.
Supplementary Table 4 | Over-represented TFBS motifs in GATA2-related gene network.
The proximal promoter region of all genes positively (in red) and negatively (in green) correlated
to GATA2 or included in the global network were scanned directly for predicted transcription
factor binding sites (TFBS) using matrices from JASPAR database. The median and average
expression values (RPKM) of all over-represented transcription factors in normal karyotype AML
patients are indicated. Those transcription factors that are considered non-expressed (RPKM
<1) in normal karyotype AML are shown in grey boxes.
Supplementary Table 5 | Known coding variants in GATA2 and their predicted effects.
Impact of mutations in GATA2 predicted by the SIFT (Sorting Intolerant From Tolerant) and
Polyphen (Polymorphism Phenotyping) algorithms. For each SNP, the genomic position (hg19)
is shown along with the dbSNP reference SNP identifier. Minor allele frequency and validation
status reported in dbSNP, where available, are also shown. The SNPs predicted as probably
damaging by the PolyPhen software or deleterious by the SIFT software are marked in pink and
those predicted as possibly damaging by the PolyPhen software are marked in light pink.