* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplementary Information (doc 7548K)
Transcription factor wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Frameshift mutation wikipedia , lookup
Genetic drift wikipedia , lookup
X-inactivation wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Primary transcript wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of depression wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genomic imprinting wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression profiling wikipedia , lookup
SNP genotyping wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Point mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Supplementary information, Celton et al. Contents Supplementary Figure 1. Point mutations and insertion of GATA2 locus in CG-SH Supplementary Figure 2. Correlation between the mutational status of DNMT3A and the allele-specific expression of GATA2. Supplementary Figure 3. Validation of allele usage AML patients expressing DNMT3A R882. Supplementary Figure 4. DNA methylation differences from TCGA data. Supplementary Figure 5. Allele usage of H19 in DNMT3A R882 mutants and linked to hypermethylation. Supplementary Figure 6 Positive and negative gene correlation network diagram for GATA2. Supplementary Figure 7. Transcription factor binding site enrichment by community Supplemental References Supplementary Table 1. Clinical characteristics of CG-SH cell line and AML patients. Supplementary Table 2. Normal cord blood CD34+ cells characteristics. Supplementary Table 3. Occurrence of allele-specific expression in non-NK-AML patients. Supplementary Table 4. Over-represented TFBS motifs in GATA2-related gene network. Supplementary Table 5. Known coding variants in GATA2 and their predicted effects. Supplementary Figure 1 | Point mutations and insertion of GATA2 locus in CG-SH. a, SNPs and DNA rearrangements in both alleles of GATA2 in CG-SH cell line are shown. Genomic co-ordinates (hg19) for translational start (ATG), SNPs and start and end of duplicated region are shown above wildtype allele 1 (silent). Red triangles represent nonreference SNP alleles while white triangles match reference bases at these positions. The expressed allele 2 also has a novel point mutation and position 128,200,682 indicated and a duplication and reinsertion of part of the last intron and exon marked by a red box at the insertion point 128,199,926. b, Validation of allele-specific expression within GATA2 in CG-SH. Chromatograms from the direct sequencing of cDNA and gDNA fragments containing the two heterozygous novel and known SNPs (chr3: 128,200,682 and 128,199,662), confirm the allele-specific expression of these variants (arrows). Supplementary Figure 2 | Correlation between the mutational status of DNMT3A and the allelic bias in expression of GATA2. Screenshots taken from the Integrated Genome Viewer (IGV) illustrate the read coverage of gDNA and cDNA at heterozygous SNP positions from 4 NK-AML patients with wild type DNMT3A and 4 with mutation DNMT3A R882. A decrease, but not complete loss, of allele-specific expression of GATA2 can be seen in patients with the mutated form of DNMT3A (right column). S upplementary Figure 3 | Validation of allele usage AML patients expressing DNMT3A R882. Chromatograms from the direct sequencing of gDNA and cDNA fragments containing heterozygous SNPs (arrows) from 2 NK-AML patients with wild type DNMT3A and 2 with mutation DNMT3A R882 are shown. The results confirm the RNA-seq data showing allelespecific expression of SNPs in DNMT3A wild type patients and a decrease of allele-specific expression in DNMT3A R882 patients. Supplementary Figure 4 | TCGA DNA methylation data for DNMT3a mutants. Data from TCGA DNA methylation arrays (Illumina Infinium HumanMethylation450 beadchip) status across GALNT7, H19 and ESRP2 are shown. Red lines represent data from 21 patients with wildtype DNMT3a and red lines show data from 7 patients with DNMT3a-882 mutations. Each point represents the average signal for patients at 1 probe and “*” mark positions of probes with a significant difference (p-val <0.001) between DNMT3a wt and mutant patients. Exons for each gene are shown as green rectangles and black boxes represent CpG islands defined in the UCSC human genome browser (hg19). ESRP2 was identified previously as having significant methylation differences in patients with DNMT3a mutations compared to wildtype1. Supplementary Figure 5 | Allele usage of H19 in DNMT3A R882 mutants and linked to hypermethylation. a, Annotated position plot of allele usage for known SNP positions in H19 is shown for either the reference allele (green box) or alternate allele (red box) for the AML cell line CG-SH or adult NK-AML patients expressing either the wild type (top lines) or mutated (bottom lines) form of DNMT3A. The height of boxes is representative of the number of reads mapping and read totals are shown for each allele when both have 15% of the total. SNPs are identified by the position in H19 and by the dbSNP rsID number. b, Chromatograms from the direct sequencing of cDNA fragments of H19 gene containing 2 heterozygous SNPs (arrows) from CG-SH cells treated without (left) and with DAC (0.1uM for 3 days) (right). The second allele of H19 is expressed after hypomethylation treatment. a Supplementary Figure 6a | Positive and negative GATA2 co-expression network. Positive GATA2 expression network built based on the 30 genes most correlated with GATA2 (indicated by a *). Some genes of potential biological relevant genes to AML are indicated in boxes (e.g. WT1, MSI2 and DNMT3b). Colouring shown is based on sub-network discovery algorithm of Blondel et al.2 as implemented in Gephi software (see Material and methods). b Supplementary Figure 6b | Negative GATA2 co-expression network. A network of the 30 genes whose expression was most anti-correlated with GATA2 (indicated by * within node and a black line connecting to GATA2) are shown. For the both networks the thickness of the lines connecting nodes are proportional to the absolute value of the correlations (and anti-correlations with GATA2). Colouring of node groups was performed as for figure 6a. Supplementary Figure 7 | Transcription factor binding site enrichment by community. Sets of genes from within the positively correlated GATA2 network that were identified as belonging to sub-networks (using the algorithm of (Blondel et al., 2008)) were generated. Ref_seq identifiers for these gene sets were used as input to the Pscan software3 to identify over-represented human transcription factor binding sites (TFBS) in co-regulated genes. The region selected for each gene was -1000 to +0 and Jaspar was used at the source of transcription factor binding matrices. For each of the 7 sub-networks identified, enriched binding sites for transcription factors with a p-value of less than 0.05 are shown. Supplemental References 1. Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, Larson DE, et al. DNMT3A mutations in acute myeloid leukemia. New England Journal of Medicine 2010; 363(25): 2424-2433. 2. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008; 2008(10): P10008. 3. Zambelli F, Pesole G, Pavesi G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic acids research 2009; 37(suppl 2): W247-W252. 2. Supplementary Tables Supplementary Table 1 | Clinical characteristics of CG-SH cell line and AML patients. Data include FAB, French-American-British classification of AML, and mutational status of nucleophosmin gene (NPM1), internal tandem duplications of FLT3 gene (FLT3-3-ITD) and point mutations in DNMT3A (R882H). Average maximum allele bias seen at heterozygous sites within GATA2 for each sample is noted. Among AML patients, 36 are informative (with heterozygous SNPs, high GATA2 expression or exon coverage above coverage thresholds used) and highlighted in dark. Supplementary Table 2 | Normal cord blood CD34+ cells characteristics. Details of normal cord blood cells collected for RNA-seq are shown. Sex and volume of blood obtained from each donor are indicated along with total mononuclear cell (MNC) counts. Percentages of CD34+ cells pre and post FACS sorting are indicated. Supplementary Table 3 | Occurrence of allele-specific expression in non-NK-AML patients. Non-NK-AML patient data is shown for 15 samples including information about the karyotype, presence of an R882 mutation in DNMT3a and an assessment of the presence of allele-specific expression of GATA2 in each sample. Samples were considered to have allelespecific expression of GATA2 (“1” in the ASE column) if a known SNP (present in dbSNP135) that was heterozygous by exome sequencing (minimum coverage of 10 reads with >3 reads for each allele) showed 80% usage of a single allele. Patient samples where GATA2 gene expression was too low to assess allelic usage or where only homozygous SNPs were present were reported as non-informative (NI) in the table. Supplementary Table 4 | Over-represented TFBS motifs in GATA2-related gene network. The proximal promoter region of all genes positively (in red) and negatively (in green) correlated to GATA2 or included in the global network were scanned directly for predicted transcription factor binding sites (TFBS) using matrices from JASPAR database. The median and average expression values (RPKM) of all over-represented transcription factors in normal karyotype AML patients are indicated. Those transcription factors that are considered non-expressed (RPKM <1) in normal karyotype AML are shown in grey boxes. Supplementary Table 5 | Known coding variants in GATA2 and their predicted effects. Impact of mutations in GATA2 predicted by the SIFT (Sorting Intolerant From Tolerant) and Polyphen (Polymorphism Phenotyping) algorithms. For each SNP, the genomic position (hg19) is shown along with the dbSNP reference SNP identifier. Minor allele frequency and validation status reported in dbSNP, where available, are also shown. The SNPs predicted as probably damaging by the PolyPhen software or deleterious by the SIFT software are marked in pink and those predicted as possibly damaging by the PolyPhen software are marked in light pink.