Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Supplementary Methods COLO-829 sequence analysis COLO-829 mutation and call details used to identify putative cis-regulatory somatic mutations were obtained from the Catalogue of Somatic Mutations (COSMIC) database (1, 2). For mutation validation, 75 bp paired-end Illumina raw sequencing reads for the COLO829 malignant melanoma cell-line were obtained with permission from the EGA (https://www.ebi.ac.uk/ega/). Reads were trimmed using Trimmomatic (3) (reads kept if quality score >20) and aligned against the hg19 human reference genome using BurrowsWheeler Aligner (BWA) (version 0.7.5) (4). Files were sorted using Novosort (version 1.03.01) (www.novocraft.com) and indexed with SAMtools (5). RNA-sequencing data was obtained from the Cancer Cell Line Encyclopedia (CCLE) (6). Cis-regulatory region annotation DNase I hypersensitivity sites (DHS): DNase-seq DNase-sequencing (DNase-seq) for COLO-829 were obtained as fastq files from the ENCODE database (7) – GEO: GSM1008571. Alignment was performed with BWA (version 0.7.5) (4) using default parameters against the hg19 human reference genome. SAMtools (5) was used to convert the files to the BAM file format, as well as for sorting and indexing. Peaks were called using the findPeaks tool within the HOMER suite (8), with the style option set to ‘dnase’. Histone modifications: ChIP-seq ChIP-seq datasets for the H3K4me3 histone modification were obtained for normal penis foreskin melanocyte primary cells as BED files from the Gene Expression Omnibus 1 (GEO) database (9) (GEO: GSE16368). Peaks were called using the findPeaks tool within the HOMER suite (8), with the style option set to ‘histone’. HCT116 SP1 ChIP-seq peak data was viewed in the UCSC browser (replicate 2; UCSC Accession: wgEncodeEH003221) and used in construction of Figure 4a. COLO-829 mutation and gene annotation MutSigCV analysis MutSigCV data (10) was plotted in Figure 2, by the frequency of each given measure of expression, replication time and non-coding mutation count. The relevant values corresponding to each of the COLO-829 promoter mutations were also plotted. Identification of putative promoter regions, mutations and associated genes Putative cis-regulatory COLO-829 mutations were identified by overlapping the mutation coordinates, using BEDtools (11), with the COLO-829 DNase-seq regions and the relevant histone mark ChIP-seq data from normal melanocytes (Figure 1a). Putative promoter regions were identified as those DHS sites overlapping H3K4me3 ChIP-seq peaks (12), and within +/- 1 kB from the transcription start site (TSS) of any gene. TSS coordinates were obtained from RefSeq (RefFlat gene list) (13). Putative promoter mutations were identified as those falling into the promoter regions identified. For putative promoter mutations, RefSeq (13) was used as described above to identify the nearest gene(s) within +/- 1 kB of each mutation. For potential bi-directional promoters, the two associated genes were assessed independently. Validation of putative promoter region and mutation annotation 2 Figure S1a displays the distances of COLO-829 DHS peaks that overlap an H3K4me3 region from the nearest TSS. TSS designations were obtained from RefSeq annotations (13). COLO-829 mRNA expression data used in Figure S1b was obtained from the CCLE (6). Groupings were made by matching genes with a TSS annotation from RefSeq (13) with genes analysed by CCLE. ‘Genes with promoter DHS’ are those that have a COLO-829 DHS and a melanocyte H3K4me3 peak within +/- 1 kB of the TSS. ‘Genes without promoter DHS’ are all remaining genes that do not fulfil that criteria. Fantom5 (14) TSS annotations used in Figure S1c were obtained from Fantom5’s TSS_human.bed file. COLO-829 putative promoter mutations were identified using the methodology described in Figure 1a. Figure S1d depicts a bootstrap analysis of randomly selected mutations falling into COLO829 putative promoter regions. The COLO-829 cell-line had 32,901 mutations, of which putative promoter mutations were identified as described in Figure 1a. A bootstrap analysis was conducted using 1,000 iterations of 32,901 randomly selected mutations, with the number recorded of putative promoter mutations that were identified using the methodology previously described. Mutation annotation by bioinformatic analysis The 46-way phastCons score (15) from mammals was used for conservation analysis of each mutated base, the region +/- 7 bp from the mutation, and the 150 bp DHS region within which the mutation is located (Table 1, Figure 4b). Transcription factor motifs created or removed by each mutation (Table 1, Figure 4a) were identified using OncoCis (16) which utilises transcription factor motifs from the JASPAR database (17). Mutation scoring 3 The scores for the likely functional impact of each non-coding mutation in COLO-829 (Table 1) were obtained from the webservers of RegulomeDB (18), FATHMM-MKL (19) and FunSeq2 (20) using default parameters. A receiver operating characteristic (ROC) curve was constructed using these scores and the findings of reporter assays to illustrate the performance of each score in accurately determining the functional impact of the non-coding mutations in this particular dataset (Figure S2). Pathway analysis QIAGEN’s Ingenuity Pathway Analysis (IPA, QIAGEN, USA, www.qiagen.com/ingenuity) was used to identify the pathway(s) in which genes of interest lie. Analysis of mutations using DNase-seq and ChIP-seq data from unmatched cells Mutations identified from melanocyte and COLO-829 DNase-seq data, shown at Figure S3a, were found as described in Figure 1a. Expression in melanoma cell-lines (used in Figure S3b) is displayed as a box and whisker plot which shows expression levels from 61 melanoma cell-lines obtained from the CCLE (6). Mutation identification Figure S3d was performed using the methodology outlined in Figure 1a, but using DNase-seq and ChIP-seq data from the cell-line listed in each case. DNase-seq and H3K4me3 ChIP-seq datasets were obtained from the ENCODE database (7) and aligned as previously described here – A549 (GEO: GSM736506 and GEO: GSM945244), GM12878 (GEO: GSM736496 and GEO: GSM945188), HepG2 (GEO: GSM736639 and GEO: GSM945182) and HCT116 (GEO: GSM736493 and GEO: GSM945304). Mutations shown in Figure 3e were found by only including mutations that were within a 150 bp DHS peak (using DNase-seq data from the cell-line listed in each case) which did not lie within 4 500 bp (+/- 175 bp) of a COLO-829 DHS region. H3K4me3 regions used were from the cellline listed in each case. Reporter assays Reporter constructs Genomic DNA (gDNA) was isolated from COLO-829 melanoma and HCC1143 breast cancer cell-lines using proteinase K digestion and phenol-chloroform extraction (21). The promoter region cloned for each reporter construct (Table S3) was selected to ensure that the region was outside of any coding exon (with the exception of STK19 and DOM3Z promoter constructs) with preference given for the most conserved section of the surrounding DHS region. Primers were designed to isolate the selected regions (Table S6), and add appropriate restriction enzyme sites (Table S3). For mutations annotated as homozygous within the COSMIC database (1, 2), the wild-type plasmid construct for each region was obtained from HCC1143 breast cancer gDNA, with all other plasmid constructs obtained from COLO-829 melanoma gDNA (Table S3). Regions were cloned upstream of the firefly luciferase gene of the promoter-less vector pGL2 Basic (Promega Corporation, WI, USA). Mutations were validated and constructs verified via Sanger sequencing performed by the Ramaciotti Centre for Genomics (UNSW Australia). Some plasmids contained SNPs, noted in Table S5. Cell culture conditions COLO-829 cells were cultured in RPMI medium (Life Technologies, VIC, Australia) supplemented with 10% fetal bovine serum, penicillin/streptomycin and glutamax. HCC1143 cells were cultured in RPMI medium (Life Technologies, VIC, Australia), supplemented with 15% FBS, glutamine, glutamax, penicillin/streptomycin, sodium pyruvate and Hepes buffer. 5 Normal melanocytes were grown in Medium 254 (melanocyte medium; Life Technologies, VIC, Australia) supplemented with 1% Human Melanocyte Growth Supplement (Life Technologies, VIC, Australia). Reporter assays with Renilla normalisation Plasmids were transfected into COLO-829 cells obtained from the Peter MacCallum Cancer Centre (22) prior to plasmid preparation. COLO-829 authentication was performed by validating the presence of COLO-829 mutations as reported in the COSMIC database (1, 2), against those identified from promoter amplification and sequencing. For reporter assays with Renilla normalisation, COLO-829 cells were seeded at a density of 8.9x103 cells/well in an opaque 96-well plate. After approximately 24 hours, cells were transfected using Lipofectamine 2000 (Life Technologies, VIC, Australia) with 73ng of wt-luc (wild-type luciferase) or mut-luc (mutant luciferase) promoter construct and 8ng of pRL-TK (Promega Corporation, WI, USA) to express Renilla luciferase. Approximately 48 hours later, cells were lysed and assayed using the Dual-Glo Luciferase Reporter Assay System (Promega Corporation, WI, USA) according to manufacturer’s instructions. Experiments were performed in quadruplicate for each plasmid construct, along with positive and negative controls. To control for transfection efficiency, relative luciferase activity was calculated as the ratio of firefly to Renilla luciferase activity. All reporter constructs were tested in at least two independent experiments. A mutant promoter region was determined to have different activity from wild-type in cases in which the mutant produced altered promoter activity in the same direction over three or more experiments, and was statistically significant at least once (p<0.05, using unpaired t-tests). A promoter region was determined to have no activity in a given experiment when the raw luciferase reading was less than twice that of the average luciferase reading from the promoter-less vector pGL2 Basic (Promega Corporation, WI, USA). 6 Recurrence of COLO-829 promoter mutations among cutaneous melanoma samples 34 whole-genome sequenced melanoma tumour and matched normal samples were obtained from the Cancer Genomics Hub (CGHub) using GeneTorrent (version 3.8.6) (23). Mutation calls were made directly from the downloaded BAM files using the Strelka pipeline (24) for each tumour/normal sample pair. All variants considered to have passed the default mutation calling threshold were used for further analysis. Cis-regulatory mutations were identified in these samples in the manner described above with regard to COLO-829 (per Figure 1a). A separate analysis was performed using melanocyte DNase-seq peaks, with similar trends (data not shown) as identified in this research, suggesting that COLO-829 DNase-seq data are representative and suitable to use for this analysis. 432 TCGA whole exome sequencing cutaneous melanoma samples were obtained from Cancer Genome Hub (CGHub) (23). SAMtools mpileup (5) was used to identify base calls where the available sequence data covered the sites in which COLO-829 mutations were identified, allowing a determination of recurrence in a larger sample size (Table S4). NDUFB9 mutation analysis with specific wild-type and mutant groupings Gene expression comparisons (Figure 4c) were performed using RNA-sequencing data available from TCGA. Wild-type and mutant groupings were determined by actual mutation calls from whole-exome sequenced data made using SAMtools mpileup (5). Wildtype samples were determined as those with C base calls and no T base calls at chr8:125,551,344. Mutant samples were determined as those with >5 T base calls at chr8:125,551,344, with all mutant samples found to be heterozygous. Co-occurrence (Figure 4d) was analysed among samples with and without the NDUFB9 promoter mutation, using samples from TCGA. The dataset used was that 7 generated by the Baylor College of Medicine. Samples were deemed mutated if they had at least one non-silent protein-coding mutation in a gene from a list of commonly mutated melanoma genes, as used in previous research (25). Significant associations (p<0.05) were calculated by chi-square, with a two-tailed Fisher’s exact test, using counts of mutations in each designated gene for NDFUB9 wild-type and mutant groupings. The survival curve at Figure S4c was plotted using TCGA cutaneous melanoma samples, segregated as previously described here. Survival is plotted for cohorts (n=34) with high (top 10%) and low (bottom 10%) NDUFB9 expression, with significance calculated using a logrank (Mantel-Cox) test. Plots were determined using clinical and RNA-sequencing data for cutaneous melanomas from the TCGA. 8 Supplementary References 1. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355-8. 2. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D94550. doi: 10.1093/nar/gkq929. 3. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114-20. 4. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754-60. 5. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078-9. 6. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603-7. 7. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636-40. 8. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular cell. 2010;38:576-89. 9. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207-10. 10. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214-8. 11. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841-2. 12. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823-37. 13. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:19. 14. The FANTOM Consortium and the RIKEN PMI and CLST. A promoter-level mammalian expression atlas. Nature. 2014;507:462-70. 15. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:103450. 16. Perera D, Chacon D, Thoms J, Poulos RC, Shlien A, Beck D, et al. OncoCis: annotation of cisregulatory mutations in cancer. Genome Biol. 2014;15:485. 17. Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142-7. 18. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790-7. 19. Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, et al. An Integrative Approach to Predicting the Functional Effects of Non-Coding and Coding Sequence Variation. Bioinformatics. 2015. 20. Fu Y, Liu Z, Lou S, Bedford J, Mu X, Yip KY, et al. FunSeq2: A framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 2014;15:480. 9 21. Sambrook J, Russell DW. Purification of nucleic acids by extraction with phenol:chloroform. CSH protocols. 2006;2006. 22. Parmenter TJ, Kleinschmidt M, Kinross KM, Bond ST, Li J, Kaadige MR, et al. Response of BRAF-mutant melanoma to BRAF inhibition is mediated by a network of transcriptional regulators of glycolysis. Cancer discovery. 2014;4:423-33. 23. Wilks C, Cline MS, Weiler E, Diehkans M, Craft B, Martin C, et al. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data. Database : the journal of biological databases and curation. 2014;2014. 24. Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811-7. 25. Hodis E, Watson IR, Kryukov GV, Arold ST, Imielinski M, Theurillat JP, et al. A landscape of driver mutations in melanoma. Cell. 2012;150:251-63. 10 Supplementary Figure and Table Legends Figure S1. Validation of COLO-829 putative promoter region and mutation designation. (a) Distances of COLO-829 DNase I hypersensitive (DHS) peaks that overlap an H3K4me3 region, from the nearest transcription start site (TSS). A distance of zero from the TSS indicates that the COLO-829 DHS directly overlaps the TSS. (b) COLO-829 gene expression in genes with and without a DNase I hypersensitive (DHS) peak that overlaps a melanocyte H3K4me3 peak. **** denotes p<0.0001. (c) Numbers of COLO-829 putative promoter mutations identified using RefSeq and Fantom5 annotations for TSS. COLO-829 putative promoter mutations were identified using the methodology described in Figure 1a. The mutation only identified by RefSeq TSS data was at chr6:27,777,830, while the mutation only identified by Fantom5 TSS data was at chr21:35,267,251. (d) Bootstrap analysis of randomly selected mutations falling into COLO-829 putative promoter regions. This figure shows the frequency of occurrence (percentage of 1,000 iterations) of each number of mutations to have been found to fulfil the criteria (median=41). The grey bar at n=31 also indicates the actual number of mutations found in the COLO-829 cell-line. Figure S2. Receiver operating characteristic (ROC) curve for the RegulomeDB, FATHMM-MKL and FunSeq2 scores attributed for a non-coding mutation’s likely functional impact. Figure S3. Analysis of putative promoter mutations in COLO-829 located using matched and unmatched DNase-seq and ChIP-seq data. (a) Numbers of candidate putative promoter mutations located using melanocyte and COLO829 DNase-seq data. (b) Expression in melanoma cell-lines, of genes containing COLO-829 putative promoter mutations identified only from melanocyte DNase-seq data. The expression level in COLO-829 is indicated by a red dot.(c) LARP4 putative promoter mutation in COLO-829, together with peaks from COLO-829 and melanocyte DNase-seq data. Location of the LARP4 putative promoter mutation (chr12:50,794,576 G>A), along with its position relative to COLO-829 and melanocyte DNase I hypersensitive (DHS) peaks. Melanocyte H3K4me3 ChIP-seq peaks and PhastCons conservation within the region are also shown. (d) Identification of COLO-829 putative promoter mutations using unmatched DNase-seq and ChIP-seq data. COLO-829 mutations (listed in the left-most column) marked in blue were found using DNase-seq and H3K4me3 ChIP-seq data from the cell-line at the head of the column, using the methodology described in Figure 1a. COLO-829 mutations marked in grey were not identified in the data from the cell-line at the head of the column. (e) COLO-829 putative promoter mutations identified using unmatched DNase-seq and ChIP-seq data, excluding those found using COLO-829 DNase-seq data (listed in Table 1). COLO-829 mutations (listed in the left-most column) marked in red were found using DNase-seq and H3K4me3 ChIP-seq data from the cell-line at the head of the column, using the methodology described in Figure 1a. COLO-829 mutations marked in grey were not identified in the data from the cell-line at the head of the column. Figure S4. NDUFB9 expression and survival in TCGA cutaneous melanoma sample cohort. NF1 mutants are those that contain non-silent protein-coding mutations per TCGA data. (a) 11 Comparison of NDUFB9 gene expression in NF1 wild-type and mutants for samples with NDUFB9 promoter mutation. * denotes significance of p<0.05 by unpaired t-test. (b) Comparison of NDUFB9 gene expression in NF1 wild-type and mutants for samples without NDUFB9 promoter mutation. n.s. denotes no significant difference by unpaired t-test. (c) Survival curve for TCGA cutaneous melanoma samples with high (top 10%) and low (bottom 10%) NDUFB9 gene expression. Survival is plotted for cohorts of n=34, with no significant difference found by log-rank (Mantel-Cox) test. Table S1. Mutation coordinates and associated genes for potential putative promoter mutations identified in COLO-829. Table S2. Number of promoter and total mutations in each of 34 whole-genome sequenced cutaneous melanoma samples available from TCGA. Table S3. Details of polymerase chain reaction (PCR) amplification and genomic DNA used for reporter constructs. Table S4. Base calls at each mutation site for the four COLO-829 promoter mutations with changes observed in mutant promoter activity from wild-type by reporter assays. Table S5. Single nucleotide polymorphisms (SNPs) present within reporter constructs. Table S6. Primer sequences used for each candidate gene in polymerase chain reaction (PCR) and quantitative polymerase chain reaction (qPCR) experiments. 12