Download Supplementary Methods and Legends

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Supplementary Methods
COLO-829 sequence analysis
COLO-829 mutation and call details used to identify putative cis-regulatory somatic
mutations were obtained from the Catalogue of Somatic Mutations (COSMIC) database (1,
2). For mutation validation, 75 bp paired-end Illumina raw sequencing reads for the COLO829 malignant melanoma cell-line were obtained with permission from the EGA
(https://www.ebi.ac.uk/ega/). Reads were trimmed using Trimmomatic (3) (reads kept if
quality score >20) and aligned against the hg19 human reference genome using BurrowsWheeler Aligner (BWA) (version 0.7.5) (4). Files were sorted using Novosort (version
1.03.01) (www.novocraft.com) and indexed with SAMtools (5). RNA-sequencing data was
obtained from the Cancer Cell Line Encyclopedia (CCLE) (6).
Cis-regulatory region annotation
DNase I hypersensitivity sites (DHS): DNase-seq
DNase-sequencing (DNase-seq) for COLO-829 were obtained as fastq files from the
ENCODE database (7) – GEO: GSM1008571. Alignment was performed with BWA (version
0.7.5) (4) using default parameters against the hg19 human reference genome. SAMtools (5)
was used to convert the files to the BAM file format, as well as for sorting and indexing.
Peaks were called using the findPeaks tool within the HOMER suite (8), with the style option
set to ‘dnase’.
Histone modifications: ChIP-seq
ChIP-seq datasets for the H3K4me3 histone modification were obtained for normal
penis foreskin melanocyte primary cells as BED files from the Gene Expression Omnibus
1
(GEO) database (9) (GEO: GSE16368). Peaks were called using the findPeaks tool within the
HOMER suite (8), with the style option set to ‘histone’.
HCT116 SP1 ChIP-seq peak data was viewed in the UCSC browser (replicate 2;
UCSC Accession: wgEncodeEH003221) and used in construction of Figure 4a.
COLO-829 mutation and gene annotation
MutSigCV analysis
MutSigCV data (10) was plotted in Figure 2, by the frequency of each given measure
of expression, replication time and non-coding mutation count. The relevant values
corresponding to each of the COLO-829 promoter mutations were also plotted.
Identification of putative promoter regions, mutations and associated genes
Putative cis-regulatory COLO-829 mutations were identified by overlapping the
mutation coordinates, using BEDtools (11), with the COLO-829 DNase-seq regions and the
relevant histone mark ChIP-seq data from normal melanocytes (Figure 1a). Putative promoter
regions were identified as those DHS sites overlapping H3K4me3 ChIP-seq peaks (12), and
within +/- 1 kB from the transcription start site (TSS) of any gene. TSS coordinates were
obtained from RefSeq (RefFlat gene list) (13). Putative promoter mutations were identified as
those falling into the promoter regions identified.
For putative promoter mutations, RefSeq (13) was used as described above to identify
the nearest gene(s) within +/- 1 kB of each mutation. For potential bi-directional promoters,
the two associated genes were assessed independently.
Validation of putative promoter region and mutation annotation
2
Figure S1a displays the distances of COLO-829 DHS peaks that overlap an H3K4me3 region
from the nearest TSS. TSS designations were obtained from RefSeq annotations (13).
COLO-829 mRNA expression data used in Figure S1b was obtained from the CCLE (6).
Groupings were made by matching genes with a TSS annotation from RefSeq (13) with genes
analysed by CCLE. ‘Genes with promoter DHS’ are those that have a COLO-829 DHS and a
melanocyte H3K4me3 peak within +/- 1 kB of the TSS. ‘Genes without promoter DHS’ are
all remaining genes that do not fulfil that criteria.
Fantom5 (14) TSS annotations used in Figure S1c were obtained from Fantom5’s
TSS_human.bed file. COLO-829 putative promoter mutations were identified using the
methodology described in Figure 1a.
Figure S1d depicts a bootstrap analysis of randomly selected mutations falling into COLO829 putative promoter regions. The COLO-829 cell-line had 32,901 mutations, of which
putative promoter mutations were identified as described in Figure 1a. A bootstrap analysis
was conducted using 1,000 iterations of 32,901 randomly selected mutations, with the
number recorded of putative promoter mutations that were identified using the methodology
previously described.
Mutation annotation by bioinformatic analysis
The 46-way phastCons score (15) from mammals was used for conservation analysis
of each mutated base, the region +/- 7 bp from the mutation, and the 150 bp DHS region
within which the mutation is located (Table 1, Figure 4b). Transcription factor motifs created
or removed by each mutation (Table 1, Figure 4a) were identified using OncoCis (16) which
utilises transcription factor motifs from the JASPAR database (17).
Mutation scoring
3
The scores for the likely functional impact of each non-coding mutation in COLO-829
(Table 1) were obtained from the webservers of RegulomeDB (18), FATHMM-MKL (19)
and FunSeq2 (20) using default parameters. A receiver operating characteristic (ROC) curve
was constructed using these scores and the findings of reporter assays to illustrate the
performance of each score in accurately determining the functional impact of the non-coding
mutations in this particular dataset (Figure S2).
Pathway analysis
QIAGEN’s
Ingenuity
Pathway
Analysis
(IPA,
QIAGEN,
USA,
www.qiagen.com/ingenuity) was used to identify the pathway(s) in which genes of interest
lie.
Analysis of mutations using DNase-seq and ChIP-seq data from unmatched cells
Mutations identified from melanocyte and COLO-829 DNase-seq data, shown at Figure S3a,
were found as described in Figure 1a. Expression in melanoma cell-lines (used in Figure S3b)
is displayed as a box and whisker plot which shows expression levels from 61 melanoma
cell-lines obtained from the CCLE (6).
Mutation identification Figure S3d was performed using the methodology outlined in Figure
1a, but using DNase-seq and ChIP-seq data from the cell-line listed in each case. DNase-seq
and H3K4me3 ChIP-seq datasets were obtained from the ENCODE database (7) and aligned
as previously described here – A549 (GEO: GSM736506
and GEO: GSM945244),
GM12878 (GEO: GSM736496 and GEO: GSM945188), HepG2 (GEO: GSM736639 and
GEO: GSM945182) and HCT116 (GEO: GSM736493 and GEO: GSM945304). Mutations
shown in Figure 3e were found by only including mutations that were within a 150 bp DHS
peak (using DNase-seq data from the cell-line listed in each case) which did not lie within
4
500 bp (+/- 175 bp) of a COLO-829 DHS region. H3K4me3 regions used were from the cellline listed in each case.
Reporter assays
Reporter constructs
Genomic DNA (gDNA) was isolated from COLO-829 melanoma and HCC1143
breast cancer cell-lines using proteinase K digestion and phenol-chloroform extraction (21).
The promoter region cloned for each reporter construct (Table S3) was selected to ensure that
the region was outside of any coding exon (with the exception of STK19 and DOM3Z
promoter constructs) with preference given for the most conserved section of the surrounding
DHS region. Primers were designed to isolate the selected regions (Table S6), and add
appropriate restriction enzyme sites (Table S3). For mutations annotated as homozygous
within the COSMIC database (1, 2), the wild-type plasmid construct for each region was
obtained from HCC1143 breast cancer gDNA, with all other plasmid constructs obtained
from COLO-829 melanoma gDNA (Table S3). Regions were cloned upstream of the firefly
luciferase gene of the promoter-less vector pGL2 Basic (Promega Corporation, WI, USA).
Mutations were validated and constructs verified via Sanger sequencing performed by the
Ramaciotti Centre for Genomics (UNSW Australia). Some plasmids contained SNPs, noted
in Table S5.
Cell culture conditions
COLO-829 cells were cultured in RPMI medium (Life Technologies, VIC, Australia)
supplemented with 10% fetal bovine serum, penicillin/streptomycin and glutamax. HCC1143
cells were cultured in RPMI medium (Life Technologies, VIC, Australia), supplemented with
15% FBS, glutamine, glutamax, penicillin/streptomycin, sodium pyruvate and Hepes buffer.
5
Normal melanocytes were grown in Medium 254 (melanocyte medium; Life Technologies,
VIC, Australia) supplemented with 1% Human Melanocyte Growth Supplement (Life
Technologies, VIC, Australia).
Reporter assays with Renilla normalisation
Plasmids were transfected into COLO-829 cells obtained from the Peter MacCallum
Cancer Centre (22) prior to plasmid preparation. COLO-829 authentication was performed by
validating the presence of COLO-829 mutations as reported in the COSMIC database (1, 2),
against those identified from promoter amplification and sequencing. For reporter assays with
Renilla normalisation, COLO-829 cells were seeded at a density of 8.9x103 cells/well in an
opaque 96-well plate. After approximately 24 hours, cells were transfected using
Lipofectamine 2000 (Life Technologies, VIC, Australia) with 73ng of wt-luc (wild-type
luciferase) or mut-luc (mutant luciferase) promoter construct and 8ng of pRL-TK (Promega
Corporation, WI, USA) to express Renilla luciferase. Approximately 48 hours later, cells
were lysed and assayed using the Dual-Glo Luciferase Reporter Assay System (Promega
Corporation, WI, USA) according to manufacturer’s instructions. Experiments were
performed in quadruplicate for each plasmid construct, along with positive and negative
controls. To control for transfection efficiency, relative luciferase activity was calculated as
the ratio of firefly to Renilla luciferase activity. All reporter constructs were tested in at least
two independent experiments. A mutant promoter region was determined to have different
activity from wild-type in cases in which the mutant produced altered promoter activity in the
same direction over three or more experiments, and was statistically significant at least once
(p<0.05, using unpaired t-tests). A promoter region was determined to have no activity in a
given experiment when the raw luciferase reading was less than twice that of the average
luciferase reading from the promoter-less vector pGL2 Basic (Promega Corporation, WI,
USA).
6
Recurrence of COLO-829 promoter mutations among cutaneous melanoma samples
34 whole-genome sequenced melanoma tumour and matched normal samples were
obtained from the Cancer Genomics Hub (CGHub) using GeneTorrent (version 3.8.6) (23).
Mutation calls were made directly from the downloaded BAM files using the Strelka pipeline
(24) for each tumour/normal sample pair. All variants considered to have passed the default
mutation calling threshold were used for further analysis. Cis-regulatory mutations were
identified in these samples in the manner described above with regard to COLO-829 (per
Figure 1a). A separate analysis was performed using melanocyte DNase-seq peaks, with
similar trends (data not shown) as identified in this research, suggesting that COLO-829
DNase-seq data are representative and suitable to use for this analysis.
432 TCGA whole exome sequencing cutaneous melanoma samples were obtained
from Cancer Genome Hub (CGHub) (23). SAMtools mpileup (5) was used to identify base
calls where the available sequence data covered the sites in which COLO-829 mutations were
identified, allowing a determination of recurrence in a larger sample size (Table S4).
NDUFB9 mutation analysis with specific wild-type and mutant groupings
Gene expression comparisons (Figure 4c) were performed using RNA-sequencing
data available from TCGA. Wild-type and mutant groupings were determined by actual
mutation calls from whole-exome sequenced data made using SAMtools mpileup (5). Wildtype samples were determined as those with C base calls and no T base calls at
chr8:125,551,344. Mutant samples were determined as those with >5 T base calls at
chr8:125,551,344, with all mutant samples found to be heterozygous.
Co-occurrence (Figure 4d) was analysed among samples with and without the
NDUFB9 promoter mutation, using samples from TCGA. The dataset used was that
7
generated by the Baylor College of Medicine. Samples were deemed mutated if they had at
least one non-silent protein-coding mutation in a gene from a list of commonly mutated
melanoma genes, as used in previous research (25). Significant associations (p<0.05) were
calculated by chi-square, with a two-tailed Fisher’s exact test, using counts of mutations in
each designated gene for NDFUB9 wild-type and mutant groupings.
The survival curve at Figure S4c was plotted using TCGA cutaneous melanoma samples,
segregated as previously described here. Survival is plotted for cohorts (n=34) with high (top
10%) and low (bottom 10%) NDUFB9 expression, with significance calculated using a logrank (Mantel-Cox) test. Plots were determined using clinical and RNA-sequencing data for
cutaneous melanomas from the TCGA.
8
Supplementary References
1.
Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue
of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004;91:355-8.
2.
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete
cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D94550. doi: 10.1093/nar/gkq929.
3.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data.
Bioinformatics. 2014;30:2114-20.
4.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics. 2009;25:1754-60.
5.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence
Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078-9.
6.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer
Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature.
2012;483:603-7.
7.
ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science.
2004;306:636-40.
8.
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of
lineage-determining transcription factors prime cis-regulatory elements required for macrophage
and B cell identities. Molecular cell. 2010;38:576-89.
9.
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and
hybridization array data repository. Nucleic Acids Res. 2002;30:207-10.
10.
Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational
heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214-8.
11.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features.
Bioinformatics. 2010;26:841-2.
12.
Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of
histone methylations in the human genome. Cell. 2007;129:823-37.
13.
Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an
update on mammalian reference sequences. Nucleic Acids Res. 2014;42:19.
14.
The FANTOM Consortium and the RIKEN PMI and CLST. A promoter-level mammalian
expression atlas. Nature. 2014;507:462-70.
15.
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al. Evolutionarily
conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:103450.
16.
Perera D, Chacon D, Thoms J, Poulos RC, Shlien A, Beck D, et al. OncoCis: annotation of cisregulatory mutations in cancer. Genome Biol. 2014;15:485.
17.
Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, et al. JASPAR 2014: an
extensively expanded and updated open-access database of transcription factor binding profiles.
Nucleic Acids Res. 2014;42:D142-7.
18.
Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of
functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790-7.
19.
Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, et al. An Integrative Approach
to Predicting the Functional Effects of Non-Coding and Coding Sequence Variation. Bioinformatics.
2015.
20.
Fu Y, Liu Z, Lou S, Bedford J, Mu X, Yip KY, et al. FunSeq2: A framework for prioritizing
noncoding regulatory variants in cancer. Genome Biol. 2014;15:480.
9
21.
Sambrook J, Russell DW. Purification of nucleic acids by extraction with phenol:chloroform.
CSH protocols. 2006;2006.
22.
Parmenter TJ, Kleinschmidt M, Kinross KM, Bond ST, Li J, Kaadige MR, et al. Response of
BRAF-mutant melanoma to BRAF inhibition is mediated by a network of transcriptional regulators of
glycolysis. Cancer discovery. 2014;4:423-33.
23.
Wilks C, Cline MS, Weiler E, Diehkans M, Craft B, Martin C, et al. The Cancer Genomics Hub
(CGHub): overcoming cancer through the power of torrential data. Database : the journal of
biological databases and curation. 2014;2014.
24.
Saunders CT, Wong WS, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic
small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811-7.
25.
Hodis E, Watson IR, Kryukov GV, Arold ST, Imielinski M, Theurillat JP, et al. A landscape of
driver mutations in melanoma. Cell. 2012;150:251-63.
10
Supplementary Figure and Table Legends
Figure S1. Validation of COLO-829 putative promoter region and mutation
designation.
(a) Distances of COLO-829 DNase I hypersensitive (DHS) peaks that overlap an H3K4me3
region, from the nearest transcription start site (TSS). A distance of zero from the TSS
indicates that the COLO-829 DHS directly overlaps the TSS. (b) COLO-829 gene expression
in genes with and without a DNase I hypersensitive (DHS) peak that overlaps a melanocyte
H3K4me3 peak. **** denotes p<0.0001. (c) Numbers of COLO-829 putative promoter
mutations identified using RefSeq and Fantom5 annotations for TSS. COLO-829 putative
promoter mutations were identified using the methodology described in Figure 1a. The
mutation only identified by RefSeq TSS data was at chr6:27,777,830, while the mutation only
identified by Fantom5 TSS data was at chr21:35,267,251. (d) Bootstrap analysis of randomly
selected mutations falling into COLO-829 putative promoter regions. This figure shows the
frequency of occurrence (percentage of 1,000 iterations) of each number of mutations to have
been found to fulfil the criteria (median=41). The grey bar at n=31 also indicates the actual
number of mutations found in the COLO-829 cell-line.
Figure S2. Receiver operating characteristic (ROC) curve for the RegulomeDB,
FATHMM-MKL and FunSeq2 scores attributed for a non-coding mutation’s likely
functional impact.
Figure S3. Analysis of putative promoter mutations in COLO-829 located using
matched and unmatched DNase-seq and ChIP-seq data.
(a) Numbers of candidate putative promoter mutations located using melanocyte and COLO829 DNase-seq data. (b) Expression in melanoma cell-lines, of genes containing COLO-829
putative promoter mutations identified only from melanocyte DNase-seq data. The
expression level in COLO-829 is indicated by a red dot.(c) LARP4 putative promoter
mutation in COLO-829, together with peaks from COLO-829 and melanocyte DNase-seq
data. Location of the LARP4 putative promoter mutation (chr12:50,794,576 G>A), along with
its position relative to COLO-829 and melanocyte DNase I hypersensitive (DHS) peaks.
Melanocyte H3K4me3 ChIP-seq peaks and PhastCons conservation within the region are also
shown. (d) Identification of COLO-829 putative promoter mutations using unmatched
DNase-seq and ChIP-seq data. COLO-829 mutations (listed in the left-most column) marked
in blue were found using DNase-seq and H3K4me3 ChIP-seq data from the cell-line at the
head of the column, using the methodology described in Figure 1a. COLO-829 mutations
marked in grey were not identified in the data from the cell-line at the head of the column. (e)
COLO-829 putative promoter mutations identified using unmatched DNase-seq and ChIP-seq
data, excluding those found using COLO-829 DNase-seq data (listed in Table 1). COLO-829
mutations (listed in the left-most column) marked in red were found using DNase-seq and
H3K4me3 ChIP-seq data from the cell-line at the head of the column, using the methodology
described in Figure 1a. COLO-829 mutations marked in grey were not identified in the data
from the cell-line at the head of the column.
Figure S4. NDUFB9 expression and survival in TCGA cutaneous melanoma sample
cohort.
NF1 mutants are those that contain non-silent protein-coding mutations per TCGA data. (a)
11
Comparison of NDUFB9 gene expression in NF1 wild-type and mutants for samples with
NDUFB9 promoter mutation. * denotes significance of p<0.05 by unpaired t-test. (b)
Comparison of NDUFB9 gene expression in NF1 wild-type and mutants for samples without
NDUFB9 promoter mutation. n.s. denotes no significant difference by unpaired t-test. (c)
Survival curve for TCGA cutaneous melanoma samples with high (top 10%) and low (bottom
10%) NDUFB9 gene expression. Survival is plotted for cohorts of n=34, with no significant
difference found by log-rank (Mantel-Cox) test.
Table S1. Mutation coordinates and associated genes for potential putative promoter
mutations identified in COLO-829.
Table S2. Number of promoter and total mutations in each of 34 whole-genome
sequenced cutaneous melanoma samples available from TCGA.
Table S3. Details of polymerase chain reaction (PCR) amplification and genomic DNA
used for reporter constructs.
Table S4. Base calls at each mutation site for the four COLO-829 promoter mutations
with changes observed in mutant promoter activity from wild-type by reporter assays.
Table S5. Single nucleotide polymorphisms (SNPs) present within reporter constructs.
Table S6. Primer sequences used for each candidate gene in polymerase chain reaction
(PCR) and quantitative polymerase chain reaction (qPCR) experiments.
12