Download Hartmaier et al., Supplementary Methods Novel insights into

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Hartmaier et al., Supplementary Methods
Novel insights into tumorigenesis by genomic profiling of a large set of adult
solid tumors
Ryan Hartmaier1,#,*, Lee A. Albacker1,#, Juliann Chmielecki1,#, Mark Bailey1, Jie He1,
Michael E. Goldberg1, Shakti Ramkissoon1, James Suh1, Julia Elvin1, Samuel Chiacchia1,
Garrett M. Frampton1, Jeffrey S. Ross1,2, Vincent A. Miller1, Philip J. Stephens1, Doron
Lipson1*
1Foundation
2Albany
#These
Medicine, Cambridge MA
Medical College, Albany NY
authors contributed equally to this work.
*To whom correspondence should be addressed: [email protected]
or [email protected]
Conflicts of interest: All authors are employees of and equity holders in Foundation
Medicine, Cambridge MA.
1
Hartmaier et al., Supplementary Methods
Supplementary Methods
Processing variant calls
Resultant sequences were analyzed for base substitutions, insertions/deletions (indels),
copy number alterations (focal amplifications and homozygous deletions) and select
gene fusions, as previously described (1). Since tumor samples were sequenced without
a corresponding matching normal sample, additional custom filtering was applied to
highlight cancer-relevant alterations and reduce noise from benign germline events
(Supplementary Fig S2). In brief, frequent germline variants from the 1000 Genomes
Project (dbSNP142) were removed, and known confirmed somatic alterations deposited
in the Catalog of Somatic Mutations in Cancer (COSMIC v62) were highlighted as
biologically significant (2). The allele frequency cutoff was set at 1% and 5% for short
variants present and not-present in COSMIC, respectively. Germline variants with two or
more counts in the ExAC database (~0.0003% population frequency,
http://exac.broadinstitute.org/) were removed, with the exception of known cancer
driver germline events (e.g. documented hereditary BRCA1/2 and TP53 deleterious
mutations). Additionally, recurrent variants of unknown significance (VUS) that were
predicted to be germline were removed using an internally developed algorithm that
assesses germline status based on allele frequency and tumor purity/ploidy in all
relevant samples (Sun et al., in review 2016). All truncations and deletions in known
tumor suppressor genes were also called as significant. To maximize mutation-detection
accuracy (sensitivity and specificity) in impure clinical specimens, the test was previously
optimized and validated to detect base substitutions at a ≥5% mutant allele frequency
2
Hartmaier et al., Supplementary Methods
(MAF), indels with a ≥10% MAF, and copy number alterations at >20% tumor fraction
with high accuracy (1).
Analysis of TCGA data
For copy number we used ‘CopyNumber_Gistic2.Level_4’ data. Gene-wise calls
were obtained from the ‘all_thresholded.by_genes.txt’ file. Only Gistic2 calls of -2
(deletion) and +2 (amplification) were considered. Data for NKX2-1 was not available in
the Gistic2 data files so data from SFTA3 (immediately adjacent to NKX2-1) was used.
For mutations we used ‘MutSigNozzleReport2CV.Level_4’ data. Specific mutations were
obtained from the ‘final_analysis_set.maf’ file. Only TCGA samples with both Gistic2 and
MutSig data were considered. FMI disease classification terms were grouped similarly to
TCGA diseases to compare the rate of alterations between the two datasets. Since the
genes in the FMI dataset is a subset of TCGA, the top 15 most frequently altered genes
from each disease group the FMI dataset were extracted from the TCGA data and
compared to equivalent TCGA disease group via Fisher’s exact test.
Mutation Hotspot Caller
We performed a hotspot analysis of single nucleotide changes, including
missense and nonsense, but excluding non-stop and frameshift mutations to determine
whether an amino acid position in a given gene is a mutation hotspot. We utilized the
strategy of Chang et. al. for determining the position specific mutability of a residue
(equations 2-4 in Chang et. al.) with two methodological differences (3). Briefly, for each
codon, we considered the nucleotides present in that codon, the number of possible
amino acid changes, and the frequency of the six types of mononucleotide substitutions
3
Hartmaier et al., Supplementary Methods
in that disease (note Chang et. al. uses tri-nucleotide context). The frequency of
mononucleotide substitutions was calculated by totaling the number of non-silent
mutations in the baited region for each disease and normalizing to one. The second
difference was to eliminate heuristics in the calculation of the background mutation rate
by grouping mutations according to amino acid position and transforming that data into
a histogram with the number of amino acids with a given number of mutations. We then
fit a Poisson distribution to the histogram as our estimate of the background mutation
rate. This difference makes high incidence mutational hotspots inconsequential to the
estimation of the background rate since they are relatively rare and amino acids with
few mutations more important since they are relatively common. One-sided p-values
were calculated as the position-dependent Poisson probability of observing at least as
many mutations that occurred at a given amino acid position. Every gene was analyzed
at the specific disease, group, and pan-cancer level. Family wise error rate (FWER)
correction was performed using the Bonferroni method of multiplying the p-value by
the number of tests (number of amino acids * number of diseases = ~9 x 106). This
algorithm is similar to other published methods in that it controls for DNA sequence and
mutation frequencies but uses a Poisson distribution to estimate the background
mutation rate (3).
References
1.
Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al.
Development and validation of a clinical cancer genomic profiling test based on
massively parallel DNA sequencing. Nat Biotechnol [Internet]. Nature Publishing
4
Hartmaier et al., Supplementary Methods
Group; 2013 [cited 2015 Aug 8];31:1023–31. Available from:
http://www.nature.com/nbt/journal/v31/n11/full/nbt.2696.html#affil-auth
2.
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining
complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.
Nucleic Acids Res [Internet]. 2011 [cited 2016 Jun 24];39:D945-50. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/20952405
3.
Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, et al. Identifying
recurrent mutations in cancer reveals widespread lineage diversity and
mutational specificity. Nat Biotechnol [Internet]. 2016 [cited 2016 Nov
21];34:155–63. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26619011
5