Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hartmaier et al., Supplementary Methods Novel insights into tumorigenesis by genomic profiling of a large set of adult solid tumors Ryan Hartmaier1,#,*, Lee A. Albacker1,#, Juliann Chmielecki1,#, Mark Bailey1, Jie He1, Michael E. Goldberg1, Shakti Ramkissoon1, James Suh1, Julia Elvin1, Samuel Chiacchia1, Garrett M. Frampton1, Jeffrey S. Ross1,2, Vincent A. Miller1, Philip J. Stephens1, Doron Lipson1* 1Foundation 2Albany #These Medicine, Cambridge MA Medical College, Albany NY authors contributed equally to this work. *To whom correspondence should be addressed: [email protected] or [email protected] Conflicts of interest: All authors are employees of and equity holders in Foundation Medicine, Cambridge MA. 1 Hartmaier et al., Supplementary Methods Supplementary Methods Processing variant calls Resultant sequences were analyzed for base substitutions, insertions/deletions (indels), copy number alterations (focal amplifications and homozygous deletions) and select gene fusions, as previously described (1). Since tumor samples were sequenced without a corresponding matching normal sample, additional custom filtering was applied to highlight cancer-relevant alterations and reduce noise from benign germline events (Supplementary Fig S2). In brief, frequent germline variants from the 1000 Genomes Project (dbSNP142) were removed, and known confirmed somatic alterations deposited in the Catalog of Somatic Mutations in Cancer (COSMIC v62) were highlighted as biologically significant (2). The allele frequency cutoff was set at 1% and 5% for short variants present and not-present in COSMIC, respectively. Germline variants with two or more counts in the ExAC database (~0.0003% population frequency, http://exac.broadinstitute.org/) were removed, with the exception of known cancer driver germline events (e.g. documented hereditary BRCA1/2 and TP53 deleterious mutations). Additionally, recurrent variants of unknown significance (VUS) that were predicted to be germline were removed using an internally developed algorithm that assesses germline status based on allele frequency and tumor purity/ploidy in all relevant samples (Sun et al., in review 2016). All truncations and deletions in known tumor suppressor genes were also called as significant. To maximize mutation-detection accuracy (sensitivity and specificity) in impure clinical specimens, the test was previously optimized and validated to detect base substitutions at a ≥5% mutant allele frequency 2 Hartmaier et al., Supplementary Methods (MAF), indels with a ≥10% MAF, and copy number alterations at >20% tumor fraction with high accuracy (1). Analysis of TCGA data For copy number we used ‘CopyNumber_Gistic2.Level_4’ data. Gene-wise calls were obtained from the ‘all_thresholded.by_genes.txt’ file. Only Gistic2 calls of -2 (deletion) and +2 (amplification) were considered. Data for NKX2-1 was not available in the Gistic2 data files so data from SFTA3 (immediately adjacent to NKX2-1) was used. For mutations we used ‘MutSigNozzleReport2CV.Level_4’ data. Specific mutations were obtained from the ‘final_analysis_set.maf’ file. Only TCGA samples with both Gistic2 and MutSig data were considered. FMI disease classification terms were grouped similarly to TCGA diseases to compare the rate of alterations between the two datasets. Since the genes in the FMI dataset is a subset of TCGA, the top 15 most frequently altered genes from each disease group the FMI dataset were extracted from the TCGA data and compared to equivalent TCGA disease group via Fisher’s exact test. Mutation Hotspot Caller We performed a hotspot analysis of single nucleotide changes, including missense and nonsense, but excluding non-stop and frameshift mutations to determine whether an amino acid position in a given gene is a mutation hotspot. We utilized the strategy of Chang et. al. for determining the position specific mutability of a residue (equations 2-4 in Chang et. al.) with two methodological differences (3). Briefly, for each codon, we considered the nucleotides present in that codon, the number of possible amino acid changes, and the frequency of the six types of mononucleotide substitutions 3 Hartmaier et al., Supplementary Methods in that disease (note Chang et. al. uses tri-nucleotide context). The frequency of mononucleotide substitutions was calculated by totaling the number of non-silent mutations in the baited region for each disease and normalizing to one. The second difference was to eliminate heuristics in the calculation of the background mutation rate by grouping mutations according to amino acid position and transforming that data into a histogram with the number of amino acids with a given number of mutations. We then fit a Poisson distribution to the histogram as our estimate of the background mutation rate. This difference makes high incidence mutational hotspots inconsequential to the estimation of the background rate since they are relatively rare and amino acids with few mutations more important since they are relatively common. One-sided p-values were calculated as the position-dependent Poisson probability of observing at least as many mutations that occurred at a given amino acid position. Every gene was analyzed at the specific disease, group, and pan-cancer level. Family wise error rate (FWER) correction was performed using the Bonferroni method of multiplying the p-value by the number of tests (number of amino acids * number of diseases = ~9 x 106). This algorithm is similar to other published methods in that it controls for DNA sequence and mutation frequencies but uses a Poisson distribution to estimate the background mutation rate (3). References 1. Frampton GM, Fichtenholtz A, Otto GA, Wang K, Downing SR, He J, et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol [Internet]. Nature Publishing 4 Hartmaier et al., Supplementary Methods Group; 2013 [cited 2015 Aug 8];31:1023–31. Available from: http://www.nature.com/nbt/journal/v31/n11/full/nbt.2696.html#affil-auth 2. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res [Internet]. 2011 [cited 2016 Jun 24];39:D945-50. Available from: http://www.ncbi.nlm.nih.gov/pubmed/20952405 3. Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol [Internet]. 2016 [cited 2016 Nov 21];34:155–63. Available from: http://www.ncbi.nlm.nih.gov/pubmed/26619011 5