Download Supplementary Methods - Clinical Cancer Research

Supplementary Methods Patient material - discovery set Included patients were operated at the Skåne University Hospital in Lund, Sweden. No patient included in the study received neoadjuvant therapy prior to surgery. For patients in the discovery cohort smoking history was obtained from patients’ medical records and categorized into three groups; current-, former- or never-smoker. Followup data was obtained from the Swedish Cause of Death Register. For all cases, all relevant pathological slides were reviewed for re-evaluation and updating of the histological diagnoses and stages to be in adherence with recent international criteria and guidelines (1-4). The study was approved by the Regional Ethical Review Board in Lund, Sweden (Registration no. 2004/762 and 2008/702). Written informed consent was obtained from all patients diagnosed after 2004. For patients diagnosed earlier than 2004, study inclusion was approved by the Regional Ethical Review Board in Lund, Sweden, if patients (or their family members/survivors) not stated otherwise when they were informed about the study in 2006. EGFR and KRAS mutation analyses in the discovery cohort EGFR and KRAS mutations were analyzed by the Therascreen® EGFR or KRAS RGQ PCR Kit (Qiagen, Hilden, Germany) according to the manufacturer's protocol. Validation tumor cohorts In addition to the discovery cohort we analyzed 444 NSCLC tumors from Sandoval et al. (5), 373 adenocarcinomas from The Cancer Genome Atlas consortium (6), and 69 NSCLC cell lines from (7) (GSE36216), all profiled by the same Illumina 450K methylation platform. These cohorts were processed similarly as the discovery cohort if not stated otherwise below. Data for the TCGA cohort was accessed September 11, 2013. Global methylation analysis DNA and total RNA were extracted using the AllPrep DNA/RNA Mini Kit (Qiagen) according manufacturer’s instructions from a single tissue piece. 500 ng of DNA was subjected to bisulfite conversion using the EZ-96 DNA Methylation Kit (Zymo Research), with a modification to the manufacturer’s instructions using a 16 cycling of 95C for 30s followed by 50C for 1 hour according to recommendations from Illumina (8), in two 96-well plates balanced for sample histology, stage, sex, and smoking status. The entire amount of bisulfite converted DNA was subject to the “Illumina Infinium HD Methylation Assay” resulting in hybridization to the Human Methylation 450K v1.0 BeadChip according to the manufacturer’s instructions (Illumina) at SCIBLU Genomics, Lund University, Sweden. Peak normalization of Infinium I and II data Prior to correction of Infinium I and II probe intensity bias CpGs with detection pvalue < 0.01 were set as NA (missing value). Adjustment of bias between Infinium I and II CpG probes were performed by a peak normalization algorithm. Briefly, for each sample we performed a peak-based correction of Illumina I and II chemical assays similar to Dedeurwaerder et al. (9). For both assays we smoothed the beta values (Epanechnikov smoothing kernel) to estimate unmethylated and methylated peaks, respectively; and the unmethylated peak was moved to 0 and the methylated peak to 1 using linear scaling, with beta-values in between stretched accordingly. Beta-values below 0 were set back to 0 and values above 1 were set to 1. After correction, CpGs located on sex chromosomes were removed. Bisulfite plate adjustment of methylation data To remove any bias due to the processing of samples in different 96-well plates in the bisulfite conversion step we normalized beta-values for plate association. The experimental design included balancing the two 96-well plates used in the bisulfite conversion and subsequent labeling for tumor histology, stage, patient smoking status, and patient sex. Bisulfite plate 1 was selected as the arbitrary reference. Mean betavalues for each probe on plate 2 were set to the mean of corresponding probes on plate 1. Probes on plate 2 were then adjusted correspondingly to fit the new mean, and trimmed so that no probes extended outside [0,1] in beta-value. Principal component analysis was performed to verify that no technical artifacts caused systematic bias in the final data (10). Generation of copy number estimates from Illumina methylation beadchips In the discovery cohort, log2 copy number estimates for CpG probes were generated from unmethylated and methylated signals obtained from GenomeStudio (Illumina) for each tumor sample by: 1) quantile normalization of Cy3 and Cy5 Infinium I probes, 2) calculation of a summarized total intensity for each probe, and 3) dividing a tumor’s total intensity with corresponding average total intensity from the 12 normal tissues for each probe. Genomic profiles were partitioned using GLAD (11) and centralized similarly as described (12, 13). Calls of copy number gain and loss were made against fixed log2ratio thresholds of ±0.05. The fraction of the genome altered by copy number alterations (CN-FGA) was defined as described (14). Genomic profiles were screened for amplifications and verified when possible to matching cases analyzed on BAC aCGH from GSE29066 (13) to assure correctness. Copy number estimates for the Sandoval et al. cohort were generated as above with one exception. Instead of dividing each tumors total intensity with the corresponding average total intensity from the 12 normal tissues from the discovery cohort for each probe (point 3 above), the total intensity was divided by the average intensity of all Sandoval et al. tumor cases. The reason for this was that a large difference was observed in intensity between Sandoval et al. cases and the matched normal samples from the discovery cohort, and that no normal samples from the Sandoval et al. study were publicly available. Calls of copy number gain and loss were made against fixed log2ratio thresholds of ±0.1. The fraction of the genome altered by copy number alterations (CN-FGA) was defined as described (14). Identification of CpG probes with aberrant methylation compared to normal lung tissue To identify CpG probes with aberrant methylation in tumors compared to normal lung tissue we first identified 218821 CpGs being either methylated (beta-values >0.9) or demethylated (beta-values <0.1) across all 12 normal lung tissues included in the discovery cohort. From this CpG set we selected 4136 probes that displayed difference in methylation in ≥13 tumors compared to the normal tissues (either betavalues < 0.5 in ≥13 tumors compared to >0.9 in normals referred to as hypomethylated in tumors, or >0.5 in ≥13 tumors compared to <0.1 in normals, referred to as hypermethylated in tumors). In addition, we used different cut-offs, ranging from 5 to 20 tumors to assess robustness of bootstrap analysis and centroid prediction. Notably, the 12 matched normal samples comprised of a mix of males (n=3) and females (n=9), never-smokers (n=6) and smokers (n=6), and with a spread in patient age (range 57-82 years). All matched normal specimens came from patients with adenocarcinoma. In subsequent analyses, a CpG is termed as promoter-related if it has an Illumina annotation of TSS1500 or TS200, and a CpG island annotation. CpGs in repetitive elements were identified through the “repeats_rmsk_hg19” table from the UCSC Genome Browser. Bootstrap clustering of genome-wide methylation data Class-analysis was performed using bootstrap clustering as described (15, 16) using 2000 permutations, Euclidian distance and ward linkage. Briefly, for each bootstrap the hierarchical cluster dendrogram is cut into the specified number of groups and the assignment of samples to these groups is recorded. Then, for each pair of samples, the frequency with which the two samples have fallen into the same groups is calculated. The co-clustering frequency matrix is then reordered by hierarchical clustering to identify the methylation subtypes, i.e. subsets of samples that repeatedly cluster together. The clustering was performed using different number of cluster solutions and different sets of CpGs to investigate robustness. The final solution was based on 4136 CpGs and a five-group cluster solution. DNA methylation centroids representing bootstrap subgroups were created from the average beta-value for each CpG probe in respective subgroup. Samples in independent cohorts were assigned to the centroid with the smallest Euclidean distance for matching CpGs, representing a single-sample predictor. Copy number analysis Amount of copy number alterations, CN-FGA, was calculated as the number of CpGs/probes with copy number gain or loss divided by the total number of CpGs/probes for the platform (SNP6 or Beadchip). CAAI-scores were calculated for a tumor as described by Russnes et al. (12). A case was classified as CAAI positive if one or more chromosome arms were affected by complex alterations with a CAAI score >1 for samples in the discovery cohort, or >2 in the TCGA cohort. The reason for the difference in cut-off between the cohorts is due to the different platforms from which the copy number data was generated (Affymetrix SNP6 for TCGA, 450K methylation beadchips for the discovery cohort). The different platforms have different responses (platform-related characteristics) to copy number change (amplitude). This renders large systematic differences in the amplitude of copy number change (SNP6 higher, 450K lower) between the cohorts. The amplitude of copy number change is one important variable in the CAAI calculation. Global gene expression analysis Total RNA was obtained from the same tumor piece used for DNA extraction. Total RNA from 117 tumors in the discovery cohort were labeled in a 96-well format using the Total Prep-96 RNA amplification kit, hybridized to Illumina Human HT-12 V4 microarrays, and scanned according to manufacturer’s instructions (Illumina). Gene expression data were quantile normalized and mean-centered for each probe across all samples. Probe sets not having signal intensity above the median of negative control intensity signals in at least 80% of samples were excluded from analysis. TCGA adenocarcinoma expression data were obtained as level 3 RNASeq V2 data (6) and processed as described (17). Classification into adenocarcinoma and SqCC molecular subtypes (18, 19), and calculation of a CIN70 proliferation metagene (20), and terminal respiratory unit (TRU) metagene (21) were performed as described (22). Consensus clustering of adenocarcinomas in the discovery cohort was performed as recently described (17) using ConsensusClusterPlus (23), after filtering out probe sets with <0.5 in log2ratio standard deviation across all tumors, and probe sets without a single gene annotation. Correlated gene expression modules representing different tumor associated processes were derived as originally described by Fredlund et al. (24) in GSE29016 (25). Briefly, in the normalized expression data we first removed probes without a gene symbol, or probes with a LOCXXX gene symbol, then probes with a log2ratio standard deviation < 0.8 across the entire sample set. For remaining probes the Pearson correlation between each probe was calculated. Only probes with positive correlation >0.8 to at least four other probes were kept, and entered into Cytoscape (www.cytosacpe.org) for gene network analysis as described in Fredlund et al. (24). Six networks of genes were identified and labeled according to results from gene ontology analysis of participating genes (See Table S1). For identified gene networks, metagene scores were calculated in the gene expression data from the discovery and TCGA cohorts similar to the CIN70 score above. Differential gene expression between epitypes In the discovery cohort differentially expressed genes between epitypes (only using adenocarcinomas with matched gene expression) were detected using Kruskal-Wallis test with false discovery rate adjustment. Prior to the Kruskal-Wallis test, probes with an log2ratio standard deviation <0.5 across all tumors were removed. Only gene expression probes with false discovery rate < 0.05 were kept (n=1824). In the TCGA RNAseq data genes with log2ration standard deviation <1 across all tumors were removed, and differentially expressed between epitypes were detected as in the discovery cohort (n=5726 genes). Functional classification Gene Ontology enrichment were performed using the DAVID Functional Annotation Tool (26) with the default human population background and a Bonferroni-adjusted pvalue <0.05 as significance threshold. For gene expression analyses the discovery cohort was matched with illumina identifiers in DAVID, while for the TCGA cohort Entrez gene ids were used to map differentially expressed genes. For methylation data, CpG annotation data from Illumina was used to map CpGs with aberrant methylation to genes. DNA methylation and gene expression correlation analysis Correlation of DNA methylation and gene expression data was performed for the largest histology in the discovery cohort, adenocarcinoma (n=77 samples profiled by gene expression microarrays) using Spearman correlation. Prior to analysis gene expression probes with a log2ratio standard deviation < 0.5 across all samples were removed, together with CpG probes with a beta standard deviation <0.1 across all samples. If multiple gene expression probes existed for one gene, the probe with the highest standard deviation was chosen to represent the gene. CpGs were matched to gene expression data based on gene annotation, allowing multiple CpGs to be associated with one expression value creating in the end two matched matrices, one methylation matrix (m cpgs x n samples) and one gene expression matrix (m genes x n samples). A false discovery rate for correlations was calculated through permutation of sample labels (n=1000 permutations). CpG – gene expression correlations with a false discovery rate < 0.05 were kept. Mutation analyses in TCGA samples The used MAF file of somatic mutations for TCGA samples was accessed September 11, 2013. Calculation of transversion frequencies Frequencies of different mutation transversions, such as C>A, etc. were estimated using all mutations with a valid transversion included in the maf file. Calculation of total number of mutations The total number of mutations for a given sample was calculated as the sum of all mutations in the maf-file for that sample. MutSigCV analysis MutSigCV (27) analysis was performed on the maf file using default settings. Prior to analysis, we updated the gene.covariates.txt file with new values for the expression of a gene. The new value for each gene was set as the mean of log2(expression count+1) for all cases (row-means). 174 genes showed a q-value<0.05 and were used in further analyses. Permutation analysis to find mutated genes associated with groups To investigate association of mutations in the 174 significant genes from MutSigCV with epitypes we used a permutation-based approach to estimate a false discovery rate as outlined below. 1. All cases with at least 1 mutation in the maf file were selected. For these cases the total number of non-silent mutations per sample was calculated for the 174 genes. 2. For each of the 174 genes we calculated a Fisher’s exact p-value from the contingency table in question (analytical p-value). 3. For each of the 174 genes we created a permuted distribution of p-values by first randomizing mutations across samples. Specifically, we used probability weights equaling the vector of total number of non-silent mutations per sample (see 1 above) to mimick that different samples (and groups) have different mutation rates from start. We performed 10000 randomizations per gene, calculating a Fisher’s exact p-value for each. This created a distribution of 10000 permuted p-values for each gene. 4. All permuted p-values for the 174 genes (174*10000 values) were collected into a single distribution. 5. For each of the 174 genes we determined the number of ‘expected’ genes from the permuted distribution for the specific analytical p-value. 6. A false discovery rate was determined as the number of expected / observed genes at a given p-value. 7. Only genes with ≤1 expected gene from permutation analysis were analyzed further. References 1. Travis W.D. BE, Muller-Hermelink H.K., Harris C.C. (Eds.). World Health Organization Classification of Tumours. Pathology and Genetics of Tumours of the Lung, Pleura, Thymus and Heart. Lyon: IARC Press; 2004. 2. Travis WD, Brambilla E, Noguchi M, Nicholson AG, Geisinger K, Yatabe Y, et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society: international multidisciplinary classification of lung adenocarcinoma: executive summary. Proc Am Thorac Soc 2011;8: 381-5. 3. Goldstraw P. International Association for the Study of Lung Cancer (IASLC). Staging manual in thoracic oncology. Orange Park: Editorial RxPress; 2009. 4. Sobin L GM, Wittekind C. International Union Against Cancer (UICC). TNM classification of malignant tumours. 7th edn ed. Chichester: Wiley-Blackwell; 2009. 5. Sandoval J, Mendez-Gonzalez J, Nadal E, Chen G, Carmona FJ, Sayols S, et al. A Prognostic DNA Methylation Signature for Stage I Non-Small-Cell Lung Cancer. J Clin Oncol 2013. 6. The Cancer Genome Atlas. [cited; Available from: http://cancergenome.nih.gov/ 7. Walter K, Holcomb T, Januario T, Du P, Evangelista M, Kartha N, et al. DNA methylation profiling defines clinically relevant biological subsets of non-small cell lung cancer. Clin Cancer Res 2012;18: 2360-73. 8. Illumina. [cited; Available from: http://www.illumina.com 9. Dedeurwaerder S, Defrance M, Calonne E, Denis H, Sotiriou C, Fuks F. Evaluation of the Infinium Methylation 450K technology. Epigenomics 2011;3: 771-84. 10. Lauss M, Visne I, Kriegner A, Ringner M, Jonsson G, Hoglund M. Monitoring of technical variation in quantitative high-throughput datasets. Cancer Inform 2013;12: 193-201. 11. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E. Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 2004;20: 3413-22. 12. Russnes HG, Vollan HK, Lingjaerde OC, Krasnitz A, Lundin P, Naume B, et al. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients. Sci Transl Med 2010;2: 38ra47. 13. Staaf J, Isaksson S, Karlsson A, Jonsson M, Johansson L, Jonsson P, et al. Landscape of somatic allelic imbalances and copy number alterations in human lung carcinoma. International journal of cancer 2012;1: 2020-31. 14. Staaf J, Jonsson G, Ringner M, Baldetorp B, Borg A. Landscape of somatic allelic imbalances and copy number alterations in HER2-amplified breast cancer. Breast Cancer Res 2011;13: R129. 15. Lindgren D, Frigyesi A, Gudjonsson S, Sjodahl G, Hallden C, Chebil G, et al. Combined gene expression and genomic profiling define two intrinsic molecular subtypes of urothelial carcinoma and gene signatures for molecular grading and outcome. Cancer research 2010;70: 3463-72. 16. Lauss M, Aine M, Sjodahl G, Veerla S, Patschan O, Gudjonsson S, et al. DNA methylation analyses of urothelial carcinoma reveal distinct epigenetic subtypes and an association between gene copy number and methylation status. Epigenetics 2012;7: 858-67. 17. Karlsson A, Ringner M, Lauss M, Botling J, Micke P, Planck M, et al. Genomic and transcriptional alterations in lung adenocarcinoma in relation to smoking history. Clin Cancer Res 2014;DOI:10.1158/1078-0432.CCR-14-0246. 18. Wilkerson MD, Yin X, Walter V, Zhao N, Cabanski CR, Hayward MC, et al. Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation. PLoS ONE 2012;7: e36530. 19. Wilkerson MD, Yin X, Hoadley KA, Liu Y, Hayward MC, Cabanski CR, et al. Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types. Clin Cancer Res 2010;16: 4864-75. 20. Carter SL, Eklund AC, Kohane IS, Harris LN, Szallasi Z. A signature of chromosomal instability inferred from gene expression profiles predicts clinical outcome in multiple human cancers. Nature genetics 2006;38: 1043-8. 21. Takeuchi T, Tomida S, Yatabe Y, Kosaka T, Osada H, Yanagisawa K, et al. Expression profile-defined classification of lung adenocarcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. J Clin Oncol 2006;24: 1679-88. 22. Planck M, Edlund K, Botling J, Micke P, Isaksson S, Staaf J. Genomic and Transcriptional Alterations in Lung Adenocarcinoma in Relation to EGFR and KRAS Mutation Status. PLoS ONE 2013;8: e78614. 23. Wilkerson MD, Hayes DN. ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics 2010;26: 1572-3. 24. Fredlund E, Staaf J, Rantala JK, Kallioniemi O, Borg A, Ringner M. The gene expression landscape of breast cancer is shaped by tumor protein p53 status and epithelial-mesenchymal transition. Breast Cancer Res 2012;14: R113. 25. Staaf J, Jonsson G, Jonsson M, Karlsson A, Isaksson S, Salomonsson A, et al. Relation between smoking history and gene expression profiles in lung adenocarcinomas. BMC Med Genomics 2012;5: 22. 26. Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4: 44-57. 27. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancerassociated genes. Nature 2013;499: 214-8.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Supplementary Methods - Clinical Cancer Research