Download Whole Genome Low Pass Sequencing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
SUPPLEMENTARY INFORMATION
2
3
METHODS
4
5
Tumor sample collection and cell culture
6
Resected brain tumor specimens were collected at Henry Ford Hospital (Detroit, MI) with written
7
informed consent from patients, under a protocol approved by the Henry Ford Hospital
8
Institutional Review Board, and graded pathologically according to the WHO criteria. A portion
9
of each tumor specimen was snap frozen and stored in liquid nitrogen. An adjacent portion was
10
used for cell culture. Tumors are dissociated enzymatically and neurospheres enriched in
11
cancer stem-like cells (CSC) were cultured, as described in detail 1,2. Neurosphere cultures
12
were serially passaged in vitro.No mycoplasma contamination was identified in the subset of
13
samples tested. Cells with passages between 7 and 18 were used for mouse implants and
14
molecular analysis, except for those designed “high passage”, where passage 40 was used.
15
16
Patient derived xenografts (PDX)
17
Orthotopic xenografts: Following IACUC guidelines in an institutionally approved animal use
18
protocol, GBM neurosphere cell suspensions were implanted into 8-week old female nude mice
19
(NCRNU, Taconic Farms) as described 3. A minimum of 8 mice were implanted with each
20
neurosphere line. Animals were anesthetized with a mixture of ketamine and xylazine.
21
Dissociated neurosphere cells (3x105) were injected using a Hamilton syringe at a defined
22
intracranial location: AP+1.0, ML+2.5, DV-3.0. Animals were monitored daily by an observer
23
blinded to the group allocation and sacrificed upon first signs of neurological deficit or weight
24
loss greater than 20%. Brains were harvested, placed in a coronal matrix for 2 mm sections,
25
with the first cut across the implant site. Brain sections were alternately frozen in dry ice and
26
embedded in OCT for storage at -80oC, or formalin fixed and paraffin embedded (FFPE).
27
Subcutaneous xenografts: Dissociated neurosphere cells (1x106) were injected in the flank of
28
nude mice. Animals were sacrificed and tumors excised when diameter reached 10 mm.
29
Drug Treatment: HF3077 PDXs were treated with capmatinib (purchased from Matrix Scientific
30
Products (Columbia, SC)) suspensions in 0.5% methylcellulose/0.1% Tween 80 were prepared
31
every week and administered by oral gavage using a 20g x1.5” gavage needle (Cadence) at a
32
dose of 30 mg/kg once a day (5 days/week) until the end of the study. Control animals received
1
33
vehicle only mock gavage. Forty-five days after implant, animals were randomized to control or
34
treatment groups. Each mouse was followed until death with no censoring and mean survival
35
differences were estimated using a t-distribution to estimate 95% confidence intervals. With a
36
sample size of 9 mice per group, a two-sided 95% confidence interval for the difference in mean
37
survival would extend 0.92SD from the observed difference in mean survival, assuming the CI is
38
based on large sample z statistic. Equivalently we expected 80% power to detect a difference in
39
mean survival of 1.4SD, for the common standard deviation, when n=9 animals per group and
40
alpha=0.05. Animals were monitored daily and sacrificed upon first signs of neurological deficit
41
or weight loss greater than 20%. Control animals were administered vehicle. Kaplan-Meier
42
Survival curves were compared by log-rank test.
43
To evaluate brain penetrance of capmatinib, 2h after administration of the last capmatinib 30
44
mg/kg dose, blood samples were drawn, animals were sacrificed and brains were harvested
45
and 2mm coronal sections were frozen in OCT. Tumor tissue was dissected from the frozen
46
blocks. Capmatinib concentration in homogenized tumor tissue and plasma was determined for
47
3 treated animals and one control was quantified by LC-MS/MS .
48
49
Xenograft tumor macrodissection of frozen tissue
50
Brain samples of 3 randomly selected animals per xenograft line were used. Frozen 2mm
51
coronal sections were transferred to a cryostat (Cryotome E, ThermoElectronCorporation) set to
52
-16oC. Six m sections were cut and stained with hematoxylin, to locate the tumor. Tumor tissue
53
was excised from the frozen block with a scalpel into a pre-chilled microtube and stored at -
54
80oC.
55
56
Nucleic Acids isolation
57
Genomic DNA was isolated from frozen tumor samples, macrodissected xenograft tumor (3
58
biological replicates), and neurosphere cultures using QIAamp DNA mini Kit (Qiagen #51304),
59
with on column RNase A digestion, following manufacturer instructions. DNA was isolated from
60
blood using DNA QIAamp Blood kit (Qiagen).
61
Total RNA was extracted from frozen tumor samples, macrodissected xenograft tumor (3
62
biological replicates), and neurosphere culture using MirVana (Ambion # AM1560), followed by
63
DNAse treatment using DNA-free (Ambion AM1906).
64
65
Fluorescence in situ hybridization (FISH)
2
66
FISH on matching tumor samples/neurospheres/PDX: FISH probes were prepared from purified
67
BAC clones (BACPAC Resource Center, https://bacpacresources.org). Probes were labeled
68
with Orange-dUTP or with Green-dUTP (Abbott Molecular Inc., Abbott Park, IL), by nick
69
translation.
Locus
Human BAC Clones:
12q13.3 - q14.1 (CDK4 gene)
7p11.2 (EGFR gene)
8q24.21 (MYC gene)
7q31 (MET gene)
4q12 (PDGFRA gene)
4q11 (Ch. 4 control)
7q11.22, (Ch 7 control)
8q11.21 (Ch. 8 control)
RP11-181L23, RP11-571M6, RP11-277A02
RP11-708P5, CTD-2026N22, RP11-148P17
CTD-3056O22
RP11-95I20, RP11-564A14, RP11-39K12
RP11-58C6 and RP11-977G3
RP11-365H22
RP11-747K2, RP11-668K3
CH17-311E13 and CH17-425G9
70
71
Metaphase slides were prepared from neurosphere cell cultures that were harvested and fixed in
72
methanol:acetic acid (3:1), according to standard cytogenetic procedures. Tumor touch
73
preparations were prepared by imprinting thawed tumor tissue onto positively-charged glass
74
slides and fixing them in methanol:acetic acid (3:1) for 30 min then air-dried. Frozen tumor and
75
macrodissected xenograft tumor samples were prepared as described 4. The FISH probes were
76
denatured at 75 °C for 5 min and held at 37 °C for 10-30 min until 10 ul of probe was applied to
77
each sample slide. Slides were coverslipped and hybridized overnight at 37 °C in the
78
ThermoBrite hybridization system (Abbott Molecular Inc.). The posthybridization wash was with
79
2X SSC/0.2% TWEEN 20 at 73 °C for 3 min followed by a brief water rinse. Slides were air-dried
80
and then counterstained with VECTASHIELD mounting medium with 4'-6-diamidino-2-
81
phenylindole (DAPI) (Vector Laboratories Inc., Burlingame, CA).
82
Image acquisition was performed at 1000x system magnification with a COOL-1300
83
SpectraCube camera (Applied Spectral Imaging-ASI, Vista, CA) mounted on an Olympus BX43
84
microscope. Images were analyzed using FISHView v7 software (ASI) and 100 - 200
85
interphase nuclei were scored for each sample in addition to analysis of 50 - 100 metaphase
86
spreads for each cell line.
87
FISH on paired primary/recurrent FFPE gliomas: Fluorescence in situ assay was performed
88
using RPS6/Con 9, CDK4/Con 12, EGFR/con 7, MYC/ con 8, PDGFRA/con 4, C-Met/con 7,
89
TERT/Con 5 FISH probes from Empire Genomics (Buffalo, N.Y.). The slides were hybridized
90
with the FISH probes according to the manufacturer's instructions with slight modifications.
91
The slides were then examined under fluorescence microscope (Nikon 80i) equipped with
92
multiple filters and signals were manually counted in 50 cells for each slide.
3
93
94
95
Immunohistochemistry
96
Sections of formalin fixed, paraffin embedded human glioma surgical samples, tumor
97
xenografts, or multicellular spheroids were deparaffinized with xylene and rehydrated through
98
graded alcohol into in phosphate buffered saline. Antigens were unmasked by 10 min incubation
99
in boiling in citrate buffer and sections stained with anti-Met rabbit monoclonal antibody (D1C2)
100
(Cell signaling #8198) or anti-phospho-Met (Tyr1234/1235) rabbit monoclonal antibody (D26)
101
(Cell signaling #3077) and visualized with Betazoid DAB (Biocare BDB2004) and counterstained
102
with Envision Flex Hematoxylin (Dako K8008). Images were captured using a Eclipse E800M
103
microscope equipped with a Nikon DS-Fi2 color digital camera (Nikon).
104
105
Reverse Transcription and PCR
106
cDNA was prepared from 1 g DNAseI-treated total RNA isolated from tumor, neurosphere and
107
xenografts using Superscript III Reverse Transcriptase and oligo dT (Thermo Fisher Scientific).
108
cDNA was used as a template for PCR reaction in a iCycler instrument (BioRad), using
109
Platinum Taq DNA Polymerase (Thermo Fisher Scientific) and the following oligos:
110
Human MET: exon 2 forward (M2F): 5’ AGCAATGGGGAGTGTAAAGAGG and exon 8 reverse
111
(M8R): 5‘ GTAAGTAAAGTGCCACCAGCC
112
Human CAPZA2 exon 1 forward (C1F): 5’ GTAAGTAAAGTGCCACCAGCC
113
Human EGFR forward: 5’GCAGCGATGCGACCCTCCGGG and reverse:
114
5’-CTATTCCGTTACACACTTTGCGG
115
Human b-actin: forward 5’ CCGACAGGATGCAGAAGGAG and reverse 5’
116
CATCTGCTGGAAGGTGGACA
117
118
LC-MS/MS Quantitation of Capmatinib and Crizotinib in Mouse Plasma and Tumor
119
For mouse plasma sample analysis, 25 L of each sample was precipitated with 200 L of
120
acetonitrile. This suspension was vortexed for 30 min and centrifuged at 4k rpm for 15 min,
121
after which 100 L of the extract was aliquoted and mixed with 200 L of acetonitrile/water (1/2,
122
v/v) prior to LC-MS/MS analysis. The extracted plasma samples were analyzed on a Waters
123
Acquity UPLC system coupled with a Waters Xevo TQ-S triple quadrupole mass spectrometer
124
operated at positive mode. The capillary voltage was set to 0.5 kv and collision energy to 32 ev.
125
Capmatinib (purchased from Matrix Scientific Products (Columbia, SC)) and crizotinib
4
126
(purchased from LC Laboratories (Woburn, MA)) were separated using a Waters Acquity UPLC
127
BEH C18 column (1.7 µm, 2.1 x 30 mm) and detected by a multiple reaction monitoring
128
transition, m/z 413.04>354.07 for capmatinib and m/z 450.04>260.18 for crizotinib, respectively.
129
The mobile phase A was 0.1% acetic acid/water and B was 0.1% acetic acid/acetonitrile. The
130
LC gradient was 10% B (0-0.3 min), 10-95% B (0.3-1.3 min), 95% B (1.3-1.7 min), 10% B (1.7-
131
2.0 min) and the flow rate was 0.5 mL/min. The column temperature was 40 C. The injection
132
volume was 2 µL. Under these conditions, the retention time was 0.85 min for capmatinib and
133
0.74 min for crizotinib. The method was validated with an analytical range of 1 – 1000 ng of
134
capmatinib and crizotinib in untreated CD-1 mouse plasma, respectively.
135
Mouse tumor tissue samples were homogenized in methanol:water (80:20, v/v) to a
136
concentration of 100 mg (tissue)/mL. The homogenates were vortexed for 10 min and
137
centrifuged at 15k rpm for 5 min, then 100 L of the supernatant was transferred into an HPLC
138
vial for LC-MS/MS analysis. The tissue homogenates were analyzed by using the same method
139
as described above. The method was validated with an analytical range of 1 – 1000 ng/mL of
140
capmatinib and crizotinib in untreated mouse tumor tissue homogenates, respectively.
141
142
Whole Exome Sequencing
143
Library Construction and Sequencing
144
The sequencing libraries were prepared using the KAPA library prep protocol (catalog number
145
KK8234, KAPA Biosystems, Wilmington, MA). The exomes were captured using the SureSelect
146
XT Human All Exon V5 kit (Agilent Technologies, Santa Clara, CA). Samples were then
147
sequenced 2x100 bp to about 340x depth on the Illumina HiSeq 2000.
148
149
BAM File Generation
150
The raw output (BCL) files of an Illumina sequencer were converted to FASTQ files using
151
Illumina's offline basecalling software CASAVA version 1.8.2. The FASTQ files were then
152
aligned to the reference genome (hg19 for human) using BWA version 0.7.0 5 for DNA samples
153
with parameters suitable for a given aligner. The aligned BAM files are subjected to mark
154
duplication, re-alignment, and re-caliberation using Picard version 1.112
155
(http://picard.sourceforge.net) and GATK version 1.5 6 when applicable before any downstream
156
analysis are conducted.
5
157
158
Whole Genome Low Pass Sequencing
159
Library Construction and Sequencing
160
The Illumina compatible libraries were prepared using KAPA DNA Library preparation kit
161
(Catalog No. KK8232) as per the manufacturer’s protocol. In brief, DNA was fragmented to a
162
median size of 200bp by sonication. Fragmented DNA ends were polished and 5′-
163
phosphorylated. After addition of 3′-A to the ends, indexed Y-adapters were ligated and the
164
samples were PCR amplified. The resulting DNA libraries were quantified and validated by
165
qPCR, and sequenced on Illumina’s HiSeq 2000 in a paired-end read format for 76 cycles. The
166
resulting BCL files containing the sequence data were converted into “.fastq.gz” files and
167
individual sample libraries were demultiplexed using CASAVA version 1.8.2 with no
168
mismatches.
169
170
RNA Sequencing
171
Library Construction and Sequencing
172
The Illumina compatible libraries were prepared using Illumina’s TruSeq RNA Sample Prep kit
173
v2, as per the manufacturer’s protocol. In brief, Poly-A RNA was enriched using Oligo-dT beads.
174
Enriched Poly-A RNA was fragmented to a median size of 150bp using chemical fragmentation
175
and converted into double stranded cDNA. Ends of the double stranded cDNA were polished,
176
5′-phosphorylated, and 3’-A tailed for ligation of the Y-shaped indexed adapters. Adapter ligated
177
DNA fragments were PCR amplified, quantified and validated by qPCR, and sequenced on
178
Illumina’s HiSeq 2000 in a paired-end read format for 76 cycles. The resulting BCL files
179
containing the sequence data were converted into “.fastq.gz” files & individual sample libraries
180
were demultiplexed using CASAVA version 1.8.2 with no mismatches.
181
182
BAM File Generation
183
RNA sequencing BAM files were generated and analyzed using the Pipeline for RNAseq Data
184
Analysis (PRADA) (http://sourceforge.net/projects/prada/) 7. In brief, PRADA uses Burroughs-
185
Wheeler alignment, Samtools, and Genome Analysis Toolkit to align RNAseq reads to a
186
reference database composed of whole genome sequences (hg19) and transcriptome
187
sequences (Ensembl64). Details of the PRADA pipeline are described in its manuscript.
188
6
189
Targeted Resequencing
190
Library Construction and Sequencing
191
The Illumina compatible libraries were prepared using KAPA DNA Library preparation kit
192
(Catalog No. KK8232) as per the manufacturer’s protocol. In brief, DNA was fragmented to a
193
median size of 200bp by sonication. Fragmented DNA ends were polished and 5′-
194
phosphorylated. After addition of 3′-A to the ends, indexed Y-adapters were ligated and the
195
samples were PCR amplified. The resulting DNA libraries were enriched for targeted regions
196
using NimbleGen SeqCap EZ Choice Library 4 RXN (Catalog No. 06740251001) and
197
NimbleGen SeqCap EZ Reagent Kit Plus v2 (Catalog No. 06953247001) as per the
198
manufacturer’s protocol. The enriched libraries were quantified and validated by qPCR, and
199
sequenced on Illumina’s HiSeq 2000 in a paired-end read format for 76 cycles. The resulting
200
BCL files containing the sequence data were converted into “.fastq.gz” files and individual
201
sample libraries were demultiplexed using CASAVA version 1.8.2 with no mismatches.
202
203
BAM File Generation
204
Sequencing FASTQ files were aligned to the reference genome (hg19 for human) and
205
processed to BAM files by the same pipeline as in whole exome sequencing.
206
207
Pacific Biosciences (PacBio) Long Read Sequencing
208
Library Construction and Sequencing
209
The DNA libraries were prepared following the Pacific Biosciences 20 kb Template Preparation
210
Using BluePippin Size-Selection System protocol. No DNA shearing was performed since the
211
samples were already fragmented. The sheared DNA was size selected on a BluePippin
212
system (Sage Science Inc., Beverly, MA, USA) using a cutoff range of 7 kb to 50 kb. The DNA
213
Damage repair, End repair and SMRT bell ligation steps were performed as described in the
214
template preparation protocol with the SMRTbell Template Prep Kit 1.0 reagents (Pacific
215
Biosciences, Menlo Park, CA, USA) . The sequencing primer annealing and the P6 polymerase
216
binding reactions were prepared according to the BindingCalculator (Pacific Biosciences
217
BindingCalculator-master_v2.3.1.1). The libraries were sequenced on a PacBio RSII instrument
218
at a loading concentration (on-plate) of 80pM, 90pM and 100pM using the MagBead
219
OneCellPerWell v1 collection protocol, DNA sequencing kit 4.0, SMRT cells v3 and 4 hours
220
movies.
221
7
222
Filtering the sequencing reads
223
Reads and subreads were filtered based on their length and quality values, using smrtpipe.py
224
from the SMRT-Analysis package.
225
226
Structural Variation Analysis
227
Canu (version 1.2) 8 was used for assembling the filtered PacBio sequence subreads with the
228
parameters suggested for low coverage data. The assembled contigs were aligned by nucmer
229
(version 3.23) 9 to the human genome reference (hg19) and the contigs having sequence
230
fragments aligned to the MET-CAPZA2 region of chromosome 7 were selected for structural
231
variation analysis. For the selected contigs, we performed a blastn search 10 against mouse
232
genome using the sequence fragments aligned to the MET-CAPZA2 region of hg19 in order to
233
make sure that they originated from human (Supplementary Table 3). Sequence framents
234
shared by two contigs were identified with pairwise alignment of the contigs using the nucmer.
235
Two contigs were considered to be connected only if they shared a sequence fragment which
236
was at least 5,000 bp long with the minimum 99% identity. The high confident shared sequence
237
fragments were used for connecting the contigs into a circular form in the HF3035. In HF3077,
238
only two contigs (tig01141776 and tig01141835) were aligned to the MET-CAPZA2 region of
239
chromosome 7, and the two contigs shared 621 bp long sequence with 95.6% identity between
240
the 3’ end of tig01141776 and the 5’ end of tig01141835.
241
242
Gene Fusion and Gene Expression Analysis
243
To detect transcript fusions, PRADA aligned RNAseq reads to a reference database composed
244
of whole genome sequences (hg19) and transcriptome sequences (Ensembl64). Two lines of
245
evidence were required for identification of a gene fusion: 1) a minimum of two discordant read
246
pairs mapping to a candidate gene pair; 2) a minimum of one junction spanning read mapping to
247
a junction that connected exons between the candidate gene pair, with its pair mate mapping to
248
the either of the two genes. Several filters were applied to remove false positives and artifacts,
249
of which the most prominent is based on significant sequence similarity between the two fusion
250
genes (using BLASTN, Expect value = 0.01). Gene expression was measured as ‘reads per
251
kilobase per million’ (RPKM) to normalize for gene length and library size. Specific details of the
252
PRADA pipeline are described elsewhere 7.
253
254
Structural Variant Detection
8
255
To detect structural variants, we applied SpeedSeq 11 (with default parameters) to whole
256
genome sequencing from both tumor and matching normal samples. We filtered somatic
257
variants by requiring at least 4 reads supporting evidence in a tumor and no reads in its
258
matching normal.
259
260
EGFR Intragenic Rearrangement
261
General User dEfined Supervised Search for intragenic fusion (GUESS-if), a module of PRADA,
262
was also used to search for EGFR intragenic rearrangements, as previously described 12. In
263
brief, using the same rationale as in PRADA gene fusion identification, GUESS-if looked for
264
spanning reads for abnormal junctions that were not present in known transcripts. To assure a
265
high accuracy, we required at least 10 reads spanning exon 1-8 of EGFR.
266
267
Validation of Somatic Single Nucleotide Variants
268
To validate our somatic single nucleotide mutation calls, we performed targeted resequencing at
269
high coverage (>1,400x). We selected 792 unique bases, which had been found to be mutated
270
in tumor, neurosphere, or xenografts but not in all of them. These sites corresponded to
271
1368sSNVs. In total, 1287of 1368mutations called from the exome sequencing data were
272
detected in the high coverage data, resulting in a true positive validation rate of 94%. Evidence
273
for recovered somatic mutation was observed in 1001 of 2646 wild type nucleotides. The variant
274
allelic fractions (VAFs), i.e. the number of reads harboring the variant allele divided by all reads
275
covering to that base, of exome and validation sequencing were highly correlated (Pearson
276
correlation = 0.92).
277
278
Somatic Single Nucleotide Variant Calling
279
Somatic single nucleotide variants (sSNVs) from tumor and patient-matched normal samples
280
were detected by using MuTect algorithm (version 1.14) with default parameters 13. The search
281
for somatic small insertion/deletions (Indels) was performed by using Pindel 14 with tumor and
282
patient-matched normal samples. All sSNVs and small indels were annotated by ANNOVAR
283
(version 2012-10-23) 15. Only exonic or splicing sSNVs were selected for analysis. Mutation
284
counts for individual samples are available in Supplementary Table 4.
285
286
Inference of Cellular Frequency and Mutational Clusters
287
We defined cellular frequency of a mutation as the fraction of cells harboring the mutation.
288
Estimation of cellular frequency was performed using PyClone version 0.12.7
16
. For each set of
9
289
patient, neurosphere, and xenograft samples, PyClone was run on the somatic mutations whose
290
sites were covered over all the samples using multi-sample joint analysis mode with PyClone
291
beta binomial density and parental copy number priors. Allelic copy numbers were estimated by
292
applying Sequenza 17 to exome sequencing data. Default options for PyClone were used. To
293
avoid potential artifacts from sequencing coverage, we limited the analysis to the mutations at
294
the sites covered with at least 50X over all samples from a same patient. PyClone inferred
295
clusters of mutations whose cellular frequencies co-vary over samples. Our analysis was limited
296
only to mutation clusters with at least two mutations.
297
298
Removing Putative Mouse Reads in Short Read Sequencing Data
299
Sequencing reads derived from xenograft samples are a mixture of reads from human and
300
mouse. We utilized Xenome 18 to select sequencing reads arising from human. Then, the
301
selected human reads selected were aligned to the human genome using the same pipeline as
302
in patient and neurosphere samples.
303
Identification of Copy Numbers from Low Pass Sequence Data
304
We used NBICSeq version 0.5.2 19 with bin size 1000 bps and BIC penalty 3 to estimate
305
somatic copy number alterations in low pass sequencing data from tumor and patient-matched
306
normal samples.
307
Detecting TERT Promoter Mutations
308
We evaluated whole genome low pass sequencing and whole exome sequencing for the
309
presence of TERT mutations in a supervised way using GATK pileup. We required minimum 2
310
variant alleles (combined from WGS and WES) for detection of TERT promoter mutations.
Variant change
Variant site
Patients
C228T
5:1295228-1295228
7
C250T
5:1295250-1295250
5
311
312
Detecting ATRX Indels
313
Indels were called using Pindel (Version 0.2.4t) with the default parameters except maximum
314
allowed mismatch rate being 0.1 14. Somatic indels were further filtered to require a minimum 5
315
supporting tumor reads.
10
316
Analysis of B-allele-frequency segments
317
B-allele-frequency segments were inferred by applying Sequenza (Version 2.1.1)
318
exome sequencing data with the default parameters. Analysis of B-allele fractions using whole
319
genome sequencing in our sample cohort revealed loss of heterozygosity (LOH) of chromosome
320
10 in two cases with diploid chromosome 10, suggesting these cases had first lost a single copy
321
of the chromosome which was subsequently duplicated (Supplementary Fig. 1). We evaluated
322
chromosome 10 LOH using Affymetrix SNP6 profiles from 320 IDH-wildtype TCGA glioblastoma
323
12
324
underscoring the importance of aberrations in chromosome 10 in gliomagenesis and evolution
325
(Supplementary Fig. 6).
326
Data used for longitudinal analysis in glioma patient tumors
327
Segmented copy number profiles for thirteen TCGA GBM patients and fourteen TCGA LGG
328
patients were were obtained from the TCGA portal https://tcga-data.nci.nih.gov/tcga/. Copy
329
number profiles for ten patients from MD Anderson Cancer Center (MDACC) and fourteen
330
patients from either Samsung Medical Center (SMC) or Seoul National University Hospital
331
(SNUH) were previously processed 20,21. Additional copy number data for seven patients from
332
MD Anderson were generated by applying NBICseq version 0.5.219 to low pass whole genome
333
sequencing. For fusion detection and structural variant calling, the same pipelines as described
334
in the corresponding method subsections were applied for unaligned RNA sequencing files and
335
whole genome sequencing BAM files from TCGA GBM, TCGA LGG, and MD Anderson
336
patients. Sequencing data for the TCGA cohort were downloaded from CGHub. Fusion calls for
337
Samsung Medical Center cohort patients were previously processed 21. Shown below is a
338
summary table of data used in the analysis.
17
to whole
, and found that 27 of 52 tumors with diploid chromosome 10 similarly showed LOH,
11
Table: The number of patients used in the longitudinal analysis
339
Cohort
CNV
RNASEQ
WGS
MDACC
17
9
7
SMC/SNUH
14
0
0
TCGA GBM
13
6
10
TCGA LGG
14
14
13
Total
58
29
30
Note: Patients do not necessarily have both RNAseq and WGS.
340
341
Predicting double minute candidates
342
After visualizing segmented copy numbers in the Integrative Genomics Viewer (IGV) 22, we
343
manually scrutinized potential extrachromosomal DNA candidate regions by searching for
344
complex patterns of copy number amplifications. In cases where structural variations and gene
345
fusions were available, we projected those variation breakpoints onto the copy number IGV view
346
plots to get additional evidence on presence of potential extrachromosomal DNAs.
347
Statistical Analysis
348
We conducted all computations with R 3.0.13 and used standard statistical tests as appropriate
349
350
More explanation on CAPZA2-MET fusion transcripts
351
Chimeric RNA fusions have been previously reported in GBM
352
therapeutically targetable, in particular when involving receptor tyrosine kinases
353
We performed RNA sequencing and detected fusion transcripts in all samples except
354
for a single neurosphere line (HF3203) with disqualifying quality control values 7. From
355
this unbiased screen, multiple fusions joining the CAPZA2 coding start with the 5’ UTR
356
of MET were identified in the primary tumors of HF3035, HF3077 and HF3055 (Fig. 3b).
357
Additional CAPZA2-MET variants resulted in an in-frame transcript consisting of
358
CAPZA2 exon 1 and MET starting from exon 3 (HF3035, HF3077) and exon 6
23-25
and may be
26,27.
12
359
(HF3035). The CAPZA2-MET fusions associated with outlier gene expression of MET
360
while CAPZA2 expression was comparable between samples with and without
361
CAPZA2-MET fusions (Supplementary Fig. 7a). The presence of multiple parallel fusion
362
transcripts suggested complex chromosomal rearrangements, which associated with
363
focal amplification of a 200 kb area on 7q31 (Fig. 3b). Amplification of the 7q31 genomic
364
area carrying the adjacent CAPZA2 and MET genes has been previously reported in
365
glioma
366
glioblastoma we analyzed the DNA copy number profiles of 486 TCGA IDH wildtype
367
glioblastoma samples. A focal amplification of the MET locus ranging in size from 150kb
368
to 5.1 Mb which associated with a highly significant increase in expression relative to
369
samples with broad 7q amplification or diploid MET copy number was identified in ten
370
cases (2.1%) (Supplementary Fig. 7b). RNAsequencing data was available for one of
371
the ten TCGA cases and no fusions involving MET were detected in those samples.
372
CAPZA2-MET fusions have been infrequently reported in other cancers
373
response of a glioblastoma carrying MET amplification to MET and ALK inhibiting agent
374
crizotinib has been recorded 31.
28.
To assess the frequency of MET-activating somatic alterations in
29,30.
Clinical
375
In spite of convincing evidence supporting fusion events in the GBM samples
376
from HF3035, HF3055 and HF3077, no sequencing reads manifesting the presence of
377
CAPZA2-MET fusion transcripts or the focal 7q31 genomic amplification were identified
378
in the HF3055 and HF3077 neurospheres and only weak support was found in the
379
HF3035 neurosphere. However, identical CAPZA2-MET fusions and 7q31 DNA
380
amplifications resurfaced at high frequency in all xenografts derived from the HF3035
381
and HF3077 neurospheres, with identical breakpoints (Supplementary Fig. 3b). None of
382
the HF3055 xenografts carried CAPZA2-MET fusions or 7q31 amplification, in line with
383
the absence of focal 7q31 amplification in the primary HF3055 tumor. To exclude the
384
possibility that the CAPZA2-MET fusion events were artifacts resulting from sequencing
385
we validated the event in all samples from HF3035 using RT-PCR, which confirmed
386
both wildtype MET and CAPZA2-MET mRNA in the tumor and PDX but not
387
neurosphere (Supplementary Fig. 3a). MET protein was abundantly present in the
388
HF3035 and HF3077 tumors as measured using immunohistochemistry, undetectable in
389
the neurospheres and re-expressed in the PDX (Supplementary Fig. 3c).
13
390
14
391
SUPPLEMENTARY REFERENCES
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Hasselbach, L.A. et al. Optimization of High Grade Glioma Cell Culture from Surgical Specimens
for Use in Clinically Relevant Animal Models and 3D Immunochemistry. J Vis Exp 83, e51088
(2014).
deCarvalho, A.C. et al. Gliosarcoma stem cells undergo glial and mesenchymal differentiation in
vivo. Stem Cells 28, 181-90 (2010).
Berezovsky, A.D. et al. Sox2 promotes malignancy in glioblastoma by regulating plasticity and
astrocytic differentiation. Neoplasia 16, 193-206 e25 (2014).
Graveel, C. et al. Activating Met mutations produce unique tumor profiles in mice with selective
duplication of the mutant allele. Proc Natl Acad Sci U S A 101, 17198-203 (2004).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754-60 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing nextgeneration DNA sequencing data. Genome Res 20, 1297-303 (2010).
Torres-Garcia, W. et al. PRADA: pipeline for RNA sequencing data analysis. Bioinformatics 30,
2224-6 (2014).
Berlin, K. et al. Assembling large genomes with single-molecule sequencing and locality-sensitive
hashing. Nat Biotechnol 33, 623-30 (2015).
Delcher, A.L. et al. Alignment of whole genomes. Nucleic Acids Res 27, 2369-76 (1999).
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J
Mol Biol 215, 403-10 (1990).
Chiang, C. et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods
12, 966-8 (2015).
Brennan, C.W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462-77 (2013).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous
cancer samples. Nat Biotechnol 31, 213-9 (2013).
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect
break points of large deletions and medium sized insertions from paired-end short reads.
Bioinformatics 25, 2865-71 (2009).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from
high-throughput sequencing data. Nucleic Acids Res 38, e164 (2010).
Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nat
Methods 11, 396-8 (2014).
Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor
sequencing data. Ann Oncol 26, 64-70 (2015).
Conway, T. et al. Xenome--a tool for classifying reads from xenograft samples. Bioinformatics 28,
i172-8 (2012).
Xi, R. et al. Copy number variation detection in whole-genome sequencing data using the
Bayesian information criterion. Proc Natl Acad Sci U S A 108, E1128-36 (2011).
Kim, H. et al. Whole-genome and multisector exome sequencing of primary and post-treatment
glioblastoma reveals patterns of tumor evolution. Genome Res 25, 316-27 (2015).
Kim, J. et al. Spatiotemporal Evolution of the Primary Glioblastoma Genome. Cancer Cell 28,
318-28 (2015).
Robinson, J.T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24-6 (2011).
Singh, D. et al. Transforming fusions of FGFR and TACC genes in human glioblastoma. Science
337, 1231-5 (2012).
15
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
24.
25.
26.
27.
28.
29.
30.
31.
Zheng, S. et al. A survey of intragenic breakpoints in glioblastoma identifies a distinct subset
associated with poor survival. Genes Dev 27, 1462-72 (2013).
Bao, Z.S. et al. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript
in secondary glioblastomas. Genome Res 24, 1765-73 (2014).
Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript
fusions. Oncogene (2014).
Mertens, F., Johansson, B., Fioretos, T. & Mitelman, F. The emerging complexity of gene fusions
in cancer. Nat Rev Cancer 15, 371-81 (2015).
Mueller, H.W. et al. Identification of an amplified gene cluster in glioma including two novel
amplified genes isolated by exon trapping. Hum Genet 101, 190-7 (1997).
Kim, H.P. et al. Novel fusion transcripts in human gastric cancer revealed by transcriptome
analysis. Oncogene 33, 5434-41 (2014).
Yoshihara, K. et al. The landscape and therapeutic relevance of cancer-associated transcript
fusions. Oncogene 34, 4845-54 (2015).
Chi, A.S. et al. Rapid radiographic and clinical improvement after treatment of a MET-amplified
recurrent glioblastoma with a mesenchymal-epithelial transition inhibitor. J Clin Oncol 30, e30-3
(2012).
454
16