Download Mutation analysis in Glioblastoma

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
“Cancer Genomics”
Richard K. Wilson, Ph.D.
Washington University
School of Medicine
 R.K.Wilson 2007
[email protected]
Cancer Genomics
Next-generation
sequencing technology
Ancillary
genomes:
mouse
chimp
etc.
Human Genome
v1.0
Discovery
Technology
Software tools
Infrastructure
Cancer
Other diseases
 R.K.Wilson 2007
PCR-based re-sequencing
list of
candidate
genes
large collection
of patient
samples
 R.K.Wilson 2007
EGFR mutations in NSCLC
EGF ligand binding
Tyrosine kinase
K
TM
718
GXGXXG
745
776
835
K
R
H
DFG
858
DFG R
autophos
Y
Y
Y Y
Y869
947 964
Y
M
LREA
Most TKI responders have EGFR mutations:
Study 1: 8/9 (89%) vs. 0/7 controls
Study 2: 5/5 (100%) vs. 0/4 controls
Study 3: 19/24 (79%) vs. 0/20 controls
 R.K.Wilson 2007
Tumor Sequencing Project
~600 genes
of interest
~200 lung
adenocarcinoma
samples
• Sequencing Centers: BCM-HGSC, BI, WUGSC
• Cancer Centers: MSKCC, DFCI, SCC, MDA
 R.K.Wilson 2007
TSP Target List
• Too expensive to sequence the whole genome;
therefore, focus on “drugable” targets.
• For lung adenocarcinoma TSP: ~600 genes
(exons only)
–
–
–
–
–
–
–
Receptor tyrosine kinases (e.g. EGFR)
Selected serine-threonine kinases
Known oncogenes
Known tumor suppressor genes
EGFR pathway genes
DNA repair genes
Etc.
 R.K.Wilson 2007
SNP Arrays
 R.K.Wilson 2007
SNP Arrays
 R.K.Wilson 2007
DNA Chips/SNP Arrays
 R.K.Wilson 2007
Lung Adeno
Genomic Events
SNP Array
Analysis
Weir et al. Nature (2007)
 R.K.Wilson 2007
Lung Adeno Genomic Events
Weir et al. Nature (2007)
 R.K.Wilson 2007
Lung Adeno Genomic Events
Weir et al. Nature (2007)
 R.K.Wilson 2007
Lung Adenocarcinoma Amplifications
Weir et al. Nature (2007)
 R.K.Wilson 2007
0
KRAS
E2F4
TP53
GNAS
STK11
EGFR
LRRK2
CDKN2A
EPHA3
NF1
SCARF2
PTPRD
LMTK2
TYK2
RIN1
ROR2
MKNK2
ERBB4
LRP1B
NTRK1
MYO3B
PIK3CG
LZTR1
JAG2
CDC2L2
EPHA5
CDH11
PAK3
SLC38A3
PIK3C3
INSRR
NTRK3
ATM
PRKCG
BAGE4
KDR
PTEN
NRAS
ZMYND10
PDGFRA
INHBA
PFTK1
TP73L
FLT4
LTK
DOCK3
NTRK2
EPHB6
IRAK2
ITK
EPHB1
APC
EPHA7
BAGE3
MST1
LMTK3
PAK7
GATA1
TFDP1
PRKACB
TSHR
MINK1
FGFR4
RB1
FGFR1
# of mutations
Mutations in lung adenocarcinoma
70
60
50
40
30
20
10
KRAS and TP53 Are Mutated in About 1/3 of Tumor Samples
 Indels have not been included in the analysis
 R.K.Wilson 2007
Mutations in TP53, ERBB3, and AKT3
appear to correlate with tumor grade
N=24
N=85
N=71
Mutation
 R.K.Wilson 2007
Correlations between mutations and
clinical features
• Mutations in PDGFRA, PTEN, NTRK1 and PRKDC
show positive correlation with tumor stage.
• Mutations in LRP1B, PRKDC, TP53, and APC correlate
with the solid tumor histological subtype of lung
adenocarcinoma.
• High correlation of mutations in EGFR and MYO3B with
never smoker and mutations in KRAS and LRP1B with
smokers.
 R.K.Wilson 2007
EGFR mutations in glioblastoma
Screen of kinase domains in glioblastomano recurrent mutations
But …
EC
I
8
TM JM
9
KD
21 21 22 23 24 25 26 27 28
10 11 12 13 15
14 15 16 17 18 19 20
II
EGFRvIII (del AA 30-297)
119 Lung Tumors: no EC mutations
270 HapMap Normals: no EC mutations
red=somatic
blue=germline
black=unknown
III
IV
L861Q
5
78
P596L
G598V
4
76
T263P
A289V/D/T
R324L
E330K
3
R108K
32
D46N,G63R
21
KINASE
18/132 glioblastoma (13.6%); + 1 KD
1/8 glioblastoma cell lines (12.5%)
0/11 lower grade gliomas
151 Total samples
 R.K.Wilson 2007
Genomic Studies of Cancer
• Hypothesis-driven (biased):
- Gene sets with related functions: “kinome”,
“phosphatome”
- Genes mutated in other cancers
- Closely related genes
- Investigator-driven ideas
• Data-driven (unbiased):
- Use genomic platforms to identify loci with
recurrent somatic alterations
- Array-based RNA profiling
- Array CGH
- Array-based SNP genotyping
 R.K.Wilson 2007
Acute myelogenous leukemia
• Project initiated in 2002.
• Primary tumors, matched normal
tissue (i.e., germline variants vs.
somatic mutations)
• “Discovery set” (46 tumors) +
“Validation set” (94 tumors)
• Initial target list: 450 genes
• Orthogonal technologies (CGH
arrays, expression profiling, etc.)
for genome characterization and
to detect additional sequencing
targets.
 R.K.Wilson 2007
Acute myelogenous leukemia
- FLT3: 29%
- NPM1: 25%
- NRAS: 9.6%
- PTPN11: 4%
- RUNX1: 4%
- GCSFR: 4%
- Others: 2-3%
 R.K.Wilson 2007
Is there a better approach?
• What are we missing outside of
the exons?
• PCR-based re-sequencing:
- Relatively expensive
- Diploid (at best) & low coverage
 R.K.Wilson 2007
Solexa/Illumina 1G Analyzer
 R.K.Wilson 2007
Solexa/Illumina 1G Analyzer
Illumina flow cell
• Acts as the microfluidic conduit for cluster generation
and sequencing reagents.
• 8-lane flow cell configuration.
• Separate libraries can be sequenced in each lane, or
the same library in all.
• ~60M clusters are sequenced per flow cell.
 R.K.Wilson 2007
Next Generation Sequencing
Technologies
Genome size
3000 Mb
Req'd coverage
6
3730
bp/read
Reads/run
bp/run
#/runs req'd
Cost per run
Total cost
600
96
57,600
312,500
$
48
$ 15,000,000
12
20
454 FLX
Solexa
250
400,000
100,000,000
360
$
6,800
$ 2,448,000
32
28,000,000
896,000,000
67
$
$
9,300
622,768
 R.K.Wilson 2007
AML: Whole Genome Sequencing
Data types:
• Whole genome sequence (tumor genome): Solexa
• FL cDNA normalized library: Solexa + 454
• Whole genome sequence (epidermal genome): Solexa
Analysis plans:
• Compare sequence to previously identified mutations.
• Compare increasing coverage levels to heterozygous
SNPs from Affy/Illumina arrays for coverage evaluation.
• Devise strategic approaches to find novel variants;
validate and characterize.
 R.K.Wilson 2007
“933124”
• 57 y/o Caucasian
female
• De novo M1 AML
• 100% blasts in initial
BM sample
• Relapsed and died at
11 months
• Normal cytogenetics
• No LOH on Affy 500K
SNP array
• Informed consent for
whole genome
sequencing
 R.K.Wilson 2007
 R.K.Wilson 2007
 R.K.Wilson 2007
AML: Whole Genome Sequencing
•
•
•
•
•
As of 1/28/08:
75 Solexa runs completed (32 bp reads)
62 billion bp (~22X haploid coverage)
2,123,143 sequence variants detected (Q30)
492,569 (23.2%) are previously undiscovered SNPs
• 46,320 heterozygous (informative) SNPs from Affy
and Ilumina SNP arrays.
• 77% of informative SNPs with both WT and variant
alleles were detected in the genome sequence.
• 97.4% of informative SNPs of either allele were
detected in the genome sequence.
 R.K.Wilson 2007
AML: Whole Genome Sequencing
“933124” genome sequence
2,123,143 variants
dbSNP
1,630,574
Splice_site
Coding
99
5,056
Synonymous
1,222
Intergenic
145,092
Genic
334,477
Other
329,322
Missense
Nonsense
Nonstop
3,402
320
9
*Only reporting Q30 variants
*Genic region = gene boundary +/- 50kb
 R.K.Wilson 2007
AML: Transcriptome Sequencing
Various cDNA library construction
procedures & normalization schemes
454 cDNA sequencing:
Number of mapped cDNA reads: 306,267
Solexa cDNA sequencing:
Number of mapped reads: 47,153,784
 R.K.Wilson 2007
AML: Transcriptome Sequencing
Expressed genes: variant:germline frequencies
–
–
–
–
–
–
–
–
–
–
–
–
–
MYCBP2
HSP90B1
BCCIP
NCOR1
CHFR
DNAJ
PTPN11
NUMA1
CASPASE 7
HOX C6
PLEKHC1
NTRK3
CDC2
1188:345
694:1347
391:394
256:268
230:52
218:0
198:1
157:2
145:147
118:2
112:14
112:10
96:82
 R.K.Wilson 2007
V194M (C to T) in FLT3
CT
CT
cDNA sequence
Tumor genome sequence
 R.K.Wilson 2007
AML: Whole Genome Sequencing
• Currently using SXOligoSearchG (Synamatix) to
detect small (1-2 bp) indels.
• Evaluating software tools for detection of larger
indels.
 R.K.Wilson 2007
AML: Current status
thirsty for knowledge?
 R.K.Wilson 2007
AML: Current status
• Diploid coverage was obtained for 77% of an AML M1
tumor genome with 22x haploid coverage.
• 2.1M sequence variants found (similar to other whole
genomes already ‘finished’).
• ~495,000 novel variants: SNPs vs. somatic mutations
• 10x coverage of epidermis (“normal”) genome just
completed; may identify >90% of variants as rare SNPs.
• Remaining 50,000 variants are being prioritized by
detection in cDNA: should be <1,000
• Very rare somatic mutations in cDNA thusfar (only 2
validated).
• No mutator (“driver”) phenotype is readily apparent for this
AML case; ”passenger” mutations appear to be rare.
• We continue to sift through the data…
 R.K.Wilson 2007
Cancer Genomics
• Exon-targeted sequencing (TSP, glioblastoma) is
revealing useful & interesting findings; expensive & slow!
• Next Gen sequencing is here and will have a substantial
near-term impact on the study of cancer genomes!
• Ancillary genome-based technologies (expression
profiling, SNP arrays, cDNA sequencing) are crucial for
understanding the target genome before considering
WGS.
• The dream is not hype: a comprehensive understanding
of the “cancer genome” is probable, and will change the
way that you diagnose & treat your patients.
 R.K.Wilson 2007
Acknowledgments
• WU Genome Sequencing Center
Elaine Mardis, Li Ding, Dave Dooling, Tracy Miner, Mike
McLellan, Ginger Fewell, Jim Eldred, Asif Chinwalla, Yumi
Kasai, Lucinda Fulton, Vince Magrini, Matt Hickenbotham, Lisa
Cook, Michael Wendl, Michael Province
• WU Siteman Cancer Center
Tim Ley, Mark Watson, Matt Walter, Rhonda Ries, Jackie
Payton, John DiPersio, Dan Link, Michael Tomasson, Tim
Graubert, Sharon Heath
• TSP/TCGA Colleagues
Baylor HGSC, Broad Institute, many others…
• Funding sources
NHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML WGS)
genome.wustl.edu
 R.K.Wilson 2007