Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“Cancer Genomics” Richard K. Wilson, Ph.D. Washington University School of Medicine R.K.Wilson 2007 [email protected] Cancer Genomics Next-generation sequencing technology Ancillary genomes: mouse chimp etc. Human Genome v1.0 Discovery Technology Software tools Infrastructure Cancer Other diseases R.K.Wilson 2007 PCR-based re-sequencing list of candidate genes large collection of patient samples R.K.Wilson 2007 EGFR mutations in NSCLC EGF ligand binding Tyrosine kinase K TM 718 GXGXXG 745 776 835 K R H DFG 858 DFG R autophos Y Y Y Y Y869 947 964 Y M LREA Most TKI responders have EGFR mutations: Study 1: 8/9 (89%) vs. 0/7 controls Study 2: 5/5 (100%) vs. 0/4 controls Study 3: 19/24 (79%) vs. 0/20 controls R.K.Wilson 2007 Tumor Sequencing Project ~600 genes of interest ~200 lung adenocarcinoma samples • Sequencing Centers: BCM-HGSC, BI, WUGSC • Cancer Centers: MSKCC, DFCI, SCC, MDA R.K.Wilson 2007 TSP Target List • Too expensive to sequence the whole genome; therefore, focus on “drugable” targets. • For lung adenocarcinoma TSP: ~600 genes (exons only) – – – – – – – Receptor tyrosine kinases (e.g. EGFR) Selected serine-threonine kinases Known oncogenes Known tumor suppressor genes EGFR pathway genes DNA repair genes Etc. R.K.Wilson 2007 SNP Arrays R.K.Wilson 2007 SNP Arrays R.K.Wilson 2007 DNA Chips/SNP Arrays R.K.Wilson 2007 Lung Adeno Genomic Events SNP Array Analysis Weir et al. Nature (2007) R.K.Wilson 2007 Lung Adeno Genomic Events Weir et al. Nature (2007) R.K.Wilson 2007 Lung Adeno Genomic Events Weir et al. Nature (2007) R.K.Wilson 2007 Lung Adenocarcinoma Amplifications Weir et al. Nature (2007) R.K.Wilson 2007 0 KRAS E2F4 TP53 GNAS STK11 EGFR LRRK2 CDKN2A EPHA3 NF1 SCARF2 PTPRD LMTK2 TYK2 RIN1 ROR2 MKNK2 ERBB4 LRP1B NTRK1 MYO3B PIK3CG LZTR1 JAG2 CDC2L2 EPHA5 CDH11 PAK3 SLC38A3 PIK3C3 INSRR NTRK3 ATM PRKCG BAGE4 KDR PTEN NRAS ZMYND10 PDGFRA INHBA PFTK1 TP73L FLT4 LTK DOCK3 NTRK2 EPHB6 IRAK2 ITK EPHB1 APC EPHA7 BAGE3 MST1 LMTK3 PAK7 GATA1 TFDP1 PRKACB TSHR MINK1 FGFR4 RB1 FGFR1 # of mutations Mutations in lung adenocarcinoma 70 60 50 40 30 20 10 KRAS and TP53 Are Mutated in About 1/3 of Tumor Samples Indels have not been included in the analysis R.K.Wilson 2007 Mutations in TP53, ERBB3, and AKT3 appear to correlate with tumor grade N=24 N=85 N=71 Mutation R.K.Wilson 2007 Correlations between mutations and clinical features • Mutations in PDGFRA, PTEN, NTRK1 and PRKDC show positive correlation with tumor stage. • Mutations in LRP1B, PRKDC, TP53, and APC correlate with the solid tumor histological subtype of lung adenocarcinoma. • High correlation of mutations in EGFR and MYO3B with never smoker and mutations in KRAS and LRP1B with smokers. R.K.Wilson 2007 EGFR mutations in glioblastoma Screen of kinase domains in glioblastomano recurrent mutations But … EC I 8 TM JM 9 KD 21 21 22 23 24 25 26 27 28 10 11 12 13 15 14 15 16 17 18 19 20 II EGFRvIII (del AA 30-297) 119 Lung Tumors: no EC mutations 270 HapMap Normals: no EC mutations red=somatic blue=germline black=unknown III IV L861Q 5 78 P596L G598V 4 76 T263P A289V/D/T R324L E330K 3 R108K 32 D46N,G63R 21 KINASE 18/132 glioblastoma (13.6%); + 1 KD 1/8 glioblastoma cell lines (12.5%) 0/11 lower grade gliomas 151 Total samples R.K.Wilson 2007 Genomic Studies of Cancer • Hypothesis-driven (biased): - Gene sets with related functions: “kinome”, “phosphatome” - Genes mutated in other cancers - Closely related genes - Investigator-driven ideas • Data-driven (unbiased): - Use genomic platforms to identify loci with recurrent somatic alterations - Array-based RNA profiling - Array CGH - Array-based SNP genotyping R.K.Wilson 2007 Acute myelogenous leukemia • Project initiated in 2002. • Primary tumors, matched normal tissue (i.e., germline variants vs. somatic mutations) • “Discovery set” (46 tumors) + “Validation set” (94 tumors) • Initial target list: 450 genes • Orthogonal technologies (CGH arrays, expression profiling, etc.) for genome characterization and to detect additional sequencing targets. R.K.Wilson 2007 Acute myelogenous leukemia - FLT3: 29% - NPM1: 25% - NRAS: 9.6% - PTPN11: 4% - RUNX1: 4% - GCSFR: 4% - Others: 2-3% R.K.Wilson 2007 Is there a better approach? • What are we missing outside of the exons? • PCR-based re-sequencing: - Relatively expensive - Diploid (at best) & low coverage R.K.Wilson 2007 Solexa/Illumina 1G Analyzer R.K.Wilson 2007 Solexa/Illumina 1G Analyzer Illumina flow cell • Acts as the microfluidic conduit for cluster generation and sequencing reagents. • 8-lane flow cell configuration. • Separate libraries can be sequenced in each lane, or the same library in all. • ~60M clusters are sequenced per flow cell. R.K.Wilson 2007 Next Generation Sequencing Technologies Genome size 3000 Mb Req'd coverage 6 3730 bp/read Reads/run bp/run #/runs req'd Cost per run Total cost 600 96 57,600 312,500 $ 48 $ 15,000,000 12 20 454 FLX Solexa 250 400,000 100,000,000 360 $ 6,800 $ 2,448,000 32 28,000,000 896,000,000 67 $ $ 9,300 622,768 R.K.Wilson 2007 AML: Whole Genome Sequencing Data types: • Whole genome sequence (tumor genome): Solexa • FL cDNA normalized library: Solexa + 454 • Whole genome sequence (epidermal genome): Solexa Analysis plans: • Compare sequence to previously identified mutations. • Compare increasing coverage levels to heterozygous SNPs from Affy/Illumina arrays for coverage evaluation. • Devise strategic approaches to find novel variants; validate and characterize. R.K.Wilson 2007 “933124” • 57 y/o Caucasian female • De novo M1 AML • 100% blasts in initial BM sample • Relapsed and died at 11 months • Normal cytogenetics • No LOH on Affy 500K SNP array • Informed consent for whole genome sequencing R.K.Wilson 2007 R.K.Wilson 2007 R.K.Wilson 2007 AML: Whole Genome Sequencing • • • • • As of 1/28/08: 75 Solexa runs completed (32 bp reads) 62 billion bp (~22X haploid coverage) 2,123,143 sequence variants detected (Q30) 492,569 (23.2%) are previously undiscovered SNPs • 46,320 heterozygous (informative) SNPs from Affy and Ilumina SNP arrays. • 77% of informative SNPs with both WT and variant alleles were detected in the genome sequence. • 97.4% of informative SNPs of either allele were detected in the genome sequence. R.K.Wilson 2007 AML: Whole Genome Sequencing “933124” genome sequence 2,123,143 variants dbSNP 1,630,574 Splice_site Coding 99 5,056 Synonymous 1,222 Intergenic 145,092 Genic 334,477 Other 329,322 Missense Nonsense Nonstop 3,402 320 9 *Only reporting Q30 variants *Genic region = gene boundary +/- 50kb R.K.Wilson 2007 AML: Transcriptome Sequencing Various cDNA library construction procedures & normalization schemes 454 cDNA sequencing: Number of mapped cDNA reads: 306,267 Solexa cDNA sequencing: Number of mapped reads: 47,153,784 R.K.Wilson 2007 AML: Transcriptome Sequencing Expressed genes: variant:germline frequencies – – – – – – – – – – – – – MYCBP2 HSP90B1 BCCIP NCOR1 CHFR DNAJ PTPN11 NUMA1 CASPASE 7 HOX C6 PLEKHC1 NTRK3 CDC2 1188:345 694:1347 391:394 256:268 230:52 218:0 198:1 157:2 145:147 118:2 112:14 112:10 96:82 R.K.Wilson 2007 V194M (C to T) in FLT3 CT CT cDNA sequence Tumor genome sequence R.K.Wilson 2007 AML: Whole Genome Sequencing • Currently using SXOligoSearchG (Synamatix) to detect small (1-2 bp) indels. • Evaluating software tools for detection of larger indels. R.K.Wilson 2007 AML: Current status thirsty for knowledge? R.K.Wilson 2007 AML: Current status • Diploid coverage was obtained for 77% of an AML M1 tumor genome with 22x haploid coverage. • 2.1M sequence variants found (similar to other whole genomes already ‘finished’). • ~495,000 novel variants: SNPs vs. somatic mutations • 10x coverage of epidermis (“normal”) genome just completed; may identify >90% of variants as rare SNPs. • Remaining 50,000 variants are being prioritized by detection in cDNA: should be <1,000 • Very rare somatic mutations in cDNA thusfar (only 2 validated). • No mutator (“driver”) phenotype is readily apparent for this AML case; ”passenger” mutations appear to be rare. • We continue to sift through the data… R.K.Wilson 2007 Cancer Genomics • Exon-targeted sequencing (TSP, glioblastoma) is revealing useful & interesting findings; expensive & slow! • Next Gen sequencing is here and will have a substantial near-term impact on the study of cancer genomes! • Ancillary genome-based technologies (expression profiling, SNP arrays, cDNA sequencing) are crucial for understanding the target genome before considering WGS. • The dream is not hype: a comprehensive understanding of the “cancer genome” is probable, and will change the way that you diagnose & treat your patients. R.K.Wilson 2007 Acknowledgments • WU Genome Sequencing Center Elaine Mardis, Li Ding, Dave Dooling, Tracy Miner, Mike McLellan, Ginger Fewell, Jim Eldred, Asif Chinwalla, Yumi Kasai, Lucinda Fulton, Vince Magrini, Matt Hickenbotham, Lisa Cook, Michael Wendl, Michael Province • WU Siteman Cancer Center Tim Ley, Mark Watson, Matt Walter, Rhonda Ries, Jackie Payton, John DiPersio, Dan Link, Michael Tomasson, Tim Graubert, Sharon Heath • TSP/TCGA Colleagues Baylor HGSC, Broad Institute, many others… • Funding sources NHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML WGS) genome.wustl.edu R.K.Wilson 2007