Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cancer Genomics Lecture Outline • How do you do whole genome sequencing? – with massively parallel sequencing – (slides courtesy of Jason Lieb) • What has this revealed about cancer genomes? – How many mutations of what type? • What are the ethical clinical implications about whole genome sequencing of patients? – what do you tell a patient about their genome? Lessons from Cancer Genomics •Sequence lots of cancer genomes. •Why do this? •It gives a more precise definition of cancer in general and of YOUR cancer in particular •The potential for personalized or “precision” medicine •Tumor heterogeneity: •Results from tumor evolution; genetic changes over time •Causes resistance to chemotherapy via natural selection •this is why you relapse Tumor heterogeneity makes cancer treatment more difficult The Downside of Diversity Science 29 March 2013: vol. 339 no. 6127 1543-1545 Genetic heterogeneity among the cells of an individual tumor always exists and can impact the response to therapeutics. Published by AAAS Fig. 6 Four types of genetic heterogeneity in tumors, illustrated by a primary tumor in the pancreas and its metastatic lesions in the liver. B Vogelstein et al. Science 2013;339:1546-1558 Cancer Genomics Lecture Outline • How do you do whole genome sequencing? – with massively parallel sequencing • What has this revealed about cancer genomes? – How many mutations of what type? • What are the ethical clinical implications about whole genome sequencing of patients? – what do you tell a patient about their genome? Why Sequence Whole Genomes? • To speed characterization of genes mapped by linkage • To obtain a "parts list" for what makes up an organism. All of the instructions are there, and the book is in front of us. Now, we just need to figure out what it all means. • To discover what sets of genes make organisms (and each of us) similar to and different from one another. • To understand our evolutionary heritage. Our genomes are a reflection of our recent and ancient origins. How Are Genomes Sequenced? In the 1980's, all of the tools were in place for genome sequencing to begin • Vectors for making genomic libraries • PCR for amplifying genes • DNA sequencing machines But sequencing was too slow and too expensive to go for whole genomes Di-deoxy nucleotides are used for "Sanger" sequencing 5’ 4’ 1’ 3’ 2’, 3’ dideoxy nucleotide triphosphate 2’ How it works: Dideoxy nucleotides terminate the DNA chain at positions according to DNA sequence What happens if we put ddATP into our reaction? P32 label on primer 20 nt 19 nt ddA ddA ddA 10 nt Sanger sequencing • • • • • Single primer PCR + chain terminating nucleotides Fredrick Sanger (1975) 1980 Nobel prize 1 molecule at a time First genome sequence (1977) • ΦX174 phage • 5,386 bp • ~1000 bp per run • Currently $5 / sequence (2 bp / cent) Modern Sanger sequencing uses fluorescent dyes instead of radioactivity and capillary tubes instead of gels Sanger Sequencing Output A strategy called Shotgun sequencing, coupled with advances in sequencing technology, robotics, and computers, made "Genomics" possible 1. Blow genome to bits 2. Sequence the little bits 3. Put all the little bits back together in the right order based on their sequence 4. Assemble longer and longer pieces until the genome is complete. There are always some gaps after shotgun sequencing Paired-end reads can identify clones that span sequence gaps Note that some gaps may never be closed. For example, it is difficult to assemble repetitive DNA sequence Using these strategies, between 1995 and 2003, the genome sequence of over 240 organisms had been determined 1977: First viral sequence ΦX174 (5.3 kb) 1995: First complete prokaryotic genome (H. influenzae, 1.8 Mb) 1996: First eukaryote, S. cerevisiae, 12 Mb 1998: First animal, C. elegans, 100 Mb 2001: Drosophila, 180 Mb (first genome by shotgun) 2001:“Draft” human sequence, 3 Gb. Book (2008): “Large Sequencing Centers can now assemble a mammalian genome in 1-2 years”. 2011: 30X genome in 10 days for ~$10,000-20,000 2012: 30X genome in 3 days for ~$3,000 The Sequence Explosion Nature, Feb. 2010 Next-generation Sequencing (Deep Sequencing) Massively parallel sequencing of short DNA fragments • It will not replace Sanger sequencing for small sequencing projects. • It has changed the approach for large scale sequencing projects and genome research. Time frames and cost for genome sequencing (Sanger): 1997 Yeast genome 13Mb – ~10 years; cost $30M 2003 Human genome ~3Gb – 15 years (1988-2003); $2.7 billion 1991 dollars – – Genome sequencing (20 x coverage) in September 2008 (2 Illumina GAII): Yeast genome 13Mb – 8 genomes per 6 day run; cost $1200/genome Human genome ~3Gb – 10-15 weeks; cost ~$150k – – Genome sequencing (20 x coverage) in March 2009 (3 Illumina systems GAII): Human genome ~3Gb: 7 weeks; cost ~$75k over 10,000 times cheaper – – Genome sequencing (20 x coverage) in March 2011 (1 Illumina Hi-seq): Human genome ~3Gb: 10 days; cost ~$15k – » Over 50,000 times cheaper, 500 times faster » Jan 2012 update: $3-5k, 2x faster Genome sequencing (20 x coverage) in 2015 (1 Illumina Hi-seq 2500): – Human genome for only $1000 (60 billion bases)!! “Some Next Gen” sequencing platforms Roche Genome Sequencer FLX System (454) Illumina Genome Analyzer (GAII and HiSeq) aka “Solexa” ABI SOLiD System HeliScope Single Molecule Sequencer Shendure J. & Hanlee J., Nature biotechnology 2008 Illumina Sequencing 3’ 5’ DNA (0.1-1.0 ug) A G C T G C T A C G A T A C C C G A T C G A T A T C G A T G C T Sample preparation Single Cluster molecule growtharray 5’ Sequencing 1 2 3 4 5 6 7 8 9 T G C T A C G A T … Image acquisition Base calling Barcodes Current status • 2 flow cells per machine • 8 lanes per flow cell • 200-300 million molecules sequenced per lane • 50 – 250 bp single or paired-end reads • $800-$2000 per lane The High-Throughput Sequencing Facility's • ~250,000 bp / cent sequencing bank is one of the largest in the country with 10 Illumina HiSeqs, a PacBio RS2, and several Ion Torrents and Protons. Cancer Genomics Lecture Outline • How do you do whole genome sequencing? – with massively parallel sequencing • What has this revealed about cancer genomes? – How many mutations of what type? • What are the ethical clinical implications about whole genome sequencing of patients? – what do you tell a patient about their genome? Some thoughts from sequencing cancer genomes • Gatekeeper or driver mutations • Passenger mutations • No metastatic genes Fig. 1 Number of somatic mutations in representative human cancers, detected by genome-wide sequencing studies. Most human cancers are caused by two to eight sequential alterations that develop over the course of 20 to 30 years. Each of these alterations directly or indirectly increases the ratio of cell birth to cell death; that is, each alteration causes a selective growth advantage to the cell in which it resides. Published by AAAS B Vogelstein et al. Science 2013;339:1546-1558 There are two types of cancer “driver” genes (e.g. oncogenes and tumor suppressors) Mut-driver genes: The evidence to date suggests that there are ~140 genes whose intragenic mutations contribute to cancer. Epi-driver genes: Genes that are altered by epigenetic mechanisms and cause a selective growth advantage. The definitive identification of epi-driver genes has been challenging. Published by AAAS B Vogelstein et al. Science 2013;339:1546-1558 Fig. 5 Number and distribution of driver gene mutations in five tumor types. B Vogelstein et al. Science 2013;339:1546-1558 Published by AAAS Fig. 3 Total alterations affecting protein-coding genes in selected tumors. B Vogelstein et al. Science 2013;339:1546-1558 Published by AAAS Fig. 4 Distribution of mutations in two oncogenes (PIK3CA and IDH1) and two tumor suppressor genes (RB1 and VHL). B Vogelstein et al. Science 2013;339:1546-1558 Published by AAAS The Hallmarks of Cancer Hanahan and Weinberg, Cell 144:646 (2011) Fig. 7 Cancer cell signaling pathways and the cellular processes they regulate. The known driver genes function through a dozen signaling pathways that regulate three core cellular processes: cell fate determination, cell survival, and genome maintenance. Published by AAAS B Vogelstein et al. Science 2013;339:1546-1558 Fig. 8 Signal transduction pathways affected by mutations in human cancer. Every individual tumor, even of the same histopathologic subtype as another tumor, is distinct with respect to its genetic alterations, but the pathways affected in different tumors are similar. Published by AAAS B Vogelstein et al. Science 2013;339:1546-1558 RESEARCH IN “MODEL” ORGANISMS! “The vast majority of our knowledge of the function of driver genes has been derived from the study of the pathways through which their homologs work in nonhuman organisms.” We believe that greater knowledge of these pathways and the ways in which they function is the most pressing need in basic cancer research. B Vogelstein et al. Science 2013;339:1546-1558 Cancer Genomics Lecture Outline • How do you do whole genome sequencing? – with massively parallel sequencing • What has this revealed about cancer genomes? – How many mutations of what type? • What are the ethical clinical implications about whole genome sequencing of patients? – what do you tell a patient about their genome? Some Definitions Regarding WGS • Actionable Item – when you can actually do something for the patient • Clinical Validity – the data supporting a genotype/phenotype relationship is strong • Clinical Utility – you can address (e.g. treat or advise) a clinically valid genotype • The “incidentalome” – the vast majority of a patients genetic variants will be unrelated to the presenting symptoms PGx: pharmacogenetics Berg et al. Genetics IN Medicine • Volume 13, Number 6, June 2011 Computationally binning human genetic variants We categorized 2,016 genes linked with Mendelian diseases into “bins” based on clinical utility and validity, and used a computational algorithm to analyze 80 whole-genome sequences in order to explore the use of such an approach in a simulated real world setting. Berg et al. Volume 15 | Number 1 | January 2013 | Genetics in medicine