Download Cancer Genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Cancer Genomics Lecture Outline
• How do you do whole genome sequencing?
– with massively parallel sequencing
– (slides courtesy of Jason Lieb)
• What has this revealed about cancer genomes?
– How many mutations of what type?
• What are the ethical clinical implications about
whole genome sequencing of patients?
– what do you tell a patient about their genome?
Lessons from Cancer Genomics
•Sequence lots of cancer genomes.
•Why do this?
•It gives a more precise definition of cancer in general and of YOUR cancer in
particular
•The potential for personalized or “precision” medicine
•Tumor heterogeneity:
•Results from tumor evolution; genetic changes over time
•Causes resistance to chemotherapy via natural selection
•this is why you relapse
Tumor heterogeneity makes cancer
treatment more difficult
The Downside of
Diversity Science 29
March 2013: vol. 339
no. 6127 1543-1545
Genetic heterogeneity among the cells of an individual tumor
always exists and can impact the response to therapeutics.
Published by AAAS
Fig. 6 Four types of genetic heterogeneity in tumors, illustrated by a primary
tumor in the pancreas and its metastatic lesions in the liver.
B Vogelstein et al. Science 2013;339:1546-1558
Cancer Genomics Lecture Outline
• How do you do whole genome sequencing?
– with massively parallel sequencing
• What has this revealed about cancer genomes?
– How many mutations of what type?
• What are the ethical clinical implications about
whole genome sequencing of patients?
– what do you tell a patient about their genome?
Why Sequence Whole Genomes?
• To speed characterization of
genes mapped by linkage
• To obtain a "parts list" for
what makes up an organism.
All of the instructions are there, and
the book is in front of us. Now, we
just need to figure out what it all
means.
• To discover what sets of
genes make organisms (and
each of us) similar to and
different from one another.
• To understand our
evolutionary heritage. Our
genomes are a reflection of our
recent and ancient origins.
How Are Genomes Sequenced?
In the 1980's, all of the tools were in place
for genome sequencing to begin
• Vectors for making genomic libraries
• PCR for amplifying genes
• DNA sequencing machines
But sequencing was too slow and too
expensive to go for whole genomes
Di-deoxy nucleotides are used for "Sanger" sequencing
5’
4’
1’
3’
2’, 3’ dideoxy nucleotide triphosphate
2’
How it works: Dideoxy nucleotides terminate the DNA chain
at positions according to DNA sequence
What happens if we put ddATP into our reaction?
P32 label
on
primer
20 nt
19 nt
ddA
ddA
ddA
10 nt
Sanger sequencing
•
•
•
•
•
Single primer PCR + chain terminating nucleotides
Fredrick Sanger (1975)
1980 Nobel prize
1 molecule at a time
First genome sequence (1977)
• ΦX174 phage
• 5,386 bp
• ~1000 bp per run
• Currently $5 / sequence (2 bp / cent)
Modern Sanger sequencing uses fluorescent dyes instead
of radioactivity and capillary tubes instead of gels
Sanger Sequencing Output
A strategy called Shotgun sequencing, coupled with
advances in sequencing technology, robotics, and
computers, made "Genomics" possible
1. Blow genome to bits
2. Sequence the little bits
3. Put all the little bits back
together in the right order
based on their sequence
4. Assemble longer and
longer pieces until the
genome is complete.
There are always some gaps after
shotgun sequencing
Paired-end reads can identify clones that span
sequence gaps
Note that some gaps may never be closed. For example, it
is difficult to assemble repetitive DNA sequence
Using these strategies, between 1995 and
2003, the genome sequence of over 240
organisms had been determined
1977: First viral sequence ΦX174 (5.3 kb)
1995: First complete prokaryotic genome (H. influenzae, 1.8 Mb)
1996: First eukaryote, S. cerevisiae, 12 Mb
1998: First animal, C. elegans, 100 Mb
2001: Drosophila, 180 Mb (first genome by shotgun)
2001:“Draft” human sequence, 3 Gb.
Book (2008): “Large Sequencing Centers can now assemble a
mammalian genome in 1-2 years”.
2011: 30X genome in 10 days for ~$10,000-20,000
2012: 30X genome in 3 days for ~$3,000
The Sequence Explosion
Nature, Feb. 2010
Next-generation Sequencing
(Deep Sequencing)
Massively parallel sequencing of short DNA fragments
• It will not replace Sanger sequencing for small sequencing projects.
• It has changed the approach for large scale sequencing projects and genome
research.
Time frames and cost for genome sequencing (Sanger):
1997 Yeast genome 13Mb – ~10 years; cost $30M
2003 Human genome ~3Gb – 15 years (1988-2003); $2.7 billion 1991 dollars
–
–
Genome sequencing (20 x coverage) in September 2008 (2 Illumina GAII):
Yeast genome 13Mb – 8 genomes per 6 day run; cost $1200/genome
Human genome ~3Gb – 10-15 weeks; cost ~$150k
–
–
Genome sequencing (20 x coverage) in March 2009 (3 Illumina systems GAII):
Human genome ~3Gb: 7 weeks; cost ~$75k
over 10,000 times cheaper
–
–
Genome sequencing (20 x coverage) in March 2011 (1 Illumina Hi-seq):
Human genome ~3Gb: 10 days; cost ~$15k
–
»
Over 50,000 times cheaper, 500 times faster
»
Jan 2012 update: $3-5k, 2x faster
Genome sequencing (20 x coverage) in 2015 (1 Illumina Hi-seq 2500):
–
Human genome for only $1000 (60 billion bases)!!
“Some Next Gen” sequencing platforms
Roche Genome
Sequencer FLX System
(454)
Illumina Genome Analyzer
(GAII and HiSeq)
aka “Solexa”
ABI SOLiD System
HeliScope Single
Molecule Sequencer
Shendure J. & Hanlee J., Nature biotechnology 2008
Illumina Sequencing
3’ 5’
DNA
(0.1-1.0 ug)
A
G
C
T
G
C
T
A
C
G
A
T
A
C
C
C
G
A
T
C
G
A
T
A
T
C
G
A
T
G
C
T
Sample
preparation
Single
Cluster
molecule
growtharray
5’
Sequencing
1
2
3
4
5
6
7
8
9
T G C T A C G A T …
Image acquisition
Base calling
Barcodes
Current status
• 2 flow cells per machine
• 8 lanes per flow cell
• 200-300 million molecules sequenced per
lane
• 50 – 250 bp single or paired-end reads
• $800-$2000 per lane
The High-Throughput Sequencing Facility's
• ~250,000 bp / cent
sequencing bank is one of the largest in the
country with 10 Illumina HiSeqs, a PacBio RS2,
and several Ion Torrents and Protons.
Cancer Genomics Lecture Outline
• How do you do whole genome sequencing?
– with massively parallel sequencing
• What has this revealed about cancer genomes?
– How many mutations of what type?
• What are the ethical clinical implications about
whole genome sequencing of patients?
– what do you tell a patient about their genome?
Some thoughts from sequencing
cancer genomes
• Gatekeeper or driver mutations
• Passenger mutations
• No metastatic genes
Fig. 1 Number of somatic mutations in
representative human cancers, detected
by genome-wide sequencing studies.
Most human cancers are
caused by two to eight
sequential alterations
that develop over the
course of 20 to 30 years.
Each of these alterations directly or
indirectly increases the ratio of cell birth
to cell death; that is, each alteration
causes a selective growth advantage to
the cell in which it resides.
Published by AAAS
B Vogelstein et al. Science 2013;339:1546-1558
There are two types of cancer “driver” genes
(e.g. oncogenes and tumor suppressors)
Mut-driver genes:
The evidence to date suggests that there are ~140 genes whose
intragenic mutations contribute to cancer.
Epi-driver genes:
Genes that are altered by epigenetic mechanisms and cause a
selective growth advantage.
The definitive identification of epi-driver genes has been
challenging.
Published by AAAS
B Vogelstein et al. Science 2013;339:1546-1558
Fig. 5 Number and distribution of driver gene mutations in five tumor types.
B Vogelstein et al. Science 2013;339:1546-1558
Published by AAAS
Fig. 3 Total alterations affecting protein-coding genes in selected tumors.
B Vogelstein et al. Science 2013;339:1546-1558
Published by AAAS
Fig. 4 Distribution of mutations in two oncogenes (PIK3CA and IDH1) and two tumor
suppressor genes (RB1 and VHL).
B Vogelstein et al. Science 2013;339:1546-1558
Published by AAAS
The Hallmarks of Cancer
Hanahan and Weinberg, Cell 144:646 (2011)
Fig. 7 Cancer cell signaling pathways and the cellular processes they regulate.
The known driver genes
function through a
dozen signaling
pathways that regulate
three core cellular
processes: cell fate
determination, cell
survival, and genome
maintenance.
Published by AAAS
B Vogelstein et al. Science 2013;339:1546-1558
Fig. 8 Signal transduction pathways affected by mutations in human cancer.
Every individual tumor,
even of the same
histopathologic subtype
as another tumor, is
distinct with respect to
its genetic alterations,
but the pathways
affected in different
tumors are similar.
Published by AAAS
B Vogelstein et al. Science 2013;339:1546-1558
RESEARCH IN
“MODEL”
ORGANISMS!
“The vast majority of our knowledge of the
function of driver genes has been derived from
the study of the pathways through which their
homologs work in nonhuman organisms.”
We believe that greater knowledge of these pathways and the ways in
which they function is the most pressing need in basic cancer research.
B Vogelstein et al. Science 2013;339:1546-1558
Cancer Genomics Lecture Outline
• How do you do whole genome sequencing?
– with massively parallel sequencing
• What has this revealed about cancer genomes?
– How many mutations of what type?
• What are the ethical clinical implications about
whole genome sequencing of patients?
– what do you tell a patient about their genome?
Some Definitions Regarding WGS
• Actionable Item
– when you can actually do something for the patient
• Clinical Validity
– the data supporting a genotype/phenotype
relationship is strong
• Clinical Utility
– you can address (e.g. treat or advise) a clinically valid
genotype
• The “incidentalome”
– the vast majority of a patients genetic variants will be
unrelated to the presenting symptoms
PGx: pharmacogenetics
Berg et al. Genetics IN Medicine • Volume 13, Number 6, June 2011
Computationally binning human genetic variants
We categorized 2,016 genes linked with Mendelian diseases into “bins” based on
clinical utility and validity, and used a computational algorithm to analyze 80
whole-genome sequences in order to explore the use of such an approach in a
simulated real world setting.
Berg et al. Volume 15 | Number 1 | January 2013 | Genetics in medicine