Download Detecting Chromosomal Rearrangements

Document related concepts

The Cancer Genome Atlas wikipedia , lookup

Transcript
Genomic DNA Variation
Computer-Aided Discovery Methods
Baylor College of Medicine course 311-405
Term 3, 2008/2009
Lecture on Wednesday, January 28th, 2009
Aleksandar Milosavljevic, PhD
http://www.brl.bcm.tmc.edu
Entering Segment 2
Segment 1 (3 weeks): Cancer Lectures (1,2,3)
Lab: Genboree, Ruby
Segment 2 (4 weeks): Bringing it together: Lecture+Lab
Segment 3: Review lectures
Background reading
A broad-brush survey of trends:
CREATIVITY SUPPORT TOOLS
Accelerating Discovery and Innovation
Ben Schneiderman
A bit of history and pointers to philosophy
( Karl Popper, C.S. Peirce ):
THINKING WITH MACHINES:
Intelligence Augmentation, Evolutionary Epistemology, and
Semiotic
Peter Skagestad
Cancer Genome Variation: Methods
Recent landmark studies ( not covered this year ):
The Cancer Genome Atlas Research Network. Comprehensive genomic characterization
defines human glioblastoma genes and core pathways. Nature. 2008 Sep 4. [Epub
ahead of print].
Parsons DW et al. An integrated genomic analysis of human glioblastoma multiforme.
Science. 2008 Sep 26;321(5897):1807-12. Epub 2008 Sep 4.
Jones S. et al Core signaling pathways in human pancreatic cancers revealed by global
genomic analyses. Science. 2008 Sep 26;321(5897):1801-6. Epub 2008 Sep 4.
Cancer Genome Variation: Methods
Lab focus ( Friday )
Lecture focus today
Chromosome Aberrations: References
1 of 2
Background (optional)
[Balmain 2001]
Balmain, A., Cancer genetics:
from Boveri and Mendel to microarrays. Nat Rev Cancer,
2001. 1(1): p. 77-82.
[Albertson et al. 2003]
Albertson, D.G., et al.,
Chromosome aberrations in solid tumors. Nat Genet,
2003. 34(4): p. 369-76.
[Rabbitts et al. 2003]
Rabbitts, T.H. and M.R. Stocks,
Chromosomal translocation products engender new
intracellular therapeutic technologies. Nat Med, 2003.
9(4): p. 383-6.
Chromosome Aberrations References
2 of 2
Breast cancer – copy number variation, array CGH and gene
expression
[Chin K. et al. 2006]
Chin K et al. Genomic and transcriptional
aberrations linked to breast cancer pathophysiologies, Cancer
Cell 10:529-541 2006
Prostate Cancer – aberrant fusions – via gene expression
[Tomlins et al. 2005]
Tomlins SA et al., Recurrent fusion of
TMPRSS2 and ETS transcription factor genes in prostate
cancer. Science, 2005. 310(5748): p. 644-8.
Breast cancer – direct detection of aberrant fusions by endsequence profiling
[Hampton OA et al] Hampton, OA et al, A sequence-level map of
chromosomal breakpoints in the MCF-7 breast cancer cell line
yields insights into the evolution of a cancer genome.
Genome Research. 2008 Dec 9. [Epub ahead of print]
Boveri, one century ago …
Multiple cell poles cause unequal
segregation of chromosomes.
a | Fertilization of sea-urchin eggs by
two sperm results in multiple cell poles.
b | Chromosomes are aberrantly
segregated
[Balmain 2001]
Chromosomal aberrations
[Albertson et al.]
Chromosomal aberrations
[Albertson et al.]
Cancer Genome Variation: Methods
(Array) Comparative Genome Hybridization
(array CGH)
Chin K. et al., Genomic and transcriptional
aberrations linked to breast cancer
pathophysiologies, Cancer Cell 10:529-541 2006.
• 100+ aggressively treated early stage breast tumors
1989-1997, before ERBB2 antagonist Trastuzumab
(Herceptin) was approved for treating ERBB2+ breast
cancer
ERBB2 heuristic (“paradigm”)
formulated in last sentence of Chin K. et al.
“Taking ERBB2 as the paradigm
(recurrently amplified, overexpressed,
associated with outcome and with
demonstrated functional importance in
cancer) suggests FGFR1, TACC1,
ADAM9, IKBKB, PNMT, and GRB7 as
high-priority therapeutic targets in these
regions of amplification.”
“Taking ERBB2 as the paradigm
(recurrently amplified, overexpressed…
Array CGH (~3K BAC array)
Gene expression (Affymetrix U133A array)
“Taking ERBB2
as the paradigm
(recurrently
amplified…
“Taking ERBB2
as the
paradigm (…
associated with
outcome…)
“Taking ERBB2 as the paradigm (…
associated with outcome…)
Mapping rearrangements ( aberrant
fusions ) using paired ends
Deletions, amplifications induce aberrant
fusions
…but…
Some aberrant fusion-producing
rearrangements ( reciprocal
translocations, inversions ) may not affect
copy number
Two significant types of aberrant
fusions
aberrantly
amplified
expression
aberrant
activation
of signaling
protein
[Rabbitts et al.]
BCR-ABL fusion in Chronic Myeloid Leukaemia: four
decades from lesion discovery
to Imatinib ( Gleevec)
1960: Philadelphia chromosome discovered
1973: Chromosome translocation t(9;22)
identified
1983: Activated oncogene ABL identified
2001: Drug inhibiting BCR-ABL fusion
identified
Fourfold significance of recurrent
chromosomal aberrations
Prognostic Marker
Drug target
Pointing to biological pathway
Early diagnostic marker
Two case studies of fusion discovery
Case Study: Prostate Cancer
Overexpression  recurrent chromosomal aberration
[Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion of
TMPRSS2 and ETS transcription factor genes in prostate
cancer. Science, 2005. 310(5748): p. 644-8.
Case Study: Breast cancer
Direct discovery of submicroscopic chromosomal
aberrations
[Hampton OA et al] Hampton, OA et al. A sequence-level map
of chromosomal breakpoints in the MCF-7 breast cancer
cell line yields insights into the evolution of a cancer
genome. Genome Research. 2008 Dec 9. [Epub ahead of
print]
Case Study: Prostate Cancer
Recurrent ( > 50% cases) chromosomal aberrations discovered in
leukaemias, lymphomas, and sarcomas
Carcinomas more complex:
-- more rearrangements
-- submicroscopic structure
Gene overexpression  recurrent chromosomal aberration
present in > 50% prostate carcinomas
[Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion of
TMPRSS2 and ETS transcription factor genes in prostate
cancer. Science, 2005. 310(5748): p. 644-8.
Cancer Outlier Profile Analysis (COPA) using
Oncomine database reveals overexpression of
ETV1 and ERG
[Tomlins et al.]
Frequent gene amplifications and losses in
receptor tyrosine kinase-mediated signaling
ETV1
ERG
Recurrent TMPRSS2:ETV1 and
TMPRSS2:ERG fusions
revealed by the study of rearrangements
involving ETV1 and ERG
Expression of
TMPRSS2
is regulated
by androgen
[Tomlins et al.]
Exclusivity of rearrangement:
either ETV1 or ERG
[Tomlins et al.]
TMPRSS2 translocation associated with:
• Aggressive disease
Cancer Res 66:8347-51, 2006
• Reduced disease free survival
Cancer Biol Ther 6, 2007
• Higher rate of prostate cancer specific death
TMPRSS2:ERG gene fusion associated with
lethal
prostate cancer in a watchful waiting
cohort. Oncogene, 2007
Direct discovery of submicroscopic
chromosomal aberrations by endsequence profiling
Detecting breakpoints / fusions by
end-sequence profiling of genomic DNA
fragments
Cancer chromosome
Paired-end shotgun
sequencing
Human Chr 20
Human Chr 3
Spectral Karyotyping (SKY) of
MCF-7 breast cancer cell line
Davidson et al (2000) Br J Cancer 83, 1309-17
• Near triploid
• Translocations involve all chromosomes except 4
Current model for origin of
rearrangements in breast cancer:
Breakage-Fusion-Bridge (BFB)
cycles initiated by “sticky” telomere
ends
Figure 10.14a The Biology of Cancer (© Garland Science 2007)
Figure 10.14b The Biology of Cancer (© Garland Science 2007)
Figure 10.14c The Biology of Cancer (© Garland Science 2007)
Genome instability occurs during transition
from hyperplasia to carcinoma in situ
End-sequence profiling of cancer
First genome-wide End-Sequence Profile of cancer:
MCF-7 breast cancer cell line (Volik et al, 2003 & 2006)
~20,000 BAC ends sequenced by Sanger method
~1X genome coverage
Left Tag
MCF-7 BAC (~150Kb)
Right Tag
chromosome 17
chromosome 20
Whole-genome BAC-end sequencing of MCF-7 (Volik
et al. 2006):
1) ~20,000 MCF-7 BACs end-sequenced
2) end-sequences mapped onto reference genome
Intrachromosomally
rearranged BACs
Interchromosomally
rearranged BACs
Rearrangement-spanning MCF-7 BACs
Chr 1
2
3
4
5
6
……..
~ 600 BACs contain rearrangements
(Volik et al. 2006)
~ 2.5 % of the human genome
Down to the basepair level:
Sequencing of Rearrangement-spanning MCF-7 BACs
569 non-redundant
rearranged BACs
Volik et al, 2003 & 2006
454 PyroSeq
Run 1
454 PyroSeq
Run 2
454 PyroSeq
Run 3
96-BAC
Pool 1
Fosmid
Library A
96-BAC
Pool 2
Fosmid
Library B
96-BAC
Pool 3
Fosmid
Library C
96-BAC
Pool 4
Fosmid
Library D
96-BAC
Pool 5
Fosmid
Library E
96-BAC
Pool 6
Fosmid
Library F
Hampton OA et al.
8-10K
Fosmid clones
selected from
each library for
end sequencing
(sanger)
Bridging (FES) and Outlining (454 PyroSeq)
BAC (134Kb)
Fosmids (40Kb)
PyroSeq
chromosome 20
PyroSeq
chromosome 3
PyroSeq
chromosome 17
PCR Validation Pipeline
and Genboree integration
Hampton OA et al.
157 PCR-confirmed somatic
breakpoint junctions
Hampton OA et al.
Genomic Aberrations in MCF-7
20
157 rearrangements
• detected in BACs
• PCR-validated on gDNA
17
83 Intrachromosomal
74 Interchromosomal
Hampton OA et al.
1
3
A majority of dispersed breakpoints
fall within LCRs
Hampton OA et al.
Transcript
validate
Detection
ofRT-PCR
FusiontoTranscripts
expression of predicted fusion transcripts
Genomic fusion:
Predicted transcript:
ATXN7
RAD51C
promoter
Exon 7
Exon 6
Exon 13
Fusion
RAD51C
ATXN7
RAD51C
MCF7 10A
N
Fusion
MCF7 10A N
ATXN7
MCF7 10A
N
RT-PCR
Hampton OA et al.
Expression of predicted fusion transcripts
Validation
By
siRNA
knockdown
Hampton OA et al.
Biological validation:
siRNA knock-down of SULF2 in 3 cell lines
anchorageindependent
growth
growth
survival
Hampton OA et al.
Expression of predicted fusion transcripts
Hampton OA et al.
Two Mechanisms for DoubleStrand Break Repair
NAHR:
Non-Allelic Homologous
Recombination
NHEJ:
Non-Homologous End-Joining
NAHR:
Roles of RAD51 and RAD51C in HR
Non-Alleleic
Homologous
Recombination
RAD51
RAD51C
Figure 12.32 The Biology of Cancer (© Garland Science 2007)
RAD51C is under-expressed in 51 out of 53 breast
cancer cell lines relative to normal breast tissue
MCF-7
Normal
Breast
10
9.5
Expression Level
9
8.5
8
7.5
7
6.5
6
5.5
600MPE
AU565
MCF7
MD AMB134
BT20
BT474
MD AMB157
MD AMB175
BT483
BT549
MD AMB231
MD AMB361
CAMA1
D U4475
MD AMB415
MD AMB435
HBL100
HCC38
MD AMB436
MD AMB453
HCC70
HCC202
HCC1007
MD AMB468
SKBR3
SUM44PE
HCC1008
HCC1143
SUM52PE
SUM149PT
HCC1187
HCC1428
SUM159PT
SUM185PE
HCC1500
HCC1569
SUM190PT
SUM225CWN
HCC1599
HCC1937
SUM1315
T47D
HCC1954
HCC2157
UACC812
Z R751
HCC2185
HCC3153
Z R7530
Z R75B
HS578T
LY2
5
Row
2
Cell
Line
MCF10A
MCF12A
RAD51C under-expression is cancer specific,
not tissue specific
Does the RAD51C / ATXN7 fusion
• interfere with resolution of Holliday
junctions or
• otherwise affect HR
in a dominant negative fashion?
NHEJ:
Non-Homologous
End-Joining
Figure 12.33 The Biology of Cancer (© Garland Science 2007)
Pending publication in PNAS ?
Figure 12.34a The Biology of Cancer (© Garland Science 2007)
Back to technology:
ramping up
Coverage is proportional to insert size
long inserts
2X
short inserts
2X
2X
coverage = L * N / G
L
N
G
= insert size
= number of inserts
= genome size
Probability of
breakpoint detection
= 1 - e – coverage
coverage = L * N / G
L
N
G
= insert size
= number of inserts
= genome size
Massively parallel
paired-end sequencing
fragment size
200bp Illumina
3 Kbp Illumina, SOLiD
20 Kbp Roche-454
40 Kbp diTag Method
run ~ cost unit
> 50M reads per run ( 8 lanes )
> 50M reads per run
< 1M reads per run
> 50M reads (54bp diTags) per run
Illumina
Error Rate (%)
4
75 fragment
75-paired end
3
35 fragment
45 paired end
2
1
0
0
10
Left Tag
20
30
40
50
60
70
CyclesRight Tag
BLAST hits
using diTag as query
Platform-independent
end-sequencing
Paired-end
Method X
Paired-end
Method Y
Paired-end
Method Z
Vendor X
Vendor Y
Vendor Z
Modular paired-end method
$1M genome
$100 genome
Effective coverage is reduced when
cell population is heterogeneous
Effective coverage
= Coverage * Fraction of tumor cells
with rearrangement
20% non-tumor
cells
80% non-tumor
cells
80% tumor cells
20% tumor cells
Effective coverage
= Coverage * 80%
Effective coverage
= Coverage * 20%
From the perspective of an LCR breakpoint
insert size is effectively reduced by LCR size
inserts
LCR
breakpoint
“wiggle room”
1 - e – effective coverage
effective coverage = W * N / G
Probability of breakpoint detection =
W = insert size – LCR size (“wiggle room”)
N = number of inserts
G = genome size
Breakpoints detected by 54bp diTag seqencing
~ 0.5 Mbp deletion
Roche-454 and Illumina diTag mappings
are consistent with fosmid insert size
Roche-454
Illumina
Genboree pipelines for genome
mapping: paired-end and array CGH
Laboratory exercise this week: array CGH
Analysis of array CGH data from a set of tumor
samples using Genboree
– Upload array CGH data
– Perform segmentation (invoke Bioconductor
tool)
– Subtract polymorphisms (databases, current
literature)
– Identify recurrent amplifications or deletions
– Study correlation with gene expression