Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course 311-405 Term 3, 2008/2009 Lecture on Wednesday, January 28th, 2009 Aleksandar Milosavljevic, PhD http://www.brl.bcm.tmc.edu Entering Segment 2 Segment 1 (3 weeks): Cancer Lectures (1,2,3) Lab: Genboree, Ruby Segment 2 (4 weeks): Bringing it together: Lecture+Lab Segment 3: Review lectures Background reading A broad-brush survey of trends: CREATIVITY SUPPORT TOOLS Accelerating Discovery and Innovation Ben Schneiderman A bit of history and pointers to philosophy ( Karl Popper, C.S. Peirce ): THINKING WITH MACHINES: Intelligence Augmentation, Evolutionary Epistemology, and Semiotic Peter Skagestad Cancer Genome Variation: Methods Recent landmark studies ( not covered this year ): The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008 Sep 4. [Epub ahead of print]. Parsons DW et al. An integrated genomic analysis of human glioblastoma multiforme. Science. 2008 Sep 26;321(5897):1807-12. Epub 2008 Sep 4. Jones S. et al Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science. 2008 Sep 26;321(5897):1801-6. Epub 2008 Sep 4. Cancer Genome Variation: Methods Lab focus ( Friday ) Lecture focus today Chromosome Aberrations: References 1 of 2 Background (optional) [Balmain 2001] Balmain, A., Cancer genetics: from Boveri and Mendel to microarrays. Nat Rev Cancer, 2001. 1(1): p. 77-82. [Albertson et al. 2003] Albertson, D.G., et al., Chromosome aberrations in solid tumors. Nat Genet, 2003. 34(4): p. 369-76. [Rabbitts et al. 2003] Rabbitts, T.H. and M.R. Stocks, Chromosomal translocation products engender new intracellular therapeutic technologies. Nat Med, 2003. 9(4): p. 383-6. Chromosome Aberrations References 2 of 2 Breast cancer – copy number variation, array CGH and gene expression [Chin K. et al. 2006] Chin K et al. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell 10:529-541 2006 Prostate Cancer – aberrant fusions – via gene expression [Tomlins et al. 2005] Tomlins SA et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8. Breast cancer – direct detection of aberrant fusions by endsequence profiling [Hampton OA et al] Hampton, OA et al, A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Research. 2008 Dec 9. [Epub ahead of print] Boveri, one century ago … Multiple cell poles cause unequal segregation of chromosomes. a | Fertilization of sea-urchin eggs by two sperm results in multiple cell poles. b | Chromosomes are aberrantly segregated [Balmain 2001] Chromosomal aberrations [Albertson et al.] Chromosomal aberrations [Albertson et al.] Cancer Genome Variation: Methods (Array) Comparative Genome Hybridization (array CGH) Chin K. et al., Genomic and transcriptional aberrations linked to breast cancer pathophysiologies, Cancer Cell 10:529-541 2006. • 100+ aggressively treated early stage breast tumors 1989-1997, before ERBB2 antagonist Trastuzumab (Herceptin) was approved for treating ERBB2+ breast cancer ERBB2 heuristic (“paradigm”) formulated in last sentence of Chin K. et al. “Taking ERBB2 as the paradigm (recurrently amplified, overexpressed, associated with outcome and with demonstrated functional importance in cancer) suggests FGFR1, TACC1, ADAM9, IKBKB, PNMT, and GRB7 as high-priority therapeutic targets in these regions of amplification.” “Taking ERBB2 as the paradigm (recurrently amplified, overexpressed… Array CGH (~3K BAC array) Gene expression (Affymetrix U133A array) “Taking ERBB2 as the paradigm (recurrently amplified… “Taking ERBB2 as the paradigm (… associated with outcome…) “Taking ERBB2 as the paradigm (… associated with outcome…) Mapping rearrangements ( aberrant fusions ) using paired ends Deletions, amplifications induce aberrant fusions …but… Some aberrant fusion-producing rearrangements ( reciprocal translocations, inversions ) may not affect copy number Two significant types of aberrant fusions aberrantly amplified expression aberrant activation of signaling protein [Rabbitts et al.] BCR-ABL fusion in Chronic Myeloid Leukaemia: four decades from lesion discovery to Imatinib ( Gleevec) 1960: Philadelphia chromosome discovered 1973: Chromosome translocation t(9;22) identified 1983: Activated oncogene ABL identified 2001: Drug inhibiting BCR-ABL fusion identified Fourfold significance of recurrent chromosomal aberrations Prognostic Marker Drug target Pointing to biological pathway Early diagnostic marker Two case studies of fusion discovery Case Study: Prostate Cancer Overexpression recurrent chromosomal aberration [Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8. Case Study: Breast cancer Direct discovery of submicroscopic chromosomal aberrations [Hampton OA et al] Hampton, OA et al. A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Research. 2008 Dec 9. [Epub ahead of print] Case Study: Prostate Cancer Recurrent ( > 50% cases) chromosomal aberrations discovered in leukaemias, lymphomas, and sarcomas Carcinomas more complex: -- more rearrangements -- submicroscopic structure Gene overexpression recurrent chromosomal aberration present in > 50% prostate carcinomas [Tomlins et al. 2005] Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644-8. Cancer Outlier Profile Analysis (COPA) using Oncomine database reveals overexpression of ETV1 and ERG [Tomlins et al.] Frequent gene amplifications and losses in receptor tyrosine kinase-mediated signaling ETV1 ERG Recurrent TMPRSS2:ETV1 and TMPRSS2:ERG fusions revealed by the study of rearrangements involving ETV1 and ERG Expression of TMPRSS2 is regulated by androgen [Tomlins et al.] Exclusivity of rearrangement: either ETV1 or ERG [Tomlins et al.] TMPRSS2 translocation associated with: • Aggressive disease Cancer Res 66:8347-51, 2006 • Reduced disease free survival Cancer Biol Ther 6, 2007 • Higher rate of prostate cancer specific death TMPRSS2:ERG gene fusion associated with lethal prostate cancer in a watchful waiting cohort. Oncogene, 2007 Direct discovery of submicroscopic chromosomal aberrations by endsequence profiling Detecting breakpoints / fusions by end-sequence profiling of genomic DNA fragments Cancer chromosome Paired-end shotgun sequencing Human Chr 20 Human Chr 3 Spectral Karyotyping (SKY) of MCF-7 breast cancer cell line Davidson et al (2000) Br J Cancer 83, 1309-17 • Near triploid • Translocations involve all chromosomes except 4 Current model for origin of rearrangements in breast cancer: Breakage-Fusion-Bridge (BFB) cycles initiated by “sticky” telomere ends Figure 10.14a The Biology of Cancer (© Garland Science 2007) Figure 10.14b The Biology of Cancer (© Garland Science 2007) Figure 10.14c The Biology of Cancer (© Garland Science 2007) Genome instability occurs during transition from hyperplasia to carcinoma in situ End-sequence profiling of cancer First genome-wide End-Sequence Profile of cancer: MCF-7 breast cancer cell line (Volik et al, 2003 & 2006) ~20,000 BAC ends sequenced by Sanger method ~1X genome coverage Left Tag MCF-7 BAC (~150Kb) Right Tag chromosome 17 chromosome 20 Whole-genome BAC-end sequencing of MCF-7 (Volik et al. 2006): 1) ~20,000 MCF-7 BACs end-sequenced 2) end-sequences mapped onto reference genome Intrachromosomally rearranged BACs Interchromosomally rearranged BACs Rearrangement-spanning MCF-7 BACs Chr 1 2 3 4 5 6 …….. ~ 600 BACs contain rearrangements (Volik et al. 2006) ~ 2.5 % of the human genome Down to the basepair level: Sequencing of Rearrangement-spanning MCF-7 BACs 569 non-redundant rearranged BACs Volik et al, 2003 & 2006 454 PyroSeq Run 1 454 PyroSeq Run 2 454 PyroSeq Run 3 96-BAC Pool 1 Fosmid Library A 96-BAC Pool 2 Fosmid Library B 96-BAC Pool 3 Fosmid Library C 96-BAC Pool 4 Fosmid Library D 96-BAC Pool 5 Fosmid Library E 96-BAC Pool 6 Fosmid Library F Hampton OA et al. 8-10K Fosmid clones selected from each library for end sequencing (sanger) Bridging (FES) and Outlining (454 PyroSeq) BAC (134Kb) Fosmids (40Kb) PyroSeq chromosome 20 PyroSeq chromosome 3 PyroSeq chromosome 17 PCR Validation Pipeline and Genboree integration Hampton OA et al. 157 PCR-confirmed somatic breakpoint junctions Hampton OA et al. Genomic Aberrations in MCF-7 20 157 rearrangements • detected in BACs • PCR-validated on gDNA 17 83 Intrachromosomal 74 Interchromosomal Hampton OA et al. 1 3 A majority of dispersed breakpoints fall within LCRs Hampton OA et al. Transcript validate Detection ofRT-PCR FusiontoTranscripts expression of predicted fusion transcripts Genomic fusion: Predicted transcript: ATXN7 RAD51C promoter Exon 7 Exon 6 Exon 13 Fusion RAD51C ATXN7 RAD51C MCF7 10A N Fusion MCF7 10A N ATXN7 MCF7 10A N RT-PCR Hampton OA et al. Expression of predicted fusion transcripts Validation By siRNA knockdown Hampton OA et al. Biological validation: siRNA knock-down of SULF2 in 3 cell lines anchorageindependent growth growth survival Hampton OA et al. Expression of predicted fusion transcripts Hampton OA et al. Two Mechanisms for DoubleStrand Break Repair NAHR: Non-Allelic Homologous Recombination NHEJ: Non-Homologous End-Joining NAHR: Roles of RAD51 and RAD51C in HR Non-Alleleic Homologous Recombination RAD51 RAD51C Figure 12.32 The Biology of Cancer (© Garland Science 2007) RAD51C is under-expressed in 51 out of 53 breast cancer cell lines relative to normal breast tissue MCF-7 Normal Breast 10 9.5 Expression Level 9 8.5 8 7.5 7 6.5 6 5.5 600MPE AU565 MCF7 MD AMB134 BT20 BT474 MD AMB157 MD AMB175 BT483 BT549 MD AMB231 MD AMB361 CAMA1 D U4475 MD AMB415 MD AMB435 HBL100 HCC38 MD AMB436 MD AMB453 HCC70 HCC202 HCC1007 MD AMB468 SKBR3 SUM44PE HCC1008 HCC1143 SUM52PE SUM149PT HCC1187 HCC1428 SUM159PT SUM185PE HCC1500 HCC1569 SUM190PT SUM225CWN HCC1599 HCC1937 SUM1315 T47D HCC1954 HCC2157 UACC812 Z R751 HCC2185 HCC3153 Z R7530 Z R75B HS578T LY2 5 Row 2 Cell Line MCF10A MCF12A RAD51C under-expression is cancer specific, not tissue specific Does the RAD51C / ATXN7 fusion • interfere with resolution of Holliday junctions or • otherwise affect HR in a dominant negative fashion? NHEJ: Non-Homologous End-Joining Figure 12.33 The Biology of Cancer (© Garland Science 2007) Pending publication in PNAS ? Figure 12.34a The Biology of Cancer (© Garland Science 2007) Back to technology: ramping up Coverage is proportional to insert size long inserts 2X short inserts 2X 2X coverage = L * N / G L N G = insert size = number of inserts = genome size Probability of breakpoint detection = 1 - e – coverage coverage = L * N / G L N G = insert size = number of inserts = genome size Massively parallel paired-end sequencing fragment size 200bp Illumina 3 Kbp Illumina, SOLiD 20 Kbp Roche-454 40 Kbp diTag Method run ~ cost unit > 50M reads per run ( 8 lanes ) > 50M reads per run < 1M reads per run > 50M reads (54bp diTags) per run Illumina Error Rate (%) 4 75 fragment 75-paired end 3 35 fragment 45 paired end 2 1 0 0 10 Left Tag 20 30 40 50 60 70 CyclesRight Tag BLAST hits using diTag as query Platform-independent end-sequencing Paired-end Method X Paired-end Method Y Paired-end Method Z Vendor X Vendor Y Vendor Z Modular paired-end method $1M genome $100 genome Effective coverage is reduced when cell population is heterogeneous Effective coverage = Coverage * Fraction of tumor cells with rearrangement 20% non-tumor cells 80% non-tumor cells 80% tumor cells 20% tumor cells Effective coverage = Coverage * 80% Effective coverage = Coverage * 20% From the perspective of an LCR breakpoint insert size is effectively reduced by LCR size inserts LCR breakpoint “wiggle room” 1 - e – effective coverage effective coverage = W * N / G Probability of breakpoint detection = W = insert size – LCR size (“wiggle room”) N = number of inserts G = genome size Breakpoints detected by 54bp diTag seqencing ~ 0.5 Mbp deletion Roche-454 and Illumina diTag mappings are consistent with fosmid insert size Roche-454 Illumina Genboree pipelines for genome mapping: paired-end and array CGH Laboratory exercise this week: array CGH Analysis of array CGH data from a set of tumor samples using Genboree – Upload array CGH data – Perform segmentation (invoke Bioconductor tool) – Subtract polymorphisms (databases, current literature) – Identify recurrent amplifications or deletions – Study correlation with gene expression