Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSCI2950-C Lecture 11 Cancer Genomics: Duplications October 23, 2008 http://cs.brown.edu/courses/csci2950-c/ Outline • Cancer Genomes 1. Comparative Genomic Hybridization • Cancer Progression Models DNA Microarrays Measuring Mutations in Cancer Comparative Genomic Hybridization (CGH) CGH Analysis (1) Log2(R/G) • Divide genome into segments of equal copy number 0.5 0 Genomic position -0.5 Deletion Amplification 0.5 0 -0.5 Genomic position A+ Model C+ for CGH G+ HMM data Fridlyand et al. (2004) S1 S2 S3 S4 A model for CGH data K states copy numbers S1 S2 Heterozygous Deletion (copy =1) Homozygous Deletion (copy =0) 1, 1 Copy number Emissions: Gaussians 2, 2 S3 Normal (copy =2) 3, 3 Genome coordinate S4 Duplication (copy >2) 4, 4 CGH Segmentation: Model Selection How many states copy number states K? Larger K: 1. Better fit to observed data 2. More parameters to estimate Avoid overfitting by model selection. Let = (A, B, ) be parameters for HMM. Try different k = 1, …, Kmax Compute L( | O ) by dynamic programming (forward-backward algorithm) Calculate: (k) = -log (L ( | O ) ) + qK D(N)/N N = number of probes (data points) qk = number of parameters D(N) = 2 (AIC) or D(N) = log(N) (BIC) Choose K = argmink (k) Problems with HMM model Length of sequence emitted from fixed state is geometrically distributed. P(j j j j j j j j) = P(t+1 = j | t = j) n For CGH this means, 1) Length of aberrant intervals 2) Separation between two intervals of same copy number Will be geometrically distributed CGH Segmentation: Transitions Let IX = length of sequence in state X. • P[lX = 1] = 1-p • P[lX = 2] = p(1-p) • … p • P[lX= k] = pk(1-p) • E[lX] = 1/(1-p) • Geometric distribution, with mean 1/(1-p) 1-p X Y 1-q q CGH Analysis (2) Chromosome 3 of 26 lung tumor samples on middensity cDNA array. Common deletion located in 3p21 and common amplification – in 3q. Samples • Identify aberrations common to multiple samples 2001T-1 2002T-1 2009T-1 2010T-1 2011T-1 2014T-1 2017T-1 2020T-1 2022T-1 2062T-1 2068T-1 2069T-1 2073T-1 2075T-1 2076T-1 2079T-1 2080T-1 2082T-1 2083T-1 2086T-1 2090T-1 2091T-1 2092T-1 2093T-1 2097T-1 2099T-1 0 20 40 60 80 100 120 140 160 180 Ben-Dor et al. Results Intervals Stacks and Footprints Results (Diskin et al.) Frequence Results (Diskin, et al.) Stacks Cancer Genomes Leukemia Breast Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976) “Comparative Genomics” of Cancer Human genome Mutation, selection Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4 1) Identify recurrent aberrations • Mitelman Database, >40,000 aberrations 2) Reconstruct temporal sequence of aberrations • Linear model: Colorectal cancer (Vogelstein, 1988): -5q 12p* -17p -18q • Tree model: (Desper et al.1999) 3) Find age of tumor, time of clonal expansion Observing Cancer Progression • Obtaining longitudinal (time-course) data difficult. t1 t2 t3 t4 • Latitudinal data (multiple patients) readily available. Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 3 Tumor genome 4 Multiple Mutations • 4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q 12p* -17p -18q • Inferred from latitudinal data in 172 tumor samples. Oncogenetic Tree models (Desper et al. JCB 1999, 2001) • Given: measurements of chromosome gain/loss events in multiple tumor samples (CGH) • Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} Oncogenetic Tree models (Desper et al. JCB 1999, 2000) • Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L Oncogenetic Tree T = (V, E, r, p, L) rooted tree • V = vertices • E = edges • L = set of events (leaves) • r root • p: E (0,1] probability distribution T gives probability distribution on 2L e1 e0 e2 e3 e4 Results • CGH of 117 cases of kidney cancer Extensions • Oncogenetic trees based on branching (Desper et al., JCB 1999) Extensions Extensions • Oncogenetic trees based on branching (Desper et al., JCB 1999) • Maximum Likelihood Estimation (von Heydebreck et al, 2004) • Mutagenic trees: mixtures of trees (Beerenwinkel, et al. JCB 2005) Heterogeneity within a tumor • Final tumor is clonal expansion of single cell lineage. • Can we date the time of clonal expansion? Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000. Estimating time of clonal expansion • Microsatellite loci (MS), CA dinucleotides. • In tumors with loss of mismatch repair (e.g. colorectal), MS change size. Estimating time of clonal expansion • For each MS locus, measure mean mi and variance si of size. • S2allele = average of s12, …, sL2 • S2loci = variance of m1, …, mL Time to clonal expansion? Simulation Estimates of Tumor Age Y2 Y1 • Y1 = time to clonal expansion • Tumor age = Y1 + Y2 • Branching process simulation. Each cell in population gives birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) • Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed. Results • 15 patients, 25 MS loci • Estimate time since clonal expansion from observed S2allele, S2loci . Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976) Sources • Fridyland, et al. Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis, 2004 • Desper, et al. Distance-Based Reconstruction of Tree Models for Oncogenesis. Journal of Computational Biology, 2000. • Diskin, et al. STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH.Genome Research, 2006 • Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.