Download Slides - Brown CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematical model wikipedia , lookup

Transcript
CSCI2950-C
Lecture 11
Cancer Genomics: Duplications
October 23, 2008
http://cs.brown.edu/courses/csci2950-c/
Outline
• Cancer Genomes
1. Comparative Genomic Hybridization
• Cancer Progression Models
DNA Microarrays
Measuring Mutations in Cancer
Comparative Genomic Hybridization (CGH)
CGH Analysis (1)
Log2(R/G)
• Divide genome into segments of equal copy number
0.5
0
Genomic position
-0.5
Deletion
Amplification
0.5
0
-0.5
Genomic position
A+ Model
C+ for CGH
G+
HMM
data
Fridlyand et al. (2004)
S1
S2
S3
S4
A model for CGH data
K states
copy numbers
S1
S2
Heterozygous
Deletion
(copy =1)
Homozygous
Deletion
(copy =0)
1, 1
Copy number
Emissions:
Gaussians
2, 2
S3
Normal
(copy =2)
3, 3
Genome
coordinate
S4
Duplication
(copy >2)
4, 4
CGH Segmentation: Model Selection
How many states copy number states K?
Larger K:
1. Better fit to observed data
2. More parameters to estimate
Avoid overfitting by model selection.
Let  = (A, B, ) be parameters for HMM.
Try different k = 1, …, Kmax
Compute L(  | O ) by dynamic
programming (forward-backward algorithm)
Calculate:
(k) = -log (L ( | O ) ) + qK D(N)/N
N = number of probes (data points)
qk = number of parameters
D(N) = 2 (AIC) or D(N) = log(N) (BIC)
Choose K = argmink (k)
Problems with HMM model
Length of sequence emitted from fixed state is
geometrically distributed.
P(j j j j j j j j) = P(t+1 = j | t = j) n
For CGH this means,
1) Length of aberrant intervals
2) Separation between two intervals of same copy
number
Will be geometrically distributed
CGH Segmentation: Transitions
Let IX = length of
sequence in state X.
• P[lX = 1] = 1-p
• P[lX = 2] = p(1-p)
• …
p
• P[lX= k] = pk(1-p)
• E[lX] = 1/(1-p)
• Geometric
distribution, with
mean 1/(1-p)
1-p
X
Y
1-q
q
CGH Analysis (2)
Chromosome 3 of 26 lung
tumor samples on middensity cDNA array.
Common deletion located in
3p21 and common
amplification – in 3q.
Samples
• Identify aberrations common to multiple samples
2001T-1
2002T-1
2009T-1
2010T-1
2011T-1
2014T-1
2017T-1
2020T-1
2022T-1
2062T-1
2068T-1
2069T-1
2073T-1
2075T-1
2076T-1
2079T-1
2080T-1
2082T-1
2083T-1
2086T-1
2090T-1
2091T-1
2092T-1
2093T-1
2097T-1
2099T-1
0
20
40
60
80
100
120
140
160
180
Ben-Dor et al. Results
Intervals
Stacks and Footprints
Results (Diskin et al.)
Frequence
Results (Diskin, et al.)
Stacks
Cancer Genomes
Leukemia
Breast
Cancer: Mutation and Selection
Clonal theory of cancer: Nowell (Science 1976)
“Comparative Genomics” of Cancer
Human genome
Mutation, selection
Tumor genome
Tumor genome 2
Tumor genome 3
Tumor genome 4
1) Identify recurrent aberrations
• Mitelman Database, >40,000 aberrations
2) Reconstruct temporal sequence of aberrations
• Linear model: Colorectal cancer (Vogelstein, 1988):
-5q  12p*  -17p  -18q
• Tree model: (Desper et al.1999)
3) Find age of tumor,
time of clonal expansion
Observing Cancer Progression
• Obtaining longitudinal (time-course) data
difficult.
t1
t2
t3
t4
• Latitudinal data (multiple patients) readily
available.
Mutation, selection
Human genome
Tumor genome
Tumor genome 2
Tumor genome 3
Tumor genome 4
Multiple Mutations
• 4 step model for colorectal cancer, Vogelstein,
et al. (1988) New Eng. J.Med
-5q  12p*  -17p  -18q
• Inferred from latitudinal data in 172 tumor
samples.
Oncogenetic Tree models
(Desper et al. JCB 1999, 2001)
• Given: measurements of chromosome
gain/loss events in multiple tumor samples
(CGH)
• Compute: rooted tree that best explains
temporal sequence of events.
{+1q}, {-8p}, {+Xq},
{+Xq, -8p}, {-8p, +1q}
Oncogenetic Tree models
(Desper et al. JCB 1999, 2000)
• Given: measurements of chromosome
gain/loss events in multiple tumor samples
{+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}
L = set of chromosome alterations observed in
all samples
Tumor samples give probability distribution on
2L
Oncogenetic Tree
T = (V, E, r, p, L) rooted tree
•
V = vertices
•
E = edges
•
L = set of events (leaves)
•
r root
•
p: E  (0,1] probability distribution
T gives probability distribution on 2L
e1
e0
e2
e3
e4
Results
• CGH of 117 cases of kidney cancer
Extensions
• Oncogenetic trees based on branching
(Desper et al., JCB 1999)
Extensions
Extensions
• Oncogenetic trees based on branching
(Desper et al., JCB 1999)
• Maximum Likelihood Estimation (von
Heydebreck et al, 2004)
• Mutagenic trees:
mixtures of trees
(Beerenwinkel, et al.
JCB 2005)
Heterogeneity within a tumor
• Final tumor is clonal
expansion of single
cell lineage.
• Can we date the time
of clonal expansion?
Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal
tumor histories, PNAS 2000.
Estimating time of clonal
expansion
• Microsatellite loci (MS), CA dinucleotides.
• In tumors with loss of mismatch repair (e.g.
colorectal), MS change size.
Estimating time of clonal expansion
• For each MS locus, measure mean mi and
variance si of size.
• S2allele = average of s12, …, sL2
• S2loci = variance of m1, …, mL
Time to clonal expansion?
Simulation Estimates of Tumor Age
Y2
Y1
• Y1 = time to clonal expansion
• Tumor age = Y1 + Y2
• Branching process simulation. Each cell in population gives
birth to 0, 1 or 2 daughter cells with +- 1 change in MS size
(coalescent: forward, backward, forward simulation)
• Posterior estimate of Y1, Y2 by running simulations, accepting
runs with simulated values of S2allele, S2loci close to observed.
Results
• 15 patients, 25 MS loci
• Estimate time since clonal expansion from
observed S2allele, S2loci .
Cancer: Mutation and Selection
Clonal theory of cancer: Nowell (Science 1976)
Sources
• Fridyland, et al. Hidden Markov models approach to
the analysis of array CGH data. Journal of Multivariate
Analysis, 2004
• Desper, et al. Distance-Based Reconstruction of Tree
Models for Oncogenesis. Journal of Computational
Biology, 2000.
• Diskin, et al. STAC: A method for testing the
significance of DNA copy number aberrations across
multiple array-CGH.Genome Research, 2006
• Tsao, … Tavare, et al. Genetic reconstruction of
individual colorectal tumor histories, PNAS 2000.