Download Mechanistic Models of Cancer in the Space of Pathways

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genetic engineering wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genomic imprinting wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Public health genomics wikipedia , lookup

Gene wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Genome evolution wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene therapy wikipedia , lookup

Gene desert wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Gene nomenclature wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

RNA-Seq wikipedia , lookup

NEDD9 wikipedia , lookup

Oncogenomics wikipedia , lookup

Transcript
Mechanistic Models of Cancer Progression in
the Space of Pathways
Elena Edelman
[email protected]
Computational Biology and Bioinformatics Program
Institute of Genome Policy and Science
Duke University
Outline
I.
Biological Background
–
Problems with single gene analysis
–
Advantages of pathway analysis
II.
Gene Sets
–
How they are derived
–
Importance of understanding context
III. Modeling Cancer Progression
–
Overview of multitask model
–
Prostate cancer example
–
Melanoma example Mechanistic Models of Cancer Progression, Elena Edelman
presenting
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Disadvantages of single gene based methods
•
•
•
Hundreds of differentially expressed genes
Subtle signals
Lack of consensus
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Solutions
•
Hundreds of differentially expressed genes – group together in a small
number of pathways
•
Subtle signals – brought to attention when seen as a group
•
Lack of consensus – consensus in processes/pathways, not single genes
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Disadvantage of single gene methods
13,023 genes
↓
1,149 mutated genes
↓
189 candidate cancer genes
↓
Each sample of a given tumor type had no more than six mutated CAN genes
in common
Sjoblom 2006
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Importance of pathway analysis
•
Deregulation of specific processes are necessary for tumor formation.
Each process has many potential member genes.
•
Alteration of a number of different genes will provide the same
phenotypic result.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Rb pathway
• Several cancer genes control
transitions from resting state (G0
or G1) to replicating phase (S) of
cell cycle.
• Diverse protein products:
– cdk4 (kinase), oncogene
– cyclin D1 (activates cdk4),
oncogene
– Rb (transcription factor), TSG
– p16 (inhibits cdk4), TSG
Mechanistic Models of Cancer Progression, Elena Edelman presenting
P53 TSG
• P53 is a transcription factor
that inhibits cell growth and
stimulates cell death
• Point mutation inactivates its
capacity to bind specifically to
its recognition sequence.
• Other ways to achieve the
same effect
– Amplification of MDM2
– Infection with DNA tumor
viruses whose products bind
to p53 and functionally
inactivate it.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Analysis
•
Identify gene sets whose expression patterns characterize specific genetic or
molecular perturbations.
•
Early pathway analysis: Apply methods such as t-tests to determine differentially
expressed genes between two classes. Use database such as Gene Ontology to
relate individual genes in terms of general cellular function.
→
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Analysis
•
Next step in pathway analysis: Gene Set Enrichment Analysis (GSEA) &
Analysis of Sample Set Enrichment Score (ASSESS)
– Start with biological information: Gene sets
– Score enrichment of gene sets in an expression profile with samples
from two classes
– GSEA outputs enrichment scores for each gene set in each phenotype
– ASSESS outputs enrichment scores for each gene set in each individual
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Enrichment Analysis
Given a ranked gene list and a gene set of interest, find genes in the set
that are “enriched” at the top or bottom of the list.
Phenotype
classes
S
Ranked Gene List
RES for GS 16 : chr1 p13
RES for GS 171 : chr3q21
0. 6
0. 6
0. 4
0. 6
G3
RES for GS 1 : xinact.u133a.grp
S
0. 4
Gene Sets
G1
G2
0.
8
G10.
8
G2
G38
0.
How could we conclude that G1 is enriched but G2 and G3 are not?
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Outline
I.
Biological Background
–
Problems with single gene analysis
–
Advantages of pathway analysis
II.
Gene Sets
–
How they are derived
–
Importance of understanding context
III. Modeling Cancer Progression
–
Overview of multitask model
–
Prostate cancer example
–
Melanoma example
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Gene Sets
•
Defined functionally or structurally
•
Defined by experimental methods or through literature.
– Experimental: Knockouts, infections
– Literature: Biochemical experiments, reported in databases such as
BioCarta and GenMapp
Mechanistic Models of Cancer Progression, Elena Edelman presenting
GSEA of male vs. female in lymphoblastoid cells
GENE SET
SOURCE
ES
NES
NOM p-v
FDR q-v
Enriched in Males
s1:chrY
Genome
0.778
2.465
< 0.001
< 0.001
s1:chrYp11
Genome
0.759
2.181
< 0.001
< 0.001
s1:chrYq11
Genome
0.886
2.175
< 0.001
< 0.001
s1:Testis expressed genes
Experimental GNF
0.656
2.018
< 0.001
0.009
s2:Genes that escape Xinactivation
Disteche et al,
Willard et al
-0.800
-2.295
< 0.001
< 0.001
s2:Female reproductive tissue expressed
genes
Experimental GNF
-0.485
-1.892
0.013
0.045
Enriched in Females
Mechanistic Models of Cancer Progression, Elena Edelman presenting
GENE SETS
ASSESS of male vs. female in lymphoblastoid cells
SAMPLES
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Gene Set Accuracy
•
Analyses will depend on accuracy of gene sets. We ask:
– What is the accuracy of gene sets annotated according to known
perturbations?
– How do gene sets defined by experimental studies vs. expert
knowledge compare?
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hypoxia Gene Set
•
Hypoxia: The cellular response to low oxygen conditions. Includes new blood
vessel formation
•
Seven hypoxia gene sets describing the cellular response to hypoxia
Gene Set
Source
Hypoxia Down
Manalo et al
Hypoxia Up
Manalo et al
Hypoxia Fibro Up
Kim et al
Hypoxia Reg Up
Leonard et al
Hypoxia Review
Harris
VEGF Pathway
BioCarta
HIF Pathway
BioCarta
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hypoxia gene set accuracy
•
Expression data set with 6 hypoxic and 6 normal cells (Mense 2006)
•
GSEA applied with database of 508 gene sets.
Rank
Gene Set
NES
P-val
Enriched in Hypoxic Cells
3
Hypoxia Up
-1.96
0.008
4
Hypoxia Review
-1.95
0
6
Hypoxia Fibro Up
-1.84
0.004
9
Hypoxia Reg Up
-1.73
0.02
10
HIF Pathway
-1.73
0.02
53
VEGF Pathway
-1.39
0.055
1.48
0.167
Enriched in Normal Cells
17
Hypoxia Down
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS
•
3 Ras gene sets: K-Ras, H-Ras, and the Ras pathway from Biocarta.
•
K-RAS and H-RAS are experimentally defined and context specific.
•
Biocarta's Ras gene set in the most general, consisting of genes thought to
biochemically interact with RAS and proteins associated with RAS.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS gene set accuracy
•
•
•
Gene expression profile of 31 cells with
tumors caused by K-RAS mutation and
19 normal cells.
H-RAS does not capture K-RAS
specificity.
BioCarta's RAS gene set is appropriate
to use regardless of the specific RAS
mutation.
Gene Set
NES
Pval
RAS Up BioCarta
1.51
0
SRC Down
1.41
0.09
MYC Up
1.25
0.15
SRC Up
1.25
0.15
HRAS Up
1.12
0.26
E2F3 Up
1.12
0.25
BCAT Up
0.81
0.74
RAS Down BioCarta
-1.51
0.12
E2F3 Down
-1.29
0.10
HRAS Down
-1.18
0.19
BCAT Down
-1.14
0.29
MYC Down
-0.99
0.55
Enriched in Tumor
Enriched in Normal
Mechanistic Models of Cancer Progression, Elena Edelman presenting
RAS gene set accuracy
•
Gene expression profile of 45 adenocarcinomas and 48 squamous lung
cancer samples.
•
Data set indirectly involves RAS perturbations.
•
Enrichment scores from ASSESS were used to predict phenotype. Class
prediction accuracy for the three sets:
– 69.9% for the H-RAS pathway gene set
– 75.3% for the K-RAS pathway gene set
– 79.6% for the BioCarta RAS pathway gene set
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Outline
I.
Biological Background
–
Problems with single gene analysis
–
Advantages of pathway analysis
II.
Gene Sets
–
How they are derived
–
Importance of understanding context
III. Modeling Cancer Progression
–
Overview of multitask model
–
Prostate cancer example
–
Melanoma example
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Dynamics of Cancer Progression
•
Long lists of genes implicated in various stages of cancer exist for many different
cancer types. Want to learn about the interaction of these genes via signaling
pathways and functional relationships.
•
Next step is for a mechanistic understanding of cancer progression on the pathway
level.
•
There are only a few types of cancers where we know which pathways acquire
mutations that initiate tumorigenesis.
– Eye: RB1
•
Are other types of cancer initiated by one or several pathways becoming altered?
•
The alteration of one gene hardly ever suffices to give rise to full blown cancer.
– Oncogenes, tumor suppressor genes (TSGs), and stability genes drive tumor
progression.
– Mammalian cells have multiple safeguards . Several genes must be defective for
invasive cancer to develop.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objectives
•
Identify pathways most relevant throughout progression and pathways most relevant
to individual transitions.
•
Build pathway networks: Estimate the interdependence of pathways relevant to each
step of tumor progression.
•
Refine relevant pathways and infer a gene network for those relevant genes sets.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Hierarchical Modeling
• Tumor progression
– FIXED EFFECTS: Stage in cancer progression. Individuals will
show similar pathway deregulation as cancer progresses
depending on whether they have benign, primary or metastatic
lesions.
– RANDOM EFFECTS: Within a stage, individuals will have
differences based on how they specifically developed the disease.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Regularized Multitask Learning (RML)
•
Current analyses of genomic data evaluate each stage in progression
independently, missing relationships between the data.
•
Integration of the data over all stages will provide a more complete
picture of the processes underlying tumorigenesis.
•
RML learns a problem together with other related problems at the
same time. Learning the problems in parallel can help each problem
be better learned by using a shared representation.
•
Problems: Which pathways are relevant to transition 1? Transition 2?
Which pathways are relevant throughout progression?
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Stratifying Data
•
States: normal (n), early (e), metastatic (m).
•
Data: Gene expression for g genes in s samples. Stratify data into T datasets, one for
each step in progression.
T=2:
D1
n
e
m
n
D2
e
e
m
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Modeling tumor progression
• Model Summary: Find relevant pathways in the overall progression
{n→e→m}
And the relevant pathways at different stages
{n→e} and {e→m}
The task t corresponds to progression from less serious to more
serious states
t=1: {n→e}, t=2: {e→m}
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Transformation
•
Transformation: Gene expression data is transformed using ASSESS
D: genes x samples
D1
n e
S: gene sets x samples
D2
e
m
S1
n
→
e
e
m
Gene sets
genes
1
S2
20,000
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Multitask SVM
•
Support vector machines (SVMs) - regularization method
– Input regression data
– Estimate a regression function f - a summary statistic of Y|X.
•
Multitask SVM
– builds classification models jointly over all data sets, Y|S1, S2.
– Provides a baseline model for gene sets relevant to predicting
phenotype in both data sets, Y|S1,S2
– Provides gene sets relevant to only one data set, Y|S1 and Y|S2
– These regressions provide data set dependent corrections to the
baseline model.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
The Model
•
•
Input: x= S1, S2
class labels, y={-1,1} where -1=less serious, 1=more serious.
•
Build two regression models ft1(x) and ft2(x), for transition 1 data and
transition 2 data.
– b(x)=baseline term over all tasks and rt(x)=task specific corrections
f t1 (x)  b(x)  rt1 (x)  
f t 2 (x)  b(x)  rt 2 (x)  
•
Discriminate functions:
v tt  x  b,
f t1(x)  w 0  x  w
1

–
–
–
–

1
f t 2 (x)  w 0  x  v t2  x  b,
w0 is a vector of baseline weights for the gene sets
vt1 is the vector of correction terms for transition 1
vt2 is thevector of correction 
terms for transition 2
b is a scalar offset
Mechanistic Models of Cancer Progression, Elena Edelman presenting
The Model
• Parameters are estimated by minimization problem:
where v(f(xit), yit) is a loss function. If tasks are thought to be highly
related, set λ2/λ1 ratio to be large.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Model Interpretation
• Interpretation:
wjo – weight of jth gene set in a baseline model. Gene sets for which
|wj0| are largest are relevant in
{n→e→m}
vjt – weight of the jth gene set in state progression t.
Gene sets for which |vj1| is large are relevant in
{n→e}
and gene sets for which |vj2| are large are relevant in
{e→m}.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Prostate Cancer
•
Gene expression profile of 22 benign epithelium samples (b), 32
primary prostate cancer samples (p), and 17 metastatic prostate
cancer samples (m). Tomlins, 2007
•
Progression {b→p→m}
w0
v1
v2
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
•
Categorized results by “Hallmarks of Cancer” – Hanahan, 2000
– Self sufficiency of growth signals
– Insensitivity to anti-growth signals
– Evasion of apoptosis
– Defense against limitless replicative potential
– Angiogenesis
– Invasion and metastasis
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
•
Self sufficiency in growth signals
– Cell cycle gene sets
– ErbB4, EGF, Sprouty, ERK
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Results
•
Evidence for insensitivity to anti-growth signals:
– PTEN down-regulation
– PTDINS up-regulation
•
Evasion of apoptosis:
– IGF1R up-regulation
– ROS down-regulation
•
Energy production
– Glycolysis gene set up-regulation
– ATP synthesis gene set up-regulation
– Oxidative phosphorylation up-regulation
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Novel Findings
•
Took previous analysis a step further by discovering the specific pathways
implicated in tumorigenesis.
– Previous work identified single genes which were relevant in progression
and grouped them together to form important concepts.
Tomlins 2007
•
Currently little known about ErbB4 deregulation in PCA
– EGF receptors have been implicated in several tumor type – stomach, brain,
breast.
– ErbB2/HER2 has been shown to be overexpressed in prostate cancer
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2: Pathway dependency structure
•
Infer a pathway interaction network for each stage of progression using
learning gradients and inverse regression .
•
Provide knowledge on how certain pathways relate, interact, and
influence one another with respect to phenotype.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
•
Standard regression methods show which gene sets are correlated
with class labels but do not provide information on the co-variation of
gene sets correlated with class labels.
•
Estimate covariance of inverse regression C=cov(X|Y)
– Input matrix of enrichment scores (X) and class labels (Y)
– Output covariance matrix C=cov(X|Y)
•
Diagonal elements measure relevance of i-th gene set with respect to
change in label.
•
ij-th off diagonal element measures the dependence between gene
sets i and j.
•
Relationships will be visualized in graphical models.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
•
Analysis can identify pathways that are closely associated throughout
progression:
– IGF1R and ERK are linked through their association with RAS. ERK ranks
9th out of 522 gene sets based on the covariance with the IGF1R
pathway.
– PTDINS ranks 15th based on the covariance with the PTEN gene set
– IGF1R ranks 32nd based on the covariance with PTDINS
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 2
•
•
A: Dependency structure of the 10 gene sets most relevant in the benign to
prostate cancer transition
B: Extended dependency structure
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Objective 3: Refinement
•
Gene sets available are not always in the right context for a specific data set.
•
The refinement procedure adapts the gene set to the context of the data set.
Shows which genes are dependent on each other and if there is substructure
in the gene set.
•
Cluster genes in gene set based on their covariance: C=cov(X|Y);
– X= gene expression value of genes in the gene set
– Y= class labels
•
A gene network modeling the interdependence of the genes in the refined
gene set is inferred.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Gene Set Refinement
•
•
•
The genes of BioCarta's ERK pathway
Refine the pathway to those genes most relevant for this data set.
A and B differ in threshold values
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Melanoma Progression
•
Gene expression profile of 4 normal skin samples (n), 4 primary
melanoma samples (p), and 4 metastatic melanoma samples (m). Smith,
2005.
•
Progression {n→p→m}
w0
v1
v2
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Melanoma Results
•
Self-sufficiency of growth
– AKT up-regulation throughout progression
– PTDINS up-regulation throughout progression
•
Escape from apoptosis
– IGF1R up-regulation in the late transition
– p53 down-regulation throughout progression
•
Defense against limitless replicative potential
– HTERT up-regulation in the early transition
•
Angiogenesis
– HIF up-regulation throughout progression
– Angiogenesis gene set up-regulation in the early transition
•
Invasion and Metastasis
– CDC42RAC up-regulation throughout progression
– MTA3 down-regulation in the early transition
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Validation
•
Gene expression profile of 9 samples of benign nevis, 6 samples of primary
melanoma, and 19 samples of metastatic melanoma (Haqq 2005)
w0
v1
•
v2
Both analysis found:
– p53 gene set down-regulation
– D4-GDI pathway over-expression
– HTERT gene set over-expression
– CDC42RAC pathway over-expression
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathway Dependencies
•
•
A: Dependency structure of top 10 gene sets most relevant in the normal
skin to primary melanoma transition
B: Extended dependency structure
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Sterol Biosynthesis
•
•
•
•
Sterol biosynthesis gene set is
highly connected
Tumor cells often have sterol
synthesis deficiencies
One component of the sterol
biosynthesis pathway is
mevalonate pathway.
Many tumor cells can not
synthesize mevalonate so they
obtain is from the host
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Pathways Dependencies
•
Interdependence with sterol biosynthesis gene sets out of 523 gene
sets:
– Fatty acid synthesis ranks 14th
– Cyanoamino acid metabolism ranks 19th
– Gamma hexachlorocyclohexane ranks 3rd
•
All are closely tied to the inability of a tumor to synthesize certain
metabolites and its increasing need for these metabolites as it grows
and develops.
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Colon Cancer Example
•
Multitask learning can be applied to data sets with more than 3
classes (2 tasks).
•
Colon cancer gene expression profile: 32 normal, 32 adenoma, 35
stage 1 carcinoma, 82 stage 2 carcinoma, 70 stage 3 carcinoma, and
43 stage 4 carcinoma.
Vogelstein, 1990
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Future
•
Expand analyses to datasets with more than 3 classes
– Prostate cancer: benign, PIN, PCA low, PCA high, metastatic
– Colon cancer: normal, adenoma, carcinomas stage1-4
•
Gene set expansion
– After refining the gene sets, find genes outside of the set with
strong dependencies to the core genes in the gene set
Mechanistic Models of Cancer Progression, Elena Edelman presenting
Acknowledgements
•
•
•
•
•
Sayan Mukherjee
Phillip Febbo
Joe Nevins
Ashley Chi
Justin Guinney
Mechanistic Models of Cancer Progression, Elena Edelman presenting