* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Mechanistic Models of Cancer in the Space of Pathways
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genomic imprinting wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Public health genomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genome evolution wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Gene nomenclature wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Mechanistic Models of Cancer Progression in the Space of Pathways Elena Edelman [email protected] Computational Biology and Bioinformatics Program Institute of Genome Policy and Science Duke University Outline I. Biological Background – Problems with single gene analysis – Advantages of pathway analysis II. Gene Sets – How they are derived – Importance of understanding context III. Modeling Cancer Progression – Overview of multitask model – Prostate cancer example – Melanoma example Mechanistic Models of Cancer Progression, Elena Edelman presenting Mechanistic Models of Cancer Progression, Elena Edelman presenting Mechanistic Models of Cancer Progression, Elena Edelman presenting Disadvantages of single gene based methods • • • Hundreds of differentially expressed genes Subtle signals Lack of consensus Mechanistic Models of Cancer Progression, Elena Edelman presenting Solutions • Hundreds of differentially expressed genes – group together in a small number of pathways • Subtle signals – brought to attention when seen as a group • Lack of consensus – consensus in processes/pathways, not single genes Mechanistic Models of Cancer Progression, Elena Edelman presenting Disadvantage of single gene methods 13,023 genes ↓ 1,149 mutated genes ↓ 189 candidate cancer genes ↓ Each sample of a given tumor type had no more than six mutated CAN genes in common Sjoblom 2006 Mechanistic Models of Cancer Progression, Elena Edelman presenting Importance of pathway analysis • Deregulation of specific processes are necessary for tumor formation. Each process has many potential member genes. • Alteration of a number of different genes will provide the same phenotypic result. Mechanistic Models of Cancer Progression, Elena Edelman presenting Rb pathway • Several cancer genes control transitions from resting state (G0 or G1) to replicating phase (S) of cell cycle. • Diverse protein products: – cdk4 (kinase), oncogene – cyclin D1 (activates cdk4), oncogene – Rb (transcription factor), TSG – p16 (inhibits cdk4), TSG Mechanistic Models of Cancer Progression, Elena Edelman presenting P53 TSG • P53 is a transcription factor that inhibits cell growth and stimulates cell death • Point mutation inactivates its capacity to bind specifically to its recognition sequence. • Other ways to achieve the same effect – Amplification of MDM2 – Infection with DNA tumor viruses whose products bind to p53 and functionally inactivate it. Mechanistic Models of Cancer Progression, Elena Edelman presenting Pathway Analysis • Identify gene sets whose expression patterns characterize specific genetic or molecular perturbations. • Early pathway analysis: Apply methods such as t-tests to determine differentially expressed genes between two classes. Use database such as Gene Ontology to relate individual genes in terms of general cellular function. → Mechanistic Models of Cancer Progression, Elena Edelman presenting Pathway Analysis • Next step in pathway analysis: Gene Set Enrichment Analysis (GSEA) & Analysis of Sample Set Enrichment Score (ASSESS) – Start with biological information: Gene sets – Score enrichment of gene sets in an expression profile with samples from two classes – GSEA outputs enrichment scores for each gene set in each phenotype – ASSESS outputs enrichment scores for each gene set in each individual Mechanistic Models of Cancer Progression, Elena Edelman presenting Enrichment Analysis Given a ranked gene list and a gene set of interest, find genes in the set that are “enriched” at the top or bottom of the list. Phenotype classes S Ranked Gene List RES for GS 16 : chr1 p13 RES for GS 171 : chr3q21 0. 6 0. 6 0. 4 0. 6 G3 RES for GS 1 : xinact.u133a.grp S 0. 4 Gene Sets G1 G2 0. 8 G10. 8 G2 G38 0. How could we conclude that G1 is enriched but G2 and G3 are not? Mechanistic Models of Cancer Progression, Elena Edelman presenting Outline I. Biological Background – Problems with single gene analysis – Advantages of pathway analysis II. Gene Sets – How they are derived – Importance of understanding context III. Modeling Cancer Progression – Overview of multitask model – Prostate cancer example – Melanoma example Mechanistic Models of Cancer Progression, Elena Edelman presenting Gene Sets • Defined functionally or structurally • Defined by experimental methods or through literature. – Experimental: Knockouts, infections – Literature: Biochemical experiments, reported in databases such as BioCarta and GenMapp Mechanistic Models of Cancer Progression, Elena Edelman presenting GSEA of male vs. female in lymphoblastoid cells GENE SET SOURCE ES NES NOM p-v FDR q-v Enriched in Males s1:chrY Genome 0.778 2.465 < 0.001 < 0.001 s1:chrYp11 Genome 0.759 2.181 < 0.001 < 0.001 s1:chrYq11 Genome 0.886 2.175 < 0.001 < 0.001 s1:Testis expressed genes Experimental GNF 0.656 2.018 < 0.001 0.009 s2:Genes that escape Xinactivation Disteche et al, Willard et al -0.800 -2.295 < 0.001 < 0.001 s2:Female reproductive tissue expressed genes Experimental GNF -0.485 -1.892 0.013 0.045 Enriched in Females Mechanistic Models of Cancer Progression, Elena Edelman presenting GENE SETS ASSESS of male vs. female in lymphoblastoid cells SAMPLES Mechanistic Models of Cancer Progression, Elena Edelman presenting Gene Set Accuracy • Analyses will depend on accuracy of gene sets. We ask: – What is the accuracy of gene sets annotated according to known perturbations? – How do gene sets defined by experimental studies vs. expert knowledge compare? Mechanistic Models of Cancer Progression, Elena Edelman presenting Hypoxia Gene Set • Hypoxia: The cellular response to low oxygen conditions. Includes new blood vessel formation • Seven hypoxia gene sets describing the cellular response to hypoxia Gene Set Source Hypoxia Down Manalo et al Hypoxia Up Manalo et al Hypoxia Fibro Up Kim et al Hypoxia Reg Up Leonard et al Hypoxia Review Harris VEGF Pathway BioCarta HIF Pathway BioCarta Mechanistic Models of Cancer Progression, Elena Edelman presenting Hypoxia gene set accuracy • Expression data set with 6 hypoxic and 6 normal cells (Mense 2006) • GSEA applied with database of 508 gene sets. Rank Gene Set NES P-val Enriched in Hypoxic Cells 3 Hypoxia Up -1.96 0.008 4 Hypoxia Review -1.95 0 6 Hypoxia Fibro Up -1.84 0.004 9 Hypoxia Reg Up -1.73 0.02 10 HIF Pathway -1.73 0.02 53 VEGF Pathway -1.39 0.055 1.48 0.167 Enriched in Normal Cells 17 Hypoxia Down Mechanistic Models of Cancer Progression, Elena Edelman presenting RAS • 3 Ras gene sets: K-Ras, H-Ras, and the Ras pathway from Biocarta. • K-RAS and H-RAS are experimentally defined and context specific. • Biocarta's Ras gene set in the most general, consisting of genes thought to biochemically interact with RAS and proteins associated with RAS. Mechanistic Models of Cancer Progression, Elena Edelman presenting RAS gene set accuracy • • • Gene expression profile of 31 cells with tumors caused by K-RAS mutation and 19 normal cells. H-RAS does not capture K-RAS specificity. BioCarta's RAS gene set is appropriate to use regardless of the specific RAS mutation. Gene Set NES Pval RAS Up BioCarta 1.51 0 SRC Down 1.41 0.09 MYC Up 1.25 0.15 SRC Up 1.25 0.15 HRAS Up 1.12 0.26 E2F3 Up 1.12 0.25 BCAT Up 0.81 0.74 RAS Down BioCarta -1.51 0.12 E2F3 Down -1.29 0.10 HRAS Down -1.18 0.19 BCAT Down -1.14 0.29 MYC Down -0.99 0.55 Enriched in Tumor Enriched in Normal Mechanistic Models of Cancer Progression, Elena Edelman presenting RAS gene set accuracy • Gene expression profile of 45 adenocarcinomas and 48 squamous lung cancer samples. • Data set indirectly involves RAS perturbations. • Enrichment scores from ASSESS were used to predict phenotype. Class prediction accuracy for the three sets: – 69.9% for the H-RAS pathway gene set – 75.3% for the K-RAS pathway gene set – 79.6% for the BioCarta RAS pathway gene set Mechanistic Models of Cancer Progression, Elena Edelman presenting Outline I. Biological Background – Problems with single gene analysis – Advantages of pathway analysis II. Gene Sets – How they are derived – Importance of understanding context III. Modeling Cancer Progression – Overview of multitask model – Prostate cancer example – Melanoma example Mechanistic Models of Cancer Progression, Elena Edelman presenting Dynamics of Cancer Progression • Long lists of genes implicated in various stages of cancer exist for many different cancer types. Want to learn about the interaction of these genes via signaling pathways and functional relationships. • Next step is for a mechanistic understanding of cancer progression on the pathway level. • There are only a few types of cancers where we know which pathways acquire mutations that initiate tumorigenesis. – Eye: RB1 • Are other types of cancer initiated by one or several pathways becoming altered? • The alteration of one gene hardly ever suffices to give rise to full blown cancer. – Oncogenes, tumor suppressor genes (TSGs), and stability genes drive tumor progression. – Mammalian cells have multiple safeguards . Several genes must be defective for invasive cancer to develop. Mechanistic Models of Cancer Progression, Elena Edelman presenting Objectives • Identify pathways most relevant throughout progression and pathways most relevant to individual transitions. • Build pathway networks: Estimate the interdependence of pathways relevant to each step of tumor progression. • Refine relevant pathways and infer a gene network for those relevant genes sets. Mechanistic Models of Cancer Progression, Elena Edelman presenting Hierarchical Modeling • Tumor progression – FIXED EFFECTS: Stage in cancer progression. Individuals will show similar pathway deregulation as cancer progresses depending on whether they have benign, primary or metastatic lesions. – RANDOM EFFECTS: Within a stage, individuals will have differences based on how they specifically developed the disease. Mechanistic Models of Cancer Progression, Elena Edelman presenting Regularized Multitask Learning (RML) • Current analyses of genomic data evaluate each stage in progression independently, missing relationships between the data. • Integration of the data over all stages will provide a more complete picture of the processes underlying tumorigenesis. • RML learns a problem together with other related problems at the same time. Learning the problems in parallel can help each problem be better learned by using a shared representation. • Problems: Which pathways are relevant to transition 1? Transition 2? Which pathways are relevant throughout progression? Mechanistic Models of Cancer Progression, Elena Edelman presenting Stratifying Data • States: normal (n), early (e), metastatic (m). • Data: Gene expression for g genes in s samples. Stratify data into T datasets, one for each step in progression. T=2: D1 n e m n D2 e e m Mechanistic Models of Cancer Progression, Elena Edelman presenting Modeling tumor progression • Model Summary: Find relevant pathways in the overall progression {n→e→m} And the relevant pathways at different stages {n→e} and {e→m} The task t corresponds to progression from less serious to more serious states t=1: {n→e}, t=2: {e→m} Mechanistic Models of Cancer Progression, Elena Edelman presenting Transformation • Transformation: Gene expression data is transformed using ASSESS D: genes x samples D1 n e S: gene sets x samples D2 e m S1 n → e e m Gene sets genes 1 S2 20,000 Mechanistic Models of Cancer Progression, Elena Edelman presenting Multitask SVM • Support vector machines (SVMs) - regularization method – Input regression data – Estimate a regression function f - a summary statistic of Y|X. • Multitask SVM – builds classification models jointly over all data sets, Y|S1, S2. – Provides a baseline model for gene sets relevant to predicting phenotype in both data sets, Y|S1,S2 – Provides gene sets relevant to only one data set, Y|S1 and Y|S2 – These regressions provide data set dependent corrections to the baseline model. Mechanistic Models of Cancer Progression, Elena Edelman presenting The Model • • Input: x= S1, S2 class labels, y={-1,1} where -1=less serious, 1=more serious. • Build two regression models ft1(x) and ft2(x), for transition 1 data and transition 2 data. – b(x)=baseline term over all tasks and rt(x)=task specific corrections f t1 (x) b(x) rt1 (x) f t 2 (x) b(x) rt 2 (x) • Discriminate functions: v tt x b, f t1(x) w 0 x w 1 – – – – 1 f t 2 (x) w 0 x v t2 x b, w0 is a vector of baseline weights for the gene sets vt1 is the vector of correction terms for transition 1 vt2 is thevector of correction terms for transition 2 b is a scalar offset Mechanistic Models of Cancer Progression, Elena Edelman presenting The Model • Parameters are estimated by minimization problem: where v(f(xit), yit) is a loss function. If tasks are thought to be highly related, set λ2/λ1 ratio to be large. Mechanistic Models of Cancer Progression, Elena Edelman presenting Model Interpretation • Interpretation: wjo – weight of jth gene set in a baseline model. Gene sets for which |wj0| are largest are relevant in {n→e→m} vjt – weight of the jth gene set in state progression t. Gene sets for which |vj1| is large are relevant in {n→e} and gene sets for which |vj2| are large are relevant in {e→m}. Mechanistic Models of Cancer Progression, Elena Edelman presenting Prostate Cancer • Gene expression profile of 22 benign epithelium samples (b), 32 primary prostate cancer samples (p), and 17 metastatic prostate cancer samples (m). Tomlins, 2007 • Progression {b→p→m} w0 v1 v2 Mechanistic Models of Cancer Progression, Elena Edelman presenting Results • Categorized results by “Hallmarks of Cancer” – Hanahan, 2000 – Self sufficiency of growth signals – Insensitivity to anti-growth signals – Evasion of apoptosis – Defense against limitless replicative potential – Angiogenesis – Invasion and metastasis Mechanistic Models of Cancer Progression, Elena Edelman presenting Results • Self sufficiency in growth signals – Cell cycle gene sets – ErbB4, EGF, Sprouty, ERK Mechanistic Models of Cancer Progression, Elena Edelman presenting Results • Evidence for insensitivity to anti-growth signals: – PTEN down-regulation – PTDINS up-regulation • Evasion of apoptosis: – IGF1R up-regulation – ROS down-regulation • Energy production – Glycolysis gene set up-regulation – ATP synthesis gene set up-regulation – Oxidative phosphorylation up-regulation Mechanistic Models of Cancer Progression, Elena Edelman presenting Novel Findings • Took previous analysis a step further by discovering the specific pathways implicated in tumorigenesis. – Previous work identified single genes which were relevant in progression and grouped them together to form important concepts. Tomlins 2007 • Currently little known about ErbB4 deregulation in PCA – EGF receptors have been implicated in several tumor type – stomach, brain, breast. – ErbB2/HER2 has been shown to be overexpressed in prostate cancer Mechanistic Models of Cancer Progression, Elena Edelman presenting Objective 2: Pathway dependency structure • Infer a pathway interaction network for each stage of progression using learning gradients and inverse regression . • Provide knowledge on how certain pathways relate, interact, and influence one another with respect to phenotype. Mechanistic Models of Cancer Progression, Elena Edelman presenting Objective 2 • Standard regression methods show which gene sets are correlated with class labels but do not provide information on the co-variation of gene sets correlated with class labels. • Estimate covariance of inverse regression C=cov(X|Y) – Input matrix of enrichment scores (X) and class labels (Y) – Output covariance matrix C=cov(X|Y) • Diagonal elements measure relevance of i-th gene set with respect to change in label. • ij-th off diagonal element measures the dependence between gene sets i and j. • Relationships will be visualized in graphical models. Mechanistic Models of Cancer Progression, Elena Edelman presenting Objective 2 • Analysis can identify pathways that are closely associated throughout progression: – IGF1R and ERK are linked through their association with RAS. ERK ranks 9th out of 522 gene sets based on the covariance with the IGF1R pathway. – PTDINS ranks 15th based on the covariance with the PTEN gene set – IGF1R ranks 32nd based on the covariance with PTDINS Mechanistic Models of Cancer Progression, Elena Edelman presenting Objective 2 • • A: Dependency structure of the 10 gene sets most relevant in the benign to prostate cancer transition B: Extended dependency structure Mechanistic Models of Cancer Progression, Elena Edelman presenting Objective 3: Refinement • Gene sets available are not always in the right context for a specific data set. • The refinement procedure adapts the gene set to the context of the data set. Shows which genes are dependent on each other and if there is substructure in the gene set. • Cluster genes in gene set based on their covariance: C=cov(X|Y); – X= gene expression value of genes in the gene set – Y= class labels • A gene network modeling the interdependence of the genes in the refined gene set is inferred. Mechanistic Models of Cancer Progression, Elena Edelman presenting Gene Set Refinement • • • The genes of BioCarta's ERK pathway Refine the pathway to those genes most relevant for this data set. A and B differ in threshold values Mechanistic Models of Cancer Progression, Elena Edelman presenting Melanoma Progression • Gene expression profile of 4 normal skin samples (n), 4 primary melanoma samples (p), and 4 metastatic melanoma samples (m). Smith, 2005. • Progression {n→p→m} w0 v1 v2 Mechanistic Models of Cancer Progression, Elena Edelman presenting Melanoma Results • Self-sufficiency of growth – AKT up-regulation throughout progression – PTDINS up-regulation throughout progression • Escape from apoptosis – IGF1R up-regulation in the late transition – p53 down-regulation throughout progression • Defense against limitless replicative potential – HTERT up-regulation in the early transition • Angiogenesis – HIF up-regulation throughout progression – Angiogenesis gene set up-regulation in the early transition • Invasion and Metastasis – CDC42RAC up-regulation throughout progression – MTA3 down-regulation in the early transition Mechanistic Models of Cancer Progression, Elena Edelman presenting Validation • Gene expression profile of 9 samples of benign nevis, 6 samples of primary melanoma, and 19 samples of metastatic melanoma (Haqq 2005) w0 v1 • v2 Both analysis found: – p53 gene set down-regulation – D4-GDI pathway over-expression – HTERT gene set over-expression – CDC42RAC pathway over-expression Mechanistic Models of Cancer Progression, Elena Edelman presenting Pathway Dependencies • • A: Dependency structure of top 10 gene sets most relevant in the normal skin to primary melanoma transition B: Extended dependency structure Mechanistic Models of Cancer Progression, Elena Edelman presenting Sterol Biosynthesis • • • • Sterol biosynthesis gene set is highly connected Tumor cells often have sterol synthesis deficiencies One component of the sterol biosynthesis pathway is mevalonate pathway. Many tumor cells can not synthesize mevalonate so they obtain is from the host Mechanistic Models of Cancer Progression, Elena Edelman presenting Pathways Dependencies • Interdependence with sterol biosynthesis gene sets out of 523 gene sets: – Fatty acid synthesis ranks 14th – Cyanoamino acid metabolism ranks 19th – Gamma hexachlorocyclohexane ranks 3rd • All are closely tied to the inability of a tumor to synthesize certain metabolites and its increasing need for these metabolites as it grows and develops. Mechanistic Models of Cancer Progression, Elena Edelman presenting Colon Cancer Example • Multitask learning can be applied to data sets with more than 3 classes (2 tasks). • Colon cancer gene expression profile: 32 normal, 32 adenoma, 35 stage 1 carcinoma, 82 stage 2 carcinoma, 70 stage 3 carcinoma, and 43 stage 4 carcinoma. Vogelstein, 1990 Mechanistic Models of Cancer Progression, Elena Edelman presenting Future • Expand analyses to datasets with more than 3 classes – Prostate cancer: benign, PIN, PCA low, PCA high, metastatic – Colon cancer: normal, adenoma, carcinomas stage1-4 • Gene set expansion – After refining the gene sets, find genes outside of the set with strong dependencies to the core genes in the gene set Mechanistic Models of Cancer Progression, Elena Edelman presenting Acknowledgements • • • • • Sayan Mukherjee Phillip Febbo Joe Nevins Ashley Chi Justin Guinney Mechanistic Models of Cancer Progression, Elena Edelman presenting