Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Emergent Biology Through Integration and Mining Of Microarray Datasets Lance D. Miller GIS Microarray & Expression Genomics FOCUS: Mining of expression data to understand the molecular composition of human cancers and to define components of the tumor molecular profile with mechanistic and clinical importance. 2001, PNAS Molecular classes are predictive of outcome overall survival: relapse-free survival: 70-gene prognosis classifier for predicting risk of distant metastasis within 5 years Van’t veer, et. al. Van’t veer, et. al. Sotiriou, et. al. Though each tumor is molecularly unique, there exist common transcriptional cassettes that underly biological and clinical properties of tumors that may be of diagnostic, prognostic and therapeutic significance. GOAL: Mining of expression data to understand the molecular composition of human cancers and to define components of the tumor molecular profile with mechanistic and clinical importance. The GIS Perpetual Array Platform Integration of Independent Datasets Perou et. al., 1999 Sorlie et. al., 2001 West et. al., 2001 Meta-Analysis of Breast Cancer Datasets: (Adaikalavan Ramasamy et. al.) dataset source sample size array format 1. Miller-Liu: unpublished 61 tumors: 39 ER+, 22 ER- 19K spotted oligo 2. Sotiriou-Liu: submitted: PNAS 99 tumors: 34 ER+, 65 ER- 7.6K spotted cDNA 3. Gruvberger-Meltzer: Cancer Research 47 tumors: 23 ER+, 24 ER- 6.7K spotted cDNA 4. Sorlie-Borrensen-Dale: PNAS 74 tumors: 56 ER+, 18 ER- 8.1K spotted cDNA 5. van’t Veer-Friend: Nature 98 tumors: 59 ER+, 39 ER- 25K spotted oligo 6. West-Nevins: PNAS 49 tumors: 25 ER+, 24 ER- 7.1K Affymetrix total: 428 tumors, ~73,500 probes META MADB: The Construct Building the Matrix 1. 2. 3. 4. 5. Extract and Format the Data Link sample/probe info via unique keys Log Transform and Normalize Filter Genes and Arrays Apply Statistical Tests Creating a Universe 1. 2. 3. 4. 5. Apply UniGene ID as Unifying Key Remove Gene Redundancy Extract p values, d values, z-scores Set p value threshold Merge Datasets META MADB META MADB d values (difference of average expression) ER+ T1 T2 T3 T4 T5 ER…Tn T1 T2 T3 T4 T5 …Tn gene1 : e1 e2 e3 e4 e5 …en e1 e2 e3 e4 e5 …en d = average e [ER+] / average e [ER-] Identifying Grade-Specific Genes in Hepatocellular Carcinoma Adenomatous hyperplasia ordinary atypical OAH AAH HCC Grade 1, 2, 3 G1 G2 Pre-neoplastic lesions HCC Progression • Sample: 10 cases of each class • Sample collection: HBV(+) • Array: Human 19K Oligonucleotide array • Analysis : 50 arrays G3 Identifying Grade-Specific Genes in Hepatocellular Carcinoma Identifying Grade-Specific Genes in Hepatocellular Carcinoma BC Breast Cancer Grade-Associated Genes as Predictors of HCC Grade? HCC Breast Cancer Grade-Associated Genes as Predictors of HCC Grade? HCC Estrogen Responsive Genes in vitro (Chin-Yo Lin) Fold Change Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2) UG Description Fold Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ + Insulin-like growth factor binding protein 4 2.1 + + + + Seven in absentia homolog 2 (Drosophila) 1.7 + + Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ + Stanniocalcin 2 5.0 ++ + + ++ Nuclear receptor interacting protein 1/RIP140 1.6 + + + GREB1 protein 3.1 + Serum-inducible kinase -2.0 + + Amphiregulin 3.9 ++ + CD7 antigen (p41) -2.5 + + Duodenal cytochrome -2.1 + + Thrombospondin 1 2.4 + + Putative transmembrane protein -3.8 + + +++ Stromal cell-derived factor 1 3.8 ++ ++ Retinoblastoma binding protein 8 2.2 ++ + + ++ Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++ protein kinase H11 1.5 Olfactomedin 1 3.0 ++ DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 10 (RNA helicase) 2.3 + + Hypothetical protein similar to mouse Dnajl1 2.5 + +++ Putative protein kinase 1.7 2.5 + UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 3.7 + + ++ Hypothetical protein FLJ14299/Similar to nocA zinc-finger protein 2.5 ++ Immunoglobulin superfamily, member 4 2.2 + ++ Cyclin G2 -2.6 ++ + Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 + Chitobiase, di-N-acetyl-1.9 ++ Arachidonate 12-lipoxygenase, 12R type -4.0 ++ + Purinergic receptor (family A group 5) -2.3 + G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + + Estrogen-Responsive in vitro and ER Status-Associated in vivo (p<0.001) UG Description Interleukin 6 signal transducer (gp130, oncostatin M receptor) Insulin-like growth factor binding protein 4 Seven in absentia homolog 2 (Drosophila) Matrix metalloproteinase 7 (matrilysin, uterine) Stanniocalcin 2 Nuclear receptor interacting protein 1/RIP140 GREB1 protein Serum-inducible kinase Amphiregulin CD7 antigen (p41) E2 E2 + ICI Fold Change 2.5 2.1 1.7 -1.7 5.0 1.6 3.1 -2.0 3.9 -2.5 2T47D MCF-7 ZR75-1 SAGE ERE (-2) ++ + + + + + + + ++ + ++ + + ++ + + + + + + ++ + + + E2 + CHX 1 2 3 4 5 6 Identifying Cancer-Linked Genes in Epithelial Adenocarcinomas Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung selection at p<0.001 242 Genes that Distinguish Tumor from Normal at p<0.001 in at least 3 of the 4 Tumor Types Summary An Integrated Database for Pan-Cancer Meta-Analysis of Gene Expression Data database components: internal and external datasets derived from: - tumor studies (clinical samples) - in vitro, pathway studies (eg, timecourse) - SAGE data - mouse studies (in vitro/in vivo) Future Directions Derive expression signatures for all major factors known or suspected to have prognostic value Determine the reliability of expression signatures in outcome prediction Expand integrated database for pancancer meta-analysis Integrate expression profiling into clinical decision making Acknowledgements GIS Adai Ramasamy Liza Vergara Phil Long Chin-Yo Lin Benjamin Mow Catholic University of Korea Suk-Woo Nam Jung Yong Lee