Download Presentation slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Emergent Biology Through Integration and Mining
Of Microarray Datasets
Lance D. Miller
GIS Microarray & Expression Genomics
FOCUS:
Mining of expression data to understand
the molecular composition of human
cancers and to define components
of the tumor molecular profile
with mechanistic and
clinical importance.
2001, PNAS
Molecular classes are predictive of outcome
overall survival:
relapse-free survival:
70-gene prognosis classifier for predicting risk
of distant metastasis within 5 years
Van’t veer, et. al.
Van’t veer, et. al.
Sotiriou, et. al.
Though each tumor is molecularly unique,
there exist common transcriptional cassettes
that underly biological and clinical properties
of tumors that may be of diagnostic,
prognostic and therapeutic significance.
GOAL:
Mining of expression data to understand
the molecular composition of human
cancers and to define components
of the tumor molecular profile
with mechanistic and
clinical importance.
The GIS Perpetual Array Platform
Integration of Independent Datasets
Perou et. al., 1999
Sorlie et. al., 2001
West et. al., 2001
Meta-Analysis of Breast Cancer Datasets:
(Adaikalavan Ramasamy et. al.)
dataset
source
sample size
array format
1.
Miller-Liu:
unpublished
61 tumors: 39 ER+, 22 ER-
19K spotted oligo
2.
Sotiriou-Liu:
submitted: PNAS
99 tumors: 34 ER+, 65 ER-
7.6K spotted cDNA
3.
Gruvberger-Meltzer:
Cancer Research
47 tumors: 23 ER+, 24 ER-
6.7K spotted cDNA
4.
Sorlie-Borrensen-Dale: PNAS
74 tumors: 56 ER+, 18 ER-
8.1K spotted cDNA
5.
van’t Veer-Friend:
Nature
98 tumors: 59 ER+, 39 ER-
25K spotted oligo
6.
West-Nevins:
PNAS
49 tumors: 25 ER+, 24 ER-
7.1K Affymetrix
total: 428 tumors, ~73,500 probes
META MADB: The Construct
Building the Matrix
1.
2.
3.
4.
5.
Extract and Format the Data
Link sample/probe info via unique keys
Log Transform and Normalize
Filter Genes and Arrays
Apply Statistical Tests
Creating a Universe
1.
2.
3.
4.
5.
Apply UniGene ID as Unifying Key
Remove Gene Redundancy
Extract p values, d values, z-scores
Set p value threshold
Merge Datasets
META MADB
META MADB
d values (difference of average expression)
ER+
T1 T2 T3 T4 T5
ER…Tn
T1 T2 T3 T4 T5
…Tn
gene1 : e1 e2 e3 e4 e5 …en
e1 e2 e3 e4 e5
…en
d = average e [ER+]
/
average e [ER-]
Identifying Grade-Specific Genes
in Hepatocellular Carcinoma
Adenomatous hyperplasia
ordinary
atypical
OAH
AAH
HCC Grade 1, 2, 3
G1
G2
Pre-neoplastic lesions
HCC Progression
• Sample: 10 cases of each class
• Sample collection: HBV(+)
• Array: Human 19K Oligonucleotide array
• Analysis : 50 arrays
G3
Identifying Grade-Specific Genes
in Hepatocellular Carcinoma
Identifying Grade-Specific Genes
in Hepatocellular Carcinoma
BC
Breast Cancer Grade-Associated Genes as
Predictors of HCC Grade?
HCC
Breast Cancer Grade-Associated Genes as
Predictors of HCC Grade?
HCC
Estrogen Responsive Genes in vitro (Chin-Yo Lin)
Fold Change
Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)
UG Description
Fold
Interleukin 6 signal transducer (gp130, oncostatin M receptor)
2.5
++
+
Insulin-like growth factor binding protein 4
2.1
+
+
+
+
Seven in absentia homolog 2 (Drosophila)
1.7
+
+
Matrix metalloproteinase 7 (matrilysin, uterine)
-1.7
++
+
Stanniocalcin 2
5.0
++
+
+
++
Nuclear receptor interacting protein 1/RIP140
1.6
+
+
+
GREB1 protein
3.1
+
Serum-inducible kinase
-2.0
+
+
Amphiregulin
3.9
++
+
CD7 antigen (p41)
-2.5
+
+
Duodenal cytochrome
-2.1
+
+
Thrombospondin 1
2.4
+
+
Putative transmembrane protein
-3.8
+
+
+++
Stromal cell-derived factor 1
3.8
++
++
Retinoblastoma binding protein 8
2.2
++
+
+
++
Janus kinase 1 (a protein tyrosine kinase)
4.9
++
++
protein kinase H11
1.5
Olfactomedin 1
3.0
++
DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 10 (RNA helicase) 2.3
+
+
Hypothetical protein similar to mouse Dnajl1
2.5
+
+++
Putative protein kinase
1.7
2.5
+
UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 3.7
+
+
++
Hypothetical protein FLJ14299/Similar to nocA zinc-finger protein
2.5
++
Immunoglobulin superfamily, member 4
2.2
+
++
Cyclin G2
-2.6
++
+
Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase
-2.0
+
Chitobiase, di-N-acetyl-1.9
++
Arachidonate 12-lipoxygenase, 12R type
-4.0
++
+
Purinergic receptor (family A group 5)
-2.3
+
G protein-coupled receptor kinase 7/Binds Erbeta
-1.8
+
+
Estrogen-Responsive in vitro and ER Status-Associated in vivo
(p<0.001)
UG Description
Interleukin 6 signal transducer (gp130, oncostatin M receptor)
Insulin-like growth factor binding protein 4
Seven in absentia homolog 2 (Drosophila)
Matrix metalloproteinase 7 (matrilysin, uterine)
Stanniocalcin 2
Nuclear receptor interacting protein 1/RIP140
GREB1 protein
Serum-inducible kinase
Amphiregulin
CD7 antigen (p41)
E2
E2 + ICI
Fold Change
2.5
2.1
1.7
-1.7
5.0
1.6
3.1
-2.0
3.9
-2.5
2T47D MCF-7 ZR75-1 SAGE ERE (-2)
++
+
+
+
+
+
+
+
++
+
++
+
+
++
+
+
+
+
+
+
++
+
+
+
E2 + CHX
1 2
3 4 5 6
Identifying Cancer-Linked Genes
in Epithelial Adenocarcinomas
Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung
selection at p<0.001
242 Genes that Distinguish Tumor from Normal
at p<0.001 in at least 3 of the 4 Tumor Types
Summary
An Integrated Database for Pan-Cancer
Meta-Analysis of Gene Expression Data
database components:
internal and external datasets derived from:
-
tumor studies (clinical samples)
-
in vitro, pathway studies (eg, timecourse)
-
SAGE data
-
mouse studies (in vitro/in vivo)
Future Directions
 Derive expression signatures for all major
factors known or suspected to have
prognostic value
 Determine the reliability of expression
signatures in outcome prediction
 Expand integrated database for pancancer meta-analysis
 Integrate expression profiling into clinical
decision making
Acknowledgements
GIS
Adai Ramasamy
Liza Vergara
Phil Long
Chin-Yo Lin
Benjamin Mow
Catholic University of Korea
Suk-Woo Nam
Jung Yong Lee
Related documents