Download The Connectivity Map: using gene-expression signatures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Tay–Sachs disease wikipedia , lookup

Microevolution wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene therapy wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

NEDD9 wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Designer baby wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Transcript
Integrating Multiple Publically Available Gene Expression Datasets to Predict
Therapeutic Options across the Disease Nosology
Marina Sirota, Annie P. Chiang, Joel Dudley and Atul J. Butte
Results
Introduction
Methods
The goal of translational bioinformatics is to enable the transformation of increasingly
voluminous genomic and biological data into diagnostics and therapeutics for the clinician.
Microarray technology allows us to analyze expression of thousands of genes in a single
experiment quickly and efficiently. Traditionally, comparative microarray analysis has been
used in order to pinpoint genetic abnormalities in a disease of interest. By examining genes
that are upregulated and downregulated in a disease state as opposed to a normal state,
we can create a genetic profile of a disease. In addition, microarrays have been used to
monitor changes in gene expression in response to drug treatments. Combining results of
disease and drug related microarray experiments enables the discovery of possible
functional connections between drugs, genes and diseases through common gene
expression changes.
Data
The Gene Expression Omnibus (GEO) is a publically available gene expression and
molecular abundance repository. It is an online resource for gene expression data
browsing, query and retrieval. The database contains roughly 200,000 microarray
experiments derived from over 100 organisms, addressing a wide range of biological
issues. For the purposes of this project we are primarily interested in the GEO microarray
datasets which allow for a comparative analysis between diseased and normal individuals.
In our analysis we combine data from nearly 70 disease microarray datasets obtained from
GEO and gene expression data from human cell lines treated with roughly 160 drugs or
small molecules. The drug related data was generated by comparing treated and untreated
cancerous cell lines including MCF7 breast cancer cell line, PC3 prostate cancer cell line,
HL60 leukemia and SKMEL5 melanoma cell lines (Lamb et al., 2006).
In a recent study, Lamb et al. present us with a collection of genome-wide transcriptional
expression data from cultured human cells treated with bioactive small molecules. The
study consists of 453 experiments with different dosages of 164 compound perturbagens
and corresponding vehicle controls. The selected compounds include several FDA
approved drugs as well as some nondrug bioactive compounds chosen to represent a
broad range of effects. The authors create 11 disease signatures manually by examining
the relevant literature and study associations between drugs, molecular compounds such
as HDAC inhibitors and disease states such as diet-induced obesity and Alzheimer’s
disease. (Lamb et al., 2006)
Generating Disease Signatures
We start by choosing a single most representative disease dataset with a corresponding
control in GEO. We mine GEO for disease related experiments by making use of
annotations relating GEO experiments with PUBMED identifiers representing the
publication in which each experiment was published (Butte et al., 2006). We require that
each of the datasets contains a disease and control experiment. We carry out Significance
Analysis of Microarrays (SAM) (Tusher et al., 2001) on every control-disease pairing to
generate a list of upregulated and downregulated genes for each disease state. Using a
0.05 significance cutoff on the q-values from the SAM analysis we generate a signature
profile of significantly up and down regulated genes for each disease of interest.
Drug Expression Data
Disease Signatures
GeneID
Affy Probes
Experiments
In this work, we recreate and extend the drug-disease “connectivity map” using publically
available disease related gene expression data obtained from the Gene Expression
Omnibus (GEO). We automate the process of creating disease signatures using publically
available data. We extend the original set of 11 signatures to examine nearly 70 diseases
and predict possible therapeutics based on the drug-disease connectivity scores. We
validate our findings using the known drug disease associations from the Micromedex
database.
DOWN
GeneID
Affy Probe Gene ID
Affy Probe
1026
2517
4501
10202
2745
4501
1191
1191
7846
2745
8339
10202
8339
1191
202284_s_at
202838_at
204326_x_at
206463_s_at
206662_at
208581_x_at
208791_at
208792_s_at
209118_s_at
209276_s_at
210387_at
214079_at
215779_s_at
222043_at
201305_x_at
201306_s_at
202589_at
202613_at
208974_x_at
208975_s_at
213507_s_at
213573_at
213803_at
217027_x_at
217684_at
UP
Up Down
The Connectivity Map Concept. Gene-expression profiles derived from the treatment of cultured human cells with a
large number of perturbagens populate a reference database. Gene-expression signatures represent any induced or
organic cell state of interest (left). Pattern-matching algorithms score each reference profile for the direction and
strength of enrichment with the query signature (center). Perturbagens are ranked by this ‘‘connectivity score’’; those at
the top (‘‘positive’’) and bottom (‘‘negative’’) are functionally connected with the query state (right) through the
transitory feature of common gene-expression changes. (Lamb et al., 2006)
UP
Results Summary. A heatmap visualizing connectivity scores of 164 drugs and compounds for each of the 66 diseases.
Red indicates a score of -1 suggesting the drug is a possible treatment for the disease. Green indicates a score of +1
suggesting a possible adverse reaction or cause for the disease.
1026
2517
4501
10202
2745
1191
7846
8339
10541
10541
7298
1503
3837
3837
3837
3837
3837
3837
7298
DOWN
10541
7298
1503
3837
Data Processing and Analysis. In order to be able to analyze data across multiple experiments from different platforms
we need to standardize the gene identifiers from chip probe ids to NCBI GeneID (Chen et al., 2007).
Computing Enrichment Scores
For each treatment-disease pair we compute an enrichment score for the probe sets
representing the up or down regulated signature genes separately using a rank-based
Kolmogorov Smirnov statistic (Lamb et al., 2006). The scores from up and down regulated
genes are combined into a single connectivity score for each drug disease combination.
Validation. We validate our findings by querying Micromedex for known drug disease associations, namely FDA approved
treatments and known adverse effects. Above is the result for breast cancer. The green circles indicate FDA approved
treatments for breast cancer and red circles indicate the drugs that have been recorded as having an adverse effect in
patients with breast cancer. The treatment vs. adverse effect distributions are significantly different (p-value = 0.008).
References
1. Butte AJ, Chen R. Finding Disease-Related Genomic Experiments Within an International
Repository: First Steps in Translational Bioinformatics. AMIA Annu Symp Proc. 2006;
2006: 106–110.
2. Chen R, Butte AJ. AILUN: reannotating gene expression data automatically. Nature
Methods 2007; 4:879.
3. Lamb J, Crawford ED, Peck D, et al. The Connectivity Map: using gene-expression
signatures to connect small molecules, genes, and disease. Science 2006; 313:1929-35.
4. Tusher V, et al. Significance Analysis of Microarrays Applied to the Ionizing Radiation
Response. Proceedings of the National Academy of Science 2001; 98:5116-5121.