Download Gonzalez

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Knowledge Integration for Gene
Target Selection
Graciela Gonzalez, PhD
Juan C. Uribe
Contact: [email protected]
GeneRanker in a Nutshell
• Integration of knowledge from
– biomedical literature
– curated PPI databases, and
– protein network topology
• Seeks to prioritize lists of genes on
their association to specific diseases and
phenotypes [1],
• Such associations may or may not have
been published (thus, not text mining)
[1] Gonzalez G, Uribe JC, Tari L, Brophy C, Baral C. Mining Gene-Disease relationships from Biomedical
Literature: Incorporating Interactions, Connectivity, Confidence, and Context Measures. Pacific Symposium in
Biocomputing; 2007; Maui, Hawaii; 2007.
GeneRanker Interface
1. The user types a disease or
biological process to be
searched.
2. Genes found to be in
association to the disease
are extracted from the
literature.
3. Protein-protein interactions
involving those genes are
then pulled from the
literature & curated sources
4. The protein network is built
and each gene ranked
GeneRanker Interface
Collaboration: Application of GeneRanker to
a biological context, with Dr. Michael Berens,
Director of the Brain Tumor Unit at the
Translational Genomics Institute (TGen).
GeneRanker is available as an online
application at http://www.generanker.org.
• Each gene is scored and can be annotated (count of
co-occurrences and statistical representation)
Evaluation of GeneRanker
Mining genes related to gliom a: Precision by Method
Ranked list (top 50)
Ranked list (top 100)
Ranked list (top 200)
Gene-disease search
Random List
0%
10%
Related (>10 articles)
•
•
•
20%
30%
40%
50%
60%
Possibly Related (1 to 10 articles)
70%
80%
90%
100%
No evidence of relation or not a gene
Contextual (PubMed search) based shows > 20% jump in precision
over NLP based extraction.
Synthetic network results show AUC > 0.984
Empirical validation against a glioma dataset shows consistent results
(118 vs 22 differentially expressed probes from top vs bottom of list)
Complementary Work
• CBioC: www.cbioc.org shows PPIs,
gene-disease, and gene-bioprocess
associations extracted from abstracts
• BANNER: sourceforge.banner.org
(presenting a poster on this one). An
open source entity recognizer available
now.
• Gene normalization: a similar open
source system soon to be available.