Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
UTHSC BINF April 24, 2008 Literature Mining Tools for Analysis of Genomic Data Ramin Homayouni, Ph.D. Associate Professor of Biology Director of Bioinformatics Gene Expression Profiling Now What? Alizadeh, et al., (2000) Nature 403:503. Useful Links for Functional Analysis Databases: – GO: http://www.geneontology.org/ – MeSH: http://www.nlm.nih.gov/mesh/meshhome.html – MEDLINE: http://www.ncbi.nlm.nih.gov/entrez/ – GEO: http://www.ncbi.nlm.nih.gov/projects/geo/ Programs: – GOTM (GO): http://genereg.ornl.gov/gotm/ – PubGene (MEDLINE): http://www.pubgene.org/ – Chilibot (MEDLINE): http://www.chilibot.net/ – Arrowsmith (MEDLINE): http://arrowsmith.psych.uic.edu/ – PubMatrix (MEDLINE): http://pubmatrix.grc.nia.nih.gov/ – TXTGate (MEDLINE): http://www.esat.kuleuven.ac.be/txtgate/ – iHOP: (MEDLINE) http://www.ihop-net.org/UniPub/iHOP/ – STRING (MEDLINE): http://string.embl.de/ Gene Ontology Consortium http://www.geneontology.org/ A controlled vocabulary applied to genes in a variety of organisms; updated every 30 minutes! Established in 1998 as a collaboration between FlyBase (Drosophila) Saccharomyces Genome Database (SGD) Mouse Genome Database (MGD) Three main classifications: Molecular Function (7385 terms) Biological Process (8822 terms) Cellular Component (1430 terms) Gene Ontology Consortium http://www.geneontology.org/ GO Tree Machine (GOTM) from WebGestalt Bing Zhang & Jay Snoddy, Vanderbilt University Zhang et al., BMC Bioinformatics. 2004 Feb 18;5(1):16. http://genereg.ornl.gov/gotm/ GO Tree Machine Demo GOTM http://bioinfo.vanderbilt.edu/webgestalt/ GO Tree Machine -- Example Problems with Gene Ontology, or any other manual indexing approach EGFR ERBB2 TRP53 TGFB1 (C) The vocabulary is general Not Comprehensive, therefore biased for well studied genes Human error: ~66% consistency between professional indexers! GO Classification Cell Process Cell growth maintenance Cell proliferation Cell cycle Binding Protein Metabolism Development Morphogenesis Organogenesis Protein phosphorylation Cell communication cell signaling receptor Enzyme linked sig transd. Protein Nuclear Import Protein transport Cell growth Sterss response EG F ER R BB TR 2 P TG 5 3 F RE B1 L DA N B VL 1 DL LR R P8 The vocabulary is limited DAB1 RELN LRP8 VLDLR Products of the National Library of Medicine (NLM) & National Center for Biotechnology Information (NCBI) Databases GenBank, UniGene, LocusLink (Gene) MEDLINE OMIM Services HealthSTAR Health Services Research Projects in Progress HSTAT Vocabulary Medical Subject Headings (MeSH) NLM Classification Unified Medical language Systems (UMLS) MEDLINE MEDLINE is the premier bibliographic database for biomedicine supported by the National Library of Medicine MEDLINE contains approximately 18 million references, most of which have abstracts. MEDLINE covers over 4800 journals, in over 30 languages MEDLINE citations date back to 1966 Free abstracts !! Defining Functional Relationships between Genes A B C Direct Relationship Gene relationships already known (e.g., A-B or B-C) • Term co-occurrence • Gene symbol: PubGene (Jenssen et al., Nature Genetics 2001 28:21) • Gene names (synonyms and aliases) – biochemical Indirect Relationship Gene relationships unknown (e.g., such as A-C) Reelin Signaling Pathway Reelin APP Amyloid plaques ApoE VLDLR ApoER2 fyn Dab1 p35 Cdk5 pTau Gene Document Test Set Reeler Alzheimer Disease Miscellaneous Reln Dab1 VLDLR Lpr8 APP Aplp2 Aplp1 Psen1 Psen2 Lrp1 Mapt Apoe A2m Apbb1 Apba1 Cdk5 Cdk5r Cdk5r2 Trp53 Fos Nras Rasa1 Rab1 Src Notch1 Dll1 Jag1 Robo1 Ptch Smo PubGene Query: Dab1 http://www.pubgene.org/ Reln 7 times Cdk5r 6 times Cdk5 5 times Gli2 3 times Src 3 times Dab2 2 times Fyn 2 times Sam68 1 times Cdkn1a 1 times Tbr1 1 times Gli 1 times Scr 1 times Shh 1 times cdf 1 times Ash 1 times Dlgh4 1 times p80 1 times Lck 1 times Emx1 1 times Pcdh18 1 times Agrn 1 times Arg2 1 times Mouse Jenssen et al., Nat Genet. 2001 May;28(1):21-8. Human DAB2 3 times GAD1 3 times RELN 3 times GSN 2 times TNFSF5 2 times HLA-DQA1 1 times BAT2 1 times GAD2 1 times PubMed Query: Dab1 AND Reln = 16 PubMed Query: Dab1 AND reelin = 152 ! iHOP Query: Dab1 http://www.ihop-net.org/ iHOP Query: Dab1; Sentence Structure http://www.ihop-net.org/ iHOP Query: Dab1; Network building http://www.ihop-net.org/ PubMatrix Demo iHOP (Information Hyperlink over Proteins) http://www.ihop-net.org/UniPub/iHOP/ Chilibot http://www.chilibot.net/ • Extracts term-term relationship from Medline abstracts. • Differentiates interactive (e.g. stimulation or inhibition) and noninteractive (e.g. homology, coexistence, etc.) interactions. • Color-codes gene expression values when data are provided. • Automatically suggests new hypothesis based on the literature. Chen and Sharp (2004) BMC Bioinformatics 5(1):147. Chilibot Demo Chilibot http://www.chilibot.net/ STRING at EMBL PubMatrix Demo STRING http://string.embl.de/ Vector Space Model: Latent Semantic Indexing G1 G2 W1 ... Gx w1 Query aij W2 w2 W3 . . . G1 Wx w3 aij = lij gi 50-Gene Document Collection Development 5 3 11 Alzheimer 16 15 Cancer Hierarchical Tree Development Cancer Development Alzheimer Unrooted Tree (Graph) Semantic Gene Organizer© User Interface GeneIndexer Software www.computablegenomix.com