Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
UTHSC BINF
April 24, 2008
Literature Mining Tools for
Analysis of Genomic Data
Ramin Homayouni, Ph.D.
Associate Professor of Biology
Director of Bioinformatics
Gene Expression Profiling
Now What?
Alizadeh, et al., (2000) Nature 403:503.
Useful Links for Functional Analysis
Databases:
– GO: http://www.geneontology.org/
– MeSH: http://www.nlm.nih.gov/mesh/meshhome.html
– MEDLINE: http://www.ncbi.nlm.nih.gov/entrez/
– GEO: http://www.ncbi.nlm.nih.gov/projects/geo/
Programs:
– GOTM (GO): http://genereg.ornl.gov/gotm/
– PubGene (MEDLINE): http://www.pubgene.org/
– Chilibot (MEDLINE): http://www.chilibot.net/
– Arrowsmith (MEDLINE): http://arrowsmith.psych.uic.edu/
– PubMatrix (MEDLINE): http://pubmatrix.grc.nia.nih.gov/
– TXTGate (MEDLINE): http://www.esat.kuleuven.ac.be/txtgate/
– iHOP: (MEDLINE) http://www.ihop-net.org/UniPub/iHOP/
– STRING (MEDLINE): http://string.embl.de/
Gene Ontology Consortium
http://www.geneontology.org/
 A controlled vocabulary applied to genes in a
variety of organisms; updated every 30 minutes!
 Established in 1998 as a collaboration between
FlyBase (Drosophila)
Saccharomyces Genome Database (SGD)
Mouse Genome Database (MGD)
 Three main classifications:
Molecular Function (7385 terms)
Biological Process (8822 terms)
Cellular Component (1430 terms)
Gene Ontology Consortium
http://www.geneontology.org/
GO Tree Machine (GOTM) from WebGestalt
Bing Zhang & Jay Snoddy, Vanderbilt University
Zhang et al., BMC Bioinformatics. 2004 Feb 18;5(1):16.
http://genereg.ornl.gov/gotm/
GO Tree Machine
Demo
GOTM
http://bioinfo.vanderbilt.edu/webgestalt/
GO Tree Machine -- Example
Problems with Gene Ontology, or any other
manual indexing approach
EGFR
ERBB2
TRP53
TGFB1
(C)
 The vocabulary is general
 Not Comprehensive, therefore
biased for well studied genes
 Human error: ~66%
consistency between
professional indexers!
GO Classification
Cell Process
Cell growth maintenance
Cell proliferation
Cell cycle
Binding
Protein Metabolism
Development
Morphogenesis
Organogenesis
Protein phosphorylation
Cell communication
cell signaling receptor
Enzyme linked sig transd.
Protein Nuclear Import
Protein transport
Cell growth
Sterss response
EG
F
ER R
BB
TR 2
P
TG 5 3
F
RE B1
L
DA N
B
VL 1
DL
LR R
P8
 The vocabulary is limited
DAB1
RELN
LRP8
VLDLR
Products of the National Library of Medicine (NLM) &
National Center for Biotechnology Information (NCBI)
 Databases
GenBank, UniGene, LocusLink (Gene)
MEDLINE
OMIM
 Services
HealthSTAR
Health Services Research Projects in Progress
HSTAT
 Vocabulary
Medical Subject Headings (MeSH)
NLM Classification
Unified Medical language Systems (UMLS)
MEDLINE

MEDLINE is the premier bibliographic database for biomedicine supported by the
National Library of Medicine

MEDLINE contains approximately 18 million references, most of which have
abstracts.

MEDLINE covers over
4800 journals, in
over 30 languages

MEDLINE citations
date back to 1966

Free abstracts !!
Defining Functional Relationships
between Genes
A
B
C
 Direct Relationship
Gene relationships already known (e.g., A-B or B-C)
• Term co-occurrence
• Gene symbol: PubGene (Jenssen et al., Nature Genetics 2001 28:21)
• Gene names (synonyms and aliases) – biochemical
 Indirect Relationship
Gene relationships unknown (e.g., such as A-C)
Reelin Signaling Pathway
Reelin
APP
Amyloid
plaques
ApoE
VLDLR
ApoER2
fyn
Dab1
p35
Cdk5
pTau
Gene Document Test Set
Reeler
Alzheimer Disease
Miscellaneous
Reln
Dab1
VLDLR
Lpr8
APP
Aplp2
Aplp1
Psen1
Psen2
Lrp1
Mapt
Apoe
A2m
Apbb1
Apba1
Cdk5
Cdk5r
Cdk5r2
Trp53
Fos
Nras
Rasa1
Rab1
Src
Notch1
Dll1
Jag1
Robo1
Ptch
Smo
PubGene Query: Dab1
http://www.pubgene.org/
Reln 7 times
Cdk5r 6 times
Cdk5 5 times
Gli2 3 times
Src 3 times
Dab2 2 times
Fyn 2 times
Sam68 1 times
Cdkn1a 1 times
Tbr1 1 times
Gli 1 times
Scr 1 times
Shh 1 times
cdf 1 times
Ash 1 times
Dlgh4 1 times
p80 1 times
Lck 1 times
Emx1 1 times
Pcdh18 1 times
Agrn 1 times
Arg2 1 times
Mouse
Jenssen et al., Nat Genet. 2001 May;28(1):21-8.
Human
DAB2 3 times
GAD1 3 times
RELN 3 times
GSN 2 times
TNFSF5 2 times
HLA-DQA1 1 times
BAT2 1 times
GAD2 1 times
PubMed Query: Dab1 AND Reln = 16
PubMed Query: Dab1 AND reelin = 152 !
iHOP Query: Dab1
http://www.ihop-net.org/
iHOP Query: Dab1; Sentence Structure
http://www.ihop-net.org/
iHOP Query: Dab1; Network building
http://www.ihop-net.org/
PubMatrix
Demo
iHOP
(Information Hyperlink over Proteins)
http://www.ihop-net.org/UniPub/iHOP/
Chilibot http://www.chilibot.net/
• Extracts term-term relationship from
Medline abstracts.
• Differentiates interactive (e.g.
stimulation or inhibition) and noninteractive (e.g. homology, coexistence, etc.) interactions.
• Color-codes gene expression values
when data are provided.
• Automatically suggests new
hypothesis based on the literature.
Chen and Sharp (2004) BMC Bioinformatics 5(1):147.
Chilibot
Demo
Chilibot
http://www.chilibot.net/
STRING at EMBL
PubMatrix
Demo
STRING
http://string.embl.de/
Vector Space Model:
Latent Semantic Indexing
G1 G2
W1
...
Gx
w1
Query
aij
W2
w2
W3

.
.
.
G1
Wx
w3
aij = lij gi
50-Gene Document Collection
Development
5
3
11
Alzheimer
16
15
Cancer
Hierarchical Tree
Development
Cancer
Development
Alzheimer
Unrooted Tree (Graph)
Semantic Gene Organizer©
User Interface
GeneIndexer Software
www.computablegenomix.com
Related documents