Download TN Bhat

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ontology, RDF, SW for
Chemical Structures
T N Bhat & J. Barkley
NIST
[email protected]
Query tool
Use Case
Publications
Major Features, Goal – to Reduce
User Frustration
We have established a use case at the HCLS
Website - Chemical taxonomies
Combining of Rule-based terms with Vocabularybased terms to define elements of RDF
Organization of the elements of RDF into
predictable ontology using concepts from use
cases
Developing tools and techniques to present the
information using familiar database environments
– Allows easier portability and implementation of the
information by the community
Illustrating the concept using high profile data
such as for AIDS inhibitors and Protein Data Bank
contents
Combining of Rule-based with Vocabularybased elements to define RDF
Chemical structures are definable by atomic
connectivity – thus structures are suitable for
identification using graph theory – InChI
– Suitable for machine reasoning
Graphs are hard to digest for humans – therefore
proposal is to combine InChI with familiar
vocabularies such as Ala, Phenyl, Adenine
– Also include synonyms in the vocabulary for greater
coverage among diverse users
– Vocabularies make it easier for humans to recognize the
information
InChI – a Scalable URI
InChI is generated using a software
that decodes the chemical
connectivity information in certain
layers such as chirality, ring
structure, atom type and then recodes them to form a text string
InChI is a naming standard for
chemicals recommended by IUPAC
InChI – a rule-based URI
InChI
–
_1_2FC10H11NO2_2Fc1110_2812_2913-9-5-7-3-1-2-48_287_296-9_2Fh1-4_2C9H_2C56H2_2C_28H2_2C11_2C12_29
Vocabulary-based Definitions
For decades scientists have been developing names to identify
structures and their images
– Simple names
His
Ala
DNA
ATP
– Semi-rule-based IUPAC names
2-amino-3-methylpentanamide
4-amino-3-hydroxy-6-methylheptanoic_acid
1-[(Benzenesulfonyl-methyl-amino)-phenyl-butyl]-piperidin-4-yl}propyl-carbamic acid, naphthalen-1-ylmethyl ester
Names facilitate text-based queries of desired components
Names when used together with InChI provide a smoother
integration of machine and human needs
Use-Case for SW; Treatment for
AIDS is a work in progress
Treatments for AIDS are of two types
– Prevention – the most effective
– Containment
Drugs to contain, and reduce the viral load
–
–
–
Majority of the drugs ( ~17) target either HIV
protease or RT
Complete suppression of either of these viral
enzymes could cure AIDS
But drug resistance leads only to partial
suppression of the enzymes
All the drug design efforts for AIDS are
based on structures
Data needed for drug-design is scattered
over many Web resources and users often
wean through the data manually
Therefore AIDS drug design is an ideal
target for Semantic Web and novel new
database related technologies
SW connection between NIST and NIAID
AIDS database
Choose the problem
that matters
Website
Annotation Technique/Developing
Structural Ontology
Define compounds using chemical
features of interest to use cases
– Fragment, subgroup, class
000503
000505
1A8K
030798
Modeling with Protégé – Suitable
for Text-based Ontology
Web tools
Structures are different from text based
info
– Structures are not amenable to text-based
query/rendering techniques
– Majority of the structural users never heard
(nor want to hear!) about SPARQL – query
language for RDF
– Commonly preferred/expected way to query is
by ‘click’
Semantic Web for Structures needs new
Web tools that allow navigation by clicking
on structural features
Chem-BLAST for Structural Semantic Web
http://bioinfo.nist.gov/SemanticWeb_pr3d/chemblast.do
Prasanna et al. PROTEINS 60, 1-4 (2005).
Prasanna et al. PROTEINS 63(4), 907-917(2006).
Download publications
Future Plans
Extend the work to chemical structures from Protein Data
Bank
If interest exists hold a workshop at NIST Proposed dates last two weeks of March 2008
– Workshop will be in conjunction with the NIST wide
Ontology week
Possible collaboration with IUPAC (International Union of
Pure and Applied Chemistry ) and ChEBI
– Contact: Colin Batchelor [email protected]
RSC Publishing,
Royal Society of Chemistry
Community participation is essential for further
development
Contact [email protected] 301 975 5448 (US)