Download Goble, 2001

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein (nutrient) wikipedia , lookup

Protein wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Protein moonlighting wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

List of types of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein domain wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

JADE1 wikipedia , lookup

Cyclol wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Transcript
Transparent access to multiple
bioinformatics information
sources (TAMBIS)
Goble, C.A. et al. (2001)
IBM Systems Journal
40(2), 532-551
Genome Analysis
Paper Presentation
March 24, 2005
Presentation Overview













Why the need to integrate
Definitions (“MW”s)
Biologists’ burden
What is TAMBIS
The TaO
Brains of TAMBIS
What makes TAMBIS “service-oriented”?
GRAIL
TAMBIS Architecture
What can you do at TAMBIS?
Related Work
More current Work
Ongoing challenges for integration
Why the need to Integrate?

The Molecular Biology Database Collection has 500+
resources




719 in 2005 NAR DB issue
Adding ~150 in the past two years
Independent development and differing scopes
 heterogeneous formats, interfaces, input, outputs
Most popular resources :




DNA and Protein sequences (GenBank, Swiss-Prot)
Genome data (ACeDB)
Protein structure and motifs (PDB, PROSITE)
Similarity searching (BLAST)
Definitions (MW*)




Extensional coverage :
number of entries /
instances covered by the source
Intensional coverage :
number of information fields /
meta-data in each source
Description Logic :
A family of knowledge representation languages which can
be used to represent the terminological knowledge of an
application domain in a structured and formally wellunderstood way.
CPL (Collection Programming Language) :
A functional multidatabase language; models complex data
types such as lists, sets, and variants with drivers (wrappers)
that execute requests over data sources
* MW = “misunderstood word” (from a Montessori class)
Definitions (MW*)



Terminology server :
Encapsulates the reasoning services associated with the
Description Logic, supporting concept reasoning, role
sanctioning, thesaurus, extrinsics services
Sanctioning :
Capability of inferring more (biological) concepts by way of
compositional constraints encompassed in the ontology
Ontology :
An explicit formal specification of how to represent the
objects, concepts, and other entities that are assumed to
exist in some area of interest and the relationships that hold
among them.
* MW = “misunderstood word” (from a Montessori class)
Biologists’ burden





Construct a view of the meta-data
Resolve structural and semantic differences in the
information
Locate and communicate with the sources
Interoperate between resources
Transformation process
…. “fragile” process…. undoubtably specialized
TAMBIS

A prototype mediation system





Designed to lessen the burden as described previously
Service-oriented
Based on an extensive source-independent global ontology of
molecular biology and bioinformatics
Represented in a Description Logic
Managed by a terminology server

A mixed top-down and bottom-up iterative methodology

Providing a single access point for biological information
sources around the world
Emphasis of TAMBIS



High transparency
Read-only access
Retrieval-oriented architecture



Efficiency and correctness
Heterogeneity management
Visual query interface
Features of TAMBIS


Very rich domain ontology (1,800 biological
concepts)
Web-based…


Query formation
Ontology browsing

Query translation and planning process

More than GO, more than SRS
The TaO


Aim is to capture biological and bioinformatics
knowledge in a logical conceptual framework
Constraints… or features…




Only biologically sensible concepts classify correctly
Can encompass different user views
Makes biological concepts and their relationships
computationally accessible
Could have used another ontology but this one was
developed concurrently for TAMBIS
The TaO
Current state of TaO

Big Model


Baby model (Baby TaO)



Covers proteins, nucleic acids, their components,
function, location, publishing
Covers only the protein subset of the big model
Used for the “fully functional version” of TAMBIS
Reconciled model

Merged version of the big and baby TAMBIS
ontologies
Brains of TAMBIS

… Query translation and planning process

“A concept formed as a query is resolved when its extension
is retrieved”
Sample query,
Protein which hasFunction Receptor

Takes a query phrased in terms of the conceptual layer and
converts it into an executable plan in terms of the classes
and methods of the physical layer.
Plans an efficient way of executing a query
i.e., evaluates the alternatives paths
The various resources do not need to provide query language
interfaces


(Definitions revisited)
relationship
concept
What makes TAMBIS “serviceoriented”?

Reasoning services for description logics







Subsumption
Classification
Satisfiability
Retrieval
Sanctioned term construction
Querying
Terminology Services
(Definitions revisited)
subsumption
sanction
An example of subsumption
GRAIL




A concept modelling language
A Description Logic in the KL-ONE family….
In this case, used to describe biological concepts
Two major services provided :


Supporting transitive roles, role hierarchies, a
powerful set of concept assertion axioms
Novel multilayered sanctioning mechanism
TAMBIS Architecture

Three layers (“models”)




Physical
Conceptual
Mapping
Five components





Ontology of biological terms
(A)
Knowledge-driven query
formulation interface (B)
Sources and Services Model
linking the biological ontology
with the source schemas (C)
Query transformation
rewriting process (D)
Wrapper service dealing with
external sources (E)
Query translation
What can you do at TAMBIS?





Browse the ontology
Build a query with a visual interface and reference
to an ontology
Give values to concepts (for a query)
Identify desired concepts as results
Bookmark your queries
Ontology browser
Specific questions for TAMBIS



Find human homologues of yeast receptor proteins
Find rat proteins that have a domain with a sevenpropeller domain architecture
Find the binding sites of human enzymes with zinc
cofactors
…. How many sources are involved per question?
…. How difficult to find these answers without
integration?.... For someone unfamiliar with the
resources?
TAMBIS Overview
Natural language :
Select motifs for antigenic human proteins that participate in apoptosis
and are homologous to the lymphocyte associated receptor of death
(also known as lard).
TAMBIS Translation :
Select patterns in the proteins that invoke an immunological response
and participate in programmed cell death that are similar in their
sequence of amino acids to the protein that is associate with triggering
cell death in the white cells of the immune system.
Concept expression in GRAIL :
Motif which
<isComponentOf (Protein which
<hasOrganismClassification Species
FunctionsInProcess Apoptosis
HasFunction Antigen
isHomologousTo Protein which
<hasName ProteinName>)>)>
(Species given value “human” and ProteinName given value “lard”)
Related Work

Closest work : Object-Protocol Model (OPM)


SRS, Entrez, BioNavigator



Does not handle as complex queries
TAMBIS is query based, these are clicking-based
BioKleisli, DiscoveryLink


No source transparency
Middleware solutions, TAMBIS sits on top of this
Carnot

General rather than detailed ontology
More current work

DAML + OIL (new DL for TAMBIS)

DARPA Agent Markup Language – provides a rich set of
constructs to create ontologies and to markup information so
that it is machine-readable

CPL/BioKleisli (wrapper language) replaced by DiscoveryHub
(commercial)

GO – more completely and widely used
Protégé OWL



Ontology editor for the Semantic Web
BioMOBY, BioConductor

Complementary systems
Ongoing challenges to integration

Evaluation



Technical efficiency
User usability
Changing underlying resources



Resources disappear
Changes in popularity
MAINTENANCE
…. Widespread acceptance and use?
References





Goble, C.A. et al. (2001) “Transparent access to multiple
bioinformatics information resources.” IBM Systems Journal.
40(2), 532-551.
Baker, P.G. et al. (1999) “An ontology for bioinformatics
applications.” Bioinformatics. 15(6), 510-520.
Ontology definition : dli.grainger.uiuc.edu/glossary.htm
Description Logic defn :
www.absoluteastronomy.com/encyclopedia/D/De/Descriptio
n_Logic.htm
TAMBIS website :
http://imgproj.cs.man.ac.uk/tambis/