Download Semantic Mediation and Scientific Workflows

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Semantic Mediation and
Scientific Workflows
Bertram Ludäscher
Data and Knowledge Systems
San Diego Supercomputer Center
University of California, San Diego
Some BIRNing Data Integration
Questions
Biomedical Informatics
Research Network
• Data Integration Approaches:
–
–
–
–
Let’s just share data, e.g., link everything from a web page!
... or better put everything into an relational or XML database
... and do remote access using the Grid
... or just use Web services!
• Nice try. But:
– “Find the files where the amygdala was segmented.”
– “Which other structures were segmented in the same files?”
– “Did the volume of any of those structures differ much from
normal?”
– What is the cerebellar distribution of rat proteins with more
than 70% homology with human NCS-1? Any structure
specificity? How about other rodents?
SEEK Kansas 11/02
2
SEEK Kansas 11/02
3
XML-Based (or Relational) vs. Semantic Mediation
CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …}
Integrated-DTD 
XML-QL(Src1-DTD,...)
“Glue Maps”
= Domain &
Process Maps
(ontologies)
No Domain
Constraints
CM-QL ~ {F-Logic, DAML+OIL, …}
Integrated-CM 
CM-QL(Src1-CM,...)
IF
 THEN 
IF
IFTHEN
THEN 
Structural Constraints (DTDs),
Parent, Child, Sibling, ...
A = (B*|C),D
B = ...
C1
C2
....
XML
Elements
XML Models
Raw
Raw
Data
RawData
Data
C3
R
....
. . ....
....
Logical
Domain
Constraints
Classes,
Relations,
is-a,
has-a, ...
(XML)
Objects
Conceptual Models
Making the SM System “Understand” Your Data:
Source Contextualization via Ontology Refinement
In addition to registering
(“hanging off”) data relative to
existing concepts, a source
may also refine the mediator’s
domain map...
 sources can register new
concepts at the mediator ...
SEEK Kansas 11/02
5
Query Processing
Demo
Mediator View Definition
Contextualization
DERIVE
CON(Result) wrt.
ANATOM.
protein_distribution(Protein,
Organism,Brain_region,
Feature_name, Anatom, Value)
WHERE
I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->>
{AS:anatomical_structure[name->Anatom]}] ,
% from PROLAB
NAE:neuro_anatomic_entity[name->Anatom;
% from ANATOM
located_in->>{Brain_region}],
Query results
AS..segments..features[name->Feature_name;
value->Value].
in context
• provided by the domain expert and mediation engineer
• deductive OO language (here: F-logic)
gi#’s from
clusfavor
blast
Genomic gi#
Chr #
Gene location
A Scientific Workflow:
Promoter Identification
cDNA gi# blast other species
Gene name
blast human
Genomic gi#
Chr #
Gene location
GRAIL
TRANSFAC
CLUS TAL
GC Island location
Exon/intron location
Repeats location
Promoter location
Validates polII
promoter location
TAF’s
Location on Genomic gi#’s
Probabilities of match
Probabilities of random match
Data Consolidation
TRANSFAC
Consensus sequences
CLUS TAL
blast
Questions:
Are chr#’s in common?
Are chr#’s locations in common?
Are there conserved upstream sequences?
Are gene locations conserved across species
SEEK Kansas 11/02
Questions:
RNA POLII promoter?
GpC Island present?
Are there common TAF’s
across genomic gi#?
7
promoter location
Shared TAF’s across cluster
Common consensus sequence
blast
Genomic gi#
cDNA gi#
Questions:
Are there other common genes?
Matthew Coleman, LLNL, 2002
SDM Demo & Architecture
Translation Approach:
Abstract Workflow (AWF) => Executable Workflow (EWF)
SEEK Kansas 11/02
8
Related documents