Download konagaya

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Bioinformatics Ontology for
Automatic Workflow Generation
on Web/Grid Services
Konagaya Akihiko
Project Director
Advanced Genome Information Technology
Research Group
RIKEN Genomic Sciences Center
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Contents
•
•
•
•
Role of Ontology
Web Services for Bioinformatics
Automatics Workflow Generation
Lessons from our First Experience
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Role of Ontology
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Tacit and Explicit Knowledge
We should start from the fact that
'we can know more than we can tell'.
Michael Polanyi, “The Tacit Dimension” 1967
Michael Polanyi (1891-1976)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Rainbow Color
How many colors can you see in rainbow?
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Ontology for Rainbow Colors
From 360 nm~400 nm
All the colors
you can see
with your
own eyes!
to 760 nm~830 nm
Purple
RGB Value
#800080
Indigo
#000080
Blue
#0000FF
Green
#008000
Yellow
#FFFF00
Orange
#FF8000
Red
#FF0000
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Which are Purple?
#800050
#800060
#800070
#700080
#800080
#600080
#500080
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Representation
by Elements and Constructor
#800050
#800070
Blue
Element
#700080
Red
Element
#800060
#800080
#600080
Purple
#500080
Purple
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Web Services for Bioinformatics
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Advantages of Web Services
•Liberating from the maintenance of biological
databases and tools
•Scalability of computational resources
•High-level application programming interface
Web Services
Task
B
computing
computing
Task
C
computing
Task
D
computing
Web Services
DB X
DB Y
DB Z
Output
Input
Task
A
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Very Simple Work Flow
Sequence
BLAST Search
UniProt
Hittable
GetEntry
UniProt
Sequences
CLUSTAL W
Multiple Alignment
Tree View
Phylogenetic tree
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Manual Workflow on Web Apps
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Web Service Programming
#!/usr/bin/perl
use SOAP::Lite;
# SOAP API
# specify WSDL
my $service = SOAP::Lite-> service('http://xml.nig.ac.jp/wsdl/GetEntry.wsdl');
# call web service
$result = $service->getXML_DDBJEntry("AB000003");
# print result
print $result;
http://www.xml.nig.ac.jp/perl.txt
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Why don’t we use workflow tools?
http://www.cyclonic.org/Taverna_and_myGrid.ppt
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Needs Automatic Workflow
Generate Tool from
Very High Level Specification
apply Blastp to UniProt
GetEntry from UniProt
Automatics
Generation
apply CLUSTALW
apply TreeView
?
Workflow
for
Bioinformatics Web Services
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Automatic Generation of
Bioinformatics Workflow
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task as Atomic Component
of Workflow
Input Data
Specification
sample
{aa_sequence,fasta}
Application
sample
{blastp DAD}
Output Data
Specification
sample
{ddbjentry,flatfile}
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Workflow as a Sequence of
Tasks
Output
Input
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Automatic Generation of Workflow
from Given Input and Output
Data Specification and Tasks
Output
Input
Task
A
Task
B
Task
C
Task
D
• Path Finding using Meta Information
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Meta Information to Specify
the Functionality of Task
TASK
Meta Data
for Database
samples
{uniprot}
{nt}
Meta Data
for Input
samples
{na_sequence,fasta}
{aa_sequence,fast}
Meta Information
for Command and
Options
{blastn}
{getentry}
Meta Data
for Output
sample
{ddbjentry,flatfile}
{aablastentry,hittable}
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Hierarchy (is_a)
I
S Homology search V
S
BLAST
N blastn id
V
S
FASTA
V
I Concrete Task O
S SSEARCH V
I : Input Type
O: Output Type
S : Sequence or
Sequence Name
N fasta id
N blastx id
A blastp id
Abstract Task O
・・・
A rfasta id
V : Various Type
N : Nucleoside
Sequence
A : Amino acid
Sequence
id : Accession ID
E : Database Entry
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Hierarchy (has_a)
Well Known/user defined Task
S
S
Genome Annotation
[glimmer2,blastn,getEntry]
glimmer2 S
N
blastn
id
E
id
getEntry
E
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Prototype for ‘Proof of Concept’
• Language tuProlog
– Java to Prolog
– Prolog to Java
• Web Service Interface through JAVA API
• Task Database
– Prolog Clause Database
• Optimal Path Finding
– Bidirectional Breadth First Search Algorithm
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
System Overview
Workflow System
User
Specification
UI
Workflow
User Workflow
Execution
Result
User
Data
Prolog
Engine
tuProlog
Web Service
Library
(Java)
Workflow
Library
(prolog)
Knowledge Base
Task
Database
(1697)
Web Service
Information
(1596)
Server
DDBJ
SPBIO
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Screen Snapshot
(Workflow Generation Phase)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Screen Snapshot
(Workflow Execution Phase)
A.Konagaya,
ASTRENA-APBioNet
Joint Meeting
at 22ndat
APAN
Singapore
18 July 2006
A.Konagaya
ASTRENA-APBioNet
Joint Meeting
22nd
APAN Singapore
18 July 20
Obtained Phylogenetic Tree
by a generated workflow
when applying to a Human Insulin Sequence
P61982[Mus musculus]
P61981[Homo sapiens]
Q5RC20[Pongo pygmaeus]
P61983[Rattus norvegicus]
Q5F3W6[Gallus gallus]
P68252[Bos taurus]
Q6PCG0[Xenopus laevis]
Q6NRY9[Xenopus laevis]
Q04917[Homo sapiens]
P68509[Bos taurus]
P68511[Rattus norvegicus]
P68510[Mus musculus]
Q6UFZ2[Oncorhynchus mykiss]
Q6PC29[Brachydanio rerio]
Q6UFZ3[Oncorhynchus mykiss]
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Lessons from our First Experience
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Task Database (prototype)
Web Service Call
DDBJ Blast
453
DDBJ SRS
638
DDBJ GetEntry 38
DDBJ ClustalW 62
SPBIO Blast
405
Format Transformation
Data Selection
In Total
56
45
1697
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Test Set of Specification
No
Input
Format
Type
1 fasta
Format
aasequence gde
2
3 fasta
aasequence gde
4
aasequence
fasta
Workflow
Output
Type
Aplications
blastp uniprot
filter num25
aamultiplealignment
getfasta_swissentry
multiplealignment
blastp uniprot
filter num25
getfasta_swissentry
multiplealignment
alignmentsearch
filter
aamultiplealignment
getentry
multiplealignment
filter
getentry
multiplealignment
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Differences of Generated Workflow
Meta Data
Input
Database
Output
Full Cmds
No input
Database
No output
Full Cmds
Input
No DB
Output
Partial Cmds
input
No DB
No output
Partial Cmds
No.
1
2
3
4
Solution
Num
8
41
100 over
100 over
time(ms)
11266
46704
249906
25297
ID
1052
102
104
109
10005
3107
129
4018
1052
102
104
109
1005
3107
129
4001
1043
102
104
109
10001
3107
129
4018
1043
102
104
109
10001
3107
129
4001
First Match WebServiceCall
Description
alignment by blastp from UNIPROT
identify data format and content.
extract SeqIdentifier from ddbj BLAST result record
extract Swiss-plot ACNumber from SequenceIdentifier
idlist[25] from idlist[??]
Get SWISSPROT entry of FASTA Format by Accession
multi fasta format from fasta list
multiplealignment by clustalw with blosum
alignment by blastp from UNIPROT
identify data format and content.
extract SeqIdentifier from ddbj BLAST result record
extract Swiss-plot ACNumber from SequenceIdentifier
idlist[25] from idlist[??]
Get SWISSPROT entry of FASTA Format by Accession
multi fasta format from fasta list
multiplealignment by clustalw with blosum
alignment by blastp from DAD
identify data format and content.
extract SeqIdentifier from ddbj BLAST result record
extract Swiss-plot ACNumber from SequenceIdentifier
idlist[25] from idlist[??]
Get SWISSPROT entry of FASTA Format by Accession
multi fasta format from fasta list
multiplealignment by clustalw with blosum
alignment by blastp from DAD
identify data format and content.
extract SeqIdentifier from ddbj BLAST result record
extract Swiss-plot ACNumber from SequenceIdentifier
idlist[5] from idlist[??]
Get SWISSPROT entry of FASTA Format by Accession
multi fasta format from fasta list
multiplealignment by clustalw with blosum
Number.
Number.
Number.
Number.
X?
X?
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Why Failed?
Amino Acid
Sequence
Input
blastp
Output
HitTable
UNIPROT
Amino Acid
Sequence
Input
blastp
DAD
Lack of
Interoperability
Between the
Web Services
Output
HitTable
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Very Similar but not the Same Format
Blastp for UniProt
Blastp for DAD
sp|Q8HXV2|INS_PONPY Insulin precursor
[Contains: Insulin B chain... 171 4e-43
L15440-1|AAA59179.1| 107|Homo sapiens
insulin protein. 177 1e-43
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Conclusion
• Web Services have great potential to
share Bioinformatics Data ant Tools in all
over the world
• Needs Automatic Workflow Generation
Tools to make full use of Web Services
• Bioinformatics Ontology is a key to
establish Interoperability among
Bioinformatics Web Services
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
Acknowledgement
• Daisuke Shinbara Tokyo Institute of Technology
(Hitachi, ltd.)
• Sumi Yoshikawa RIKEN GSC, TITECH
References
Akihiko Konagaya: “Bioinformatics Ontology: Towards the Automatics Generation
of Bioinformatics Workflow for Web Services,” in Proc. of Distributed Applications,
Web Services, Tools and GRID Infrastructures for Bioinformatics (NETTAB2006),
S. Margherita di Pula, Italy (http://www.nettab.org/2006/), pp.75-82 (2006)
Akihiko Konagaya: “OBIGrid: Towards the 'Ba' for Sharing Resources, Services
and Knowledge for Bioinformatics”, in Proc. of Fourth International Workshop on
Biomedical Computations on the Grid (BioGrid), Singapore (CCGRID 2006), 37
(2006)
A.Konagaya, ASTRENA-APBioNet Joint Meeting at 22nd APAN Singapore 18 July 2006
ご静聴ありがとうございました。
Thank You for Listening
Related documents