Download On-the-fly Link Generation for Workflows in Biology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genomic imprinting wikipedia , lookup

Microevolution wikipedia , lookup

Gene wikipedia , lookup

Genomics wikipedia , lookup

Designer baby wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Metagenomics wikipedia , lookup

Minimal genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Transcript
On-the-fly Link Generation for Workflows in Biology
Yeondae Kwon1
Yasumasa Shigemoto1
[email protected]
[email protected]
Yoshikazu Kuwana1
Hideaki Sugawara1
[email protected]
1
[email protected]
National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan
Keywords: Web service, SOAP, REST, workflow, workflow navigation system
1
Introduction
A number of biological data resources such as databases and analytical tools can be accessible through the
Internet. However, it is laborious and sometimes impossible to write a computer program that finds a useful
data source, sends a proper query and processes its output. Therefore, it becomes a serious obstacle to the
integration of distributed heterogeneous data sources. To solve this problem, DNA Data Bank of Japan
(DDBJ) provides Web-based systems for biological analysis, called Web APIs for biology (WABI,
http://www.xml.nig.ac.jp/), workflows and a workflow navigation system (http://cyclamen.ddbj.nig.ac.jp/),
which supports users’ navigation by providing next possible tasks on Web browsers. WABI also provides
wiki-style Web pages, called Cookbook, to share know-how in using WABI services, such as “How can we
obtain a BLAST result with an XML format?”
2
DDBJ Web APIs for Biology
DDBJ currently provides 133 Web APIs (methods) from 22 services, such as keyword search, data retrieval,
and homology search, with both SOAP and REST interfaces (Table 1) [1, 2]. These methods can be used as
the building blocks for the developments of customized workflows. WABI also provides the function that
enables users to asynchronously retrieve execution results of time-consuming methods.
Table 1: Provided Web APIs
Service name (the number of Web APIs)
DDBJ(7), ARSA(4), GetEntry(44)
Service description
Keyword search and data retrieval against 20 public databases.
Blast(6), ClustalW(4), Mafft(4), Fasta(5), VecScreen (4)
Analysis functions such as homology search and multiple alignments.
Gib(11), Gtop(3),
GIBIS(1), SPS(2)
DDBJ original database system (microbial/virus genome, insertion
sequence, environmental sequence, re-evaluation of ORF in genome,
protein structure).
Useful databases developed by other institutes.
GTPS(8),
GIBV(8),
GIBEnv(1),
TxSearch(5), RefSeq(1), GO(3), Ensembl(4), OMIM(2),
NCBIGenomeAnnotation(4)
3
DDBJ Workflows and UML Notations
A workflow is a series of tasks. DDBJ currently provides 8 predefined workflows so that typical analysis
procedures can be carried out without any programming (Table 2). These workflows are constructed by
applying several Web APIs. The semantics of each workflow is defined using Unified Modeling Language
(UML) notations so that users can understand its function unambiguously. Figure 1 shows an example of a
UML activity diagram of the homology workflow.
P113-1
Table2: Provided Workflows
Workflow
BLAST workflow
Blast-ClustalW workflow
Homology workflow
Human chromosome gene workflow
Nucleotide frequency workflow
OMIM workflow
SNP workflow
Splicing workflow
Description
Run multiple BLAST against DDBJ, UniProtKB/Swiss-Port, and PDB.
Run blastn and compare alignment regions of high similar sequences.
Search other species which have genes similar to human genes.
Show the number of genes on each chromosome.
Report the pattern of nucleotide frequency distribution.
Compare the similarities of human disease genes among eukaryotes.
Extract the relation between a human gene and SNP.
Compare the similarities between splicing structure and homology.
Figure 1: UML notation of the homology workflow
3
Figure 3: Workflow navigation system
DDBJ Workflow Navigation System
DDBJ workflow navigation system aims
to help non-programming biologists
perform analysis tasks by providing next
applicable services on Web browsers
according to the output of a previously
selected service. This eliminates the
need of any programming, and thus,
users only need to select a service name
they would like to execute from the list
of executable services (Figure 2). A list
of services is generated from
dictionaries and meta-information of
services (Figure 3).
4
Discussions
We plan to verify case studies of
semantic Web in biology and investigate
the possibility of semantic Web APIs for
biology.
Figure 2: Example of workflow navigation system
References
[1] Sugawara, H. and Miyazaki, S., Biological SOAP servers and Web services provided by the public
sequence data bank, Nucleic Acids Res., 31(13):3836-3839, 2003.
[2] Kwon, Y., Shigemoto, Y., Kuwana, Y., and Sugawara, H., Web API for biology with a workflow
navigation system, Nucleic Acids Res., 37:W11-W16, 2009.
P113-2