Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Saethre–Chotzen syndrome wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene therapy wikipedia , lookup

X-inactivation wikipedia , lookup

Pathogenomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome evolution wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome (book) wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome editing wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
1
7
2
3
1
3
7
6
2
8
9
8
4
10
4
9
5
11
10
6
Services descriptions
5
Overall workflow description
Overall Workflow Description
Services descriptions
This workflow performs a BLAST search then compares the result to
a previous blast result based on specified filter.
This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to
the start of each gene id. These gene ids are then cross-referenced to
KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway
database and its relevant pathways id returned.
7 blast_ddbj (searchSimple): Executes BLAST with specified program, database and query [local_aligning]
inputs:
program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx
database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases: see appendix
query: nucleotide or protein sequence [biological_sequence]
Output
Result: result of blast execution [BLAST_report]
Inputs:
1 program: blast type: blastn, blastp, blastx, tblastn or tblastx
2 database: e.g. SWISS, NCBI, EMBL, DDBJ
3 query : nucleotide or protein sequence
blastfilecomparer: Compares a new BLAST output to an older blast output to identify new hits [filtering]
8 Inputs
blastResult_direct_data: blast result file.[BLAST_report].
Use either this parameter or blastResult_url parameter as input but not both together.
blastResult_url: url of the blast result. .[BLAST_report].
oldRefFile_direct_data: old blast result file..[BLAST_report].
Use either this parameter or oldblastResult_url as input but not both together.
oldRefFile_url: url of the old blast result
species: filter the result by species name
chromo: filter the result by chromosome number
advanced: words are looked for in the FASTA definition line
Output
report: Return a filtered blast result.
4 OldBlastResult: blast result
6 species_filter: species name
5 chromosome_filter: chromosome number
Outputs:
9 blast_output: result of blast execution
10 Compared_output: return a list of GI number
6 Split_gene_ids: beanshell script to extract KEGG id from the record returned by “Kegg_gene_ids_all_species”
operation
Input
input: result returned by Kegg_gene_ids_all_species operation
Output
output: Return gene KEGG id [KEGG_genes_id]
Inputs
1 gene: Entrez gene id
2 Gi_numbers: Entrez gene id
Outputs
7 Lister: List each element of a given file that can be used by subsequent operations
Input
File: file containing the elements to be listed
Output
listerReturn: return each element of the file to be used by subsequent operations
4 Kegg_strings: KEGG gene id
11 merged_kegg_pathways: KEGG pathways
Services descriptions
3
5 Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping]
Input
string: External ID . In this workflow Entrez gene id [Entrez_Gene_ID]
Output
return: KEGG gene ID [KEGG_gene_id]
Add_ncbi_to_string: beanshell script to add “ncbi-geneid:” to entrez gene
ids.
Input
input: Entrez gene id [Entrez_Gene_ID]
Output
output: Return KEGG gene id [KEGG_genes_id]
8 Get_pathways_by_genes: Search all pathways which include all the given genes [Searching]
Input
genes_id_list: List of KEGG genes id [KEGG_genes_id]
Output
return: Return a list of pathway_id of specified KEGG gene ids [KEGG_record_id]
9 & 10 merge_pathways & mergePathways2 (Merge string list to string): concatenate a list of string
Inputs
stringlist: list of string to concatenate
separator: separator to use between strings
Output
concatenated: Return concatenated string
Services descriptions
3
3 blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc,
Score, bits, per, p, exp) to be displayed in the blast result output. [filtering].
1
Inputs
new_direct_data: blast report file [BLAST_report].
mutually exclusive with new_url parameter
new_url: url of the blast report file [BLAST_report].
2
1
The following parameter are optional. To select one of them , pass the name of the input
as input parameter. For example to display GI numbers, pass gi to the parameter gi.
3
4
8
seq_id: sequence identifier
4
gi: For GI number
2
7
acc: For accession number
9
5
desc: for descriptions
score: for score value
Overall Workflow Description
bits: for bits score
This workflow simplifies a BLAST text file into identifiers, descriptions and values (P, E-values).
In order to extract the relevant ids etc. you need to pass the relevant string into the corresponding port,
e.g. the default port being used is gi. This has been passed "gi". For any other ports simply pass
in the string the SAME as the port name, e.g. seq_id, p, per etc.
per: for percentage of identity.
p: for p-value
exp: for E-value
Inputs
Output
report: return a simplified blast report
1 blast_file: blast result
Services descriptions
Overall workflow description
This workflow extracts gene information and the relevant swissprot ids given Enembl gene ids
Inputs
1 genes_in_region: List of Ensembl gene ids.
Outputs
8 gene_info: return gene information
Outputs
Simplified_output: list of GI numbers
9 swiss_ids: return swissprot ids
Service descriptions
5
1
2
9
6
7
8
10
3
4
Overall workflow description
This workflow takes the list of GI number of a given blast report and retrieves the corresponding GO id.
Inputs:
1 blast_report: blast result
2 gi_number: gi to retrieve GI numbers
3 regex: regex value to use for split_by_regex operation
5 blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc,
Score, bits, per, p, exp) to be displayed in the blast result output. [filtering].
Input:
new_direct_data: blast report file [BLAST_report].
Parameter mutually exclusive with the “new_url” parameter
new_url: url of the blast report file [BLAST_report]
To choose one of the following input , pass the name of the input as parameter value.
For example to display GI numbers, pass gi as value for the parameter gi.
seq_id: sequence identifier
gi: For GI number
acc: For accession number
desc: for descriptions
score: for score value
bits: for bits score
per: for percentage of identity.
p: for p-value
exp: for E-value
Output:
report: a brief summary of the result
output: list of specified element. Here, list of GI numbers.
6 split_by_regex (Split string into string list by regular expression): split a given string with
a specified regular expression (regex)
Input:
String: string to split
Regex: regular expression
Output:
split: return split string
7 Merge_string_list_to_string: Merge a list of string
Input:
stringlist: string list to merge
seperator: separator used for merging the list of string
Output:
concatenated: Return concatenated string
8 GOIDFromGiList: retrieves an array of GO id for a specified array of GI’s [retrieving]
Input:
giList: list of GI number [genbank_GI]
Output:
result: list of GO id [Gene_Ontology_term_id]
4 seperator: separator to use between strings
Outputs:
9 Gi_numbers: list of GI numbers
10 GO_id: list of GO id
3
4
2
7
5
6
1
Services descriptions
Overall workflow description
This workflow retrieves an EMBL sequence in fasta format then performs a blast operation.
Inputs
1 emblid_default: embl sequence identifier
2 Blast_db: blast database. e.g. SWISS
3 Blast_program: blast program. e.g. blastn
Outputs
6 Fasta_output: nucleotide sequence in fasta format
7 Blast_result_ddbj: Blast result
4 Blastx_ddbj (searchSimple): Execute BLAST with specified program, database and
query [local_aligning].
inputs
program: Blast type used: blastn, blastp, blastx, tblastn or tblastx
database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . or all possible databases see appendix
query: Nucleotide or protein sequence in fasta format or without format [biological_sequence]
Output
Result: Return the result of blast execution [BLAST_report]
5 getFASTA: Get DDBJ entry of FASTA Format by Accession Number [Retrieving]
Input
accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession]
[genebank_gene_accession]
Output
Result: Return a nucleotide sequence in fasta format [nucleotide_sequence].
3 Split_by_regex (Split string into string list by regular expression): split a given string with a
specified regular expression (regex)
Input
String: string to split
Regex: regular expression
Output
split: return split string
2 regex: Regex value to use for “split_by_regex” operation
5 options: option value used to extract a piece of data from “parse_ddbj_gene_info” output file. e.g.
swiss
2 gi_option: here we want to retrieve only the gi number from the blast
output.
4
6
4 getGeneInfo: retrieves gene information given a Ensembl gene id [retrieving]
Input
geneId: Ensembl gene id [ensembl_record_id]
Output
Result: Return gene info of specified Ensembl gene id [Ensembl_record]
6 Parse_ddbj_gene_info: extract information from DDBJ (Dna Data Bank of Japan) getGeneInfo
processor [retrieving]
Input
file_direct_data: ‘getGeneInfo’ output result [Ensembl_record]
option: used to extract a piece of data from output file. e.g. swiss
Output
Output: return the extracted piece of data
7 parse_swiss: Beanshell script to extract only swissprot id from “parse_ddbj_gene_info” output
Record.
Input
input: parse_ddbj_gene_info output record with ‘swiss’ as option.
Output
output: Return swissprot ids [SWISS-PROT_accession]
16
14
4
8
20
21
1
19
14
5
2
12
9
27
6
28
10
8
29
13
17
12
3
1
24
Services descriptions
36
5
7
31
32
4
11
26
15
35
37
16
33
34
30
Overall workflow description
This workflow builds up a sub graph of the Gene Ontology given a GO term id
to show the context for a supplied term or terms
Inputs
1 termID: GO term id. e.g. GO:0007601
7 childColour: colour to use for specify children
9 ancestorColour: colour to use for specify ancestors
5 & 6 getChildren & getImmediateChildren (getChildren): Retrieves the IDs of all immediate children of a specified GO ID
[Retrieving].
Input
geneOntologyID: GO ID of which the Children should be returned [Gene_Ontology_term_id].
Output
getChildrenReturn: Return the IDs of all immediate children of the specified term [Gene_Ontology_term_id].
Outputs
25
2 getParents: Retrieves the IDs of all immediate parent terms of specified GO ID [retrieving]
Input
geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id]
Output
getParentsReturn: Return the IDs of all immediate parent terms of the specified term [Gene_Ontology_term_id]
.
3 getAncestry (getAncestors): Retrieves the IDs of all ancestors of specified GO ID [retrieving].
Input
geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id]
Output
getAncestorsReturn: Return the IDs of all ancestors of the specified term [Gene_Ontology_term_id].
4 Create (createSession): Takes no arguments and Creates a new GoViz session on the server and returns a session
identifier that can be used in subsequent operation.
Output
createSessionReturn: Return a session identifier that can be used in subsequent operation.
12 colourInputTerm: specify the colour of given terms.
2
17
10
22
13
15
11
3
17 graphical: Return a sub graph of the Gene Ontology given a GO id.
8 & 10 addImmediateChildren & add (addTerm): Add a GO term to the visualisation, updating the state of the named
session.
Input
SessionID: Session ID returned by the createSession operation.
geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id]
7
9
6
18
23
Services descriptions
Overall workflow description
This workflow retrieves the protein sequence, Pathways, GO diagram,
medline info, blast result, and EC numbers of a given probe set id.
Inputs
1 ProbSetid: probe set id
13 database: blast database
14 program: blast program used
Outputs
28 swissprot: protein sequence
29 interproIds: InterPro ids
30 goDiagram: GO diagram
31 pathways: pathway diagram
32 ecNumbers: enzyme EC number
33 embl: nucleotide sequence in EMBL format
34 meltTemp: nucleotide sequence melting temperature
35 medline: medline info
36 Blast_result: Blast result
37 medlineIds: medline id
Services descriptions
2 getMolFuncGoIds: Retrieves GO id of specified probe set id [retrieving]
Input
probSetid: probe set id [probe_id]
Output
getGeneOntologyMolecularFunctionReturn: Return a GO id [Gene_Ontology_term_id]
3 getEC: Retrieves enzyme EC number of specified probe set id [retrieving]
Input
probeSetId: probe set id [probe_id]
Output
getECReturn: Return EC number [EC_number]
4 getEmblid: Retrieves EMBL id of specified probe set id [retrieving]
Input
probeSetId: probe set id [probe_id]
Output
Return: Return EMBL id [EMBL_accession]
11 & 13 & 14 markAncestors & colourChildren & colourInputTerm (markTerm): Adds a specific colour parameter to supplied term in the Gene ontology.
Inputs
SessionID: Session ID returned by the createSession operation.
geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id]
colour: The colours can be anything that is a valid colour within the dot file format. For the list of colours see appendix:
15 getresults (getDot): Retrieves the DOT text specifying the sub graph of the Gene Ontology that contains all the terms that have been added to the session.
[retrieving]
Input
sessionID: SessionID: Session ID returned by the createSession operation.
Output
Return the DOT text specifying the subgraph of the Gene Ontology.
16 Finish (destroySession): Removes a session from the server, identified by the session ID returned by the createSession operation.
Input
SessionID: Session ID returned by the createSession operation.
5 cleanECnumbers: beanshell script to extract EC number from “getEC” service output
Input
“getEC” service output execution
Ouput
ecNumber: Return EC number [EC_number]
1
6 cleanGoIds: beanshell script to extract GO id from GO record returned by “getMolFuncGoIds”.
Input
GO records from “getMolFuncGoIds” execution
Output
goIds: Return GO id [Gene_Onotology_term_id]
4
2
5
7 createVizSession (createSession): Create a new GoViz (Gene Ontology Visualisation Service) session on the server
Output
returns a session identifier that can be used in subsequent operations
3
8 getSwissProtId: get a swissprot id of specified probe set id [retrieving]
Input
probeSetId: probe set id [probe_id]
Output
getSwissProtIdReturn: Return swissprot id [SWISS-PROT_accession]
9 addTermToViz (addTerm): Add a GO term to the visualisation, updating the state of the named session.
Inputs
sessionID: session identifier created by “createVizSession” web service
geneOntologyID: GO id [Gene_Onotology_term_id]
Overall workflow Description
This workflow performs a sequence similarity search using the BLAST algorithm through the
DDBJ (DNA Data Bank of Japan) web service
Inputs:
1 program: blast type: blastn, blastp, blastx, tblastn or tblastx
Services descriptions
10 getPathwaysByECNumbers: get pathways by enzyme EC number [retrieving]
Input
enzyme_id_list: list of enzyme EC number [EC_number]
Output
Return: return pathway ids [KEGG_record_id]
11 getMedlineIds (ebi_srslinks): For cross-referencing between databanks [retrieving]
In this workflow retrieves medline id given EMBL id.
Inputs
databank: database name of the record to be linked from.
fieldname: databank can be queried according to a number of field ( acc, All text)
searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’
xrefDatabank: the databank to be linked to. See appendix for the list of databank
Outputs
report: summary of the result
result: Result of ebi_srslinks execution. This case: medline ids [MEDLINE_reference_id]
15 removePrefix: beanshell script to remove prefix “MEDLINE:” from “getMedlineIds” output.
Inputs
str: string containing the prefix to be removed.
prefix: prefix to remove
Output
id: Return medline id [MEDLINE_reference_id]
22 cleanInterProIds: beanshell script to extract interPro id from interPro record returned by
“getInterProIds” service.
Input
inputStr: InterPro record returned by the “getInterProIds” service
Output
InterProIds: Return interPro ids [InterPro_accession]
16 ebi_embl: retrieves embl records given search term(s) [retrieving]
Inputs
Fieldname: databank can be queried according to a number of field (see appendix)
Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’
Outputs
report: summary of the result
result: Return Embl record [embl_record]
23 destroyVizSession (destroySession): Remove a session from the server, identified by the
session ID returned by the createVizSession operation.
Input
sessionID: session ID returned by createVizSession operation.
26 Ebi_medline2007: retrieves medline record given a search term [retrieving]
Inputs
Fieldname: databank can be queried according to a number of field (see appendix)
Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’
Outputs
result: return medline record [MEDLINE_citation]
Output:
5 text_blast_out: result of blast execution
Service description
4 searchSimple: Executes BLAST with specified program, database and query [local_aligning]
Inputs
program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx
database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible database see appendix
query: nucleotide or protein sequence [biological_sequence]
output
Result: result of blast execution [BLAST_report]
20 splitString (Split string into string list by regular expression): split a record by a given
regular expression
Inputs
string: string to be split
regex: regular expression used to split a given string
Output
split: return split string
21 Ebi_uniprot: retrieves Uniprot records given search term(s) [retrieving]
Inputs
Fieldname: databank can be queried according to a number of field (see appendix)
searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’
Outputs
result: Return uniprot record [Uniprot_record]
25 calcMeltTemp: Calculates RNA/DNA melting temperature [calculating]
Inputs
sequence_usa: the Uniform Sequence Address. Mutually exclusive with sequence_direct_data
sequence_direct_data: Nucleotide or protein sequence in specified format [Biological_sequence]
sformat: optional parameter. sequence format ( see appendix for all possible format)
sbegin: optional parameter. the first position to be used in the sequence.
send: optional parameter specify the last position to be used in the sequence
sprotein: optional parameter. Is sequence protein?
snucleotide: optional parameter. Is sequence nucleotide?
sreverse: optional parameter. Use reverse sequence
slower: optional parameter. Use lower case
supper: optional parameter. Use upper case
windowsize: optional parameter. Specify window size (see appendix)
shiftincrement: optional parameter. specify Shift Increment (see appendix)
dnaconc: optional parameter. specify DNA concentration (nM)
saltconc: optional parameter. specify salt concentration (mM)
graph_format: optional parameter. Format of the graphical output (png, postscript, colourps, hpgl)
rna: optional parameter. Use RNA data values
product: optional parameter. Prompt for product values
formamide: optional parameter. specify percentage of formamide
mismatch: optional parameter. specify percent mismatch
prodIen: optional parameter. specify product length
thermo: optional parameter. Thermodynamic calculations
temperature: optional parameter specify temperature in Celsius
plot: optional parameter. produce a plot
mintemp: optional parameter. minimum temperature
Outputs
Outfile: Return DNA/RNA melting temperature.
3 query_seq : nucleotide or protein sequence
19 getInterProIds: get interPro records of specified probe set Id [retrieving]
Input
probeSetId: probe set Id [probe_id]
Output
getInterProReturn: return interPro record [InterPro_record]
12 getFASTA: Get DDBJ entry of FASTA Format by Accession Number [Retrieving]
Input
accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession]
[genebank_gene_accession]
Output
Result: Return a nucleotide sequence in fasta format [nucleotide_sequence]
17 mark_pathway_by_objects: Mark given objects on a given pathway map [displaying].
Inputs
pathway_id: pathway id [KEGG_record_id]
object_id_list: list of EC number (without) the prefix “EC”
Output
return: Return the URL of the generated pathway map [KEGG_record]
2 database: e.g. SWISS, NCBI, EMBL, DDBJ
18 getDotFromViz( getDot) : Return the DOT text specifying the subgraph of the GO that contains
all the terms that have been added to this session using “addTermToViz” calls plus all the
ancestors of such Term.
input
sessionID: session identifier created by “createVizSession” web service
24 getPathwayDiagrams (Get image from URL): retrieves image given the URL
Input
URL: URL of a image or diagram
Output
image: Retrun the image corresponding to a given URL.
1
2
5
4
7
8
Services descriptions
6 hsapiens_gene_ensembl: This biomart processor has been configured to retrieve Ensembl
human gene ids and associated GO terms given chromosome number, start and
end position [retrieving]
6 Inputs
chromosome_name_filter: chromosome number
end_filter: end position to use for the query
start_filter: start position to use for the query
Outputs
go_description: return GO term description
go: return GO term id [Gene_Ontology_term_id]
ensembl_gene_id: return ensembl gene id [ensembl_record_id]
7 genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving]
Inputs
genesIds: gene identifier .e.g. BRCA2, ENSG00000128573 [Ensembl_record_id]
species: species name. e.g. homo sapiens
format: format of the gene id list. e.g. plain
Output
genesLocationsReturn: Return the location of genes on a given chromosome.
9
10
11
12
13
14
15
Overall workflow description
This workflow first retrieves Ensembl gene ids and associated GO term given a chromosome
start and end position. Then displays the genes on a karyotype.
DDBJ_blastx (searchSimple): Execute BLAST with specified program, database and query [local_aligning].
Inputs
program: Blast type used: blastn, blastp, blastx, tblastn or tblastx
database: blast database: eg. SWISS, NCBI, EMBL, DDBJ. For all possible databases see appendix
query: Nucleotide or protein sequence in fasta format [biological_sequence]
Output
Result: Return blast report [BLAST_report]
3
Inputs
1 chromosome: chromosome number .e.g. 12
2 end: chromosome end position to be used
3 start: chromosome start position to be used
4 species: species name. e.g. homo sapiens
5 plain_format: format type. e.g. plain
Outputs
12 Image: karyotype image
13 GO_description: GO term description
14 GO_id: GO term ids
15 ens_gene_id: Ensembl gene ids.
8 split_pos (Split string into string list by regular expression): split a given string with a
specified regular expression (regex)
Input
String: string to split
Regex: regular expression (here:“\n”)
Output
split: return split string
9 getKaryoviewImage: Returns a representation of the karyotype of given species with features
you want to locate on [displaying]
Inputs
position: position of gene on the chromosome
species: species name. e.g. homo sapiens
chromosome: chromosome number
Outputs
getKaryoviewImageReturn: return the URL and html file of karyotype
10 getImage: Beanshell script, extracts the URL of the karyotype
Input
tabResult: result of the “getKaryoviewImage” service
Output
url: return the URL of the karyotype.
11 Get_image_from_URL: retrieves the image given the URL
input
url: URL of the image
Output
Image : Return the image of specified URL
6
8
1
3
4
1
5
Services descriptions
2
9
3 getP53MutationIdsByExon: Get TP53 gene mutation ids by exon from IARC TP53 Database
catalogue [retrieving]
Input
libs: Specifies the name (constant) of the TP53 somatic mutation database that must
be queried. e.g. tp53_iarc
exon: Exon number in the p53 gene
Output
result: Return exon ids including catalogues' names. e.g. TP53_IARC:9339
10
7
2
3
4
11
Overall workflow description
Takes a GenBank identifier (a gi number), gets the according sequence, runs a
BLAST against Arabidopsis Proteins and returns the AGI (Arabidopsis Genome
Initiative) locus code for the best hit.
Inputs
1 string_constant: Biomoby namespace
6
5
Services descriptions
3 Object: BioMoby object
Inputs
namespace: BioMoby name space
id: NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl, or PDB identifier
article_name: BioMoby article name
Output
mobydata: return BioMoby data
9 getP53MutationssByIds: Get TP53 gene mutations by ids from TP53 IARC database [retrieving]
Input
id: Exon id without catalogues' names. e.g. 9339
Output
result: Return TP53 somatic mutation description.
8
4 MOBYSHoundGetGenBankWhateverSequence: Consumes a NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl,
or PDB identifier and returns the equivalent genbank record as a DNA, RNA, AminoAcid sequence object as appropriate.
Input
object (identifier): output of Biomoby “Object” service.
Output
GenericSequence(file): Returns sequence associated to a given identifier [biological_sequence]
9 AGI: AGI locus code
11
10 identity: identical sequence
10
8 Filter_list_of_strings_extracting_match_to_a_regex: extract given regex from a specified string.
Inputs
stringlist: sting to extract from
regex: regular expression to extract.
Output
filteredlist: return extracted string
Overall workflow description
This workflow takes the exon and the TP53 somatic mutation database as input and retrieves the
full TP53 somatic mutation description(s) by first retrieving the TP53 somatic mutation database
unique IDs associated with the input and then using IDs for retrieving the full TP53 somatic
mutations descriptions.
5 MIPSBlastBetterE13: executes blast against MAtDB Arabidopsis protein coding genes with a cut off E-value of E=1e-13
Input
GenericSequence(QuerySequence): biological sequence [biological_sequence]
Output
WU_BLAST_Text(BlastReport): Return a blast report [BLAST_report]
11 gi: GI number
5 Split_string_into_string_list_by_regular_expression: split string with specified regular expression
Inputs
strings: string to be split
regex: regular expression to use
Output
split: Return split string
9
2 gi: GI number
Outputs
8 acc: sequences accession numbers
7
6 Extract_accession: beanshell script to extract accession number from a sequence file
Input
in: sequence file [biological_sequence]
Output
accs: Return accession numbers
1
2
Inputs
Tp53_somatic_mutations_database: TP53 somatic mutation database
exon: Exons in the p53 gene. Range between 5-11
regex_entry_list_separator: used as a regex separator string to moveTP53 somatic mutation IDs
from a text string to a list of strings.
6 regex_id_separator: This regular expression specifies the format of a TP53 somatic mutation id.
4
7 Extract_best_hit: beanshell script to extract AGI locus code for best hit
Input
in: blast report [BLAST_report]
Output
agi: Return AGI locus code for best hit
id: Return sequence identity between query sequence and best hit.
7
id_position: specifies that the mutation code is the second part of the ID (regular expression
specified by the 'regex_id_separator' string).
Outputs
11 ids: Return TP53 exon ids.
10 mutations: Return TP53 somatic mutation description
5
6
14
1
10
9
2
4
3
13
8
7
4
1
11
3
12
6
5
Overall workflow description
This workflow retrieves and displays genes positions on a chromosome
using Ensembl Karyoview.
Inputs
1 ids: list of gene id. e.g. BRCA2, ENSG00000128573
2 species: species name. e.g. homo sapiens
Services descriptions
2
5 Split_ids (Split string into string list by regular expression): split a given string with a specified regular expression (regex)
Input:
String: string to split
Regex: regular expression (here:“\n”)
Output:
split: return split string
6 genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving]
Inputs
genesIds: gene identifier .e.g. ENSG00000128573 [Ensembl_record_id]
species: species name. e.g. homo sapiens
format: format of the gene id list. e.g. plain
Output
genesLocationsReturn: Return the location of genes on the chromosome.
3 chromosome: chromosome number.
4 plain_format: format of the gene id list
Outputs
12 HTML_file: HTML file of the URL containing the image
7 split_positions (Split string into string list by regular expression): split a given string with a specified regular expression (regex)
Input:
String: string to split
Regex: regular expression (here:“\n”)
Output:
split: return split string
13 image: image of the genes positions on the chromosome
14 Position: position of gene on the chromosome
8 getKaryoviewImage: Returns a representation of the karyotype of a species with features we want to locate on [displaying]
Inputs
position: position of genes on the chromosome
species: species name. e.g. homo sapiens
chromosome: chromosome number
Outputs
getKaryoviewImageReturn: return the URL and html file of karyotype
7
Overall workflow description
Services descriptions
This workflow marks and retrieves a pathway diagram given a KEGG pathway id.
It also retrieves gene information associated to the pathway id.
4 GetImage (Get image from URL): retrieve the image associated to a given URL
Input
url: URL to retrieve the image from.
Output
image: return the image.
Input
1 pathwayId: KEGG pathway id. e.g. path:eco00020
Outputs:
6 Image: Pathway image
5 kegg_getEntries (bget): Retrieves KEGG database entries specified by a list of
entry_id. [retrieving]
Input
Kids: KEGG database entry id. e.g. eco00020 [KEGG_genes_id]
Output
return: Return KEGG gene information [KEGG_record]
7 Gene_info: gene information
Services descriptions
2 kegg_getGenesByPathway (get_genes_by_pathway): Search all genes on a specified pathway [searching]
Input
pathway_id: KEGG pathway id. e.g. path:bsu00010 [KEGG_record_id]
Output
return: Returns all gene_id of the specified pathway [KEGG_record_id]
3 mark_pathway_by_genes: Mark given genes on a given pathway map and return the URL of the generated
image.
Inputs
map_id: KEGG pathway id [KEGG_record_id]
oids: KEGG gene id . e.g. eco00020 [KEGG_genes_id]
Outputs
return: Returns URL of the generated image.
Services descriptions
9 getImageURL: Beanshell script, extracts the URL of the karyotype
Input
tabResult: result of the “getKaryoviewImage” service
Output
url: return the URL of the karyotype.
1
11 getHTMLPage: Beanshell script, extracts the html page of the karyotype
Input
tabResult: result of “getKaryoviewImage” service
Output
HTMLPage: Return the HTML page of the karyotype.
2
10 Get_image_from_URL (Get image from URL): retrieves the image given the URL
input
url: URL of the image
Output
Image : Return the image of specified URL
3
5
1
Services descriptions
6 getgenesbyspecies: Retrieves a list of Ensembl genes for a given species, chromosome and
position [retrieving]
Inputs
database: name of the Ensembl database to retrieve the genes from.
chromosome: chromosome number. e.g. 12
start: start position of the region in the chromosome.
end: end position.
Output
output: return a list of Ensembl gene id of specified region of a given chromosome
[ensembl_record_id]
8
4
7
3
5 getcurrentdatabase: Retrieves the current databases used by ENSEMBL for given species
[retrieving]
Input
species: species name e.g. homo_sapiens
Output
output: Return the current database from ENSEMBL
6
2
Overall workflow description
This workflow retrieves a list of genes and current databases used from ENSEMBL
for a given species, chromosome and positions.
inputs
4 Chromosome: chromosome number. e.g. 12
3 Start: start of the region in the chromosome. e.g. 100
2 end: end position. e.g. 5000000
1 species: species name. e.g. homo_sapiens
Outputs
7 genes_in_region: Return a list of ENSEMBL gene
8 current_database: Return current database used
Services descriptions
5
6
2 genscan: determines the most likely gene structure given a genomic DNA [predicting]
4
Inputs
sequence_direct_data: genomic DNA sequence in fasta format [DNA_sequence]
sequence_url: URL of the genomic DNA sequence in fasta format.
These 2 input parameters are mutually exclusive
Output
output: Return a gene prediction report [gene_prediction_report]
7
3 genscansplitter: Run genscan (for gene prediction) on the given sequence input [predicting]
8
2
3
1
Inputs:
Scanrecord_direct_data: genomic DNA sequence in fasta format [DNA_sequence]
Scanrecord_url: URL of the genomic DNA sequence in fasta format
These 2 input parameters are mutually exclusive
Outputs:
Peptide: Return the predicted protein sequence of the predicted gene [protein_sequence]
Contig: Return the predicted gene sequence [DNA_sequence]
9
10
11
6 Search simple: Execute BLAST with specified program, database and query [local_aligning].
12
inputs
program: Blast type used: blastn, blastp, blastx, tblastn or tblastx
database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases see appendix
query: Nucleotide or protein sequence in fasta format [biological_sequence]
Output
Result: Return the result of blast execution [BLAST_report]
Overall workflow description
This workflow first scans a DNA sequence for gene prediction. Then using the predicted gene, it
performs a blast operation and finds motifs within the predicted gene.
Inputs:
1 dna: DNA sequence
4 Database: blast database. e.g. e.g. SWISS, NCBI, EMBL, DDBJ
5 program: blast type: blastn, blastp, blastx, tblastn or tblastx
Outputs
8 blast_out: blast result report
9 prosite_matches: result of PROSITE motif search
10 Peptides: translated gene
11 cds: coding sequence of the predicted gene.
12 genscan_report: sequence of predicted gene.
patmatmotifs: Search a PROSITE motif database with a given protein sequence [searching]
7
Inputs
sequence_direct_data: protein sequence in fasta format [protein_sequence]
full: Boolean. Provide full documentation for matching patterns
prune: Boolean. Ignore simple patterns
Output
outfile: return possible PROSITE motifs found in the given protein [ Prosite_record]
5
4
6
7
Overall workflow description
This workflow aligns given sequences and displays aligned sequences,
with colouring and boxing.
Input
1 seqs: nucleotide or protein sequence in fasta format
Outputs
5 alignment: return sequence alignment result using analyzeSimple operation
7 single_list: return sequence alignment result using “emma” operation
6 pretty_alignment: Return alignment result with colouring and boxing.
Services descriptions
2 emma: Multiple alignment program - interface to ClustalW program [aligning]
Input
sequence_direct_data: nucleotide or protein sequence [biological_sequence]
Output
outseq: Return aligned sequence [multiple_sequence_alignment_report]
3 analyseSimple: Execute ClustalW specified with multi sequences [aligning].
Input
query: nucleotide or protein sequence [biological_sequence]
Output
result: Return aligned sequences [multiple_sequence_alignment_report]
4 prettyplot: Displays aligned sequences, with colouring and boxing [displaying]
Input
sequence_direct_data: File containing a sequence alignment
[multiple_sequence_alignment_report] [pairwise_sequence_alignment_report]
Output
Graphics_in_PNG: Return a plot of aligned sequences.
1
2
7
8
9
4
5
12
10
6
3
11
13
14
15
17
18
16
Overall workflow description
This workflow fetches sequences using the seqret tool, the sequences are then subjected
to a multiple alignment using emma and simultaneously scanned for predicted transmembrane
regions. This alignment is then plotted to a set of PNG images and also used to build a profile
using the prophecy and prophet tools.
Inputs
1 Sequenceid: sequence identifiers
3 msFormat: sequence format
5 prophecyType: prophecy type
6 prophecyName: single word for sequence name
7 transeqSequenceID: nucleotide sequence id
8 sbegin: start position of the translation process
9 send: end position of the translation process
Outputs
17 prophetOutput: Return aligned sequences
16 outputPlot: Return alignment result with colouring and boxing
18 tmapPlot: Displays membrane spanning regions
Services descriptions
2 seqret1(seqret): Reads and returns sequences [retrieving]
Input
Sequence_usa: identifier or GI number of the input sequence
Output
outseq: Retun sequence [biological_sequence]
4 emma: Multiple alignment program, interface to ClustalW program [aligning]
Input
sequence_direct_data: nucleotide or protein sequence [biological_sequence]
Output
outseq: Return aligned sequence [multiple_sequence_alignment_report]
13 formatSequences (seqret): Reads and return sequences [retrieving]
Input
Sequence_direct_data: nucleotide or protein sequence [biological_sequence]
osformat: output sequence format. Possible values see appendix.
Output
outseq: Retun sequence in specified format. [biological_sequence]
12 plot (prettyplot): Displays aligned sequences, with colouring and boxing [displaying]
Input
sequence_direct_data: File containing a sequence alignment
[multiple_sequence_alignment_report] [pairwise_sequence_alignment_report]
Output
Graphics_in_PNG: result of prettyplot execution
Services descriptions
11 Prophecy: Creates matrices/profiles from multiple alignments
Inputs
sequence_direct_data: alignment report file [multiple_sequence_alignment_report]
type: The allowed values for this parameter are: F, G, H,
name: Single word without spaces to identify the sequence
Output
outfile: Return matrix profile
10 transeq: Translate nucleic acid sequences into protein [translating]
Input
sequence_usa: nucleotide sequence id [EMBL_id]
sbegin: start position to be used in the sequence
send: end position to be used in the sequence
Output
outseq: return protein sequence [protein_sequence]
14 tmap: Displays membrane spanning regions [displaying]
Input
sequence_direct_data: sequence in specified format [biological_sequence]
Output
graphics_in_PNG: display a graph of the result
15 Prophet: Return Gapped alignment for profiles [gapped_aligning]
Input
sequence_direct_data: sequence data [biological_sequence]
infile_direct_data: Profile or weight matrix file
Output
outseq: return gap alignment report [multiple_sequence_alignment_report]