* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Saethre–Chotzen syndrome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy wikipedia , lookup
X-inactivation wikipedia , lookup
Pathogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome (book) wikipedia , lookup
Metagenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Sequence alignment wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Designer baby wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
1 7 2 3 1 3 7 6 2 8 9 8 4 10 4 9 5 11 10 6 Services descriptions 5 Overall workflow description Overall Workflow Description Services descriptions This workflow performs a BLAST search then compares the result to a previous blast result based on specified filter. This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways id returned. 7 blast_ddbj (searchSimple): Executes BLAST with specified program, database and query [local_aligning] inputs: program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases: see appendix query: nucleotide or protein sequence [biological_sequence] Output Result: result of blast execution [BLAST_report] Inputs: 1 program: blast type: blastn, blastp, blastx, tblastn or tblastx 2 database: e.g. SWISS, NCBI, EMBL, DDBJ 3 query : nucleotide or protein sequence blastfilecomparer: Compares a new BLAST output to an older blast output to identify new hits [filtering] 8 Inputs blastResult_direct_data: blast result file.[BLAST_report]. Use either this parameter or blastResult_url parameter as input but not both together. blastResult_url: url of the blast result. .[BLAST_report]. oldRefFile_direct_data: old blast result file..[BLAST_report]. Use either this parameter or oldblastResult_url as input but not both together. oldRefFile_url: url of the old blast result species: filter the result by species name chromo: filter the result by chromosome number advanced: words are looked for in the FASTA definition line Output report: Return a filtered blast result. 4 OldBlastResult: blast result 6 species_filter: species name 5 chromosome_filter: chromosome number Outputs: 9 blast_output: result of blast execution 10 Compared_output: return a list of GI number 6 Split_gene_ids: beanshell script to extract KEGG id from the record returned by “Kegg_gene_ids_all_species” operation Input input: result returned by Kegg_gene_ids_all_species operation Output output: Return gene KEGG id [KEGG_genes_id] Inputs 1 gene: Entrez gene id 2 Gi_numbers: Entrez gene id Outputs 7 Lister: List each element of a given file that can be used by subsequent operations Input File: file containing the elements to be listed Output listerReturn: return each element of the file to be used by subsequent operations 4 Kegg_strings: KEGG gene id 11 merged_kegg_pathways: KEGG pathways Services descriptions 3 5 Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping] Input string: External ID . In this workflow Entrez gene id [Entrez_Gene_ID] Output return: KEGG gene ID [KEGG_gene_id] Add_ncbi_to_string: beanshell script to add “ncbi-geneid:” to entrez gene ids. Input input: Entrez gene id [Entrez_Gene_ID] Output output: Return KEGG gene id [KEGG_genes_id] 8 Get_pathways_by_genes: Search all pathways which include all the given genes [Searching] Input genes_id_list: List of KEGG genes id [KEGG_genes_id] Output return: Return a list of pathway_id of specified KEGG gene ids [KEGG_record_id] 9 & 10 merge_pathways & mergePathways2 (Merge string list to string): concatenate a list of string Inputs stringlist: list of string to concatenate separator: separator to use between strings Output concatenated: Return concatenated string Services descriptions 3 3 blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc, Score, bits, per, p, exp) to be displayed in the blast result output. [filtering]. 1 Inputs new_direct_data: blast report file [BLAST_report]. mutually exclusive with new_url parameter new_url: url of the blast report file [BLAST_report]. 2 1 The following parameter are optional. To select one of them , pass the name of the input as input parameter. For example to display GI numbers, pass gi to the parameter gi. 3 4 8 seq_id: sequence identifier 4 gi: For GI number 2 7 acc: For accession number 9 5 desc: for descriptions score: for score value Overall Workflow Description bits: for bits score This workflow simplifies a BLAST text file into identifiers, descriptions and values (P, E-values). In order to extract the relevant ids etc. you need to pass the relevant string into the corresponding port, e.g. the default port being used is gi. This has been passed "gi". For any other ports simply pass in the string the SAME as the port name, e.g. seq_id, p, per etc. per: for percentage of identity. p: for p-value exp: for E-value Inputs Output report: return a simplified blast report 1 blast_file: blast result Services descriptions Overall workflow description This workflow extracts gene information and the relevant swissprot ids given Enembl gene ids Inputs 1 genes_in_region: List of Ensembl gene ids. Outputs 8 gene_info: return gene information Outputs Simplified_output: list of GI numbers 9 swiss_ids: return swissprot ids Service descriptions 5 1 2 9 6 7 8 10 3 4 Overall workflow description This workflow takes the list of GI number of a given blast report and retrieves the corresponding GO id. Inputs: 1 blast_report: blast result 2 gi_number: gi to retrieve GI numbers 3 regex: regex value to use for split_by_regex operation 5 blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc, Score, bits, per, p, exp) to be displayed in the blast result output. [filtering]. Input: new_direct_data: blast report file [BLAST_report]. Parameter mutually exclusive with the “new_url” parameter new_url: url of the blast report file [BLAST_report] To choose one of the following input , pass the name of the input as parameter value. For example to display GI numbers, pass gi as value for the parameter gi. seq_id: sequence identifier gi: For GI number acc: For accession number desc: for descriptions score: for score value bits: for bits score per: for percentage of identity. p: for p-value exp: for E-value Output: report: a brief summary of the result output: list of specified element. Here, list of GI numbers. 6 split_by_regex (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression Output: split: return split string 7 Merge_string_list_to_string: Merge a list of string Input: stringlist: string list to merge seperator: separator used for merging the list of string Output: concatenated: Return concatenated string 8 GOIDFromGiList: retrieves an array of GO id for a specified array of GI’s [retrieving] Input: giList: list of GI number [genbank_GI] Output: result: list of GO id [Gene_Ontology_term_id] 4 seperator: separator to use between strings Outputs: 9 Gi_numbers: list of GI numbers 10 GO_id: list of GO id 3 4 2 7 5 6 1 Services descriptions Overall workflow description This workflow retrieves an EMBL sequence in fasta format then performs a blast operation. Inputs 1 emblid_default: embl sequence identifier 2 Blast_db: blast database. e.g. SWISS 3 Blast_program: blast program. e.g. blastn Outputs 6 Fasta_output: nucleotide sequence in fasta format 7 Blast_result_ddbj: Blast result 4 Blastx_ddbj (searchSimple): Execute BLAST with specified program, database and query [local_aligning]. inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . or all possible databases see appendix query: Nucleotide or protein sequence in fasta format or without format [biological_sequence] Output Result: Return the result of blast execution [BLAST_report] 5 getFASTA: Get DDBJ entry of FASTA Format by Accession Number [Retrieving] Input accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession] [genebank_gene_accession] Output Result: Return a nucleotide sequence in fasta format [nucleotide_sequence]. 3 Split_by_regex (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input String: string to split Regex: regular expression Output split: return split string 2 regex: Regex value to use for “split_by_regex” operation 5 options: option value used to extract a piece of data from “parse_ddbj_gene_info” output file. e.g. swiss 2 gi_option: here we want to retrieve only the gi number from the blast output. 4 6 4 getGeneInfo: retrieves gene information given a Ensembl gene id [retrieving] Input geneId: Ensembl gene id [ensembl_record_id] Output Result: Return gene info of specified Ensembl gene id [Ensembl_record] 6 Parse_ddbj_gene_info: extract information from DDBJ (Dna Data Bank of Japan) getGeneInfo processor [retrieving] Input file_direct_data: ‘getGeneInfo’ output result [Ensembl_record] option: used to extract a piece of data from output file. e.g. swiss Output Output: return the extracted piece of data 7 parse_swiss: Beanshell script to extract only swissprot id from “parse_ddbj_gene_info” output Record. Input input: parse_ddbj_gene_info output record with ‘swiss’ as option. Output output: Return swissprot ids [SWISS-PROT_accession] 16 14 4 8 20 21 1 19 14 5 2 12 9 27 6 28 10 8 29 13 17 12 3 1 24 Services descriptions 36 5 7 31 32 4 11 26 15 35 37 16 33 34 30 Overall workflow description This workflow builds up a sub graph of the Gene Ontology given a GO term id to show the context for a supplied term or terms Inputs 1 termID: GO term id. e.g. GO:0007601 7 childColour: colour to use for specify children 9 ancestorColour: colour to use for specify ancestors 5 & 6 getChildren & getImmediateChildren (getChildren): Retrieves the IDs of all immediate children of a specified GO ID [Retrieving]. Input geneOntologyID: GO ID of which the Children should be returned [Gene_Ontology_term_id]. Output getChildrenReturn: Return the IDs of all immediate children of the specified term [Gene_Ontology_term_id]. Outputs 25 2 getParents: Retrieves the IDs of all immediate parent terms of specified GO ID [retrieving] Input geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] Output getParentsReturn: Return the IDs of all immediate parent terms of the specified term [Gene_Ontology_term_id] . 3 getAncestry (getAncestors): Retrieves the IDs of all ancestors of specified GO ID [retrieving]. Input geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] Output getAncestorsReturn: Return the IDs of all ancestors of the specified term [Gene_Ontology_term_id]. 4 Create (createSession): Takes no arguments and Creates a new GoViz session on the server and returns a session identifier that can be used in subsequent operation. Output createSessionReturn: Return a session identifier that can be used in subsequent operation. 12 colourInputTerm: specify the colour of given terms. 2 17 10 22 13 15 11 3 17 graphical: Return a sub graph of the Gene Ontology given a GO id. 8 & 10 addImmediateChildren & add (addTerm): Add a GO term to the visualisation, updating the state of the named session. Input SessionID: Session ID returned by the createSession operation. geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] 7 9 6 18 23 Services descriptions Overall workflow description This workflow retrieves the protein sequence, Pathways, GO diagram, medline info, blast result, and EC numbers of a given probe set id. Inputs 1 ProbSetid: probe set id 13 database: blast database 14 program: blast program used Outputs 28 swissprot: protein sequence 29 interproIds: InterPro ids 30 goDiagram: GO diagram 31 pathways: pathway diagram 32 ecNumbers: enzyme EC number 33 embl: nucleotide sequence in EMBL format 34 meltTemp: nucleotide sequence melting temperature 35 medline: medline info 36 Blast_result: Blast result 37 medlineIds: medline id Services descriptions 2 getMolFuncGoIds: Retrieves GO id of specified probe set id [retrieving] Input probSetid: probe set id [probe_id] Output getGeneOntologyMolecularFunctionReturn: Return a GO id [Gene_Ontology_term_id] 3 getEC: Retrieves enzyme EC number of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output getECReturn: Return EC number [EC_number] 4 getEmblid: Retrieves EMBL id of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output Return: Return EMBL id [EMBL_accession] 11 & 13 & 14 markAncestors & colourChildren & colourInputTerm (markTerm): Adds a specific colour parameter to supplied term in the Gene ontology. Inputs SessionID: Session ID returned by the createSession operation. geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] colour: The colours can be anything that is a valid colour within the dot file format. For the list of colours see appendix: 15 getresults (getDot): Retrieves the DOT text specifying the sub graph of the Gene Ontology that contains all the terms that have been added to the session. [retrieving] Input sessionID: SessionID: Session ID returned by the createSession operation. Output Return the DOT text specifying the subgraph of the Gene Ontology. 16 Finish (destroySession): Removes a session from the server, identified by the session ID returned by the createSession operation. Input SessionID: Session ID returned by the createSession operation. 5 cleanECnumbers: beanshell script to extract EC number from “getEC” service output Input “getEC” service output execution Ouput ecNumber: Return EC number [EC_number] 1 6 cleanGoIds: beanshell script to extract GO id from GO record returned by “getMolFuncGoIds”. Input GO records from “getMolFuncGoIds” execution Output goIds: Return GO id [Gene_Onotology_term_id] 4 2 5 7 createVizSession (createSession): Create a new GoViz (Gene Ontology Visualisation Service) session on the server Output returns a session identifier that can be used in subsequent operations 3 8 getSwissProtId: get a swissprot id of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output getSwissProtIdReturn: Return swissprot id [SWISS-PROT_accession] 9 addTermToViz (addTerm): Add a GO term to the visualisation, updating the state of the named session. Inputs sessionID: session identifier created by “createVizSession” web service geneOntologyID: GO id [Gene_Onotology_term_id] Overall workflow Description This workflow performs a sequence similarity search using the BLAST algorithm through the DDBJ (DNA Data Bank of Japan) web service Inputs: 1 program: blast type: blastn, blastp, blastx, tblastn or tblastx Services descriptions 10 getPathwaysByECNumbers: get pathways by enzyme EC number [retrieving] Input enzyme_id_list: list of enzyme EC number [EC_number] Output Return: return pathway ids [KEGG_record_id] 11 getMedlineIds (ebi_srslinks): For cross-referencing between databanks [retrieving] In this workflow retrieves medline id given EMBL id. Inputs databank: database name of the record to be linked from. fieldname: databank can be queried according to a number of field ( acc, All text) searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’ xrefDatabank: the databank to be linked to. See appendix for the list of databank Outputs report: summary of the result result: Result of ebi_srslinks execution. This case: medline ids [MEDLINE_reference_id] 15 removePrefix: beanshell script to remove prefix “MEDLINE:” from “getMedlineIds” output. Inputs str: string containing the prefix to be removed. prefix: prefix to remove Output id: Return medline id [MEDLINE_reference_id] 22 cleanInterProIds: beanshell script to extract interPro id from interPro record returned by “getInterProIds” service. Input inputStr: InterPro record returned by the “getInterProIds” service Output InterProIds: Return interPro ids [InterPro_accession] 16 ebi_embl: retrieves embl records given search term(s) [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’ Outputs report: summary of the result result: Return Embl record [embl_record] 23 destroyVizSession (destroySession): Remove a session from the server, identified by the session ID returned by the createVizSession operation. Input sessionID: session ID returned by createVizSession operation. 26 Ebi_medline2007: retrieves medline record given a search term [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’ Outputs result: return medline record [MEDLINE_citation] Output: 5 text_blast_out: result of blast execution Service description 4 searchSimple: Executes BLAST with specified program, database and query [local_aligning] Inputs program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible database see appendix query: nucleotide or protein sequence [biological_sequence] output Result: result of blast execution [BLAST_report] 20 splitString (Split string into string list by regular expression): split a record by a given regular expression Inputs string: string to be split regex: regular expression used to split a given string Output split: return split string 21 Ebi_uniprot: retrieves Uniprot records given search term(s) [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’ Outputs result: Return uniprot record [Uniprot_record] 25 calcMeltTemp: Calculates RNA/DNA melting temperature [calculating] Inputs sequence_usa: the Uniform Sequence Address. Mutually exclusive with sequence_direct_data sequence_direct_data: Nucleotide or protein sequence in specified format [Biological_sequence] sformat: optional parameter. sequence format ( see appendix for all possible format) sbegin: optional parameter. the first position to be used in the sequence. send: optional parameter specify the last position to be used in the sequence sprotein: optional parameter. Is sequence protein? snucleotide: optional parameter. Is sequence nucleotide? sreverse: optional parameter. Use reverse sequence slower: optional parameter. Use lower case supper: optional parameter. Use upper case windowsize: optional parameter. Specify window size (see appendix) shiftincrement: optional parameter. specify Shift Increment (see appendix) dnaconc: optional parameter. specify DNA concentration (nM) saltconc: optional parameter. specify salt concentration (mM) graph_format: optional parameter. Format of the graphical output (png, postscript, colourps, hpgl) rna: optional parameter. Use RNA data values product: optional parameter. Prompt for product values formamide: optional parameter. specify percentage of formamide mismatch: optional parameter. specify percent mismatch prodIen: optional parameter. specify product length thermo: optional parameter. Thermodynamic calculations temperature: optional parameter specify temperature in Celsius plot: optional parameter. produce a plot mintemp: optional parameter. minimum temperature Outputs Outfile: Return DNA/RNA melting temperature. 3 query_seq : nucleotide or protein sequence 19 getInterProIds: get interPro records of specified probe set Id [retrieving] Input probeSetId: probe set Id [probe_id] Output getInterProReturn: return interPro record [InterPro_record] 12 getFASTA: Get DDBJ entry of FASTA Format by Accession Number [Retrieving] Input accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession] [genebank_gene_accession] Output Result: Return a nucleotide sequence in fasta format [nucleotide_sequence] 17 mark_pathway_by_objects: Mark given objects on a given pathway map [displaying]. Inputs pathway_id: pathway id [KEGG_record_id] object_id_list: list of EC number (without) the prefix “EC” Output return: Return the URL of the generated pathway map [KEGG_record] 2 database: e.g. SWISS, NCBI, EMBL, DDBJ 18 getDotFromViz( getDot) : Return the DOT text specifying the subgraph of the GO that contains all the terms that have been added to this session using “addTermToViz” calls plus all the ancestors of such Term. input sessionID: session identifier created by “createVizSession” web service 24 getPathwayDiagrams (Get image from URL): retrieves image given the URL Input URL: URL of a image or diagram Output image: Retrun the image corresponding to a given URL. 1 2 5 4 7 8 Services descriptions 6 hsapiens_gene_ensembl: This biomart processor has been configured to retrieve Ensembl human gene ids and associated GO terms given chromosome number, start and end position [retrieving] 6 Inputs chromosome_name_filter: chromosome number end_filter: end position to use for the query start_filter: start position to use for the query Outputs go_description: return GO term description go: return GO term id [Gene_Ontology_term_id] ensembl_gene_id: return ensembl gene id [ensembl_record_id] 7 genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving] Inputs genesIds: gene identifier .e.g. BRCA2, ENSG00000128573 [Ensembl_record_id] species: species name. e.g. homo sapiens format: format of the gene id list. e.g. plain Output genesLocationsReturn: Return the location of genes on a given chromosome. 9 10 11 12 13 14 15 Overall workflow description This workflow first retrieves Ensembl gene ids and associated GO term given a chromosome start and end position. Then displays the genes on a karyotype. DDBJ_blastx (searchSimple): Execute BLAST with specified program, database and query [local_aligning]. Inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ. For all possible databases see appendix query: Nucleotide or protein sequence in fasta format [biological_sequence] Output Result: Return blast report [BLAST_report] 3 Inputs 1 chromosome: chromosome number .e.g. 12 2 end: chromosome end position to be used 3 start: chromosome start position to be used 4 species: species name. e.g. homo sapiens 5 plain_format: format type. e.g. plain Outputs 12 Image: karyotype image 13 GO_description: GO term description 14 GO_id: GO term ids 15 ens_gene_id: Ensembl gene ids. 8 split_pos (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input String: string to split Regex: regular expression (here:“\n”) Output split: return split string 9 getKaryoviewImage: Returns a representation of the karyotype of given species with features you want to locate on [displaying] Inputs position: position of gene on the chromosome species: species name. e.g. homo sapiens chromosome: chromosome number Outputs getKaryoviewImageReturn: return the URL and html file of karyotype 10 getImage: Beanshell script, extracts the URL of the karyotype Input tabResult: result of the “getKaryoviewImage” service Output url: return the URL of the karyotype. 11 Get_image_from_URL: retrieves the image given the URL input url: URL of the image Output Image : Return the image of specified URL 6 8 1 3 4 1 5 Services descriptions 2 9 3 getP53MutationIdsByExon: Get TP53 gene mutation ids by exon from IARC TP53 Database catalogue [retrieving] Input libs: Specifies the name (constant) of the TP53 somatic mutation database that must be queried. e.g. tp53_iarc exon: Exon number in the p53 gene Output result: Return exon ids including catalogues' names. e.g. TP53_IARC:9339 10 7 2 3 4 11 Overall workflow description Takes a GenBank identifier (a gi number), gets the according sequence, runs a BLAST against Arabidopsis Proteins and returns the AGI (Arabidopsis Genome Initiative) locus code for the best hit. Inputs 1 string_constant: Biomoby namespace 6 5 Services descriptions 3 Object: BioMoby object Inputs namespace: BioMoby name space id: NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl, or PDB identifier article_name: BioMoby article name Output mobydata: return BioMoby data 9 getP53MutationssByIds: Get TP53 gene mutations by ids from TP53 IARC database [retrieving] Input id: Exon id without catalogues' names. e.g. 9339 Output result: Return TP53 somatic mutation description. 8 4 MOBYSHoundGetGenBankWhateverSequence: Consumes a NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl, or PDB identifier and returns the equivalent genbank record as a DNA, RNA, AminoAcid sequence object as appropriate. Input object (identifier): output of Biomoby “Object” service. Output GenericSequence(file): Returns sequence associated to a given identifier [biological_sequence] 9 AGI: AGI locus code 11 10 identity: identical sequence 10 8 Filter_list_of_strings_extracting_match_to_a_regex: extract given regex from a specified string. Inputs stringlist: sting to extract from regex: regular expression to extract. Output filteredlist: return extracted string Overall workflow description This workflow takes the exon and the TP53 somatic mutation database as input and retrieves the full TP53 somatic mutation description(s) by first retrieving the TP53 somatic mutation database unique IDs associated with the input and then using IDs for retrieving the full TP53 somatic mutations descriptions. 5 MIPSBlastBetterE13: executes blast against MAtDB Arabidopsis protein coding genes with a cut off E-value of E=1e-13 Input GenericSequence(QuerySequence): biological sequence [biological_sequence] Output WU_BLAST_Text(BlastReport): Return a blast report [BLAST_report] 11 gi: GI number 5 Split_string_into_string_list_by_regular_expression: split string with specified regular expression Inputs strings: string to be split regex: regular expression to use Output split: Return split string 9 2 gi: GI number Outputs 8 acc: sequences accession numbers 7 6 Extract_accession: beanshell script to extract accession number from a sequence file Input in: sequence file [biological_sequence] Output accs: Return accession numbers 1 2 Inputs Tp53_somatic_mutations_database: TP53 somatic mutation database exon: Exons in the p53 gene. Range between 5-11 regex_entry_list_separator: used as a regex separator string to moveTP53 somatic mutation IDs from a text string to a list of strings. 6 regex_id_separator: This regular expression specifies the format of a TP53 somatic mutation id. 4 7 Extract_best_hit: beanshell script to extract AGI locus code for best hit Input in: blast report [BLAST_report] Output agi: Return AGI locus code for best hit id: Return sequence identity between query sequence and best hit. 7 id_position: specifies that the mutation code is the second part of the ID (regular expression specified by the 'regex_id_separator' string). Outputs 11 ids: Return TP53 exon ids. 10 mutations: Return TP53 somatic mutation description 5 6 14 1 10 9 2 4 3 13 8 7 4 1 11 3 12 6 5 Overall workflow description This workflow retrieves and displays genes positions on a chromosome using Ensembl Karyoview. Inputs 1 ids: list of gene id. e.g. BRCA2, ENSG00000128573 2 species: species name. e.g. homo sapiens Services descriptions 2 5 Split_ids (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression (here:“\n”) Output: split: return split string 6 genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving] Inputs genesIds: gene identifier .e.g. ENSG00000128573 [Ensembl_record_id] species: species name. e.g. homo sapiens format: format of the gene id list. e.g. plain Output genesLocationsReturn: Return the location of genes on the chromosome. 3 chromosome: chromosome number. 4 plain_format: format of the gene id list Outputs 12 HTML_file: HTML file of the URL containing the image 7 split_positions (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression (here:“\n”) Output: split: return split string 13 image: image of the genes positions on the chromosome 14 Position: position of gene on the chromosome 8 getKaryoviewImage: Returns a representation of the karyotype of a species with features we want to locate on [displaying] Inputs position: position of genes on the chromosome species: species name. e.g. homo sapiens chromosome: chromosome number Outputs getKaryoviewImageReturn: return the URL and html file of karyotype 7 Overall workflow description Services descriptions This workflow marks and retrieves a pathway diagram given a KEGG pathway id. It also retrieves gene information associated to the pathway id. 4 GetImage (Get image from URL): retrieve the image associated to a given URL Input url: URL to retrieve the image from. Output image: return the image. Input 1 pathwayId: KEGG pathway id. e.g. path:eco00020 Outputs: 6 Image: Pathway image 5 kegg_getEntries (bget): Retrieves KEGG database entries specified by a list of entry_id. [retrieving] Input Kids: KEGG database entry id. e.g. eco00020 [KEGG_genes_id] Output return: Return KEGG gene information [KEGG_record] 7 Gene_info: gene information Services descriptions 2 kegg_getGenesByPathway (get_genes_by_pathway): Search all genes on a specified pathway [searching] Input pathway_id: KEGG pathway id. e.g. path:bsu00010 [KEGG_record_id] Output return: Returns all gene_id of the specified pathway [KEGG_record_id] 3 mark_pathway_by_genes: Mark given genes on a given pathway map and return the URL of the generated image. Inputs map_id: KEGG pathway id [KEGG_record_id] oids: KEGG gene id . e.g. eco00020 [KEGG_genes_id] Outputs return: Returns URL of the generated image. Services descriptions 9 getImageURL: Beanshell script, extracts the URL of the karyotype Input tabResult: result of the “getKaryoviewImage” service Output url: return the URL of the karyotype. 1 11 getHTMLPage: Beanshell script, extracts the html page of the karyotype Input tabResult: result of “getKaryoviewImage” service Output HTMLPage: Return the HTML page of the karyotype. 2 10 Get_image_from_URL (Get image from URL): retrieves the image given the URL input url: URL of the image Output Image : Return the image of specified URL 3 5 1 Services descriptions 6 getgenesbyspecies: Retrieves a list of Ensembl genes for a given species, chromosome and position [retrieving] Inputs database: name of the Ensembl database to retrieve the genes from. chromosome: chromosome number. e.g. 12 start: start position of the region in the chromosome. end: end position. Output output: return a list of Ensembl gene id of specified region of a given chromosome [ensembl_record_id] 8 4 7 3 5 getcurrentdatabase: Retrieves the current databases used by ENSEMBL for given species [retrieving] Input species: species name e.g. homo_sapiens Output output: Return the current database from ENSEMBL 6 2 Overall workflow description This workflow retrieves a list of genes and current databases used from ENSEMBL for a given species, chromosome and positions. inputs 4 Chromosome: chromosome number. e.g. 12 3 Start: start of the region in the chromosome. e.g. 100 2 end: end position. e.g. 5000000 1 species: species name. e.g. homo_sapiens Outputs 7 genes_in_region: Return a list of ENSEMBL gene 8 current_database: Return current database used Services descriptions 5 6 2 genscan: determines the most likely gene structure given a genomic DNA [predicting] 4 Inputs sequence_direct_data: genomic DNA sequence in fasta format [DNA_sequence] sequence_url: URL of the genomic DNA sequence in fasta format. These 2 input parameters are mutually exclusive Output output: Return a gene prediction report [gene_prediction_report] 7 3 genscansplitter: Run genscan (for gene prediction) on the given sequence input [predicting] 8 2 3 1 Inputs: Scanrecord_direct_data: genomic DNA sequence in fasta format [DNA_sequence] Scanrecord_url: URL of the genomic DNA sequence in fasta format These 2 input parameters are mutually exclusive Outputs: Peptide: Return the predicted protein sequence of the predicted gene [protein_sequence] Contig: Return the predicted gene sequence [DNA_sequence] 9 10 11 6 Search simple: Execute BLAST with specified program, database and query [local_aligning]. 12 inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases see appendix query: Nucleotide or protein sequence in fasta format [biological_sequence] Output Result: Return the result of blast execution [BLAST_report] Overall workflow description This workflow first scans a DNA sequence for gene prediction. Then using the predicted gene, it performs a blast operation and finds motifs within the predicted gene. Inputs: 1 dna: DNA sequence 4 Database: blast database. e.g. e.g. SWISS, NCBI, EMBL, DDBJ 5 program: blast type: blastn, blastp, blastx, tblastn or tblastx Outputs 8 blast_out: blast result report 9 prosite_matches: result of PROSITE motif search 10 Peptides: translated gene 11 cds: coding sequence of the predicted gene. 12 genscan_report: sequence of predicted gene. patmatmotifs: Search a PROSITE motif database with a given protein sequence [searching] 7 Inputs sequence_direct_data: protein sequence in fasta format [protein_sequence] full: Boolean. Provide full documentation for matching patterns prune: Boolean. Ignore simple patterns Output outfile: return possible PROSITE motifs found in the given protein [ Prosite_record] 5 4 6 7 Overall workflow description This workflow aligns given sequences and displays aligned sequences, with colouring and boxing. Input 1 seqs: nucleotide or protein sequence in fasta format Outputs 5 alignment: return sequence alignment result using analyzeSimple operation 7 single_list: return sequence alignment result using “emma” operation 6 pretty_alignment: Return alignment result with colouring and boxing. Services descriptions 2 emma: Multiple alignment program - interface to ClustalW program [aligning] Input sequence_direct_data: nucleotide or protein sequence [biological_sequence] Output outseq: Return aligned sequence [multiple_sequence_alignment_report] 3 analyseSimple: Execute ClustalW specified with multi sequences [aligning]. Input query: nucleotide or protein sequence [biological_sequence] Output result: Return aligned sequences [multiple_sequence_alignment_report] 4 prettyplot: Displays aligned sequences, with colouring and boxing [displaying] Input sequence_direct_data: File containing a sequence alignment [multiple_sequence_alignment_report] [pairwise_sequence_alignment_report] Output Graphics_in_PNG: Return a plot of aligned sequences. 1 2 7 8 9 4 5 12 10 6 3 11 13 14 15 17 18 16 Overall workflow description This workflow fetches sequences using the seqret tool, the sequences are then subjected to a multiple alignment using emma and simultaneously scanned for predicted transmembrane regions. This alignment is then plotted to a set of PNG images and also used to build a profile using the prophecy and prophet tools. Inputs 1 Sequenceid: sequence identifiers 3 msFormat: sequence format 5 prophecyType: prophecy type 6 prophecyName: single word for sequence name 7 transeqSequenceID: nucleotide sequence id 8 sbegin: start position of the translation process 9 send: end position of the translation process Outputs 17 prophetOutput: Return aligned sequences 16 outputPlot: Return alignment result with colouring and boxing 18 tmapPlot: Displays membrane spanning regions Services descriptions 2 seqret1(seqret): Reads and returns sequences [retrieving] Input Sequence_usa: identifier or GI number of the input sequence Output outseq: Retun sequence [biological_sequence] 4 emma: Multiple alignment program, interface to ClustalW program [aligning] Input sequence_direct_data: nucleotide or protein sequence [biological_sequence] Output outseq: Return aligned sequence [multiple_sequence_alignment_report] 13 formatSequences (seqret): Reads and return sequences [retrieving] Input Sequence_direct_data: nucleotide or protein sequence [biological_sequence] osformat: output sequence format. Possible values see appendix. Output outseq: Retun sequence in specified format. [biological_sequence] 12 plot (prettyplot): Displays aligned sequences, with colouring and boxing [displaying] Input sequence_direct_data: File containing a sequence alignment [multiple_sequence_alignment_report] [pairwise_sequence_alignment_report] Output Graphics_in_PNG: result of prettyplot execution Services descriptions 11 Prophecy: Creates matrices/profiles from multiple alignments Inputs sequence_direct_data: alignment report file [multiple_sequence_alignment_report] type: The allowed values for this parameter are: F, G, H, name: Single word without spaces to identify the sequence Output outfile: Return matrix profile 10 transeq: Translate nucleic acid sequences into protein [translating] Input sequence_usa: nucleotide sequence id [EMBL_id] sbegin: start position to be used in the sequence send: end position to be used in the sequence Output outseq: return protein sequence [protein_sequence] 14 tmap: Displays membrane spanning regions [displaying] Input sequence_direct_data: sequence in specified format [biological_sequence] Output graphics_in_PNG: display a graph of the result 15 Prophet: Return Gapped alignment for profiles [gapped_aligning] Input sequence_direct_data: sequence data [biological_sequence] infile_direct_data: Profile or weight matrix file Output outseq: return gap alignment report [multiple_sequence_alignment_report]