Download CoBalt User Guide v0 - Directory listing for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
CoBaltDB User Guide v1.1
© B@SIC, UMR 6026 – December 2009
1. Introduction
1.1. Objectives
CoBaltDB, the Complete Bacterial and Archaeal Orfeomes Subcellular Localization Database, is
a Client-Server application, which aims at presenting the cellular localisation of all prokaryotic
proteins of all sequenced genomes, as predicted by numerous bioinformatic localisation tools. A
list of all the tools used in CoBaltDB is given at the end of this guide.
1.2. Graphical User Interface (GUI)
Not only does CoBaltDB supply the localization predictions given by the selected tools, but it
also seeks to facilitate the job of the biologist or bioanalyst by providing a certain number of
potentialities, such as:









Giving the tools predictions of the subcellular localizations of a list of proteins
identified by their locus tags or by their genome name;
Giving the tools predictions of the presence or truth of localization features
(Tat or Sec signal peptides, lipoproteins, presence of transmembrane domains,
beta barrels) for all considered proteins;
Providing the annotation information for all considered proteins, with links to
the corresponding NCBI and KEGG web sites;
Sorting the proteins with respect to the presence or truth of their localization
features;
Providing the raw data of the tools used within CoBaltDB for the considered
proteins;
Supplying a synopsis recapitulating all localization-related information;
Providing user-friendly graphs showing the positions of the signal peptides
and transmembrane domains predicted by all considered localization tools;
Allowing the user to submit the protein sequence to another ~50 localization
tools;
Allowing the user to save the tables and sysnopsis to xls and pdf files,
respectively.
2. Installation & Launch of the Application
CoBaltDB is a client-server application, with the server installed and staying at Biogenouest
Bioinformatics Platform, keeping all needed pre-computed data, while the CoBaltDB Client or GUI
is a Java application which communicates with the server via web-services. The CoBaltDB Client
needs to be downloaded on your computer and can be found on web site:
http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten
In order to run, CoBaltDB needs Java JRE 5 (or a more recent version). If not already installed
on your machine, the latter can be downloaded at the following address:
http://java.sun.com/javase/downloads/index_jdk5.jsp
Once CoBaltDB has been downloaded, unzip the CoBaltDB.zip or CoBaltDB.tar.gz file by
clicking on it, or by typing under Linux:
tar -xzvf CoBaltDB.tar.gz
A CoBaltDB/ directory should appear. In order to launch CoBaltDB, no matter which platform,
first go to the CoBaltDB/ directory;
On Windows, simply double-click on file: StartCoBaltDB.bat
On Mac OS X, double-click on file: StartCoBaltDB.command
On Linux, double-click on file: StartCoBaltDB.sh or in a terminal window, type:
./StartCoBaltDB
Depending on the requests submitted, the CoBaltDB client may require large amounts of
memory. By default, this is somewhat accounted for within the files above. However, if your
computer has less than 1Go available RAM, please replace the -Xmx1g option inside the above
corresponding file by –Xmx128m, –Xmx256m or –Xmx512m according to your system available RAM.
The application CoBaltDB launches; after a few seconds, you should see the following window
appear:
Figure 1. The Input tab at the beginning.
3. Description of the Application
3.1. The Input Tab
The Input tab, already shown in Figure 1, allows the biologist to express two kinds of requests to
CoBaltDB:
a) What are the localization predictions for all proteins of a given genome?
b) What are the localization predictions for a list of proteins defined by their locus tags?
Depending on the question, the bio-analyst needs to check the appropriate radio button:
3.1.1. Requesting all proteins of a genome
The genome of interest must be selected. The biologist may use the editable text field
to enter parts of the genome name, or simply browse through the genome
names given in alphabetical order.
Once the genome is selected, the biologist may submit the request to CoBaltDB. This is
performed by clicking on the
button. The CoBaltDB server then
receives the request, reads it, recognizes the submitted name of the pre-computed genome, and
returns to the client the desired data. The latter contains the localization information of all the
genes belonging to the selected genome.
Once the data has returned from the server, the CoBaltDB client window switches to present a
table showing the results of all feature tool boxes and associated databases, for all genes
belonging to the selected genome (Figure 2):
Figure 2. The Feature Boxes and Localization Cards Table.
It can be seen that, in addition to the Input tab, new tabs have been included in the CoBaltDB
client window:
 A Specialized Tools (Feature Predictions) tab showing all proteins together with their
corresponding annotation information, results of the feature box tools and
databases, and links to their synopsis.
 A Meta Tools (Localization Predictions) tab showing the localization predictions for
each protein from all retained global tools and global databases;
 An Additionnal Tools (Posts) tab enabling the submission of sequences to yet
another 50 localization tools or so.
All these tabs will be described in details in Sections 3.2. and 3.3.
3.1.2. Requesting a list of proteins designated by their locus tags
The alternative input consists in requesting a list of proteins. This is performed by selecting the
appropriate radio-button; a new panel appears on the input tab:
A list of locus tags may be constructed from this panel: the biologist needs to enter the first locus
tag in the text field
, and then click on the
to the list which is shown just below.
button to add this locus tag
This step may be reproduced several times, finally yielding the desired list of locus tags.
Alternatively, the list of locus tags may be loaded from a text file, which contains one locus tag
per line, by clicking on the button
. Selecting a particular locus tag within the list and then
clicking on the
button removes the selected locus tag from the list. The list is given the
name appearing in the designation text field
. At last, clicking on
the
button submits the request to the server and the localization
feature panel is displayed, showing the localization features of the proteins corresponding to the
requested genes:
3.2. The Specialized Tools (Feature Predictions) Tab
3.2.1. The Main Panel
This panels presents, for each gene in the replicon or genome, or for each gene whose locus tag
belongs to the uploaded list, some annotation and localization information: the locus tag of the
gene, its protein identifier (id.), gene name and description (as present in the annotation), replicon
name, feature boxes and predictions of localization databases are shown on a single line.
Selecting a line (corresponding to a single gene) and clicking on the
and
buttons results in opening the default browser to the NCBI and KEGG
information web pages for that gene, respectively, e.g.
There are five feature boxes: the Lipo, Tat, Sec, Helix (transmembrane) and Barrel
(outer membrane) boxes. These boxes gather the results from the different tools, integrated in
CoBaltDB, and provide some prediction for the corresponding feature: for instance, the Tat box
gathers the tools predicting the presence/absence of Tat signal peptides within each protein.
For each box and each protein, the percentage of tools predicting the presence or truth of
the considered feature can be visualized by clicking on the protein line. This percentage is also
somewhat shown by using different shades of cobalt blue for colouring the corresponding cells.
Clicking on one of those cells will give the actual results of all tools belonging to the
considered box for the corresponding protein. This capability will be further described below in
section 3.2.2.
Clicking on the header of any column will sort the whole table according to the
alphanumerical order of the information contained within that column.
Clicking again on the same header will sort the table in the inverse order. This capability
is particularly interesting if one wants to search for certain kinds of proteins: for instance, a
biologist who would like to find e.g. all transmembrane proteins with sec signal peptides would
just need to click (once or twice) on the Helix header and then on the Sec header; all proteins
with just these features will be sorted in the table. Notice that the order according to which
proteins are sorted will eventually yield different results.
The interface provides other controls for the biologist: the Replicon combo-box allows
viewing the proteins of the selected replicon only. The table can be saved under xls formatted
files by clicking on the
button and specifying the repertory and file name. The
table may also be searched (locus tag, protein id, annotation gene name or description, etc.) by
entering the desired expression into the Search field
and then clicking on
the Search button to search from the beginning or the Next button to look for the next
occurrence(s).
3.2.2. The Tools Raw Data Window
This window appears whenever a shaded cobalt blue cell, corresponding to a particular protein
and localization feature box (Lipo, Tat, Sec, Helix or Barrel), is clicked. The window recalls
the information relative to the gene (its genome, replicon and locus tag) and localization feature
box (its name). It also displays a different tab for each localization tool that actually gave some
results for that protein and belongs to the feature box. In every tab (i.e. for every tool), all raw
data specific to the considered tool, and recorded from the pre-computing process, is displayed in
a table showing the name and value of each recorded property:
In this way, the biologists or bioanalysts can understand the percentage value (or cobalt blue
shade given to the selected cell). They may also retrieve the information they are used to when
analyzing using their favourite tool, which may help them interpret the results and draw some
hypotheses regarding the actual features or localization of the considered protein.
3.2.3. The Synopsis
CoBaltDB provides a synopsis giving the results returned by the localization tools for every
particular protein. It presents the details of the protein (locus tag, protein id, gene name, position
on the genome, organism, replicon name, annotation description and sequence) in their upper
panel. The lower panel displays more precise localization-oriented information. A Save to pdf
button allows saving the whole window as a pdf formatted file.
The synopsis gives all details retained within CoBalt: information with respect to the
protein being a lipoprotein, or having a signal peptide, then proposing its consensual position,
information with respect to possible transmembrane domains and their consensual positions,
information from the global tools and databases with regard to their prediction concerning some
precise localization. The raw data of the tools are not given here. The synopsis has been designed
so as to fit onto a single A4 sheet.
The figure below shows the information given by the synopsis.
3.3. The Meta Tools (Localization Predictions) Tab
The third tab within CoBaltDB displays, for all considered proteins, the localization predictions
of the considered global or meta- tools  they are called meta- because they integrate results
from other tools  and from the global databases, which directly propose some prediction(s) for
the cellular localization.
A different colour is used for every different localization prediction. As before, this table
may be searched and saved under xls format.
3.4. The Additional Tools Submission Window
Finally, the fourth and last tab within CoBaltDB enables the user to submit any particular gene to
yet another 50 or so additional localization tools, i.e. other than those whose results are displayed
or used within CoBaltDB. The different tools are organized within feature panels. The gene must
be selected from the other tabs by clicking on the corresponding line of the tables.
The selected gene is specified via its locus tag and amino-acid sequence. Below this
information, a list of localization tools that have not been used to construct the CoBaltDB
database appears in the form of different check-boxes, organized in several panels gathering
respectively additional lipoprotein tools, some signal peptide prediction tools, transmembrane
prediction tools, beta-barrel tools and finally global localization prediction meta-tools. Checking
the desired tools and then clicking on the Launch button should result in opening, for each
selected web tool, the default browser showing the corresponding web sites, with the sequence
and possibly gram colour filled in in the appropriate place. Only a few webtools, marked with an
asterix, will not have the sequence and gram information filled in. The biologist should then only
need to press the submit button within the web site to actually launch the web tool processes and
eventually collect the results.
4. Contact
We hope you will find CoBaltDB useful and this guide helpful.
If you have any questions or suggestions, feel free to contact us at:
rennes1.fr
Thank you,
The B@SIC team.
stephane.avner@univ-