Download Gene converter - Bioinformatics Platform

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biology and consumer behaviour wikipedia , lookup

Protein moonlighting wikipedia , lookup

RNA interference wikipedia , lookup

Transposable element wikipedia , lookup

X-inactivation wikipedia , lookup

Epistasis wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Point mutation wikipedia , lookup

Public health genomics wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

History of genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene desert wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene nomenclature wikipedia , lookup

Designer baby wikipedia , lookup

Transcript
TABLE OF CONTENTS
1.
2.
3.
4.
Nomenclature systems
CBS: Gene converter (input)
CBS: Gene converter (output)
CBS: Customize options
SUMMARY
CBS is a very helpful tool when characterizing the binding sites for certain TFs in a
regulatory sequence. However, it is not uncommon that other applications deal with a
different nomenclature for the genes involved in the study. Thus, it is interesting to use
this CBS tool to convert gene identifiers, which favours doubtless the interchange of
information between bioinformatics tools.
DIFFICULTY
Low
TIME
30 minutes
1. NOMENCLATURE SYSTEMS
Genes usually receive their name from the original publication in which were
discovered for the first time. In addition, there are several nomenclatures available to
mention a gene as gene symbols, or accession identifiers assigned in the databases that
store genomic information (FlyBase, RefSeq, etc...).
In general, we consider a gene symbol as the official abbreviation of a full gene name.
Thus, the absent, small, or homeotic discs 2 gene can be referred as ash2. In
publications, full gene names are used the first time the gene is mentioned, while the
short form is utilized in posterior citations. In bioinformatics treatments, the gene
symbol is more useful as it favours easy comparisons between gene lists.
FlyBase is the reference repository of Drosophila. The accession identifiers usually start
with the FBgn prefix (stands for FlyBase gene number). Thus, our ash2 gene is
converted into the FBgn0000139 accession identifier.
Both gene symbols and FlyBase numbers encode for a gene and all its transcripts
derived from alternative splicing under the same accession identifier.
RefSeq is another major hub of genomic annotations. RefSeq identifiers are associated
to individual transcripts. In that way, each alternative form of a gene receives a unique
identifier. RefSeq codes use the NM_ nomenclature for mRNAs, NR_ for RNA and NP_
for proteins (among others).
As up to three alternative forms encode the ash2 gene, each one must receive its own
identifier: NM_170159, NM_170160 and NM_176558.
Finally, the NCBI website offers its own system of gene nomenclature that operates
among multiple species in the phylogenetic tree. Here, each gene is defined by a single
number that encodes all the alternative transcripts in single entry. The RefSeq identifers
can be used here to access a particular gene.
The ash2 gene is converted into the 42936 code under the ENTREZ query system.
2. CBS: GENE CONVERTER (INPUT)
This is the main web form that CBS users must fill in to convert a list of gene names
into another nomenclature. It basically consists of a text area to submit the genes (or to
upload the file of genes), and a few options to customize the output.
Once the web page is loaded, a dummy list of genes is shown to demonstrate the
functionalities of this CBS tool.
The gene box must contain each gene name in the nomenclature detailed below by the
user. Each gene must appear in a single line. Empty lines are not processed afterwards.
The conversion from one reference format to a second output format must be indicated
here (pairwise conversions, except when demanding the Full description option):
In this example, the gene names follow the gene symbol nomenclature. Let us convert
them into RefSeq accession codes.
Other options that are explained later in this tutorial allow users to define the format of
the conversion and affect the output length.
To define a two-column output or a single column result.
Or to show all alternative transcripts in the conversion (only for RefSeq operations).
Throughout this procedure, the user can access a CBS short guide in the helpbox that is
just attached at the end of each web site. This guide contains additional descriptions,
help and comments about input options, output formats and possible errors:
3. CBS: GENE CONVERTER (OUTPUT)
This is the output when the default example is executed. Results are shown as an
interactive web table, incorporating in addition a link below to save them as a flat file.
The conversion table usually contains links to visit the original entry associated to this
Drosophila gene. Here, we access from CBS the NCBI Entrez entry for this RefSeq:
Bioinformatics developers can use the identifier on each platform to build the link
directly from one document to the main entry:
Thus, http://www.ncbi.nlm.nih.gov/gene?term=NM_170159 is the link in NCBI
ENTREZ (RefSeq) to the ash2 gene.
For further computational analysis, the lists of genes can be saved as flat files:
Apart from the pairwise conversions, you can choose the Full description to obtain all
identifiers for a given gene:
All the converted columns contain a weblink to visit the original entry of this gene.
4. CBS: CUSTOMIZE OPTIONS
CBS offers two customization variants to build more flexible conversion procedures.
First, it is possible to show only the resulting identifier in the output (this is useful to
submit this information in other external applications):
If the user combines this former option with the number of alternative RefSeq
transcripts that must be shown, it is possible to avoid biases in the results of other
programs due to the number of isoforms:
When we choose to display only one form per gene preserving the two-columns format,
the representation that we obtain is the following:
THE END