* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Gene converter - Bioinformatics Platform
Biology and consumer behaviour wikipedia , lookup
Protein moonlighting wikipedia , lookup
RNA interference wikipedia , lookup
Transposable element wikipedia , lookup
X-inactivation wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Point mutation wikipedia , lookup
Public health genomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome (book) wikipedia , lookup
The Selfish Gene wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
TABLE OF CONTENTS 1. 2. 3. 4. Nomenclature systems CBS: Gene converter (input) CBS: Gene converter (output) CBS: Customize options SUMMARY CBS is a very helpful tool when characterizing the binding sites for certain TFs in a regulatory sequence. However, it is not uncommon that other applications deal with a different nomenclature for the genes involved in the study. Thus, it is interesting to use this CBS tool to convert gene identifiers, which favours doubtless the interchange of information between bioinformatics tools. DIFFICULTY Low TIME 30 minutes 1. NOMENCLATURE SYSTEMS Genes usually receive their name from the original publication in which were discovered for the first time. In addition, there are several nomenclatures available to mention a gene as gene symbols, or accession identifiers assigned in the databases that store genomic information (FlyBase, RefSeq, etc...). In general, we consider a gene symbol as the official abbreviation of a full gene name. Thus, the absent, small, or homeotic discs 2 gene can be referred as ash2. In publications, full gene names are used the first time the gene is mentioned, while the short form is utilized in posterior citations. In bioinformatics treatments, the gene symbol is more useful as it favours easy comparisons between gene lists. FlyBase is the reference repository of Drosophila. The accession identifiers usually start with the FBgn prefix (stands for FlyBase gene number). Thus, our ash2 gene is converted into the FBgn0000139 accession identifier. Both gene symbols and FlyBase numbers encode for a gene and all its transcripts derived from alternative splicing under the same accession identifier. RefSeq is another major hub of genomic annotations. RefSeq identifiers are associated to individual transcripts. In that way, each alternative form of a gene receives a unique identifier. RefSeq codes use the NM_ nomenclature for mRNAs, NR_ for RNA and NP_ for proteins (among others). As up to three alternative forms encode the ash2 gene, each one must receive its own identifier: NM_170159, NM_170160 and NM_176558. Finally, the NCBI website offers its own system of gene nomenclature that operates among multiple species in the phylogenetic tree. Here, each gene is defined by a single number that encodes all the alternative transcripts in single entry. The RefSeq identifers can be used here to access a particular gene. The ash2 gene is converted into the 42936 code under the ENTREZ query system. 2. CBS: GENE CONVERTER (INPUT) This is the main web form that CBS users must fill in to convert a list of gene names into another nomenclature. It basically consists of a text area to submit the genes (or to upload the file of genes), and a few options to customize the output. Once the web page is loaded, a dummy list of genes is shown to demonstrate the functionalities of this CBS tool. The gene box must contain each gene name in the nomenclature detailed below by the user. Each gene must appear in a single line. Empty lines are not processed afterwards. The conversion from one reference format to a second output format must be indicated here (pairwise conversions, except when demanding the Full description option): In this example, the gene names follow the gene symbol nomenclature. Let us convert them into RefSeq accession codes. Other options that are explained later in this tutorial allow users to define the format of the conversion and affect the output length. To define a two-column output or a single column result. Or to show all alternative transcripts in the conversion (only for RefSeq operations). Throughout this procedure, the user can access a CBS short guide in the helpbox that is just attached at the end of each web site. This guide contains additional descriptions, help and comments about input options, output formats and possible errors: 3. CBS: GENE CONVERTER (OUTPUT) This is the output when the default example is executed. Results are shown as an interactive web table, incorporating in addition a link below to save them as a flat file. The conversion table usually contains links to visit the original entry associated to this Drosophila gene. Here, we access from CBS the NCBI Entrez entry for this RefSeq: Bioinformatics developers can use the identifier on each platform to build the link directly from one document to the main entry: Thus, http://www.ncbi.nlm.nih.gov/gene?term=NM_170159 is the link in NCBI ENTREZ (RefSeq) to the ash2 gene. For further computational analysis, the lists of genes can be saved as flat files: Apart from the pairwise conversions, you can choose the Full description to obtain all identifiers for a given gene: All the converted columns contain a weblink to visit the original entry of this gene. 4. CBS: CUSTOMIZE OPTIONS CBS offers two customization variants to build more flexible conversion procedures. First, it is possible to show only the resulting identifier in the output (this is useful to submit this information in other external applications): If the user combines this former option with the number of alternative RefSeq transcripts that must be shown, it is possible to avoid biases in the results of other programs due to the number of isoforms: When we choose to display only one form per gene preserving the two-columns format, the representation that we obtain is the following: THE END