Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Human genome wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
Pathogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Helitron (biology) wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Title: The EMBL Nucleotide Sequence Database (EMBL-Bank): Submission methods to the repository, data curation and innovative solutions to the handling of large datasets. Authors: G. Mukherjee, R. Akhtar, R. J. Vaughan, G. Cochrane and R. Apweiler Abstract: The EMBL Nucleotide Sequence Database (EMBL-Bank), represents Europe's leading publically available repository of nucleotide sequence data. Sequence data can be submitted to the database in a number of ways. The tools offered to the submitter depend on firstly the size of the dataset, in terms of both numbers of individual sequences and the size of the sequence itself; and secondly the biocomputational expertise of the submitter or institution concerned. Webin is an online submission tool and consists of a set of web pages with javascript checking, where the submitter is offered extensive online help, and the use of standard submission examples to provide appropriate annotation for the sequences they submit. The submitter may also upload the annotation as embl formatted feature table exported from third party software such as Artemis. Following submission the data is reviewed by a curator within two working days and either accession numbers are assigned or queries are asked regarding the annotation. An important function of the curation review is to add value to the annotation whenever it is considered appropriate. Bulk submissions are those where more than 24 related sequences are submitted. In these cases submitters are requested to provide a representative sequence which following curator review results in individually designed templates that submitter appends with the appropriate annotation. A relatively recent innovation is that instead of completing individual templates the submitter can upload their data with the relevant annotation and sequences as a single fasta formatted file. This method also has the advantage that the submitter can submit very large datasets in a widely used format with the appropriate annotation. The demonstration will cover: Small scale submission of annotated sequence. Submission of large numbers of related entries (e.g. 16S rRNA gene, EST). Submission of complete genomes, importing pre-prepared annotation from third party tools. Submission of multiple annotated sequences in alignment format (large numbers of entries with complex annotation (e.g. multi-exon genes, complete viral genomes). Submission of alignment data.