Download Title: The EMBL Nucleotide Sequence Database (EMBL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Human genome wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Pathogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Genomics wikipedia , lookup

Sequence alignment wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Title: The EMBL Nucleotide Sequence Database (EMBL-Bank): Submission methods to the
repository, data curation and innovative solutions to the handling of large datasets.
Authors: G. Mukherjee, R. Akhtar, R. J. Vaughan, G. Cochrane and R. Apweiler
Abstract:
The EMBL Nucleotide Sequence Database (EMBL-Bank), represents Europe's leading publically
available repository of nucleotide sequence data. Sequence data can be submitted to the database in
a number of ways. The tools offered to the submitter depend on firstly the size of the dataset, in
terms of both numbers of individual sequences and the size of the sequence itself; and secondly the
biocomputational expertise of the submitter or institution concerned.
Webin is an online submission tool and consists of a set of web pages with javascript checking,
where the submitter is offered extensive online help, and the use of standard submission examples
to provide appropriate annotation for the sequences they submit. The submitter may also upload
the annotation as embl formatted feature table exported from third party software such as Artemis.
Following submission the data is reviewed by a curator within two working days and either
accession numbers are assigned or queries are asked regarding the annotation. An important
function of the curation review is to add value to the annotation whenever it is considered
appropriate.
Bulk submissions are those where more than 24 related sequences are submitted. In these cases
submitters are requested to provide a representative sequence which following curator review
results in individually designed templates that submitter appends with the appropriate annotation.
A relatively recent innovation is that instead of completing individual templates the submitter can
upload their data with the relevant annotation and sequences as a single fasta formatted file. This
method also has the advantage that the submitter can submit very large datasets in a widely used
format with the appropriate annotation.
The demonstration will cover:
 Small scale submission of annotated sequence.
 Submission of large numbers of related entries (e.g. 16S rRNA gene, EST).
 Submission of complete genomes, importing pre-prepared annotation from third party tools.
 Submission of multiple annotated sequences in alignment format (large numbers of entries
with complex annotation (e.g. multi-exon genes, complete viral genomes).
 Submission of alignment data.