Download GenBank Accession Number Reference Sheet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Large numbers wikipedia , lookup

Addition wikipedia , lookup

Elementary mathematics wikipedia , lookup

Transcript
GenBank Accession Number Reference Sheet:
The International Nucleotide Sequence Database Collaboration (INSDC) consists of the DNA
DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL) and GenBank
at NCBI. As part of the Collaboration, all three organizations accept new sequence submissions
and share sequence data among the three databases. To facilitate the exchange of data, each
member of the collaboration is assigned certain accession prefixes. In addition to the accession
number, GenBank records also have a GI number. The GI number is simply a series of digits
assigned consecutively to sequences submitted to NCBI.
Format of GenBank accession numbers:
Type
Format
Nucleotide
1 letter + 5 numbers or 2 letters + 6 numbers
Protein
3 letters + 5 numbers
WGS
4 letters + 2 numbers for WGS assembly version + 6-8 numerals
Primary GenBank accession number prefixes:
Prefixes
Data Source
AE, CP, CY
Genome projects (nucleotide)
U, AF, AY, DQ
Direct submissions (nucleotide)
AAAA-AZZZ
Whole genome shotgun sequences (nucleotide)
AAA-AZZ
Protein ID
EAA-EZZ
WGS protein ID
O, P, Q
Swissprot (protein)
Version number suffix:
GenBank sequence identifiers consist of an accession number of the record followed by a dot
and a version number (i.e. accession.version). The version number is incremented whenever the
sequence record is updated.
Refseq Accession Format:
Refseq accession numbers do not follow the standards set by INSDC. It has a distinct format of
2 letters + underbar + 6 digits (i.e. NM_012345). Refseq records can either be curated (manually
reviewed by NCBI staff or collaborators) or automated (records not individually reviewed).
Prefixes
NC, NG
NM
NR
NP
NT, NW
XM
XR
XP
Molecule
Genomic
MRNA
RNA
Protein
Genomic
MRNA
RNA
Protein
Method
Curated
Curated
Curated
Curated
Automated
Automated
Automated
Automated
The complete list of accession numbers is available at http://www.ncbi.nlm.nih.gov/Sequin/acc.html.