On line (DNA and amino acid)
Sequence Information
Lecture 9
Annotation of genes
Basic bioinformatics Databases
NCBI home page
Query and return results
DNA sequence results page
Protein sequence results page
Bioinformatcs Databases
• The Biological data, generated by various labs, is
submitted and stored in specific databases is :
• The data can be:
– Nucleotide: DNA and mRNA (cDNA)
– Proteins sequences
• The main nucleotide sequence databases are:
– United states: Genebank (NCBI)
– Europe: Nucleotide sequence database (EMBL)
– Japan: DNA databank of Japan. (DDJB)
• These databases also contain sequences related
– Expressed sequence tags (ESTs) small (800 bp) of
mRNA that be used to see what genes are expressed…
Protein Databases
• The main protein databases is:
• Uniprot (DB) databases contains data from three
related databases sites:
– SWISS-PROT (most up-to date information)
– Trembl: (translation of coding sequences.)
– PIR database [protein information resource]
• Both the nucleotide and protein databases
contain much more detail than just sequences.
The data is generated is referred to gene
annotated data.
The Annotation of genes
• Once the gene sequence’s have been determined
then the data must be annotated, This basic
annotated data includes: (Klug 2010)
– Identify regulatory regions
– Identify coding sequences (cds); the exons/ introns (if
a sequence; eukaryotic)….
– The amino acid sequence for the gene.
– Other organisms where the DNA sequence/ AA
sequence is to found
– Journals/Reference to where data came from.
– Links to other databases that contain information
about the gene,
Global Sequence
Bioinformatics Database
• To faciliate finding annotated data about genes
and protein information there are a number of
sites containing specific search engines;
– EMBL has the EBI search page previously SRS engine
– The SIB ExPaSy search engine (This is more fosuces on
protein related information. )
• Consider the following query:
– What is the DNA and amino acid sequence for the
following gene: Human BTEB
– Type the following into the search text box:
– Human[orgamism] AND BTEB[title]
NCBI Entrez search page
BTEB NCBI Nucleotide Record
Coding section of gene
The Exon intron structure is also available in graphic form
Further information
• On the right hand column you will find links to
online analytical resources; e.g. BLAST (psiblast) (a tool to search for similar sequences
contained in the database):
• Information on the amino acid sequence
obtained for the CDs of the gene. The text box
also provides a link to information on the
protein in the uniprot database.
An EMBL nucleotide record
• Annotated data can also be found in the EMBL
• BTEB EMBL record.: shows the main record.
• Clicking on the “text” link at the top right
hand corner will give the essential features of
the gene. BTEB-EMBL-EBI_text_record.
• An ExPASy database search gives the following
information for this gene: Type BTEB and then
BTEB and Human
The BTEB Protein record
A link to a graphic representation of the protein and the relevant
annotated data can be found at: BTEB Human Protein
Other databases databases
• The nucleotide (Genbank and EMBL) and
protein (Uniprot) contain the “raw data” and
are referred to as “primary databases”.
– More specific databases derive data from these
and are referred to as secondary database;
examples include protein family and sequence
similarity databases such as PROSITE and PRINTS
– There are databases which contain information
about specific organisms such as e. coli using
Genome online database (GOLD)
Other databases
– Databases for specific types of sequences such as
those associated with promoters and other regulatory
elements. dbEST ; Homologous structure alignment
– Structural databases from the Protein Data Bank
– On-line Mendelian inheritance of man (OMIM) which
contains information on human genes and genetic
• The nucleic acids research journal January
edition provides up-to-date analysis of current
online bioinformatics databases: Nucleic acid
research database edition
Other important information sources
• PUBMED: Literature research: journal articles/
conference proceedings/ books etc.
Search under many fields: keyword, author….
Returns: journal articles/abstracts
Two types: general/review.
BTEB pubmed search found at:
• The user can register a NCBI account to manage
their activity and store findings of: gene
searches; pubmed searches…. This information
can be download, emailed….
BTEB pubmed search result
• The EMBL-EBI record: BTEB_”text”_record.
• The NCBI : BTEB NCBI Nucleotide Record
• The DDJB: BTEB flatfile Record
• Exercise: write a briefy report comparing and
contrasting the core elements of both records:
refer to page 8-16 in Bioinformatics: A practical
guide to the analysis of genes and proteins 3rd
edition ; Book can be found in the library.
• Search for the following gene “DNA”
– Human Leukocyte Elastase gene linear DNA [ hint
should be 5292 bp long].
– Retrieve the record and download and save the
fasta file.