Download On line (DNA and amino acid) Sequence Information

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Genome evolution wikipedia, lookup

Cre-Lox recombination wikipedia, lookup

Human genome wikipedia, lookup

Epigenetics of neurodegenerative diseases wikipedia, lookup

Nutriepigenomics wikipedia, lookup

Protein moonlighting wikipedia, lookup

Non-coding DNA wikipedia, lookup

History of genetic engineering wikipedia, lookup

Vectors in gene therapy wikipedia, lookup

NEDD9 wikipedia, lookup

United Kingdom National DNA Database wikipedia, lookup

Pathogenomics wikipedia, lookup

Gene nomenclature wikipedia, lookup

Site-specific recombinase technology wikipedia, lookup

Gene expression profiling wikipedia, lookup

Microevolution wikipedia, lookup

RNA-Seq wikipedia, lookup

Gene wikipedia, lookup

Genome editing wikipedia, lookup

Designer baby wikipedia, lookup

Genomics wikipedia, lookup

Metagenomics wikipedia, lookup

Point mutation wikipedia, lookup

Therapeutic gene modulation wikipedia, lookup

Helitron (biology) wikipedia, lookup

Artificial gene synthesis wikipedia, lookup

On line (DNA and amino acid)
Sequence Information
Lecture 9
Annotation of genes
Basic bioinformatics Databases
NCBI home page
Query and return results
DNA sequence results page
Protein sequence results page
Bioinformatcs Databases
• The Biological data, generated by various labs, is
submitted and stored in specific databases is :
• The data can be:
– Nucleotide: DNA and mRNA (cDNA)
– Proteins sequences
• The main nucleotide sequence databases are:
– United states: Genebank (NCBI)
– Europe: Nucleotide sequence database (EMBL)
– Japan: DNA databank of Japan. (DDJB)
• These databases also contain sequences related
– Expressed sequence tags (ESTs) small (800 bp) of
mRNA that be used to see what genes are expressed…
Protein Databases
• The main protein databases is:
• Uniprot (DB) databases contains data from three
related databases sites:
– SWISS-PROT (most up-to date information)
– Trembl: (translation of coding sequences.)
– PIR database [protein information resource]
• Both the nucleotide and protein databases
contain much more detail than just sequences.
The data is generated is referred to gene
annotated data.
The Annotation of genes
• Once the gene sequence’s have been determined
then the data must be annotated, This basic
annotated data includes: (Klug 2010)
– Identify regulatory regions
– Identify coding sequences (cds); the exons/ introns (if
a sequence; eukaryotic)….
– The amino acid sequence for the gene.
– Other organisms where the DNA sequence/ AA
sequence is to found
– Journals/Reference to where data came from.
– Links to other databases that contain information
about the gene,
Global Sequence
Bioinformatics Database
• To faciliate finding annotated data about genes
and protein information there are a number of
sites containing specific search engines;
– EMBL has the EBI search page previously SRS engine
– The SIB ExPaSy search engine (This is more fosuces on
protein related information. )
• Consider the following query:
– What is the DNA and amino acid sequence for the
following gene: Human BTEB
– Type the following into the search text box:
– Human[orgamism] AND BTEB[title]
NCBI Entrez search page
BTEB NCBI Nucleotide Record
Coding section of gene
The Exon intron structure is also available in graphic form
Further information
• On the right hand column you will find links to
online analytical resources; e.g. BLAST (psiblast) (a tool to search for similar sequences
contained in the database):
• Information on the amino acid sequence
obtained for the CDs of the gene. The text box
also provides a link to information on the
protein in the uniprot database.
An EMBL nucleotide record
• Annotated data can also be found in the EMBL
• BTEB EMBL record.: shows the main record.
• Clicking on the “text” link at the top right
hand corner will give the essential features of
the gene. BTEB-EMBL-EBI_text_record.
• An ExPASy database search gives the following
information for this gene: Type BTEB and then
BTEB and Human
The BTEB Protein record
A link to a graphic representation of the protein and the relevant
annotated data can be found at: BTEB Human Protein
Other databases databases
• The nucleotide (Genbank and EMBL) and
protein (Uniprot) contain the “raw data” and
are referred to as “primary databases”.
– More specific databases derive data from these
and are referred to as secondary database;
examples include protein family and sequence
similarity databases such as PROSITE and PRINTS
– There are databases which contain information
about specific organisms such as e. coli using
Genome online database (GOLD)
Other databases
– Databases for specific types of sequences such as
those associated with promoters and other regulatory
elements. dbEST ; Homologous structure alignment
– Structural databases from the Protein Data Bank
– On-line Mendelian inheritance of man (OMIM) which
contains information on human genes and genetic
• The nucleic acids research journal January
edition provides up-to-date analysis of current
online bioinformatics databases: Nucleic acid
research database edition
Other important information sources
• PUBMED: Literature research: journal articles/
conference proceedings/ books etc.
Search under many fields: keyword, author….
Returns: journal articles/abstracts
Two types: general/review.
BTEB pubmed search found at:
• The user can register a NCBI account to manage
their activity and store findings of: gene
searches; pubmed searches…. This information
can be download, emailed….
BTEB pubmed search result
• The EMBL-EBI record: BTEB_”text”_record.
• The NCBI : BTEB NCBI Nucleotide Record
• The DDJB: BTEB flatfile Record
• Exercise: write a briefy report comparing and
contrasting the core elements of both records:
refer to page 8-16 in Bioinformatics: A practical
guide to the analysis of genes and proteins 3rd
edition ; Book can be found in the library.
• Search for the following gene “DNA”
– Human Leukocyte Elastase gene linear DNA [ hint
should be 5292 bp long].
– Retrieve the record and download and save the
fasta file.