Download NUCLEOTIDE and PROTEIN databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular ecology wikipedia , lookup

RNA-Seq wikipedia , lookup

Messenger RNA wikipedia , lookup

Gene expression wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Epitranscriptome wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Genetic code wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein structure prediction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Transcript
COURSE OF BIOINFORMATICS
a.a. 2015-2016
NUCLEOTIDE and PROTEIN
databases
NCBI/GenBank
(USA)
1983
EMBL/ENA
(Europe)
1982
NIG/DDBj
(Japan)
1984
The International Sequence Database Collaboration (1988)
NCBI
NIG
GenBank
DDBJ
ENA
EBI
GenBank (NCBI)
http://www.ncbi.nlm.nih.gov/genbank/
Is a comprehensive public database of nucleotide sequences
and supporting bibliographic and biological annotations
GenBank data are available at no cost through FTP or through
a wide range of web-based retrieval and analysis tools
Is built primarily from the submission of sequence data from authors
(i.e BankIt) and from bulk submission of high-throughput data from
sequencing centre (i.e. Sequin)
GenBank (NCBI)
GenBank (NCBI)
GenBank (NCBI)
http://www.ncbi.nlm.nih.gov/genbank/
Each GenBank record, consisting of both a sequence and
its annotations, is assigned a unique identifier called an
accession number
The accession number remains constant over the lifetime of
the record, even when there is a change to the sequence or to
the annotation.
ACCESSION Number: AF000001
GenBank (NCBI)
http://www.ncbi.nlm.nih.gov/genbank/
Changes to the sequence data itself are tracked by an
integer extension of the accession number
ACCESSION: AF000001
VERSION:
AF000001.1
GenBank (NCBI)
http://www.ncbi.nlm.nih.gov/genbank/
In addition, each version of the DNA sequence is also
assigned a unique NCBI identifier called a
GI number
ACCESSION AF000001
VERSION AF000001.1 GI: 987654321
GenBank (NCBI)
http://www.ncbi.nlm.nih.gov/genbank/
When a change is made to a sequence in a GenBank
record, a new GI number is issued to the updated
sequence and the version extension of the
Accession.version identifier is incremented.
ACCESSION
AF000001.1
GI: 987654321
VERSION
AF000001.2
GI: 998594213
How to search GenBank?
Entrez Nucleotide
http://www.ncbi.nlm.nih.gov/nucleotide/
Entrez Nucleotide
http://www.ncbi.nlm.nih.gov/nucleotide/
Entrez Nucleotide
http://www.ncbi.nlm.nih.gov/nucleotide/
How are Nucleotide records organized?
Which kind of annotation
are associated with a nucleotide sequence?
Which kind of information I can retrieve?
TRY NOW:
Find the record corresponding to
Accession Number: M60495
Search M60495 at EBI: could you find the same information?
Search M60495 at NIG: could you find the same information?
Entrez Nucleotide
http://www.ncbi.nlm.nih.gov/nucleotide/
Does M60495 sequence corresponds to the complete
sequence of the human profilaggrin mRNA?
What about the complete sequence of the human
mRNA profilaggrin sequence?
TRY NOW:
Search Nucleotide by profilaggrin as text
(Use Boolean operators or Advanced search if necessary)
The human profilaggrin (FLG) mRNA
The human profilaggrin (FLG) mRNA
The human profilaggrin (FLG) mRNA
The RefSeq database
Database of Expressed Sequence Tags (dbEST)
Sequence Read Archive (SRA)
What about PROTEIN sequences??
Amino acid sequences at NCBI
The majority of amino acid sequences arises
from translation of nucleotide sequences.
Protein records at NCBI
RefSeq Protein records at NCBI
Continue ….
The first amino acid sequence database was
developed by Margaret O. Dayhoff.
Margaret Belle Dayhoff
(1925 – 1983)
was an American physical chemist and
a pioneer in the field of bioinformatics
From this archive grew the
Protein Information Resource (PIR)
at the National Biomedical Research Foundation
of the Georgetown University Medical Center
in Washington DC, USA
UniProt is a single worldwide database
of protein sequence and function,
unifying the PIR-PSD, Swiss-Prot, and TrEMBL databases
(Created in 2002 thanks to a NIH grant)
The Universal Protein Resource
The mission of UniProt is:
to provide the scientific community with a
comprehensive, high-quality and freely
accessible resource of
protein sequence and functional information.
The UniProt databases
UniProt Knowledgebase
UniProtKB
UniProt Reference Clusters
UniRef
UniProt Archive
UniParc
UniProt Knowledgebase
UniProtKB
The UniProt Knowledgebase, the centrepiece of the UniProt Consortium’s activities, is
an expertly and richly curated protein database, consisting of two sections called
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL.
UniProtKB/Swiss-Prot
This is a high quality manually
annotated (reviewed) and non
redundant protein sequence
database, which brings
together experimental results
and computed features.
UniProtKB/TrEMBL
This is a high-quality
computationally analysed
(unreviewed) records, all
p r o t e i n s e q u e n c e s
(redundant) from TrEMBL
UniProt Reference Clusters
UniRef
Three UniRef databases merging sequences automatically
across species on the basis of sequence identity
UniRef100: database combines identical sequences and subfragments with 11 or more residues (from any
organism) into a single UniRef entry
UniRef90:
is build by clustering UniRef100 sequences such
that each cluster is composed of sequences that
have at least 90% identity (40% reduction)
UniRef50:
is build by clustering UniRef90 sequences such
that each cluster is composed of sequences that
have at least 50% identity (65% reduction)
http://www.uniprot.org (2014!!)
Search for: filaggrin
UniProt records
2013!!!
UniProt records (II)
Continue ….
How to search UniProt
2013!!
search UniProt for:
RRM
(> 113K records)
RRM in human
(> 1K records)
RRM in human
RRM(as domain)
(241 records)
..Have a look of Q9NW13
corresponding to RBM28
2013!!!
2013!!!
prosite
InterPro
Protein resources at NCBI
NCBI
Structure db
Continue ….
Benson DA et al., 2011
GenBank
Nucleic Acids Res. 39:D32-7
NCBI: ENTREZ SEQUENCE
http://www.ncbi.nlm.nih.gov/books/NBK44864/