Download PIR-International Protein Sequence Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Biochemical cascade wikipedia , lookup

Magnesium transporter wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Proteolysis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
Archives and Information
Retrieval
CSC 487/687 Computing for
Bioinformatics
Introduction
 Learning objectives:
 What is the general arrangement of biological data in
the public databases?

To know the information retrieval skills that will allow
you to make effective use of the databases.

To become familiar with basic operations.

How does one retrieve information on a particular
subject in the literature?
Primary public domain bioinformatics
servers
Public Domain
Bioinformatics
Facilities
National Center
For Biotechnology
Information (NCBI)
United States
Databases
Analysis
Tools
European Bioinformatics
Institute (EBI)
United Kingdom
Databases
Analysis
Tools
Genome
Net
(KEGG & DDBJ)
Japan
Databases
Analysis
Tools
The Archives
 Massive biological experimental data
 These biological information databases can
be classified into two types

The first level databases


Come from the raw data which were obtained via
the experiments. “simple”
The second level databases

Further reorganized based on.. in order to
achieve some specific goals
The Archives
 Some examples:

The first level databases




Nucleic acid sequence databases: GenBank,
EMBL Data Library, DNA Database of Japan
(DDBJ)
Protein sequence database: SWISS-PROT, PIR
Protein structure database: PDB
The second level databases



GDB
TRANSFAC
SCOP
Nucleic acid sequence databases
 International DNA Sequence Database
Collaboration



NCBI (GenBank) – USA (1982)
EMBL (Data Library)– Europe (1982)
DDBJ (DNA Data Bank)– Japan (1988)
NCBI
 Established in USA in 1988 as a national
resource for molecular biology information
 creates public databases
 conducts research in computational biology
 develops software tools for analyzing genome
data
 disseminates biomedical information
Nucleic acid sequence databases
 GenBank





nucleic acid sequence and the protein
sequence
literature work
biological annotation
A new release is made every two months
GenBank information retrieval system
NCBI ENTREZ
 A platform that provides access to and links
to databases with biological information
ENTREZ
PubMed
MedLine
GenBank
Protein Genomes
databases
PopSet
Taxonomy
OMIM
NCBI ENTREZ
MedLine
OMIM
Literature Database
Database of human genes and genetic disorders
GenBank
Database of all publicly available DNA sequences
Protein
databases
Database of amino acid sequences from SwissProt, PIR, PRF,
PDB, and translations from annotated coding regions in
GenBank and RefSeq.
Genomes
Database of genomes from organisms and viruses
PopSet
Taxonomy
Database of DNA sequences that have been collected to
analyze the evolutionary relatedness of a population.
Database of names of organisms with sequences in GenBank or Prot
PubMed Center
 the U.S. National Library of Medicine's digital
archive of life sciences journal literature
 Access to the full text of articles in PMC is
free, except where a journal requires a
subscription for access to recent articles
OMIM-Online Mendelian
Inheritance in Man
 A catalog of human genes linked to diseases
 Began by Victor A. McKusick at Johns Hopkins
University
 A good place to start when you want to research a
certain disease or biological molecule
 This database is cross-referenced to PubMed and
other NCBI-based databases
How to submit sequence data to
GenBank
 Bankit based web interface

http://www.ncbi.nlm.nih.gov/BankIt
 Sequin program

http://www.ncbi.nlm.nih.gov/Sequin
On-class exercise
Protein databases
 The Protein Information Resource (PIR) was
established in 1984 by the National Biomedical
Research Foundation (NBRF).
 The PIR Protein Sequence Database evolved from
the original NBRF Protein Sequence Database,
developed over 20 years
 PIR-International is a collaboration between NBRF,
the Munich Information Center for Protein Sequences
(MIPS), and the Japan International Protein
Information Database (JIPID)
 collect and publish what is now the oldest and largest
database of biomolecular sequence, source,
literature, and feature information.
PIR
 PIR-International Protein Sequence Database: an annotated,
non-redundant and cross-referenced database of protein
sequences.
 PIR Alignment Database, PIR-ALN: contains sequence
alignments of superfamilies, families and homology domains
produced from information in the Protein Sequence Database.
 FAMBASE Family Database: a searchable database
containing a single representative sequence from each protein
family.
 RESID Database of Amino Acid Modifications: based on
feature information in the Protein Sequence Database.
PIR
 http://www-nbrf.georgetown.edu/pir/
SWISS-PROT
 http://www.ebi.ac.uk/swissprot/
 an well-annotated protein sequence database established in
1986.
 It is maintained collaboratively by the Swiss Institute for
Bioinformatics (SIB) and the European Bioinformatics Institute
(EBI).
 a curated protein sequence database that provides a high level
of annotation, a minimal level of redundancy and a high level of
integration with other databases.
Note: UniProtKB/TrEMBL and UniProtKB/Swiss-Prot have been
incorporated into the UniProt (Universal Protein Resource). a
one-stop shop allowing easy access to all publicly available
information about protein sequences.
PROSITE
 http://ca.expasy.org/prosite/
 a method of determining what is the function
of uncharacterized proteins translated from
genomic or cDNA sequences.


a database of biologically significant sites
patterns formulated in such a way that with
appropriate computational tools it can rapidly
and reliably identify to which known family of
protein (if any) the new sequence belongs.
PDB
 http://www.rcsb.org/pdb/
 The single international repository for public data on
the 3-dimensional structures of biological
macromolecules
 Is established by the Brookhaven National Lab of
United States
 The contents are primarily experimental data derived
from X-ray crystallography and NMR experiments
 Rasmol may demonstrate 3D structure of the
biological macromolecule according to the PDB
document