Download Molecular Biology Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression profiling wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Magnesium transporter wikipedia , lookup

Non-coding DNA wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genome evolution wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene expression wikipedia , lookup

Western blot wikipedia , lookup

Protein adsorption wikipedia , lookup

Proteolysis wikipedia , lookup

Point mutation wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
An Introduction to Bioinformatics
Molecular Biology Databases
AIMS
To introduce the major databases
- nucleotide
- protein
To explain how to search the appropriate databases
To explain how to retrieve information from databases
OBJECTIVES
Choose appropriate databases for information retrieval
Use of Boolean operators to search databases
Retrieve nucleotide and protein sequence files
Introduction
• Hundreds!
• Databases of databases!
• Acronym rich!
• Subcomponents
• organisms
• structure
• metabolism…….
• Searched
• text, sequences
Historically
• 1960s
•Mary Dayhoff - Protein Sequences
(Eck, R. V., and M. O. Dayhoff. 1966. Atlas of Protein Sequence and Structure 1966.
National Biomedical Research Foundation, Silver Spring, Maryland.)
• 1980s - explosion in DNA sequences
• EMBL (European Molecular Biology Laboratory)
• NIH (National Institute of Health) Genbank
• DDBJ (DNA database of Japan)
• 1988
• agreed on international collaboration
Primary Databases
• Experimentally determined nucleotide sequence,
• Inferred protein sequence
–
–
–
–
EMBL, GenBank, DDBJ
GenPept
PIR Protein Identification Resource
SWISS-PROT
• Which to choose?
nucleotides
}
proteins
Composite Databases
SWISS-PROT + PIR
+ GenPept +
SWISS-PROT, Swissnew, Trembl,
Tremblnew, Genbank, PIR,
Wormpep and PDB
Secondary Databases
• Analytical results of primary databases
• Searching for related patterns
– Prosite
– Pfam
More on these later
Sub-Databases
• EST - Expressed Sequence Tags
• STS - Sequence Tagged Sites
• SNP - Single Nucleotide Polymorphisms
• OMIM - Online Medelian Inheritance in Man
Searching and Retrieval
• Entrez
- National Center for Biotechnology Information
• SRS
- European Bioinformatics Institute
• DBGET
- Japan’s GenomeNet.
Capable of retrieving specific nucleotide or protein sequence.
Provide links to additional related information.
Entrez
Entrez Tutorial
Q/ Are there any genes that code for penicillin binding in the
Mycobacterium genome?
•
•
•
•
Search for penicillin-binding genes
Search for Mycobacterium tuberculosis
Combine the searches
Scan the output
Example of a text based search to identify genes that have
already been annotated.
#1 AND #2
SRS guide
Searching the Databases
• Subject
• Accession Numbers
• Author
e.g. AF208262
Boolean Operators
AND will locate all records containing both the words e.g. human
AND protease
OR will locate all records containing either word not necessarily
both e.g. human OR protease)
NOT will locate records containing one word, but NOT the other
word e.g. human NOT protease