Download Click Here to download this tutorial as a PDF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

QPNC-PAGE wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Western blot wikipedia , lookup

Protein adsorption wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein folding wikipedia , lookup

Protein domain wikipedia , lookup

Cyclol wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

List of types of proteins wikipedia , lookup

Biochemistry wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Transcript
Finding Protein and Molecular Structures
Part of the Jmol Training Guide from the MSOE Center for BioMolecular Modeling
Interactive version available at http://cbm.msoe.edu/teachingResources/jmol/jmolTraining/structures.html
Introduction
In order to view a protein or molecule using Jmol, or any molecular visualization program, you need to have a
3-dimensional structure file. These files contain the (X, Y, Z) coordinates for the atoms that make up a
structure, along with information about each atom.
These files can vary dramatically in both size and internal format, depending on how large the structure is and
how the structure file was created. The most common molecular structure file formats that you will be using
with Jmol are Protein Databank (.pdb) files and MDL Molfile (.mol) files.
Types of Structure Files
Protein Databank (.pdb) Files
The protein databank (.pdb) file format is curated and annotated by
the RCSB Protein Databank (www.pdb.org). The RCSB PDB is an
international database that contains archive-information about the
3D shapes of proteins, nucleic acids, and complex assemblies that
helps students and researchers understand all aspects of micro
biology. The RCSB Protein Databank has also created tools and resources for research and education in
molecular biology, structural biology, computational biology, and beyond.
The RCSB Protein Databank is the primary source for large protein structure files and will be discussed in more
detail before.
MDL Molfile (.mol) Files
The MDL Molfile (.mol) file format was originally designed as part of the Chemical
MIME Project by Henry Rzepa. It is similar to .pdb files in that it contains the 3dimensional locations of atoms in a molecular structure. However, unlike .pdb files,
.mol files are often used for smaller structures such as ligands, drugs and sugars.
There are a large number of .mol file sources including ChemSpider, Drug Bank and
the NIH Cactus Server. Many chemical drawing programs such as ChemDraw and
ChemDoodle export .mol files for viewing created structures in 3-dimensional visualization programs.
Inside a Structure File
Once a structure has been
determined, each atom in the
structure is assigned an (X, Y, Z)
coordinate to mark its location in 3dimensional space. Additional
information compliments these
basic coordinates including the
type of atom at each location, the
chain and the residue the atom is
part of. Some structure files
contain additional information such
as resolution data, temperature
numbers, electrostatic potential
data and more.
The image to the right shows a short bit of code from inside of a structure file.
For more information on structure files and how they are determined, visit these RCSB Protein Databank
resources:

Understanding PDB Data
http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/intro.html

Methods for Determining Atomic Structures
http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-atStructures/methods.html
The RCSB Protein Databank
The RCSB Protein Databank (http://www.pdb.org) is the largest worldwide repository for the processing and
distribution of .pdb file structure data of large molecules of proteins and nucleic acids.
There now well over 100,000 structure files available on the www.pdb.org website!
Finding Structures on the Protein Databank
Each structure hosted on the Protein Databank has a unique four character long alpha-numeric identifier,
referred to as the structure's PDB ID.
Often more than one .pdb file will exist for a specific type of protein. For example, there are hundreds of .pdb
file entries for the relatively common protein Hemoglobin. It is often a good idea to use specific information
about a structure listed below to help determine if you have found the best possible file.

Who are the authors of the PDB file?

In which journal was the primary citation published?

On what date was the file deposited into the PDB?

How many chains are in this file?

Are there any heterologous groups within this PDB file? If so, which ones?

From what source was this molecule isolated?
The Structure Summary Page
When you click on a specific PDB ID, you will initially see the Structure Summary page for the structure. This
page includes a variety of useful information about the structure.

Structure Preview Image - Provides a quick overview of what the molecule or protein looks like.

Structure ID Number - This 4 letter/number ID is a unique identifier that is assigned to the crystal data
file upon deposition into the database.

Source of the Molecule - From which species was the molecule isolated, such as human, bacterium,
virus, mouse, etc..

Title - Title of the .pdb file

Authors - These are the researchers who were involved with the crystallization of the molecule. The
senior author or principal investigator is usually the last author in science publications.

Primary Citation - The journal article that accompanies the .pdb file. This is usually an excellent
research resource for understanding the function of the molecule.

Molecular Description – The abstract associated with the primary citation.

Chemical Component - This will tell you the number of chains within the molecule and the chain
identity. For example, in the hemoglobin file 1a3n.pdb, the chains A and C are the alpha-globin
molecules and chains B and D are the beta-globin molecules.
This section also tells you if there are any heterologous groups that were crystallized with the
molecule. Not all .pdb files will have this section.
o
The 2-3 letter identifier used to designate the chemical components contained within the file
listed are recognized by Jmol and can be used to select these molecules with the Jmol Console.
o
For example, if this section stated that there was NAG (N-acetyl-glucosamine) contained within
the molecule, RasMol would recognize “NAG” and you could therefore “select NAG” and
RasMol would be able to select the atoms within that chemical component of the PDB file.

Method of Structure Determination - The method that was used to obtain the structural data (NMR,
X-ray diffraction).

Resolution - How accurate the data is; the smaller the number, the better the data.
The View in 3D Window
The View in 3D Window will also let you preview the structure using a webembedded online Jmol. To view this preview, simply click the "View in 3D: JSmol"
button that is located directly below the molecule image on each Structure
Summary Page.
The Sequence Page
Just above the .pdb file Title should be a series of tabs, the fourth of which is the Sequence tab. This section of
the .pdb file page provides specific sequence information as well as secondary structure information about the
molecule. You can identify the alpha helices or beta sheets as well as the amino/carboxyl termini, which are
the first and last amino acids of the protein.
The Two Ways to Obtain a .pdb Structure
One of the key features of the Protein Data Bank is the ability to search the database for files. You can search
for a unique structure if you know its PDB ID, or by using key words and authors. To submit a search query,
enter these terms in the search box located near the top center of every www.pdb.org page. After you have
entered the search terms in the field, hit enter or click on the "Go" button to the right of the search field.
There are two ways to obtain a .pdb file:
1. Download the File from the RCSB Protein Databank website.
a. Go to the website http://www.pdb.org
b. In the top right corner of the website is a search bar similar to the image below. Type in the four
number/letter file name, in this case we are looking for "1qys", and click the "Search" button.
c. This should bring you to the page for "1qys.pdb – Top 7". Just below the search box on the right should
be a list of four options. Click "Download Files" and you will see an expanded menu similar to the
image shown below.
d. Click "PDB Format" to begin the download of the .pdb file containing the coordinates for Top 7. This
file, named "1qys.pdb", can be saved to the location of your choosing on your computer.
Note that is a good idea to create a new folder for each molecule you work on to organize all of your
.pdb files, images, and other related work.
2. Dynamically Load the File from the RCSB Protein Databank Server.
As long as you have an Internet connection, Jmol allows you to dynamically connect to the RCSB Protein
Databank and load a structure without downloading it permanently to your computer. You will, however,
need to know the four character alpha-numeric PDB ID for the structure you are looking for.
To load the structure file 1qys.pdb:
load=1qys
Note that you do not need to add the file extension (.pdb) when entering this command; just the four
character alpha-numeric PDB ID is needed. You do, however, need to include the equal sign "=" with no spaces
between it and the name of the .pdb file. This equal sign tells Jmol that you want to access the RCSB Protein
Databank servers to find the structure, rather than finding a file locally on your computer.
Additional Resources from the RCSB Protein Databank
The RCSB Protein Databank has several regularly updated features as well as some interesting interviews and
newsletters that may be useful for any Jmol designer.

The Molecule of the Month by David S. Goodsell provides an introduction to the structure and
function of a molecule, a discussion of its relevance to health and disease, interactive views, discussion
topics, and links to related entries. This monthly feature has been around for a while, so the collection
of proteins covered is quite extensive! This is also an excellent source for good .pdb file suggestions.
http://www.rcsb.org/pdb/motm.do

The PDB Newsletter is a quarterly publication that highlights new features and programs supported by
the RCSB Protein Databank.
http://www.rcsb.org/pdb/static.do?p=general_information/news_publications/newsletters/newslette
r.html

PDB-101 is an excellent source for various educational resources produced by the RCSB Protein
Databank, including animations, videos, posters, and other useful teaching tools.
http://www.rcsb.org/pdb/101/structural_view_of_biology.do
The NIH Cactus Databank
The NIH (National Institute of Health) Cactus (CADD Group Chemoinformatics Tools and User Services)
Database is a public website with several powerful chemoinformatics tools that can provide structures, data,
and tools to help explore molecular structures. Most of the tools on the NIH Cactus Database focus on small
molecules and use the (.mol) file format.

You can access the NIH CACTUS home page at
http://cactus.nci.nih.gov/index.html

You can search for MDL Molfile (.mol) structures at
http://cactus.nci.nih.gov/ncidb2.2/

You can draw custom chemical structures and export hem as MDL Molfile (.mol) structures at
http://cactus.nci.nih.gov/cgi-bin/lookup/search
Dynamically Connecting to the NIH Cactus Server
Like .pdb files, small molecule structures from the NIH Cactus Server can be loaded into Jmol dynamically
without downloading it permanently to your computer. As long as you have an Internet connection, you can
load a specific small molecule directly from Jmol.
To load the small molecule aspirin:
load$aspirin
Note that you need to include the dollar sign "$" with no spaces between it and the name of the small
molecule. This dollar sign tells Jmol that you want to access the NIH Cactus servers to find the structure, rather
than finding a file locally on your computer.
SMILES Sequences
While almost every molecular structure you can think of will be identifiable by name when loading a structure
dynamically from the NIH Cactus database, you may occasionally come across a structure that the database
does not know. For these situations, we suggest you try to find a SMILES (Simplified Molecular Input Line
Entry Specification) sequence.
SMILES Sequences are a line notation for molecules that include connectivity between the specific atoms in a
structure but do not include 2D or 3D coordinates. Atoms are represented by their element symbols (C, N, O,
P, Cl, Br, etc.). The equals sign "=" represents double bonds and the pound sign "#" represents triple bonds.
Branching is indicated by brackets "()" and rings are indicated by pairs of digits. A few examples are shown
below.

Aspirin - O=C(Oc1ccccc1C(=O)O)C

Glucose - OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O

Dopamine - c1cc(c(cc1CCN)O)O
Jmol can use a SMILES sequence and connect to the NIH Cactus database to turn it into a 3-dimensional
structure.
To load the SMILES sequence for glucose:
load$OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O
Note that like loading a small molecule by name, you need to include the dollar sign "$" with no spaces
between it and the name. This dollar sign tells Jmol that you want to access the NIH Cactus servers to convert
the structure from a SMILES sequence to a 3-dimensional structure.
SMILES Sequences can be found from a variety of online drug and small molecule databases, including the
following websites.

Wikipedia actually include a SMILES sequence along the right hand column for almost all small
molecule entries.
https://www.wikipedia.org/

Drug Bank has a huge variety of resources for drugs of all kinds, including SMILES sequences for each
entry.
http://www.drugbank.ca/

ChemSpider is a free chemical structure database providing fast text and structure search access to
over 34 million structures from hundreds of data sources
http://www.chemspider.com/