Download Project 3: Visualizing Three-Dimensional Protein Structures Using

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metalloprotein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Structural alignment wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
Bioinformatics Exercises
Over the last two decades, information has been gaining increasing importance in both
teaching and learning biochemistry. The most obvious case is the sequencing of the
human genome and many other complete genomes. In 1990, the determination of the
sequence of a protein was often the topic of a full publication in a peer-reviewed journal
such as Science, Nature, or The Journal of Biological Chemistry. Now entire genomes
are the topic of individual research papers. The term "bioinformatics" is a catch-all
phrase which generally refers to the use computers and computer science approaches to
the study of biological systems. The main chapters where this information is discussed
in the text are chapters 3 (Nucleotides, Nucleic Acids and Genetic Information), 5
(Proteins: Primary Structure), 6 (Proteins: Three-Dimensional Structure), 12 (Enzyme
Kinetics, Inhibition and Regulation) and 13 (Introduction to Metabolism). Here we
provide exercises appropriate to these chapters aimed at introducing the techniques of
bioinformatics that involve the use of computers, Internet-accessible databases and the
tools that have been developed to “mine” those databases.
General principles
Open ended questions. The exercises may include some questions that have definite
answers, but in many cases there will also be questions which may be answered in a
number of ways, depending on the approach you take or the topic you select.
2. Stable Internet Resources. As much as possible, the exercises will be based on well
established, stable web sites. If it is necessary to use less reliable sites and/or
resources, attempts have been made to provide multiple sites that perform similar
functions.
3. Here are the stable online resources that will be used most frequently:
1.
a.
b.
c.
d.
e.
f.
g.
h.
i.
Genbank (http://www.ncbi.nlm.nih.gov/)
Protein Data Bank (http://www.rcsb.org)
Expasy Proteomics Server (http://us.expasy.org/)
European Bioinformatics Institute (http://www.ebi.ac.uk/)
Pfam (http://www.sanger.ac.uk/Software/Pfam/)
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)
PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
PubMed Central (http://www.pubmedcentral.nih.gov/)
Answer key. Where a definite answer is known, it will be provided in an answer key.
For more open-ended questions, a typical correct answer will be presented.
5. Historical perspective. If historical resources are available online (including
PubMed), there may be questions designed to help students identify some of the
historical roots of biochemistry and molecular biology.
4.
Project 3: Visualizing Three-Dimensional Protein Structures Using the Molecular
Visualization Programs Jmol and PyMOL
There are a number of good free visualization tools available on the Internet. Each has
strengths and weakness. You will have two options: Jmol and PyMOL. Jmol is written in
Java, has a nice user interface and uses the command set that will be quite familiar to
users of RasMol or Chime. PyMOL is written in Python; the standard PyMOL user
interface can be quite challenging to use, but plug-ins are available to increase ease-ofuse. Many users consider PyMOL’s graphics capabilities compelling and worth the
challenge of using the program interface.
Project 3 Exercises with Jmol
Jmol can be used in two different formats – as an applet built into web pages, or as a
standalone application. We will be using it as a standalone application. Jmol is a java
based application and therefore requires that you have Java Virtual Machine (JVM)
installed on your computer. It is frequently installed on computers before purchase, but
you can also find it at http://www.java.com/en/download/index.jsp.
Downloading and installing Jmol. The Jmol wiki
(http://wiki.jmol.org/index.php/Main_Page) has a terrific instruction page about the
Jmol application (http://wiki.jmol.org/index.php/Jmol_Application).
Windows. These steps should work on most computers. If you have difficulty, please go
to the Jmol wiki and search for more instructions there.
Download the latest stable release (not a pre-release) from
http://sourceforge.net/projects/jmol/files/ in a zip format. Zip is a compressed
file format that can be opened by the operating system in Windows XP or Windows
Vista.
2. Create a folder for Jmol. Suggestion: c:\Program Files\Jmol.
3. View the compressed zip file from Windows Explorer. Extract only the jmol.jar file
to the c:\Program Files\Jmol folder.
4. With the c:\Program Files\Jmol folder open, right click on the icon for Jmol.jar and
select “Create shortcut”. Drag the shortcut to your desktop or your taskbar. You
will now have access to Jmol from your desktop.
1.
Macintosh. These instructions are taken directly from the Jmol wiki
(http://wiki.jmol.org/index.php/Jmol_Application#Installing_Jmol_Application).
Download the Jmol package (either .zip or tar.gz format) and extract/uncompress
only the Jmol.jar file to the folder of your choice.
2. Simply double click on the Jmol.jar file to open Jmol.
1.
As you go through the exercises below, you are encouraged to return to the Jmol wiki
(http://wiki.jmol.org/index.php/Main_Page) for instructions and links to useful
information about using Jmol.
1.
Obtaining Structural Information. Please review the materials in your textbook
about secondary structure of proteins. Secondary structures include alpha helices,
beta sheets and beta turns in proteins. Many programs have been written that will
predict secondary structures that will be found in a protein, based only on the
primary sequence. Let's start again with rabbit muscle triose phosphate
isomerase. Here is the primary sequence:
>gi|136066|sp|P00939|TPIS_RABIT Triosephosphate isomerase (TIM)
(Triose-phosphate isomerase)
APSRKFFVGGNWKMNGRKKNLGELITTLNAAKVPADTEVVCAPPTAYIDFARQKLDPKIAVAAQNCYKV
TNGAFTGEISPGMIKDCGATWVVLGHSERRHVFGESDELIGQKVAHALSEGLGVIACIGEKLDEREAGI
TEKVVFEQTKVIADNVKDWSKVVLAYEPVWAIGTGKTATPQQAQEVHEKLRGWLKSNVSDAVAQSTRII
YGGSVTGATCKELASQPDVDGFLVGGASLKPEFVDIINAKQ
a.
There are a number of web servers that will predict secondary structure
based on the primary sequence of a protein. Here is a list (in case one or
more is not working on a given day). If all fail because their web addresses
have changed, a Google search for “protein secondary structure prediction”
should be successful.
i.
PredictProtein (http://www.predictprotein.org/). To start, you will
need to create an account on this site. You can actually request this
site to predict secondary structure from 7 different web servers on
line. If this site is available, it will enable you to complete this
assignment by clicking on 2 or more of the optional services. Please
note that results may take one or two days.
ii.
JPred (http://www.compbio.dundee.ac.uk/www-jpred/). Click on the
advanced link to the right of the sequence box. If you use the JPred
server, be certain to check the box labeled “Skip searching PDB before
prediction”.
Submit the rabbit muscle triose phosphate isomerase sequence to
these two servers. Compare the results you receive from the different
servers. Can you identify segments where the predictions are not
consistent between servers?
b.
The structure of rabbit muscle triose phosphate isomerase has been
determined by X-ray crystallography. Please go to the Protein Data Bank
web server (http://www.rcsb.org/pdb/home/home.do) and search for 1R2R
(that is the PDB ID for this protein). To do so, go to the blue band at the
top of the page and select “PDB ID or Text”, enter 1R2R in the box, and click
on “Search”. The page that comes up contains several tabs: Summary,
Sequence, Derived Data, Seq. Similarity, 3D Similarity, Literature, Biol. &
Chem., Methods, Geometry, and Links. The page normally opens to the
Summary tab. Click on Sequence tab. The results shown here for the
secondary structure are from an analysis of the actual 3D structure (not a
prediction), which has been calculated according to an implementation of the
method of Kabsch and Sander (1983) Biopolymers 22, 2577-2637. The
assignments are: H=helix; B=residue in isolated beta bridge; E=extended
beta strand; G=310 helix; I=pi helix; T=hydrogen bonded turn; S=bend.
Compare your predicted results with the results presented on the PDB site.
c.
As a first attempt at molecular visualization, please return to the Summary
tab and follow the links on the PDB site for "Download File." You can
download the file in a number of formats. It is best to download the file in
“PDB file (text)” format for use with Jmol. Save the structure file as
1R2R.pdb on your computer (suggested folder: My Documents/PDB Files).
Open the Jmol program. Then use the drop down menu: File..open to open
1R2R.pdb. You will initially see a cartoon model which represents helices as
magenta corkscrews, sheets as yellow arrows and waters as small red
spheres. To rotate the image, hold down the (left) mouse button while
dragging the mouse over the image. You can control the view in Jmol in
three different ways: dropdown menus, right-click menus and scripting.
Perform the following steps to clean up the image a bit using the dropdown
and right-click menus (for a one-button mouse, use CNTRL-click).
i. Dropdown: Display..Select..Water
ii. Dropdown: Display..Atom..None
iii. Dropdown: Display..Select..Hetero
Now you should be able to see the alpha helix and beta sheet
structures in rabbit muscle triose phosphate isomerase without the red
water spheres. Take some time to experiment with the other dropdown menu options on Jmol.
In addition to dropdown and right-click menus, Jmol also has a Script
Console window that enables you to select specific atoms or parts of a
structure (amino acid residues for example), then change the way they
appear. To open the Jmol Script Console, select File..Console.. from
the Jmol Dropdown menu. Then enter these commands at the $
prompt.
select hetero and not water (selects non-protein parts of the structure
excluding water)
v. spacefill (a van der Waals radius representation)
vi. color CPK (standard chemistry color scheme)
iv.
select protein
cartoon off
wireframe 30
spacefill 100 (These combined commands yield a ball-and-stick
structure of the protein.)
xi. zoom 200 (This gives a 2X expansion of the view. You can also zoom
in on the structure in the viewing window by holding down the Shift
key on your keyboard while using a left-mouse click-and-drag from the
top to the bottom of the window. Experiment with this.)
xii. Now convert the protein back to a cartoon with the following four
commands
1. Select protein
2. Wireframe off
3. Spacefill off
4. Cartoon
vii.
viii.
ix.
x.
2.
Exploring the Protein Data Bank. In the first problem, we visited the Protein
Data Bank (PDB). We will explore that site in more detail now. If you encounter
difficulties at any point in this exercise, you may be able to find your way using
the Search box on the main site page or the Help files (on the left side of the
page).
The PDB is a repository of macromolecular structures. Perhaps the most
important skill for a PDB site user is the ability to find the structures they are
seeking. On the home page (http://www.rcsb.org/pdb/home/home.do), the Help
menu on the left side of the page includes Video Tutorials. These Flash animations
will instruct you on navigating the site, searching for proteins and using the tools
and viewers on the site.
Structures in the PDB are assigned PDB IDs - 4 letter alphanumeric codes that
uniquely identify each structure. So for example 4HHB is a hemoglobin structure
and 8GCH is a chymotrypsin structure. If you know the PDB ID, then you can use
that to search the PDB. You may ask - why would I know that code unless I was
the crystallographer who determined that structure? Most scientists who
determine macromolecular structures are highly motivated to publish their findings
in journals such as Science, Nature, Journal of Biological Chemistry, Journal of
Molecular Biology and Protein Science. These journals have an agreement with
the PDB that requires authors to submit their structures to the PDB before they
will publish the article in their journal. Also, the figures in the text showing
structures of proteins and nucleic acids list the corresponding PDB ID. For our first
PDB search, we're going to find a PDB ID in a journal article, then find that
structure on the PDB site.
Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search
for this paper using the QUICK SEARCH menu near the top of the page:
Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram,
Padmanabhan Balaram, and Mathur R. N. Murthy.
Structure of Plasmodium falciparum Triose-phosphate Isomerase-2Phosphoglycerate Complex at 1.1-Å Resolution. J. Biol. Chem. 2003 volume 278,
pages 52461-52470.
Download the article (Full text or PDF – it’s free). Go to the footnotes section and
find the four character PDB ID code. Then go to the Protein Data Bank main page.
Type the PDB ID in the search box and click the Search button. You should be
taken to the Structure Summary page for this enzyme. The Structure Summary
page contains links to many related resources. Try to do each of the following:
a. Download the PDB (structure) file for this protein to your computer.
Remember where you put it (suggested folder: My Documents/PDB Files;
suggested name: 1o5x.pdb). In problem 3, you're going to study this
structure using Jmol.
b. Download the protein sequence in FASTA format – click on Download Files
on the right hand side of the page and select FASTA sequence. Suggested
file name: 1o5x_FASTA.txt.
c. Find the still images of this protein on the 1o5x Summary page in the
Biological Assembly box. Click on the link to More Images…. To save an
image on the page that appears, just right click on it (CNTRL-click for a onebutton mouse) and select the option that lets you save the file (In Internet
Explorer, the command is "Save Picture As.."; in Firefox and Safari, the
command is "Save Image As..").
d. Return to the "Summary" page for 1o5x. Click on "Links" tab. Follow the
links for 1o5x to the sites at PDBSum and the IMB Jena Image Library.
Collect still images from each of these sites. Make sure you keep a record of
where you found each image.
3.
Examining Protein Structures. In Problem 2, you should have saved the PDB
file for 1o5x, entitled "Plasmodium Falciparum TIM Complexed To 2Phosphoglycerate." We're going to use Jmol to explore this structure. We'll be
particularly interested in identifying secondary structures and looking at the active
site.
a. You're going to expand on the Question 1c exercise. Open Jmol on your
computer. If you have not installed it already, please see the opening
paragraph for the exercises in this chapter.
b. Open the file 1o5x.pdb. When you first open it, you will see cartoon
representation of the structure with the waters shown as small red spheres.
Now it's time to explore the drop-down menus in Jmol. There are 7 dropdown menus in Jmol: File, Edit, Display, View, Tools, Macro, and Help.
Spend a few minutes trying each command in each of the menus. Here are
a few that are very helpful:
i. File..Export..Export Image enables you to export an image you have
created as an image in jpg format.
ii. Edit..Copy Image copies the image to memory. You can then paste the
same image into a word processor or presentation file.
iii. Display..Zoom allows you to enlarge or shrink your structure.
iv. Display..Axes brings the x, y and z axes into Jmol
v. View..Front brings the structure around to its original orientation. The
remaining options show the structure from different angles.
vi. HELP..User Guide. The User Guide includes instructions on a number of
features of Jmol. Explore the section called “Rasmol/Chime
commands” that contains many commands you can use in the Script
window of Jmol.
c.
When you open Jmol, a structure-viewing window appears on your
computer. To move beyond the drop-down menus, open the Script Console
again (File..Console). You can use this window to send very specific
commands to the structure viewing window. We'll use this to select the
small molecules that are bound to triose phosphate isomerase in this
structure: 3-hydroxypyruvic acid and 2-phosphoglyceric acid, but first you'll
need to learn a little bit about viewing a PDB file.
Go to the Structure Explorer page for 1o5x
(http://www.rcsb.org/pdb/home/home.do/explore/explore.do?structureId=1
O5X). Click on Display File..PDB File on the upper right side of the page.
This will bring up the PDB file, which contains a lot of information. We'll only
look at a few items.
Each line in a PDB file is called a "record" and the first 6 characters on that
line tell what kind of "record" it is. In your browser, search for SEQRES.
According to the PDB Format Description, "SEQRES records contain the
amino acid or nucleic acid sequence of residues in each chain of the
macromolecule that was studied." So you can see the sequence of your
protein there. For 1o5x, the SEQRES section looks like this:
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
1
2
3
4
5
6
7
8
9
A
A
A
A
A
A
A
A
A
248
248
248
248
248
248
248
248
248
MET
ASN
PHE
VAL
LYS
VAL
SER
ILE
THR
ALA
GLY
ASN
VAL
LEU
SER
ALA
ILE
ASP
ARG
THR
ASN
PHE
LEU
LYS
GLU
GLY
GLU
LYS
LEU
LEU
PRO
GLN
PHE
ILE
HIS
ASP
TYR
GLU
ASP
VAL
SER
GLY
ALA
PHE
VAL
PHE
SER
PHE
SER
LYS
ASN
LYS
GLU
ARG
VAL
ILE
ASP
VAL
PHE
GLY
ASP
ARG
GLU
ALA
LYS
PRO
HIS
SER
SER
LEU
ARG
LYS
ALA
SER
SER
TYR
THR
TYR
ASN
LYS
LEU
ASN
LEU
LYS
ASP
GLY
THR
ILE
TYR
GLN
TRP
THR
LEU
HIS
ILE
GLY
GLU
PHE
ALA
LYS
ASN
ASP
THR
GLN
GLU
TYR
HIS
SER
CYS
SER
VAL
ARG
ASN
VAL
VAL
GLU
LEU
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
LYS
LEU
THR
PHE
ILE
LEU
CYS
GLY
GLN
SER
MET
MET
ASN
PHE
VAL
LYS
VAL
SER
ILE
THR
LYS
LEU
THR
PHE
ILE
LEU
CYS
GLY
GLN
SER
MET
ASN
GLU
LYS
ASP
GLY
VAL
GLY
GLY
GLN
LEU
ASN
GLN
GLN
ASN
THR
HIS
GLU
SER
GLU
LYS
LEU
ARG
VAL
VAL
GLY
LYS
LYS
VAL
ASP
GLU
LYS
GLU
LYS
ILE
LYS
GLU
GLN
ASN
ILE
SER
ALA
GLN
ALA
LEU
THR
ILE
ALA
THR
ASP
PHE
VAL
ASN
PHE
VAL
ALA
ARG
ASN
GLU
GLY
VAL
VAL
LYS
VAL
TYR
THR
LYS
GLN
ASN
PHE
ASP
CYS
THR
ASP
GLU
PRO
ILE
ILE
CYS
LEU
ILE
PHE
ILE
LEU
PRO
GLU
VAL
ARG
SER
VAL
ILE
GLY
GLU
ILE
LEU
GLN
LYS
ILE
SER
GLY
LYS
GLU
VAL
ASP
TRP
ALA
ASP
LEU
LEU
ASN
SER
SER
ILE
ASN
ALA
GLN
THR
TYR
ILE
ALA
ALA
ALA
GLY
ASN
VAL
LEU
SER
ALA
ILE
ASP
ASN
GLU
LYS
ASP
GLY
VAL
GLY
GLY
GLN
LEU
ARG
THR
ASN
PHE
LEU
LYS
GLU
GLY
GLU
ASN
GLN
GLN
ASN
THR
HIS
GLU
SER
GLU
LYS
LYS
LEU
LEU
PRO
GLN
PHE
ILE
HIS
ASP
LEU
ARG
VAL
VAL
GLY
LYS
LYS
VAL
ASP
GLU
TYR
GLU
ASP
VAL
SER
GLY
ALA
PHE
VAL
LYS
GLU
LYS
ILE
LYS
GLU
GLN
ASN
ILE
SER
PHE
SER
PHE
SER
LYS
ASN
LYS
GLU
ARG
ALA
GLN
ALA
LEU
THR
ILE
ALA
THR
ASP
PHE
VAL
ILE
ASP
VAL
PHE
GLY
ASP
ARG
GLU
VAL
ASN
PHE
VAL
ALA
ARG
ASN
GLU
GLY
VAL
ALA
LYS
PRO
HIS
SER
SER
LEU
ARG
LYS
VAL
LYS
VAL
TYR
THR
LYS
GLN
ASN
PHE
ASP
ALA
SER
SER
TYR
THR
TYR
ASN
LYS
LEU
CYS
THR
ASP
GLU
PRO
ILE
ILE
CYS
LEU
ILE
ASN
LEU
LYS
ASP
GLY
THR
ILE
TYR
GLN
PHE
ILE
LEU
PRO
GLU
VAL
ARG
SER
VAL
ILE
TRP
THR
LEU
HIS
ILE
GLY
GLU
PHE
ALA
GLY
GLU
ILE
LEU
GLN
LYS
ILE
SER
GLY
LYS
LYS
ASN
ASP
THR
GLN
GLU
TYR
HIS
SER
GLU
VAL
ASP
TRP
ALA
ASP
LEU
LEU
ASN
SER
CYS
SER
VAL
ARG
ASN
VAL
VAL
GLU
LEU
SER
ILE
ASN
ALA
GLN
THR
TYR
ILE
ALA
ALA
Each line contains 13 amino acid residues, using the 3-letter abbreviations
for the amino acids. So residue #27 in chain A is PHE (phenylalanine). The
12th character in each record is a chain identifier. If a protein contains more
than one polypeptide chain, the chains will be identified with a letter (in this
case, there are two chains - A and B).
Anything in a PDB file that is not either protein or nucleic acid is considered
a heterogeneous atom and is referred to with the prefix "het". So HETNAM
is the label for a record that contains the name of a non-protein, non-nucleic
acid group. Search the HTML version of your structure file for "HETNAM".
What are the hetero groups in this structure? Now, let’s begin in Jmol by
cleaning up the cartoon view of the protein in 1o5x:
i.
ii.
Dropdown: Display..Select..Water
Dropdown: Display..Atom..None
We shall also display the heterogeneous groups in 1o5x. Go to the
Jmol Script Console (File..Console) and enter the command:
iii.
select hetero and not water
This command selects all the heterogeneous atoms excluding water.
To show the heterogeneous molecules, enter these two commands
consecutively:
iv.
v.
spacefill on
color CPK
This creates a spacefilling representation of the three heterogenous
molecules and colors them according to Corey-Pauling-Kultun (CPK;
carbon is gray, oxygen is red; phosphorous is orange). Note that the
phosphite ion and the 3-hydroxypyruvate molecule are in contact).
d.
As the last part of this exercise, we're going to show some of the active site
residues, based on Figure 4 of the primary citation for this structure
(Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram,
Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium
falciparum Triose-phosphate Isomerase-2-Phosphoglycerate Complex at 1.1Å Resolution. J. Biol. Chem., Vol. 278, Issue 52, 52461-52470, December
26, 2003). Figure 4a shows three residues that interact with the 2phosphoglycerate: glutamate 165, lysine 12 and histidine 95. Use this
command to select these residues:
select lys12,his95,glu165
Then use the right-click menu in the structure viewer window to show them
in a ball-and-stick format
right-click..Style..Scheme..Ball and Stick
Finally change them to CPK coloring so you can distinguish the atoms on the
structure
right-click..Color..Atoms..By Scheme..Element (CPK)
To get a better look at this interaction, you can zoom in on a structure by
pressing the Shift key and moving the mouse using a combination of the
Shift key and the mouse. To zoom in drag the mouse from top to bottom in
the structure window while holding down the Shift key. To move the image
side-to-side or up-and-down, hold down Ctrl-key, then use the right-click on
your mouse and drag the image where you want to go (for a one-button
mouse, hold down the Shift key, double-click in the Jmol window – keep the
mouse button depressed after the 2nd click - and drag the image to the
desired position). Using a combination of Shift-mouse and Ctrl-right-click,
you can get a closeup view of the binding site for 2-phosphoglycerate.
Advanced: You may have noticed that there are actually two representations
of glu-165 in one of the active sites in 1o5x in the closeup view. Why are
there two glu-165’s in this structure?
To complete this exercise, identify and display additional residues that
interact with the 2-phosphoglycerate in 1o5x. A couple hints: Use Figure 3
and Figure 4 from the primary citation. Also, for some reason, Jmol won't
select 3-hydroxypyruvate using the "select 3PY", but you can select it using
"select 3300", which is a second way 3PY is identified in 1o5x.
4.
Protein Families. Now we will explore protein families. The goal of this exercise
is to identify a protein that shares structural homology with triose phosphate
isomerase (as seen in PDB ID 1o5x), but catalyzes a different reaction. We will
start with two resources - CATH and SCOP.
a.
CATH. You can start at the CATH homepage (http://www.cathdb.info/).
Enter 1o5x in the Search box at the top of the page. The results will be
presented in a page with 6 tabs.
i. The first tab is the search tab. You can return here is you wish to
perform another search or to search using a FASTA sequence.
ii. The Results Summary tab provides links to each of the four levels of
the CATH hierarchy for this structure. These four levels of hierarchy
are also represented in the four remaining tabs.
iii. The Cathnodes tab gives you access to the classification lineage for
triose phosphate isomerase, along with a list of related structures. The
tabs on this page provide links to members of the same homologous
superfamily, along with alignments and structural neighbors. What is
the Cathnodes class for 1o5x?
iv. The CATH Domain tab describes structural neighbors of the query
protein. This is a good place to find enzymes from the same
superfamily that catalyze different reactions.
v. The CATH Chains tab simply lists the chains from the query structure
along with linked keywords.
vi. The CATH Pdbs tab lists the PDB ID, along with linked keywords.
Click on the Cathnodes class link for 1o5x. What is its Homologous
Superfamily and classification lineage?
Scroll down the same page that gave you the Homologous Superfamily
and Classification Lineage to find the list of Non-Redundant
Representatives. Explore the page to find 5 enzymes in this
superfamily that catalyze reactions different from the reaction
catalyzed by triose phosphate isomerase. Links to Representative
Domains will lead you to lists of candidate proteins. The first four
characters of the match are the PDB id for the structure (e.g.,
1wa3D00 points to PDB ID 1wa3). Compare the function of these
proteins using the links on the CATH site. Return to the Cathnodes
page (which contained the list of Non-Redundant Representatives) and
follow the links for Alignments.
b.
SCOP. Go to the SCOP homepage (http://scop.mrc-lmb.cam.ac.uk/scop/).
Click the link for “Keyword search of SCOP entries”. Enter “triosephosphate”
as one word without the quote marks to search to triose phosphate
isomerase.
SCOP provides a Lineage for each protein that is classified. Follow the
lineage links at the Fold level to identify 5 proteins that are related to triose
phosphate isomerase. Are any of these proteins also in your list from your
search of the CATH database?
c.
The final part of this exercise is to identify other resources that help you find
proteins related to triose phosphate isomerase from Plasmodium falciparum.
You are encouraged to follow other links from the PDB External Links page
for 1o5x, but you may also be able to find other resources by searching the
Internet using the PDB ID codes. List and summarize three other resources
that you find.
Project 3 Exercises with PyMOL
Introduction to PyMOL
PyMOL is a molecular modeling program that generates high-quality molecular images
and animations for presentations and publications. PyMOL's command line interface
allows users to input commands which directly alter the appearance of their molecule in
the viewing window. While PyMOL is highly capable on its own, it also supports a series
of plug-ins that enhance the user experience with PyMOL. Included in this list is
ConSCRIPT, a plug-in developed by students at the Rochester Institute of Technology,
that enables users who are familiar with RasMol or Jmol to use those commands to
generate images in PyMOL.
PyMOL can be downloaded in a “for educational use only” form from the PyMOL website
found at http://pymol.org/educational/. Before you can download PyMOL, you will need to
register with Schrodinger, who will then send you a confirming email and provide you
with a login and password that will allow you to download PyMOL. From that page, you
simply download and install PyMOL for Windows, Macintosh or Linux to the default
location on your system.
Note to Macintosh users: To use plug-ins for PyMOL on the Mac, it needs to operate in
hybrid X11 mode. To enable this, you simply change the name of the application from
MacPyMOL to PyMOLX11Hybrid.
Installing ConSCRIPT. The following documentation is taken directly from the ConSCRIPT
readme file.
1. For all operating systems: Download ConSCRIPT. ConSCRIPT can be downloaded
in a compressed file format from
http://sourceforge.net/projects/sbevsl/files in the ConSCRIPT folder.
It is important to note that clicking on the prominent green download arrow on
this page (or the main SBEVSL page on SourceForge.net) may lead you to another
file that is part of the SBEVSL project. You will need to open the ConSCRIPT folder
on http://sourceforge.net/projects/sbevsl/files by clicking on the green triangle to
the left of the ConSCRIPT folder. Similarly, you should open the latest ConSCRIPT
folder (currently ConSCRIPT-2.1) and download the most recent .tar.gz or .zip file.
You can choose to download either a .tar.gz compressed file (ConSCRIPT2.1rc1.tar.gz as of August 3, 2010) or a zip compressed file (ConSCRIPT2.1rc1.zip as of August 3, 2010). The way you expand the file depends on your
operating system.
2. Unix or Linux
a. Expand the compressed file. In unix or linux systems, or using MINGW under
Windows you may unpack the tarball with
gunzip < ConSCRIPT-2.1rc1.tar.gz | tar xvf –
b. Install ConSCRIPT. The simplest way to install ConSCRIPT is by using the
Plugin->Manage Plugins->Install... menu
to install CONSCRIPT.py
To use that menu item, you need the same administrative rights as were
needed to install PyMOL, itself. Under Unix or Linux, you may need to have
root access (e.g. via sudo).
Alternatively, you may copy ConSCRIPT.py into the correct place in the
directory tree used by PyMOL for plugins. For example, for the default
install of PyMOL 1.2 under Ubuntu Linux, the ConSCRIPT.py file should be
copied to
/usr/lib/pymodules/python2.6/pmg_tk/startup
If you have installed PyMOL to another location on your system, you will
need to find the portion of the pymol installation tree that contains
modules/pmg_tk/startup
Once you have completed the installation, launch PyMOL and look for
ConSCRIPT in the Plugin menu. When all files are in place, the ConScript
script interface can be run by choosing ConSCRIPT from the PyMOL plug-in
menu. Alternatively, prefixing RasMOL commands with one of
"R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL command line
will convert the RasMOL command to PyMOL.
3. Windows
a. Expand the compressed file.
Windows systems may have a native application that will expand your
.tar.gz or .zip files; if not you may wish to consider using Stuffit Expander or
WinZip.
b. Install ConSCRIPT
The simplest way to install ConSCRIPT is by using the
Plugin->Manage Plugins->Install... menu
to install CONSCRIPT.py
To use that menu item, you need the same administrative rights as were
needed to install PyMOL, itself. It is probably simplest if you install as an
administrator or ask the administrator for your system to install it for you.
Alternatively, you may copy ConSCRIPT.py into the correct place in the
directory tree used by PyMOL for plugins. For example, for the default
install of PyMOL 1.3 under Windows, the ConSCRIPT.py file should be copied
to
c:\Program Files\PyMOL\PyMOL\modules\pmg_tk\startup
If you have installed PyMOL to another location on your system, you will
need to find the portion of the PyMOL installation tree that contains
modules/pmg_tk/startup
Once you have completed the installation, launch PyMOL and look for
ConSCRIPT in the Plugin menu. When all files are in place, the ConScript
script interface can be run by choosing ConSCRIPT from the PyMOL plug-in
menu. Alternatively, prefixing RasMOL commands with one of
"R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL commandline
will convert the RasMOL command to PyMOL.
4. Macintosh
a. Rename your PyMOL application
Under Macintosh OS X you need to rename the MacPymol application to
MacPymolX11Hybrid, in order to get access to plugins.
b. Expand the compressed file.
Macintosh systems may have a native application that will expand your
.tar.gz or .zip files; if not you may wish to consider using Stuffit Expander.
c. Install ConSCRIPT.
Then, the simplest way to install ConSCRIPT is by using the
Plugin->Manage Plugins->Install... menu
to install CONSCRIPT.py
Alternatively, you may copy ConSCRIPT.py into the correct place in the
directory tree used by PyMOL for plugins. This can be a bit tricky on the
Mac, so follow these instructions closely.
i. Open Finder->Applications
ii. If you have a three button mouse, right click on MacPyMOLX11Hybrid
and select Show Package Contents. This will take you to the directory
tree for PyMOL. If you don’t have a three button mouse, Ctrl-click
should bring up the option to Show Package Contents
iii. Traverse the tree to pymol/modules/pmg_tk/startup
iv. Copy ConSCRIPT.py to the startup folder.
Once you have completed the installation, launch PyMOL and look for
ConSCRIPT in the Plugin menu. When all files are in place, the ConScript
script interface can be run by choosing ConSCRIPT from the PyMOL plug-in
menu. Alternatively, prefixing RasMOL commands with one of
"R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL command line
will convert the RasMOL command to PyMOL.
Let’s start with a brief introduction to the PyMOL interface. When you open PyMOL, two
windows appear; one is labeled "PyMOL Tcl/Tk GUI" (or “The PyMOL Molecular Graphics
System”) and the other is labeled "PyMOL Viewer" (or “MacPyMOL”). The PyMOL Tcl/Tk
GUI is the external graphical user interface that serves as a command line to interact
with PyMOL as well as to provide you with output from the program. The PyMOL Viewer
contains the internal GUI and the viewing window. The internal GUI serves as a list of
objects within PyMOL while the viewing window is where your molecule will be displayed.
Here are a few simple PyMOL commands that will help you as you go through these
exercises.
1. select selection-name, selection-expression: The select command is very useful
when you want to modify a subset of atoms. The selection-name is the name you
want to assign to the selection so that you can refer to that selection in the future
when you want to make modifications. The selection-expression is a reference to
the residues in the molecule that you want to select. Each residue can be identified
with either a unique name or number that is found in the ATOM (for amino acids)
and HET (for hetero atom) records of the PDB file. You can also find the residue
name/number by clicking the residue in the PyMOL viewing window and looking at
the output in the command prompt of the PyMOL Tcl/Tk GUI window. The selection
expression uses the 'resi' command to reference residues by number and the
'resn' command to refer to residues by name. They can be used jointly to specify a
specific residue. For example, let's say that a critical residue in the active site of a
molecule is Histidine 95. There are a few ways to select this residue:
a. select activeSite, resi 95 and resn his
b. select activeSite, resi 95
c. We cannot simply type select activeSite, resn his because that will select all
the histidine residues in the structure.
2. show/hide representation, selection-name: The show and hide commands can be
used to alter the way the molecule appears in the viewing window. Users can
choose from cartoon, spheres, sticks, surface, and others to view their molecule.
So let's say we want to make a previously selected active site appear as spheres:
a. show spheres, activeSite
b. Alternatively the spheres can be hidden by typing:
hide spheres, activeSite
3. color color-name, selection: The color command can be used to alter the color of a
selection. Let's change the color of our active site to red.
a. color red, activeSite
4. Keep in mind that if you have ConSCRIPT installed with PyMOL, you can also use
RasMol or Jmol commands, which ConSCRIPT will then translate into commands
that are executable by PyMOL. To activate the ConSCRIPT translator, you simply
enter RasMOL commands preceded with one of "R,r,VSL,vsl,RASMOL or rasmol"
typed directly into the PyMOL command line.
For more information on selecting residues and PyMOL commands see the PyMOL
community wiki (http://pymolwiki.org/index.php/Main_Page).
1. Obtaining Structural Information. Please review the materials in your textbook
about secondary structure of proteins. Secondary structures include alpha helices,
beta sheets and beta turns in proteins. Many programs have been written that will
predict secondary structures that will be found in a protein, based only on the
primary sequence. Let's start again with rabbit muscle triose phosphate
isomerase. Here is the primary sequence:
>gi|136066|sp|P00939|TPIS_RABIT Triosephosphate isomerase (TIM)
(Triose-phosphate isomerase)
APSRKFFVGGNWKMNGRKKNLGELITTLNAAKVPADTEVVCAPPTAYIDFARQKLDPKIAVAAQNCYKV
TNGAFTGEISPGMIKDCGATWVVLGHSERRHVFGESDELIGQKVAHALSEGLGVIACIGEKLDEREAGI
TEKVVFEQTKVIADNVKDWSKVVLAYEPVWAIGTGKTATPQQAQEVHEKLRGWLKSNVSDAVAQSTRII
YGGSVTGATCKELASQPDVDGFLVGGASLKPEFVDIINAKQ
a. There are a number of web servers that will predict secondary structure
based on the primary sequence of a protein. Here is a list (in case one or
more is not working on a given day). If all fail because their web addresses
have changed, a Google search for “protein secondary structure prediction”
should be successful.
i. PredictProtein (http://www.predictprotein.org/). To start, you will
need to create an account on this site. You can actually request this
site to predict secondary structure from 7 different web servers on
line. If this site is available, it will enable you to complete this
assignment by clicking on 2 or more of the optional services. Please
note that results may take one or two days.
ii. JPred (http://www.compbio.dundee.ac.uk/www-jpred/). Click on the
advanced link to the right of the sequence box. If you use the JPred
server, be certain to check the box labeled “Skip searching PDB before
prediction”.
Submit the rabbit muscle triose phosphate isomerase sequence to
these two servers. Compare the results you receive from the different
servers. Can you identify segments where the predictions are not
consistent between servers?
b. The structure of rabbit muscle triose phosphate isomerase has been
determined by X-ray crystallography. Please go to the Protein Data Bank
web server (http://www.rcsb.org/pdb/home/home.do) and search for 1R2R
(that is the PDB ID for this protein). To do so, go to the blue band at the
top of the page and select “PDB ID or Text”, enter 1R2R in the box, and click
on “Search”. The page that comes up contains several tabs: Summary,
Sequence, Derived Data, Seq. Similarity, 3D Similarity, Literature, Biol. &
Chem., Methods, Geometry, and Links. The page normally opens to the
Summary tab. Click on Sequence tab. The results shown here for the
secondary structure are from an analysis of the actual 3D structure (not a
prediction), which has been calculated according to an implementation of the
method of Kabsch and Sander (1983) Biopolymers 22, 2577-2637. The
assignments are: H=helix; B=residue in isolated beta bridge; E=extended
beta strand; G=310 helix; I=pi helix; T=hydrogen bonded turn; S=bend.
Compare your predicted results with the results presented on the PDB site.
c. As a first attempt at molecular visualization, please return to the Summary
tab and follow the links on the PDB site for "Download File." You can
download the file in a number of formats. It is best to download the file in
“PDB file (text)” format for use with PyMOL. Save the structure file as
1R2R.pdb on your computer (suggested folder: My Documents/PDB Files).
Open the PyMOL program. Then use File..Open from the drop-down menus
to open the 1r2r.pdb. As an alternative, you can also load the file directly to
PyMOL from the PDB web site if you have an active Internet connection:
i. Simply open the PDB Loader Service from the PyMOL Plugin drop-down
menu.
ii. Enter 1r2r in the box and hit return. This will load the PDB file directly
in your Viewer window.
iii. As an alternative, you can use the File..Open dropdown menu in
PyMOL and open the file you have saved to your hard drive.
d. You will initially see a stick model that represents all the bonds in the
structure as simple lines. Let’s run a few commands to make a simpler
image. Remember that you are using a ConSCRIPT enabled version of
PyMOL, so you simply need to preface your RasMol or Jmol command with
an r to get PyMOL to execute the command. You can enter the script at the
PyMOL> prompt at the bottom of either PyMOL window.
i.
ii.
iii.
iv.
v.
r
r
r
r
r
restrict protein
select protein
wireframe off
cartoon
color structure
What color are the alpha helices?
What color are the beta strands?
Explore the PDB file to find the abbreviations for the three ligands:
Mg2+, dimethyl sulfoxide, and 2-amino-2-hydroxymethyl-propane-1,3diol.
To rotate the image, hold down the (left) mouse button while dragging
the mouse over the image. Now enter these commands to create
some more changes in the appearance of triose phosphate isomerase:
vi. r select hetero and not water (selects non-protein parts of the
structure excluding water)
vii. r spacefill (a van der Waals radius representation)
viii. r color CPK (standard chemistry color scheme)
ix. r select protein
x. r cartoon off
xi. r wireframe 30
xii. spacefill 100 (These combined commands yield a ball-and-stick
structure of the protein.)
xiii. zoom 200 (This gives a 2X expansion of the view. You can also zoom
in on the structure in the viewing window by holding down the Shift
key on your keyboard while using a left-mouse click-and-drag from the
top to the bottom of the window. Experiment with this.)
xiv. Now convert the protein back to a cartoon with the following four
commands
1. r select protein
2. r wireframe off
3. r spacefill off
4. r cartoon
2. Exploring the Protein Data Bank. In the previous problem, we visited the Protein
Data Bank (PDB). We will explore that site in more detail now. If you encounter
difficulties at any point in this exercise, you may be able to find your way using the
Search box on the main site page or the Help files (on the left side of the page).
The PDB is a repository of macromolecular structures. Perhaps the most important
skill for a PDB site user is the ability to find the structures they are seeking. On the
home page (http://www.rcsb.org/pdb/home/home.do), the Help menu on the left
side of the page includes Video Tutorials. These Flash animations will instruct you on
navigating the site, searching for proteins and using the tools and viewers on the site.
Structures in the PDB are assigned PDB IDs - 4 letter alphanumeric codes that
uniquely identify each structure. So for example 4HHB is a hemoglobin structure and
8GCH is a chymotrypsin structure. If you know the PDB ID, then you can use that to
search the PDB. You may ask - why would I know that code unless I was the
crystallographer who determined that structure? Most scientists who determine
macromolecular structures are highly motivated to publish their findings in journals
such as Science, Nature, Journal of Biological Chemistry, Journal of Molecular Biology
and Protein Science. These journals have an agreement with the PDB that requires
authors to submit their structures to the PDB before they will publish the article in
their journal. Also, the figures in the textbook showing structures of proteins and
nucleic acids list the corresponding PDB ID. For our first PDB search, we're going to
find a PDB ID in a journal article, then find that structure on the PDB site.
Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search
for this paper using the QUICK SEARCH menu near the top of the page:
Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram, Padmanabhan
Balaram, and Mathur R. N. Murthy.
Structure of Plasmodium falciparum Triose-phosphate Isomerase-2-Phosphoglycerate
Complex at 1.1-Å Resolution. J. Biol. Chem. 2003 volume 278, pages 52461-52470.
Download the article (Full text or PDF – it’s free). Go to the footnotes section and find
the four character PDB ID code. Then go to the Protein Data Bank main page. Type
the PDB ID in the search box and click the Search button. You should be taken to the
Structure Summary page for this enzyme. The Structure Summary page contains
links to many related resources. Try to do each of the following:
a. Download the PDB (structure) file for this protein to your computer. Remember
where you put it (suggested folder: My Documents/PDB Files; suggested name:
1o5x.pdb). In problem 3, you're going to study this structure using Jmol.
b. Download the protein sequence in FASTA format.
c. Find the still images of this protein on the 1o5x Summary page in the Biological
Assembly box. Click on the link to More Images…. To save an image on the page
that appears, just right click on it (CNTRL-click for a one-button mouse) and select
the option that lets you save the file (In Internet Explorer, the command is "Save
Picture As.."; in Firefox and Safari, the command is "Save Image As..").
d. Return to the "Summary" page for 1o5x. Click on "Links" tab. Follow the links for
1o5x to the sites at PDBSum and the IMB Jena Image Library. Collect still images
from each of these sites. Make sure you keep a record of where you found each
image.
3. Examining Protein Structures. In Problem 2, you should have saved the PDB file
for 1o5x, entitled "Plasmodium Falciparum TIM Complexed To 2-Phosphoglycerate."
We're going to use PyMOL to explore this structure. We'll be particularly interested in
identifying secondary structures and looking at the active site.
a. You're going to expand on the question 1c exercise. Open PyMOL on your
computer. If you have not installed it already, please see the opening paragraph
for the exercises in this chapter.
b. Open the file 1o5x.pdb using the File..Open drop-down menu. When you first
open it, you will see wireframe representation of the structure with the waters
shown as small red dots. Use these commands to alter the appearance of the
structure:
i. r restrict protein
ii. r select protein
iii. r wireframe off
iv. r cartoon
v. r color structure
c. When you open PyMOL, a structure viewing window appears on your computer. As
you have seen, you can use Rasmol/Jmol scripts to control the appearance of the
structure from the command line in PyMOL. We'll use this to select the small
molecules that are bound to triose phosphate isomerase in this structure: 3hydroxypyruvic acid and 2-phosphoglyceric acid, but first you'll need to learn a
little bit about viewing a PDB file.
Go to the Structure Explorer page for 1o5x
(http://www.rcsb.org/pdb/home/home.do/explore/explore.do?structureId=1O5X).
Click on Display File..PDB File on the upper right side of the page. This will bring
up the PDB file, which contains a lot of information. We'll only look at a few items.
Each line in a PDB file is called a "record" and the first 6 characters on that line tell
what kind of "record" it is. In your browser, search for SEQRES. According to the
PDB Format Description, "SEQRES records contain the amino acid or nucleic acid
sequence of residues in each chain of the macromolecule that was studied." So
you can see the sequence of your protein there. For 1o5x, the SEQRES section
looks like this:
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
B
B
B
B
B
B
B
B
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
248
MET
ASN
PHE
VAL
LYS
VAL
SER
ILE
THR
LYS
LEU
THR
PHE
ILE
LEU
CYS
GLY
GLN
SER
MET
MET
ASN
PHE
VAL
LYS
VAL
SER
ILE
ALA
GLY
ASN
VAL
LEU
SER
ALA
ILE
ASP
ASN
GLU
LYS
ASP
GLY
VAL
GLY
GLY
GLN
LEU
ARG
THR
ASN
PHE
LEU
LYS
GLU
GLY
GLU
ASN
GLN
GLN
ASN
THR
HIS
GLU
SER
GLU
LYS
LYS
LEU
LEU
PRO
GLN
PHE
ILE
HIS
ASP
LEU
ARG
VAL
VAL
GLY
LYS
LYS
VAL
ASP
GLU
TYR
GLU
ASP
VAL
SER
GLY
ALA
PHE
VAL
LYS
GLU
LYS
ILE
LYS
GLU
GLN
ASN
ILE
SER
PHE
SER
PHE
SER
LYS
ASN
LYS
GLU
ARG
ALA
GLN
ALA
LEU
THR
ILE
ALA
THR
ASP
PHE
VAL
ILE
ASP
VAL
PHE
GLY
ASP
ARG
GLU
VAL
ASN
PHE
VAL
ALA
ARG
ASN
GLU
GLY
VAL
ALA
LYS
PRO
HIS
SER
SER
LEU
ARG
LYS
VAL
LYS
VAL
TYR
THR
LYS
GLN
ASN
PHE
ASP
ALA
SER
SER
TYR
THR
TYR
ASN
LYS
LEU
CYS
THR
ASP
GLU
PRO
ILE
ILE
CYS
LEU
ILE
ASN
LEU
LYS
ASP
GLY
THR
ILE
TYR
GLN
PHE
ILE
LEU
PRO
GLU
VAL
ARG
SER
VAL
ILE
TRP
THR
LEU
HIS
ILE
GLY
GLU
PHE
ALA
GLY
GLU
ILE
LEU
GLN
LYS
ILE
SER
GLY
LYS
LYS
ASN
ASP
THR
GLN
GLU
TYR
HIS
SER
GLU
VAL
ASP
TRP
ALA
ASP
LEU
LEU
ASN
SER
CYS
SER
VAL
ARG
ASN
VAL
VAL
GLU
LEU
SER
ILE
ASN
ALA
GLN
THR
TYR
ILE
ALA
ALA
ALA
GLY
ASN
VAL
LEU
SER
ALA
ILE
ARG
THR
ASN
PHE
LEU
LYS
GLU
GLY
LYS
LEU
LEU
PRO
GLN
PHE
ILE
HIS
TYR
GLU
ASP
VAL
SER
GLY
ALA
PHE
PHE
SER
PHE
SER
LYS
ASN
LYS
GLU
VAL
ILE
ASP
VAL
PHE
GLY
ASP
ARG
ALA
LYS
PRO
HIS
SER
SER
LEU
ARG
ALA
SER
SER
TYR
THR
TYR
ASN
LYS
ASN
LEU
LYS
ASP
GLY
THR
ILE
TYR
TRP
THR
LEU
HIS
ILE
GLY
GLU
PHE
LYS
ASN
ASP
THR
GLN
GLU
TYR
HIS
CYS
SER
VAL
ARG
ASN
VAL
VAL
GLU
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
SEQRES
9
10
11
12
13
14
15
16
17
18
19
20
B
B
B
B
B
B
B
B
B
B
B
B
248
248
248
248
248
248
248
248
248
248
248
248
THR
LYS
LEU
THR
PHE
ILE
LEU
CYS
GLY
GLN
SER
MET
ASP
ASN
GLU
LYS
ASP
GLY
VAL
GLY
GLY
GLN
LEU
GLU
ASN
GLN
GLN
ASN
THR
HIS
GLU
SER
GLU
LYS
ASP
LEU
ARG
VAL
VAL
GLY
LYS
LYS
VAL
ASP
GLU
VAL
LYS
GLU
LYS
ILE
LYS
GLU
GLN
ASN
ILE
SER
ARG
ALA
GLN
ALA
LEU
THR
ILE
ALA
THR
ASP
PHE
GLU
VAL
ASN
PHE
VAL
ALA
ARG
ASN
GLU
GLY
VAL
LYS
VAL
LYS
VAL
TYR
THR
LYS
GLN
ASN
PHE
ASP
LEU
CYS
THR
ASP
GLU
PRO
ILE
ILE
CYS
LEU
ILE
GLN
PHE
ILE
LEU
PRO
GLU
VAL
ARG
SER
VAL
ILE
ALA
GLY
GLU
ILE
LEU
GLN
LYS
ILE
SER
GLY
LYS
SER
GLU
VAL
ASP
TRP
ALA
ASP
LEU
LEU
ASN
SER
LEU
SER
ILE
ASN
ALA
GLN
THR
TYR
ILE
ALA
ALA
Each line contains 13 amino acid residues, using the 3-letter abbreviations for the
amino acids. So residue #27 in chain A is PHE (phenylalanine). The 12th
character in each record is a chain identifier. If a protein contains more than one
polypeptide chain, the chains will be identified with a letter (in this case, there are
two chains - A and B).
Anything in a PDB file that is not either protein or nucleic acid is considered a
heterogeneous atom and is referred to with the prefix "het". So HETNAM is the
label for a record that contains the name of a non-protein, non-nucleic acid group.
Search the HTML version of your structure file for "HETNAM". What are the hetero
groups in this structure?
Now, let’s continue exploring 1o5x in PyMOL using ConSCRIPT and the command
line in PyMOL.
i. r select hetero and not water
This command selects all the heterogeneous atoms excluding water. To show
the heterogeneous molecules, enter these two commands consecutively:
ii. r spacefill
iii. r color cpk
This creates a spacefilling representation of the three heterogenous molecules
and colors them according to Corey-Pauling-Kultun (CPK; carbon is gray,
oxygen is red; phosphorous is orange). Note that the phosphite ion and the 3hydroxypyruvate molecule are in contact).
d. As the last part of this exercise, we're going to show some of the active site
residues, based on Figure 4 of the primary citation for this structure
(Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram,
Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium
falciparum Triose-phosphate Isomerase-2-Phosphoglycerate Complex at 1.1-Å
Resolution. J. Biol. Chem., Vol. 278, Issue 52, 52461-52470, December 26,
2003). Figure 4a shows three residues that interact with the 2-phosphoglycerate:
glutamate 165, lysine 12 and histidine 95. Use this command to select these
residues:
i. r select 12,95,165
ii. r spacefill 100
iii. r wireframe 30
This gives the side chains a ball-and-stick appearance.
Finally change them to CPK coloring so you can distinguish the atoms on the
structure
iv. r color cpk
To get a better look at this interaction, you can zoom in on a structure by
pressing the right mouse button and moving the mouse from top to bottom of
the viewing window. To move the image side-to-side or up-and-down, use the
middle button on your mouse (on a Macintosh) and drag the image where you
want to go (you may need to experiment with your mouse/operating system to
find the right combination to move the image around the screen). Using this
approach, you can get a closeup view of the binding site for 2phosphoglycerate.
Advanced: You may have noticed that there are actually two representations of
glu-165 in the closeup view of one of the active sites on 1o5x. Why are there
two glu-165’s in this structure?
To complete this exercise, identify and display additional residues that interact
with the 2-phosphoglycerate in 1o5x. A couple hints: Use Figure 3 and Figure
4 from the primary citation. Also, for some reason, Jmol won't select 2phosphoglycerate using the "select 2PG", but you can select it using "select
4400", which is a second way 2PG is identified in 1o5x.
4. Protein Families. Now we will explore protein families. The goal of this exercise is
to identify a protein that shares structural homology with triose phosphate isomerase
(as seen in PDB ID 1o5x), but catalyzes a different reaction. We will start with two
resources - CATH and SCOP.
a. CATH. You can start at the CATH homepage (http://www.cathdb.info/).
Enter 1o5x in the Search box at the top of the page. The results will be
presented in a page with 6 tabs.
i. The first tab is the search tab. You can return here is you wish to
perform another search or to search using a FASTA sequence.
ii. The Results Summary tab provides links to each of the four levels of
the CATH hierarchy for this structure. These four levels of hierarchy
are also represented in the four remaining tabs.
iii. The Cathnodes tab gives you access to the classification lineage for
triose phosphate isomerase, along with a list of related structures. The
tabs on this page provide links to members of the same homologous
superfamily, along with alignments and structural neighbors. What is
the Cathnodes class for 1o5x?
iv. The CATH Domain tab describes structural neighbors of the query
protein. This is a good place to find enzymes from the same
superfamily that catalyze different reactions.
v. The CATH Chains tab simply lists the chains from the query structure
along with linked keywords.
vi. The CATH Pdbs tab lists the PDB ID, along with linked keywords.
Click on the Cathnodes class link for 1o5x. What is its Homologous
Superfamily and classification lineage?
Scroll down the same page that gave you the Homologous Superfamily
and Classification Lineage to find the list of Non-Redundant
Representatives. Explore the page to find 5 enzymes in this
superfamily that catalyze reactions different from the reaction
catalyzed by triose phosphate isomerase. Links to Representative
Domains will lead you to lists of candidate proteins. The first four
characters of the match are the PDB id for the structure (e.g.,
1wa3D00 points to PDB ID 1wa3). Compare the function of these
proteins using the links on the CATH site. Return to the Cathnodes
page (which contained the list of Non-Redundant Representatives) and
follow the links for Alignments.
b. SCOP. Go to the SCOP homepage (http://scop.mrc-lmb.cam.ac.uk/scop/).
Click the link for “Keyword search of SCOP entries”. Enter “triosephosphate”
as one word without the quote marks to search to triose phosphate
isomerase.
SCOP provides a Lineage for each protein that is classified. Follow the
lineage links at the Fold level to identify 5 proteins that are related to triose
phosphate isomerase. Are any of these proteins also in your list from your
search of the CATH database?
c. The final part of this exercise is to identify other resources that help you find
proteins related to triose phosphate isomerase from Plasmodium falciparum.
You are encouraged to follow other links from the PDB External Links page
for 1o5x, but you may also be able to find other resources by searching the
Internet using the PDB ID codes. List and summarize three other resources
that you find.