Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bioinformatics Exercises Over the last two decades, information has been gaining increasing importance in both teaching and learning biochemistry. The most obvious case is the sequencing of the human genome and many other complete genomes. In 1990, the determination of the sequence of a protein was often the topic of a full publication in a peer-reviewed journal such as Science, Nature, or The Journal of Biological Chemistry. Now entire genomes are the topic of individual research papers. The term "bioinformatics" is a catch-all phrase which generally refers to the use computers and computer science approaches to the study of biological systems. The main chapters where this information is discussed in the text are chapters 3 (Nucleotides, Nucleic Acids and Genetic Information), 5 (Proteins: Primary Structure), 6 (Proteins: Three-Dimensional Structure), 12 (Enzyme Kinetics, Inhibition and Regulation) and 13 (Introduction to Metabolism). Here we provide exercises appropriate to these chapters aimed at introducing the techniques of bioinformatics that involve the use of computers, Internet-accessible databases and the tools that have been developed to “mine” those databases. General principles Open ended questions. The exercises may include some questions that have definite answers, but in many cases there will also be questions which may be answered in a number of ways, depending on the approach you take or the topic you select. 2. Stable Internet Resources. As much as possible, the exercises will be based on well established, stable web sites. If it is necessary to use less reliable sites and/or resources, attempts have been made to provide multiple sites that perform similar functions. 3. Here are the stable online resources that will be used most frequently: 1. a. b. c. d. e. f. g. h. i. Genbank (http://www.ncbi.nlm.nih.gov/) Protein Data Bank (http://www.rcsb.org) Expasy Proteomics Server (http://us.expasy.org/) European Bioinformatics Institute (http://www.ebi.ac.uk/) Pfam (http://www.sanger.ac.uk/Software/Pfam/) SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi) PubMed Central (http://www.pubmedcentral.nih.gov/) Answer key. Where a definite answer is known, it will be provided in an answer key. For more open-ended questions, a typical correct answer will be presented. 5. Historical perspective. If historical resources are available online (including PubMed), there may be questions designed to help students identify some of the historical roots of biochemistry and molecular biology. 4. Project 3: Visualizing Three-Dimensional Protein Structures Using the Molecular Visualization Programs Jmol and PyMOL There are a number of good free visualization tools available on the Internet. Each has strengths and weakness. You will have two options: Jmol and PyMOL. Jmol is written in Java, has a nice user interface and uses the command set that will be quite familiar to users of RasMol or Chime. PyMOL is written in Python; the standard PyMOL user interface can be quite challenging to use, but plug-ins are available to increase ease-ofuse. Many users consider PyMOL’s graphics capabilities compelling and worth the challenge of using the program interface. Project 3 Exercises with Jmol Jmol can be used in two different formats – as an applet built into web pages, or as a standalone application. We will be using it as a standalone application. Jmol is a java based application and therefore requires that you have Java Virtual Machine (JVM) installed on your computer. It is frequently installed on computers before purchase, but you can also find it at http://www.java.com/en/download/index.jsp. Downloading and installing Jmol. The Jmol wiki (http://wiki.jmol.org/index.php/Main_Page) has a terrific instruction page about the Jmol application (http://wiki.jmol.org/index.php/Jmol_Application). Windows. These steps should work on most computers. If you have difficulty, please go to the Jmol wiki and search for more instructions there. Download the latest stable release (not a pre-release) from http://sourceforge.net/projects/jmol/files/ in a zip format. Zip is a compressed file format that can be opened by the operating system in Windows XP or Windows Vista. 2. Create a folder for Jmol. Suggestion: c:\Program Files\Jmol. 3. View the compressed zip file from Windows Explorer. Extract only the jmol.jar file to the c:\Program Files\Jmol folder. 4. With the c:\Program Files\Jmol folder open, right click on the icon for Jmol.jar and select “Create shortcut”. Drag the shortcut to your desktop or your taskbar. You will now have access to Jmol from your desktop. 1. Macintosh. These instructions are taken directly from the Jmol wiki (http://wiki.jmol.org/index.php/Jmol_Application#Installing_Jmol_Application). Download the Jmol package (either .zip or tar.gz format) and extract/uncompress only the Jmol.jar file to the folder of your choice. 2. Simply double click on the Jmol.jar file to open Jmol. 1. As you go through the exercises below, you are encouraged to return to the Jmol wiki (http://wiki.jmol.org/index.php/Main_Page) for instructions and links to useful information about using Jmol. 1. Obtaining Structural Information. Please review the materials in your textbook about secondary structure of proteins. Secondary structures include alpha helices, beta sheets and beta turns in proteins. Many programs have been written that will predict secondary structures that will be found in a protein, based only on the primary sequence. Let's start again with rabbit muscle triose phosphate isomerase. Here is the primary sequence: >gi|136066|sp|P00939|TPIS_RABIT Triosephosphate isomerase (TIM) (Triose-phosphate isomerase) APSRKFFVGGNWKMNGRKKNLGELITTLNAAKVPADTEVVCAPPTAYIDFARQKLDPKIAVAAQNCYKV TNGAFTGEISPGMIKDCGATWVVLGHSERRHVFGESDELIGQKVAHALSEGLGVIACIGEKLDEREAGI TEKVVFEQTKVIADNVKDWSKVVLAYEPVWAIGTGKTATPQQAQEVHEKLRGWLKSNVSDAVAQSTRII YGGSVTGATCKELASQPDVDGFLVGGASLKPEFVDIINAKQ a. There are a number of web servers that will predict secondary structure based on the primary sequence of a protein. Here is a list (in case one or more is not working on a given day). If all fail because their web addresses have changed, a Google search for “protein secondary structure prediction” should be successful. i. PredictProtein (http://www.predictprotein.org/). To start, you will need to create an account on this site. You can actually request this site to predict secondary structure from 7 different web servers on line. If this site is available, it will enable you to complete this assignment by clicking on 2 or more of the optional services. Please note that results may take one or two days. ii. JPred (http://www.compbio.dundee.ac.uk/www-jpred/). Click on the advanced link to the right of the sequence box. If you use the JPred server, be certain to check the box labeled “Skip searching PDB before prediction”. Submit the rabbit muscle triose phosphate isomerase sequence to these two servers. Compare the results you receive from the different servers. Can you identify segments where the predictions are not consistent between servers? b. The structure of rabbit muscle triose phosphate isomerase has been determined by X-ray crystallography. Please go to the Protein Data Bank web server (http://www.rcsb.org/pdb/home/home.do) and search for 1R2R (that is the PDB ID for this protein). To do so, go to the blue band at the top of the page and select “PDB ID or Text”, enter 1R2R in the box, and click on “Search”. The page that comes up contains several tabs: Summary, Sequence, Derived Data, Seq. Similarity, 3D Similarity, Literature, Biol. & Chem., Methods, Geometry, and Links. The page normally opens to the Summary tab. Click on Sequence tab. The results shown here for the secondary structure are from an analysis of the actual 3D structure (not a prediction), which has been calculated according to an implementation of the method of Kabsch and Sander (1983) Biopolymers 22, 2577-2637. The assignments are: H=helix; B=residue in isolated beta bridge; E=extended beta strand; G=310 helix; I=pi helix; T=hydrogen bonded turn; S=bend. Compare your predicted results with the results presented on the PDB site. c. As a first attempt at molecular visualization, please return to the Summary tab and follow the links on the PDB site for "Download File." You can download the file in a number of formats. It is best to download the file in “PDB file (text)” format for use with Jmol. Save the structure file as 1R2R.pdb on your computer (suggested folder: My Documents/PDB Files). Open the Jmol program. Then use the drop down menu: File..open to open 1R2R.pdb. You will initially see a cartoon model which represents helices as magenta corkscrews, sheets as yellow arrows and waters as small red spheres. To rotate the image, hold down the (left) mouse button while dragging the mouse over the image. You can control the view in Jmol in three different ways: dropdown menus, right-click menus and scripting. Perform the following steps to clean up the image a bit using the dropdown and right-click menus (for a one-button mouse, use CNTRL-click). i. Dropdown: Display..Select..Water ii. Dropdown: Display..Atom..None iii. Dropdown: Display..Select..Hetero Now you should be able to see the alpha helix and beta sheet structures in rabbit muscle triose phosphate isomerase without the red water spheres. Take some time to experiment with the other dropdown menu options on Jmol. In addition to dropdown and right-click menus, Jmol also has a Script Console window that enables you to select specific atoms or parts of a structure (amino acid residues for example), then change the way they appear. To open the Jmol Script Console, select File..Console.. from the Jmol Dropdown menu. Then enter these commands at the $ prompt. select hetero and not water (selects non-protein parts of the structure excluding water) v. spacefill (a van der Waals radius representation) vi. color CPK (standard chemistry color scheme) iv. select protein cartoon off wireframe 30 spacefill 100 (These combined commands yield a ball-and-stick structure of the protein.) xi. zoom 200 (This gives a 2X expansion of the view. You can also zoom in on the structure in the viewing window by holding down the Shift key on your keyboard while using a left-mouse click-and-drag from the top to the bottom of the window. Experiment with this.) xii. Now convert the protein back to a cartoon with the following four commands 1. Select protein 2. Wireframe off 3. Spacefill off 4. Cartoon vii. viii. ix. x. 2. Exploring the Protein Data Bank. In the first problem, we visited the Protein Data Bank (PDB). We will explore that site in more detail now. If you encounter difficulties at any point in this exercise, you may be able to find your way using the Search box on the main site page or the Help files (on the left side of the page). The PDB is a repository of macromolecular structures. Perhaps the most important skill for a PDB site user is the ability to find the structures they are seeking. On the home page (http://www.rcsb.org/pdb/home/home.do), the Help menu on the left side of the page includes Video Tutorials. These Flash animations will instruct you on navigating the site, searching for proteins and using the tools and viewers on the site. Structures in the PDB are assigned PDB IDs - 4 letter alphanumeric codes that uniquely identify each structure. So for example 4HHB is a hemoglobin structure and 8GCH is a chymotrypsin structure. If you know the PDB ID, then you can use that to search the PDB. You may ask - why would I know that code unless I was the crystallographer who determined that structure? Most scientists who determine macromolecular structures are highly motivated to publish their findings in journals such as Science, Nature, Journal of Biological Chemistry, Journal of Molecular Biology and Protein Science. These journals have an agreement with the PDB that requires authors to submit their structures to the PDB before they will publish the article in their journal. Also, the figures in the text showing structures of proteins and nucleic acids list the corresponding PDB ID. For our first PDB search, we're going to find a PDB ID in a journal article, then find that structure on the PDB site. Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search for this paper using the QUICK SEARCH menu near the top of the page: Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram, Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium falciparum Triose-phosphate Isomerase-2Phosphoglycerate Complex at 1.1-Å Resolution. J. Biol. Chem. 2003 volume 278, pages 52461-52470. Download the article (Full text or PDF – it’s free). Go to the footnotes section and find the four character PDB ID code. Then go to the Protein Data Bank main page. Type the PDB ID in the search box and click the Search button. You should be taken to the Structure Summary page for this enzyme. The Structure Summary page contains links to many related resources. Try to do each of the following: a. Download the PDB (structure) file for this protein to your computer. Remember where you put it (suggested folder: My Documents/PDB Files; suggested name: 1o5x.pdb). In problem 3, you're going to study this structure using Jmol. b. Download the protein sequence in FASTA format – click on Download Files on the right hand side of the page and select FASTA sequence. Suggested file name: 1o5x_FASTA.txt. c. Find the still images of this protein on the 1o5x Summary page in the Biological Assembly box. Click on the link to More Images…. To save an image on the page that appears, just right click on it (CNTRL-click for a onebutton mouse) and select the option that lets you save the file (In Internet Explorer, the command is "Save Picture As.."; in Firefox and Safari, the command is "Save Image As.."). d. Return to the "Summary" page for 1o5x. Click on "Links" tab. Follow the links for 1o5x to the sites at PDBSum and the IMB Jena Image Library. Collect still images from each of these sites. Make sure you keep a record of where you found each image. 3. Examining Protein Structures. In Problem 2, you should have saved the PDB file for 1o5x, entitled "Plasmodium Falciparum TIM Complexed To 2Phosphoglycerate." We're going to use Jmol to explore this structure. We'll be particularly interested in identifying secondary structures and looking at the active site. a. You're going to expand on the Question 1c exercise. Open Jmol on your computer. If you have not installed it already, please see the opening paragraph for the exercises in this chapter. b. Open the file 1o5x.pdb. When you first open it, you will see cartoon representation of the structure with the waters shown as small red spheres. Now it's time to explore the drop-down menus in Jmol. There are 7 dropdown menus in Jmol: File, Edit, Display, View, Tools, Macro, and Help. Spend a few minutes trying each command in each of the menus. Here are a few that are very helpful: i. File..Export..Export Image enables you to export an image you have created as an image in jpg format. ii. Edit..Copy Image copies the image to memory. You can then paste the same image into a word processor or presentation file. iii. Display..Zoom allows you to enlarge or shrink your structure. iv. Display..Axes brings the x, y and z axes into Jmol v. View..Front brings the structure around to its original orientation. The remaining options show the structure from different angles. vi. HELP..User Guide. The User Guide includes instructions on a number of features of Jmol. Explore the section called “Rasmol/Chime commands” that contains many commands you can use in the Script window of Jmol. c. When you open Jmol, a structure-viewing window appears on your computer. To move beyond the drop-down menus, open the Script Console again (File..Console). You can use this window to send very specific commands to the structure viewing window. We'll use this to select the small molecules that are bound to triose phosphate isomerase in this structure: 3-hydroxypyruvic acid and 2-phosphoglyceric acid, but first you'll need to learn a little bit about viewing a PDB file. Go to the Structure Explorer page for 1o5x (http://www.rcsb.org/pdb/home/home.do/explore/explore.do?structureId=1 O5X). Click on Display File..PDB File on the upper right side of the page. This will bring up the PDB file, which contains a lot of information. We'll only look at a few items. Each line in a PDB file is called a "record" and the first 6 characters on that line tell what kind of "record" it is. In your browser, search for SEQRES. According to the PDB Format Description, "SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied." So you can see the sequence of your protein there. For 1o5x, the SEQRES section looks like this: SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES 1 2 3 4 5 6 7 8 9 A A A A A A A A A 248 248 248 248 248 248 248 248 248 MET ASN PHE VAL LYS VAL SER ILE THR ALA GLY ASN VAL LEU SER ALA ILE ASP ARG THR ASN PHE LEU LYS GLU GLY GLU LYS LEU LEU PRO GLN PHE ILE HIS ASP TYR GLU ASP VAL SER GLY ALA PHE VAL PHE SER PHE SER LYS ASN LYS GLU ARG VAL ILE ASP VAL PHE GLY ASP ARG GLU ALA LYS PRO HIS SER SER LEU ARG LYS ALA SER SER TYR THR TYR ASN LYS LEU ASN LEU LYS ASP GLY THR ILE TYR GLN TRP THR LEU HIS ILE GLY GLU PHE ALA LYS ASN ASP THR GLN GLU TYR HIS SER CYS SER VAL ARG ASN VAL VAL GLU LEU SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 A A A A A A A A A A A B B B B B B B B B B B B B B B B B B B B 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 LYS LEU THR PHE ILE LEU CYS GLY GLN SER MET MET ASN PHE VAL LYS VAL SER ILE THR LYS LEU THR PHE ILE LEU CYS GLY GLN SER MET ASN GLU LYS ASP GLY VAL GLY GLY GLN LEU ASN GLN GLN ASN THR HIS GLU SER GLU LYS LEU ARG VAL VAL GLY LYS LYS VAL ASP GLU LYS GLU LYS ILE LYS GLU GLN ASN ILE SER ALA GLN ALA LEU THR ILE ALA THR ASP PHE VAL ASN PHE VAL ALA ARG ASN GLU GLY VAL VAL LYS VAL TYR THR LYS GLN ASN PHE ASP CYS THR ASP GLU PRO ILE ILE CYS LEU ILE PHE ILE LEU PRO GLU VAL ARG SER VAL ILE GLY GLU ILE LEU GLN LYS ILE SER GLY LYS GLU VAL ASP TRP ALA ASP LEU LEU ASN SER SER ILE ASN ALA GLN THR TYR ILE ALA ALA ALA GLY ASN VAL LEU SER ALA ILE ASP ASN GLU LYS ASP GLY VAL GLY GLY GLN LEU ARG THR ASN PHE LEU LYS GLU GLY GLU ASN GLN GLN ASN THR HIS GLU SER GLU LYS LYS LEU LEU PRO GLN PHE ILE HIS ASP LEU ARG VAL VAL GLY LYS LYS VAL ASP GLU TYR GLU ASP VAL SER GLY ALA PHE VAL LYS GLU LYS ILE LYS GLU GLN ASN ILE SER PHE SER PHE SER LYS ASN LYS GLU ARG ALA GLN ALA LEU THR ILE ALA THR ASP PHE VAL ILE ASP VAL PHE GLY ASP ARG GLU VAL ASN PHE VAL ALA ARG ASN GLU GLY VAL ALA LYS PRO HIS SER SER LEU ARG LYS VAL LYS VAL TYR THR LYS GLN ASN PHE ASP ALA SER SER TYR THR TYR ASN LYS LEU CYS THR ASP GLU PRO ILE ILE CYS LEU ILE ASN LEU LYS ASP GLY THR ILE TYR GLN PHE ILE LEU PRO GLU VAL ARG SER VAL ILE TRP THR LEU HIS ILE GLY GLU PHE ALA GLY GLU ILE LEU GLN LYS ILE SER GLY LYS LYS ASN ASP THR GLN GLU TYR HIS SER GLU VAL ASP TRP ALA ASP LEU LEU ASN SER CYS SER VAL ARG ASN VAL VAL GLU LEU SER ILE ASN ALA GLN THR TYR ILE ALA ALA Each line contains 13 amino acid residues, using the 3-letter abbreviations for the amino acids. So residue #27 in chain A is PHE (phenylalanine). The 12th character in each record is a chain identifier. If a protein contains more than one polypeptide chain, the chains will be identified with a letter (in this case, there are two chains - A and B). Anything in a PDB file that is not either protein or nucleic acid is considered a heterogeneous atom and is referred to with the prefix "het". So HETNAM is the label for a record that contains the name of a non-protein, non-nucleic acid group. Search the HTML version of your structure file for "HETNAM". What are the hetero groups in this structure? Now, let’s begin in Jmol by cleaning up the cartoon view of the protein in 1o5x: i. ii. Dropdown: Display..Select..Water Dropdown: Display..Atom..None We shall also display the heterogeneous groups in 1o5x. Go to the Jmol Script Console (File..Console) and enter the command: iii. select hetero and not water This command selects all the heterogeneous atoms excluding water. To show the heterogeneous molecules, enter these two commands consecutively: iv. v. spacefill on color CPK This creates a spacefilling representation of the three heterogenous molecules and colors them according to Corey-Pauling-Kultun (CPK; carbon is gray, oxygen is red; phosphorous is orange). Note that the phosphite ion and the 3-hydroxypyruvate molecule are in contact). d. As the last part of this exercise, we're going to show some of the active site residues, based on Figure 4 of the primary citation for this structure (Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram, Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium falciparum Triose-phosphate Isomerase-2-Phosphoglycerate Complex at 1.1Å Resolution. J. Biol. Chem., Vol. 278, Issue 52, 52461-52470, December 26, 2003). Figure 4a shows three residues that interact with the 2phosphoglycerate: glutamate 165, lysine 12 and histidine 95. Use this command to select these residues: select lys12,his95,glu165 Then use the right-click menu in the structure viewer window to show them in a ball-and-stick format right-click..Style..Scheme..Ball and Stick Finally change them to CPK coloring so you can distinguish the atoms on the structure right-click..Color..Atoms..By Scheme..Element (CPK) To get a better look at this interaction, you can zoom in on a structure by pressing the Shift key and moving the mouse using a combination of the Shift key and the mouse. To zoom in drag the mouse from top to bottom in the structure window while holding down the Shift key. To move the image side-to-side or up-and-down, hold down Ctrl-key, then use the right-click on your mouse and drag the image where you want to go (for a one-button mouse, hold down the Shift key, double-click in the Jmol window – keep the mouse button depressed after the 2nd click - and drag the image to the desired position). Using a combination of Shift-mouse and Ctrl-right-click, you can get a closeup view of the binding site for 2-phosphoglycerate. Advanced: You may have noticed that there are actually two representations of glu-165 in one of the active sites in 1o5x in the closeup view. Why are there two glu-165’s in this structure? To complete this exercise, identify and display additional residues that interact with the 2-phosphoglycerate in 1o5x. A couple hints: Use Figure 3 and Figure 4 from the primary citation. Also, for some reason, Jmol won't select 3-hydroxypyruvate using the "select 3PY", but you can select it using "select 3300", which is a second way 3PY is identified in 1o5x. 4. Protein Families. Now we will explore protein families. The goal of this exercise is to identify a protein that shares structural homology with triose phosphate isomerase (as seen in PDB ID 1o5x), but catalyzes a different reaction. We will start with two resources - CATH and SCOP. a. CATH. You can start at the CATH homepage (http://www.cathdb.info/). Enter 1o5x in the Search box at the top of the page. The results will be presented in a page with 6 tabs. i. The first tab is the search tab. You can return here is you wish to perform another search or to search using a FASTA sequence. ii. The Results Summary tab provides links to each of the four levels of the CATH hierarchy for this structure. These four levels of hierarchy are also represented in the four remaining tabs. iii. The Cathnodes tab gives you access to the classification lineage for triose phosphate isomerase, along with a list of related structures. The tabs on this page provide links to members of the same homologous superfamily, along with alignments and structural neighbors. What is the Cathnodes class for 1o5x? iv. The CATH Domain tab describes structural neighbors of the query protein. This is a good place to find enzymes from the same superfamily that catalyze different reactions. v. The CATH Chains tab simply lists the chains from the query structure along with linked keywords. vi. The CATH Pdbs tab lists the PDB ID, along with linked keywords. Click on the Cathnodes class link for 1o5x. What is its Homologous Superfamily and classification lineage? Scroll down the same page that gave you the Homologous Superfamily and Classification Lineage to find the list of Non-Redundant Representatives. Explore the page to find 5 enzymes in this superfamily that catalyze reactions different from the reaction catalyzed by triose phosphate isomerase. Links to Representative Domains will lead you to lists of candidate proteins. The first four characters of the match are the PDB id for the structure (e.g., 1wa3D00 points to PDB ID 1wa3). Compare the function of these proteins using the links on the CATH site. Return to the Cathnodes page (which contained the list of Non-Redundant Representatives) and follow the links for Alignments. b. SCOP. Go to the SCOP homepage (http://scop.mrc-lmb.cam.ac.uk/scop/). Click the link for “Keyword search of SCOP entries”. Enter “triosephosphate” as one word without the quote marks to search to triose phosphate isomerase. SCOP provides a Lineage for each protein that is classified. Follow the lineage links at the Fold level to identify 5 proteins that are related to triose phosphate isomerase. Are any of these proteins also in your list from your search of the CATH database? c. The final part of this exercise is to identify other resources that help you find proteins related to triose phosphate isomerase from Plasmodium falciparum. You are encouraged to follow other links from the PDB External Links page for 1o5x, but you may also be able to find other resources by searching the Internet using the PDB ID codes. List and summarize three other resources that you find. Project 3 Exercises with PyMOL Introduction to PyMOL PyMOL is a molecular modeling program that generates high-quality molecular images and animations for presentations and publications. PyMOL's command line interface allows users to input commands which directly alter the appearance of their molecule in the viewing window. While PyMOL is highly capable on its own, it also supports a series of plug-ins that enhance the user experience with PyMOL. Included in this list is ConSCRIPT, a plug-in developed by students at the Rochester Institute of Technology, that enables users who are familiar with RasMol or Jmol to use those commands to generate images in PyMOL. PyMOL can be downloaded in a “for educational use only” form from the PyMOL website found at http://pymol.org/educational/. Before you can download PyMOL, you will need to register with Schrodinger, who will then send you a confirming email and provide you with a login and password that will allow you to download PyMOL. From that page, you simply download and install PyMOL for Windows, Macintosh or Linux to the default location on your system. Note to Macintosh users: To use plug-ins for PyMOL on the Mac, it needs to operate in hybrid X11 mode. To enable this, you simply change the name of the application from MacPyMOL to PyMOLX11Hybrid. Installing ConSCRIPT. The following documentation is taken directly from the ConSCRIPT readme file. 1. For all operating systems: Download ConSCRIPT. ConSCRIPT can be downloaded in a compressed file format from http://sourceforge.net/projects/sbevsl/files in the ConSCRIPT folder. It is important to note that clicking on the prominent green download arrow on this page (or the main SBEVSL page on SourceForge.net) may lead you to another file that is part of the SBEVSL project. You will need to open the ConSCRIPT folder on http://sourceforge.net/projects/sbevsl/files by clicking on the green triangle to the left of the ConSCRIPT folder. Similarly, you should open the latest ConSCRIPT folder (currently ConSCRIPT-2.1) and download the most recent .tar.gz or .zip file. You can choose to download either a .tar.gz compressed file (ConSCRIPT2.1rc1.tar.gz as of August 3, 2010) or a zip compressed file (ConSCRIPT2.1rc1.zip as of August 3, 2010). The way you expand the file depends on your operating system. 2. Unix or Linux a. Expand the compressed file. In unix or linux systems, or using MINGW under Windows you may unpack the tarball with gunzip < ConSCRIPT-2.1rc1.tar.gz | tar xvf – b. Install ConSCRIPT. The simplest way to install ConSCRIPT is by using the Plugin->Manage Plugins->Install... menu to install CONSCRIPT.py To use that menu item, you need the same administrative rights as were needed to install PyMOL, itself. Under Unix or Linux, you may need to have root access (e.g. via sudo). Alternatively, you may copy ConSCRIPT.py into the correct place in the directory tree used by PyMOL for plugins. For example, for the default install of PyMOL 1.2 under Ubuntu Linux, the ConSCRIPT.py file should be copied to /usr/lib/pymodules/python2.6/pmg_tk/startup If you have installed PyMOL to another location on your system, you will need to find the portion of the pymol installation tree that contains modules/pmg_tk/startup Once you have completed the installation, launch PyMOL and look for ConSCRIPT in the Plugin menu. When all files are in place, the ConScript script interface can be run by choosing ConSCRIPT from the PyMOL plug-in menu. Alternatively, prefixing RasMOL commands with one of "R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL command line will convert the RasMOL command to PyMOL. 3. Windows a. Expand the compressed file. Windows systems may have a native application that will expand your .tar.gz or .zip files; if not you may wish to consider using Stuffit Expander or WinZip. b. Install ConSCRIPT The simplest way to install ConSCRIPT is by using the Plugin->Manage Plugins->Install... menu to install CONSCRIPT.py To use that menu item, you need the same administrative rights as were needed to install PyMOL, itself. It is probably simplest if you install as an administrator or ask the administrator for your system to install it for you. Alternatively, you may copy ConSCRIPT.py into the correct place in the directory tree used by PyMOL for plugins. For example, for the default install of PyMOL 1.3 under Windows, the ConSCRIPT.py file should be copied to c:\Program Files\PyMOL\PyMOL\modules\pmg_tk\startup If you have installed PyMOL to another location on your system, you will need to find the portion of the PyMOL installation tree that contains modules/pmg_tk/startup Once you have completed the installation, launch PyMOL and look for ConSCRIPT in the Plugin menu. When all files are in place, the ConScript script interface can be run by choosing ConSCRIPT from the PyMOL plug-in menu. Alternatively, prefixing RasMOL commands with one of "R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL commandline will convert the RasMOL command to PyMOL. 4. Macintosh a. Rename your PyMOL application Under Macintosh OS X you need to rename the MacPymol application to MacPymolX11Hybrid, in order to get access to plugins. b. Expand the compressed file. Macintosh systems may have a native application that will expand your .tar.gz or .zip files; if not you may wish to consider using Stuffit Expander. c. Install ConSCRIPT. Then, the simplest way to install ConSCRIPT is by using the Plugin->Manage Plugins->Install... menu to install CONSCRIPT.py Alternatively, you may copy ConSCRIPT.py into the correct place in the directory tree used by PyMOL for plugins. This can be a bit tricky on the Mac, so follow these instructions closely. i. Open Finder->Applications ii. If you have a three button mouse, right click on MacPyMOLX11Hybrid and select Show Package Contents. This will take you to the directory tree for PyMOL. If you don’t have a three button mouse, Ctrl-click should bring up the option to Show Package Contents iii. Traverse the tree to pymol/modules/pmg_tk/startup iv. Copy ConSCRIPT.py to the startup folder. Once you have completed the installation, launch PyMOL and look for ConSCRIPT in the Plugin menu. When all files are in place, the ConScript script interface can be run by choosing ConSCRIPT from the PyMOL plug-in menu. Alternatively, prefixing RasMOL commands with one of "R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL command line will convert the RasMOL command to PyMOL. Let’s start with a brief introduction to the PyMOL interface. When you open PyMOL, two windows appear; one is labeled "PyMOL Tcl/Tk GUI" (or “The PyMOL Molecular Graphics System”) and the other is labeled "PyMOL Viewer" (or “MacPyMOL”). The PyMOL Tcl/Tk GUI is the external graphical user interface that serves as a command line to interact with PyMOL as well as to provide you with output from the program. The PyMOL Viewer contains the internal GUI and the viewing window. The internal GUI serves as a list of objects within PyMOL while the viewing window is where your molecule will be displayed. Here are a few simple PyMOL commands that will help you as you go through these exercises. 1. select selection-name, selection-expression: The select command is very useful when you want to modify a subset of atoms. The selection-name is the name you want to assign to the selection so that you can refer to that selection in the future when you want to make modifications. The selection-expression is a reference to the residues in the molecule that you want to select. Each residue can be identified with either a unique name or number that is found in the ATOM (for amino acids) and HET (for hetero atom) records of the PDB file. You can also find the residue name/number by clicking the residue in the PyMOL viewing window and looking at the output in the command prompt of the PyMOL Tcl/Tk GUI window. The selection expression uses the 'resi' command to reference residues by number and the 'resn' command to refer to residues by name. They can be used jointly to specify a specific residue. For example, let's say that a critical residue in the active site of a molecule is Histidine 95. There are a few ways to select this residue: a. select activeSite, resi 95 and resn his b. select activeSite, resi 95 c. We cannot simply type select activeSite, resn his because that will select all the histidine residues in the structure. 2. show/hide representation, selection-name: The show and hide commands can be used to alter the way the molecule appears in the viewing window. Users can choose from cartoon, spheres, sticks, surface, and others to view their molecule. So let's say we want to make a previously selected active site appear as spheres: a. show spheres, activeSite b. Alternatively the spheres can be hidden by typing: hide spheres, activeSite 3. color color-name, selection: The color command can be used to alter the color of a selection. Let's change the color of our active site to red. a. color red, activeSite 4. Keep in mind that if you have ConSCRIPT installed with PyMOL, you can also use RasMol or Jmol commands, which ConSCRIPT will then translate into commands that are executable by PyMOL. To activate the ConSCRIPT translator, you simply enter RasMOL commands preceded with one of "R,r,VSL,vsl,RASMOL or rasmol" typed directly into the PyMOL command line. For more information on selecting residues and PyMOL commands see the PyMOL community wiki (http://pymolwiki.org/index.php/Main_Page). 1. Obtaining Structural Information. Please review the materials in your textbook about secondary structure of proteins. Secondary structures include alpha helices, beta sheets and beta turns in proteins. Many programs have been written that will predict secondary structures that will be found in a protein, based only on the primary sequence. Let's start again with rabbit muscle triose phosphate isomerase. Here is the primary sequence: >gi|136066|sp|P00939|TPIS_RABIT Triosephosphate isomerase (TIM) (Triose-phosphate isomerase) APSRKFFVGGNWKMNGRKKNLGELITTLNAAKVPADTEVVCAPPTAYIDFARQKLDPKIAVAAQNCYKV TNGAFTGEISPGMIKDCGATWVVLGHSERRHVFGESDELIGQKVAHALSEGLGVIACIGEKLDEREAGI TEKVVFEQTKVIADNVKDWSKVVLAYEPVWAIGTGKTATPQQAQEVHEKLRGWLKSNVSDAVAQSTRII YGGSVTGATCKELASQPDVDGFLVGGASLKPEFVDIINAKQ a. There are a number of web servers that will predict secondary structure based on the primary sequence of a protein. Here is a list (in case one or more is not working on a given day). If all fail because their web addresses have changed, a Google search for “protein secondary structure prediction” should be successful. i. PredictProtein (http://www.predictprotein.org/). To start, you will need to create an account on this site. You can actually request this site to predict secondary structure from 7 different web servers on line. If this site is available, it will enable you to complete this assignment by clicking on 2 or more of the optional services. Please note that results may take one or two days. ii. JPred (http://www.compbio.dundee.ac.uk/www-jpred/). Click on the advanced link to the right of the sequence box. If you use the JPred server, be certain to check the box labeled “Skip searching PDB before prediction”. Submit the rabbit muscle triose phosphate isomerase sequence to these two servers. Compare the results you receive from the different servers. Can you identify segments where the predictions are not consistent between servers? b. The structure of rabbit muscle triose phosphate isomerase has been determined by X-ray crystallography. Please go to the Protein Data Bank web server (http://www.rcsb.org/pdb/home/home.do) and search for 1R2R (that is the PDB ID for this protein). To do so, go to the blue band at the top of the page and select “PDB ID or Text”, enter 1R2R in the box, and click on “Search”. The page that comes up contains several tabs: Summary, Sequence, Derived Data, Seq. Similarity, 3D Similarity, Literature, Biol. & Chem., Methods, Geometry, and Links. The page normally opens to the Summary tab. Click on Sequence tab. The results shown here for the secondary structure are from an analysis of the actual 3D structure (not a prediction), which has been calculated according to an implementation of the method of Kabsch and Sander (1983) Biopolymers 22, 2577-2637. The assignments are: H=helix; B=residue in isolated beta bridge; E=extended beta strand; G=310 helix; I=pi helix; T=hydrogen bonded turn; S=bend. Compare your predicted results with the results presented on the PDB site. c. As a first attempt at molecular visualization, please return to the Summary tab and follow the links on the PDB site for "Download File." You can download the file in a number of formats. It is best to download the file in “PDB file (text)” format for use with PyMOL. Save the structure file as 1R2R.pdb on your computer (suggested folder: My Documents/PDB Files). Open the PyMOL program. Then use File..Open from the drop-down menus to open the 1r2r.pdb. As an alternative, you can also load the file directly to PyMOL from the PDB web site if you have an active Internet connection: i. Simply open the PDB Loader Service from the PyMOL Plugin drop-down menu. ii. Enter 1r2r in the box and hit return. This will load the PDB file directly in your Viewer window. iii. As an alternative, you can use the File..Open dropdown menu in PyMOL and open the file you have saved to your hard drive. d. You will initially see a stick model that represents all the bonds in the structure as simple lines. Let’s run a few commands to make a simpler image. Remember that you are using a ConSCRIPT enabled version of PyMOL, so you simply need to preface your RasMol or Jmol command with an r to get PyMOL to execute the command. You can enter the script at the PyMOL> prompt at the bottom of either PyMOL window. i. ii. iii. iv. v. r r r r r restrict protein select protein wireframe off cartoon color structure What color are the alpha helices? What color are the beta strands? Explore the PDB file to find the abbreviations for the three ligands: Mg2+, dimethyl sulfoxide, and 2-amino-2-hydroxymethyl-propane-1,3diol. To rotate the image, hold down the (left) mouse button while dragging the mouse over the image. Now enter these commands to create some more changes in the appearance of triose phosphate isomerase: vi. r select hetero and not water (selects non-protein parts of the structure excluding water) vii. r spacefill (a van der Waals radius representation) viii. r color CPK (standard chemistry color scheme) ix. r select protein x. r cartoon off xi. r wireframe 30 xii. spacefill 100 (These combined commands yield a ball-and-stick structure of the protein.) xiii. zoom 200 (This gives a 2X expansion of the view. You can also zoom in on the structure in the viewing window by holding down the Shift key on your keyboard while using a left-mouse click-and-drag from the top to the bottom of the window. Experiment with this.) xiv. Now convert the protein back to a cartoon with the following four commands 1. r select protein 2. r wireframe off 3. r spacefill off 4. r cartoon 2. Exploring the Protein Data Bank. In the previous problem, we visited the Protein Data Bank (PDB). We will explore that site in more detail now. If you encounter difficulties at any point in this exercise, you may be able to find your way using the Search box on the main site page or the Help files (on the left side of the page). The PDB is a repository of macromolecular structures. Perhaps the most important skill for a PDB site user is the ability to find the structures they are seeking. On the home page (http://www.rcsb.org/pdb/home/home.do), the Help menu on the left side of the page includes Video Tutorials. These Flash animations will instruct you on navigating the site, searching for proteins and using the tools and viewers on the site. Structures in the PDB are assigned PDB IDs - 4 letter alphanumeric codes that uniquely identify each structure. So for example 4HHB is a hemoglobin structure and 8GCH is a chymotrypsin structure. If you know the PDB ID, then you can use that to search the PDB. You may ask - why would I know that code unless I was the crystallographer who determined that structure? Most scientists who determine macromolecular structures are highly motivated to publish their findings in journals such as Science, Nature, Journal of Biological Chemistry, Journal of Molecular Biology and Protein Science. These journals have an agreement with the PDB that requires authors to submit their structures to the PDB before they will publish the article in their journal. Also, the figures in the textbook showing structures of proteins and nucleic acids list the corresponding PDB ID. For our first PDB search, we're going to find a PDB ID in a journal article, then find that structure on the PDB site. Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search for this paper using the QUICK SEARCH menu near the top of the page: Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram, Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium falciparum Triose-phosphate Isomerase-2-Phosphoglycerate Complex at 1.1-Å Resolution. J. Biol. Chem. 2003 volume 278, pages 52461-52470. Download the article (Full text or PDF – it’s free). Go to the footnotes section and find the four character PDB ID code. Then go to the Protein Data Bank main page. Type the PDB ID in the search box and click the Search button. You should be taken to the Structure Summary page for this enzyme. The Structure Summary page contains links to many related resources. Try to do each of the following: a. Download the PDB (structure) file for this protein to your computer. Remember where you put it (suggested folder: My Documents/PDB Files; suggested name: 1o5x.pdb). In problem 3, you're going to study this structure using Jmol. b. Download the protein sequence in FASTA format. c. Find the still images of this protein on the 1o5x Summary page in the Biological Assembly box. Click on the link to More Images…. To save an image on the page that appears, just right click on it (CNTRL-click for a one-button mouse) and select the option that lets you save the file (In Internet Explorer, the command is "Save Picture As.."; in Firefox and Safari, the command is "Save Image As.."). d. Return to the "Summary" page for 1o5x. Click on "Links" tab. Follow the links for 1o5x to the sites at PDBSum and the IMB Jena Image Library. Collect still images from each of these sites. Make sure you keep a record of where you found each image. 3. Examining Protein Structures. In Problem 2, you should have saved the PDB file for 1o5x, entitled "Plasmodium Falciparum TIM Complexed To 2-Phosphoglycerate." We're going to use PyMOL to explore this structure. We'll be particularly interested in identifying secondary structures and looking at the active site. a. You're going to expand on the question 1c exercise. Open PyMOL on your computer. If you have not installed it already, please see the opening paragraph for the exercises in this chapter. b. Open the file 1o5x.pdb using the File..Open drop-down menu. When you first open it, you will see wireframe representation of the structure with the waters shown as small red dots. Use these commands to alter the appearance of the structure: i. r restrict protein ii. r select protein iii. r wireframe off iv. r cartoon v. r color structure c. When you open PyMOL, a structure viewing window appears on your computer. As you have seen, you can use Rasmol/Jmol scripts to control the appearance of the structure from the command line in PyMOL. We'll use this to select the small molecules that are bound to triose phosphate isomerase in this structure: 3hydroxypyruvic acid and 2-phosphoglyceric acid, but first you'll need to learn a little bit about viewing a PDB file. Go to the Structure Explorer page for 1o5x (http://www.rcsb.org/pdb/home/home.do/explore/explore.do?structureId=1O5X). Click on Display File..PDB File on the upper right side of the page. This will bring up the PDB file, which contains a lot of information. We'll only look at a few items. Each line in a PDB file is called a "record" and the first 6 characters on that line tell what kind of "record" it is. In your browser, search for SEQRES. According to the PDB Format Description, "SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied." So you can see the sequence of your protein there. For 1o5x, the SEQRES section looks like this: SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 A A A A A A A A A A A A A A A A A A A A B B B B B B B B 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 248 MET ASN PHE VAL LYS VAL SER ILE THR LYS LEU THR PHE ILE LEU CYS GLY GLN SER MET MET ASN PHE VAL LYS VAL SER ILE ALA GLY ASN VAL LEU SER ALA ILE ASP ASN GLU LYS ASP GLY VAL GLY GLY GLN LEU ARG THR ASN PHE LEU LYS GLU GLY GLU ASN GLN GLN ASN THR HIS GLU SER GLU LYS LYS LEU LEU PRO GLN PHE ILE HIS ASP LEU ARG VAL VAL GLY LYS LYS VAL ASP GLU TYR GLU ASP VAL SER GLY ALA PHE VAL LYS GLU LYS ILE LYS GLU GLN ASN ILE SER PHE SER PHE SER LYS ASN LYS GLU ARG ALA GLN ALA LEU THR ILE ALA THR ASP PHE VAL ILE ASP VAL PHE GLY ASP ARG GLU VAL ASN PHE VAL ALA ARG ASN GLU GLY VAL ALA LYS PRO HIS SER SER LEU ARG LYS VAL LYS VAL TYR THR LYS GLN ASN PHE ASP ALA SER SER TYR THR TYR ASN LYS LEU CYS THR ASP GLU PRO ILE ILE CYS LEU ILE ASN LEU LYS ASP GLY THR ILE TYR GLN PHE ILE LEU PRO GLU VAL ARG SER VAL ILE TRP THR LEU HIS ILE GLY GLU PHE ALA GLY GLU ILE LEU GLN LYS ILE SER GLY LYS LYS ASN ASP THR GLN GLU TYR HIS SER GLU VAL ASP TRP ALA ASP LEU LEU ASN SER CYS SER VAL ARG ASN VAL VAL GLU LEU SER ILE ASN ALA GLN THR TYR ILE ALA ALA ALA GLY ASN VAL LEU SER ALA ILE ARG THR ASN PHE LEU LYS GLU GLY LYS LEU LEU PRO GLN PHE ILE HIS TYR GLU ASP VAL SER GLY ALA PHE PHE SER PHE SER LYS ASN LYS GLU VAL ILE ASP VAL PHE GLY ASP ARG ALA LYS PRO HIS SER SER LEU ARG ALA SER SER TYR THR TYR ASN LYS ASN LEU LYS ASP GLY THR ILE TYR TRP THR LEU HIS ILE GLY GLU PHE LYS ASN ASP THR GLN GLU TYR HIS CYS SER VAL ARG ASN VAL VAL GLU SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES SEQRES 9 10 11 12 13 14 15 16 17 18 19 20 B B B B B B B B B B B B 248 248 248 248 248 248 248 248 248 248 248 248 THR LYS LEU THR PHE ILE LEU CYS GLY GLN SER MET ASP ASN GLU LYS ASP GLY VAL GLY GLY GLN LEU GLU ASN GLN GLN ASN THR HIS GLU SER GLU LYS ASP LEU ARG VAL VAL GLY LYS LYS VAL ASP GLU VAL LYS GLU LYS ILE LYS GLU GLN ASN ILE SER ARG ALA GLN ALA LEU THR ILE ALA THR ASP PHE GLU VAL ASN PHE VAL ALA ARG ASN GLU GLY VAL LYS VAL LYS VAL TYR THR LYS GLN ASN PHE ASP LEU CYS THR ASP GLU PRO ILE ILE CYS LEU ILE GLN PHE ILE LEU PRO GLU VAL ARG SER VAL ILE ALA GLY GLU ILE LEU GLN LYS ILE SER GLY LYS SER GLU VAL ASP TRP ALA ASP LEU LEU ASN SER LEU SER ILE ASN ALA GLN THR TYR ILE ALA ALA Each line contains 13 amino acid residues, using the 3-letter abbreviations for the amino acids. So residue #27 in chain A is PHE (phenylalanine). The 12th character in each record is a chain identifier. If a protein contains more than one polypeptide chain, the chains will be identified with a letter (in this case, there are two chains - A and B). Anything in a PDB file that is not either protein or nucleic acid is considered a heterogeneous atom and is referred to with the prefix "het". So HETNAM is the label for a record that contains the name of a non-protein, non-nucleic acid group. Search the HTML version of your structure file for "HETNAM". What are the hetero groups in this structure? Now, let’s continue exploring 1o5x in PyMOL using ConSCRIPT and the command line in PyMOL. i. r select hetero and not water This command selects all the heterogeneous atoms excluding water. To show the heterogeneous molecules, enter these two commands consecutively: ii. r spacefill iii. r color cpk This creates a spacefilling representation of the three heterogenous molecules and colors them according to Corey-Pauling-Kultun (CPK; carbon is gray, oxygen is red; phosphorous is orange). Note that the phosphite ion and the 3hydroxypyruvate molecule are in contact). d. As the last part of this exercise, we're going to show some of the active site residues, based on Figure 4 of the primary citation for this structure (Sampathkumar Parthasarathy, Kandiah Eaazhisai, Hemalatha Balaram, Padmanabhan Balaram, and Mathur R. N. Murthy. Structure of Plasmodium falciparum Triose-phosphate Isomerase-2-Phosphoglycerate Complex at 1.1-Å Resolution. J. Biol. Chem., Vol. 278, Issue 52, 52461-52470, December 26, 2003). Figure 4a shows three residues that interact with the 2-phosphoglycerate: glutamate 165, lysine 12 and histidine 95. Use this command to select these residues: i. r select 12,95,165 ii. r spacefill 100 iii. r wireframe 30 This gives the side chains a ball-and-stick appearance. Finally change them to CPK coloring so you can distinguish the atoms on the structure iv. r color cpk To get a better look at this interaction, you can zoom in on a structure by pressing the right mouse button and moving the mouse from top to bottom of the viewing window. To move the image side-to-side or up-and-down, use the middle button on your mouse (on a Macintosh) and drag the image where you want to go (you may need to experiment with your mouse/operating system to find the right combination to move the image around the screen). Using this approach, you can get a closeup view of the binding site for 2phosphoglycerate. Advanced: You may have noticed that there are actually two representations of glu-165 in the closeup view of one of the active sites on 1o5x. Why are there two glu-165’s in this structure? To complete this exercise, identify and display additional residues that interact with the 2-phosphoglycerate in 1o5x. A couple hints: Use Figure 3 and Figure 4 from the primary citation. Also, for some reason, Jmol won't select 2phosphoglycerate using the "select 2PG", but you can select it using "select 4400", which is a second way 2PG is identified in 1o5x. 4. Protein Families. Now we will explore protein families. The goal of this exercise is to identify a protein that shares structural homology with triose phosphate isomerase (as seen in PDB ID 1o5x), but catalyzes a different reaction. We will start with two resources - CATH and SCOP. a. CATH. You can start at the CATH homepage (http://www.cathdb.info/). Enter 1o5x in the Search box at the top of the page. The results will be presented in a page with 6 tabs. i. The first tab is the search tab. You can return here is you wish to perform another search or to search using a FASTA sequence. ii. The Results Summary tab provides links to each of the four levels of the CATH hierarchy for this structure. These four levels of hierarchy are also represented in the four remaining tabs. iii. The Cathnodes tab gives you access to the classification lineage for triose phosphate isomerase, along with a list of related structures. The tabs on this page provide links to members of the same homologous superfamily, along with alignments and structural neighbors. What is the Cathnodes class for 1o5x? iv. The CATH Domain tab describes structural neighbors of the query protein. This is a good place to find enzymes from the same superfamily that catalyze different reactions. v. The CATH Chains tab simply lists the chains from the query structure along with linked keywords. vi. The CATH Pdbs tab lists the PDB ID, along with linked keywords. Click on the Cathnodes class link for 1o5x. What is its Homologous Superfamily and classification lineage? Scroll down the same page that gave you the Homologous Superfamily and Classification Lineage to find the list of Non-Redundant Representatives. Explore the page to find 5 enzymes in this superfamily that catalyze reactions different from the reaction catalyzed by triose phosphate isomerase. Links to Representative Domains will lead you to lists of candidate proteins. The first four characters of the match are the PDB id for the structure (e.g., 1wa3D00 points to PDB ID 1wa3). Compare the function of these proteins using the links on the CATH site. Return to the Cathnodes page (which contained the list of Non-Redundant Representatives) and follow the links for Alignments. b. SCOP. Go to the SCOP homepage (http://scop.mrc-lmb.cam.ac.uk/scop/). Click the link for “Keyword search of SCOP entries”. Enter “triosephosphate” as one word without the quote marks to search to triose phosphate isomerase. SCOP provides a Lineage for each protein that is classified. Follow the lineage links at the Fold level to identify 5 proteins that are related to triose phosphate isomerase. Are any of these proteins also in your list from your search of the CATH database? c. The final part of this exercise is to identify other resources that help you find proteins related to triose phosphate isomerase from Plasmodium falciparum. You are encouraged to follow other links from the PDB External Links page for 1o5x, but you may also be able to find other resources by searching the Internet using the PDB ID codes. List and summarize three other resources that you find.