Download Exercise 4: Side-Chain Modeling - CS

Workshop in Computational Structural Biology (81813), Spring 2017 Exercise 4: Side-Chain Modeling Contents Representation of side chains as rotamers 1 Side-chain modeling 1 Rosetta fixbb 1 SCWRL 2 Quality assessment - different measures of success prediction 3 Representation of side chains as rotamers In class, we saw how a small number of rotamers can be used to represent the most common side chain conformations in proteins. Not all residues in crystal structures are, however, "rotameric" - some adapt non-rotameric conformations. Question 1: Inspect position I13 of ubiquitin and explain why this side chain conformation does not appear in any rotamer library. Side-chain modeling In this section you will predict the rotamers in the structure of Ubiquitin from a backbone-only model using Rosetta and SCWRL, two of the leading tools for side- chain modeling. Rosetta fixbb This application allows you to remodel a protein's side-chains (rotamers, as well as identities), while keeping the backbone fixed. For this purpose, we will provide Rosetta the modeling instructions via a resfile. Resfiles are used in Rosetta to inform the protocol you are using about specific tasks you require You can manipulate the amino acids of a structure to be changed completely (i.e. mutated), remodeled to a better rotamer, or left untouched. Resfiles are text files that contain a header for default general behavior of any residue that does not have specific instructions, and later, the file body, with residue specific commands. Common resfile commands are: • NATRO - freeze this side-chain; keep both aa type and rotamer fixed (natural rotamer). • NATAA - keep the amino acid type, allow change of rotamer (natural amino acid). • ALLAA - allow change to all possible amino acids (all amino acids). • PIKAA - mutate to one of a specific list of aa types (pick amino acid). • NOTAA - excludes mutations to specific aa types (not amino acid). Inspect the example resfile $CSBW_HOME/resources/ex4/example_resfile.txt. The header allows global commands, while after the start line, specified residues can be addressed. The NATAA statement will keep all the amino acid types (unless stated otherwise later). However, for residue 5 in chain A, no search for rotamers will be performed - the NATRO command will use the native rotamer, rather than search for a (perhaps) better one. We can add additional rotamers to the standard rotamer library used for each residue. The EX 1 and EX 2 statements will add extra χ1 and χ2 rotamers respectively for buried residues only, and one can also control the amount of rotamers added or the number of neighbors a residue must have in order to be considered "buried". Another option that has proven to be very useful is use_input_sc which means 1 Workshop in Computational Structural Biology (81813), Spring 2017 that each position will also have its starting state added to the rotamer library. More on the resfile format in the Rosetta manual. Create a resfile that instructs Rosetta to repack all the residues of a structure. This can be done in any text editor, using NATAA on all amino acids. Make sure to add the EX 1 EX 2 statement. For convenience, call the file repack.res. Next, acquire a PDB of ubiquitin stripped of its side-chains (i.e. backbone only): $> prody select "backbone and chain A" 1ubi --output "1ubiA_bb" Run Rosetta to model back its side chains: $> fixbb.linuxgccrelease \ -database $ROSETTA_DB \ -s 1ubiA_bb.pdb \ -resfile repack.res \ -nstruct 15 \ -scorefile scoreFixedBB.sc \ > fixbb.log Output: 15 structures, called 1ubiA_bb_0001.pdb to 1ubiA_bb_0015.pdb. Question 2: Inspect the structures in PyMOL. Where do the side chains differ? (Provide a general answer. No need to refer to specific positions). Question 3: Look at scoreFixedBB.sc. What is the range of energy scores? Now let's compare the structures to the native PDB: Question 4: a. Find a position where all decoys have the same rotamer as the native, and a position where all decoys have the same rotamer, but it is different from the native rotamer. b. Repeat your run, but this time include the native side chain conformation (i.e. add the -use_input_sc flag to the command-line and take 1ubiA.pdb as your starting structure). To be able to generate new output, make sure to use the -out:prefix flag as you did in exercise 3. c. Can you identify regions where you improved your prediction? Use prody align to measure the side-chain RMSd of the lowest energy decoy from both runs to the native. $> prody align -s "sc and noh" 1ubiA.pdb <name_of_structure_to_compare_to.pdb> Question 5: What is the RMSd value you got? SCWRL SCWRL models side-chains of structures by breaking the problem into segments. Side chains are represented as vertices in an undirected graph, where each two interacting residues are connected by an edge. The graph can be partitioned into connected subgraphs with no edges between them. These can be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. Run SCWRL: $> Scwrl4 \ -i 1ubiA_bb.pdb \ # input file 2 Workshop in Computational Structural Biology (81813), Spring 2017 -o 1ubi_scwrl.pdb \ # output file > 1ubi.scwrl.log Question 6: Find the RMS between the SCWRL-generated sidechains and the native structure. What is its value? Question 7: Fit the native structure, the lowest-energy structure of the naive Rosetta run (the first one) and the SCWRL-generated structure on top of one another. What are the differences between the structures? Are these differences located at the core or at the surface? Are they pinpointed to specific residues? Are they important? Quality assessment - different measures of success prediction The accepted method for assessing the success of a side chain modelling program is comparing the χ1 (and sometimes also χ2) values of the proposed side chains and the native side chains. The standard is that each position with up to 40 degrees deviation is considered to be successfully predicted. We will now use this measure to assess the success in side chain prediction of another structure, solved at a resolution of 0.66Å. First, let's get a clean copy of this structure, stripped from its side chains: $> $> $> prody fetch 1us0 prody "protein and chain A" 1us0.pdb --output "1us0A" prody select "backbone and chain A" 1us0.pdb --output "1us0A_bb" Next, run the fixbb command to generate 15 models of the structure with its predicted side chains. For this, run the same command as before, only provide 1us0A_bb as the input, and give the score file and log file new names. Now run these commands in order to find out what is the percent of recapitulated residues in each model: $> python calc_percent_recapitulation.py 1us0A.pdb 1us0A_bb_00*pdb > recapitulation.txt $> grep "recapitulated" recapitulation.txt Question 8: What is the range of values you get? Do you think this indicates success in side chain recapitulation? Next, we'll inspect the structure with the highest percent of recapitulated side chains. Open the structure in PyMOL (if several structures had the same percent, choose one of them). In order to find the positions that were not well modeled according to this criterion, in your terminal, run the following command: $> grep deviation recapitulation.txt | grep <my_model_name> Question 9: Select several positions from this list and compare them to several positions that were recapitulated. From visual inspection, would you say all positions that were not recapitulated are not well modeled? And positions that were recapitulated were modeled well? Attach a session highlighting the positions you've inspected. As a result of the high resolution in which this structure was solved, in some positions, the experimentalists were able to assign two different side chain conformations of a flexible residue (Note that when Rosetta read in the structure it saved only the first of these conformations). Open 1us0.pdb in pymol and inspect position K100. 3 Workshop in Computational Structural Biology (81813), Spring 2017 Question 10: a. Measure the χ1 of the two K100 conformations. What is the delta value you get and how does it compare to the threshold value (40 degrees)? b. How similar do you think the two conformations are and is this similarity reflected in the value you got? 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Exercise 4: Side-Chain Modeling - CS