Download Exercise 4: Side-Chain Modeling - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Catalytic triad wikipedia , lookup

Metabolism wikipedia , lookup

Protein wikipedia , lookup

Proteolysis wikipedia , lookup

Peptide synthesis wikipedia , lookup

Metalloprotein wikipedia , lookup

Genetic code wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Structural alignment wikipedia , lookup

Biosynthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
Workshop in Computational Structural Biology (81813), Spring 2017
Exercise 4: Side-Chain Modeling
Contents
Representation of side chains as rotamers
1
Side-chain modeling
1
Rosetta fixbb
1
SCWRL
2
Quality assessment - different measures of success prediction
3
Representation of side chains as rotamers
In class, we saw how a small number of rotamers can be used to represent the most common side chain
conformations in proteins. Not all residues in crystal structures are, however, "rotameric" - some adapt
non-rotameric conformations.
Question 1: Inspect position I13 of ubiquitin and explain why this side chain conformation does not
appear in any rotamer library.
Side-chain modeling
In this section you will predict the rotamers in the structure of Ubiquitin from a backbone-only model using
Rosetta and SCWRL, two of the leading tools for side- chain modeling.
Rosetta fixbb
This application allows you to remodel a protein's side-chains (rotamers, as well as identities), while
keeping the backbone fixed. For this purpose, we will provide Rosetta the modeling instructions via a
resfile. Resfiles are used in Rosetta to inform the protocol you are using about specific tasks you require You can manipulate the amino acids of a structure to be changed completely (i.e. mutated), remodeled to
a better rotamer, or left untouched. Resfiles are text files that contain a header for default general
behavior of any residue that does not have specific instructions, and later, the file body, with residue
specific commands. Common resfile commands are:
• NATRO - freeze this side-chain; keep both aa type and rotamer fixed (natural rotamer).
• NATAA - keep the amino acid type, allow change of rotamer (natural amino acid).
• ALLAA - allow change to all possible amino acids (all amino acids).
• PIKAA - mutate to one of a specific list of aa types (pick amino acid).
• NOTAA - excludes mutations to specific aa types (not amino acid).
Inspect the example resfile $CSBW_HOME/resources/ex4/example_resfile.txt. The header
allows global commands, while after the start line, specified residues can be addressed. The NATAA
statement will keep all the amino acid types (unless stated otherwise later). However, for residue 5 in
chain A, no search for rotamers will be performed - the NATRO command will use the native rotamer,
rather than search for a (perhaps) better one.
We can add additional rotamers to the standard rotamer library used for each residue. The EX 1 and
EX 2 statements will add extra χ1 and χ2 rotamers respectively for buried residues only, and one can
also control the amount of rotamers added or the number of neighbors a residue must have in order to be
considered "buried". Another option that has proven to be very useful is use_input_sc which means
1
Workshop in Computational Structural Biology (81813), Spring 2017
that each position will also have its starting state added to the rotamer library. More on the resfile format in
the Rosetta manual.
Create a resfile that instructs Rosetta to repack all the residues of a structure. This can be done in any
text editor, using NATAA on all amino acids. Make sure to add the EX 1 EX 2 statement. For convenience,
call the file repack.res.
Next, acquire a PDB of ubiquitin stripped of its side-chains (i.e. backbone only):
$>
prody select "backbone and chain A" 1ubi --output "1ubiA_bb"
Run Rosetta to model back its side chains:
$>
fixbb.linuxgccrelease \
-database $ROSETTA_DB \
-s 1ubiA_bb.pdb \
-resfile repack.res \
-nstruct 15 \
-scorefile scoreFixedBB.sc \
> fixbb.log
Output: 15 structures, called 1ubiA_bb_0001.pdb to 1ubiA_bb_0015.pdb.
Question 2: Inspect the structures in PyMOL. Where do the side chains differ? (Provide a general
answer. No need to refer to specific positions).
Question 3: Look at scoreFixedBB.sc. What is the range of energy scores?
Now let's compare the structures to the native PDB:
Question 4: a. Find a position where all decoys have the same rotamer as the native, and a position
where all decoys have the same rotamer, but it is different from the native rotamer. b. Repeat your run,
but this time include the native side chain conformation (i.e. add the -use_input_sc flag to the
command-line and take 1ubiA.pdb as your starting structure). To be able to generate new output, make
sure to use the -out:prefix flag as you did in exercise 3. c. Can you identify regions where you
improved your prediction?
Use prody align to measure the side-chain RMSd of the lowest energy decoy from both runs to the
native.
$> prody align -s "sc and noh" 1ubiA.pdb <name_of_structure_to_compare_to.pdb>
Question 5: What is the RMSd value you got?
SCWRL
SCWRL models side-chains of structures by breaking the problem into segments. Side chains are
represented as vertices in an undirected graph, where each two interacting residues are connected by an
edge. The graph can be partitioned into connected subgraphs with no edges between them. These can be
broken into biconnected components, which are graphs that cannot be disconnected by removal of a
single vertex. The combinatorial problem is reduced to finding the minimum energy of these small
biconnected components and combining the results to identify the global minimum energy conformation.
Run SCWRL:
$> Scwrl4 \
-i 1ubiA_bb.pdb
\ # input file
2
Workshop in Computational Structural Biology (81813), Spring 2017
-o 1ubi_scwrl.pdb \ # output file
> 1ubi.scwrl.log
Question 6: Find the RMS between the SCWRL-generated sidechains and the native structure. What is
its value?
Question 7: Fit the native structure, the lowest-energy structure of the naive Rosetta run (the first one)
and the SCWRL-generated structure on top of one another. What are the differences between the
structures? Are these differences located at the core or at the surface? Are they pinpointed to specific
residues? Are they important?
Quality assessment - different measures of success
prediction
The accepted method for assessing the success of a side chain modelling program is comparing the χ1
(and sometimes also χ2) values of the proposed side chains and the native side chains. The standard is
that each position with up to 40 degrees deviation is considered to be successfully predicted.
We will now use this measure to assess the success in side chain prediction of another structure, solved
at a resolution of 0.66Å. First, let's get a clean copy of this structure, stripped from its side chains:
$>
$>
$>
prody fetch 1us0
prody "protein and chain A" 1us0.pdb --output "1us0A"
prody select "backbone and chain A" 1us0.pdb --output "1us0A_bb"
Next, run the fixbb command to generate 15 models of the structure with its predicted side chains. For
this, run the same command as before, only provide 1us0A_bb as the input, and give the score file and
log file new names.
Now run these commands in order to find out what is the percent of recapitulated residues in each model:
$> python calc_percent_recapitulation.py 1us0A.pdb 1us0A_bb_00*pdb > recapitulation.txt
$> grep "recapitulated" recapitulation.txt
Question 8: What is the range of values you get? Do you think this indicates success in side chain
recapitulation?
Next, we'll inspect the structure with the highest percent of recapitulated side chains. Open the structure in
PyMOL (if several structures had the same percent, choose one of them). In order to find the positions
that were not well modeled according to this criterion, in your terminal, run the following command:
$> grep deviation recapitulation.txt | grep <my_model_name>
Question 9: Select several positions from this list and compare them to several positions that were
recapitulated. From visual inspection, would you say all positions that were not recapitulated are not well
modeled? And positions that were recapitulated were modeled well? Attach a session highlighting the
positions you've inspected.
As a result of the high resolution in which this structure was solved, in some positions, the
experimentalists were able to assign two different side chain conformations of a flexible residue (Note that
when Rosetta read in the structure it saved only the first of these conformations). Open 1us0.pdb in
pymol and inspect position K100.
3
Workshop in Computational Structural Biology (81813), Spring 2017
Question 10: a. Measure the χ1 of the two K100 conformations. What is the delta value you get and how
does it compare to the threshold value (40 degrees)? b. How similar do you think the two conformations
are and is this similarity reflected in the value you got?
4