Download CSCI 474 Lab 4a : inferring the effects of mutations Spring 2017

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Circular dichroism wikipedia , lookup

Rosetta@home wikipedia , lookup

Cyclol wikipedia , lookup

Proteomics wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein domain wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein design wikipedia , lookup

Protein folding wikipedia , lookup

Protein purification wikipedia , lookup

Alpha helix wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
CSCI 474
Lab 4a : inferring the effects of mutations
Spring 2017
Computer Science
In this lab you will explore an online tool, SDM, for inferring the
effects of a single amino acid mutation on the structure and stability
of a protein. You’ll rely on tools that you’ve used in the past such as
NCBI BLAST and Clustal Omega, and you’ll use an on-line database of
wet lab experimental results about mutations to validate the
prediction made by SDM.
Compose answers to the bold questions throughout this lab and submit your write up via Canvas.
I. Scenario
Your student who has been collecting organisms from a local pond has brought to you two FASTA
sequences. One sequence, prot_WT_FASTA, is what he claims is the WT (Wild Type, non-mutated) form
of a protein that 99% of the organisms in his sample possess.
The other, prot_Mutant_FASTA, is a sequence of a protein that he claims is found in 1% of the
organisms in the samples that he has collected. These are the mutant proteins.
Your task is to determine how the two protein sequences differ, and how that difference between the
WT and mutant affects the stability of the mutant protein relative to the wild type version.
I. Sequence data
Both sequences, prot_WT_FASTA.txt and prot_Mutant_FASTA.txt are provided to you on the course
website.
1. Download these two sequences and save them to a local directory.
Q1 : At what position (which amino acid number) in the polypeptide chain do the two protein
sequences differ? You can perform a visual inspection, or – better yet – use Clustal Omega to perform
a multiple sequence alignment.
Q2 : At the amino acid location where the WT sequence differs from the Mutant sequence, what is the
amino acid in the WT? Give the full name. How many atoms does that amino acid have?
Q3 : At the amino acid where the WT sequence differs from the Mutant sequence, what is the amino
acid in the mutant? Give the full name. How many atoms does that amino acid have?
WWU CSCI 474, Lab 4a : effects of mutations
Filip Jagodzinski
Q4 : Based on the different atom counts in the WT and Mutant amino acids that you identified in Q2
and Q3, formulate a hypothesis about the effect that the mutation (from WT to Mutant) has on the
structure of the protein. A few sentences should suffice and your answer does not need to be
technical. However do not say “The mutation will disrupt the protein” or “The mutation will affect the
protein.” Instead, say HOW might the mutation affect the protein.
II. Structure Data
There are available several tools, including those that rely on Machine Learning (ML) techniques, that
(try to) infer the effect of a mutation using only the sequence of amino acids in a protein. Such tools
have varying degrees of accuracy for the reason that the effect of a mutation is dependent on how the
amino acid substitution/deletion/insertion affects the 3D structure of the protein, and not just the
sequence.
In order to assess best how the mutation in your WT sequence affects the protein, you need to compare
the 3D structures of the WT protein and mutant. Unfortunately, your student only provided you with
FASTA sequences, and using X-ray crystallography to sequence and then analyze both structures could
be a year-long process. However, what if the WT sequence already exists in the gene bank and/or PDB?
2. Perform a NCBI BLAST using the WT sequence as the query sequence. Because you are dealing with
proteins instead of DNA sequences, use protein blast. Run BLAST with all default parameters.
Q5 : What is the Max Score and Total Score (from the Descriptions panel) of the top-most BLAST
match for the WT sequence? What is the Score (bits) and Identities value (from the Alignments
panel)? What is the superfamily of your sequence according to the BLAST results?
You should notice that the BLAST match found a “synthetic coding“ sequence to which the WT
sequences aligns with best. Your answers to Q5 should be for the synthetic match.
However, we want to use a non-synthetic sequence.
3. Inspect the Alignments section, for the top-most match found by BLAST (the synthetic coding result).
Click on See 13 more title(s) to see identical matches.
Q6 : What is the Protein Data Bank structure ID (4 alpha numeric characters) of the first non-synthetic
match to your query when you expand See 13 more title(s)?
You now know the protein type (superfamily), and the entry in the PDB that matches best with your WT
sequence.
III. Site Directed Mutator (SDM)
SDM is a mutation analysis tool developed by the Crystallography and Bioinformatics Group at
Cambridge. The web-server is at the following URL : http://mordred.bioc.cam.ac.uk/~sdm/sdm.php
WWU CSCI 474, Lab 4a : effects of mutations
Filip Jagodzinski
SDM relies on several metrics and scores (use of propensity tables, matrices, etc.) to infer the effect of a
mutation. At the core of SDM’s algorithm is the use of free energy of unfolding (G) measurements
derived from experiments performed on physical proteins. Such experiments compare the energy that is
needed to unfold a WT protein to the energy that is needed to unfold a mutant protein. That difference
is a physical real (performed in the wet lab) measure of the effect of a mutation. Although the details of
G and the Schellman equation are beyond the scope of this course, the key facts to remember are
the following:

A negative G value implies that the Mutant is LESS stable than the WT

A positive G value implies that the Mutant is MORE stable than the WT
If interested you can read more about the technical details of SDM by proceeding to the Theory link in
the upper-left hand corner of the SDM website.
4. Click on the SDM link in the upper left hand corner of the SDM website.
5. Use the input fields in Predicting the effect of mutations on protein stability to upload the PDB file
with the ID that you identified in Q6. You’ll need to retrieve that structure file from the PDB.
6. In order to run SDM, you must specify which amino acid substitution/mutation you want to
investigate. Into the box labeled 1 letter code of PDB chain, input the chain letter to specify the chain of
the protein structure where you want to explore the effect of a mutation.
7. Use your answer to Q3 to provide the 1 letter code of the mutant residue.
8. In the box labeled Residue position, input the amino acid position (your answer for Q1).
9. Click on the SUBMIT to SDM Button
Give the web server a few moments to perform the analysis. Once the results page appears, look over
the text on the left hand side.
Q7 : What is the pseudo DELTA DELTA G value (the predicted effect of the mutation) of the mutation?
Q8 : Based on the predicted DELTA DELTA G value, performing the mutation will have what sort of
effect on the protein structure?
WWU CSCI 474, Lab 4a : effects of mutations
Filip Jagodzinski
IV. Validation, ProTherm
Being a good scientist, you want to validate the output of SDM. Luckily, you are aware of a database that
catalogues G values for mutations that have been performed in physical proteins.
10. Proceed to the ProTherm database at the following URL: http://www.abren.net/protherm/
11. Click on the Advanced Search button to proceed to the search parameters page.
Your goal is to search the database for experimental results that have been performed on the WT
protein. If in luck, somebody in the past has created a mutant structure that is identical to the one that
you have been given by your student, and a (real) G value for that mutation has been deposited into
the ProTherm database.
12. In the search input page that appears, input into PDB Code the 4 alphanumeric ID of the protein that
you identified with BLAST that is the best non-synthetic match for the WT that you’ve been given.
13. Into the ddG/ddG H20 input boxes, specify ddG -10 to 10. This will search the ProTherm database
for entries that have non-null ddG values.
14. ProTherm contains much more information than you need. Set the search parameters to the
following
15. Initiate the ProTherm search by click on the Start button.
16. Locate in the search results the mutation that is identical which you received in the Mutant
sequence given to you by your student.
Q9 : According to ProTherm, what is the ddG value for the mutation performed in the physical protein
that is identical to the mutation in your mutant sequence?
Q10. Based on your answers to Q9 and Q8, how accurate is SDM?
WWU CSCI 474, Lab 4a : effects of mutations
Filip Jagodzinski
V. Submission and Rubric
Submit your answers to the 10 questions to Canvas.
Component of Lab
Answers to questions 1-10
Total
WWU CSCI 474, Lab 4a : effects of mutations
Points
20
20 points
Filip Jagodzinski