* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CSCI 474 Lab 4a : inferring the effects of mutations Spring 2017
Circular dichroism wikipedia , lookup
Rosetta@home wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein domain wikipedia , lookup
Structural alignment wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein design wikipedia , lookup
Protein folding wikipedia , lookup
Protein purification wikipedia , lookup
Alpha helix wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
CSCI 474 Lab 4a : inferring the effects of mutations Spring 2017 Computer Science In this lab you will explore an online tool, SDM, for inferring the effects of a single amino acid mutation on the structure and stability of a protein. You’ll rely on tools that you’ve used in the past such as NCBI BLAST and Clustal Omega, and you’ll use an on-line database of wet lab experimental results about mutations to validate the prediction made by SDM. Compose answers to the bold questions throughout this lab and submit your write up via Canvas. I. Scenario Your student who has been collecting organisms from a local pond has brought to you two FASTA sequences. One sequence, prot_WT_FASTA, is what he claims is the WT (Wild Type, non-mutated) form of a protein that 99% of the organisms in his sample possess. The other, prot_Mutant_FASTA, is a sequence of a protein that he claims is found in 1% of the organisms in the samples that he has collected. These are the mutant proteins. Your task is to determine how the two protein sequences differ, and how that difference between the WT and mutant affects the stability of the mutant protein relative to the wild type version. I. Sequence data Both sequences, prot_WT_FASTA.txt and prot_Mutant_FASTA.txt are provided to you on the course website. 1. Download these two sequences and save them to a local directory. Q1 : At what position (which amino acid number) in the polypeptide chain do the two protein sequences differ? You can perform a visual inspection, or – better yet – use Clustal Omega to perform a multiple sequence alignment. Q2 : At the amino acid location where the WT sequence differs from the Mutant sequence, what is the amino acid in the WT? Give the full name. How many atoms does that amino acid have? Q3 : At the amino acid where the WT sequence differs from the Mutant sequence, what is the amino acid in the mutant? Give the full name. How many atoms does that amino acid have? WWU CSCI 474, Lab 4a : effects of mutations Filip Jagodzinski Q4 : Based on the different atom counts in the WT and Mutant amino acids that you identified in Q2 and Q3, formulate a hypothesis about the effect that the mutation (from WT to Mutant) has on the structure of the protein. A few sentences should suffice and your answer does not need to be technical. However do not say “The mutation will disrupt the protein” or “The mutation will affect the protein.” Instead, say HOW might the mutation affect the protein. II. Structure Data There are available several tools, including those that rely on Machine Learning (ML) techniques, that (try to) infer the effect of a mutation using only the sequence of amino acids in a protein. Such tools have varying degrees of accuracy for the reason that the effect of a mutation is dependent on how the amino acid substitution/deletion/insertion affects the 3D structure of the protein, and not just the sequence. In order to assess best how the mutation in your WT sequence affects the protein, you need to compare the 3D structures of the WT protein and mutant. Unfortunately, your student only provided you with FASTA sequences, and using X-ray crystallography to sequence and then analyze both structures could be a year-long process. However, what if the WT sequence already exists in the gene bank and/or PDB? 2. Perform a NCBI BLAST using the WT sequence as the query sequence. Because you are dealing with proteins instead of DNA sequences, use protein blast. Run BLAST with all default parameters. Q5 : What is the Max Score and Total Score (from the Descriptions panel) of the top-most BLAST match for the WT sequence? What is the Score (bits) and Identities value (from the Alignments panel)? What is the superfamily of your sequence according to the BLAST results? You should notice that the BLAST match found a “synthetic coding“ sequence to which the WT sequences aligns with best. Your answers to Q5 should be for the synthetic match. However, we want to use a non-synthetic sequence. 3. Inspect the Alignments section, for the top-most match found by BLAST (the synthetic coding result). Click on See 13 more title(s) to see identical matches. Q6 : What is the Protein Data Bank structure ID (4 alpha numeric characters) of the first non-synthetic match to your query when you expand See 13 more title(s)? You now know the protein type (superfamily), and the entry in the PDB that matches best with your WT sequence. III. Site Directed Mutator (SDM) SDM is a mutation analysis tool developed by the Crystallography and Bioinformatics Group at Cambridge. The web-server is at the following URL : http://mordred.bioc.cam.ac.uk/~sdm/sdm.php WWU CSCI 474, Lab 4a : effects of mutations Filip Jagodzinski SDM relies on several metrics and scores (use of propensity tables, matrices, etc.) to infer the effect of a mutation. At the core of SDM’s algorithm is the use of free energy of unfolding (G) measurements derived from experiments performed on physical proteins. Such experiments compare the energy that is needed to unfold a WT protein to the energy that is needed to unfold a mutant protein. That difference is a physical real (performed in the wet lab) measure of the effect of a mutation. Although the details of G and the Schellman equation are beyond the scope of this course, the key facts to remember are the following: A negative G value implies that the Mutant is LESS stable than the WT A positive G value implies that the Mutant is MORE stable than the WT If interested you can read more about the technical details of SDM by proceeding to the Theory link in the upper-left hand corner of the SDM website. 4. Click on the SDM link in the upper left hand corner of the SDM website. 5. Use the input fields in Predicting the effect of mutations on protein stability to upload the PDB file with the ID that you identified in Q6. You’ll need to retrieve that structure file from the PDB. 6. In order to run SDM, you must specify which amino acid substitution/mutation you want to investigate. Into the box labeled 1 letter code of PDB chain, input the chain letter to specify the chain of the protein structure where you want to explore the effect of a mutation. 7. Use your answer to Q3 to provide the 1 letter code of the mutant residue. 8. In the box labeled Residue position, input the amino acid position (your answer for Q1). 9. Click on the SUBMIT to SDM Button Give the web server a few moments to perform the analysis. Once the results page appears, look over the text on the left hand side. Q7 : What is the pseudo DELTA DELTA G value (the predicted effect of the mutation) of the mutation? Q8 : Based on the predicted DELTA DELTA G value, performing the mutation will have what sort of effect on the protein structure? WWU CSCI 474, Lab 4a : effects of mutations Filip Jagodzinski IV. Validation, ProTherm Being a good scientist, you want to validate the output of SDM. Luckily, you are aware of a database that catalogues G values for mutations that have been performed in physical proteins. 10. Proceed to the ProTherm database at the following URL: http://www.abren.net/protherm/ 11. Click on the Advanced Search button to proceed to the search parameters page. Your goal is to search the database for experimental results that have been performed on the WT protein. If in luck, somebody in the past has created a mutant structure that is identical to the one that you have been given by your student, and a (real) G value for that mutation has been deposited into the ProTherm database. 12. In the search input page that appears, input into PDB Code the 4 alphanumeric ID of the protein that you identified with BLAST that is the best non-synthetic match for the WT that you’ve been given. 13. Into the ddG/ddG H20 input boxes, specify ddG -10 to 10. This will search the ProTherm database for entries that have non-null ddG values. 14. ProTherm contains much more information than you need. Set the search parameters to the following 15. Initiate the ProTherm search by click on the Start button. 16. Locate in the search results the mutation that is identical which you received in the Mutant sequence given to you by your student. Q9 : According to ProTherm, what is the ddG value for the mutation performed in the physical protein that is identical to the mutation in your mutant sequence? Q10. Based on your answers to Q9 and Q8, how accurate is SDM? WWU CSCI 474, Lab 4a : effects of mutations Filip Jagodzinski V. Submission and Rubric Submit your answers to the 10 questions to Canvas. Component of Lab Answers to questions 1-10 Total WWU CSCI 474, Lab 4a : effects of mutations Points 20 20 points Filip Jagodzinski