* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1. Amino acids. Of all data abstractions in
Bimolecular fluorescence complementation wikipedia , lookup
Rosetta@home wikipedia , lookup
Protein design wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein folding wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein domain wikipedia , lookup
Metalloprotein wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Alpha helix wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) 1. Amino acids. Of all data abstractions in bioinformatics, the one-letter amino acid code is the most important one. The sketch below shows the bonding topologies of amino acids in a polypeptide. Only bonds between non-hydrogen atoms are shown and single and double bonds are not distinguished. Amino- and carboxy terminus are identified. (3) Write the sequence of this polypeptide into your exam booklet in one-letter code. Where the sidechains are ambiguous, write all possible one letter codes for the residue in square brackets. Annotate residues that are > 80 % charged at physiological pH with a "+" or "-". Example: AB+CD[EFG]HIJ[KL]M-N-OPQ O– NH3+ G [TV] A[D-NL] [CS] I M [E-Q] K+ R+ P H+ F Y W (Grading – Correct: 3 points; 1 mistake or omission: 2 points; 2 mistakes or omissions: 1 point; 3 or more: 0 points. Writing H as an uncharged residue: 1/2 point deducted.) 1/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) 2. Stereo vision. Proteins are three-dimensional structures and stereo-images are an essential aid to understand the spatial relationships of their components. The stereo figure below shows a trace of connected Cα atoms of a protein domain (the VH domain of the anti-Fluorescein antibody 4-4-20, 4FAB.PDB) and a wireframe representation of all its tryptophan sidechains. (3) Which tryptophan is a conserved element of the hydrophobic core of this domain? Tryptophan B (Residues A, D and C are obviously solvent accessible, on the surface of the protein) (Grading – Right: 3 points; Wrong: 0 points) 2/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) C: Concepts. Read the following abstract: Structure of TCTP reveals nucleotidefree chaperones unexpected relationship with guanine Paul Thaw, Nicola J. Baxter, Andrea M. Hounslow, Clive Price, Jonathan P. Waltho and C. Jeremy Craven: Nature Struct Biol 8: 701–704 (2001) The translationally controlled tumor-associated proteins (TCTPs) are a highly conserved and abundantly expressed family of eukaryotic proteins that are implicated in both cell growth and the human acute allergic response but whose intracellular biochemical function has remained elusive. We report here the solution structure of the TCTP from Schizosaccharomyces pombe, which, on the basis of sequence homology, defines the fold of the entire family. We show that TCTPs form a structural superfamily with the Mss4/Dss4 family of proteins, which bind to the GDP/GTP free form of Rab proteins (members of the Ras superfamily) and have been termed guanine nucleotide-free chaperones (GFCs). Mss4 also acts as a relatively inefficient guanine nucleotide exchange factor (GEF). We further show that the Rab protein binding site on Mss4 coincides with the region of highest sequence conservation in the TCTP family. This is the first link to any other family of proteins that has been established for the TCTP family and suggests the presence of a GFC/GEF at extremely high abundance in eukaryotic cells. This abstract reports several pieces of data and mentions several pieces of prior information. (12) 3. Summarize the essential steps of how these entities were related to each other in this study. You may use any representation that is reasonable such as pseudocode, a flowchart, or other type of sketch. Note that you are not required to understand the biochemical processes that are described here, nor are you required to comment on the cell-biological implications. Hint: one of the key steps has been underlined by me - you must understand how such a conclusion can be drawn in the situation that is described. You are to summarize the flow of data: the entities that are being referred to, and the experimental and computational procedures. 3/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) (Grading – Marks were given for the presence of at least the following entities and the correct procedures relating them. One mark is given for each correct entity or procedural relationship, extra marks for insightful comments, maximum 12 marks. Marks were deducted for answers that are glaring errors.) Entities reported and implied in the abstract: • • • • • • TCTP Family (multiple alignment) S. pombe TCTP NMR structure (solution structure) Mss4 structure S. pombe TCTP / Mss4 structural alignment Annotation (biochemical ?) of Mss4 Rab protein binding site Cluster of conserved positions in TCTP family Other entities • • Sequence database Structure database Procedures • • • • Sequence database search / significance Multiple sequence alignment and definition of conserved residues Structure database search / significance Visualization / Mapping of information on structure or other method of demonstrating coincidence of Rab binding site and conserved positions A listing of the above entities was not sufficient, correct answers had to assemble a process from these elements. 4/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) Example process: This shows one possible way to sketch the process using the entities above (informal SADT, one of my personal favorites). Many other possibilities exist. This question is marked on structuring and logic, not on form. S. pombe TCTP Protein NMR structure determination Structure database search / significance PDB S. pombe TCTP sequence Mss4 structure VAST, DALI, CE ... Sequence database search / significance Genbank Annotation (biochemical ?) of Mss4 Rab protein binding site S. pombe TCTP NMR structure Structural superposition TCTP / Mss4 structural alignment LOCK ... TCTP Family PSI-BLAST, FASTA ... Multiple sequence alignment and definition of conserved residues Sites coincide ! Interpretation, publication, party Visualization / Mapping on structure Cluster of conserved positions in TCTP family Rasmol, O, MolMol ... CLUSTAL W ... 5/8 BCH441H – Bioinformatics (12) Exam solutions 2002, Part C (Boris Steipe) 4. Describe the two most important implicit assumptions – in your opinion – that are being made in the above process. State each assumption, the conditions for its validity, and its meaning for the interpretation of the results. Assumptions need to be made about many facts in this process; assumptions in general might be categorized into: correctness, completeness, significance and relevance. Correctness and completeness are obvious. Usually the impact of rare errors and omissions across database searches or high-troughput projects is compensated by the correct data. Thus these assumptions are less important here. Whether a result is significant should not be assumed but tested. Usually this involves contrasting an observation with a random model and asking how far the observation deviates from one that could be expected as a chance occurrence. Assumptions about relevance are problem-domain specific. They are usually the ones that you need to be most worried about, because they can't be removed simply by a clear, mechanistic procedure. You need to understand the question to determine whether an answer is relevant to it. (Grading – 6 marks for each of the two assumptions: two marks if the description of an assumption that is actually made in the abstract is correct and complete, one mark, if it is one of the important (top five of the examples below) assumptions, one mark for stating the validity, one mark for explaining how this could be tested, one mark for discussing the consequences when the assumption does not hold. Extra marks for insightful comments were possible, maximum 6 marks. If your assumption is not used at all in the abstract but otherwise well explained, you can get a maximum of two marks. Followup errors were not multiply penalized if the reasoning was otherwise correct. E.g. if you had assumed that a "solution structure" – an experimentally determined NMR structure, the term is used in contrast to a crystal structure – is a Swiss-Model structure, I deducted one point, but marked the rest of the answer as if it had been a homology model after all.) 6/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) Here are five key assumptions, listed in decreasing importance, all of these examples would be full mark answers. 1. S. pombe TCTP and MSS4 are homologues, even if they don't have significant sequence similarity, since they have similar structures. Insignificant sequence similarity seems to be the case for S. pombe TCTP and MSS4- otherwise MSS4 would have been reported to be a member of the TCTP family. Since homology is a reasonable explanantion for striking structural similarity in many instances of distantly related proteins, and these share structure, function, active sites, even catalytic mechanism, this empirical fact can generate useful hypotheses about how function of one protein might be inferred from the relatedness to another. The assumption is difficult to test and involves comparing how many aspects other than structure and sequence are shared in structurally equivalent positions. In the absence of homology, structural similarity may be due to chance (need to test significance) or due to functional requirements. Functional requirements could be tested if a mechanistic model for the function makes predictions for requiring specific residues. If the two structurally similar proteins are not related through common ancestry, the coincidence of a functional site in one protein (cluster of conserved residues in TCTP) and the other (reported RAB binding site in MSS4) would be meaningless with respect to a possible similar function. 2. Homologous proteins have similar structure. This appears to be always true, even though it is an empirical observation. The assumption cannot be tested - except by a structure determination of both homologues - but has never been found to be contradicted. If this assumption were invalid, the S. pombe TCTP structure might not be a valid model for proteins within the TCTP family; residues that are aligned in the family's multiple sequence alignment might be in dissimilar environments regardless of the alignment and have significantly dissimilar roles for fold and function. 3. The structural alignment of S. pombe TCTP and MSS4 is significant. This has to be tested against a random chance model ; usually some Z-score criterion is applied. The alignment will be more significant the lower the RMSD and the longer the stretch of structurally alignable residues. If the alignment is due to random chance, all conclusions about the implications of the alignment are meaningless. 4. The cluster of conserved residues S. pombe TCTP is a functional site. This assumption could be tested biochemically, by mutagenesis, but unfortunately only once the function is known. It is a weak assumption, because residues migth be conserved for structural / folding / stability reasons and not for functional reasons. If the residues are conserved for structural reasons, then the coincidence with an annotated Rab binding site in MSS4 would be meaningless. 5. The annotation of the MSS4 functional site is correct. It may be difficult to pinpoint a biochemically determined binding site to a specific set of amino acids. For example, loss of binding after mutagenesis can also be due to partial denaturation of the protein. Several orthogonal biochemical experiments (or better: the determination of the structure of the complex) may be required to test the validity of this assumption. If the annotation is wrong, the coincidence of a functional site in one protein (cluster of conserved residues in TCTP) and the other (reported RAB binding site in MSS4) would be meaningless with respect to a possible similar function. 7/8 BCH441H – Bioinformatics Exam solutions 2002, Part C (Boris Steipe) Here are three less important assumptions, these would be five point answers. 6. The structural alignment is correct / relevant. "Correct" in this sense means that the mathematically optimal alignment which the algorithm calculates actually creates pairwise associations between those residues that have similar function. This is impossible to test in the absence of additional information. If the structural alignment aligns non-related residues, the coincidence between a functional site in one protein (cluster of conserved residues in TCTP) and the other protein (reported RAB binding site in MSS4) would be an artefact of the alignment. 7. The sequences in the TCTP family are homologuous. Presumably they have been identified as sequences that are highly similar, more similar than could be reasonably expected from random chance. This "reasonable expectation" can be quantified as an expectation value, if a statistical model is availble. If the sequences are similar but in fact not homologuous all conclusions with respected to similar function, similar structure, similar active sites etc. loose their basis. 8. The TCTP family sequence alignment is correct. In this sense "correct" means that residues are aligned that are in fact equivalent in terms of their position in the ancestral sequence, or their function in the protein. The validity of this assumption cannot be tested – but in those cases in which structural alignments can be made, one can at least demonstrate that the residues are in spatially equivalent positions. If the alignment is wrong (i.e. aligning similar residues from different locations of two structures), the conservation patterns may be meaningless. Other answers were possible. 8/8