Download 1. Amino acids. Of all data abstractions in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bimolecular fluorescence complementation wikipedia , lookup

Rosetta@home wikipedia , lookup

Protein design wikipedia , lookup

Proteomics wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Protein folding wikipedia , lookup

Protein wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Cyclol wikipedia , lookup

Protein domain wikipedia , lookup

Metalloprotein wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Alpha helix wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Transcript
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
1. Amino acids. Of all data abstractions in bioinformatics, the one-letter amino acid code is
the most important one. The sketch below shows the bonding topologies of amino acids in
a polypeptide. Only bonds between non-hydrogen atoms are shown and single and double
bonds are not distinguished. Amino- and carboxy terminus are identified.
(3)
Write the sequence of this polypeptide into your exam booklet in one-letter code.
Where the sidechains are ambiguous, write all possible one letter codes for the residue
in square brackets. Annotate residues that are > 80 % charged at physiological pH
with a "+" or "-".
Example: AB+CD[EFG]HIJ[KL]M-N-OPQ
O–
NH3+
G [TV] A[D-NL] [CS]
I
M [E-Q] K+
R+
P
H+
F
Y
W
(Grading – Correct: 3 points; 1 mistake or omission: 2 points; 2 mistakes or omissions: 1
point; 3 or more: 0 points. Writing H as an uncharged residue: 1/2 point deducted.)
1/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
2. Stereo vision. Proteins are three-dimensional structures and stereo-images are an
essential aid to understand the spatial relationships of their components. The stereo figure
below shows a trace of connected Cα atoms of a protein domain (the VH domain of the
anti-Fluorescein antibody 4-4-20, 4FAB.PDB) and a wireframe representation of all its
tryptophan sidechains.
(3)
Which tryptophan is a conserved element of the hydrophobic core of this domain?
Tryptophan B
(Residues A, D and C are obviously solvent accessible, on the surface of the protein)
(Grading – Right: 3 points; Wrong: 0 points)
2/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
C: Concepts. Read the following abstract:
Structure of TCTP reveals
nucleotidefree chaperones
unexpected
relationship
with
guanine
Paul Thaw, Nicola J. Baxter, Andrea M. Hounslow, Clive Price, Jonathan P. Waltho and
C. Jeremy Craven: Nature Struct Biol 8: 701–704 (2001)
The translationally controlled tumor-associated proteins (TCTPs) are a highly
conserved and abundantly expressed family of eukaryotic proteins that are
implicated in both cell growth and the human acute allergic response but whose
intracellular biochemical function has remained elusive. We report here the
solution structure of the TCTP from Schizosaccharomyces pombe, which, on the
basis of sequence homology, defines the fold of the entire family. We show that
TCTPs form a structural superfamily with the Mss4/Dss4 family of proteins,
which bind to the GDP/GTP free form of Rab proteins (members of the Ras
superfamily) and have been termed guanine nucleotide-free chaperones (GFCs).
Mss4 also acts as a relatively inefficient guanine nucleotide exchange factor
(GEF). We further show that the Rab protein binding site on Mss4 coincides with
the region of highest sequence conservation in the TCTP family. This is the first
link to any other family of proteins that has been established for the TCTP family
and suggests the presence of a GFC/GEF at extremely high abundance in
eukaryotic cells.
This abstract reports several pieces of data and mentions several pieces of prior
information.
(12)
3. Summarize the essential steps of how these entities were related to each other in
this study. You may use any representation that is reasonable such as pseudocode, a
flowchart, or other type of sketch.
Note that you are not required to understand the biochemical processes that are described
here, nor are you required to comment on the cell-biological implications. Hint: one of the
key steps has been underlined by me - you must understand how such a conclusion can be
drawn in the situation that is described. You are to summarize the flow of data: the entities
that are being referred to, and the experimental and computational procedures.
3/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
(Grading – Marks were given for the presence of at least the following entities and the
correct procedures relating them. One mark is given for each correct entity or procedural
relationship, extra marks for insightful comments, maximum 12 marks. Marks were
deducted for answers that are glaring errors.)
Entities reported and implied in the abstract:
•
•
•
•
•
•
TCTP Family (multiple alignment)
S. pombe TCTP NMR structure (solution structure)
Mss4 structure
S. pombe TCTP / Mss4 structural alignment
Annotation (biochemical ?) of Mss4 Rab protein binding site
Cluster of conserved positions in TCTP family
Other entities
•
•
Sequence database
Structure database
Procedures
•
•
•
•
Sequence database search / significance
Multiple sequence alignment and definition of conserved residues
Structure database search / significance
Visualization / Mapping of information on structure or other method of demonstrating
coincidence of Rab binding site and conserved positions
A listing of the above entities was not sufficient, correct answers had to assemble a process
from these elements.
4/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
Example process: This shows one possible way to sketch the process using the entities
above (informal SADT, one of my personal favorites). Many other possibilities exist. This
question is marked on structuring and logic, not on form.
S. pombe
TCTP
Protein
NMR structure
determination
Structure
database search
/ significance
PDB
S. pombe
TCTP
sequence
Mss4
structure
VAST,
DALI,
CE ...
Sequence database
search
/ significance
Genbank
Annotation
(biochemical
?) of Mss4
Rab protein
binding site
S. pombe
TCTP
NMR
structure
Structural
superposition
TCTP /
Mss4
structural
alignment
LOCK
...
TCTP
Family
PSI-BLAST,
FASTA ...
Multiple sequence
alignment and
definition of
conserved residues
Sites
coincide !
Interpretation,
publication,
party
Visualization /
Mapping on
structure
Cluster of
conserved
positions in
TCTP family
Rasmol,
O,
MolMol
...
CLUSTAL W
...
5/8
BCH441H – Bioinformatics
(12)
Exam solutions 2002, Part C (Boris Steipe)
4. Describe the two most important implicit assumptions – in your opinion – that are
being made in the above process. State each assumption, the conditions for its validity,
and its meaning for the interpretation of the results.
Assumptions need to be made about many facts in this process; assumptions in
general might be categorized into: correctness, completeness, significance and
relevance.
Correctness and completeness are obvious. Usually the impact of rare errors and
omissions across database searches or high-troughput projects is compensated by
the correct data. Thus these assumptions are less important here.
Whether a result is significant should not be assumed but tested. Usually this
involves contrasting an observation with a random model and asking how far the
observation deviates from one that could be expected as a chance occurrence.
Assumptions about relevance are problem-domain specific. They are usually the
ones that you need to be most worried about, because they can't be removed simply
by a clear, mechanistic procedure. You need to understand the question to
determine whether an answer is relevant to it.
(Grading – 6 marks for each of the two assumptions: two marks if the description of
an assumption that is actually made in the abstract is correct and complete, one
mark, if it is one of the important (top five of the examples below) assumptions, one
mark for stating the validity, one mark for explaining how this could be tested, one
mark for discussing the consequences when the assumption does not hold. Extra
marks for insightful comments were possible, maximum 6 marks. If your assumption
is not used at all in the abstract but otherwise well explained, you can get a
maximum of two marks. Followup errors were not multiply penalized if the
reasoning was otherwise correct. E.g. if you had assumed that a "solution
structure" – an experimentally determined NMR structure, the term is used in
contrast to a crystal structure – is a Swiss-Model structure, I deducted one point,
but marked the rest of the answer as if it had been a homology model after all.)
6/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
Here are five key assumptions, listed in decreasing importance, all of these
examples would be full mark answers.
1. S. pombe TCTP and MSS4 are homologues, even if they don't have significant sequence
similarity, since they have similar structures. Insignificant sequence similarity seems to be
the case for S. pombe TCTP and MSS4- otherwise MSS4 would have been reported to be
a member of the TCTP family. Since homology is a reasonable explanantion for striking
structural similarity in many instances of distantly related proteins, and these share
structure, function, active sites, even catalytic mechanism, this empirical fact can generate
useful hypotheses about how function of one protein might be inferred from the relatedness
to another. The assumption is difficult to test and involves comparing how many aspects
other than structure and sequence are shared in structurally equivalent positions. In the
absence of homology, structural similarity may be due to chance (need to test significance)
or due to functional requirements. Functional requirements could be tested if a mechanistic
model for the function makes predictions for requiring specific residues. If the two
structurally similar proteins are not related through common ancestry, the coincidence of a
functional site in one protein (cluster of conserved residues in TCTP) and the other
(reported RAB binding site in MSS4) would be meaningless with respect to a possible
similar function.
2. Homologous proteins have similar structure. This appears to be always true, even though
it is an empirical observation. The assumption cannot be tested - except by a structure
determination of both homologues - but has never been found to be contradicted. If this
assumption were invalid, the S. pombe TCTP structure might not be a valid model for
proteins within the TCTP family; residues that are aligned in the family's multiple sequence
alignment might be in dissimilar environments regardless of the alignment and have
significantly dissimilar roles for fold and function.
3. The structural alignment of S. pombe TCTP and MSS4 is significant. This has to be
tested against a random chance model ; usually some Z-score criterion is applied. The
alignment will be more significant the lower the RMSD and the longer the stretch of
structurally alignable residues. If the alignment is due to random chance, all conclusions
about the implications of the alignment are meaningless.
4. The cluster of conserved residues S. pombe TCTP is a functional site. This assumption
could be tested biochemically, by mutagenesis, but unfortunately only once the function is
known. It is a weak assumption, because residues migth be conserved for structural /
folding / stability reasons and not for functional reasons. If the residues are conserved for
structural reasons, then the coincidence with an annotated Rab binding site in MSS4 would
be meaningless.
5. The annotation of the MSS4 functional site is correct. It may be difficult to pinpoint a
biochemically determined binding site to a specific set of amino acids. For example, loss of
binding after mutagenesis can also be due to partial denaturation of the protein. Several
orthogonal biochemical experiments (or better: the determination of the structure of the
complex) may be required to test the validity of this assumption. If the annotation is wrong,
the coincidence of a functional site in one protein (cluster of conserved residues in TCTP)
and the other (reported RAB binding site in MSS4) would be meaningless with respect to a
possible similar function.
7/8
BCH441H – Bioinformatics
Exam solutions 2002, Part C (Boris Steipe)
Here are three less important assumptions, these would be five point answers.
6. The structural alignment is correct / relevant. "Correct" in this sense means that the
mathematically optimal alignment which the algorithm calculates actually creates pairwise
associations between those residues that have similar function. This is impossible to test in
the absence of additional information. If the structural alignment aligns non-related
residues, the coincidence between a functional site in one protein (cluster of conserved
residues in TCTP) and the other protein (reported RAB binding site in MSS4) would be an
artefact of the alignment.
7. The sequences in the TCTP family are homologuous. Presumably they have been
identified as sequences that are highly similar, more similar than could be reasonably
expected from random chance. This "reasonable expectation" can be quantified as an
expectation value, if a statistical model is availble. If the sequences are similar but in fact
not homologuous all conclusions with respected to similar function, similar structure, similar
active sites etc. loose their basis.
8. The TCTP family sequence alignment is correct. In this sense "correct" means that
residues are aligned that are in fact equivalent in terms of their position in the ancestral
sequence, or their function in the protein. The validity of this assumption cannot be tested –
but in those cases in which structural alignments can be made, one can at least
demonstrate that the residues are in spatially equivalent positions. If the alignment is wrong
(i.e. aligning similar residues from different locations of two structures), the conservation
patterns may be meaningless.
Other answers were possible.
8/8