Download Conserved Key Amino Acid Positions (CKAAPs) Derived From the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

Gene expression wikipedia , lookup

Metabolism wikipedia , lookup

Expression vector wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Magnesium transporter wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Interactome wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Point mutation wikipedia , lookup

Genetic code wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein wikipedia , lookup

Biochemistry wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
PROTEINS: Structure, Function, and Genetics 42:148 –163 (2001)
Conserved Key Amino Acid Positions (CKAAPs) Derived
From the Analysis of Common Substructures in Proteins
Boojala V.B. Reddy1, Wilfred W. Li1, Ilya N. Shindyalov1, and Philip E. Bourne1,2,3*
1
San Diego Supercomputer Center, University of California, San Diego, La Jolla, California
2
Department of Pharmacology, University of California, San Diego,
La Jolla, California
3
The Burnham Institute, La Jolla, California
ABSTRACT
An all-against-all protein structure comparison using the Combinatorial Extension
(CE) algorithm applied to a representative set of
PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html).
These substructures represent commonly identified
folds, domains, or components thereof. Most of the
subsequences forming these similar substructures
have no significant sequence similarity. We present
a method to identify conserved amino acid positions
and residue-dependent property clusters within
these subsequences starting with structure alignments. Each of the subsequences is aligned to its
homologues in SWALL, a nonredundant protein
sequence database. The most similar sequences are
purged into a common frequency matrix, and
weighted homologues of each one of the subsequences are used in scoring for conserved key amino
acid positions (CKAAPs). We have set the top 20% of
the high-scoring positions in each substructure to
be CKAAPs. It is hypothesized that CKAAPs may be
responsible for the common folding patterns in
either a local or global view of the protein-folding
pathway. Where a significant number of structures
exist, CKAAPs have also been identified in structure
alignments of complete polypeptide chains from the
same protein family or superfamily. Evidence to
support the presence of CKAAPs comes from other
computational approaches and experimental studies of mutation and protein-folding experiments,
notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non-CKAAPs
is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of
CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure
prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148 –163.
©
2000 Wiley-Liss, Inc.
Key words: protein structure comparison; sequence
homology; conserved key; amino acid
positions; protein folding; protein structure prediction; protein engineering
©
2000 WILEY-LISS, INC.
INTRODUCTION
It was observed long ago that the three-dimensional
structural constraints and functional selection of proteins
in nature leads to the retention of significant sequence
homology between proteins of similar fold and function.
This observation has been the basis for successful use of
comparative (homology) modeling procedures in which
structures of homologues are used to model a query
sequence.1,2 Such modeled structures are more reliable as
the homology between the sequence of the template structure and the target sequence increases over 40%.3,4 Conversely, as sequence and structural data have increased
rapidly, we observe proteins with significant similarity in
their structure and possibly function but no measurable
similarity from their sequences alone. Many families of
protein structures, classified by CATH,5 SCOP,6 or
HOMSTRAD,7 contain one or more member structures
that have no significant sequence similarity (⬍25% sequence identity) but have a similar overall structure or at
least a fold belonging to the corresponding family or
superfamily. These observations have driven the attention
of scientists interested in sequence analysis to strive for
new methods that could help identify remotely related
proteins from the sequence information alone, for the ratio
of available sequence to structure information will remain
large.8,9 It is also a challenging task for scientists interested in protein structure analysis to explain the rationale
for similar folds from apparently dissimilar amino acid
sequences. Such explanations will provide new insights
into protein folding and protein engineering.
There are two general models that attempt to explain
the overall three-dimensional conformation of a protein
from its amino acid sequence10,11: (i) a centralized (local)
model, in which fold specificity is coded in just a few
critical residues (10 –20%) of the sequence and (ii) a
distributed (global) model, in which the fold is formed by
interactions involving the entire sequence. The global
Grant sponsor: National Biomedical Computation Resource: Grant
number: NIH P41 RR08605-06; Grant sponsor: National Science
Foundation: Grant number: DBI 9808706.
*Correspondence to: Philip E. Bourne, San Diego Supercomputer
Center, University of California, San Diego, 9500 Gilman Drive, La
Jolla, CA 92093-0505. E-mail: [email protected]
Received 13 March 2000; Accepted 7 September 2000
Published online 00 Month 2000
CONSERVED KEY AMINO ACID POSITIONS
model is supported by a number of mutation studies in
which most of single-residue mutations were found to
provide no measurable effect on protein function and
presumably structure.12,13 This view is also supported by
structural studies. Russell and Barton14 report that for
many proteins with similar three-dimensional structures,
the proportion of complementary changes is near to that
expected by chance, suggesting that many similar proteins
with similar three-dimensional structures have fundamentally different stabilizing interactions. Furthermore, an
analysis of the most commonly occurring immunoglobulin
core sequences indicated that the high degree of structural
flexibility outside the common core and the variability of
side-chain packing inside the core did not support the
notion of a common protein-folding pathway.15 Likewise,
Wood and Pearson10 argue that statistically defined Zvalues for sequence similarity and structural similarity
are related linearly at all levels of sequence identity,
supporting the idea of a global folding pathway.
Conversely, a folding pathway governed by local interactions is also supported by many convincing examples in
the literature, for example, the mutations leading to
protein misfolding that are associated with certain diseases.16 Studies of sequence versus structural similarity17–20 show most protein pairs with ⬎30% identical
residues have similar structures. However, many proteins
have a similar fold with sequence identity ⬍25%. It could
therefore be argued that only a small subset of residues
are important to define the fold. Hence, a number of
investigators have recently concentrated on the idea of a
minimal set of conserved key residues in proteins serving
as nucleation centers in the folding pathway and in
stabilization of the protein structure.21–23 The Gaussian
network model24 identified residues with the highest
frequency fluctuations near the native state as kinetically
hot residues. It was shown that these residues are correlated with the most conserved amino acids in proteins. The
formation of the transition state in folding kinetics is
observed to be due to a sufficient minimal number of
specific contacts as observed in the native structures.21,25
Such a minimal number of contact residues are said to be
residues of the fold nucleus that are position-specific and
conserved throughout the family and superfamily of protein structures. Mirny et al.26 showed the requirement and
need of a minimal set of key residues for faster folding of
proteins on a physiological timescale. Rost27 concurs by
proposing that only 3– 4% of residues are “anchor residues,” which are implicated to be significant during evolution in relating proteins of dissimilar functions with
similar folds. Dosztányi et al.28 reported a minimal set of
conserved residues as stabilization centers in protein
structures. Lichtarge et al.29 presented an evolutionary
trace method to identify functionally important residues
through analysis of closely related sequences at various
levels of sequence identities. Recently Michnick and Shakhnovich30 presented a method to predict conserved residues for a given protein structure through simulated
sequences and implicated such residues to be guiding and
deciding the folding path. Another report from the same
149
group31 further discussed the universally conserved positions in the five most commonly occurred protein folds
including the immunoglobulin (Ig) fold. In the case of the
most commonly occurring immunoglobulin fold Clarke et
al.32 showed that the structurally more conserved amino
acids among the proteins of similar folds from superfamilies are indeed used for guiding the folding process.
The work presented here does not attempt to favor
either a local or global view of folding. Rather it attempts
to recognize residues whose property conservation may be
significant. The percentage of those residues that are
indeed significant is unknown. We have arbitrarily chosen
a 20% cutoff in the residues reported. It may be that a far
greater percentage is needed in some cases, supporting a
global view of folding, and less in others, supporting a local
view. The viewpoint of this article is simply one of using
the increasing amount of structural data possessing the
same fold, aligning those structures and interpreting the
sequence alignments associated with those structure alignments in an effort to recognize a common fingerprint.
Consider this viewpoint with reference to the Paracelsus
Challenge.33
It has been shown by three independent groups34 –36
that by mutating ⬍50% of the residues, an all-␣ protein
structure can be converted to a very different ␣-␤ fold and
vice versa. Key questions then become, what is the minimum number of residues that need to be mutated to effect
the change to a stable protein of a different conformation?
What if I mutate more residues, can I achieve a more
stable protein? Answers to these questions are going to
depend very much on the size and characteristics of the
starting protein. The presence of CKAAPs, for which the
significance is unknown, and the unknown outcome on
stability of making a specific mutation, do not answer
these questions. However, CKAAPs may provide some
insight into what residues to and what not to mutate in
addressing the Challenge. With this background, consider
how CKAAPs are derived. An all-against-all protein structure comparison study using the Combinatorial Extension
(CE) algorithm developed in our laboratory37 revealed 150
clusters of common substructures in proteins.38 Approximately 40% of all structures found in the PDB contain one
or more of these substructures. Substructures are formed
by near-continuous subsequence of ⬎60 amino acids in
length. Each of the substructures is formed by at least five
or more dissimilar subsequences and hence presents an
excellent data source for inferring sequence-structure relationships. Here we present a strategy to identify residue
positions conserved, at least for property, among these
naturally occurring structural homologues with dissimilar
sequences.
CKAAPs are determined from three-dimensional structure alignments by a combined normalized occurrence
score based on absolute amino acid conservation combined
with property-based conservation. Structure alignments
for each substructure data set are expanded by including
sequences for which no structures are available and which
have ⬎40% sequence identity to a substructure. We propose that CKAAPs provide insights to function when a
150
B.V.B. REDDY ET AL.
clear evolutionary relationship between the sequences
being compared exists and insights into what residues are
most important in defining a particular protein fold.
Evidence for these insights comes from parallel computational studies and experimental evidence from the literature. Here we highlight evidence from the extracellular
matrix protein tenascin-C (TN), the calcium-binding regulatory protein Troponin C (TnC), and others. Finally, we
present an analysis of the structural environment of
CKAAPs versus all other residue positions to ascertain
whether they represent a unique fingerprint and discuss
their usefulness in sequence-based protein structure prediction, fold recognition, de novo design and modeling, and
engineering of protein structures to achieve a desired
function.
Fig. 1. Sequence space scanning procedure to identify CKAAPs
using structural homologues identified by CE.
MATERIALS AND METHODS
Previously an all-against-all protein three-dimensional
protein structure comparison was performed by using the
CE algorithm37 to identify representative protein structures38 in the Protein Data Bank (PDB). Each of the
representative structures is grouped with its represented
structures based on the following criteria: a Root Mean
Square Deviation (RMSD) cutoff of 4.0 Å among superposed C␣ positions of aligned amino acid residues; a length
difference between two sequences of ⬍10%; and a Z-score37
⬎ 4.0. Based on these collective criteria, which in no way
are biased by sequence similarity, there are 2,016 representative polypeptide chains in the PDB (release 1998).
Subsequent comparison of this structurally nonredundant
set of complete polypeptide chains revealed a set of 75
recurring substructures.38 These substructures make no
account of protein domains, yet in virtually all instances
fall within domains defined by CATH.5 From the original
CE alignment for each substructure, we then excluded
substructures whose subsequences had ⬎25% sequence
identity between one another. Using these alignments and
the naturally occurring sequences in SWALL (a nonredundant sequence database derived from SwissProt ⫹
Trembl ⫹ TremblNew) at the European Bioinformatics
Institute (EBI), we developed a procedure to identify
CKAAPs. This procedure is described below.
Sequence-Based Analysis of Common
Substructures
The analysis for a given common substructure consists
of the following steps, with the initial steps depicted in
Figure 1.
1. Each of the subsequences Si of a given substructure i
are submitted to a FASTA3 search against the SWALL
nonredundant sequence database using a Blosum50
weight matrix, a gap ⫽ ⫺12, and a gap-extension ⫽ ⫺2
to obtain homologous subsequences Sij.
2. The set of subsequences, Sij, with ⬎40% sequence
identity are sorted, and 100% identical subsequences
are removed by keeping only one such subsequence.
Subsequences with ⬎90% sequence identity, S⬙ik, are
grouped and separate position specific frequency tables,
ik
flm, are calculated for each such group, k. Here l is the
alignment position of the subsequence and m is the type
of amino acid. Each group k is given the weight of one
sequence equivalent. The effect is not to over weight
close homologues. The remaining subsequences having
a sequence identity between 40% and 90% are grouped
together into S⬘ij, and the residue occurrence value,
i
Nlm, is computed.
3. The frequency of occurrence (iflm) of amino acids is
computed for each position l in each substructure i by
converting the residue occurrence into a frequency and
adding the group specific frequency values as depicted
in Eq. 1.
N⬘lm ⫹
i
i
冘
ik
f⬙lm
k
flm ⫽
(1)
n⬘ ⫹ n⬙
where n⬘ is the subsequence count in S⬘ij and n⬙ is the
number of k groups.
4. A weighted occurrence score, given as a position specific
amino acid matrix, Nlm, is computed for all subsequences, Si, of a given substructure (Eq. 2).
冘
共iflm ⫻ iw兲
i
⫻ 1000 ⫽ Nlm
n
i
w ⫽ 0.2 if S⬘ij count is ⱕ 5
Where i
w ⫽ 1.0 otherwise
(2)
再
and n is the total number of subsequences in Si.
5. An average occurrence value for each amino acid (Nm) is
obtained and the sum of RMSDs of amino acid occurrence (Ra) at every sequence position is computed as
given below.
Ra ⫽
冑冘
共Nlm ⫺ Nm兲2Ⲑ20
(3)
m
6. The amino acids are divided into 12 property groups as
classified by Taylor39 and Zvelebil et al.40 Most amino
acids are included in more than one group because they
satisfy the property pertaining to multiple groups. An
CONSERVED KEY AMINO ACID POSITIONS
ment to amino acids in a random data set. Specifically, a
data set of 306 nonhomologous (ⱕ25% sequence identity),
best resolved (ⱕ2.0 Å resolution X-ray structures)41 were
used to calculate the propensities of residues for a given
structural environment and compared to the environment
of amino acids in CKAAPs. The parameters used to define
the structural environment of amino acid residues were
secondary structure type, packing density, hydrogen bonding, and solvent accessibility. The values were computed
by using the methods described below.
TABLE I. Property Groups of Amino Acids Used
to Identify CKAAPs39,40
S. No
Property group
Amino acids
1
2
3
4
5
6
7
8
9
10
11
12
Hydrophobic group 1
Hydrophobic group 2
Polar group
Small residue group
Tiny
Aliphatic
Aromatic
Positive (basic)
Negative (acidic)
Charged
Conformational
Neutral polar
AVLIFYWTCMHKG
AVLIFYW
YWSTNQDEHKR
AVSTCNDGP
AGS
ILV
FYWH
HKR
ED
HKRED
PG
STCMNQ
Secondary structure
average value of occurrence of residues belonging to
each property group (Table I) is calculated and the
RMSD for the 12 property group occurrences, Rg, is
computed. The first 20% of residue positions which have
the highest (Ra ⫹ Rg) values are marked as conserved
key amino acid positions (CKAAPs).
7. The weighted log odds values (Hlm) for each amino acid
at every position is calculated (Eq. 4).
Hlm ⫽ 共兩Nlm ⫺ Nl兩/10兲log共Nlm/Nm兲
151
(4)
Identification of CKAAPs for Complete Proteins
The procedure has been extended to identify CKAAPs
for complete polypeptide chains, rather than substructures. This is possible in cases in which several structures
exist for proteins in the same family and their structures
can be aligned. Two such families, the chymotrypsin
inhibitor and the ubiquitin family of proteins, are discussed in this study. On occasion the Z-score, RMSD, and
the sequence identity thresholds have been adjusted to
provide a statistically significant number of structural
homologues (5 or more members). If required, the Z-score
was lowered from 4.0 to 3.7, and the sequence identity
threshold was increased from 25% to 35% or the RMSD
between C␣ coordinates was raised from 4.0 to 5.0 Å. We
then follow steps 2–7 outlined above to identify CKAAPs in
a given protein family. Undoubtedly, these adjustments
affect the significance of CKAAPs, but as shown subsequently, empirically the determinations still appear useful.
Stereo Diagram Generation and Molecular
Visualization
Stereo diagrams were generated by using WebLab Pro
from Molecular Simulation Inc. (MSI, San Diego, CA).
When the software rendering the secondary structures of
molecules did not agree with those in the CATH classification (http://www.biochem.ucl.ac.uk/bsm/cath/), the latter
was applied where indicated. Further molecular visualization was conducted by using Insight II 98.0 also from MSI.
Analysis of the Structural Environment of CKAAPs
All the CKAAPs in the gallery of common substructures38 were analyzed to compare their structural environ-
The secondary structure definition of Kabsch and
Sander,42 as summarized by Smith43 in his SSTRUC
program, were used to define the secondary structure type
taken by the residue in the wild type proteins, and
classified as either helices, strands, or random coils.
Packing density (Ooi number)
A contact number with other residues within an 8 Å and
a 14 Å radius were computed by using the method of
Nishikawa and Ooi.44 Because the longest distance from
C␣i to C␣i⫹1 is approximately 4 Å, the nearest neighbor
residues on either side of the dipeptide were omitted when
counting. Ooi numbers calculated at both a 8 Å and a 14 Å
radius were combined and used as a single structure
environment parameter.
Hydrogen bond formation
Hydrogen bond formation was defined by using a donoracceptor distance of ⱕ3.5 Å.45. Angular criteria were not
considered because side-chain atoms are not equally well
positioned by crystallography and not all hydrogen atom
positions are fixed by the positions of the heavier atoms.
Hydrogen bonding was examined from a side chain at
positions i to the residues other than those at positions
i⫺1, i, and i⫹1. The average number of hydrogen bonds
(dipole interactions) that could be formed by the residue in
a given protein structure was computed.
Solvent accessibility
The solvent accessible contact area of amino acids was
calculated by using the method of Lee and Richards,46 as
coded by Sali and Blundell47 in their PSA program, with a
probe radius of 1.4 Å. The percentage of accessible contact
area of the residue side chain, main chain, polar side
chain, non-polar side chain, and total atoms were used.
RESULTS AND DISCUSSION
The classification of protein structures has received
considerable attention over the years,5,6,48 particularly as
more structures have been experimentally determined.
Consider a bottom-up view. Amino acids show different
propensities to adopt a particular conformation based on
their side chain properties and the influence from near
neighbor residue interactions. The most predominant secondary structures are ␣-helices and ␤-sheets connected
through different types of turns, bends, and loops. Different classes of supersecondary structures are formed on the
152
B.V.B. REDDY ET AL.
basis of the properties of residues on the surface of these
predominantly occurring rigid secondary structural elements. Compact substructures in proteins evolve as a
result of the hierarchical packing of secondary structural
elements (SSE),49,50 which fold into energetically stable
structural domains. Many related proteins have similar
combinations of SSEs, domains, or substructures as a
result of geometrically and energetically favorable packing
architectures.51–53 Here we refer to a substructure as a
part of a domain classified in the CATH database.5 Acceptability of mutation(s) at each position in the natural
sequence depends on the local environment of amino
acid(s)54 and the kind of interactions each amino acid
undergoes in the folding process. Most of the single residue
mutations—substitution, insertion, and deletion— bring
about a small conformational change in the near vicinity of
the residue, effecting a drift in atomic positions within an
8 –10 Å radius.55–57 However, some residues may effect a
greater change than others, to the point where a single
mutation would bring about a drastic change in the folding
process. The resulting protein structure might be unstable
and prone to premature degradation.58 Such substitution
mutations may also lead to the evolution of new folds in
proteins.59,60 The sequences of naturally occurring structural homologs for each such fold have undergone many
evolutionary changes. However, the need to maintain
structural integrity to permit biological function necessitates the absolute conservation of residues or at least the
conservation of property. The key question then becomes
whether we can identify the relative importance of these
residues.
Conserved Key Amino Acid Positions (CKAAPs) in
Common Substructures
The CE algorithm developed in our laboratory37 allowed
us to provide a different view of protein fold space.38
Among the 2,016 nonredundant representative polypeptide chains identified, the commonly occurring substructures are formed by mostly dissimilar sequences and
represented in a gallery of substructures (http://cl.sdsc.edu/
ce.html). Each substructure is formed by a significant
number of apparently functionally unrelated proteins. The
availability of these structural alignments—and the importance of validating manually good structure alignments is
critical—makes it possible to examine the conservation of
amino acids (Ra) and amino acid properties (Rg), which
may be playing a key role in substructure formation. The
sum value (Ra ⫹ Rg), calculated from amino acids and
property group occurrence scores, provides an indication
as to whether particular positions are preferred by either a
few specific amino acids, or property group, or both.
Arbitrarily, the first 20% of residues with the highest
sum values are referred as CKAAPs and considered to
have the most significant contribution to the threedimensional structure. CKAAPs have automatically been
determined for all the common substructures in the gallery as defined in the methods section. CKAAPs for all the
common substructures are available via the World Wide
Web at http://ckaaps.sdsc.edu. Figure 2 shows stereo dia-
grams of four representative substructures for all ␣, all ␤,
and ␣-␤-␣ type proteins, with their associated CKAAPs
highlighted. DC02 (DCxx denotes the substructure, where
a lower value of xx indicates a more frequently occurring
substructure in the PDB) is found in the nitrate/nitrite
response regulator; DC07, in the mannose permease; and
DC30, in the vascular cell adhesion molecule. The molecular chaperone DnaK is a heat-shock protein family member, important in protein folding, interaction, and translocation. DC57 represents an ␣ helical segment in DnaK
that is involved in DnaK-substrate complex stabilization.
It comes as no surprise, based on many studies of the
hydrophobic cores of proteins, that most of CKAAPs are
found in the hydrophobic core of the molecule. However, a
number of CKAAPs are found in loop regions, exposed to
solvent and available for interaction with solvent and/or
other cellular components. The remainder of this article is
devoted to a detailed analysis of the CKAAPs in a few
substructures, for example, the Ig fold and the EF-hand
motif containing Tn-C, together with an analysis of the
overall environment for the most predominant CKAAPs
versus other residues.
The immunoglobulin fold (DC01)
The most common substructure (DC01) is the immunoglobulin (Ig) fold, with 105 aligned members with no
discernable sequence similarity. Sixteen such substructures are superimposed in Figure 3 to represent the level
of structural diversity and conservation within the Ig
superfamily. The Ig superfamily includes immunoglobulins, cell adhesion molecules, extracellular matrix proteins, bacterial and viral proteins, and the NF-␬B p65
subunit and PD-1, which are involved in gene regulation
and cell death.61 The identities of a subset of DC01
comprised of sequences with ⬍25% sequence identity are
shown in Table II. The many functions represented by
these proteins attest to the diversity present in this
superfamily of proteins, yet the Ig fold is highly conserved
(Fig. 3). The Ig fold is the most commonly occurring
␤-sandwich and defined by six core strands in two sheets
(A, B, E, sheet I; G, F, C, sheet II). The Ig fold family (IgFF)
is further classified into a number of different subtypes
based on sequence or structural conservation.62 The extracellular matrix protein tenascin-C (TNfn3) consists of 15
repeats of the fibronectin type 3 (Fn3) subtype. The crystal
structure of the third Fn3 repeat (1TEN) is used as a
reference structure for the substructures defined as
DC01.38
In this study, we identified 16 CKAAPs (see Fig. 4a for
the 3-D structure, Table II for aligned subsequences and
key positions, and Table III for the log odds matrix). Most
of the key residues contribute to the stabilization of the
␤-sandwich via hydrophobic side chains packed between
the sheets. The tendency for conserved residues to occur in
secondary structures, ␣-helices or ␤-strands, has been
observed in many studies; however, the conservation of
residues or amino acid property group in the turn and loop
region is less documented. The loop regions have in
CONSERVED KEY AMINO ACID POSITIONS
153
Fig. 2. Stereo diagrams of common substructures with C-␣ of CKAAPs highlighted as solid spheres and associated side chains labeled and
rendered as sticks. Details of CKAAPs for all substructures with their associated DCxx identifiers are available via the Web at http://ckaaps.sdsc.edu. a:
1RNL, a nitrate/nitrite response regulator. This three-layer ␣-␤-␣ sandwich fold is observed in 43 dissimilar subsequences (DC02); b: 1PDO, a mannose
permease. This ␣-␤-␣ sandwich is observed in 30 dissimilar subsequences (DC07). c: 1VSC:A, a vascular cell adhesion molecule. This ␤ sandwich is
found in 16 dissimilar subsequences (DC30) and includes a disulphide bridge between residues C71 and C23. d: 1DKX:A, a substrate binding domain of
DnaK. This mainly ␣ substructure is found in 11 dissimilar subsequences (DC57).
general been observed to be less conserved and serve as
ligands or ligand binding sites. In TNfn3, the RGD and
IDG tripeptide binding sites for integrin are located on the
F-G and B-C loop, respectively. In our study, we did not
identify any positions as CKAAPs on those two loops,
suggesting that these positions are not key across the
complete superfamily. Rather, we identified conserved
positions on the E-F loop (Leu863) and A-B loop (Thr817).
Closer examinations of these positions in the loop region
show that Leu863 and Thr817 have a hydrogen bond
interaction that may be very important for the turn (Fig.
4a). Together with the hydrogen bond interactions between Ala819 and Ile860 (both residues identified as
CKAAPs by our work and by Halaby and Mormon.61)
providing stability to the ␤-sheet, the Leu863-Thr817
hydrogen bond interaction could further contribute to the
conformational stability of this region. In Table III, the
position marked as j (Thr817) has several other possible
amino acids, Gly with a score of 31, Asp 5, Asn 4, and Thr 0.
These residues are found in the small residue group (see
Table III). Thus, across substructures, Gly has the highest
chance of being observed, suggesting the need for a small
residue suitable for a tight turn.
There are two recent reports related to the work presented here.31,61 These articles describe conserved residues in commonly occurring folds in proteins using different methods from that reported here. The report by Halaby
et al.62 discussed only the conserved residues in the Ig fold
formed by functionally varied protein sequences. Mirny
and Shakhnovich31 discussed the five most commonly
occurring folds, including the Ig fold. They identify key
residues in the fold nucleus by correlating the low-residue
entropy in the homologous sequence space with solvent
exposure. That is, residue positions with low entropy and
more buried in the structure of all families are said to be
universally conserved residues important to the fold
nucleus. Results presented here are compared with the
results from these approaches. The same core hydrophobic
residues identified by Mirny and Shakhnovich31 (Ala819,
Ile821, Trp823, Leu835, Val871, and Leu873) are all
154
B.V.B. REDDY ET AL.
Fig. 3. a: Stereo view of dissimilar subsequences that form the highly conserved immunoglobulin (Ig) fold (DC01). Each subsequence has a unique
color. b: Rotated 90° about the vertical axis. The superimposed substructure is approximately 86 amino acids in length. Shown are 16 representatives
with an RMSD between C-␣ atoms of ⱕ3.0 Å and a Z-score ⬎ 4.0. Sequence identity between any two sequences is ⬍25%. The PDB ID:chain ID, for
each substructures is 1TEN:_ 1AHW:C, 3HHR:C:1, 1ILL:R(193), 1JRH:I, 1ITE:B, 2HFT:_, 1ITE:C, 1EBP:A, 1BP3:B:1, 1BJ8:_, 1CFB:_, 1TTF:_,
1DAN:T, 1B4R:_ and 1CTO:_.
identified as CKAAPs by this study. The topohydrophobic
residues identified by Halaby et al.62 are also identified
(Ile821, Trp823, Leu835, Tyr837, Ile860, Tyr869, Val871,
and Leu873). Residues underlined were identified by all
three studies.
The Troponin C EF-hand calcium-binding
substructure
In the case of DC01 (TNfn3), we observed an apparent
separation of structural conservation and functional conservation, where residues known to be important in one
family did not show conservation across the superfamily.
DC05, on the other hand, reveals a case of conservation of
structural as well as functional residues (Fig. 4b). In this
substructure alignment, the chicken troponin C (TnC)
(Glu413 Ala) mutant (1SMG)63 is used as the reference
structure. Troponin C and troponin I (Tnl) are very
important in skeletal muscle contraction and relaxation
regulated by calcium.63 TnC has two low-affinity regulatory Ca2⫹ binding sites located in the N-terminal domain,
each consisting of a EF-hand motif.63– 65 Calcium binding
to the EF-hand induces a conformational change in TnC,
exposing the core hydrophobic residues, which is the
proposed TnI binding site.66
In a study by Strynadka et al.67 of turkey TnC (5TNC)
the EF-hand residues important for the function of calcium binding and conformational change are examined via
hydrogen bond interactions. We have identified 18 CKAAPs
in TnC, underlying residues found in our studies and
Strynadka et al.67 Asp30, Asp32, Glu41(Ala) belong to the
1st EF-hand, of which Gly34, Asp36, and Ser38 are not
identified, but Phe26, Gly35, Ile37, and Leu42 are identi-
CONSERVED KEY AMINO ACID POSITIONS
155
TABLE II. Subsequences With <25% Sequence Identity That Represent the DC01 Substructure†
------------r-------m-kjc-l--o--------b-h--------------------n-a--e---p-f-i-d-----------q---g1TEN:_
1WIT:_
2VAA:B
1WHP:_
1IGJ:A
1SOX:A
1IGY:B
1AIF:H
1BHG:A
1VSC:A
1SEB:E
1EUT:_
1WIQ:B
1HWG:B
1ILM:G
1AKJ:D
2HFT:_
4KBP:B
1IGE:A
1BGM:L
1CID:_
2ISD:A
1AO2:N
1BEC:_
1AH1:_
1DAN:U
2RAM:A
2PCY:_
1GOG:_
1EBP:A
1CTO:_
1NCG:_
1FNF:_
1IAM:_
1GGT:A
1TTF:_
1TLK:_
1ILN:B
1TIT:_
1ZXQ:_
1HNG:A
DAPSQIEVKDVTDTTALITWFKPLAEIDGIELTYGIKDVPGDRTTIDLTEDENQYSIGNLKPDTEYEVSLISRRGDMSSNPAKETFT
KILTASRKIKIKAGFTHNLEVDFIGADPTATWTVG----DSGAALADAKSSTTSIFFPSAKRADSGNYKLKVKNELGED-EAIFEVI
KTPQIQVYSRHPKPNILNCYVTQFHPPHIEIQMLKNG----KKIPEMSDMSYILAHTEFTPTETDTYACRVKH---DSMEPKTVYWD
---VTFTVEGSNEKHLAVLVKYEGDT--MAEVELREHGSD-EWVAMTKG-EGGVWTFDSEEPLQGPFNFRFLTEK-GMKNVFDDV-VMTQTPLSLPVSLGDQASISCRSSQSLVYLNWYLQKAGQS-PKLLIYKVGTDFTLKISRVEAEDLGIYFCSQTTHVPPTFGGGTKLE
PVQS-AVTQPRVPELTVKGYAWSGGGREVVRVDVSLDGGR-TWKVARLMGDALWELTVPVEATELEIVCKAVDS--SYNVQAWHRVR
---LQESGAELARPSVKMSCKASGYTTYTIHWIKQRPGQGL-EWIGYINPSTANIHLSSLTSDDSAVYYCVRE--GEVPYWGQGTTV
---KLQESGGGLVSMKLSCVASGFTFNNYWMSWVRQSPKGLEWVAEIRLDDSRLYLQMNSLRATGIYYCVLRPLFYYVDYWGQGTSV
YIDDITVTTSVEQSGLVNYQISVKGNLFKLEVRLLDAE--NKVVANGTG--TQGQLKVPGVSLYLYSLEVQLTAQPVSDFYTLPVGI
FKIETTPRYLAQIGSVSLTC-STTGCESPFFSWRT--QIDSPLNKVTNEGTTSTLTMNPVSFGNEHSYLCTATC-ESRKLEKGIQVE
VPPEVTVLTNSPPNVLICFIDKF--TPPVVNVTWLRN--GKPVTVSETVFLFRKFHYLPFLPSDVYDCRVEHW---GLDEPLLKHWE
ICAP-FTIPDVALVTVPVAVTNQSGIVPKPSLQLD-ASPDWQVQGSVEPLMQAKGQVTITVPGRYRVGATLRT---SAGNASTTFTV
VFGLTANSDHLLQGQSLTLTLESPPGSSPSVQCRSPR----GKNIQGGK----TLSVSQLELQDSGTWTCTVLQNQKKV-EFKIDIV
DPPIALNWTLLNHADIQVRWEAPRNMVLEYELQYKEVNETKWKMMDPIL--TTSVPVYSLKVDKEYEVRVRSKQSGNYGEFSVLYVT
WAPENLTLHKLSESQLELNWNNRFLLEHLVQYRTDWDHS-WTEQSVDYR---HKFSLPSVDGQKRYTFRVRSRFAQHWSEWSPIHWG
-SQFRVSPRTWNLGTVELKCQVLLSNPSGCSWLFQPRGAAASPTFLLYLSDTFVLTLSDFRRENEGYYFCSALSNSIMYFSHFVPVF
VAAYNLTWKSTN-FKTILEWEPKPVNQ-VYTVQISTKSGD-WKSKCFYTT-DTECDLTDEDVKQTYLARVFSYPXXXEPLYENSEFT
APQQVHITQGDLVGRAMIISWVTMDEPGSSAVRYWSEKNG-RKRIAKGKMSIHHTTIRKLKYNTKYYYEVGLR----NTTRRFSFIT
---VSAYLSRPSPPTITCLVVDLAPSKGTVNLTWSRASG-KPVNTRKEEKQTVTSTLPVGTRGETYQCRVTH-----PHRALMRSTT
QISDFHVATRFFSRAVLEAEVQMCGEYLRVTVSLWQG--ETQVASGTAPFGGVTLRLNVENPKLLYRAVVELHTDGTLIEAEACDVG
------ITAYKSEGSAEFSFPLNLGEESLQGELRWKAEPSSQSWITFSLKLPLTLQIPQVSLQFAGSGNLTLTLD-RGILYQEVNLV
------------LRVRIISGQQLPNSIVDPKVIVEIHGVGTGSRQTAVITNPRWDMEFEFEVTALVRFMVEDYDSSSNDFIGQSTIP
PMVERQDTDSCLVYGGQQMILTGQNFTESKVVFTEKTTDQQIWEMEATVDKLFVEIPEKHIRTPVKVNFYVIN-GKRKRSQPQHFTY
AVTQSPRNKVAVTGGKVTLSCQQTNNHNNMYWYRQ-DTGHGLRLIHYSYQEQFSLILELATPSQTSVYFCASGGGAEQFFGPGTRLT
---VAQPAVVLASGIASFVCEYASPGKEVRVTVLRQADSQVTEVCAATYMMQVNLTIQGLRAMTGLYICKVELMYPYYLGIGNGTQI
GQPTIQSFE-QVGTKVNVTVEDERTFDLIYTLYYWXXXXSG-KKTAKTN--TNEFLID-VDKGENYCFSVQAVIPNRKSTDSVECMLKICRVNRNSGSCLGDEIFLLCDKVQKEDIEVYFTG---PGWEARGSFSQADVFRTPPPSLQAPVRVSMQLRRPSDRELSEPMEFQY
SLAFVPSEFSISPGEKIVFKNNA---GFPHNIVFDSIPSGVDASSMLLNAKGETFEVA--LSNKGEYSFYCSP----HAGMVGKVTV
-PKITRTSTQSVKVGGRITISTDS---SISKASLIRYGDQRRIPLTLTNNGSYSFQVPSDSLPGYWMLFVMNS---AGVPSVASTIR
DAPVGLVARLAG--HVVLRWLPPPETHIRYEVDVSAGNGAGSVQRVEILEGRTECVLSNLRGRTRYTFAVRARMGGFWSAWSPVSLL
---PMLQALDIGPGCLWLSWKPWKYMEQECELRYQPQLKGANWTLVFHLPSSKQFELCGLHQAPVYTLQMRCIRPGFWSPWPGLQLR
-----IPPINLPENSELVRIRSGRDLSLRYSVTGPGA-DQPPTGIINPI--SGQLSVTKPLDRARFHLRAHAVDINGNQNPIDIVIN
PPPTDLRFTNIGPDTMRVTWAPPPIDLTNFLVRYSPVKNEEDVAELSISPSDNAVVLTNLLPGTEYVVSVSSVYEQHESTPLRGRQK
WTPERVELAPLPSLTLRCQVEGGAPRAQLTVVLLRGE----KELKREPAVGEAEVTTTVLHHGAQFSCRTELDLRPQELFENTSPYQ
DMDFEVEN--AVLGKDFKLSITFRNTITAYLSANITFYGVPKAEKKETFDVEAVLIQAGQLLEQASLHFFVTARIRDVLAKQKSTVL
DVPRDLEVVAATPTSLLISWDAPAVTVRYYRITYGETGGNSPVQEFTVPGSKSTATISGLKPGVDYTITVYAVTGASSKPISINYRT
KPYFTKTILDMDVAARFDCKVEG---YPDPEVMWFKDD---NPVKIDYDEEGNCSLTISEVCGDAKYTCKAVN---SLGEATCTAEL
KPFENLRLMAPETHRCNISWEISQAYFERHLFEARTLSPGHTWEEAPLTLKQEWICLETLTPDTQYEFQVRVKPLQTWSPWSQPLAF
IEVEKPLYGVEVFVTAHFEIELS--EPDVHGQWKLKGQ---PLTAIIEDGKKHILILHNCQLGMTGEVSFQA-----ANAKSAANLK
PPRQVILTLQPTLSFTIECRVPTVEPLDLTLFLFRG----NETLHYETFGKATATFNSTADRGHRNFSCLAVLDLMNIFHKHSAPKM
MVSKPMIYWECSNATLTCEVLEG---TDVELKLYQG----KEHLRSLR---QKTMSYQWTN-LRAPFKCKAVN---RVSQESEMEVV
†
The PDB code is followed by the sequence represented by the single letter amino acid code. The conserved key amino acids are shown
in the first row, where “a” is most conserved and “r” is the least conserved.
1TEN_: Tenascin (Third Fibronectin Type III Repeat); 1WIT_: Twitchin 18Th Igsf Module; 2VAAB: Mhc Class I H-2Kb Heavy Chain;
1WHP_: Allergen Phl P 2; 1IGJA: Fab (Igg2A) Fragment (26–10) Complex With Digoxin; 1SOXA: Sulfite Oxidase; Oxidoreductase; 1IGYB:
Igg1 Intact Antibody Mab61.1.3; 1AIFH: Anti-Idiotypic Fab 409.5.3 (Igg2A); 1BHGA: Beta-Glucuronidase (Glycosidase); 1VSCA: Vascular
Cell Adhesion Molecule-1; 1SEBE: Hla Class II Histocompatibility Antigen; 1EUT_: Sialidase; Neuraminidase; Hydrolase; 1WIQB: T-Cell
Surface Glycoprotein Cd4; 1HWGB: Growth Hormone; 1ILMG: Interleukin-2 (Model); 1AKJD: Mhc Class I Histocompatibility Antigen;
2HFT_: Human Tissue Cogulation Factor; 4KBPB: Purple Acid Phosphatase; 1IGEA: Fc Fragment (Ige’Cl); 1BGML: Beta-Galactosidase
(O-Glycosyl); 1CID_: T-Cell Surface Glycoprotein CD4; 2ISDA: Phosphoinositide-Specific Phospholipase C, Isozyme 1; 1AO2N: Nfat-DNA
Binding Domain; 1BEC_: 14.3D T Cell Antigen Receptor; 1AH1_: Ctla-4 N-Terminal Immunoglobulin V-Like; 1DANU: Blood Coagulation
Factor Viia; 2RAMA: Transcription Factor Nf-Kb P65; 2PCY_: Plastocyanin; 1GOG_: Galactose Oxidase (Oxidoreductase); 1EBPA:
EPO-receptor (Cytokine Receptor/Peptide); 1CTO_: Granulocyte Colony-Stimulating Factor—Receptor; 1NCG_: Neural Cadherin
Domain 1; 1FNF_: Fibronectin; 1IAM_: Intercellular Adhesion Molecule-1; 1GGTA: Coagulation Factor Xiii (A-Subunit Zymogen);
1TTF_: Fibronectin (Tenth Type III Module); 1TLK_: Telokin; 1ILNB: Interleukin-2 Complex (Cytokine—Model B); 1TIT_: Titin Ig
Repeat-27 (Connectin-I27); 1ZXQ_: Intercellular Adhesion Molecule-2; 1HNGA: Cell Adhesion Molecule Cd2; 1ASOA: Ascorbate
Oxidase (Oxidoreductase); 1ITEC: Interleukin-4 Receptor; 1SVB_: Tick-Borne Encephalitis Virus Glycoprotein; 1ALS_: Fceri (Ige)
(Subunit, Extracellular Region); 1CFB_: Drosophila Neuroglian (Fibronectin Type III Repeats); 1KOA_: Twitchin (Kinase Fragment);
1AAC_: Amicyanin (Electron transport); 1ILNG: Interleukin-2 Complex (Cytokine); 1NEU_: Myelin PO Protein.
fied as part of the EF-hand. Asp66, Asp68, Ser70, Asp74,
and Glu77 belong to the second EF-hand, of which Thr72 is
not identified, but Ile73, Phe75, and Phe78 are. Clearly,
Phe26, Ile37, Leu42, Ile73, Phe75, and Phe78 are involved
in the formation of the hydrophobic core and may not
directly participate in calcium binding. Therefore, in this
156
B.V.B. REDDY ET AL.
Fig. 4. Stereo views of tenascin (TNfn3) and troponin C (TnC). CKAAPs are rendered as labeled sticks, and their van der Waals surfaces are
rendered according to secondary structures: red, helices; blue, strands; and gray, loops. The EF hand of troponin C is colored yellow. a: Extracellular
matrix protein tenascin (1TEN). From the reader’s view the right sheet comprises strands ordered A, B, and E away from the reader and the left sheet
strands ordered G, F, C, and C⬘ (an additional strand) away from the reader. b: Calcium-regulated muscle protein troponin C (1SMG) with a E413 A
mutant and NMR structure model 1 used. The EF-hand is colored yellow.
case, our method not only identified the residues contributing to the hydrophobic core formation but also the functionally conserved residues needed for calcium binding.
CKAAPs Versus Nucleation-Stabilization Centers
Predicted by Using Other Methods
The CE structure alignment procedure allows us to
obtain structurally homologous protein sequences of complete polypeptide chain by setting higher thresholds on
RMSD and lower thresholds on Z-scores. This differs from
the substructures that have a higher level of structural
homology but only over a fragment of the complete polypep-
tide chain. The result is longer aligned sequence, but
possibly less accurate alignments. The question then
becomes whether useful CKAAPs can still be derived from
these alignments. To address this question, we have
obtained the maximum available number of structural
neighbors (4 or more) with low-sequence identities to
chymotrypsin inhibitor, Chey-signal transduction protein,
and cytochrome C and the ubiquitin family of proteins
(Table IV). We have then followed a similar procedure as is
done in the case of common substructures to identify
CKAAPs for each family of proteins. We have compared
the kinetically hot residues24 and folding nucleus residues
157
CONSERVED KEY AMINO ACID POSITIONS
TABLE III. Weighted Log Odd Values (Hlm) for CKAAPs†
7
A
V
L
I
F
Y
W
S
T
C
M
N
Q
D
E
H
K
R
G
P
Ra
Rg
a
b
c
d
e
f
d
e
f
g
h
i
j
k
l
m
n
o
⫺1
⫺5
⫺10
0
1
0
⫺1
0
1
0
2
⫺1
0
0
2
1
0
0
1
36
2
29
4
⫺3
3
0
0
7
5
5
⫺6
0
⫺3
0
⫺5
0
13
0
15
0
22
⫺1
8
3
0
5
11
⫺1
⫺5
14
0
⫺25
0
11
26
4
9
⫺2
0
⫺4
4
1
2
3
0
1
⫺4
2
⫺3
⫺3
⫺1
0
0
0
1
0
3
0
0
7
9
0
⫺2
0
0
5
0
⫺4
0
0
0
0
⫺5
0
⫺10
56
⫺5
20
31
⫺10
⫺6
0
⫺5
5
⫺10
0
⫺1
⫺2
⫺5
14
⫺5
⫺1
⫺5
⫺1
11
10
0
⫺1
0
43
⫺5
0
0
0
⫺5
0
⫺9
⫺1
⫺3
⫺3
⫺2
⫺3
⫺4
⫺3
⫺1
0
⫺2
0
0
⫺1
3
⫺4
10
⫺3
⫺1
0
0
⫺3
0
⫺1
0
⫺9
⫺3
0
0
0
0
⫺1
0
7
0
0
⫺3
⫺3
11
40
⫺1
⫺1
⫺3
⫺1
4
⫺3
0
26
⫺3
8
⫺1
⫺3
⫺1
⫺3
6
0
0
0
0
⫺1
0
0
⫺2
0
7
⫺1
⫺2
1
0
0
1
0
0
⫺2
0
⫺11
⫺1
0
⫺4
⫺3
⫺1
0
⫺3
⫺6
4
⫺5
0
0
0
⫺5
0
0
⫺4
0
⫺1
⫺1
0
⫺1
⫺4
0
⫺2
⫺1
⫺1
0
0
0
0
4
⫺4
⫺3
0
⫺12
0
⫺5
⫺5
⫺10
⫺7
⫺4
0
⫺12
5
⫺1
0
19
⫺5
⫺3
⫺3
0
⫺2
⫺8
0
⫺1
⫺2
⫺18
⫺11
1
⫺2
⫺6
0
⫺2
0
0
⫺2
⫺2
⫺1
⫺1
⫺4
⫺2
0
0
⫺1
0
0
0
⫺4
0
0
⫺4
⫺1
⫺1
0
0
⫺1
0
0
0
⫺6
0
⫺1
0
0
0
0
0
⫺1
⫺2
0
0
0
0
0
⫺12
⫺3
0
⫺2
⫺3
0
2
⫺4
0
0
⫺2
0
⫺1
0
⫺1
0
⫺1
⫺19
⫺16
⫺19
⫺5
⫺3
6
0
⫺2
⫺1
⫺19
0
⫺2
31
⫺8
3
0
3
⫺8
0
⫺16
⫺16
⫺13
0
0
⫺3
⫺16
⫺16
⫺7
⫺10
⫺16
0
⫺16
5
0
1
3
55
68
51
66
54
62
43
44
47
43
46
52
54
44
33
43
38
42
39
22
31
12
20
8
28
26
22
27
23
16
12
20
28
17
21
16
Values are given in decreasing order of their (Ra ⫹ Rg) values.
†
in the ubiquitin family of proteins reported by Michnick
and Shakhnovich30 to the CKAAPs and observed that at
least 75% of the residues are common. This indicates a
high level of consistency, but using a method that is
computationally tractable and can be applied across a wide
range of superfamilies for which multiple structures can
be aligned. We have compared CKAAPs with conservatismof-conservatism (CoC) residues31 for the most commonly
occurring folds identified by both methods (Fig. 5). We
predicted all the CoC residues as CKAAPs, plus some
additional residues in our arbitrary 20% cutoff. Most of our
additionally predicted amino acid positions are in the
terminal regions of rigid secondary structural elements.
Mirny and Shakhnovich31 identify the positions conserved
and unexposed to the solvent; however, we take the first
20% of the conserved positions to be important for the fold.
Similar CKAAPs in Related Substructures—the
Continuity of Fold Space
The substructures obtained through structural alignment using CE show some degree of overlap.38 The question then becomes are the CKAAPs identified in both
substructures the same? We have compared the CKAAPs
for TNfn3 (1TEN) and the vascular cell adhesion molecule
(VCAM, 1VSC) from substructures DC01 and DC30, respectively (see Shindyalov and Bourne38 for details). Both are
derived from the Ig fold, illustrating the continuity of the
folding. CKAAPs of VCAM are identified by aligning
against TNfn3 and shown in bold letters in Table V(a).
When VCAM is used as the reference structure in DC30,
the key positions identified are slightly different (Table
V(b)). However, 10 of the residues identified are identical.
In addition, when the results for VCAM are compared with
those identified by Halaby et al.,62 all the topohydrophobic
positions are identified, as shown by underscored residues
except two amino acids (Table V(b)). In particular, both
cysteines that are important in disulfide bridge formation
in VCAM are identified as CKAAPs in the reference
structure and in analogous structures.
Mutations of CKAAPs and the Effects on Structure,
Function, and Stability of Proteins
We have searched the protein mutation database
(PMD)68 and the relevant literature to find reports on
proteins in which the residues we predict as CKAAPs are
mutated and any observed effects on structure, function,
and stability of the protein. We discuss here two examples
of such mutation studies.
Arc repressor protein
Arc repressor of bacteriophage P22 has 53 residues and
exists as a homodimer with a simple architecture consisting of a ␤-sheet and four ␣-helices. Alanine scanning
substitution mutations were made and the effects on
folding, stability, and function of this protein have been
reported.33 Using CE we identified 15 structural homologues in the PDB aligned to residues 7– 46 of the Arc
repressor protein. The CKAAPs for this protein are found
to be G30, V25, S32, E36, R31, R40, S35, V22, and S44 in
the order of their conservation. The alanine-scanning
mutations for this protein were done one residue at a time.
Of the nine CKAAPs mutated, seven alanine substitution
mutants (G30A, V22A, S32A, E36A, R31A, R40A, and
S44A) were found to decrease the equilibrium stability and
cause a significant increases in the rate of unfolding.69 In
other words, compared with other mutants, CKAAP mutants exhibit more severe perturbations in protein stability. The implication is that CKAAPs indeed play a significant role in structural stability and fold architecture.
B1 Domain of streptococcal IgG-binding protein G
versus Rop protein
Dalal et al.34 interconverted an ␣;/␤-sheet protein, the
B1 domain of Streptococcal IgG-binding protein G (1PGA),
158
B.V.B. REDDY ET AL.
TABLE IV. Determination of CKAAPs in Complete Polypeptide Chains†
A. Ubiquitin family CKAAPs
--fca---------j-d--------o--------ml--n---i------b--------------kgh-e----1TBE:A 1 MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLR
1RAX:A 29 CIIRVSLDVNMYKSILVTSQDKAPAVIRKAMDKHNLEPEDYELLQIKLKIPENANVFYAMNSANYDFVLKKRTF
1RRB:_ 20 -TIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLARLDWNTDAA--SLIG-EELQVDFLK1A5R:_ 24 IKLKVIGQDSSEIHFKVKMTTHLKKLKESYCQRQGVPMNSLRFLFEGQRIADNHTPKELGMEEEDVIEVYQE---*-*---------*-*--------*---*------------*-----------------------*-*-----
B. CKAAPs compared with kinetically hot residues
†
(A) Ubiquitin and its structural homologues (⬍25% sequence identity). The RMSD, sequence identity, length of alignment, and Z-score values for
each structural alignment are given. For the other two sequences of the ubiquitin-like superfamily only CKAAPs and nucleation residues
identified by other methods are given. (B) CKAAPs are compared with kinetically hot residues in proteins of the chymotrypsin inhibitor family of
protein structures. Alphabets above CKAAPs represent the descending order of (Ra ⫹ Rg) values (a highest). The conserved amino acids with
potential nucleation sites identified by other methods are marked with a * in the last row of each sequence alignment. ˆ represents an important
site peculiar to the specific structure.
to a homodimeric helix-turn-helix Rop-like protein
(1ROP) through substitution mutation of ⬍50% of the
amino acids (PGA-m). The B1 domain is 56 amino acids
in length and the helix-turn-helix motif of Rop is
confined to the first 56 N-terminal amino acids. We have
taken the 26 nonhomologous subsequences from the CE
alignments of 1ROP and identified CKAAPs and calculated the frequency based log-odds table. Similarly, we
obtained CKAAPs and the log-odds matrix for the 11
nonhomologous CE alignments of 1PGA. Of the 28
substitution mutations in PGA-m, 12 are observed to be
CKAAPs of either 1PGA or 1ROP (Table VI(a)). Eight
mutations correspond to the CKAAPs of 1ROP and 7
mutations correspond to the CKAAPs of 1PGA. There
are four common CKAAPs for 1PGA and 1ROP corresponding to three mutation sites. We compared the sum
score of log-odd values for CKAAPs positions of 1PGA-m
by using the 1PGA-based log-odds matrix and 1ROPbased log-odds matrix to sum score of log-odd values of
corresponding native proteins and observed an interesting relationship. The sum score of log-odd values of
1PGA-m converges significantly toward a Rop-like structure (Table VIb). The implication is that the mutations
made by Dalal et al.34 disturbed a significant number of
CKAAPs in such a way that the mutant protein is now
less inclined to form an ␣/␤-structure and more inclined
to form a helix-turn-helix structure. Such a result points
toward the potential usefulness of CKAAPs in protein
design. In fact, using CKAAPs we suggest a minimum
set of 12 substitution mutations to 1PGA to engineer a
structural change from the 1PGA ␣;/␤; structure to the
1ROP like helix-turn-helix motif (Table VIb). Amino
acid residues for substitution are selected such that the
sum score of log-odd values of amino acids for the
CKAAPs of 1ROP are maximized and that of the CKAAPs of 1PGA are minimized. This was observed to be
consistent with similar experiments performed by other
groups.70
CONSERVED KEY AMINO ACID POSITIONS
159
Fig. 5. CKAAPs compared with key residues identified by other methods. a: Conservatism-of-conservatism (CoC) amino acids20 and those
identified by Clarke et al.32 1TEN, tenascin, representative of Ig fold (DC01). The color coding of C-␣ atoms is as follows: Purple, CKAAPs, green, CoCs;
red, Clarke et al.; pink, CKAAPs ⫹ Clarke et al.; blue, CoC ⫹ Clarke et al.; black, identified by all three methods. b: Conservatism-of-conservatism (CoC)
amino acids.20 2ACY, acylphosphatase, representative of ␣/␤ plaited structures (not in subdomain gallery). The color coding of C-␣ atoms is as follows:
Yellow: CKAAPs ⫹ CoCs; purple, CKAAPs only.
Structural Environment of CKAAPs
To further evaluate the structural environment of CKAAPs,
we compared various environment-dependent parameters
for the amino acids present in CKAAPs in all substructures against all amino acids. The hydrophobic amino
acids, L, V, and I have a higher relative occurrence
compared with their composition in a nonredundant data
set (Fig. 6). The charged and polar amino acids, K, R, D, E,
N, P, Q, R, S, and T show a considerably lower frequency of
occurrence in CKAAPs. This finding supports the well-
known observation that amino acids in the hydrophobic
core play a key role in the structural integrity of a protein.
The composition of amino acids present in CKAAPs is
higher in terminal regions of the rigid secondary structure
elements, ␣-helices and ␤-strands (not shown), and turns
and loop regions of the protein structures (Fig. 7a). The
total hydrogen-bonding interactions of these residues is
higher for two and three hydrogen bonding interactions
per residue (Fig. 7b). Thus, the charged groups of CKAAPs
are better neutralized by dipole interactions. The Ooi
160
B.V.B. REDDY ET AL.
TABLE V. CE Alignments Showing CKAAPs for the Substructure Represented by 1VSC Using (a) 1TEN as the
Reference Structure and (b) 1VSC as the Reference Structure (Underlined)†
(a)
1TEN DAPSQI
IEVKDVTDT
TTALI
ALITW
WFKP
PLAEIDGIEL
LTY
YGIKDVPGDRTTIDLTEDENQY
YSI
IGNL
LKPDT
TEY
YEV
VSL
LISRRGDMSSNPA
AKETF
FT
1VSC FKIETT
TPRYLAQIG
GSVSL
VSLTC
C-ST
TTGCESPFFS
SWR
RT--QIDSPLNKVTNEGTTSTL
LTM
MNPV
VSFGN
NEH
HSY
YLC
CTATC-ESRKLEK
KGIQV
VE
(b)
1VSC
ETTPESRY
YLA
AQIG
GDSV
VSL
LTC
CSTTG
GCESPFFSW
WRTQ
QIDSPLNGKVTNEGTTSTL
LTM
MNPV
VSFGN
NEH
HSY
YLC
CTATCESRKLEKGIQV
VEI
IYS
1VSCFKIETT
TP--RYLAQIG
G_SVS
VSLTC
CSTTGCESPFFS
SWR
RTQIDSPLN-KVTNEGTTSTL
LTM
MNPV
VSFGN
NEH
HHSY
YLC
CTATCESRKLEK
KGIQV
VE
†
CKAAPs are represented by bold characters. For comparison in (b), 1VSC is copied from (a) except corresponding deletions are
represented by hyphens. The underlined residues are identified by Halaby et al.62
TABLE VI. Comparison of CKAAPs With the Substitution Mutations Made to 1PGA to Meet Paracelsus
Challenge†
1PGA CKAAPs
PGA-m of Dalal et al.33
1ROP CKAAPs
PGA-M Sequence
MTYKLIL
LNGKTLKGET
TTT
TEAVDA
VDAATAE
AEKVF
FKQYA
ANDNGVDGEW
WTYD
DDATKTFTVTE
MTKKAILALNTAKFLRTQAAVLAAKLEKLGAQEANDNAVDLEDTADDLYKTLLVLA
GTKQE
EKTAL
LNMARFIRSQTLTLLE
LEKL
LNELG
GADEQADICESLH
LHDHA
HADELY
YRSC
CLARF
MTYKLIANIKTLKGENTTEAVDIATIDKVGKQYTNDNGVDIASTYKDATKTFTVTE
Log odds matrix 円 Sequence 3
2
¥ Hlm of ROP-CKAAPs
¥ Hlm of PGA-CKAAPs
1PGA
1ROP
PGA-m
PGA-M
⫺18
197
77
⫺2
67
87
97
⫺123
(a)
(b)
†
(a) Comparison of substitution mutations to the 1PGA sequence. (PGA-m) by Dalal et al.34 and the CKAAPs (bold) identified for the
1PGA and 1ROP sequences. The suggested 20 substitution mutations of the 1PGA sequence to convert it to a 1ROP-like structure are
shown in the last row (PGA-M). (b) Sum score of log-odd values for the amino acids at CKAAPs based on 1PGA and 1ROP structural
homologues with ⬍25% sequence identity. The mutations made by Dalal et al.33 (PGA-m) increased the sum score of ROP-CKAAPs
residues and decreases the sum score of PGA-CKAAPs. We suggested a sequence, PGA-M, with only 12 substitution mutations at the
CKAAPs by optimizing the sum score more toward a ROP-like structure.
values (Fig. 7c) indicate that CKAAPs residues are significantly more buried than residues overall, in keeping with
their presence in the hydrophobic core. Finally, the solventaccessible contact area of CKAAPs do not show much
difference compared with amino acids in a random data set
(Fig. 7d).
In summary, the structural environments of CKAAPs
show no change in the normal pattern of solvent accessibility; however, CKAAPs are predominant in the terminal
regions of rigid secondary structural elements. The Ooi
number shows that CKAAPs are mostly surrounded by
other amino acids and that charged groups on the amino
acids are better neutralized by hydrogen bonding interactions.
Usefulness of CKAAPs for Protein Engineering and
Fold Recognition
The recognition of CKAAPs has implications in protein structure prediction and in the design of new
protein sequences to achieve a desired folded architecture. The amino acids present as CKAAPs are mostly
important for the integrity and stabilization of the
common substructure, thus allowing specific mutations
at other locations with minimum distortion to the
overall protein structure. Such mutations could be used
to either engineer altered functions or to design a new
function. CKAAPs have the potential to engineer stability for less stable proteins by appropriately substituting
nonoptimal amino acids at CKAAPs. These conserved
amino acids may also be useful in fold recognition and
structure prediction studies.
CONCLUSIONS
Using the CE structure comparison algorithm identifies
similar substructures formed by dissimilar subsequences.
We have presented a sequence space scanning procedure
to identify conserved key amino acid positions (CKAAPs)
in these commonly occurring protein substructures. We
propose that CKAAPs are important for structural integrity and for nucleation and stabilization of proteins.
Tertiary structure formation from primary amino acid
sequence can be explained by two different models. (i) A
global model in which a fold is formed by interactions
that involve the entire sequence. The global model is
supported by mutation studies that show that mutations
at any position in a sequence, have no measurable
impact on the fold in some proteins. (ii) A local model in
which fold specificity is coded only within a few critical
residues (10 –20%) of the sequence. The local model is
suggested by studies of sequence versus structure similarity that show that naturally occurring protein sequences with 25–90% sequence identities having no
significant change in fold, yet below 25% radical changes
in fold can occur. This observation has been the basis for
successful use of homology-based protein modeling when
the structure being modeled has significant sequence
identity to an existing structure. Our analysis does not
necessarily contradict either of these models but bridges
CONSERVED KEY AMINO ACID POSITIONS
Fig. 6. Histogram showing the composition of amino acids in CKAAPs
relative to their composition in a random (natural) set of nonhomologous
protein structures.
them as we propose a hierarchy of position-specific
residues important for a given fold. In other words,
many of the residues can have an impact on folding, but
161
some clearly have a greater impact that others. This
study clearly lacks a good statistical treatment and
leads to many questions. For example, what is the
minimum number of aligned structures and over what
length is needed to provide useful CKAAPs? Answers to
these questions are part of an ongoing study. It can be
stated at this point that the iterative random removal of
20% of the structures forming a substructure will lead to
10% of the 20% of CKAAPs reappearing at the 95%
confidence level. Clearly, the success of this approach to
assigning residues most likely to impact the folding of a
protein depends on the accuracy of the structure alignments from which the sequence alignments are derived.
Structure alignment is an ongoing area of study in our
laboratory, including multiple structure alignments.71
The prediction of CKAAPs is more reliable the greater
the number of dissimilar subsequences that form similar
substructures. Because the predictions are based on available sequence and structure space, more sequence and
structure data should provide more reliable prediction of
CKAAPs in the future. CKAAPs for substructures are
already available on the Web at http://ckaaps.sdsc.edu. A
Fig. 7. Composition of amino acids in CKAAPs compared with a random representative set of nonhomologous
protein structures. On the Y-axis is the percentage of amino acids and on the X-axis: a: secondary structural regions
(H helix, E strand, and C coil); b: hydrogen bonding interactions; c: Ooi number in an 8 Å radius around the amino
acid; and d: solvent accessible contact area as a percentage of residue accessibility.
162
B.V.B. REDDY ET AL.
database of CKAAPs72 that can be queried by PDB id is
available from the same URL.
29.
REFERENCES
1. Hilbert M, Bohm G, Jaenicke R. Structural relationships of
homologous proteins as a fundamental principle in homology
modeling. Proteins 1993;17:138 –151.
2. Moult J, Hubbard T, Fidelis K, Pedersen JT. Critical assessment
of methods of protein structure prediction (CASP): round III.
Proteins 1999;37(S3):2– 6.
3. Srinivasan N, Blundell TL. An evaluation of the performance of an
automated procedure for comparative modeling of protein tertiary
structure. Protein Eng 1993;6:501–512.
4. Sanchez R, Sali A. Evaluation of comparative protein structure
modeling by MODELLER-3. Proteins 1997;Suppl.1:50 –58.
5. Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB,
Thornton JM. CATH—a hierarchic classification of protein domain structures. Structure 1997;5:1093 –1108.
6. Hubbard TJ, Ailey B, Brenner SE, Murzin AG, Chothia C. SCOP:
a structural classification of protein’s database. Nucleic Acids Res
1999;27:254 –256.
7. Mizuguchi K, Deane CM, Blundell TL, Overington JP. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci 1998;7:2469 –2471.
8. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and
PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389 –3402.
9. Geetha V, Francesco VD, Garnier J, Munson PJ. Comparing
protein sequence-based and predicted secondary structure-based
methods for identification of remote homologs. Protein Eng 1999;
12:527–534.
10. Wood TC, Pearson WR. Evolution of protein sequences and
structures J Mol Biol 1999;291:977–995.
11. Lattman EE, Rose GD. Protein folding—what’s the question? Proc
Natl Acad Sci USA 1993;90:439 – 441.
12. Bowie JU, Reidhaar-Olson JF, Lim WA, Sauer RT. Deciphering
the message in protein sequences: tolerance to amino acid substitutions. Science 1990;247:1306 –1310.
13. Mathews BW. Genetic and structural analysis of the protein
stability problem. Biochemistry 1987;26:6885– 6888.
14. Russell RB, Barton GJ. Structural features can be unconserved in
proteins with similar folds an analysis of side-chain to side-chain
contacts secondary structure and accessibility. J Mol Biol 1994;244:
332–350.
15. Bork P, Holm L, Sander C. The immunoglobulin fold: structural
classification, sequence pattern and common core. J Mol Biol
1994;242:309 –320.
16. Thomas PJ, Qu BH, Pedersen PL. Defective protein folding as a
basis of human disease. Trends Biochem Sci 1995;20:456 – 459.
17. Doolittle RF. Similar amino acid sequences: chance or common
ancestry? Science 1981;214:149 –59.
18. Chothia C, Lesk AM. The relation between the divergence of
sequence and structure in proteins. EMBO J 1986;5:823– 826.
19. Chothia C, Lesk AM. The evolution of protein structures. Cold
Spring Harb Symp Quant Biol 1987;52:399 – 405.
20. Rost B. Twilight zone of protein sequence alignments. Protein Eng
1999;12:85–94.
21. Shakhnovich EI, Abkevich VI, Ptitsyn O. Conserved residues and
the mechanism of protein folding. Nature 1996;379:96 –98.
22. Ptitsyn OB. Protein folding and protein evolution: common folding
nucleus in different subfamilies of c-type cytochromes? J Mol Biol
1998;278:655– 666.
23. Ptitsyn OB, Ting KH. Non-functional conserved residues in
globins and their possible role as a folding nucleus. J Mol Biol
1999;291:671– 682.
24. Demirel MC, Atilgan AR, Jernigan RL, Erman B, Bahar I.
Identification of kinetically hot residues in proteins. Protein Sci
1998;7:2522–2532.
25. Shakhnovich EI. Folding by association. Nat Struct Biol 1999;6:99 –
102.
26. Mirny LA, Abkevich VI, Shakhnovich EI. How evolution makes
proteins fold quickly. Proc Natl Acad Sci USA 1998;28:4976 –
4981.
27. Rost B. Protein structures sustain evolutionary drift. Fold Design
1997;2:519 –524.
28. Dosztányi Z, Fiser A, Simon I. Stabilization centers in proteins:
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
identification, characterization and predictions. J Mol Biol 1997;
272:597– 612.
Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method
defines binding surfaces common to protein families. J Mol Biol
1996;257:342–358.
Michnick SW, Shakhnovich E. A strategy for detecting the conservation of folding-nucleus residues in proteins superfamilies. Fold
Design 1998;3:239 –251.
Mirny LA, Shakhnovich EI. Universally conserved positions in
protein folds: reading evolutionary signals about stability, folding,
kinetics and function. J Mol Biol 1999;291:177–196.
Clarke J, Cota E, Fowler SB, Hamill SJ. Folding studies of
immunoglobulin-like beta-sandwich proteins suggest that they
share a common folding pathway. Struct Fold Design 1999;7:1145–
1153.
Rose GD. Protein folding and the Paracelsus challenge. Nat Struct
Biol 1997;4:512–514.
Dalal S, Balasubramanian S, Regan L. Protein alchemy: changing
beta-sheet into alpha-helix. Nat Struct Biol 1997;4:538 – 452.
Jones DT, Moody CM, Uppenbrink J, et al. Towards meeting the
Paracelsus challenge: the design, synthesis, and characterization
of paracelsin-43, an alpha-helical protein with over 50% sequence
identity to an all-beta protein. Proteins 1996;24:502–513.
Yuan SM, Clarke ND. A hybrid sequence approach to the Paracelsus challenge. Proteins 1998;30:136 –143.
Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein
Eng 1998;11:739 –747.
Shindyalov IN, Bourne PE. An alternative view of protein fold
space. Proteins 2000;38:247–260.
Taylor WR. Classification of amino acid conservation. J Theor Biol
1986;119:205–218.
Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJ. Prediction of
protein secondary structure and active sites using the alignment
of homologous sequence. J Mol Biol 1987;195:957–961.
Hobohm U, Sander C. Enlarged representative set of protein
structures. Protein Sci 1994;3:522–524.
Kabsch W, Sander C. Dictionary of protein secondary structure:
pattern recognition of hydrogen-bonded and geometrical features.
Biopolymers 1983;22:2577–2637.
Smith D. SSTRUC: a program to calculate a secondary structural
summary. Department of Crystallography, Birkbeck College, University of London, 1989.
Nishikawa K, Ooi T. Prediction of the surface-interior diagram of
globular proteins by an empirical method. Int J Pept Protein Res
1980;16:19 –32.
Baker EN, Hubbard RE. Hydrogen bonding in globular proteins.
Prog Biophys Mol Biol 1984;44:97–179.
Lee B, Richards FM. The interpretation of protein structures:
estimation of static accessibility. J Mol Biol 1971;55:379 – 400.
Sali A, Blundell TL. Definition of general topological equivalence
in protein structures: a procedure involving comparison of properties and relationships through simulated annealing and dynamic
programming. J Mol Biol 1990;212:403– 428.
Richardson JS. The anatomy and taxonomy of protein structure.
Adv Prot Chem 1981;34:167–339.
Efimov AV. Structural trees for protein superfamilies. Proteins
1997;28:241–260.
Efimov AV. A structural tree for proteins containing S-like
beta-sheets. FEBS Lett 1998;437:246 –250.
Reddy BVB, Blundell TL. Packing of secondary structural elements in proteins: analysis and prediction of inter-helix distances.
J Mol Biol 1993;233:464 – 479.
Reddy BVB, Nagarajaram HA, Blundell TL. Analysis of interactive packing of secondary structural elements in alpha/beta units
in proteins. Protein Sci 1999;8:573–586.
Nagarajaram HA, Reddy BVB, Blundell TL. Analysis and prediction of inter-strand packing distances between beta-sheets of
globular proteins. Protein Eng 1999:12;1055–1062.
Reddy BVB, Datta S, Thiwari S. Use of propensity of amino acids
to the local structural environments to understand effect of
substitution mutations on protein stability. Protein Eng 1998:11;
1137–1145.
Lesk AM, Chothia C. The response of protein structures to amino
acid sequence changes. Phil Trans R Soc Lond [Biol] 1986;317:345–
356.
CONSERVED KEY AMINO ACID POSITIONS
56. Shortle D. Mutational studies of protein structures and their
stability’s. Q Rev Biophys 1992;25:205–250.
57. Shortle D, Sondek J. The emerging role of insertions and deletions
in protein engineering. Curr Opin Biotechnol 1995;6:387–393.
58. Dice JF. Molecular determinants of protein half-lives in eukaryotic cells. FASEB J 1987;1:349 –357.
59. Murzin AG. How far divergent evolution goes in proteins. Curr
Opin Struct Biol 1998;8:380 –387.
60. Cordes MHJ, Walsh NP, Knight JM, Sauer RT. Evolution of a
protein fold in vitro. Science 1999;284:325–327.
61. Halaby DM, Mornon JP. The immunoglobulin superfamily: an
insight on its tissular, species, and functional diversity. J Mol Evol
1998;46:389 – 400.
62. Halaby DM, Poupon A, Mornon JP. The immunoglobulin fold
family: sequence analysis and 3D structure comparisons. Protein
Eng 1999;12:563–571.
63. Gagne SM, Li MX, Sykes BD. Mechanism of direct coupling
between binding and induced structural change in regulatory
calcium binding proteins. Biochemistry 1997;36:4386 – 4392.
64. Ingraham RH, Swenson CA. Binary interactions of troponin
subunits. J Biol Chem 1984;259:9544 –9548.
65. Kretsinger RH, Nockolds CE. Carp muscle calcium-binding pro-
66.
67.
68.
69.
70.
71.
72.
163
tein II. Structure determination and general description. J Biol
Chem 1973;248:3313–3326.
Farah CS, Reinach FC. The troponin complex and regulation of
muscle contraction. FASEB J 1995;9:755–767.
Strynadka NC, Cherney M, Sielecki AR, Li MX, Smillie LB, James
MN. Structural details of a calcium-induced molecular switch:
x-ray crystallographic analysis of the calcium-saturated Nterminal domain of troponin C at 1.75 A resolution. J Mol Biol
1997;273:238 –255.
Kawabata T, Ota M, Nishikawa K. The protein mutant database.
Nucleic Acids Res 1999;27:355–357.
Sauer RT, Milla ME, Waldburger CD, Brown BM, Schildbach JF.
Sequence determinants of folding and stability for the P22 Arc
repressor. FASEB J 1996;10:42– 48.
Dalal S, Balasubramanian S, Regan L. Transmuting alpha helices
and beta sheets. Fold Design 1997;2:R71–79.
Guda C, Scheeff ED, Bourne PE, Shindyalov IN. A new algorithm
for alignment of multiple protein structures using Monte Carlo
optimization. Pacific Symposium on Biocomputing 2001. In press.
Li W, Reddy BVB, Shindyaloo IN, Bovine PE. CKAAPs DB: A
conserved key amino acid positions database. Nucleic Acids
Research 2001. In press.