Download 38

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein-gated ion channel wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Transcript
38
THE SIGNIFICANCE AND IMPACTS
OF PROTEIN DISORDER AND
CONFORMATIONAL VARIANTS
Jenny Gu and Vincent Hilser
INTRODUCTION
Protein disorder is a topic worth attention from the structural bioinformatics community
largely for the technical challenges it presents to the field, but also for its biological and
functional implications. The success of structural genomic efforts using X-ray crystallography depends on overcoming several potential bottlenecks (Chapter 40), one of which is the
formation of protein crystals that can be obstructed by the presence of highly flexible and
disordered regions. Despite precluding the number of structures that can be obtained thus
impacting the coverage of protein space, our current generalized understanding of disordered regions is a result of structural bioinformatics efforts that were able to extract and
analyze patterns associated with these regions. These disorder predictors have been proven
to be useful in advancing our understanding of disordered regions with potential impact to
improve the success rate of structural genomics efforts, particularly those focused on
eukaryotic proteins (Oldfield et al., 2005b).
The importance of resolving differences observed in conformational variants within
protein families and understanding their impacts is also a rising issue. Most structural
genomics efforts aim to solve a representative structure for each protein family to maximize
the coverage of protein space with particular focus on identifying new protein folds.
However, it is equally important to understand structural changes that result from sequence
differences introduced by a few single point mutations, insertions, and/or deletions since it
can have a large functional impact. Furthermore, the structural information recorded in the
Protein Data Bank (PDB) is often overlooked as a macroscopic view of a collection of
microscopic ensembles that give rise to the observed protein structure. In other words, the
Structural Bioinformatics, Second Edition Edited by Jenny Gu and Philip E. Bourne
Copyright 2009 John Wiley & Sons, Inc.
939
940
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
observed protein structure is not the only conformation adopted by the protein. In fact, most
observed biological phenomena are a macroscopic consequence of the collective microscopic states. Understanding the differences in the microscopic states and how the changes
impact the macroscopic event is currently addressed in several ways that will be discussed.
By exploiting the technical weakness in structural data, researchers have been able to
gain insight into the potential biological significance of these otherwise poorly characterized
disordered regions (Ringe and Petsko, 1986). Recognition for the importance of protein
disorder in biological function came around as early as the late 1970s when disordered
regions seem to reoccur within particular features of enzymes such as the zymogens of
pancreatic serine proteases and tyrosyl-tRNA synthetases (Blow, 1977). In light of these
investigations, the hypothesis presented at the time was that the reactivity and specificity are
associated with more rigid structures while disordered regions may be involved with control
of the function. Since then, many functional roles of disordered regions including regulatory
control have been implicated through experimental investigation of these regions, statistical
mechanics, and structural bioinformatics approaches.
While the topics of protein disorder and conformational variations are intrinsically
related to protein flexibility, these topics warranted a separate chapter from ‘‘Protein Motion:
Simulation’’ (Chapter 37) largely because it deals with a time frame and complexity beyond
what is captured by protein dynamic modeling approaches (Figure 38.1). Molecular
dynamics simulations have been used to study conformational disorder and variants of
proteins with limitations (Torda and Scheek, 1990; Kuriyan et al., 1991; Fuentes et al., 2005).
Longer molecular dynamic simulations are reserved for smaller proteins or are otherwise
restricted to a small time frame within limits of nanoseconds for larger proteins. As such, the
observed conformational changes with these simulations will also be limited. The topics of
disorder and conformational variations discussed here extend beyond what can be offered by
molecular dynamic simulations, although various strategies such as the use of Monte Carlo
sampling (Lindorff-Larsen et al., 2004) and averaging over a few samples of generated
conformers while using experimental constraints (Kemmink et al., 1993; Bonvin and
Brunger, 1995) have been used to address this issue. Coarse-grained dynamic modeling
addresses molecular motion beyond the time frame limitations of classical molecular
dynamics. However, a systematic analysis between disordered regions and the modeled
large-amplitude fluctuating regions using these rigid-body based approaches needs to be
conducted.
Figure 38.1. Range of protein dynamics and structural observation. Protein flexibility lies on a
spectrum where the fluctuations occur at a range of different time scales. Ordered structures can be
visualized with simulated motion limited to the nanosecond range. Beyond these limits, protein
dynamics is perceived as protein disorder and lacking stable structures.
P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’
In this chapter, we discuss briefly the experimental methods used to study disordered
regions and highlight the computational resources that have largely fueled the advancement
of this field, by providing many of the current generalized observations. The biological
importance of protein disorder and conformational variations as they exist in microscopic
ensembles will also be examined in more detail. We attempt to create an introductory chapter
to the subject and apologize if not all research efforts are represented in this otherwise rapidly
growing field.
PROTEIN DISORDER: UNDERSTANDING THE REALM OF ‘‘INVISIBLE’’
Defining Protein Disorder
Before proceeding, we must first make clear that the field currently lacks a unifying definition
when discussing protein flexibility, disorder, and intrinsically unstructured proteins. These
terms are often used interchangeably largely due to the qualitative nature of the definition and
can leave readers with some confusion if the slight distinctions are not clarified. Other
disorder-related terms that have been coined in the field are intrinsic coils, random coils,
unfolded proteins, molten globules, and premolten globules as examples to define protein
states that are not natively folded. These terms are often referred to the global state of the
protein rather than specific regions within the protein structures that are disordered. Without
setting the standard nomenclature for the field, we will clarify by defining the usage of
‘‘disorder’’ in this chapter as regions in the protein structure where the equilibrium position
of the backbone, along with the dihedral angles, has no specific values and vary significantly
over time.
When evaluating and using disorder predictors, it is also important to have a clarified
view of how these regions were defined in the training of disorder predictors and other efforts
to understand these regions. Some sequence-based disorder predictors, such as PONDR
(Romero et al., 1997) and DISOPRED (Jones and Ward, 2003), were trained on disorder
defined as missing regions in the X-ray crystallographic structures. This definition is also
used to benchmark the performance of disorder predictors by evaluators in CASP experiments (Chapter 28). However, other predictors such as GlobPlot (Linding et al., 2003b) and
DisEMBL (Linding et al., 2003a) are trained on definition based on a temperature factor
(B-value) threshold to define disorder in X-ray crystal structures. Finally, other subtle
differences in disorder predictors should be considered such as RONN (Yang et al., 2005) and
Wiggle (Gu, Gribskov, and Bourne, 2006). RONN incorporates additional use of curated
information from homologous proteins to make predictions regarding disordered regions,
and Wiggle was trained on a data set where flexible regions are defined using dynamic
modeling techniques. These subtle distinctions should be noted when considering which
predictor would best serve the scientific question at hand.
Prevalence of Disordered Protein Regions
Flexible and disordered regions present two challenges to our understanding of protein
structures. Aside from being unable to resolve atomic coordinates for these regions to
understand the structure, the regions also interfere with the formation of protein crystals
needed to collect X-ray diffraction data. Disordered regions are often addressed by removing
them from proteins targeted for structure determination. These disordered regions can also
941
942
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
be detected using nuclear magnetic resonance (NMR—Chapter 5), but the structure of these
regions cannot be easily determined due to the increased conformational space sampled by
the disordered regions. An analysis of a nonredundant subset of the PDB shows that 7% of
the complete sequences, as deposited in the Swiss-Prot Database, contained no disordered
regions (Le Gall et al., 2007). A number of sequences where >95% of the protein is resolved
structurally comprise about 25% of the data set, a surprisingly small count that illustrates
the prevalence of disordered regions within protein structures.
The presence of disordered regions is not a technical artifact and several different
techniques have been employed to study this phenomenon. Early studies used spectroscopic
techniques such as infrared circular dichroism (CD), Fourier transform infrared (FTIR),
electron paramagnetic resonance (EPR), and optical rotary dispersion (ORD) to detect native
and nonnative structures that may form within the disordered regions. More recently, NMR
and small-angle X-ray scattering (SAXS) have been used to provide quantitative data about
disordered and denatured proteins (Kern, Eisenmesser, and Wolf-Watz, 2005; Mittag and
Forman-Kay, 2007; Sasakawa et al., 2007; Tsutakawa et al., 2007). These experimental
approaches can provide quantitative data that can be incorporated into the calculation of the
observed conformational ensembles in solution to determine the structural information
about denatured, unfolded, and intrinsically disordered proteins. Hydrogen–deuterium (H/
D) exchange mass spectrometry (Chapter 7) has also been used to study dynamic processes
such as the role of transient structural disorder as a facilitator of protein–ligand binding (Xiao
and Kaltashov, 2005). These experiments have detected structural formations within these
disordered regions, and these structures have been associated with functional implications.
With the development of sequence-based predictors, the prevalence of disordered
regions in organisms has been investigated across the three kingdoms of life (Oldfield
et al., 2005a; Ward et al., 2004). The frequency of native disorder was calculated for several
representative genomes and found to have increased content in eukaryotic proteins (33.0%)
compared to 2.0% and 4.2% of archaean and eubacterial proteins, respectively (Ward
et al., 2004). The analysis showed that proteins containing disorder are often located in the
cell nucleus with functional association to regulations of transcription and cell signaling. In a
separate study, an increase in intrinsic disorder content has been observed in regulatory cell
signaling, cytoskeletal, and human cancer-associated proteins (Iakoucheva et al., 2002).
Disordered regions are currently being curated into a database, DisProt (Sickmeier
et al., 2007), which contains 472 proteins and 1121 disordered regions as reported for
release 3.6 (June 29, 2007).
Computational Approaches to Understanding Protein Disorder
The computational tools that have been developed to predict regions of protein flexibility and
disorder range from the use of simple sequence complexity profiles to complex machine
learning infrastructure schemes such as the neural network and support vector machines
(SVMs) (Figure 38.2). The successful development of these tools is attributed to the fact that
sequence signatures of protein disorder are present. The popular choice of training set to
construct these predictors often use reported missing residues in X-ray crystallographic
structures, but reported temperature factors (B-factors) and NMR characterized disordered
regions have also been used. First we will discuss algorithms that do not use structural
information to identify and understand disordered regions. This is achieved by either
examination of the sequence space only or focusing on residues in which the structure
cannot be resolved. Then we will follow with alternative strategies that use temperature
P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’
Q1
Figure 38.2. General strategies to predict the disorder sequence space. Schema of various
strategies used to identify and understand the sequence space of disordered regions. The differences stem largely from how the disordered regions were defined and the underlying infrastructure
for analysis and prediction tool development. Within all of the sequence space, a subset of sequence
space will be associated with regions with low complexity, detected disordered, or those transitioning between an ordered and a disordered state. Overlaps can occur between the subsets.
factors in X-ray structures, the incorporation of homology information, and coarse-grained
dynamic modeling to guide the training and disorder definition process. This short overview
of disorder predictors will reflect the ongoing research efforts and common strategies
employed to develop sequence-based identification of disordered regions.
SEG is a successful algorithm that identified unstructured regions by examination of the
changing variation in sequence complexities within the sequence database (Wootton and
Federhen, 1996). For a window length L and an N-residue alphabet, the compositional
complexity for a given residue is
1
K1 ¼ logN W;
L
where W is the multinomial coefficient (L!=PNi¼1 ni !). Alternative formulations that resemble
Shannon’s entropy to measure sequence have also been used. After identifying low sequence
complexity regions, the second stage of this algorithm constructs an optimal subsequence to
evaluate the probability of occurrence of the observed pattern that is calculated as
1
P0 ¼ L WF;
N
where F is the combinatorial expression N!=PLk¼0 rk ! that yields the frequency of observing
the sequence composition with this complexity and rk is the count of the number of times the
complexity state is observed in the window. This probability of occurrence has been
precomputed into a table that serves efficiently as an index to identify these regions. Thus,
SEQ not only identifies low complexity sequence regions but also determines whether these
identified regions are significant rather than a random occurrence. The success of this
943
944
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
approach hinges on the assumption that disordered regions have low sequence complexities.
However, many disordered regions are not detected by SEG and therefore suggest that
features other than sequence complexity are involved.
To detect other disordered regions, PONDR is the first disorder predictor that uses a
design of two feedforward neural networks to make predictions using several attributes such
as the fractional composition and hydropathy for the 20 amino acids (Romero, Obradovic,
and Dunker, 1997; Romero et al., 1997). Unlike early flexible predictors, this predictor was
trained on a data set of eight- and seven-residue-long disordered regions defined by X-ray
and NMR experiments, respectively. The regions were defined as having either (1) no
resolvable atomic coordinates and therefore declared as missing in the PDB files of X-ray
structures or (2) extensively characterized as disorder with the use of NMR techniques.
Predictions were made using raw input features extracted for the sequence and smoothed out
with a second predictor. Since the initial development, PONDR is now available as a series of
eight different predictors that identify different ‘‘flavors’’ of disorder (Vucetic et al., 2003)
indicating that disordered features have distinctive sequence characteristics within each
subclass. The success of this initial development of disorder predictors on such a small
training data set is surprising and may be illuminative of sequence properties to be further
discussed in the subsequent section.
DISOPRED (Jones and Ward, 2003) is another predictor that is also based on the use of a
neural network but uses sequence profiles generated by PSI-BLAST (Altschul et al., 1997) as
input features to make predictions with a postfilter that takes into account the confidence of
secondary structure predictions. Often predictions are made based on physicochemical
properties of the amino acid, but DISOPRED uses instead the amino acid identity,
composition, and evolutionary conservation. Thus, the physicochemical properties are not
explicitly captured in the input features although it may be implicitly represented. The
inclusion of input features that represent evolutionary conservation was inspired when
secondary structure predictors (Chapter 29) were improved using this information. The use
of evolutionary information helps capture conserved features, or lack there of, between
protein homologues. DISOPRED reports an accuracy of 90%, but the use of accuracy to
measure success can sometimes be misleading, especially when the data set is unbalanced in
class frequencies. In this case, the data set contains a much greater number of ordered
residues than disordered examples, an important consideration when evaluating any
predictor. Another measure to evaluate predictor performance is the use of Matthews’
correlation coefficient (MCC) that DISOPRED reports to be 0.34, suggesting an overprediction of disordered regions.
RONN is another neural network algorithm that incorporates evolutionary information
to improve disorder prediction. Instead of using multiple sequence alignments and sequence
similarity directly, this algorithm compares the sequence to homologous proteins with
characterized and annotated disordered features (Yang et al., 2005). The alignment scores of
the sequence to a database of known order and disorder segments are used as the input
features for prediction. This interesting strategy resulted in improvements that reduced the
number of incorrect classification of residues in either the ordered or the disordered
structural class.
More recently, POODLE-L (Hirose et al., 2007) implemented a two-layered support
vector machine that uses physicochemical properties as input features and reports an
improved performance with an MCC value of 0.658. The source of improvement is difficult
to ascertain, although it may be safe to speculate that either the use of the SVM for extraction
or a more focused training set is the underlying source. While successful discrimination does
P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’
depend on the correct selection of input features that properly represent critical properties of
disorder regions, the physicochemical properties used in this predictor have also been used
by other predictors. Thus, it is unlikely that this would be the source of significant
improvements.
Strategies that use relatively simpler algorithms compared to machine learning approach have also been used to efficiently identify these regions. GlobPlot2 (Linding
et al., 2003b) uses propensities for amino acid to be in either an ordered or a disordered
structure, thus creating a disorder propensity index. Different propensity indexes and scales
were calculated to accommodate the different definition of disorder in the field and therefore
will make predictions accordingly. IUPred (Dosztanyi et al., 2005a; Dosztanyi et al., 2005b)
uses a low-resolution energetic force field based on the pair-wise interacting residues
observed in structures. The total pair-wise interaction is estimated based on a quadratic form
in the amino acid composition of the protein.
Developments of specialized disorder predictors to identify particular features of
disordered regions have also been made. The PONDR series of predictors have effectively
achieved this by constructing predictors that identify pattern subsets on features such as the
length of disordered regions. Wiggle is another specialized predictor that identifies flexible
regions having functional importance. Functional flexibility was defined as regions in
the protein where (1) the fluctuating motions exceed the mean fluctuation by more than one
standard deviation and (2) these fluctuations are involved in correlated motion. The use of
this definition successfully identified regions such as recognition loops, catalytic loops, and
hinges in a training data set where protein motion was obtained using a coarse dynamic
modeling technique.
Finally, various consensus and integrated strategies that incorporate different predictors
to improve disorder prediction have been developed. Consensus strategies have improved
structure prediction methods in the recent years and therefore will likely be the case for
disorder prediction. This approach often requires interpretation of results by the user to
decide between prediction results from different methods since the integration of methods
does not include an automated decision-making feature. Two such examples of integrated
servers that are available to the community are PrDOS (Ishida and Kinoshita, 2007) and
iPDA (Su, Chen, and Hsu, 2007).
Sequence Basis for the Biophysical Property of Ordered and Disordered
Regions
Decoding the sequence space is at the heart of many fields, and understanding how the
biophysical properties of proteins are encoded in the sequence is imperative to making
inferences about the protein structural fold and function. For example, the amino acid
hydrophobicities determined from several biophysical and theoretical experiments have
been used to make predictions for higher order protein features such as secondary structure
(Palliser and Parry, 2001). Unfortunately, advances in the field are still needed before an
accurate sequence-based biophysical description of proteins is available to the community.
Reoccurring amino acid bias and sequence patterns found within particular features of
proteins are often examined in hopes to glean some insight into a biophysical explanation. In
regard to protein disorder and flexibility, variances in protein sequences have been examined
and regions of low sequence complexity have been identified to preclude structural
formation (Wootton and Federhen, 1993; Wootton and Federhen, 1996). Glutamine-rich,
glycine-rich, and arginine-rich sequences are often a part of this class of low sequence
945
946
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
complexity regions with an occasional periodicity nature of repeating units. Regions
enriched in proline, glutamic acid, serine, and threonine (PEST) are also associated with
protein disorder.
The arrival of disordered predictors has allowed researchers to gain more insight into
sequence bias and patterns associated with ordered and disordered structures through both
the training process and subsequent analysis of the sequence space identified with these
algorithms. Initially, there were some concerns that the disorder predictors were making
predictions based on low complexity features similar to those identified by SEG. Instead, it
has been demonstrated that amino acid composition differed between low complexity,
disordered, and ordered regions (Romero et al., 2001). Disordered regions have been found
to contain higher levels of R, K, E, P, and S amino acids with lower levels of C, W, Y, I, and V
compared to ordered regions. Based on this analysis of change in amino acid frequency
between ordered and disordered structures, the residues can be ranked from disorder
promoting to order promoting as follows: K, E, D, P, N, S, Q, G, R, T, A, M, H, L, V, Y,
I, F, C, W. Such ranking suggests that the amount of flexibility, and hence disorder, can be
tuned depending on which residues are used and in what order.
Sequences that adopt both an ordered and a disordered structure depending on the
observed conformational state have been investigated to understand how such a balance
could be achieved (Zhang et al., 2007). These regions are coined as having ‘‘dual
personality’’ and have been collected from proteins with multiple X-ray structures in
different conformations. Regions that are invisible in one conformer but resolved in another
were defined as having ambivalence for either ordered or disordered regions. Residues were
clustered into three major groups based on their relative abundance for disordered, ordered,
or ambivalent regions. The first group contains hydrophilic and small amino acids (K, E, S,
G, and A) that are largely associated with disordered regions. The second group consists
mostly of hydrophobic resides (M, H, Y, I, F, C, and W) that are found in abundance within
ordered regions. Finally, the third group consists of mostly hydrophilic amino acids (D, T, Q,
N, P, and R) that are found in fairly equal propensities for ordered and disordered regions.
These clusters are in some agreement with what have been identified by Romero et al.
although there are differences. Wiggle (Gu, Gribskov, and Bourne, 2006) shows a different
scenario of amino acid preferences for these regions ranked in the following decreasing
order: E, K, Q, R, D, P, N, S, G, A, L, T, W, H, M, Y, F, C, V, I. This ranking shows correlation
with the consensus hydrophobicity index (Palliser and Parry, 2001). Although some general
trends can be observed between the different analyses, the lack of agreement suggests that
more work is needed to understand how the biophysical property of protein disorder is
encoded in the sequence.
Investigation of higher order sequence associations with disordered regions has
identified at least two nonrandom reoccurrence of patterns based on amino acid identity
or physicochemical properties (Lise and Jones, 2005). The analysis was conducted on
segments up to eight residues and repeated observations of proline-rich or charged segments
in disordered regions were identified. Although rather restrictive parameters were used in
this analysis, this examination shows that patterns associated with disordered regions are
much simpler compared to those found in ordered regions and rarely contained two different
amino acids. These patterns are not inclusive of all the possible patterns that may be found in
disordered regions.
Despite the low complexity of these local sequence patterns, the nonrandom occurrence
of these patterns reveals that biophysical properties of disordered structures are dependent on
both the sequence composition and order. Furthermore, different types of disordered regions
have been identified with a dependency on segment length (Vucetic et al., 2003). Subclasses
P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’
of disordered regions have been identified with different functional associations. More
recently, these thermodynamic features of unfolded regions were recently surveyed using a
structure-based thermodynamic model (Wang et al., 2008). The results show that, unlike
natively folded proteins, the thermodynamics of unfolded regions is dominated by local
sequence contribution and is sensitive to the composition and order of the sequence. The
local dependence of the disorder thermodynamics also provided some insight regarding why
certain biophysical properties of the natively folded state may be retained.
In the process of understanding the sequence basis for protein disorder, it is also
important to understand the contributions of evolutionary pressures that select for the
disordered state. Although evolutionary conservation has been included as one of the
input features to help disordered predictors discriminate between the different sequence
types, strict conservation may not necessary be a critical feature of disordered regions.
Disordered regions have been cited to evolve rapidly (Brown et al., 2002) and observed to
have increased alternative splice sites that generate more functional diversity in multicellular organisms (Romero et al., 2006). Evidence that disordered regions can be
identified with a reduced set of the amino acid alphabet further supports the notion of
weak evolutionary selection (Weathers et al., 2004). Consequently, the robustness to
substitutions simplifies the combinatorial possibilities of amino acid patterns; thus, it may
be possible to decode a direct linkage between sequence and the biophysical properties of
disordered regions.
Biological Consequences
In addition to paving way to a better understanding of the sequence space, many hypotheses
that generalize the biological and functional significance of disordered regions in proteins
stem from computational analysis with disordered predictors. Detection of the prevalence of
disordered regions across the different genomes mentioned earlier is one example of how the
disorder predictors have been applied. At least 28 functional roles of disorder proteins have
been identified and can be grouped into four main functional classes: (1) molecular
recognition, (2) molecular assembly, (3) protein modification, and (4) entropic chain
activities (Dunker et al., 2002; Radivojac et al., 2007). These functional classes largely
appear to reflect intermolecular function linked to regulatory processes, but the biophysical
properties of disordered regions are also important for protein folding, allostery, and
catalytic processes that are not necessarily captured by this functional classification system
presented by Dunker and colleagues. Patterns detected within disordered regions have also
been used to functionally classify proteins, particularly in cases where there are no structural
homologues (Lobley et al., 2007).
First, disordered structures provide an advantage in molecular recognition by promoting
promiscuous, transient binding for several targets and therefore help to increase the
complexities of protein interaction networks (Tompa, Szasz, and Buday, 2005). The
resulting promiscuous bindings allow these proteins to have multiple cellular roles (Sandhu
and Dash, 2006). Binding mechanisms used by disordered regions rely more on hydrophobic–hydrophobic interactions with more intermolecular contacts that cover a much
larger surface area of the target protein compared to ordered binding sites (Meszaros
et al., 2007; Vacic et al., 2007). This is often achieved through a continuous segment that
sometimes contains preformed structural elements where the native structure is fully
induced after binding to the substrate (Fuxreiter et al., 2004). Investigation of underlying
linear motifs important for recognition suggests that order favoring sequences are grafted
between disordered regions that serve as a carrier (Fuxreiter, Tompa, and Simon, 2007).
947
948
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Disordered regions have been identified to be important for complex formation for larger
complexes such as the viral capsid, bacterial flagellar system, cytoskeleton, ribosome, and
clathrin coat (Namba, 2001; Dafforn and Smith, 2004; Ward et al., 2004). A significant
correlation was observed between predicted structural disorder and the number of proteins
assembled into complexes conducted on E. coli and S. cerevisiae proteins (Hegyi, Schad, and
Tompa, 2007). The larger complexes show a higher average of disorder content with longer
predicted segments. These results are in agreement with the idea that disordered regions are
involved with protein binding and molecular recognition that could be one contributing
mechanism to complex formation. Alternatively, disordered regions in complex formation
may simply serve as linkers between well-formed domains. The hypotheses presented here
have been based on bioinformatics analysis and need to be further investigated experimentally.
Other disordered regions have been cited to act as entropic chains comprising a
functional class that includes linkers, bristles, springs, and clocks (Dunker et al., 2002).
This functional class is relatively less studied and appears to play a role in introducing a
level of organization in time and space. For example, linkers serve to link domains while
bristles help keep molecules apart through molecular exclusion. Springs are segments
that have restoring forces to favor a randomized fold and become restricted when
stretched as observed in titan molecules found in muscle fibers (Labeit and Kolmerer, 1995; Kellermayer et al., 2000) and elastin (Pometun, Chekmenev, and Wittebort, 2004). The property of increased flexibility for disorder regions has been hypothesized to be exploited and used as a ‘‘random generator’’ that effects the timing of
biological process such as determining the closure of a voltage-gated channel (Wissmann
et al., 1999; Wissmann et al., 2003).
The aforementioned four categories are not exclusive of each other nor are they
comprehensive of all possible functions associated with disordered regions that are still
yet to be fully understood. For example, a highly disordered loop in cochaperonin GroES is
important for binding to GroEL and has also been suggested to facilitate the cycles observed
for chaperonin-mediated protein folding through modulation of binding affinity without
affecting specificity (Landry et al., 1996). This example serves to suggest that a certain level
of flexibility necessary for function is being evolutionarily selected and conserved.
Retaining this balance of flexibility is also evident in the presence of disordered regions
that are important for catalysis and allostery. Changes in local segmental flexibility were
studied in the catalytic subunit of cAMP-dependent protein kinase using site-directed
labeling and fluorescence spectroscopy (Li et al., 2002). The backbone located around the
B-helix was found to have reduced flexibility only when the substrate and pseudosubstrate
are bound to the catalytic domain. This stage of the catalytic cycle coincided with the
phosphoryl transfer transition suggesting that internal disorder is important for this catalytic
step. In another example using single-molecule enzymatic assays, DNA was hydrolyzed by
lambda exonuclease with contributions from sequence-dependent factors and disorder
arising from conformational changes (van Oijen et al., 2003).
Although popular competing allosteric models are based on changes observed in rigid
structure bodies, alternative views propose that proteins can be regulated through changes in
protein dynamics. The Cooper–Dryden model is a mathematical formulation that shows
protein allostery can be achieved in the absence of structural change (Cooper and
Dryden, 1984). The dimeric CAP that binds to cAMP is an example of the Cooper–Dryden
model where changes were observed in the dynamics of the system but not the structure
(Popovych et al., 2006). In another example, dynamics is an integral part of the allosteric
response initiated by a ligand-induced disorder to order transition in the adenylate binding
P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’
loop of the biotin repressor, a transcription regulatory protein (Naganathan and Beckett, 2007). Changes in internal fluctuation between different stages of the activation cycle for
cyclin-dependent kinase 2 have been identified to be associated with functionally important
regions for regulation and catalytic activity with possible detection of entropy compensation
mechanisms being utilized (Gu and Bourne, 2007). The advantages of coupled disordered
regions for allosteric control have been demonstrated through statistical mechanics (Hilser
and Thompson, 2007).
Disease Impacts
Disordered regions in proteins have been implicated in several diseases such as neurodegenerative diseases, cardiovascular diseases, and cancer. These pathogenic culprits contain
disordered regions making it difficult to conduct structural studies with X-ray crystallography and NMR. NACP, for example, is a natively unfolded protein that seeds the polymerization of amyloid proteins leading to Alzheimer’s disease and impacts learning (Weinreb
et al., 1996). It is suggested that the disorder regions allow for promiscuous binding and help
potentiate protein–protein interaction that leads to the formation of these insoluble fibrils.
Likewise, the tau protein found in Alzheimer’s tangles is characterized to have intrinsically
disordered regions and leads to the formation of amyloid fibrils connected with disease
progression (Skrabana, Sevcik, and Novak, 2006; Skrabana et al., 2006).
A subset of eukaryotic proteins related to cardiovascular disease (CVD) was examined
and concluded to be enriched in disorder content (Cheng et al., 2006a). The analysis was
conducted with PONDR disorder predictions, cumulative distribution function analysis, and
charge–hydropathy plot analysis. Predictions for a-helical molecular recognition features
suggest high abundance within these proteins. The percentage of CVD containing >30
residues predicted to be disordered was 57 4% compared to 47 4% of eukaryotic
proteins in Swiss-Prot. The role of disorder in cardiovascular diseases needs to be further
validated experimentally, but the finding does not come as a surprise since disordered regions
are found to be associated with 66 6% of signaling molecules, proteins that often have
regulatory roles. Diseases are often a result of regulated processes gone awry.
Similarly, 79 5% of human cancer associated proteins have been found to contain
regions of disorder that are at least 30 consecutive residues in length (Iakoucheva
et al., 2002). One example of an oncoprotein is the HPV16 E7 that is an extended dimer
with a stable and cooperative fold but displays properties of natively unfolded proteins
(Garcia-Alai, Alonso, and de Prat-Gay, 2007). The region of disorder is located at the Nterminal region of the E7 domain that contains two important sites for regulation: (1) the
retinoblastoma tumor suppressor binding site for molecular recognition and (2) casein
kinase II phosphorylation site that induces stabilization with phosphorylation. The structural plasticity of this region has allowed for adaptation to binding of a variety of protein
targets and regulation of protein turnover. HPV16 is one of the human papillomavirus strains
associated with high frequency to cervical cancer.
Case Study: Disorder in the Glucocorticoid Receptor
We present the structural anatomy of a transcription factor in more detail as an example to
show how disordered regions may play a functional role (Figure 38.3). Glucocorticoid
receptor (GR) is a steroid binding nuclear receptor with well-defined domain boundaries
(McEwan et al., 2007). The receptor is composed of three domains with structures available
949
950
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Figure 38.3. Anatomy of the glucocorticoid receptor. The structure of nearly half of the glucocorticoid receptor cannot be resolved due to intrinsic disorder in the N-terminal domain (yellow)
that contains the transactivating motif AF1 (green). Low-resolution structural information shows a
composition of a-helices in the AF1 core region (187–244). Residues 399–419 are found to contain
the PEST motif of proline, glutamic acid, serine, and threonine that is associated with highly
disordered regions. High-resolution structures are available for the DNA (blue) and steroid (red)
Q2
binding domains connected by a hinge (orange).
for the DNA and ligand binding domains located at the C-terminal end in the bound form.
The N-terminal domain (NTD), on the other hand, is highly disordered and no highresolution structural data are available to study this region that contains the transactivating
AF1 domain (residues 77–262) involved in protein–protein interactions and regulation of
transcriptional activity (Lavery and McEwan, 2005). However, significant structural data
have been obtained using alternative methods such as biochemical analysis, circular
dichroism, NMR, fluorescence, and Fourier transform infrared spectroscopy. Predictions
for the structural content of this region have also been made using secondary structure
prediction algorithms. These data collectively show that GR-NTD potentially consists of a
mixture of a-helix, b-strand, and coil conformations. The disordered state of this region is
hypothesized to provide a mechanism for allosteric control that allows for the adoption of
different conformers that subsequently create different binding interfaces to interact with a
multitude of targets. This feature may be particularly important for the AF1 region that is
found to be 27% a-helical and 39% disordered in GR.
The AF1 region may be an example of molecular recognition elements important for
protein–protein interactions that use disordered regions as shuttles mentioned earlier.
Through mutagenic studies, the induced formation of a-helical structures in this region
has been correlated with the transactivation potential of GR (Dahlman-Wright et al., 1995;
Dahlman-Wright and McEwan, 1996). This example demonstrates how regulation of
transcriptional activity is achieved through modulating the order–disorder transition state
that can be induced through a variety of factors such as DNA binding events (Lefstin and
Yamamoto, 1998; Kumar et al., 1999) and even the presence of structure inducing osmolytes
(Baskakov et al., 1999; Kumar et al., 2007). This strategy may be commonly used by all
transcription factors as suggested by an analysis with PONDR that shows a relatively
increased disorder content in transcription factors compared to other subsets of the
P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S
eukaryotic proteins. Furthermore, the transcription activation regions are identified to have
higher disorder content compared to the DNA binding region for the majority of the
transcription factors (Liu et al., 2006).
PROTEIN CONFORMATIONAL VARIANTS AND ENSEMBLES
A discussion about protein disorder is really a discussion of protein conformational variants
and the resulting ensembles that are the underlying basis for all biological phenomena and
observations measured experimentally. A concept of ensemble highlights multiple possibilities that can be explored by proteins in alternative conformations rather than a single
static structure, an important concept we wish to emphasize in this section. Consideration of
alternative protein conformations expands not only the structural space, but also the
functional space that can be regulated simply through partial unfolding that is observed
as local protein disorder.
Structural variations are often appreciated when differences are observed between
homologous proteins, but structural variations can also be observed for a protein at a single
equilibrium state or between two states such as a ligand-bound and an unbound conformation
(Figure 38.4). In the field of structural biology and structural bioinformatics, it is convenient
Figure 38.4. Conformational variations in calmodulin. Comparison of (a) one conformational
state of yeast calmodulin and (b) 31 states aligned at the N-terminal domain in the absence of
calcium ions (PDBID: 1LKJ). (c) The bovine calmodulin adopts a dumbbell-like shape with calcium
binding, which is different from the more globular structure found in yeast. (d) Calmodulin bound
to a substrate (white spheres).
951
952
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
to view high-resolution structural data as a single molecule, but we must remind ourselves
that this interpretation is not the complete view. X-ray crystallographic studies are a
collective contribution of all the protein molecules at equilibrium state found in the crystal
lattice. Thus, the X-ray structure would represent the dominant conformation in the
ensemble with regions of high temperature factors indicating a higher conformational
variability. NMR, on the other hand, provides multiple solutions for conformations found in
solution, thus instilling a greater appreciation in the structure interpreter for conformational
variations. Protein dynamics and disorder leading to conformational variation is observed in
NMR experiments as resonance overlap and peak broadening from conformational averaging and contributions from intermediate time scale dynamics.
As an example of a protein that exists in many different conformational variations, we
use calmodulin to illustrate the point (Figure 38.4). This regulator responds to calcium ions
and exists in three main conformational states: (1) the apo-structure, (2) bound to calcium
ions, (3) and bound to the target substrate. The apo-form of calmodulin has a structure of
two globular domains connected by a hinge as observed in an NMR structure of calmodulin
from Saccharomyces cerevisiae (Figure 38.4a). Variations within a single state can be
immediately observed in the apo-structure where alternative conformational states are
aligned based on the N-terminal domain (Figure 38.4b). The C-terminal globular domain
can exist in a different conformation relative to the N-terminal domain. Variations between
homologues are observed in the calcium-bound bovine calmodulin with a helical linker
region between the two domains whereas the yeast calmodulin adopts a more globular
structure (Figure 38.4c). Finally, significant structural rearrangement is observed with
binding to substrate. A multitude of structural conformations donned on by calmodulin
represent some challenges that face the structural bioinformatics field.
A biophysical explanation for protein disorder can be described by the underlying
presence of conformational variations (Figure 38.5). An important concept that must be
delivered here is that most experimental measurements of proteins are not single-molecule
studies and therefore are collective contributions of all protein molecules in the solution.
Thus, the observed measurement can be written as the summed contribution of each
conformational state in the solution:
X
hObsi ¼
Pi þ Obsi :
Q3
With this in mind, disorder in X-ray structure, for example, arises when there are many
conformational variations that do not give rise to a single converged structure that is viewed
as ‘‘ordered.’’ Sometimes highly ordered regions can be mistaken to be a disordered
structure, particularly if large domain motions are involved such as those observed in
calmodulin (Figure 38.4b).
The contribution of different states to the observation can be explained by one of the two
models that represent the ratio of states differently (Figure 38.5). The first model assumes a
discrete two-state conformation while the second allows for additional conformational states
to be present. We illustrate the impact of the difference between the two models by applying it
to the unfolding process of proteins, for example (Figure 38.5a). In the first model, only the
native (order) and denatured (disorder) states of the protein are allowed to exist in solution.
The observed destabilization of proteins with increasing denaturant is then a result of the
changing ratio between these two states in the solution (Figure 38.5b). The probability of
observing an ordered structure will decrease as the probability of observing a disordered
structure will increase. In the second model, intermediate states containing partially
P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S
Figure 38.5. Ensemble-based description of protein disorder. Biological observations are the sum
contribution of the different states in solution. The unfolding process of proteins, for example, can be
described with one of the two models. Model 1 assumes two discrete states in solution in the native
(order) conformation or denatured (disorder) conformation. The denaturation process is the changing ratio of these two states from one spectrum to the other. Model 2 allows from other intermediate
conformations to contribute to the observation. Figure also appears in Color Figure section.
unfolded conformers are allowed. Thus, the probability of observing each intermediate state
as well as the native and denatured states contributes to the observation.
The importance of an ensemble-based interpretation of the native state can be
demonstrated through the use of COREX, a statistical thermodynamic model that uses
free energy values that have been structurally parameterized (Hilser and Freire, 1996; Hilser
et al., 2006). This experimentally validated model allows us to calculate the heat capacity
(DCp), enthalpy (DH), and entropy (DS) differences between the partially unfolded states and
the native state (reported in kcal/K/mol). More importantly, the derivation allows for the
interpretation of residue stability and contribution to the energetics of the ensemble that have
provided insight into possible mechanisms for cooperative (Hilser et al., 1998) and allosteric
(Hilser and Thompson, 2007) processes.
953
954
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Briefly, the relative Gibbs free energy of each possible conformational state adopted by
the protein (DGi) is expressed in terms of the standard thermodynamic equation:
DGi ¼ DHi TDSi :
COREX obtains the relative Gibbs free energy using (1) a high-resolution structure
and (2) a statistical thermodynamic model where the variables have been parameterized
based on changes in the accessible surface area (DASA) between the native and the partially
unfolded state. The enthalpic contribution to the state can be written as the sum of enthalpic
contributions from apolar (DHap) and polar residues (DHpol):
DH ¼ DHap þ DHpol :
The enthalpy change is related to DASA (A2) in the following way and is parameterized
at a reference temperature of 60 C, which is the median unfolding temperature for the data
set of model proteins used:
DHð60Þ ¼
aH ð60Þ ¼
bH ð60Þ ¼
The entropy of the system
conformational DSconf entropies:
aH ð60ÞDASAap þ bH ð60ÞDASApol ;
8:44;
31:4:
is the sum of contributions from solvent DSsolv and
DS ¼ DSsolv þ DSconf :
The solvent entropy can be calculated with the knowledge of the heat capacity of the
protein as derived:
DSsolv ¼ DSsolv;ap þ DSsolv;pol ;
*
*
DSsolv ¼ DCp;ap lnðT=TS;ap
Þ þ DCp;pol lnðT=TS;pol
Þ;
*
*
where TS;ap
¼ 385:15 and TS;pol
¼ 335:15 are the reference temperatures at which the
hydration entropy is equal to zero (Baldwin, 1986; Murphy and Freire, 1992; D’Aquino
et al., 1996). The heat capacity is found to scale to DASA for temperatures up to 80 C as
follows:
¼ DCp;ap þ DCp;pol ;
DCp
DCp
¼ ac ðTÞDASAap þ bc ðTÞ*DASApol ;
ac ðTÞ
¼ 0:45 þ 2:63 104 ðT25Þ4:2 105 ðT25Þ2 ;
bc ðTÞ
¼
0:26 þ 2:85 104 ðT25Þ þ 4:31 105 ðT25Þ2 :
Finally, to complete the calculation of DS, conformational entropy is performed as
follows:
X
X
X
DSconf ¼
DSbu-ex þ
DSex-un þ
DSbb :
P
The three contributions to conformational entropy are
DSbu-ex : buried residues
P (1)
that become exposed with
partial
unfolding;
(2)
DS
:
exposed residues in
ex-un
P
the unfolded state; and (3)
DSbb : backbone entropy changes for residues that become
unfolded. The entropy contributions of each amino acid have been determined and these
values are used in the calculation (Lee et al., 1994; D’Aquino et al., 1996).
P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S
The relative Gibbs free energy of each state is calculated with these parameterized
thermodynamic variables and will be important in determining the probability of observing
such a conformational state in the ensemble. Under equilibrium conditions, statistical
mechanics states that the probability of any given conformational state i (Pi) is given by the
equation
expðDGi =RTÞ
;
Pi ¼
Q
where the statistical weights, also known as the Boltzmann exponents (exp(DG/RT), are
defined by DGi relative to the gas constant R and temperature T. Q is the conformational
partition function defined as the sum of the statistical weights of all the states accessible to the
protein:
N
X
Q¼
expðDGi =RTÞ:
i¼0
These probabilities reflect preferences for the protein to adopt a partially unfolded
conformational state and can be extended to calculate the free energy contributions of each
residue to the ensemble. Using the probability-weighted conformations in the generated
ensemble, residue stability in the protein can be calculated as the ratio of residues in the
folded and unfolded states:
P
Pf; j
kf; j ¼ P
;
Pnf; j
P
P
where Pf; j and Pnf; j are the summed probabilities of all the states in which the residue
is either folded or unfolded, respectively.
The free energy contribution of each residue to the ensemble can then be calculated:
DGf ;j ¼ RT ln kf; j:
The importance of this derived formalism is that it can be extended to study changes in
energetic contributions at the residue level and provide insights into functional processes
such as cooperativity (Liu, Whitten, and Hilser, 2006; Liu, Whitten, and Hilser, 2007; Pan,
Lee, and Hilser, 2000) and allostery (Hilser and Thompson, 2007) by interpreting proteins as
an ensemble of multiple conformations. Recent systems in which cooperativity has been
identified and studied with COREX are dihydrofolate reductase and elgin C. The studies
defined structural–thermodynamic linkages based on correlations in stability changes
between residues in the ensemble as captured by kf, j. By examining these correlations,
the model helps to define a mechanism for site–site communication, particularly between
ligand binding sites and distantly located regions. The analyses suggest an alternative view to
energetic coupling between residues when a clear, connected pathway of intramolecular
interactions between them cannot be identified. The results also further emphasize the
importance of entropic contributions that is often neglected.
While it is important to produce the correct high-resolution structure using fold
recognition, homology modeling, and ab initio structure prediction approaches (Chapter
29–32), it is also equally important to construct other physically and chemically valid
conformational states that can be sampled by the protein. Generating these structural variants
that collectively produce a protein ensemble can be achieved with a variety of models, some
more restrictive than others. Restrictive models are those that assume disordered or partially
955
956
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
unfolded regions of the protein to adopt only coil structures (Bernado et al., 2005; Jha
et al., 2005). A less restrictive model such as TraDES (Feldman and Hogue, 2000) is an
unbiased conformational sampling method that generates plausible random structures
allowing for both native and nonnative contacts. Other conformer generating methods
including Rosetta (Simons et al., 1997) and CNS (Brunger et al., 1998) can also be used to
predict structures in these disordered and highly flexible regions. The relative probabilities of
the generated conformational variants that potentially populate the ensemble can then be
calculated with experimental constraints using ENSEMBLE (Choy and Forman-Kay, 2001;
Marsh et al., 2007). The population weight assignment is achieved with a pseudoenergy
minimization process and a Monte Carlo algorithm. With these strategies we could possibly
begin to make interpretation of the functional consequences arising from these variety of
conformational states.
FUTURE DIRECTIONS
Aside from the growing amount of literature on this topic, community recognition of the
importance of understanding disordered region is signified by the inclusion of disorder
predictor evaluation in CASP (CH 28). The first evaluation of disorder predictors appeared in
CASP5 (Melamud and Moult, 2003) in 2002 with results showing successful detection for
over half of the disordered residues in the blind set with a low rate of overprediction.
However, proper evaluation of these predictors remains a challenge that still needs to be
refined. This is to be expected due to both the varied definition and the existence of these
different types of disordered regions. Furthermore, as noted by the assessors, the data set
used for evaluation is skewed toward short disordered regions identified by missing residues
in X-ray crystallographic structures. As such, caution should be taken when interpreting the
performance of these results. In spite of the mentioned weaknesses, the evaluation process is
a necessity because the predictors serve many useful purposes. The most recent benchmarking effort conducted at CASP7 in 2006 showed that in spite of the many new generations
of disorder predictors, significant improvements in the performances have not be observed
and variations are seen in their sensitivity and specificity for detecting these regions (Bordoli,
Kiefer, and Schwede, 2007). Improvements in disorder predictors cannot be made without a
systematic study of these regions and several experimental strategies using techniques that
combine heat and acid treatment with mass spectrometry and/or 2D electrophoresis have
been proposed to tackle this issue (Csizmok et al., 2007).
The applications of disordered predictors are not limited to target protein identification
and elimination for structural genomic efforts. New applications include improved functional categorization of newly identified proteins (Lobley et al., 2007) and a potential role in
improved drug design (Cheng et al., 2006b). The power of leveraging what we know about
disordered regions will prove itself to be immensely valuable for the majority of the proteins
that do not adopt a native fold. Currently, the function of about 35% of proteins cannot be
categorized using homology-based assignment, leaving researchers with a large set of
‘‘hypothetical protein’’ drug targets with unknown function (Ofran et al., 2005). A
systematic characterization of protein disorder can be achieved by combining the developments in improved computational and experimental analysis (Bracken et al., 2004).
Finally, the importance of understanding subtle differences in conformational variations, due to effects such as mutational events, has always been recognized by the structural
bioinformatics field. New measures to better understand these variations are indicated by the
REFERENCES
957
goals presented to the structure prediction community proposed at the conclusion of CASP6
(Moult et al., 2005). The four challenges to overcome are to (1) model the structure of
single-residue mutants, (2) model the structural changes associated with specificity
changes within protein families, (3) improve refinement methods to produce a 0.5 A
root-mean-square-deviation (RMSD) improvement in the Ca accuracy of models, and
(4) devise a scoring function that will reliably pick the most accurate model of the possible
candidate structures for new fold predictions. As the community addresses these challenges,
an ensemble view of conformational variations should be kept in mind to understand
functional consequences as well.
WEB RESOURCES
Resource
References
URL
DisProt: the database of
disordered proteins
DisProt: list of disorder
predictors
DISOPRED
VLXT (PONDR)
GlobPlot
DisEMBL
PrDOS
RONN
Wiggle
PROFbval
Sickmeier et al. (2007)
http://www.disprot.org/
Not published, a part of
DisProt
Jones and Ward (2003)
Romero et al. (1997)
Linding et al. (2003b)
Linding et al. (2003a)
Ishida and Kinoshita (2007)
Yang et al. (2005)
Gu et al. (2006)
Schlessinger et al. (2006)
http://www.ist.temple.edu/disprot/
predictors.php
http://bioinf.cs.ucl.ac.uk/disopred/
http://www.pondr.com
http://globplot.embl.de
http://dis.embl.de
http://prdos.hgc.jp/cgi-bin/top.cgi
http://www.strubi.ox.ac.uk/RONN
http://wiggle.sdsc.edu
http://cubic.bioc.columbia.edu/
services/profbval/
http://biominer.bime.ntu.edu.tw/ipda/
iPDA: integrated protein
disorder analyzer
ENSEMBLE
CNS
ROSETTA
COREX/BEST server
Su et al. (2007)
Choy and Forman-Kay (2001) http://pound.med.utoronto.ca/forman/
and Marsh et al. (2007)
ensemble/ensemble.html
Brunger et al. (1998)
http://helix.nih.gov/apps/structbio/cns.
html
Simons et al. (1997)
http://www.rosettacommons.org/
Vertrees et al. (2005)
http://www.best.utmb.edu/BEST/
REFERENCES
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997): Gapped
BLASTand PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res
25:3389–3402.
Baldwin RL (1986): Temperature dependence of the hydrophobic interaction in protein folding. Proc
Natl Acad Sci USA 83:8069–8072.
Baskakov IV, Kumar R, Srinivasan G, Ji YS, Bolen DW, Thompson EB (1999): Trimethylamine
N-oxide-induced cooperative folding of an intrinsically unfolded transcription-activating fragment
of human glucocorticoid receptor. J Biol Chem 274:10693–10696.
958
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Bernado P, Blanchard L, Timmins P, Marion D, Ruigrok RW, Blackledge M (2005): A structural model
for unfolded proteins from residual dipolar couplings and small-angle X-ray scattering. Proc Natl
Acad Sci USA 102:17002–17007.
Blow DM (1977): Flexibility and rigidity in protein crystals. Ciba Found Symp 55–61.
Bonvin AM, Brunger AT (1995): Conformational variability of solution nuclear magnetic resonance
structures. J Mol Biol 250:80–93.
Bordoli L, Kiefer F, Schwede T (2007): Assessment of disorder predictions in CASP7. Proteins
69:129–136.
Bracken C, Iakoucheva LM, Rorner PR, Dunker AK (2004): Combining prediction, computation and
experiment for the characterization of protein disorder. Curr Opin Struct Biol 14:570–576.
Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK
(2002): Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol
55:104–110.
Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski
J, Nilges M, Pannu NS, et al. (1998): Crystallography & NMR system: a new software suite for
macromolecular structure determination. Acta Crystallogr D 54:905–921.
Cheng Y, LeGall T, Oldfield CJ, Dunker AK, Uversky VN (2006a): Abundance of intrinsic disorder in
protein associated with cardiovascular disease. Biochemistry 45:10448–10460.
Cheng Y, LeGall T, Oldfield CJ, Mueller JP, Van YY, Romero P, Cortese MS, Uversky VN, Dunker AK
(2006b): Rational drug design via intrinsically disordered protein. Trends Biotechnol 24:435–442.
Choy WY, Forman-Kay JD (2001): Calculation of ensembles of structures representing the unfolded
state of an SH3 domain. J Mol Biol 308:1011–1032.
Cooper A, Dryden DT (1984): Allostery without conformational change. A plausible model. Eur
Biophys J 11:103–109.
Csizmok V, Dosztanyi Z, Simon I, Tompa P (2007): Towards proteomic approaches for the identification of structural disorder. Curr Protein Pept Sci 8:173–179.
Dafforn TR, Smith CJ (2004): Natively unfolded domains in endocytosis: hooks, lines and linkers.
EMBO Rep 5:1046–1052.
Dahlman-Wright K, Baumann H, McEwan IJ, Almlof T, Wright AP, Gustafsson JA, Hard T (1995):
Structural characterization of a minimal functional transactivation domain from the human
glucocorticoid receptor. Proc Natl Acad Sci USA 92:1699–1703.
Dahlman-Wright K, McEwan IJ (1996): Structural studies of mutant glucocorticoid receptor
transactivation domains establish a link between transactivation activity in vivo and alphahelix-forming potential in vitro. Biochemistry 35:1323–1327.
D’Aquino JA, Gomez J, Hilser VJ, Lee KH, Amzel LM, Freire E (1996): The magnitude of the
backbone conformational entropy change in protein folding. Proteins 25:143–156.
Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005a): IUPred: web server for the prediction of
intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics
21:3433–3434.
Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005b): The pairwise energy content estimated from
amino acid composition discriminates between folded and intrinsically unstructured proteins.
J Mol Biol 347:827–839.
Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z (2002): Intrinsic disorder and
protein function. Biochemistry 41:6573–6582.
Feldman HJ, Hogue CW (2000): A fast method to sample real protein conformational space. Proteins
39:112–131.
Fuentes G, Nederveen AJ, Kaptein R, Boelens R, Bonvin AM (2005): Describing partially unfolded
states of proteins from sparse NMR data. J Biomol NMR 33:175–186.
REFERENCES
Fuxreiter M, Simon I, Friedrich P, Tompa P (2004): Preformed structural elements feature in partner
recognition by intrinsically unstructured proteins. J Mol Biol 338:1015–1026.
Fuxreiter M, Tompa P, Simon I (2007): Local structural disorder imparts plasticity on linear motifs.
Bioinformatics 23:950–956.
Garcia-Alai MM, Alonso LG, de Prat-Gay G (2007): The N-terminal module of HPV16 E7 is an
intrinsically disordered domain that confers conformational and recognition plasticity to the
oncoprotein. Biochemistry 46:10405–10412.
Gu J, Gribskov M, Bourne PE (2006): Wiggle-predicting functionally flexible regions from primary
sequence. PLoS Comput Biol 2:e90.
Gu J, Bourne PE (2007): Identifying allosteric fluctuation transitions between different protein
conformational states as applied to cyclin dependent kinase 2. BMC Bioinform 8:45.
Hegyi H, Schad E, Tompa P (2007): Structural disorder promotes assembly of protein complexes.
BMC Struct Biol 7:65.
Hilser VJ, Freire E (1996): Structure-based calculation of the equilibrium folding pathway of proteins.
Correlation with hydrogen exchange protection factors. J Mol Biol 262:756–772.
Hilser VJ, Dowdy D, Oas TG, Freire E (1998): The structural distribution of cooperative interactions in
proteins: analysis of the native state ensemble. Proc Natl Acad Sci USA 95:9903–9908.
Hilser VJ, Garcia-Moreno EB, Oas TG, Kapp G, Whitten ST (2006): A statistical thermodynamic
model of the protein ensemble. Chem Rev 106:1545–1558.
Hilser VJ, Thompson EB (2007): Intrinsic disorder as a mechanism to optimize allosteric coupling in
proteins. Proc Natl Acad Sci USA 104:8311–8315.
Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T (2007): POODLE-L: a two-level SVM prediction
system for reliably predicting long disordered regions. Bioinformatics 23:2046–2053.
Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK (2002): Intrinsic disorder in cellsignaling and cancer-associated proteins. J Mol Biol 323:573–584.
Ishida T, Kinoshita K (2007): PrDOS: prediction of disordered protein regions from amino acid
sequence. Nucleic Acids Res 35:W460–464.
Jha AK, Colubri A, Freed KF, Sosnick TR (2005): Statistical coil model of the unfolded state:
resolving the reconciliation problem. Proc Natl Acad Sci USA 102:13099–13104.
Jones DT, Ward JJ (2003): Prediction of disordered regions in proteins from position specific score
matrices. Proteins 53:(Suppl. 6): 573–578.
Kellermayer MS, Smith SB, Granzier HL, Bustamante C (1997): Folding–unfolding transitions in
single titin molecules characterized with laser tweezers. Science 276:1112–1116.
Kemmink J, van Mierlo CP, Scheek RM, Creighton TE (1993): Local structure due to an aromaticamide interaction observed by 1H-nuclear magnetic resonance spectroscopy in peptides related to
the N terminus of bovine pancreatic trypsin inhibitor. J Mol Biol 230:312–322.
Kern D, Eisenmesser EZ, Wolf-Watz M (2005): Enzyme dynamics during catalysis measured by NMR
spectroscopy. Methods Enzymol 394:507–524.
Kumar R, Baskakov IV, Srinivasan G, Bolen DW, Lee JC, Thompson EB (1999): Interdomain
signaling in a two-domain fragment of the human glucocorticoid receptor. J Biol Chem
274:24737–24741.
Kumar R, Serrette JM, Khan SH, Miller AL, Thompson EB (2007): Effects of different osmolytes on
the induced folding of the N-terminal activation domain (AF1) of the glucocorticoid receptor. Arch
Biochem Biophys 465:452–460.
Kuriyan J, Osapay K, Burley SK, Brunger AT, Hendrickson WA, Karplus M (1991): Exploration of
disorder in protein structures by X-ray restrained molecular dynamics. Proteins 10:340–358.
Labeit S, Kolmerer B (1995): Titins: giant proteins in charge of muscle ultrastructure and elasticity.
Science 270:293–296.
959
960
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Landry SJ, Taher A, Georgopoulos C, van der Vies SM (1996): Interplay of structure and disorder in
cochaperonin mobile loops. Proc Natl Acad Sci USA 93:11622–11627.
Lavery DN, McEwan IJ (2005): Structure and function of steroid receptor AF1 transactivation
domains: induction of active conformations. Biochem J 391:449–464.
Lee KH, Xie D, Freire E, Amzel LM (1994): Estimation of changes in side chain configurational
entropy in binding and folding: general methods and application to helix formation. Proteins
20:68–84.
Lefstin JA, Yamamoto KR (1998): Allosteric effects of DNA on transcriptional regulators. Nature
392:885–888.
Le Gall T, Romero PR, Cortese MS, Uversky VN, Dunker AK (2007): Intrinsic disorder in the Protein
Data Bank. J Biomol Struct Dyn 24:325–342.
Li F, Gangal M, Juliano C, Gorfain E, Taylor SS, Johnson DA (2002): Evidence for an internal entropy
contribution to phosphoryl transfer: a study of domain closure, backbone flexibility, and the
catalytic cycle of cAMP-dependent protein kinase. J Mol Biol 315:459–469.
Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB (2003a): Protein disorder prediction:
implications for structural proteomics. Structure 11:1453–1459.
Linding R, Russell RB, Neduva V, Gibson TJ (2003b): GlobPlot: exploring protein sequences for
globularity and disorder. Nucleic Acids Res 31:3701–3708.
Lindorff-Larsen K, Kristjansdottir S, Teilum K, Fieber W, Dobson CM, Poulsen FM, Vendruscolo M
(2004): Determination of an ensemble of structures representing the denatured state of the bovine
acyl-coenzyme a binding protein. J Am Chem Soc 126:3291–3299.
Lise S, Jones DT (2005): Sequence patterns associated with disordered regions in proteins. Proteins
58:144–150.
Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK (2006): Intrinsic disorder in
transcription factors. Biochemistry 45:6873–6888.
Liu T, Whitten ST, Hilser VJ (2006): Ensemble-based signatures of energy propagation in proteins: a
new view of an old phenomenon. Proteins 62:728–738.
Liu T, Whitten ST, Hilser VJ (2007): Functional residues serve a dominant role in mediating the
cooperativity of the protein ensemble. Proc Natl Acad Sci USA 104:4347–4352.
Lobley A, Swindells MB, Orengo CA, Jones DT (2007): Inferring function using patterns of native
disorder in proteins. PLoS Comput Biol 3:e162.
Marsh JA, Neale C, Jack FE, Choy WY, Lee AY, Crowhurst KA, Forman-Kay JD (2007): Improved
structural characterizations of the drkN SH3 domain unfolded state suggest a compact ensemble
with native-like and non-native structure. J Mol Biol 367:1494–1510.
McEwan IJ, Lavery D, Fischer K, Watt K (2007): Natural disordered sequences in the amino terminal
domain of nuclear receptors: lessons from the androgen and glucocorticoid receptors. Nucl
Receptor Signal 5:e001.
Melamud E, Moult J (2003): Evaluation of disorder predictions in CASP5. Proteins 53:561–565.
Meszaros B, Tompa P, Simon I, Dosztanyi Z (2007): Molecular principles of the interactions of
disordered proteins. J Mol Biol 372:549–561.
Mittag T, Forman-Kay JD (2007): Atomic-level characterization of disordered protein ensembles.
Curr Opin Struct Biol 17:3–14.
Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A (2005): Critical assessment of methods of
protein structure prediction (CASP)—round 6. Proteins 61:(Suppl. 7): 3–7.
Murphy KP, Freire E (1992): Thermodynamics of structural stability and cooperative folding behavior
in proteins. Adv Protein Chem 43:313–361.
Naganathan S, Beckett D (2007): Nucleation of an allosteric response via ligand-induced loop folding.
J Mol Biol 373:96–111.
REFERENCES
Namba K (2001): Roles of partly unfolded conformations in macromolecular self-assembly. Genes
Cells 6:1–12.
Ofran Y, Punta M, Schneider R, Rost B (2005): Beyond annotation transfer by homology: novel
protein-function prediction methods to assist drug discovery. Drug Discov Today 10:1475–1482.
Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK (2005a): Comparing and
combining predictors of mostly disordered proteins. Biochemistry 44:1989–2000.
Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL (2005b): Addressing the intrinsic disorder
bottleneck in structural proteomics. Proteins 59:444–453.
Palliser CC, Parry DA (2001): Quantitative comparison of the ability of hydropathy scales to recognize
surface beta-strands in proteins. Proteins 42:243–255.
Pan H, Lee JC, Hilser VJ (2000): Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc Natl Acad Sci USA 97:12020–12025.
Pometun MS, Chekmenev EY, Wittebort RJ (2004): Quantitative observation of backbone disorder in
native elastin. J Biol Chem 279:7982–7987.
Popovych N, Sun S, Ebright RH, Kalodimos CG (2006): Dynamically driven protein allostery. Nat
Struct Mol Biol 13:831–838.
Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK (2007): Intrinsic
disorder and functional proteomics. Biophys J 92:1439–1456.
Ringe D, Petsko GA (1986): Study of protein dynamics by X-ray diffraction. Methods Enzymol
131:389–433.
Romero P, Obradovic Z, Dunker K (1997): Sequence data analysis for long disordered regions
prediction in the calcineurin family. Genome Inform Ser Workshop Genome Inform, Vol. 8,
pp 110–124.
Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997): Identifying disordered
regions in proteins from amino acid sequences. Proceedings of the IEEE. International Conference
on Neural Networks, Vol. 1, pp 90–95.
Romero P, Obradovic Z, Li XH, Garner EC, Brown CJ, Dunker AK (2001): Sequence complexity of
disordered protein. Proteins 42:38–48.
Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P, Oldfield CJ, Cortese MS, Sickmeier M,
LeGall T, Obradovic Z, Dunker AK (2006): Alternative splicing in concert with protein intrinsic
disorder enables increased functional diversity in multicellular organisms. Proc Nat Acad Sci USA
103:8390–8395.
Sandhu KS, Dash D (2006): Conformational flexibility may explain multiple cellular roles of PEST
motifs. Proteins 63:727–732.
Sasakawa H, Sakata E, Yamaguchi Y, Masuda M, Mori T, Kurimoto E, Iguchi T, Hisanaga SI, Iwatsubo
T, Hasegawa M, Kato K (2007): Ultra-high field NMR studies of antibody binding and site-specific
phosphorylation of alpha-synuclein. Biochem Biophys Res Commun 363:795–799.
Schlessinger A, Yachdav G, Rost B (2006): PROFbval: predict flexible and rigid residues in proteins.
Bioinformatics 22:891–893.
Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J,
Uversky VN, et al. (2007): DisProt: the database of disordered proteins. Nucleic Acids Res 35:
D786–793.
Simons KT, Kooperberg C, Huang E, Baker D (1997): Assembly of protein tertiary structures from
fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
J Mol Biol 268:209–225.
Skrabana R, Sevcik J, Novak M (2006): Intrinsically disordered proteins in the neurodegenerative
processes: formation of tau protein paired helical filaments and their analysis. Cell Mol Neurobiol
26:1085–1097.
961
962
T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R
Q4
Skrabana R, Skrabanova-Khuebachova M, Kontsek P, Novak M (2006): Alzheimer’s-disease-associated conformation of intrinsically disordered tau protein studied by intrinsically disordered protein
liquid-phase competitive enzyme-linked immunosorbent assay. Anal Biochem 359:230–237.
Su CT, Chen CY, Hsu CM (2007): iPDA: integrated protein disorder analyzer. Nucleic Acids Res 35:
W465–472.
Tompa P, Szasz C, Buday L (2005): Structural disorder throws new light on moonlighting. Trends
Biochem Sci 30:484–489.
Torda AE, Scheek RM, Gunsteren WF, (1990): Time-averaged nuclear Overhauser effect distance
restraints applied to tendamistat. J Mol Biol 214:223–235.
Tsutakawa SE, Hura GL, Frankel KA, Cooper PK, Tainer JA (2007): Structural analysis of flexible
proteins in solution by small angle X-ray scattering combined with crystallography. J Struct Biol
158:214–223.
Vacic V, Oldfield CJ, Mohan A, Radivojac P, Cortese MS, Uversky VN, Dunker AK (2007):
Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome
Res 6:2351–2366.
van Oijen AM, Blainey PC, Crampton DJ, Richardson CC, Ellenberger T, Xie XS (2003): Singlemolecule kinetics of lambda exonuclease reveal base dependence and dynamic disorder. Science
301:1235–1238.
Vertrees J, Barritt P, Whitten S, Hilser VJ (2005): COREX/BEST server: a web browser-based
program that calculates regional stability variations within protein structures. Bioinformatics
21:3318–3319.
Vucetic S, Brown CJ, Dunker AK, Obradovic Z (2003): Flavors of protein disorder. Proteins
52:573–584.
Wang S, Gu J, Larson SA, Whitten ST, Hilser VJ (2008): Probing the denatured ensemble for fold
specifying thermodynamic information. Submitted.
Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004): Prediction and functional analysis of
native disorder in proteins from the three kingdoms of life. J Mol Biol 337:635–645.
Weathers EA, Paulaitis ME, Woolf TB, Hoh JH (2004): Reduced amino acid alphabet is sufficient to
accurately recognize intrinsically disordered protein. FEBS Lett 576:348–352.
Weinreb PH, Zhen W, Poon AW, Conway KA, Lansbury PT Jr (1996): NACP, a protein implicated in
Alzheimer’s disease and learning, is natively unfolded. Biochemistry 35:13709–13715.
Wissmann R, Baukrowitz T, Kalbacher H, Kalbitzer HR, Ruppersberg JP, Pongs O, Antz C, Fakler B
(1999): NMR structure and functional characteristics of the hydrophilic N terminus of the
potassium channel beta-subunit Kvbeta1.1. J Biol Chem 274:35521–35525.
Wissmann R, Bildl W, Oliver D, Beyermann M, Kalbitzer HR, Bentrop D., Fakler B (2003): Solution
structure and function of the ‘‘tandem inactivation domain’’ of the neuronal A-type potassium
channel Kv1.4. J Biol Chem 278:16142–16150.
Wootton JC, Federhen S (1993): Statistics of local complexity in amino-acid-sequences and sequence
databases. Comp Chem 17:149–163.
Wootton JC, Federhen S (1996): Analysis of compositionally biased regions in sequence databases.
Comp Methods Macromol Sequence Anal 266:554–571.
Xiao H, Kaltashov IA (2005): Transient structural disorder as a facilitator of protein–ligand binding:
native H/D exchange-mass spectrometry study of cellular retinoic acid binding protein I. J Am Soc
Mass Spectrom 16:869–879.
Yang ZR, Thomson R, McNeil P, Esnouf RM (2005): RONN: the bio-basis function neural network
technique applied to the detection of natively disordered regions in proteins. Bioinformatics.
Zhang Y, Stec B, Godzik A (2007): Between order and disorder in protein structures: analysis of ‘‘dual
personality’’ fragments in proteins. Structure 15:1141–1147.
Author Query
1. Please check whether the intended meaning of sentence ‘‘General strategies ... sequence
space’’ is retained after the edits.
2. There is a mention of color in the caption of Figure 38.3, but the figure is to be printed in
black and white. Kindly amend the text accordingly.
3. Parts a and b are mentioned in the text for Figure 38.5, but the same are not present in the
artwork. Please check.
4. Please update the following reference: Wang et al., 2008.