Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
38 THE SIGNIFICANCE AND IMPACTS OF PROTEIN DISORDER AND CONFORMATIONAL VARIANTS Jenny Gu and Vincent Hilser INTRODUCTION Protein disorder is a topic worth attention from the structural bioinformatics community largely for the technical challenges it presents to the field, but also for its biological and functional implications. The success of structural genomic efforts using X-ray crystallography depends on overcoming several potential bottlenecks (Chapter 40), one of which is the formation of protein crystals that can be obstructed by the presence of highly flexible and disordered regions. Despite precluding the number of structures that can be obtained thus impacting the coverage of protein space, our current generalized understanding of disordered regions is a result of structural bioinformatics efforts that were able to extract and analyze patterns associated with these regions. These disorder predictors have been proven to be useful in advancing our understanding of disordered regions with potential impact to improve the success rate of structural genomics efforts, particularly those focused on eukaryotic proteins (Oldfield et al., 2005b). The importance of resolving differences observed in conformational variants within protein families and understanding their impacts is also a rising issue. Most structural genomics efforts aim to solve a representative structure for each protein family to maximize the coverage of protein space with particular focus on identifying new protein folds. However, it is equally important to understand structural changes that result from sequence differences introduced by a few single point mutations, insertions, and/or deletions since it can have a large functional impact. Furthermore, the structural information recorded in the Protein Data Bank (PDB) is often overlooked as a macroscopic view of a collection of microscopic ensembles that give rise to the observed protein structure. In other words, the Structural Bioinformatics, Second Edition Edited by Jenny Gu and Philip E. Bourne Copyright 2009 John Wiley & Sons, Inc. 939 940 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R observed protein structure is not the only conformation adopted by the protein. In fact, most observed biological phenomena are a macroscopic consequence of the collective microscopic states. Understanding the differences in the microscopic states and how the changes impact the macroscopic event is currently addressed in several ways that will be discussed. By exploiting the technical weakness in structural data, researchers have been able to gain insight into the potential biological significance of these otherwise poorly characterized disordered regions (Ringe and Petsko, 1986). Recognition for the importance of protein disorder in biological function came around as early as the late 1970s when disordered regions seem to reoccur within particular features of enzymes such as the zymogens of pancreatic serine proteases and tyrosyl-tRNA synthetases (Blow, 1977). In light of these investigations, the hypothesis presented at the time was that the reactivity and specificity are associated with more rigid structures while disordered regions may be involved with control of the function. Since then, many functional roles of disordered regions including regulatory control have been implicated through experimental investigation of these regions, statistical mechanics, and structural bioinformatics approaches. While the topics of protein disorder and conformational variations are intrinsically related to protein flexibility, these topics warranted a separate chapter from ‘‘Protein Motion: Simulation’’ (Chapter 37) largely because it deals with a time frame and complexity beyond what is captured by protein dynamic modeling approaches (Figure 38.1). Molecular dynamics simulations have been used to study conformational disorder and variants of proteins with limitations (Torda and Scheek, 1990; Kuriyan et al., 1991; Fuentes et al., 2005). Longer molecular dynamic simulations are reserved for smaller proteins or are otherwise restricted to a small time frame within limits of nanoseconds for larger proteins. As such, the observed conformational changes with these simulations will also be limited. The topics of disorder and conformational variations discussed here extend beyond what can be offered by molecular dynamic simulations, although various strategies such as the use of Monte Carlo sampling (Lindorff-Larsen et al., 2004) and averaging over a few samples of generated conformers while using experimental constraints (Kemmink et al., 1993; Bonvin and Brunger, 1995) have been used to address this issue. Coarse-grained dynamic modeling addresses molecular motion beyond the time frame limitations of classical molecular dynamics. However, a systematic analysis between disordered regions and the modeled large-amplitude fluctuating regions using these rigid-body based approaches needs to be conducted. Figure 38.1. Range of protein dynamics and structural observation. Protein flexibility lies on a spectrum where the fluctuations occur at a range of different time scales. Ordered structures can be visualized with simulated motion limited to the nanosecond range. Beyond these limits, protein dynamics is perceived as protein disorder and lacking stable structures. P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’ In this chapter, we discuss briefly the experimental methods used to study disordered regions and highlight the computational resources that have largely fueled the advancement of this field, by providing many of the current generalized observations. The biological importance of protein disorder and conformational variations as they exist in microscopic ensembles will also be examined in more detail. We attempt to create an introductory chapter to the subject and apologize if not all research efforts are represented in this otherwise rapidly growing field. PROTEIN DISORDER: UNDERSTANDING THE REALM OF ‘‘INVISIBLE’’ Defining Protein Disorder Before proceeding, we must first make clear that the field currently lacks a unifying definition when discussing protein flexibility, disorder, and intrinsically unstructured proteins. These terms are often used interchangeably largely due to the qualitative nature of the definition and can leave readers with some confusion if the slight distinctions are not clarified. Other disorder-related terms that have been coined in the field are intrinsic coils, random coils, unfolded proteins, molten globules, and premolten globules as examples to define protein states that are not natively folded. These terms are often referred to the global state of the protein rather than specific regions within the protein structures that are disordered. Without setting the standard nomenclature for the field, we will clarify by defining the usage of ‘‘disorder’’ in this chapter as regions in the protein structure where the equilibrium position of the backbone, along with the dihedral angles, has no specific values and vary significantly over time. When evaluating and using disorder predictors, it is also important to have a clarified view of how these regions were defined in the training of disorder predictors and other efforts to understand these regions. Some sequence-based disorder predictors, such as PONDR (Romero et al., 1997) and DISOPRED (Jones and Ward, 2003), were trained on disorder defined as missing regions in the X-ray crystallographic structures. This definition is also used to benchmark the performance of disorder predictors by evaluators in CASP experiments (Chapter 28). However, other predictors such as GlobPlot (Linding et al., 2003b) and DisEMBL (Linding et al., 2003a) are trained on definition based on a temperature factor (B-value) threshold to define disorder in X-ray crystal structures. Finally, other subtle differences in disorder predictors should be considered such as RONN (Yang et al., 2005) and Wiggle (Gu, Gribskov, and Bourne, 2006). RONN incorporates additional use of curated information from homologous proteins to make predictions regarding disordered regions, and Wiggle was trained on a data set where flexible regions are defined using dynamic modeling techniques. These subtle distinctions should be noted when considering which predictor would best serve the scientific question at hand. Prevalence of Disordered Protein Regions Flexible and disordered regions present two challenges to our understanding of protein structures. Aside from being unable to resolve atomic coordinates for these regions to understand the structure, the regions also interfere with the formation of protein crystals needed to collect X-ray diffraction data. Disordered regions are often addressed by removing them from proteins targeted for structure determination. These disordered regions can also 941 942 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R be detected using nuclear magnetic resonance (NMR—Chapter 5), but the structure of these regions cannot be easily determined due to the increased conformational space sampled by the disordered regions. An analysis of a nonredundant subset of the PDB shows that 7% of the complete sequences, as deposited in the Swiss-Prot Database, contained no disordered regions (Le Gall et al., 2007). A number of sequences where >95% of the protein is resolved structurally comprise about 25% of the data set, a surprisingly small count that illustrates the prevalence of disordered regions within protein structures. The presence of disordered regions is not a technical artifact and several different techniques have been employed to study this phenomenon. Early studies used spectroscopic techniques such as infrared circular dichroism (CD), Fourier transform infrared (FTIR), electron paramagnetic resonance (EPR), and optical rotary dispersion (ORD) to detect native and nonnative structures that may form within the disordered regions. More recently, NMR and small-angle X-ray scattering (SAXS) have been used to provide quantitative data about disordered and denatured proteins (Kern, Eisenmesser, and Wolf-Watz, 2005; Mittag and Forman-Kay, 2007; Sasakawa et al., 2007; Tsutakawa et al., 2007). These experimental approaches can provide quantitative data that can be incorporated into the calculation of the observed conformational ensembles in solution to determine the structural information about denatured, unfolded, and intrinsically disordered proteins. Hydrogen–deuterium (H/ D) exchange mass spectrometry (Chapter 7) has also been used to study dynamic processes such as the role of transient structural disorder as a facilitator of protein–ligand binding (Xiao and Kaltashov, 2005). These experiments have detected structural formations within these disordered regions, and these structures have been associated with functional implications. With the development of sequence-based predictors, the prevalence of disordered regions in organisms has been investigated across the three kingdoms of life (Oldfield et al., 2005a; Ward et al., 2004). The frequency of native disorder was calculated for several representative genomes and found to have increased content in eukaryotic proteins (33.0%) compared to 2.0% and 4.2% of archaean and eubacterial proteins, respectively (Ward et al., 2004). The analysis showed that proteins containing disorder are often located in the cell nucleus with functional association to regulations of transcription and cell signaling. In a separate study, an increase in intrinsic disorder content has been observed in regulatory cell signaling, cytoskeletal, and human cancer-associated proteins (Iakoucheva et al., 2002). Disordered regions are currently being curated into a database, DisProt (Sickmeier et al., 2007), which contains 472 proteins and 1121 disordered regions as reported for release 3.6 (June 29, 2007). Computational Approaches to Understanding Protein Disorder The computational tools that have been developed to predict regions of protein flexibility and disorder range from the use of simple sequence complexity profiles to complex machine learning infrastructure schemes such as the neural network and support vector machines (SVMs) (Figure 38.2). The successful development of these tools is attributed to the fact that sequence signatures of protein disorder are present. The popular choice of training set to construct these predictors often use reported missing residues in X-ray crystallographic structures, but reported temperature factors (B-factors) and NMR characterized disordered regions have also been used. First we will discuss algorithms that do not use structural information to identify and understand disordered regions. This is achieved by either examination of the sequence space only or focusing on residues in which the structure cannot be resolved. Then we will follow with alternative strategies that use temperature P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’ Q1 Figure 38.2. General strategies to predict the disorder sequence space. Schema of various strategies used to identify and understand the sequence space of disordered regions. The differences stem largely from how the disordered regions were defined and the underlying infrastructure for analysis and prediction tool development. Within all of the sequence space, a subset of sequence space will be associated with regions with low complexity, detected disordered, or those transitioning between an ordered and a disordered state. Overlaps can occur between the subsets. factors in X-ray structures, the incorporation of homology information, and coarse-grained dynamic modeling to guide the training and disorder definition process. This short overview of disorder predictors will reflect the ongoing research efforts and common strategies employed to develop sequence-based identification of disordered regions. SEG is a successful algorithm that identified unstructured regions by examination of the changing variation in sequence complexities within the sequence database (Wootton and Federhen, 1996). For a window length L and an N-residue alphabet, the compositional complexity for a given residue is 1 K1 ¼ logN W; L where W is the multinomial coefficient (L!=PNi¼1 ni !). Alternative formulations that resemble Shannon’s entropy to measure sequence have also been used. After identifying low sequence complexity regions, the second stage of this algorithm constructs an optimal subsequence to evaluate the probability of occurrence of the observed pattern that is calculated as 1 P0 ¼ L WF; N where F is the combinatorial expression N!=PLk¼0 rk ! that yields the frequency of observing the sequence composition with this complexity and rk is the count of the number of times the complexity state is observed in the window. This probability of occurrence has been precomputed into a table that serves efficiently as an index to identify these regions. Thus, SEQ not only identifies low complexity sequence regions but also determines whether these identified regions are significant rather than a random occurrence. The success of this 943 944 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R approach hinges on the assumption that disordered regions have low sequence complexities. However, many disordered regions are not detected by SEG and therefore suggest that features other than sequence complexity are involved. To detect other disordered regions, PONDR is the first disorder predictor that uses a design of two feedforward neural networks to make predictions using several attributes such as the fractional composition and hydropathy for the 20 amino acids (Romero, Obradovic, and Dunker, 1997; Romero et al., 1997). Unlike early flexible predictors, this predictor was trained on a data set of eight- and seven-residue-long disordered regions defined by X-ray and NMR experiments, respectively. The regions were defined as having either (1) no resolvable atomic coordinates and therefore declared as missing in the PDB files of X-ray structures or (2) extensively characterized as disorder with the use of NMR techniques. Predictions were made using raw input features extracted for the sequence and smoothed out with a second predictor. Since the initial development, PONDR is now available as a series of eight different predictors that identify different ‘‘flavors’’ of disorder (Vucetic et al., 2003) indicating that disordered features have distinctive sequence characteristics within each subclass. The success of this initial development of disorder predictors on such a small training data set is surprising and may be illuminative of sequence properties to be further discussed in the subsequent section. DISOPRED (Jones and Ward, 2003) is another predictor that is also based on the use of a neural network but uses sequence profiles generated by PSI-BLAST (Altschul et al., 1997) as input features to make predictions with a postfilter that takes into account the confidence of secondary structure predictions. Often predictions are made based on physicochemical properties of the amino acid, but DISOPRED uses instead the amino acid identity, composition, and evolutionary conservation. Thus, the physicochemical properties are not explicitly captured in the input features although it may be implicitly represented. The inclusion of input features that represent evolutionary conservation was inspired when secondary structure predictors (Chapter 29) were improved using this information. The use of evolutionary information helps capture conserved features, or lack there of, between protein homologues. DISOPRED reports an accuracy of 90%, but the use of accuracy to measure success can sometimes be misleading, especially when the data set is unbalanced in class frequencies. In this case, the data set contains a much greater number of ordered residues than disordered examples, an important consideration when evaluating any predictor. Another measure to evaluate predictor performance is the use of Matthews’ correlation coefficient (MCC) that DISOPRED reports to be 0.34, suggesting an overprediction of disordered regions. RONN is another neural network algorithm that incorporates evolutionary information to improve disorder prediction. Instead of using multiple sequence alignments and sequence similarity directly, this algorithm compares the sequence to homologous proteins with characterized and annotated disordered features (Yang et al., 2005). The alignment scores of the sequence to a database of known order and disorder segments are used as the input features for prediction. This interesting strategy resulted in improvements that reduced the number of incorrect classification of residues in either the ordered or the disordered structural class. More recently, POODLE-L (Hirose et al., 2007) implemented a two-layered support vector machine that uses physicochemical properties as input features and reports an improved performance with an MCC value of 0.658. The source of improvement is difficult to ascertain, although it may be safe to speculate that either the use of the SVM for extraction or a more focused training set is the underlying source. While successful discrimination does P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’ depend on the correct selection of input features that properly represent critical properties of disorder regions, the physicochemical properties used in this predictor have also been used by other predictors. Thus, it is unlikely that this would be the source of significant improvements. Strategies that use relatively simpler algorithms compared to machine learning approach have also been used to efficiently identify these regions. GlobPlot2 (Linding et al., 2003b) uses propensities for amino acid to be in either an ordered or a disordered structure, thus creating a disorder propensity index. Different propensity indexes and scales were calculated to accommodate the different definition of disorder in the field and therefore will make predictions accordingly. IUPred (Dosztanyi et al., 2005a; Dosztanyi et al., 2005b) uses a low-resolution energetic force field based on the pair-wise interacting residues observed in structures. The total pair-wise interaction is estimated based on a quadratic form in the amino acid composition of the protein. Developments of specialized disorder predictors to identify particular features of disordered regions have also been made. The PONDR series of predictors have effectively achieved this by constructing predictors that identify pattern subsets on features such as the length of disordered regions. Wiggle is another specialized predictor that identifies flexible regions having functional importance. Functional flexibility was defined as regions in the protein where (1) the fluctuating motions exceed the mean fluctuation by more than one standard deviation and (2) these fluctuations are involved in correlated motion. The use of this definition successfully identified regions such as recognition loops, catalytic loops, and hinges in a training data set where protein motion was obtained using a coarse dynamic modeling technique. Finally, various consensus and integrated strategies that incorporate different predictors to improve disorder prediction have been developed. Consensus strategies have improved structure prediction methods in the recent years and therefore will likely be the case for disorder prediction. This approach often requires interpretation of results by the user to decide between prediction results from different methods since the integration of methods does not include an automated decision-making feature. Two such examples of integrated servers that are available to the community are PrDOS (Ishida and Kinoshita, 2007) and iPDA (Su, Chen, and Hsu, 2007). Sequence Basis for the Biophysical Property of Ordered and Disordered Regions Decoding the sequence space is at the heart of many fields, and understanding how the biophysical properties of proteins are encoded in the sequence is imperative to making inferences about the protein structural fold and function. For example, the amino acid hydrophobicities determined from several biophysical and theoretical experiments have been used to make predictions for higher order protein features such as secondary structure (Palliser and Parry, 2001). Unfortunately, advances in the field are still needed before an accurate sequence-based biophysical description of proteins is available to the community. Reoccurring amino acid bias and sequence patterns found within particular features of proteins are often examined in hopes to glean some insight into a biophysical explanation. In regard to protein disorder and flexibility, variances in protein sequences have been examined and regions of low sequence complexity have been identified to preclude structural formation (Wootton and Federhen, 1993; Wootton and Federhen, 1996). Glutamine-rich, glycine-rich, and arginine-rich sequences are often a part of this class of low sequence 945 946 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R complexity regions with an occasional periodicity nature of repeating units. Regions enriched in proline, glutamic acid, serine, and threonine (PEST) are also associated with protein disorder. The arrival of disordered predictors has allowed researchers to gain more insight into sequence bias and patterns associated with ordered and disordered structures through both the training process and subsequent analysis of the sequence space identified with these algorithms. Initially, there were some concerns that the disorder predictors were making predictions based on low complexity features similar to those identified by SEG. Instead, it has been demonstrated that amino acid composition differed between low complexity, disordered, and ordered regions (Romero et al., 2001). Disordered regions have been found to contain higher levels of R, K, E, P, and S amino acids with lower levels of C, W, Y, I, and V compared to ordered regions. Based on this analysis of change in amino acid frequency between ordered and disordered structures, the residues can be ranked from disorder promoting to order promoting as follows: K, E, D, P, N, S, Q, G, R, T, A, M, H, L, V, Y, I, F, C, W. Such ranking suggests that the amount of flexibility, and hence disorder, can be tuned depending on which residues are used and in what order. Sequences that adopt both an ordered and a disordered structure depending on the observed conformational state have been investigated to understand how such a balance could be achieved (Zhang et al., 2007). These regions are coined as having ‘‘dual personality’’ and have been collected from proteins with multiple X-ray structures in different conformations. Regions that are invisible in one conformer but resolved in another were defined as having ambivalence for either ordered or disordered regions. Residues were clustered into three major groups based on their relative abundance for disordered, ordered, or ambivalent regions. The first group contains hydrophilic and small amino acids (K, E, S, G, and A) that are largely associated with disordered regions. The second group consists mostly of hydrophobic resides (M, H, Y, I, F, C, and W) that are found in abundance within ordered regions. Finally, the third group consists of mostly hydrophilic amino acids (D, T, Q, N, P, and R) that are found in fairly equal propensities for ordered and disordered regions. These clusters are in some agreement with what have been identified by Romero et al. although there are differences. Wiggle (Gu, Gribskov, and Bourne, 2006) shows a different scenario of amino acid preferences for these regions ranked in the following decreasing order: E, K, Q, R, D, P, N, S, G, A, L, T, W, H, M, Y, F, C, V, I. This ranking shows correlation with the consensus hydrophobicity index (Palliser and Parry, 2001). Although some general trends can be observed between the different analyses, the lack of agreement suggests that more work is needed to understand how the biophysical property of protein disorder is encoded in the sequence. Investigation of higher order sequence associations with disordered regions has identified at least two nonrandom reoccurrence of patterns based on amino acid identity or physicochemical properties (Lise and Jones, 2005). The analysis was conducted on segments up to eight residues and repeated observations of proline-rich or charged segments in disordered regions were identified. Although rather restrictive parameters were used in this analysis, this examination shows that patterns associated with disordered regions are much simpler compared to those found in ordered regions and rarely contained two different amino acids. These patterns are not inclusive of all the possible patterns that may be found in disordered regions. Despite the low complexity of these local sequence patterns, the nonrandom occurrence of these patterns reveals that biophysical properties of disordered structures are dependent on both the sequence composition and order. Furthermore, different types of disordered regions have been identified with a dependency on segment length (Vucetic et al., 2003). Subclasses P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’ of disordered regions have been identified with different functional associations. More recently, these thermodynamic features of unfolded regions were recently surveyed using a structure-based thermodynamic model (Wang et al., 2008). The results show that, unlike natively folded proteins, the thermodynamics of unfolded regions is dominated by local sequence contribution and is sensitive to the composition and order of the sequence. The local dependence of the disorder thermodynamics also provided some insight regarding why certain biophysical properties of the natively folded state may be retained. In the process of understanding the sequence basis for protein disorder, it is also important to understand the contributions of evolutionary pressures that select for the disordered state. Although evolutionary conservation has been included as one of the input features to help disordered predictors discriminate between the different sequence types, strict conservation may not necessary be a critical feature of disordered regions. Disordered regions have been cited to evolve rapidly (Brown et al., 2002) and observed to have increased alternative splice sites that generate more functional diversity in multicellular organisms (Romero et al., 2006). Evidence that disordered regions can be identified with a reduced set of the amino acid alphabet further supports the notion of weak evolutionary selection (Weathers et al., 2004). Consequently, the robustness to substitutions simplifies the combinatorial possibilities of amino acid patterns; thus, it may be possible to decode a direct linkage between sequence and the biophysical properties of disordered regions. Biological Consequences In addition to paving way to a better understanding of the sequence space, many hypotheses that generalize the biological and functional significance of disordered regions in proteins stem from computational analysis with disordered predictors. Detection of the prevalence of disordered regions across the different genomes mentioned earlier is one example of how the disorder predictors have been applied. At least 28 functional roles of disorder proteins have been identified and can be grouped into four main functional classes: (1) molecular recognition, (2) molecular assembly, (3) protein modification, and (4) entropic chain activities (Dunker et al., 2002; Radivojac et al., 2007). These functional classes largely appear to reflect intermolecular function linked to regulatory processes, but the biophysical properties of disordered regions are also important for protein folding, allostery, and catalytic processes that are not necessarily captured by this functional classification system presented by Dunker and colleagues. Patterns detected within disordered regions have also been used to functionally classify proteins, particularly in cases where there are no structural homologues (Lobley et al., 2007). First, disordered structures provide an advantage in molecular recognition by promoting promiscuous, transient binding for several targets and therefore help to increase the complexities of protein interaction networks (Tompa, Szasz, and Buday, 2005). The resulting promiscuous bindings allow these proteins to have multiple cellular roles (Sandhu and Dash, 2006). Binding mechanisms used by disordered regions rely more on hydrophobic–hydrophobic interactions with more intermolecular contacts that cover a much larger surface area of the target protein compared to ordered binding sites (Meszaros et al., 2007; Vacic et al., 2007). This is often achieved through a continuous segment that sometimes contains preformed structural elements where the native structure is fully induced after binding to the substrate (Fuxreiter et al., 2004). Investigation of underlying linear motifs important for recognition suggests that order favoring sequences are grafted between disordered regions that serve as a carrier (Fuxreiter, Tompa, and Simon, 2007). 947 948 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Disordered regions have been identified to be important for complex formation for larger complexes such as the viral capsid, bacterial flagellar system, cytoskeleton, ribosome, and clathrin coat (Namba, 2001; Dafforn and Smith, 2004; Ward et al., 2004). A significant correlation was observed between predicted structural disorder and the number of proteins assembled into complexes conducted on E. coli and S. cerevisiae proteins (Hegyi, Schad, and Tompa, 2007). The larger complexes show a higher average of disorder content with longer predicted segments. These results are in agreement with the idea that disordered regions are involved with protein binding and molecular recognition that could be one contributing mechanism to complex formation. Alternatively, disordered regions in complex formation may simply serve as linkers between well-formed domains. The hypotheses presented here have been based on bioinformatics analysis and need to be further investigated experimentally. Other disordered regions have been cited to act as entropic chains comprising a functional class that includes linkers, bristles, springs, and clocks (Dunker et al., 2002). This functional class is relatively less studied and appears to play a role in introducing a level of organization in time and space. For example, linkers serve to link domains while bristles help keep molecules apart through molecular exclusion. Springs are segments that have restoring forces to favor a randomized fold and become restricted when stretched as observed in titan molecules found in muscle fibers (Labeit and Kolmerer, 1995; Kellermayer et al., 2000) and elastin (Pometun, Chekmenev, and Wittebort, 2004). The property of increased flexibility for disorder regions has been hypothesized to be exploited and used as a ‘‘random generator’’ that effects the timing of biological process such as determining the closure of a voltage-gated channel (Wissmann et al., 1999; Wissmann et al., 2003). The aforementioned four categories are not exclusive of each other nor are they comprehensive of all possible functions associated with disordered regions that are still yet to be fully understood. For example, a highly disordered loop in cochaperonin GroES is important for binding to GroEL and has also been suggested to facilitate the cycles observed for chaperonin-mediated protein folding through modulation of binding affinity without affecting specificity (Landry et al., 1996). This example serves to suggest that a certain level of flexibility necessary for function is being evolutionarily selected and conserved. Retaining this balance of flexibility is also evident in the presence of disordered regions that are important for catalysis and allostery. Changes in local segmental flexibility were studied in the catalytic subunit of cAMP-dependent protein kinase using site-directed labeling and fluorescence spectroscopy (Li et al., 2002). The backbone located around the B-helix was found to have reduced flexibility only when the substrate and pseudosubstrate are bound to the catalytic domain. This stage of the catalytic cycle coincided with the phosphoryl transfer transition suggesting that internal disorder is important for this catalytic step. In another example using single-molecule enzymatic assays, DNA was hydrolyzed by lambda exonuclease with contributions from sequence-dependent factors and disorder arising from conformational changes (van Oijen et al., 2003). Although popular competing allosteric models are based on changes observed in rigid structure bodies, alternative views propose that proteins can be regulated through changes in protein dynamics. The Cooper–Dryden model is a mathematical formulation that shows protein allostery can be achieved in the absence of structural change (Cooper and Dryden, 1984). The dimeric CAP that binds to cAMP is an example of the Cooper–Dryden model where changes were observed in the dynamics of the system but not the structure (Popovych et al., 2006). In another example, dynamics is an integral part of the allosteric response initiated by a ligand-induced disorder to order transition in the adenylate binding P R O T E I N D I S O R D E R : U N D E R S T A N D I N G T H E RE A L M O F ‘ ‘ I N V I S I B L E ’ ’ loop of the biotin repressor, a transcription regulatory protein (Naganathan and Beckett, 2007). Changes in internal fluctuation between different stages of the activation cycle for cyclin-dependent kinase 2 have been identified to be associated with functionally important regions for regulation and catalytic activity with possible detection of entropy compensation mechanisms being utilized (Gu and Bourne, 2007). The advantages of coupled disordered regions for allosteric control have been demonstrated through statistical mechanics (Hilser and Thompson, 2007). Disease Impacts Disordered regions in proteins have been implicated in several diseases such as neurodegenerative diseases, cardiovascular diseases, and cancer. These pathogenic culprits contain disordered regions making it difficult to conduct structural studies with X-ray crystallography and NMR. NACP, for example, is a natively unfolded protein that seeds the polymerization of amyloid proteins leading to Alzheimer’s disease and impacts learning (Weinreb et al., 1996). It is suggested that the disorder regions allow for promiscuous binding and help potentiate protein–protein interaction that leads to the formation of these insoluble fibrils. Likewise, the tau protein found in Alzheimer’s tangles is characterized to have intrinsically disordered regions and leads to the formation of amyloid fibrils connected with disease progression (Skrabana, Sevcik, and Novak, 2006; Skrabana et al., 2006). A subset of eukaryotic proteins related to cardiovascular disease (CVD) was examined and concluded to be enriched in disorder content (Cheng et al., 2006a). The analysis was conducted with PONDR disorder predictions, cumulative distribution function analysis, and charge–hydropathy plot analysis. Predictions for a-helical molecular recognition features suggest high abundance within these proteins. The percentage of CVD containing >30 residues predicted to be disordered was 57 4% compared to 47 4% of eukaryotic proteins in Swiss-Prot. The role of disorder in cardiovascular diseases needs to be further validated experimentally, but the finding does not come as a surprise since disordered regions are found to be associated with 66 6% of signaling molecules, proteins that often have regulatory roles. Diseases are often a result of regulated processes gone awry. Similarly, 79 5% of human cancer associated proteins have been found to contain regions of disorder that are at least 30 consecutive residues in length (Iakoucheva et al., 2002). One example of an oncoprotein is the HPV16 E7 that is an extended dimer with a stable and cooperative fold but displays properties of natively unfolded proteins (Garcia-Alai, Alonso, and de Prat-Gay, 2007). The region of disorder is located at the Nterminal region of the E7 domain that contains two important sites for regulation: (1) the retinoblastoma tumor suppressor binding site for molecular recognition and (2) casein kinase II phosphorylation site that induces stabilization with phosphorylation. The structural plasticity of this region has allowed for adaptation to binding of a variety of protein targets and regulation of protein turnover. HPV16 is one of the human papillomavirus strains associated with high frequency to cervical cancer. Case Study: Disorder in the Glucocorticoid Receptor We present the structural anatomy of a transcription factor in more detail as an example to show how disordered regions may play a functional role (Figure 38.3). Glucocorticoid receptor (GR) is a steroid binding nuclear receptor with well-defined domain boundaries (McEwan et al., 2007). The receptor is composed of three domains with structures available 949 950 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Figure 38.3. Anatomy of the glucocorticoid receptor. The structure of nearly half of the glucocorticoid receptor cannot be resolved due to intrinsic disorder in the N-terminal domain (yellow) that contains the transactivating motif AF1 (green). Low-resolution structural information shows a composition of a-helices in the AF1 core region (187–244). Residues 399–419 are found to contain the PEST motif of proline, glutamic acid, serine, and threonine that is associated with highly disordered regions. High-resolution structures are available for the DNA (blue) and steroid (red) Q2 binding domains connected by a hinge (orange). for the DNA and ligand binding domains located at the C-terminal end in the bound form. The N-terminal domain (NTD), on the other hand, is highly disordered and no highresolution structural data are available to study this region that contains the transactivating AF1 domain (residues 77–262) involved in protein–protein interactions and regulation of transcriptional activity (Lavery and McEwan, 2005). However, significant structural data have been obtained using alternative methods such as biochemical analysis, circular dichroism, NMR, fluorescence, and Fourier transform infrared spectroscopy. Predictions for the structural content of this region have also been made using secondary structure prediction algorithms. These data collectively show that GR-NTD potentially consists of a mixture of a-helix, b-strand, and coil conformations. The disordered state of this region is hypothesized to provide a mechanism for allosteric control that allows for the adoption of different conformers that subsequently create different binding interfaces to interact with a multitude of targets. This feature may be particularly important for the AF1 region that is found to be 27% a-helical and 39% disordered in GR. The AF1 region may be an example of molecular recognition elements important for protein–protein interactions that use disordered regions as shuttles mentioned earlier. Through mutagenic studies, the induced formation of a-helical structures in this region has been correlated with the transactivation potential of GR (Dahlman-Wright et al., 1995; Dahlman-Wright and McEwan, 1996). This example demonstrates how regulation of transcriptional activity is achieved through modulating the order–disorder transition state that can be induced through a variety of factors such as DNA binding events (Lefstin and Yamamoto, 1998; Kumar et al., 1999) and even the presence of structure inducing osmolytes (Baskakov et al., 1999; Kumar et al., 2007). This strategy may be commonly used by all transcription factors as suggested by an analysis with PONDR that shows a relatively increased disorder content in transcription factors compared to other subsets of the P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S eukaryotic proteins. Furthermore, the transcription activation regions are identified to have higher disorder content compared to the DNA binding region for the majority of the transcription factors (Liu et al., 2006). PROTEIN CONFORMATIONAL VARIANTS AND ENSEMBLES A discussion about protein disorder is really a discussion of protein conformational variants and the resulting ensembles that are the underlying basis for all biological phenomena and observations measured experimentally. A concept of ensemble highlights multiple possibilities that can be explored by proteins in alternative conformations rather than a single static structure, an important concept we wish to emphasize in this section. Consideration of alternative protein conformations expands not only the structural space, but also the functional space that can be regulated simply through partial unfolding that is observed as local protein disorder. Structural variations are often appreciated when differences are observed between homologous proteins, but structural variations can also be observed for a protein at a single equilibrium state or between two states such as a ligand-bound and an unbound conformation (Figure 38.4). In the field of structural biology and structural bioinformatics, it is convenient Figure 38.4. Conformational variations in calmodulin. Comparison of (a) one conformational state of yeast calmodulin and (b) 31 states aligned at the N-terminal domain in the absence of calcium ions (PDBID: 1LKJ). (c) The bovine calmodulin adopts a dumbbell-like shape with calcium binding, which is different from the more globular structure found in yeast. (d) Calmodulin bound to a substrate (white spheres). 951 952 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R to view high-resolution structural data as a single molecule, but we must remind ourselves that this interpretation is not the complete view. X-ray crystallographic studies are a collective contribution of all the protein molecules at equilibrium state found in the crystal lattice. Thus, the X-ray structure would represent the dominant conformation in the ensemble with regions of high temperature factors indicating a higher conformational variability. NMR, on the other hand, provides multiple solutions for conformations found in solution, thus instilling a greater appreciation in the structure interpreter for conformational variations. Protein dynamics and disorder leading to conformational variation is observed in NMR experiments as resonance overlap and peak broadening from conformational averaging and contributions from intermediate time scale dynamics. As an example of a protein that exists in many different conformational variations, we use calmodulin to illustrate the point (Figure 38.4). This regulator responds to calcium ions and exists in three main conformational states: (1) the apo-structure, (2) bound to calcium ions, (3) and bound to the target substrate. The apo-form of calmodulin has a structure of two globular domains connected by a hinge as observed in an NMR structure of calmodulin from Saccharomyces cerevisiae (Figure 38.4a). Variations within a single state can be immediately observed in the apo-structure where alternative conformational states are aligned based on the N-terminal domain (Figure 38.4b). The C-terminal globular domain can exist in a different conformation relative to the N-terminal domain. Variations between homologues are observed in the calcium-bound bovine calmodulin with a helical linker region between the two domains whereas the yeast calmodulin adopts a more globular structure (Figure 38.4c). Finally, significant structural rearrangement is observed with binding to substrate. A multitude of structural conformations donned on by calmodulin represent some challenges that face the structural bioinformatics field. A biophysical explanation for protein disorder can be described by the underlying presence of conformational variations (Figure 38.5). An important concept that must be delivered here is that most experimental measurements of proteins are not single-molecule studies and therefore are collective contributions of all protein molecules in the solution. Thus, the observed measurement can be written as the summed contribution of each conformational state in the solution: X hObsi ¼ Pi þ Obsi : Q3 With this in mind, disorder in X-ray structure, for example, arises when there are many conformational variations that do not give rise to a single converged structure that is viewed as ‘‘ordered.’’ Sometimes highly ordered regions can be mistaken to be a disordered structure, particularly if large domain motions are involved such as those observed in calmodulin (Figure 38.4b). The contribution of different states to the observation can be explained by one of the two models that represent the ratio of states differently (Figure 38.5). The first model assumes a discrete two-state conformation while the second allows for additional conformational states to be present. We illustrate the impact of the difference between the two models by applying it to the unfolding process of proteins, for example (Figure 38.5a). In the first model, only the native (order) and denatured (disorder) states of the protein are allowed to exist in solution. The observed destabilization of proteins with increasing denaturant is then a result of the changing ratio between these two states in the solution (Figure 38.5b). The probability of observing an ordered structure will decrease as the probability of observing a disordered structure will increase. In the second model, intermediate states containing partially P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S Figure 38.5. Ensemble-based description of protein disorder. Biological observations are the sum contribution of the different states in solution. The unfolding process of proteins, for example, can be described with one of the two models. Model 1 assumes two discrete states in solution in the native (order) conformation or denatured (disorder) conformation. The denaturation process is the changing ratio of these two states from one spectrum to the other. Model 2 allows from other intermediate conformations to contribute to the observation. Figure also appears in Color Figure section. unfolded conformers are allowed. Thus, the probability of observing each intermediate state as well as the native and denatured states contributes to the observation. The importance of an ensemble-based interpretation of the native state can be demonstrated through the use of COREX, a statistical thermodynamic model that uses free energy values that have been structurally parameterized (Hilser and Freire, 1996; Hilser et al., 2006). This experimentally validated model allows us to calculate the heat capacity (DCp), enthalpy (DH), and entropy (DS) differences between the partially unfolded states and the native state (reported in kcal/K/mol). More importantly, the derivation allows for the interpretation of residue stability and contribution to the energetics of the ensemble that have provided insight into possible mechanisms for cooperative (Hilser et al., 1998) and allosteric (Hilser and Thompson, 2007) processes. 953 954 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Briefly, the relative Gibbs free energy of each possible conformational state adopted by the protein (DGi) is expressed in terms of the standard thermodynamic equation: DGi ¼ DHi TDSi : COREX obtains the relative Gibbs free energy using (1) a high-resolution structure and (2) a statistical thermodynamic model where the variables have been parameterized based on changes in the accessible surface area (DASA) between the native and the partially unfolded state. The enthalpic contribution to the state can be written as the sum of enthalpic contributions from apolar (DHap) and polar residues (DHpol): DH ¼ DHap þ DHpol : The enthalpy change is related to DASA (A2) in the following way and is parameterized at a reference temperature of 60 C, which is the median unfolding temperature for the data set of model proteins used: DHð60Þ ¼ aH ð60Þ ¼ bH ð60Þ ¼ The entropy of the system conformational DSconf entropies: aH ð60ÞDASAap þ bH ð60ÞDASApol ; 8:44; 31:4: is the sum of contributions from solvent DSsolv and DS ¼ DSsolv þ DSconf : The solvent entropy can be calculated with the knowledge of the heat capacity of the protein as derived: DSsolv ¼ DSsolv;ap þ DSsolv;pol ; * * DSsolv ¼ DCp;ap lnðT=TS;ap Þ þ DCp;pol lnðT=TS;pol Þ; * * where TS;ap ¼ 385:15 and TS;pol ¼ 335:15 are the reference temperatures at which the hydration entropy is equal to zero (Baldwin, 1986; Murphy and Freire, 1992; D’Aquino et al., 1996). The heat capacity is found to scale to DASA for temperatures up to 80 C as follows: ¼ DCp;ap þ DCp;pol ; DCp DCp ¼ ac ðTÞDASAap þ bc ðTÞ*DASApol ; ac ðTÞ ¼ 0:45 þ 2:63 104 ðT25Þ4:2 105 ðT25Þ2 ; bc ðTÞ ¼ 0:26 þ 2:85 104 ðT25Þ þ 4:31 105 ðT25Þ2 : Finally, to complete the calculation of DS, conformational entropy is performed as follows: X X X DSconf ¼ DSbu-ex þ DSex-un þ DSbb : P The three contributions to conformational entropy are DSbu-ex : buried residues P (1) that become exposed with partial unfolding; (2) DS : exposed residues in ex-un P the unfolded state; and (3) DSbb : backbone entropy changes for residues that become unfolded. The entropy contributions of each amino acid have been determined and these values are used in the calculation (Lee et al., 1994; D’Aquino et al., 1996). P R O T E I N CO N F O R M A T I O N A L V A R I A N T S A N D E N S E M B L E S The relative Gibbs free energy of each state is calculated with these parameterized thermodynamic variables and will be important in determining the probability of observing such a conformational state in the ensemble. Under equilibrium conditions, statistical mechanics states that the probability of any given conformational state i (Pi) is given by the equation expðDGi =RTÞ ; Pi ¼ Q where the statistical weights, also known as the Boltzmann exponents (exp(DG/RT), are defined by DGi relative to the gas constant R and temperature T. Q is the conformational partition function defined as the sum of the statistical weights of all the states accessible to the protein: N X Q¼ expðDGi =RTÞ: i¼0 These probabilities reflect preferences for the protein to adopt a partially unfolded conformational state and can be extended to calculate the free energy contributions of each residue to the ensemble. Using the probability-weighted conformations in the generated ensemble, residue stability in the protein can be calculated as the ratio of residues in the folded and unfolded states: P Pf; j kf; j ¼ P ; Pnf; j P P where Pf; j and Pnf; j are the summed probabilities of all the states in which the residue is either folded or unfolded, respectively. The free energy contribution of each residue to the ensemble can then be calculated: DGf ;j ¼ RT ln kf; j: The importance of this derived formalism is that it can be extended to study changes in energetic contributions at the residue level and provide insights into functional processes such as cooperativity (Liu, Whitten, and Hilser, 2006; Liu, Whitten, and Hilser, 2007; Pan, Lee, and Hilser, 2000) and allostery (Hilser and Thompson, 2007) by interpreting proteins as an ensemble of multiple conformations. Recent systems in which cooperativity has been identified and studied with COREX are dihydrofolate reductase and elgin C. The studies defined structural–thermodynamic linkages based on correlations in stability changes between residues in the ensemble as captured by kf, j. By examining these correlations, the model helps to define a mechanism for site–site communication, particularly between ligand binding sites and distantly located regions. The analyses suggest an alternative view to energetic coupling between residues when a clear, connected pathway of intramolecular interactions between them cannot be identified. The results also further emphasize the importance of entropic contributions that is often neglected. While it is important to produce the correct high-resolution structure using fold recognition, homology modeling, and ab initio structure prediction approaches (Chapter 29–32), it is also equally important to construct other physically and chemically valid conformational states that can be sampled by the protein. Generating these structural variants that collectively produce a protein ensemble can be achieved with a variety of models, some more restrictive than others. Restrictive models are those that assume disordered or partially 955 956 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R unfolded regions of the protein to adopt only coil structures (Bernado et al., 2005; Jha et al., 2005). A less restrictive model such as TraDES (Feldman and Hogue, 2000) is an unbiased conformational sampling method that generates plausible random structures allowing for both native and nonnative contacts. Other conformer generating methods including Rosetta (Simons et al., 1997) and CNS (Brunger et al., 1998) can also be used to predict structures in these disordered and highly flexible regions. The relative probabilities of the generated conformational variants that potentially populate the ensemble can then be calculated with experimental constraints using ENSEMBLE (Choy and Forman-Kay, 2001; Marsh et al., 2007). The population weight assignment is achieved with a pseudoenergy minimization process and a Monte Carlo algorithm. With these strategies we could possibly begin to make interpretation of the functional consequences arising from these variety of conformational states. FUTURE DIRECTIONS Aside from the growing amount of literature on this topic, community recognition of the importance of understanding disordered region is signified by the inclusion of disorder predictor evaluation in CASP (CH 28). The first evaluation of disorder predictors appeared in CASP5 (Melamud and Moult, 2003) in 2002 with results showing successful detection for over half of the disordered residues in the blind set with a low rate of overprediction. However, proper evaluation of these predictors remains a challenge that still needs to be refined. This is to be expected due to both the varied definition and the existence of these different types of disordered regions. Furthermore, as noted by the assessors, the data set used for evaluation is skewed toward short disordered regions identified by missing residues in X-ray crystallographic structures. As such, caution should be taken when interpreting the performance of these results. In spite of the mentioned weaknesses, the evaluation process is a necessity because the predictors serve many useful purposes. The most recent benchmarking effort conducted at CASP7 in 2006 showed that in spite of the many new generations of disorder predictors, significant improvements in the performances have not be observed and variations are seen in their sensitivity and specificity for detecting these regions (Bordoli, Kiefer, and Schwede, 2007). Improvements in disorder predictors cannot be made without a systematic study of these regions and several experimental strategies using techniques that combine heat and acid treatment with mass spectrometry and/or 2D electrophoresis have been proposed to tackle this issue (Csizmok et al., 2007). The applications of disordered predictors are not limited to target protein identification and elimination for structural genomic efforts. New applications include improved functional categorization of newly identified proteins (Lobley et al., 2007) and a potential role in improved drug design (Cheng et al., 2006b). The power of leveraging what we know about disordered regions will prove itself to be immensely valuable for the majority of the proteins that do not adopt a native fold. Currently, the function of about 35% of proteins cannot be categorized using homology-based assignment, leaving researchers with a large set of ‘‘hypothetical protein’’ drug targets with unknown function (Ofran et al., 2005). A systematic characterization of protein disorder can be achieved by combining the developments in improved computational and experimental analysis (Bracken et al., 2004). Finally, the importance of understanding subtle differences in conformational variations, due to effects such as mutational events, has always been recognized by the structural bioinformatics field. New measures to better understand these variations are indicated by the REFERENCES 957 goals presented to the structure prediction community proposed at the conclusion of CASP6 (Moult et al., 2005). The four challenges to overcome are to (1) model the structure of single-residue mutants, (2) model the structural changes associated with specificity changes within protein families, (3) improve refinement methods to produce a 0.5 A root-mean-square-deviation (RMSD) improvement in the Ca accuracy of models, and (4) devise a scoring function that will reliably pick the most accurate model of the possible candidate structures for new fold predictions. As the community addresses these challenges, an ensemble view of conformational variations should be kept in mind to understand functional consequences as well. WEB RESOURCES Resource References URL DisProt: the database of disordered proteins DisProt: list of disorder predictors DISOPRED VLXT (PONDR) GlobPlot DisEMBL PrDOS RONN Wiggle PROFbval Sickmeier et al. (2007) http://www.disprot.org/ Not published, a part of DisProt Jones and Ward (2003) Romero et al. (1997) Linding et al. (2003b) Linding et al. (2003a) Ishida and Kinoshita (2007) Yang et al. (2005) Gu et al. (2006) Schlessinger et al. (2006) http://www.ist.temple.edu/disprot/ predictors.php http://bioinf.cs.ucl.ac.uk/disopred/ http://www.pondr.com http://globplot.embl.de http://dis.embl.de http://prdos.hgc.jp/cgi-bin/top.cgi http://www.strubi.ox.ac.uk/RONN http://wiggle.sdsc.edu http://cubic.bioc.columbia.edu/ services/profbval/ http://biominer.bime.ntu.edu.tw/ipda/ iPDA: integrated protein disorder analyzer ENSEMBLE CNS ROSETTA COREX/BEST server Su et al. (2007) Choy and Forman-Kay (2001) http://pound.med.utoronto.ca/forman/ and Marsh et al. (2007) ensemble/ensemble.html Brunger et al. (1998) http://helix.nih.gov/apps/structbio/cns. html Simons et al. (1997) http://www.rosettacommons.org/ Vertrees et al. (2005) http://www.best.utmb.edu/BEST/ REFERENCES Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997): Gapped BLASTand PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. Baldwin RL (1986): Temperature dependence of the hydrophobic interaction in protein folding. Proc Natl Acad Sci USA 83:8069–8072. Baskakov IV, Kumar R, Srinivasan G, Ji YS, Bolen DW, Thompson EB (1999): Trimethylamine N-oxide-induced cooperative folding of an intrinsically unfolded transcription-activating fragment of human glucocorticoid receptor. J Biol Chem 274:10693–10696. 958 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Bernado P, Blanchard L, Timmins P, Marion D, Ruigrok RW, Blackledge M (2005): A structural model for unfolded proteins from residual dipolar couplings and small-angle X-ray scattering. Proc Natl Acad Sci USA 102:17002–17007. Blow DM (1977): Flexibility and rigidity in protein crystals. Ciba Found Symp 55–61. Bonvin AM, Brunger AT (1995): Conformational variability of solution nuclear magnetic resonance structures. J Mol Biol 250:80–93. Bordoli L, Kiefer F, Schwede T (2007): Assessment of disorder predictions in CASP7. Proteins 69:129–136. Bracken C, Iakoucheva LM, Rorner PR, Dunker AK (2004): Combining prediction, computation and experiment for the characterization of protein disorder. Curr Opin Struct Biol 14:570–576. Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK (2002): Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol 55:104–110. Brunger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, et al. (1998): Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr D 54:905–921. Cheng Y, LeGall T, Oldfield CJ, Dunker AK, Uversky VN (2006a): Abundance of intrinsic disorder in protein associated with cardiovascular disease. Biochemistry 45:10448–10460. Cheng Y, LeGall T, Oldfield CJ, Mueller JP, Van YY, Romero P, Cortese MS, Uversky VN, Dunker AK (2006b): Rational drug design via intrinsically disordered protein. Trends Biotechnol 24:435–442. Choy WY, Forman-Kay JD (2001): Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J Mol Biol 308:1011–1032. Cooper A, Dryden DT (1984): Allostery without conformational change. A plausible model. Eur Biophys J 11:103–109. Csizmok V, Dosztanyi Z, Simon I, Tompa P (2007): Towards proteomic approaches for the identification of structural disorder. Curr Protein Pept Sci 8:173–179. Dafforn TR, Smith CJ (2004): Natively unfolded domains in endocytosis: hooks, lines and linkers. EMBO Rep 5:1046–1052. Dahlman-Wright K, Baumann H, McEwan IJ, Almlof T, Wright AP, Gustafsson JA, Hard T (1995): Structural characterization of a minimal functional transactivation domain from the human glucocorticoid receptor. Proc Natl Acad Sci USA 92:1699–1703. Dahlman-Wright K, McEwan IJ (1996): Structural studies of mutant glucocorticoid receptor transactivation domains establish a link between transactivation activity in vivo and alphahelix-forming potential in vitro. Biochemistry 35:1323–1327. D’Aquino JA, Gomez J, Hilser VJ, Lee KH, Amzel LM, Freire E (1996): The magnitude of the backbone conformational entropy change in protein folding. Proteins 25:143–156. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005a): IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21:3433–3434. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005b): The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347:827–839. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z (2002): Intrinsic disorder and protein function. Biochemistry 41:6573–6582. Feldman HJ, Hogue CW (2000): A fast method to sample real protein conformational space. Proteins 39:112–131. Fuentes G, Nederveen AJ, Kaptein R, Boelens R, Bonvin AM (2005): Describing partially unfolded states of proteins from sparse NMR data. J Biomol NMR 33:175–186. REFERENCES Fuxreiter M, Simon I, Friedrich P, Tompa P (2004): Preformed structural elements feature in partner recognition by intrinsically unstructured proteins. J Mol Biol 338:1015–1026. Fuxreiter M, Tompa P, Simon I (2007): Local structural disorder imparts plasticity on linear motifs. Bioinformatics 23:950–956. Garcia-Alai MM, Alonso LG, de Prat-Gay G (2007): The N-terminal module of HPV16 E7 is an intrinsically disordered domain that confers conformational and recognition plasticity to the oncoprotein. Biochemistry 46:10405–10412. Gu J, Gribskov M, Bourne PE (2006): Wiggle-predicting functionally flexible regions from primary sequence. PLoS Comput Biol 2:e90. Gu J, Bourne PE (2007): Identifying allosteric fluctuation transitions between different protein conformational states as applied to cyclin dependent kinase 2. BMC Bioinform 8:45. Hegyi H, Schad E, Tompa P (2007): Structural disorder promotes assembly of protein complexes. BMC Struct Biol 7:65. Hilser VJ, Freire E (1996): Structure-based calculation of the equilibrium folding pathway of proteins. Correlation with hydrogen exchange protection factors. J Mol Biol 262:756–772. Hilser VJ, Dowdy D, Oas TG, Freire E (1998): The structural distribution of cooperative interactions in proteins: analysis of the native state ensemble. Proc Natl Acad Sci USA 95:9903–9908. Hilser VJ, Garcia-Moreno EB, Oas TG, Kapp G, Whitten ST (2006): A statistical thermodynamic model of the protein ensemble. Chem Rev 106:1545–1558. Hilser VJ, Thompson EB (2007): Intrinsic disorder as a mechanism to optimize allosteric coupling in proteins. Proc Natl Acad Sci USA 104:8311–8315. Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T (2007): POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 23:2046–2053. Iakoucheva LM, Brown CJ, Lawson JD, Obradovic Z, Dunker AK (2002): Intrinsic disorder in cellsignaling and cancer-associated proteins. J Mol Biol 323:573–584. Ishida T, Kinoshita K (2007): PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res 35:W460–464. Jha AK, Colubri A, Freed KF, Sosnick TR (2005): Statistical coil model of the unfolded state: resolving the reconciliation problem. Proc Natl Acad Sci USA 102:13099–13104. Jones DT, Ward JJ (2003): Prediction of disordered regions in proteins from position specific score matrices. Proteins 53:(Suppl. 6): 573–578. Kellermayer MS, Smith SB, Granzier HL, Bustamante C (1997): Folding–unfolding transitions in single titin molecules characterized with laser tweezers. Science 276:1112–1116. Kemmink J, van Mierlo CP, Scheek RM, Creighton TE (1993): Local structure due to an aromaticamide interaction observed by 1H-nuclear magnetic resonance spectroscopy in peptides related to the N terminus of bovine pancreatic trypsin inhibitor. J Mol Biol 230:312–322. Kern D, Eisenmesser EZ, Wolf-Watz M (2005): Enzyme dynamics during catalysis measured by NMR spectroscopy. Methods Enzymol 394:507–524. Kumar R, Baskakov IV, Srinivasan G, Bolen DW, Lee JC, Thompson EB (1999): Interdomain signaling in a two-domain fragment of the human glucocorticoid receptor. J Biol Chem 274:24737–24741. Kumar R, Serrette JM, Khan SH, Miller AL, Thompson EB (2007): Effects of different osmolytes on the induced folding of the N-terminal activation domain (AF1) of the glucocorticoid receptor. Arch Biochem Biophys 465:452–460. Kuriyan J, Osapay K, Burley SK, Brunger AT, Hendrickson WA, Karplus M (1991): Exploration of disorder in protein structures by X-ray restrained molecular dynamics. Proteins 10:340–358. Labeit S, Kolmerer B (1995): Titins: giant proteins in charge of muscle ultrastructure and elasticity. Science 270:293–296. 959 960 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Landry SJ, Taher A, Georgopoulos C, van der Vies SM (1996): Interplay of structure and disorder in cochaperonin mobile loops. Proc Natl Acad Sci USA 93:11622–11627. Lavery DN, McEwan IJ (2005): Structure and function of steroid receptor AF1 transactivation domains: induction of active conformations. Biochem J 391:449–464. Lee KH, Xie D, Freire E, Amzel LM (1994): Estimation of changes in side chain configurational entropy in binding and folding: general methods and application to helix formation. Proteins 20:68–84. Lefstin JA, Yamamoto KR (1998): Allosteric effects of DNA on transcriptional regulators. Nature 392:885–888. Le Gall T, Romero PR, Cortese MS, Uversky VN, Dunker AK (2007): Intrinsic disorder in the Protein Data Bank. J Biomol Struct Dyn 24:325–342. Li F, Gangal M, Juliano C, Gorfain E, Taylor SS, Johnson DA (2002): Evidence for an internal entropy contribution to phosphoryl transfer: a study of domain closure, backbone flexibility, and the catalytic cycle of cAMP-dependent protein kinase. J Mol Biol 315:459–469. Linding R, Jensen LJ, Diella F, Bork P, Gibson TJ, Russell RB (2003a): Protein disorder prediction: implications for structural proteomics. Structure 11:1453–1459. Linding R, Russell RB, Neduva V, Gibson TJ (2003b): GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31:3701–3708. Lindorff-Larsen K, Kristjansdottir S, Teilum K, Fieber W, Dobson CM, Poulsen FM, Vendruscolo M (2004): Determination of an ensemble of structures representing the denatured state of the bovine acyl-coenzyme a binding protein. J Am Chem Soc 126:3291–3299. Lise S, Jones DT (2005): Sequence patterns associated with disordered regions in proteins. Proteins 58:144–150. Liu J, Perumal NB, Oldfield CJ, Su EW, Uversky VN, Dunker AK (2006): Intrinsic disorder in transcription factors. Biochemistry 45:6873–6888. Liu T, Whitten ST, Hilser VJ (2006): Ensemble-based signatures of energy propagation in proteins: a new view of an old phenomenon. Proteins 62:728–738. Liu T, Whitten ST, Hilser VJ (2007): Functional residues serve a dominant role in mediating the cooperativity of the protein ensemble. Proc Natl Acad Sci USA 104:4347–4352. Lobley A, Swindells MB, Orengo CA, Jones DT (2007): Inferring function using patterns of native disorder in proteins. PLoS Comput Biol 3:e162. Marsh JA, Neale C, Jack FE, Choy WY, Lee AY, Crowhurst KA, Forman-Kay JD (2007): Improved structural characterizations of the drkN SH3 domain unfolded state suggest a compact ensemble with native-like and non-native structure. J Mol Biol 367:1494–1510. McEwan IJ, Lavery D, Fischer K, Watt K (2007): Natural disordered sequences in the amino terminal domain of nuclear receptors: lessons from the androgen and glucocorticoid receptors. Nucl Receptor Signal 5:e001. Melamud E, Moult J (2003): Evaluation of disorder predictions in CASP5. Proteins 53:561–565. Meszaros B, Tompa P, Simon I, Dosztanyi Z (2007): Molecular principles of the interactions of disordered proteins. J Mol Biol 372:549–561. Mittag T, Forman-Kay JD (2007): Atomic-level characterization of disordered protein ensembles. Curr Opin Struct Biol 17:3–14. Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A (2005): Critical assessment of methods of protein structure prediction (CASP)—round 6. Proteins 61:(Suppl. 7): 3–7. Murphy KP, Freire E (1992): Thermodynamics of structural stability and cooperative folding behavior in proteins. Adv Protein Chem 43:313–361. Naganathan S, Beckett D (2007): Nucleation of an allosteric response via ligand-induced loop folding. J Mol Biol 373:96–111. REFERENCES Namba K (2001): Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells 6:1–12. Ofran Y, Punta M, Schneider R, Rost B (2005): Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discov Today 10:1475–1482. Oldfield CJ, Cheng Y, Cortese MS, Brown CJ, Uversky VN, Dunker AK (2005a): Comparing and combining predictors of mostly disordered proteins. Biochemistry 44:1989–2000. Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL (2005b): Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 59:444–453. Palliser CC, Parry DA (2001): Quantitative comparison of the ability of hydropathy scales to recognize surface beta-strands in proteins. Proteins 42:243–255. Pan H, Lee JC, Hilser VJ (2000): Binding sites in Escherichia coli dihydrofolate reductase communicate by modulating the conformational ensemble. Proc Natl Acad Sci USA 97:12020–12025. Pometun MS, Chekmenev EY, Wittebort RJ (2004): Quantitative observation of backbone disorder in native elastin. J Biol Chem 279:7982–7987. Popovych N, Sun S, Ebright RH, Kalodimos CG (2006): Dynamically driven protein allostery. Nat Struct Mol Biol 13:831–838. Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK (2007): Intrinsic disorder and functional proteomics. Biophys J 92:1439–1456. Ringe D, Petsko GA (1986): Study of protein dynamics by X-ray diffraction. Methods Enzymol 131:389–433. Romero P, Obradovic Z, Dunker K (1997): Sequence data analysis for long disordered regions prediction in the calcineurin family. Genome Inform Ser Workshop Genome Inform, Vol. 8, pp 110–124. Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997): Identifying disordered regions in proteins from amino acid sequences. Proceedings of the IEEE. International Conference on Neural Networks, Vol. 1, pp 90–95. Romero P, Obradovic Z, Li XH, Garner EC, Brown CJ, Dunker AK (2001): Sequence complexity of disordered protein. Proteins 42:38–48. Romero PR, Zaidi S, Fang YY, Uversky VN, Radivojac P, Oldfield CJ, Cortese MS, Sickmeier M, LeGall T, Obradovic Z, Dunker AK (2006): Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Nat Acad Sci USA 103:8390–8395. Sandhu KS, Dash D (2006): Conformational flexibility may explain multiple cellular roles of PEST motifs. Proteins 63:727–732. Sasakawa H, Sakata E, Yamaguchi Y, Masuda M, Mori T, Kurimoto E, Iguchi T, Hisanaga SI, Iwatsubo T, Hasegawa M, Kato K (2007): Ultra-high field NMR studies of antibody binding and site-specific phosphorylation of alpha-synuclein. Biochem Biophys Res Commun 363:795–799. Schlessinger A, Yachdav G, Rost B (2006): PROFbval: predict flexible and rigid residues in proteins. Bioinformatics 22:891–893. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, et al. (2007): DisProt: the database of disordered proteins. Nucleic Acids Res 35: D786–793. Simons KT, Kooperberg C, Huang E, Baker D (1997): Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol 268:209–225. Skrabana R, Sevcik J, Novak M (2006): Intrinsically disordered proteins in the neurodegenerative processes: formation of tau protein paired helical filaments and their analysis. Cell Mol Neurobiol 26:1085–1097. 961 962 T H E SI G N I F IC A N C E A N D I M P A C T S OF P R O T E I N D I S O R D E R Q4 Skrabana R, Skrabanova-Khuebachova M, Kontsek P, Novak M (2006): Alzheimer’s-disease-associated conformation of intrinsically disordered tau protein studied by intrinsically disordered protein liquid-phase competitive enzyme-linked immunosorbent assay. Anal Biochem 359:230–237. Su CT, Chen CY, Hsu CM (2007): iPDA: integrated protein disorder analyzer. Nucleic Acids Res 35: W465–472. Tompa P, Szasz C, Buday L (2005): Structural disorder throws new light on moonlighting. Trends Biochem Sci 30:484–489. Torda AE, Scheek RM, Gunsteren WF, (1990): Time-averaged nuclear Overhauser effect distance restraints applied to tendamistat. J Mol Biol 214:223–235. Tsutakawa SE, Hura GL, Frankel KA, Cooper PK, Tainer JA (2007): Structural analysis of flexible proteins in solution by small angle X-ray scattering combined with crystallography. J Struct Biol 158:214–223. Vacic V, Oldfield CJ, Mohan A, Radivojac P, Cortese MS, Uversky VN, Dunker AK (2007): Characterization of molecular recognition features, MoRFs, and their binding partners. J Proteome Res 6:2351–2366. van Oijen AM, Blainey PC, Crampton DJ, Richardson CC, Ellenberger T, Xie XS (2003): Singlemolecule kinetics of lambda exonuclease reveal base dependence and dynamic disorder. Science 301:1235–1238. Vertrees J, Barritt P, Whitten S, Hilser VJ (2005): COREX/BEST server: a web browser-based program that calculates regional stability variations within protein structures. Bioinformatics 21:3318–3319. Vucetic S, Brown CJ, Dunker AK, Obradovic Z (2003): Flavors of protein disorder. Proteins 52:573–584. Wang S, Gu J, Larson SA, Whitten ST, Hilser VJ (2008): Probing the denatured ensemble for fold specifying thermodynamic information. Submitted. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004): Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337:635–645. Weathers EA, Paulaitis ME, Woolf TB, Hoh JH (2004): Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 576:348–352. Weinreb PH, Zhen W, Poon AW, Conway KA, Lansbury PT Jr (1996): NACP, a protein implicated in Alzheimer’s disease and learning, is natively unfolded. Biochemistry 35:13709–13715. Wissmann R, Baukrowitz T, Kalbacher H, Kalbitzer HR, Ruppersberg JP, Pongs O, Antz C, Fakler B (1999): NMR structure and functional characteristics of the hydrophilic N terminus of the potassium channel beta-subunit Kvbeta1.1. J Biol Chem 274:35521–35525. Wissmann R, Bildl W, Oliver D, Beyermann M, Kalbitzer HR, Bentrop D., Fakler B (2003): Solution structure and function of the ‘‘tandem inactivation domain’’ of the neuronal A-type potassium channel Kv1.4. J Biol Chem 278:16142–16150. Wootton JC, Federhen S (1993): Statistics of local complexity in amino-acid-sequences and sequence databases. Comp Chem 17:149–163. Wootton JC, Federhen S (1996): Analysis of compositionally biased regions in sequence databases. Comp Methods Macromol Sequence Anal 266:554–571. Xiao H, Kaltashov IA (2005): Transient structural disorder as a facilitator of protein–ligand binding: native H/D exchange-mass spectrometry study of cellular retinoic acid binding protein I. J Am Soc Mass Spectrom 16:869–879. Yang ZR, Thomson R, McNeil P, Esnouf RM (2005): RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics. Zhang Y, Stec B, Godzik A (2007): Between order and disorder in protein structures: analysis of ‘‘dual personality’’ fragments in proteins. Structure 15:1141–1147. Author Query 1. Please check whether the intended meaning of sentence ‘‘General strategies ... sequence space’’ is retained after the edits. 2. There is a mention of color in the caption of Figure 38.3, but the figure is to be printed in black and white. Kindly amend the text accordingly. 3. Parts a and b are mentioned in the text for Figure 38.5, but the same are not present in the artwork. Please check. 4. Please update the following reference: Wang et al., 2008.