* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download bbr052online 329..336 - Oxford Academic
Paracrine signalling wikipedia , lookup
Biosynthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Biochemistry wikipedia , lookup
Structural alignment wikipedia , lookup
Metalloprotein wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
B RIEFINGS IN BIOINF ORMATICS . VOL 13. NO 3. 329^336 Advance Access published on 19 September 2011 doi:10.1093/bib/bbr052 A practical guide for the computational selection of residues to be experimentally characterized in protein families Alfonso Ben|¤ tez-Pa¤ez, Sonia Ca¤rdenas-Brito and Andre¤s J. Gutie¤rrez Submitted: 23rd May 2011; Received (in revised form) : 2nd August 2011 Abstract In recent years, numerous biocomputational tools have been designed to extract functional and evolutionary information from multiple sequence alignments (MSAs) of proteins and genes. Most biologists working actively on the characterization of proteins from a single or family perspective use the MSA analysis to retrieve valuable information about amino acid conservation and the functional role of residues in query protein(s). In MSAs, adjustment of alignment parameters is a key point to improve the quality of MSA output. However, this issue is frequently underestimated and/or misunderstood by scientists and there is no in-depth knowledge available in this field. This brief review focuses on biocomputational approaches complementary to MSA to help distinguish functional residues in protein families. These additional analyses involve issues ranging from phylogenetic to statistical, which address the detection of amino acids pivotal for protein function at any level. In recent years, a large number of tools has been designed for this very purpose. Using some of these relevant, useful tools, we have designed a practical pipeline to perform in silico studies with a view to improving the characterization of family proteins and their functional residues. This review-guide aims to present biologists a set of specially designed tools to study proteins. These tools are user-friendly as they use web servers or easy-to-handle applications. Such criteria are essential for this review as most of the biologists (experimentalists) working in this field are unfamiliar with these biocomputational analysis approaches. Keywords: functional divergence; residue selection; multiple sequence alignment; mutagenesis; protein families; evolutionary information INTRODUCTION Combining computational and experimental approaches are the current trend in scientific research to solve central cell biology questions. This synergic effort permits a more rational design of scientific studies in favour of bench routines. The best biocomputational contributions to experimentalists are based on the development of tools for the selection and prioritization of the genes, proteins and residues to be examined. Consequently, a selection of candidates (i.e. genes, proteins, residues, etc.) based on rigorous analyses supported by biological and statistical evidence helps save the time and funds invested by wet laboratories. Protein characterization Corresponding author. Alfonso Benı́tez-Páez, Bioinformatic Analysis Group – GABi, Centro de Investigación y Desarrollo en Biotecnologı́a, Bogotá DC, Colombia. Tel/fax: þ34 63355672, E-mail: [email protected] Alfonso Ben|¤ tez-Pa¤ez, PhD in Molecular Biology. Head of the Bioinformatic Analysis Group in Bogotá DC, Colombia; Associate Researcher in the Molecular Genetics Laboratory (CIPF) in Valencia, Spain; research line: evolution and characterization of tRNA-modifying enzymes. Sonia Ca¤rdenas-Brito, MSc in Biology Science. Researcher in the Bioinformatic Analysis Group in Bogotá DC, Colombia; research line: protein sequence. Andre¤s J. Gutie¤rrez, MSc in Biology Science. Researcher in the Bioinformatic Analysis Group in Bogotá DC, Colombia; research line: molecular evolution of proteins, genome structure. ß The Author 2011. Published by Oxford University Press. For Permissions, please email: [email protected] 330 Ben|¤ tez-Pa¤ez et al. frequently involves dissection of residue(s) accomplishing a specific molecular task. In recent years, this field has been supported by computational studies addressing the identification of pivotal residues for a given protein function. Expert scientists in the study and characterization of a specific family of proteins normally obtain in-depth knowledge (after years of experimental research) of the particular and constrained set of residues, which helps classify a protein into a functional family (i.e. catalytic triads, dinucleotide binding motif, G-motifs of GTPases, DEAD-boxes, etc.). Thus, it is quite normal to find studies that examine the function of homologue residues in the different and/or novel members of the given family being examined (i.e. NTPases, oxidoreductases, methyltransferases, helicases, proteases, etc.). Consequently, many studies contain redundant data about equivalent and conserved residues, which carry out the same task in homologue proteins. Moreover, there are several studies that consist in testing the residues that play new molecular roles, such as protein–protein interaction, molecular switches, substrate specificity, etc. In all these studies, the main approach used by experimentalists for residue selection is multiple sequence alignment (MSA). However, the limited knowledge that scientists have about MSA parameter adjustment (i.e. substitution matrix, word, window, algorithm, etc.) can produce poor quality alignments, which do not prove very useful for residue selection purposes. Besides, the use of a low number of sequences in MSAs is also a frequent error, especially when a large family is being studied. Accordingly, residue conservation in a specific position of MSA will probably be true for that small set of analysed proteins, but not for a larger set. In order to overcome such failures and to present further useful computational tools for the selection of undisclosed residues for functional characterizations in any family of proteins, some important considerations in MSA analyses are explained, while other alternative MSA-based methods for the selection of functional residues in a protein are described below. conservation in a protein family, at sequence level, and to reflect the most important residues, as well as the probably indispensable amino acids for the correct function of that family of proteins. However, MSA can change according to the number of sequences, the number of organisms (and variability), the amino acid substitution matrix, and the algorithm being used. Thus, it is important to take into account certain considerations to use an appropriate MSA method. CHOOSING THE RIGHT MSA ALGORITHM Among the different algorithms designed for MSA, accuracy, scalability and computational cost must also be considered. Nowadays, the best method or algorithm to build an MSA must be capable of: (i) processing alignments based on dozens or hundreds of protein sequences; and (ii) being flexible in comparing non-homologue proteins sharing a functional domain. Since ClustalW was published in 1994 [1], it has rapidly become the most widely used MSA algorithm. However, some improvements have been made since then, and more accurate and faster methods have been developed. Some of the current, improved methods include MAFFT [2], PROBCONS [3], T-COFFEE [4] and MUSCLE [5, 6]. In particular, we should use highly accurate MSA methods (i.e. PROBCONS or T-COFFEE) given their implications for subsequent analyses. However, if computational cost is a disadvantage, MUSCLE offers good performance and global balance in accuracy, scalability and computational cost terms [7]. Like most MSA tools, it can be executed locally by downloading a simple file (for the Linux or Windows systems), and by calling the MSA algorithm using a basic command line; alternatively, we can upload an input text file of sequences on the MUSCLE web server at EBI (http://www.ebi.ac. uk/Tools/msa/muscle/), where we can use other reliable tools such as MAFFT, T-COFFEE or KALIGN to perform MSA. Both options retrieve an MSA output file in the FASTA format, which is readable by any MSA viewer. MSA AS CORNERSTONE Most scientific knowledge on the biological understanding of living organisms is based on the comparison of sequences, structures, pathways, reactions, metabolites, etc., to infer functionality by homology. MSA is the simplest tool to detect residue SEQUENCE SELECTION FOR MSA The characterization of the functional residues of a protein is frequently based on a poor MSA analysis with 3–5 homologue sequences (maximum 10) to Residue selection in proteins detect residue constraints. To further exploit MSA and the data acquired from it, we should build an MSA from a representative sample of the entire set of organisms in which the query protein is present. Thus, if we are interested in studying a bacterial family of proteins, we should have at least one representative protein sequence of each bacterial group in accordance with conventional taxonomy (i.e. Alfaproteobacteria, Betaproteobacteria, Delta/ Epsilon, Gammaproteobactaria, Firmicutes, Cyanobacteria, Chloroflexi, etc.). To compile a useful file of sequences for MSA, non-redundant searching in major protein databases must be performed: UNIPROT [8] or GENBANK [9]; both are accessible using the BLAST tool [10] from the NCBI site [http://blast.ncbi.nlm.nih.gov/Blast.cgi]. In addition, we have to select a large number of sequences, preferably more than twenty-five (whenever possible), with the highest possible variability. For this object, the BLAST interface offers an advanced option to limit searches by taxonomy, which includes or excludes groups, families, genera or species. In this way, we should select a representative sequence per genus, or even per family, when organisms are highly related. If there is a low number of sequences to be analysed in the MSA, we could add related sequences by detecting distant homologues following the PSI-BLAST approach [11], construct MSAs with the PROMALS algorithm (http://prodata.swmed.edu/promals/promals.php) [12, 13], or modify the BLAST parameters by employing more permissive BLOSUM matrices [14], such as BLOSUM-45 instead of BLOSUM-62; the latter is used by default in BLAST searches. Alternatively, we can find additional members of the query family by searching in the Pfam database [15], which contains the largest collection of protein families, domains and motifs clustered by sequence conservation and amino acid profiles based on Hidden Markov Model (HMM). If we are interested in studying the residues of a functional domain in the query protein, we can include non-homologue proteins, but those that contain equivalent functional domains in their architecture. Thus, we can apply further phylogenetic and statistical analyses to the MSA, such as those described in the following sections. A quantitative evaluation of sequence selection and variability in MSA can be obtained from the DIVERGE analysis (see below). 331 DETECTING FUNCTIONAL CONSTRAINTS IN MSA By employing user-friendly MSA viewers such as JALVIEW [16, 17] or SEAVIEW [18, 19], we can explore the MSA text file to search the highly conserved positions along the alignment. These conserved positions can be highlighted by colouring according to different criteria, such as percentage of identity. The residues showing whole conservation in the expanded set of sequences used for MSA will provide more information than those from an MSA based on a low number of sequences. In addition to residue conservation, the biochemical nature of residues can also be constrained. Consequently, different positively- (K, R) or negatively (D, E) charged residues will be present in a specific position within the MSA. Conservation is not so easy to observe in a polar or hydrophobic position. If we wish to draw a residue map for each position easy for our eyes and brains to understand, an amino acid profile based on the HMM can be built from MSA [20], interpreted by the HMM-Logo server [21] on its web site or locally by downloading the HMMVE viewer [22]. Good views of amino acid preferential usage per position in MSA can help us understand and recognize most of the relevant residues and those selected by evolution forces. Complementary studies must be carried out if one of the query proteins has a three-dimensional X-ray or NMR structure; otherwise, we could follow several strategies to obtain a protein structure model in which we can localize the selected residues [23]. Thus, each conserved residue/position in the MSA can be localized on the respective protein structure, and some predictions of their respective roles may be stated for subsequent experimental corroboration. These predictions can focus on the residues adjacent to the active site of the query proteins, or even on the protein surface where they are possibly involved in protein–protein interaction (PPI) or in nucleic acid binding. BEYOND THE MSA ANALYSIS Further analyses can address the detection of those residues predicted to be involved in a specific task. One of these methods is the In Silico Two Hybrid method (I2H), which predicts PPIs based on the amino acid-correlated changes/mutations between interacting proteins [24]. Moreover, other methods have been developed to detect altered functional 332 Ben|¤ tez-Pa¤ez et al. Figure 1: General pipeline for the selection of residue candidates for experimentally functional characterization. The steps and paths in the pipeline are outlined. There are steps that are connected by grey arrows in which, in some cases, the required computational tools are cited. The blue arrows in the boxes show three major computational methods to select residues (Functional Constraints, Functional Divergence and PCA), and show the localization of the hypothetical candidate residues to be analysed in each case. constraints, which directly compare different clusters (or subfamilies) of a protein family (or superfamily). In this brief guide, we focus on the last methodology because its permits MSAs to be explored in depth and to extract the evolutionary information deriving from speciation/specialization after gene duplication. The following methods have been illustrated in Figure 1 to provide a better understanding. WHAT DOES THE FUNCTIONAL DIVERGENCE ANALYSIS CONFER? The residue constraints extracted from an MSA can be altered if we add other functional-related proteins to the original MSA. These new proteins incorporated into this MSA must essentially be the paralogues that emerge after gene duplication during evolution. They are probably specialized in other molecular functions, but retain both high sequence identity and similarity to the original query protein(s) because of their closely related function. At this point, we have two different clusters belonging to a common evolution-related family, or superfamily, of proteins; a direct comparison made among all the sequences present in both clusters will provide valuable ways to detect those residues of functional relevance which are difficult to disclose by direct MSA observation in viewers, but are easier to detect through phylogenetic and statistical analyses [25–27]. The DIVERGE tool [28] (web site: http://www.xungulab.com) offers a complete interface to manage phylogeny, MSA and protein structure data. Its aim is to study type-I (or the site-specific rate shift of amino acids) and type-II Residue selection in proteins (or the radical shift of the amino acid property) functional divergence [25, 27]. This methodology has provided good results in the different protein families to which it has been applied [29–31]. Although DIVERGE does not define a specific function for the residues selected in the query protein, it finds the residues that most probably act as functional determinants, which are ranked according to statistical evaluation. We can modulate such statistical significance by increasing the cut-off value (posterior probability score 0.85) to rescue residues under a strong functional divergence, together with a function assignment, in accordance with our expertise and structural data. The results obtained with this approach will largely depend on the MSA analysis where the selection of sequences and its variability are critical. For the quantitative proposes of this step, DIVERGE retrieves the coefficient of functional divergence (y) after each pair-wise cluster comparison. As a result, a y value higher than zero suggests that a significantly altered functional constraint has occurred between a pair of clusters. Then if y does not differ from zero, this means that the divergence in MSA and in each cluster of proteins does not suffice to detect functional constraints. Therefore, we should attempt to make comparisons between different clusters or acquire new sequences to be studied. DIVERGE results are also influenced by phylogenetic tree construction methods. In that way, a reliable and accurate phylogenetic tree have to be constructed prior to this analysis. Given that tree construction is essential for any DIVERGE analysis, we should follow a PROTTEST approach [32] to determine the best evolutionary model for the set of query proteins. This web server (http:// darwin.uvigo.es/software/prottest.html) only requires an MSA file to be used to test more than 60 different evolutionary models in the protein family. The selection of a specific model explaining molecular variation in this set of proteins will be based on ‘goodness-and-fit’ measures such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC), which will represent the uncertainty of all the models tested and the relevance of different model parameters (invariable sites: þI; amino acid categories: þG; amino acid frequencies: þF) in the evolution of query proteins. After testing all the different evolutionary models, PROTTEST retrieves a rank of models according to the AIC or BIC values. Consequently, the 333 evolutionary model with the lowest AIC or BIC value will be selected to draw the evolutionary tree of proteins. Despite having obtained a reliable tree from the PROTTEST for DIVERGE analysis, the DIVERGE suite offers a recommended option to enhance protein clustering when users are not familiar with tree presentation or if users are using an input tree that contains unsolved relationships. In such cases, the re-rooting of an input tree is a convenient way to select gene clusters before proceeding to the functional divergence analysis. By using the Neighbor-Joining option by selecting the ‘NJ Tree-Making’ button, the software will employ the Neighbor-Joining algorithm to quickly generate a tree based on the desired distance measure (i.e. p-Distance, Poisson or Kimura). Among the different distance calculation options, we should preferably employ those using corrected distances for multiple substitutions per site, such as the Poisson or Kimura algorithms. Once the tree has been re-rooted by DIVERGE, we can cluster all the proteins in well-defined groups, preferably with more than four sequences per cluster/group (i.e. bacteria versus mammals; or g-proteobacteria versus a-proteobacteria for the proteins restricted to bacteria). At least two clusters are required to perform this analysis. However, if multiple clusters are selected, a pair-wise comparison should always be performed. Then we can compute types I and II functional divergence to extract the residues that are more likely to be under functional divergence. In addition to the cluster comparison, a more specific intra-cluster analysis to detect functional divergence can be achieved according to phylogenetic distribution (see Supplementary Figure S1; where functional aspects of the characterised MnmC family of proteins [see ref 47–52] are outlined). Thus, a residue showing gain-of-function in some species within a particular analysed cluster can be easily observed by residue fixation in these sequences, but not in others belonging to the same protein cluster. This additional approach requires the extensive usage of highly related sequences instead of the diversity required for the above-mentioned cluster comparison. Furthermore, a cut-off value (or posterior probability) to select residues under functional divergence must be relaxed because the estimated variability of an amino acid per site is lower at this level of comparison. 334 Ben|¤ tez-Pa¤ez et al. COMPLEMENTARY ANALYSES Despite having focused on a well-known method to detect the cluster-dependent amino acid conservation based on phylogenetic and statistical approaches, other alternative methods have been widely studied and successfully tested to functionally predict important residues. These methods include multivariate and related approaches that help predict pivotal residues when the functional and phylogenetic relationship is ambiguous [33–37]. Some methods explore the sequence information content of MSA to create a vector transformation where residues are evaluated according to their two- or three-dimensional clustering [33, 35, 37], while others use robust information from three-dimensional structures to predict the functional interfaces and residues involved in ligand-binding sites, protein–protein or protein–nucleic acid interactions, or ligand specificity [34, 36]. Notwithstanding, the gap between the available number of sequences and the three-dimensional structures of proteins can limit the use of the latter method type. For the purpose of performing an additional analysis that is complementary to the functional divergence method extensively described as the scope of this guide, we can use the TreeDet web server (http://treedetv2.bioinfo.cnio.es/treedet/index. html) to predict the functional sites in a family of proteins [38]. TreeDet integrates related methods using a principal component analysis (PCA) to study the ‘Sequence Space’ of an input MSA. It also predicts functional sites after their representation in a multidimensional plot and statistical evaluation. The TreeDet authors encourage us to submit MSA from MUSCLE, T-COFFEE and PROBCONS algorithms, all of which are briefly described in earlier sections of this guide. EXPERIMENTAL TESTING OF SELECTED RESIDUES After completing the various steps stated throughout this guide for residue selection as regards functional characterization in a family of proteins, it is important to filter the set of selected residues for experimental testing. Accordingly, we should reduce the number of possible protein mutants to be studied by the directed mutagenesis approach. Then, we should focus on those residues with greater statistical and biological significance, otherwise the design and management of several mutants can result in a tedious and expensive experimental task that is time-consuming, and can even lead to ambiguous results, which are difficult to explain by the multiple mutants and functionalities, explored. It is also necessary to consider appropriate amino acid changes. Traditionally, alanine scanning has been the initial approach followed to study the relevance of a specific residue; nonetheless, the function of a mutant protein will notably fluctuate in accordance with the amino acid selected to replace the targeted one. Consequently, electrostatic charge changes will help better elucidate the role of a positively- or negatively charged residue [39]. Likewise, polarity changes shed light on the role played by hydrophobic residues [40]. PPIs, or pocket binding sites, could also be studied by applying the aforementioned criteria. Nevertheless, steric hindrance by replacing small side-chain amino acids with those that have bulky side-chains can impair PPIs with single amino acid substitutions [41, 42]. For this case, we may consider that the protein interactions involving other larger molecules such as DNA or RNA could be supported by several interaction sites. Therefore, multiple mutants will be necessary to notably impair interaction [43], even when considering steric hindrance in all the mutated positions. EVALUATION OF SELECTED MUTATIONS At this point, we refine a list of residues that are probably involved in functions such as catalysis, cofactor binding, proton donation and in the mediation of the PPIs obtained by applying MSA, functional divergence, phylogenetic and/or PCA-related methods. Moreover, another analysis can be done to describe the thermodynamic implications of the desired mutational changes on protein structure stability before proceeding to mutagenesis and experimental studies. The CUPSAT server [44] (http://cupsat.tu-bs.de) is a tool that predicts changes in protein stability upon point mutations. The CUPSAT evaluation of protein mutants requires the input files from PDB, as well as custom-developed protein structures. Its prediction model uses: (i) protein–environment potentials to predict protein stability; (ii) atom potentials; using the 40 amino acid atom classes from the Melo-Feytmans model [45, 46]; (iii) the torsion angles ’ and c; and (iv) the Gaussian apodization Residue selection in proteins function to adjust torsion angle perturbation in mutants. As a result, we can obtain a friendship list of the most probable and favourable amino acid changes given in ‘overall stability’, ‘torsion’, and ‘protein thermodynamics’ terms. Consequently, at the end stage of this analysis, we will have obtained strong computational evidence for both residues and probable mutations to support an experimental and practical approach to study the functional relevance of the different residues in any family of proteins. References SUPPLEMENTARY DATA 4. Supplementary data are available online at http:// bib.oxfordjournals.org/. Key points The considerations, issues, tools and suggestions reviewed in this guide have been addressed to improve the functional characterization of residues in any family of proteins under study by experimentalists. The general and central considerations of the MSA analysis have been reviewed for the purpose of constructing better and more informative alignments of protein families; simultaneously, different tools have been exposed according to user requirements. The different approaches presented herein help to shed light on the extraction of the evolutionary and functional information derived from sequence comparisons. A methodical usage of these approaches will permit a more rational design of experimental studies, which will be supported by stronger biological and statistical evidence. The aim of this review is to also promote the development of reliable, useful and combined computational bench routines given that the design of studies with more biological and statistical evidence prior to experimental testing could save time and funds, thus enabling more efficient research. Finally, the aim of the step-by-step design of this workflow is to encourage experimental biologists to learn and to acquire in-depth knowledge on biocomputacional analyses, tools and methods which could permit flexible analyses according to user demands and the complexity of the query proteins without potential black box results assumed by the users. Acknowledgements The authors thank the Centro de Investigación y Desarrollo en Biotecnologı́a for its support in the implementation, edition and publication of this guide; they also thank M.E. Armengod, I. Moukadiri and M.J. Garzón for inviting us to collaborate in their MnmC project. Finally, they particularly thank the editors and peer reviewers of this guide for their constructive criticisms that have helped improve the quality of this review. 1. 2. 3. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. FUNDING This study was supported by Centro de Investigación y Desarrollo en Biotecnologı́a - CIDBIO in Bogotá DC, Colombia. 20. 335 Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994;22:4673–80. Katoh K, Misawa K, Kuma K, et al. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002;30: 3059–66. Do CB, Mahabhashyam MS, Brudno M, et al. ProbCons: probabilistic consistency-based multiple sequence alignment. Genome Res 2005;15:330–40. Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000;302:205–17. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 2004;5:113. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004; 32:1792–7. Edgar RC, Batzoglou S. Multiple sequence alignment. Curr Opin Struct Biol 2006;16:368–73. Apweiler R, Martin MJ, O’Donovan C, etal. The Universal Protein Resource (UniProt). Nucleic Acids Res 2010;38: D142–8. Benson DA, Karsch-Mizrachi I, Lipman DJ, et al. GenBank. Nucleic Acids Res 2010;38:D46–51. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol 1990;215:403–10. Altschul SF, Madden TL, Schaffer AA, etal. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402. Pei J, Grishin NV. PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 2007;23:802–08. Pei J, Kim BH, Tang M, et al. PROMALS web server for accurate multiple protein sequence alignments. Nucleic Acids Res 2007;35:W649–52. Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 1992;89: 10915–9. Finn RD, Mistry J, Tate J, et al. The Pfam protein families database. Nucleic Acids Res 2010;38:D211–22. Clamp M, Cuff J, Searle SM, et al. The Jalview Java alignment editor. Bioinformatics 2004;20:426–7. Waterhouse AM, Procter JB, Martin DM, et al. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics 2009;25:1189–91. Galtier N, Gouy M, Gautier C. SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci 1996;12: 543–8. Gouy M, Guindon S, Gascuel O. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010; 27:221–4. Eddy SR. Hidden Markov models. Curr Opin Struct Biol 1996;6:361–5. 336 Ben|¤ tez-Pa¤ez et al. 21. Schuster-Bockler B, Schultz J, Rahmann S. HMM Logos for visualization of protein families. BMC Bioinformatics 2004;5:7. 22. Dai J, Cheng J. HMMEditor: a visual editing tool for profile hidden Markov model. BMC Genomics 2008;9(Suppl 1):S8. 23. Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009;19:145–55. 24. Pazos F, Valencia A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 2002;47:219–27. 25. Gu X. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 1999;16:1664–74. 26. Gu X. Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol 2001;18: 453–64. 27. Gu X. A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Mol Biol Evol 2006;23:1937–45. 28. Gu X, Vander Velden K. DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family. Bioinformatics 2002;18:500–1. 29. Benitez-Paez A, Cardenas-Brito S. Dissection of functional residues in receptor activity-modifying proteins through phylogenetic and statistical analyses. Evol Bioinform Online 2008;4:153–69. 30. Zheng Y, Xu D, Gu X. Functional divergence after gene duplication and sequence-structure relationship: a case study of G-protein alpha subunits. J Exp Zool B Mol Dev Evol 2007; 308:85–96. 31. Zhou H, Gu J, Lamont SJ, et al. Evolutionary analysis for functional divergence of the toll-like receptor gene family and altered functional constraints. J Mol Evol 2007;65: 119–123. 32. Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 2005; 21:2104–5. 33. Casari G, Sander C, Valencia A. A method to predict functional residues in proteins. Nat Struct Biol 1995;2:171–8. 34. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 1996;257:342–58. 35. Pazos F, Rausell A, Valencia A. Phylogeny-independent detection of functional residues. Bioinformatics 2006;22: 1440–8. 36. Rausell A, Juan D, Pazos F, et al. Protein interactions and ligand binding: from protein subfamilies to functional specificity. Proc Natl Acad Sci USA 2010;107:1995–2000. 37. Wallace IM, Higgins DG. Supervised multivariate analysis of sequence groups to identify specificity determining residues. BMC Bioinformatics 2007;8:135. 38. Carro A, Tress M, de Juan D, et al. TreeDet: a web server to explore sequence space. Nucleic Acids Res 2006;34:W110–5. 39. Fornek JL, Gillim-Ross L, Santos C, et al. A singleamino-acid substitution in a polymerase protein of an H5N1 influenza virus is associated with systemic infection and impaired T-cell activation in mice. J Virol 2009;83: 11102–15. 40. Knez J, Bilan PT, Capone JP. A single amino acid substitution in herpes simplex virus type 1 VP16 inhibits binding to the virion host shutoff protein and is incompatible with virus growth. J Virol 2003;77:2892–902. 41. Banerjee M, Balaram H, Balaram P. Structural effects of a dimer interface mutation on catalytic activity of triosephosphate isomerase. The role of conserved residues and complementary mutations. FEBS J 2009;276:4169–83. 42. Mallam AL, Jackson SE. The dimerization of an alpha/ beta-knotted protein is essential for structure and function. Structure 2007;15:111–22. 43. Osawa T, Ito K, Inanaga H, et al. Conserved cysteine residues of GidA are essential for biogenesis of 5-carboxymethylaminomethyluridine at tRNA anticodon. Structure 2009;17:713–24. 44. Parthiban V, Gromiha MM, Schomburg D. CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res 2006;34:W239–42. 45. Melo F, Feytmans E. Novel knowledge-based mean force potential at atomic level. J Mol Biol 1997;267:207–22. 46. Melo F, Feytmans E. Assessing protein structures with a non-local atomic interaction energy. J Mol Biol 1998;277: 1141–52. 47. Bujnicki JM, Oudjama Y, Roovers M, etal. Identification of a bifunctional enzyme MnmC involved in the biosynthesis of a hypermodified uridine in the wobble position of tRNA. RNA 2004;10:1236–42. 48. Moukadiri I, Garzón MJ, Benitez-Paez A, et al. Biochemical and functional characterization of domains MnmC1 and MnmC2 of the E. coli MnmC bifunctional enzyme. 23rd tRNA Workshop, Aveiro, Portugal, 2010. 49. Moukadiri I, Prado S, Piera J, etal. Evolutionarily conserved proteins MnmE and GidA catalyze the formation of two methyluridine derivatives at tRNA wobble positions. Nucleic Acids Res 2009;37:7177–93. 50. Pearson D, Carell T. Assay of both activities of the bifunctional tRNA-modifying enzyme MnmC reveals a kinetic basis for selective full modification of cmnm5s2U to mnm5s2U. Nucleic Acids Res 2011;39:4818–26. 51. Roovers M, Oudjama Y, Kaminska KH, et al. Sequence-structure-function analysis of the bifunctional enzyme MnmC that catalyses the last two steps in the biosynthesis of hypermodified nucleoside mnm5s2U in tRNA. Proteins 2008;71:2076–85. 52. Murakami Y, Jones S. SHARP2: protein–protein interaction predictions using patch analysis. Bioinformatics 2006; 22:1794–5.