Download Publication: Sequence Analysis of Holins by Reduced Amino Acid

Saravanan, J Appl Bioinform Comput Biol 2015, 4:3 http://dx.doi.org/10.4172/2329-9533.1000120 Journal of Applied Bioinformatics & Computational Biology Research Article a SciTechnol journal Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches Konda Mani Saravanan* Abstract Objective: Holins are small proteins which perform many important functions in the cytoplasmic membrane of the cell. There is no crystal structure of holins reported in Protein Data Bank and hence computational sequence analysis is the only alternative to understand structure and functional consequences of these proteins. In the present work, we engaged several careful computational procedures to explore the important amino acid residues responsible for functioning of holins on membranes. Methods To explore role of amino acid residues in holins, we used reduced amino acid alphabet model by reducing twenty amino acids to fifteen. Transmembrane regions in holin sequences are extracted and subjected to multiple sequence alignment to bring out the role of conserved amino acid residues. Further transmembrane regions in holins are permutated to different possible positions by keeping loops as static to understand the role of transmembrane and nontransmembrane regions. Results We found that the reduced amino acid alphabet model is successful, when no relationship is established between the proteins belonging to similar families. Also, the important physico-chemical properties conserved in the non-redundant holin sequences is explored in detail by computing correlation coefficients. Permutation of transmembrane regions in holins and database search showed that the holin sequence composition and arrangement is unique to perform its specific function. Conclusion Analysis presented in this paper reveal the vital role of each and every amino acid residue in the holin and this may help to accurately model the structure to understand the sequence-structure-function relationship of holins on the membrane. Keywords Holins; Sequence alignment; Physico-chemical Permutation experiments; Consensus sequence properties; *Corresponding author: Konda Mani Saravanan, Centre of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai 625 021, Tamilnadu, India, E-mail: [email protected] Received: October 15, 2015 Accepted: November 15, 2015 Published: November 22, 2015 International Publisher of Science, Technology and Medicine Introduction Holins are small membrane proteins responsible for disrupting the cytoplasmic membrane of bacteria to release endolysins which hydrolyze the cell wall and induce cell death [1]. The holin genes are encoded in the genome of bacteriophages to mainly control the phage infection cycle. These genes play two important roles; one is to release the endolysin and other is to determine the timing of the end of infection cycle [2,3]. More than hundred families of holin functional genes have been characterized by defining about thirty orthologous groups. Due to the high sequence divergence of holins, it is usually grouped into two classes based on common structural motifs. Class I holins have three predicted transmembrane regions where as class II holins are shorter than other class with two transmembrane regions [4]. The amino terminal of class I holins span the membrane in periplasm and its carboxy terminal in the cytoplasm whereas class II holins has both its amino and carboxy terminal on the cytoplasm [5]. By looking the literature carefully, it is noted that the crucial function of holin at the structural level and mechanistic level is investigated very little. In other words, the nature of the lethal lesion caused by holins in the process of lysis is still not clear [6]. As mentioned above, the function of holin in the phage vegetative cycle was not investigated in deep until phage systems dominated molecular biology. In order to study structure-function relationship of holins, several sequence characteristic properties were used such as location of genes, presence of two adjacent hydrophobic transmembrane regions, dual start motif and highly charged hydrophilic C-terminal domains respectively [7-9]. There are exceptions of above sequence characteristics which may still act as a putative holin [10-14]. Holins exist in many unrelated protein families in terms of sequence and membrane topology, suggesting that the holins have evolved independently. In other words, holins are identified as domains (Transmembrane domains) in a variety of protein families and furthermore, this would imply a single lineage of evolution from a common ancestral (phage) that has been horizontally transferred [14]. Computational analysis of a set of similarly folded proteins with distinct amino acid sequences can help in identifying residues and regions of polypeptide chains that are likely to be important in the protein folding and function [15,16]. The amino acid residues in a protein contribute to different extents in coding a particular fold to perform its function [17]. The physico-chemical and conformational properties of amino acid residues at core of transmembrane, terminal and loop regions are important for its structure and function [18-20]. In order to study non-related sequences adopting similar 3D folds, a reduced amino acid alphabet approach is used [21]. A reduced amino acid alphabet model is possible by grouping twenty amino acids into a smaller number of representative residues with similar features [22]. In the present work, we have considered non-redundant dataset of holin and carried out careful detailed sequence analysis to uncover the conserved and contrasting properties of amino acid residues in the transmembrane and non-transmembrane regions. Material and Methods Dataset and sequence alignment We have considered 48 non-redundant sequences of holins in All articles published in Journal of Applied Bioinformatics & Computational Biology are the property of SciTechnol, and is protected by copyright laws. Copyright © 2015, SciTechnol, All Rights Reserved. Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput Biol 4:3. doi:http://dx.doi.org/10.4172/2329-9533.1000120 bacteriophages. They share less than 40% sequence identity with each other in the dataset. We considered two major classes of holins which contains two transmembrane regions or with three transmembrane regions. By using SOSUI web server [23], we have extracted sequences of transmembrane regions. We have used reduced amino acid alphabet model proposed by Beckstette et al. [22], where they reduced twenty amino acid residues to fifteen. Figure 1 reproduced from Beckstette et al., [22] shows the reduced amino acid alphabet model adopted in this work. We have used multiple sequence alignment program ‘Multialin” to align transmembrane regions [24]. Since, our research group is working experimentally (by X-ray crystallography) to solve the structure of holin with three transmembrane segments, we have considered a holin with three transmembrane regions and permutated the first, second and third regions to different positions (generated 6 sequences by altering TM regions by permutation). We have built hmm profile by using HMMer tool [25] for a three transmembrane holin sequence and searched against Pfam [26] and COG genome databases [27]. By using the six permutated holin sequences, we have derived a consensus sequence based on multiple sequence alignment to perform further computations to find whether the transmembrane or non-transmembrane regions of holins play vital role in folding and formation of hole like structure. Computation of amino acids properties correlation coefficient We have made use of forty eight kinds of diverse set of physical, chemical, energetic and conformational properties derived from folded native conformation of proteins which is given in a paper by Gromiha et al. [28]. The list of forty eight diverse physicochemical properties used in the present work is shown in Table 1. We computed cross correlation coefficient by substituting sequence of numerical values which represents any one of the above diverse set of property in the place of amino acid sequence of target and template sequences. Calculations of average correlation coefficients using a set of properties were found to improve the signal noise ratio and in our calculation average cross correlation coefficient were also computed. A quantitative expression of homology between two amino acid sequences X and Y is obtained by computation of cross correlation coefficient described below. The coefficient C ( j ) at the jth residue of the sequence Y is expressed by comparing a sequence of N residues long, which starts at the uth residue and ends at the (u+N)th residue in the sequence X with the sequence Y from the jth residue to the (u+N) th residue. ∑ i =1 ( X (u + i − 1)− < X >)(Y(j+ i − 1)− < Y >) N C(j) = N [{∑ i =1 (X(u + i − 1)− < X >) 2 }{∑ (Y ( j + i − 1)− < Y > 2 )}]1/2 N i =1 Where N = < X > 1/ N (∑ X (u + i − 1)) i =1 N < Y >= 1/ N ∑ Y (j = i − 1)) i =1 Here X (u + i – 1) is the index value of an amino acid at the position (u + i – 1) in X and Y (j + i – 1) at the position (j + i – 1) in Y. The percentage of occurrences of the correlation coefficient greater than 0.5 for each property is also computed. The whole computation process have been carried out and automated by using an in house PERL program. Volume 4 • Issue 3 • 1000120 Figure 1: Reduced amino acid alphabet model (figure reproduced from Beckstette et al. 2006). Table 1: Forty eight diverse kind of physico-chemical properties. S.No Physico-chemical properties of amino acid residues 1 Compressibility 2 Thermodynamic transfer hydrophobicity 3 surrounding Hydrophobicity 4 Polarity 5 Isoelectric point 6 Equilibrium constant with reference to ionization property of COOH group 7 Molecular weight 8 Bulkiness 9 Chromotographic index 10 Refractive index 11 Normalized consensus hydrophobicity 12 short and medium range non bonded energy 13 Long-range non bonded energy 14 Total non-bonded energy 15 Alpha helical tendency 16 Beta structure tendency 17 Turn forming tendencies 18 Coil forming tendency 19 Helical contact area 20 Mean RMS fluctuational displacement 21 Burriedness 22 Solvent Accessible reduction ratio 23 Average number of surrounding residues 24 Power to be at the N-terminal 25 C-terminal 26 Middle of alpha helix 27 Partial specific volume 28 Average medium range contacts 29 Average number of long-range contacts 30 Combined surrounding hydrophobicity 31 Solvent Accessible Surface area of denatured 32 Native 33 Unfolding 34 Gibbs free energy change of hydration for unfolding 35 denatured 36 native 37 Unfolding enthalpy change of hydration 38 Unfolding entropy change of hydration 39 Unfolding hydration heat capacity change 40 Unfolding Gibbs free energy 41 Unfolding enthalpy 42 Unfolding entropy changes 43 Unfolding Gibbs free energy change 44 Unfolding enthalpy change 45 Unfolding entropy changes 46 Volume 47 Shape 48 Flexibility • Page 2 of 5 • Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput Biol 4:3. doi:http://dx.doi.org/10.4172/2329-9533.1000120 Results transmembrane region aligns with Phage holin4. Third TM aligns with Flavi_NS4A Flavivirus non-structural protein NS4A. Sequence properties of Holins By aligning the transmembrane regions of holins with Protein Data Bank, we have noted that the absence of similar sequences in the database (average sequence identity is 35%). We observed conservation of amino acid residue D in first and second transmembrane region and amino acid S in third transmembrane segment. Conserved motifs are observed in the first transmembrane region (LXXL) and in third transmembrane region (EXXS) which is shown in Figure 2 (a, b, c). Then, we have shuffled the three transmembrane regions in the sequences by keeping loops as constant to generate six permuted sequences like 123, 213, 312, 231, 321 and 132 respectively. The numbers above indicate the order of transmembrane regions. Figure 3 shows the alignment of six permutated sequences and its consensus. While searching the reference protein (123) in the translated genome databases and Protein Data Bank, we found some hypothetical proteins as hits whereas there are no hits while searching Protein Data Bank. It should be noted that there are even no hits for the reference protein (Permutated) in PDB. In the case of permutated sequence (213), an uncharacterized protein has significant alignment at first transmembrane region with sequence identity 20.1%. For permutated sequence (312), second While searching in translated genomes, ABC-type metal ion transport system, permease component aligns with holin with 24% sequence identity. In the case of permutated sequence (231), most of the regions in this sequence align with uncharacterized proteins. First transmembrane region aligns with AhpA Uncharacterized membrane protein affecting hemolysin expression. For Permutated Sequence (321), first transmembrane region aligns with small hydrophobic integral membrane protein. First and second transmembrane region aligns with ProW ABC-type proline/glycine betaine transport system, permease component. The permutated sequence (132) aligns with Macoilin transmembrane protein and most of the regions align with uncharacterized protein. Although the sequence identities and e-values obtained for hits are quite low (<24%) and the length of alignment is so short which do not infer homology. The observed sequence identity has no biological context. Correlation coefficient of 48 amino acid properties between targets and templates The pairwise alignment of holins against PDB is given in Supplementary Material. The results suggest that the relationship between diverse set of properties of amino acid residues between Figure 2: Multiple sequence alignment of transmembrane regions. Volume 4 • Issue 3 • 1000120 • Page 3 of 5 • Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput Biol 4:3. doi:http://dx.doi.org/10.4172/2329-9533.1000120 Figure 3: Multiple sequence alignment of permutated holins and its consensus sequence. Figure 4: Correlation coefficient of 48 amino acid properties. holins and their templates are very crucial while selecting template or fold from fold library for a holin with very low sequence identity. The average correlation coefficients of 48 properties between holins and their templates for each property are shown in Figure 4. The properties such as polarity, bulkiness, chromotographic index, total non-bonded energy, mean RMS fluctuational displacement, burriedness, average number of surrounding residues, solvent accessible surface area of native and unfolding states, unfolding hydration heat capacity change and unfolding entropy changes respectively have average correlation coefficient greater than 0.5. The properties like equilibrium constant with reference to ionization property, consensus hydrophobicity, alpha helix forming tendency, power to be at the middle of alpha helix, average number of medium range contacts, gibbs free energy change of hydration for unfolding and denatured state and shape respectively have poor correlation coefficient less than 0.4. The figure clearly shows only marginally increased values for those >0.5, whereas the ones <0.5 go as low as 0.2. A statistical analysis of data shows that about 0.68 correlation coefficients deviate from 0.5 for the other amino acids for the holins. Currently, the data suggests that the values aren’t significantly above 0.5, indicating that there is no (statistically significant) correlation between the 48 holin amino acids and their templates. Conclusion From our analysis, we have shown several interesting observations. By using reduced amino acid alphabet model, we found two common motifs in the first transmembrane region (LXXL) and in third transmembrane region (EXXS). Another interesting observation is construction of a consensus holin from the alignment of permutated sequences. Database search of consensus holin sequence shows weak homology against transporting proteins. Most of the regions of permutated sequences align with uncharacterized or Volume 4 • Issue 3 • 1000120 hypothetical proteins. Through the paper, we show that there are no holins with permuted arrangements and the sequences of holins are unique compared than that of others. Acknowledgements The author thanks Department of Biotechnology for providing Computational facilities in the form of Bioinformatics Centre at the Madurai Kamaraj University, Madurai, India. He also thanks the University Grants Commission for the award of Dr D.S. Kothari Post Doctoral Fellowship [grant number F.13-932/2013(BSR)] and Dr. S. Krishnaswamy, Retired senior professor at Madurai Kamaraj University under whom the author is working. References 1. Young R (2014) Phage lysis: three steps, three choices, one outcome. J Microbiol 52: 243-258. 2. Reddy B, Saier MH Jr (2013) Topological and phylogenetic analyses of bacterial holin families and superfamilies. Biochim Biophys Acta 1828: 26542671. 3. Young R (1992) Bacteriophage lysis: mechanism and regulation. Microbiol Rev 56: 430-481. 4. Wang IN, Smith DL, Young R (2000) Holins: the protein clocks of bacteriophage infections. Annu Rev Microbiol 54: 799-825. 5. Ramanculov E, Young R (2001) Genetic analysis of the T4 holin: timing and topology. Gene 265: 25-36. 6. Young R (2002) Bacteriophage holins: Deadly diversity. J Mol Microbiol Biotechnol 4: 21-36. 7. Loessner MJ, Wendlinger G, Scherer S (1995) Heterogeneous endolysins in listeria monocytogenes bacteriophages: a new class of enzymes and evidence for conserved holin genes within the siphoviral cassettes. Mol Microbiol 16: 1231-1241. 8. Young R, Blasi U (1995) Holins: form and function in bacteriophage lysis. FEMS Microbiol Rev 17: 191-205. 9. Blasi U, Young R (1996) Two beginnings for a single purpose: the dual start holins in the regulation of phage lysis. Mol Microbiol 21: 675-682. • Page 4 of 5 • Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput Biol 4:3. doi:http://dx.doi.org/10.4172/2329-9533.1000120 10.Loessner MJ, Gaeng S, Wendlinger G, Maier SK, Scherer S (1998) The two component lysis system of Staphylococcus aureus bacteriophage twort: a large TTG-start holin and an associated amidase endolysin. FEMS Microbiol Lett 162: 265-274. 11.White R, Tran TAT, Dankenbring CA, Deaton J, Young R (2010) The N-terminal transmembrane domain of λ S is required for holin but not antiholin function. J Bacteriol 192: 725-733. 12.Park T, Struck DK, Deaton JF, Young R (2006) Topological dynamics of holins in programmed bacterial lysis. Proc Natl Acad Sci USA 103: 1971319718. 13.Savva CG, Dewey JS, Deaton J, White RL, Struck DK, et al. (2008) The holin of bacteriophage lambda forms rings with large diameter. Mol Microbiol 69: 784-793. 14.Srividhya KV, Alaguraj V, Poornima G, Kumar D, Singh GP, et al. (2007) Identification of prophages in bacterial genomes by dinucleotide relative abundance difference. Plos One 2: e1193. 15.Saravanan KM, Selvaraj S (2009) Analysis and visualization of long-range contact networks in homologous families of proteins. The Open Struct Biol 3: 104-125. 16.Saravanan KM, Balasubramanian H, Nallusamy S, Samuel S (2010) Sequence and structural analysis of two designed proteins with 88% identity adopting different folds. Protein Eng Des Sel 23: 911-918. 17.Saravanan KM, Selvaraj S (2012) Search for identical octapeptides in unrelated proteins: Structural plasticity revisited. Peptide Sci 98: 11-26. 18.Saravanan KM, Krishnaswamy S (2015) Analysis of dihedral angle preferences of alanine and glycine residues in alpha and beta transmembrane regions. J Biomol Struct Dyn 33: 534-551. 19.Rishyakulya MC, Saravanan KM (2015) Computational structural analysis of C-terminal residues in proteins containing transmembrane regions. Int J Comp Biol 4: 44-54. 20.Saravanan KM, Selvaraj S (2015) Better theoretical models and protein design experiments can help to understand protein folding. J Nat Sci Biol Med 6: 202-204. 21.Etchebest C, Benros C, Bornot A, Camproux AC, De Brevern AG (2007) A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36: 1059-1069. 22.Beckstette M, Homann R, Giegerich R, Kurtz S (2006) Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 7: 389. 23.Hirokawa T, Boon-Chieng S, Mitaku S (1998) SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics 14: 378-379. 24.Corpet F (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res 16: 10881-10890. 25.Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39: W29. 26.Bateman A, Coin L, Durbin R, Finn RD, Hollich V, et al. (2004) The Pfam protein families database. Nucleic Acids Res 32: D138-D141. 27.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4: 41. 28.Gromiha MM, Oobatake M, Sarai A (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82: 51-67. Author Affiliation Top Centre of Excellence in Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai 625 021, Tamilnadu, India Submit your next manuscript and get advantages of SciTechnol submissions 50 Journals 21 Day rapid review process 1000 Editorial team 2 Million readers Publication immediately after acceptance Quality and quick editorial, review processing Submit your next manuscript at ● www.scitechnol.com/submission Volume 4 • Issue 3 • 1000120 • Page 5 of 5 •

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Publication: Sequence Analysis of Holins by Reduced Amino Acid