Download Publication: Sequence Analysis of Holins by Reduced Amino Acid

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Magnesium transporter wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Peptide synthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Hepoxilin wikipedia , lookup

Metabolism wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Metalloprotein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Point mutation wikipedia , lookup

Biosynthesis wikipedia , lookup

Genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
Saravanan, J Appl Bioinform Comput Biol 2015, 4:3
http://dx.doi.org/10.4172/2329-9533.1000120
Journal of Applied
Bioinformatics &
Computational Biology
Research Article
a SciTechnol journal
Sequence Analysis of Holins by
Reduced Amino Acid Alphabet
Model and Permutation
Approaches
Konda Mani Saravanan*
Abstract
Objective: Holins are small proteins which perform many
important functions in the cytoplasmic membrane of the cell.
There is no crystal structure of holins reported in Protein Data
Bank and hence computational sequence analysis is the only
alternative to understand structure and functional consequences of
these proteins. In the present work, we engaged several careful
computational procedures to explore the important amino acid
residues responsible for functioning of holins on membranes.
Methods
To explore role of amino acid residues in holins, we used reduced
amino acid alphabet model by reducing twenty amino acids to
fifteen. Transmembrane regions in holin sequences are extracted
and subjected to multiple sequence alignment to bring out the role
of conserved amino acid residues. Further transmembrane regions
in holins are permutated to different possible positions by keeping
loops as static to understand the role of transmembrane and nontransmembrane regions.
Results
We found that the reduced amino acid alphabet model is successful,
when no relationship is established between the proteins belonging
to similar families. Also, the important physico-chemical properties
conserved in the non-redundant holin sequences is explored
in detail by computing correlation coefficients. Permutation of
transmembrane regions in holins and database search showed
that the holin sequence composition and arrangement is unique to
perform its specific function.
Conclusion
Analysis presented in this paper reveal the vital role of each and
every amino acid residue in the holin and this may help to accurately
model the structure to understand the sequence-structure-function
relationship of holins on the membrane.
Keywords
Holins; Sequence alignment; Physico-chemical
Permutation experiments; Consensus sequence
properties;
*Corresponding author: Konda Mani Saravanan, Centre of Excellence in
Bioinformatics, School of Biotechnology, Madurai Kamaraj University, Madurai
625 021, Tamilnadu, India, E-mail: [email protected]
Received: October 15, 2015 Accepted: November 15, 2015 Published: November
22, 2015
International Publisher of Science,
Technology and Medicine
Introduction
Holins are small membrane proteins responsible for disrupting
the cytoplasmic membrane of bacteria to release endolysins which
hydrolyze the cell wall and induce cell death [1]. The holin genes are
encoded in the genome of bacteriophages to mainly control the phage
infection cycle. These genes play two important roles; one is to release
the endolysin and other is to determine the timing of the end of
infection cycle [2,3]. More than hundred families of holin functional
genes have been characterized by defining about thirty orthologous
groups. Due to the high sequence divergence of holins, it is usually
grouped into two classes based on common structural motifs. Class I
holins have three predicted transmembrane regions where as class II
holins are shorter than other class with two transmembrane regions
[4]. The amino terminal of class I holins span the membrane in
periplasm and its carboxy terminal in the cytoplasm whereas class II
holins has both its amino and carboxy terminal on the cytoplasm [5].
By looking the literature carefully, it is noted that the crucial
function of holin at the structural level and mechanistic level is
investigated very little. In other words, the nature of the lethal
lesion caused by holins in the process of lysis is still not clear [6]. As
mentioned above, the function of holin in the phage vegetative cycle
was not investigated in deep until phage systems dominated molecular
biology. In order to study structure-function relationship of holins,
several sequence characteristic properties were used such as location
of genes, presence of two adjacent hydrophobic transmembrane
regions, dual start motif and highly charged hydrophilic C-terminal
domains respectively [7-9]. There are exceptions of above sequence
characteristics which may still act as a putative holin [10-14].
Holins exist in many unrelated protein families in terms of
sequence and membrane topology, suggesting that the holins have
evolved independently. In other words, holins are identified as
domains (Transmembrane domains) in a variety of protein families
and furthermore, this would imply a single lineage of evolution from
a common ancestral (phage) that has been horizontally transferred
[14]. Computational analysis of a set of similarly folded proteins with
distinct amino acid sequences can help in identifying residues and
regions of polypeptide chains that are likely to be important in the
protein folding and function [15,16]. The amino acid residues in a
protein contribute to different extents in coding a particular fold to
perform its function [17]. The physico-chemical and conformational
properties of amino acid residues at core of transmembrane, terminal
and loop regions are important for its structure and function [18-20].
In order to study non-related sequences adopting similar 3D folds, a
reduced amino acid alphabet approach is used [21]. A reduced amino
acid alphabet model is possible by grouping twenty amino acids into a
smaller number of representative residues with similar features [22].
In the present work, we have considered non-redundant dataset of
holin and carried out careful detailed sequence analysis to uncover
the conserved and contrasting properties of amino acid residues in
the transmembrane and non-transmembrane regions.
Material and Methods
Dataset and sequence alignment
We have considered 48 non-redundant sequences of holins in
All articles published in Journal of Applied Bioinformatics & Computational Biology are the property of SciTechnol, and is
protected by copyright laws. Copyright © 2015, SciTechnol, All Rights Reserved.
Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput
Biol 4:3.
doi:http://dx.doi.org/10.4172/2329-9533.1000120
bacteriophages. They share less than 40% sequence identity with each
other in the dataset. We considered two major classes of holins which
contains two transmembrane regions or with three transmembrane
regions. By using SOSUI web server [23], we have extracted sequences
of transmembrane regions. We have used reduced amino acid
alphabet model proposed by Beckstette et al. [22], where they reduced
twenty amino acid residues to fifteen. Figure 1 reproduced from
Beckstette et al., [22] shows the reduced amino acid alphabet model
adopted in this work. We have used multiple sequence alignment
program ‘Multialin” to align transmembrane regions [24]. Since, our
research group is working experimentally (by X-ray crystallography)
to solve the structure of holin with three transmembrane segments,
we have considered a holin with three transmembrane regions and
permutated the first, second and third regions to different positions
(generated 6 sequences by altering TM regions by permutation).
We have built hmm profile by using HMMer tool [25] for a three
transmembrane holin sequence and searched against Pfam [26] and
COG genome databases [27]. By using the six permutated holin
sequences, we have derived a consensus sequence based on multiple
sequence alignment to perform further computations to find whether
the transmembrane or non-transmembrane regions of holins play
vital role in folding and formation of hole like structure.
Computation of amino acids properties correlation coefficient
We have made use of forty eight kinds of diverse set of physical,
chemical, energetic and conformational properties derived from
folded native conformation of proteins which is given in a paper
by Gromiha et al. [28]. The list of forty eight diverse physicochemical properties used in the present work is shown in Table 1.
We computed cross correlation coefficient by substituting sequence
of numerical values which represents any one of the above diverse set
of property in the place of amino acid sequence of target and template
sequences. Calculations of average correlation coefficients using a set
of properties were found to improve the signal noise ratio and in our
calculation average cross correlation coefficient were also computed.
A quantitative expression of homology between two amino acid
sequences X and Y is obtained by computation of cross correlation
coefficient described below. The coefficient C ( j ) at the jth residue of
the sequence Y is expressed by comparing a sequence of N residues
long, which starts at the uth residue and ends at the (u+N)th residue in
the sequence X with the sequence Y from the jth residue to the (u+N)
th
residue.
∑ i =1 ( X (u + i − 1)− < X >)(Y(j+ i − 1)− < Y >)
N
C(j) =
N
[{∑ i =1 (X(u + i − 1)− < X >) 2 }{∑ (Y ( j + i − 1)− < Y > 2 )}]1/2
N
i =1
Where
N
=
< X > 1/ N (∑ X (u + i − 1))
i =1
N
< Y >= 1/ N ∑ Y (j = i − 1))
i =1
Here X (u + i – 1) is the index value of an amino acid at the
position (u + i – 1) in X and Y (j + i – 1) at the position (j + i – 1) in
Y. The percentage of occurrences of the correlation coefficient greater
than 0.5 for each property is also computed. The whole computation
process have been carried out and automated by using an in house
PERL program.
Volume 4 • Issue 3 • 1000120
Figure 1: Reduced amino acid alphabet model (figure reproduced from
Beckstette et al. 2006).
Table 1: Forty eight diverse kind of physico-chemical properties.
S.No Physico-chemical properties of amino acid residues
1
Compressibility
2
Thermodynamic transfer hydrophobicity
3
surrounding Hydrophobicity
4
Polarity
5
Isoelectric point
6
Equilibrium constant with reference to ionization property of COOH group
7
Molecular weight
8
Bulkiness
9
Chromotographic index
10
Refractive index
11
Normalized consensus hydrophobicity
12
short and medium range non bonded energy
13
Long-range non bonded energy
14
Total non-bonded energy
15
Alpha helical tendency
16
Beta structure tendency
17
Turn forming tendencies
18
Coil forming tendency
19
Helical contact area
20
Mean RMS fluctuational displacement
21
Burriedness
22
Solvent Accessible reduction ratio
23
Average number of surrounding residues
24
Power to be at the N-terminal
25
C-terminal
26
Middle of alpha helix
27
Partial specific volume
28
Average medium range contacts
29
Average number of long-range contacts
30
Combined surrounding hydrophobicity
31
Solvent Accessible Surface area of denatured
32
Native
33
Unfolding
34
Gibbs free energy change of hydration for unfolding
35
denatured
36
native
37
Unfolding enthalpy change of hydration
38
Unfolding entropy change of hydration
39
Unfolding hydration heat capacity change
40
Unfolding Gibbs free energy
41
Unfolding enthalpy
42
Unfolding entropy changes
43
Unfolding Gibbs free energy change
44
Unfolding enthalpy change
45
Unfolding entropy changes
46
Volume
47
Shape
48
Flexibility
• Page 2 of 5 •
Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput
Biol 4:3.
doi:http://dx.doi.org/10.4172/2329-9533.1000120
Results
transmembrane region aligns with Phage holin4. Third TM aligns
with Flavi_NS4A Flavivirus non-structural protein NS4A.
Sequence properties of Holins
By aligning the transmembrane regions of holins with
Protein Data Bank, we have noted that the absence of similar
sequences in the database (average sequence identity is 35%). We
observed conservation of amino acid residue D in first and second
transmembrane region and amino acid S in third transmembrane
segment. Conserved motifs are observed in the first transmembrane
region (LXXL) and in third transmembrane region (EXXS) which
is shown in Figure 2 (a, b, c). Then, we have shuffled the three
transmembrane regions in the sequences by keeping loops as constant
to generate six permuted sequences like 123, 213, 312, 231, 321 and 132
respectively. The numbers above indicate the order of transmembrane
regions. Figure 3 shows the alignment of six permutated sequences
and its consensus. While searching the reference protein (123) in the
translated genome databases and Protein Data Bank, we found some
hypothetical proteins as hits whereas there are no hits while searching
Protein Data Bank. It should be noted that there are even no hits for
the reference protein (Permutated) in PDB.
In the case of permutated sequence (213), an uncharacterized
protein has significant alignment at first transmembrane region with
sequence identity 20.1%. For permutated sequence (312), second
While searching in translated genomes, ABC-type metal ion
transport system, permease component aligns with holin with 24%
sequence identity. In the case of permutated sequence (231), most of
the regions in this sequence align with uncharacterized proteins. First
transmembrane region aligns with AhpA Uncharacterized membrane
protein affecting hemolysin expression. For Permutated Sequence
(321), first transmembrane region aligns with small hydrophobic
integral membrane protein. First and second transmembrane region
aligns with ProW ABC-type proline/glycine betaine transport system,
permease component. The permutated sequence (132) aligns with
Macoilin transmembrane protein and most of the regions align
with uncharacterized protein. Although the sequence identities and
e-values obtained for hits are quite low (<24%) and the length of
alignment is so short which do not infer homology. The observed
sequence identity has no biological context.
Correlation coefficient of 48 amino acid properties between
targets and templates
The pairwise alignment of holins against PDB is given in
Supplementary Material. The results suggest that the relationship
between diverse set of properties of amino acid residues between
Figure 2: Multiple sequence alignment of transmembrane regions.
Volume 4 • Issue 3 • 1000120
• Page 3 of 5 •
Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput
Biol 4:3.
doi:http://dx.doi.org/10.4172/2329-9533.1000120
Figure 3: Multiple sequence alignment of permutated holins and its consensus sequence.
Figure 4: Correlation coefficient of 48 amino acid properties.
holins and their templates are very crucial while selecting template or
fold from fold library for a holin with very low sequence identity. The
average correlation coefficients of 48 properties between holins and
their templates for each property are shown in Figure 4. The properties
such as polarity, bulkiness, chromotographic index, total non-bonded
energy, mean RMS fluctuational displacement, burriedness, average
number of surrounding residues, solvent accessible surface area of
native and unfolding states, unfolding hydration heat capacity change
and unfolding entropy changes respectively have average correlation
coefficient greater than 0.5. The properties like equilibrium constant
with reference to ionization property, consensus hydrophobicity,
alpha helix forming tendency, power to be at the middle of alpha
helix, average number of medium range contacts, gibbs free energy
change of hydration for unfolding and denatured state and shape
respectively have poor correlation coefficient less than 0.4. The
figure clearly shows only marginally increased values for those >0.5,
whereas the ones <0.5 go as low as 0.2. A statistical analysis of data
shows that about 0.68 correlation coefficients deviate from 0.5 for the
other amino acids for the holins. Currently, the data suggests that
the values aren’t significantly above 0.5, indicating that there is no
(statistically significant) correlation between the 48 holin amino acids
and their templates.
Conclusion
From our analysis, we have shown several interesting
observations. By using reduced amino acid alphabet model, we found
two common motifs in the first transmembrane region (LXXL)
and in third transmembrane region (EXXS). Another interesting
observation is construction of a consensus holin from the alignment
of permutated sequences. Database search of consensus holin
sequence shows weak homology against transporting proteins. Most
of the regions of permutated sequences align with uncharacterized or
Volume 4 • Issue 3 • 1000120
hypothetical proteins. Through the paper, we show that there are no
holins with permuted arrangements and the sequences of holins are
unique compared than that of others.
Acknowledgements
The author thanks Department of Biotechnology for providing Computational
facilities in the form of Bioinformatics Centre at the Madurai Kamaraj University,
Madurai, India. He also thanks the University Grants Commission for the award
of Dr D.S. Kothari Post Doctoral Fellowship [grant number F.13-932/2013(BSR)]
and Dr. S. Krishnaswamy, Retired senior professor at Madurai Kamaraj University
under whom the author is working.
References
1. Young R (2014) Phage lysis: three steps, three choices, one outcome. J
Microbiol 52: 243-258.
2. Reddy B, Saier MH Jr (2013) Topological and phylogenetic analyses of
bacterial holin families and superfamilies. Biochim Biophys Acta 1828: 26542671.
3. Young R (1992) Bacteriophage lysis: mechanism and regulation. Microbiol
Rev 56: 430-481.
4. Wang IN, Smith DL, Young R (2000) Holins: the protein clocks of
bacteriophage infections. Annu Rev Microbiol 54: 799-825.
5. Ramanculov E, Young R (2001) Genetic analysis of the T4 holin: timing and
topology. Gene 265: 25-36.
6. Young R (2002) Bacteriophage holins: Deadly diversity. J Mol Microbiol
Biotechnol 4: 21-36.
7. Loessner MJ, Wendlinger G, Scherer S (1995) Heterogeneous endolysins
in listeria monocytogenes bacteriophages: a new class of enzymes and
evidence for conserved holin genes within the siphoviral cassettes. Mol
Microbiol 16: 1231-1241.
8. Young R, Blasi U (1995) Holins: form and function in bacteriophage lysis.
FEMS Microbiol Rev 17: 191-205.
9. Blasi U, Young R (1996) Two beginnings for a single purpose: the dual start
holins in the regulation of phage lysis. Mol Microbiol 21: 675-682.
• Page 4 of 5 •
Citation: Saravanan KM (2015) Sequence Analysis of Holins by Reduced Amino Acid Alphabet Model and Permutation Approaches. J Appl Bioinform Comput
Biol 4:3.
doi:http://dx.doi.org/10.4172/2329-9533.1000120
10.Loessner MJ, Gaeng S, Wendlinger G, Maier SK, Scherer S (1998) The two
component lysis system of Staphylococcus aureus bacteriophage twort: a
large TTG-start holin and an associated amidase endolysin. FEMS Microbiol
Lett 162: 265-274.
11.White R, Tran TAT, Dankenbring CA, Deaton J, Young R (2010) The
N-terminal transmembrane domain of λ S is required for holin but not antiholin
function. J Bacteriol 192: 725-733.
12.Park T, Struck DK, Deaton JF, Young R (2006) Topological dynamics of
holins in programmed bacterial lysis. Proc Natl Acad Sci USA 103: 1971319718.
13.Savva CG, Dewey JS, Deaton J, White RL, Struck DK, et al. (2008) The holin
of bacteriophage lambda forms rings with large diameter. Mol Microbiol 69:
784-793.
14.Srividhya KV, Alaguraj V, Poornima G, Kumar D, Singh GP, et al. (2007)
Identification of prophages in bacterial genomes by dinucleotide relative
abundance difference. Plos One 2: e1193.
15.Saravanan KM, Selvaraj S (2009) Analysis and visualization of long-range
contact networks in homologous families of proteins. The Open Struct Biol
3: 104-125.
16.Saravanan KM, Balasubramanian H, Nallusamy S, Samuel S (2010)
Sequence and structural analysis of two designed proteins with 88% identity
adopting different folds. Protein Eng Des Sel 23: 911-918.
17.Saravanan KM, Selvaraj S (2012) Search for identical octapeptides in
unrelated proteins: Structural plasticity revisited. Peptide Sci 98: 11-26.
18.Saravanan KM, Krishnaswamy S (2015) Analysis of dihedral angle
preferences of alanine and glycine residues in alpha and beta transmembrane
regions. J Biomol Struct Dyn 33: 534-551.
19.Rishyakulya MC, Saravanan KM (2015) Computational structural analysis
of C-terminal residues in proteins containing transmembrane regions. Int J
Comp Biol 4: 44-54.
20.Saravanan KM, Selvaraj S (2015) Better theoretical models and protein
design experiments can help to understand protein folding. J Nat Sci Biol
Med 6: 202-204.
21.Etchebest C, Benros C, Bornot A, Camproux AC, De Brevern AG (2007)
A reduced amino acid alphabet for understanding and designing protein
adaptation to mutation. Eur Biophys J 36: 1059-1069.
22.Beckstette M, Homann R, Giegerich R, Kurtz S (2006) Fast index based
algorithms and software for matching position specific scoring matrices. BMC
Bioinformatics 7: 389.
23.Hirokawa T, Boon-Chieng S, Mitaku S (1998) SOSUI: classification and
secondary structure prediction system for membrane proteins. Bioinformatics
14: 378-379.
24.Corpet F (1988) Multiple sequence alignment with hierarchical clustering.
Nucleic Acids Res 16: 10881-10890.
25.Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive
sequence similarity searching. Nucleic Acids Res 39: W29.
26.Bateman A, Coin L, Durbin R, Finn RD, Hollich V, et al. (2004) The Pfam
protein families database. Nucleic Acids Res 32: D138-D141.
27.Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, et al.
(2003) The COG database: an updated version includes eukaryotes. BMC
Bioinformatics 4: 41.
28.Gromiha MM, Oobatake M, Sarai A (1999) Important amino acid properties for
enhanced thermostability from mesophilic to thermophilic proteins. Biophys
Chem 82: 51-67.
Author Affiliation
Top
Centre of Excellence in Bioinformatics, School of Biotechnology, Madurai
Kamaraj University, Madurai 625 021, Tamilnadu, India
Submit your next manuscript and get advantages of SciTechnol
submissions
™™
™™
™™
™™
™™
™™
50 Journals
21 Day rapid review process
1000 Editorial team
2 Million readers
Publication immediately after acceptance
Quality and quick editorial, review processing
Submit your next manuscript at ● www.scitechnol.com/submission
Volume 4 • Issue 3 • 1000120
• Page 5 of 5 •