Download Homology modeling and enzyme function prediction in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Indian Journal of Biotechnology
Vol 11, April 2012, pp 224-234
Homology modeling and enzyme function
prediction in uncharacterized proteins of
Salmonella typhi—An in silico approach
D G Gore1*, M K Rathod2, V Soni 3and M M Rai2
1
Sai Bioinfosys-Bioinformatics Research Centre,
Raghuji Nagar, Nagpur 440 023, India
2
Centre for Sericulture and Biological Pest
Management Research, RTM Nagpur University,
Nagpur 440 001, India
3
St. Wilferd College, Jaipur 320 020, India
Received 13 September 2010; revised 11 August 2011;
accepted 15 October 2011
Salmonella typhi, a known human pathogen registering
multiple drug resistance, causes majority of endemic cases in
developing nations. S. typhi genome was marked with the
1220 ORFs for hypothetical proteins. Enzyme coding probability
was searched in hypothetical proteins using web tools like CDDBLAST, InterProScan, Pfam and COGs. Study sorted out 213
proteins as enzyme coding and for these proteins tertiary
structures were predicted based on the homology modeling. About
89 structures were modeled for functional proteins and such a
deciphered structure-function relationship could help in detail
understanding of regulatory network of S. typhi and establishing
new function in uncharacterized regions.
Keywords: CDD-BLAST, COGs, function prediction, homology
modeling, hypothetical proteins, InterProScan, Pfam,
Salmonella typhi
Humans are the only natural host of Salmonella typhi,
while it shows limited pathogenicity to other
mammals. Studies based on isoenzymes have shown
that isolates of S. typhi around the world are highly
related1. Resistance to common antibiotics like
fluoroquinolones, being the most effective drug for
typhoid fever, has been reported in S. typhi2.
Moreover, S. typhi has recently been reported for
multiple drug resistance (MDR) and S. typhi CT18 is
considered as one of the examples of emerging MDR
microorganism3. Since genome of S. typhi has been
sequenced, now a better understanding of
pathogenicity and resistance is expected. S. typhi
__________
*Author for correspondence:
Tel: +91-712-2703977
E-mail: [email protected]
genome comprised of 4809037 bases for main circular
chromosome, along with 218160 and 106516 bases
for pHCM1 and pHCM2 plasmids, respectively3.
About 1220 ORFs codes for hypothetical proteins are
present in a circular chromosome. These regions
comprised of about 25% of coding ability of S. typhi,
which has not been analyzed yet. The probable
function prediction of these hypothetical proteins is
possible by comparative functional genomics with the
biological databases using bioinformatics web tools,
which have the potential to screen conserved domains
in the input. Based on homology information of
conserved domain, classification of hypothetical
proteins into particular family is possible4-6. This
allows the filtering of those hypothetical proteins
whose roles in the life cycle could be ascertained on
the priority basis by cloning and expression studies7.
The primary information regarding the availability of
protein sequences of S. typhi have been gathered from
the website www.genome.jp/kegg/. The S. typhi
hypothetical proteins were screened for the presence
of conserved domain(s) by using the following 4 web
tools:Conserved Domain BLAST—The CDD 27036
PSSMs database was used to search conserved
regions using E-value parameter at 0.01, the value set
for getting a very close family members and key kept
“ON” for ‘low complexity filter’, which removes all
those sequences from the analysis which have not
shown evolutionary relationships.
InterProScan—The databases like BlastProDom,
FPrintscan, HMMPIR, HMMPfam, HMMSmart,
HMMTigr, ProfileScan, ScanRegExp, PatternScan,
SuperFamily, SignalPHMM, TMHMM, MMPanther
and Gene3D were used in the Interproscan
functionality search analysis.
Pfam—The search strategy used of both Global and
Local (merged) type, using Automatic Domain
Decomposition Algorithm (ADDA) and by setting Evalue set as 1.0, as with this value results were similar
to other programs used.
COGs—The parameters were set by using “clades
value” as BeTs to 3 clades. Clades used to change the
SHORT COMMUNICATIONS
stringency of the search, to insist that any COG to
which the query protein is assigned must be
composed of at least the indicated number of clades.
Set the value 3, which was the number used to define
the minimal COG.
The results obtained from the protein functionality
analysis were reported in confidential level in per cent
for assigning function to the hypothetical protein. The
parameter of confidence limit set as 100, 75, 50, 25
and 0% considering the following rules:1
If the given 4 tools indicate the same enzymatic
domain with the similar function inspite of any
scores of each tool, then the confidence level
were to be 100%.
2
If the given 3 tools indicate the same enzymatic
domain with the similar function inspite of any
scores of each tool, then the confidence level
were to be 75%.
3
If the given 2 tools indicate the same enzymatic
domain with the similar function inspite of any
scores of each tool, then the confidence level
were to be 50%.
4
If the given at least 1 tool indicates the enzymatic
domain with the similar function inspite of any
scores of each tool, while others are different then
the confidence level were to be 25%.
5
If the given tool does not indicate any enzymatic
domain then the confidence level were to be
0%4-6,8.
The tertiary structures of S. typhi hypothetical
proteins were modeled by using PS square [(PS)2],
which is an automated homology modeling server.
The method used an effective consensus strategy by
combining
PSI-BLAST9,10,
IMPALA10,
and
11
T-Coffee in both template selection and targettemplate alignment. The final 3-dimensional structure
was built using the modeling package MODELLER
available along with server12-14. The web address is
http://www.ps2.life.nctu.edu.tw/15. The predicted
structures obtained from the PS square were saved in
the .PDB format (will be made available through email on request to the corresponding author). Of the
1220 hypothetical proteins anlyzed for enzymatic
function, study sorted out 213 hypothetical proteins
for probable enzyme activity based on conserved
domains found in the primary sequences when aligned
225
Table 1—Percentage classification of S. typhi enzymatic
hypothetical proteins
Percentage of
similarity
100%
75%
50%
25%
0%
No. of proteins
31
30
23
57
72
with known enzyme families using 4 web tools. The
functionality search by 4 tools has given the variable
results and according to conditions set for confidence
limits, these 213 proteins were categorized in a
particular % confidence (Table 1).
The particular enzyme functions were linked with
all 213 hypothetical proteins as showcased in Table 2
(will be made available through e-mail on request to
the corresponding author). The enzyme data highlight
the domain information, which was predicted based
on the sequence homology with known protein family
information using 4 web tools. The 213 hypothetical
proteins with enzymatic conserved domains were
used for the protein structure prediction. The server
predicted structures of 89 proteins only, where
remaining 124 proteins were rejected based on not
getting the best template for structure building. The
protein structure prediction of 89 hypothetical
proteins was done, only when the aligned template
has shown the same enzyme family as predicted by 4
web tools (Table 2). The study filtered out 213 S.
typhi hypothetical proteins for probable enzymatic
function as reported earlier for B. anthracis, S.
flexneri and H. influenza4-6. The methodology was
useful in predicting the enzyme function and
modeling tertiary structure based on the bio-programs
involved in the study. These enzyme domain
containing proteins could be put into the operation of
cloning and expression to decipher its function and in
return further establishing the fact about these
mysterious hypothetical proteins. The importance of
bioinformatics in establishing sequence specific
functional relationship was once again realized.
Utilized web tools like CDD-BLAST, IterProScan,
Pfam and COGs, along with PS square, enabled us to
explore functionality in the uncharacterized sequences
and along with predicted structures, and these proteins
could be implemented further for linking with the life
cycle of S. thphi. Predicted functional hypothetical
loci could be linked with established metabolic
network and may help to understand in detail about
these hypothetical regions.
INDIAN J BIOTECHNOL, APRIL 2012
226
Table 2—Conseved domain data for hypothetical protein and template for structure prediction
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY0033
DsbA_Com1_like
Protein-disulfide
isomerase
Arylsulfatase A
1eejA
Sulfatase super family
DSBA-like
thioredoxin
No
75
STY0099
50
1aukA
STY0165
PulE-GspE
GSPII_E
ATPases
25
1p9rA
STYO197
Polysacc_deac_1
DSBA
oxidoreductase
Alkalinephosphatase-like
Type II secretion
system protein E
Polysacc. deacetylase
Polysacc. deacetylase
100
2c1iA
STY0260
Glyoxalase
Glyoxalase/
dioxygenase
75
2p25A
STY0279
Exo_endo_phos
Uncharacterized BCR
75
2j63A
STY0283
AdoMet_MTases
Methyltransferases
100
3ccfB
STY0311
Glucosaminidase
Endonuclease/
exonuclease/
phosph.
Methyltransferase
type 11
Acetylglucosamidase
Glyoxalase/
Dioxygenase superfamily
Endonuclease/
Exonuclease/
phosph.
Methyltransferase
Xylanase/chitin
deacetylase
Lactoylglutathione
lyase
No related COG
75
no
STY0313
Glucosaminidase
Acetylglucosamidase
Sulfate permease
75
no
STY0317
No
No
No related COG
25
no
STY0338
Polysacc_deac_1
2c1iA
CpxP super family
25
no
STY0353
Xc-1258_like
UPF0012
Xylanase/chitin
deacetylase
Restriction
endonuclease S
Amidohydrolase
100
STY0352
Polysaccharide
deacetylase
Unintegrated
50
2e11A
STY0356
YafJ
1te5A
YkuD
Glyoxalase
25
75
1y7mA
2r6uD
STY0449
STY0478
PRK11295
No
50
25
2qgpB
no
STY0496
STY0499
4HBT
HAD_like
Thioesterase
Cof protein
100
75
1njkA
1rkqA
STY0511
EAL
100
2r6oA
STY0523
STY0541
Transposase
YbaK_deacylase
Diguanylate phosphodiesterase
Transposase
CHP00011
Glutamine
amidotransferase
Uncharacterized BCR
Lactoylglutathione
lyase
No related COG
Sf. II DNA and RNA
helicases
Thioesterase
Hydrolases of the
HAD
EAL domain
75
STY0357
STY0447
Glutamine
amidotransferase
YkuD
Glyoxalase//
dioxygenase
HNH endonuclease
No
No related COG
Uncharacterized ACR
75
50
no
2dxaA
STY0547
Membrane protease
100
3bk6A
STY0586
Band_7_stomatin_lik
e
DUF457 super family
no
PaaI_thioesterase
75
no
STY0646
GlyDH-like1
Metal-dependent
hydrolases
Uncharacterized
protein
Glycerol
dehydrogenase
50
STY0643
100
1ta9B
STY0648
ParBc
ParB-like nuclease
Transposase
YbaK/prolyl-tRNA
synthetases
SPFH domain/Band
7 family
Metal-dependent
hydrolase (DUF457)
Thioesterase superfamily
Iron-containing
alcohol dehydrogenase
ParB-like nuclease
75
1vz0B
STY0649
PAPS_reductase
PAPS reductase
PAPS reductase
Band 7 protein
DUF457
P.acid degradationrelated protein
Alcohol
dehydrogenase
Acetylglucosaminidase
Acetylglucosaminidase
L,D-transpeptidase
catalytic domain
Polysaccharide
deacetylase
RNA polymerase
Rpb5
Carbon-nitrogen
hydrolase
Glutamine
amidotransferases
L,D-transpeptidase
Glyoxalase//
Dioxygenase
HNH endonuclease
No
Thioesterase
Haloacid dehalogenase-like hydrolase
EAL domain
Transcriptional
regulators
PAPS reductase
100
2oq2C
Contd.
SHORT COMMUNICATIONS
227
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY0658
Nitrate_red_del
TorD-like chaperone
1n1cA
SPOUT
methyltransferase
Alpha/beta hydrolase
75
1ns5B
STY0734
SPOUT_MTase super
family
Esterase_lipase
Anaerobic
dehydrogenases
Uncharacterized ACR
50
STY0692
Nitrate reductase
delta subunit
SPOUT
methyltransferase
Alpha/beta hydrolase
75
3bf7A
STY0752
AHS1 super family
100
2phcB
STY0753
AHS2 super family
100
no
STY0767
Glyco_tranf_GTA
Hydrolases or
acyltransferases
Allophanate hydrolase subunit 1
Allophanate
hydrolase subunit 2
Glycosyltransferases
100
2ffuA
STY0772
Gneg_AbrB_dup
super family
YvcK_like
Aammonia
monooxygenase
Uncharacterized ACR
100
no
25
2ppvA
NO related COG
75
2o3hA
100
3b5qB
75
75
1zatA
2hf2B
75
2hf2B
100
2hunA
25
2o5vA
75
1yacA
50
50
2bibA
no
STY0848
Exo_endo_phos super
family
Allophanate
hydrolase subunit 1
Allophanate
hydrolase subunit 2
Glycosyl transferase,
family 2
ammonia
monooxygenase
2-Phospho-L-lactate
transferase
Exonuclease/phosphatase
STY0875
Sulfatase
Sulfatase
STY0878
STY0881
YkuD
HAD_like
YkuD
Cof protein
STY0900
HAD_like
Cof protein
STY0929
NADB_Rossmann
STY0935
TOPRIM_OLD
NAD-dep. epimerase/dehydratase
DUF2813
L,D-transpeptidase
Haloacid dehalogenase-like hydrolase
Haloacid dehalogenase-like hydrolase
NAD dep. epimerase/dehydratase
(DUF2813)
STY0948
YcaC_related
Isochorismatase
Isochorismatase
STY0984
STY0991
Beta-lactamase
Aminoglycoside
phosphotransferase
Twin-arginine translocation pathway
Beta-lactamase
Competence protein
Phosphotransferase
DUF882
Uncharacterized BCR
25
1lbuA
Metallo-betalactamase
Lon protease (S16)
2gcuA
100
1z0wA
STY1103
AdoMet_Mtases
100
1wxxA
STY1129
TLP_HIUase
100
2gpzA
STY1143
Lactamase_B super
family
Zn-dep. hydrolases,
glyoxylases
ATP-dependent
protease
SAM-dependent methyltransferases
Transthyretin-like
protein
Beta-lactamase superfamily III
75
STY1089
Lactamase_B
PKc_like super family
Peptidase_M15_3
super family
Lactamase_B super
family
Lon_C super family
Metal-dependent
hydrolase
Uncharacterized BCR
Hydrolases of the
HAD
Hydrolases of the
HAD
Nucleoside-pp-sugar
epimerases
ATP-dependent endonuclease
Amidases related to
nicotinamidase
Metal-binding protein
No related COG
75
1y44A
STY1174
STY1185
Nitrate_red_del super
family
PLDc
STY1193
RHOD_YceA
STY0835
STY0998
STY0999
Peptidase S16, Lon
protease
P. synthase/ar. transglycosylase
Transthyretin/hydrox
yisourate hydrolase
Unintegrated
TorD-like chaperone
Phospholipase
D/Transphosphatidylase
Rhodanese-like
Allophanate
hydrolase subunit 1
Allophanate
hydrolase subunit 2
Glycosyl transferase
family 2
Ammonia
monooxygenase
UPF0052
Exonuclease/phosphatase
family
Sulfatase
SAM dependent methyltransferase
HIUase/Transthyretin
family
Metallo-betalactamase superfamily
Nitrate reductase
delta subunit
Phospholipase D
Active site motif
Rhodanese-like
domain
Anaerobic
dehydrogenases
Cardiolipin synthases
50
1s9uA
100
2ze9A
Sulfurtransferases
100
2eg4A
Contd.
INDIAN J BIOTECHNOL, APRIL 2012
228
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY1296
Pat_NTE
75
1oxwC
no
Catalase, manganese
100
2v8tA
STY1321
Ferritin_like
Manganese containing catalase
Domain of unknown
function (DUF892)
Fe-S-cluster oxidoreductase
Mn-containing
catalase
No related COG
100
STY1320
UPF0153 super
family
Mn_catalase
Patatin-like
phospholipase
(UPF0153)
Alpha-beta hydrolase
STY1318
Lysophospholipase
patatin
UPF0153
50
2gs4A
STY1329
Metal-dependent
phosphoesterases 100
EAL domain
100
1m65A
STY1354
POLIIIAc super
family
EAL super family
100
2r6oA
STY1360
STY1426
No
GFA super family
ATPase
No related COG
25
75
no
1x6mA
STY1433
STY1598
AdoMet_MTases
super family
M20_dimer super
family
PRK10281 super
family
No
STY1604
rve super family
STY1609
LT_GEWL
STY1615
COG4373 super
family
YtcJ_like
STY1452
STY1484
STY1650
STY1757
A4_betagalactosidase
PaaI_thioesterase
STY1766
EAL super family
STY1787
APH_ChoK_like
STY1790
NO
STY1818
STY1831
Arsenite_oxidase
P-loop NTPase super
family
YeaK
STY1741
STY1835
STY1846
STY1869
STY1889
STY1925
STY1942
Transgly_assoc super
family
Sialidase super family
DUF847
Transgly_assoc super
family
FAA_hydrolase super
family
Ferritin/
ribonucleotide
reductase
Polymerase/histidinol
phosphatase
Diguanylate phosphodiesterase
No
Formaldehydeactivating, GFA
SAM dependent
methyltransferase
Peptidase M42
PHP domain
EAL domain
No
Formaldehydeactivating enzyme
Methyltransferase
domain
M42 glutamyl
aminopeptidase
Phenazine biosynthesis-like protein
Nucleotide pyrophosphohydrolase
Integrase core domain
SAM-dependent
methyltransferases
Cellulase M and
related proteins
Epimerase,
PhzC/PhzF homolog
NO related COG
100
2avnA
75
1y0yA
75
1qyaB
25
no
Predicted transposase
75
1bcoA
Transglycosylase
SLT domain
Terminase-like family
Soluble lytic murein
transglycosylase
No related COG
75
1qsaA
50
2o0jA
Metal-dependent
hydrolase
DUF1355
Amidohydrolase
family
DUF1355
Metal-dependent
hydrolase
No related COG
100
2g3fA
25
2gk3E
Phenylacetic acid
degradation
Diguanylate
phosphodiesterase
Protein kinase-like
domain
DUF457, transmembrane
Nitroreductase-like
PrkA serine kinase
Thioesterase superfamily
EAL domain
Uncharacterized
protein
EAL domain
75
no
100
2r6oA
Fructosamine kinase
Fructosamine-3kinase
Metal-dependent
hydrolases
Nitroreductase
Putative Ser protein
kinase
Uncharacterized ACR
75
no
75
no
75
75
3bm1A
1g8pA
75
1vjfA
No related COG
75
no
No related COG
No related COG
50
75
1so7A
2ikbA
Predicted membrane
proteins
2-Keto-4-pentenoate
hydratase
75
1ciiA
100
1nr9A
Phenazine
biosynthesis
No
Integrase, catalytic
core
Lytic transglycosylase-like, catalytic
Terminase-like
Aminoacyl-tRNA
synthetase
Transglycosylaseassociated protein
Neuraminidase
DUF847
Transglycosylaseassociated protein
Fumarylacetoacetase
Metal-dependent
hydrolase (DUF457)
Nitroreductase family
PrkA AAA domain
YbaK/prolyl-tRNA
synthetases
Transglycosylase
associated protein
No
Predicted lysozyme
(DUF847)
Transglycosylase
associated protein
Fumarylacetoacetate
(FAA) hydrolase
Contd.
SHORT COMMUNICATIONS
229
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY1950
2gelA
75
1j7hA
75
1nqzA
EAL
EAL domain
Metal-dependent
proteases
Translation initiation
inhibitor
NTP pyrophosphohydrolases
EAL domain
75
STY1957
100
2r6oA
STY1958
TerC family
Hemolysins
100
2o3gA
STY2005
CBS_pair_CorC_Hly
C
GGDEF
GGDEF domain
GGDEF domain
100
3breB
STY2083
Nitrilase super family
Carbon-nitrogen
hydrolase
Predicted
amidohydrolase
100
1f89A
STY2098
Opacity-associated
protein A
Isochorismatase
2gu1A
75
1j2rA
STY2113
AdoMet_Mtases
Methyltransferase
100
1im8A
STY2114
AdoMet_MTases
100
3ccfB
STY2120
Glyco_hydro_88
super family
GATase1_DJ-1
tRNA (cmo5U34)methyltransferase
tRNA (mo5U34)methyltransferase
Six-hairpin
glycosidase-like
ThiJ/PfpI
Metalloendopeptidases
Amidases related to
nicotinamidase
SAM-dependent
methyltransferases
SAM-dependent methyltransferases
No related COG
75
STY2110
Peptidase_M23 super
family
Cysteine_hydrolases
Peptidase M22,
glycoprotease
Endoribonuclease LPSP
NUDIX hydrolase,
NudL, conserved site
Diguanylate
phosphodiesterase,
predicted
Cystathionine betasynthase, core
Diguanylate cyclase,
predicted
Nitrilase/cyanide
hydratase and Apolipoprotein
N-acyltransferase
Peptidoglycanbinding Lysin group
Isochorismatase-like
Glycoprotease family
STY1955
COG1214 super
family
YjgF_YER057c_UK1
14
CoAse
50
2ahfA
100
2ab0A
NLP/P60
NlpC/P60 family
75
2evrA
Dextransucrase
DSRB
Diguanylate cyclase,
predicted
Metal-dependent
phosphohydrolase
Unintegrated
Dextransucrase
DSRB
GGDEF domain
Intracellular
protease/amidase
Cell wall-associated
hydrolases
No related COG
75
no
GGDEF domain
100
1w25A
Predicted HD superfamily hydrolase
No related COG
100
3b57A
25
no
Uncharacterized BCR
75
1y7mA
Uncharacterized ACR
75
3ci3A
No related COG
75
2hfsA
Diphosphate-sugar
epimerases
Hemolysins
75
1r6dA
75
2plsA
GGDEF domain
100
no
No related COG
Phospholipid
phosphatase
EAL domain
25
100
no
1up8A
100
2r6oA
STY1952
STY2140
STY2191
NLPC_P60 super
family
DSRB super family
STY2194
GGDEF
STY2201
HDc super family
STY2202
Peptidase_S10 super
family
YkuD super family
STY2149
STY2218
YkuD domain
STY2263
Cob_adeno_trans
super family
PduX
Adenosylcobalamin
biosynthesis
GHMP kinase
STY2279
WcaG
STY2332
CBS_pair_CorC_Hly
C_assoc
GGDEF
Epimerase/dehydratase
Cystathionine
beta-synthase
PAS, Diguanylate
cyclase
Unintegrated
Phosphatidic acid
phosphatase
Diguanylate phosphodiesterase
STY2255
STY2336
STY2350
STY2449
STY2451
No
PAP2_like super
family
EAL
Endoribonuclease
L-PSP
NUDIX domain
Protein of unknown
function (DUF1698)
Glycosyl Hydrolase
Family 88
DJ-1/PfpI family
HD domain
Protein of unknown
function (DUF1469)
L,D-transpeptidase
catalytic domain
Cobalamin adenosyltransferase
GHMP kinases N
terminal domain
GDP-mannose dehydrogenase
Integral membrane
protein TerC
MASE1
Peptidase S24-like
PAP2 superfamily
EAL domain
Contd.
INDIAN J BIOTECHNOL, APRIL 2012
230
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY2471
PagL super family
75
2ervA
NTP pyrophosphohydrolases
No related COG
100
1ppvB
75
2iw0A
STY2543
Nudix_Hydrolase
super family
Polysacc_deac_1
super family
NAT_SF super family
Lipid A 3-Odeacylase (PagL)
NUDIX domain
No related COG
STY2525
1xebA
HDc super family
100
2parB
STY2576
Nudix_Hydrolase_38
100
2fkbC
STY2580
100
2c29F
50
2odoA
STY2608
DUF1731 super family
PLPDE_III_AR_like_
1
Abi super family
Predicted acyltransferases
Predicted hydrolases
of HD
NTP pyrophosphohydrolases
Epimerases
(SulA family)
No related COG
75
STY2562
membrane protease
100
no
STY2651
EAL
EAL domain
100
2r6oA
STY2671
No
No
No related COG
25
no
STY2676
PRK10318 super
family
Dyp_perox super
family
ADPRase_NUDT5
Lipid A 3-Odeacylase-related
NUDIX hydrolase
domain
Polysaccharide deacetylase
GCN5-related Nacetyltransferase
Metal-dependent
phosphohydrolase
NUDIX hydrolase
domain
NAD-dep. epimerase/dehydratase
Alanine racemase,
N-terminal
Abortive infection
protein
Diguanylate cyclase,
predicted
Aldehyde dehydrogenase
Unintegrated
Putative papain-like
cysteine peptidase
Dyp-type peroxidase
family
NUDIX domain
No related COG
25
no
Predicted iron-dep.
peroxidase
NTP pyrophosphohydrolases
Beta-lactamase class
C
Arsenate reductase
100
2iizA
100
1viuB
100
2ffyA
75
1rw1A
n-Acetyltransferase
100
2ae6A
Metalloprotease
100
3c37A
Glycerate kinase
100
1to6A
Zn-dependent protease
EAL domain
100
3c37A
100
2r6oA
Hydrolases
100
2hdwA
Phosphoserine phosphatase
SAM-dep. Methyltransferases
Acyl-CoA synthetase
(NDP )
Uncharacterized ACR
50
1l7mB
100
2b3tA
75
no
75
1rw0A
75
2ghsA
STY2530
STY2588
STY2683
Dyp-type peroxidase
STY2720
Beta-lactamase super
family
ArsC_Yffb
STY2723
DUF699 super family
STY2724
Zn_peptidase
NUDIX hydrolase
domain
Beta-lactamaserelated
Conserved hypothetical protein
GCN5-related Nacetyltransferase
Zinc metallopeptidase
STY2730
Gly_kinase
Glycerate kinase
STY2735
Peptidase_M48
Peptidase M48
STY2744
EAL
STY2793
STY2835
Esterase_lipase super
family
HAD_like super family
AdoMet_Mtases
STY2844
NAT_SF super family
STY2850
Cu-oxidase_4 super
family
Diguanylate phosphodiesterase
Peptidase S9, prolyl
oligopeptidase
HAD-s.f. hydrolase,
subfamily IF, YfhB
DNA methylase, N-6
adenine-specific
GCN5-related Nacetyltransferase
Polyphenol
oxidoreductase
STY2855
SGL
STY2714
STY2716
STY2815
Senescence marker
prt-30 (SMP-30)
Polysaccharide deacetylase
Acetyltransferase
(GNAT) family
HD domain
NUDIX domain
NAD dep. epimerase/dehydratase
Alanine racemase,
N-terminal domain
CAAX amino terminal protease
MASE1
Beta-lactamase
ArsC family
Domain of unknown
function (DUF1726)
Neutral zinc
metallopeptidase
Glycerate kinase
family
Peptidase family M48
MASE1
Prolyl oligopeptidase
family
No
Methyltransferase
small domain
Acetyltransferase
(GNAT) family
Multi-copper
polyphenol
oxidoreductase
SMP30/Gluconolaconase/
LRE-like region
Gluconolactonase
Contd.
SHORT COMMUNICATIONS
231
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
STY2859
GGDEF
GGDEF domain
STY2866
CBS_pair_CorC_Hly
C_assoc
Polyketide_cyc2 super family
No
Diguanylate cyclase,
predicted
Cystathionine betasynthase
Basic-leucine zipper
(bZIP)
Unintegrated
STY2873
STY2880
Putative esterase
STY2907
Esterase_lipase super
family
No
STY2918
RHOD super family
Rhodanese-like
STY2928
CMD super family
STY3027
No
STY3039
STY3040
NADB_Rossmann
super family
PRK09989
Carboxy
decarboxylase
Acyl-CoA
N-acyltransferase
NAD-dep. epimerase/dehydratase
Xylose isomerase
STY3047
STY3071
UbiD super family
HDc super family
STY3078
Lactamase_B super
family
STY3080
Radical_SAM super
family
No
Radical SAM
Gly_kinase super
family
DUF3412 super
family
NADB_Rossmann
super fam.
Peptidase_M48 super
family
PLPDE_III_Yggs_lik
e
Polysacc_deac_1
super family
HIT_like super family
Glycerate kinase
STY2893
STY3092
STY3097
STY3108
STY3212
STY3237
STY3253
STY3302
STY3341
STY3345
B12-binding_like
super family
No
STY3358
ABM super family
STY3366
GSP_synth super
family
STY3342
Unintegrated
Carboxylyase-related
Metal-dependent
phosphohydrolase
Beta-lactamase-like
No
Hypothetical protein
CHP00730
Monooxygenase,
FAD-binding
Peptidase M48
Alanine racemase, Nterminal
Polysaccharide deacetylase
Histidine triad (HIT)
protein
Elongator protein
3/MiaB/NifB
No
Carbamoyl phosphate
synthetase
Glutathionylspermidine synthase
%
Template
GGDEF domain
100
3breB
DUF21
CBS domains
75
2o1rA
Polyketide cyclase /
dehydrase
Ubiquitinolcytochrome C reductase
Putative esterase
Oligoketide cyclase
75
1t17A
No related COG
25
no
Hydrolase of the
alpha/beta
No related COG
100
2gzsA
25
no
Rhodanese-rel.
sulfurtransferases
Uncharacterized ACR
100
1yt8A
75
2gmyA
Histone
acetyltransferase
Nucleoside-pp-sugar
epimerases
Hydroxypyruvate
isomerase
Carboxylase
Predicted helicases
75
2i79A
75
2hrzA
75
1k77A
75
75
2idbA
1gm5A
Beta-lactamase superfamily II
100
2p4zB
Organic radical
activating enzymes
No related COG
100
2z2uA
25
no
Glycerate kinase
100
1to6A
Nucleotide-binding
protein
FADdep.oxidoreductases
Zn-dependent
protease
Predicted enzyme
25
2pmbA
100
2qa2A
100
3c37A
50
1w8gA
Predicted chitin
deacetylase
HIT family
hydrolases
Fe-S oxidoreductases
family 2
No related COG
100
1z7aA
100
1y23B
50
2qgqA
25
no
Uncharacterized ACR
25
1tuvA
Glutathionylspermidine synthase
100
2vobA
Transmembrane
exosortase
Rhodanese-like
domain
Carboxydecarboxylase family
Acetyltransferase
(GNAT) family
NAD dep. epimerase/dehydratase
Xylose isomerase-like
TIM barrel
carboxy-lyase
DEAD/DEAH box
helicase
Metallo-betalactamase superfamily
Radical SAM superfamily
Glucodextranase,
domain B
Glycerate kinase family
Possible lysine decarboxylase
FAD binding domain
Peptidase family M48
Alanine racemase,
N-terminal domain
Polysaccharide deacetylase
HIT domain
Radical SAM Nterminal
4-Alpha-Lfucosyltransferase
Antibiotic biosynthesis monooxygenase
Glutathionylspermidine synthase
Contd.
INDIAN J BIOTECHNOL, APRIL 2012
232
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY3367
Extradiol ringcleavage dioxygenase
Adenylate cyclase
aromatic ring-opening
dioxygenase
CYTH domain
Uncharacterized ACR
75
2pw6A
Uncharacterized ACR
75
3bhdA
STY3400
45_DOPA_Dioxygen
ase
CYTHlike_Pase_CHAD
AdoMet_Mtases
DNA methylase,
2pjdA
DUF45 super family
DUF45
25
no
STY3413
GST_C_ECM4_like
1eemA
75
2hp0A
STY3446
SDH_alpha super
family
TP_methylase
Glutathione
S-transferase
Serine dehydratase
100
STY3418
Methyltransferases
100
1pjqB
STY3448
UPF0102
Glutathione
S-transferase
Serine dehydrataselike
Tetrapyrrole methylase
UPF0102
16S RNA G1207
methylase RsmC
Metal-dependent
hydrolase
Glutathione
S-transferase
Uncharacterized ACR
100
STY3401
Methyltransferase
small domain
DUF45
25
no
STY3451
NADB
2a35A
GATase1_PfpI_like
Semialdehyde dehydrogenase
DJ-1/PfpI family
50
STY3452
Semialdehyde dehydrogenase
ThiJ/PfpI
100
1oi4A
STY3458
Peptidase U32
Peptidase family U32
100
1i4nA
Luciferase-like
Luciferase-like
monooxygenase
75
1lucA
1x6vB
100
1oltA
TIM-barrel dehydrogenases
GGDEF domain
100
1vhnA
GGDEF
100
1w25A
STY3603
PaaI_thioesterase
2cy9B
COG5283
50
no
STY3755
Sulfatase
Sulfatase
100
3b5qB
STY3765
MPP_UshA_N_like
75
2z1aA
STY3803
ADP_ribosyl_GH
super family
ADPribosylglycohydrolase
100
1t5jA
STY3805
No related COG
25
no
Rhamnose mutarotase
Aldose 1-epimerase
Heptaprenyl diphosphate synthase
(DUF718)
Aldose 1-epimerase
Uncharacterized ACR
Galactose mutarotase
75
100
1x8dA
1snzA
STY3866
PRK09669 super
family
DUF718 super family
Aldose_epim super
family
Sulfatase super family
Metallophosphoesterase
ADPribosylation/Crystallin J1
Unintegrated
Uncharacterized protein
Chr. segregation ATPases
Metal-dependent
hydrolase
Phosphodiesterase
75
STY3700
Phenylacetic acid
degradation
Phage tail tape
P-loop ATPase
protein family
Radical SAM superfamily
Dihydrouridine synthase (Dus)
Bacterial signalling
prt.
Thioesterase superfamily
Phage-related minor
tail protein
Sulfatase
100
STY3568
ATPase, P-loopcontaining
Hypothetical protein
CHP01212
tRNA-dihydrouridine
synthase
Diguanylate cyclase
Predicted P-loopcontaining kinase
Fe-S oxidoreductases
STY3564
Peptidase_U32 super
family
Flavin_utilizing_monoxy
genases
ATP_bind_2 super
family
Radical_SAM super
family
DUS_like_FMN
Endonuclease /
resolvase
Nucleoside-PP-sugar
epimerases
Intracellular protease/amidase
Collagenase and
related proteases
Reductase & flavindep.oxidoreductases
Sulfatase
Sulfatase
100
1aukA
STY3870
HAD_like
Cof protein
75
1nf2A
STY3933
FMN_red super family
NADPH-dependent
FMN reductase
Haloacid dehalogenase-like hydrolase
NADPH-dependent
FMN reductase
Metal-dependent
hydrolase
Hydrolases of the
HAD
Predicted flavoprotein
100
1rttA
STY3381
STY3459
STY3502
STY3508
STY3831
STY3858
Tetrapyrrole
Methylases
UPF0102
Calcineurin-like
phosphoesterase
ADPribosylglycohydrolase
Contd.
SHORT COMMUNICATIONS
233
Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd.
KEGG No.
CDD-BLAST
InterProScan
Pfam
COGs
%
Template
STY3980
No
50
2osxA
PTS_IIA_fru
Phosphotransferase
75
1xizB
STY4020
Transposase_31 super
family
YcaC_related
Cellulase (glycosyl
hydrolase family 5)
Sugar phosphotransferase system
Putative transposase
No related COG
STY3998
Glycoside hydrolase,
family 5
Phosphotransferase
system
Transposase (putative)
Isochorismatase-like
No related COG
75
no
Isochorismatase family
Polysaccharide pyruvyl transferase
O-Antigen ligase
Amidases related to
nicotinamidase
No related COG
100
1yacA
75
1vgvA
No related COG
75
no
Uncharacterized BCR
75
2nlyA
metalloendopeptidases
No related COG
100
2gu1A
25
2idoD
Uncharacterized BCR
75
1fp3A
Uncharacterized FlgJrelated protein
75
no
Acetyltransferases
75
2j8mA
WD40-like repeat
family
Predicted multitransmembrane pro.
No related COG
No related COG
75
1kv9A
75
no
25
25
2oo3A
no
Histone acetyltransferase HPA2
No related COG
100
2pdoD
25
2gk3E
No related COG
75
no
STY4025
STY4075
STY4082
STY4089
STY4090
STY4108
PS_pyruv_trans super
family
Wzy_C super family
Polysacc_deac_2
super family
Peptidase_M23 super
family
No
Polysaccharide pyruvyl transferase
O-antigen ligaserelated
DUF610, YibQ
Peptidase M23B
DUF1680 super
family
Glucosaminidase
super family
Six-hairpin glycosidase
Beta-Nacetylglucosamidase
STY4159
NAT_SF super family
STY4165
Arylsulfotrans
GCN5-related
N-acetyltransferase
Arylsulfotransferase
STY4195
Ribonuclease_BN
super family
DUF519 super family
No
Ribonuclease BNrelated
DNA methylase
No
DUF3749 super
family
A4_betagalactosidase
Transposase_31 super
family
GCN5-related
N-acetyltransferase
DUF1355
DNA polymerase III,
theta subunit
Glycosyl hydrolase
(DUF1680)
Beta-Nacetylglucosaminidase
Acetyltransferase
(GNAT) family
Arylsulfotransferase
(ASST)
Ribonuclease BN-like
family
(DUF519)
TnsA endonuclease
N terminal
Acetyltransferase,
GNAT family
(DUF1355)
Transposase
(putative), YhgA-like
Putative transposase,
YhgA-like
STY4117
STY4135
STY4206
STY4216
STY4247
STY4263
STY4288
No
References
1
Divergent polysaccharide deacetylase
Peptidase family M23
5
Reeves M W, Evins G M, Heiba A A, Plikaytis B D &
Farmer J J, Clonal nature of Salmonella typhi and its genetic
relatedness to other salmonellae as shown by multilocus
enzyme electrophoresis, and proposal of S. bongori comb.
nov., J Clin Microbiol, 27 (1989) 313-230.
6
2
Parry C, Wain J, Chinh N T, Vinh H & Farrar J J,
Quinolone-resistant Salmonella typhi in Vietnam, Lancet,
351 (1998) 1289.
7
3
Parkhill J, Dougan G, James K D, Thomson N R, Pickard D
et al, Complete genome sequence of a multiple drug resistant
Salmonella enterica serovar Typhi CT18, Nature (Lond), 413
(2001) 848-52.
4
Gore D, In silico prediction of structure and enzymatic
activity for hypothetical proteins of Shigella flexneri,
Biofrontiers, 1 (2009) 1-10.
8
9
Gore D & Raut A, Computational function and structural
annotations for hypothetical proteins of Bacillus anthracis,
Biofrontiers, 1 (2009) 27-36.
Dogra P & Gore D, Prediction of enzymatic function and
structure of Haemophilus influenzae hypothetical proteins—
An in silico approach, Int J Soft Comput Bioinform, 1 (2010)
67-77.
Piatek A S, Telenti A, Murry M R, El-Hajj H, Jacobs Jr W R
et al, Genotypic analysis of Mycobacterium tuberculosis in
two distinct populations using molecular beacons: Implications for rapid susceptibility testing, Antimicrob Agents
Chemother, 44 (2000) 103-110.
Anandakumar S & Shanmughavel P, Computational
annotation for hypothetical proteins of Mycobacterium
tuberculosis, J Comput Sci Syst Biol, 1 (2008) 50-62.
Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z
et al, Gapped BLAST and PSI-BLAST: A new generation of
234
10
11
12
13
14
INDIAN J BIOTECHNOL, APRIL 2012
protein database search programs, Nucleic Acids Res, 25
(1997) 3389-3402.
Schäffer A A, Aravind L, Madden T L, Shavirin S, Spouge
J L et al, Improving the accuracy of PSI-BLAST protein
database searches with composition-based statistics and other
refinements, Nucleic Acids Res, 29 (2001) 2994-3005.
Notredame C, Higgins D G, Heringa J, T-Coffee: A novel
method for fast and accurate multiple sequence alignments,
J Mol Biol, 302 (2000) 205-217.
Marti-Renom M A, Stuart A, Fiser A, Sanchez R, Melo F
et al, Comparative protein structure modeling of genes and
genomes, Annu Rev Biophys Biomol Struct, 29 (2000)
291-325.
Fiser A, Do R K & Sali A, Modeling of loops in protein
structures, Protein Sci, 9 (2000) 1753-1773.
Sali A & Blundell T L, Comparative protein modeling by
satisfaction of spatial restraints, J Mol Biol, 234 (1993)
779-815.
15 Chen C C, Hwang J K & Yang J-M, (PS)2: Protein
structure prediction server, Nucleic Acids Res, 34 (2006),
W152-W157.
16 Marchler-Bauer A, Anderson J B, Derbyshire M K,
DeWeese-Scott C, Gonzales N R et al, CDD: A conserved
domain database for interactive domain family analysis,
Nucleic Acids Res, 35(Database issue) (2007) D237-D240.
17 Zdobnov E M & Apweiler R, InterProScan—An integration
platform for the signature- srecognition methods in InterPro,
Bioinformatics, 17 (2001) 847-848.
18 Baker W, van den Broek A, Camon E , Hingamp P, Sterk P
et al, The EMBL nucleotide sequence database, Nucleic
Acids Res, 28 (2000) 19-23.
19 Tatusov R L, Galperin M Y., Natale D A & Koonin E V, The
COG database: A tool for genome-scale analysis of protein
functions and evolution, Nucleic Acids Res, 28 (2000) 33-36.
Related documents