Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Indian Journal of Biotechnology Vol 11, April 2012, pp 224-234 Homology modeling and enzyme function prediction in uncharacterized proteins of Salmonella typhi—An in silico approach D G Gore1*, M K Rathod2, V Soni 3and M M Rai2 1 Sai Bioinfosys-Bioinformatics Research Centre, Raghuji Nagar, Nagpur 440 023, India 2 Centre for Sericulture and Biological Pest Management Research, RTM Nagpur University, Nagpur 440 001, India 3 St. Wilferd College, Jaipur 320 020, India Received 13 September 2010; revised 11 August 2011; accepted 15 October 2011 Salmonella typhi, a known human pathogen registering multiple drug resistance, causes majority of endemic cases in developing nations. S. typhi genome was marked with the 1220 ORFs for hypothetical proteins. Enzyme coding probability was searched in hypothetical proteins using web tools like CDDBLAST, InterProScan, Pfam and COGs. Study sorted out 213 proteins as enzyme coding and for these proteins tertiary structures were predicted based on the homology modeling. About 89 structures were modeled for functional proteins and such a deciphered structure-function relationship could help in detail understanding of regulatory network of S. typhi and establishing new function in uncharacterized regions. Keywords: CDD-BLAST, COGs, function prediction, homology modeling, hypothetical proteins, InterProScan, Pfam, Salmonella typhi Humans are the only natural host of Salmonella typhi, while it shows limited pathogenicity to other mammals. Studies based on isoenzymes have shown that isolates of S. typhi around the world are highly related1. Resistance to common antibiotics like fluoroquinolones, being the most effective drug for typhoid fever, has been reported in S. typhi2. Moreover, S. typhi has recently been reported for multiple drug resistance (MDR) and S. typhi CT18 is considered as one of the examples of emerging MDR microorganism3. Since genome of S. typhi has been sequenced, now a better understanding of pathogenicity and resistance is expected. S. typhi __________ *Author for correspondence: Tel: +91-712-2703977 E-mail: [email protected] genome comprised of 4809037 bases for main circular chromosome, along with 218160 and 106516 bases for pHCM1 and pHCM2 plasmids, respectively3. About 1220 ORFs codes for hypothetical proteins are present in a circular chromosome. These regions comprised of about 25% of coding ability of S. typhi, which has not been analyzed yet. The probable function prediction of these hypothetical proteins is possible by comparative functional genomics with the biological databases using bioinformatics web tools, which have the potential to screen conserved domains in the input. Based on homology information of conserved domain, classification of hypothetical proteins into particular family is possible4-6. This allows the filtering of those hypothetical proteins whose roles in the life cycle could be ascertained on the priority basis by cloning and expression studies7. The primary information regarding the availability of protein sequences of S. typhi have been gathered from the website www.genome.jp/kegg/. The S. typhi hypothetical proteins were screened for the presence of conserved domain(s) by using the following 4 web tools:Conserved Domain BLAST—The CDD 27036 PSSMs database was used to search conserved regions using E-value parameter at 0.01, the value set for getting a very close family members and key kept “ON” for ‘low complexity filter’, which removes all those sequences from the analysis which have not shown evolutionary relationships. InterProScan—The databases like BlastProDom, FPrintscan, HMMPIR, HMMPfam, HMMSmart, HMMTigr, ProfileScan, ScanRegExp, PatternScan, SuperFamily, SignalPHMM, TMHMM, MMPanther and Gene3D were used in the Interproscan functionality search analysis. Pfam—The search strategy used of both Global and Local (merged) type, using Automatic Domain Decomposition Algorithm (ADDA) and by setting Evalue set as 1.0, as with this value results were similar to other programs used. COGs—The parameters were set by using “clades value” as BeTs to 3 clades. Clades used to change the SHORT COMMUNICATIONS stringency of the search, to insist that any COG to which the query protein is assigned must be composed of at least the indicated number of clades. Set the value 3, which was the number used to define the minimal COG. The results obtained from the protein functionality analysis were reported in confidential level in per cent for assigning function to the hypothetical protein. The parameter of confidence limit set as 100, 75, 50, 25 and 0% considering the following rules:1 If the given 4 tools indicate the same enzymatic domain with the similar function inspite of any scores of each tool, then the confidence level were to be 100%. 2 If the given 3 tools indicate the same enzymatic domain with the similar function inspite of any scores of each tool, then the confidence level were to be 75%. 3 If the given 2 tools indicate the same enzymatic domain with the similar function inspite of any scores of each tool, then the confidence level were to be 50%. 4 If the given at least 1 tool indicates the enzymatic domain with the similar function inspite of any scores of each tool, while others are different then the confidence level were to be 25%. 5 If the given tool does not indicate any enzymatic domain then the confidence level were to be 0%4-6,8. The tertiary structures of S. typhi hypothetical proteins were modeled by using PS square [(PS)2], which is an automated homology modeling server. The method used an effective consensus strategy by combining PSI-BLAST9,10, IMPALA10, and 11 T-Coffee in both template selection and targettemplate alignment. The final 3-dimensional structure was built using the modeling package MODELLER available along with server12-14. The web address is http://www.ps2.life.nctu.edu.tw/15. The predicted structures obtained from the PS square were saved in the .PDB format (will be made available through email on request to the corresponding author). Of the 1220 hypothetical proteins anlyzed for enzymatic function, study sorted out 213 hypothetical proteins for probable enzyme activity based on conserved domains found in the primary sequences when aligned 225 Table 1—Percentage classification of S. typhi enzymatic hypothetical proteins Percentage of similarity 100% 75% 50% 25% 0% No. of proteins 31 30 23 57 72 with known enzyme families using 4 web tools. The functionality search by 4 tools has given the variable results and according to conditions set for confidence limits, these 213 proteins were categorized in a particular % confidence (Table 1). The particular enzyme functions were linked with all 213 hypothetical proteins as showcased in Table 2 (will be made available through e-mail on request to the corresponding author). The enzyme data highlight the domain information, which was predicted based on the sequence homology with known protein family information using 4 web tools. The 213 hypothetical proteins with enzymatic conserved domains were used for the protein structure prediction. The server predicted structures of 89 proteins only, where remaining 124 proteins were rejected based on not getting the best template for structure building. The protein structure prediction of 89 hypothetical proteins was done, only when the aligned template has shown the same enzyme family as predicted by 4 web tools (Table 2). The study filtered out 213 S. typhi hypothetical proteins for probable enzymatic function as reported earlier for B. anthracis, S. flexneri and H. influenza4-6. The methodology was useful in predicting the enzyme function and modeling tertiary structure based on the bio-programs involved in the study. These enzyme domain containing proteins could be put into the operation of cloning and expression to decipher its function and in return further establishing the fact about these mysterious hypothetical proteins. The importance of bioinformatics in establishing sequence specific functional relationship was once again realized. Utilized web tools like CDD-BLAST, IterProScan, Pfam and COGs, along with PS square, enabled us to explore functionality in the uncharacterized sequences and along with predicted structures, and these proteins could be implemented further for linking with the life cycle of S. thphi. Predicted functional hypothetical loci could be linked with established metabolic network and may help to understand in detail about these hypothetical regions. INDIAN J BIOTECHNOL, APRIL 2012 226 Table 2—Conseved domain data for hypothetical protein and template for structure prediction KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY0033 DsbA_Com1_like Protein-disulfide isomerase Arylsulfatase A 1eejA Sulfatase super family DSBA-like thioredoxin No 75 STY0099 50 1aukA STY0165 PulE-GspE GSPII_E ATPases 25 1p9rA STYO197 Polysacc_deac_1 DSBA oxidoreductase Alkalinephosphatase-like Type II secretion system protein E Polysacc. deacetylase Polysacc. deacetylase 100 2c1iA STY0260 Glyoxalase Glyoxalase/ dioxygenase 75 2p25A STY0279 Exo_endo_phos Uncharacterized BCR 75 2j63A STY0283 AdoMet_MTases Methyltransferases 100 3ccfB STY0311 Glucosaminidase Endonuclease/ exonuclease/ phosph. Methyltransferase type 11 Acetylglucosamidase Glyoxalase/ Dioxygenase superfamily Endonuclease/ Exonuclease/ phosph. Methyltransferase Xylanase/chitin deacetylase Lactoylglutathione lyase No related COG 75 no STY0313 Glucosaminidase Acetylglucosamidase Sulfate permease 75 no STY0317 No No No related COG 25 no STY0338 Polysacc_deac_1 2c1iA CpxP super family 25 no STY0353 Xc-1258_like UPF0012 Xylanase/chitin deacetylase Restriction endonuclease S Amidohydrolase 100 STY0352 Polysaccharide deacetylase Unintegrated 50 2e11A STY0356 YafJ 1te5A YkuD Glyoxalase 25 75 1y7mA 2r6uD STY0449 STY0478 PRK11295 No 50 25 2qgpB no STY0496 STY0499 4HBT HAD_like Thioesterase Cof protein 100 75 1njkA 1rkqA STY0511 EAL 100 2r6oA STY0523 STY0541 Transposase YbaK_deacylase Diguanylate phosphodiesterase Transposase CHP00011 Glutamine amidotransferase Uncharacterized BCR Lactoylglutathione lyase No related COG Sf. II DNA and RNA helicases Thioesterase Hydrolases of the HAD EAL domain 75 STY0357 STY0447 Glutamine amidotransferase YkuD Glyoxalase// dioxygenase HNH endonuclease No No related COG Uncharacterized ACR 75 50 no 2dxaA STY0547 Membrane protease 100 3bk6A STY0586 Band_7_stomatin_lik e DUF457 super family no PaaI_thioesterase 75 no STY0646 GlyDH-like1 Metal-dependent hydrolases Uncharacterized protein Glycerol dehydrogenase 50 STY0643 100 1ta9B STY0648 ParBc ParB-like nuclease Transposase YbaK/prolyl-tRNA synthetases SPFH domain/Band 7 family Metal-dependent hydrolase (DUF457) Thioesterase superfamily Iron-containing alcohol dehydrogenase ParB-like nuclease 75 1vz0B STY0649 PAPS_reductase PAPS reductase PAPS reductase Band 7 protein DUF457 P.acid degradationrelated protein Alcohol dehydrogenase Acetylglucosaminidase Acetylglucosaminidase L,D-transpeptidase catalytic domain Polysaccharide deacetylase RNA polymerase Rpb5 Carbon-nitrogen hydrolase Glutamine amidotransferases L,D-transpeptidase Glyoxalase// Dioxygenase HNH endonuclease No Thioesterase Haloacid dehalogenase-like hydrolase EAL domain Transcriptional regulators PAPS reductase 100 2oq2C Contd. SHORT COMMUNICATIONS 227 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY0658 Nitrate_red_del TorD-like chaperone 1n1cA SPOUT methyltransferase Alpha/beta hydrolase 75 1ns5B STY0734 SPOUT_MTase super family Esterase_lipase Anaerobic dehydrogenases Uncharacterized ACR 50 STY0692 Nitrate reductase delta subunit SPOUT methyltransferase Alpha/beta hydrolase 75 3bf7A STY0752 AHS1 super family 100 2phcB STY0753 AHS2 super family 100 no STY0767 Glyco_tranf_GTA Hydrolases or acyltransferases Allophanate hydrolase subunit 1 Allophanate hydrolase subunit 2 Glycosyltransferases 100 2ffuA STY0772 Gneg_AbrB_dup super family YvcK_like Aammonia monooxygenase Uncharacterized ACR 100 no 25 2ppvA NO related COG 75 2o3hA 100 3b5qB 75 75 1zatA 2hf2B 75 2hf2B 100 2hunA 25 2o5vA 75 1yacA 50 50 2bibA no STY0848 Exo_endo_phos super family Allophanate hydrolase subunit 1 Allophanate hydrolase subunit 2 Glycosyl transferase, family 2 ammonia monooxygenase 2-Phospho-L-lactate transferase Exonuclease/phosphatase STY0875 Sulfatase Sulfatase STY0878 STY0881 YkuD HAD_like YkuD Cof protein STY0900 HAD_like Cof protein STY0929 NADB_Rossmann STY0935 TOPRIM_OLD NAD-dep. epimerase/dehydratase DUF2813 L,D-transpeptidase Haloacid dehalogenase-like hydrolase Haloacid dehalogenase-like hydrolase NAD dep. epimerase/dehydratase (DUF2813) STY0948 YcaC_related Isochorismatase Isochorismatase STY0984 STY0991 Beta-lactamase Aminoglycoside phosphotransferase Twin-arginine translocation pathway Beta-lactamase Competence protein Phosphotransferase DUF882 Uncharacterized BCR 25 1lbuA Metallo-betalactamase Lon protease (S16) 2gcuA 100 1z0wA STY1103 AdoMet_Mtases 100 1wxxA STY1129 TLP_HIUase 100 2gpzA STY1143 Lactamase_B super family Zn-dep. hydrolases, glyoxylases ATP-dependent protease SAM-dependent methyltransferases Transthyretin-like protein Beta-lactamase superfamily III 75 STY1089 Lactamase_B PKc_like super family Peptidase_M15_3 super family Lactamase_B super family Lon_C super family Metal-dependent hydrolase Uncharacterized BCR Hydrolases of the HAD Hydrolases of the HAD Nucleoside-pp-sugar epimerases ATP-dependent endonuclease Amidases related to nicotinamidase Metal-binding protein No related COG 75 1y44A STY1174 STY1185 Nitrate_red_del super family PLDc STY1193 RHOD_YceA STY0835 STY0998 STY0999 Peptidase S16, Lon protease P. synthase/ar. transglycosylase Transthyretin/hydrox yisourate hydrolase Unintegrated TorD-like chaperone Phospholipase D/Transphosphatidylase Rhodanese-like Allophanate hydrolase subunit 1 Allophanate hydrolase subunit 2 Glycosyl transferase family 2 Ammonia monooxygenase UPF0052 Exonuclease/phosphatase family Sulfatase SAM dependent methyltransferase HIUase/Transthyretin family Metallo-betalactamase superfamily Nitrate reductase delta subunit Phospholipase D Active site motif Rhodanese-like domain Anaerobic dehydrogenases Cardiolipin synthases 50 1s9uA 100 2ze9A Sulfurtransferases 100 2eg4A Contd. INDIAN J BIOTECHNOL, APRIL 2012 228 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY1296 Pat_NTE 75 1oxwC no Catalase, manganese 100 2v8tA STY1321 Ferritin_like Manganese containing catalase Domain of unknown function (DUF892) Fe-S-cluster oxidoreductase Mn-containing catalase No related COG 100 STY1320 UPF0153 super family Mn_catalase Patatin-like phospholipase (UPF0153) Alpha-beta hydrolase STY1318 Lysophospholipase patatin UPF0153 50 2gs4A STY1329 Metal-dependent phosphoesterases 100 EAL domain 100 1m65A STY1354 POLIIIAc super family EAL super family 100 2r6oA STY1360 STY1426 No GFA super family ATPase No related COG 25 75 no 1x6mA STY1433 STY1598 AdoMet_MTases super family M20_dimer super family PRK10281 super family No STY1604 rve super family STY1609 LT_GEWL STY1615 COG4373 super family YtcJ_like STY1452 STY1484 STY1650 STY1757 A4_betagalactosidase PaaI_thioesterase STY1766 EAL super family STY1787 APH_ChoK_like STY1790 NO STY1818 STY1831 Arsenite_oxidase P-loop NTPase super family YeaK STY1741 STY1835 STY1846 STY1869 STY1889 STY1925 STY1942 Transgly_assoc super family Sialidase super family DUF847 Transgly_assoc super family FAA_hydrolase super family Ferritin/ ribonucleotide reductase Polymerase/histidinol phosphatase Diguanylate phosphodiesterase No Formaldehydeactivating, GFA SAM dependent methyltransferase Peptidase M42 PHP domain EAL domain No Formaldehydeactivating enzyme Methyltransferase domain M42 glutamyl aminopeptidase Phenazine biosynthesis-like protein Nucleotide pyrophosphohydrolase Integrase core domain SAM-dependent methyltransferases Cellulase M and related proteins Epimerase, PhzC/PhzF homolog NO related COG 100 2avnA 75 1y0yA 75 1qyaB 25 no Predicted transposase 75 1bcoA Transglycosylase SLT domain Terminase-like family Soluble lytic murein transglycosylase No related COG 75 1qsaA 50 2o0jA Metal-dependent hydrolase DUF1355 Amidohydrolase family DUF1355 Metal-dependent hydrolase No related COG 100 2g3fA 25 2gk3E Phenylacetic acid degradation Diguanylate phosphodiesterase Protein kinase-like domain DUF457, transmembrane Nitroreductase-like PrkA serine kinase Thioesterase superfamily EAL domain Uncharacterized protein EAL domain 75 no 100 2r6oA Fructosamine kinase Fructosamine-3kinase Metal-dependent hydrolases Nitroreductase Putative Ser protein kinase Uncharacterized ACR 75 no 75 no 75 75 3bm1A 1g8pA 75 1vjfA No related COG 75 no No related COG No related COG 50 75 1so7A 2ikbA Predicted membrane proteins 2-Keto-4-pentenoate hydratase 75 1ciiA 100 1nr9A Phenazine biosynthesis No Integrase, catalytic core Lytic transglycosylase-like, catalytic Terminase-like Aminoacyl-tRNA synthetase Transglycosylaseassociated protein Neuraminidase DUF847 Transglycosylaseassociated protein Fumarylacetoacetase Metal-dependent hydrolase (DUF457) Nitroreductase family PrkA AAA domain YbaK/prolyl-tRNA synthetases Transglycosylase associated protein No Predicted lysozyme (DUF847) Transglycosylase associated protein Fumarylacetoacetate (FAA) hydrolase Contd. SHORT COMMUNICATIONS 229 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY1950 2gelA 75 1j7hA 75 1nqzA EAL EAL domain Metal-dependent proteases Translation initiation inhibitor NTP pyrophosphohydrolases EAL domain 75 STY1957 100 2r6oA STY1958 TerC family Hemolysins 100 2o3gA STY2005 CBS_pair_CorC_Hly C GGDEF GGDEF domain GGDEF domain 100 3breB STY2083 Nitrilase super family Carbon-nitrogen hydrolase Predicted amidohydrolase 100 1f89A STY2098 Opacity-associated protein A Isochorismatase 2gu1A 75 1j2rA STY2113 AdoMet_Mtases Methyltransferase 100 1im8A STY2114 AdoMet_MTases 100 3ccfB STY2120 Glyco_hydro_88 super family GATase1_DJ-1 tRNA (cmo5U34)methyltransferase tRNA (mo5U34)methyltransferase Six-hairpin glycosidase-like ThiJ/PfpI Metalloendopeptidases Amidases related to nicotinamidase SAM-dependent methyltransferases SAM-dependent methyltransferases No related COG 75 STY2110 Peptidase_M23 super family Cysteine_hydrolases Peptidase M22, glycoprotease Endoribonuclease LPSP NUDIX hydrolase, NudL, conserved site Diguanylate phosphodiesterase, predicted Cystathionine betasynthase, core Diguanylate cyclase, predicted Nitrilase/cyanide hydratase and Apolipoprotein N-acyltransferase Peptidoglycanbinding Lysin group Isochorismatase-like Glycoprotease family STY1955 COG1214 super family YjgF_YER057c_UK1 14 CoAse 50 2ahfA 100 2ab0A NLP/P60 NlpC/P60 family 75 2evrA Dextransucrase DSRB Diguanylate cyclase, predicted Metal-dependent phosphohydrolase Unintegrated Dextransucrase DSRB GGDEF domain Intracellular protease/amidase Cell wall-associated hydrolases No related COG 75 no GGDEF domain 100 1w25A Predicted HD superfamily hydrolase No related COG 100 3b57A 25 no Uncharacterized BCR 75 1y7mA Uncharacterized ACR 75 3ci3A No related COG 75 2hfsA Diphosphate-sugar epimerases Hemolysins 75 1r6dA 75 2plsA GGDEF domain 100 no No related COG Phospholipid phosphatase EAL domain 25 100 no 1up8A 100 2r6oA STY1952 STY2140 STY2191 NLPC_P60 super family DSRB super family STY2194 GGDEF STY2201 HDc super family STY2202 Peptidase_S10 super family YkuD super family STY2149 STY2218 YkuD domain STY2263 Cob_adeno_trans super family PduX Adenosylcobalamin biosynthesis GHMP kinase STY2279 WcaG STY2332 CBS_pair_CorC_Hly C_assoc GGDEF Epimerase/dehydratase Cystathionine beta-synthase PAS, Diguanylate cyclase Unintegrated Phosphatidic acid phosphatase Diguanylate phosphodiesterase STY2255 STY2336 STY2350 STY2449 STY2451 No PAP2_like super family EAL Endoribonuclease L-PSP NUDIX domain Protein of unknown function (DUF1698) Glycosyl Hydrolase Family 88 DJ-1/PfpI family HD domain Protein of unknown function (DUF1469) L,D-transpeptidase catalytic domain Cobalamin adenosyltransferase GHMP kinases N terminal domain GDP-mannose dehydrogenase Integral membrane protein TerC MASE1 Peptidase S24-like PAP2 superfamily EAL domain Contd. INDIAN J BIOTECHNOL, APRIL 2012 230 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY2471 PagL super family 75 2ervA NTP pyrophosphohydrolases No related COG 100 1ppvB 75 2iw0A STY2543 Nudix_Hydrolase super family Polysacc_deac_1 super family NAT_SF super family Lipid A 3-Odeacylase (PagL) NUDIX domain No related COG STY2525 1xebA HDc super family 100 2parB STY2576 Nudix_Hydrolase_38 100 2fkbC STY2580 100 2c29F 50 2odoA STY2608 DUF1731 super family PLPDE_III_AR_like_ 1 Abi super family Predicted acyltransferases Predicted hydrolases of HD NTP pyrophosphohydrolases Epimerases (SulA family) No related COG 75 STY2562 membrane protease 100 no STY2651 EAL EAL domain 100 2r6oA STY2671 No No No related COG 25 no STY2676 PRK10318 super family Dyp_perox super family ADPRase_NUDT5 Lipid A 3-Odeacylase-related NUDIX hydrolase domain Polysaccharide deacetylase GCN5-related Nacetyltransferase Metal-dependent phosphohydrolase NUDIX hydrolase domain NAD-dep. epimerase/dehydratase Alanine racemase, N-terminal Abortive infection protein Diguanylate cyclase, predicted Aldehyde dehydrogenase Unintegrated Putative papain-like cysteine peptidase Dyp-type peroxidase family NUDIX domain No related COG 25 no Predicted iron-dep. peroxidase NTP pyrophosphohydrolases Beta-lactamase class C Arsenate reductase 100 2iizA 100 1viuB 100 2ffyA 75 1rw1A n-Acetyltransferase 100 2ae6A Metalloprotease 100 3c37A Glycerate kinase 100 1to6A Zn-dependent protease EAL domain 100 3c37A 100 2r6oA Hydrolases 100 2hdwA Phosphoserine phosphatase SAM-dep. Methyltransferases Acyl-CoA synthetase (NDP ) Uncharacterized ACR 50 1l7mB 100 2b3tA 75 no 75 1rw0A 75 2ghsA STY2530 STY2588 STY2683 Dyp-type peroxidase STY2720 Beta-lactamase super family ArsC_Yffb STY2723 DUF699 super family STY2724 Zn_peptidase NUDIX hydrolase domain Beta-lactamaserelated Conserved hypothetical protein GCN5-related Nacetyltransferase Zinc metallopeptidase STY2730 Gly_kinase Glycerate kinase STY2735 Peptidase_M48 Peptidase M48 STY2744 EAL STY2793 STY2835 Esterase_lipase super family HAD_like super family AdoMet_Mtases STY2844 NAT_SF super family STY2850 Cu-oxidase_4 super family Diguanylate phosphodiesterase Peptidase S9, prolyl oligopeptidase HAD-s.f. hydrolase, subfamily IF, YfhB DNA methylase, N-6 adenine-specific GCN5-related Nacetyltransferase Polyphenol oxidoreductase STY2855 SGL STY2714 STY2716 STY2815 Senescence marker prt-30 (SMP-30) Polysaccharide deacetylase Acetyltransferase (GNAT) family HD domain NUDIX domain NAD dep. epimerase/dehydratase Alanine racemase, N-terminal domain CAAX amino terminal protease MASE1 Beta-lactamase ArsC family Domain of unknown function (DUF1726) Neutral zinc metallopeptidase Glycerate kinase family Peptidase family M48 MASE1 Prolyl oligopeptidase family No Methyltransferase small domain Acetyltransferase (GNAT) family Multi-copper polyphenol oxidoreductase SMP30/Gluconolaconase/ LRE-like region Gluconolactonase Contd. SHORT COMMUNICATIONS 231 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs STY2859 GGDEF GGDEF domain STY2866 CBS_pair_CorC_Hly C_assoc Polyketide_cyc2 super family No Diguanylate cyclase, predicted Cystathionine betasynthase Basic-leucine zipper (bZIP) Unintegrated STY2873 STY2880 Putative esterase STY2907 Esterase_lipase super family No STY2918 RHOD super family Rhodanese-like STY2928 CMD super family STY3027 No STY3039 STY3040 NADB_Rossmann super family PRK09989 Carboxy decarboxylase Acyl-CoA N-acyltransferase NAD-dep. epimerase/dehydratase Xylose isomerase STY3047 STY3071 UbiD super family HDc super family STY3078 Lactamase_B super family STY3080 Radical_SAM super family No Radical SAM Gly_kinase super family DUF3412 super family NADB_Rossmann super fam. Peptidase_M48 super family PLPDE_III_Yggs_lik e Polysacc_deac_1 super family HIT_like super family Glycerate kinase STY2893 STY3092 STY3097 STY3108 STY3212 STY3237 STY3253 STY3302 STY3341 STY3345 B12-binding_like super family No STY3358 ABM super family STY3366 GSP_synth super family STY3342 Unintegrated Carboxylyase-related Metal-dependent phosphohydrolase Beta-lactamase-like No Hypothetical protein CHP00730 Monooxygenase, FAD-binding Peptidase M48 Alanine racemase, Nterminal Polysaccharide deacetylase Histidine triad (HIT) protein Elongator protein 3/MiaB/NifB No Carbamoyl phosphate synthetase Glutathionylspermidine synthase % Template GGDEF domain 100 3breB DUF21 CBS domains 75 2o1rA Polyketide cyclase / dehydrase Ubiquitinolcytochrome C reductase Putative esterase Oligoketide cyclase 75 1t17A No related COG 25 no Hydrolase of the alpha/beta No related COG 100 2gzsA 25 no Rhodanese-rel. sulfurtransferases Uncharacterized ACR 100 1yt8A 75 2gmyA Histone acetyltransferase Nucleoside-pp-sugar epimerases Hydroxypyruvate isomerase Carboxylase Predicted helicases 75 2i79A 75 2hrzA 75 1k77A 75 75 2idbA 1gm5A Beta-lactamase superfamily II 100 2p4zB Organic radical activating enzymes No related COG 100 2z2uA 25 no Glycerate kinase 100 1to6A Nucleotide-binding protein FADdep.oxidoreductases Zn-dependent protease Predicted enzyme 25 2pmbA 100 2qa2A 100 3c37A 50 1w8gA Predicted chitin deacetylase HIT family hydrolases Fe-S oxidoreductases family 2 No related COG 100 1z7aA 100 1y23B 50 2qgqA 25 no Uncharacterized ACR 25 1tuvA Glutathionylspermidine synthase 100 2vobA Transmembrane exosortase Rhodanese-like domain Carboxydecarboxylase family Acetyltransferase (GNAT) family NAD dep. epimerase/dehydratase Xylose isomerase-like TIM barrel carboxy-lyase DEAD/DEAH box helicase Metallo-betalactamase superfamily Radical SAM superfamily Glucodextranase, domain B Glycerate kinase family Possible lysine decarboxylase FAD binding domain Peptidase family M48 Alanine racemase, N-terminal domain Polysaccharide deacetylase HIT domain Radical SAM Nterminal 4-Alpha-Lfucosyltransferase Antibiotic biosynthesis monooxygenase Glutathionylspermidine synthase Contd. INDIAN J BIOTECHNOL, APRIL 2012 232 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY3367 Extradiol ringcleavage dioxygenase Adenylate cyclase aromatic ring-opening dioxygenase CYTH domain Uncharacterized ACR 75 2pw6A Uncharacterized ACR 75 3bhdA STY3400 45_DOPA_Dioxygen ase CYTHlike_Pase_CHAD AdoMet_Mtases DNA methylase, 2pjdA DUF45 super family DUF45 25 no STY3413 GST_C_ECM4_like 1eemA 75 2hp0A STY3446 SDH_alpha super family TP_methylase Glutathione S-transferase Serine dehydratase 100 STY3418 Methyltransferases 100 1pjqB STY3448 UPF0102 Glutathione S-transferase Serine dehydrataselike Tetrapyrrole methylase UPF0102 16S RNA G1207 methylase RsmC Metal-dependent hydrolase Glutathione S-transferase Uncharacterized ACR 100 STY3401 Methyltransferase small domain DUF45 25 no STY3451 NADB 2a35A GATase1_PfpI_like Semialdehyde dehydrogenase DJ-1/PfpI family 50 STY3452 Semialdehyde dehydrogenase ThiJ/PfpI 100 1oi4A STY3458 Peptidase U32 Peptidase family U32 100 1i4nA Luciferase-like Luciferase-like monooxygenase 75 1lucA 1x6vB 100 1oltA TIM-barrel dehydrogenases GGDEF domain 100 1vhnA GGDEF 100 1w25A STY3603 PaaI_thioesterase 2cy9B COG5283 50 no STY3755 Sulfatase Sulfatase 100 3b5qB STY3765 MPP_UshA_N_like 75 2z1aA STY3803 ADP_ribosyl_GH super family ADPribosylglycohydrolase 100 1t5jA STY3805 No related COG 25 no Rhamnose mutarotase Aldose 1-epimerase Heptaprenyl diphosphate synthase (DUF718) Aldose 1-epimerase Uncharacterized ACR Galactose mutarotase 75 100 1x8dA 1snzA STY3866 PRK09669 super family DUF718 super family Aldose_epim super family Sulfatase super family Metallophosphoesterase ADPribosylation/Crystallin J1 Unintegrated Uncharacterized protein Chr. segregation ATPases Metal-dependent hydrolase Phosphodiesterase 75 STY3700 Phenylacetic acid degradation Phage tail tape P-loop ATPase protein family Radical SAM superfamily Dihydrouridine synthase (Dus) Bacterial signalling prt. Thioesterase superfamily Phage-related minor tail protein Sulfatase 100 STY3568 ATPase, P-loopcontaining Hypothetical protein CHP01212 tRNA-dihydrouridine synthase Diguanylate cyclase Predicted P-loopcontaining kinase Fe-S oxidoreductases STY3564 Peptidase_U32 super family Flavin_utilizing_monoxy genases ATP_bind_2 super family Radical_SAM super family DUS_like_FMN Endonuclease / resolvase Nucleoside-PP-sugar epimerases Intracellular protease/amidase Collagenase and related proteases Reductase & flavindep.oxidoreductases Sulfatase Sulfatase 100 1aukA STY3870 HAD_like Cof protein 75 1nf2A STY3933 FMN_red super family NADPH-dependent FMN reductase Haloacid dehalogenase-like hydrolase NADPH-dependent FMN reductase Metal-dependent hydrolase Hydrolases of the HAD Predicted flavoprotein 100 1rttA STY3381 STY3459 STY3502 STY3508 STY3831 STY3858 Tetrapyrrole Methylases UPF0102 Calcineurin-like phosphoesterase ADPribosylglycohydrolase Contd. SHORT COMMUNICATIONS 233 Table 2—Conseved domain data for hypothetical protein and template for structure prediction—Contd. KEGG No. CDD-BLAST InterProScan Pfam COGs % Template STY3980 No 50 2osxA PTS_IIA_fru Phosphotransferase 75 1xizB STY4020 Transposase_31 super family YcaC_related Cellulase (glycosyl hydrolase family 5) Sugar phosphotransferase system Putative transposase No related COG STY3998 Glycoside hydrolase, family 5 Phosphotransferase system Transposase (putative) Isochorismatase-like No related COG 75 no Isochorismatase family Polysaccharide pyruvyl transferase O-Antigen ligase Amidases related to nicotinamidase No related COG 100 1yacA 75 1vgvA No related COG 75 no Uncharacterized BCR 75 2nlyA metalloendopeptidases No related COG 100 2gu1A 25 2idoD Uncharacterized BCR 75 1fp3A Uncharacterized FlgJrelated protein 75 no Acetyltransferases 75 2j8mA WD40-like repeat family Predicted multitransmembrane pro. No related COG No related COG 75 1kv9A 75 no 25 25 2oo3A no Histone acetyltransferase HPA2 No related COG 100 2pdoD 25 2gk3E No related COG 75 no STY4025 STY4075 STY4082 STY4089 STY4090 STY4108 PS_pyruv_trans super family Wzy_C super family Polysacc_deac_2 super family Peptidase_M23 super family No Polysaccharide pyruvyl transferase O-antigen ligaserelated DUF610, YibQ Peptidase M23B DUF1680 super family Glucosaminidase super family Six-hairpin glycosidase Beta-Nacetylglucosamidase STY4159 NAT_SF super family STY4165 Arylsulfotrans GCN5-related N-acetyltransferase Arylsulfotransferase STY4195 Ribonuclease_BN super family DUF519 super family No Ribonuclease BNrelated DNA methylase No DUF3749 super family A4_betagalactosidase Transposase_31 super family GCN5-related N-acetyltransferase DUF1355 DNA polymerase III, theta subunit Glycosyl hydrolase (DUF1680) Beta-Nacetylglucosaminidase Acetyltransferase (GNAT) family Arylsulfotransferase (ASST) Ribonuclease BN-like family (DUF519) TnsA endonuclease N terminal Acetyltransferase, GNAT family (DUF1355) Transposase (putative), YhgA-like Putative transposase, YhgA-like STY4117 STY4135 STY4206 STY4216 STY4247 STY4263 STY4288 No References 1 Divergent polysaccharide deacetylase Peptidase family M23 5 Reeves M W, Evins G M, Heiba A A, Plikaytis B D & Farmer J J, Clonal nature of Salmonella typhi and its genetic relatedness to other salmonellae as shown by multilocus enzyme electrophoresis, and proposal of S. bongori comb. nov., J Clin Microbiol, 27 (1989) 313-230. 6 2 Parry C, Wain J, Chinh N T, Vinh H & Farrar J J, Quinolone-resistant Salmonella typhi in Vietnam, Lancet, 351 (1998) 1289. 7 3 Parkhill J, Dougan G, James K D, Thomson N R, Pickard D et al, Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18, Nature (Lond), 413 (2001) 848-52. 4 Gore D, In silico prediction of structure and enzymatic activity for hypothetical proteins of Shigella flexneri, Biofrontiers, 1 (2009) 1-10. 8 9 Gore D & Raut A, Computational function and structural annotations for hypothetical proteins of Bacillus anthracis, Biofrontiers, 1 (2009) 27-36. Dogra P & Gore D, Prediction of enzymatic function and structure of Haemophilus influenzae hypothetical proteins— An in silico approach, Int J Soft Comput Bioinform, 1 (2010) 67-77. Piatek A S, Telenti A, Murry M R, El-Hajj H, Jacobs Jr W R et al, Genotypic analysis of Mycobacterium tuberculosis in two distinct populations using molecular beacons: Implications for rapid susceptibility testing, Antimicrob Agents Chemother, 44 (2000) 103-110. Anandakumar S & Shanmughavel P, Computational annotation for hypothetical proteins of Mycobacterium tuberculosis, J Comput Sci Syst Biol, 1 (2008) 50-62. Altschul S F, Madden T L, Schäffer A A, Zhang J, Zhang Z et al, Gapped BLAST and PSI-BLAST: A new generation of 234 10 11 12 13 14 INDIAN J BIOTECHNOL, APRIL 2012 protein database search programs, Nucleic Acids Res, 25 (1997) 3389-3402. Schäffer A A, Aravind L, Madden T L, Shavirin S, Spouge J L et al, Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements, Nucleic Acids Res, 29 (2001) 2994-3005. Notredame C, Higgins D G, Heringa J, T-Coffee: A novel method for fast and accurate multiple sequence alignments, J Mol Biol, 302 (2000) 205-217. Marti-Renom M A, Stuart A, Fiser A, Sanchez R, Melo F et al, Comparative protein structure modeling of genes and genomes, Annu Rev Biophys Biomol Struct, 29 (2000) 291-325. Fiser A, Do R K & Sali A, Modeling of loops in protein structures, Protein Sci, 9 (2000) 1753-1773. Sali A & Blundell T L, Comparative protein modeling by satisfaction of spatial restraints, J Mol Biol, 234 (1993) 779-815. 15 Chen C C, Hwang J K & Yang J-M, (PS)2: Protein structure prediction server, Nucleic Acids Res, 34 (2006), W152-W157. 16 Marchler-Bauer A, Anderson J B, Derbyshire M K, DeWeese-Scott C, Gonzales N R et al, CDD: A conserved domain database for interactive domain family analysis, Nucleic Acids Res, 35(Database issue) (2007) D237-D240. 17 Zdobnov E M & Apweiler R, InterProScan—An integration platform for the signature- srecognition methods in InterPro, Bioinformatics, 17 (2001) 847-848. 18 Baker W, van den Broek A, Camon E , Hingamp P, Sterk P et al, The EMBL nucleotide sequence database, Nucleic Acids Res, 28 (2000) 19-23. 19 Tatusov R L, Galperin M Y., Natale D A & Koonin E V, The COG database: A tool for genome-scale analysis of protein functions and evolution, Nucleic Acids Res, 28 (2000) 33-36.