* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download procite - UWI St. Augustine
Transcriptional regulation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Biochemistry wikipedia , lookup
Paracrine signalling wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Signal transduction wikipedia , lookup
Gene expression wikipedia , lookup
Silencer (genetics) wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Homology modeling wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein structure prediction wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Anthrax toxin wikipedia , lookup
{PDOC00000} {BEGIN} ********************************** *** PROSITE documentation file *** ********************************** Release 20.58 of 15-Dec-2009. PROSITE is developed by the Swiss Institute of Bioinformatics (SIB) under the responsability of Amos Bairoch and Nicolas Hulo. This release was prepared by: Nicolas Hulo, Virginie Bulliard, Petra Langendijk-Genevaux and Christian Sigrist with the help of Edouard de Castro, Lorenzo Cerutti, Corinne Lachaize and Amos Bairoch. See: http://www.expasy.org/prosite/ Email: [email protected] Acknowledgements: - To all those mentioned in this document who have reviewed the entry(ies) for which they are listed as experts. With specific thanks to Rein Aasland, Mark Boguski, Peer Bork, Josh Cherry, Andre Chollet, Frank Kolakowski, David Landsman, Bernard Henrissat, Eugene Koonin, Steve Henikoff, Manuel Peitsch and Jonathan Reizer. - Jim Apostolopoulos is the author of the PDOC00699 entry. - Brigitte Boeckmann is the author of the PDOC00691, PDOC00703, PDOC00829, PDOC00796, PDOC00798, PDOC00799, PDOC00906, PDOC00907, PDOC00908, PDOC00912, PDOC00913, PDOC00924, PDOC00928, PDOC00929, PDOC00955, PDOC00961, PDOC00966, PDOC00988 and PDOC50020 entries. - Jean-Louis Boulay is the author of the PDOC01051, PDOC01050, PDOC01052, PDOC01053 and PDOC01054 entries. - Ryszard Brzezinski is the author of the PDOC60000 entry. - Elisabeth Coudert is the author of the PDOC00373 entry. - Kirill Degtyarenko is the author of the PDOC60001 entry. - Christian Doerig is the author of the PDOC01049 entry. - Kay Hofmann is the author of the PDOC50003, PDOC50006, PDOC50007 and PDOC50017 entries. - Chantal Hulo is the author of the PDOC00987 entry. - Karine Michoud is the author of the PDOC01044 and PDOC01042 entries. - Yuri Panchin is the author of the PDOC51013 entry. - S. Ramakumar is the author of the PDOC51052, PDOC60004, PDOC60010, PDOC60011, PDOC60015, PDOC60016, PDOC60018, PDOC60020, PDOC60021, PDOC60022, PDOC60023, PDOC60024, PDOC60025, PDOC60026, PDOC60027, PDOC60028, PDOC60029 and PDOC60030 entries. - Keith Robison is the author of the PDOC00830 and PDOC00861 entries. ----------------------------------------------------------------------PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by nonprofit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. ----------------------------------------------------------------------+-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00001} {PS00001; ASN_GLYCOSYLATION} {BEGIN} ************************ * N-glycosylation site * ************************ It has been known for a long time [1] that potential N-glycosylation sites are specific to the consensus sequence Asn-Xaa-Ser/Thr. It must be noted that the presence of the consensus tripeptide is not sufficient to conclude that an asparagine residue is glycosylated, due to the fact that the folding of the protein plays an important role in the regulation of N-glycosylation [2]. It has been shown [3] that the presence of proline between Asn and Ser/Thr will inhibit N-glycosylation; this has been confirmed by a recent [4] statistical analysis of glycosylation sites, which also shows that about 50% of the sites that have a proline C-terminal to Ser/Thr are not glycosylated. It must also be noted that there are a few reported cases of glycosylation sites with the pattern Asn-Xaa-Cys; an experimentally demonstrated occurrence of such a non-standard site is found in the plasma protein C [5]. -Consensus pattern: N-{P}-[ST]-{P} [N is the glycosylation site] -Last update: May 1991 / Text revised. [ 1] Marshall R.D. "Glycoproteins." Annu. Rev. Biochem. 41:673-702(1972). PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325 [ 2] Pless D.D., Lennarz W.J. "Enzymatic conversion of proteins to glycoproteins." Proc. Natl. Acad. Sci. U.S.A. 74:134-138(1977). PubMed=264667 [ 3] Bause E. "Structural requirements of N-glycosylation of proteins. Studies with proline peptides as conformational probes." Biochem. J. 209:331-336(1983). PubMed=6847620 [ 4] Gavel Y., von Heijne G. "Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering." Protein Eng. 3:433-442(1990). PubMed=2349213 [ 5] Miletich J.P., Broze G.J. Jr. "Beta protein C is not glycosylated at asparagine 329. The rate of translation may influence the frequency of usage at asparagine-X-cysteine sites." J. Biol. Chem. 265:11397-11404(1990). PubMed=1694179 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00004} {PS00004; CAMP_PHOSPHO_SITE} {BEGIN} **************************************************************** * cAMP- and cGMP-dependent protein kinase phosphorylation site * **************************************************************** There has been a number of studies relative to the specificity of cAMP- and cGMP-dependent protein kinases [1,2,3]. Both types of kinases appear to share a preference for the phosphorylation of serine or threonine residues found close to at least two consecutive N-terminal basic residues. It is important to note that there are quite a number of exceptions to this rule. -Consensus pattern: [RK](2)-x-[ST] [S or T is the phosphorylation site] -Last update: June 1988 / First entry. [ 1] Fremisco J.R., Glass D.B., Krebs E.G. J. Biol. Chem. 255:4240-4245(1980). [ 2] Glass D.B., Smith S.B. "Phosphorylation by cyclic GMP-dependent protein kinase of a synthetic peptide corresponding to the autophosphorylation site in the enzyme." J. Biol. Chem. 258:14797-14803(1983). PubMed=6317673 [ 3] Glass D.B., el-Maghrabi M.R., Pilkis S.J. "Synthetic peptides corresponding to the site phosphorylated in 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase as substrates of cyclic nucleotide-dependent protein kinases." J. Biol. Chem. 261:2987-2993(1986). PubMed=3005275 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00005} {PS00005; PKC_PHOSPHO_SITE} {BEGIN} ***************************************** * Protein kinase C phosphorylation site * ***************************************** In vivo, protein kinase C phosphorylation of exhibits a preference for the serine or threonine residues found close to a C-terminal basic residue [1,2]. The presence of additional basic residues at the N- or C-terminal of the target amino acid enhances the Vmax and Km of the phosphorylation reaction. -Consensus pattern: [ST]-x-[RK] [S or T is the phosphorylation site] -Last update: June 1988 / First entry. [ 1] Woodget J.R., Gould K.L., Hunter T. Eur. J. Biochem. 161:177-184(1986). [ 2] Kishimoto A., Nishiyama K., Nakanishi H., Uratsuji Y., Nomura H., Takeyama Y., Nishizuka Y. "Studies on the phosphorylation of myelin basic protein by protein kinase C and adenosine 3':5'-monophosphate-dependent protein kinase." J. Biol. Chem. 260:12492-12499(1985). PubMed=2413024 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00006} {PS00006; CK2_PHOSPHO_SITE} {BEGIN} ***************************************** * Casein kinase II phosphorylation site * ***************************************** Casein kinase II (CK-2) is a protein serine/threonine kinase whose activity is independent of cyclic nucleotides and calcium. CK-2 phosphorylates many different proteins. The substrate specificity [1] of this enzyme can be summarized as follows: (1) Under comparable conditions Ser is favored over Thr. (2) An acidic residue (either Asp or Glu) must be present three residues from the C-terminal of the phosphate acceptor site. (3) Additional acidic residues in positions +1, +2, +4, and +5 increase the phosphorylation rate. Most physiological substrates have at least one acidic residue in these positions. (4) Asp is preferred to Glu as the provider of acidic determinants. (5) A basic residue at the N-terminal of the acceptor site decreases the phosphorylation rate, while an acidic one will increase it. -Consensus pattern: [ST]-x(2)-[DE] [S or T is the phosphorylation site] -Note: This pattern is found in most of the known physiological substrates. -Last update: May 1991 / Text revised. [ 1] Pinna L.A. "Casein kinase 2: an 'eminence grise' in cellular regulation?" Biochim. Biophys. Acta 1054:267-284(1990). PubMed=2207178 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00007} {PS00007; TYR_PHOSPHO_SITE} {BEGIN} **************************************** * Tyrosine kinase phosphorylation site * **************************************** Substrates of tyrosine protein kinases are generally characterized by a lysine or an arginine seven residues to the N-terminal side of the phosphorylated tyrosine. An acidic residue (Asp or Glu) is often found at either three or four residues to the N-terminal side of the tyrosine [1,2,3]. There are a number of exceptions to this rule such as the tyrosine phosphorylation sites of enolase and lipocortin II. -Consensus pattern: [RK]-x(2)-[DE]-x(3)-Y or [RK]-x(3)-[DE]-x(2)-Y [Y is the phosphorylation site] -Last update: June 1988 / First entry. [ 1] Patschinsky T., Hunter T., Esch F.S., Cooper J.A., Sefton B.M. "Analysis of the sequence of amino acids surrounding sites of tyrosine phosphorylation." Proc. Natl. Acad. Sci. U.S.A. 79:973-977(1982). PubMed=6280176 [ 2] Hunter T. "Synthetic peptide substrates for a tyrosine protein kinase." J. Biol. Chem. 257:4843-4848(1982). PubMed=6279650 [ 3] Cooper J.A., Esch F.S., Taylor S.S., Hunter T. "Phosphorylation sites in enolase and lactate dehydrogenase utilized by tyrosine protein kinases in vivo and in vitro." J. Biol. Chem. 259:7835-7841(1984). PubMed=6330085 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00008} {PS00008; MYRISTYL} {BEGIN} ************************* * N-myristoylation site * ************************* An appreciable number of eukaryotic proteins are acylated by the covalent addition of myristate (a C14-saturated fatty acid) to their N-terminal residue via an amide linkage [1,2]. The sequence specificity of the enzyme responsible for this modification, myristoyl CoA:protein N-myristoyl transferase (NMT), has been derived from the sequence of known N-myristoylated proteins and from studies using synthetic peptides. It seems to be the following: - The N-terminal residue must be glycine. - In position 2, uncharged residues are allowed. proline and large hydrophobic residues are not allowed. Charged residues, - In positions 3 and 4, most, if not all, residues are allowed. - In position 5, small uncharged residues are allowed (Ala, Ser, Thr, Cys, Asn and Gly). Serine is favored. - In position 6, proline is not allowed. -Consensus pattern: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P} [G is the N-myristoylation site] -Note: We deliberately include as potential myristoylated glycine residues, those which are internal to a sequence. It could well be that the sequence under study represents a viral polyprotein precursor and that subsequent proteolytic processing could expose an internal glycine as the Nterminal of a mature protein. -Last update: October 1989 / Pattern and text revised. [ 1] Towler D.A., Gordon J.I., Adams S.P., Glaser L. "The biology and enzymology of eukaryotic protein acylation." Annu. Rev. Biochem. 57:69-99(1988). PubMed=3052287; DOI=10.1146/annurev.bi.57.070188.000441 [ 2] Grand R.J.A. "Acylation of viral and eukaryotic proteins." Biochem. J. 258:625-638(1989). PubMed=2658970 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00009} {PS00009; AMIDATION} {BEGIN} ****************** * Amidation site * ****************** The precursor of hormones and other active peptides which are Cterminally amidated is always directly followed [1,2] by a glycine residue which provides the amide group, and most often by at least two consecutive basic residues (Arg or Lys) which generally function as an active peptide precursor cleavage site. Although all amino acids can be amidated, neutral hydrophobic residues such as Val or Phe are good substrates, while charged residues such as Asp or Arg are much less reactive. C-terminal amidation has not yet been shown to occur in unicellular organisms or in plants. -Consensus pattern: x-G-[RK]-[RK] [x is the amidation site] -Last update: June 1988 / First entry. [ 1] Kreil G. "Occurrence, detection, and biosynthesis of carboxy-terminal amides." Methods Enzymol. 106:218-223(1984). PubMed=6548541 [ 2] Bradbury A.F., Smyth D.G. "Biosynthesis of the C-terminal amide in peptide hormones." Biosci. Rep. 7:907-916(1987). PubMed=3331120 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00010} {PS00010; ASX_HYDROXYL} {BEGIN} *************************************************** * Aspartic acid and asparagine hydroxylation site * *************************************************** Post-translational hydroxylation of aspartic acid or asparagine [1] to form erythro-beta-hydroxyaspartic acid or erythro-beta-hydroxyasparagine has been identified in a number of proteins with domains homologous to epidermal growth factor (EGF). Examples of such proteins are the blood coagulation protein factors VII, IX and X, proteins C, S, and Z, the LDL receptor, thrombomodulin, etc. Based on sequence comparisons of the EGF-homology region that contains hydroxylated Asp or Asn, a consensus sequence has been identified that seems to be required by the hydroxylase(s). -Consensus pattern: C-x-[DN]-x(4)-[FY]-x-C-x-C [D or N is the hydroxylation site] -Note: This consensus pattern is located in the N-terminal of EGF-like domains, while our EGF-like cysteine pattern signature (see the relevant entry <PDOC00021>) is located in the C-terminal. -Last update: January 1989 / First entry. [ 1] Stenflo J., Ohlin A.-K., Owen W.G., Schneider W.J. "beta-Hydroxyaspartic acid or beta-hydroxyasparagine in bovine low density lipoprotein receptor and in bovine thrombomodulin." J. Biol. Chem. 263:21-24(1988). PubMed=2826439 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00011} {PS00011; GLA_1} {PS50998; GLA_2} {BEGIN} ********************************************************************** * Gamma-carboxyglutamic acid-rich (Gla) domain signature and profile * ********************************************************************** The vitamin K-dependent blood coagulation factor IX as well as several extracellular regulatory proteins require vitamin K for the posttranslational synthesis of gamma-carboxyglutamic acid, an amino acid clustered in the N-terminal Gla domain of these proteins [1,2]. The Gla domain is a membrane binding motif which, in the presence of calcium ions, with phospholipid membranes that include phosphatidylserine. interacts The 3D structure of the Gla domain has been solved (see for example <PDB:1CFH>) [3,4]. Calcium ions induce conformational changes in the Gla domain and are necessary for the Gla domain to fold properly. A common structural feature of functional Gla domains is the clustering of Nterminal hydrophobic residues into a hydrophobic patch that mediates interaction with the cell surface membrane [4]. Proteins known to contain a Gla domain are listed below: - A number of plasma proteins involved in blood coagulation. These proteins are prothrombin, coagulation factors VII, IX and X, proteins C, S, and Z. - Two proteins that occur in calcified tissues: osteocalcin (also known as bone-Gla protein, BGP), and matrix Gla-protein (MGP). - Proline-rich Gla proteins 1 and 2 [5]. - Cone snail venom peptides: conantokin-G and -T, and conotoxin GS [6]. The pattern we developed start with the conserved Gla-x(3)-Gla-x-Cys motif found in the middle of the domain which seems to be important for substrate recognition by the carboxylase [7] and end with the last conserved position of the domain (an aromatic residue). We also developed a profile that covers the whole Gla domain. -Consensus pattern: E-x(2)-[ERK]-E-x-C-x(6)-[EDR]-x(10,11)-[FYA]-[YW] [The 2 E's are the carboxylation site] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 1. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: All glutamic residues present in the domain are potential carboxylation sites; in coagulation proteins, all are modified to Gla, while in BGP and MGP some are not. -Expert(s) to contact by email: Price P.A.; [email protected] -Last update: June 2004 / Pattern and text revised; profile added. [ 1] Friedman P.A., Przysiecki C.T. "Vitamin K-dependent carboxylation." Int. J. Biochem. 19:1-7(1987). PubMed=3106112 [ 2] Vermeer C. "Gamma-carboxyglutamate-containing proteins and the vitamin K-dependent carboxylase." Biochem. J. 266:625-636(1990). PubMed=2183788 [ 3] Freedman S.J., Furie B.C., Furie B., Baleja J.D. "Structure of the metal-free gamma-carboxyglutamic acid-rich membrane binding region of factor IX by two-dimensional NMR spectroscopy." J. Biol. Chem. 270:7980-7987(1995). PubMed=7713897 [ 4] Freedman S.J., Blostein M.D., Baleja J.D., Jacobs M., Furie B.C., Furie B. "Identification of the phospholipid binding site in the vitamin K-dependent blood coagulation protein factor IX." J. Biol. Chem. 271:16227-16236(1996). PubMed=8663165 [ 5] Kulman J.D., Harris J.E., Haldeman B.A., Davie E.W. "Primary structure and tissue distribution of two novel proline-rich gamma-carboxyglutamic acid proteins." Proc. Natl. Acad. Sci. U.S.A. 94:9058-9062(1997). PubMed=9256434 [ 6] Haack J.A., Rivier J.E., Parks T.N., Mena E.E., Cruz L.J., Olivera B.M. "Conantokin-T. A gamma-carboxyglutamate containing peptide with N-methyl-d-aspartate antagonist activity." J. Biol. Chem. 265:6025-6029(1990). PubMed=2180939 [ 7] Price P.A., Fraser J.D., Metz-Virca G. "Molecular cloning of matrix Gla protein: implications for substrate recognition by the vitamin K-dependent gamma-carboxylase." Proc. Natl. Acad. Sci. U.S.A. 84:8335-8339(1987). PubMed=3317405 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00012} {PS00012; PHOSPHOPANTETHEINE} {PS50075; ACP_DOMAIN} {BEGIN} ************************************** * Phosphopantetheine attachment site * ************************************** Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some multienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and amino-acid groups [1]. Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins or domains have been found in various enzyme systems which are listed below (references are only provided for recently determined sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of longchain fatty acids from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits which correspond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase domain and the C-terminal thioesterase domain [3]. - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary metabolites produced from simple fatty acids, by microorganisms and plants. ACP is one of the polypeptidic components involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin, curamycin, granatacin, monensin, oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketide synthases pksK, pksL and pksM which respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicylic acid synthase (MSAS) from Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic and which contains an ACP domain in the Cterminal extremity. - Multifunctional mycocerosic acid synthase (gene mas) from Mycobacterium bovis. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the first step in the biosynthesis of the cyclic antibiotic gramicidin S. - Tyrocidine synthetase I (gene tycA) from Bacillus brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymerizes proline, valine, ornithine and leucine. GrsB contains four ACP domains. - Erythronolide synthase proteins 1, 2 and 3 from Saccharopolyspora erythraea which is involved in the biosynthesis of the polyketide antibiotic erythromicin. Each of these proteins contain two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans. - ACV synthetase from various fungi. This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains. - Enterobactin synthetase component F (gene entF) from Escherichia coli. This enzyme is involved in the ATP-dependent activation of serine during enterobactin (enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains a single domain. - HC-toxin synthetase (gene HTS1) from Cochliobolus carbonum. This enzyme synthesizes HC-toxin, a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP, which is part of the respiratory chain NADH dehydrogenase (complex I). - Rhizobium nodulation protein nodF, which probably acts as an ACP in the synthesis of the nodulation Nod factor fatty acyl chain. The sequence around the phosphopantetheine attachment site is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the complete ACP-like domain. -Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-[LIVMFYAG][DNEKHS]-S[LIVMST]-{PCFY}-[STAGCPQLIVMF]-[LIVMATN][DENQGTAKRHLM][LIVMWSTA]-[LIVGSTACR]-{LPIY}-{VY}-[LIVMFA] [S is the pantetheine attachment site] -Sequences known to belong to this class detected by the pattern: ALL, except C.paradoxa ACP. -Other sequence(s) detected in Swiss-Prot: 115. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: December 2004 / Pattern and text revised. [ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New-York (1988). [ 2] Pugh E.L., Wakil S.J. J. Biol. Chem. 240:4727-4733(1965). [ 3] Witkowski A., Rangan V.S., Randhawa Z.I., Amy C.M., Smith S. "Structural organization of the multifunctional animal fatty-acid synthase." Eur. J. Biochem. 198:571-579(1991). PubMed=2050137 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00013} {PS51257; PROKAR_LIPOPROTEIN} {BEGIN} ****************************************************************** * Prokaryotic membrane lipoprotein lipid attachment site profile * ****************************************************************** In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such processing currently include (for recent listings see [1,2,3]): - Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp). - Escherichia coli lipoprotein-28 (gene nlpA). - Escherichia coli lipoprotein-34 (gene nlpB). - Escherichia coli lipoprotein nlpC. - Escherichia coli lipoprotein nlpD. - Escherichia coli osmotically inducible lipoprotein B (gene osmB). - Escherichia coli osmotically inducible lipoprotein E (gene osmE). - Escherichia coli peptidoglycan-associated lipoprotein (gene pal). - Escherichia coli rare lipoproteins A and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF (or nlpE). - Escherichia coli plasmids traT proteins. - Escherichia coli Col plasmids lysis proteins. - A number of Bacillus beta-lactamases. - Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). - Chlamydia trachomatis outer membrane protein 3 (gene omp3). - Fibrobacter succinogenes endoglucanase cel-3. - Haemophilus influenzae proteins Pal and Pcp. - Klebsiella pullulunase (gene pulA). - Klebsiella pullulunase secretion protein pulS. - Mycoplasma hyorhinis protein p37. - Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vlpABC). - Neisseria outer membrane protein H.8. - Pseudomonas aeruginosa lipopeptide (gene lppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). - Rickettsia 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxiJ and mxiM. - Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). - Treponema pallidium 34 Kd antigen. - Treponema pallidium membrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmid protein yscJ. - Halocyanin from Natrobacterium pharaonis [4], a membrane associated copperbinding protein. This is the first archaebacterial protein known to be modified in such a fashion). From the precursor sequences of all these proteins, we derived a profile that starts at the beginning of the sequence and ends after the post-translationally modified cysteine. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: some 100 prokaryotic proteins. Some of them are not membrane lipoproteins, but at least half of them could be. -Note: This profile replace an obsolete rule. All the information in the rule has been encoded in the profile format. -Last update: October 2006 / Text revised; profiles added; rule deleted. [ 1] Hayashi S., Wu H.C. "Lipoproteins in bacteria." J. Bioenerg. Biomembr. 22:451-471(1990). PubMed=2202727 [ 2] Klein P., Somorjai R.L., Lau P.C.K. "Distinctive properties of signal sequences from bacterial lipoproteins." Protein Eng. 2:15-20(1988). PubMed=3253732 [ 3] von Heijne G. Protein Eng. 2:531-534(1989). [ 4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D., Engelhard M. "The primary structure of halocyanin, an archaeal blue copper protein, predicts a lipid anchor for membrane fixation." J. Biol. Chem. 269:14939-14945(1994). PubMed=8195126 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00014} {PS00014; ER_TARGET} {BEGIN} ******************************************** * Endoplasmic reticulum targeting sequence * ******************************************** Proteins that permanently reside in the lumen of the endoplasmic reticulum (ER) seem to be distinguished from newly synthesized secretory proteins by the presence of the C-terminal sequence Lys-Asp-Glu-Leu (KDEL) [1,2]. While KDEL is the preferred signal in many species, variants of that signal are used by different species. This situation is described in the following table. Signal Species ---------------------------------------------------------------KDEL Vertebrates, Drosophila, Caenorhabditis elegans, plants HDEL Saccharomyces cerevisiae, Kluyveromyces lactis, plants DDEL Kluyveromyces lactis ADEL Schizosaccharomyces pombe (fission yeast) SDEL Plasmodium falciparum The signal is usually very strictly conserved in major ER proteins but some minor ER proteins have divergent sequences (probably because efficient retention of these proteins is not crucial to the cell). Proteins bearing the KDEL-type signal are not simply held in the ER, but are selectively retrieved from a post-ER compartment by a receptor and returned to their normal location. The currently known ER luminal proteins are listed below. - Protein disulfide-isomerase (PDI) (also known as the betasubunit of prolyl 4-hydroxylase, as a component of oligosaccharyl transferase, as glutathione-insulin transhydrogenase and as a thyroid hormone binding protein). - ERp60, ERp72, and P5, three minor isoforms of PDI. - Trypanosoma brucei bloodstream-specific protein 2, a probable PDI. - hsp70 related protein GRP78 (also known as the immunoglobulin heavy chain binding protein (BiP), and as KAR2, in fungi). - hsp90 related protein 'endoplasmin' (also known as GRP94, Erp99 or Hsp108). - Calreticulin, a calcium-binding protein (also known as calregulin, CRP55, or HACBP). - ERC-55, a calcium-binding protein. - Reticulocalbin, a calcium-binding protein. - Hsp47, a heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway. - A receptor for a plant hormone, auxin. - Thiol proteases from rice bean (SH-EP) and kidney bean (EP-C1). - Esterases from mammalian liver and from nematodes. - Alpha-2-macroglobulin receptor-associated protein (RAP). - Yeast peptidyl-prolyl cis-trans isomerase D (CYPD). - Yeast protein KRE5, a protein required for (1->6)-beta-D-glucan synthesis. - Yeast protein SEC20, required for the transport of proteins from the endoplasmic reticulum to the Golgi apparatus. - Yeast protein SCJ1, involved in protein sorting. -Consensus pattern: [KRHQSA]-[DENQ]-E-L> -Sequences known to belong to this class detected by the pattern: ALL, except for liver esterases which have H-[TVI]-E-L. -Other sequence(s) detected in Swiss-Prot: 24 proteins which are clearly not located in the ER (because they are of bacterial or viral origin, for example) and a protein which can be considered as valid candidate: human 80KH protein. -Last update: November 1997 / Text revised. [ 1] Munro S., Pelham H.R.B. "A C-terminal signal prevents secretion of luminal ER proteins." Cell 48:899-907(1987). PubMed=3545499 [ 2] Pelham H.R.B. "The retention signal for soluble proteins of the endoplasmic reticulum." Trends Biochem. Sci. 15:483-486(1990). PubMed=2077689 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00015} {PS50079; NLS_BP} {BEGIN} ************************************************* * Bipartite nuclear localization signal profile * ************************************************* The uptake of protein by the nucleus is extremely selective and nuclear proteins must therefore contain within their final structure a signal that specifies selective accumulation in the nucleus [1,2]. Studies on some nuclear proteins, such as the large T antigen of SV40, have indicated which part of the sequence is required for nuclear translocation. The known nuclear targeting sequences are generally basic, but there seems to be no clear common denominator between all the known sequences. Although some consensus sequence patterns have been proposed (see for example [3]), the current best strategy to detect a nuclear targeting sequence is based [4] on the following definition of what is called a 'bipartite nuclear localization signal': (1) Two adjacent basic amino acids (Arg or Lys). (2) A spacer region of any 10 residues. (3) At least three basic residues (Arg or Lys) in the five positions after the spacer region. The profile localization signal. we developed covers the entire bipartite nuclear -Sequences known to belong to this class detected by the profile: 56% of known nuclear proteins according to [4]. -Other sequence(s) detected in Swiss-Prot: about 4.2% of non-nuclear proteins according to [4]. -Note: This profile replace an obsolete rule. All the information in the rule has been encoded in the profile format. -Last update: October 2006 / Text revised; profiles added; rule deleted. [ 1] Dingwall C., Laskey R.A. "Protein import into the cell nucleus." Annu. Rev. Cell Biol. 2:367-390(1986). PubMed=3548772; DOI=10.1146/annurev.cb.02.110186.002055 [ 2] Garcia-Bustos J.F., Heitman J., Hall M.N. Biochim. Biophys. Acta 1071:83-101(1991). [ 3] Gomez-Marquez J., Segade F. FEBS Lett. 226:217-219(1988). [ 4] Dingwall C., Laskey R.A. "Nuclear targeting sequences -- a consensus?" Trends Biochem. Sci. 16:478-481(1991). PubMed=1664152 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00016} {PS00016; RGD} {BEGIN} **************************** * Cell attachment sequence * **************************** The sequence Arg-Gly-Asp, found in fibronectin, is crucial for its interaction with its cell surface receptor, an integrin [1,2]. What has been called the 'RGD' tripeptide is also found in the sequences of a number of other proteins, where it has been shown to play a role in cell adhesion. These proteins are: some forms of collagens, fibrinogen, vitronectin, von Willebrand factor (VWF), snake disintegrins, and slime mold discoidins. The 'RGD' tripeptide is also found in other proteins where it may also, but not always, serve the same purpose. -Consensus pattern: R-G-D -Last update: December 1991 / Text revised. [ 1] Ruoslahti E., Pierschbacher M.D. "Arg-Gly-Asp: a versatile cell recognition signal." Cell 44:517-518(1986). PubMed=2418980 [ 2] d'Souza S.E., Ginsberg M.H., Plow E.F. Trends Biochem. Sci. 16:246-250(1991). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00017} {PS00017; ATP_GTP_A} {BEGIN} ***************************************** * ATP/GTP-binding site motif A (P-loop) * ***************************************** From sequence comparisons and crystallographic data analysis it has been shown [1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs. The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally referred to as the 'A' consensus sequence [1] or the 'P-loop' [5]. There are numerous ATP- or GTP-binding proteins in which the P-loop is found. We list below a number of protein families for which the relevance of the presence of such motif has been noted: - ATP synthase alpha and beta subunits (see <PDOC00137>). - Myosin heavy chains. - Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). - Dynamins and dynamin-like proteins (see <PDOC00362>). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDOC00524>). - Thymidylate kinase (see <PDOC01034>). - Shikimate kinase (see <PDOC00868>). - Nitrogenase iron protein family (nifH/chlL) (see <PDOC00580>). - ATP-binding proteins involved in 'active transport' (ABC transporters) [7] (see <PDOC00185>). - DNA and RNA helicases [8,9,10]. - GTP-binding elongation factors (EF-Tu, EF-1alpha, EF-G, EF-2, etc.). - Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1, SEC4, etc.). - Nuclear protein ran (see <PDOC00859>). - ADP-ribosylation factors family (see <PDOC00781>). - Bacterial dnaA protein (see <PDOC00771>). - Bacterial recA protein (see <PDOC00131>). - Bacterial recF protein (see <PDOC00539>). - Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, G0, etc.). - DNA mismatch repair proteins mutS family (See <PDOC00388>). - Bacterial type II secretion system protein E (see <PDOC00567>). Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTPbinding proteins the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases. A special mention must be reserved for adenylate kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is found instead of Ser or Thr. -Consensus pattern: [AG]-x(4)-G-K-[ST] -Sequences known to belong to this class detected by the pattern: a majority. -Other sequence(s) detected in Swiss-Prot: in addition to the proteins listed above, the 'A' motif is also found in a number of other proteins. Most of these proteins probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human ferritin light chain). -Expert(s) to contact by email: Koonin E.V.; [email protected] -Last update: July 1999 / Text revised. [ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J. "Distantly related sequences in the alpha- and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold." EMBO J. 1:945-951(1982). PubMed=6329717 [ 2] Moller W., Amons R. "Phosphate-binding sequences in nucleotide-binding proteins." FEBS Lett. 186:1-7(1985). PubMed=2989003 [ 3] Fry D.C., Kuby S.A., Mildvan A.S. "ATP-binding site of adenylate kinase: mechanistic implications of its homology with ras-encoded p21, F1-ATPase, and other nucleotidebinding proteins." Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986). PubMed=2869483 [ 4] Dever T.E., Glynias M.J., Merrick W.C. "GTP-binding domain: three consensus sequence elements with distinct spacing." Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987). PubMed=3104905 [ 5] Saraste M., Sibbald P.R., Wittinghofer A. "The P-loop -- a common motif in ATP- and GTP-binding proteins." Trends Biochem. Sci. 15:430-434(1990). PubMed=2126155 [ 6] Koonin E.V. "A superfamily of ATPases with diverse functions containing either classical or deviant ATP-binding motif." J. Mol. Biol. 229:1165-1174(1993). PubMed=8445645 [ 7] Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R., Gallagher M.P. "Binding protein-dependent transport systems." J. Bioenerg. Biomembr. 22:571-592(1990). PubMed=2229036 [ 8] Hodgman T.C. "A new superfamily of replicative proteins." Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). PubMed=3362205; DOI=10.1038/333022b0 [ 9] Linder P., Lasko P.F., Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski P.P. "Birth of the D-E-A-D box." Nature 337:121-122(1989). PubMed=2563148; DOI=10.1038/337121a0 [10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00018} {PS00018; EF_HAND_1} {PS50222; EF_HAND_2} {BEGIN} ******************************************************** * EF-hand calcium-binding domain signature and profile * ******************************************************** Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain known as the EF-hand [1 to 5]. This type of domain consists of a twelve residue loop flanked on both side by a twelve residue alpha-helical domain (see <PDB:1CLL>). In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidal configuration. The six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). The basic structural/functional unit of EF-hand proteins is usually a pair of EF-hand motifs that together form a stable four-helix bundle domain. The pairing of EF-hand enables cooperativity in the binding of Ca2+ ions. We list below the proteins which are known to contain EF-hand regions. For each type of protein we have indicated between parenthesis the total number of EF-hand regions known or supposed to exist. This number does not include regions which clearly have lost their calcium-binding properties, or the atypical low-affinity site (which spans thirteen residues) found in the S-100/ ICaBP family of proteins [6]. - Aequorin and Renilla luciferin binding protein (LBP) (Ca=3). - Alpha actinin (Ca=2). - Calbindin (Ca=4). - Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca=4). - Calcium-binding protein from Streptomyces erythraeus (Ca=3?). - Calcium-binding protein from Schistosoma mansoni (Ca=2?). - Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena thermophila (Ca=4?). - Calcium-dependent protein kinases (CDPK) from plants (Ca=4). - Calcium vector protein from amphoxius (Ca=2). - Calcyphosin (thyroid protein p24) (Ca=4?). - Calmodulin (Ca=4, except in yeast where Ca=3). - Calpain small and large chains (Ca=2). - Calretinin (Ca=6). - Calcyclin (prolactin receptor associated protein) (Ca=2). - Caltractin (centrin) (Ca=2 or 4). - Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?). - Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Ca=2). - FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.99.5) from mammals (Ca=1). - Fimbrin (plastin) (Ca=2). - Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1 or 2). - Guanylate cyclase activating protein (GCAP) (Ca=3). - Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-1 (Ca=2) [10]. - Intestinal calcium-binding protein (ICaBPs) (Ca=2). - MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2). - Myosin regulatory light chains (Ca=1). - Oncomodulin (Ca=2). - Osteonectin (basement membrane protein BM-40) (SPARC) and proteins that contains an 'osteonectin' domain (QR1, matrix glycoprotein SC1) (see the entry <PDOC00535>) (Ca=1). - Parvalbumins alpha and beta (Ca=2). - Placental calcium-binding protein (18a2) (nerve growth factor induced protein 42a) (p9k) (Ca=2). - Recoverins (visinin, hippocalcin, neurocalcin, S-modulin) (Ca=2 to 3). - Reticulocalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2). - Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3). - Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8). - Serine/threonine specific protein phosphatase rdgc (EC 3.1.3.16) from Drosophila (Ca=2). - Sorcin V19 from hamster (Ca=2). - Spectrin alpha chain (Ca=2). - Squidulin (optic lobe calcium-binding protein) from squid (Ca=4). - Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3), from arthropods and molluscs (Ca=2). There has been a number of attempts [7,8] to develop patterns that pickup EFhand regions, but these studies were made a few years ago when not so many different families of calcium-binding proteins were known. We therefore developed a new pattern which takes into account all published sequences. This pattern includes the complete EF-hand loop as well as the first residue which follows the loop and which seem to always be hydrophobic. We also developed a profile that covers the loop and the two alpha helices. -Consensus pattern: D-{W}-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC][DENQSTAGC]-x(2)-[DE]-[LIVMFYW] -Sequences known to belong to this class detected by the profile: ALL. for a few sequences. -Other sequence(s) detected in Swiss-Prot: NONE. probably not calcium-binding and a few proteins for which we have reason to believe that they bind calcium: a number of endoglucanases and a xylanase from the cellulosome complex of Clostridium [9]. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: Positions 1 (X), 3 (Y) and 12 (-Z) are the most conserved. -Note: The 6th residue in an EF-hand loop is, in most cases a Gly, but the number of exceptions to this 'rule' has gradually increased and we felt that the pattern should include all the different residues which have been shown to exist in this position in functional Ca-binding sites. -Note: The pattern will, in some cases, miss one of the EF-hand regions in some proteins with multiple EF-hand domains. -Expert(s) to contact by email: Cox J.A.; [email protected] Kretsinger R.H.; [email protected] -Last update: April 2006 / Pattern revised. [ 1] Kawasaki H., Kretsinger R.H. "Calcium-binding proteins 1: EF-hands." Protein Prof. 2:305-490(1995). PubMed=7553064 [ 2] Kretsinger R.H. "Calcium coordination and the calmodulin fold: divergent versus convergent evolution." Cold Spring Harb. Symp. Quant. Biol. 52:499-510(1987). PubMed=3454274 [ 3] Moncrief N.D., Kretsinger R.H., Goodman M. "Evolution of EF-hand calcium-modulated proteins. I. Relationships based on amino acid sequences." J. Mol. Evol. 30:522-562(1990). PubMed=2115931 [ 4] Nakayama S., Moncrief N.D., Kretsinger R.H. "Evolution of EF-hand calcium-modulated proteins. II. Domains of several subfamilies have diverse evolutionary histories." J. Mol. Evol. 34:416-448(1992). PubMed=1602495 [ 5] Heizmann C.W., Hunziker W. "Intracellular calcium-binding proteins: more sites than insights." Trends Biochem. Sci. 16:98-103(1991). PubMed=2058003 [ 6] Kligman D., Hilt D.C. "The S100 protein family." Trends Biochem. Sci. 13:437-443(1988). PubMed=3075365 [ 7] Strynadka N.C.J., James M.N. "Crystal structures of the helix-loop-helix calcium-binding proteins." Annu. Rev. Biochem. 58:951-998(1989). PubMed=2673026; DOI=10.1146/annurev.bi.58.070189.004511 [ 8] Haiech J., Sallantin J. "Computer search of calcium binding sites in a gene data bank: use of learning techniques to build an expert system." Biochimie 67:555-560(1985). PubMed=3839696 [ 9] Chauvaux S., Beguin P., Aubert J.-P., Bhat K.M., Gow L.A., Wood T.M., Bairoch A. "Calcium-binding affinity and calcium-enhanced activity of Clostridium thermocellum endoglucanase D." Biochem. J. 265:261-265(1990). PubMed=2302168 [10] Bairoch A., Cox J.A. "EF-hand motifs in inositol phospholipid-specific phospholipase C." FEBS Lett. 269:454-456(1990). PubMed=2401372 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00019} {PS00019; ACTININ_1} {PS00020; ACTININ_2} {BEGIN} ************************************************ * Actinin-type actin-binding domain signatures * ************************************************ Alpha-actinin is a F-actin cross-linking protein which is thought to anchor actin to a variety of intracellular structures [1]. The actin-binding domain of alpha-actinin seems to reside in the first 250 residues of the protein. A similar actin-binding domain has been found in the N-terminal region of many different actin-binding proteins [2,3]: - In the beta chain of spectrin (or fodrin). - In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which may play a role in anchoring the cytoskeleton to the plasma membrane. - In the slime mold gelation factor (or ABP-120). - In actin-binding protein ABP-280 (or filamin), a protein that link actin filaments to membrane glycoproteins. - In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs from the above proteins in that it contains two tandem copies of the actinbinding domain and that these copies are located in the C-terminal part of the protein. We selected two conserved regions as signature patterns for this type of domain. The first of this region is located at the beginning of the domain, while the second one is located in the central section and has been shown to be essential for the binding of actin. -Consensus pattern: [EQ]-{LNYH}-x-[ATV]-[FY]-{LDAM}-{T}-W-{PG}-N -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 32. -Consensus pattern: [LIVM]-x-[SGNL]-[LIVMN]-[DAGHENRS]-[SAGPNVT]-x[DNEAG][LIVM]-x-[DEAGQ]-x(4)-[LIVM]-x-[LM]-[SAG]-[LIVM][LIVMT][WS]-x(0,1)-[LIVM](2) -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: April 2006 / Patterns revised. [ 1] Schleicher M., Andre E., Hartmann H., Noegel A.A. "Actin-binding proteins are conserved from slime molds to man." Dev. Genet. 9:521-530(1988). PubMed=3243032 [ 2] Matsudaira P. "Modular organization of actin crosslinking proteins." Trends Biochem. Sci. 16:87-92(1991). PubMed=2058002 [ 3] Dubreuil R.R. "Structure and evolution of the actin crosslinking proteins." BioEssays 13:219-226(1991). PubMed=1892474 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00020} {PS00021; KRINGLE_1} {PS50070; KRINGLE_2} {BEGIN} **************************************** * Kringle domain signature and profile * **************************************** Kringles [1,2,3] are triple-looped, disulfide cross-linked domains found in a varying number of copies, in some serine proteases and plasma proteins. The kringle domain has been found in the following proteins: - Apolipoprotein A (38 copies). Blood coagulation factor XII (Hageman factor) (1 copy). Hepatocyte growth factor (HGF) (4 copies). Hepatocyte growth factor like protein (4 copies) [4]. Hepatocyte growth factor activator [1] (once) [5]. Plasminogen (5 copies). Thrombin (2 copies). Tissue plasminogen activator (TPA) (2 copies). Urokinase-type plasminogen activator (1 copy). The schematic domain is shown below: representation of the structure of a typical kringle +---------------------------------------+ | | xCxxxxxxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxCxxxCx | | | | +----------|-----+ | +------------+ 'C': conserved cysteine involved in a disulfide bond. Kringle domains are thought to play a role in binding mediators, such as membranes, other proteins or phospholipids, and in the regulation of proteolytic activity. As a signature pattern for this type of domain, we selected a conserved sequence that contains two of the cysteines invovled in disulfide bonds. -Consensus pattern: [FY]-C-[RH]-[NS]-x(7,8)-[WY]-C [The 2 C's are involved in a disulfide bonds] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 5 -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Ikeo K.; [email protected] -Last update: May 2004 / Text revised. [ 1] Castellino F.J., Beals J.M. "The genetic relationships between the kringle domains of human plasminogen, prothrombin, tissue plasminogen activator, urokinase, and coagulation factor XII." J. Mol. Evol. 26:358-369(1987). PubMed=3131537 [ 2] Patthy L. "Evolution of the proteases of blood coagulation and fibrinolysis by assembly from modules." Cell 41:657-663(1985). PubMed=3891096 [ 3] Ikeo K., Takahashi K., Gojobori T. "Evolutionary origin of numerous kringles in human and simian apolipoprotein(a)." FEBS Lett. 287:146-148(1991). PubMed=1879523 [ 4] Friezner Degen S.J., Stuart L.A., Han S., Jamison C.S. Biochemistry 30:9781-9791(1991). [ 5] Miyazawa K., Shimomura T., Kitamura A., Kondo J., Morimoto Y., Kitamura N. "Molecular cloning and sequence analysis of the cDNA for a human serine protease reponsible for activation of hepatocyte growth factor. Structural similarity of the protease precursor to blood coagulation factor XII." J. Biol. Chem. 268:10024-10028(1993). PubMed=7683665 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00021} {PS00022; EGF_1} {PS01186; EGF_2} {PS50026; EGF_3} {BEGIN} ****************************************** * EGF-like domain signatures and profile * ****************************************** A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF) has been shown [1 to 6] to be present, in a more or less conserved form, in a large number of other, mostly animal proteins. EGF is a polypeptide of about 50 amino acids with three internal disulfide bridges. It first binds with high affinity to specific cell-surface receptors and then induces their dimerization, which is essential for activating the tyrosine kinase in the receptor cytoplasmic domain, initiating a signal transduction that results in DNA synthesis and cell proliferation. A common feature of all EGF-like domains is that they are found in the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandin G/H synthase). The EGF-like domain includes six cysteine residues which have been shown to be involved in disulfide bonds. The structure of several EGF-like domains has been solved. The fold consists of two-stranded beta-sheet followed by a loop to a C-terminal short two-stranded sheet (see <PDB:1EGF). Subdomains between the conserved cysteines strongly vary in length as shown in the following schematic representation of the EGF-like domain: +-------------------+ +-------------------------+ | | | | x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)C-x | | ************************************ +-------------------+ 'C': 'G': 'a': '*': 'x': conserved cysteine involved in a disulfide bond. often conserved glycine often conserved aromatic amino acid position of both patterns. any residue Some proteins domain are listed below. known to contain one or more copies of an EGF-like - Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6 copies). - Agrin, a basal lamina protein that causes the aggregation of acetylcholine receptors on cultured muscle fibers (4 copies). - Amphiregulin, a growth factor (1 copy). - Betacellulin, a growth factor (1 copy). - Blastula proteins BP10 and Span from sea urchin which are thought to be involved in pattern formation (1 copy). - BM86, a glycoprotein antigen of cattle tick (7 copies). - Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone formation and which expresses metalloendopeptidase activity (1-2 copies). Homologous proteins are found in sea urchin - suBMP (1 copy) - and in Drosophila - the dorsal-ventral patterning protein tolloid (2 copies). - Caenorhabditis elegans developmental proteins lin-12 (13 copies) and glp-1 (10 copies). - Caenorhabditis elegans apx-1 protein, a patterning protein (4.5 copies). - Calcium-dependent serine proteinase (CASP) which degrades the extracellular matrix proteins type I and IV collagen and fibronectin (1 copy). - Cartilage matrix protein CMP (1 copy). - Cartilage oligomeric matrix protein COMP (4 copies). - Cell surface antigen 114/A10 (3 copies). - Cell surface glycoprotein complex transmembrane subunit ASGP-2 from rat (2 copies). - Coagulation associated proteins C, Z (2 copies) and S (4 copies). - Coagulation factors VII, IX, X and XII (2 copies). - Complement C1r components (1 copy). - Complement C1s components (1 copy). - Complement-activating component of Ra-reactive factor (RARF) (1 copy). - Complement components C6, C7, C8 alpha and beta chains, and C9 (1 copy). - Crumbs, an epithelial development protein from Drosophila (29 copies). - Epidermal growth factor precursor (7-9 copies). - Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy). - Fat protein, a Drosophila cadherin-related tumor suppressor (5 copies). - Fetal antigen 1, a probable neuroendocrine differentiation protein, which is derived from the delta-like protein (DLK) (6 copies). - Fibrillin 1 (47 copies) and fibrillin 2 (14 copies). - Fibropellins IA (21 copies), IB (13 copies), IC (8 copies), II (4 copies) and III (8 copies) from the apical lamina - a component of the extracellular matrix - of sea urchin. - Fibulin-1 and -2, two extracellular matrix proteins (9-11 copies). - Giant-lens protein (protein Argos), which regulates cell determination and axon guidance in the Drosophila eye (1 copy). - Growth factor-related proteins from various poxviruses (1 copy). - Gurken protein, a Drosophila developmental protein (1 copy). - Heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor alpha (TGF-alpha), growth factors Lin-3 and Spitz (1 copy); the precursors are membrane proteins, the mature form is located extracellular. - Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies). - LDL and VLDL receptors, which bind and transport low-density lipoproteins and very low-density lipoproteins (3 copies). - LDL receptor-related protein (LRP), which may act as a receptor for endocytosis of extracellular ligands (22 copies). - Leucocyte antigen CD97 (3 copies), cell surface glycoprotein EMR1 (6 copies) and cell surface glycoprotein F4/80 (7 copies). - Limulus clotting factor C, which is involved in hemostasis and host defense mechanisms in japanese horseshoe crab (1 copy). - Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1 copy). - Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies). - Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy). - Neurexins from mammals (3 copies). - Neurogenic proteins Notch, Xotch and the human homolog Tan-1 (36 copies), Delta (9 copies) and the similar differentiation proteins Lag-2 from Caenorhabditis elegans (2 copies), Serrate (14 copies) and Slit (7 copies) from Drosophila. - Nidogen (also called entactin), a basement membrane protein from chordates (2-6 copies). - Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4 copies). - Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy). - Perforin, which lyses non-specifically a variety of target cells (1 copy). - Proteoglycans aggrecan (1 copy), versican (2 copies), perlecan (at least 2 copies), brevican (1 copy) and chondroitin sulfate proteoglycan (gene PG-M) (2 copies). - Prostaglandin G/H synthase 1 and 2 (EC 1.14.99.1) (1 copy), which is found in the endoplasmatic reticulum. - Reelin, an extracellular matrix protein that plays a role in layering of neurons in the cerebral cortex and cerebellum of mammals (8 copies). - S1-5, a human extracellular protein whose ultimate activity is probably modulated by the environment (5 copies). - Schwannoma-derived growth factor (SDGF), an autocrine growth factor as well as a mitogen for different target cells (1 copy). - Selectins. Cell adhesion proteins such as ELAM-1 (E-selectin), GMP-140 (P-selectin), or the lymph-node homing receptor (L-selectin) (1 copy). - Serine/threonine-protein kinase homolog (gene Pro25) from Arabidopsis thaliana, which may be involved in assembly or regulation of light-harvesting chlorophyll A/B protein (2 copies). - Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1 copy). - Stromal cell derived protein-1 (SCP-1) from mouse (6 copies). - TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy). - Tenascin (or neuronectin), an extracellular matrix protein from mammals (14.5 copies), chicken (TEN-A) (13.5 copies) and the related proteins human tenascin-X (18 copies) and tenascin-like proteins TEN-A and TEN-M from Drosophila (8 copies). - Thrombomodulin (fetomodulin), which together with thrombin activates protein C (6 copies). - Thrombospondin 1, 2 (3 copies), 3 and 4 (4 copies), adhesive glycoproteins that mediate cell-to-cell and cell-to-matrix interactions. - Thyroid peroxidase 1 and 2 (EC 2.7.10.1) from human (1 copy). - Transforming growth factor beta-1 binding protein (TGF-B1-BP) (16 or 18 copies). - Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.112) (3 copies). - Urokinase-type plasminogen activator (EC 3.4.21.73) (UPA) and tissue plasminogen activator (EC 3.4.21.68) (TPA) (1 copy). - Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies). - Vitamin K-dependent anticoagulants protein C (2 copies) and protein S (4 copies) and the similar protein Z, a single-chain plasma glycoprotein of unknown function (2 copies). - 63 Kd sperm flagellar membrane protein from sea urchin (3 copies). - 93 Kd protein (gene nel) from chicken (5 copies). - Hypothetical 337.6 Kd protein T20G5.3 from Caenorhabditis elegans (44 copies). The region between the 5th and 6th cysteine contains two conserved glycines of which at least one is present in most EGF-like domains. We created two patterns for this domain, each including one of these C-terminal conserved glycine residues. The profile we developed covers the whole domain. -Consensus pattern: C-x-C-x(2)-{V}-x(2)-G-{C}-x-C [The 3 C's are involved in disulfide bonds] -Sequences known to belong to this class detected by the pattern: ALL. but not those that have very long or very short regions between the last 3 conserved cysteines of their EGF-like domain(s). -Other sequence(s) detected in Swiss-Prot: 87 proteins, of which 27 can be considered as possible candidates. -Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C [The 3 C's are involved in disulfide bonds] -Sequences known to belong to this class detected by the pattern: ALL. but not those that have very long or very short regions between the last 3 conserved cysteines of their EGF-like domain(s). -Other sequence(s) detected in Swiss-Prot: 83 proteins, of which 49 can be considered as possible candidates. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: The beta chain of the integrin family of proteins contains 2 cysteinerich repeats which were said to be dissimilar with the EGF pattern [7]. -Note: Laminin EGF-like repeats (see <PDOC00961>) are longer than the average EGF module and contain a further disulfide bond C-terminal of the EGF-like region. Perlecan and agrin contain both EGF-like domains and laminin-type EGF-like domains. -Note: The pattern do not detect all of the repeats of proteins with multiple EGF-like repeats. -Note: See <PDOC00913> for an entry describing specifically the subset of EGFlike domains that bind calcium. -Last update: April 2006 / Pattern revised. [ 1] Davis C.G. "The many faces of epidermal growth factor repeats." New Biol. 2:410-419(1990). PubMed=2288911 [ 2] Blomquist M.C., Hunt L.T., Barker W.C. "Vaccinia virus 19-kilodalton protein: relationship to several mammalian proteins, including two growth factors." Proc. Natl. Acad. Sci. U.S.A. 81:7363-7367(1984). PubMed=6334307 [ 3] Barker W.C., Johnson G.C., Hunt L.T., George D.G. Protein Nucl. Acid Enz. 29:54-68(1986). [ 4] Doolittle R.F., Feng D.F., Johnson M.S. "Computer-based characterization of epidermal growth factor precursor." Nature 307:558-560(1984). PubMed=6607417 [ 5] Appella E., Weber I.T., Blasi F. "Structure and function of epidermal growth factor-like regions in proteins." FEBS Lett. 231:1-4(1988). PubMed=3282918 [ 6] Campbell I.D., Bork P. Curr. Opin. Struct. Biol. 3:385-392(1993). [ 7] Tamkun J.W., DeSimone D.W., Fonda D., Patel R.S., Buck C., Horwitz A.F., Hynes R.O. "Structure of integrin, a glycoprotein involved in the transmembrane linkage between fibronectin and actin." Cell 46:271-282(1986). PubMed=3487386 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00022} {PS00023; FN2_1} {PS51092; FN2_2} {BEGIN} ********************************************************************* * Fibronectin type-II collagen-binding domain signature and profile * ********************************************************************* Fibronectin is a plasma protein that binds cell surfaces and various compounds including collagen, fibrin, heparin, DNA, and actin. The major part of the sequence of fibronectin consists of the repetition of three types of domains, which are called type I, II, and III [1]. Type II domain (FN2) is approximately 40 residues long, contains four conserved cysteines involved in disulfide bonds and is part of the collagen-binding region of fibronectin [2]. In fibronectin the minimal collagen binding region is formed by one FN1 and two FN2 domains. This suggests that the collagen-binding sites spans multiple modules. A schematic representation of the position of the invariant residues and the topology of the disulfide bonds in FN2 domain is shown below. +----------------------+ | | xxCxxPFx#xxxxxxxCxxxxxxxxWCxxxxx#xxx#x#Cxx | | +-----------------------+ 'C': conserved cysteine involved in a disulfide bond. '#': large hydrophobic residue. The 3D-structure of the FN2 domain has been determined (see <PDB:2FN2>) [3]. The structure consists of two double-stranded anti-parallel betasheets, oriented approximately perpendicular to each other, and two irregular loops, one separating the two beta-sheets and the other between the two strands of the second beta-sheet. The minimal collagen-binding region (FN1FN2-FN2) adopts a hairpin structure where the conserved aromatic residues of FN2 form a hydrophobic pocket which polar residues in collagen [4]. is thought to provide a binding site for non Some proteins that contain an FN2 domain are listed below: - Blood coagulation factor XII (Hageman factor) (1 copy). - Bovine seminal plasma proteins PDC-109 (BSP-A1/A2) and BSP-A3 [5] (twice). - Cation-independent mannose-6-phosphate receptor (which is also the insulinlike growth factor II receptor) [6] (1 copy). - Mannose receptor of macrophages [7] (1 copy). - 180 Kd secretory phospholipase A2 receptor (1 copy) [8]. - DEC-205 receptor (1 copy) [9]. 72 Kd and 92 Kd type IV collagenases (EC 3.4.24.24) (MMP-2 and MMP-9) [10] (3 copies). Both metalloproteinases are strongly expressed in malignant tumors and have been attributed to metastasize. They both degradate collagen-IV thus facilitating penetration of the basement membranes by tumor cells. - Hepatocyte growth factor activator [11] (1 copy). Our consensus pattern spans the domain between the first and the last conserved cysteine. We also developed a profile that covers the whole domain. -Consensus pattern: C-x(2)-P-F-x-[FYWIV]-x(7)-C-x(8,10)-W-C-x(4)-[DNSR][FYW]x(3,5)-[FYW]-x-[FYWI]-C [The 4 C's are involved in disulfide bonds] -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: March 2005 / Text revised; profile added. [ 1] Skorstengaard K., Jensen M.S., Sahl P., Petersen T.E., Magnusson S. "Complete primary structure of bovine plasma fibronectin." Eur. J. Biochem. 161:441-453(1986). PubMed=3780752 [ 2] Forastieri H., Ingham K.C. "Interaction of gelatin with a fluorescein-labeled 42-kDa chymotryptic fragment of fibronectin." J. Biol. Chem. 260:10546-10550(1985). PubMed=3928622 [ 3] Pickford A.R., Potts J.R., Bright J.R., Phan I., Campbell I.D. "Solution structure of a type 2 module from fibronectin: implications for the structure and function of the gelatin-binding domain." Structure 5:359-370(1997). PubMed=9083105 [ 4] Pickford A.R., Smith S.P., Staunton D., Boyd J., Campbell I.D. "The hairpin structure of the (6)F1(1)F2(2)F2 fragment from human fibronectin enhances gelatin binding." EMBO J. 20:1519-1529(2001). PubMed=11285216; DOI=10.1093/emboj/20.7.1519 [ 5] Seidah N.G., Manjunath P., Rochemont J., Sairam M.R., Chretien M. "Complete amino acid sequence of BSP-A3 from bovine seminal plasma. Homology to PDC-109 and to the collagen-binding domain of fibronectin." Biochem. J. 243:195-203(1987). PubMed=3606570 [ 6] Kornfeld S. "Structure and function of the mannose 6-phosphate/insulinlike growth factor II receptors." Annu. Rev. Biochem. 61:307-330(1992). PubMed=1323236; DOI=10.1146/annurev.bi.61.070192.001515 [ 7] Taylor M.E., Conary J.T., Lennartz M.R., Stahl P.D., Drickamer K. "Primary structure of the mannose receptor contains multiple motifs resembling carbohydrate-recognition domains." J. Biol. Chem. 265:12156-12162(1990). PubMed=2373685 [ 8] Lambeau G., Ancian P., Barhanin J., Lazdunski M. "Cloning and expression of a membrane receptor for secretory phospholipases A2." J. Biol. Chem. 269:1575-1578(1994). PubMed=8294398 [ 9] Jiang W., Swiggard W.J., Heufler C., Peng M., Mirza A., Steinman R.M., Nussenzweig M.C. "The receptor DEC-205 expressed by dendritic cells and thymic epithelial cells is involved in antigen processing." Nature 375:151-155(1995). PubMed=7753172; DOI=10.1038/375151a0 [10] Collier I.E., Wilhelm S.M., Eisen A.Z., Marmer B.L., Grant G.A., Seltzer J.L., Kronberger A., He C., Bauer E.A., Goldberg G.I. J. Biol. Chem. 263:6579-6587(1988). [11] Miyazawa K., Shimomura T., Kitamura A., Kondo J., Morimoto Y., Kitamura N. J. Biol. Chem. 268:10024-10028(1993). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00023} {PS00024; HEMOPEXIN} {BEGIN} ****************************** * Hemopexin domain signature * ****************************** Hemopexin is a serum glycoprotein that binds heme and transports it to the liver for breakdown and iron recovery, after which the free hemopexin returns to the circulation. Structurally hemopexin consists of two similar halves of approximately two hundred amino acid residues connected by a histidine-rich hinge region. Each half is itself formed by the repetition of a basic unit of some 35 to 45 residues. Hemopexin-like domains have been found [1,2] in two other types of proteins: - In vitronectin, a cell adhesion and spreading factor found in plasma and tissues. Vitronectin, like hemopexin, has two hemopexin-like domains. - In most members of the matrix metalloproteinases family (matrixins) (see <PDOC00129>): MMP-1, MMP-2, MMP-3, MMP-8, MMP-9, MMP-10, MMP-11, MMP-12, MMP-13, MMP-14, MMP-15, MMP-16, MMP-17, MMP-18, MMP-19, MMP-20, MMP-24, and MMP-25. These zinc endoproteases have a single hemopexin-like domain in their C-terminal section. It is suggested that the hemopexin domain facilitates binding to a variety of molecules and proteins. The signature pattern for this type of domain has been derived from the best conserved region which is located at the beginning of the second repeat. -Consensus pattern: [LIFAT]-{IL}-x(2)-W-x(2,3)-[PE]-x-{VF}-[LIVMFY][DENQS][STA]-[AV]-[LIVMFY] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 11. -Last update: April 2006 / Pattern revised. [ 1] Hunt L.T., Barker W.C., Chen H.R. Protein Seq. Data Anal. 1:21-26(1987). [ 2] Stanley K.K. "Homology with hemopexin suggests a possible scavenging function for S-protein/vitronectin." FEBS Lett. 199:249-253(1986). PubMed=2422056 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00024} {PS00025; P_TREFOIL_1} {PS51448; P_TREFOIL_2} {BEGIN} *************************************************** * P-type ('Trefoil') domain signature and profile * *************************************************** A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins [1,2,3,4,5]. This domain is known as either the 'P', 'trefoil' or 'TFF' domain. It contains six cysteines that are linked by three disulfide bonds in a 1-5, 2-4, and 3-6 configuration. This leads to a characteristic three leafed structure ('trefoil'). The P-type domain is clearly composed of three looplike regions. The central core of the domain consists of a short two-stranded antiparallel beta-sheet, which is capped by an irregular loop and forms a central hairpin (loop 3). The beta-sheet is preceded by a short alpha-helix, with majority of the remainder of the domain contained in two loops, which lie on either side of the central hairpin (see <PDB:1E9T>) [6]. Proteins known to contain this domain are: - Protein pS2 (TFF1), a protein secreted by the stomach mucosa, whose gene is induced by estrogen. The exact function of pS2 is not known. It is a protein of about 65 residues and it contains a copy of the 'P' domain. - Spasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretion. SP could be a growth factor. It contains two tandem copies of the 'P' domain. - Intestinal trefoil factor (ITF) (TFF3), an intestinal protein of about 60 residues which may have a role in promoting cell migration. It contains a copy of the 'P' domain. - Xenopus stomach proteins xP1 (one 'P' domain) and xP4 (four 'P' domains). - Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1). These proteins could be involved in defense against microbial infections by protecting the epithelia from external environment. They are large proteins (400 residues for A.1; more than 660 residues for C.1 whose sequence is only partially known) that contain multiple copies of the 'P' domain interspersed with tandem repeats of threonine-rich, Oglycosylated regions. - Xenopus skin protein xp2 (or APEG) a protein that contains two 'P' domains and which exists in two alternative spliced forms that differ from the inclusion of a N-terminal region of 320 residues that consist of 33 tandem repeats of a G-[GE]-[AP](2,4)-A-E motif. - Zona pellucida sperm-binding protein B (ZP-B) (also known as ZP-X in rabbit and ZP-3 alpha in pig). This protein is a receptor-like glycoprotein whose extracellular region contains a 'P' domain followed by a ZP domain (see <PDOC00577>). - Intestinal sucrase-isomaltase (EC 3.2.1.48 / EC 3.2.1.10), a vertebrate membrane-bound, multifunctional enzyme complex which hydrolyzes sucrose, maltose and isomaltose (see <PDOC00120>). - Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase), a vertebrate extracellular glycosidase (see <PDOC00120>). Structurally the P-type domain can be represented as shown below. +-------------------------+ | +--------------+| | | || xxCxxxxxx+xxCG#xxxxxxxCxxxxCC#xxxxxxxxWC#xxxxxxxx *************|******* | | | +----------------+ 'C': '#': '+': '*': conserved cysteine involved in a disulfide bond. large hydrophobic residue. positively charged residue. position of the pattern. -Consensus pattern: [KRH]-x(2)-C-x-[FYPSTV]-x(3,4)-[ST]-x(3)-C-x(4)-C-C[FYWH] [The 4 C's are involved in disulfide bonds] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Hoffmann W.; [email protected] -Last update: May 2009 / Text revised; profile added. [ 1] Hoffmann W., Hauser F. "The P-domain or trefoil motif: a role in renewal and pathology of mucous epithelia?" Trends Biochem. Sci. 18:239-243(1993). PubMed=8267796 [ 2] Otto B., Wright N. "Trefoil peptides. Coming up clover." Curr. Biol. 4:835-838(1994). PubMed=7820556 [ 3] Bork P. "A trefoil domain in the major rabbit zona pellucida protein." Protein Sci. 2:669-670(1993). PubMed=8518738 [ 4] Wright N.A., Hoffmann W., Otto W.R., Rio M.-C., Thim L. "Rolling in the clover: trefoil factor family (TFF)-domain peptides, cell migration and cancer." FEBS Lett. 408:121-123(1997). PubMed=9187350 [ 5] Sommer P., Blin N., Goett P. "Tracing the evolutionary origin of the TFF-domain, an ancient motif at mucous surfaces." Gene 236:133-136(1999). PubMed=10433974 [ 6] Lemercinier X., Muskett F.W., Cheeseman B., McIntosh P.B., Thim L., Carr M.D. "High-resolution solution structure of human intestinal trefoil factor and functional insights from detailed structural comparisons with the other members of the trefoil family of mammalian cell motility factors." Biochemistry 40:9552-9559(2001). PubMed=11583154 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00025} {PS00026; CHIT_BIND_I_1} {PS50941; CHIT_BIND_I_2} {BEGIN} ****************************************************** * Chitin-binding type-1 domain signature and profile * ****************************************************** Many plants respond to pathogenic attack by producing defense proteins that are capable of reversible binding to chitin, an Nacetylglucosamine polysaccharide present in the cell wall of fungi and the exoskeleton of insects. Most of these chitin-binding proteins include a common structural motif of 30 to 43 residues organized around a conserved four-disulfide core, known as the chitin-binding domain type-1 [1]. The topological arrangement of the four disulfide bonds is shown in the following figure: +-------------+ +----|------+ | | | | | xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC | ******|************* | | | | +----+ +--------------+ 'C': conserved cysteine involved in a disulfide bond. '*': position of the pattern. The structure (see of several chitin-binding domain type-1 have been solved, for example <PDB:1HEV>) [2]. The chitin-binding site is localized in a beta-hairpin loop formed by the second disulfide bridge. Conserved serine and aromatic residues associated with the hairpin-loop are essential for the chitin-binding activity [3]. The chitin-binding domain type-1 displays some structural similarities with the chitin-binding domain type-2 (see <PDOC50940>). Some of listed below: the proteins containing a chitin-binding domain type-1 are - A number of non-leguminous plant lectins. The best characterized of these lectins are the three highly homologous wheat germ agglutinins (WGA-1, 2 and 3). WGA is an N-acetylglucosamine/N-acetylneuraminic acid binding lectin which structurally consists of a fourfold repetition of the 43 amino acid domain. The same type of structure is found in a barley rootspecific lectin as well as a rice lectin. - Plants endochitinases (EC 3.2.1.14) from class IA (see <PDOC00620>). Endochitinases are enzymes that catalyze the hydrolysis of the beta-1,4 linkages of N-acetyl glucosamine polymers of chitin. Plant chitinases function as a defense against chitin containing fungal pathogens. Class IA chitinases generally contain one copy of the chitin-binding domain at their N-terminal extremity. An exception is agglutinin/chitinase [4] from the stinging nettle Urtica dioica which contains two copies of the domain. - Hevein, a wound-induced protein found in the latex of rubber trees. - Win1 and win2, two wound-induced proteins from potato. - Kluyveromyces lactis killer toxin alpha subunit [5]. The toxin encoded by the linear plasmid pGKL1 is composed of three subunits: alpha, beta, and gamma. The gamma subunit harbors toxin activity and inhibits growth of sensitive yeast strains in the G1 phase of the cell cycle; the alpha subunit, which is proteolytically processed from a larger precursor that also contains the beta subunit, is a chitinase (see <PDOC00839>). The profile we developed covers the whole domain. -Consensus pattern: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(3,4)-[FYW]-C [The 5 C's are involved in disulfide bonds] -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: Hevein is a strong allergen which is implied in the allergy to natural rubber latex (NRL). NLR can be associated to hypersensitivity to some plant-derived foods (latex–fruit syndrome). An increasing number of plant sources, such as avocado, banana, chestnut, kiwi, peach, tomato, potato and bell pepper, have been associated with this syndrome. Several papers [6,7] have shown that allergen cross-reactivity is due to IgE antibodies that recognize structurally similar epitopes on different proteins that are closely related. One of these family is plant defence proteins class I chitinase containing a type-1 chitin-binding domain. -Last update: December 2004 / Pattern and text revised. [ 1] Wright H.T., Sandrasegaram G., Wright C.S. "Evolution of a family of N-acetylglucosamine binding proteins containing the disulfide-rich domain of wheat germ agglutinin." J. Mol. Evol. 33:283-294(1991). PubMed=1757999 [ 2] Andersen N.H., Cao B., Rodriguez-Romero A., Arreguin B. "Hevein: NMR assignment and assessment of solution-state folding for the agglutinin-toxin motif." Biochemistry 32:1407-1422(1993). PubMed=8431421 [ 3] Asensio J.L., Canada F.J., Siebert H.C., Laynez J., Poveda A., Nieto P.M., Soedjanaamadja U.M., Gabius H.J., Jimenez-Barbero J. "Structural basis for chitin recognition by defense proteins: GlcNAc residues are bound in a multivalent fashion by extended binding sites in hevein domains." Chem. Biol. 7:529-543(2000). PubMed=10903932 [ 4] Lerner D.R., Raikhel N.V. "The gene for stinging nettle lectin (Urtica dioica agglutinin) encodes both a lectin and a chitinase." J. Biol. Chem. 267:11085-11091(1992). PubMed=1375935 [ 5] Butler A.R., O'Donnell R.W., Martin V.J., Gooday G.W., Stark M.J.R. "Kluyveromyces lactis toxin has an essential chitinase activity." Eur. J. Biochem. 199:483-488(1991). PubMed=2070799 [ 6] Sowka S., Hsieh L.S., Krebitz M., Akasawa A., Martin B.M., Starrett D., Peterbauer C.K., Scheiner O., Breiteneder H. "Identification and cloning of prs a 1, a 32-kDa endochitinase and major allergen of avocado, and its expression in the yeast Pichia pastoris." J. Biol. Chem. 273:28091-28097(1998). PubMed=9774427 [ 7] Wagner S., Breiteneder H. "The latex-fruit syndrome." Biochem. Soc. Trans. 30:935-940(2002). PubMed=12440950; +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00026} {PS51390; WAP} {BEGIN} ************************************************* * WAP-type 'four-disulfide core' domain profile * ************************************************* The 'four-disulfide core' or WAP domain comprises 8 cysteine residues involved in disulfide bonds in a conserved arrangement [1]. One or more of these domains occur in whey acidic protein (WAP), antileukoproteinase, elastase-inhibitor proteins and other structurally related proteins which are listed below. - Whey acidic protein (WAP). WAP is a major component of milk whey whose function might be that of a protease inhibitor. WAP consists of two 'four-disulfide core' domains in most mammals. - Antileukoproteinase 1 (HUSI), a mucous fluid serine proteinase inhibitor. HUSI consists of two 'four-disulfide core' domains. - Elafin, an elastase-specific inhibitor from human skin [2,3]. - Sodium/potassium ATPase inhibitors SPAI-1, -2, and -3 from pig [4]. - Chelonianin, a protease inhibitor from the eggs of red sea turtle. This inhibitor consists of two domains: an N-terminal domain which inhibits trypsin and belongs to the BPTI/Kunitz family of inhibitors, and a C-terminal domain which inhibits subtilisin and is a 'four-disulfide core domain'. - Extracellular peptidase inhibitor (WDNM1 protein), involved in the metastatic potential of adenocarcinomas in rats. - Caltrin-like protein 2 from guinea pig, which inhibits calcium transport into spermatozoa. - Kallmann syndrome protein (Anosmin-1 or KALIG-1) [5,6]. This secreted protein may be a adhesion-like molecule with anti-protease activity. It contains a 'four-disulfide core domain' in its N-terminal part. - Whey acidic protein (WAP) from the tammar wallaby, which consists of three 'four-disulfide core' domains [7]. - Waprins from snake venom, such as omwaprin from Oxyuranus microlepidotus [8] which has antibacterial activity against Gram-positive bacteria. The following schematic representation shows the position of the conserved cysteines that form the 'four-disulfide core' WAP domain (see <PDB:2REL>). +---------------------+ | +-----------+ | | | | | xxxxxxxCPxxxxxxxxxCxxxxCxxxxxCxxxxxCCxxxCxxxCxxxx | | | | | +--------------+ | | +----------------------------+ <------------------50-residues------------------> 'C': conserved cysteine involved in a disulfide bond. We developed a profile that WAP-type 'four-disulfide core' domain. covers the whole structure of the -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Claverie J.-M.; [email protected] -Last update: July 2008 / Pattern removed, profile added and text revised. [ 1] Hennighausen L.G., Sippel A.E. "Mouse whey acidic protein is a novel member of the family of 'four-disulfide core' proteins." Nucleic Acids Res. 10:2677-2684(1982). PubMed=6896234 [ 2] Wiedow O., Schroeder J.-M., Gregory H., Young J.A., Christophers E. "Elafin: an elastase-specific inhibitor of human skin. Purification, characterization, and complete amino acid sequence." J. Biol. Chem. 265:14791-14795(1990). PubMed=2394696 [ 3] Francart C., Dauchez M., Alix A.J., Lippens G. "Solution structure of R-elafin, a specific inhibitor of elastase." J. Mol. Biol. 268:666-677(1997). PubMed=9171290; DOI=10.1006/jmbi.1997.0983 [ 4] Araki K., Kuwada M., Ito O., Kuroki J., Tachibana S. "Four disulfide bonds' allocation of Na+, K(+)-ATPase inhibitor (SPAI)." Biochem. Biophys. Res. Commun. 172:42-46(1990). PubMed=2171523 [ 5] Legouis R., Hardelin J.-P., Levilliers J., Claverie J.-M., Compain S., Wunderle V., Millasseau P., Le Paslier D., Cohen D., Caterina D. Bougueleret L., Delemarre-Van de Waal H., Lutfalla G., Weissenbach J., Petit C. "The candidate gene for the X-linked Kallmann syndrome encodes a protein related to adhesion molecules." Cell 67:423-435(1991). PubMed=1913827 [ 6] Hu Y., Sun Z., Eaton J.T., Bouloux P.M., Perkins S.J. "Extended and flexible domain solution structure of the extracellular matrix protein anosmin-1 by X-ray scattering, analytical ultracentrifugation and constrained modelling." J. Mol. Biol. 350:553-570(2005). PubMed=15949815; DOI=10.1016/j.jmb.2005.04.031 [ 7] Simpson K.J., Ranganathan S., Fisher J.A., Janssens P.A., Shaw D.C., Nicholas K.R. "The gene for a novel member of the whey acidic protein family encodes three four-disulfide core domains and is asynchronously expressed during lactation." J. Biol. Chem. 275:23074-23081(2000). PubMed=10801834; DOI=10.1074/jbc.M002161200 [ 8] Nair D.G., Fry B.G., Alewood P., Kumar P.P., Kini R.M. "Antimicrobial activity of omwaprin, a new member of the waprin family of snake venom proteins." Biochem. J. 402:93-104(2007). PubMed=17044815; DOI=10.1042/BJ20060318 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00027} {PS00027; HOMEOBOX_1} {PS50071; HOMEOBOX_2} {BEGIN} ******************************************* * 'Homeobox' domain signature and profile * ******************************************* The 'homeobox' is a protein domain of 60 amino acids [1 to 5,E1] first identified in a number of Drosophila homeotic and segmentation proteins. It has since been found to be extremely well conserved in many other animals, including vertebrates. This domain binds DNA through a helix-turn-helix type of structure. Some of the proteins which contain a homeobox domain play an important role in development. Most of these proteins are known to be sequence specific DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of the yeast mating type proteins. These are sequence-specific DNA-binding proteins that act as master switches in yeast differentiation by controlling gene expression in a cell type-specific fashion. A schematic representation of the homeobox domain is shown below. The helix-turn-helix region is shown by the symbols 'H' (for helix), and 't' (for turn). xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx | | | | | | | 1 10 20 30 40 50 60 The pattern we developed to detect homeobox sequences long and spans positions 34 to 57 of the homeobox domain. is 24 residues -Consensus pattern: [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-{Y}-x(2){L}[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]x(5)[RKNAIMW] -Sequences known to belong to this class detected by the pattern: ALL, except for 10 sequences. -Other sequence(s) detected in Swiss-Prot: 9. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: Proteins which contain a homeobox domain can be classified, on the basis of their sequence characteristics, into various subfamilies. We have developed specific patterns for conserved elements of the antennapedia, engrailed and paired families. -Expert(s) to contact by email: Buerglin T.R.; [email protected] -Last update: April 2006 / Pattern revised. [ 1] Gehring W.J. (In) Guidebook to the homebox genes, Duboule D., Ed., pp1-10, Oxford University Press, Oxford, (1994). [ 2] Buerglin T.R. (In) Guidebook to the homebox genes, Duboule D., Ed., pp25-72, Oxford University Press, Oxford, (1994). [ 3] Gehring W.J. Trends Biochem. Sci. 17:277-280(1992). [ 4] Gehring W.J., Hiromi Y. "Homeotic genes and the homeobox." Annu. Rev. Genet. 20:147-173(1986). PubMed=2880555; DOI=10.1146/annurev.ge.20.120186.001051 [ 5] Schofield P.N. Trends Neurosci. 10:3-6(1987). [E1] http://www.biosci.ki.se/groups/tbu/homeo.html +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00028} {PS00028; ZINC_FINGER_C2H2_1} {PS50157; ZINC_FINGER_C2H2_2} {BEGIN} ****************************************************** * Zinc finger C2H2-type domain signature and profile * ****************************************************** 'Zinc finger' domains [1-5] are nucleic acid-binding protein structures first identified in the Xenopus transcription factor TFIIIA. These domains have since been found in numerous nucleic acid-binding proteins. A zinc finger domain is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown below: x x x x x x x x H x x x x C x \ / Zn x x x x x x x / C x x x \ H x x x x x Many classes of zinc fingers are characterized according to the number and positions of the histidine and cysteine residues involved in the zinc atom coordination. In the first class to be characterized, called C2H2, the first pair of zinc coordinating residues are cysteines, while the second pair are histidines. A number of experimental reports have demonstrated the zincdependent DNA or RNA binding property of some members of this class. Some of the proteins known to include C2H2-type zinc fingers are listed below. We have indicated, between brackets, the number of zinc finger regions found in each of these proteins; a '+' symbol indicates that only partial sequence data is available and that additional finger domains may be present. - Saccharomyces cerevisiae: ACE2 (3), ADR1 (2), AZF1 (4), FZF1 (5), MIG1 (2), MSN2 (2), MSN4 (2), RGM1 (2), RIM1 (3), RME1 (3), SFP1 (2), SSL1 (1), STP1 (3), SWI5 (3), VAC1 (1) and ZMS1 (2). - Emericella nidulans: brlA (2), creA (2). - Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot (5), Glass (5), Hunchback (6), Kruppel (5), Kruppel-H (4+), Odd-skipped (4), Odd-paired (4), Pep (3), Snail (5), Spalt-major (7), Serependity locus beta (6), delta (7), h-1 (8), Suppressor of hairy wing su(Hw) (12), Suppressor of variegation suvar(3)7 (5), Teashirt (3) and Tramtrack (2). - Xenopus: transcription factor TFIIIA (9), p43 from RNP particle (9), Xfin (37 !!), Xsna (5), gastrula XlcGF5.1 to XlcGF71.1 (from 4+ to 11+), Oocyte XlcOF2 to XlcOF22 (from 7 to 12). - Mammalian: basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppel-like transcription factor (3), transcription factors Sp1 (3), Sp2 (3), Sp3 (3) and Sp(4) 3, transcriptional repressor YY1 (4), Wilms' tumor protein (4), EGR1/Krox24 (3), EGR2/Krox20 (3), EGR3/Pilot (3), EGR4/AT133 (4), Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/ZNF40 (4), HIV-EP2 (2), KR1 (9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 (6+), REX-1 (4), ZfX (13), ZfY (13), Zfp-35 (18), ZNF7 (15), ZNF8 (7), ZNF35 (10), ZNF42/MZF-1 (13), ZNF43 (22), ZNF46/Kup (2), ZNF76 (7), ZNF91 (36), ZNF133 (3). In addition to the conserved zinc ligand residues it has been shown [6] that a number of other positions are also important for the structural integrity of the C2H2 zinc fingers. The best conserved position is found four residues after the second cysteine; it is generally an aromatic or aliphatic residue. A profile was also developed that spans the whole domain. -Consensus pattern: C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H [The 2 C's and the 2 H's are zinc ligands] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 42. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: 2. -Note: In proteins that include many copies of the C2H2 zinc finger domain, incomplete or degenerate copies of the domain are frequently found. The former are generally found at the extremity of the zinc finger region(s); the latter have typically lost one or more of the zinc-coordinating residues or are interrupted by insertions or deletions. Our pattern does not detect any of these finger domains. -Expert(s) to contact by email: Becker K.G.; [email protected] -Last update: May 2004 / Text revised. [ 1] Klug A., Rhodes D. Trends Biochem. Sci. 12:464-469(1987). [ 2] Evans R.M., Hollenberg S.M. "Zinc fingers: gilt by association." Cell 52:1-3(1988). PubMed=3125980 [ 3] Payre F., Vincent A. "Finger proteins and DNA-specific recognition: distinct patterns of conserved amino acids suggest different evolutionary modes." FEBS Lett. 234:245-250(1988). PubMed=3292287 [ 4] Miller J., McLachlan A.D., Klug A. "Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes." EMBO J. 4:1609-1614(1985). PubMed=4040853 [ 5] Berg J.M. "Proposed structure for the zinc-binding domains from transcription factor IIIA and related proteins." Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). PubMed=3124104 [ 6] Rosenfeld R., Margalit H. "Zinc fingers: conserved properties that can distinguish between spurious and actual DNA-binding motifs." J. Biomol. Struct. Dyn. 11:557-570(1993). PubMed=8129873 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00029} {PS00029; LEUCINE_ZIPPER} {BEGIN} ************************** * Leucine zipper pattern * ************************** A structure, referred to as the 'leucine zipper' [1,2], has been proposed to explain how some eukaryotic gene regulatory proteins work. The leucine zipper consist of a periodic repetition of leucine residues at every seventh position over a distance covering eight helical turns. The segments containing these periodic arrays of leucine residues seem to exist in an alphahelical conformation. The leucine side chains extending from one alpha-helix interact with those from a similar alpha helix of a second polypeptide, facilitating dimerization; the structure formed by cooperation of these two regions forms a coiled coil [3]. The leucine zipper pattern is present in many gene regulatory proteins, such as: - The - The ATFs). - The - The - The - The - The CCATT-box and enhancer binding protein (C/EBP). cAMP response element (CRE) binding proteins (CREB, CRE-BP1, Jun/AP1 family of transcription factors. yeast general control protein GCN4. fos oncogene, and the fos-related proteins fra-1 and fos B. C-myc, L-myc and N-myc oncogenes. octamer-binding transcription factor 2 (Oct-2/OTF-2). -Consensus pattern: L-x(6)-L-x(6)-L-x(6)-L -Sequences known to belong to this class detected by the pattern: All those mentioned in the original paper, with the exception of L-myc which has a Met instead of the second Leu. -Other sequence(s) detected in Swiss-Prot: some 600 other sequences from every category of protein families. -Note: As this is far from being a specific pattern you should be cautious in citing the presence of such pattern in a protein if it has not been shown to be a nuclear DNA-binding protein. -Last update: December 1992 / Text revised. [ 1] Landschulz W.H., Johnson P.F., McKnight S.L. "The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins." Science 240:1759-1764(1988). PubMed=3289117 [ 2] Busch S.J., Sassone-Corsi P. "Dimers, leucine zippers and DNA-binding domains." Trends Genet. 6:36-40(1990). PubMed=2186528 [ 3] O'Shea E.K., Rutkowski R., Kim P.S. Science 243:538-542(1989). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00030} {PS50102; RRM} {BEGIN} ******************************************** * Eukaryotic RNA recognition motif profile * ******************************************** Many eukaryotic proteins that are known or supposed to bind singlestranded RNA contain one or more copies of a putative RNA-binding domain of about 90 amino acids [1,2]. This domain is known as the RNA recognition motif (RRM). This region has been found in the following proteins: ** Heterogeneous nuclear ribonucleoproteins ** - hnRNP A1 (helix destabilizing protein) (twice). - hnRNP A2/B1 (twice). - hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). ** Small nuclear ribonucleoproteins ** - U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B'' (once). ** Pre-RNA and mRNA associated proteins ** - Protein synthesis initiation factor 4B (eIF-4B) [3], a protein essential for the binding of mRNA to ribosomes (once). - Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once). - Yeast protein NSR1 (twice). NSR1 is involved in pre-rRNA processing; it specifically binds nuclear localization sequences. - Poly(A) binding protein (PABP) (4 times). ** Others ** - Drosophila sex determination protein Sex-lethal (Sxl) (twice). - Drosophila sex determination protein Transformer-2 (Tra-2) (once). - Drosophila 'elav' protein (3 times), which is probably involved in the RNA metabolism of neurons. - Human paraneoplastic encephalomyelitis antigen HuD (3 times) [4], which is highly similar to elav and which may play a role in neuronspecific RNA processing. - Drosophila 'bicoid' protein (once) [5], a segment-polarity homeobox protein that may also bind to specific mRNAs. - La antigen (once), a protein which may play a role in the transcription of RNA polymerase III. - The 60 Kd Ro protein (once), a putative RNP complex protein. - A maize protein induced by abscisic acid in response to water stress, which seems to be a RNA-binding protein. - Three tobacco proteins, located in the chloroplast [6], which may be involved in splicing and/or processing of chloroplast RNAs (twice). - X16 [7], a mammalian protein which may be involved in RNA processing in relation with cellular proliferation and/or maturation. - Insulin-induced growth response protein Cl-4 from rat (twice). - Nucleolysins TIA-1 and TIAR (3 times) [8] which possesses nucleolytic activity against cytotoxic lymphocyte target cells. may be involved in apoptosis. - Yeast RNA15 protein, which plays a role in mRNA stability and/or poly-(A) tail length [9]. Inside the RRM there are two regions which are highly conserved. The first one is a hydrophobic segment of six residues (which is called the RNP-2 motif), the second one is an octapeptide motif (which is called RNP-1 or RNPCS). The position of both motifs in the domain is shown in the following schematic representation: xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx########xxxxxxxxxxxxxxxxxxxxxxx xx RNP-2 RNP-1 We have developed a profile that spans the RRM domain. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: August 2004 / Text revised; pattern deleted. [ 1] Bandziulis R.J., Swanson M.S., Dreyfuss G. "RNA-binding proteins as developmental regulators." Genes Dev. 3:431-437(1989). PubMed=2470643 [ 2] Dreyfuss G., Swanson M.S., Pinol-Roma S. "Heterogeneous nuclear ribonucleoprotein particles and the pathway of mRNA formation." Trends Biochem. Sci. 13:86-91(1988). PubMed=3072706 [ 3] Milburn S.C., Hershey J.W.B., Davies M.V., Kelleher K., Kaufman R.J. "Cloning and expression of eukaryotic initiation factor 4B cDNA: sequence determination identifies a common RNA recognition motif." EMBO J. 9:2783-2790(1990). PubMed=2390971 [ 4] Szabo A., Dalmau J., Manley G., Rosenfeld M., Wong E., Henson J., Posner J.B., Furneaux H.M. "HuD, a paraneoplastic encephalomyelitis antigen, contains RNAbinding domains and is homologous to Elav and Sex-lethal." Cell 67:325-333(1991). PubMed=1655278 [ 5] Rebagliati M. "An RNA recognition motif in the bicoid protein." Cell 58:231-232(1989). PubMed=2752425 [ 6] Li Y.Q., Sugiura M. "Three distinct ribonucleoproteins from tobacco chloroplasts: each contains a unique amino terminal acidic domain and two ribonucleoprotein consensus motifs." EMBO J. 9:3059-3066(1990). PubMed=1698606 [ 7] Ayane M., Preuss U., Koehler G., Nielsen P.J. "A differentially expressed murine RNA encoding a protein with similarities to two types of nucleic acid binding motifs." Nucleic Acids Res. 19:1273-1278(1991). PubMed=2030943 [ 8] Kawakami A., Tian Q., Duan X., Streuli M., Schlossman S.F., Anderson P. "Identification and functional characterization of a TIA-1-related nucleolysin." Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685(1992). PubMed=1326761 [ 9] Minvielle-Sebastia L., Winsor B., Bonneaud N., Lacroute F. Mol. Cell. Biol. 11:3075-3087(1991). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00031} {PS00031; NUCLEAR_REC_DBD_1} {PS51030; NUCLEAR_REC_DBD_2} {BEGIN} ********************************************************************** * Nuclear hormone receptors DNA-binding domain signature and profile * ********************************************************************** Nuclear hormone receptors are ligand-activated transcription factors that regulate gene expression by interacting with specific DNA sequences upstream of their target genes. In vertebrates, these proteins regulate diverse biological processes such as pattern formation, cellular differentiation and homeostasis [1 to 6]. Classical nuclear hormone receptors contain two conserved regions, the hormone binding domain and a DNA-binding domain (DBD) that is composed of two C4-type zinc fingers. The DBD is responsible for targeting the receptors to their hormone response elements (HRE). It binds as a dimer with each monomer recognizing a six base pair sequence of DNA. The vast majority of targets contain the same 5'-AGGTCA-3' consensus sequence [7]. In some cases a less conserved C-terminal extension of the core DBD confers the DNA selectivity [8]. The two zinc fingers fold to form a single structural domain (see <PDB:1HCQ>) [9,10]. The structure consists of two helices perpendicular to each other. A zinc ion, coordinated by four conserved cysteines, holds the base of a loop at the N terminus of each helix. The helix of each monomer makes sequence specific contacts in the major groove of the DNA. Proteins known domain are listed below: to contain a nuclear hormone receptor DNA-binding - Androgen receptor (AR). - Estrogen receptor (ER). - Glucocorticoid receptor (GR). - Mineralocorticoid receptor (MR). - Progesterone receptor (PR). - Retinoic acid receptors (RARs and RXRs). - Thyroid hormone receptors (TR) alpha and beta. - The avian erythroblastosis virus oncogene v-erbA, derived from a cellular thyroid hormone receptor. - Vitamin D3 receptor (VDR). - Insects ecdysone receptor (EcR). - COUP transcription factor (also known as ear-3), and its Drosophila homolog seven-up (svp). - Hepatocyte nuclear factor 4 (HNF-4), which binds to DNA sites required for the transcription of the genes for alpha-1-antitrypsin, apolipoprotein CIII and transthyretin. - Ad4BP, a protein that binds to the Ad4 site found in the promoter region of steroidogenic P450 genes. - Apolipoprotein AI regulatory protein-1 (ARP-1), required for the transcription of apolipoprotein AI. - Peroxisome proliferator activated receptors (PPAR), transcription factors specifically activated by peroxisome proliferators. They control the peroxisomal beta-oxidation pathway of fatty acids by activating the gene for acyl-CoA oxidase. - Drosophila protein knirps (kni), a zygotic gap protein required for abdominal segmentation of the Drosophila embryo. - Drosophila protein ultraspiracle (usp) (or chorion factor 1), which binds to the promoter region of s15 chorion gene. - Human estrogen receptor related genes 1 and 2 (err1 and err2). - Human erbA related gene 2 (ear-2). - Mammalian NGFI-B (NAK1, nur/77, N10). - Mammalian NOT/nurR1/RNR-1. - Drosophila protein embryonic gonad (egon). - Drosophila knirps-related protein (knrl). - Drosophila protein tailless (tll). - Drosophila 20-oh-ecdysone regulated protein E75. - Insects Hr3. - Insects Hr38. - Caenorhabditis elegans cnr-8, cnr-14, and odr-7 - Caenorhabditis elegans hypothetical proteins B0280.8, EO2H1.7 and K06A1.4. As a signature pattern for this family of proteins, we took the most conserved residues, the first 27, of the DNA-binding domain. We also developed a profile that spans the whole domain. -Consensus pattern: C-x(2)-C-x(1,2)-[DENAVSPHKQT]-x(5,6)-[HNY]-[FY]-x(4)Cx(2)-C-x(2)-F(2)-x-R [The 4 C's are zinc ligands] -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: April 2006 / Pattern revised. [ 1] Gronemeyer H., Laudet V. Protein Prof. 2:1173-1308(1995). [ 2] Evans R.M. "The steroid and thyroid hormone receptor superfamily." Science 240:889-895(1988). PubMed=3283939 [ 3] Gehring U. Trends Biochem. Sci. 12:399-402(1987). [ 4] Beato M. "Gene regulation by steroid hormones." Cell 56:335-344(1989). PubMed=2644044 [ 5] Segraves W.A. "Something old, some things new: the steroid receptor superfamily in Drosophila." [ 6] [ 7] [ 8] [ 9] [10] Cell 67:225-228(1991). PubMed=1913821 Laudet V., Haenni C., Coll J., Catzeflis F., Stehelin D. "Evolution of the nuclear receptor gene superfamily." EMBO J. 11:1003-1013(1992). PubMed=1312460 Stunnenberg H.G. "Mechanisms of transactivation by retinoic acid receptors." BioEssays 15:309-315(1993). PubMed=8393666 Zhao Q., Khorasanizadeh S., Miyoshi Y., Lazar M.A., Rastinejad F. "Structural elements of an orphan nuclear receptor-DNA complex." Mol. Cell 1:849-861(1998). PubMed=9660968 Schwabe J.W.R., Neuhaus D., Rhodes D. "Solution structure of the DNA-binding domain of the oestrogen receptor." Nature 348:458-461(1990). PubMed=2247153; DOI=10.1038/348458a0 Schwabe J.W.R., Chapman L., Finch J.T., Rhodes D. Cell 75:567-578(1993). +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00032} {PS00032; ANTENNAPEDIA} {BEGIN} ************************************************** * 'Homeobox' antennapedia-type protein signature * ************************************************** The homeotic Hox proteins are sequence-specific transcription factors. They are part of a developmental regulatory system that provides cells with specific positional identities on the anterior-posterior (A-P) axis [1]. The hox proteins contain a 'homeobox' domain. In Drosophila and other insects, there are eight different Hox genes that are encoded in two gene complexes, ANT-C and BX-C. In vertebrates there are 38 genes organized in four complexes. In six of the eight Drosophila Hox genes the homeobox domain is highly similar and a conserved hexapeptide is found five to sixteen amino acids upstream of the homeobox domain. The six Drosophila proteins that belong to this group are antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb), sex combs reduced (scr) and ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily. In vertebrates the corresponding Hox genes are known [2] as Hox-A2, A3, A4, A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1, D3, D4 and D8. Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily. As a signature pattern for this subfamily of used the conserved hexapeptide. homeobox proteins, we have -Consensus pattern: [LIVMFE]-[FY]-P-W-M-[KRQTA] -Sequences known to belong to this class detected by the pattern: ALL, except for 6 sequences. -Other sequence(s) detected in Swiss-Prot: 3. -Note: Arg and Lys are most frequently found in the last position of the hexapeptide; other amino acids are found in only a few cases. -Last update: June 1994 / Text revised. [ 1] McGinnis W., Krumlauf R. "Homeobox genes and axial patterning." Cell 68:283-302(1992). PubMed=1346368 [ 2] Scott M.P. "Vertebrate homeobox gene nomenclature." Cell 71:551-553(1992). PubMed=1358459 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00033} {PS00033; ENGRAILED} {BEGIN} *********************************************** * 'Homeobox' engrailed-type protein signature * *********************************************** Most proteins which contain a 'homeobox' domain can be classified [1,2], on the basis of their sequence characteristics, in three subfamilies: engrailed, antennapedia and paired. Proteins currently known to belong to the engrailed subfamily are: - Drosophila segmentation polarity protein engrailed (en) which specifies the body segmentation pattern and is required for the development of the central nervous system. - Drosophila invected protein (inv). - Silk moth proteins engrailed and invected, which may be involved in the compartmentalization of the silk gland. - Honeybee E30 and E60. - Grasshopper (Schistocerca americana) G-En. - Mammalian and birds En-1 and En-2. - Zebrafish Eng-1, -2 and -3. - Sea urchin (Tripneusteas gratilla) SU-HB-en. - Leech (Helobdella triserialis) Ht-En. - Caenorhabditis elegans ceh-16. Engrailed homeobox proteins are characterized by the presence of a conserved region of some 20 amino-acid residues located at the C-terminal of the 'homeobox' domain. As a signature pattern for this subfamily of proteins, we have used a stretch of eight perfectly conserved residues in this region. -Consensus pattern: L-M-A-[EQ]-G-L-Y-N -Sequences known to belong to this class detected by the pattern: ALL, except for ceh-16. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: July 1999 / Pattern and text revised. [ 1] Scott M.P., Tamkun J.W., Hartzell G.W. III "The structure and function of the homeodomain." Biochim. Biophys. Acta 989:25-48(1989). PubMed=2568852 [ 2] Gehring W.J. "Homeo boxes in the study of development." Science 236:1245-1252(1987). PubMed=2884726 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00034} {PS00034; PAIRED_1} {PS51057; PAIRED_2} {BEGIN} *************************************** * Paired domain signature and profile * *************************************** The paired domain is a ~126 amino acid DNA-binding domain, which is found in eukaryotic transcription regulatory proteins involved in embryogenesis. The domain was originally described as the 'paired box' in the Drosophila protein paired (prd) [1,2]. The paired domain is generally located in the Nterminal part. An octapeptide [3] and/or a homeodomain (see <PDOC00027>) can occur C-terminal to the paired domain, as well as a Pro-Ser-Thr-rich Cterminus. Paired domain proteins can function as transcription repressors or activators. The paired domain contains three subdomains, which show functional differences in DNA-binding. The crystal structures of prd and Pax proteins show that the DNA-bound paired domain is bipartite, consisting of an N-terminal subdomain (PAI or NTD) and a C-terminal subdomain (RED or CTD), connected by a linker (see <PDB:1K78>). PAI and RED each form a three-helical fold, with the most C-terminal helices comprising a helix-turn-helix (HTH) motif that binds the DNA major groove. In addition, the PAI subdomain encompasses an N-terminal beta-turn and beta-hairpin, also named 'wing', participating in DNA-binding. The linker can bind into the DNA minor groove. Different Pax proteins and their alternatively spliced isoforms use different (sub)domains for DNA-binding to mediate the specificity of sequence recognition [4,5]. Some proteins known to contain a paired domain: - Drosophila paired (prd), a segmentation pair-rule class protein. - Drosophila gooseberry proximal (gsb-p) and gooseberry distal (gsb-d), segmentation polarity class proteins. - Drosophila Pox-meso and Pox-neuro proteins. The Pax proteins: - Mammalian protein Pax1, which may play a role in the formation of segmented structures in the embryo. In mouse, mutations in Pax1 produce the undulated phenotype, characterized by vertebral malformations along the entire rostro-caudal axis. - Mammalian protein Pax2, a probable transcription factor that may have a role in kidney cell differentiation. - Mammalian protein Pax3. Pax3 is expressed during early neurogenesis. In Man, defects in Pax3 are the cause of Waardenburg's syndrome (WS), an autosomal dominant combination of deafness and pigmentary disturbance. - Mammalian protein Pax5, also known as B-cell specific transcription factor (BSAP). Pax5 is involved in the regulation of the CD19 gene. It plays an important role in B-cell differentiation as well as neural development and spermatogenesis. - Mammalian protein Pax6 (oculorhombin). Pax6 is a transcription factor with important functions in eye and nasal development. In Man, defects in Pax6 are the cause of aniridia type II (AN2), an autosomal dominant disorder characterized by complete or partial absence of the iris. - Mammalian protein Pax8, required in thyroid development. - Mammalian protein Pax9. In man, defects in Pax9 cause oligodontia. - Zebrafish proteins Pax[Zf-a] and Pax[Zf-b]. We use the region spanning positions 34 to 50 of the paired domain as a signature pattern. This conserved region spans the DNA-binding HTH located in the N-terminal subdomain. We also developed a profile that covers the entire paired domain, including the PAI and RED subdomains and which allows a more sensitive detection. -Consensus pattern: R-P-C-x(11)-C-V-S -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: January 2005 / Text revised; profile added. [ 1] Bopp D., Burri M., Baumgartner S., Frigerio G., Noll M. "Conservation of a large protein domain in the segmentation gene paired and in functionally related genes of Drosophila." Cell 47:1033-1040(1986). PubMed=2877747 [ 2] Baumgartner S., Bopp D., Burri M., Noll M. "Structure of two genes at the gooseberry locus related to the paired gene and their spatial expression during Drosophila embryogenesis." Genes Dev. 1:1247-1267(1987). PubMed=3123319 [ 3] Eberhard D., Jimenez G., Heavey B., Busslinger M. "Transcriptional repression by Pax5 (BSAP) through interaction with corepressors of the Groucho family." EMBO J. 19:2292-2303(2000). PubMed=10811620; DOI=10.1093/emboj/19.10.2292 [ 4] Underhill D.A. "Genetic and biochemical diversity in the Pax gene family." Biochem. Cell Biol. 78:629-638(2000). PubMed=11103953 [ 5] Apuzzo S., Abdelhakim A., Fortin A.S., Gros P. "Cross-talk between the paired domain and the homeodomain of Pax3: DNA binding by each domain causes a structural change in the other domain, supporting interdependence for DNA Binding." J. Biol. Chem. 279:33601-33612(2004). PubMed=15148315; DOI=10.1074/jbc.M402949200 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00035} {PS00035; POU_1} {PS00465; POU_2} {PS51179; POU_3} {BEGIN} ***************************************************** * POU-specific (POUs) domain signatures and profile * ***************************************************** The POU (pronounced 'pow') domain [1 to 7 ] is a highly charged 155-162amino acid region of sequence similarity which has been identified in the three mammalian transcription factors Pit-1, Oct-1, and Oct-2 and in the product of the nematode gene unc-86. The POU domain is a bipartite DNA binding protein module that binds selectively to the DNA octamer motif ATGCAAAT and a subset of derivatives. It consists of two subdomains, a C-terminal homeodomain (POUh) (see <PDOC00027>) and an N-terminal 75- to 82-residue POU-specific (POUs) region separated by a short non-conserved linker. The POU-specific region or 'box' can be subdivided further into two highly conserved regions, A and B, separated by a less highly conserved segment. The POUs domain is always found in association with a POUh domain, and both are required for high affinity and sequence-specific DNA binding. The POUs domain consists of four alpha helices packed to enclose an extensive hydrophobic core (see <PDB:1POU>). The POUs domain contains an unusual HTH structure, which differs from the canonical HTH motif in the length of the first alpha helix and the turn. The region of hypervariability located between subdomains A and B lies within the sequence corresponding to the Cterminal end of helix 2 and the linker between helices 2 and 3. In the model of the POUs-DNA complex, the C-terminus of helix 2 and the turn of the HTH motif project away from the DNA such that sequence variability in this region can be accomodated without adversely affecting DNA binding [8]. Some proteins currently known to contain a POUs domain are listed below: - Oct-1 (or OTF-1, NF-A1) (gene POU2F1), a transcription factor for small nuclear RNA and histone H2B genes. - Oct-2 (or OTF-2, NF-A2) (gene POU2F2), a transcription factor that specifically binds to the immunoglobulin promoters octamer motif and activates these genes. - Oct-3 (or Oct-4, NF-A3) (gene POU5F1), a transcription factor that also binds to the octamer motif. - Oct-6 (or OTF-6, SCIP) (gene POU3F1), an octamer-binding transcription factor thought to be involved in early embryogenesis and neurogenesis. - Oct-7 (or N-Oct 3, OTF-7, Brn-2) (gene POU3F2), a nervous-system specific octamer-binding transcription factor. - Oct-11 (or OTF-11) (gene POU2F3), an octamer-binding transcription factor. - Pit-1 (or GHF-1) (gene POU1F1), a transcription factor that activates growth hormone and prolactin genes. - Brn-1 (or OTF-8) (gene POU3F3). - Brn-3A (or RDC-1) (gene POU4F1), a probable transcription factor that may play a role in neuronal tissue differentiation. - Brn-3B (gene POU4F2), a probable transcription factor that may play a role in determining or maintaining the identities of a small subset of visual system neurons. - Brn-3C (gene POU4F3). - Brn-4 (or OTF-9) (gene POU3F4), a probable transcription factor which exert its primary action widely during early neural development and in a very limited set of neurons in the mature brain. - Mpou (or Brn-5, Emb) (gene POU6F1), a transcription factor that binds preferentially to a variant of the octamer motif. - Skn, that activates cytokeratin 10 (k10) gene expression. - Sprm-1, a transcription factor that binds preferentially to the octamer motif and that may exert a regulatory function in meiotic events that are required for terminal differentiation of male germ cell. - Unc-86, a Caenorhabditis elegans transcription factor involved in cell lineage and differentiation. - Cf1-a, a Drosophila neuron-specific transcription factor necessary for the expression of the dopa decarboxylase gene (dcc). - I-POU, a Drosophila protein that forms a stable heterodimeric complex with Cf1-a and inhibits its action. - Drosophila protein nubbin/twain (PDM-1 or DPou-19). - Drosophila protein didymous (PDM-2 or DPou-28) that may play multiple roles during development. - Bombyx mori silk gland factor 3 (SGF-3). - Xenopus proteins Pou1, Pou2, and Pou3. - Zebrafish proteins Pou1, Pou2, Pou[C], ZP-12, ZP-23, ZP-47 and ZP-50. - Caenorhabditis elegans protein ceh-6. - Caenorhabditis elegans protein ceh-18. We have derived two signature patterns for the 'POU' domain. The first one spans positions 15 to 27 of the domain, the second positions 42 to 55. We have also developed a profile which covers the entire POUs domain. -Consensus pattern: [RKQ]-R-[LIM]-x-[LF]-G-[LIVMFY]-x-Q-x-[DNQ]-V-G -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Consensus pattern: S-Q-[STK]-[TA]-I-[SC]-R-[FH]-[ET]-x-[LSQ]-x(0,1)[LIR][ST] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: January 2006 / Text revised; profile added. [ 1] Robertson M. "Homoeo boxes, POU proteins and the limits to promiscuity." Nature 336:522-524(1988). PubMed=2904652; DOI=10.1038/336522a0 [ 2] Sturm R.A., Herr W. "The POU domain is a bipartite DNA-binding structure." Nature 336:601-604(1988). PubMed=2904656; DOI=10.1038/336601a0 [ 3] Herr W., Sturm R.A., Clerc R.G., Corcoran L.M., Baltimore D., Sharp P.A., Ingraham H.A., Rosenfeld M.G., Finney M., Ruvkun G., Horvitz H.R. "The POU domain: a large conserved region in the mammalian pit-1, oct-1, oct-2, and Caenorhabditis elegans unc-86 gene products." Genes Dev. 2:1513-1516(1988). PubMed=3215510 [ 4] Levine M., Hoey T. "Homeobox proteins as sequence-specific transcription factors." Cell 55:537-540(1988). PubMed=2902929 [ 5] Rosenfeld M.G. "POU-domain transcription factors: pou-er-ful developmental regulators." Genes Dev. 5:897-907(1991). PubMed=2044958 [ 6] Schoeler H.R. Trends Genet. 7:323-329(1991). [ 7] Verrijzer C.P., Van der Vliet P.C. "POU domain transcription factors." Biochim. Biophys. Acta 1173:1-21(1993). PubMed=8485147 [ 8] Assa-Munt N., Mortishire-Smith R.J., Aurora R., Herr W., Wright P.E. "The solution structure of the Oct-1 POU-specific domain reveals a striking similarity to the bacteriophage lambda repressor DNAbinding domain." Cell 73:193-205(1993). PubMed=8462099 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00036} {PS00036; BZIP_BASIC} {PS50217; BZIP} {BEGIN} ************************************************************ * Basic-leucine zipper (bZIP) domain signature and profile * ************************************************************ The bZIP superfamily [1,2] of eukaryotic DNA-binding transcription factors groups together proteins that contain a basic region mediating sequencespecific DNA-binding followed by a leucine zipper (see <PDOC00029>) required for dimerization. bZIP domains usually bind a pallindromic 6 nucleotide site, but the specificity can be altered by interaction with accessory factor [3]. Several structure of bZIP have been solved (see for example <PDB:1AN2>) [4]. The basic region and the leucine zipper form a contiguous alpha helice where the four hydrophobic residues of the leucine zipper are oriented on one side. This conformation allows dimerization in parallel and it bends the helices so that the newly functional dimer forms a flexible fork where the basic domains, at the N-terminal open end, can then interact with DNA. The two leucine zipper are therefore oriented perpendicular to the DNA [4,5]. This family is quite large and we only list here some representative members. - Transcription factor AP-1, which binds selectively to enhancer elements in the cis control regions of SV40 and metallothionein IIA. AP-1, also known as c-jun, is the cellular homolog of the avian sarcoma virus 17 (ASV17) oncogene v-jun. - Jun-B and jun-D, probable transcription factors which are highly similar to jun/AP-1. - The fos protein, a proto-oncogene that forms a non-covalent dimer with c-jun. - The fos-related proteins fra-1, and fos B. - Mammalian cAMP response element (CRE) binding proteins CREB, CREM, ATF-1, ATF-3, ATF-4, ATF-5, ATF-6 and LRF-1. - Maize Opaque 2, a trans-acting transcriptional activator involved in the regulation of the production of zein proteins during endosperm. - Arabidopsis G-box binding factors GBF1 to GBF4, Parsley CPRF-1 to CPRF-3, Tobacco TAF-1 and wheat EMBP-1. All these proteins bind the G-box promoter elements of many plant genes. - Drosophila protein Giant, which represses the expression of both the kruppel and knirps segmentation gap genes. - Drosophila Box B binding factor 2 (BBF-2), a transcriptional activator that binds to fat body-specific enhancers of alcohol dehydrogenase and yolk protein genes. - Drosophila segmentation protein cap'n'collar (gene cnc), which is involved in head morphogenesis. - Caenorhabditis elegans skn-1, a developmental protein involved in the fate of ventral blastomeres in the early embryo. - Yeast GCN4 transcription factor, a component of the general control system that regulates the expression of amino acid-synthesizing enzymes in response to amino acid starvation, and the related Neurospora crassa cpc-1 protein. - Neurospora crassa cys-3 which turns on the expression of structural genes which encode sulfur-catabolic enzymes. - Yeast MET28, a transcriptional activator of sulfur amino acids metabolism. - Yeast PDR4 (or YAP1), a transcriptional activator of the genes for some oxygen detoxification enzymes. - Epstein-Barr virus trans-activator protein BZLF1. The pattern we developped is directed against also developed a profile that covers the whole domain. the basic region. We -Consensus pattern: [KR]-x(1,3)-[RKSAQ]-N-{VL}-x-[SAQ](2)-{L}-[RKTAENQ]x-R{S}-[RK] -Sequences known to belong to this class detected by the profile: the large majority. -Other sequence(s) detected in Swiss-Prot: 18. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: April 2006 / Pattern revised. [ 1] Hurst H.C. Protein Prof. 2:105-168(1995). [ 2] Ellenberger T. Curr. Opin. Struct. Biol. 4:12-21(1994). [ 3] Baranger A.M. "Accessory factor-bZIP-DNA interactions." Curr. Opin. Chem. Biol. 2:18-23(1998). PubMed=9667910 [ 4] Ferre-D'amare A.R., Prendergast G.C., Ziff E.B., Burley S.K. Nature 363:38-45(1993). [ 5] Ellenberger T.E., Brandl C.J., Struhl K., Harrison S.C. "The GCN4 basic region leucine zipper binds DNA as a dimer of uninterrupted alpha helices: crystal structure of the protein-DNA complex." Cell 71:1223-1237(1992). PubMed=1473154 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00037} {PS50090; MYB_LIKE} {PS51294; HTH_MYB} {BEGIN} ******************************************** * Myb-type HTH DNA-binding domain profiles * ******************************************** The myb family can be classified into three groups: the myb-type HTH domain, which binds DNA, the SANT domain, which is a protein-protein interaction module (see <PDOC51293>) and the myb-like domain that can be involved in either of these functions. The myb-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of ~55 amino acids, typically occurring in a tandem repeat in eukaryotic transcription factors. The domain is named after the retroviral oncogene v-myb, and its cellular counterpart c-myb, which encode nuclear DNAbinding proteins that specifically recognize the sequence YAAC(G/T)G [1,2]. Myb proteins contain three tandem repeats of 51 to 53 amino acids, termed R1, R2 and R3. This repeat region is involved in DNA-binding and R2 and R3 bind directly to the DNA major groove. The major part of the first repeat is missing in retroviral v-Myb sequences and in plant myb-related (R2R3) proteins [3]. A single myb-type HTH DNA-binding domain occurs in TRF1 and TRF2. The 3D-structure of the myb-type HTH domain forms three alpha-helices (see <PDB:1H88; C>) [4]. The second and third helices connected via a turn comprise the helix-turn-helix motif. Helix 3 is termed the recognition helix as it binds the DNA major groove, like in other HTHs. Some proteins known to contain a myb-type HTH domain: - Fruit fly myb protein [2]. - Vertebrate myb-like proteins A-myb and B-myb. - Maize anthocyanin regulatory C1 protein, a trans-acting factor which controls the expression of genes involved in anthocyanin biosynthesis. - Maize P protein [5], a trans-acting factor which regulates the biosynthetic pathway of a flavonoid-derived pigment in certain floral tissues. - Arabidopsis thaliana protein GL1/GLABROUS1 [6], required for the initiation of differentiation of leaf hair cells (trichomes). - Maize and barley myb-related proteins Zm1, Zm38 and Hv1, Hv33 [7]. - Yeast BAS1 [8], a transcriptional activator for the HIS4 gene. - Yeast REB1 [9], which recognizes sites within both the enhancer and the promoter of rRNA transcription, as well as upstream of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity is required for cell cycle progression and growth during G2. - Fission yeast myb1, which regulates telomere length and function. - Baker's yeast pre-mRNA-splicing factor CEF1. - Vertebrate telomeric repeat-binding factors 1 and 2 (TRF1/2), which bind to telomeric DNA and are involved in telomere length regulation. We have developed a profile, which has been manually adapted to specifically detect the DNA-binding myb-type HTH domain. A second general profile was developed for detection of the myb-like domain with a high sensitivity. A third profile was developed for the SANT domain (see <PDOC51293>). -Sequences known to belong to this class detected by the first profile: ALL. -Other sequence(s) detected in Swiss-Prot: 2. -Sequences known to belong to this class detected by the second profile: ALL, except 25. -Other sequence(s) detected in Swiss-Prot: 2. -Note: The profiles are in competition with one another and with the profile of the SANT domain (see <PDOC51293>). -Last update: added; February 2007 / Profile and text revised; profile patterns removed. [ 1] Biedenkapp H., Borgmeyer U., Sippel A.E., Klempnauer K.-H. "Viral myb oncogene encodes a sequence-specific DNA-binding activity." Nature 335:835-837(1988). PubMed=3185713; DOI=10.1038/335835a0 [ 2] Peters C.W.B., Sippel A.E., Vingron M., Klempnauer K.-H. "Drosophila and vertebrate myb proteins share two conserved regions, one of which functions as a DNA-binding domain." EMBO J. 6:3085-3090(1987). PubMed=3121304 [ 3] Stracke R., Werber M., Weisshaar B. "The R2R3-MYB gene family in Arabidopsis thaliana." Curr. Opin. Plant. Biol. 4:447-456(2001). PubMed=11597504 [ 4] Tahirov T.H., Sato K., Ichikawa-Iwata E., Sasaki M., Inoue-Bungo T., Shiina M., Kimura K., Takata S., Fujikawa A., Morii H., Kumasaka T., Yamamoto M., Ishii S., Ogata K. "Mechanism of c-Myb-C/EBP beta cooperation from separated sites on a promoter." Cell 108:57-70(2002). PubMed=11792321 [ 5] Grotewold E., Athma P., Peterson T. "Alternatively spliced products of the maize P gene encode proteins with homology to the DNA-binding domain of myb-like transcription factors." Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591(1991). PubMed=2052542 [ 6] Oppenheimer D.G., Herman P.L., Sivakumaran S., Esch J., Marks M.D. "A myb gene required for leaf trichome differentiation in Arabidopsis is expressed in stipules." Cell 67:483-493(1991). PubMed=1934056 [ 7] Marocco A., Wissenbach M., Becker D., Paz-Ares J., Saedler H., Salamini F., Rohde W. "Multiple genes are transcribed in Hordeum vulgare and Zea mays that carry the DNA binding domain of the myb oncoproteins." Mol. Gen. Genet. 216:183-187(1989). PubMed=2664447 [ 8] Tice-Baldwin K., Fink G.R., Arndt K.T. "BAS1 has a Myb motif and activates HIS4 transcription only in combination with BAS2." Science 246:931-935(1989). PubMed=2683089 [ 9] Ju Q.D., Morrow B.E., Warner J.R. "REB1, a yeast DNA-binding protein with many targets, is essential for growth and bears some resemblance to the oncogene myb." Mol. Cell. Biol. 10:5226-5234(1990). PubMed=2204808 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00038} {PS50888; HLH} {BEGIN} *********************************************** * Myc-type, 'helix-loop-helix' domain profile * *********************************************** A number of eukaryotic proteins, which probably are sequence specific DNAbinding proteins that act as transcription factors, share a conserved domain of 40 to 50 amino acid residues. It has been proposed [1] that this domain is formed of two amphipathic helices joined by a variable length linker region that could form a loop. This 'helix-loop-helix' (HLH) domain mediates protein dimerization and has been found in the proteins listed below [2,3]. Most of these proteins have an extra basic region of about 15 amino acid residues that is adjacent to the HLH domain and specifically binds to DNA. They are refered as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A (ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on the core sequence 'CANNTG', also refered to as the E-box motif. The homo- or heterodimerization mediated by the HLH domain is independent of, but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking the basic domain (Emc, Id) function as negative regulators since they form heterodimers, but fail to bind DNA. The hairy-related proteins (hairy, E(spl), deadpan) also repress transcription although they can bind DNA. The proteins of this subfamily act together with co-repressor proteins, like groucho, through their C-terminal motif WRPW. - The myc family of cellular oncogenes [4], which is currently known to contain four members: c-myc, N-myc, L-myc, and B-myc. The myc genes are thought to play a role in cellular differentiation and proliferation. - Proteins involved in myogenesis (the induction of muscle cells). In mammals MyoD1 (Myf-3), myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or herculin), in birds CMD1 (QMF-1), in Xenopus MyoD and MF25, in Caenorhabditis elegans CeMyoD, and in Drosophila nautilus (nau). - Vertebrate proteins that bind specific DNA sequences ('E boxes') in various immunoglobulin chains enhancers: E2A or ITF-1 (E12/pan-2 and E47/pan-1), ITF-2 (tcf4), TFE3, and TFEB. - Vertebrate neurogenic differentiation factor 1 that acts as differentiation factor during neurogenesis. - Vertebrate MAX protein, a transcription regulator that forms a sequencespecific DNA-binding protein complex with myc or mad. - Vertebrate Max Interacting Protein 1 (MXI1 protein) which acts as a transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Proteins of the bHLH/PAS superfamily which are transcriptional activators. In mammals, AH receptor nuclear translocator (ARNT), single-minded homologs (SIM1 and SIM2), hypoxia-inducible factor 1 alpha (HIF1A), AH receptor (AHR), neuronal pas domain proteins (NPAS1 and NPAS2), endothelial pas domain protein 1 (EPAS1), mouse ARNT2, and human BMAL1. In drosophila, single-minded (SIM), AH receptor nuclear translocator (ARNT), trachealess protein (TRH), and similar protein (SIMA). - Mammalian transcription factors HES, which repress transcription by acting on two types of DNA sequences, the E box and the N box. - Mammalian MAD protein (max dimerizer) which acts as transcriptional repressor and may antagonize myc transcriptional activity by competing for max. - Mammalian Upstream Stimulatory Factor 1 and 2 (USF1 and USF2), which bind to a symmetrical DNA sequence that is found in a variety of viral and cellular promoters. - Human lyl-1 protein; which is involved, by chromosomal translocation, in Tcell leukemia. - Human transcription factor AP-4. - Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E boxdependent transcription in collaboration with E47. - Mammalian stem cell protein (SCL) (also known as tal1), a protein which may play an important role in hemopoietic differentiation. SCL is involved, by chromosomal translocation, in stem-cell leukemia. - Mammalian proteins Id1 to Id4 [5]. Id (inhibitor of DNA binding) proteins lack a basic DNA-binding domain but are able to form heterodimers with other HLH proteins, thereby inhibiting binding to DNA. - Drosophila extra-macrochaetae (emc) protein, which participates in sensory organ patterning by antagonizing the neurogenic activity of the achaetescute complex. Emc is the homolog of mammalian Id proteins. - Human Sterol Regulatory Element Binding Protein 1 (SREBP1), a transcriptional activator that binds to the sterol regulatory element 1 (SRE-1) found in the flanking region of the LDLR gene and in other genes. - Drosophila achaete-scute (AS-C) complex proteins T3 (l'sc), T4 (scute), T5 (achaete) and T8 (asense). The AS-C proteins are involved in the determination of the neuronal precursors in the peripheral nervous system and the central nervous system. - Mammalian homologs of achaete-scute proteins, the MASH-1 and MASH-2 proteins. - Drosophila atonal protein (ato) which is involved in neurogenesis. - Drosophila daughterless (da) protein, which is essential for neurogenesis and sex-determination. - Drosophila deadpan (dpn), a hairy-like protein involved in the functional differentiation of neurons. - Drosophila delilah (dei) protein, which is plays an important role in the differentiation of epidermal cells into muscle. - Drosophila hairy (h) protein, a transcriptional repressor which regulates the embryonic segmentation and adult bristle patterning. - Drosophila enhancer of split proteins E(spl), that are hairy-like proteins active during neurogenesis. also act as transcriptional repressors. - Drosophila twist (twi) protein, which is involved in the establishment of germ layers in embryos. - Maize anthocyanin regulatory proteins R-S and LC. - Yeast centromere-binding protein 1 (CPF1 or CBF1). This protein is involved in chromosomal segregation. It binds to a highly conserved DNA sequence, found in centromers and in several promoters. - Yeast INO2 and INO4 proteins. - Yeast phosphate system positive regulatory protein PHO4 which interacts with the upstream activating sequence of several acid phosphatase genes. - Yeast serine-rich protein TYE7 that is required for ty-mediated ADH2 expression. - Neurospora crassa nuc-1, a protein that activates the transcription of structural genes for phosphorus acquisition. - Fission yeast protein esc1 which is involved in the sexual differentiation process. The schematic representation of the helix-loop-helix domain is shown here: xxxxxxxxxxxxxxxxxxxxxxxx--------------------xxxxxxxxxxxxxxxxxxxxxxx Amphipathic helix 1 Loop Amphipathic helix 2 The profile we developed covers the helix-loop-helix dimerization domain and the basic region. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: August 2003 / Pattern removed. [ 1] Murre C., McCaw P.S., Baltimore D. "A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins." Cell 56:777-783(1989). PubMed=2493990 [ 2] Garrel J., Campuzano S. BioEssays 13:493-498(1991). [ 3] Kato G.J., Dang C.V. "Function of the c-Myc oncoprotein." FASEB J. 6:3065-3072(1992). PubMed=1521738 [ 4] Krause M., Fire A., Harrison S.W., Priess J., Weintraub H. CeMyoD accumulation defines the body wall muscle cell fate during C. "elegans embryogenesis." Cell 63:907-919(1990). PubMed=2175254 [ 5] Riechmann V., van Cruechten I., Sablitzky F. "The expression pattern of Id4, a novel dominant negative helix-loop-helix protein, is distinct from Id1, Id2 and Id3." Nucleic Acids Res. 22:749-755(1994). PubMed=8139914 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00039} {PS00039; DEAD_ATP_HELICASE} {PS00690; DEAH_ATP_HELICASE} {BEGIN} ***************************************************************** * DEAD and DEAH box families ATP-dependent helicases signatures * ***************************************************************** A number of eukaryotic and prokaryotic proteins have been characterized [1,2, 3] on the basis of their structural similarity. They all seem to be involved in ATP-dependent, nucleic-acid unwinding. Proteins currently known to belong to this family are: - Initiation factor eIF-4A. Found in eukaryotes, this protein is a subunit of a high molecular weight complex involved in 5'cap recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase. - PRP5 and PRP28. These yeast proteins are involved in various ATPrequiring steps of the pre-mRNA splicing process. - Pl10, a mouse protein expressed specifically during spermatogenesis. - An3, a Xenopus putative RNA helicase, closely related to Pl10. - SPP81/DED1 and DBP1, two yeast proteins probably involved in pre-mRNA splicing and related to Pl10. - Caenorhabditis elegans helicase glh-1. - MSS116, a yeast protein required for mitochondrial splicing. - SPB4, a yeast protein involved in the maturation of 25S ribosomal RNA. - p68, a human nuclear antigen. p68 has ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62), a Drosophila putative RNA helicase related to p68. - DBP2, a yeast protein related to p68. - DHH1, a yeast protein. - DRS1, a yeast protein involved in ribosome assembly. - MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid. - ROK1, a yeast protein. - ste13, a fission yeast protein. - Vasa, a Drosophila protein important for oocyte formation and specification of of embryonic posterior structures. - Me31B, a Drosophila maternally expressed protein of unknown function. - dbpA, an Escherichia coli putative RNA helicase. - deaD, an Escherichia coli putative RNA helicase which can suppress a mutation in the rpsB gene for ribosomal protein S2. - rhlB, an Escherichia coli putative RNA helicase. - rhlE, an Escherichia coli putative RNA helicase. - srmB, an Escherichia coli protein that shows RNA-dependent ATPase activity. It probably interacts with 23S ribosomal RNA. - Caenorhabditis elegans hypothetical proteins T26G10.1, ZK512.2 and ZK686.2. - Yeast hypothetical protein YHR065c. - Yeast hypothetical protein YHR169w. - Fission yeast hypothetical protein SpAC31A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteins share a number of conserved sequence motifs. Some of them are specific to this family while others are shared by other ATPbinding proteins or by proteins belonging to the helicases `superfamily' [4,E1]. One of these motifs, called the 'D-E-A-D-box', represents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins [3,5,6,E1]. Proteins currently known to belong to this subfamily are: - PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all involved in various ATP-requiring steps of the pre-mRNA splicing process. - Fission yeast prh1, which my be involved in pre-mRNA splicing. - Male-less (mle), a Drosophila protein required in males, for dosage compensation of X chromosome linked genes. - RAD3 from yeast. RAD3 is a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or cross-linking agents. Fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2) are the homologs of RAD3. - Yeast CHL1 (or CTF1), which is important for chromosome transmission and normal cell cycle progression in G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w. - Caenorhabditis elegans hypothetical proteins C06E1.10 and K03H1.2. - Poxviruses' early transcription factor 70 Kd subunit which acts with RNA polymerase to initiate transcription from early gene promoters. - I8, a putative vaccinia virus helicase. - hrpA, an Escherichia coli putative RNA helicase. We have developed signature patterns for both subfamilies. -Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN] -Sequences known to belong to this class detected by the pattern: ALL, except for YHR169w. -Other sequence(s) detected in Swiss-Prot: 14. -Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] -Sequences known to belong to this class detected by the pattern: ALL, except for hrpA. -Other sequence(s) detected in Swiss-Prot: 6. -Note: Proteins belonging to this family also contain a copy of the ATP/GTPbinding motif 'A' (P-loop) (see the relevant entry <PDOC00017>). -Expert(s) to contact by email: Linder P.; [email protected] -Last update: July 1999 / Text revised. [ 1] Schmid S.R., Linder P. "D-E-A-D protein family of putative RNA helicases." Mol. Microbiol. 6:283-291(1992). PubMed=1552844 [ 2] Linder P., Lasko P.F., Ashburner M., Leroy P., Nielsen P.J., Nishi K., Schnier J., Slonimski P.P. "Birth of the D-E-A-D box." Nature 337:121-122(1989). PubMed=2563148; DOI=10.1038/337121a0 [ 3] Wassarman D.A., Steitz J.A. "RNA splicing. Alive with DEAD proteins." Nature 349:463-464(1991). PubMed=1825133; DOI=10.1038/349463a0 [ 4] Hodgman T.C. "A new superfamily of replicative proteins." Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata). PubMed=3362205; DOI=10.1038/333022b0 [ 5] Harosh I., Deschavanne P. "The RAD3 gene is a member of the DEAH family RNA helicase-like protein." Nucleic Acids Res. 19:6331-6331(1991). PubMed=1956796 [ 6] Koonin E.V., Senkevich T.G. "Vaccinia virus encodes four putative DNA and/or RNA helicases distantly related to each other." J. Gen. Virol. 73:989-993(1992). PubMed=1321883 [E1] http://medweb2.unige.ch/~linder/RNA_helicases.html +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00040} {PS00041; HTH_ARAC_FAMILY_1} {PS01124; HTH_ARAC_FAMILY_2} {BEGIN} ******************************************************************** * Bacterial regulatory proteins, araC family signature and profile * ******************************************************************** The many bacterial transcription regulation proteins which bind DNA through a 'helix-turn-helix' motif can be classified into subfamilies on the basis of sequence similarities. One of these subfamilies groups together the following proteins [1,2,3]: - aarP, a transcriptional activator of the 2'-N-acetyltransferase gene in Providencia stuartii. - ada, an Escherichia coli and Salmonella typhimurium bifunctional protein that repairs alkylated guanine in DNA by transferring the alkyl group at the O(6) position to a cysteine residue in the enzyme. The methylated protein acts a positive regulator of its own synthesis and of the alkA, alkB and aidB genes. - adaA, a Bacillus subtilis bifunctional protein that acts both as a transcriptional activator of the ada operon and as a methylphosphotriesterDNA alkyltransferase. - adiY, an Escherichia coli protein of unknown function. - aggR, the transcriptional activator of aggregative adherence fimbria I expression in enteroaggregative Escherichia coli. - appY, a protein which acts as a transcriptional activator of acid phosphatase and other proteins during the deceleration phase of growth and acts as a repressor for other proteins that are synthesized in exponential growth or in the stationary phase. - araC, the arabinose operon regulatory protein, which activates the transcription of the araBAD genes. - cafR, the Yersinia pestis F1 operon positive regulatory protein. - celD, the Escherichia coli cel operon repressor. - cfaD, a protein which is required for the expression of the CFA/I adhesin of enterotoxigenic Escherichia coli. - csvR, a transcriptional activator of fimbrial genes in enterotoxigenic Escherichia coli. - envY, the porin thermoregulatory protein, which is involved in the control of the temperature-dependent expression of several Escherichia coli envelope proteins such as ompF, ompC, and lamB. - exsA, an activator of exoenzyme S synthesis in Pseudomonas aeruginosa. - fapR, the positive activator for the expression of the 987P operon coding for the fimbrial protein in enterotoxigenic Escherichia coli. - hrpB, a positive regulator of pathogenicity genes in Burkholderia solanacearum. - invF, the Salmonella typhimurium invasion operon regulator. - marA, which may be a transcriptional activator of genes involved in the multiple antibiotic resistance (mar) phenotype. - melR, the melibiose operon regulatory protein, which activates the transcription of the melAB genes. - mixE, a Shigella flexneri protein necessary for secretion of ipa invasins. - mmsR, the transcriptional activator for the mmsAB operon in Pseudomonas aeruginosa. - msmR, the multiple sugar metabolism operon transcriptional activator in Streptococcus mutans. - pchR, a Pseudomonas aeruginosa activator for pyochelin and ferripyochelin receptor. - perA, a transcriptional activator of the eaeA gene for intimin in enteropathogenic Escherichia coli. - pocR, a Salmonella typhimurium regulator of the cobalamin biosynthesis operon. - pqrA, from Proteus vulgaris. - rafR, the regulator of the raffinose operon in Pediococcus pentosaceus. - ramA, from Klebsiella pneumoniae. - rhaR, the Escherichia coli and Salmonella typhimurium L-rhamnose operon transcriptional activator. - rhaS, an Escherichia coli and Salmonella typhimurium positive activator of genes required for rhamnose utilization. - rns, a protein which is required for the expression of the cs1 and cs2 adhesins of enterotoxigenic Escherichia coli. - rob, a protein which binds to the right arm of the replication origin oriC of the Escherichia coli chromosome. - soxS, a protein that, with the soxR protein, controls a superoxide response regulon in Escherichia coli. - tetD, a protein from transposon TN10. - tcpN or toxT, the Vibrio cholerae transcriptional activator of the tcp operon involved in pilus biosynthesis and transport. - thcR, a probable regulator of the thc operon for the degradation of the thiocarbamate herbicide EPTC in Rhodococcus sp. strain NI86/21. - ureR, the transcriptional activator of the plasmid-encoded urease operon in Enterobacteriaceae. - virF and lcrF, the Yersinia virulence regulon transcriptional activator. - virF, the Shigella transcriptional factor of invasion related antigens ipaBCD. - xylR, the Escherichia coli xylose operon regulator. - xylS, the transcriptional activator of the Pseudomonas putida TOL plasmid (pWWO, pWW53 and pDK1) meta operon (xylDLEGF genes). - yfeG, an Escherichia coli hypothetical protein. - yhiW, an Escherichia coli hypothetical protein. - yhiX, an Escherichia coli hypothetical protein. - yidL, an Escherichia coli hypothetical protein. - yijO, an Escherichia coli hypothetical protein. - yuxC, a Bacillus subtilis hypothetical protein. - yzbC, a Bacillus subtilis hypothetical protein. Except for celD, all of these proteins seem to be positive transcriptional factors. Their size range from 107 (soxS) to 529 (yzbC) residues. The helix-turn-helix motif is located in the third quarter of most of the sequences; the N-terminal and central regions of these proteins are presumed to interact with effector molecules and may be involved in dimerization. The minimal DNA binding domain, which spans roughly 100 residues and comprises the HTH motif contains another region with similarity to classical HTH domain. However, it contains an insertion of one residue in the turn-region. A signature pattern was derived from the region that follows the first HTH domain and that includes the totality of the putative second HTH domain. A more sensitive detection of members of the araC family is available through the use of a profile which spans the minimal DNA-binding region of 100 residues. -Consensus pattern: [KRQ]-[LIVMA]-x(2)-[GSTALIV]-{FYWPGDN}-x(2)-[LIVMSA]x(4,9)-[LIVMF]-x-{PLH}-[LIVMSTA]-[GSTACIL]-{GPK}-{F}x[GANQRF]-[LIVMFY]-x(4,5)-[LFY]-x(3)-[FYIVA]-{FYWHCM}{PGVI}-x(2)-[GSADENQKR]-x-[NSTAPKL]-[PARL] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 50. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Ramos J.L.; [email protected] Gallegos M.-T.; [email protected] -Last update: April 2006 / Pattern revised. [ 1] Gallegos M.-T., Michan C., Ramos J.L. "The XylS/AraC family of regulators." Nucleic Acids Res. 21:807-810(1993). PubMed=8451183 [ 2] Henikoff S., Wallace J.C., Brown J.P. "Finding protein similarities with nucleotide sequence databases." Methods Enzymol. 183:111-132(1990). PubMed=2314271 [ 3] Gallegos M.T., Schleif R., Bairoch A., Hofmann K., Ramos J.L. "Arac/XylS family of transcriptional regulators." Microbiol. Mol. Biol. Rev. 61:393-410(1997). PubMed=9409145 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00041} {PS00042; HTH_CRP_1} {PS51063; HTH_CRP_2} {BEGIN} ********************************************* * Crp-type HTH domain signature and profile * ********************************************* The crp-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH) domain of about 70-75 amino acids present in transcription regulators of the crp-fnr family, involved in the control of virulence factors, enzymes of aromatic ring degradation, nitrogen fixation, photosynthesis, and various types of respiration. The crp-fnr family is named after the first members identified in E.coli: the well characterized cyclic AMP receptor protein CRP or CAP (catabolite activator protein) and the fumarate and nitrate reductase regulator Fnr. crp-type HTH domain proteins occur in most bacteria and in chloroplasts of red algae. The DNA-binding HTH domain is located in the C-terminal part; the N-terminal part of the proteins of the crp-fnr family contains a nucleotide-binding domain (see <PDOC00691>) and a dimerization/linker helix occurs in between. The crp-fnr regulators predominantly act as transcription activators, but can also be important repressors, and respond to diverse intracellular and exogenous signals, such as cAMP, anoxia, redox state, oxidative and nitrosative stress, carbon monoxide, nitric oxide or temperature [1,2]. The structure of the crp-type DNA-binding domain (see <PDB:1LB2>) shows that the helices (H) forming the helix-turn-helix motif (H2-H3) are flanked by two beta-hairpin (B) wings, in the topology H1-B1-B2-H2-H3-B3-B4. Helix 3 is termed the recognition helix, as in most wHTHs it binds the DNA major groove [3,4,5]. Some proteins known to contain a Crp-type HTH domain: - Escherichia coli crp (also known as cAMP receptor), a protein that complexes with cAMP and regulates the transcription of several catabolite-sensitive operons. - Escherichia coli fnr, a protein that activates genes for proteins involved in a variety of anaerobic electron transport systems. - Rhizobium leguminosarum fnrN, a transcription regulator of nitrogen fixation. - Rhodobacter sphaeroides fnrL, a transcription activator of genes for heme biosynthesis, bacteriochlorophyll synthesis and the lightharvesting complex LHII. - Rhizobiacae fixK, a protein that regulates nitrogen fixation genes, both positively and negatively. - Lactobacillus casei fnr-like protein flp, a putative regulatory protein linked to the trpDCFBA operon. - Cyanobacteria ntcA, a regulator of the expression of genes subject to nitrogen control. - Xanthomonas campestris clp, a protein involved in the regulation of phytopathogenicity. Clp controls the production of extracellular enzymes, xanthan gum and pigment, either positively or negatively. The 'helix-turn-helix' DNA-binding motif of these proteins is located in the C-terminal part of the sequence. The pattern we use to detect these proteins starts two residues before the HTH motif and ends two residues before the end of helix 3. We also developed a profile that covers the entire wHTH, including helix 1 and strand 4, and which allows a more sensitive detection. -Consensus pattern: [LIVM]-[STAG]-[RHNWM]-x(2)-[LIM]-[GA]-x-[LIVMFYAS][LIVSC]-[GA]-x-[STACN]-x(2)-[MST]-x(1,2)-[GSTN]-R-x[LIVMF]-x(2)-[LIVMF] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 1. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: April 2006 / Pattern revised. [ 1] Irvine A.S., Guest J.R. "Lactobacillus casei contains a member of the CRP-FNR family." Nucleic Acids Res. 21:753-753(1993). PubMed=8441692 [ 2] Koerner H., Sofia H.J., Zumft W.G. FEMS Microbiol. Rev. 27:559-592(2003). [ 3] Busby S., Ebright R.H. "Transcription activation by catabolite activator protein (CAP)." J. Mol. Biol. 293:199-213(1999). PubMed=10550204; DOI=10.1006/jmbi.1999.3161 [ 4] Lanzilotta W.N., Schuller D.J., Thorsteinsson M.V., Kerby R.L., Roberts G.P., Poulos T.L. "Structure of the CO sensing transcription activator CooA." Nat. Struct. Biol. 7:876-880(2000). PubMed=11017196; DOI=10.1038/82820 [ 5] Huffman J.L., Brennan R.G. "Prokaryotic transcription regulators: more than just the helix-turn-helix motif." Curr. Opin. Struct. Biol. 12:98-106(2002). PubMed=11839496 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00042} {PS50949; HTH_GNTR} {BEGIN} ******************************** * GntR-type HTH domain profile * ******************************** The gntR-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH) domain of about 60-70 residues present in transcriptional regulators of the gntR family. This family of bacterial regulators is named after Bacillus subtilis gntR, a repressor of the gluconate operon [1,2]. Six subfamilies have been described for the gntR family: fadR, hutC, plmA, mocR, ytrA, and araR, which regulate various biological processes and important bacterial metabolic pathways. The DNA-binding gntR-type HTH domain occurs usually in the N-terminal part. The C-terminal part can contain a subfamilyspecific effector-binding domain and/or an oligomerization domain. The fadR-like regulators, representing the largest subfamily, are involved in the regulation of oxidized substrates related to metabolic pathways or metabolism of amino acids. HutC-like proteins are involved in conjugative plasmid transfer in several Streptomyces species. PlmA is a cyanobacterial regulator of plasmid maintenance. The mocR subfamily encompasses proteins homologous to class I aminotransferase proteins, which bind pyridoxal phosphate as a cofactor. Most of the ytrA-like proteins take part in operons involved in ATPbinding cassette (ABC) transport systems. AraR is an autoregulatory protein with a C-terminal domain that binds a carbohydrate effector, similar to that present in regulators of the lacI/galR family (see <PDOC00366>) [3,4]. The crystal structures of fadR show that the N-terminal, DNA binding domain contains a small beta-sheet (B) core and three alpha-helices (H) with a topology H1-B1-H2-H3-B2-B3 (see <PDB:1H9T>). Helices 2 and 3, connected via a tight turn, comprise the helix-turn-helix motif. The antiparallel beta-strands 2 and 3 together with B1 form a small beta-sheet, which is called the wing. Helix 3 is termed the recognition helix as in most wHTHs it binds the DNA major groove. Here, only the N-terminal tip of the recognition helix makes specific DNA-contacts and the wing makes unusual sequencespecific contacts to the minor groove. Like other HTH proteins, most gntR-type regulators bind as homodimers to 2-fold symmetric DNA sequences in which each monomer recognizes half of the site [5,6]. Some proteins known to contain a gntR-type HTH domain: - Bacillus subtilis gntR, a repressor of the gnt operon, which is responsible for gluconate metabolism. In the absence of gluconate, gntR binds to the promoter of the operon. The expression of the operon is induced in the presence of gluconate. - Escherichia coli fadR, a transcriptional regulator of fatty acid metabolism. In the absence of the acyl-CoA effector, fadR binds specific operator sites, represses the expression of genes involved in fatty acid degradation and import, and activates biosynthetic genes. Binding of acyl-CoA gives conformational changes abolishing DNA binding, which derepresses the catabolic genes and deactivates the anabolic genes. - Escherichia coli phdR, a transcriptional repressor of the pyruvate dehydrogenase complex. - Klebsiella aerogenes and Pseudomonas putida hutC, a transcriptional repressor of the histidine utilization (hut) operon. - Streptomyces lividans korA, a regulator that controls plasmid transfer. - Rhizobium meliloti mocR, a probable regulator of rhizopine catabolism. - Bacillus subtilis ytrA, a repressor of the acetoine utilization gene cluster. - Anabaena sp. strain PCC 7120 plmA, a regulator involved in plasmid maintenance [4]. - Bacillus arabinose operon. subtilis araR, a transcriptional repressor of the The profile we developed covers the entire gntR-type HTH domain, from the well-conserved part of helix 1 to the end of the wing. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Rigali S.; [email protected] -Last update: February 2004 / Text revised. [ 1] Buck D., Guest J.R. "Overexpression and site-directed mutagenesis of the succinyl-CoA synthetase of Escherichia coli and nucleotide sequence of a gene (g30) that is adjacent to the suc operon." Biochem. J. 260:737-747(1989). PubMed=2548486 [ 2] Haydon D.J., Guest J.R. "A new family of bacterial regulatory proteins." FEMS Microbiol. Lett. 63:291-295(1991). PubMed=2060763 [ 3] Rigali S., Derouaux A., Giannotta F., Dusart J. "Subdivision of the helix-turn-helix GntR family of bacterial regulators in the FadR, HutC, MocR, and YtrA subfamilies." J. Biol. Chem. 277:12507-12515(2002). PubMed=11756427; DOI=10.1074/jbc.M110968200 [ 4] Lee M.H., Scherer M., Rigali S., Golden J.W. "PlmA, a new member of the GntR family, has plasmid maintenance functions in Anabaena sp. strain PCC 7120." J. Bacteriol. 185:4315-4325(2003). PubMed=12867439 [ 5] Van Aalten D.M.F., DiRusso C.C., Knudsen J. EMBO J. 20:2041-2050(2001). [ 6] Xu Y., Heath R.J., Li Z., Rock C.O., White S.W. "The FadR.DNA complex. Transcriptional control of fatty acid metabolism in Escherichia coli." J. Biol. Chem. 276:17373-17379(2001). PubMed=11279025; DOI=10.1074/jbc.M100195200 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00043} {PS50931; HTH_LYSR} {BEGIN} ******************************** * LysR-type HTH domain profile * ******************************** The lysR-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH) domain of about 60 residues present in lysR-type transcriptional regulators (LTTR), one of the most common regulator families in prokaryotes. The family is named after the Escherichia coli regulator lysR [1]. LysR proteins are present in diverse bacterial genera, archaea and algal chloroplasts. All LTTRs contain the DNA-binding lysR-type HTH domain, usually in the N-terminal part. Most LTTRs require a small compound that acts as co-inducer. The Cterminal part of lysR proteins can contain a regulatory domain with two subdomains involved in (1) co-inducer recognition/response and (2) DNA binding and response. LTTRs activate the transcription of operons and regulons involved in very diverse functions, such as amino acid biosynthesis, CO2 fixation, antibiotic resistance, regulation of virulence factors, nodulation for nitrogen fixing bacteria, oxidative stress response or aromatic compounds catabolism. Most LTTRs act as a transcriptional activator of the target genes and also as a repressor of their own expression. Typical LTTRs bind to a sequence of about 50-60 bp, which contains two distinct sites, (1) a recognition-binding site (RBS) centered near -65 of the target transcription start site and with an inverted repeat motif including the T-N(11)-A motif and (2) an activation-binding site (ABS) which overlaps the -35 region of the transcription start site of the regulated gene. LysR proteins are mainly cytoplasmic, but some seem membrane-bound [2]. The crystal structure of the lysR alpha helices and two anti-parallel the helix-turn-helix motif comprising strands being called the wing. Most LTTRs DNA-binding domain of CbnR shows three beta strands (see <PDB:1IXC>), with the second and third helices and the are likely tetramers [3]. Some proteins known to contain a lysR domain: - Proteus vulgaris blaA, a transcriptional regulator of beta-lactamase. - Pseudomonas putida catR, a regulator of catechol catabolism for benzoate degradation. - Escherichia coli cynR, a regulator for detoxification of cyanate. - Klebsiella aerogenes cysB, a regulator of cysteine biosynthesis. - Vibrio cholerae irgB, an iron-dependent regulator of virulence factors. - Escherichia coli lysR, a transcriptional regulator of lysine biosynthesis. - Escherichia coli nhaR, a regulator of a sodium/proton (Na+/H+) antiporter. - Rhizobium meliloti nodD and syrM, regulators of nodulation genes involved in nitrogen fixation symbiosis. - Salmonella typhimurium oxyR, a regulator of intracellular hydrogen peroxide and oxydative stress response. - Ralstonia solanacearum phcA, a regulator of virulence factors. The profile we developed covers the entire lysR-type HTH domain. -Sequences known to belong to this class detected by the profile: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Expert(s) to contact by email: Schell M.; [email protected] -Last update: October 2003 / Pattern removed, profile added and text revised. [ 1] Henikoff S., Haughn G.W., Calvo J.M., Wallace J.C. "A large family of bacterial activator proteins." Proc. Natl. Acad. Sci. U.S.A. 85:6602-6606(1988). PubMed=3413113 [ 2] Schell M.A. "Molecular biology of the LysR family of transcriptional regulators." Annu. Rev. Microbiol. 47:597-626(1993). PubMed=8257110; DOI=10.1146/annurev.mi.47.100193.003121 [ 3] Muraoka S., Okumura R., Ogawa N., Nonaka T., Miyashita K., Senda T. "Crystal structure of a full-length LysR-type transcriptional regulator, CbnR: unusual combination of two subunit forms and molecular bases for causing and changing DNA bend." J. Mol. Biol. 328:555-566(2003). PubMed=12706716 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00044} {PS00045; HISTONE_LIKE} {BEGIN} ********************************************************* * Bacterial histone-like DNA-binding proteins signature * ********************************************************* Bacteria synthesize a set of small, usually basic proteins of about 90 residues that bind DNA and are known as histone-like proteins [1,2]. The exact function of these proteins is not yet clear but they are capable of wrapping DNA and stabilizing it from denaturation under extreme environmental conditions. The sequence of a number of different types of these proteins is known: - The HU proteins, which, in Escherichia coli, are a dimer of closely related alpha and beta chains and, in other bacteria, can be dimer of identical chains. HU-type proteins have been found in a variety of eubacteria, cyanobacteria and archaebacteria, and are also encoded in the chloroplast genome of some algae [3]. - The integration host factor (IHF), a dimer of closely related chains which seem to function in genetic recombination as well as in translational and transcriptional control [4] in enterobacteria. - The bacteriophage sp01 transcription factor 1 (TF1) which selectively binds to and inhibits the transcription of hydroxymethyluracil-containing DNA, such as sp01 DNA, by RNA polymerase in vitro. - The African Swine fever virus protein A104R (or LMW5-AR) [5]. As a signature pattern for this family of proteins, we use a twenty residue sequence which includes three perfectly conserved positions. According to the tertiary structure of one of these proteins [6], this pattern spans exactly the first half of the flexible DNA-binding arm. -Consensus pattern: [GSK]-F-x(2)-[LIVMF]-x(4)-[RKEQA]-x(2)-[RST]-x(1,2)[GA]x-[KN]-P-x-[TN] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Last update: December 2004 / Pattern and text revised. [ 1] Drlica K., Rouviere-Yaniv J. "Histonelike proteins of bacteria." Microbiol. Rev. 51:301-319(1987). PubMed=3118156 [ 2] Pettijohn D.E. "Histone-like proteins and bacterial chromosome structure." J. Biol. Chem. 263:12793-12796(1988). PubMed=3047111 [ 3] Wang S.L., Liu X.-Q. "The plastid genome of Cryptomonas phi encodes an hsp70-like protein, a histone-like protein, and an acyl carrier protein." Proc. Natl. Acad. Sci. U.S.A. 88:10783-10787(1991). PubMed=1961745 [ 4] Friedman D.I. "Integration host factor: a protein for all reasons." Cell 55:545-554(1988). PubMed=2972385 [ 5] Neilan J.G., Lu Z., Kutish G.F., Sussman M.D., Roberts P.C., Yozawa T., Rock D.L. "An African swine fever virus gene with similarity to bacterial DNA binding proteins, bacterial integration host factors, and the Bacillus phage SPO1 transcription factor, TF1." Nucleic Acids Res. 21:1496-1496(1993). PubMed=8464748 [ 6] Tanaka I., Appelt K., Dijk J., White S.W., Wilson K.S. "3-A resolution structure of a protein with histone-like properties in prokaryotes." Nature 310:376-381(1984). PubMed=6540370 +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00045} {PS00046; HISTONE_H2A} {BEGIN} ************************* * Histone H2A signature * ************************* Histone H2A is one of the four histones, along with H2B, H3 and H4, which forms the eukaryotic nucleosome core. Using alignments of histone H2A sequences [1,2,E1] we selected, as a signature pattern, a conserved region in the N-terminal part of H2A. This region is conserved both in classical Sphase regulated H2A's and in variant histone H2A's which are synthesized throughout the cell cycle. -Consensus pattern: [AC]-G-L-x-F-P-V -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 2. -Last update: November 1995 / Pattern and text revised. [ 1] Wells D.E., Brown D. "Histone and histone gene compilation and alignment update." Nucleic Acids Res. 19:2173-2188(1991). PubMed=2041803 [ 2] Thatcher T.H., Gorovsky M.A. "Phylogenetic analysis of the core histones H2A, H2B, H3, and H4." Nucleic Acids Res. 22:174-179(1994). PubMed=8121801 [E1] http://research.nhgri.nih.gov/histones/ +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00046} {PS00047; HISTONE_H4} {BEGIN} ************************ * Histone H4 signature * ************************ Histone H4 is one of the four histones, along with H2A, H2B and H3, which forms the eukaryotic nucleosome core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost invariant in more then 2 billion years of evolution [1,E1]. The region we use as a signature pattern is a pentapeptide found in positions 14 to 18 of all H4 sequences. It contains a lysine residue which is often acetylated [2] and a histidine residue which is implicated in DNA-binding [3]. -Consensus pattern: G-A-K-R-H -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: 3. -Last update: November 1995 / Text revised. [ 1] Thatcher T.H., Gorovsky M.A. "Phylogenetic analysis of the core histones H2A, H2B, H3, and H4." Nucleic Acids Res. 22:174-179(1994). PubMed=8121801 [ 2] Doenecke D., Gallwitz D. "Acetylation of histones in nucleosomes." Mol. Cell. Biochem. 44:113-128(1982). PubMed=6808351 [ 3] Ebralidse K.K., Grachev S.A., Mirzabekov A.D. "A highly basic histone H4 domain bound to the sharply bent region of nucleosomal DNA." Nature 331:365-367(1988). PubMed=3340182; DOI=10.1038/331365a0 [E1] http://research.nhgri.nih.gov/histones/ +-----------------------------------------------------------------------+ PROSITE is copyright. It is produced by the Swiss Institute of Bioinformatics (SIB). There are no restrictions on its use by non-profit institutions as long as its content is in no way modified. Usage by and for commercial entities requires a license agreement. For information about the licensing scheme send an email to [email protected] or see: http://www.expasy.org/prosite/prosite_license.htm. +-----------------------------------------------------------------------+ {END} {PDOC00047} {PS00048; PROTAMINE_P1} {BEGIN} ************************** * Protamine P1 signature * ************************** Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. There are two different types of mammalian protamine, called P1 and P2. P1 has been found in all species studied, while P2 is sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to