Download procite - UWI St. Augustine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transcriptional regulation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Biochemistry wikipedia , lookup

Paracrine signalling wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Point mutation wikipedia , lookup

SR protein wikipedia , lookup

Metalloprotein wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Signal transduction wikipedia , lookup

Gene expression wikipedia , lookup

Silencer (genetics) wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein wikipedia , lookup

Interactome wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

Homology modeling wikipedia , lookup

Protein purification wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Anthrax toxin wikipedia , lookup

Proteolysis wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
{PDOC00000}
{BEGIN}
**********************************
*** PROSITE documentation file ***
**********************************
Release 20.58 of 15-Dec-2009.
PROSITE is developed by the Swiss Institute of Bioinformatics (SIB) under
the responsability of Amos Bairoch and Nicolas Hulo.
This release was prepared by: Nicolas Hulo, Virginie Bulliard, Petra
Langendijk-Genevaux and Christian Sigrist with the help of Edouard
de Castro, Lorenzo Cerutti, Corinne Lachaize and Amos Bairoch.
See: http://www.expasy.org/prosite/
Email: [email protected]
Acknowledgements:
- To all those mentioned in this document who have reviewed the
entry(ies)
for which they are listed as experts. With specific thanks to Rein
Aasland,
Mark Boguski, Peer Bork, Josh Cherry, Andre Chollet, Frank Kolakowski,
David Landsman, Bernard Henrissat, Eugene Koonin, Steve Henikoff,
Manuel
Peitsch and Jonathan Reizer.
- Jim Apostolopoulos is the author of the PDOC00699 entry.
- Brigitte Boeckmann is the author of the PDOC00691, PDOC00703,
PDOC00829,
PDOC00796, PDOC00798, PDOC00799, PDOC00906, PDOC00907, PDOC00908,
PDOC00912, PDOC00913, PDOC00924, PDOC00928, PDOC00929, PDOC00955,
PDOC00961, PDOC00966, PDOC00988 and PDOC50020 entries.
- Jean-Louis Boulay is the author of the PDOC01051, PDOC01050,
PDOC01052,
PDOC01053 and PDOC01054 entries.
- Ryszard Brzezinski is the author of the PDOC60000 entry.
- Elisabeth Coudert is the author of the PDOC00373 entry.
- Kirill Degtyarenko is the author of the PDOC60001 entry.
- Christian Doerig is the author of the PDOC01049 entry.
- Kay Hofmann is the author of the PDOC50003, PDOC50006, PDOC50007 and
PDOC50017 entries.
- Chantal Hulo is the author of the PDOC00987 entry.
- Karine Michoud is the author of the PDOC01044 and PDOC01042 entries.
- Yuri Panchin is the author of the PDOC51013 entry.
- S. Ramakumar is the author of the PDOC51052, PDOC60004, PDOC60010,
PDOC60011, PDOC60015, PDOC60016, PDOC60018, PDOC60020, PDOC60021,
PDOC60022, PDOC60023, PDOC60024, PDOC60025, PDOC60026, PDOC60027,
PDOC60028, PDOC60029 and PDOC60030 entries.
- Keith Robison is the author of the PDOC00830 and PDOC00861 entries.
----------------------------------------------------------------------PROSITE is copyright.
It
is
produced
by
the
Swiss
Institute
of
Bioinformatics (SIB). There are no restrictions on its use by nonprofit
institutions as long as its content is in no way modified. Usage by
and
for commercial entities requires a license agreement.
For
information
about the licensing scheme
send an email to [email protected]
or
see: http://www.expasy.org/prosite/prosite_license.htm.
----------------------------------------------------------------------+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00001}
{PS00001; ASN_GLYCOSYLATION}
{BEGIN}
************************
* N-glycosylation site *
************************
It has been known for a long time [1] that potential N-glycosylation
sites are
specific to the consensus sequence Asn-Xaa-Ser/Thr. It must be noted
that the
presence of the consensus tripeptide is not sufficient to conclude
that an
asparagine residue is glycosylated, due to the fact that the folding of
the
protein plays an important role in the regulation of N-glycosylation
[2]. It
has been shown [3] that the presence of proline between Asn and Ser/Thr
will
inhibit N-glycosylation; this has been confirmed by a recent [4]
statistical
analysis of glycosylation sites, which also shows that about 50% of the
sites
that have a proline C-terminal to Ser/Thr are not glycosylated.
It must also be noted that there are a few reported cases of
glycosylation
sites with the pattern Asn-Xaa-Cys; an experimentally demonstrated
occurrence
of such a non-standard site is found in the plasma protein C [5].
-Consensus pattern: N-{P}-[ST]-{P}
[N is the glycosylation site]
-Last update: May 1991 / Text revised.
[ 1] Marshall R.D.
"Glycoproteins."
Annu. Rev. Biochem. 41:673-702(1972).
PubMed=4563441; DOI=10.1146/annurev.bi.41.070172.003325
[ 2] Pless D.D., Lennarz W.J.
"Enzymatic conversion of proteins to glycoproteins."
Proc. Natl. Acad. Sci. U.S.A. 74:134-138(1977).
PubMed=264667
[ 3] Bause E.
"Structural requirements of N-glycosylation of proteins. Studies
with
proline peptides as conformational probes."
Biochem. J. 209:331-336(1983).
PubMed=6847620
[ 4] Gavel Y., von Heijne G.
"Sequence differences between glycosylated and non-glycosylated
Asn-X-Thr/Ser acceptor sites: implications for protein engineering."
Protein Eng. 3:433-442(1990).
PubMed=2349213
[ 5] Miletich J.P., Broze G.J. Jr.
"Beta protein C is not glycosylated at asparagine 329. The rate of
translation may influence the frequency of usage at
asparagine-X-cysteine sites."
J. Biol. Chem. 265:11397-11404(1990).
PubMed=1694179
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00004}
{PS00004; CAMP_PHOSPHO_SITE}
{BEGIN}
****************************************************************
* cAMP- and cGMP-dependent protein kinase phosphorylation site *
****************************************************************
There has been a number of studies relative to the specificity of
cAMP- and
cGMP-dependent protein kinases [1,2,3]. Both types of kinases appear to
share
a preference for the phosphorylation of serine or threonine residues
found
close to at least two consecutive N-terminal basic residues. It is
important
to note that there are quite a number of exceptions to this rule.
-Consensus pattern: [RK](2)-x-[ST]
[S or T is the phosphorylation site]
-Last update: June 1988 / First entry.
[ 1] Fremisco J.R., Glass D.B., Krebs E.G.
J. Biol. Chem. 255:4240-4245(1980).
[ 2] Glass D.B., Smith S.B.
"Phosphorylation by cyclic GMP-dependent protein kinase of a
synthetic
peptide corresponding to the autophosphorylation site in the
enzyme."
J. Biol. Chem. 258:14797-14803(1983).
PubMed=6317673
[ 3] Glass D.B., el-Maghrabi M.R., Pilkis S.J.
"Synthetic peptides corresponding to the site phosphorylated in
6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase as substrates
of
cyclic nucleotide-dependent protein kinases."
J. Biol. Chem. 261:2987-2993(1986).
PubMed=3005275
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00005}
{PS00005; PKC_PHOSPHO_SITE}
{BEGIN}
*****************************************
* Protein kinase C phosphorylation site *
*****************************************
In vivo, protein kinase C
phosphorylation of
exhibits
a
preference
for the
serine or threonine residues found close to a C-terminal basic residue
[1,2].
The presence of additional
basic residues at the N- or C-terminal of
the
target amino acid enhances the Vmax and Km of the phosphorylation
reaction.
-Consensus pattern: [ST]-x-[RK]
[S or T is the phosphorylation site]
-Last update: June 1988 / First entry.
[ 1] Woodget J.R., Gould K.L., Hunter T.
Eur. J. Biochem. 161:177-184(1986).
[ 2] Kishimoto A., Nishiyama K., Nakanishi H., Uratsuji Y., Nomura H.,
Takeyama Y., Nishizuka Y.
"Studies on the phosphorylation of myelin basic protein by protein
kinase C and adenosine 3':5'-monophosphate-dependent protein
kinase."
J. Biol. Chem. 260:12492-12499(1985).
PubMed=2413024
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00006}
{PS00006; CK2_PHOSPHO_SITE}
{BEGIN}
*****************************************
* Casein kinase II phosphorylation site *
*****************************************
Casein kinase II (CK-2) is a protein serine/threonine kinase whose
activity is
independent of cyclic nucleotides
and calcium. CK-2 phosphorylates
many
different proteins.
The substrate specificity [1] of this enzyme
can be
summarized as follows:
(1) Under comparable conditions Ser is favored over Thr.
(2) An acidic residue (either Asp or Glu) must be present three residues
from
the C-terminal of the phosphate acceptor site.
(3) Additional acidic residues in positions +1, +2, +4, and +5
increase the
phosphorylation rate. Most physiological substrates have at
least one
acidic residue in these positions.
(4) Asp is preferred to Glu as the provider of acidic determinants.
(5) A basic residue at the N-terminal of the acceptor site decreases
the
phosphorylation rate, while an acidic one will increase it.
-Consensus pattern: [ST]-x(2)-[DE]
[S or T is the phosphorylation site]
-Note: This pattern is found in most of the known physiological
substrates.
-Last update: May 1991 / Text revised.
[ 1] Pinna L.A.
"Casein kinase 2: an 'eminence grise' in cellular regulation?"
Biochim. Biophys. Acta 1054:267-284(1990).
PubMed=2207178
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00007}
{PS00007; TYR_PHOSPHO_SITE}
{BEGIN}
****************************************
* Tyrosine kinase phosphorylation site *
****************************************
Substrates of tyrosine protein kinases are generally characterized by a
lysine
or an arginine seven residues to the N-terminal side of the
phosphorylated
tyrosine. An acidic residue (Asp or Glu) is often found at either
three or
four residues to the N-terminal side of the tyrosine [1,2,3]. There
are a
number of exceptions to this rule such as the tyrosine phosphorylation
sites
of enolase and lipocortin II.
-Consensus pattern: [RK]-x(2)-[DE]-x(3)-Y
or [RK]-x(3)-[DE]-x(2)-Y
[Y is the phosphorylation site]
-Last update: June 1988 / First entry.
[ 1] Patschinsky T., Hunter T., Esch F.S., Cooper J.A., Sefton B.M.
"Analysis of the sequence of amino acids surrounding sites of
tyrosine
phosphorylation."
Proc. Natl. Acad. Sci. U.S.A. 79:973-977(1982).
PubMed=6280176
[ 2] Hunter T.
"Synthetic peptide substrates for a tyrosine protein kinase."
J. Biol. Chem. 257:4843-4848(1982).
PubMed=6279650
[ 3] Cooper J.A., Esch F.S., Taylor S.S., Hunter T.
"Phosphorylation sites in enolase and lactate dehydrogenase utilized
by tyrosine protein kinases in vivo and in vitro."
J. Biol. Chem. 259:7835-7841(1984).
PubMed=6330085
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00008}
{PS00008; MYRISTYL}
{BEGIN}
*************************
* N-myristoylation site *
*************************
An appreciable number of eukaryotic proteins are acylated by the
covalent
addition of myristate (a C14-saturated fatty acid) to their N-terminal
residue
via an amide linkage [1,2]. The sequence specificity of the enzyme
responsible
for this modification,
myristoyl CoA:protein N-myristoyl transferase
(NMT),
has been derived from the sequence of known N-myristoylated proteins and
from
studies using synthetic peptides. It seems to be the following:
- The N-terminal residue must be glycine.
- In position 2, uncharged residues are allowed.
proline
and large hydrophobic residues are not allowed.
Charged residues,
- In positions 3 and 4, most, if not all, residues are allowed.
- In position 5, small uncharged residues are allowed (Ala, Ser, Thr,
Cys,
Asn and Gly). Serine is favored.
- In position 6, proline is not allowed.
-Consensus pattern: G-{EDRKHPFYW}-x(2)-[STAGCN]-{P}
[G is the N-myristoylation site]
-Note: We deliberately include as potential myristoylated glycine
residues,
those which are internal to a sequence. It could well be that the
sequence
under study represents a viral polyprotein precursor and that
subsequent
proteolytic processing could expose an internal glycine as the Nterminal of
a mature protein.
-Last update: October 1989 / Pattern and text revised.
[ 1] Towler D.A., Gordon J.I., Adams S.P., Glaser L.
"The biology and enzymology of eukaryotic protein acylation."
Annu. Rev. Biochem. 57:69-99(1988).
PubMed=3052287; DOI=10.1146/annurev.bi.57.070188.000441
[ 2] Grand R.J.A.
"Acylation of viral and eukaryotic proteins."
Biochem. J. 258:625-638(1989).
PubMed=2658970
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00009}
{PS00009; AMIDATION}
{BEGIN}
******************
* Amidation site *
******************
The precursor of hormones and other active peptides which are Cterminally
amidated is always directly followed [1,2] by a glycine residue which
provides
the amide group, and most often by at least two consecutive basic
residues
(Arg or Lys) which generally function as an active peptide precursor
cleavage
site. Although all amino acids can be amidated, neutral hydrophobic
residues
such as Val or Phe are good substrates, while charged residues such as
Asp or
Arg are much less reactive. C-terminal amidation has not yet been
shown to
occur in unicellular organisms or in plants.
-Consensus pattern: x-G-[RK]-[RK]
[x is the amidation site]
-Last update: June 1988 / First entry.
[ 1] Kreil G.
"Occurrence, detection, and biosynthesis of carboxy-terminal
amides."
Methods Enzymol. 106:218-223(1984).
PubMed=6548541
[ 2] Bradbury A.F., Smyth D.G.
"Biosynthesis of the C-terminal amide in peptide hormones."
Biosci. Rep. 7:907-916(1987).
PubMed=3331120
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00010}
{PS00010; ASX_HYDROXYL}
{BEGIN}
***************************************************
* Aspartic acid and asparagine hydroxylation site *
***************************************************
Post-translational hydroxylation of aspartic acid or asparagine [1] to
form
erythro-beta-hydroxyaspartic acid or erythro-beta-hydroxyasparagine has
been
identified in a number of proteins with domains homologous to epidermal
growth
factor (EGF).
Examples of such proteins are the blood coagulation
protein
factors VII, IX and X, proteins C, S, and Z, the LDL receptor,
thrombomodulin,
etc. Based on sequence comparisons of the EGF-homology region that
contains
hydroxylated Asp or Asn, a consensus sequence has been identified that
seems
to be required by the hydroxylase(s).
-Consensus pattern: C-x-[DN]-x(4)-[FY]-x-C-x-C
[D or N is the hydroxylation site]
-Note: This consensus pattern is located in the N-terminal of
EGF-like
domains, while our EGF-like
cysteine pattern signature (see the
relevant
entry <PDOC00021>) is located in the C-terminal.
-Last update: January 1989 / First entry.
[ 1] Stenflo J., Ohlin A.-K., Owen W.G., Schneider W.J.
"beta-Hydroxyaspartic acid or beta-hydroxyasparagine in bovine low
density lipoprotein receptor and in bovine thrombomodulin."
J. Biol. Chem. 263:21-24(1988).
PubMed=2826439
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00011}
{PS00011; GLA_1}
{PS50998; GLA_2}
{BEGIN}
**********************************************************************
* Gamma-carboxyglutamic acid-rich (Gla) domain signature and profile *
**********************************************************************
The vitamin K-dependent blood coagulation factor IX as well as
several
extracellular regulatory proteins require vitamin K for the
posttranslational
synthesis of gamma-carboxyglutamic acid, an amino acid clustered
in the
N-terminal Gla domain of these proteins [1,2]. The Gla domain is a
membrane
binding motif which, in the presence of calcium ions,
with
phospholipid membranes that include phosphatidylserine.
interacts
The 3D structure of the Gla domain has been solved (see for
example
<PDB:1CFH>) [3,4]. Calcium ions induce conformational changes in
the Gla
domain and are necessary for the Gla domain to fold properly. A
common
structural feature of functional Gla domains is the clustering of Nterminal
hydrophobic residues into a hydrophobic patch that mediates interaction
with
the cell surface membrane [4].
Proteins known to contain a Gla domain are listed below:
- A number of plasma proteins involved in blood coagulation.
These
proteins
are prothrombin, coagulation factors VII, IX and X, proteins C, S, and
Z.
- Two proteins that occur in calcified tissues: osteocalcin (also
known as
bone-Gla protein, BGP), and matrix Gla-protein (MGP).
- Proline-rich Gla proteins 1 and 2 [5].
- Cone snail venom peptides: conantokin-G and -T, and conotoxin GS [6].
The pattern we developed start with the conserved Gla-x(3)-Gla-x-Cys
motif
found in the middle of the domain which seems to be important for
substrate
recognition by the carboxylase [7] and end with the last conserved
position of
the domain (an aromatic residue). We also developed a profile that
covers the
whole Gla domain.
-Consensus pattern: E-x(2)-[ERK]-E-x-C-x(6)-[EDR]-x(10,11)-[FYA]-[YW]
[The 2 E's are the carboxylation site]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 1.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Note: All glutamic residues present in the domain are potential
carboxylation
sites; in coagulation proteins, all are modified to Gla, while in BGP
and MGP
some are not.
-Expert(s) to contact by email:
Price P.A.; [email protected]
-Last update: June 2004 / Pattern and text revised; profile added.
[ 1] Friedman P.A., Przysiecki C.T.
"Vitamin K-dependent carboxylation."
Int. J. Biochem. 19:1-7(1987).
PubMed=3106112
[ 2] Vermeer C.
"Gamma-carboxyglutamate-containing proteins and the vitamin
K-dependent carboxylase."
Biochem. J. 266:625-636(1990).
PubMed=2183788
[ 3] Freedman S.J., Furie B.C., Furie B., Baleja J.D.
"Structure of the metal-free gamma-carboxyglutamic acid-rich
membrane
binding region of factor IX by two-dimensional NMR spectroscopy."
J. Biol. Chem. 270:7980-7987(1995).
PubMed=7713897
[ 4] Freedman S.J., Blostein M.D., Baleja J.D., Jacobs M., Furie B.C.,
Furie B.
"Identification of the phospholipid binding site in the vitamin
K-dependent blood coagulation protein factor IX."
J. Biol. Chem. 271:16227-16236(1996).
PubMed=8663165
[ 5] Kulman J.D., Harris J.E., Haldeman B.A., Davie E.W.
"Primary structure and tissue distribution of two novel proline-rich
gamma-carboxyglutamic acid proteins."
Proc. Natl. Acad. Sci. U.S.A. 94:9058-9062(1997).
PubMed=9256434
[ 6] Haack J.A., Rivier J.E., Parks T.N., Mena E.E., Cruz L.J., Olivera
B.M.
"Conantokin-T. A gamma-carboxyglutamate containing peptide with
N-methyl-d-aspartate antagonist activity."
J. Biol. Chem. 265:6025-6029(1990).
PubMed=2180939
[ 7] Price P.A., Fraser J.D., Metz-Virca G.
"Molecular cloning of matrix Gla protein: implications for substrate
recognition by the vitamin K-dependent gamma-carboxylase."
Proc. Natl. Acad. Sci. U.S.A. 84:8335-8339(1987).
PubMed=3317405
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00012}
{PS00012; PHOSPHOPANTETHEINE}
{PS50075; ACP_DOMAIN}
{BEGIN}
**************************************
* Phosphopantetheine attachment site *
**************************************
Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic
group of
acyl carrier proteins (ACP) in some multienzyme complexes where it
serves as
a 'swinging arm' for the attachment of activated fatty acid and
amino-acid
groups [1]. Phosphopantetheine is attached to a serine residue in
these
proteins [2]. ACP proteins or
domains have been found in various
enzyme
systems which are listed below (references are only provided for
recently
determined sequences).
- Fatty acid synthetase (FAS), which catalyzes the formation of longchain
fatty acids
from acetyl-CoA, malonyl-CoA and NADPH. Bacterial and
plant
chloroplast FAS are composed of eight separate subunits which
correspond to
the different enzymatic activities; ACP is one of these
polypeptides.
Fungal FAS consists of two multifunctional proteins, FAS1 and FAS2;
the ACP
domain is located in the N-terminal section of FAS2.
Vertebrate FAS
consists of a single multifunctional enzyme; the ACP domain is
located
between the beta-ketoacyl reductase domain and the C-terminal
thioesterase
domain [3].
- Polyketide antibiotics synthase enzyme systems. Polyketides are
secondary
metabolites produced from simple fatty acids, by microorganisms and
plants.
ACP is one of the polypeptidic components involved in the
biosynthesis of
Streptomyces polyketide antibiotics actinorhodin, curamycin,
granatacin,
monensin, oxytetracycline and tetracenomycin C.
- Bacillus subtilis putative polyketide synthases pksK, pksL and pksM
which
respectively contain three, five and one ACP domains.
- The multifunctional 6-methysalicylic acid synthase (MSAS) from
Penicillium
patulum. This is a multifunctional enzyme involved in the biosynthesis
of a
polyketide antibiotic and which contains an ACP domain in the Cterminal
extremity.
- Multifunctional mycocerosic acid synthase (gene mas) from
Mycobacterium
bovis.
- Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This
enzyme
catalyzes the first step in the biosynthesis of the cyclic
antibiotic
gramicidin S.
- Tyrocidine synthetase I (gene tycA) from Bacillus brevis.
The
reaction
carried out by tycA is identical to that catalyzed by grsA
- Gramicidin S synthetase II (gene grsB) from Bacillus brevis. This
enzyme
is a multifunctional protein that activates and polymerizes
proline,
valine, ornithine and leucine. GrsB contains four ACP domains.
- Erythronolide synthase proteins 1, 2 and 3 from Saccharopolyspora
erythraea
which is
involved in the biosynthesis of the polyketide
antibiotic
erythromicin. Each of these proteins contain two ACP domains.
- Conidial green pigment synthase from Aspergillus nidulans.
- ACV synthetase from various fungi. This enzyme catalyzes the first
step in
the biosynthesis of penicillin and cephalosporin. It contains
three ACP
domains.
- Enterobactin synthetase component F (gene entF) from Escherichia coli.
This
enzyme is involved in the ATP-dependent activation of serine
during
enterobactin (enterochelin) biosynthesis.
- Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3
from
Bacillus subtilis. Subunits 1 and 2 contains three related domains
while
subunit 3 only contains a single domain.
- HC-toxin synthetase (gene HTS1) from Cochliobolus carbonum. This
enzyme
synthesizes HC-toxin,
a cyclic tetrapeptide. HTS1 contains four
ACP
domains.
- Fungal mitochondrial ACP, which is part of the respiratory chain
NADH
dehydrogenase (complex I).
- Rhizobium nodulation protein nodF, which probably acts as an ACP
in the
synthesis of the nodulation Nod factor fatty acyl chain.
The sequence around the phosphopantetheine attachment site is conserved
in all
these proteins and can be used as a signature pattern. A profile was
also
developed that spans the complete ACP-like domain.
-Consensus pattern: [DEQGSTALMKRH]-[LIVMFYSTAC]-[GNQ]-[LIVMFYAG][DNEKHS]-S[LIVMST]-{PCFY}-[STAGCPQLIVMF]-[LIVMATN][DENQGTAKRHLM][LIVMWSTA]-[LIVGSTACR]-{LPIY}-{VY}-[LIVMFA]
[S is the pantetheine attachment site]
-Sequences known to belong to this class detected by the pattern: ALL,
except
C.paradoxa ACP.
-Other sequence(s) detected in Swiss-Prot: 115.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: December 2004 / Pattern and text revised.
[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de
Gruyter,
Berlin New-York (1988).
[ 2] Pugh E.L., Wakil S.J.
J. Biol. Chem. 240:4727-4733(1965).
[ 3] Witkowski A., Rangan V.S., Randhawa Z.I., Amy C.M., Smith S.
"Structural organization of the multifunctional animal fatty-acid
synthase."
Eur. J. Biochem. 198:571-579(1991).
PubMed=2050137
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00013}
{PS51257; PROKAR_LIPOPROTEIN}
{BEGIN}
******************************************************************
* Prokaryotic membrane lipoprotein lipid attachment site profile *
******************************************************************
In prokaryotes, membrane lipoproteins are synthesized with a precursor
signal
peptide, which is cleaved by a specific lipoprotein signal peptidase
(signal
peptidase II). The peptidase recognizes a conserved sequence and cuts
upstream
of a cysteine residue to which a glyceride-fatty acid lipid is attached
[1].
Some of the proteins known to undergo such processing currently include
(for
recent listings see [1,2,3]):
- Major outer membrane lipoprotein (murein-lipoproteins) (gene lpp).
- Escherichia coli lipoprotein-28 (gene nlpA).
- Escherichia coli lipoprotein-34 (gene nlpB).
- Escherichia coli lipoprotein nlpC.
- Escherichia coli lipoprotein nlpD.
- Escherichia coli osmotically inducible lipoprotein B (gene osmB).
- Escherichia coli osmotically inducible lipoprotein E (gene osmE).
- Escherichia coli peptidoglycan-associated lipoprotein (gene pal).
- Escherichia coli rare lipoproteins A and B (genes rplA and rplB).
- Escherichia coli copper homeostasis protein cutF (or nlpE).
- Escherichia coli plasmids traT proteins.
- Escherichia coli Col plasmids lysis proteins.
- A number of Bacillus beta-lactamases.
- Bacillus subtilis periplasmic oligopeptide-binding protein (gene
oppA).
- Borrelia burgdorferi outer surface proteins A and B (genes ospA and
ospB).
- Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene
vmp7).
- Chlamydia trachomatis outer membrane protein 3 (gene omp3).
- Fibrobacter succinogenes endoglucanase cel-3.
- Haemophilus influenzae proteins Pal and Pcp.
- Klebsiella pullulunase (gene pulA).
- Klebsiella pullulunase secretion protein pulS.
- Mycoplasma hyorhinis protein p37.
- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes
vlpABC).
- Neisseria outer membrane protein H.8.
- Pseudomonas aeruginosa lipopeptide (gene lppL).
- Pseudomonas solanacearum endoglucanase egl.
- Rhodopseudomonas viridis reaction center cytochrome subunit (gene
cytC).
- Rickettsia 17 Kd antigen.
- Shigella flexneri invasion plasmid proteins mxiJ and mxiM.
- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA).
- Treponema pallidium 34 Kd antigen.
- Treponema pallidium membrane protein A (gene tmpA).
- Vibrio harveyi chitobiase (gene chb).
- Yersinia virulence plasmid protein yscJ.
- Halocyanin from Natrobacterium pharaonis [4], a membrane associated
copperbinding protein. This is the first archaebacterial protein known
to be
modified in such a fashion).
From the precursor sequences of all these proteins, we derived a profile
that
starts
at
the
beginning
of
the
sequence
and
ends
after
the
post-translationally modified cysteine.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: some 100 prokaryotic proteins.
Some
of them are not membrane lipoproteins, but at least half of them could
be.
-Note: This profile replace an obsolete rule. All the information in the
rule
has been encoded in the profile format.
-Last update: October 2006 / Text revised; profiles added; rule deleted.
[ 1] Hayashi S., Wu H.C.
"Lipoproteins in bacteria."
J. Bioenerg. Biomembr. 22:451-471(1990).
PubMed=2202727
[ 2] Klein P., Somorjai R.L., Lau P.C.K.
"Distinctive properties of signal sequences from bacterial
lipoproteins."
Protein Eng. 2:15-20(1988).
PubMed=3253732
[ 3] von Heijne G.
Protein Eng. 2:531-534(1989).
[ 4] Mattar S., Scharf B., Kent S.B.H., Rodewald K., Oesterhelt D.,
Engelhard M.
"The primary structure of halocyanin, an archaeal blue copper
protein,
predicts a lipid anchor for membrane fixation."
J. Biol. Chem. 269:14939-14945(1994).
PubMed=8195126
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00014}
{PS00014; ER_TARGET}
{BEGIN}
********************************************
* Endoplasmic reticulum targeting sequence *
********************************************
Proteins that permanently reside in the lumen of the endoplasmic
reticulum
(ER) seem to be distinguished from newly synthesized secretory proteins
by the
presence of the C-terminal sequence Lys-Asp-Glu-Leu (KDEL) [1,2]. While
KDEL
is the preferred signal in many species, variants of that signal are
used by
different species. This situation is described in the following table.
Signal
Species
---------------------------------------------------------------KDEL
Vertebrates, Drosophila, Caenorhabditis elegans, plants
HDEL
Saccharomyces cerevisiae, Kluyveromyces lactis, plants
DDEL
Kluyveromyces lactis
ADEL
Schizosaccharomyces pombe (fission yeast)
SDEL
Plasmodium falciparum
The signal is usually very strictly conserved in major ER proteins but
some
minor ER proteins have
divergent sequences (probably because
efficient
retention of these proteins is not crucial to the cell).
Proteins bearing the KDEL-type signal are not simply held in the ER,
but are
selectively retrieved from a post-ER compartment by a receptor and
returned to
their normal location.
The currently known ER luminal proteins are listed below.
- Protein disulfide-isomerase (PDI)
(also known as the betasubunit of
prolyl 4-hydroxylase, as a component of oligosaccharyl
transferase, as
glutathione-insulin transhydrogenase and as a thyroid hormone
binding
protein).
- ERp60, ERp72, and P5, three minor isoforms of PDI.
- Trypanosoma brucei bloodstream-specific protein 2, a probable PDI.
- hsp70 related protein GRP78 (also known as the immunoglobulin heavy
chain
binding protein (BiP), and as KAR2, in fungi).
- hsp90 related protein 'endoplasmin' (also known as GRP94, Erp99 or
Hsp108).
- Calreticulin, a calcium-binding protein (also known as calregulin,
CRP55,
or HACBP).
- ERC-55, a calcium-binding protein.
- Reticulocalbin, a calcium-binding protein.
- Hsp47, a heat-shock protein that binds strongly to collagen and
could act
as a chaperone in the collagen biosynthetic pathway.
- A receptor for a plant hormone, auxin.
- Thiol proteases from rice bean (SH-EP) and kidney bean (EP-C1).
- Esterases from mammalian liver and from nematodes.
- Alpha-2-macroglobulin receptor-associated protein (RAP).
- Yeast peptidyl-prolyl cis-trans isomerase D (CYPD).
- Yeast protein KRE5, a protein required for (1->6)-beta-D-glucan
synthesis.
- Yeast protein SEC20, required
for the transport of proteins
from the
endoplasmic reticulum to the Golgi apparatus.
- Yeast protein SCJ1, involved in protein sorting.
-Consensus pattern: [KRHQSA]-[DENQ]-E-L>
-Sequences known to belong to this class detected by the pattern: ALL,
except
for liver esterases which have H-[TVI]-E-L.
-Other sequence(s) detected in Swiss-Prot: 24 proteins which are
clearly not
located in the ER (because they are of bacterial or viral
origin, for
example) and a protein which can be considered as valid candidate: human
80KH protein.
-Last update: November 1997 / Text revised.
[ 1] Munro S., Pelham H.R.B.
"A C-terminal signal prevents secretion of luminal ER proteins."
Cell 48:899-907(1987).
PubMed=3545499
[ 2] Pelham H.R.B.
"The retention signal for soluble proteins of the endoplasmic
reticulum."
Trends Biochem. Sci. 15:483-486(1990).
PubMed=2077689
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00015}
{PS50079; NLS_BP}
{BEGIN}
*************************************************
* Bipartite nuclear localization signal profile *
*************************************************
The uptake of protein by the
nucleus is extremely selective
and
nuclear
proteins must therefore contain within their final structure a signal
that
specifies selective accumulation in the nucleus [1,2]. Studies on some
nuclear
proteins, such as the large T antigen of SV40, have indicated which
part of
the sequence is required for nuclear translocation.
The known
nuclear
targeting sequences are
generally basic, but there seems to be no
clear
common denominator between all the known sequences. Although some
consensus
sequence patterns have been proposed (see for example [3]), the current
best
strategy to detect a nuclear targeting sequence is based [4] on the
following
definition of what is called a 'bipartite nuclear localization signal':
(1) Two adjacent basic amino acids (Arg or Lys).
(2) A spacer region of any 10 residues.
(3) At least three basic residues (Arg or Lys) in the five positions
after the spacer region.
The profile
localization
signal.
we
developed
covers
the entire bipartite nuclear
-Sequences known to belong to this class detected by the profile: 56% of
known
nuclear proteins according to [4].
-Other sequence(s) detected in Swiss-Prot: about 4.2% of non-nuclear
proteins
according to [4].
-Note: This profile replace an obsolete rule. All the information in the
rule
has been encoded in the profile format.
-Last update: October 2006 / Text revised; profiles added; rule deleted.
[ 1] Dingwall C., Laskey R.A.
"Protein import into the cell nucleus."
Annu. Rev. Cell Biol. 2:367-390(1986).
PubMed=3548772; DOI=10.1146/annurev.cb.02.110186.002055
[ 2] Garcia-Bustos J.F., Heitman J., Hall M.N.
Biochim. Biophys. Acta 1071:83-101(1991).
[ 3] Gomez-Marquez J., Segade F.
FEBS Lett. 226:217-219(1988).
[ 4] Dingwall C., Laskey R.A.
"Nuclear targeting sequences -- a consensus?"
Trends Biochem. Sci. 16:478-481(1991).
PubMed=1664152
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00016}
{PS00016; RGD}
{BEGIN}
****************************
* Cell attachment sequence *
****************************
The sequence Arg-Gly-Asp, found in fibronectin, is crucial for its
interaction
with its cell surface receptor, an integrin [1,2]. What has been
called the
'RGD' tripeptide is also found in the sequences of a number of other
proteins,
where it has been shown to play a role in cell adhesion.
These proteins
are:
some forms of collagens, fibrinogen, vitronectin, von Willebrand factor
(VWF),
snake disintegrins, and slime mold discoidins.
The 'RGD' tripeptide is
also
found in other proteins where it may also, but not always, serve the
same
purpose.
-Consensus pattern: R-G-D
-Last update: December 1991 / Text revised.
[ 1] Ruoslahti E., Pierschbacher M.D.
"Arg-Gly-Asp: a versatile cell recognition signal."
Cell 44:517-518(1986).
PubMed=2418980
[ 2] d'Souza S.E., Ginsberg M.H., Plow E.F.
Trends Biochem. Sci. 16:246-250(1991).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00017}
{PS00017; ATP_GTP_A}
{BEGIN}
*****************************************
* ATP/GTP-binding site motif A (P-loop) *
*****************************************
From sequence comparisons and crystallographic data analysis it has been
shown
[1,2,3,4,5,6] that an appreciable proportion of proteins that bind ATP
or GTP
share a number of more or less conserved sequence motifs.
The best
conserved
of these motifs is a glycine-rich region, which typically forms a
flexible
loop between a beta-strand and an alpha-helix. This loop interacts with
one of
the phosphate groups of the nucleotide.
This sequence motif is
generally
referred to as the 'A' consensus sequence [1] or the 'P-loop' [5].
There are numerous ATP- or GTP-binding proteins in which the P-loop is
found.
We list below a number of protein families for which the relevance
of the
presence of such motif has been noted:
- ATP synthase alpha and beta subunits (see <PDOC00137>).
- Myosin heavy chains.
- Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>).
- Dynamins and dynamin-like proteins (see <PDOC00362>).
- Guanylate kinase (see <PDOC00670>).
- Thymidine kinase (see <PDOC00524>).
- Thymidylate kinase (see <PDOC01034>).
- Shikimate kinase (see <PDOC00868>).
- Nitrogenase iron protein family (nifH/chlL) (see <PDOC00580>).
- ATP-binding proteins involved in 'active transport' (ABC
transporters) [7]
(see <PDOC00185>).
- DNA and RNA helicases [8,9,10].
- GTP-binding elongation factors (EF-Tu, EF-1alpha, EF-G, EF-2, etc.).
- Ras family of GTP-binding proteins (Ras, Rho, Rab, Ral, Ypt1, SEC4,
etc.).
- Nuclear protein ran (see <PDOC00859>).
- ADP-ribosylation factors family (see <PDOC00781>).
- Bacterial dnaA protein (see <PDOC00771>).
- Bacterial recA protein (see <PDOC00131>).
- Bacterial recF protein (see <PDOC00539>).
- Guanine nucleotide-binding proteins alpha subunits (Gi, Gs, Gt, G0,
etc.).
- DNA mismatch repair proteins mutS family (See <PDOC00388>).
- Bacterial type II secretion system protein E (see <PDOC00567>).
Not all ATP- or GTP-binding proteins are picked-up by this motif. A
number of
proteins escape detection because the structure
of their ATP-binding
site is
completely different from that of the P-loop. Examples of such
proteins are
the E1-E2 ATPases or the glycolytic kinases.
In other ATP- or GTPbinding
proteins the flexible loop exists in a slightly different form; this is
the
case for tubulins or protein kinases. A special mention must be
reserved for
adenylate kinase, in which there is a single deviation from the
P-loop
pattern: in the last position Gly is found instead of Ser or Thr.
-Consensus pattern: [AG]-x(4)-G-K-[ST]
-Sequences known to belong to this class detected by the pattern: a
majority.
-Other sequence(s) detected in Swiss-Prot: in addition to the proteins
listed
above, the 'A' motif is also found in a number of other proteins.
Most of
these proteins probably bind a nucleotide, but others are
definitively not
ATP- or GTP-binding (as for example chymotrypsin, or human ferritin
light
chain).
-Expert(s) to contact by email:
Koonin E.V.; [email protected]
-Last update: July 1999 / Text revised.
[ 1] Walker J.E., Saraste M., Runswick M.J., Gay N.J.
"Distantly related sequences in the alpha- and beta-subunits of ATP
synthase, myosin, kinases and other ATP-requiring enzymes and a
common
nucleotide binding fold."
EMBO J. 1:945-951(1982).
PubMed=6329717
[ 2] Moller W., Amons R.
"Phosphate-binding sequences in nucleotide-binding proteins."
FEBS Lett. 186:1-7(1985).
PubMed=2989003
[ 3] Fry D.C., Kuby S.A., Mildvan A.S.
"ATP-binding site of adenylate kinase: mechanistic implications of
its
homology with ras-encoded p21, F1-ATPase, and other nucleotidebinding
proteins."
Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986).
PubMed=2869483
[ 4] Dever T.E., Glynias M.J., Merrick W.C.
"GTP-binding domain: three consensus sequence elements with distinct
spacing."
Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987).
PubMed=3104905
[ 5] Saraste M., Sibbald P.R., Wittinghofer A.
"The P-loop -- a common motif in ATP- and GTP-binding proteins."
Trends Biochem. Sci. 15:430-434(1990).
PubMed=2126155
[ 6] Koonin E.V.
"A superfamily of ATPases with diverse functions containing either
classical or deviant ATP-binding motif."
J. Mol. Biol. 229:1165-1174(1993).
PubMed=8445645
[ 7] Higgins C.F., Hyde S.C., Mimmack M.M., Gileadi U., Gill D.R.,
Gallagher M.P.
"Binding protein-dependent transport systems."
J. Bioenerg. Biomembr. 22:571-592(1990).
PubMed=2229036
[ 8] Hodgman T.C.
"A new superfamily of replicative proteins."
Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).
PubMed=3362205; DOI=10.1038/333022b0
[ 9] Linder P., Lasko P.F., Ashburner M., Leroy P., Nielsen P.J., Nishi
K.,
Schnier J., Slonimski P.P.
"Birth of the D-E-A-D box."
Nature 337:121-122(1989).
PubMed=2563148; DOI=10.1038/337121a0
[10] Gorbalenya A.E., Koonin E.V., Donchenko A.P., Blinov V.M.
Nucleic Acids Res. 17:4713-4730(1989).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00018}
{PS00018; EF_HAND_1}
{PS50222; EF_HAND_2}
{BEGIN}
********************************************************
* EF-hand calcium-binding domain signature and profile *
********************************************************
Many calcium-binding proteins belong to the same evolutionary family and
share
a type of calcium-binding domain known as the EF-hand [1 to 5]. This
type of
domain consists of a twelve residue loop flanked on both side by a
twelve
residue alpha-helical domain (see <PDB:1CLL>). In an EF-hand loop the
calcium
ion is coordinated in a pentagonal bipyramidal configuration. The six
residues
involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these
residues
are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at
position 12
provides
two
oxygens for liganding Ca (bidentate ligand). The
basic
structural/functional unit of EF-hand proteins is usually a pair of
EF-hand
motifs that together form a stable four-helix bundle domain. The
pairing of
EF-hand enables cooperativity in the binding of Ca2+ ions.
We list below the proteins which are known to contain EF-hand
regions. For
each type of protein we have indicated between parenthesis the total
number of
EF-hand regions known or supposed to exist.
This number does not
include
regions which clearly have lost their calcium-binding properties,
or the
atypical low-affinity site (which spans thirteen residues) found in the
S-100/
ICaBP family of proteins [6].
- Aequorin and Renilla luciferin binding protein (LBP) (Ca=3).
- Alpha actinin (Ca=2).
- Calbindin (Ca=4).
- Calcineurin B subunit (protein phosphatase 2B regulatory subunit)
(Ca=4).
- Calcium-binding protein from Streptomyces erythraeus (Ca=3?).
- Calcium-binding protein from Schistosoma mansoni (Ca=2?).
- Calcium-binding proteins TCBP-23 and TCBP-25 from Tetrahymena
thermophila
(Ca=4?).
- Calcium-dependent protein kinases (CDPK) from plants (Ca=4).
- Calcium vector protein from amphoxius (Ca=2).
- Calcyphosin (thyroid protein p24) (Ca=4?).
- Calmodulin (Ca=4, except in yeast where Ca=3).
- Calpain small and large chains (Ca=2).
- Calretinin (Ca=6).
- Calcyclin (prolactin receptor associated protein) (Ca=2).
- Caltractin (centrin) (Ca=2 or 4).
- Cell Division Control protein 31 (gene CDC31) from yeast (Ca=2?).
- Diacylglycerol kinase (EC 2.7.1.107) (DGK) (Ca=2).
- FAD-dependent
glycerol-3-phosphate
dehydrogenase
(EC 1.1.99.5)
from
mammals (Ca=1).
- Fimbrin (plastin) (Ca=2).
- Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1
or 2).
- Guanylate cyclase activating protein (GCAP) (Ca=3).
- Inositol phospholipid-specific phospholipase C isozymes gamma-1 and
delta-1
(Ca=2) [10].
- Intestinal calcium-binding protein (ICaBPs) (Ca=2).
- MIF related proteins 8 (MRP-8 or CFAG) and 14 (MRP-14) (Ca=2).
- Myosin regulatory light chains (Ca=1).
- Oncomodulin (Ca=2).
- Osteonectin (basement membrane protein BM-40) (SPARC) and proteins
that
contains an 'osteonectin' domain (QR1, matrix glycoprotein SC1)
(see the
entry <PDOC00535>) (Ca=1).
- Parvalbumins alpha and beta (Ca=2).
- Placental calcium-binding protein (18a2) (nerve growth factor
induced
protein 42a) (p9k) (Ca=2).
- Recoverins (visinin, hippocalcin, neurocalcin, S-modulin) (Ca=2 to 3).
- Reticulocalbin (Ca=4).
- S-100 protein, alpha and beta chains (Ca=2).
- Sarcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3).
- Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-1 (Ca=8).
- Serine/threonine specific protein phosphatase rdgc (EC 3.1.3.16)
from
Drosophila (Ca=2).
- Sorcin V19 from hamster (Ca=2).
- Spectrin alpha chain (Ca=2).
- Squidulin (optic lobe calcium-binding protein) from squid (Ca=4).
- Troponins C; from skeletal muscle (Ca=4), from cardiac muscle (Ca=3),
from
arthropods and molluscs (Ca=2).
There has been a number of attempts [7,8] to develop patterns that pickup EFhand regions, but these studies were made a few years ago when not so
many
different families of calcium-binding proteins were known. We
therefore
developed a new pattern which takes into account all published sequences.
This
pattern includes the complete EF-hand loop as well as the first residue
which
follows the loop and which seem to always be hydrophobic. We also
developed a
profile that covers the loop and the two alpha helices.
-Consensus pattern: D-{W}-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC][DENQSTAGC]-x(2)-[DE]-[LIVMFYW]
-Sequences known to belong to this class detected by the profile: ALL.
for a few sequences.
-Other sequence(s) detected in Swiss-Prot: NONE.
probably not calcium-binding and a few proteins for which we have
reason to
believe that they bind calcium: a number of endoglucanases and a
xylanase
from the cellulosome complex of Clostridium [9].
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Note: Positions 1 (X), 3 (Y) and 12 (-Z) are the most conserved.
-Note: The 6th residue in an EF-hand loop is, in most cases a Gly, but
the
number of exceptions to this 'rule' has gradually increased and we felt
that
the pattern should include all the different residues which have been
shown
to exist in this position in functional Ca-binding sites.
-Note: The pattern will, in some cases, miss one of the EF-hand
regions in
some proteins with multiple EF-hand domains.
-Expert(s) to contact by email:
Cox J.A.; [email protected]
Kretsinger R.H.; [email protected]
-Last update: April 2006 / Pattern revised.
[ 1] Kawasaki H., Kretsinger R.H.
"Calcium-binding proteins 1: EF-hands."
Protein Prof. 2:305-490(1995).
PubMed=7553064
[ 2] Kretsinger R.H.
"Calcium coordination and the calmodulin fold: divergent versus
convergent evolution."
Cold Spring Harb. Symp. Quant. Biol. 52:499-510(1987).
PubMed=3454274
[ 3] Moncrief N.D., Kretsinger R.H., Goodman M.
"Evolution of EF-hand calcium-modulated proteins. I. Relationships
based on amino acid sequences."
J. Mol. Evol. 30:522-562(1990).
PubMed=2115931
[ 4] Nakayama S., Moncrief N.D., Kretsinger R.H.
"Evolution of EF-hand calcium-modulated proteins. II. Domains of
several subfamilies have diverse evolutionary histories."
J. Mol. Evol. 34:416-448(1992).
PubMed=1602495
[ 5] Heizmann C.W., Hunziker W.
"Intracellular calcium-binding proteins: more sites than insights."
Trends Biochem. Sci. 16:98-103(1991).
PubMed=2058003
[ 6] Kligman D., Hilt D.C.
"The S100 protein family."
Trends Biochem. Sci. 13:437-443(1988).
PubMed=3075365
[ 7] Strynadka N.C.J., James M.N.
"Crystal structures of the helix-loop-helix calcium-binding
proteins."
Annu. Rev. Biochem. 58:951-998(1989).
PubMed=2673026; DOI=10.1146/annurev.bi.58.070189.004511
[ 8] Haiech J., Sallantin J.
"Computer search of calcium binding sites in a gene data bank: use
of
learning techniques to build an expert system."
Biochimie 67:555-560(1985).
PubMed=3839696
[ 9] Chauvaux S., Beguin P., Aubert J.-P., Bhat K.M., Gow L.A., Wood
T.M.,
Bairoch A.
"Calcium-binding affinity and calcium-enhanced activity of
Clostridium
thermocellum endoglucanase D."
Biochem. J. 265:261-265(1990).
PubMed=2302168
[10] Bairoch A., Cox J.A.
"EF-hand motifs in inositol phospholipid-specific phospholipase C."
FEBS Lett. 269:454-456(1990).
PubMed=2401372
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00019}
{PS00019; ACTININ_1}
{PS00020; ACTININ_2}
{BEGIN}
************************************************
* Actinin-type actin-binding domain signatures *
************************************************
Alpha-actinin is a F-actin cross-linking protein which is thought to
anchor
actin to a variety of intracellular structures [1].
The actin-binding
domain
of alpha-actinin seems to reside in the first 250 residues of the
protein. A
similar actin-binding domain has been found in the N-terminal region of
many
different actin-binding proteins [2,3]:
- In the beta chain of spectrin (or fodrin).
- In dystrophin, the protein defective in Duchenne muscular dystrophy
(DMD)
and which may play a role in anchoring the cytoskeleton to the
plasma
membrane.
- In the slime mold gelation factor (or ABP-120).
- In actin-binding protein ABP-280 (or filamin), a protein that link
actin
filaments to membrane glycoproteins.
- In fimbrin (or plastin), an actin-bundling protein. Fimbrin differs
from
the above proteins in that it contains two tandem copies of the
actinbinding domain and that these copies are located in the C-terminal
part of
the protein.
We selected two conserved regions as signature patterns for this
type of
domain. The first of this region is located at the beginning of the
domain,
while the second one is located in the central section and has been
shown to
be essential for the binding of actin.
-Consensus pattern: [EQ]-{LNYH}-x-[ATV]-[FY]-{LDAM}-{T}-W-{PG}-N
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 32.
-Consensus pattern: [LIVM]-x-[SGNL]-[LIVMN]-[DAGHENRS]-[SAGPNVT]-x[DNEAG][LIVM]-x-[DEAGQ]-x(4)-[LIVM]-x-[LM]-[SAG]-[LIVM][LIVMT][WS]-x(0,1)-[LIVM](2)
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: April 2006 / Patterns revised.
[ 1] Schleicher M., Andre E., Hartmann H., Noegel A.A.
"Actin-binding proteins are conserved from slime molds to man."
Dev. Genet. 9:521-530(1988).
PubMed=3243032
[ 2] Matsudaira P.
"Modular organization of actin crosslinking proteins."
Trends Biochem. Sci. 16:87-92(1991).
PubMed=2058002
[ 3] Dubreuil R.R.
"Structure and evolution of the actin crosslinking proteins."
BioEssays 13:219-226(1991).
PubMed=1892474
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00020}
{PS00021; KRINGLE_1}
{PS50070; KRINGLE_2}
{BEGIN}
****************************************
* Kringle domain signature and profile *
****************************************
Kringles [1,2,3] are triple-looped, disulfide cross-linked domains found
in a
varying number of copies, in some serine proteases and plasma
proteins. The
kringle domain has been found in the following proteins:
-
Apolipoprotein A (38 copies).
Blood coagulation factor XII (Hageman factor) (1 copy).
Hepatocyte growth factor (HGF) (4 copies).
Hepatocyte growth factor like protein (4 copies) [4].
Hepatocyte growth factor activator [1] (once) [5].
Plasminogen (5 copies).
Thrombin (2 copies).
Tissue plasminogen activator (TPA) (2 copies).
Urokinase-type plasminogen activator (1 copy).
The schematic
domain is
shown below:
representation
of the structure of a typical kringle
+---------------------------------------+
|
|
xCxxxxxxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxCxxxCx
|
|
|
|
+----------|-----+
|
+------------+
'C': conserved cysteine involved in a disulfide bond.
Kringle domains are thought to play a role in binding mediators,
such as
membranes, other
proteins
or phospholipids, and in the
regulation of
proteolytic activity.
As a signature pattern for this type of
domain, we
selected a conserved sequence that contains two of the cysteines
invovled in
disulfide bonds.
-Consensus pattern: [FY]-C-[RH]-[NS]-x(7,8)-[WY]-C
[The 2 C's are involved in a disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 5
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Ikeo K.; [email protected]
-Last update: May 2004 / Text revised.
[ 1] Castellino F.J., Beals J.M.
"The genetic relationships between the kringle domains of human
plasminogen, prothrombin, tissue plasminogen activator, urokinase,
and
coagulation factor XII."
J. Mol. Evol. 26:358-369(1987).
PubMed=3131537
[ 2] Patthy L.
"Evolution of the proteases of blood coagulation and fibrinolysis by
assembly from modules."
Cell 41:657-663(1985).
PubMed=3891096
[ 3] Ikeo K., Takahashi K., Gojobori T.
"Evolutionary origin of numerous kringles in human and simian
apolipoprotein(a)."
FEBS Lett. 287:146-148(1991).
PubMed=1879523
[ 4] Friezner Degen S.J., Stuart L.A., Han S., Jamison C.S.
Biochemistry 30:9781-9791(1991).
[ 5] Miyazawa K., Shimomura T., Kitamura A., Kondo J., Morimoto Y.,
Kitamura N.
"Molecular cloning and sequence analysis of the cDNA for a human
serine protease reponsible for activation of hepatocyte growth
factor.
Structural similarity of the protease precursor to blood coagulation
factor XII."
J. Biol. Chem. 268:10024-10028(1993).
PubMed=7683665
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00021}
{PS00022; EGF_1}
{PS01186; EGF_2}
{PS50026; EGF_3}
{BEGIN}
******************************************
* EGF-like domain signatures and profile *
******************************************
A sequence of about thirty to forty amino-acid residues long found in
the
sequence of epidermal growth factor (EGF) has been shown [1 to 6]
to be
present, in a more or less conserved form, in a large number of other,
mostly
animal proteins. EGF is a polypeptide of about 50 amino acids with
three
internal disulfide bridges. It first binds with high affinity to
specific
cell-surface receptors and then induces their dimerization, which is
essential
for activating the tyrosine kinase in the receptor cytoplasmic
domain,
initiating a signal transduction that results in DNA synthesis and
cell
proliferation.
A common feature of all EGF-like domains is that they are found
in the
extracellular domain of membrane-bound proteins or in proteins known
to be
secreted (exception: prostaglandin G/H synthase). The EGF-like domain
includes
six cysteine residues which have been shown to be involved in disulfide
bonds.
The structure of several EGF-like domains has been solved. The fold
consists
of
two-stranded beta-sheet followed by a loop to a C-terminal
short
two-stranded sheet (see <PDB:1EGF). Subdomains between the conserved
cysteines
strongly vary in length as shown in the following schematic
representation of
the EGF-like domain:
+-------------------+
+-------------------------+
|
|
|
|
x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)C-x
|
|
************************************
+-------------------+
'C':
'G':
'a':
'*':
'x':
conserved cysteine involved in a disulfide bond.
often conserved glycine
often conserved aromatic amino acid
position of both patterns.
any residue
Some proteins
domain are
listed below.
known
to contain one or more copies of an EGF-like
- Adipocyte differentiation inhibitor (gene PREF-1) from mouse (6
copies).
- Agrin, a basal lamina protein that causes the aggregation of
acetylcholine
receptors on cultured muscle fibers (4 copies).
- Amphiregulin, a growth factor (1 copy).
- Betacellulin, a growth factor (1 copy).
- Blastula proteins BP10 and Span from sea urchin which are thought
to be
involved in pattern formation (1 copy).
- BM86, a glycoprotein antigen of cattle tick (7 copies).
- Bone morphogenic protein 1 (BMP-1), a protein which induces cartilage
and
bone formation and which expresses metalloendopeptidase activity
(1-2
copies). Homologous proteins are found in sea urchin - suBMP (1 copy)
- and
in Drosophila - the dorsal-ventral patterning protein tolloid (2
copies).
- Caenorhabditis elegans developmental proteins lin-12 (13 copies) and
glp-1
(10 copies).
- Caenorhabditis elegans apx-1 protein, a patterning protein (4.5
copies).
- Calcium-dependent serine proteinase (CASP) which degrades the
extracellular
matrix proteins type I and IV collagen and fibronectin (1 copy).
- Cartilage matrix protein CMP (1 copy).
- Cartilage oligomeric matrix protein COMP (4 copies).
- Cell surface antigen 114/A10 (3 copies).
- Cell surface glycoprotein complex transmembrane subunit ASGP-2 from
rat (2
copies).
- Coagulation associated proteins C, Z (2 copies) and S (4 copies).
- Coagulation factors VII, IX, X and XII (2 copies).
- Complement C1r components (1 copy).
- Complement C1s components (1 copy).
- Complement-activating component of Ra-reactive factor (RARF) (1 copy).
- Complement components C6, C7, C8 alpha and beta chains, and C9 (1
copy).
- Crumbs, an epithelial development protein from Drosophila (29 copies).
- Epidermal growth factor precursor (7-9 copies).
- Exogastrula-inducing peptides A, C, D and X from sea urchin (1 copy).
- Fat protein, a Drosophila cadherin-related tumor suppressor (5
copies).
- Fetal antigen 1, a probable neuroendocrine differentiation protein,
which
is derived from the delta-like protein (DLK) (6 copies).
- Fibrillin 1 (47 copies) and fibrillin 2 (14 copies).
- Fibropellins IA (21 copies), IB (13 copies), IC (8 copies), II (4
copies)
and III
(8
copies) from the apical lamina - a component of
the
extracellular matrix - of sea urchin.
- Fibulin-1 and -2, two extracellular matrix proteins (9-11 copies).
- Giant-lens protein (protein Argos), which regulates cell
determination and
axon guidance in the Drosophila eye (1 copy).
- Growth factor-related proteins from various poxviruses (1 copy).
- Gurken protein, a Drosophila developmental protein (1 copy).
- Heparin-binding EGF-like growth factor (HB-EGF), transforming growth
factor
alpha (TGF-alpha), growth factors Lin-3 and Spitz (1 copy); the
precursors
are membrane proteins, the mature form is located extracellular.
- Hepatocyte growth factor (HGF) activator (EC 3.4.21.-) (2 copies).
- LDL and VLDL receptors, which bind and transport low-density
lipoproteins
and very low-density lipoproteins (3 copies).
- LDL receptor-related protein (LRP), which may act as a
receptor for
endocytosis of extracellular ligands (22 copies).
- Leucocyte antigen CD97 (3 copies), cell surface glycoprotein
EMR1 (6
copies) and cell surface glycoprotein F4/80 (7 copies).
- Limulus clotting factor C, which is involved in hemostasis and host
defense
mechanisms in japanese horseshoe crab (1 copy).
- Meprin A alpha subunit, a mammalian membrane-bound endopeptidase (1
copy).
- Milk fat globule-EGF factor 8 (MFG-E8) from mouse (2 copies).
- Neuregulin GGF-I and GGF-II, two human glial growth factors (1 copy).
- Neurexins from mammals (3 copies).
- Neurogenic proteins Notch, Xotch and the human homolog Tan-1 (36
copies),
Delta (9 copies) and the similar differentiation proteins Lag-2
from
Caenorhabditis elegans (2 copies), Serrate (14 copies) and Slit (7
copies)
from Drosophila.
- Nidogen (also called entactin), a basement membrane protein from
chordates
(2-6 copies).
- Ookinete surface proteins (24 Kd, 25 Kd, 28 Kd) from Plasmodium (4
copies).
- Pancreatic secretory granule membrane major glycoprotein GP2 (1 copy).
- Perforin, which lyses non-specifically a variety of target cells (1
copy).
- Proteoglycans aggrecan (1 copy), versican (2 copies), perlecan (at
least 2
copies), brevican (1 copy) and chondroitin sulfate proteoglycan (gene
PG-M)
(2 copies).
- Prostaglandin G/H synthase 1 and 2 (EC 1.14.99.1) (1 copy), which is
found
in the endoplasmatic reticulum.
- Reelin, an extracellular matrix protein that plays a role in
layering of
neurons in the cerebral cortex and cerebellum of mammals (8 copies).
- S1-5, a human extracellular protein whose ultimate activity is
probably
modulated by the environment (5 copies).
- Schwannoma-derived growth factor (SDGF), an autocrine growth factor as
well
as a mitogen for different target cells (1 copy).
- Selectins. Cell adhesion proteins such as ELAM-1 (E-selectin),
GMP-140
(P-selectin), or the lymph-node homing receptor (L-selectin) (1 copy).
- Serine/threonine-protein kinase homolog (gene Pro25) from
Arabidopsis
thaliana, which may
be
involved
in
assembly
or
regulation
of
light-harvesting chlorophyll A/B protein (2 copies).
- Sperm-egg fusion proteins PH-30 alpha and beta from guinea pig (1
copy).
- Stromal cell derived protein-1 (SCP-1) from mouse (6 copies).
- TDGF-1, human teratocarcinoma-derived growth factor 1 (1 copy).
- Tenascin (or neuronectin), an extracellular matrix protein from
mammals
(14.5 copies), chicken (TEN-A) (13.5 copies) and the related proteins
human
tenascin-X (18 copies) and tenascin-like proteins TEN-A and TEN-M
from
Drosophila (8 copies).
- Thrombomodulin
(fetomodulin), which together with thrombin
activates
protein C (6 copies).
- Thrombospondin 1, 2 (3 copies), 3 and 4 (4 copies), adhesive
glycoproteins
that mediate cell-to-cell and cell-to-matrix interactions.
- Thyroid peroxidase 1 and 2 (EC 2.7.10.1) from human (1 copy).
- Transforming growth factor beta-1 binding protein (TGF-B1-BP) (16
or 18
copies).
- Tyrosine-protein kinase receptors Tek and Tie (EC 2.7.1.112) (3
copies).
- Urokinase-type plasminogen activator (EC 3.4.21.73) (UPA) and
tissue
plasminogen activator (EC 3.4.21.68) (TPA) (1 copy).
- Uromodulin (Tamm-horsfall urinary glycoprotein) (THP) (3 copies).
- Vitamin K-dependent anticoagulants protein C (2 copies) and protein
S (4
copies) and the similar protein Z, a single-chain plasma
glycoprotein of
unknown function (2 copies).
- 63 Kd sperm flagellar membrane protein from sea urchin (3 copies).
- 93 Kd protein (gene nel) from chicken (5 copies).
- Hypothetical 337.6 Kd protein T20G5.3 from Caenorhabditis
elegans (44
copies).
The region between the 5th and 6th cysteine contains two conserved
glycines of
which at least one is present in most EGF-like domains. We
created two
patterns for this domain, each including one of these C-terminal
conserved
glycine residues. The profile we developed covers the whole domain.
-Consensus pattern: C-x-C-x(2)-{V}-x(2)-G-{C}-x-C
[The 3 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
but not those that have very long or very short regions between the
last 3
conserved cysteines of their EGF-like domain(s).
-Other sequence(s) detected in Swiss-Prot: 87 proteins, of which 27
can be
considered as possible candidates.
-Consensus pattern: C-x-C-x(2)-[GP]-[FYW]-x(4,8)-C
[The 3 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
but not those that have very long or very short regions between the
last 3
conserved cysteines of their EGF-like domain(s).
-Other sequence(s) detected in Swiss-Prot: 83 proteins, of which 49
can be
considered as possible candidates.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Note: The beta chain of the integrin family of proteins contains 2
cysteinerich repeats which were said to be dissimilar with the EGF pattern [7].
-Note: Laminin EGF-like repeats (see <PDOC00961>) are longer than the
average
EGF module and contain a further disulfide bond C-terminal of the
EGF-like
region. Perlecan and agrin contain both EGF-like domains and
laminin-type
EGF-like domains.
-Note: The pattern do not detect all of the repeats of proteins with
multiple
EGF-like repeats.
-Note: See <PDOC00913> for an entry describing specifically the subset of
EGFlike domains that bind calcium.
-Last update: April 2006 / Pattern revised.
[ 1] Davis C.G.
"The many faces of epidermal growth factor repeats."
New Biol. 2:410-419(1990).
PubMed=2288911
[ 2] Blomquist M.C., Hunt L.T., Barker W.C.
"Vaccinia virus 19-kilodalton protein: relationship to several
mammalian proteins, including two growth factors."
Proc. Natl. Acad. Sci. U.S.A. 81:7363-7367(1984).
PubMed=6334307
[ 3] Barker W.C., Johnson G.C., Hunt L.T., George D.G.
Protein Nucl. Acid Enz. 29:54-68(1986).
[ 4] Doolittle R.F., Feng D.F., Johnson M.S.
"Computer-based characterization of epidermal growth factor
precursor."
Nature 307:558-560(1984).
PubMed=6607417
[ 5] Appella E., Weber I.T., Blasi F.
"Structure and function of epidermal growth factor-like regions in
proteins."
FEBS Lett. 231:1-4(1988).
PubMed=3282918
[ 6] Campbell I.D., Bork P.
Curr. Opin. Struct. Biol. 3:385-392(1993).
[ 7] Tamkun J.W., DeSimone D.W., Fonda D., Patel R.S., Buck C.,
Horwitz A.F., Hynes R.O.
"Structure of integrin, a glycoprotein involved in the transmembrane
linkage between fibronectin and actin."
Cell 46:271-282(1986).
PubMed=3487386
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00022}
{PS00023; FN2_1}
{PS51092; FN2_2}
{BEGIN}
*********************************************************************
* Fibronectin type-II collagen-binding domain signature and profile *
*********************************************************************
Fibronectin is a plasma protein that binds cell surfaces and various
compounds
including collagen, fibrin, heparin, DNA, and actin. The major part
of the
sequence of fibronectin consists of the repetition of three types of
domains,
which are called type I, II, and III [1]. Type II domain
(FN2) is
approximately 40 residues long, contains four conserved cysteines
involved in
disulfide bonds and is part of the collagen-binding region of fibronectin
[2].
In fibronectin the minimal collagen binding region is formed by one
FN1 and
two FN2 domains. This suggests that the collagen-binding sites spans
multiple
modules.
A schematic representation of the position of the invariant residues
and the
topology of the disulfide bonds in FN2 domain is shown below.
+----------------------+
|
|
xxCxxPFx#xxxxxxxCxxxxxxxxWCxxxxx#xxx#x#Cxx
|
|
+-----------------------+
'C': conserved cysteine involved in a disulfide bond.
'#': large hydrophobic residue.
The 3D-structure of the FN2 domain has been determined (see <PDB:2FN2>)
[3].
The structure consists of two double-stranded anti-parallel betasheets,
oriented approximately perpendicular to each other, and two irregular
loops,
one separating the two beta-sheets and the other between the two
strands of
the second beta-sheet. The minimal collagen-binding region (FN1FN2-FN2)
adopts a hairpin structure where the conserved aromatic residues of FN2
form a
hydrophobic pocket which
polar
residues in collagen [4].
is thought to provide a binding site for non
Some proteins that contain an FN2 domain are listed below:
- Blood coagulation factor XII (Hageman factor) (1 copy).
- Bovine seminal plasma proteins PDC-109 (BSP-A1/A2) and BSP-A3 [5]
(twice).
- Cation-independent mannose-6-phosphate receptor (which is also the
insulinlike growth factor II receptor) [6] (1 copy).
- Mannose receptor of macrophages [7] (1 copy).
- 180 Kd secretory phospholipase A2 receptor (1 copy) [8].
- DEC-205 receptor (1 copy) [9].
72 Kd and 92 Kd type IV collagenases (EC 3.4.24.24) (MMP-2 and MMP-9)
[10]
(3 copies). Both metalloproteinases are strongly expressed in
malignant
tumors and have been attributed to metastasize. They both
degradate
collagen-IV thus facilitating penetration of the basement
membranes by
tumor cells.
- Hepatocyte growth factor activator [11] (1 copy).
Our consensus pattern spans the domain between the first and the
last
conserved cysteine. We also developed a profile that covers the whole
domain.
-Consensus pattern: C-x(2)-P-F-x-[FYWIV]-x(7)-C-x(8,10)-W-C-x(4)-[DNSR][FYW]x(3,5)-[FYW]-x-[FYWI]-C
[The 4 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: March 2005 / Text revised; profile added.
[ 1] Skorstengaard K., Jensen M.S., Sahl P., Petersen T.E., Magnusson S.
"Complete primary structure of bovine plasma fibronectin."
Eur. J. Biochem. 161:441-453(1986).
PubMed=3780752
[ 2] Forastieri H., Ingham K.C.
"Interaction of gelatin with a fluorescein-labeled 42-kDa
chymotryptic
fragment of fibronectin."
J. Biol. Chem. 260:10546-10550(1985).
PubMed=3928622
[ 3] Pickford A.R., Potts J.R., Bright J.R., Phan I., Campbell I.D.
"Solution structure of a type 2 module from fibronectin:
implications
for the structure and function of the gelatin-binding domain."
Structure 5:359-370(1997).
PubMed=9083105
[ 4] Pickford A.R., Smith S.P., Staunton D., Boyd J., Campbell I.D.
"The hairpin structure of the (6)F1(1)F2(2)F2 fragment from human
fibronectin enhances gelatin binding."
EMBO J. 20:1519-1529(2001).
PubMed=11285216; DOI=10.1093/emboj/20.7.1519
[ 5] Seidah N.G., Manjunath P., Rochemont J., Sairam M.R., Chretien M.
"Complete amino acid sequence of BSP-A3 from bovine seminal plasma.
Homology to PDC-109 and to the collagen-binding domain of
fibronectin."
Biochem. J. 243:195-203(1987).
PubMed=3606570
[ 6] Kornfeld S.
"Structure and function of the mannose 6-phosphate/insulinlike
growth
factor II receptors."
Annu. Rev. Biochem. 61:307-330(1992).
PubMed=1323236; DOI=10.1146/annurev.bi.61.070192.001515
[ 7] Taylor M.E., Conary J.T., Lennartz M.R., Stahl P.D., Drickamer K.
"Primary structure of the mannose receptor contains multiple motifs
resembling carbohydrate-recognition domains."
J. Biol. Chem. 265:12156-12162(1990).
PubMed=2373685
[ 8] Lambeau G., Ancian P., Barhanin J., Lazdunski M.
"Cloning and expression of a membrane receptor for secretory
phospholipases A2."
J. Biol. Chem. 269:1575-1578(1994).
PubMed=8294398
[ 9] Jiang W., Swiggard W.J., Heufler C., Peng M., Mirza A., Steinman
R.M.,
Nussenzweig M.C.
"The receptor DEC-205 expressed by dendritic cells and thymic
epithelial cells is involved in antigen processing."
Nature 375:151-155(1995).
PubMed=7753172; DOI=10.1038/375151a0
[10] Collier I.E., Wilhelm S.M., Eisen A.Z., Marmer B.L., Grant G.A.,
Seltzer J.L., Kronberger A., He C., Bauer E.A., Goldberg G.I.
J. Biol. Chem. 263:6579-6587(1988).
[11] Miyazawa K., Shimomura T., Kitamura A., Kondo J., Morimoto Y.,
Kitamura N.
J. Biol. Chem. 268:10024-10028(1993).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00023}
{PS00024; HEMOPEXIN}
{BEGIN}
******************************
* Hemopexin domain signature *
******************************
Hemopexin is a serum glycoprotein that binds heme and transports it
to the
liver for breakdown and iron recovery, after which the free hemopexin
returns
to the circulation. Structurally hemopexin consists of two similar
halves of
approximately two hundred amino acid residues connected by a
histidine-rich
hinge region. Each half is itself formed by the repetition of a basic
unit of
some 35 to 45 residues. Hemopexin-like domains have been found [1,2]
in two
other types of proteins:
- In vitronectin, a cell adhesion and spreading factor found in
plasma and
tissues. Vitronectin, like hemopexin, has two hemopexin-like domains.
- In most members of the matrix metalloproteinases family (matrixins)
(see
<PDOC00129>): MMP-1, MMP-2, MMP-3, MMP-8, MMP-9, MMP-10, MMP-11,
MMP-12,
MMP-13, MMP-14, MMP-15, MMP-16, MMP-17, MMP-18, MMP-19, MMP-20,
MMP-24,
and MMP-25. These zinc endoproteases have a single hemopexin-like
domain in
their C-terminal section.
It is suggested that the hemopexin domain facilitates binding to a
variety of
molecules and proteins. The signature pattern for this type of domain has
been
derived from the best conserved region which is located at the
beginning of
the second repeat.
-Consensus pattern: [LIFAT]-{IL}-x(2)-W-x(2,3)-[PE]-x-{VF}-[LIVMFY][DENQS][STA]-[AV]-[LIVMFY]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 11.
-Last update: April 2006 / Pattern revised.
[ 1] Hunt L.T., Barker W.C., Chen H.R.
Protein Seq. Data Anal. 1:21-26(1987).
[ 2] Stanley K.K.
"Homology with hemopexin suggests a possible scavenging function for
S-protein/vitronectin."
FEBS Lett. 199:249-253(1986).
PubMed=2422056
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00024}
{PS00025; P_TREFOIL_1}
{PS51448; P_TREFOIL_2}
{BEGIN}
***************************************************
* P-type ('Trefoil') domain signature and profile *
***************************************************
A cysteine-rich domain of approximately forty five
amino-acid
residues has
been found in some extracellular eukaryotic proteins [1,2,3,4,5]. This
domain
is known as either the 'P', 'trefoil' or 'TFF' domain. It
contains six
cysteines that are linked by three disulfide bonds in a 1-5, 2-4,
and 3-6
configuration. This
leads
to
a characteristic three leafed
structure
('trefoil'). The P-type domain is clearly composed of three looplike
regions.
The central core of the domain consists of a short two-stranded
antiparallel
beta-sheet, which is capped by an irregular loop and forms a central
hairpin
(loop 3). The beta-sheet is preceded by a short alpha-helix, with
majority of
the remainder of the domain contained in two loops, which lie on either
side
of the central hairpin (see <PDB:1E9T>) [6].
Proteins known to contain this domain are:
- Protein pS2 (TFF1), a protein secreted by the stomach mucosa, whose
gene is
induced by estrogen. The exact function of pS2 is not known. It
is a
protein of about 65 residues and it contains a copy of the 'P' domain.
- Spasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues
that
inhibits gastrointestinal motility and gastric acid secretion. SP
could be
a growth factor. It contains two tandem copies of the 'P' domain.
- Intestinal trefoil factor (ITF) (TFF3), an intestinal protein of
about 60
residues which may have a role in promoting cell migration. It
contains a
copy of the 'P' domain.
- Xenopus stomach proteins xP1 (one 'P' domain) and xP4 (four 'P'
domains).
- Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin)
and C.1
(FIM-C.1). These proteins could be involved in defense against
microbial
infections by protecting the epithelia from external environment.
They are
large proteins (400 residues for A.1; more than 660 residues for C.1
whose
sequence is only partially known) that contain multiple copies of
the 'P'
domain interspersed with tandem repeats of threonine-rich, Oglycosylated
regions.
- Xenopus skin protein xp2 (or APEG) a protein that contains two 'P'
domains
and which exists in two alternative spliced forms that differ
from the
inclusion of a N-terminal region of 320 residues that consist of 33
tandem
repeats of a G-[GE]-[AP](2,4)-A-E motif.
- Zona pellucida sperm-binding protein B (ZP-B) (also known as ZP-X in
rabbit
and ZP-3 alpha in pig). This protein is a receptor-like glycoprotein
whose
extracellular region contains a 'P' domain followed by a ZP domain
(see
<PDOC00577>).
- Intestinal sucrase-isomaltase
(EC 3.2.1.48 / EC 3.2.1.10), a
vertebrate
membrane-bound, multifunctional enzyme complex which hydrolyzes
sucrose,
maltose and isomaltose (see <PDOC00120>).
- Lysosomal alpha-glucosidase
(EC 3.2.1.20) (acid maltase), a
vertebrate
extracellular glycosidase (see <PDOC00120>).
Structurally the P-type domain can be represented as shown below.
+-------------------------+
|
+--------------+|
|
|
||
xxCxxxxxx+xxCG#xxxxxxxCxxxxCC#xxxxxxxxWC#xxxxxxxx
*************|*******
|
|
|
+----------------+
'C':
'#':
'+':
'*':
conserved cysteine involved in a disulfide bond.
large hydrophobic residue.
positively charged residue.
position of the pattern.
-Consensus pattern: [KRH]-x(2)-C-x-[FYPSTV]-x(3,4)-[ST]-x(3)-C-x(4)-C-C[FYWH]
[The 4 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Hoffmann W.; [email protected]
-Last update: May 2009 / Text revised; profile added.
[ 1] Hoffmann W., Hauser F.
"The P-domain or trefoil motif: a role in renewal and pathology of
mucous epithelia?"
Trends Biochem. Sci. 18:239-243(1993).
PubMed=8267796
[ 2] Otto B., Wright N.
"Trefoil peptides. Coming up clover."
Curr. Biol. 4:835-838(1994).
PubMed=7820556
[ 3] Bork P.
"A trefoil domain in the major rabbit zona pellucida protein."
Protein Sci. 2:669-670(1993).
PubMed=8518738
[ 4] Wright N.A., Hoffmann W., Otto W.R., Rio M.-C., Thim L.
"Rolling in the clover: trefoil factor family (TFF)-domain peptides,
cell migration and cancer."
FEBS Lett. 408:121-123(1997).
PubMed=9187350
[ 5] Sommer P., Blin N., Goett P.
"Tracing the evolutionary origin of the TFF-domain, an ancient motif
at mucous surfaces."
Gene 236:133-136(1999).
PubMed=10433974
[ 6] Lemercinier X., Muskett F.W., Cheeseman B., McIntosh P.B., Thim L.,
Carr M.D.
"High-resolution solution structure of human intestinal trefoil
factor
and functional insights from detailed structural comparisons with
the
other members of the trefoil family of mammalian cell motility
factors."
Biochemistry 40:9552-9559(2001).
PubMed=11583154
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00025}
{PS00026; CHIT_BIND_I_1}
{PS50941; CHIT_BIND_I_2}
{BEGIN}
******************************************************
* Chitin-binding type-1 domain signature and profile *
******************************************************
Many plants respond to pathogenic attack by producing defense proteins
that
are
capable
of
reversible binding to chitin, an Nacetylglucosamine
polysaccharide present in the cell wall of fungi and the
exoskeleton of
insects. Most of these chitin-binding proteins include a common
structural
motif of 30 to 43 residues organized around a conserved four-disulfide
core,
known as the chitin-binding domain type-1 [1]. The topological
arrangement of
the four disulfide bonds is shown in the following figure:
+-------------+
+----|------+
|
|
|
|
|
xxCgxxxxxxxCxxxxCCsxxgxCgxxxxxCxxxCxxxxC
|
******|*************
|
|
|
|
+----+
+--------------+
'C': conserved cysteine involved in a disulfide bond.
'*': position of the pattern.
The structure
(see
of several chitin-binding domain type-1 have been solved,
for example <PDB:1HEV>) [2]. The chitin-binding site is localized
in a
beta-hairpin loop formed by the second disulfide bridge. Conserved
serine and
aromatic residues associated with the hairpin-loop are essential
for the
chitin-binding activity [3]. The chitin-binding domain type-1 displays
some
structural
similarities
with
the
chitin-binding
domain type-2
(see
<PDOC50940>).
Some of
listed
below:
the
proteins
containing
a chitin-binding domain type-1 are
- A number of non-leguminous plant lectins. The best characterized of
these
lectins are the three highly homologous wheat germ agglutinins
(WGA-1, 2
and 3). WGA is an N-acetylglucosamine/N-acetylneuraminic acid
binding
lectin which structurally consists of a fourfold repetition of the 43
amino
acid domain. The same type of structure is found in a barley rootspecific
lectin as well as a rice lectin.
- Plants endochitinases (EC 3.2.1.14) from class IA (see
<PDOC00620>).
Endochitinases are enzymes that catalyze the hydrolysis of the
beta-1,4
linkages of N-acetyl glucosamine polymers of chitin. Plant
chitinases
function as a defense against chitin containing fungal pathogens.
Class IA
chitinases generally contain one copy of the chitin-binding domain at
their
N-terminal extremity. An exception is agglutinin/chitinase [4]
from the
stinging nettle Urtica dioica which contains two copies of the domain.
- Hevein, a wound-induced protein found in the latex of rubber trees.
- Win1 and win2, two wound-induced proteins from potato.
- Kluyveromyces lactis killer toxin alpha subunit [5]. The toxin
encoded by
the linear plasmid pGKL1 is composed of three subunits: alpha, beta,
and
gamma. The gamma subunit harbors toxin activity and inhibits
growth of
sensitive yeast strains in the G1 phase of the cell cycle; the
alpha
subunit, which is proteolytically processed from a larger precursor
that
also contains the beta subunit, is a chitinase (see <PDOC00839>).
The profile we developed covers the whole domain.
-Consensus pattern: C-x(4,5)-C-C-S-x(2)-G-x-C-G-x(3,4)-[FYW]-C
[The 5 C's are involved in disulfide bonds]
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Note: Hevein is a strong allergen which is implied in the allergy to
natural
rubber latex (NRL). NLR can be associated to hypersensitivity to
some
plant-derived foods (latex–fruit syndrome). An increasing number of
plant
sources, such as avocado, banana, chestnut, kiwi, peach, tomato,
potato and
bell pepper, have been associated with this syndrome. Several papers
[6,7]
have shown that allergen cross-reactivity is due to IgE antibodies
that
recognize structurally similar epitopes on different proteins
that are
closely related. One of these family is plant defence proteins
class I
chitinase containing a type-1 chitin-binding domain.
-Last update: December 2004 / Pattern and text revised.
[ 1] Wright H.T., Sandrasegaram G., Wright C.S.
"Evolution of a family of N-acetylglucosamine binding proteins
containing the disulfide-rich domain of wheat germ agglutinin."
J. Mol. Evol. 33:283-294(1991).
PubMed=1757999
[ 2] Andersen N.H., Cao B., Rodriguez-Romero A., Arreguin B.
"Hevein: NMR assignment and assessment of solution-state folding for
the agglutinin-toxin motif."
Biochemistry 32:1407-1422(1993).
PubMed=8431421
[ 3] Asensio J.L., Canada F.J., Siebert H.C., Laynez J., Poveda A.,
Nieto P.M., Soedjanaamadja U.M., Gabius H.J., Jimenez-Barbero J.
"Structural basis for chitin recognition by defense proteins: GlcNAc
residues are bound in a multivalent fashion by extended binding
sites
in hevein domains."
Chem. Biol. 7:529-543(2000).
PubMed=10903932
[ 4] Lerner D.R., Raikhel N.V.
"The gene for stinging nettle lectin (Urtica dioica agglutinin)
encodes both a lectin and a chitinase."
J. Biol. Chem. 267:11085-11091(1992).
PubMed=1375935
[ 5] Butler A.R., O'Donnell R.W., Martin V.J., Gooday G.W., Stark M.J.R.
"Kluyveromyces lactis toxin has an essential chitinase activity."
Eur. J. Biochem. 199:483-488(1991).
PubMed=2070799
[ 6] Sowka S., Hsieh L.S., Krebitz M., Akasawa A., Martin B.M.,
Starrett D., Peterbauer C.K., Scheiner O., Breiteneder H.
"Identification and cloning of prs a 1, a 32-kDa endochitinase and
major allergen of avocado, and its expression in the yeast Pichia
pastoris."
J. Biol. Chem. 273:28091-28097(1998).
PubMed=9774427
[ 7] Wagner S., Breiteneder H.
"The latex-fruit syndrome."
Biochem. Soc. Trans. 30:935-940(2002).
PubMed=12440950;
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00026}
{PS51390; WAP}
{BEGIN}
*************************************************
* WAP-type 'four-disulfide core' domain profile *
*************************************************
The 'four-disulfide core' or WAP domain comprises 8 cysteine residues
involved
in disulfide bonds in a conserved arrangement [1]. One or more of
these
domains
occur
in
whey
acidic
protein
(WAP),
antileukoproteinase,
elastase-inhibitor proteins and other structurally related proteins
which are
listed below.
- Whey acidic protein (WAP). WAP is a major component of milk whey
whose
function might be that of a protease inhibitor. WAP consists
of two
'four-disulfide core' domains in most mammals.
- Antileukoproteinase 1 (HUSI), a mucous fluid serine proteinase
inhibitor.
HUSI consists of two 'four-disulfide core' domains.
- Elafin, an elastase-specific inhibitor from human skin [2,3].
- Sodium/potassium ATPase inhibitors SPAI-1, -2, and -3 from pig [4].
- Chelonianin, a protease inhibitor from the eggs of red sea turtle.
This
inhibitor consists of two domains: an N-terminal domain which
inhibits
trypsin and belongs to the BPTI/Kunitz family of inhibitors,
and a
C-terminal domain which inhibits subtilisin and is a 'four-disulfide
core
domain'.
- Extracellular
peptidase
inhibitor (WDNM1 protein), involved in
the
metastatic potential of adenocarcinomas in rats.
- Caltrin-like protein 2 from guinea pig, which inhibits calcium
transport
into spermatozoa.
- Kallmann syndrome protein (Anosmin-1 or KALIG-1) [5,6]. This
secreted
protein may be a adhesion-like molecule with anti-protease
activity. It
contains a 'four-disulfide core domain' in its N-terminal part.
- Whey acidic protein (WAP) from the tammar wallaby, which consists of
three
'four-disulfide core' domains [7].
- Waprins from snake venom, such as omwaprin from Oxyuranus
microlepidotus
[8] which has antibacterial activity against Gram-positive bacteria.
The following schematic representation shows the position of the
conserved
cysteines that form the 'four-disulfide core' WAP domain (see
<PDB:2REL>).
+---------------------+
|
+-----------+
|
|
|
|
|
xxxxxxxCPxxxxxxxxxCxxxxCxxxxxCxxxxxCCxxxCxxxCxxxx
|
|
|
|
|
+--------------+
|
|
+----------------------------+
<------------------50-residues------------------>
'C': conserved cysteine involved in a disulfide bond.
We developed a profile that
WAP-type
'four-disulfide core' domain.
covers
the
whole
structure of the
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Claverie J.-M.; [email protected]
-Last update: July 2008 / Pattern removed, profile added and text
revised.
[ 1] Hennighausen L.G., Sippel A.E.
"Mouse whey acidic protein is a novel member of the family of
'four-disulfide core' proteins."
Nucleic Acids Res. 10:2677-2684(1982).
PubMed=6896234
[ 2] Wiedow O., Schroeder J.-M., Gregory H., Young J.A., Christophers E.
"Elafin: an elastase-specific inhibitor of human skin. Purification,
characterization, and complete amino acid sequence."
J. Biol. Chem. 265:14791-14795(1990).
PubMed=2394696
[ 3] Francart C., Dauchez M., Alix A.J., Lippens G.
"Solution structure of R-elafin, a specific inhibitor of elastase."
J. Mol. Biol. 268:666-677(1997).
PubMed=9171290; DOI=10.1006/jmbi.1997.0983
[ 4] Araki K., Kuwada M., Ito O., Kuroki J., Tachibana S.
"Four disulfide bonds' allocation of Na+, K(+)-ATPase inhibitor
(SPAI)."
Biochem. Biophys. Res. Commun. 172:42-46(1990).
PubMed=2171523
[ 5] Legouis R., Hardelin J.-P., Levilliers J., Claverie J.-M., Compain
S.,
Wunderle V., Millasseau P., Le Paslier D., Cohen D., Caterina D.
Bougueleret L., Delemarre-Van de Waal H., Lutfalla G., Weissenbach
J.,
Petit C.
"The candidate gene for the X-linked Kallmann syndrome encodes a
protein related to adhesion molecules."
Cell 67:423-435(1991).
PubMed=1913827
[ 6] Hu Y., Sun Z., Eaton J.T., Bouloux P.M., Perkins S.J.
"Extended and flexible domain solution structure of the
extracellular
matrix protein anosmin-1 by X-ray scattering, analytical
ultracentrifugation and constrained modelling."
J. Mol. Biol. 350:553-570(2005).
PubMed=15949815; DOI=10.1016/j.jmb.2005.04.031
[ 7] Simpson K.J., Ranganathan S., Fisher J.A., Janssens P.A., Shaw D.C.,
Nicholas K.R.
"The gene for a novel member of the whey acidic protein family
encodes
three four-disulfide core domains and is asynchronously expressed
during lactation."
J. Biol. Chem. 275:23074-23081(2000).
PubMed=10801834; DOI=10.1074/jbc.M002161200
[ 8] Nair D.G., Fry B.G., Alewood P., Kumar P.P., Kini R.M.
"Antimicrobial activity of omwaprin, a new member of the waprin
family
of snake venom proteins."
Biochem. J. 402:93-104(2007).
PubMed=17044815; DOI=10.1042/BJ20060318
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00027}
{PS00027; HOMEOBOX_1}
{PS50071; HOMEOBOX_2}
{BEGIN}
*******************************************
* 'Homeobox' domain signature and profile *
*******************************************
The 'homeobox' is a protein domain of 60 amino acids [1 to 5,E1]
first
identified in a
number of Drosophila homeotic and segmentation
proteins. It
has since been found to be extremely well conserved in many other
animals,
including vertebrates. This domain binds DNA through a helix-turn-helix
type
of structure. Some of the proteins which contain a homeobox domain
play an
important role in development. Most of these proteins are known
to be
sequence specific DNA-binding transcription factors. The homeobox
domain has
also been found to be very similar to a region of the yeast mating
type
proteins. These are sequence-specific DNA-binding proteins that act as
master
switches in yeast differentiation by controlling gene expression in a
cell
type-specific fashion.
A schematic representation of the homeobox domain is shown below.
The
helix-turn-helix region is shown by the symbols 'H' (for helix), and 't'
(for
turn).
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx
|
|
|
|
|
|
|
1
10
20
30
40
50
60
The pattern we developed to detect homeobox sequences
long and
spans positions 34 to 57 of the homeobox domain.
is 24 residues
-Consensus pattern: [LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-{Y}-x(2){L}[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]x(5)[RKNAIMW]
-Sequences known to belong to this class detected by the pattern: ALL,
except
for 10 sequences.
-Other sequence(s) detected in Swiss-Prot: 9.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Note: Proteins which contain a homeobox domain can be classified,
on the
basis of their sequence characteristics, into various subfamilies. We
have
developed specific patterns for conserved elements of the
antennapedia,
engrailed and paired families.
-Expert(s) to contact by email:
Buerglin T.R.; [email protected]
-Last update: April 2006 / Pattern revised.
[ 1] Gehring W.J.
(In) Guidebook to the homebox genes, Duboule D., Ed., pp1-10,
Oxford University Press, Oxford, (1994).
[ 2] Buerglin T.R.
(In) Guidebook to the homebox genes, Duboule D., Ed., pp25-72,
Oxford University Press, Oxford, (1994).
[ 3] Gehring W.J.
Trends Biochem. Sci. 17:277-280(1992).
[ 4] Gehring W.J., Hiromi Y.
"Homeotic genes and the homeobox."
Annu. Rev. Genet. 20:147-173(1986).
PubMed=2880555; DOI=10.1146/annurev.ge.20.120186.001051
[ 5] Schofield P.N.
Trends Neurosci. 10:3-6(1987).
[E1] http://www.biosci.ki.se/groups/tbu/homeo.html
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00028}
{PS00028; ZINC_FINGER_C2H2_1}
{PS50157; ZINC_FINGER_C2H2_2}
{BEGIN}
******************************************************
* Zinc finger C2H2-type domain signature and profile *
******************************************************
'Zinc finger' domains [1-5] are nucleic acid-binding protein structures
first
identified in the Xenopus transcription factor TFIIIA.
These domains
have
since been found in numerous nucleic acid-binding proteins.
A zinc
finger
domain is composed of 25 to 30 amino-acid residues. There are two
cysteine or
histidine residues at both extremities of the domain, which are
involved in
the tetrahedral coordination of a zinc atom. It has been proposed that
such a
domain interacts with about five nucleotides. A schematic representation
of a
zinc finger domain is shown below:
x
x
x
x
x
x
x
x
H
x
x
x
x
C
x
\ /
Zn
x
x
x x x x x
/
C
x
x
x
\
H
x x x x x
Many classes of zinc fingers are characterized according to the
number and
positions of the histidine and cysteine residues involved in the zinc
atom
coordination. In the first class to be characterized, called C2H2, the
first
pair of zinc coordinating residues are cysteines, while the second
pair are
histidines. A number of experimental reports have demonstrated the
zincdependent DNA or RNA binding property of some members of this class.
Some of the proteins known to include C2H2-type zinc fingers are listed
below.
We have indicated, between brackets, the number of zinc finger regions
found
in each of these proteins; a '+' symbol indicates that only partial
sequence
data is available and that additional finger domains may be present.
- Saccharomyces cerevisiae: ACE2 (3), ADR1 (2), AZF1 (4), FZF1 (5), MIG1
(2),
MSN2 (2), MSN4 (2), RGM1 (2), RIM1 (3), RME1 (3), SFP1 (2), SSL1 (1),
STP1 (3), SWI5 (3), VAC1 (1) and ZMS1 (2).
- Emericella nidulans: brlA (2), creA (2).
- Drosophila: AEF-1 (4), Cf2 (7), ci-D (5), Disconnected (2), Escargot
(5),
Glass (5), Hunchback (6), Kruppel (5), Kruppel-H (4+), Odd-skipped
(4),
Odd-paired (4), Pep (3), Snail (5), Spalt-major (7), Serependity locus
beta
(6), delta (7), h-1 (8), Suppressor of hairy wing su(Hw) (12),
Suppressor
of variegation suvar(3)7 (5), Teashirt (3) and Tramtrack (2).
- Xenopus: transcription factor TFIIIA (9), p43 from RNP particle (9),
Xfin
(37 !!), Xsna (5), gastrula XlcGF5.1 to XlcGF71.1 (from 4+ to 11+),
Oocyte
XlcOF2 to XlcOF22 (from 7 to 12).
- Mammalian: basonuclin (6),
BCL-6/LAZ-3 (6),
erythroid
krueppel-like
transcription factor (3), transcription factors Sp1 (3), Sp2 (3),
Sp3 (3)
and Sp(4) 3, transcriptional repressor YY1 (4),
Wilms' tumor protein
(4),
EGR1/Krox24 (3), EGR2/Krox20 (3), EGR3/Pilot (3), EGR4/AT133 (4),
Evi-1
(10), GLI1 (5), GLI2 (4+), GLI3 (3+), HIV-EP1/ZNF40 (4), HIV-EP2
(2), KR1
(9+), KR2 (9), KR3 (15+), KR4 (14+), KR5 (11+), HF.12 (6+), REX-1
(4), ZfX
(13), ZfY (13), Zfp-35 (18), ZNF7 (15), ZNF8 (7), ZNF35 (10),
ZNF42/MZF-1
(13), ZNF43 (22), ZNF46/Kup (2), ZNF76 (7), ZNF91 (36), ZNF133 (3).
In addition to the conserved zinc ligand residues it has been shown [6]
that a
number of other positions are also important for the structural
integrity of
the C2H2 zinc fingers. The best conserved position is found four
residues
after the second cysteine; it is generally an aromatic or aliphatic
residue. A
profile was also developed that spans the whole domain.
-Consensus pattern: C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H
[The 2 C's and the 2 H's are zinc ligands]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 42.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: 2.
-Note: In proteins that include many copies of the C2H2 zinc finger
domain,
incomplete or degenerate copies of the domain are frequently
found. The
former are generally found at the extremity of the zinc finger
region(s); the
latter have typically lost one or more of the zinc-coordinating
residues or
are interrupted by insertions or deletions. Our pattern does not
detect any
of these finger domains.
-Expert(s) to contact by email:
Becker K.G.; [email protected]
-Last update: May 2004 / Text revised.
[ 1] Klug A., Rhodes D.
Trends Biochem. Sci. 12:464-469(1987).
[ 2] Evans R.M., Hollenberg S.M.
"Zinc fingers: gilt by association."
Cell 52:1-3(1988).
PubMed=3125980
[ 3] Payre F., Vincent A.
"Finger proteins and DNA-specific recognition: distinct patterns of
conserved amino acids suggest different evolutionary modes."
FEBS Lett. 234:245-250(1988).
PubMed=3292287
[ 4] Miller J., McLachlan A.D., Klug A.
"Repetitive zinc-binding domains in the protein transcription factor
IIIA from Xenopus oocytes."
EMBO J. 4:1609-1614(1985).
PubMed=4040853
[ 5] Berg J.M.
"Proposed structure for the zinc-binding domains from transcription
factor IIIA and related proteins."
Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988).
PubMed=3124104
[ 6] Rosenfeld R., Margalit H.
"Zinc fingers: conserved properties that can distinguish between
spurious and actual DNA-binding motifs."
J. Biomol. Struct. Dyn. 11:557-570(1993).
PubMed=8129873
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00029}
{PS00029; LEUCINE_ZIPPER}
{BEGIN}
**************************
* Leucine zipper pattern *
**************************
A structure, referred to as the 'leucine zipper' [1,2], has been
proposed to
explain how some eukaryotic gene regulatory proteins work. The leucine
zipper
consist of a periodic repetition of leucine residues at every
seventh
position over a distance covering eight helical turns. The segments
containing
these periodic arrays of leucine residues seem to exist in an alphahelical
conformation. The leucine side chains extending from one alpha-helix
interact
with those from a similar alpha helix of a second polypeptide,
facilitating
dimerization; the structure formed by cooperation of these two regions
forms a
coiled coil [3]. The leucine zipper pattern is present in many gene
regulatory
proteins, such as:
- The
- The
ATFs).
- The
- The
- The
- The
- The
CCATT-box and enhancer binding protein (C/EBP).
cAMP response element (CRE) binding proteins (CREB, CRE-BP1,
Jun/AP1 family of transcription factors.
yeast general control protein GCN4.
fos oncogene, and the fos-related proteins fra-1 and fos B.
C-myc, L-myc and N-myc oncogenes.
octamer-binding transcription factor 2 (Oct-2/OTF-2).
-Consensus pattern: L-x(6)-L-x(6)-L-x(6)-L
-Sequences known to belong to this class detected by the pattern: All
those
mentioned in the original paper, with the exception of L-myc which has
a Met
instead of the second Leu.
-Other sequence(s) detected in Swiss-Prot: some 600 other sequences from
every
category of protein families.
-Note: As this is far from being a specific pattern you should be
cautious in
citing the presence of such pattern in a protein if it has not been
shown to
be a nuclear DNA-binding protein.
-Last update: December 1992 / Text revised.
[ 1] Landschulz W.H., Johnson P.F., McKnight S.L.
"The leucine zipper: a hypothetical structure common to a new class
of
DNA binding proteins."
Science 240:1759-1764(1988).
PubMed=3289117
[ 2] Busch S.J., Sassone-Corsi P.
"Dimers, leucine zippers and DNA-binding domains."
Trends Genet. 6:36-40(1990).
PubMed=2186528
[ 3] O'Shea E.K., Rutkowski R., Kim P.S.
Science 243:538-542(1989).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00030}
{PS50102; RRM}
{BEGIN}
********************************************
* Eukaryotic RNA recognition motif profile *
********************************************
Many eukaryotic proteins that are known or supposed to bind singlestranded
RNA contain one or more copies of a putative RNA-binding domain of
about 90
amino acids [1,2]. This domain is known as the RNA recognition motif
(RRM).
This region has been found in the following proteins:
** Heterogeneous nuclear ribonucleoproteins **
- hnRNP A1 (helix destabilizing protein) (twice).
- hnRNP A2/B1 (twice).
- hnRNP C (C1/C2) (once).
- hnRNP E (UP2) (at least once).
- hnRNP G (once).
** Small nuclear ribonucleoproteins **
- U1 snRNP 70 Kd (once).
- U1 snRNP A (once).
- U2 snRNP B'' (once).
** Pre-RNA and mRNA associated proteins **
- Protein synthesis initiation factor 4B (eIF-4B) [3], a protein
essential
for the binding of mRNA to ribosomes (once).
- Nucleolin (4 times).
- Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once).
- Yeast protein NSR1 (twice). NSR1 is involved in pre-rRNA
processing; it
specifically binds nuclear localization sequences.
- Poly(A) binding protein (PABP) (4 times).
** Others **
- Drosophila sex determination protein Sex-lethal (Sxl) (twice).
- Drosophila sex determination protein Transformer-2 (Tra-2) (once).
- Drosophila 'elav' protein (3 times), which is probably involved in
the RNA
metabolism of neurons.
- Human paraneoplastic encephalomyelitis antigen HuD (3 times) [4],
which is
highly similar to elav and which may play a role in neuronspecific RNA
processing.
- Drosophila 'bicoid' protein (once) [5], a segment-polarity homeobox
protein
that may also bind to specific mRNAs.
- La antigen (once), a protein which may play a role in the
transcription of
RNA polymerase III.
- The 60 Kd Ro protein (once), a putative RNP complex protein.
- A maize protein induced by abscisic acid in response to water stress,
which
seems to be a RNA-binding protein.
- Three tobacco proteins, located in the chloroplast [6], which
may be
involved in splicing and/or processing of chloroplast RNAs (twice).
- X16 [7], a mammalian protein which may be involved in RNA
processing in
relation with cellular proliferation and/or maturation.
- Insulin-induced growth response protein Cl-4 from rat (twice).
- Nucleolysins TIA-1 and TIAR (3 times) [8] which possesses
nucleolytic
activity against cytotoxic lymphocyte target cells. may be
involved in
apoptosis.
- Yeast RNA15 protein, which plays a role in mRNA stability and/or
poly-(A)
tail length [9].
Inside the RRM there are two regions which are highly conserved. The
first one
is a hydrophobic segment of six residues (which is called the RNP-2
motif),
the second one is an octapeptide motif (which is called RNP-1 or RNPCS). The
position of both motifs in the domain is shown in the following
schematic
representation:
xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx########xxxxxxxxxxxxxxxxxxxxxxx
xx
RNP-2
RNP-1
We have developed a profile that spans the RRM domain.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: August 2004 / Text revised; pattern deleted.
[ 1] Bandziulis R.J., Swanson M.S., Dreyfuss G.
"RNA-binding proteins as developmental regulators."
Genes Dev. 3:431-437(1989).
PubMed=2470643
[ 2] Dreyfuss G., Swanson M.S., Pinol-Roma S.
"Heterogeneous nuclear ribonucleoprotein particles and the pathway
of
mRNA formation."
Trends Biochem. Sci. 13:86-91(1988).
PubMed=3072706
[ 3] Milburn S.C., Hershey J.W.B., Davies M.V., Kelleher K., Kaufman R.J.
"Cloning and expression of eukaryotic initiation factor 4B cDNA:
sequence determination identifies a common RNA recognition motif."
EMBO J. 9:2783-2790(1990).
PubMed=2390971
[ 4] Szabo A., Dalmau J., Manley G., Rosenfeld M., Wong E., Henson J.,
Posner J.B., Furneaux H.M.
"HuD, a paraneoplastic encephalomyelitis antigen, contains RNAbinding
domains and is homologous to Elav and Sex-lethal."
Cell 67:325-333(1991).
PubMed=1655278
[ 5] Rebagliati M.
"An RNA recognition motif in the bicoid protein."
Cell 58:231-232(1989).
PubMed=2752425
[ 6] Li Y.Q., Sugiura M.
"Three distinct ribonucleoproteins from tobacco chloroplasts: each
contains a unique amino terminal acidic domain and two
ribonucleoprotein consensus motifs."
EMBO J. 9:3059-3066(1990).
PubMed=1698606
[ 7] Ayane M., Preuss U., Koehler G., Nielsen P.J.
"A differentially expressed murine RNA encoding a protein with
similarities to two types of nucleic acid binding motifs."
Nucleic Acids Res. 19:1273-1278(1991).
PubMed=2030943
[ 8] Kawakami A., Tian Q., Duan X., Streuli M., Schlossman S.F., Anderson
P.
"Identification and functional characterization of a TIA-1-related
nucleolysin."
Proc. Natl. Acad. Sci. U.S.A. 89:8681-8685(1992).
PubMed=1326761
[ 9] Minvielle-Sebastia L., Winsor B., Bonneaud N., Lacroute F.
Mol. Cell. Biol. 11:3075-3087(1991).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00031}
{PS00031; NUCLEAR_REC_DBD_1}
{PS51030; NUCLEAR_REC_DBD_2}
{BEGIN}
**********************************************************************
* Nuclear hormone receptors DNA-binding domain signature and profile *
**********************************************************************
Nuclear hormone receptors are ligand-activated transcription factors
that
regulate gene expression by interacting with specific DNA sequences
upstream
of their target genes. In vertebrates, these proteins regulate
diverse
biological processes such as pattern formation, cellular
differentiation and
homeostasis [1 to 6].
Classical nuclear hormone receptors contain two conserved regions, the
hormone
binding domain and a DNA-binding domain (DBD) that is composed of two
C4-type
zinc fingers. The DBD is responsible for targeting the receptors to
their
hormone response elements (HRE). It binds as a dimer with each
monomer
recognizing a six base pair sequence of DNA. The vast majority of
targets
contain the same 5'-AGGTCA-3' consensus sequence [7]. In some cases a
less
conserved C-terminal extension of the core DBD confers the DNA
selectivity
[8].
The two zinc fingers fold to form a single structural domain (see
<PDB:1HCQ>)
[9,10]. The structure consists of two helices perpendicular to each
other. A
zinc ion, coordinated by four conserved cysteines, holds the base of a
loop at
the N terminus of each helix. The helix of each monomer makes
sequence
specific contacts in the major groove of the DNA.
Proteins known
domain are
listed below:
to
contain a nuclear hormone receptor DNA-binding
- Androgen receptor (AR).
- Estrogen receptor (ER).
- Glucocorticoid receptor (GR).
- Mineralocorticoid receptor (MR).
- Progesterone receptor (PR).
- Retinoic acid receptors (RARs and RXRs).
- Thyroid hormone receptors (TR) alpha and beta.
- The avian erythroblastosis virus oncogene v-erbA, derived from a
cellular
thyroid hormone receptor.
- Vitamin D3 receptor (VDR).
- Insects ecdysone receptor (EcR).
- COUP transcription factor (also known as ear-3), and its
Drosophila
homolog seven-up (svp).
- Hepatocyte nuclear factor 4 (HNF-4), which binds to DNA sites
required for
the transcription of the genes for alpha-1-antitrypsin, apolipoprotein
CIII
and transthyretin.
- Ad4BP, a protein that binds to the Ad4 site found in the promoter
region of
steroidogenic P450 genes.
- Apolipoprotein
AI
regulatory
protein-1
(ARP-1), required for
the
transcription of apolipoprotein AI.
- Peroxisome proliferator activated receptors (PPAR), transcription
factors
specifically activated
by peroxisome proliferators. They control
the
peroxisomal beta-oxidation pathway of fatty acids by activating the
gene
for acyl-CoA oxidase.
- Drosophila protein knirps (kni), a zygotic gap protein
required for
abdominal segmentation of the Drosophila embryo.
- Drosophila protein ultraspiracle (usp) (or chorion factor 1), which
binds
to the promoter region of s15 chorion gene.
- Human estrogen receptor related genes 1 and 2 (err1 and err2).
- Human erbA related gene 2 (ear-2).
- Mammalian NGFI-B (NAK1, nur/77, N10).
- Mammalian NOT/nurR1/RNR-1.
- Drosophila protein embryonic gonad (egon).
- Drosophila knirps-related protein (knrl).
- Drosophila protein tailless (tll).
- Drosophila 20-oh-ecdysone regulated protein E75.
- Insects Hr3.
- Insects Hr38.
- Caenorhabditis elegans cnr-8, cnr-14, and odr-7
- Caenorhabditis elegans hypothetical proteins B0280.8, EO2H1.7 and
K06A1.4.
As a signature pattern for this family of proteins, we took the most
conserved
residues, the first 27, of the DNA-binding domain. We also developed a
profile
that spans the whole domain.
-Consensus pattern: C-x(2)-C-x(1,2)-[DENAVSPHKQT]-x(5,6)-[HNY]-[FY]-x(4)Cx(2)-C-x(2)-F(2)-x-R
[The 4 C's are zinc ligands]
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: April 2006 / Pattern revised.
[ 1] Gronemeyer H., Laudet V.
Protein Prof. 2:1173-1308(1995).
[ 2] Evans R.M.
"The steroid and thyroid hormone receptor superfamily."
Science 240:889-895(1988).
PubMed=3283939
[ 3] Gehring U.
Trends Biochem. Sci. 12:399-402(1987).
[ 4] Beato M.
"Gene regulation by steroid hormones."
Cell 56:335-344(1989).
PubMed=2644044
[ 5] Segraves W.A.
"Something old, some things new: the steroid receptor superfamily in
Drosophila."
[ 6]
[ 7]
[ 8]
[ 9]
[10]
Cell 67:225-228(1991).
PubMed=1913821
Laudet V., Haenni C., Coll J., Catzeflis F., Stehelin D.
"Evolution of the nuclear receptor gene superfamily."
EMBO J. 11:1003-1013(1992).
PubMed=1312460
Stunnenberg H.G.
"Mechanisms of transactivation by retinoic acid receptors."
BioEssays 15:309-315(1993).
PubMed=8393666
Zhao Q., Khorasanizadeh S., Miyoshi Y., Lazar M.A., Rastinejad F.
"Structural elements of an orphan nuclear receptor-DNA complex."
Mol. Cell 1:849-861(1998).
PubMed=9660968
Schwabe J.W.R., Neuhaus D., Rhodes D.
"Solution structure of the DNA-binding domain of the oestrogen
receptor."
Nature 348:458-461(1990).
PubMed=2247153; DOI=10.1038/348458a0
Schwabe J.W.R., Chapman L., Finch J.T., Rhodes D.
Cell 75:567-578(1993).
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00032}
{PS00032; ANTENNAPEDIA}
{BEGIN}
**************************************************
* 'Homeobox' antennapedia-type protein signature *
**************************************************
The homeotic Hox proteins are sequence-specific transcription factors.
They
are part of a developmental regulatory system that provides cells
with
specific positional identities on the anterior-posterior (A-P) axis [1].
The
hox proteins contain a 'homeobox' domain. In Drosophila and other
insects,
there are eight different Hox genes that are encoded in two gene
complexes,
ANT-C and BX-C. In vertebrates there are 38 genes organized in four
complexes.
In six of the eight Drosophila Hox genes the homeobox domain is highly
similar
and a conserved hexapeptide is found five to sixteen amino acids
upstream of
the homeobox domain. The six Drosophila proteins that belong to this
group are
antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia
(pb),
sex combs reduced (scr) and ultrabithorax (ubx) and are collectively
known as
the 'antennapedia' subfamily.
In vertebrates the corresponding Hox genes are known [2] as Hox-A2,
A3, A4,
A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8,
Hox-D1,
D3, D4 and D8.
Caenorhabditis elegans lin-39 and mab-5 are also members of the
'antennapedia'
subfamily.
As a signature pattern for this subfamily of
used
the conserved hexapeptide.
homeobox proteins, we have
-Consensus pattern: [LIVMFE]-[FY]-P-W-M-[KRQTA]
-Sequences known to belong to this class detected by the pattern: ALL,
except
for 6 sequences.
-Other sequence(s) detected in Swiss-Prot: 3.
-Note: Arg and Lys are most frequently found in the last position
of the
hexapeptide; other amino acids are found in only a few cases.
-Last update: June 1994 / Text revised.
[ 1] McGinnis W., Krumlauf R.
"Homeobox genes and axial patterning."
Cell 68:283-302(1992).
PubMed=1346368
[ 2] Scott M.P.
"Vertebrate homeobox gene nomenclature."
Cell 71:551-553(1992).
PubMed=1358459
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00033}
{PS00033; ENGRAILED}
{BEGIN}
***********************************************
* 'Homeobox' engrailed-type protein signature *
***********************************************
Most proteins which contain a 'homeobox' domain can be classified
[1,2], on
the basis of their sequence characteristics, in three subfamilies:
engrailed,
antennapedia and paired.
Proteins currently known to belong to the
engrailed
subfamily are:
- Drosophila segmentation polarity protein engrailed (en) which
specifies the
body segmentation pattern and is required for the development
of the
central nervous system.
- Drosophila invected protein (inv).
- Silk moth proteins engrailed and invected, which may be involved
in the
compartmentalization of the silk gland.
- Honeybee E30 and E60.
- Grasshopper (Schistocerca americana) G-En.
- Mammalian and birds En-1 and En-2.
- Zebrafish Eng-1, -2 and -3.
- Sea urchin (Tripneusteas gratilla) SU-HB-en.
- Leech (Helobdella triserialis) Ht-En.
- Caenorhabditis elegans ceh-16.
Engrailed homeobox proteins are characterized by the presence of a
conserved
region of some 20 amino-acid residues located at the C-terminal
of the
'homeobox' domain. As a signature pattern for this subfamily of
proteins, we
have used a stretch of eight perfectly conserved residues in this region.
-Consensus pattern: L-M-A-[EQ]-G-L-Y-N
-Sequences known to belong to this class detected by the pattern: ALL,
except
for ceh-16.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: July 1999 / Pattern and text revised.
[ 1] Scott M.P., Tamkun J.W., Hartzell G.W. III
"The structure and function of the homeodomain."
Biochim. Biophys. Acta 989:25-48(1989).
PubMed=2568852
[ 2] Gehring W.J.
"Homeo boxes in the study of development."
Science 236:1245-1252(1987).
PubMed=2884726
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00034}
{PS00034; PAIRED_1}
{PS51057; PAIRED_2}
{BEGIN}
***************************************
* Paired domain signature and profile *
***************************************
The paired domain is a ~126 amino acid DNA-binding domain, which is
found in
eukaryotic transcription regulatory proteins involved in
embryogenesis. The
domain was originally described as the 'paired box' in the Drosophila
protein
paired (prd) [1,2]. The paired domain is generally located in the Nterminal
part. An octapeptide [3] and/or a homeodomain (see <PDOC00027>) can
occur
C-terminal to the paired domain, as well as a Pro-Ser-Thr-rich Cterminus.
Paired domain proteins can function as transcription repressors or
activators.
The paired domain contains three subdomains, which show functional
differences
in DNA-binding.
The crystal structures of prd and Pax proteins show that the DNA-bound
paired
domain is bipartite, consisting of an N-terminal subdomain (PAI or NTD)
and a
C-terminal subdomain (RED or CTD), connected by a linker (see
<PDB:1K78>). PAI
and RED each form a three-helical fold, with the most C-terminal
helices
comprising a helix-turn-helix (HTH) motif that binds the DNA major
groove. In
addition,
the
PAI
subdomain encompasses an N-terminal beta-turn
and
beta-hairpin, also named 'wing', participating in DNA-binding. The
linker can
bind into the DNA minor groove. Different Pax proteins and their
alternatively
spliced isoforms use different (sub)domains for DNA-binding to
mediate the
specificity of sequence recognition [4,5].
Some proteins known to contain a paired domain:
- Drosophila paired (prd), a segmentation pair-rule class protein.
- Drosophila gooseberry proximal (gsb-p) and gooseberry distal
(gsb-d),
segmentation polarity class proteins.
- Drosophila Pox-meso and Pox-neuro proteins.
The Pax proteins:
- Mammalian protein Pax1, which may play a role in the formation of
segmented
structures in the embryo. In mouse, mutations in Pax1 produce the
undulated
phenotype, characterized
by vertebral malformations along the
entire
rostro-caudal axis.
- Mammalian protein Pax2, a probable transcription factor that may
have a
role in kidney cell differentiation.
- Mammalian protein Pax3. Pax3 is expressed during early
neurogenesis. In
Man, defects in Pax3 are the cause of Waardenburg's syndrome
(WS), an
autosomal dominant combination of deafness and pigmentary disturbance.
- Mammalian protein Pax5, also known as B-cell specific transcription
factor
(BSAP). Pax5 is involved in the regulation of the CD19 gene. It
plays an
important role in B-cell differentiation as well as neural
development and
spermatogenesis.
- Mammalian protein Pax6 (oculorhombin). Pax6 is a transcription factor
with
important functions in eye and nasal development. In Man, defects in
Pax6
are the cause of aniridia type II (AN2), an autosomal dominant
disorder
characterized by complete or partial absence of the iris.
- Mammalian protein Pax8, required in thyroid development.
- Mammalian protein Pax9. In man, defects in Pax9 cause oligodontia.
- Zebrafish proteins Pax[Zf-a] and Pax[Zf-b].
We use the region spanning positions 34 to 50 of the paired domain
as a
signature pattern. This conserved region spans the DNA-binding HTH
located in
the N-terminal subdomain. We also developed a profile that covers the
entire
paired domain, including the PAI and RED subdomains and which allows a
more
sensitive detection.
-Consensus pattern: R-P-C-x(11)-C-V-S
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: January 2005 / Text revised; profile added.
[ 1] Bopp D., Burri M., Baumgartner S., Frigerio G., Noll M.
"Conservation of a large protein domain in the segmentation gene
paired and in functionally related genes of Drosophila."
Cell 47:1033-1040(1986).
PubMed=2877747
[ 2] Baumgartner S., Bopp D., Burri M., Noll M.
"Structure of two genes at the gooseberry locus related to the
paired
gene and their spatial expression during Drosophila embryogenesis."
Genes Dev. 1:1247-1267(1987).
PubMed=3123319
[ 3] Eberhard D., Jimenez G., Heavey B., Busslinger M.
"Transcriptional repression by Pax5 (BSAP) through interaction with
corepressors of the Groucho family."
EMBO J. 19:2292-2303(2000).
PubMed=10811620; DOI=10.1093/emboj/19.10.2292
[ 4] Underhill D.A.
"Genetic and biochemical diversity in the Pax gene family."
Biochem. Cell Biol. 78:629-638(2000).
PubMed=11103953
[ 5] Apuzzo S., Abdelhakim A., Fortin A.S., Gros P.
"Cross-talk between the paired domain and the homeodomain of Pax3:
DNA
binding by each domain causes a structural change in the other
domain,
supporting interdependence for DNA Binding."
J. Biol. Chem. 279:33601-33612(2004).
PubMed=15148315; DOI=10.1074/jbc.M402949200
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00035}
{PS00035; POU_1}
{PS00465; POU_2}
{PS51179; POU_3}
{BEGIN}
*****************************************************
* POU-specific (POUs) domain signatures and profile *
*****************************************************
The POU (pronounced 'pow') domain [1 to 7 ] is a highly charged 155-162amino
acid region of sequence similarity which has been identified in the
three
mammalian transcription factors Pit-1, Oct-1, and Oct-2 and in the
product of
the nematode gene unc-86. The POU domain is a bipartite DNA binding
protein
module that binds selectively to the DNA octamer motif ATGCAAAT and a
subset
of derivatives. It consists of two subdomains, a C-terminal homeodomain
(POUh)
(see <PDOC00027>) and an N-terminal 75- to 82-residue POU-specific
(POUs)
region separated by a short non-conserved linker. The POU-specific
region or
'box' can be subdivided further into two highly conserved regions, A
and B,
separated by a less highly conserved segment. The POUs domain is always
found
in association with a POUh domain, and both are required for high
affinity and
sequence-specific DNA binding.
The POUs domain consists of four alpha helices packed to enclose an
extensive
hydrophobic core (see <PDB:1POU>). The POUs domain contains an
unusual HTH
structure, which differs from the canonical HTH motif in the length
of the
first alpha helix and the turn. The region of hypervariability located
between
subdomains A and B lies within the sequence corresponding to the Cterminal
end of helix 2 and the linker between helices 2 and 3. In the model
of the
POUs-DNA complex, the C-terminus of helix 2 and the turn of the HTH
motif
project away from the DNA such that sequence variability in this region
can be
accomodated without adversely affecting DNA binding [8].
Some proteins currently known to contain a POUs domain are listed below:
- Oct-1 (or OTF-1, NF-A1) (gene POU2F1), a transcription factor for
small
nuclear RNA and histone H2B genes.
- Oct-2 (or OTF-2, NF-A2) (gene POU2F2), a transcription factor
that
specifically binds to the immunoglobulin promoters octamer motif
and
activates these genes.
- Oct-3 (or Oct-4, NF-A3) (gene POU5F1), a transcription factor that
also
binds to the octamer motif.
- Oct-6
(or
OTF-6, SCIP) (gene POU3F1), an octamer-binding
transcription
factor thought to be involved in early embryogenesis and neurogenesis.
- Oct-7 (or N-Oct 3, OTF-7, Brn-2) (gene POU3F2), a nervous-system
specific
octamer-binding transcription factor.
- Oct-11 (or OTF-11) (gene POU2F3), an octamer-binding transcription
factor.
- Pit-1 (or GHF-1) (gene POU1F1), a transcription factor that
activates
growth hormone and prolactin genes.
- Brn-1 (or OTF-8) (gene POU3F3).
- Brn-3A (or RDC-1) (gene POU4F1), a probable transcription factor that
may
play a role in neuronal tissue differentiation.
- Brn-3B (gene POU4F2), a probable transcription factor that may play a
role
in determining or maintaining the identities of a small subset of
visual
system neurons.
- Brn-3C (gene POU4F3).
- Brn-4 (or OTF-9) (gene POU3F4), a probable transcription factor which
exert
its primary action widely during early neural development and in a
very
limited set of neurons in the mature brain.
- Mpou (or Brn-5, Emb) (gene POU6F1), a transcription factor that
binds
preferentially to a variant of the octamer motif.
- Skn, that activates cytokeratin 10 (k10) gene expression.
- Sprm-1, a transcription factor that binds preferentially to the
octamer
motif and that may exert a regulatory function in meiotic events
that are
required for terminal differentiation of male germ cell.
- Unc-86, a Caenorhabditis elegans transcription factor involved in
cell
lineage and differentiation.
- Cf1-a, a Drosophila neuron-specific transcription factor necessary
for the
expression of the dopa decarboxylase gene (dcc).
- I-POU, a Drosophila protein that forms a stable heterodimeric complex
with
Cf1-a and inhibits its action.
- Drosophila protein nubbin/twain (PDM-1 or DPou-19).
- Drosophila protein didymous (PDM-2 or DPou-28) that may play multiple
roles
during development.
- Bombyx mori silk gland factor 3 (SGF-3).
- Xenopus proteins Pou1, Pou2, and Pou3.
- Zebrafish proteins Pou1, Pou2, Pou[C], ZP-12, ZP-23, ZP-47 and ZP-50.
- Caenorhabditis elegans protein ceh-6.
- Caenorhabditis elegans protein ceh-18.
We have derived two signature patterns for the 'POU' domain. The
first one
spans positions 15 to 27 of the domain, the second positions 42 to 55. We
have
also developed a profile which covers the entire POUs domain.
-Consensus pattern: [RKQ]-R-[LIM]-x-[LF]-G-[LIVMFY]-x-Q-x-[DNQ]-V-G
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Consensus pattern: S-Q-[STK]-[TA]-I-[SC]-R-[FH]-[ET]-x-[LSQ]-x(0,1)[LIR][ST]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: January 2006 / Text revised; profile added.
[ 1] Robertson M.
"Homoeo boxes, POU proteins and the limits to promiscuity."
Nature 336:522-524(1988).
PubMed=2904652; DOI=10.1038/336522a0
[ 2] Sturm R.A., Herr W.
"The POU domain is a bipartite DNA-binding structure."
Nature 336:601-604(1988).
PubMed=2904656; DOI=10.1038/336601a0
[ 3] Herr W., Sturm R.A., Clerc R.G., Corcoran L.M., Baltimore D.,
Sharp P.A., Ingraham H.A., Rosenfeld M.G., Finney M., Ruvkun G.,
Horvitz H.R.
"The POU domain: a large conserved region in the mammalian pit-1,
oct-1, oct-2, and Caenorhabditis elegans unc-86 gene products."
Genes Dev. 2:1513-1516(1988).
PubMed=3215510
[ 4] Levine M., Hoey T.
"Homeobox proteins as sequence-specific transcription factors."
Cell 55:537-540(1988).
PubMed=2902929
[ 5] Rosenfeld M.G.
"POU-domain transcription factors: pou-er-ful developmental
regulators."
Genes Dev. 5:897-907(1991).
PubMed=2044958
[ 6] Schoeler H.R.
Trends Genet. 7:323-329(1991).
[ 7] Verrijzer C.P., Van der Vliet P.C.
"POU domain transcription factors."
Biochim. Biophys. Acta 1173:1-21(1993).
PubMed=8485147
[ 8] Assa-Munt N., Mortishire-Smith R.J., Aurora R., Herr W., Wright P.E.
"The solution structure of the Oct-1 POU-specific domain reveals a
striking similarity to the bacteriophage lambda repressor DNAbinding
domain."
Cell 73:193-205(1993).
PubMed=8462099
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00036}
{PS00036; BZIP_BASIC}
{PS50217; BZIP}
{BEGIN}
************************************************************
* Basic-leucine zipper (bZIP) domain signature and profile *
************************************************************
The bZIP superfamily [1,2] of eukaryotic DNA-binding transcription
factors
groups together proteins that contain a basic region mediating
sequencespecific DNA-binding followed by a leucine zipper (see <PDOC00029>)
required
for dimerization. bZIP domains usually bind a pallindromic 6 nucleotide
site,
but the specificity can be altered by interaction with accessory factor
[3].
Several structure of bZIP have been solved (see for example <PDB:1AN2>)
[4].
The basic region and the leucine zipper form a contiguous alpha helice
where
the four hydrophobic residues of the leucine zipper are oriented on one
side.
This conformation allows dimerization in parallel and it bends the
helices so
that the newly functional dimer forms a flexible fork where the basic
domains,
at the N-terminal open end, can then interact with DNA. The two leucine
zipper
are therefore oriented perpendicular to the DNA [4,5].
This family is quite large and we only list here some representative
members.
- Transcription factor AP-1, which binds selectively to enhancer
elements in
the cis control regions of SV40 and metallothionein IIA.
AP-1, also
known
as c-jun, is the cellular homolog of the avian sarcoma virus 17
(ASV17)
oncogene v-jun.
- Jun-B and jun-D, probable transcription factors which are highly
similar
to jun/AP-1.
- The fos protein, a proto-oncogene that forms a non-covalent dimer
with
c-jun.
- The fos-related proteins fra-1, and fos B.
- Mammalian cAMP response element (CRE) binding proteins CREB, CREM,
ATF-1,
ATF-3, ATF-4, ATF-5, ATF-6 and LRF-1.
- Maize Opaque 2, a trans-acting transcriptional activator involved
in the
regulation of the production of zein proteins during endosperm.
- Arabidopsis G-box binding factors GBF1 to GBF4, Parsley CPRF-1 to
CPRF-3,
Tobacco TAF-1 and wheat EMBP-1. All these proteins bind the G-box
promoter
elements of many plant genes.
- Drosophila protein Giant, which represses the expression of
both the
kruppel and knirps segmentation gap genes.
- Drosophila Box B binding factor 2 (BBF-2), a transcriptional activator
that
binds to fat body-specific enhancers of alcohol dehydrogenase and
yolk
protein genes.
- Drosophila segmentation protein cap'n'collar (gene cnc), which is
involved
in head morphogenesis.
- Caenorhabditis elegans skn-1, a developmental protein involved in the
fate
of ventral blastomeres in the early embryo.
- Yeast GCN4 transcription factor, a component of the general control
system
that regulates the expression of amino acid-synthesizing
enzymes in
response to amino acid starvation, and the related Neurospora crassa
cpc-1
protein.
- Neurospora crassa cys-3 which turns on the expression of structural
genes
which encode sulfur-catabolic enzymes.
- Yeast MET28, a transcriptional activator of sulfur amino acids
metabolism.
- Yeast PDR4 (or YAP1), a transcriptional activator of the genes for
some
oxygen detoxification enzymes.
- Epstein-Barr virus trans-activator protein BZLF1.
The pattern we developped is directed against
also
developed a profile that covers the whole domain.
the basic region. We
-Consensus pattern: [KR]-x(1,3)-[RKSAQ]-N-{VL}-x-[SAQ](2)-{L}-[RKTAENQ]x-R{S}-[RK]
-Sequences known to belong to this class detected by the profile: the
large
majority.
-Other sequence(s) detected in Swiss-Prot: 18.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: April 2006 / Pattern revised.
[ 1] Hurst H.C.
Protein Prof. 2:105-168(1995).
[ 2] Ellenberger T.
Curr. Opin. Struct. Biol. 4:12-21(1994).
[ 3] Baranger A.M.
"Accessory factor-bZIP-DNA interactions."
Curr. Opin. Chem. Biol. 2:18-23(1998).
PubMed=9667910
[ 4] Ferre-D'amare A.R., Prendergast G.C., Ziff E.B., Burley S.K.
Nature 363:38-45(1993).
[ 5] Ellenberger T.E., Brandl C.J., Struhl K., Harrison S.C.
"The GCN4 basic region leucine zipper binds DNA as a dimer of
uninterrupted alpha helices: crystal structure of the protein-DNA
complex."
Cell 71:1223-1237(1992).
PubMed=1473154
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00037}
{PS50090; MYB_LIKE}
{PS51294; HTH_MYB}
{BEGIN}
********************************************
* Myb-type HTH DNA-binding domain profiles *
********************************************
The myb family can be classified into three groups: the myb-type HTH
domain,
which binds DNA, the SANT domain, which is a protein-protein
interaction
module (see <PDOC51293>) and the myb-like domain that can be
involved in
either of these functions.
The myb-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain
of ~55
amino
acids,
typically
occurring
in a tandem repeat in
eukaryotic
transcription factors. The domain is named after the retroviral
oncogene
v-myb, and its cellular counterpart c-myb, which encode nuclear DNAbinding
proteins that specifically recognize the sequence YAAC(G/T)G
[1,2]. Myb
proteins contain three tandem repeats of 51 to 53 amino acids, termed
R1, R2
and R3. This repeat region is involved in DNA-binding and R2 and R3
bind
directly to the DNA major groove. The major part of the first
repeat is
missing in retroviral v-Myb sequences and in plant myb-related (R2R3)
proteins
[3]. A single myb-type HTH DNA-binding domain occurs in TRF1 and TRF2.
The 3D-structure of the myb-type HTH domain forms three alpha-helices
(see
<PDB:1H88; C>) [4]. The second and third helices connected via a turn
comprise
the helix-turn-helix motif. Helix 3 is termed the recognition helix
as it
binds the DNA major groove, like in other HTHs.
Some proteins known to contain a myb-type HTH domain:
- Fruit fly myb protein [2].
- Vertebrate myb-like proteins A-myb and B-myb.
- Maize anthocyanin regulatory C1 protein, a trans-acting factor
which
controls the expression of genes involved in anthocyanin biosynthesis.
- Maize P protein [5], a trans-acting factor which regulates the
biosynthetic
pathway of a flavonoid-derived pigment in certain floral tissues.
- Arabidopsis thaliana protein GL1/GLABROUS1 [6], required for the
initiation
of differentiation of leaf hair cells (trichomes).
- Maize and barley myb-related proteins Zm1, Zm38 and Hv1, Hv33 [7].
- Yeast BAS1 [8], a transcriptional activator for the HIS4 gene.
- Yeast REB1 [9], which recognizes sites within both the enhancer
and the
promoter of rRNA transcription, as well as upstream of many
genes
transcribed by RNA polymerase II.
- Fission yeast cdc5, a possible transcription factor whose
activity is
required for cell cycle progression and growth during G2.
- Fission yeast myb1, which regulates telomere length and function.
- Baker's yeast pre-mRNA-splicing factor CEF1.
- Vertebrate telomeric repeat-binding factors 1 and 2 (TRF1/2), which
bind to
telomeric DNA and are involved in telomere length regulation.
We have developed a profile, which has been manually adapted to
specifically
detect the DNA-binding myb-type HTH domain. A second general
profile was
developed for detection of the myb-like domain with a high
sensitivity. A
third profile was developed for the SANT domain (see <PDOC51293>).
-Sequences known to belong to this class detected by the first profile:
ALL.
-Other sequence(s) detected in Swiss-Prot: 2.
-Sequences known to belong to this class detected by the second profile:
ALL,
except 25.
-Other sequence(s) detected in Swiss-Prot: 2.
-Note: The profiles are in competition with one another and with the
profile
of the SANT domain (see <PDOC51293>).
-Last update:
added;
February
2007
/
Profile
and
text
revised; profile
patterns removed.
[ 1] Biedenkapp H., Borgmeyer U., Sippel A.E., Klempnauer K.-H.
"Viral myb oncogene encodes a sequence-specific DNA-binding
activity."
Nature 335:835-837(1988).
PubMed=3185713; DOI=10.1038/335835a0
[ 2] Peters C.W.B., Sippel A.E., Vingron M., Klempnauer K.-H.
"Drosophila and vertebrate myb proteins share two conserved regions,
one of which functions as a DNA-binding domain."
EMBO J. 6:3085-3090(1987).
PubMed=3121304
[ 3] Stracke R., Werber M., Weisshaar B.
"The R2R3-MYB gene family in Arabidopsis thaliana."
Curr. Opin. Plant. Biol. 4:447-456(2001).
PubMed=11597504
[ 4] Tahirov T.H., Sato K., Ichikawa-Iwata E., Sasaki M., Inoue-Bungo T.,
Shiina M., Kimura K., Takata S., Fujikawa A., Morii H., Kumasaka T.,
Yamamoto M., Ishii S., Ogata K.
"Mechanism of c-Myb-C/EBP beta cooperation from separated sites on a
promoter."
Cell 108:57-70(2002).
PubMed=11792321
[ 5] Grotewold E., Athma P., Peterson T.
"Alternatively spliced products of the maize P gene encode proteins
with homology to the DNA-binding domain of myb-like transcription
factors."
Proc. Natl. Acad. Sci. U.S.A. 88:4587-4591(1991).
PubMed=2052542
[ 6] Oppenheimer D.G., Herman P.L., Sivakumaran S., Esch J., Marks M.D.
"A myb gene required for leaf trichome differentiation in
Arabidopsis
is expressed in stipules."
Cell 67:483-493(1991).
PubMed=1934056
[ 7] Marocco A., Wissenbach M., Becker D., Paz-Ares J., Saedler H.,
Salamini F., Rohde W.
"Multiple genes are transcribed in Hordeum vulgare and Zea mays that
carry the DNA binding domain of the myb oncoproteins."
Mol. Gen. Genet. 216:183-187(1989).
PubMed=2664447
[ 8] Tice-Baldwin K., Fink G.R., Arndt K.T.
"BAS1 has a Myb motif and activates HIS4 transcription only in
combination with BAS2."
Science 246:931-935(1989).
PubMed=2683089
[ 9] Ju Q.D., Morrow B.E., Warner J.R.
"REB1, a yeast DNA-binding protein with many targets, is essential
for
growth and bears some resemblance to the oncogene myb."
Mol. Cell. Biol. 10:5226-5234(1990).
PubMed=2204808
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00038}
{PS50888; HLH}
{BEGIN}
***********************************************
* Myc-type, 'helix-loop-helix' domain profile *
***********************************************
A number of eukaryotic proteins, which probably are sequence specific
DNAbinding proteins that act as transcription factors, share a conserved
domain
of 40 to 50 amino acid residues. It has been proposed [1] that this
domain is
formed of two amphipathic helices joined by a variable length linker
region
that could form a loop. This 'helix-loop-helix' (HLH) domain mediates
protein
dimerization and has been found in the proteins listed below [2,3].
Most
of these proteins have an extra basic region of about 15 amino acid
residues
that is adjacent to the HLH domain and specifically binds to DNA.
They are
refered as basic helix-loop-helix proteins (bHLH), and are classified
in two
groups: class A (ubiquitous) and class B (tissue-specific). Members
of the
bHLH family bind variations on the core sequence 'CANNTG', also refered
to as
the E-box motif. The homo- or heterodimerization mediated by the HLH
domain is
independent of, but necessary for DNA binding, as two basic
regions are
required for DNA binding activity. The HLH proteins lacking the basic
domain
(Emc, Id) function as negative regulators since they form
heterodimers, but
fail to bind DNA. The hairy-related proteins (hairy, E(spl), deadpan)
also
repress transcription although
they can bind DNA. The proteins of
this
subfamily act together with co-repressor proteins, like groucho, through
their
C-terminal motif WRPW.
- The myc family of cellular oncogenes [4], which is currently
known to
contain four members: c-myc, N-myc, L-myc, and B-myc. The myc
genes are
thought to play a role in cellular differentiation and proliferation.
- Proteins involved in myogenesis (the induction of muscle cells). In
mammals
MyoD1 (Myf-3), myogenin (Myf-4), Myf-5, and Myf-6 (Mrf4 or
herculin), in
birds CMD1 (QMF-1), in Xenopus MyoD and MF25, in Caenorhabditis
elegans
CeMyoD, and in Drosophila nautilus (nau).
- Vertebrate proteins that bind specific DNA sequences ('E boxes') in
various
immunoglobulin chains enhancers: E2A or ITF-1 (E12/pan-2 and
E47/pan-1),
ITF-2 (tcf4), TFE3, and TFEB.
- Vertebrate neurogenic differentiation factor 1 that acts as
differentiation
factor during neurogenesis.
- Vertebrate MAX protein, a transcription regulator that forms a
sequencespecific DNA-binding protein complex with myc or mad.
- Vertebrate
Max Interacting Protein 1 (MXI1 protein) which acts
as a
transcriptional repressor and may antagonize myc transcriptional
activity
by competing for max.
- Proteins of the bHLH/PAS superfamily which are transcriptional
activators.
In mammals, AH receptor nuclear translocator (ARNT), single-minded
homologs
(SIM1 and SIM2), hypoxia-inducible factor 1 alpha (HIF1A), AH
receptor
(AHR), neuronal pas domain proteins (NPAS1 and NPAS2),
endothelial pas
domain protein 1 (EPAS1), mouse ARNT2, and human BMAL1. In
drosophila,
single-minded (SIM), AH receptor nuclear translocator (ARNT),
trachealess
protein (TRH), and similar protein (SIMA).
- Mammalian transcription factors HES, which repress transcription by
acting
on two types of DNA sequences, the E box and the N box.
- Mammalian
MAD protein (max dimerizer) which acts as
transcriptional
repressor and may antagonize myc transcriptional activity by
competing for
max.
- Mammalian Upstream Stimulatory Factor 1 and 2 (USF1 and USF2), which
bind
to a symmetrical DNA sequence that is found in a variety of viral
and
cellular promoters.
- Human lyl-1 protein; which is involved, by chromosomal translocation,
in Tcell leukemia.
- Human transcription factor AP-4.
- Mouse helix-loop-helix proteins MATH-1 and MATH-2 which activate E
boxdependent transcription in collaboration with E47.
- Mammalian stem cell protein (SCL) (also known as tal1), a protein
which may
play an important role in hemopoietic differentiation. SCL is
involved, by
chromosomal translocation, in stem-cell leukemia.
- Mammalian proteins Id1 to Id4 [5]. Id (inhibitor of DNA binding)
proteins
lack a basic DNA-binding domain but are able to form heterodimers
with
other HLH proteins, thereby inhibiting binding to DNA.
- Drosophila extra-macrochaetae (emc) protein, which participates in
sensory
organ patterning by antagonizing the neurogenic activity of the
achaetescute complex. Emc is the homolog of mammalian Id proteins.
- Human
Sterol
Regulatory
Element
Binding
Protein 1 (SREBP1), a
transcriptional activator that binds to the sterol regulatory
element 1
(SRE-1) found in the flanking region of the LDLR gene and in other
genes.
- Drosophila achaete-scute (AS-C) complex proteins T3 (l'sc), T4
(scute),
T5 (achaete) and T8 (asense). The AS-C proteins are involved in
the
determination of the neuronal precursors in the peripheral nervous
system
and the central nervous system.
- Mammalian homologs
of achaete-scute proteins, the MASH-1 and
MASH-2
proteins.
- Drosophila atonal protein (ato) which is involved in neurogenesis.
- Drosophila daughterless (da) protein, which is essential for
neurogenesis
and sex-determination.
- Drosophila deadpan (dpn), a hairy-like protein involved in the
functional
differentiation of neurons.
- Drosophila delilah (dei) protein, which is plays an important role
in the
differentiation of epidermal cells into muscle.
- Drosophila hairy (h) protein, a transcriptional repressor which
regulates
the embryonic segmentation and adult bristle patterning.
- Drosophila enhancer of split proteins E(spl), that are hairy-like
proteins
active during neurogenesis. also act as transcriptional repressors.
- Drosophila twist (twi) protein, which is involved in the
establishment of
germ layers in embryos.
- Maize anthocyanin regulatory proteins R-S and LC.
- Yeast centromere-binding protein 1 (CPF1 or CBF1). This protein is
involved
in chromosomal segregation. It binds to a highly conserved DNA
sequence,
found in centromers and in several promoters.
- Yeast INO2 and INO4 proteins.
- Yeast phosphate system positive regulatory protein PHO4 which
interacts
with the upstream activating sequence of several acid phosphatase
genes.
- Yeast serine-rich protein TYE7 that is required for ty-mediated
ADH2
expression.
- Neurospora crassa nuc-1, a protein that activates the
transcription of
structural genes for phosphorus acquisition.
- Fission yeast protein esc1 which is involved in the sexual
differentiation
process.
The schematic representation of the helix-loop-helix domain is shown
here:
xxxxxxxxxxxxxxxxxxxxxxxx--------------------xxxxxxxxxxxxxxxxxxxxxxx
Amphipathic helix 1
Loop
Amphipathic helix 2
The profile we developed covers the helix-loop-helix dimerization
domain and
the basic region.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: August 2003 / Pattern removed.
[ 1] Murre C., McCaw P.S., Baltimore D.
"A new DNA binding and dimerization motif in immunoglobulin enhancer
binding, daughterless, MyoD, and myc proteins."
Cell 56:777-783(1989).
PubMed=2493990
[ 2] Garrel J., Campuzano S.
BioEssays 13:493-498(1991).
[ 3] Kato G.J., Dang C.V.
"Function of the c-Myc oncoprotein."
FASEB J. 6:3065-3072(1992).
PubMed=1521738
[ 4] Krause M., Fire A., Harrison S.W., Priess J., Weintraub H.
CeMyoD accumulation defines the body wall muscle cell fate during C.
"elegans embryogenesis."
Cell 63:907-919(1990).
PubMed=2175254
[ 5] Riechmann V., van Cruechten I., Sablitzky F.
"The expression pattern of Id4, a novel dominant negative
helix-loop-helix protein, is distinct from Id1, Id2 and Id3."
Nucleic Acids Res. 22:749-755(1994).
PubMed=8139914
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00039}
{PS00039; DEAD_ATP_HELICASE}
{PS00690; DEAH_ATP_HELICASE}
{BEGIN}
*****************************************************************
* DEAD and DEAH box families ATP-dependent helicases signatures *
*****************************************************************
A number of eukaryotic and prokaryotic proteins have been characterized
[1,2,
3] on the basis of their structural similarity. They all seem to be
involved
in ATP-dependent, nucleic-acid unwinding. Proteins currently known to
belong
to this family are:
- Initiation factor eIF-4A. Found in eukaryotes, this protein is a
subunit of
a high molecular weight complex involved in 5'cap recognition
and the
binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase.
- PRP5 and PRP28. These yeast proteins are involved in various ATPrequiring
steps of the pre-mRNA splicing process.
- Pl10, a mouse protein expressed specifically during spermatogenesis.
- An3, a Xenopus putative RNA helicase, closely related to Pl10.
- SPP81/DED1 and DBP1, two yeast proteins probably involved in
pre-mRNA
splicing and related to Pl10.
- Caenorhabditis elegans helicase glh-1.
- MSS116, a yeast protein required for mitochondrial splicing.
- SPB4, a yeast protein involved in the maturation of 25S ribosomal RNA.
- p68, a human nuclear antigen. p68 has ATPase and DNA-helicase
activities in
vitro. It is involved in cell growth and division.
- Rm62 (p62), a Drosophila putative RNA helicase related to p68.
- DBP2, a yeast protein related to p68.
- DHH1, a yeast protein.
- DRS1, a yeast protein involved in ribosome assembly.
- MAK5, a yeast protein involved in maintenance of dsRNA killer plasmid.
- ROK1, a yeast protein.
- ste13, a fission yeast protein.
- Vasa, a Drosophila protein important for oocyte formation and
specification
of of embryonic posterior structures.
- Me31B, a Drosophila maternally expressed protein of unknown function.
- dbpA, an Escherichia coli putative RNA helicase.
- deaD, an Escherichia coli putative RNA helicase which can
suppress a
mutation in the rpsB gene for ribosomal protein S2.
- rhlB, an Escherichia coli putative RNA helicase.
- rhlE, an Escherichia coli putative RNA helicase.
- srmB, an Escherichia coli protein that shows RNA-dependent ATPase
activity.
It probably interacts with 23S ribosomal RNA.
- Caenorhabditis elegans hypothetical proteins T26G10.1, ZK512.2 and
ZK686.2.
- Yeast hypothetical protein YHR065c.
- Yeast hypothetical protein YHR169w.
- Fission yeast hypothetical protein SpAC31A2.07c.
- Bacillus subtilis hypothetical protein yxiN.
All these proteins share a number of conserved sequence motifs. Some of
them
are specific to this family while others are shared by other ATPbinding
proteins or by proteins belonging to the helicases `superfamily'
[4,E1]. One
of these motifs, called the 'D-E-A-D-box', represents a special version
of the
B motif of ATP-binding proteins.
Some other proteins belong to a subfamily which have His instead of the
second
Asp and are thus said to be 'D-E-A-H-box' proteins [3,5,6,E1].
Proteins
currently known to belong to this subfamily are:
- PRP2, PRP16, PRP22 and PRP43. These yeast proteins are all
involved in
various ATP-requiring steps of the pre-mRNA splicing process.
- Fission yeast prh1, which my be involved in pre-mRNA splicing.
- Male-less (mle), a
Drosophila protein required in males, for
dosage
compensation of X chromosome linked genes.
- RAD3 from yeast. RAD3 is a DNA helicase involved in excision repair
of DNA
damaged by
UV light, bulky adducts or cross-linking agents.
Fission
yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD
(ERCC-2)
are the homologs of RAD3.
- Yeast CHL1 (or CTF1), which is important for chromosome
transmission and
normal cell cycle progression in G(2)/M.
- Yeast TPS1.
- Yeast hypothetical protein YKL078w.
- Caenorhabditis elegans hypothetical proteins C06E1.10 and K03H1.2.
- Poxviruses' early transcription factor 70 Kd subunit which acts
with RNA
polymerase to initiate transcription from early gene promoters.
- I8, a putative vaccinia virus helicase.
- hrpA, an Escherichia coli putative RNA helicase.
We have developed signature patterns for both subfamilies.
-Consensus pattern: [LIVMF](2)-D-E-A-D-[RKEN]-x-[LIVMFYGSTN]
-Sequences known to belong to this class detected by the pattern: ALL,
except
for YHR169w.
-Other sequence(s) detected in Swiss-Prot: 14.
-Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR]
-Sequences known to belong to this class detected by the pattern: ALL,
except
for hrpA.
-Other sequence(s) detected in Swiss-Prot: 6.
-Note: Proteins belonging to this family also contain a copy of the
ATP/GTPbinding motif 'A' (P-loop) (see the relevant entry <PDOC00017>).
-Expert(s) to contact by email:
Linder P.; [email protected]
-Last update: July 1999 / Text revised.
[ 1] Schmid S.R., Linder P.
"D-E-A-D protein family of putative RNA helicases."
Mol. Microbiol. 6:283-291(1992).
PubMed=1552844
[ 2] Linder P., Lasko P.F., Ashburner M., Leroy P., Nielsen P.J., Nishi
K.,
Schnier J., Slonimski P.P.
"Birth of the D-E-A-D box."
Nature 337:121-122(1989).
PubMed=2563148; DOI=10.1038/337121a0
[ 3] Wassarman D.A., Steitz J.A.
"RNA splicing. Alive with DEAD proteins."
Nature 349:463-464(1991).
PubMed=1825133; DOI=10.1038/349463a0
[ 4] Hodgman T.C.
"A new superfamily of replicative proteins."
Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata).
PubMed=3362205; DOI=10.1038/333022b0
[ 5] Harosh I., Deschavanne P.
"The RAD3 gene is a member of the DEAH family RNA helicase-like
protein."
Nucleic Acids Res. 19:6331-6331(1991).
PubMed=1956796
[ 6] Koonin E.V., Senkevich T.G.
"Vaccinia virus encodes four putative DNA and/or RNA helicases
distantly related to each other."
J. Gen. Virol. 73:989-993(1992).
PubMed=1321883
[E1] http://medweb2.unige.ch/~linder/RNA_helicases.html
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00040}
{PS00041; HTH_ARAC_FAMILY_1}
{PS01124; HTH_ARAC_FAMILY_2}
{BEGIN}
********************************************************************
* Bacterial regulatory proteins, araC family signature and profile *
********************************************************************
The many bacterial transcription regulation proteins which bind DNA
through a
'helix-turn-helix' motif can be classified into subfamilies on the
basis of
sequence similarities. One of these subfamilies groups together the
following
proteins [1,2,3]:
- aarP, a transcriptional activator of the 2'-N-acetyltransferase
gene in
Providencia stuartii.
- ada, an Escherichia coli and Salmonella typhimurium bifunctional
protein
that repairs alkylated guanine in DNA by transferring the alkyl
group at
the O(6) position to a cysteine residue in the enzyme. The
methylated
protein acts a positive regulator of its own synthesis and of the
alkA,
alkB and aidB genes.
- adaA, a Bacillus subtilis bifunctional
protein that
acts both
as a
transcriptional activator of the ada operon and as a
methylphosphotriesterDNA alkyltransferase.
- adiY, an Escherichia coli protein of unknown function.
- aggR, the transcriptional activator of aggregative adherence
fimbria I
expression in enteroaggregative Escherichia coli.
- appY, a protein which acts as a transcriptional
activator of
acid
phosphatase and other proteins during the deceleration phase of
growth and
acts as a repressor for other proteins that are synthesized in
exponential
growth or in the stationary phase.
- araC, the
arabinose operon
regulatory protein,
which activates
the
transcription of the araBAD genes.
- cafR, the Yersinia pestis F1 operon positive regulatory protein.
- celD, the Escherichia coli cel operon repressor.
- cfaD, a protein which is required for the expression of the CFA/I
adhesin
of enterotoxigenic Escherichia coli.
- csvR, a transcriptional activator of fimbrial genes in
enterotoxigenic
Escherichia coli.
- envY, the porin thermoregulatory protein, which is involved in the
control
of the temperature-dependent expression of several
Escherichia
coli
envelope proteins such as ompF, ompC, and lamB.
- exsA, an activator of exoenzyme S synthesis in Pseudomonas aeruginosa.
- fapR, the positive activator for the expression of the 987P operon
coding
for the fimbrial protein in enterotoxigenic Escherichia coli.
- hrpB, a
positive regulator
of pathogenicity
genes in
Burkholderia
solanacearum.
- invF, the Salmonella typhimurium invasion operon regulator.
- marA, which may be a transcriptional activator of genes involved
in the
multiple antibiotic resistance (mar) phenotype.
- melR, the melibiose operon regulatory
protein,
which activates
the
transcription of the melAB genes.
- mixE, a Shigella flexneri protein necessary for secretion of ipa
invasins.
- mmsR, the transcriptional activator for the mmsAB operon in
Pseudomonas
aeruginosa.
- msmR, the multiple sugar metabolism operon transcriptional
activator in
Streptococcus mutans.
- pchR, a Pseudomonas aeruginosa activator for pyochelin and
ferripyochelin
receptor.
- perA, a transcriptional activator of the eaeA gene for
intimin in
enteropathogenic Escherichia coli.
- pocR, a Salmonella typhimurium regulator of the cobalamin
biosynthesis
operon.
- pqrA, from Proteus vulgaris.
- rafR, the regulator of the raffinose operon in Pediococcus
pentosaceus.
- ramA, from Klebsiella pneumoniae.
- rhaR, the Escherichia coli and Salmonella typhimurium L-rhamnose
operon
transcriptional activator.
- rhaS, an Escherichia coli and Salmonella typhimurium positive
activator of
genes required for rhamnose utilization.
- rns, a protein which is required for the expression of the cs1
and cs2
adhesins of enterotoxigenic Escherichia coli.
- rob, a protein which binds to the right arm of the replication origin
oriC
of the Escherichia coli chromosome.
- soxS, a protein that, with the soxR protein, controls a superoxide
response
regulon in Escherichia coli.
- tetD, a protein from transposon TN10.
- tcpN or toxT, the Vibrio cholerae transcriptional activator of the
tcp
operon involved in pilus biosynthesis and transport.
- thcR, a probable regulator of the thc operon for the degradation
of the
thiocarbamate herbicide EPTC in Rhodococcus sp. strain NI86/21.
- ureR, the transcriptional activator of the plasmid-encoded urease
operon in
Enterobacteriaceae.
- virF and lcrF, the Yersinia virulence regulon transcriptional
activator.
- virF, the Shigella transcriptional factor of invasion related
antigens
ipaBCD.
- xylR, the Escherichia coli xylose operon regulator.
- xylS, the transcriptional activator of the Pseudomonas putida TOL
plasmid
(pWWO, pWW53 and pDK1) meta operon (xylDLEGF genes).
- yfeG, an Escherichia coli hypothetical protein.
- yhiW, an Escherichia coli hypothetical protein.
- yhiX, an Escherichia coli hypothetical protein.
- yidL, an Escherichia coli hypothetical protein.
- yijO, an Escherichia coli hypothetical protein.
- yuxC, a Bacillus subtilis hypothetical protein.
- yzbC, a Bacillus subtilis hypothetical protein.
Except for celD, all of these proteins seem to be positive
transcriptional
factors. Their size range from 107 (soxS) to 529 (yzbC) residues.
The helix-turn-helix motif is located in the third quarter of most
of the
sequences; the N-terminal and central regions of these proteins are
presumed
to interact with effector molecules and may be involved in
dimerization. The
minimal DNA binding domain, which spans roughly 100 residues and
comprises the
HTH motif contains another region with similarity to classical HTH
domain.
However, it contains an insertion of one residue in the turn-region.
A signature pattern was derived from the region that follows the
first HTH
domain and that includes the totality of the putative second HTH
domain. A
more sensitive detection of members of the araC family is available
through
the use of a profile which spans the minimal DNA-binding region
of 100
residues.
-Consensus pattern: [KRQ]-[LIVMA]-x(2)-[GSTALIV]-{FYWPGDN}-x(2)-[LIVMSA]x(4,9)-[LIVMF]-x-{PLH}-[LIVMSTA]-[GSTACIL]-{GPK}-{F}x[GANQRF]-[LIVMFY]-x(4,5)-[LFY]-x(3)-[FYIVA]-{FYWHCM}{PGVI}-x(2)-[GSADENQKR]-x-[NSTAPKL]-[PARL]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 50.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Ramos J.L.; [email protected]
Gallegos M.-T.; [email protected]
-Last update: April 2006 / Pattern revised.
[ 1] Gallegos M.-T., Michan C., Ramos J.L.
"The XylS/AraC family of regulators."
Nucleic Acids Res. 21:807-810(1993).
PubMed=8451183
[ 2] Henikoff S., Wallace J.C., Brown J.P.
"Finding protein similarities with nucleotide sequence databases."
Methods Enzymol. 183:111-132(1990).
PubMed=2314271
[ 3] Gallegos M.T., Schleif R., Bairoch A., Hofmann K., Ramos J.L.
"Arac/XylS family of transcriptional regulators."
Microbiol. Mol. Biol. Rev. 61:393-410(1997).
PubMed=9409145
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00041}
{PS00042; HTH_CRP_1}
{PS51063; HTH_CRP_2}
{BEGIN}
*********************************************
* Crp-type HTH domain signature and profile *
*********************************************
The crp-type HTH domain is a DNA-binding, winged helix-turn-helix
(wHTH)
domain of about 70-75 amino acids present in transcription regulators
of the
crp-fnr family, involved in the control of virulence factors,
enzymes of
aromatic ring degradation, nitrogen fixation, photosynthesis, and
various
types of respiration. The crp-fnr family is named after the first
members
identified in E.coli: the well characterized cyclic AMP receptor
protein CRP
or CAP (catabolite activator protein) and the fumarate and nitrate
reductase
regulator Fnr. crp-type HTH domain proteins occur in most bacteria
and in
chloroplasts of red algae. The DNA-binding HTH domain is located
in the
C-terminal part; the N-terminal part of the proteins of the crp-fnr
family
contains
a
nucleotide-binding
domain
(see
<PDOC00691>)
and
a
dimerization/linker
helix
occurs
in
between. The crp-fnr
regulators
predominantly act as transcription activators, but can also be
important
repressors, and respond to diverse intracellular and exogenous signals,
such
as cAMP, anoxia, redox state, oxidative and nitrosative stress,
carbon
monoxide, nitric oxide or temperature [1,2].
The structure of the crp-type DNA-binding domain (see <PDB:1LB2>) shows
that
the helices (H) forming the helix-turn-helix motif (H2-H3) are flanked
by two
beta-hairpin (B) wings, in the topology H1-B1-B2-H2-H3-B3-B4. Helix
3 is
termed the recognition helix, as in most wHTHs it binds the DNA major
groove
[3,4,5].
Some proteins known to contain a Crp-type HTH domain:
- Escherichia coli crp (also known as cAMP receptor), a protein
that
complexes
with
cAMP
and
regulates
the transcription of
several
catabolite-sensitive operons.
- Escherichia coli fnr, a protein that activates genes for proteins
involved
in a variety of anaerobic electron transport systems.
- Rhizobium
leguminosarum fnrN, a transcription regulator of
nitrogen
fixation.
- Rhodobacter sphaeroides fnrL, a transcription activator of genes for
heme
biosynthesis,
bacteriochlorophyll
synthesis and the lightharvesting
complex LHII.
- Rhizobiacae fixK, a protein that regulates nitrogen fixation genes,
both
positively and negatively.
- Lactobacillus casei fnr-like protein flp, a putative regulatory
protein
linked to the trpDCFBA operon.
- Cyanobacteria ntcA, a regulator of the expression of genes
subject to
nitrogen control.
- Xanthomonas campestris clp, a protein involved in the
regulation of
phytopathogenicity. Clp controls the production of extracellular
enzymes,
xanthan gum and pigment, either positively or negatively.
The 'helix-turn-helix' DNA-binding motif of these proteins is located
in the
C-terminal part of the sequence. The pattern we use to detect these
proteins
starts two residues before the HTH motif and ends two residues before
the end
of helix 3. We also developed a profile that covers the entire wHTH,
including
helix 1 and strand 4, and which allows a more sensitive detection.
-Consensus pattern: [LIVM]-[STAG]-[RHNWM]-x(2)-[LIM]-[GA]-x-[LIVMFYAS][LIVSC]-[GA]-x-[STACN]-x(2)-[MST]-x(1,2)-[GSTN]-R-x[LIVMF]-x(2)-[LIVMF]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 1.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: April 2006 / Pattern revised.
[ 1] Irvine A.S., Guest J.R.
"Lactobacillus casei contains a member of the CRP-FNR family."
Nucleic Acids Res. 21:753-753(1993).
PubMed=8441692
[ 2] Koerner H., Sofia H.J., Zumft W.G.
FEMS Microbiol. Rev. 27:559-592(2003).
[ 3] Busby S., Ebright R.H.
"Transcription activation by catabolite activator protein (CAP)."
J. Mol. Biol. 293:199-213(1999).
PubMed=10550204; DOI=10.1006/jmbi.1999.3161
[ 4] Lanzilotta W.N., Schuller D.J., Thorsteinsson M.V., Kerby R.L.,
Roberts G.P., Poulos T.L.
"Structure of the CO sensing transcription activator CooA."
Nat. Struct. Biol. 7:876-880(2000).
PubMed=11017196; DOI=10.1038/82820
[ 5] Huffman J.L., Brennan R.G.
"Prokaryotic transcription regulators: more than just the
helix-turn-helix motif."
Curr. Opin. Struct. Biol. 12:98-106(2002).
PubMed=11839496
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00042}
{PS50949; HTH_GNTR}
{BEGIN}
********************************
* GntR-type HTH domain profile *
********************************
The gntR-type HTH domain is a DNA-binding, winged helix-turn-helix
(wHTH)
domain of about 60-70 residues present in transcriptional regulators
of the
gntR family. This family of bacterial regulators is named after
Bacillus
subtilis gntR, a repressor of the gluconate operon [1,2]. Six subfamilies
have
been described for the gntR family: fadR, hutC, plmA, mocR, ytrA, and
araR,
which regulate various biological processes and important bacterial
metabolic
pathways.
The DNA-binding gntR-type HTH domain occurs usually in
the
N-terminal
part. The C-terminal part can contain a subfamilyspecific
effector-binding domain and/or an oligomerization domain. The
fadR-like
regulators, representing the largest subfamily, are involved in the
regulation
of oxidized substrates related to metabolic pathways or metabolism of
amino
acids. HutC-like proteins are involved in conjugative plasmid
transfer in
several Streptomyces species. PlmA is a cyanobacterial regulator of
plasmid
maintenance. The mocR subfamily encompasses proteins homologous to
class I
aminotransferase proteins, which bind pyridoxal phosphate as a cofactor.
Most
of the ytrA-like proteins take part in operons involved in ATPbinding
cassette (ABC) transport systems. AraR is an autoregulatory protein
with a
C-terminal domain that binds a carbohydrate effector, similar to that
present
in regulators of the lacI/galR family (see <PDOC00366>) [3,4].
The crystal structures of fadR show that the N-terminal, DNA binding
domain
contains a small beta-sheet (B) core and three alpha-helices (H)
with a
topology H1-B1-H2-H3-B2-B3 (see <PDB:1H9T>). Helices 2 and 3, connected
via a
tight
turn,
comprise
the
helix-turn-helix
motif. The antiparallel
beta-strands 2 and 3 together with B1 form a small beta-sheet, which is
called
the wing. Helix 3 is termed the recognition helix as in most wHTHs it
binds
the DNA major groove. Here, only the N-terminal tip of the recognition
helix
makes specific DNA-contacts and the wing makes unusual sequencespecific
contacts to the minor groove. Like other HTH proteins, most
gntR-type
regulators bind as homodimers to 2-fold symmetric DNA sequences in which
each
monomer recognizes half of the site [5,6].
Some proteins known to contain a gntR-type HTH domain:
- Bacillus subtilis gntR, a repressor of the gnt operon, which is
responsible
for gluconate metabolism. In the absence of gluconate, gntR binds
to the
promoter of the operon. The expression of the operon is induced
in the
presence of gluconate.
- Escherichia
coli
fadR,
a transcriptional regulator of fatty
acid
metabolism. In the absence of the acyl-CoA effector, fadR binds
specific
operator sites, represses the expression of genes involved in fatty
acid
degradation and import, and activates biosynthetic genes.
Binding of
acyl-CoA
gives conformational changes abolishing DNA binding,
which
derepresses the catabolic genes and deactivates the anabolic genes.
- Escherichia
coli phdR, a transcriptional repressor of the
pyruvate
dehydrogenase complex.
- Klebsiella
aerogenes and Pseudomonas putida hutC, a
transcriptional
repressor of the histidine utilization (hut) operon.
- Streptomyces lividans korA, a regulator that controls plasmid
transfer.
- Rhizobium meliloti mocR, a probable regulator of rhizopine catabolism.
- Bacillus subtilis ytrA, a repressor of the acetoine utilization
gene
cluster.
- Anabaena sp. strain PCC 7120 plmA, a regulator involved in
plasmid
maintenance [4].
- Bacillus
arabinose
operon.
subtilis
araR,
a
transcriptional
repressor
of
the
The profile we developed covers the entire gntR-type HTH domain,
from the
well-conserved part of helix 1 to the end of the wing.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Rigali S.; [email protected]
-Last update: February 2004 / Text revised.
[ 1] Buck D., Guest J.R.
"Overexpression and site-directed mutagenesis of the succinyl-CoA
synthetase of Escherichia coli and nucleotide sequence of a gene
(g30)
that is adjacent to the suc operon."
Biochem. J. 260:737-747(1989).
PubMed=2548486
[ 2] Haydon D.J., Guest J.R.
"A new family of bacterial regulatory proteins."
FEMS Microbiol. Lett. 63:291-295(1991).
PubMed=2060763
[ 3] Rigali S., Derouaux A., Giannotta F., Dusart J.
"Subdivision of the helix-turn-helix GntR family of bacterial
regulators in the FadR, HutC, MocR, and YtrA subfamilies."
J. Biol. Chem. 277:12507-12515(2002).
PubMed=11756427; DOI=10.1074/jbc.M110968200
[ 4] Lee M.H., Scherer M., Rigali S., Golden J.W.
"PlmA, a new member of the GntR family, has plasmid maintenance
functions in Anabaena sp. strain PCC 7120."
J. Bacteriol. 185:4315-4325(2003).
PubMed=12867439
[ 5] Van Aalten D.M.F., DiRusso C.C., Knudsen J.
EMBO J. 20:2041-2050(2001).
[ 6] Xu Y., Heath R.J., Li Z., Rock C.O., White S.W.
"The FadR.DNA complex. Transcriptional control of fatty acid
metabolism in Escherichia coli."
J. Biol. Chem. 276:17373-17379(2001).
PubMed=11279025; DOI=10.1074/jbc.M100195200
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00043}
{PS50931; HTH_LYSR}
{BEGIN}
********************************
* LysR-type HTH domain profile *
********************************
The lysR-type HTH domain is a DNA-binding, winged helix-turn-helix
(wHTH)
domain of about 60 residues present in lysR-type transcriptional
regulators
(LTTR), one of the most common regulator families in prokaryotes. The
family
is named after the Escherichia coli regulator lysR [1]. LysR
proteins are
present in diverse bacterial genera, archaea and algal chloroplasts. All
LTTRs
contain the DNA-binding lysR-type HTH domain, usually in the N-terminal
part.
Most LTTRs require a small compound that acts as co-inducer. The Cterminal
part of lysR proteins can contain a regulatory domain with two
subdomains
involved in (1) co-inducer recognition/response and (2) DNA
binding and
response. LTTRs activate the transcription of operons and regulons
involved in
very diverse functions, such as amino acid biosynthesis, CO2
fixation,
antibiotic
resistance, regulation of virulence factors, nodulation
for
nitrogen fixing bacteria, oxidative stress response or aromatic
compounds
catabolism.
Most LTTRs act as a transcriptional activator of the target genes and
also as
a repressor of their own expression. Typical LTTRs bind to a sequence of
about
50-60 bp, which contains two distinct sites, (1) a recognition-binding
site
(RBS) centered near -65 of the target transcription start site and
with an
inverted
repeat
motif
including
the
T-N(11)-A
motif
and
(2) an
activation-binding
site
(ABS)
which overlaps the -35 region of
the
transcription start site of the regulated gene. LysR proteins are
mainly
cytoplasmic, but some seem membrane-bound [2].
The crystal structure of the lysR
alpha
helices and two anti-parallel
the
helix-turn-helix motif comprising
strands
being called the wing. Most LTTRs
DNA-binding domain of CbnR shows three
beta
strands
(see <PDB:1IXC>),
with
the second and third helices and the
are likely tetramers [3].
Some proteins known to contain a lysR domain:
- Proteus vulgaris blaA, a transcriptional regulator of beta-lactamase.
- Pseudomonas putida catR, a regulator of catechol catabolism for
benzoate
degradation.
- Escherichia coli cynR, a regulator for detoxification of cyanate.
- Klebsiella aerogenes cysB, a regulator of cysteine biosynthesis.
- Vibrio cholerae irgB, an iron-dependent regulator of virulence
factors.
- Escherichia coli lysR, a transcriptional regulator of lysine
biosynthesis.
- Escherichia coli nhaR, a regulator of a sodium/proton (Na+/H+)
antiporter.
- Rhizobium meliloti nodD and syrM, regulators of nodulation genes
involved
in nitrogen fixation symbiosis.
- Salmonella typhimurium oxyR, a regulator of intracellular hydrogen
peroxide
and oxydative stress response.
- Ralstonia solanacearum phcA, a regulator of virulence factors.
The profile we developed covers the entire lysR-type HTH domain.
-Sequences known to belong to this class detected by the profile: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Expert(s) to contact by email:
Schell M.; [email protected]
-Last update: October 2003 / Pattern removed, profile added and text
revised.
[ 1] Henikoff S., Haughn G.W., Calvo J.M., Wallace J.C.
"A large family of bacterial activator proteins."
Proc. Natl. Acad. Sci. U.S.A. 85:6602-6606(1988).
PubMed=3413113
[ 2] Schell M.A.
"Molecular biology of the LysR family of transcriptional
regulators."
Annu. Rev. Microbiol. 47:597-626(1993).
PubMed=8257110; DOI=10.1146/annurev.mi.47.100193.003121
[ 3] Muraoka S., Okumura R., Ogawa N., Nonaka T., Miyashita K., Senda T.
"Crystal structure of a full-length LysR-type transcriptional
regulator, CbnR: unusual combination of two subunit forms and
molecular bases for causing and changing DNA bend."
J. Mol. Biol. 328:555-566(2003).
PubMed=12706716
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00044}
{PS00045; HISTONE_LIKE}
{BEGIN}
*********************************************************
* Bacterial histone-like DNA-binding proteins signature *
*********************************************************
Bacteria synthesize a set of small, usually basic proteins of
about 90
residues that bind DNA and are known as histone-like proteins [1,2]. The
exact
function of these proteins is not yet clear but they are capable of
wrapping
DNA and
stabilizing
it
from denaturation under extreme
environmental
conditions. The sequence of a number of different types of these
proteins is
known:
- The HU proteins, which, in Escherichia coli, are a dimer of closely
related
alpha and beta chains and, in other bacteria, can be dimer of
identical
chains. HU-type proteins have been found in a variety of
eubacteria,
cyanobacteria and archaebacteria, and are also encoded in the
chloroplast
genome of some algae [3].
- The integration host factor (IHF), a dimer of closely related chains
which
seem to function in genetic recombination as well as in
translational and
transcriptional control [4] in enterobacteria.
- The bacteriophage sp01 transcription factor 1 (TF1) which selectively
binds
to and inhibits the transcription of hydroxymethyluracil-containing
DNA,
such as sp01 DNA, by RNA polymerase in vitro.
- The African Swine fever virus protein A104R (or LMW5-AR) [5].
As a signature pattern for this family of proteins, we use a twenty
residue
sequence which includes three perfectly conserved positions. According
to the
tertiary structure of one of these proteins [6], this pattern spans
exactly
the first half of the flexible DNA-binding arm.
-Consensus pattern: [GSK]-F-x(2)-[LIVMF]-x(4)-[RKEQA]-x(2)-[RST]-x(1,2)[GA]x-[KN]-P-x-[TN]
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: NONE.
-Last update: December 2004 / Pattern and text revised.
[ 1] Drlica K., Rouviere-Yaniv J.
"Histonelike proteins of bacteria."
Microbiol. Rev. 51:301-319(1987).
PubMed=3118156
[ 2] Pettijohn D.E.
"Histone-like proteins and bacterial chromosome structure."
J. Biol. Chem. 263:12793-12796(1988).
PubMed=3047111
[ 3] Wang S.L., Liu X.-Q.
"The plastid genome of Cryptomonas phi encodes an hsp70-like
protein,
a histone-like protein, and an acyl carrier protein."
Proc. Natl. Acad. Sci. U.S.A. 88:10783-10787(1991).
PubMed=1961745
[ 4] Friedman D.I.
"Integration host factor: a protein for all reasons."
Cell 55:545-554(1988).
PubMed=2972385
[ 5] Neilan J.G., Lu Z., Kutish G.F., Sussman M.D., Roberts P.C.,
Yozawa T., Rock D.L.
"An African swine fever virus gene with similarity to bacterial DNA
binding proteins, bacterial integration host factors, and the
Bacillus
phage SPO1 transcription factor, TF1."
Nucleic Acids Res. 21:1496-1496(1993).
PubMed=8464748
[ 6] Tanaka I., Appelt K., Dijk J., White S.W., Wilson K.S.
"3-A resolution structure of a protein with histone-like properties
in
prokaryotes."
Nature 310:376-381(1984).
PubMed=6540370
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00045}
{PS00046; HISTONE_H2A}
{BEGIN}
*************************
* Histone H2A signature *
*************************
Histone H2A is one of the four histones, along with H2B, H3 and H4,
which
forms the
eukaryotic nucleosome core. Using alignments of histone
H2A
sequences [1,2,E1] we selected, as a signature pattern, a conserved
region in
the N-terminal part of H2A. This region is conserved both in
classical Sphase regulated H2A's and in
variant histone H2A's which are
synthesized
throughout the cell cycle.
-Consensus pattern: [AC]-G-L-x-F-P-V
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 2.
-Last update: November 1995 / Pattern and text revised.
[ 1] Wells D.E., Brown D.
"Histone and histone gene compilation and alignment update."
Nucleic Acids Res. 19:2173-2188(1991).
PubMed=2041803
[ 2] Thatcher T.H., Gorovsky M.A.
"Phylogenetic analysis of the core histones H2A, H2B, H3, and H4."
Nucleic Acids Res. 22:174-179(1994).
PubMed=8121801
[E1] http://research.nhgri.nih.gov/histones/
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00046}
{PS00047; HISTONE_H4}
{BEGIN}
************************
* Histone H4 signature *
************************
Histone H4 is one of the four histones, along with H2A, H2B and H3,
which
forms the eukaryotic nucleosome core. Along with H3, it plays a central
role
in nucleosome formation. The sequence of histone H4 has remained
almost
invariant in more then 2 billion years of evolution [1,E1]. The region
we use
as a signature pattern is a pentapeptide found in positions 14 to 18 of
all H4
sequences. It contains a lysine residue which is often acetylated [2]
and a
histidine residue which is implicated in DNA-binding [3].
-Consensus pattern: G-A-K-R-H
-Sequences known to belong to this class detected by the pattern: ALL.
-Other sequence(s) detected in Swiss-Prot: 3.
-Last update: November 1995 / Text revised.
[ 1] Thatcher T.H., Gorovsky M.A.
"Phylogenetic analysis of the core histones H2A, H2B, H3, and H4."
Nucleic Acids Res. 22:174-179(1994).
PubMed=8121801
[ 2] Doenecke D., Gallwitz D.
"Acetylation of histones in nucleosomes."
Mol. Cell. Biochem. 44:113-128(1982).
PubMed=6808351
[ 3] Ebralidse K.K., Grachev S.A., Mirzabekov A.D.
"A highly basic histone H4 domain bound to the sharply bent region
of
nucleosomal DNA."
Nature 331:365-367(1988).
PubMed=3340182; DOI=10.1038/331365a0
[E1] http://research.nhgri.nih.gov/histones/
+-----------------------------------------------------------------------+
PROSITE is copyright.
It is produced by the Swiss Institute
of
Bioinformatics (SIB). There are no restrictions on its use by non-profit
institutions as long as its content is in no way modified. Usage by and
for commercial entities requires a license agreement.
For information
about the licensing scheme
send an email to [email protected] or
see: http://www.expasy.org/prosite/prosite_license.htm.
+-----------------------------------------------------------------------+
{END}
{PDOC00047}
{PS00048; PROTAMINE_P1}
{BEGIN}
**************************
* Protamine P1 signature *
**************************
Protamines are small, highly basic proteins, that substitute for
histones in
sperm chromatin during the
haploid phase of spermatogenesis. They
pack
sperm DNA into a
highly condensed, stable and inactive complex.
There are
two different types of mammalian protamine, called P1 and P2. P1 has
been
found in all species studied, while P2 is sometimes absent. There seems
to be
a single type of avian protamine whose sequence is closely related to