Download Field Guide to Protein Folds

Document related concepts

Zinc finger nuclease wikipedia , lookup

Rosetta@home wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein design wikipedia , lookup

Structural alignment wikipedia , lookup

Protein wikipedia , lookup

Circular dichroism wikipedia , lookup

Proteomics wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Cyclol wikipedia , lookup

Protein folding wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

P-type ATPase wikipedia , lookup

Alpha helix wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Protein domain wikipedia , lookup

Transcript
Field Guide to Protein Folds
This appendix is based on the contribution of Nicholas Furnham, London School of Hygiene & Tropical
Medicine, UK
TABLE OF CONTENTS
1.
14-3-3 Protein Domain
29.
HAMP Domain
2.
ATP-binding Cassette (ABC) Domain
30.
HEAT Repeat Domain
3.
Ankyrin Repeat Domain
31.
Helix-turn-helix DNA-binding Motif
4.
Armadillo Domain
32.
Immunoglobulin (Ig) Domain
5.
BAR Domain
33.
Jelly Roll Fold
6.
β-propeller Fold
34.
Kelch Motif/Domain
7.
BIR Domain
35.
K Homology (KH) Domain
8.
BRCT Domain
36.
LD Motif
9.
Bromodomain
37.
LIM Domain
10.
BTB/POZ Domain
38.
Leucine-rich Repeats (LRR) Domain
11.
C2 Domain
39.
NAD(H)/NAD(P)-binding Domain
12.
Calmodulin
40.
Oligonucleotide/Oligosaccharide-binding Fold (OB) Domain
13.
Caspase Recruitment Domain (CARD)
41.
PAS Domain
14.
Centromeric-A-targeting Domain (CATD)
42.
Polo-box Domain (PBD)
15.
Complement Control Protein (CCP) Domain
43.
PDZ Domain
16.
Calponin Homology (CH) Domain
44.
Pleckstrin Homology (PH) Domain
17.
Cold Shock Domain (CSD)
45.
Phosphotyrosine-binding (PTB) Domain
18.
Complement C1r/C1s, Uegf, Bmp1 (CUB) Domain
46.
RNA Recognition Motif (RRM) Domain
19.
Cyclin-box Domain
47.
S1 Domain
20.
Death Domain
48.
Src Homology 2 (SH2) Domain
21.
Dbl Homology (DH) Domain
49.
Src Homology 3 (SH3) Domain
22.
DEXD/H Domain
50.
Spectrin-like Repeats
23.
EF-hand Domain
51.
Ubiquitin Fold
24.
Epidermal Growth Factor (EGF)-like Domain
52.
von Willebrand Factor Type A Domain
25.
FERM Domain
53.
WD40 Repeat Domain
26.
Fibronectin Type III (FNIII) Domain
54.
Winged-helix Domain (WHD)
27.
Formin Homology (FH) Domain
55.
Zinc Finger Domain
28.
Greek Key Motif
56.
Zinc Ribbon Domain
Field Guide to Protein Folds
1.
1
14-3-3 PROTEIN DOMAIN
CATH: 1.20.190.20
SCOP: a.118.7.1
InterPro: IPR000308
Pfam: PF00244
The 14-3-3 protein domain is an abundant type of adaptor
protein that recognizes and specifically interacts with phosphorylated proteins in eukaryotes. Their name refers to their
properties in chromatographic fractionation. Most species
contain more than one isoform of this protein. These are
not products of alternative splicing but are separate genes
and differ in short sections of the sequence. All seven isoforms form dimers, which have a common horseshoe-like
structure. Each domain consists of nine α helices, with the
five C-terminal helices forming a cuplike structure and the
remainder involved in forming the dimer interface. The
dimeric structure is stabilized by various salt bridges. Formation of homo- and heterodimers is considered to be one
of the factors affecting specificity of the protein with its different protein targets, though not all isoforms are able to
form heterodimers.
Field Guide to Protein Folds
2.
2
ATP-BINDING CASSETTE (ABC) DOMAIN
CATH: 3.20.50.300
SCOP: c.37.1.12
InterPro: IPR017871
Pfam: PF00005
The ATP-binding cassette (ABC) domain is one of two
domains that form the ABC transporters, which are found
in all kingdoms of life and comprise one of the largest protein families. It binds and hydrolyzes ATP, thereby coupling
transport to ATP hydrolysis in a large number of biological processes. Its sequence is highly conserved, displaying
a typical Walker A phosphate-binding loop and a Walker B
magnesium-binding site found in one arm of the L-shaped
structure that the domain adopts. This arm, formed mainly
of β strands, also contains the important residues for ATP
hydrolysis and/or binding (located in the P-loop). The
ATP-binding pocket is found at the end of the arm. The
hinge between the two arms contains both a histidine loop
motif and a Q-motif, making contact with the γ phosphate
of the ATP molecule. The other arm, mostly an α-helical
subdomain, contains the signature motif (LSGGQ) and is
in direct contact with the transmembrane domain of the
transporter.
Field Guide to Protein Folds
3.
3
ANKYRIN REPEAT DOMAIN
CATH: 1.25.40.20
SCOP: d.211.1.1
InterPro: IPR002110
Pfam: PF00023
The ankyrin repeat domain is a 33-residue repeating unit
consisting of a simple helix-turn-helix with the helices
arranged in an anti-parallel fashion. The turn projects out
from the helices at a 90° angle to facilitate the formation
of hairpin-like β sheets with neighboring loops. Thus, the
molecule has an L-shaped structure, with the helices as the
vertical arm and the N- and C-terminal stretches as the
base. The domain, found across all kingdoms of life and
particularly prevalent in eukaryotes, often occurs as repeats
of between four and six units. They can also be found as a
relatively large number of repeats (the largest predicted to
be made of 34 repeats) and can stack to form a variety of
tertiary structures with an intrinsic property to form compact and concave structures.
Field Guide to Protein Folds
4.
4
ARMADILLO DOMAIN
CATH: 1.25.10.10
SCOP: a.118.1.1
InterPro: IPR000225
Pfam: PF00514
The armadillo (ARM motif) domain has multiple copies of
a 42-residue repeat, consisting of three α helices: a short
helix of two turns, followed by two longer helices of three
to four turns each. The two longer helices pack together in
an anti-parallel manner similar to the helical packing in
the HEAT repeat domains. Proteins that contain the motif
often have many tandem repeated copies. Multiple copies
of the repeat form a right-handed superhelix of α helices
with each repeat rotated 30° with respect to the preceding
repeat to form an α-solenoid structure. This features a positively charged groove, which is presumed to interact with
the acidic surfaces of the known interaction partners.
Field Guide to Protein Folds
5.
5
BAR DOMAIN
CATH: 1.20.1270.60
SCOP: a.238.1.1
InterPro: IPR004148
Pfam: PF03114
The Bin-amphiphysin-Rvs (BAR) domain comprises three
long α-helices coiled together. These coils are often associated as dimers to form a functional banana-shaped sixhelix bundle. The coiled coil trimer in the BAR domain
of amphiphysin has approximately 210 amino acids, with
positively charged residues at the end of the coiled coil not
involved in the dimerization as well as along the curved
surface that forms the banana-shaped quaternary structure.
These positively charged residues are thought to facilitate
the interaction with the phospholipids of the membranes.
These proteins function in diverse cellular processes such
as endocytosis, sorting nexins, and amphiphysin and actin
reorganization.
Field Guide to Protein Folds
6.
6
β-PROPELLER FOLD
CATH: 2.105, 2.110, 2.115, 2.120, 2.130, 2.140
SCOP: b.66, b.67, b.68, b.69, b.70
InterPro: IPR010620, IPR001680
Pfam: PF00400, PF06739
The β-propeller fold is found in many different structures
from a variety of organisms across all kingdoms of life. It
adopts a highly symmetrical structure formed of between
four and eight fold repeats arranged toroidally around a
central axis. Each fold repeat is formed of a four-stranded
twisted anti-parallel β sheet. The repeats are arranged in
ring-like fashion around a central tunnel. The ring is closed
by what has been termed a ‘molecular Velcro,’ with both
termini forming one of the four-stranded anti-parallel β
sheets either as a 1 + 3 or a 2 + 2 combination of strands
from the N- and C-terminal ends. There are cases where
closure is achieved by each end forming a separate sheet/
blade and stabilized by hydrophobic interactions. For a further example of a seven-bladed β-repeat see the entry for
WD40 Domain.
Field Guide to Protein Folds
7.
7
BIR DOMAIN
CATH: 1.10.1170.10
SCOP: g.52.1.1
InterPro: IPR001370
Pfam: PF00653
The baculovirus inhibitor of apoptosis protein repeat (BIR)
domain, also referred to as inhibitor of apoptosis (IAP)
domain, is an approximately 70 amino acid zinc-binding
domain. Occurring either as a single domain or as two
or three tandem repeats, they consist of a mainly alpha
orthogonal bundle comprising four or five short α helices,
a three-stranded β sheet, and a zinc atom packed into a
highly hydrophobic environment created by a number of
residues in the vicinity of the zinc pocket and three coordinating residues cystine and one histidine residues, which
are highly conserved among all BIR domains. The principle
function of BIRs is in mediating protein–protein interactions and the surface the domain has a number of hydrophobic regions that would facilitate such interactions with
a number of other proteins.
Field Guide to Protein Folds
8.
8
BRCT DOMAIN
CATH: 3.40.50.10190
SCOP: c.15.1
InterPro: IPR001357
Pfam: PF00533
The BRCT (breast cancer susceptibility protein C-terminal)
domain comprises a Rossmann fold with a central fourstranded β sheet flanked by a single α helix on one side and
two α helices on the opposite side. The domain is an approximately 100 amino acid tandem repeat, which appears to
act as a phospho-protein-binding domain. BRCT repeats
are defined by conserved clusters of hydrophobic residues
that occupy the core of the repeat structure and by glycine
residues that facilitate a tight turn between α1 and β2. There
is considerable diversity in the multidomain architectures
in which the BRCT domains are found. They can exist as
single isolated domains, multiple tandem BRCT repeats,
and in association with other functional domains. They
can also be found in multiple but isolated copies, where an
unstructured region linking the two domains separates two
distinct single BRCT domains.
Field Guide to Protein Folds
9
9.BROMODOMAIN
CATH: 1.20.920.10
SCOP: a.29.2.1
InterPro: IPR001487
Pfam: PF00439
The bromodomain is central to epigenetic control of gene
transcription through its role in acetylating histone lysine.
The structure adopts a conserved left-handed bundle of four
α helices, with two interhelical loops of variable length and
sequence between the first and second and third and fourth
helices. These constitute a hydrophobic pocket that both
stabilizes the structure and interacts with the acetyl-lysine.
The N and C termini are located close to each other, indicating the modular nature of the domain and its involvement in protein–protein interactions and that multiple
bromodomains can be placed sequentially in a chromosomal protein. Though generally the sequence conservation
between bromodomains is low, the residues (two tyrosines
and an asparagine) involved in acetyl-lysine recognition are
highly conserved. The acetyl-lysine forms a specific hydrogen bond between the oxygen of the acetyl carbonyl group
and the side chain amide nitrogen of the conserved asparagines. A network of water-mediated hydrogen bonds with
backbone carbonyl groups at the base of the binding cleft
also contribute to acetyl-lysine binding.
Field Guide to Protein Folds
10.
10
BTB/POZ DOMAIN
CATH: 3.30.710.10
SCOP: d.42.1.1
InterPro: IPR000210
Pfam: PF00651
The BTB domain (Broad-Complex, Tramtrack and Bric a
Brac) is found at the N terminus in 5–10% of zinc finger proteins, in poxvirus proteins involved in dimerization (hence
its other name, POZ domain, for poxvirus and zinc finger),
and in proteins that have a Kelch motif. These domains are
about 120 amino acids long. The fold is based on a cluster of six α helices flanked by a β sheet. As a dimer the N
terminus of each chain is associated with the main body of
the other chain, generating one of the β sheets between the
first β strand of one monomer and the fifth β strand of the
other. This, along with the sixth α helix, forms an extended
concave surface on the underside of the protein dimer and
is implicated in ligand binding.
Field Guide to Protein Folds
11.
11
C2 DOMAIN
CATH: 2.60.40.150
SCOP: b.7.1.1, b.7.1.2
InterPro: IPR000008
Pfam: PF00168
The calcium-binding C2 domain comprises approximately
130 residues forming an anti-parallel β sandwich formed
of two β sheets each containing four strands. The calciumbinding site is located between the loops that connect the
second and third β strands and the sixth and seventh β
strands. Though forming the same tertiary structure, two
distinct topologies exist, differing in their β-strand connectivity. Occurring in single and multiple copies they have
been found in a wide range of eukaryotic signaling proteins
and are involved in a wide range of functions including signal transduction, vesicular transport, GTPase regulation,
lipid modification, and protein phosphorylation. The common mechanism by which the C2 domain acts comes from
the calcium inducing a change in the electrostatic potential
enhancing phospholipid binding, suggesting that the C2
domain functions as an electrostatic switch.
Field Guide to Protein Folds
12
12.CALMODULIN
CATH: 1.10.238.10
SCOP: a.39.1.5
InterPro: IPR011992
Pfam: PF13202
Calmodulin is a small dumb-bell-shaped protein composed
of two globular domains connected together by a flexible
linker and acts as an intermediary protein sensing calcium
levels and relaying signals to various calcium-sensitive
enzymes, ion channels, and other proteins The globular
domain of calmodulin is a particular type of EF-hand (see
entry for EF-hand domain) that collates two calcium ions.
The linker between the two domains can be highly flexible, permitting it to interact with a range of target protein
partners.
Field Guide to Protein Folds
13.
13
CASPASE RECRUITMENT DOMAIN (CARD)
CATH: 1.10.533.10
SCOP: a.77.1.3
InterPro: IPR001315
Pfam: PF00619
The caspase recruitment domain (CARD) has about 94
residues which form six anti-parallel amphipathic α helices
that pack together to form a hydrophobic core. This domain
resembles the death domain (see Death Domain). Helices
2–5 form a four-helix bundle, with the two other helices
crossing on top of helix 4 and 5. It is the orientation of the
latter two helices that contributes to the difference to the
death domain, which shares a similar six-helix bundle. One
side of the domain has predominantly basic residues, while
the other side has predominantly acidic residues, which
contribute to the protein–protein interactions that define
the CARDs function in the regulation of caspase activation
and apoptosis. In addition, a number of CARD proteins
have been shown to play a role in regulating inflammation
in response to bacterial and viral pathogens as well as to a
variety of endogenous stress signals.
Field Guide to Protein Folds
14.
14
CENTROMERIC-A-TARGETING DOMAIN (CATD)
CATH: 1.10.20.10
SCOP: a.22.1.1
InterPro: IPR000164
Pfam: PF00125
The centromeric-A-targeting domain (CATD) comprises
the first loop and the second α helix of the CENP-A histone fold domain that replaces histone H3 in centromeric
nucleosomes. It confers a unique structural rigidity to the
nucleosomes into which it assembles. CATD is confined
to the structured core of the nucleosome, indicating that
many of the essential features of CENP-A are within the
rigid core of the nucleosome. This supports the model that
centromere identity is maintained by a unique nucleosome
structure that serves to distinguish the centromere from the
rest of the chromosome.
Field Guide to Protein Folds
15.
15
COMPLEMENT CONTROL PROTEIN (CCP) DOMAIN
CATH: 2.10.70.10
SCOP: g.18.1.1
InterPro: IPR000436
Pfam: PF00084
Complement control protein (CCP), also called the Sushi
domain or short consensus repeat (SCR) domain, is found
in a number of complement and adhesion proteins. Approximately 60 amino acids in length, it comprises mainly β
strands based on a β sandwich arrangement, with one face
formed of three strands and the other opposing face of two
strands, with the regions between comprising well-defined
turns and less defined loops. They often occur in tandem
arrays linked by short sequences, with up 30 domains
occurring together. Experimental evidence suggests that
the other CCP domains with which it is associated influence the stability of a CCP domain. There are a number of
conserved residues primarily involved in maintaining the
structural fold, including four invariant cysteine residues
involved in intramolecular disulfide bonds, a highly conserved tryptophan, and conserved glycine.
Field Guide to Protein Folds
16.
16
CALPONIN HOMOLOGY (CH) DOMAIN
CATH: 1.10.418.10
SCOP: a.40.1.1
InterPro: IPR001715
Pfam: PF00307
The calponin homology (CH) domain has an all α-helical
architecture dominated by 4 α helices each comprising
11–18 residues connected by relatively long loops. A further three shorter and less well ordered α helices are also
present. Often found in pairs they form the F-actin-binding region of proteins associated with a large superfamily of cytoskeletal proteins responsible for the localization
and cross-linking of filamentous actin. The domain is
also present in signaling proteins involved in modulation
of actin filaments, often as a single domain, and may not
directly interact with actin. As a dimer, the last helix of the
first CH domain and the first helix of the second domain
bind F-actin. Phosphatidylinositol 4,5-bisphosphate modulates the function of actin-binding proteins through the
CH domain, interacting with the second helix of the second CH domain, and thus may regulate by a competitive
mechanism.
Field Guide to Protein Folds
17.
17
COLD SHOCK DOMAIN (CSD)
CATH: 2.40.50.140
SCOP: b.40.4.5
InterPro: IPR002059
Pfam: PF00313
The cold shock domain (CSD) is a small (approximately
70 residues) ancient nucleic acid-binding domain found
in all kingdoms of life. Its structure consists of a nearly
closed anti-parallel β barrel formed of a three-stranded β
sheet crossing at 90° over a β ladder. The first strands of the
sheet and ladder are twisted to form the barrel. The connecting loops are generally short except for the loop that
joins the sheet to the ladder, which is relatively long. Two
consensus RNA-binding motifs are found on adjacent β
strands of the β sheet. They contain aromatic residues that
enable base stacking with single-stranded DNA. In bacteria
this domain is found by itself, but eukaryotic counterparts
contain additional domains at the N and C termini and are
referred to as Y-box binding proteins. In plants the CSD is
found with additional domains at the C terminus.
Field Guide to Protein Folds
18.
18
COMPLEMENT C1r/C1s, Uegf, Bmp1 (CUB) DOMAIN
CATH: 2.60.120.290
SCOP: b.23.1.1
InterPro: IPR000859
Pfam: PF00431
The complement C1r/C1s, Uegf, Bmp1 (CUB) domain is
an approximately 110 residue domain often occurring as
repeat arrays in a range of extracellular and plasma membrane-associated proteins. The domain consists of two fourto five-stranded β sheets. Four conserved cysteine residues
form two disulfide bridges at opposite edges of the same
face of the β sandwich. CUB domains are associated with
a wide variety of functions, from complement activation
to developmental patterning, tissue repair, and cell signaling. Many of the proteins are proteases; although the role of
the CUB domain has yet to be fully realized they have been
shown to be involved in oligomerization as well as substrate
and protein–protein interaction partner recognition.
Field Guide to Protein Folds
19.
19
CYCLIN-BOX DOMAIN
CATH: 1.10.472.10
SCOP: a.74.1.1
InterPro: IPR006671, IPR013763
Pfam: CL0065
The cyclin-box is an approximately 100-residue domain
found in all cyclin and cyclin-like domains that acts as a
generalized adaptor motif to recognize diverse proteins
and DNAs that are involved in cell cycle and transcriptional regulation. It consists of five helices with a central
helix (helix 3) surrounded by the other four helices. Often
the cyclin-box region is duplicated to form a paired N- and
C-terminal set of repeats, although there is little sequence
similarity between the two. In addition they may have
embellishments to this core such as extra helices or loop
regions. No single residue is completely conserved among
all the different domains; however, some positions are more
conserved than others, for example, an alanine that seems
to be important for helical packing.
Field Guide to Protein Folds
20.
20
DEATH DOMAIN
CATH: 1.10.533.10
SCOP: a.77.1.2
InterPro: IPR000488
Pfam: PF00531
The death domain is one of the largest classes of protein interaction modules and it plays a pivotal role in the
apoptosis, inflammation, necrosis, and immune cell signaling pathways. It comprises four subfamilies: the death
domain, the death effector domain, the caspase recruitment domain, and the pyrin domain. All share a common
six-helical structural fold, although individual subfamilies
have distinct structural embellishments and conserved
sequence characteristics that are unique to the subfamily,
for example, in length and/or direction of helices. In addition, sequence similarity is low among subfamily members,
resulting in entirely different surface features that may be
responsible for specificity in protein–protein interactions.
The domains mostly occur in combination with domains
outside of the death domain superfamily or with other subfamily domains, although they can be found as the only
motif in the protein.
Field Guide to Protein Folds
21
21.Dbl HOMOLOGY (DH) DOMAIN
CATH: 1.20.900.10
SCOP: a.87.1.1
InterPro: IPR000219
Pfam: PF00621
The Dbl homology (DH) or RhoGEF domain is approximately 150 residues in length and consists of an elongated
all α-helical bundle composed of nine α helices and four
310-helices. The DH domain contains three highly conserved blocks of sequence and forms three long helices that
pack together to form the core of the domain. A U-shaped
arrangement of the helices, with a block of three shorter
helices stacking against the three long helices, forms a
structural scaffold that the pleckstrin homology (PH)
domain, which the DH domain is invariably preceded by,
packs against. The presence of the PH domain is not absolutely required for catalysis of nucleotide exchange, but
does appear to greatly increase catalytic efficiency in many
cases.
Field Guide to Protein Folds
22.
22
DEXD/H DOMAIN
CATH: 3.40.50.300
SCOP: c.37.1.16
InterPro: IPR011545
Pfam: PF00270
The DEXD/H is a motif, in single letter amino acid code,
found in a highly conserved helicase domain. These proteins belong to a large grouping of proteins that can be subclassified into five superfamilies, one of which contains the
DEXD/H-box, and can be further arranged into subgroups.
The helicase domain with the DEXD/H motif is a parallel
α/β structure sharing the same topology as the RecA-like
domain consisting of five parallel β strands surrounded
by five α helices and extended by a further two parallel β
strands and two α helices. A number of other motifs, such
as the Q-motif that are unique to the DEXD/H protein,
are found within this domain. These motifs are located at
the strand-loop or helix–loop transitions. In addition to a
domain that contains the DEXD/H motif, a second domain
is found in all solved structures of DEXD/H-box containing helicases that contain motifs that may coordinate
ATPase and unwinding activities. Flanking sequences that
are variable in length and composition are thought to provide additional interactions with substrates/co-factors or
confer additional activities.
Field Guide to Protein Folds
23.
23
EF-HAND DOMAIN
CATH: 1.10.238.10, 1.10.238.110
SCOP: a.39.1.10, a.39.1.4, a.39.1.5, a.39.1.6, a.39.1.7, a.39.1.8, a.39.1.9
InterPro: IPR002048
Pfam: PF00036
The EF-hand is a structural motif formed of helix-loophelix that binds calcium or occasionally magnesium. It is
characterized by a 12-residue sequence, corresponding to
the loop which coordinates the metal ion in a pentagonal
bipyramidal configuration. They form a pattern X, Y, Z, Y,
X with the six residues involved in coordinating calcium
defined as X, Y, Z. The last residue is an invariant Glu or Asp
providing two oxygen atoms for coordinating the calcium
ion. The two helices orientate like the spread thumb and
forefinger of the human hand, giving rise to the name. A
number of these structural motifs appear together within
a protein, for example, three in parvalbumin, and two in
calmodulin and protein troponin-C. Each motif within a
protein can have different binding affinities for the metal
ion.
Field Guide to Protein Folds
24.
24
EPIDERMAL GROWTH FACTOR (EGF)-LIKE DOMAIN
CATH: 2.10.25.10
SCOP: g.3.11.1
InterPro: IPR006209
Pfam: PF00008
The epidermal growth factor (EGF)-like domain is made
up of about 30–40 residues often found in the extracellular domain of membrane-bound proteins or in proteins
that are known to be secreted. The structure consists of a
two-stranded β sheet followed by a loop to a C-terminal
short two-stranded sheet and is held together by six cysteine
residues that form three disulfide bonds. Proteins can contain more than one copy of the domain.
Field Guide to Protein Folds
25.
25
FERM DOMAIN
CATH: 3.10.20.90, 1.20.80.10, 2.30.29.30
SCOP: a.11.2.1, b.55.1.5, d.15.1.4
InterPro: IPR000299
Pfam: PF09379, PF00373, PF09380
The FERM domain (F for 4.1 protein, Ezrin, Radixin,
Moesin) is an is an N-terminal domain made up of three
subdomains featuring a ubiquitin-like fold, a four-helix
bundle, and a phosphotyrosine-binding-like domain.
These subdomains are organized by intermediate interdomain interactions to form characteristic grooves and clefts
that together form a compact clover-shaped structure. One
groove, created by the fourth β strand of the ubiquitin-like
fold and the first α helix of the phosphotyrosine-bindinglike domain, is positively charged, while a second groove,
found between the four-helix bundle and phosphotyrosinebinding-like domain, is negatively charged. This produces
a pronounced polarization around the FERM domain. The
two grooves have been shown to be the points of interaction between the domain and specific membrane-bound
proteins.
Field Guide to Protein Folds
26.
26
FIBRONECTIN TYPE III (FNIII) DOMAIN
CATH: 2.60.40.10
SCOP: b.1.2.1
InterPro: IPR003961
Pfam: PF00041
The fibronectin type III (FNIII) domain is one of the
three different kinds of structural units found in fibronectin, a multifunctional protein of the extracellular matrix
and serum. It is characterized by a consensus sequence
of approximately 90 residues that forms a fold similar to
that of an immunoglobulin domain, consisting of seven β
strands that form a sandwich of two anti-parallel β sheets,
one containing three strands and the other four. The superfamily of sequences believed to contain FNIII repeats represents 45 different families that are widely distributed in
animal species, but also found more sporadically in yeast,
plant, and bacterial proteins.
Field Guide to Protein Folds
27.
27
FORMIN HOMOLOGY (FH) DOMAIN
CATH: SCOP: a.207.1.1
InterPro: IPR015425
Pfam: PF02181
Formin proteins are a family of highly conserved eukaryotic proteins implicated in a range of actin-based processes.
The defining feature of formins is a highly conserved
approximately 400 residue domain, the formin homology 2
domain. It forms an almost entirely α-helical dimeric structure consisting of a number of somewhat arbitrarily defined
subdomains. The N-terminal forms a ‘lasso,’ a region that
encircles the C-terminal helix of the dimer-related subunit,
followed by a linker region, a globular subunit and coiled
coil region, and a terminal helix. FH2 domains dimerize
to promote self-association of formin proteins. Two other
formin homology domains (FH1 and FH3) are also characteristic of the formin protein. The proline-rich FH1 domain
is involved in interacting with a wide variety of other proteins and the less well-conserved FH3 domain is important
for determining intracellular localization of formin family
proteins.
Field Guide to Protein Folds
28.
28
GREEK KEY MOTIF
CATH: SCOP: InterPro: Pfam: The Greek key motif is a very common structural unit in
proteins. It is defined as four β strands with a ‘+3,–1,–1’
topology. These motifs share little or no sequence similarity
or common function and are found in a wide range of proteins of either all β or α + β classes. Despite the similarities
in topology, they have different three-dimensional structures depending on the hydrogen-bonding pattern within
the motif and thus can be subclassified into three distinct
classes.
Field Guide to Protein Folds
29.
29
HAMP DOMAIN
CATH: 1.10.287.130
SCOP: a.30.2.1
InterPro: IPR003660
Pfam: PF00672
The HAMP domain is present in histidine kinases, adenyl
cyclases, methyl-accepting proteins, and phosphatases,
which gives rise to its name. It comprises approximately 50
residues forming two long α helices that span membranes
in prokaryotes, fungi, plants, and protists, and it functions
to connect extracellular sensory domains with intracellular signaling domains. It has a heptad repeat, which is
a hallmark of a coiled coil structure. The two helices are
connected by a loop of approximately 13 residues which has
been observed to be tightly packed into the groove between
the two helices in experimentally resolved structures. The
ability to form a coiled coil has been proposed as part of
the mechanism of signal transduction in which the domain
alternates between two parallel helices and a canonical
coiled coil.
Field Guide to Protein Folds
30.
30
HEAT REPEAT DOMAIN
CATH: 1.25.10.10
SCOP: a.118.1.2
InterPro: IPR000357
Pfam: PF02985
The HEAT repeat (named after four cytoplasmic proteins it
is found in: Huntingtin, elongation factor 3 (EF3), protein
phosphatase 2A (PP2A), and the yeast PI3-kinase TOR1)
is related at the superfamily level to the ARM/armadillo
repeat domain. It consists of repeats 37–47 amino acids in
length formed of two anti-parallel α helices and two turns
arranged about a common axis with conserved asparagine
and arginine residues at positions 19 and 25. These repeats
are linked by flexible inter-unit loops. HEAT repeats occur
in series consisting of 3–36 units to form rod-like helical structures that can act as protein–protein interaction
surfaces.
Field Guide to Protein Folds
31.
31
HELIX-TURN-HELIX DNA-BINDING MOTIF
CATH: 1.10.10.60
SCOP: a.4.1.1
InterPro: IPR000047
Pfam: The helix-turn-helix motif is one of the principal structural
motifs capable of binding DNA. As the name suggests, the
HTH motif is made up of helices 1 and 2 in the 3-helix structure shown. It functions as a DNA recognition and binding
motif, binding in the major groove of the DNA duplex, with
the second helix contributing most to the recognition of the
correct DNA strand and termed the recognition helix. The
first helix stabilizes the interaction with the DNA through
hydrogen bonds and van der Waals interactions and is
always in the same relative orientation to the recognition
helix. The helix-turn-helix motif can be found in various
combinations with other secondary structural elements
and in multiple copies within the same domain.
Field Guide to Protein Folds
32.
32
IMMUNOGLOBULIN (Ig) DOMAIN
CATH: 2.60.40.10
SCOP: b.1.1.1, b.1.1.4
InterPro: IPR013151
Pfam: PF00047
The immunoglobulin domain is one of the most populous
protein families in the human genome, with 765 members identified. It is found in many eukaryotes as well as
bacteria, probably through horizontal gene transfer. These
domains contain about 70–110 amino acids and are subcategorized according to size and function. They have seven
to nine anti-parallel β strands forming a barrel-like shape,
although due to the lack of hydrogen bonds around the
barrel, they are in effect two distinct β-pleated sheets and
form a β sandwich. This is often termed a simple Greek key,
which is shared with a number of other domains. Interactions between hydrophobic amino acids in the interior of
the sandwich and highly conserved disulfide bonds formed
between cysteine residues in the second and sixth strands
stabilize the Ig fold.
Field Guide to Protein Folds
33.
33
JELLY ROLL FOLD
CATH: 2.60.120.10
SCOP: b.82.1.1, b.82.1.10, b.82.1.11, b.82.1.12, b.82.1.15, b.82.1.16, b.82.1.18, b.82.1.19, b.82.1.2,
b.82.1.20, b.82.1.22, b.82.1.23, b.82.1.24, b.82.1.3, b.82.1.5, b.82.1.6, b.82.1.7, b.82.1.8, b.82.1.9,
b.82.2.13, b.82.3.1, b.82.3.2, b.82.3.3
InterPro: IPR014710
Pfam: The term ‘jelly roll fold’ was first coined to describe a more
complicated version of the Greek key topology (see Greek
key entry). The same topology has been described as a
wedge shape, β barrel, a β sandwich, and an eight-stranded
β barrel with a β roll topology. The jelly roll motif can be
considered to be a single long β hairpin coiled in a helical
manner to form two four-stranded anti-parallel β sheets.
Various structural embellishments ranging from extensive
regions of coil to additional sheets and helices are permissible. The topology is well conserved even in cases of little
sequence similarity. The fold exists in many functional contexts including glucose 6-phosphate isomerase, germin (a
metal-binding protein with oxalate oxidase and superoxide
dismutase activities), auxin-binding protein, seed storage
protein 7S, and acireductone dioxygenase, among others.
Field Guide to Protein Folds
34.
34
KELCH MOTIF/DOMAIN
CATH: 2.130.10.80
SCOP: b.68.11.1
InterPro: IPR006652
Pfam: PF01344
The Kelch motif is a short, approximately 50 residue, repeat
motif comprising a four-stranded anti-parallel β sheet that
is repeated usually six or seven times to form a propellerlike structure. Sequence identity is relatively low between
individual repeats, ranging from 11 to 50%. A key set of
conserved residues in Kelch distinguishes them from the
large group of WD repeat proteins that also form β propellers. The Kelch domain may be associated with other
domains at both the N and C termini or can be found by
itself.
Field Guide to Protein Folds
35.
35
K HOMOLOGY (KH) DOMAIN
CATH: 3.30.1370.10, 3.30.300.20, 3.30.1140.32
SCOP: d.51.1.1, d.52.3.1, i.1.1.1
InterPro: IPR004088 (Type I), IPR004044 (Type II)
Pfam: PF00013 (Type I), PF07650 (Type II)
The K homology domain is an approximately 75 residue
conserved sequence present in an assortment of nucleic
acid-binding proteins. Though the KH motif is conserved,
structural studies have revealed that there are actually two
different versions, named type I (left) and type II (right),
which have two different folds. Type I KH domains have
a β sheet, abutted by three α helices, composed of three
β strands (ordered as β1, β2, β3) with β1, and β2 parallel to each other and β3 anti-parallel to both. In Type II
KH domains the β1 and β3 are adjacent to each other and
the β2 strand is adjacent and anti-parallel to the β1 strand.
A main variable loop region is different in the two types
of domain, occurring between β3 and β2 on Type I and
between β2 and β1 in Type II. KH domains are often found
in multiple copies, with some evidence that the relative orientation of tandem repeats between Type I and Type II are
quite different. The origin and evolution of the KH domains
have been hypothesized to have occurred from a common
ancestor through N and C terminal extensions or by extension, displacement, and deletion from one of the existing
topologies.
Field Guide to Protein Folds
36.
36
LD MOTIF
CATH: SCOP: InterPro: (IPR001904)
Pfam: The LD motif is a short leucine-rich sequence with the general consensus LDXLLXXL, where X can be any residue. It
was first identified in paxillin, where the motif is repeated
five times and the conserved leucine and aspartic residues
are at the beginning of the sequence (except in the third
repeat where LD is substituted with VE), giving rise to the
name. They are highly conserved throughout the paxillin
superfamily members such as leupaxin, Hic-5, and PaxB, as
well as across a diverse set of species. The structural fragments of LD motifs that have been solved show that it forms
a predominantly α-helical structure.
Field Guide to Protein Folds
37.
37
LIM DOMAIN
CATH: 2.10.110.10
SCOP: g.39.1.3
InterPro: IPR001781
Pfam: PF00412
LIM domains, first discovered in the proteins Lin11, Isl-1
& Mec-3, are composed of approximately 55 residues, 8
(mostly cysteine and histidine) of which are highly conserved and located at defined intervals. This conservation
indicates that the LIM domain binds metal cofactors, and
has been shown to bind two zinc ions. In fact the LIM
domain consists of two zinc fingers (see entry for Zinc
Finger Domain), each of which comprises two orthogonally packed anti-parallel β hairpins. Rubredoxin-type zinc
knuckles connect the strands of the first and third β hairpins, while the second and fourth β hairpins are connected
by tight turns containing a moderately conserved glycine.
The second of the zinc fingers is terminated by an α helix.
The secondary structure and tertiary fold are established by
the conserved tetrahedral zinc coordination. LIM domains
function as a modular protein-binding interface, mediating
protein–protein interactions.
Field Guide to Protein Folds
38.
38
LEUCINE-RICH REPEATS (LRR) DOMAIN
CATH: 3.80.10.10
SCOP: c.10.1.1, c.10.2.1, c.10.2.2, c.10.2.3, c.10.2.4, c.10.2.6, c.10.2.7, c.10.2.8, c.10.3.1
InterPro: IPR001611
Pfam: PF00560
Leucine-rich repeats (LLRs) comprise a motif 20–30 residues in length with a highly conserved segment consisting of an 11-residue stretch LXXLXLXX(N/T/S/C)XL or
a 12-residue stretch LXXLXLXX(C/S)XXL, in which L is
valine, leucine or isoleucine. Typically, each repeat unit has
a β strand turn–α helix structure, although the α helix may
be replaced by a 310-helix, pII, or β turn and is quite variable.
The number of repeats ranges from 2 to 45, adopting a characteristic arc or horseshoe shape with a parallel β sheet on
the concave (inner) face. The concave face and the adjacent
loops are the most common protein interaction surfaces
on LRR proteins. LRRs occur in organisms from viruses to
eukaryotes, and appear to provide a structural framework
for the formation of protein–protein interactions.
Field Guide to Protein Folds
39.
39
NAD(H)/NAD(P)-BINDING DOMAIN
CATH: 3.40.50.720
SCOP: c.2.1
InterPro: IPR016040
Pfam: CL0063
The nicotinamide adenine dinucleotide (NAD)-binding
domain is an ancient protein domain superfamily that is
found in all kingdoms of life. It consists of a Rossmann-like
fold with a three-layered α helix–β sheet–α helix arrangement, where the six β strands are parallel. The strands are in
the order of 6-5-4-1-2-3 with a long loop between strands 3
and 4, which creates a natural cavity that binds the adenine
ring of the NAD cofactor. There is an extensive network of
hydrogen bonds and van der Waals interactions that give
rise to a consensus sequence associated with NAD binding GXGXXG, with the first two glycine residues involved
in binding the NAD, while the third is involved in protein packing. As well as binding NAD, NADP can also be
accommodated by either a conformational change in the
loop that connects the second strand to the second helix or
by a mutation from a conserved aspartate to an asparagine.
Field Guide to Protein Folds
40.
40
OLIGONUCLEOTIDE/OLIGOSACCHARIDE-BINDING FOLD (OB) DOMAIN
CATH: 2.40.50.140
SCOP: b.40.4.1, b.40.4.3
InterPro: IPR004365
Pfam: CL0021
The oligonucleotide/oligosaccharide-binding fold domain
is a 70–105 residue structural motif whose variants share
little sequence similarity. The variability in length results
from dramatic differences in the size of variable loops found
between well-conserved elements of secondary structure. Like the Ig domain, the structure contains a Greek
key motif consisting of two three-stranded anti-parallel
β sheets, where one strand is shared between both sheets.
The β sheets pack orthogonally forming a flattened β barrel.
Frequently a helix is found between strands 3 and 4 that
packs against one end of the barrel. Two glycines, or other
small residues, contribute to the OB fold, one in the first
half of the first strand and one at the beginning of strand
4. OB-fold domains tend to use a common ligand-binding
interface centered on strands 2 and 3 and loops between
strands 1 and 3, strand 3 and the helix, the helix and strand
4, and strand 4 and strand 5. This defines a cleft that runs
perpendicular to the β-barrel axis and is where the majority
of nucleic acid-binding partners bind.
Field Guide to Protein Folds
41.
41
PAS DOMAIN
CATH: 3.30.450.20
SCOP: d.110.3.1, d.110.3.2
InterPro: IPR013767
Pfam: PF00989
The PAS domain, which takes its name from three proteins—the period circadian protein, aryl hydrocarbon
receptor nuclear translocator protein, and single-minded
protein—is a sensor domain involved in signal transduction that is found in a wide range of organisms. It is formed
of a structurally conserved α/β fold with little conservation
between sequences. The structure consists of a central sixstranded β sheet with the N- and C-terminal β strands at
the center. The domain can be divided into four segments:
the first is an N-terminal helical lariat; the second consists
of the first three strands of the central β sheet core that is
interleaved with a hairpin turn and two short α helices; the
third section comprises a helical connector running diagonally across the β sheet and connects to the final section of
the last three strands of the β sheet that are connected by a
fourth section consisting of a final hairpin turn.
Field Guide to Protein Folds
42.
42
POLO-BOX DOMAIN (PBD)
CATH: 3.30.1120.30
SCOP: d.223.1.1, d.223.1.2
InterPro: IPR000959
Pfam: PF00659
The Polo-box domain (PBD) contains at its core a continuous six-stranded anti-parallel β sheet and an α helix.
Occurring as a linked homodimer related to each other
by a 2-fold symmetry, together they form a 12-stranded β
sandwich flanked by three α-helical segments. The domain
regulates the kinase activity of the protein it is found in,
which is located in the centrosomes, kinetochores, and central spindle structures during mitosis and promotes mitosis
and cytokinesis by phosphorylation of a range of substrates.
The phospho-peptide binds in a cleft located between the
two PBD domains, making a short anti-parallel β sheet
that stabilizes the interaction. Histidine and lysine residues, which are some of the few residues highly conserved
between Polo-box domains, are involved in interacting
with the phospho-peptide.
Field Guide to Protein Folds
43.
43
PDZ DOMAIN
CATH: 2.30.42.10
SCOP: b.36.1.1, b.36.1.2, b.36.1.3, b.36.1.4, b.36.1.6
InterPro: IPR001478
Pfam: PF00595
The PDZ domain, taking its name from three proteins
in which it was first observed (post synaptic density protein, Drosophila disc large tumor suppressor, and zonula
occludens-1 protein) is found in all kingdoms of life. It
comprises six β strands and two α helices that fold to form
a six-stranded β sandwich. PDZ domains are modular protein interaction domains that contribute to protein targeting and complex assembly. The C-terminal peptides of the
target protein bind as an anti-parallel β strand in a groove
between the second strand and the second helix, in essence
extending the β sheet. A conserved set of residues (GLGF),
found in the loop between the first and second strand, is
important for stabilizing the C-terminal carboxylate group.
The N and C termini are located on the opposite side to the
binding site, a feature shared with other protein interaction
domains such as SH2.
Field Guide to Protein Folds
44.
44
PLECKSTRIN HOMOLOGY (PH) DOMAIN
CATH: 2.30.29.30
SCOP: b.55.1.1
InterPro: IPR001849
Pfam: PF00169
The pleckstrin homology (PH) domain is approximately
120 residues in length and commonly found as a constituent
of signaling proteins as well as proteins of the cytoskeleton.
Its basic structure consists of a seven-stranded anti-parallel
β sheet that has a strong bend, resulting in a conformation
that is referred to as an orthogonal sandwich or up-anddown β barrel. In addition there is an amphipathic α helix
that blocks one end of the bent sheet. The loops connecting
the β strands differ greatly in length, providing the source
of the domain’s specificity to a range of phosphoinositides,
which differ by being phosphorylated at different sites
within the inositol ring. The only conserved residue among
PH domains is a single tryptophan located within the α
helix that serves to nucleate the core of the domain.
Field Guide to Protein Folds
45.
45
PHOSPHOTYROSINE-BINDING (PTB) DOMAIN
CATH: 2.30.29.30
SCOP: b.55.1.2
InterPro: IPR013625
Pfam: PF08416
The phosphotyrosine-binding domain, as the name suggests, binds phosphotyrosine. It is structurally similar to
the pleckstrin homology domain (see entry for Pleckstrin
Homology Domain) and consists of a β sandwich containing two nearly orthogonal, anti-parallel β sheets and three
α helices. One β sheet is made up of the first four strands,
while the second is made up of the remaining strands plus
parts of the first and second strands. The phospho-peptidebinding site is formed by the fifth strand, the C-terminal α
helix, and the loop connecting the first strand and the second α helix. The N terminus of the phospho-peptide adopts
an extended conformation, forming an additional strand to
the β sheet.
Field Guide to Protein Folds
46.
46
RNA RECOGNITION MOTIF (RRM) DOMAIN
CATH: 3.30.70.330
SCOP: d.58.7.1, d.58.7.3
InterPro: IPR000504
Pfam: PF00076
The RNA recognition motif is one of the most abundant
eukaryotic protein domains and is commonly found in
all kingdoms of life. The RRM domain consists of an αβ
sandwich with a β1-α1-β2-β3-α2-β4 topology, comprising
one four-stranded anti-parallel β sheet and two α helices
packed against the β sheet. Four highly conserved residues
contributing to RNA binding are located in the central
strands of the β sheet, with other conserved residues making up a consensus of approximately 90 residues forming
the hydrophobic core. A common archetype of RRM–RNA
interaction is defined by two deoxynucleotides stacking
against two aromatic rings located on the middle strands of
the β sheet. A third aromatic ring interacts with sugar rings
of the RNA and a positive charged side chain forms a salt
bridge with the phosphate between the two deoxynucleotides. Combinations of two or more RRM domains allow
for continuous recognition of a long nucleotide sequence.
Recent studies have shown that as well as RNA recognition the RRM domain is also involved in protein–protein
interactions.
Field Guide to Protein Folds
47.
47
S1 DOMAIN
CATH: 2.40.50.140
SCOP: b.40.4.16
InterPro: IPR003029
Pfam: PF00575
The S1 domain is an approximately 70–80 residue domain
discovered in ribosomal S1 protein and found in a range
of RNA-binding proteins, especially those associated with
the initiation of translation and turnover of mRNA. The S1
domain consists of a five-stranded anti-parallel β barrel,
with the β strands arranged in a Greek key topology and a
β bulge in the first strand permitting the formation of the
barrel. Some of the connecting loops contain very small sections of α helix. The termini are orientated such that arrays
of S1 domains can be arranged in a consecutive fashion.
Residues on the surface of the domain are not strictly conserved, reflecting the varied specificity of RNA binding of
the S1 domain. Structural similarity to cold shock domain
family proteins, at least one other ribosomal protein, and
domains of several aminoacyl-tRNA synthetases indicates
that they all diverged from an ancient nucleic acid-binding
domain.
Field Guide to Protein Folds
48
48.Src HOMOLOGY 2 (SH2) DOMAIN
CATH: 3.30.505.10
SCOP: d.93.1.1
InterPro: IPR000980
Pfam: PF00017
The Src homology 2 (SH2) domain consists of approximately 100 residues with two α helices and seven β strands,
with five of the β strands forming a central β sheet flanked
by the two α helices with the remaining two β strands at the
N and C termini. Unlike the SH3 and PDZ domains, SH2
domains specifically function in protein tyrosine kinase
pathways, due to their dependence of binding on tyrosine
phosphorylation (pTyr). Two regions mediate this specificity: the first is the phosphorylated tyrosine residue-binding
site; the second is a region that interacts with ligand residues C-terminal to the pTyr. Most of the binding interactions occur in the loop between β strands two and three. In
addition, further interactions occur in a deep hydrophobic
binding pocket that interacts with a pTyr plus three residues. They are found in a wide variety of protein contexts
and are frequently found as repeats in a single protein
sequence.
Field Guide to Protein Folds
49
49.Src HOMOLOGY 3 (SH3) DOMAIN
CATH: 2.30.30.40
SCOP: b.34.2.1
InterPro: IPR001452
Pfam: PF00018
The Src homology 3 (SH3) domain is approximately half
the size of the SH2 domain, consisting of around 50 residues. The structure is composed of five or six β strands
arranged as two tightly packed anti-parallel β sheets, which
are arranged in a barrel-like structure. The linker regions
between the strands may contain short α helices. The
domain contains a relatively flat hydrophobic ligand-binding pocket consisting of three shallow grooves defined by
conservative aromatic residues in which the protein ligand
adopts an extended left-handed helical arrangement. It
recognizes proline-rich sequences, in particular those containing a PXXP motif. Recent studies have shown that the
specificity and cellular function of SH3 domains are far
more diverse than previously appreciated. Like the SH2
domain they are found in the context of other domains and
may mediate many diverse processes such as increasing
local concentration of proteins, altering their subcellular
location, and mediating the assembly of large multiprotein
complexes.
Field Guide to Protein Folds
50.
50
SPECTRIN-LIKE REPEATS
CATH: 1.20.58.60
SCOP: a.7.1.1
InterPro: IPR002017
Pfam: PF00435
Cytoskeletal proteins of the spectrin family have an elongated structure composed of repeating units termed the
spectrin-like repeat. This unit comprises a triple helical structure, with three long helices separated by a loop
between the first and second helix and a turn between the
second and last helix. There is little sequence similarity
between repeats, although some residues are more highly
conserved and correspond to a set of residues between the
a and d heptad positions in the helical bundle. The repeats
are defined by a characteristic tryptophan residue in the
first helix and a leucine at two residues from the carboxyl
end of the third helix. The second helix is interrupted by
proline in some sequences.
Field Guide to Protein Folds
51.
51
UBIQUITIN FOLD
CATH: 3.10.20
SCOP: k.45.1.1
InterPro: IPR000626
Pfam: PF00240
The ubiquitin domain has an overall topology of an α/β roll
consisting of five β strands and two α helices in the order
β1–β2–α1–β3–β4–α2–β5. Some members of the superfamily have additional decorations to the basic fold, including
the addition of the small helix or an additional strand before
the first principal helix or replacing the second helix with a
β strand. In addition, the length of the connecting loops can
vary a lot, which might modulate the interaction with other
domains or proteins. Ubiquitin is a small domain of just 76
residues, with poor sequence conservation between superfamily members except for seven lysine residues, which are
important for linking to the target protein or another ubiquitin molecule to form a ubiquitin chain that attaches itself
to a target protein. The length and branching pattern of the
ubiquitin chain alters the fate of the target protein.
Field Guide to Protein Folds
52.
52
VON WILLEBRAND FACTOR TYPE A DOMAIN
CATH: 3.40.50.410
SCOP: c.62.1.1, c.62.1.2, c.62.1.4
InterPro: IPR002035
Pfam: PF00092
von Willebrand factor is a large multimeric protein that
has a central role in hemostasis and thrombosis. It comprises a mosaic of many types of domains termed type A
to D, with type A found in a number of other proteins such
as complement factor B, the integrins, and collagen types
VI, VII, XII, and XIV among others. All von Willebrand
factor type domains share a common fold of a central β
sheet with one anti-parallel edge strand flanked on two
sides by amphipathic helices that lie against the β sheet face
forming a globular domain. Often an Mg2+ is bound to the
carboxy-terminal end of the β sheet. In von Willebrand factor domains with a bound metal ion, the metal coordinates
residues in loops that constitute what is termed a metal
ion-dependent adhesion site (MIDAS) and is involved in
allosteric movement of the C-terminal α7 helix from the
ligand-binding face at the MIDAS to the opposite end of
the domain, which contacts other domains.
Field Guide to Protein Folds
53.
53
WD40 REPEAT DOMAIN
CATH: 2.130.10.10
SCOP: b.69.4.1
InterPro: IPR001680
Pfam: PF00400
The WD40 repeat is an approximately 40-residue tract
characterized by a glycine-histidine (GH dipeptide) 11–24
residues from the N terminus and a tryptophan-aspartic
(WD) acid dipeptide found at the C terminus, which also
gives rise to its name. The WD domains contain 4–16 WD
repeats, which circularize to form a β propeller (see entry
for β-Propeller Fold). Each repeat contains four β strands,
although the repeat structure is not equivalent to a single
propeller blade. In fact the propeller blade contains the first
three strands of one repeat and the last strand of the previous propeller blade. This sharing of strands stabilizes the
overall structure. The evolutionary pressure to conserve
the sequence is apparently to form the propeller structure,
as it provides a stable platform for several protein–protein
interactions.
Field Guide to Protein Folds
54.
54
WINGED-HELIX DOMAIN (WHD)
CATH: 1.10.10.10
SCOP: a.4.5
InterPro: IPR011991
Pfam: CL0123*
The winged-helix domain is an elaboration of the common
helix-turn-helix domain, a common denominator in basal
and specific transcription factors found in all kingdoms of
life. It comprises a three-helix core in the form of a righthanded helical bundle with a partly open configuration.
This core is embellished with a C-terminal β strand hairpin
unit that packs against the shallow cleft of the partially open
core. Two extensions between the last helix and the first of
the C-terminal β strands and between this β strand and the
last β strand form the two ‘wings.’ The wings can play an
important part in the interaction with DNA, although this
is not considered to be the major interaction site, which is
on the third helix. As well as interacting with DNA it can
form protein–protein interactions.
* This represents the general helix-turn-helix clan, of which the winged-helix domain is an example.
Field Guide to Protein Folds
55.
55
ZINC FINGER DOMAIN
CATH: 3.30.160.60
SCOP: g.37.1.1
InterPro: IPR007087
Pfam: PF00096
Note: References relate to the ‘classic’ zinc finger motif (C2H2).
Several different zinc finger motifs have been characterized,
and vary with regard to structure, as well as binding modes
and affinities. This entry relates to the classic (C2H2) motif,
which is the most common DNA-binding motif found in
eukaryotic transcription factors. It contains two conserved
histidine and cysteine residues that coordinate binding of
the zinc ion. Generally zinc fingers contain few secondary
structural elements, with C2H2 zinc fingers consisting of
just two short β strands followed by an α helix. Individual
zinc finger domains typically occur as tandem repeats with
two, three, or more fingers comprising the DNA-binding
domain of the protein; they can bind in the major groove of
DNA and are typically spaced at 3-base pair intervals. The
α helix contains the recognition site for sequence-specific
contacts with the DNA.
Field Guide to Protein Folds
56.
56
ZINC RIBBON DOMAIN
CATH: 2.20.25.10
SCOP: g.41.3
InterPro: IPR013137
Pfam: CL0167
Unlike previously characterized zinc-containing DNA/
RNA-binding modules, which contain an α helix, the zinc
ribbon domain comprises a three-stranded β sheet. The
domain is similar to the other zinc-containing DNA/RNAbinding modules (see Zinc Finger Domain) in as much as
it forms a small globular domain stabilized by the coordination of a zinc ion, with a well-defined secondary structure, nonpolar interior, and a charged surface. The three β
strands form an anti-parallel β sheet with the connecting β
turns containing a cystine-2X-cystine motif, the cystines of
which are involved in the zinc ion coordination. Homologous proteins containing the domain have been found in all
kingdoms of life. In Eukaryota the domain is found in the
N-terminal region of transcription inhibition factor TFIIB,
which binds to the TATA-binding protein-promoter complex and facilitates the recruitment of RNA polymerase II.