Download CHAPTER 3 STRUCTURAL ELEMENTS OF

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Biosynthesis wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Deoxyribozyme wikipedia , lookup

SR protein wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

Interactome wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Western blot wikipedia , lookup

Protein wikipedia , lookup

Metalloprotein wikipedia , lookup

Biochemistry wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Proteolysis wikipedia , lookup

Anthrax toxin wikipedia , lookup

Transcript
CHAPTER 3
STRUCTURAL ELEMENTS OF THE PROTEINS
The secondary structure elements of the proteins can be divided principally in three different
topologies (Fig. 1):
•
•
•
α-helix
β-strands
loops
Figure 1. Secondary structures. To have a clear and quick “sheme” of the proteins 3D structure, the schematic
representation of the secondary structural elements is very useful: helices = cylinders or spirals, strands =
arrows N! C, loops = tapes. This has permitted to identify also the super-secondary structures and the
structural motifs existing in the proteins.
In a protein, segments peptide of various lengths fold together to form regular structures. These
structures are very widespread because are stable, minimizing the steric repulsion and making
maximum the possibility to form hydrogen bonds.
The most common secondary structure is the α-helix, followed by β-strand, that usually interact one
with the other to form a β-sheet. The typical arrangement of α-helix is on the external face of the
protein with one side face toward the solvent and the other face toward the internal part of the
protein. In this case, the helix is named amphipathic and the primary sequence consists in a regular
alternation of hydrophobic and hydrophilic amino acids.
The β-sheet is a stable structure constituted by more strands that can be parallel or antiparallel
depending in which direction the various strands are organized.
α-helix
The α-helix structure represents the the most frequent folding result that a peptide chain can
produce. This is confirmed by the fact that the α-helix is the element of the secondary structure
more common in the proteins. α-helix is a regular structure, characterized by defined parameters.
When a number of successive pairs of dihedral angles ɸ and ψ assume values around -60° and -50°,
the structure is formed. In this mode the peptides planes are arranged in a helical shape around a
longitudinal axis. The α-helix has a step of 5.4 Å and each convolution of the helix consists of 3.6
residues (Fig. 2)
1
Figure 2. Schematic representation of α-helix structure highlighting the hydrogen bonds.
The strong stability of this conformation depends by the fact that all NH and CO groups of the
peptides are involved in hydrogen bonds. Each hydrogen bond occurs between the hydrogen of the
NH group of one residue and the oxygen of the CO group of the fourth subsequent residue. The
direction of the hydrogen bond is almost parallel to the helix axis. In the proteins, the α-helix is
usually right-handed because the amino acids are in “L” conformation and in a left-handed
conformation the side groups R would be too close to the CO groups destabilizing the structure. The
side chains R of the amino acids are all directed towards the outside of the helix. The chemicalphysical characteristics of these groups influences the modality in which the α-helices structures are
arranged among them to build the tertiary structure of the protein. Some amino acids are considered
“good starters” of the α-helix, others, as the proline, can destabilize the helix causing a distortion.
β-sheet
As α-helices, also the β strand is characterized by a regular conformation. The sections of the
peptide chain with β conformations are disposed with a “zig-zag” arrangement in the peptide
planes. The lateral chains of the residues are oriented perpendicularly to the median plane of the
structure in alternate manner from one side to other of the plane forming the β-strand structure (Fig.
3).
Figure 3. Schematic representation of the β-strand.
Usually, in the proteins, two or more β-strands tend to close laterally forming hydrogen bonds
among them generating an extensive and pleated structure called β-sheet. In the β strands the
hydrogen bonds are present between the flanked chains (Fig. 4) instead that among the same
segment as in α-helix. Generally, the β sheets are not planar but tend to assume a curved shape,
slightly "screwed".
2
Figure 4. Schematic representation of the parallel β-sheet.
The antiparallel β-strands are the most stable elements . The high stability is due to the formation of
the linear hydrogen bonds between the main chains. In figure 5 it can be noticed that the CO and
NH groups are linear , in this case the acceptor-hydrogen-donor angle is equal to 180°, and the
hydrogen bond is very stable.
Figure 5. Top: antiparallel β-sheet; bottom: parallel β-sheet.
The parallel β-sheets are less stable because they are not linear. In fact, the structures with only
parallel β-sheets are relatively rare because less stable. In order to obtain a high degree of stability,
the parallel β-sheets must assume particular conformations, while the β–barrel structures,
constituted by antiparallel β strands, are the most frequent architectures.
Loops
In addition to the two, above described, regular elements, peptide segments, apparently
disorganized, having variable length are present in the proteins. These segments, called loops,
represent the connection between α-helices or β-strands, and have an important role in the 3D
structure organization of the peptide chain. They are relatively flexible and allow directional
changes of the sequences in α and β conformations.
The presence of short loops of 3-5 residues that connect two consecutive β-strands, oriented in
antiparallel mode (β-turns) is very usual in proteins (Figure 6).
3
Figure 6. On the left, the graph shows the hairpin loop in
many different proteins. On the right, two types of hairpin
loops more frequent in the structures.
Moreover, the loops are often involved in the formation of the active site of the enzyme. In the loop
regions, the presence of amino acids like glycine or proline is highly frequent and the effects on the
conformation of the chain have been described in previous chapters. The presence of secondary structures connected by loops with different length, allows defining the
concept of topology that can be defined as the manner in which the different secondary structure
elements are connected one to the other.
Topological diagrams
The topological diagrams are very useful to represent the connections between the elements of
secondary structure inside of a protein. For example, β-strands can be connected in several
topologies. The topological identification of a protein is very important because only when two
proteins have the same topology they can possess the same fold and therefore a comparable threedimensional structure. Figure 7 shows three types of beta sheets located in different connections.
Figure 7. Topological
diagrams of some β-sheets:
a) β-sheet with 4 antiparallel strands
b) β-sheet with 5 parallel strands
c) β-barrel structure with 8 antiparallel strands
In a) it is shown a simple connection named “up and down” where the C-terminal region of a β
strand is linked to the N-terminal region of another strand through a short loop; in b) the
connections among parallel β-strands linked to long loops are represented. In this case, due to the
big length of the loops, it is possible to find an α-helix between one strand and another one and the
type of connection takes the name of βαβ. In c) it is illustrated a mixed connection where there is a
hairpin connection (left and right) and a greek motif (center).
4
Structural motifs
The mixing of α and β structures gives rise to simple structural motifs that once assembled together
can generate complex three-dimensional structures. Generally, it is possible to divide a complex
protein structure as the sum of basic elements consisting of basic structural motifs (Fig. 8). The
most frequent structural motifs are:
1. helix-loop-helix: it is present in many Ca2+ binding proteins (calmodulin, parvalbumin,
troponin C ) or DNA binding proteins
2. β hairpin : two antiparallel β-strands are kept together by a short loop of 2-5 residues.
3. greek key: four β strands (minimum), two short loops and one long loop are necessary to
generate this motif .
4. β-α-β: it is constituted by two parallel β-strands intercalated by one α-helix.
Figure 8. Representation of some frequent structural motifs.
Motif helix-loop-helix
In figure 9, the specific structural motif helix-loop-helix for the DNA (left) and for the Ca2+
binding proteins (right), is represented.
Figure 9. Two type of structural motifs helix-loop-helix. On the
left: a typical structural motif DNA binding; on the right: a
typical structural motif Ca2+ binding.
The Ca2+ binding motif, has been identified for the first time in parvalbumin, . This motif is called
also “hand EF” because the E and F helices are the regions of the protein utilized to describe the
Ca2+ binding site (Fig. 10).
5
Figure 10. Schematic representation of the motif Ca2+ binding.
A comparison between the motif and the hand highlights that, from the bottom of the index begins
the helix E, the loop of 12 residues is represented by the middle finger and the helix F starts from
the bottom of the inch toward the ends. Generally, the Ca2+ ligands are located on the loop
connecting the two helices that are almost perpendicular one to the other.
The motif includes two helices: E and F that flank a loop of 12 adjacent residues; five of these
residues bind the Ca2+. Their side chain has oxygen atoms representing the most favourable ligands
for the Ca2+ with a high coordination number (among 6/8). Principally, the side chains of the
aspartic and glutamic residues are the preferred ones. The sequences of the EF motif , reported in
figure 11, show some positions that are conserved and generate the consensus sequence: the residue
6 must be a glycine, the Ca2+ binding amino acids (shown in orange) must be residues with side
chains that can act as possible ligands and the residues forming the hydrophobic core are
represented in green.
Figure 11. Consensus sequences of the motifs EF hand in three different proteins.
In figure 12, the helix-loop-helix motif existing in the transcription factors interacting with DNA, is
shown. The recognition helix is represented in red and contains charge positive residues that
interact with DNA: the other helix is a helix having structural role.
Figure 12. Typical motif helix-loop-helix binding DNA.
6
Hairpin β
The β hairpin is the simplest structural motif formed by β-strands. It consists of 2 antiparallel
adjacent β-strands joined by a loop having a variable length from 2 to 5 residues. Usually, this motif
can be found as isolated form or as part of complex β-sheets structures. Figure 13 illustrates this
concept: in the bovine trypsin inhibitor the β hairpin is found as isolated strand, whereas in
erabulotoxin, a toxin present in the poison of the snake, the β turn is implicated in a complex βsheet formed by 2 β hairpin and one β-strand.
Figure 13. On the left: bovine trypsin inhibitor; on the right: the two hairpin motifs present in the erabulotoxin.
Greek motif
It is believed that the geek motif was born following a modification of the β hairpin, in particular,
from a long hairpin that subsequently had a refolding in the central part of the structure. This is
the reason β-strand is linked to another β-strand located after three strands (Fig. 14). Many proteins
have this motif, especially in the antiparallel β-barrel.
Figure 14. Greek motif representation.
β-α-β motif
The β-α-β motif allows the connection between two parallel strands, in fact the motif is constituted
by two parallel β-strands connected by one α-helix and two short loops (Fig. 15).
7
Figure 15. β-α-β motif
In the motif, the α-helix connects the carboxyl ends of one strand to the amine ends of the next
strand. The helix is closely associated with the two strands through hydrophobic interactions. More
motifs connected among them giving rise to complex protein structures. The loop linking the
carboxyl terminus of the β-strand with the amine of the α-helix, is often involved in the formation
of the active site. The β-α-β motif is always in the right-handed connection permitting the correct
position of the α-helix above the plane formed by two strands (Fig. 16). All known proteins posses
the β-α-β motif in right-hand but subtilisin.
Figure 16. Right-hand connection (a) and left-hand connection (b) of β-α-β motif.
The protein domains
Several structural motifs and secondary structures assemble among them following preferential
assembly. In this paragraph, the preferential assembly made by structures α, structures α-β and
structures β will be presented.
Structure and alpha domain
Propelled supercoiling
The α-helices can assume different arrangement, but the one formed by supercoiled parallel helices
is one of the most frequent. When 2 α-helices adopt the supercoiling configuration, the number of
the residues for turn change from 3.6 to 3.5. In this mode the helices form a “heptad repeat”
sequence, in which every seven residues one residue of leucine is located . In figure 17, the seven
amino acids are indicated with the alphabetic letters: a, b, c, d, e, f and g where d is a leucine.
Moreover, every 3.5 residues the helices interface between them and in this specific position is
always located an amino acid with non-polar character, usually a residue of valine. 8
Figure 17. Scheme and amino acids sequence of two supercoiling helices.
The interactions between the supercoiled helices are strengthened by hydrophobic contacts and by
electrostatic interactions of the amino acids located near the hydrophobic residues. These amino
acids, as well as the residues in position g and e (Fig. 18), have an opposite charge to improve the
interaction between the helices.
Figu
re
18.
Pack
ing
of
the
residues implicated in supercoiling helices and role of the electrostatic interactions.
9
α helical bundle
The α helical bundle is often found in domains consisting of α helices. It consists of 2 pair of
antiparallel helices relative to each other. In those domains, the helices possess strongly
amphipathic proprieties as shown in Figure 19. The internal region is strongly hydrophobic, while
the external region is hydrophilic.
Figure 19. Scheme of the α helical bundle.
The folding is extremely stable because, in addition to the hydrophobic interactions at the interface
of the four helices, there are the intra-helix hydrogen bonds of each helix. The α helical bundle is
present as single domain in monomeric protein but it can be observed also also as dimeric or
tetrameric motif. It is interesting that the α helical bundle is found in proteins with completely
uncorrelated functions as for example in cytochrome b562 and in the human growth factor as
illustrated in figure 20.
Figure 20. Cytochrome b562 and human growth factor.
In figure 21, the same domain is shown for Rop, a dimeric protein. In this case, the monomer is
represented by two antiparallel helices. The two monomers are joined together to form the α helical
bundle. The structural architecture is absolute identical, but while in the previous examples the α
helical bundle was formed by one protein, in this situation the two subunits together form the
domain.
10
Figure 21. α helical bundle in the protein Rop
Globin folding
Globin folding represents one of the most important α-helix structures. The globin structure consists
of a bundle of eight helices named A-H that are connected by short loop arranged to form a
hydrophobic pocket in which the active site, the heme group, is located . The length of the helices
is variable, the longest one is helix H with approximately 28 residues, the shortest one is helix C
with about 7 residues (Fig 22).
Figure 22. Globin structural domain.
The interactions among the helices occur between not sequential helices except for the last two (G
and H). The domain cannot be decomposed as the sum of simple structural motifs but can be
described as a “screwing” of the helices around the central core in different directions. This type of
folding is present in many proteins with correlated functions as myoglobin, phycocyanins,
hemoglobins.
11
Structure with α-helix domain
The β-α-β motif is a simple motif that can be generate three different classes of structure: TIM
barrel, opened β-sheet and the horseshoe (Fig. 23).
Figure 23. TIM barrel, open β sheet and the horseshoe structure.
In the TIM barrel the α-helices are located outside of a barrel consisting by parallel β-strands. In the
opened β-sheet the strands are rotated one to each other and the α-helices are on both sides of the
sheet plane. The third class consists of leucine-rich sequences where the β-strands produce a curved
β-sheet that is partially shielded by the solvent thanks to the presence of α helices. In this way, the
helices are located only on one side of the sheet and the structure remains opened taking the
peculiar name of horseshoe.
The β-α-β motif is the common basic element of these three classes. The structural diversity
depends by different connections. Two β-α-β motifs have two connection options to form one βsheet made by four parallel strands (Fig. 24). For example, β3-strand can be adjacent to β2-strand
resulting in the 1234 order of the sheet , or adjacent to the β1-strand giving rise to a 4321 sheet .
The β-α-β motif is always right-handed, so, in the first case the helices are all on a single side
producing the TIM barrel or horseshoe structures. In the second case, to permit an alignment of the
strands, it is necessary rotate the second motif permitting the helices to stay on both sides fof the
sheet forming an opened β-sheet (Fig. 24).
Figure 24. Connection type of two β-α-β motifs.
12
TIM barrel
The TIM barrel has more structural constrains than the opened beta sheet, which in theory, can be
extended indefinitely. The TIM barrel structure is very frequent in proteins due to its strong
stability. The structure is characterized by the presence of a defined number of β-strands, generally
eight, which provide the staves to form a closed barrel surrounded by α-helices. This structure
represents one of the largest and regular conformations because it needs about 200 amino acids.
The central part of β-sheet is composed entirely of hydrophobic amino acids closely associated with
the hydrophobic chains of the helices interfaced with the β strands while the external faces of the
helices are hydrophilic (Fig. 25).
Figure 25. The TIM barrel structure and strands sequence.
In the interactions formed between the α-helices and β-strands and in the hydrophobic core of the
barrel the residues Val, Ile and Leu have a predominant role representing approximately 40% of the
amino acids. This topological type of protein allows to understand very well the division between
structural region and active region. In fact, the barrel is the structural part, while the active site is
always located in the connection loops between the C-terminal region of the β-strand and the Nterminal region of the α-helix (Fig. 26). In general, all proteins possess a structural core separated
by the active site . In agreement, in the TIM barrel structure ithe active site, is located on the loops
connecting
the
β-strand
and
the
α-helix.
All TIM barrel structures have an enzymatic functions that in some cases involves the entire
protein, while in other cases, the protein is a multi-domain protein. An example is provided by
pyruvate kinase (Fig. 26) that folds into multiple domains, one of them is a TIM barrel. In the multidomain proteins the enzyme activity is always associated with the TIM barrel.
Figure 26. Pyruvate kinase structure and position of the active site in TIM barrel.
13
Opened α-β sheet
The opened sheet structures have α-helices on both sides of the sheet. This produces a structure that
is never closed and it will never form a barrel structure. The only possibility is that the β strands
enclose the α-helices in one face of the sheet. Moreover, there are always two adjacent β-strands
whose connections with the next strand are found in the opposite sides of the sheet. In this region a
directional change of the connections occur and here it is always located the active site, i.e. near to
the C-terminal region of the β-strands. In the figure 27 it is explained how the strand 1 is connected
to the strand 2 through an α-helix and how the strand 4 is connected to the strand 5 always with an
α-helix. In the reversal point of the connection (i.e. where the C-terminal regions are located), there
is a small cavity that represents the region where the active site can be found. Another feature is that
the α-helices are always strictly attached to the sheet by hydrophobic interactions.
Figure 27. Position of the active site in the opened sheet structure. Below, there are some examples where it is possible to identify the position of the active site from
the protein topology (Fig. 28 and Fig. 29).
Figure 28. Flavodoxin and adenylate kinase
structure and position of the clefts where the
active site is located.
14
Figure 29. Hexokinase and phosphoglycerate mutase structure and position of the clefts where the active site is
located.
Horseshoe structure
The last α-β structure is called Horseshoe. Figure 30 shows the horseshoe scheme where it is
possible to note that the architecture is similar to the TIM barrel, because all α-helices are located
on the same side of the β-sheet, but the structure is opened and acquires the typical form of a
horseshoe.
Figure 30. Horseshoe structure domain.
In this structure, the number of β-strands is greater than 8 and the main feature is the presence of
several residues of leucine. In fact, these motifs are also named leucine rich motifs, because the β
strand, the α-helix and the loop possess a high number of leucine that interact in the internal part of
the structure forming a strong hydrophobic core stabilizing the structure (Fig. 31). The leucine
residues 2-5-7-12-17-20-24 are generally invariant and therefore represent a consensus sequence
that permits the identification of the horseshoe domain.
Figure 31. Interaction of leucine residues.
15
Antiparallel beta domains structures
Usually, in the antiparallel β structures, the antiparallel β-strands are arranged to form two β-sheets
packed against one another creating a distorted barrel that constitutes the core of the molecule.
However, the barrel is not the only element formed by antiparallel β strands. Depending on the way
the filaments β are connected to each other, these structures can be divided into:
Up and down structure. This type of connection is very frequent in structures consisting of 8 β
strands, barrel-shaped, where each filament is connected to the next one by a small loop (for
example, retinol binding protein). Generally, proteins with this topology bind bulky and
hydrophobic ligands inside of their structure.
Greek structures. Also in this case, the filaments form a barrel. This topology is found in
immunoglobulins and in many enzymes.
jelly roll structures. Characterizing different macromolecules, including viral coating proteins and
hemagglutinin from influenza virus.
Beta barrel structure
In "all β "proteins β barrel appears to be the most stable structure. Usually, it is constituted by 8
antiparallel β-strands. Eight is the ideal number to form a barrel, since it gives the greater available
compactness. However, barrels with a different number of β-strands can exist. The β barrel can
have a different topology and, consequently, different connections. Figure 32 shows a typical barrel,
where the eight cylindrical strands form the skeleton while the loop accommodates the active site.
Figure 32. Superoxide dismutase barrel structure (Cu,Zn) with eight antiparallel strands.
The greek motif represents a frequent topology in the β barrel structure where, the connection of the
strand n with the strand n-3 or n + 3, is present (Fig. 33).
Figure 33. Greek motif in an antiparallel barrel domain.
16
Up and down structure is another topology often found in these proteins, in which the C-terminal
region of a strand is connected to the N-terminal region of another strand and so on. In Figure 34,
for example, is shown the structure of the protein that binds retinol: in this case, the active site is
located within the barrel itself.
Figure 34. Barrel structure of the protein binding retinol.
The active site is constituted by hydrophobic amino acids coming from the β-strands. Two sheets
overlap to form an antiparallel barrel. In Figure 34 the strands 1 2 3 4 5 and 6 form a sheet while the
strands 1 8 7 6 5 form the second sheet. The strands 1 5 6 contribute to form both sheets.
Another example of a protein associated with an up and down topology is represented by
neuraminidase. The whole structure is complex because the protein is a tetramer (Fig. 35), but the
decomposition of the motifs present in each monomer indicates the presence of simple structural
principles.
Figure 35. The tetrameric structure of neuraminidase
The generated structure has not exactly a barrel shape because the β-sheets are arranged in a
circular mode around an axis passing through the center of the molecule. The protein contains a
total of 1600 amino acids and it is involved in the sialic acid hydrolysis. Every single monomer
consists of a repetition of 6 sheets which are composed by 4 strands, connected to each other in an
up and down topology (Fig. 36). The six sheets are arranged to form the blades of a six-blades
propeller.
Figure 36. Neuraminidase : structure of a
monomer and its topology.
17
The topology of the six sheets existing in a monomer and the connections between the different
motifs appear identical. Strand 4 of the first sheet is in connection with strand 1 of the next sheet
and so on. This produces a molecule with a pseudo-symmetry of order six, in which the 12 loops
are all located on the same side of the molecule. The connection loops are the loops forming the
active site and the neuraminidase represents a clear example of separation between structural and
functional region. The β-strands represent in fact the structural skeleton over which it is
implemented the active site, made by the loops
connecting the strands (Fig. 37).
Figure 37. Neuraminidase and its active site.
Jelly roll domains
Jerry roll is another β barrel structure. In order to understand this structure, it is useful to imagine a
strip of paper, whose two sides are constituted each one by four strands and the strands located on
opposite sides interact between them (Fig. 38).
Figure 38. Schematic representation of the barrel jerry roll structure.
Try to imagine, also, to wrap a tape of paper on a cylinder to have the β strands located on the sides,
with the loops on the top and the cylinder itself on the bottom. The antiparallel strands are bound
together by hydrogen bonds with the pairs 1-8, 2-7, 3-6, 4-5, and will be arranged so that the strandt
1 is adjacent to the second one, the 7 to 4, the 5 to 6 and the 3 to 8. All adjacent strands are
antiparallel. The strand 8 continues to interact with the first, the second with 7 and so on, in other
words, the pairs of antiparallel β-strands interact with one another, forming the structure of the
protein. The corresponding topology is described in figure 39.
18
Figure 39. Topology of the jelly roll.
An example of jelly roll is the head of hemagglutinin (Fig. 40), the globular region of the influenza
virus protein that must recognize the sialic acid to begin the process of infection.
Figure 40. Hemagglutinin, the monomer and the barrel jerry roll in the terminal region of the monomer.
Hemagglutinin is a trimer and it is anchored at the membrane of the virus. It consists of two chains
named HA1 and HA2. HA1 is constituted by 328 amino acids and HA2 by 221. The two chains are
joined by disulfide bridges.
The two chains produce a structure in which a part is constituted by a stem that extends from the
membrane to the second part represented by a globular domain.
HA1 begins from the membrane but does not enter into it and forms an extended structure that, for
about 100 Å, follows the stem in the globular region. The apex is a globular jelly roll structure
formed by eight strands composed approximately by 150 residues. After the globular region, the
subunit strengthens the stem following it in a parallel mode with 70 residues.
HA2 contributes only in the stem formation and in the insertion into the membrane.
The recognition site for the sialic acid is located on the globular head in an inner region of the jelly
roll (Fig. 40) at a distance of more than 100 Å from the membrane. The sialic acid binding site is
located in an internal pocket. The immune system antibodies bind this molecule in the proximity of
the binding site to prevent the viral infection. To escape from this defense mechanism, the virus
undergoes mutations that are located on the border of the pocket because the inner part of the
19
pocket must be conserved to maintain intact the recognition capacity of the molecule for the sialic
acid recognition.
Domains with parallel β-helices
β helix with 2 strands
The domains consisting of parallel β-strands are relatively rare because the hydrogen bonds are less
stable than the antiparallel β-strands. For this reason, to have a stable structure, the strategy used by
the parallel β-strands is to form β-strands helices. In such structures, the polypeptide chain forms a
supercoiled helix consisting of β-strands connected by loops. Currently, two types of such structures
are known. In the simplest case, the β-helix is constituted by two sheets and each turn of the
propeller contains two strands and two loops. (Fig. 41).
Figure 41. Scheme of the β-helix with two strands.
The basic structural unit of this motif contains 18 amino acids: three in each strand and six in each
loop. The sequence shows the specific repetitions, in particular it is possible to identify a consensus
sequence of nine residues Gly-Gly-X-Gly-X-Asp-HUX, where U is an amino acid with bulky and
hydrophobic chain. The first six residues generate the loop while the last three the β-strands.
Another feature of these motif is that they are involved in the binding of ion calcium through the
Asp
residue.
The other structure constituted by parallel β strands, is another helix, where the basic unit is formed
by three β strands that are extremely shorts, from 3 to 5 residues,
connected by loops (Fig. 42).
Figure 42. Schematic representation of a helix formed by three β–strands.
20
Three strands constitute the structure: two almost parallel, and the third one perpendicular to the
first two. Only two residues form the connection loop, between the 1° and the 2° strand, while the
other two loops are much longer with variable length and conformation. In this way, the helix forms
three large parallel sheets arranged as three faces of a prism. An example of this type is represented
by the pectate lyase (Fig. 43).
Figure 43. Pectate lyase structure.
The database CATH
The database CATH classifies proteins on a structural basis.
The classification is hierarchical. The two co-authors are: C.A: Orengo and J.M. Thornton. Contrary
to the primary database, where data resulting from the experiment are inserted without any
manipulation, CATH is a secondary database where the information is analyzed, selected and then
stored.
During the evolution, many families of proteins with different sequence but with correlated
structure have been generated. In fact, proteins with very different sequence may have a similar
three-dimensional structure, consequently, a classification based on three-dimensional structure will
be of great utility, in order to identify significant correlations. The classification in CATH takes
place in a semi-automatic mode, i.e. partially manually and partially automatically.
The abbreviation CATH means: Class, Architecture, Topology, Homologous superfamily, terms
that identify the four main levels of classification of proteins:
1.
2.
3.
4.
Class
Architecture
Topology
Homology
The Class (C-level) is a very simple, and it is assigned automatically. The class is determined
according to the content of secondary structure in the protein. Four classes are defined: α, β, αβ and
another one in which the content of secondary structure is minimal.
Architecture (A-level) considers the domain shape determined by the orientation of secondary
structures but ignores the connections between the secondary structures. Currently, this
classification is done manually using a simple description of the secondary structure arrangement
such as β barrel, three-layer sandwich, etc..
21
The topology (T-level) considers the connections between the elements of secondary structure: the
structures are clustered into groups of folding according to the shape and connections of the
secondary structures. The proteins are classified into folding families.
In level of homologous superfamily (H-level), the proteins with a possible common ancestor,
defined homologous, are included. In this mode, the groups of homologous superfamilies are
defined. There is also a 5th level S (Sequence family), where the proteins having sequence identity ≥
35% are clustered. There are also many sublevels that will be not discussed in this chapter.
The classes are numbered with a number ranging from 1 to 4, class 1 for α , β in the class 2, and so
on. The next level is the architecture that, is the description of the arrangement of the secondary
structure independently of the connections as previously described. Figure 44 shows a series of
proteins belonging to different architectures
(bundle of helices, β-barrel, propeller, horseshoe,
etc..)
Figure 44. Proteins examples classified in several groups of architecture. In this level, the orientation of secondary structures is taken in consideration and only when the
topology and the connections between the different elements will be considered it will be possible
to get into the next level (Fig. 45).
An example is represented in figure 45: it starts with the class α-β, and afterwards it is branched in
three different architectures: TIM barrel, sandwich and roll. The sandwich architecture is branched
again into two different topologies such as flavodoxin and β-lactamase. Although these proteins
have a similar arrangement and orientation of their secondary structure, they are characterized by a
different topology (i.e. different connections of secondary structure
elements).
Figure 45. Example of branching and classification levels.
22
Flavodoxin and lactamase are in the same sandwich architecture and in the same α-β class, but they
belong to two groups with different topology. Proteins, belonging to the same topological group,
have a relatively similar fold because proteins with same topology have conserved secondary
structure elements . The connections between the secondary structure elements are also conserved.
Generally, the length of the connections loops of the secondary structure elements represents the
more variable component among proteins belonging to the same topological group. Also, the length
of the elements of secondary structure, such as the β-strands and the α-helix can change. Usually,
proteins with the same topology possess a conserved core, and therefore, have similar structures but
different functions.
The classification of a protein occurs in a hierarchical manner and each protein can be recognized
by a specific number (Fig. 46). In the example, the recognition number is 1.10.490.20 resulting
from the fact that classes are numbered from 1 to 4; while the levels of architecture, topology and
superfamilies homologous are numbered from ten increasing each level by ten. So the number
1.10.490.20 indicates that the protein belongs to the class 1, architecture 10, topology 490 and
homology 20.
Figure 46. Example of classification and corresponding numbering.
In this way, all proteins can be numbered and classified and each number corresponds to only one
protein. In figure 47 is reported another example of hierarchical classification of the proteins.
23
Figure 47. Example of classification and corresponding numbering
The population in the various levels is not identical. For example, in the H-level (Homologous
superfamily) there are some folds more represented than others. The high frequency of a fold is
correlated to its structural relevance. Some folding are present in enzymes having different
functional characteristics . Therefore, the folding has a quality value (stability, flexibility)
independently from the function to which it is associated. In figure 48 are shown some populated
folds at the H-level.
Figure 48. Some of the folds (H-level) most represented.
Criteria for classification
The ranking methodology is semi-automatic.
The structures are selected from the PDB database. Native proteins or mutant structures, resolved
by diffraction or NMR, are chosen when their structure has at least 3 Å of resolution. The next step
is the sequence comparison that it represents a direct step, since proteins having a sequence identity
greater to 35% are inserted directly to the S-level.
The division of the protein domains represents the next step, which are then analyzed individually.
The assignment of the class is automatic because the procedure examines the composition of the
secondary structure analyzing the value of the Φ and ψ angles and observing how many values are
related to the structure α or β. A comparison of the structures is then performed to define the levels
H and T. SSAP is the software that automatically carries out the comparison. The program
compares distances between residues of a three-dimensional structure in a sequential manner. The
parameter used for classification is the number S which is proportional to the inverse of the sum of
these differences. A small difference indicates similar structures that then will a large S value.
When S is equal to 100 the structures are completely identical. For the level T and H the threshold
is 70 and 80, respectively. Between 70 and 80 values the protein is classified in the level T, while
for values larger than 80 is classified in the level H.
The relative level of the architecture is defined manually. In fact, it is difficult to define such level
in an automatic way. The architectures that are not easily described in a first analysis are grouped
into specific architecture simply defined 'complex architecture'. Finally, a CATH number is
assigned. The proteins can be obtained from the database using:
- PDB code
- CATH code
- Keywords that define the properties of the protein.
24
In this database, the protein function is not kept in consideration.
25