Download A primer on the structure and function of proteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nucleic acid analogue wikipedia , lookup

Transfer RNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

NEDD9 wikipedia , lookup

Point mutation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genetic code wikipedia , lookup

Expanded genetic code wikipedia , lookup

Transcript
A primer on the structure and function of proteins
Introduction
Proteins have an essential role to play in virtually all biological processes. They participate in an
extraordinarily diverse set of chemical reactions and physical tasks. Note that the word protein, coined by
Jöns J. Berzelius in 1838, is derived from the Greek word proteios, which mean “of the first rank.”
Examples of the significance and scope of protein functionality:
1. Enzymatic catalysis
The vast majority of chemical reactions within biological systems are
catalyzed by enzymes. Enzymes have an enormous influence on the pace
of biological systems as they can typically increase a biochemical reaction
rate by a million fold. The reactions under enzymatic control can be quite
simple, or as complex as the replication of an entire genome.
2. Transport and storage
Transport of small, but critically important molecules is carried out by
specific proteins. Examples are the globins; haemoglobins are used to
transport oxygen in red blood cells, and myoglobins to transport oxygen in
muscle. Over time haemoglobin and myoglobin have evolved very precise,
but divergent functions with respect to their role in oxygen transport within
an organism.
3. Motion
The physical motion of muscle contraction, movement of chromosomes
during mitosis and meiosis, the propulsion of sperm by flagella is achieved
by using proteins.
4. Structure and support
Proteins can also serve as structural elements. The strength of skin,
tendons, dentin and bone is achieved, in part, by the use of collagen, a
relatively simple protein that forms long fibrous structures
5. Immunity
Proteins play a central role in the process of distinguishing self from nonself, recognizing and binding to foreign substances derived from viruses,
bacteria, and cells from other organisms. The evolution of immune proteins
represent an important area of research, as the proteins of infectious agents
are under strong selective pressures to adapt to the immune systems of
their host organisms, and the proteins of the host organisms immune
system, are under strong pressure to evolve counter-adaptations. An
evolutionary arm-race is played out every day at the level of the immune
system.
6. Control of gene
expression
Precise control of the level of gene expression is essential to the proper
growth and function of cells. The incredibly complex process of
development from a fertilized egg to a multi-cellular organism such as a
human being is under genetic control through the production (expression)
and function of proteins such as transcription factors.
The proteins encoded in the genomes of all organisms, from viruses to humans, have evolved under a
dynamic process of mutation, genetic drift, recombination, and natural selection. However it is by natural
selection (sometimes called Darwinian selection) that proteins have evolved to perform specific functions.
The functional properties of proteins depend on their three-dimensional structure. As the three
dimensional structure is derived from a linear chain of amino acids that folds into compact domains with
their own three dimensional structures, we must understand the organization and evolution of proteins at
different levels of structure in order to understand the origins of protein function.
Amino acids are the building blocks of proteins
The monomers of a protein are amino acids. The basic structure of an amino acid is composed of an
amino group (shown in blue below), a carboxyl group (shown in green below), a hydrogen atom (shown in
black), and a unique R-group (shown in red below) bonded to a central carbon (the α carbon) atom. The
twenty types of amino acids found in proteins are defined by twenty R-group side chains. The side chains
vary in size, shape charge, hydrogen bonding capacity, and chemical reactivity.
H
O
│
α
H 3N - C – C
│
OR
Amino Acids
AA Codes
Structure
Glycine
Gly
G
Alanine
Ala
A
Serine
Ser
S
Proline
Pro
P
Valine
Val
V
Threonine
Thr
T
Cysteine
Cys
C
Leucine
Leu
L
Isoleucine
Ile
I
Asparagine
Asn
N
Aspartic acid/Aspartate
Asp
D
Glutamine
Gln
Q
Lysine
Lys
K
Glutamic acid/Glutamate
Glu
E
Methionine
Met
M
Histidine
His
H
Phenylalanine
Phe
F
Arginine
Arg
R
Tyrosine
Tyr
Y
Tryptophan
Trp
W
In fact the same twenty amino acids are used in proteins of all species from bacteria to whales. This
indicates that the fundamental alphabet of proteins, these twenty amino acids, is very ancient; > 2.5 billion
years old. What is truly remarkable is the exceptional range of functions and complexity of life forms (see
Tree of Life: http://tolweb.org/tree/phylogeny.html) that have been constructed from this simple set of
twenty monomers.
Although there are only 20 amino acids, the total number of possible unique polypeptides is “nearly
infinite”. Consider the polypeptides of just 150 amino acids. There are 20150 (approximately 1 × 10105)
different possibilities. This number is greater than the number of electrons in the universe! The number of
possible 3D conformations they can theoretically adopt is even greater.
Let’s say that there are 10 million species on Earth with an average of 5,000 genes per genome. There
will be 5 × 1010 protein sequences. Although this number is but a tiny fraction of the total polypeptide
sequence space, it is nonetheless several orders of magnitude greater than greater than the number of
proteins known to science at this time. Fortunately, it is clear that these proteins represent a highly limited
number of 3D protein folds. Estimates of the total number range from 650 to 10,000; most fit ~ 1000 folds.
The amino acids are often subdivided by the physiochemical properties of their side chains. A Venn
diagram is provided that shows them grouped according to the nine most commonly used properties.
Tiny
Small
P
CS-S
Aliphatic
I
A
N
G
V
S
Q
CS-H
L
D
T
M
Y
F
W
Aromatic
Polar
K
Negative
E
H
R
Charged
Hydrophobic
Positive
Note that it is impossible to classify all amino acids into an absolute set of sub-group, as the scales for a
particular physiochemical property are artificial. For example, see the four different hydrophobic scales in
the figure below. In fact there are more than 115 different scales for measuring the properties of a sidechain. Of course many of these different scales are trying to measure the same property, and hence are
highly correlated with each other. The important thing to remember is that the amino acid will be in a
different, and far more complicated, environment than the one in which its physiochemical property was
inferred. It will be in a unique protein environment, interacting with a unique set of other amino acids in
three dimensional space.
Four different scales for measuring the same physiochemical property:
hydrophobicicty. References for the scales are as follows: (1) Janin
(1979); (2) Wolfenden, et al. (1981); (3) Kyte and Doolittle (1982) ; and
(4) Rose, et al. (1985).
There are some substantial differences among these four scales. Note
that only two of the scales (1 and 4) classify cysteine as the most
hydrophobic amino acid. Scales 1 and 4 are derived using 3D structure
information, whereas scales 2 and 3 just use the properties of the side
chains. Amino acid behaviours reflect a myriad of factors (e.g., long
rang and short range interactions, environment, steric effects, etc.) and
one scale seems suitable for all cases.
Figure adapted from: http://prowl.rockefeller.edu/aainfo/hydro.htm
Amino acid polymers are built by using the peptide bond
Individual amino acids are joined together during the process of protein synthesis, or translation, by the
formation of a PEPTIDE BOND. The resulting polypeptide chain has two parts; the MAIN CHAIN (or BACKBONE)
is the regularly repeating part of the polypeptide, and the SIDE CHAIN is the variable part of the polypeptide
composed of R-groups. The polypeptide chain is directional, having different ends called the α-amino end
(AMINO TERMINUS) and the α-carboxyl end (CARBOXY TERMINUS). The convention is that the α-amino end of
the polypeptide is considered the “beginning” end. Hence, Ala-Met-Ile is not the same tri-peptide as IleMet-Ala.
Amino or N-terminus
Carboxyl or C-terminus
CH3
│
S
│
CH2
│
H O
H O
CH2
│ ║
│
│ ║
NH3―C―C―NH―C―C―NH―C―C―O
│
│ ║
│
H O
CH―CH3
CH3
│
CH2
│
Peptide Bonds
CH3
Alanine
Methionine
Isoleucine
Most natural polypeptides are between 50 and 2000 amino acids in length. Polypeptides of less than 30
amino acids are sometimes called OLIGOPEPTIDES.
The structural hierarchy of a protein can be described at four levels
1. PRIMARY STRUCTURE refers to the linear arrangement of amino acids along a polypeptide chain. This
level of structure includes any covalent connections between elements of the chain, such as disulfide
bonds.
Amino acids
2. SECONDARY STRUCTURE refers to the folding of the polypeptide chain into regular structures such as αhelices and β-sheets. Proteins fold into more complex structures through stable association of such
secondary structural elements.
β-sheet
α-helix
3. TERTIARY STRUCTURE refers to the folding of regions between the regular secondary structures and the
longer strand interactions (hydrogen bonds, hydrophobic bonds and van der Waals interactions) that
combine to form a compact globular structure. Such structures are sometimes called DOMAINS.
Proteorhodopsin (side view)
Proteorhodopsin (top view)
4. QUATERNARY STRUCTURE refers to the spatial arrangement, and interactions, of subunits that combine to
form multimeric, or multi-subunit, proteins. Multimeric proteins are built by using more than one
polypeptide chain. Examples include haemoglobin, which is comprised of two α-globins and two βglobins (note that in this case α and β do not refer to secondary structure!); and immunoglobulin G, which
is comprised of two L chains and two H chains.
Haemoglobin
Immunoglobulin G
Amino acid sequences are specified by genes and a genetic code
The information contained in the DNA that specifies the amino acid sequence of a protein is converted to
an mRNA by the process of TRANSCRIPTION. The conversion of the information contained in the mRNA
into the amino acid sequence of a polypeptide is called TRANSLATION. The faithful translation of the
information contained in a nucleic acid is achieved by reading units of three nucleotides (called a CODON)
and translating it into the correct one of 20 different amino acids; the relationship between a codon and an
amino acid is called the GENETIC CODE.
Consider the following:
• The function of a protein is determined by its three dimensional structure and the properties of the
specific amino acids at each location in the three dimensions.
• The three dimensional structure of the protein is itself a function of the amino acid sequence (e.g.
presence of sulphur atom in side chain will lead to disulfide bonds).
• The amino acid sequence, in turn, is determined by the nucleotide sequence of the encoding
gene.
Because the genetic code determines how a nucleotide sequence will be translated into a polypeptide, it
defines how random changes to the gene brought about by the process of mutation will impact the
function of the encoded protein. Knowledge of the genetic code turns out to very useful to understanding
rates of molecular evolution as well as the evolution of function. In fact, later in this course we will use the
genetic code to define an index of natural selection pressure on a protein. See the last lecture (Lecture 3)
for a more detailed summary of the genetic code and theories of the evolution of the code itself.
Prosthetic groups
Many proteins contain a small tightly bound molecule called a prosthetic group. A PROSTHETIC GROUP is
any small molecule that is not a polypeptide, which is tightly bound to the protein and plays an essential
role in its function. Prosthetic groups often have an important influence on the folded, three dimensional,
shape of the protein.
One of the best known examples of a prosthetic group is the HEME MOLECULE of haemoglobin. The mature
haemoglobin molecule has a quaternary structure comprised of two alpha globins and two beta globins;
i.e., the mature molecule is comprised of four subunits forming a tetramer. Each of the subunits has a
characteristic “GLOBIN FOLD” which enfolds the heme prosthetic group. Of course you will remember that
heme is the oxygen carrier and that oxygen metabolism is extremely widespread. Thus it is not surprising
that the globin fold is extremely conserved among a huge diversity of organisms. In fact, globins have
been used for oxygen transport (haemoglobin) and storage (myoglobin) for more than 800 million years.
Prosthetic groups can be linked to proteins either by covalent bonds or by noncovalent bonds, as in
haemoglobin. Remember that not all proteins have prosthetic groups.
Covalent modifications affect the structure and function of proteins
Because of the complexity at different levels of structure (1°, 2°, 3°, and 4°), and the use of prosthetic
groups, the basic set of 20 amino acids provides a remarkable foundation for diversity of protein function.
However, capabilities can be further modified, or enhanced, by covalent modifications following the
synthesis of the polypeptide chain. As the purpose of this topic is to provide a primer on protein function,
a full accounting of this source of molecular diversity is not possible. Rather a table of example
mechanisms is provided below.
Examples of different forms of covalent modification:
1. Disulfide bridge
Covalent bridge between two cysteine residues in the same chain or in
different polypeptide chains. The disulfide bridge is an important
mechanism by which folded conformations of proteins are stabilized,
2. Polypeptide cleavage
Many proteins are cleaved or trimmed sometime after synthesis. For
example many digestive enzymes are synthesized in an inactive form.
This allows for safe storage in the pancreas. To use such an enzyme, it is
secreted into the digestive track where a peptide bond is cleaved, yielding
the active form of the enzyme.
3. Modification of AA side
chains
The genetic code only allows for 20 different amino acids. To use an
amino acid with a physiochemical property not allowed by the genetic
code, one of the 20 fundamental amino acids must be altered. Covalent
modification of amino acid side chains occurs after polymerization of a
polypeptide chain.
In collagen, the side chains of proline and lysine are often modified by
attachment of hydroxyl groups that act to stabilize the correct structure of
the collagen fibre.
Phosphate groups can be linked to the hydroxyl groups of tyrosine, serine,
or threonine. Phosphorylation or dephosphorylation of specific residues is
used to modulate the activity of enzymes. The activity of adrenaline is
stimulated by phosphorylation of serine and threonine.
Phosphorylation as a molecular switch: production of cancer by some
tumour viruses is by excessive phosphorylation of tyrosine residues on
proteins that control cell proliferation.
4. Addition of
carbohydrates
Glycoproteins are proteins with covalently bound carbohydrates
5. Addition of lipids
Lipoproteins contain tightly bound lipids. In membrane lipoproteins, fatty
acids are often connected to cysteine groups as a means to increase the
hydrophobicity of the amino acid side chain. In other lipoproteins, lipids
are bound to side-chains via hydrophobic interactions.
Cleavage site
Insulin is derived from the cleavage of
proinsulin at two locations.
This results in two chains (21 and 30 AA in
length) which are covalently bonded together
by disulfide bonds to form an insulin
molecule.
Cleavage site
Disulfide bonds
The fundamental nature of protein functionality: binding and transmission of
conformational changes
In the first section of this topic we provided just a small survey of the significance and scope of protein
function. Proteins, as a class of macromolecules, achieve this impressive range of function through the
ability to recognize and interact with a highly diverse set of molecules. In fact this capacity is unique to
proteins. It is all the more impressive when you consider that this capacity is based on a fundamental set
of only 20 amino acids.
•
Proteins can recognize and interact with a wide variety of molecules: Examples include (i) binding
the heme prosthetic group by haemoglobin, (ii) binding individual proteins into long arrays found in
contractile muscle fibres; (iii) binding of foreign molecules to antibody proteins; (iv) regulation of
gene expression through binding of DNA sequence elements.
•
Proteins are capable of forming complementary surfaces and clefts: The capacity for such a
diversity of inter-molecular interactions is derived from the precise formation of clefts and the
interactions (hydrogen bonds, van der Waals, etc.) with the side chains found at key positions in
such clefts. Thus amino acid sequence contributes to this both through its influence on protein
folding and 3D structure, and also through the physiochemical properties of specific amino acid
residues.
•
Catalytic power of proteins is derived from an ability to bind substrates and orient them precisely:
The enormous increase in biochemical reactions (> million fold) that are catalyzed by proteins is
achieved by binding and orienting substrates in very close and optimal proximity for the specific
reaction. Proteins also use charged groups to polarize substrates and stabilize transition states
•
Proteins can receive, integrate, and transmit molecular signals: Proteins contain regulatory sites
called ALLOSTERIC SITES that control both the binding of other molecules and their catalytic activity.
Reversible binding of O2 in myoglobin and haemoglobin is a nice example. The binding of H+ and
CO2 at sites distant to the heme group will precisely alter the micro-environment of the heme
pocket, promoting the release of O2. Thus, reversible binding of O2 is achieved by sensitivity to
pH and carbon dioxide concentrations. ALLOSTERIC CONTROL derives from conformational
changes associated with the binding of separate and non-adjacent sites. Note that myoglobin
oxygen affinity is not sensitive to pH and carbon dioxide concentrations, thus the evolution of
haemoglobin involved the evolution of a mechanism capable of perceiving a change in its
environment.
Substrate
Enzyme
Product K
X
Product J
Allosteric activator
Binding of the allosteric activator to the enzyme changes the conformation of the
enzyme. This allows the enzyme to bind the substrate and catalyze the
production of products K and J.
Through conformational changes, information is transmitted between distant sites in a protein; this is the
basis for a MOLECULAR SWITCH. The capacity for molecular switches is hugely important to evolution, with
out such capacity it is not possible to regulate complex biochemical pathways, initiate and halt
developmental processes, nor perceive changes in the cellular environment and mount the appropriate
response.