Download Protein Structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

SR protein wikipedia , lookup

Magnesium transporter wikipedia , lookup

Peptide synthesis wikipedia , lookup

Interactome wikipedia , lookup

Point mutation wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Metabolism wikipedia , lookup

Metalloprotein wikipedia , lookup

Genetic code wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Protein Structure
Amino Acid Structure
•
•
•
•
•
You are going to have to learn the names and
structures of the 20 amino acids, along with
their 3 letter and 1 letter codes. One of the
best ways for you to learn them is to draw the
structures out yourself, not just look at
pictures of them.
I am going to spend a little bit of time going
through them systematically to help you out.
Most of the one letter codes are simply the
first letter of the amino acid, or an obvious
extension of them: R for arginine, Y for
tyrosine, F for phenylalanine.
A few are arbitrary and you just have to
memorize them.
The basic structure of an amino acid: the
central carbon atom, the alpha carbon, Cα, is
connected to an amino group (-NH2) one side,
and a carboxylic acid group (-COOH) on the
other side. The alpha carbon is also
connected to the R group, which is different
for each of the amino acids. The fourth bond
on the alpha carbon is to a hydrogen atom.
Glycine and Alanine, the Simplest Amino Acids
• The R group on glycine (Gly, G) is
just a hydrogen. Thus, the alpha
carbon on glycine has 2
hydrogens stuck to it, and for
that reason it does not have an L
form and a D form. It is nonpolar, but only weakly
hydrophobic. It can be found in
either hydrophilic or
hydrophobic environments.
Glycine
• Alanine (Ala, A) has a methyl
group as its R group. This makes
it weakly hydrophobic, but like
glycine, it can be found
anywhere in a protein.
Alanine
Adding an –OH: serine and threonine
• Serine (Ser, S) is alanine
with an –OH group
attached. This makes
serine a polar amino acid:
anytime you have a C-O or a
C-N bond, you get polarity,
which implies a hydrophilic
character and the likelihood
of hydrogen bonds.
Serine
• Threonine (Thr, T) adds
another methyl group to
the R group carbon of
serine. Serine and
threonine have very similar
properties.
Threonine
Acidic Amino Acids: aspartic acid and glutamic acid
• Aspartic acid (Asp, D) is just
alanine with a carboxylic acids
group attached. At physiological
pHs this group is in the –COOform. This makes it hydrophilic
and subject to electrostatic
interactions with basic amino
acids that have a + charge.
• Glutamic acid (Glu, E) simply has
one more carbon between the
acid group and the alpha carbon
than aspartic acid. Properties
are very similar.
Aspartic acid
Glutamic acid
Amide Derivatives: asparagine and glutamine
• Asparagine (Asn, N) has the
carboxylic acid on its R group
converted to an amide: CONH2 instead of –COOH. This
makes asparagine polar but
not charged.
• Glutamine (Gln, Q) does the
same thing with glutamic acid:
an amide group instead of an
acid group.
Asparagine
Glutamine
Basic Amino Acids: Varied Structures
•
•
•
Lysine (Lys, K) is the simplest of the basic
amino acids. Its R group is a 4 carbon
chain ending in an amino group. At
physiological pHs, the amino group is in
the –NH3+ form, making lysine a
hydrophilic charged amino acid that
forms ionic bonds with acidic amino
acids.
Arginine (Arg, R) has a 3 carbon chain
ending in a more complicated structure:
a central carbon connected to 3
nitrogens. This structure carries a +
charge at physiological pHs.
Histidine (His, H) is like alanine
connected to a 5 member ring containing
3 carbons and 2 nitrogens. Histidine
normally has a + charge, but its pKa is
6.10, which means that under
physiological conditions, small changes
in pH will change the amount of charge
on the histidine.
Lysine
Arginine
 Histidine
Sulfur-containing: cysteine and methionine
• Cysteine (Cys, C) is identical to serine except
that the –OH group has been replaced by an
–SH group. That is, it’s like alanine with an -SH
attached. Cysteine often forms disulfide
bridges with other cysteines, which help
stabilize the three dimensional structure of
proteins.
• Methionine (Met, M) has a linear R group that
is 2 carbons, a sulfur and a carbon. This makes
methionine fairly hydrophobic. Methionine is
the first amino acid in every protein as it is
being synthesized, although it is often removed
after synthesis in complete. Methionines are
also found in the middle of protein sequences.
Aliphatic and Hydrophobic: Leucine, Isoleucine, Valine
•
•
These three amino acids are very
similar. They all contain hydrocarbon
chains, which makes them
hydrophobic, usually found in the
interior of proteins. Aliphatic means
there are no benzene-type rings
(which are called aromatic).
Valine (Val, V) has an R group that is
a V of 3 carbons attached.
•
Leucine (Leu, L) is like valine with an
extra carbon between the 3 carbon V
and the alpha carbon.
•
Isoleucine (Ile, I) has the same
number of carbons in its R group as
leucine, but arranged slightly
differently: it’s like valine with an
extra carbon on one of the arms.
Valine
Leucine
Isoleucine
Aromatic: Phenylalanine, Tyrosine, Tryptophan
•
•
•
Phenylalanine (Phe, F) is like alanine with a
phenyl group (a benzene ring) attached). It
is a very hydrophobic amino acid, usually
buried in the interior of proteins or
membranes.
Tyrosine (Tyr, Y) is phenylalanine with an –
OH group attached to the ring. It is also
hydrophobic, but not as much as
phenylalanine. The –OH group can form
hydrogen bonds, so in some proteins,
tyrosine is found exposed to water.
Tryptophan (Trp, W) is the largest and least
common amino acid. It has an R group with
2 rings. The 5 member ring contains a
nitrogen. This structure is an indole ring,
with indole being the compound that gives
feces its characteristic odor. Because of the
nitrogen, tryptophan can form hydrogen
bonds even though it is quite hydrophobic.
Phenylalanine
Tyrosine
Tryptophan
Imino Acid: Proline
• Proline (Pro, P) is unique in that its R
group is attached to the amino nitrogen
as well as to the alpha carbon. For a
chemist, this makes proline an imino
acid, not an amino acid. This bond
means that proline necessarily
introduces a kink in the polypeptide
backbone. The lack of an H attached to
the N means that proline can’t form
any hydrogen bonds. All other amino
acids can at least form hydrogen bonds
with the N-H in the backbone. Proline
is hydrophobic.
Peptide Linkage
•
•
•
The peptide bond connects the amino group
of one amino acid to the acid group of the
next amino acid. This bond is called a peptide
bond; organic chemists would call this an
amide bond.
– The peptide bond region is almost always
planar with the C=O sticking out one side,
and the H on the nitrogen sticking out the
other side. These groups are both polar
and easily form hydrogen bonds.
The other two bonds in the polypeptide
backbone are called psi (ψ), between the acid
carbon and the alpha carbon, and phi (φ),
between the amino nitrogen and the alpha
carbon.
These bonds can rotate freely, but they are
constrained by steric hindrance between the R
groups (i.e. they bump into each other); the
book refers to this as Van der Waals forces.
Also, formation of hydrogen bonds and ionic
bonds influences the phi and psi bond angles.
Also, hydrophobic interactions: the need for
some amino acids to get away from water and
others to be in contact with water.
Levels of Protein Structure
•
•
•
•
•
Primary (1o) structure: the amino acid
sequence.
Secondary (2o) structure: local structures,
mostly the alpha helix and the beta sheet
Tertiary (3o) structure: the overall folding
pattern of the whole polypeptide
Quaternary (4o) structure: how different
polypeptides join together to form a protein
with multiple subunits.
Between secondary and tertiary is a very
important level: the domain. A domain is a
region of a polypeptide that can fold into a
compact functional structure independent
of the rest of the protein. Most proteins are
composed of several domains. Usually 100200 amino acids long. Domains are more
conserved in evolution than whole proteins
are.
Pyruvate kinase,
an enzyme with
3 domains
Secondary Structure
• There are just 2 common secondary structures
in proteins: the alpha helix and the beta sheet.
Both are held together by hydrogen bonds
between the C=O in one peptide bond and the
N-H in another peptide bond.
• The alpha helix is a rigid cylinder formed when
a single chain is twisted so the C=O of one
amino acid is hydrogen-bonded to the N-H of
the fourth amino acid down the backbone.
– This gives one turn every 3.6 amino acids
• In an alpha helix, the protein backbone is in the
center, with the R groups jutting out.
• The transmembrane regions of proteins often
consist of alpha helices with hydrophobic R
groups. The hydrophilic backbone is shielded
from the membrane interior by the R groups.
More Alpha Helix
• Many transmembrane proteins
use several alpha helices wrapped
up together.
– Ion channels and other transporter
molecules
• Sometimes 2 or 3 alpha helices
will wrap around each other: this
is called a coiled coil.
– In an amphipathic alpha helix, the
hydrophobic R groups on one side
of each helix to interact with each
other, while hydrophilic R groups on
the other side of each helix can
interact with water.
– Structural proteins like keratin (in
skin) and myosin (muscle) are long
coiled-coil rods.
More Secondary Structure: Beta Sheet
• A beta strand is a region of a polypeptide that is composed of the Vshaped N-C-C backbone of the amino acids alternating up and down. The R
groups also alternate up and down.
• Two or more beta strands next to each other is a beta pleated sheet.
• Beta sheets are held together by hydrogen bonds between the amino acids
of the different strands.
• The strands can be parallel or anti-parallel. Or even mixed.
More Beta Sheet
• Beta sheets form
strong and rigid
proteins, such as silk
protein.
• Beta sheets are
usually not planar:
they tend to curl up.
• A beta barrel is a
common
transmembrane
structure that readily
forms a pore.
Protein Domains
•
Protein domains are structures that fold
independently into compact units that have a specific
function.
–
–
•
Example of a multi-domain protein: steroid hormone
receptors are proteins that have two domains:
–
–
–
•
a ligand binding domain that binds to the steroid
hormone
a DNA binding domain that binds to DNA and
stimulates transcription
The two domains are connected by a few amino acids
called a hinge region.
Some domains are found in several different proteins:
they are shuffled between different proteins during
evolution.
–
•
Domains can contain alpha helices, beta sheets, or both,
plus less well defined regions.
Proteins can be composed of just a single domain, or
they can contain several different domains.
An example: the TIM barrel, is found in at least 30
different proteins.
The three dimensional structure is more
evolutionarily conserved than the amino acid
sequence.
Some TIM Barrel Proteins
Domain Structure of a Family of Proteins
• Proteins often have
several different
domains, with similar
proteins having
slightly different
arrangements of the
domains.
• Here are the domain
structures of several
extracellular proteins.
The red domain is the
“chordin-like cysteinerich domain”.
Protein Folding
• Proteins spontaneously fold into a
lowest energy conformation. This is the
active conformation, which allows the
protein to function properly.
– In the 1950s Christian Anfinsen showed
that pancreatic RNase could refold itself
into its active configuration after
denaturation, without any external
guidance. This, and many confirming
experiments on other proteins, has lead
to the general belief that the amino
acid sequence of a protein contains all
the information needed to fold itself
properly, without any additional energy
input.
• Natural selection has strongly favored
protein sequences that have a single
conformation that forms easily and
without making mistakes. The vast
majority of randomly chosen protein
sequences don’t do this: they have
multiple conformations that are about
equally low energy.
Anfinsen experiment: RNase is held
together by disulfide bonds as well as
the non-covalent bonds of the folding.
The protein was denatured by adding
urea and reducing the disulfides to –SH
groups. The experimental group had
the urea removed first, then the
disulfides re-formed by oxidation. The
control had disulfides re-formed first,
while the protein was still in a random
conformation.
Forces That Fold Proteins
• Proteins are thought to fold into the lowest free energy conformation.
• It is thought that the fastest acting force is hydrophilic and hydrophobic
interactions: the need for amino acids with hydrophobic R groups to
aggregate together away from water, and the need for the hydrophilic
amino acids to fit into the structure of water molecules
• Formation of hydrogen bonds and ionic bonds (electrostatic interactions
happens after the initial hydrophobic interactions
• Van der Waals forces: the slight attraction between all atoms coupled with
repulsion if they get too close. This makes the atoms of a protein pack
together (as seen in space-filling models).
Protein Folding Energetics
• An unfolded protein has high entropy:
there are many different conformations it
can be in. This is symbolized by the width
of the funnel.
• Unfolded proteins also have a high free
energy, meaning that the protein chain
moves easily between different
conformations
• As protein folding proceeds, both the free
energy and the entropy decrease to a
minimum. The protein assumes a single
conformation (meaning entropy is reduced
to a minimum) that is the lowest free
energy state.
• There are various intermediate folding
states, some of which can trap the protein
into a misfolded and inactive conformation.
Bioinformatics: The Protein Folding Problem
•
•
The protein folding problem: predict the three dimensional structure of a protein
from its amino acid sequence.
If proteins fold into their lowest energy configuration, based entirely on their
amino acid sequence, you would think that we could figure out the rules and be
able to predict three dimensional structure just from the sequence.
– It’s easy to get the amino acid sequence: just translate the DNA sequence of the genes.
• However, this hasn’t proved true: we can make useful guesses about
structure, but they are still very inaccurate.
A few small proteins
whose structure has been
predicted from the amino
acid sequence (blue), then
compared to the actual
structure (red).
Chaperone Proteins
• Cells contain machinery to
unfold and refold proteins that
are mis-folded.
– Some cellular proteins require
specific chaperones to fold
properly: they can’t fold into
their active configuration by
themselves
• Best studied is Hsp100, also
called Clp. It uses ATP energy
to unfold a protein (and also
break any disulfide bridges).
Then, the protein chain gets
fed through a very tiny
opening, allowing it to refold
on the other side.
Protein Aggregates
• Under various stress conditions (like heat or high pH), proteins unfold
from their correct, functional configuration. Often, the unfolded
proteins bind together into insoluble aggregates.
– think what happens when cooking an egg white: the clear water-soluble
liquid before cooking is due to albumin proteins existing as individual
globular proteins.
– When you heat the egg whites, you unfold the proteins, and they aggregate
together into an insoluble (but more easily digested) white mass, because the
hydrophobic amino acids from different polypeptides stick together to get
away from the aqueous environment. This is a protein aggregate.
Protein Aggregation in
Neurodegenerative Diseases
• A more relevant protein aggregate
issue: neural degenerative
diseases. Alzheimer’s, prion
diseases, Huntington’s disease,
Parkinson’s disease are all caused
by the formation of insoluble
protein aggregates in the brain.
– These aggregates are mis-folded
proteins that form fibrils rich in
beta sheet structures. They are
called amyloid.
– As the protein folds, it gets caught
in a local free energy minimum,
which can combine with other
mis-folded proteins.
– Why these aggregates are toxic to
neurons is still unclear.
•
A prion is an “infectious protein”. Prion
proteins are encoded in the genome and
expressed at high levels in the nervous
system. They presumably have a function in
normal cells, as yet unknown.
•
Prions are the agents that cause various
neural degenerative diseases: mad cow
disease (bovine spongiform
encephalopathy), chronic wasting disease in
deer and elk, scrapie in sheep, and kuru and
Creutzfeld-Jakob syndrome in humans.
•
The normal prion protein (PrP) is folded into
a specific conformation, a state called PrPC.
Prion diseases are caused by the same
protein folded abnormally, a state called
PrPSc.
A PrPSc can bind to a normal PrPC protein
and convert it to PrPSc. This conversion
spreads throughout the body, causing the
disease to occur.
– It is also a form of inheritance that
does not involve nucleic acids.
Several prion-like proteins are known in
yeast, a model eukaryote. These proteins
have 2 stable conformations, which can be
inherited across generations.
•
•
•
Prions
Quaternary Structure: Assembly of Subunits
• Most proteins are composed of more
than one polypeptide chain (subunit).
• Simplest cases: 2 identical subunits bind
together to form a symmetrical structure
– Sometimes 4 identical subunits, as in
neuraminidase
• Protein complexes with many different
subunits are common.
• Even larger structures: viruses,
ribosomes, etc. can often be
disassembled, and then they
spontaneously reassemble.
– This implies all the information needed for
assembly is present in the proteins
themselves
– Some of these structures also contain RNA
or DNA.
Superoxide dismutase, a
dimer of 2 identical
subunits.
ATP synthetase, multiple copies
of at least 5 different subunits
Quaternary Structure of Fibrous Proteins
• Fibrous proteins provide structure to a cell: actin filaments, microtubules,
collagen, keratin, etc.
• These fibers are formed from many identical subunits binding together.
Virus Assembly
• This is also a spontaneous process, but it can be very complicated.