Download Introduction to Protein Structure

Document related concepts

Signal transduction wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Structural alignment wikipedia , lookup

Interactome wikipedia , lookup

Protein wikipedia , lookup

Metalloprotein wikipedia , lookup

Biochemistry wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Introduction to Bioinformatics
Introduction to Bioinformatics
Lecture 14: Protein Folding
Centre for Integrative Bioinformatics VU (IBIVU)
18 Apr 2006
1
Introduction to Bioinformatics
Introduction to Protein Structure
• Great book covering basics of Protein Structure:
18 Apr 2006
– Short Introduction to
Molecular Structures
– “Introduction to
Protein Structure”
• Chapters 1 to 5
• Carl Branden &
John Tooze
ISBN: 0-8153-2305-0
2
Introduction to Bioinformatics
Prelude: molecular structures
• John Dalton (1810)
A new system of chemistry
• Elements, but no structures yet
• Mendeljev (1869)
18 Apr 2006
3
Introduction to Bioinformatics
Johannes van ’t Hoff
• Chimie dans l’Espace
“Proposal for the development
of three-dimensional chemical
structural formulae” (1875)
• Tetraedrical carbon atom
18 Apr 2006
4
Introduction to Bioinformatics
Linus Pauling (1951)
• Atomic Coordinates and
Structure Factors for Two
Helical Configurations of
Polypeptide Chains
• Alpha-helix
18 Apr 2006
5
Introduction to Bioinformatics
James Watson & Francis Crick (1953)
• Molecular structure of nucleic acids
18 Apr 2006
6
Introduction to Bioinformatics
James Watson & Francis Crick (1953)
• Molecular structure of nucleic acids
18 Apr 2006
7
Introduction to Bioinformatics
DNA/Protein structure-function
analysis and prediction
The building blocks:
•Chains of amino acids
•Three-dimensional Structures
•Four levels of protein architecture
•Amino acids: classes
•Disulphide bridges
•Histidine
•Proline
•Ramachandran plot: mainchain dihedral angles
•Rotamers: sidechain dihedral angles
18 Apr 2006
8
Introduction to Bioinformatics
The Building Blocks (proteins)
•
•
•
•
18 Apr 2006
Proteins consist of chains of amino acids
Bound together through the peptide bond
Special folding of the chain yields structure
Structure determines the function
9
Introduction to Bioinformatics
18 Apr 2006
Chains of aminoacids
10
Introduction to Bioinformatics
Three-dimensional Structures
• Four hierarchical levels of protein architecture
18 Apr 2006
11
Introduction to Bioinformatics
Aminoacids: physicochemical classes
• Hydrophobic aminoacids
Alanine
Phenylalanine
Leucine
Methionine
Ala
Phe
Leu
Met
A
F
L
M
Valine
Isoleucine
Proline
Val
Ile
Pro
V
I
P
Asp
Lys
D
K
Glutamate (-)
Arginine (+)
Glu
Arg
E
R
Ser
Tyr
Asn
His
S
Y
N
H
Threonine
Cysteine
Glutamine
Tryptophane
Thr
Cys
Gln
Trp
T
C
Q
W
• Charged aminoacids
Aspartate (-)
Lysine (+)
• Polar aminoacids
Serine
Tyrosine
Asparagine
Histidine
• Glycine (sidechain is only a hydrogen)
Glycine
18 Apr 2006
Gly
G
12
Introduction to Bioinformatics
Disulphide bridges
• Two cysteines can form disulphide bridges
• Anchoring of secondary structure elements
18 Apr 2006
13
Introduction to Bioinformatics
Ramachandran plot
• Only certain combinations of
values of phi (f) and psi (y)
angles are observed
psi
psi
phi
omega
phi
18 Apr 2006
16
Introduction to Bioinformatics
18 Apr 2006
Rotamers: highly populated combinations
of side-chain dihedral angles
Rotamers
•are amino acid sidechain dihedral angles, numbered 1, 2, 3,...
going outward from C atom
•different numbers of -angles depending on amino acid type
•are usually defined as low energy side-chain conformations.
•the use of a library of rotamers allows the modeling of a
structure while trying the most likely side-chain conformations,
saving time and producing a structure that is more likely to be
correct.
17
Introduction to Bioinformatics
18 Apr 2006
DNA/Protein structure-function
analysis and prediction
Motifs of protein structure
• Secundary structure elements
• Renderings of proteins
• Alpha helix
• Beta-strands & sheets
• Turns and motifs
• Domains formed by motifs
18
Introduction to Bioinformatics
Motifs of protein structure
• Global structural characteristics:
– Outside hydrophylic, inside hydrophobic (unless…)
– Often globular form (unless…)
Artymiuk et al, Structure of Hen Egg White Lysozyme (1981)
18 Apr 2006
19
Introduction to Bioinformatics
18 Apr 2006
Secundary structure elements
Alpha-helix
Beta-strand
20
Introduction to Bioinformatics
Renderings of proteins
• Irving Geis:
18 Apr 2006
21
Introduction to Bioinformatics
Renderings of proteins
• Jane Richardson:
18 Apr 2006
22
Introduction to Bioinformatics
Alpha helix
• Hydrogen bond:
from N-H at position n, to C=O at position n-4 (‘n-n+4’)
18 Apr 2006
23
Introduction to Bioinformatics
Other helices
• Alternative helices are also possible
18 Apr 2006
– 310-helix: hydrogen bond from N-H at position n, to
C=O at position n-3
• Bigger chance of bad contacts
– -helix: hydrogen bond from N-H at position n, to
C=O at position n-4
– p-helix: hydrogen bond from N-H at position n, to
C=O at position n-5
• structure more open: no contacts
• Hollow in the middle too small for e.g. water
• At the edge of the Ramachandran plot
24
Introduction to Bioinformatics
Helices
• Backbone hydrogen bridges form the structure
– Often covers hydrophobic centre of protein
• Sidechains point outwards (‘Xmas tree’)
– Possibly: one side hydrophobic, one side
hydrophylic (amphipathic helices)
18 Apr 2006
25
Introduction to Bioinformatics
Beta-strands: beta-sheets
• Beta-strands next to each other form
hydrogen bridges
18 Apr 2006
26
Introduction to Bioinformatics
Parallel or Antiparallel sheets
Anti-parallel
Parallel
• Usually only parallel or
anti-parallel
• Occasionally mixed
• Sidechains alternating
(up-down)
18 Apr 2006
27
Introduction to Bioinformatics
Turns and motifs
• Between the secundary structure elements are loops
• Very short loops between two b-strands: turn
• Different secondary structure elements often appear
together: motifs
– Helix-turn-helix
– Calcium binding motif
– Hairpin
– Greek key motif
– b--b-motif
18 Apr 2006
28
Introduction to Bioinformatics
Helix-turn-helix motif
• Helix-turn-helix
important for DNA
recognition by proteins
• EF-hand:
calcium binding motif
18 Apr 2006
29
Introduction to Bioinformatics
Hairpin / Greek key motif
• Different possible
hairpins : type I/II
• Greek key:
anti-parallel beta-sheets
18 Apr 2006
30
Introduction to Bioinformatics
b--b motif
• Most common way to obtain
parallel b-sheets
• Usually the motif is ‘righthanded’
18 Apr 2006
31
Introduction to Bioinformatics
Domains formed by motifs
• Within protein different domains can be identified
– For example:
• ligand binding domain
• DNA binding domain
• Catalytic domain
• Domains are built from motifs of secondary structure
elements
18 Apr 2006
32
Introduction to Bioinformatics
Alpha/beta barrels
• TIM barrel after triosephosphate isomerase
• Usually 8 b-strands, at least 200 aminoacids
• Often hydrophobic interior
– alternating amino acids in the strands
18 Apr 2006
33
Introduction to Bioinformatics
Alpha/beta barrels
• Active site formed by (variable) loop regions at top of the barrel
• Exception:
active site in the core of methylmalonyl-coenzyme A mutase
18 Apr 2006
34
Introduction to Bioinformatics
Summary
• Aminoacids form polypeptide chains
• Chains fold into three-dimensional structure
• Specific backbone angles are permitted or not:
Ramachandran plot
• Secundary structure elements:
-helix, b-sheet
• Common structural motifs:
Helix-turn-helix, Calcium binding motif, Hairpin,
Greek key motif, b--b-motif
• Combination of elements and motifs:
tertiary structure
• Many protein structures available: PDB
18 Apr 2006
35
Introduction to Bioinformatics
Sequence-Structure-Function
What can we do with bioinformatics?
Knowledge based
Ab initio
Sequence
Inverse
folding,
Threading
BLAST
Structure
Function
Folding:
impossible but
for the smallest
structures
Function
prediction from
structure – very
difficult
•Ab initio prediction (based on first principles) is still not generally
succesful (red)
•Many Bioinformatics methods are therefore knowledge-based
(green)
18 Apr 2006
36
Introduction to Bioinformatics
Active protein conformation
• Active conformation of protein is the native state
• unfolded, denatured state
– high temperature
– high pressure
– high concentrations urea (8 M)
• Equilibrium between two forms
Denatured state
18 Apr 2006
Native state
37
Introduction to Bioinformatics
Anfinsen’s Theorem (1950’s)
• Primary structure determines tertiary structure.
In the mid 1950’s Anfinsen began to concentrate on the
problem of the relationship between structure and function in
enzymes. […] He proposed that the information determining
the tertiary structure of a protein resides in the chemistry of its
amino acid sequence. […] It was demonstrated that, after
cleavage of disulfide bonds and disruption of tertiary structure,
many proteins could spontaneously refold to their native forms.
This work resulted in general acceptance of the
‘thermodynamic hypothesis’ (Nobel Prize Chemistry 1972)."
www.nobel.se/chemistry/laureates/1972/anfinsen-bio.html
• Anfinsen performed
un-folding/re-folding experiments
18 Apr 2006
38
Introduction to Bioinformatics
Dimensions: Sequence Space
• How many sequences of length n are possible?
N(seq) = 20 • 20 • 20 • … = 20n
e.g. for n = 100, N = 20100  10130, is nearly infinite
– Only a subset of these will fold in a stable
conformation
• The probability p of finding twice the same sequence is
p = 1/N, e.g. 1/10130
is nearly zero.
• Evolution: divergent or convergent
– sequences are dissimilar,
in divergent and particularly in convergent evolution
18 Apr 2006
39
Introduction to Bioinformatics
Dimensions: Fold Space
• How many folds exist?
– Sequences cluster into sequence families and fold
families
– some have many members, some few or only one:
• Using Zipf’s law:
n(r) = a / rb
• For sequence families:
b  0.64  ntotal  60000
• For fold families:
b  0.8  ntotal  14000
18 Apr 2006
r is the rank of family, n(r) is
the number of proteins in the
r-th family, a is a scaling
constant, depending on the
number of proteins in the
dataset. Constant b does not
depend on the size of the
dataset.
40
Introduction to Bioinformatics
Levinthal’s paradox (1969)
• Denatured protein re-folds in ~ 0.1 – 1000 seconds
• Protein with e.g. 100 amino acids each with 2 torsions (f en y)
Each can assume 3 conformations (1 trans, 2 gauche)
3100x2  1095 possible conformations!
• Or:
100 amino acids with 3 possibilities in Ramachandran plot (, b,
L): 3100  1047 conformations
• If the protein can visit one conformation in one ps (10-12 s)
exhaustive search costs 1047 x 10-12 s = 1035 s  1027 years!
(the lifetime of the universe  1010 years…)
18 Apr 2006
41
Introduction to Bioinformatics
Levinthal’s paradox
Protein folding problem:
– Predict the 3D structure from sequence
– Understand the folding process
18 Apr 2006
42
18 Apr 2006
Introduction to Bioinformatics
From 1D to 3D…
43
18 Apr 2006
44
Introduction to Bioinformatics
Nanoseconds, CPU-days
100000
60
10000
10
1000
1
100
CPU years
Introduction to Bioinformatics
What to fold?
…fastest folders
10
1
PPA
alpha
helix
BBA5
beta
hairpin
villin
Pande et al. “Atomistic Protein Folding Simulations on the Submillisecond Time Scale
Using Worldwide Distributed Computing” Biopolymers (2003) 68 91–109
18 Apr 2006
45
Experiments:
100000
villin
BBAW
10000
Predicted folding time
(nanoseconds)
Introduction to Bioinformatics
Rates: predicted vs experiment
BBAW:
Gruebele, et al, UIUC
beta
hairpin
1000
villin:
Raleigh, et al,
SUNY, Stony Brook
beta hairpin:
Eaton, et al, NIH
100
alpha helix:
Eaton, et al, NIH
alpha helix
10
PPA:
Gruebele, et al, UIUC
PPA
Predictions:
1
1
10
100
1000
10000
100000
Pande, et al, Stanford
experimental measurement
18 Apr 2006
(nanoseconds)
46
Introduction to Bioinformatics
Molten globule
• First step: hydrophobic collapse
• Molten globule: globular structure, not yet correct folded
• Local minimum on the free energy surface
18 Apr 2006
47
Introduction to Bioinformatics
Folded state
• Native state = lowest point on the free energy landscape
• Many possible routes
• Many possible local minima (misfolded structures)
18 Apr 2006
48
Introduction to Bioinformatics
Folding energy
• Each protein conformation has a certain energy and a certain
flexibility (entropy)
• Corresponds to a point on a multidimensional free energy surface
Three coordinates per atom
3N-6 dimensions possible
DG = DH – TDS
In very rough generalities:
DH relates to bond
formation/breaking
DS relates to configurational
freedom and water ordering
18 Apr 2006
49
Introduction to Bioinformatics
Hydrophobic Effect
Fundamental:
The Hydrophobic Effect is a Solvent Effect
Oil +
Water
Oil
How is interfacial water
layer ordered?
18 Apr 2006
50
Introduction to Bioinformatics
Hydrophobic Effect in Protein Folding
HOH
+
HOH
DS = +
Unfolded
More Hydrocarbon-Water
Interfacial Area,
More Water Ordered
18 Apr 2006
Folded
Less Hydrocarbon-Water
Interfacial Area,
Less Water Ordered
51
Introduction to Bioinformatics
Helper proteins
• Forming and breaking disulfide bridges
– Disulfide bridge forming enzymes: Dsb
– protein disulfide isomerase: PDI
• “Isomerization” of proline residues
– Peptidyl prolyl isomerases
• Chaperones
– Heat shock proteins
– GroEL/GroES complex
– Preventing or breaking
‘undesirable interactions’…
18 Apr 2006
52
Introduction to Bioinformatics
Disulfide bridges
• Equilibriums during the folding process
18 Apr 2006
53
Introduction to Bioinformatics
Proline: two conformations
• Peptide bond nearly always trans (1000:1)
• For proline cis conformation also possible (trans:cis equilibrium =
4:1)
• For folding, all prolines need to be in trans conformation -Isomerization is bottleneck, cyclophilin catalyses
18 Apr 2006
54
Introduction to Bioinformatics
Chaperones
• During folding process hydrophobic parts outside?
– Risk for aggregation of proteins
• Chaperones offer protection
– Are mainly formed at high temperatures (when needed)
– Heat-shock proteins: Hsp70, Hsp60 (GroEL), Hsp10 (GroES)
18 Apr 2006
55
Introduction to Bioinformatics
GroEL/GroES complex
• GroEL:
– 2 x seven subunits in a ring
– Each subunit has equatorial, intermediate and apical domain
– ATP hydrolyse, ATP/ADP diffuse through intermediate domain
• GroES:
– Also seven subunits
– Closes cavity of GroEL
18 Apr 2006
56
Introduction to Bioinformatics
GroEL/GroES mechanism
• GroES binding changes both
sides of GroEL
– closed cavity
– open cavity
• cycle
– protein binds side 1
– GroES covers, ATP binds
– ATP  ADP + Pi
– ATP binds side 2
– ATP -> ADP + Pi
• GroES opens
• folded protein exits
• ADP exits
– New protein binds
18 Apr 2006
57
Introduction to Bioinformatics
Alternative folding: prions
• Prion proteins are found in
the brains
• Function unknown
• Two forms
– normal alpha-structure
– harmful beta-structure
• beta-structure can aggregate
and form ‘plaques’
– Blocks certain tissues and
functions in the brains
18 Apr 2006
58
Introduction to Bioinformatics
Protein flexibility
• Also a correctly folded protein is dynamic
– Crystal structure
yields average
position of
the atoms
– ‘Breathing’
overall motion
possible
18 Apr 2006
59
Introduction to Bioinformatics
B-factors
• The average motion of an atom around the average position
beta-sheet
18 Apr 2006
alpha helices
60
Introduction to Bioinformatics
18 Apr 2006
Protein Tertiary Structure Tied to Function
61
Introduction to Bioinformatics
Conformational changes
• Often conformational changes play an important role for the
function of the protein
• Estrogen receptor
– With activator (agonist) bound: active
– With inactivator (antagonist) bound: not active
18 Apr 2006
active
inactive
62
Introduction to Bioinformatics
Main points
• Anfinsen: proteins fold reversibly!
• Levinthal: too many conformations for fast folding?
– First hydrophobic collapse, then local rearrangement
• Protein folding funnel
– Assistance with protein folding
• Sulphur bridge formation
• Proline isomerization
• Chaperonins
• Intrinsic flexibility: Breating / Conformational change
– Conformational changes for
• Activation / Deactivation
18 Apr 2006
63