Download lecture 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Biochemistry wikipedia, lookup

List of types of proteins wikipedia, lookup

Protein structure prediction wikipedia, lookup

Cell-penetrating peptide wikipedia, lookup

Cyclol wikipedia, lookup

Protein adsorption wikipedia, lookup

Metalloprotein wikipedia, lookup

Protein wikipedia, lookup

Gene expression wikipedia, lookup

Protein (nutrient) wikipedia, lookup

Protein–protein interaction wikipedia, lookup

Intrinsically disordered proteins wikipedia, lookup

Circular dichroism wikipedia, lookup

QPNC-PAGE wikipedia, lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia, lookup

Protein moonlighting wikipedia, lookup

Magnesium transporter wikipedia, lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia, lookup

Western blot wikipedia, lookup

Interactome wikipedia, lookup

G protein–coupled receptor wikipedia, lookup

Silencer (genetics) wikipedia, lookup

SR protein wikipedia, lookup

LSm wikipedia, lookup

Ancestral sequence reconstruction wikipedia, lookup

Proteasome wikipedia, lookup

Ubiquitin wikipedia, lookup

Homology modeling wikipedia, lookup

Protein folding wikipedia, lookup

Immunoprecipitation wikipedia, lookup

Protein design wikipedia, lookup

Transcript
Chapter 4: Protein Structures II
BINF 6101/8101, Spring 2017
Protein Structure Classification
Why bother?
q  Provides structural and evolutionary relationship
q  Provides current fold space
q  Assists protein structure prediction (BINF6202/8202)
Two popular protein classification databases:
q  SCOP (Structural Classification Of Proteins )/SCOPe (extended)
http://scop.berkeley.edu
Latest release: v2.06 (February 2016)
244,326 domains
Murzin et al. J. Mol. Biol. 247, 536-540, 1995
q  CATH: Class (C), Architecture (A), Topology (T) and Homologous superfamily (H).
http://www.cathdb.info/
Latest release: v4.1 (January 1, 2015) | CATH-B
308,999 domains
Orengo et al. Structure, 5, 1093-1108, 1997
Classifications are Domains-based, Why?
q 
q 
q 
q 
Basic units for protein structure comparison/classification
protein domain databases: SCOP, CATH
Structural domains are evolutionary, functional, and folding units of proteins
Very useful in protein structure prediction. Many structural similarities are
between domains
Protein design
Nucleic Acids Res. 1998 Jan 1;26(1):316-9.
Classifications are Domain-based
What is a protein domain?
--Definition of protein domain is not well defined
--General Considerations:
•  compact, semi-independent units *
(close to spherical shape)
•  interactions between domains are weak
(small contact)
•  identifiable hydrophobic core **
(interface is more hydrophilic)
*
Wetlaufer DB. PNAS 1973; 70:697-701
** Swindells MB. Protein Science 1995; 4:103-112
Pyruvate kinase
Simple Cases—Continuous Domains
Adding to the Complexity—Discontinuous Domains
N-terminal
C-terminal
SCOP Classification:
33844 px
39360 px
c.56.5.4 d1cg2a1
d.58.19.1 d1cg2a2
1cg2
1cg2
A:26-213,A:327-414
A:214-326
About 20% of mutidomain proteins are non-contiguous
Redfern OC. et al, PloS Computational Biology, 2007
Multi-domain Proteins
~50% proteins are multi-domain (data from 2005)
It could be as high as 80% in eukaryotes
Redfern OC. et al, PloS Computational Biology, 2007
Hierarchical Structure Classification by SCOP
SCOP—Structural Classification of Proteins
Family: Clear evolutionarily relationship
(1) pairwise residue identities between the proteins are 30% and greater.
(2) Proteins with low sequence similarity but very similar functions and structures; for
example, many globins have sequence identities of only 15%.
Superfamily: Probable common evolutionary origin
Proteins that have low sequence identities, but whose structural and functional
features suggest that a common evolutionary origin is probable are placed together in
superfamilies. For example, actin, the ATPase domain of the heat shock protein, and
hexakinase together form a superfamily.
Fold: Major structural similarity
(1) have same major secondary structures in same arrangement and with the same
topological connections.
(2) Proteins placed together in the same fold category may not have a common
evolutionary origin: the structural similarities could arise just from the physics and
chemistry of proteins favoring certain packing arrangements and chain topologies.
Class: secondary structure content and organization
Murzin et al. J. Mol. Biol. 247, 536-540, 1995
Hierarchical Structure Classification by SCOP
Mainly parallel beta sheets
Segregated alpha and beta regions,
anti-parallel beta
Automatic Assignment in SCOPe
Fox NK, Brenner SE, Chandonia JM. 2014. SCOPe: Structural Classification of Proteins—extended, integrating SCOP
and ASTRAL data and classification of new structures. Nucleic Acids Research 42:D304-309
Timeline of SCOP(e) Releases
SCOP Classification Statistics
SCOP ID
SCOP Classification:
33844 px c.56.5.4 d1cg2a1 1cg2
39360 px d.58.19.1 d1cg2a2 1cg2
A:26-213,A:327-414
A:214-326
Redfern OC. et al, PloS Computational Biology, 2007
All of these different proteins share the TIM-barrel fold, named after triosephosphate isomerase
All α
All β
α/β
α+β
Common Folds
Immunoglobulin fold
• all-β protein fold
• consists of 2 layers
• ~7 antiparallel β-strands
arranged in two β-sheets.
Tim barrel fold
• α/β protein fold
• named after
triosephosphate isomerase
• eight α-helices and
eight parallel β-strands
Rossman fold
• α/β protein fold
• named after
Michael Rossman
• Parallel β-strands connected
by α-helices
PDB: Protein Data Bank
http://www.rcsb.org/pdb/home/home.do
Protein Data Bank (PDB)
•  X-ray structures and NMR structures
(modeled structures in a separate place)
•  Each PDB entry has one unique 4-letter ID
1GUO
1JUN
1TAO
1PHD
Some PDB Statistics
Protein Structure Methods: X-Ray Crystallography
Protein
purification
Steps needed
•  Purify the protein
•  Crystallize the protein
•  Collect diffraction data
•  Calculate electron density
•  Fit residues into density
Pros
•  No size limits, well-established
Cons
•  Difficult for membrane proteins
•  Cannot see hydrogen atoms
•  not every protein can be crystalized!
Images from PDB and “Protein Structure and
Function” by Gregory A Petsko and Dagmar Ringe
Analysis of Diffraction Pattern
q  The diffraction pattern is analyzed by mathematical and computational
methods (Fourier transform analysis) to produce an electron density map.
q  Note the objective result of a crystallographic experiment is not really a
picture of the atoms, but a map of the distribution of electrons in the
molecule, i.e. an electron density map—the x-rays are scattered from the
electron cloud of the atoms.
q  Since the electrons are mostly tightly localized around the nuclei, the
electron density map gives us a pretty good picture of the molecule.
Resolution of X-ray Structures
3.0 Å
2.0 Å
1.0 Å
Resolution is a measure of the level of detail present in the diffraction pattern and the level of
detail that will be seen when the electron density map is calculated.
Images from PDB and “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe
Resolution of X-ray Structures
Determining the Structure of Myoglobin by x-ray Method
Structure Determination by NMR
Steps needed
•  Purify the protein
•  Dissolve the protein
•  Collect NMR data
•  Assign NMR signals
•  Calculate the structure
Pros
•  No need to crystallize the protein
•  Can see hydrogen atoms
Cons
•  Difficult for insoluble proteins
•  Works best with small proteins, size limit (<50 kd)
Image from “Protein Structure and Function” by Gregory A Petsko and Dagmar Ringe
Determining the Structure of Myoglobin by NMR
q  Based on magnetic moments of atomic nuclei.
q  NMR measures the interactions of atomic nuclei.
q  A typical NMR structure includes an ensemble of
protein structures, all of which are consistent with the
observed list of experimental restraints.
Alternate Location Indicator
Intrinsically Disordered Proteins
•  Contain protein segments that lack definable structure
•  Composed of amino acids whose higher concentration forces
less-defined structure
–  Lys, Arg, Glu, and Pro
•  Disordered regions can conform to many different proteins,
facilitating interaction with numerous different partner
proteins
Intrinsically Disordered Proteins
REMARK 465 MISSING RESIDUES
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
REMARK 465
REMARK 465
M RES C SSSEQI
REMARK 465
ALA A
10
REMARK 465
LEU A
11
REMARK 465
TYR A
12
REMARK 465
ASP A
13
REMARK 465
GLU A
14
REMARK 465
ASN A
15
REMARK 465
GLN A
16
REMARK 465
LYS A
17
REMARK 465
GLY A
34
REMARK 465
SER A
35
REMARK 465
ASP A
36
REMARK 465
THR A
37
REMARK 465
LYS A
38
REMARK 465
VAL A
39
REMARK 465
LEU A
40
REMARK 465
ASN A
97
REMARK 465
LYS A
98 !
1AZ3, ECORV ENDONUCLEASE
!
...........
!
Intrinsically Disordered Proteins
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
N
CA
C
O
CB
CG
CD1
CD2
N
CA
C
O
CB
OG
N
CA
C
O
CB
OG1
CG2
N
LEU
LEU
LEU
LEU
LEU
LEU
LEU
LEU
SER
SER
SER
SER
SER
SER
THR
THR
THR
THR
THR
THR
THR
ILE
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
33
33
33
33
33
33
33
33
41
41
41
41
41
41
42
42
42
42
42
42
42
43
26.730
27.036
28.374
28.664
25.885
24.530
23.595
23.966
22.783
22.712
22.385
22.445
24.019
24.113
22.154
21.792
20.607
19.810
22.947
23.998
22.476
20.520
-44.512
-44.367
-43.713
-42.605
-43.635
-44.331
-43.397
-44.741
-41.387
-41.306
-42.623
-42.687
-40.757
-40.937
-43.682
-45.011
-45.407
-46.272
-46.040
-45.743
-47.447
-44.767
3.763
2.333
1.986
2.433
1.634
1.784
2.462
0.434
-5.851
-7.306
-8.008
-9.214
-7.877
-9.299
-7.238
-7.752
-6.888
-7.239
-7.570
-8.494
-7.809
-5.726
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
42.58
42.29
39.56
39.78
37.47
32.39
32.67
32.82
20.25
28.02
27.89
25.11
33.28
39.21
33.31
30.10
30.51
34.44
27.32
33.52
21.11
30.77
N
C
C
O
C
C
C
C
N
C
C
O
C
O
N
C
C
O
C
O
C
N
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
Intrinsically Disordered Proteins
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
REMARK
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
465
MISSING RESIDUES
THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE
EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN
IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)
M RES
ALA
THR
SER
THR
LYS
LYS
GLU
ASP
ASN
ALA
ASP
SER
GLY
GLN
C SSSEQI
A
1
A
2
A
3
A
4
A
5
A
6
A
142
A
143
A
144
A
145
A
146
A
147
A
148
A
149 !
N-terminal and C-terminal disordered regions
1EYA
P53 Protein and its Binding Partners
Protein folding and stability
Protein Stability and Folding
A protein’s function depends on its 3D-structure
Proteins are stabilized by many(!) collective weak interactions
Protein Stability
•  Protein stability is a small difference of large numbers.
•  Proteins are stable (ΔG < 0) only over a narrow environmental range.
•  In fact, there are forces pushing the equilibrium between folded and unfolded
in both directions.
• 
Stabilizing forces: Intraprotein salt bridges, hydrogen bonds, dipole-dipole interactions and
VDW interactions (all of which are electrostatic in nature).
• 
Destabilizing forces: Primarily electrostatic interactions with solvent and conformational
entropy reduction.
•  The hydrophobic effect is a not a true force; rather, it is a colligative property.
•  Common denaturants include: detergents, organic solutions, extreme pH,
extreme temperature, high ionic strength, etc.
Thermal Denaturation
Loss of structural integrity with accompanying loss of function is called denaturation.
Tm: melting temperature
Ribonuclease A: CD method
Apomyoglobin: W content
Absorption of UV light by Aromatic Amino Acids
• 
• 
• 
• 
The aromatic amino acids absorb light in the UV region
Proteins typically have UV absorbance maxima around 275–280 nm
Tryptophan and tyrosine are the strongest chromophores
Concentration can be determined by UV-visible spectrophotometry using
Beers law: A = ε·c·l
ε: Molar absorptivity
c: concentration
l: length of the light path
Circular Dichroism (CD) Analysis
•  CD measures the molar absorption difference Δε of left- and right-circularly
polarized light: Δε = εL – εR
•  Chromophores in the chiral environment produce characteristic signals
•  CD signals from peptide bonds depend on the chain conformation
Protein Denaturation
Ribonuclease Refolding Experiment
• 
Ribonuclease is a small protein that contains 8
cysteines linked via four disulfide bonds
• 
Urea in the presence of 2-mercaptoethanol fully
denatures ribonuclease
• 
When urea and 2-mercaptoethanol are removed,
the protein spontaneously refolds, and the correct
disulfide bonds are reformed
• 
The sequence alone determines the native
conformation
• 
Quite “simple” experiment, but so important it
earned Chris Anfinsen the 1972 Chemistry Nobel
Prize
Ribonuclease Refolding Experiment
2-Mercaptoethanol
Ribonuclease Refolding Experiment
“Simulations of CI2 in 8 M urea indicate that urea
promotes unfolding by both indirect and direct
mechanisms. Direct urea interactions consisted of
hydrogen bonding to the polar moieties of the protein,
particularly peptide groups, leading to screening of
intramolecular hydrogen bonds. Solvation of the
hydrophobic core proceeded via the influx of water
molecules, then urea. Urea also promoted protein
unfolding in an indirect manner by altering water structure
and dynamics, as also occurs on the introduction of
nonpolar groups to water, thereby diminishing the
hydrophobic effect and facilitating the exposure of the
hydrophobic core residues. Overall, urea-induced effects
on water indirectly contributed to unfolding by
encouraging hydrophobic solvation, whereas direct
interactions provided the pathway.”
PNAS April 29, 2003 vol. 100 no. 9, 5142047
Protein Folding
•  Proteins fold to the lowest-energy fold in the microsecond to second time
scales. How can they find the right fold so fast?
•  It is mathematically impossible for protein folding to occur by randomly
trying every conformation until the lowest-energy one is found
(Levinthal’s paradox, see next slide)
•  Search for the minimum is not random because the direction toward the
native structure is thermodynamically most favorable
Levinthal’s Paradox
Introducing Levinthal’s paradox...
Q: Assume a 300 aa protein, with 3 possible
conformational states per residue. How many possible
conformations are there?
A: 3300 conformations (easy question)
Q: Assuming some finite time for each transition (a few
ps), how long would it take the protein to fold
assuming a completely random sampling?
A: Older than the age of the universe.
Levinthal, Cyrus (1969). "How to Fold Graciously"
Proteins Folding Path
Chaperones in Protein Folding
Chaperones in Protein Folding
Chaperones in Protein Folding
Protein Misfolding and Human Disease