Download Introduction to 3D-Structure Visualization and Homology Modeling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

Metabolism wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Expression vector wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Interactome wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Structural alignment wikipedia , lookup

Metalloprotein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Homology modeling wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biochemistry wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Introduction to 3D-Structure Visualization and
Homology Modeling using
the Swiss-Model Workspace
L. Bordoli
Biozentrum of the University of Basel and
Swiss Institute of Bioinformatics
May 2009
Outline
• Recapitulation: properties of protein
structures
– Amino acids properties
– Protein folding
– Primary, Secondary, Tertiary and Quaternary
structure
• The Protein Structure Database (PDB)
• Representation of Structural Information
– file formats
– structure visualization using DeepView
Recapitulation: Protein Structures
• Brief Recap: Some properties of protein
structures
– Primary Structure
• Amino acids
• Peptide bonds
– Secondary Structure
– Tertiary Structure
– Quaternary Structure
Recapitulation: Primary Structures
•
Proteins are polypeptides (generally: polyamides)
Carboxyl group reacts with amine group
Backbone
+
Side chains
Recapitulation: Amino Acids
• 20 standard
L-amino acids
Stereochemistry: L- and D-amino acids
“L”
“D”
Recapitulation: Amino Acids
• 20 standard L-amino acids
Amino Acids: Side Chain Properties
Neutral Hydrophobic
Alanine
Valine
Leucine
Isoleucine
Proline
Tryptophane
Phenylalanine
Methionine
Neutral Polar
Glycine
Serine
Threonine
Tyrosine
Cysteine
Asparagine
Glutamine
Basic
Lysin
Arginine
(Histidine)
Acidic
Aspartic Acid
Glutamic Acid
**
*
**
*
The hydropathy index of an amino acid is a number representing the hydrophobic (*) or
hydrophilic (**) properties of its side-chain: the larger the number the more hydrophobic.
Amino Acids: Side Chain Properties
•
Chemical properties of
standard L-amino acids:
•
Aprox. pKa values of side
chains:
–
–
–
–
–
–
–
Arg 12.5
Lys 10.8
Tyr 10.1
Cys 8.3
His 6.0
Glu 4.1
Asp 3.9
Ka= dissociation constant: degree
of deprotonation
Energetics of protein folding
ΔGfold = ΔH - TΔS
Then a system changes from a well-defined initial state to a well-defined final
state, the Gibbs free energy ΔG equals the work exchanged by the system
with its surroundings, less the work of the pressure forces, during a
reversible transformation of the system from the same initial state to the
same final state.
The enthalpy change ΔH = change in the internal energy of the system
The entropy change ΔS: change in the amount of order, disorder, and/or
chaos in a thermodynamic system
Protein Folding: Hydrophobic Effects
• main driving force for protein folding
Water molecules in bulk water are mobile
and can form H-bonds in all directions.
Hydrophobic surfaces don’t form H-bonds. The
surrounding water molecules have to orient and
become more ordered.
The entropy loss can be minimized by
gathering the hydrophobic surfaces
together in the core of a protein and
separating them from the solvent.
Protein Folding: Hydrophobic Effects
• main driving force for protein folding
Protein Folding: Hydrogen Bonds
•
A H-bond occurs when two electronegative atoms (e.g. N, O)
compete for the same hydrogen atom:
•
H-bonding partners include:
– main chain atoms
N
– side chain atoms
H
– water molecules
– ligands, etc…
O
C
N
C
Q: Do H-bonds
stabilize a
protein fold ?
Protein Folding: Hydrogen Bonds
• In the unfolded state, all potential hydrogen bonding
partners in the extended polypeptide chain are satisfied
by hydrogen bonds to water. When the protein folds,
these protein-to-water H-bonds are broken, and only
some are replaced by (often sub-optimal) intra-protein Hbonds (enthalpic terms increase).
• It would appear that hydrogen bonding is destabilizing to
folded protein structure
• However, one must also consider entropy. When a
protein folds, and those hydrogen bonds that the protein
made to bulk water are broken, the entropy of the
solvent increases.
Protein Folding: Hydrogen Bonds
• The balance between the entropy and enthalpy
terms are close, and in the recent past it was
considered that H-bonds made no contribution
overall to protein stability.
• But, it is now generally accepted that H-bonds
make a positive contribution to protein
stabilization.
• We must remember that if we break or delete an
intramolecular hydrogen bond in a protein
without the possibility of forming a compensating
H-bond to solvent, that protein will be
destabilized.
Energetics of protein folding
ΔGfold = ΔH - TΔS
Energetics of protein folding
H-bonds
hydrophobic effects (entropy)
salt bridges (enthalpy)
SS - bonds
loss of solvation
entropy change
dispersion / VdW contacts
conformational energy
•
•
Difference of two very large energetic terms
Low overall stabilization energy
Principles of protein structure
• Primary Structure
• Secondary Structure
• Tertiary Structure
• Quaternary Structure
Three-dimensional form of local segments of proteins, such as the
formation of loops or helices.
Secondary Structures: α-Helices
• α-Helices: Every backbone N-H group donates a hydrogen bond
to the backbone C=O group of the amino acid four residues earlier
(i+4 -> i hydrogen bonding).
Atomic
representation
Full atom (cpk)
representation
Ribbon (cartoon)
representation
Secondary Structures: β-sheets
• β-sheets - beta strands connected laterally by three or more
hydrogen bonds, forming a generally twisted, pleated sheet.
Secondary Structures: β-sheets
• Most β-sheets have a left-handed twist:
Bovine pancreatic
trypsin inhibitor
0° - 30° per aa
Secondary Structures: β-sheets
• Parallel and anti-parallel β-sheets
Structural motifs
•
•
Structural motifs (often referred to as super-secondary structures)
consist of several secondary structure elements and loops.
Examples:
– Helix loop Helix: Consists of alpha helices bound by a looping stretch
of amino acids. Important in DNA binding proteins.
– Beta Hairpin: Extremely common. Two anti-parallel beta strands
connected by a tight turn of a few amino acids between them.
– Zinc Finger: Two beta strands with an alpha helix end folded over to
bind a zinc ion. This motif is seen in transcription factors.
– Greek Key: 4 beta strands folded over into a sandwich shape.
Peptide bonds
• Geometry of peptide bonds
H
R
H
R
Peptide bonds
• Definition of dihedral angels Φ, Ψ, and ω.
A dihedral angle is the angle of
intersection of two planes. It is the
measure of an angle having its vertex
on the intersecting edge and one side
in each of the planes. The sides of the
angle are perpendicular to the
intersecting edge.
ω
Peptide bonds
•
Dihedral angles Φ and Ψ, the values that are possible are
constrained geometrically due to steric clashes between neighboring
atoms.
Peptide bonds
• Ramachandran Plots: The permitted values of phi psi
Ψ (deg)
Φ (deg)
• Ramachandran Plots
Ψ (deg)
Φ (deg)
Principles of protein structure
• Primary Structure
• Secondary Structure
• Tertiary Structure
• Quaternary Structure
The tertiary structure of a protein or any other macromolecule is its
three-dimensional structure, as defined by the atomic coordinates
Tertiary Structure
• Very large proteins (proteins with more than
10’000 residues are possible) are rarely forming
one large compact structure, but are often
structured in individual domains of ~200-500
residues.
• Domains: The definition of protein domains
adopted here is that of compactly folded
structures with their own hydrophobic core which
may fold independently of the rest of the chain.
Tertiary Structure: Domains
Phosporylase kinase domain
MAP Kinase ERK-2
Phospotransferase domain
Principles of protein structure
• Primary Structure
• Secondary Structure
• Tertiary Structure
• Quaternary Structure
Quaternary Structure
• Arrangement of multiple folded protein molecules in a
multi-subunit complex.
• e.g.: human hemoglobin: 4 chains: α2β2
Where do we find protein structures?
http://www.wwpdb.org/
http://www.pdb.org
http://www.ebi.ac.uk/pdbe/
http://www.pdbj.org
Growth of the Protein Data Bank PDB
„ Total
„ Yearly
[ PDB: http://www.pdb.org ]
Growth of the Protein Data Bank PDB
Representation of Structural Information
•
Representation of Structural Information
– Atom types (chemical element and hybridization)
– Atom coordinates
– Atom charges (full or partial)
– Topology (connectivity of atoms)
– Chemical bond type
– Chirality and Ambiguities
– Trajectories
– Surfaces and scalar fields (e.g. electrostatics)
– Identification (IUPAC name, trivial names)
– Experimental details (source of data)
– Accuracy and reliability information
– Annotation (cross references with other databases)
File formats and their limitations
• Representation of Structural Information
– File formats:
•
•
•
•
•
•
SMILES
MOL2 (Tripos Inc.)
SDF
PDB
mmCIF
PDBML
PDB file format
•
•
http://www.pdb.org
File format is column based
1
2
3
4
5
6
7
1234567890123456789012345678901234567890123456789012345678901234567890
HEADER
MUSCLE PROTEIN
02-JUN-93
1MYS
•
Sections:
• Title
• Primary Structure
• Heterogen Section
• Secondary Structure
• Connectivity Annotation Section
• Miscellaneous Features Section
• Crystallographic and Coordinate Transformation Section
• Coordinate Section
• Connectivity Section
PDB file format
1
2
3
4
5
6
7
1234567890123456789012345678901234567890123456789012345678901234567890
HEADER
3-EPIMERASE
01-DEC-98
1RPX
TITLE
D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM SOLANUM TUBEROSUM
TITLE
2 CHLOROPLASTS
COMPND
MOL_ID: 1;
COMPND
2 MOLECULE: PROTEIN (RIBULOSE-PHOSPHATE 3-EPIMERASE);
COMPND
3 CHAIN: A, B, C;
COMPND
4 EC: 5.1.3.1;
COMPND
5 ENGINEERED: YES
SOURCE
MOL_ID: 1;
SOURCE
2 ORGANISM_SCIENTIFIC: SOLANUM TUBEROSUM;
SOURCE
3 ORGANISM_COMMON: POTATO;
SOURCE
4 ORGANISM_TAXID: 4113;
SOURCE
5 ORGANELLE: CHLOROPLAST;
SOURCE
6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE
7 EXPRESSION_SYSTEM_TAXID: 562
KEYWDS
3-EPIMERASE, CHLOROPLAST, CALVIN CYCLE, OXIDATIVE PENTOSE
KEYWDS
2 PHOSPHATE PATHWAY
EXPDTA
X-RAY DIFFRACTION
AUTHOR
J.KOPP,G.E.SCHULZ
REVDAT
4
24-FEB-09 1RPX
1
VERSN
REVDAT
3
01-MAR-05 1RPX
1
DBREF
REVDAT
2
01-APR-03 1RPX
1
JRNL
REVDAT
1
07-APR-99 1RPX
0
JRNL
AUTH
J.KOPP,S.KOPRIVA,K.H.SUSS,G.E.SCHULZ
JRNL
TITL
STRUCTURE AND MECHANISM OF THE AMPHIBOLIC ENZYME
JRNL
TITL 2 D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM POTATO
JRNL
TITL 3 CHLOROPLASTS.
JRNL
REF
J.MOL.BIOL.
V. 287
761 1999
JRNL
REFN
ISSN 0022-2836
JRNL
PMID
10191144
JRNL
DOI
10.1006/JMBI.1999.2643
REMARK
1
....
HEADER,
OBSLTE,
TITLE,
CAVEAT,
COMPND,
SOURCE,
KEYWDS,
EXPDTA,
AUTHOR,
REVDAT,
SPRSDE,
JRNL,
REMARK
PDB file format
HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS,
EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, REMARK
1
2
3
4
5
6
7
1234567890123456789012345678901234567890123456789012345678901234567890
...
REMARK
1
REMARK
2
REMARK
2 RESOLUTION. 2.30 ANGSTROMS.
REMARK
3
REMARK
3 REFINEMENT.
REMARK
3
PROGRAM
: X-PLOR 3.8.5.1
REMARK
3
AUTHORS
: BRUNGER
REMARK
3
REMARK
3 DATA USED IN REFINEMENT.
REMARK
3
RESOLUTION RANGE HIGH (ANGSTROMS) : 2.3
REMARK
3
RESOLUTION RANGE LOW (ANGSTROMS) : 35.0
REMARK
3
DATA CUTOFF
(SIGMA(F)) : 0.0
REMARK
3
DATA CUTOFF HIGH
(ABS(F)) : 100000.0
REMARK
3
DATA CUTOFF LOW
(ABS(F)) : 0.001
REMARK
3
COMPLETENESS (WORKING+TEST)
(%) : 97.2
REMARK
3
NUMBER OF REFLECTIONS
: 49783
REMARK
3
REMARK
3
REMARK
3 FIT TO DATA USED IN REFINEMENT.
REMARK
3
CROSS-VALIDATION METHOD
: THROUGHOUT
REMARK
3
FREE R VALUE TEST SET SELECTION : RANDOM
REMARK
3
R VALUE
(WORKING SET) : 0.174
REMARK
3
FREE R VALUE
: 0.212
REMARK
3
FREE R VALUE TEST SET SIZE
(%) : 3.01
REMARK
3
FREE R VALUE TEST SET COUNT
: 1500
REMARK
3
ESTIMATED ERROR OF FREE R VALUE : 0.005
...
PDB file format
MODEL, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, ENDMDL
1
2
3
4
5
6
7
1234567890123456789012345678901234567890123456789012345678901234567890
...
ATOM
74 N
ASP A 10
12.982 78.264 31.707 1.00 48.50
ATOM
75 CA ASP A 10
14.137 79.163 31.764 1.00 46.20
ATOM
76 C
ASP A 10
14.910 79.105 30.460 1.00 43.70
ATOM
77 O
ASP A 10
14.572 78.355 29.547 1.00 45.78
ATOM
78 CB ASP A 10
15.133 78.752 32.855 1.00 49.64
ATOM
79 CG ASP A 10
14.471 78.300 34.129 1.00 57.95
ATOM
80 OD1 ASP A 10
13.809 79.129 34.788 1.00 57.91
ATOM
81 OD2 ASP A 10
14.651 77.114 34.487 1.00 63.05
...
HETATM 5200 S
SO4
231
30.451 80.354 18.252 1.00 51.91
HETATM 5201 O1 SO4
231
30.153 81.805 18.105 1.00 57.57
HETATM 5202 O2 SO4
231
31.895 80.187 18.738 1.00 54.06
HETATM 5203 O3 SO4
231
29.512 79.607 19.287 1.00 46.19
HETATM 5204 O4 SO4
231
30.193 79.714 16.846 1.00 50.16
...
x,y,z atom coordinates
N
C
C
O
C
C
O
O
S
O
O
O
O
Representation of Structural Information
Alanine
Ala
A
ATOM
ATOM
ATOM
ATOM
ATOM
263
264
265
266
267
N
CA
C
O
CB
ALA
ALA
ALA
ALA
ALA
A
A
A
A
A
35
35
35
35
35
1.429
0.523
-0.724
-1.850
1.209
34.959
34.398
33.878
34.138
33.268
-16.825
-17.829
-17.157
-17.600
-18.594
1.00
1.00
1.00
1.00
1.00
35.48
35.10
33.88
33.13
33.84
N
C
C
O
C
References and further reading:
1. Thomas E. Creighton, “Proteins:
Structures and Molecular
Properties”.
2. Arthur M. Lesk, “Introduction to
Protein Architecture. The Structural
Biology of Proteins”