* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to 3D-Structure Visualization and Homology Modeling
Gene expression wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Expression vector wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Structural alignment wikipedia , lookup
Metalloprotein wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Homology modeling wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Introduction to 3D-Structure Visualization and Homology Modeling using the Swiss-Model Workspace L. Bordoli Biozentrum of the University of Basel and Swiss Institute of Bioinformatics May 2009 Outline • Recapitulation: properties of protein structures – Amino acids properties – Protein folding – Primary, Secondary, Tertiary and Quaternary structure • The Protein Structure Database (PDB) • Representation of Structural Information – file formats – structure visualization using DeepView Recapitulation: Protein Structures • Brief Recap: Some properties of protein structures – Primary Structure • Amino acids • Peptide bonds – Secondary Structure – Tertiary Structure – Quaternary Structure Recapitulation: Primary Structures • Proteins are polypeptides (generally: polyamides) Carboxyl group reacts with amine group Backbone + Side chains Recapitulation: Amino Acids • 20 standard L-amino acids Stereochemistry: L- and D-amino acids “L” “D” Recapitulation: Amino Acids • 20 standard L-amino acids Amino Acids: Side Chain Properties Neutral Hydrophobic Alanine Valine Leucine Isoleucine Proline Tryptophane Phenylalanine Methionine Neutral Polar Glycine Serine Threonine Tyrosine Cysteine Asparagine Glutamine Basic Lysin Arginine (Histidine) Acidic Aspartic Acid Glutamic Acid ** * ** * The hydropathy index of an amino acid is a number representing the hydrophobic (*) or hydrophilic (**) properties of its side-chain: the larger the number the more hydrophobic. Amino Acids: Side Chain Properties • Chemical properties of standard L-amino acids: • Aprox. pKa values of side chains: – – – – – – – Arg 12.5 Lys 10.8 Tyr 10.1 Cys 8.3 His 6.0 Glu 4.1 Asp 3.9 Ka= dissociation constant: degree of deprotonation Energetics of protein folding ΔGfold = ΔH - TΔS Then a system changes from a well-defined initial state to a well-defined final state, the Gibbs free energy ΔG equals the work exchanged by the system with its surroundings, less the work of the pressure forces, during a reversible transformation of the system from the same initial state to the same final state. The enthalpy change ΔH = change in the internal energy of the system The entropy change ΔS: change in the amount of order, disorder, and/or chaos in a thermodynamic system Protein Folding: Hydrophobic Effects • main driving force for protein folding Water molecules in bulk water are mobile and can form H-bonds in all directions. Hydrophobic surfaces don’t form H-bonds. The surrounding water molecules have to orient and become more ordered. The entropy loss can be minimized by gathering the hydrophobic surfaces together in the core of a protein and separating them from the solvent. Protein Folding: Hydrophobic Effects • main driving force for protein folding Protein Folding: Hydrogen Bonds • A H-bond occurs when two electronegative atoms (e.g. N, O) compete for the same hydrogen atom: • H-bonding partners include: – main chain atoms N – side chain atoms H – water molecules – ligands, etc… O C N C Q: Do H-bonds stabilize a protein fold ? Protein Folding: Hydrogen Bonds • In the unfolded state, all potential hydrogen bonding partners in the extended polypeptide chain are satisfied by hydrogen bonds to water. When the protein folds, these protein-to-water H-bonds are broken, and only some are replaced by (often sub-optimal) intra-protein Hbonds (enthalpic terms increase). • It would appear that hydrogen bonding is destabilizing to folded protein structure • However, one must also consider entropy. When a protein folds, and those hydrogen bonds that the protein made to bulk water are broken, the entropy of the solvent increases. Protein Folding: Hydrogen Bonds • The balance between the entropy and enthalpy terms are close, and in the recent past it was considered that H-bonds made no contribution overall to protein stability. • But, it is now generally accepted that H-bonds make a positive contribution to protein stabilization. • We must remember that if we break or delete an intramolecular hydrogen bond in a protein without the possibility of forming a compensating H-bond to solvent, that protein will be destabilized. Energetics of protein folding ΔGfold = ΔH - TΔS Energetics of protein folding H-bonds hydrophobic effects (entropy) salt bridges (enthalpy) SS - bonds loss of solvation entropy change dispersion / VdW contacts conformational energy • • Difference of two very large energetic terms Low overall stabilization energy Principles of protein structure • Primary Structure • Secondary Structure • Tertiary Structure • Quaternary Structure Three-dimensional form of local segments of proteins, such as the formation of loops or helices. Secondary Structures: α-Helices • α-Helices: Every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier (i+4 -> i hydrogen bonding). Atomic representation Full atom (cpk) representation Ribbon (cartoon) representation Secondary Structures: β-sheets • β-sheets - beta strands connected laterally by three or more hydrogen bonds, forming a generally twisted, pleated sheet. Secondary Structures: β-sheets • Most β-sheets have a left-handed twist: Bovine pancreatic trypsin inhibitor 0° - 30° per aa Secondary Structures: β-sheets • Parallel and anti-parallel β-sheets Structural motifs • • Structural motifs (often referred to as super-secondary structures) consist of several secondary structure elements and loops. Examples: – Helix loop Helix: Consists of alpha helices bound by a looping stretch of amino acids. Important in DNA binding proteins. – Beta Hairpin: Extremely common. Two anti-parallel beta strands connected by a tight turn of a few amino acids between them. – Zinc Finger: Two beta strands with an alpha helix end folded over to bind a zinc ion. This motif is seen in transcription factors. – Greek Key: 4 beta strands folded over into a sandwich shape. Peptide bonds • Geometry of peptide bonds H R H R Peptide bonds • Definition of dihedral angels Φ, Ψ, and ω. A dihedral angle is the angle of intersection of two planes. It is the measure of an angle having its vertex on the intersecting edge and one side in each of the planes. The sides of the angle are perpendicular to the intersecting edge. ω Peptide bonds • Dihedral angles Φ and Ψ, the values that are possible are constrained geometrically due to steric clashes between neighboring atoms. Peptide bonds • Ramachandran Plots: The permitted values of phi psi Ψ (deg) Φ (deg) • Ramachandran Plots Ψ (deg) Φ (deg) Principles of protein structure • Primary Structure • Secondary Structure • Tertiary Structure • Quaternary Structure The tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates Tertiary Structure • Very large proteins (proteins with more than 10’000 residues are possible) are rarely forming one large compact structure, but are often structured in individual domains of ~200-500 residues. • Domains: The definition of protein domains adopted here is that of compactly folded structures with their own hydrophobic core which may fold independently of the rest of the chain. Tertiary Structure: Domains Phosporylase kinase domain MAP Kinase ERK-2 Phospotransferase domain Principles of protein structure • Primary Structure • Secondary Structure • Tertiary Structure • Quaternary Structure Quaternary Structure • Arrangement of multiple folded protein molecules in a multi-subunit complex. • e.g.: human hemoglobin: 4 chains: α2β2 Where do we find protein structures? http://www.wwpdb.org/ http://www.pdb.org http://www.ebi.ac.uk/pdbe/ http://www.pdbj.org Growth of the Protein Data Bank PDB Total Yearly [ PDB: http://www.pdb.org ] Growth of the Protein Data Bank PDB Representation of Structural Information • Representation of Structural Information – Atom types (chemical element and hybridization) – Atom coordinates – Atom charges (full or partial) – Topology (connectivity of atoms) – Chemical bond type – Chirality and Ambiguities – Trajectories – Surfaces and scalar fields (e.g. electrostatics) – Identification (IUPAC name, trivial names) – Experimental details (source of data) – Accuracy and reliability information – Annotation (cross references with other databases) File formats and their limitations • Representation of Structural Information – File formats: • • • • • • SMILES MOL2 (Tripos Inc.) SDF PDB mmCIF PDBML PDB file format • • http://www.pdb.org File format is column based 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 HEADER MUSCLE PROTEIN 02-JUN-93 1MYS • Sections: • Title • Primary Structure • Heterogen Section • Secondary Structure • Connectivity Annotation Section • Miscellaneous Features Section • Crystallographic and Coordinate Transformation Section • Coordinate Section • Connectivity Section PDB file format 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 HEADER 3-EPIMERASE 01-DEC-98 1RPX TITLE D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM SOLANUM TUBEROSUM TITLE 2 CHLOROPLASTS COMPND MOL_ID: 1; COMPND 2 MOLECULE: PROTEIN (RIBULOSE-PHOSPHATE 3-EPIMERASE); COMPND 3 CHAIN: A, B, C; COMPND 4 EC: 5.1.3.1; COMPND 5 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: SOLANUM TUBEROSUM; SOURCE 3 ORGANISM_COMMON: POTATO; SOURCE 4 ORGANISM_TAXID: 4113; SOURCE 5 ORGANELLE: CHLOROPLAST; SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 7 EXPRESSION_SYSTEM_TAXID: 562 KEYWDS 3-EPIMERASE, CHLOROPLAST, CALVIN CYCLE, OXIDATIVE PENTOSE KEYWDS 2 PHOSPHATE PATHWAY EXPDTA X-RAY DIFFRACTION AUTHOR J.KOPP,G.E.SCHULZ REVDAT 4 24-FEB-09 1RPX 1 VERSN REVDAT 3 01-MAR-05 1RPX 1 DBREF REVDAT 2 01-APR-03 1RPX 1 JRNL REVDAT 1 07-APR-99 1RPX 0 JRNL AUTH J.KOPP,S.KOPRIVA,K.H.SUSS,G.E.SCHULZ JRNL TITL STRUCTURE AND MECHANISM OF THE AMPHIBOLIC ENZYME JRNL TITL 2 D-RIBULOSE-5-PHOSPHATE 3-EPIMERASE FROM POTATO JRNL TITL 3 CHLOROPLASTS. JRNL REF J.MOL.BIOL. V. 287 761 1999 JRNL REFN ISSN 0022-2836 JRNL PMID 10191144 JRNL DOI 10.1006/JMBI.1999.2643 REMARK 1 .... HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, REMARK PDB file format HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, REMARK 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 ... REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.30 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.8.5.1 REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 2.3 REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 35.0 REMARK 3 DATA CUTOFF (SIGMA(F)) : 0.0 REMARK 3 DATA CUTOFF HIGH (ABS(F)) : 100000.0 REMARK 3 DATA CUTOFF LOW (ABS(F)) : 0.001 REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 97.2 REMARK 3 NUMBER OF REFLECTIONS : 49783 REMARK 3 REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM REMARK 3 R VALUE (WORKING SET) : 0.174 REMARK 3 FREE R VALUE : 0.212 REMARK 3 FREE R VALUE TEST SET SIZE (%) : 3.01 REMARK 3 FREE R VALUE TEST SET COUNT : 1500 REMARK 3 ESTIMATED ERROR OF FREE R VALUE : 0.005 ... PDB file format MODEL, ATOM, SIGATM, ANISOU, SIGUIJ, TER, HETATM, ENDMDL 1 2 3 4 5 6 7 1234567890123456789012345678901234567890123456789012345678901234567890 ... ATOM 74 N ASP A 10 12.982 78.264 31.707 1.00 48.50 ATOM 75 CA ASP A 10 14.137 79.163 31.764 1.00 46.20 ATOM 76 C ASP A 10 14.910 79.105 30.460 1.00 43.70 ATOM 77 O ASP A 10 14.572 78.355 29.547 1.00 45.78 ATOM 78 CB ASP A 10 15.133 78.752 32.855 1.00 49.64 ATOM 79 CG ASP A 10 14.471 78.300 34.129 1.00 57.95 ATOM 80 OD1 ASP A 10 13.809 79.129 34.788 1.00 57.91 ATOM 81 OD2 ASP A 10 14.651 77.114 34.487 1.00 63.05 ... HETATM 5200 S SO4 231 30.451 80.354 18.252 1.00 51.91 HETATM 5201 O1 SO4 231 30.153 81.805 18.105 1.00 57.57 HETATM 5202 O2 SO4 231 31.895 80.187 18.738 1.00 54.06 HETATM 5203 O3 SO4 231 29.512 79.607 19.287 1.00 46.19 HETATM 5204 O4 SO4 231 30.193 79.714 16.846 1.00 50.16 ... x,y,z atom coordinates N C C O C C O O S O O O O Representation of Structural Information Alanine Ala A ATOM ATOM ATOM ATOM ATOM 263 264 265 266 267 N CA C O CB ALA ALA ALA ALA ALA A A A A A 35 35 35 35 35 1.429 0.523 -0.724 -1.850 1.209 34.959 34.398 33.878 34.138 33.268 -16.825 -17.829 -17.157 -17.600 -18.594 1.00 1.00 1.00 1.00 1.00 35.48 35.10 33.88 33.13 33.84 N C C O C References and further reading: 1. Thomas E. Creighton, “Proteins: Structures and Molecular Properties”. 2. Arthur M. Lesk, “Introduction to Protein Architecture. The Structural Biology of Proteins”