* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Protein Structure
Signal transduction wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Gene expression wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Structural alignment wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Biochemistry wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Homology modeling wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Introduction to Bioinformatics Introduction to Bioinformatics Lecture 14: Protein Folding Centre for Integrative Bioinformatics VU (IBIVU) 18 Apr 2006 1 Introduction to Bioinformatics Introduction to Protein Structure • Great book covering basics of Protein Structure: 18 Apr 2006 – Short Introduction to Molecular Structures – “Introduction to Protein Structure” • Chapters 1 to 5 • Carl Branden & John Tooze ISBN: 0-8153-2305-0 2 Introduction to Bioinformatics Prelude: molecular structures • John Dalton (1810) A new system of chemistry • Elements, but no structures yet • Mendeljev (1869) 18 Apr 2006 3 Introduction to Bioinformatics Johannes van ’t Hoff • Chimie dans l’Espace “Proposal for the development of three-dimensional chemical structural formulae” (1875) • Tetraedrical carbon atom 18 Apr 2006 4 Introduction to Bioinformatics Linus Pauling (1951) • Atomic Coordinates and Structure Factors for Two Helical Configurations of Polypeptide Chains • Alpha-helix 18 Apr 2006 5 Introduction to Bioinformatics James Watson & Francis Crick (1953) • Molecular structure of nucleic acids 18 Apr 2006 6 Introduction to Bioinformatics James Watson & Francis Crick (1953) • Molecular structure of nucleic acids 18 Apr 2006 7 Introduction to Bioinformatics DNA/Protein structure-function analysis and prediction The building blocks: •Chains of amino acids •Three-dimensional Structures •Four levels of protein architecture •Amino acids: classes •Disulphide bridges •Histidine •Proline •Ramachandran plot: mainchain dihedral angles •Rotamers: sidechain dihedral angles 18 Apr 2006 8 Introduction to Bioinformatics The Building Blocks (proteins) • • • • 18 Apr 2006 Proteins consist of chains of amino acids Bound together through the peptide bond Special folding of the chain yields structure Structure determines the function 9 Introduction to Bioinformatics 18 Apr 2006 Chains of aminoacids 10 Introduction to Bioinformatics Three-dimensional Structures • Four hierarchical levels of protein architecture 18 Apr 2006 11 Introduction to Bioinformatics Aminoacids: physicochemical classes • Hydrophobic aminoacids Alanine Phenylalanine Leucine Methionine Ala Phe Leu Met A F L M Valine Isoleucine Proline Val Ile Pro V I P Asp Lys D K Glutamate (-) Arginine (+) Glu Arg E R Ser Tyr Asn His S Y N H Threonine Cysteine Glutamine Tryptophane Thr Cys Gln Trp T C Q W • Charged aminoacids Aspartate (-) Lysine (+) • Polar aminoacids Serine Tyrosine Asparagine Histidine • Glycine (sidechain is only a hydrogen) Glycine 18 Apr 2006 Gly G 12 Introduction to Bioinformatics Disulphide bridges • Two cysteines can form disulphide bridges • Anchoring of secondary structure elements 18 Apr 2006 13 Introduction to Bioinformatics Ramachandran plot • Only certain combinations of values of phi (f) and psi (y) angles are observed psi psi phi omega phi 18 Apr 2006 16 Introduction to Bioinformatics 18 Apr 2006 Rotamers: highly populated combinations of side-chain dihedral angles Rotamers •are amino acid sidechain dihedral angles, numbered 1, 2, 3,... going outward from C atom •different numbers of -angles depending on amino acid type •are usually defined as low energy side-chain conformations. •the use of a library of rotamers allows the modeling of a structure while trying the most likely side-chain conformations, saving time and producing a structure that is more likely to be correct. 17 Introduction to Bioinformatics 18 Apr 2006 DNA/Protein structure-function analysis and prediction Motifs of protein structure • Secundary structure elements • Renderings of proteins • Alpha helix • Beta-strands & sheets • Turns and motifs • Domains formed by motifs 18 Introduction to Bioinformatics Motifs of protein structure • Global structural characteristics: – Outside hydrophylic, inside hydrophobic (unless…) – Often globular form (unless…) Artymiuk et al, Structure of Hen Egg White Lysozyme (1981) 18 Apr 2006 19 Introduction to Bioinformatics 18 Apr 2006 Secundary structure elements Alpha-helix Beta-strand 20 Introduction to Bioinformatics Renderings of proteins • Irving Geis: 18 Apr 2006 21 Introduction to Bioinformatics Renderings of proteins • Jane Richardson: 18 Apr 2006 22 Introduction to Bioinformatics Alpha helix • Hydrogen bond: from N-H at position n, to C=O at position n-4 (‘n-n+4’) 18 Apr 2006 23 Introduction to Bioinformatics Other helices • Alternative helices are also possible 18 Apr 2006 – 310-helix: hydrogen bond from N-H at position n, to C=O at position n-3 • Bigger chance of bad contacts – -helix: hydrogen bond from N-H at position n, to C=O at position n-4 – p-helix: hydrogen bond from N-H at position n, to C=O at position n-5 • structure more open: no contacts • Hollow in the middle too small for e.g. water • At the edge of the Ramachandran plot 24 Introduction to Bioinformatics Helices • Backbone hydrogen bridges form the structure – Often covers hydrophobic centre of protein • Sidechains point outwards (‘Xmas tree’) – Possibly: one side hydrophobic, one side hydrophylic (amphipathic helices) 18 Apr 2006 25 Introduction to Bioinformatics Beta-strands: beta-sheets • Beta-strands next to each other form hydrogen bridges 18 Apr 2006 26 Introduction to Bioinformatics Parallel or Antiparallel sheets Anti-parallel Parallel • Usually only parallel or anti-parallel • Occasionally mixed • Sidechains alternating (up-down) 18 Apr 2006 27 Introduction to Bioinformatics Turns and motifs • Between the secundary structure elements are loops • Very short loops between two b-strands: turn • Different secondary structure elements often appear together: motifs – Helix-turn-helix – Calcium binding motif – Hairpin – Greek key motif – b--b-motif 18 Apr 2006 28 Introduction to Bioinformatics Helix-turn-helix motif • Helix-turn-helix important for DNA recognition by proteins • EF-hand: calcium binding motif 18 Apr 2006 29 Introduction to Bioinformatics Hairpin / Greek key motif • Different possible hairpins : type I/II • Greek key: anti-parallel beta-sheets 18 Apr 2006 30 Introduction to Bioinformatics b--b motif • Most common way to obtain parallel b-sheets • Usually the motif is ‘righthanded’ 18 Apr 2006 31 Introduction to Bioinformatics Domains formed by motifs • Within protein different domains can be identified – For example: • ligand binding domain • DNA binding domain • Catalytic domain • Domains are built from motifs of secondary structure elements 18 Apr 2006 32 Introduction to Bioinformatics Alpha/beta barrels • TIM barrel after triosephosphate isomerase • Usually 8 b-strands, at least 200 aminoacids • Often hydrophobic interior – alternating amino acids in the strands 18 Apr 2006 33 Introduction to Bioinformatics Alpha/beta barrels • Active site formed by (variable) loop regions at top of the barrel • Exception: active site in the core of methylmalonyl-coenzyme A mutase 18 Apr 2006 34 Introduction to Bioinformatics Summary • Aminoacids form polypeptide chains • Chains fold into three-dimensional structure • Specific backbone angles are permitted or not: Ramachandran plot • Secundary structure elements: -helix, b-sheet • Common structural motifs: Helix-turn-helix, Calcium binding motif, Hairpin, Greek key motif, b--b-motif • Combination of elements and motifs: tertiary structure • Many protein structures available: PDB 18 Apr 2006 35 Introduction to Bioinformatics Sequence-Structure-Function What can we do with bioinformatics? Knowledge based Ab initio Sequence Inverse folding, Threading BLAST Structure Function Folding: impossible but for the smallest structures Function prediction from structure – very difficult •Ab initio prediction (based on first principles) is still not generally succesful (red) •Many Bioinformatics methods are therefore knowledge-based (green) 18 Apr 2006 36 Introduction to Bioinformatics Active protein conformation • Active conformation of protein is the native state • unfolded, denatured state – high temperature – high pressure – high concentrations urea (8 M) • Equilibrium between two forms Denatured state 18 Apr 2006 Native state 37 Introduction to Bioinformatics Anfinsen’s Theorem (1950’s) • Primary structure determines tertiary structure. In the mid 1950’s Anfinsen began to concentrate on the problem of the relationship between structure and function in enzymes. […] He proposed that the information determining the tertiary structure of a protein resides in the chemistry of its amino acid sequence. […] It was demonstrated that, after cleavage of disulfide bonds and disruption of tertiary structure, many proteins could spontaneously refold to their native forms. This work resulted in general acceptance of the ‘thermodynamic hypothesis’ (Nobel Prize Chemistry 1972)." www.nobel.se/chemistry/laureates/1972/anfinsen-bio.html • Anfinsen performed un-folding/re-folding experiments 18 Apr 2006 38 Introduction to Bioinformatics Dimensions: Sequence Space • How many sequences of length n are possible? N(seq) = 20 • 20 • 20 • … = 20n e.g. for n = 100, N = 20100 10130, is nearly infinite – Only a subset of these will fold in a stable conformation • The probability p of finding twice the same sequence is p = 1/N, e.g. 1/10130 is nearly zero. • Evolution: divergent or convergent – sequences are dissimilar, in divergent and particularly in convergent evolution 18 Apr 2006 39 Introduction to Bioinformatics Dimensions: Fold Space • How many folds exist? – Sequences cluster into sequence families and fold families – some have many members, some few or only one: • Using Zipf’s law: n(r) = a / rb • For sequence families: b 0.64 ntotal 60000 • For fold families: b 0.8 ntotal 14000 18 Apr 2006 r is the rank of family, n(r) is the number of proteins in the r-th family, a is a scaling constant, depending on the number of proteins in the dataset. Constant b does not depend on the size of the dataset. 40 Introduction to Bioinformatics Levinthal’s paradox (1969) • Denatured protein re-folds in ~ 0.1 – 1000 seconds • Protein with e.g. 100 amino acids each with 2 torsions (f en y) Each can assume 3 conformations (1 trans, 2 gauche) 3100x2 1095 possible conformations! • Or: 100 amino acids with 3 possibilities in Ramachandran plot (, b, L): 3100 1047 conformations • If the protein can visit one conformation in one ps (10-12 s) exhaustive search costs 1047 x 10-12 s = 1035 s 1027 years! (the lifetime of the universe 1010 years…) 18 Apr 2006 41 Introduction to Bioinformatics Levinthal’s paradox Protein folding problem: – Predict the 3D structure from sequence – Understand the folding process 18 Apr 2006 42 18 Apr 2006 Introduction to Bioinformatics From 1D to 3D… 43 18 Apr 2006 44 Introduction to Bioinformatics Nanoseconds, CPU-days 100000 60 10000 10 1000 1 100 CPU years Introduction to Bioinformatics What to fold? …fastest folders 10 1 PPA alpha helix BBA5 beta hairpin villin Pande et al. “Atomistic Protein Folding Simulations on the Submillisecond Time Scale Using Worldwide Distributed Computing” Biopolymers (2003) 68 91–109 18 Apr 2006 45 Experiments: 100000 villin BBAW 10000 Predicted folding time (nanoseconds) Introduction to Bioinformatics Rates: predicted vs experiment BBAW: Gruebele, et al, UIUC beta hairpin 1000 villin: Raleigh, et al, SUNY, Stony Brook beta hairpin: Eaton, et al, NIH 100 alpha helix: Eaton, et al, NIH alpha helix 10 PPA: Gruebele, et al, UIUC PPA Predictions: 1 1 10 100 1000 10000 100000 Pande, et al, Stanford experimental measurement 18 Apr 2006 (nanoseconds) 46 Introduction to Bioinformatics Molten globule • First step: hydrophobic collapse • Molten globule: globular structure, not yet correct folded • Local minimum on the free energy surface 18 Apr 2006 47 Introduction to Bioinformatics Folded state • Native state = lowest point on the free energy landscape • Many possible routes • Many possible local minima (misfolded structures) 18 Apr 2006 48 Introduction to Bioinformatics Folding energy • Each protein conformation has a certain energy and a certain flexibility (entropy) • Corresponds to a point on a multidimensional free energy surface Three coordinates per atom 3N-6 dimensions possible DG = DH – TDS In very rough generalities: DH relates to bond formation/breaking DS relates to configurational freedom and water ordering 18 Apr 2006 49 Introduction to Bioinformatics Hydrophobic Effect Fundamental: The Hydrophobic Effect is a Solvent Effect Oil + Water Oil How is interfacial water layer ordered? 18 Apr 2006 50 Introduction to Bioinformatics Hydrophobic Effect in Protein Folding HOH + HOH DS = + Unfolded More Hydrocarbon-Water Interfacial Area, More Water Ordered 18 Apr 2006 Folded Less Hydrocarbon-Water Interfacial Area, Less Water Ordered 51 Introduction to Bioinformatics Helper proteins • Forming and breaking disulfide bridges – Disulfide bridge forming enzymes: Dsb – protein disulfide isomerase: PDI • “Isomerization” of proline residues – Peptidyl prolyl isomerases • Chaperones – Heat shock proteins – GroEL/GroES complex – Preventing or breaking ‘undesirable interactions’… 18 Apr 2006 52 Introduction to Bioinformatics Disulfide bridges • Equilibriums during the folding process 18 Apr 2006 53 Introduction to Bioinformatics Proline: two conformations • Peptide bond nearly always trans (1000:1) • For proline cis conformation also possible (trans:cis equilibrium = 4:1) • For folding, all prolines need to be in trans conformation -Isomerization is bottleneck, cyclophilin catalyses 18 Apr 2006 54 Introduction to Bioinformatics Chaperones • During folding process hydrophobic parts outside? – Risk for aggregation of proteins • Chaperones offer protection – Are mainly formed at high temperatures (when needed) – Heat-shock proteins: Hsp70, Hsp60 (GroEL), Hsp10 (GroES) 18 Apr 2006 55 Introduction to Bioinformatics GroEL/GroES complex • GroEL: – 2 x seven subunits in a ring – Each subunit has equatorial, intermediate and apical domain – ATP hydrolyse, ATP/ADP diffuse through intermediate domain • GroES: – Also seven subunits – Closes cavity of GroEL 18 Apr 2006 56 Introduction to Bioinformatics GroEL/GroES mechanism • GroES binding changes both sides of GroEL – closed cavity – open cavity • cycle – protein binds side 1 – GroES covers, ATP binds – ATP ADP + Pi – ATP binds side 2 – ATP -> ADP + Pi • GroES opens • folded protein exits • ADP exits – New protein binds 18 Apr 2006 57 Introduction to Bioinformatics Alternative folding: prions • Prion proteins are found in the brains • Function unknown • Two forms – normal alpha-structure – harmful beta-structure • beta-structure can aggregate and form ‘plaques’ – Blocks certain tissues and functions in the brains 18 Apr 2006 58 Introduction to Bioinformatics Protein flexibility • Also a correctly folded protein is dynamic – Crystal structure yields average position of the atoms – ‘Breathing’ overall motion possible 18 Apr 2006 59 Introduction to Bioinformatics B-factors • The average motion of an atom around the average position beta-sheet 18 Apr 2006 alpha helices 60 Introduction to Bioinformatics 18 Apr 2006 Protein Tertiary Structure Tied to Function 61 Introduction to Bioinformatics Conformational changes • Often conformational changes play an important role for the function of the protein • Estrogen receptor – With activator (agonist) bound: active – With inactivator (antagonist) bound: not active 18 Apr 2006 active inactive 62 Introduction to Bioinformatics Main points • Anfinsen: proteins fold reversibly! • Levinthal: too many conformations for fast folding? – First hydrophobic collapse, then local rearrangement • Protein folding funnel – Assistance with protein folding • Sulphur bridge formation • Proline isomerization • Chaperonins • Intrinsic flexibility: Breating / Conformational change – Conformational changes for • Activation / Deactivation 18 Apr 2006 63