* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download protein_mol_biophysics_slides
Nucleic acid analogue wikipedia , lookup
Gene expression wikipedia , lookup
Peptide synthesis wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Expression vector wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Interactome wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Biosynthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Proteolysis wikipedia , lookup
Folding simulation: self-organization of 4-helix bundle protein yellow = helical turns Protein structure Protein: heteropolymer chain made of amino acid residues R φ | +H 3 N- C- ψ COO- | H Chain of amino acid residues 20 different amino acids More than 50,000 different proteins in human body alone Protein structure str an d primary Linear chain of amino acids secondary Local regular structures tertiary m yo glo bin β- αhe lix Hierarchical levels of structure: 3-D compact structure with long-range contacts The biological function is determined by shape. The shape is determined by primary sequence. How? (“Protein Folding Problem”) Protein Folding “Protein folding”: Primary sequence Native state ∆S<0 Random coil Highly organized Compact 3-d structure p Huge variation in the possible primary sequence: 20N (20 different amino acids, N is # of amino acids in a chain) Most sequences do not fold; primary sequence must be carefully chosen Methods for finding primary sequences that fold to specific shapes: l Evolution: trial and error, requires lots of time l Engineering: Understand underlying principles of Self-organization Protein Folding Problem l Proteins are long (>50) chains of amino acid residues l Biological functioning requires protein chain to fold to very specific compact shape: “native state” l Chain is very flexible: each amino acid has internal degrees of l Paradox: Even with super-fast sampling rate 10-12sec/config freedom (Φ, Ψ, sidechain, e.g. 4 states each) ⇒ > 64 configurations Ex: Myoglobin (153 amino acids) 64153 = 10276 configs ⇒ 10264 seconds (10256 yrs) to randomly find native state. (degeneracy of native state reduces this to 10118 years) Actual real protein folding times: milli-seconds !! l How ?? Folding must be a guided deterministic process, not random. Configuration space is frustrated, ultra-metric. Different initial configurations converge to native state. How does primary sequence determine folding dynamics. l Interactions are non-linear: Anti-chaotic dynamics !? Energy Landscape Funnel shaped, different initial configurations guide system to the same native state. Anti-Chaos? Is this a valid and useful approach? (B. Gerstman and Y. Garbourg, Journal of Polymer Science B: Polymer Physics, 36, 2761-2769, 1998.) -- 0 Many other axes are necessary to represent all the structural degrees of freedom. Which are most important? Ultimate Physics Aim: Determine which aspects of 1-D sequence of amino acids in peptide chain determine efficient folding pathway and final shape (native state). Immediate Aim: Determine if formalism of non-linear dynamics is useful for investigating protein folding. This work: Can formalism of non-linear dynamics show that large scale un-folding is deterministic (and is it mathematically anti-chaotic) and distinguish random thermal fluctuations ? Use data from lattice simulations of protein unfolding (realistic folding simulations of full proteins not available) First check to confirm that model realistically simulates protein dynamics. Compare results from model for characteristics that have been experimentally measured; e.g. Heat Capacity Why use computer model? The system is complex - Huge number of degrees of structural freedom - Many terms in the Hamiltonian - System is not solvable analytically Monte Carlo simulations are very useful for these kinds of systems - Interested in relaxation times (non-equilibrium dynamics), as well as final configurations (equilibrium). Lattice model and interaction Hamiltonian Red: backbone Green: side chain Interaction Hamiltonian: ss ssp bb bb rep H = ∑ ∑ a ij ∑ E ij + a ij E + a ij E rep + i j> i p ∑a l l i El + ∑a m m i E m close enough contact (or preferred state)? Yes: a = 1; No: a = 0. ss : sidechain-sidechain bb: backbone-backbone l : local m : cooperative p : hydrophobic or polar or hydrophilic Protein Configuration Energy Determined by Interaction Hamiltonian i,j : amino acid residue number in the primary sequence. aijss: are sidechains of i and j close enough to interact; yes = 1, no = 0. Eijssp: sidechain-sidechain energy (p = 1 hydrophobic-hydrophobic, p = 2 hydrophilic-hydrophilic, p = 3 hydrophobic-hydrophilic). aijbb: are backbones i and j close enough to interact; y=1, n=0. Ebb: backbone-backbone interaction energy ( hydrogen bond, dipole, soft core repulsion combined together) ail: are residues i-1, i, i+1, arranged so that ‘i’ is in its preferred user-defined local configuration (i.e. α-helix, β-sheet, turn); y=1, n=0. El: local propensity energy. aim: are residues i-1, i, i+1, i+2 arranged so that i and i+1 are both in the same preferred local configuration; y=1, n=0 Em: medium range (cooperative) propensity energy DNA Nucleus DNA Packaging Inside Nucleus Nucleosome Supercoil Chromosome Protein Scaffold Double Helix (Partially Disrupted) 2 nanometers 1 turn = 10 base pairs = 3.4 nanometers Minor Groove Major Groove NUCLEOTIDES Phosphate Base Sugar TRIPLET CODONS GENETIC CODE Initiation Codon Termination Codons U U First Base in Codon C A G C A G UUU Phe UCU Ser UAU Tyr UGU Cys UUC Phe UCC Ser UAC Tyr UGC Cys UUA Leu UCA Ser UAA UAA UGA UUG Leu UCG Ser UAG UGA UGG Trp CUU Leu CCU Pro CAU His CGU Arg CUC Leu CCC Pro CAC His CGC Arg CUA Leu CCA Pro CAA Gln CGA Arg CUG Leu CCG Pro CAG Gln CGG Arg AUU Ile ACU Thr AAU Asn AGU Ser Ile ACC Thr AAC Asn AGC Ser AUA Ile ACA Thr AAA Lys AGA Arg AUG AUG Met ACG Thr AAG Lys AGG Arg GUU Val GCU Ala GAU Asp GGU Gly GUC Val GCC Ala GAC Asp GGC Gly GUA Val GCA Ala GAA Glu GGA Gly GUG Val GCG Ala GAG Glu GGG Gly AUC Third Base in Codon