* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Proteins - Structure, folding and domains
Evolution of metal ions in biological systems wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemical cascade wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
Genetic code wikipedia , lookup
Biosynthesis wikipedia , lookup
Paracrine signalling wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Interactome wikipedia , lookup
Signal transduction wikipedia , lookup
Homology modeling wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Biochemistry wikipedia , lookup
Proteins - Structure, folding and domains Tommi Kajander X-ray crystallography lab Institute of Biotechnology University of Helsinki Outline basics on proteins stability and folding structures and domains X-ray crystallography protein structure primary structure, amino acid sequence, the peptide bond, polypeptide runs from amino(N) to carboxy(C)terminus of the chain. secondary structure, tertiary structure (the folded state) + oligomeric state (quaternary structure). cellular functions produced by living cells, translated from the encoding gene. may function inside the cell (in its various compartments, e.g. nucleus -DNA binding proteins- or cytosol or on the cell membrane or be secreted out (e.g. microbial enzymes such as cellulases etc, animals growth factors, antibodies, lysozyme, proteases etc (food processing in the stomach, gut). Protein functional classes enzymes structural proteins (collagen etc) ion channels transporters (hemoglobin, throughmembrane transport) immune system proteins (binding) other binding proteins (growth factors, DNA binding proteins, chaperones) often functionalities are specific to specific domains (see below) Building blocks the 20 natural Lamino acids (no Damino acids) the peptide bond amino and carboxytermini, direction of the polypeptide zwitterions, chiral with different “side chains” The 20 amino acids 20 natural amino acids encoded by DNA (one amino acid per base triplet, 64 triplets) all L-amino acids (stereochemistry), chiral around Cα−atom. hydrophobic and polar and charged. what does this mean, implications? learn them! some with special properties (Gly, Pro, Cys..). atom nomenclature: – heavy atoms: N,C,O, Cα (peptide unit) + side chain atoms Cβ, Cγ, Xδ etc with distance from Cα. – some side chains are branched (numbering 1,2..) – three letter and one letter codes (Glysine, Gly, G) hydrophobic amino acids inside proteins the hydrophobic core, sticky binding pockets. aromatic rings, stacking. delocalization of the π-electrons (Tyr, Trp, hydrogen bonds) volume and shape Proline ring, imino acid polar and charged Sulphur containing amino acids • Cysteine and methionine • Cys the most reactive amino acid, thiol group can oxidize and deprotonate, pKa ca 8-9. • disulhide brigdes, structurally important. also non-spesific dimerization/aggregation via free -SH groups •Met potentially nucleophilic, hydrophobic. Acidic groups Aps, Glu. pKa = 4 can vary with environment substantially (proteins tune the functional group protonations) active sites/enzymes metal ion binding (also other polar residues (e.g His and transition metals) acid/base catalysis thermostability/ionic networks Basic groups: Arg, Lys, His positively charged under physiological pH (neutral) His, metal coordination and catalysis pK around 6-7, protonation state variable (mostly not charged?) Lys, catalytic base in active sites Arg, ion-pairs, electrostatic interactions in active sites and ligand (e.g PO4) binding Other polar Asn, Gln – hyrdogen bonding, Asn can deaminate. – Asn, N-linked glycosylation of proteins Ser, Thr, Tyr – H-bonds, Ser in catalysi, Ser, Thr, Tyr are phosporylated and dephosphorylated in cellular signalling events (most common signalling method) “turn” residues Glysine, Proline – common in turns, break secondary structure (not always) – Pro is an imino acid, generates kinks, if not turns. – both destablize secondary structure in any case (helices and strands) – Pro and hydroxyl-Pro in collagen triple helix (also Gly). – Gly has NO side chain. – Pro is really an imino acid cis or trans form, usually trans. Pro is rigid, Gly can have variable peptide bond phi and psi angles. pKa, pI pKa -log10 of the acid dissociation constant: – Ka = [H+][A-]/[HA] for HA <-> H+ + A- pI, the isoelectric point, net charge of protein is zero. – can be calculated from sequence (google protparam), approximately, or experimentally ,by isoelectric focussing. – so this tells whether protein has + or - charge at specific pH. – implications on solubility? pKas of amino acids residue Asp Glu His Cys Tyr Lys Arg pKa 4.5 4.6 6.2 9.1-9.5 9.7 10.4 ca. 12 physiological pH (7.4) charged(-) neutral/charged -SH(neutr.) -NH3+ charged(+) charged(+) roles of residues in proteins hydrophilic out, hydrophobic in (or more precisely the groups not necessarily whole side chains, e.g. lysine has a long aliphatic hydrocarbon chain) functional/active site specificity is in shape (conformation and hydrophobic) hydrogen bonding (polar/charged) catalytic residues (acids, bases, nucleophiles, metal ion binding) about enzymes types of catalysis: – general acid/base catalysis (Lys, Asp, Glu, His) – electrostatic catalysis (exclusion of solvent) – nucleophilic and electrophilic catalysis Ser, Cys (proteases) most common nucleophiles, metals as electrophiles (+ Reduction of entropy on binding, high effective concentration, catalysis in preordered active site) The Serine protease catalytic triad (electron transfer chain) amino acids in the polypeptide context polypeptide backbone + side chains of the amino acid residues = protein protein synthesis peptide bonds are formed (in cells) on the ribosome on translation of the of messengerRNA(mRNA), amino acid gets transfered from the transferRNA(tRNA) and added to the polypeptide C-terminus. Structure formation properties of the side chains determine the higher order structure of proteins, and functionality, mostly. hydrogen bonds between and from the peptide “backbone” amide and carbonyl groups are important for secondary structure (still defined by side chains) peptide bond is planar (important!) proteins can be a) fibrious/filamentous (collagen, silk, muscles myosin and actin), b) soluble (e.g. enzymes, growth factors, insulin etc etc), or in the cell lipid membrane c) “membrane proteins” (hydrophobic/lipid soluble) (ion channels and pumps, control of cell homeostatis, transporters and receptors (G-protein coupled receptors, signalling), typically cell surface receptors have a intramembrane domain + extracellular ligand binding domain + intracellular region for signalling inside the cell. Secondary structure fold/structure of the protein stabilized in secondary structure, α-helices or β-strands sequencial arrangement (topology) and spacial organization of these elements defines the FOLD of the protein amino acids have different propensities for forming particular secondary structure (e.g. Ala and non-β branced residues in α-helices steric restraints Ramachandran plot of polypeptide main chain variable angles (phi, psi) only a restricted set of conformations available for different secondary structures (and amino acids) steric effects, due to side chains --> Gly has most freedom, hence important in turns. polypeptide “backbone” angles 3 main chain dihedrals Only two really vary – Create all of the backbone structures seen in proteins. Ramachandran plot Beta strands - spread out Alpha (lh) helices mostly glycine Alpha (rh) helices bunched up >90% of residues should be in most favoured areas Glycines in many areas from MLE, code 1MUC α-helix 3.6 residues/turn main chain h-bonding from i to i+4. 1.5 Å rise/residue 5.4 Å rise/turn Independent stabilization by Hbonding. Sidechains point back like a Christmas tree. helices often have different faces helix dipole/capping (N/C) helical wheel and packing amphilicity, binary pattern (polar/aliphatic, helical packing), “heptad repeat” packing: “grooves to ridges” (in real life more variable, Bowie Nat. Struct. Biol. 1997) β-strand Flat-ish Two varieties – parallel – antiparallel Often distorted – Twisted, Bulged Certain proteins can also form amyloid fibrils =“infinite Beta-sheets” (Altzheimers’s, prions, other amyloid diseases, e.g. lysozyme and betamicro-globulin and transerythrin variants) antiparallel β-sheet β-turns • nearly 1/3 of residues involved in turns (various types) •hairpin loops, connect betastrands • classification (here, type I and II), torsion angles Supersecondary structure Regular secondary structure patterns – Sometimes functional HTH motif (DNA binding element) – Sometimes structural β−α − β β−hairpin sequence analysis conserved domains or motifs (web tools). membrane vs. soluble proteins – hydrophobicity plots(scales), again the www. – types of membrane proteins: (alpha-helical e.g. 7-TM proteins, ion-channels etc), beta-barrel, single-span alpha-helical (many receptors and adhesion molecules). GABA-transporter domains.. Domains Defined as independently-foldable units. Often mark different functional units. Separated by linker regions. MLE, code 1MUC Goldman, Helin et al. Domains Mammalian proteins often are “beads on a string” – Repeated domains often occur – Different functions, structural variations – See e.g. SMART or PFAM databases (Google) for domain compositions of proteins Domains 19-20 of Factor H Jokiranta et al & Goldman, EMBO J, 2006 Proteins Folds Some protein folds are unique, but the “fold-space” must be limited (structural genomics attemps to map this). many proteins have several domains Most proteins fold into already-known structures Some common folds – Even when there is no sequence homology (<10-20%) – Structure preserved much longer than sequence. – β/α barrel – 7-TM receptor – 4-helix bundle − β-sanwich domains (immunoglobulin-like) etc. β/α barrel β/α barrel – 40% of unique proteins – Usually enzymes – 8 repeated β/α units 7-TM fold 7-TM fold – Example is Bovine rhodopsin – >500 such folds in humans – 7 transmembrane helices – Signal transduction signals very varied PDB code: 1F88 Repeat proteins Tetratricopeptide Repeat (TPR) Leucine rich repeat HEAT repeat Ankyrin repeat -protein-protein interactions in various cellular contexts -structural scaffolds The β-propeller (a repeat protein of sorts) -All β-sheet -4-7 ”blades” with 4 strands -Neuraminidase, WD-40 repeat proteins (Gproteins). -binding site often in the center β-structures: ”Sandwich”, barrels, propellers Neurospora crassa CMLE, Kajander et al. (2002) The β−sandwich fold -Antibodies (Immunoglobulins), adhesion molecular and receptors NCAM, ICAM etc, RAGE, FGFR etc. Antibody structure multidomain structure.. integrin αvβ3 Hsp90 complex with p23 protein complexes Stable – Ribosomes (small and large subunit) (proteinRNA) – FoF1 ATPase (ATP synthase, proton pump) – Proteosome, etc etc. Transient – Nucleic acid polymerase complexes (proteinDNA) – chaperones and their (unfolded) substrate proteis – signalling complexes (e.g G-proteins and their receptors) viral capsids Protein folding “the protein folding problem” – – – complex, vs e.g. DNA/RNA structure Can’t predict the structure ab initio but there are cases of success for small proteins what stabilises the folded (native) state driving forces mechanisms protein folding Levinthal’s paradox – folding via sampling all conformations (contact order) would take more than age of the universe --> folding landscape/not all conformations are explored. – Original example: 3100 = 5 X 1047, 1013 per second, or 3 x 1020 per year, it will take 1027 years…. hydrophobic vs hydrophilic: – the most basic idea of protein folding is binary patterning (used in protein design of helical bundles, most simple case). Kamtekar et al & , Hecht MH. Protein design by binary patterning of polar and nonpolar amino acids. Science (1993) folded/native state free energy minimum, typical functional proteins have a single defined native state hydrophobic and aromatic residues inside charged/polar residues on the outside OR hydrogen bonded (solvation by the protein) hydrophobic and aromatic residues inside charged/polar residues on the outside OR hydrogen bonded (solvation by the protein) – one observed feature of thermophilic proteins is increased number and extent of ionic networks minimal “alphabet” designs (polar, hydrophobic, turn, e.g funcitonal SH3 domain by Riddle et al & Baker D. amino acids: Ile, Lys, Glu, Ala, Gly) Both cases used combinatorial methods (selection or screening) from mutant libraries (Kamtekar et al & Riddle et al ...and others) Thermodynamics of folding large opposing and favoring terms – the hyrdophobic effect (release of water from hydrophobic surface, entropy driven) – hydrogen bonding of the backbone, favorable enthalpy Cooperativity, e.g. helix and beta-sheet formation, main chain hydrogen bonds define folded topology. all buried side chains solvated by H-bonding – loss of conformational freedom of the protein chain (unfavorable entropy) – other effects? charge burial? small proteins <100 residues, sometimes stabilized by bound metals or disulphides. Substratcting two large values: proteins are marginally stable (ca. 5-20 kcal/mol) ...and one hydrogen bond is (2-10 kcal/mol) Thermodynamic studies of protein stability denaturation by heat or chemical denaturation 0-6 M Gu-HCl CD spectroscopy (secondary structure) Trp-fluorescence quenching (tertiary structure) Protein folding CI2: collapse & formation of secondary structure simultaneously: nucleation-condensation (a basic foldon) typically single domain proteins exhibit two-state unfolding with sharp transition multidomain or oligomeric proteins not. SPECIAL CASE (?): repeat proteins: The 1-D Ising model and mechanical unfolding (step-wise + elasticitity, hearing (hair cells in ear) AFM 1D-Ising model: Stability of TPRs vs. number of TPR repeats • a two-state transition • Increase in stability with increasing number of repeats • Increase in cooperativity (slope of transition) with number of repeats (in addition to dominant local interactions and regular structure retained as repeats are added) The 1D-Ising model captures this behavior (as shown for helix-coil transition by the Zimm-Bragg model): =(spin up, +1) folded helix =(spin down, -1) unfolded helix = description of folded/unfolded helices (in TPRs) Æ each “spin” up-down interaction Kajander et al. & Regan (2005) J. Am. Chem. Soc. will cost energy Folding states Equilbr. U->N, U->I->N (end states observable) Seen by several transitions, or difference in fluorescence and far-UV CD-spectroscopy Kinetics tell about the folding pathway Folding kinetics Two-state kinetics vs intermediates First-order kinetics if 2-state Transtion state theory: – kf=υκ exp –∆G‡/RT Æ ∆∆G‡=RTln kf’/kf Chevron plots • measure folding & unfolding (dilution to or from e.g. GuHCl) • deviations from: effects of intermediates, mutations / what they affect. Folding, φ-value analysis in order to study the folding pathway one needs to look at kinetics (e.g. trpfluorescence by stopped-flow rapid mixing) e.g. phi-value analysis of mutants (Ferhst & co-workers) Range from 0 to 1 (effect of mutation on denatured or folded state). – 0=(already) unfolded in transition state (mut: no effect), 1=structured in transition state. Mutate away H/D exchange with NMR. ÆVariability in local stability protection factor (pf) = ku/kex CTPR2 & CTPR3 –NH hydrogen exchange (Main et al. & Regan PNAS 2005) 3D-Structural Techniques of biomolecules X-ray (and neutron) scattering & crystallography. – Elucidates structure-function relationships at the atomic level. – Very large size range (40 Da to > 1 MDa) (ribosome, viruses) NMR spectroscopy also provides detailed structural data. – Complementary to crystallographic efforts and results. – NMR has been restricted to molecules below 40 kDa. Electron microscopy, single particle and diffraction. – Low resolution, but highest molecular mass (eg Clathrin pits, whole ribosome, enveloped viruses) Strategy in protein structure determination via X-ray crystallography Obtain pure preparations in milligram quantities of the macromolecule(s) of interest. Screen for preliminary crystallization conditions on an incomplete factorial basis. Consider the biochemistry. Refine crystallization conditions / Mutants / protein chemical modifications. Collect diffraction data from suitable crystals. Solve the “Phase” problem. Analyze the electron density maps and build the atomic model. Landscape of Structural Biology Cloning Model Building and Refinement Protein Expression and Purification Crystallization Trials Data Collection Optimization crystallization typically by vapour diffusion also microdialysis or “batch” (no diffusion). or diffusion in a capillary use 24/48/96 wellplates different temperatures (4/RT/other) start with various random screens Crystallisation automatin with robotics http://www.biocenter.helsinki.fi/bi/xray/automation/ Crystals Crystals and Symmetry Why do we need a crystal? – Scattering from a single molecule is undetectable. need a lattice – Crystals have translational symmetry – Can resolve features at atomic resolution. Unit Cell Smallest object from which you can make the entire crystal by translation along the edges. Bounded by lattice points. Edges: a, b, c Angles: α, β, γ What is in a unit cell? A unit cell is built from asymmetric units. Asymmetric unit L L L L L L L L L L L L L L Examples L L L – the smallest unit from which a unit cell can be built by application of crystallographic symmetry operators. L Translational Symmetry Cover space – Screw axes sliding rotations 21, 31, 32, 41, 42, 43 repeats that extend across unit cells – (Glide planes) sliding mirrors 21 41 42 Asymmetric unit The smallest object from which the unit cell can be built up, by application of crystallographic symmetry. Contents of an asymmetric unit – At least one macromolecule per asymmetric unit. – Can be more - if there is non-crystallographic symmetry: a dimeric protein may be in the asymmetric unit (or not). the 5-fold symmetric virus coat is always in the asymmetric unit. Symmetry Crystallographic (Space group) symmetry – The symmetry elements apply throughout the crystal. A 2-fold axis in unit cell A will be coincident with other 2-fold axes in different unit cells. Non-crystallographic (local) symmetry – The symmetry elements apply locally only to the atoms around a particular lattice point. Axes are parallel but not (except by chance) coincident with each other. Crystal Crystal lattice – Asymmetric unit + symmetry = unit cell 2 What is a protein crystal? Crystal lattice is a periodic, symmetric system: – Proteins are asymmetric and fit periodicity poorly Inorganic crystals are hard and dry – protein crystals are moist and include up to 70% solvent . – Solvent channels between molecules: – Possible to soak in chemicals; enzymes can be active in the crystal Bravais lattices Bravais, 1850: – 14 different crystal lattices possible: – Due to symmetry of unit cell. – Constraints on cell lengths & angles. 230 space groups; 65 biological ones Point group – Symmetry around a point. Space group – Symmetry arrangement that covers space. The word “group” here has its full mathematical meaning. Space Groups Certain space groups are more common in protein (and other) crystals. 60% of the PDB structures belong to 5 space groups: Space Group Percentage (1997) P212121 23 % P3221 8% C2 8% P21 11 % P21212 8% Space groups 230 possible space groups – All available symmetry operations on all 14 Bravais lattices. Biological ones are chiral – 65 of those. Protein crystals Only limited amount of symmetries are allowed for proteins: no mirror! Symmetry has to fill the space: 2-fold, 3-fold, 4-fold, 6-fold (no 5 or 7 fold etc!! Can’t build a crystal lattice!!) Screw axis – Ex. P21 Crystals are not like solution? Protein crystals containing 35-75% (vol.) water Crystals can be active Enzyme activity Time-resolved crystallography Bacteriorhodopsin photobleaching – Pumping in the D96N variant crystal (colour changes) Myoglobin CO photolysis Diffraction Are these good ? For our purposes they are good when they diffract X-rays well. Data collection Data from a synchrotron Microscopy How it works – Specimen diffracts parallel light – objective lens focusses diffracted image Diffraction different orders of diffraction (waves) 1,2,3,4,5.... Crystal Lattice A theoretical concept Lattice planes (Hila tasot) – Can be defined by their intersection with the axes – Spacing, d, between the planes used in Bragg’s law. in phase from identical lattice points Braggs law: nλ=2d sinθ Fourier transform • adding up the components of nth (of increasing frequency) order will give the sum = repeating unit. • the more terms the finer the detail ρ xyz 1 = V ∑∑∑ | F | h k Measured hkl eiα e − 2πi ( hx + ky +lz ) l Phase Fourier Transform The “phase problem” ? Obtaining the phase(s) to reconstruct the diffracted waves need to know the phase, intensities collected from diffraction give amplitude (of the wave) I = |F|2 Only the Intensities can be directly obtained by the X-ray detector. patterson function •methods (in practise): •heavy atom derivatization, inclusion of (other) anomalous scattering elements (e.g. SelenoMethioninelabeling, Br-nucleotides), or using a model (homologous (ca. >30-40 % id.) structure) and find the position and orientation of this in the crystal unit cell (asymmetric unit more exactly). •Find the heavy atom structure with patterson function calculation (a fourier transform without the phases) 1 −2 πi(hx +ky +lz ) Phkl = ∑ ∑ ∑ Ihkl e V h k l Difference patterson function (IPH-IP=IH) Cross vector peaks between atoms related by symmetry operators of the space group found on special sections called the harker planes. From this can be obtained H (x,y,z) Structure solution Obtain initial electron density with the heavy atom coordinates (initial phase refinement) Build a model (and/or use non-crystallographic symmetry averaging and solvent flattening to improve maps, possibly phase extension to higher resolution if data available) Automated chain tracing programs + hand building on graphics A map phased with 4 Ptatoms finishing the structure refinement fit model back to X-ray data via a minimization target function (minimize an energy, including known geometric/chemical terms) The meaning of resolution (1.92.5-3.0Å) lattice plane separation d, smaller the separation the wider the diffraction angle = higher resolution of “image” How do you know when the crystal structure is right? – Data completeness, – signal/noise (I/sigma) True measures of resolution – R-factor (Rwork, Rfree) R= – Geometry (Ramachandran plot, r.m.s.d. of bonds and angles) – look at the electron density maps ∑F o − Fc ∑F o Links and literature Google: PDB, www.expasy.com (various seq. analysis tools + swissmodel), SMART, PFAM + web courses on crystallography. EDS: http://eds.bmc.uu.se/eds/ (view maps for PDB structures) (protein interaction domains: http://pawsonlab.mshri.on.ca/index.php, doenst include all...) Free PC/Mac sofware for structure viewing and analysis: Pymol (best by far), Swiss-PDBviewer, Rasmol (see also PDB site). Reading – Introduction to protein structure, Bränden C and Tooze J. – Crystallography made crystal clear, Rhodes G. – Outline of Crystallography for Biologists, Blow D. – Structure and mechanism in Protein Science, Fersht A.