* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Outline Visualizing proteins with PyMol
Point mutation wikipedia , lookup
Paracrine signalling wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genetic code wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Signal transduction wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Structural alignment wikipedia , lookup
Interactome wikipedia , lookup
Metalloprotein wikipedia , lookup
Homology modeling wikipedia , lookup
Protein purification wikipedia , lookup
Biochemistry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
StructuralBioinforma/cs Lecture1:ProteinStructure FrankDiMaio([email protected]) Outline • “Homework”fornext/me – installPyMOL(seeemail) – installFoldit(hFp://fold.it/portal/), completefirst6introductorypuzzles • Today’slecture – proteinstructure,visualiza/on VisualizingproteinswithPyMol Obtaining PyMOL We will use an educational build of PyMOL that is freely available. I will email you download instructions. Very Basic Tutorial Here is a simple tutorial that will show you how to open and visualize a protein: http://www.pymolwiki.org/index.php/Practical_Pymol_for_Beginners For today’s lecture I will be using PyMOL to look at the protein ubiquitin. (pdb file: 1ubq from the Protein Data Bank - http://www.rcsb.org/pdb/ home/home.do) (pymol -> show how to move protein, explain default representation) Mo/va/on:Whydowecareabout macromolecularstructure? • Sequence-->Structure-->Func/on • Structuredeterminesfunc/on,sounderstandingstructurehelps ourunderstandingoffunc/on • Structuremoreconservedthansequence • Structureallowsiden/fica/onofmoredistantevolu/onary rela/onships • Structureisencodedinsequence • Understandingthedeterminantsofstructureallowsdesignand manipula/onofproteins Proteinbioinforma/csproblems •Proteinclassifica/on •Analysisofproteinstructuralproper/es •Structurepredic/onfromsequence •Func/onpredic/onfromsequenceandstructure •Interfaces,ac/vesites,regulatorysites,specificity •Modelingmolecularmo/ons •Predic/ngphysicalproper/es (stability,bindingaffini/es) •Designofstructureandfunc/on ProteinsarePolymersofAminoAcids Amino acids Aminoacidshave chiralcenters polypep/de Non-polarorHydrophobicAminoAcids Serine(Ser) Glycine (Gly, G) Glycine(Gly) Alanine (Ala, A) Alanine(Ala) Valine(Val) Valine (Val, V) Isoleucine(Ile) Isoleucine (Ile, I) OH Threonin Leucine(Leu) Leucine (Leu, L) CH2 H CH3 CH CH3 CH3 Phenylalanine (Phe, F) Phenylalanine(Phe) Tyrosine (Tyr, Y) Tyrosine(Tyr) CH CH3 CH2 CH2 CH CH3 CH3 CH OH CH3 CH3 Trptophan (Trp, W) Methionine (Met, M) Proline (Pro, P) Proline(Pro) Trptophan(Trp) Methionine(Met) Histidine(His) (pKa=6.0) CH2 CH2 CH2 CH2 CH2 H N CH2 S HN CH3 Serine(Ser) Threonine(Thr) Cysteine(Cys) Asparagine(Asn) Aspartic Ac (pKa=3.9) O C CH2 N C NH Backbone bonds: red Side chain bonds: black OH O Glutamine (Gln) CH2 O- N S PolarorHydrophilicAminoAcids HN NH CH3 OH Serine (Ser, S) Serine(Ser) (Ile) 3 Threonine (Thr, T) Threonine(Thr) Cysteine (Cys, C) Cysteine(Cys) Asparagine (Asn, N) Asparagine(Asn) Glutamine (Gln, Q) Glutamine (Gln) Leucine(Leu) CH2 CH2 CH CH3 CH OH OH C SH CH3 CH2 CH2 CH2 CH2 O C NH2 CH3 NH2 hionine(Met) Histidine Histidine(His) (His, H) Proline(Pro) =6.0) pKa(pK = 6.0 a H2 Glutamic Acid(Glu) Lysine(Lys) Arginine(Arg) Aspartic AsparticAcid(Asp) Acid (Asp, D) Glutamic Acid (Glu, (pK E) Lysine (Lys, K) Arginine (Arg, R) = 10.8) (pK =4.1) a a (pKa=3.9) pKa = 3.9 pKa = 4.1 pKa = 10.8 (pKa = 12.5) pKa = 12.5 CH2 H N H2 CH2 N C H3 NH O- O CH2 CH2 CH2 CH2 CH2 CH2 CH2 CH2 CH2 NH C O- ragine(Asn) O Glutamine (Gln) O NH3+ (pymol1ubq–showhowtodisplaysequence,explainatomcoloring, CH2 CH2 selectaspecificaminoacidtype) +C NH2 NH2 Waterandhydrogenbonds (Hydrogenbondsarealsocalled“H-bonds”) NOTE: H-bond distances vary. The H-bond distance shown is only the approximate average H-bond distance in liquid water. (see AAM, Table 2.3) O-O distance: 0.274 nm = 2.74 Å =1.77Å =0.965Å Lehninger 4/e Fig 2.1 Important: The O-H distance of ~2.74 Å in an H-bond is smaller than the sum of : (i) the O-H covalent bond distance of ~0.97 Å + (ii) the H vdW-radius of ~1.2 Å + (iii) the O vdW-radius of ~1.4 Å, since this sum is: 0.97 + 1.2 + 1.4 = ~ 3.6 Å. 10Å=1nm=10-9m Hydrogen bonds in general Ingeneral: AhydrogenbondcanberepresentedasD-H….A,where: D-H=weaklyacidic“donor”group,suchasO-H,N-H A=weaklybasic“acceptor”atomsuchasO,N (Lehninger4/eFig.2.4) The Building Blocks of All Proteins Gly Ala Val Ile Leu Met Phe Tyr Trp Pro Ser Cys Thr His Asp Lys Glu Asn Gln Arg A Polypeptide Chain Linking amino acids by forming peptide units. The order of the amino acids is called the “Primary Structure” of a protein GeneralFeaturesofPolypep/des Backbonehastwopolar groupsperresidue pep/debondshavedouble bondcharacterandpreferto beplanar Bondanglesandlengthsarelargelyinvariant,proteinsadoptdifferent conforma/onsbyvaryingphiandpsi (pymol->showhowtomeasuredistances,anglesandtorsions) Two main chain torsional angles per residue: phi (Φ) and psi (Ψ) Cαn+1 Pep/deunit Cαn Pep/deunit EachpepPdeunitcontains sixatoms: α C ,C,O,N,Handthenext Cα. Cαn−1 If one peptide unit is kept fixed, Φ and Ψ define the orientation of the second peptide unit. So, in a first approximation, the course of a polypeptide chain is defined by a pair of dihedral angles (Φ and Ψ) per amino acid residue. 14 The definition of phi (Φ) Cαn+1 Pep/deunit Cαn The four highlighted atoms Pep/deunit (Cn-1,Nn, Cαn, Cn) define the dihedral angle Φ Cαn−1 If one peptide unit is kept fixed, Φ and Ψ define the orientation of the second peptide unit. So, in a first approximation, the course of a polypeptide chain is defined by 15 a pair of dihedral angles (Φ and Ψ) per amino acid residue. The definition of psi (Ψ) Cαn+1 The four highlighted atoms Pep/deunit (Nn, Cαn, Cn, Nn+1) define the dihedral angle Ψ Cαn Pep/deunit Cαn−1 If one peptide unit is kept fixed, Φ and Ψ define the orientation of the second peptide unit. So, in a first approximation, the course of a polypeptide chain is defined by 16 a pair of dihedral angles (Φ and Ψ) per amino acid residue. Ramachandrandiagramalsocalledthe(Φ,Ψ) -plot Regions with allowed torsion angles (Φ, Ψ) for non-Gly and non-Pro residues βA β P α αL βP βA c = α-helix = left-handed α-helix = parallel β-strands (“ideal”) = antiparallel β-strands (“ideal”) = collagen Blue: region of (Φ, Ψ) combinations of lowest energy Green: region of (Φ, Ψ) combinations with somewhat less favorable energy Tan: (Φ, Ψ) –plot also called the “Ramachandran diagram” region of (Φ, Ψ) combinations with high energy. Duetoclashesbetweenatomsfromsubsequentresidues,notallcombinaPonsofΦandΨarepossible For18aminoacidstheso-calledRamachandranplotisasabove. ForGly,whichhasnosidechain,theRamachandranislessrestricted. 17 ForPro,whichhasaspecialsidechain,theRamachandranis(much)morerestricted. Sidechaindependenceof Ramachandranangles • Torsionpreferencesvaryfordifferentsidechains • MostlooklikealaninebecauseofclasheswithCβ Classifyingproteinstructure 1. Theirprimarystructureistheaminoacidsequenceofthe polypep/dechain. 2.Secondarystructureisthelocalspa/alarrangementofa polypep/de’sbackboneatoms.Commonsecondarystructures area-helicesandb-strands. 3.TerParystructurereferstothethree-dimensionalstructureof theen/repolypep/dechain. 4.Someproteinsarecomposedoftwoormorepolypep/de chains.Thespa/alarrangementofthesechainsisaprotein’s quaternarystructure. Higher-orderStructure (pymol->showcartoonrepresenta/on) ProteinSecondaryStructure:Thea-helix Purple: Hydrogen Bonds i Red: Oxygen i+1 Dark Blue: Nitrogen i+2 i+3 Light Blue: Hydrogen Green: Carbon i+4 A standard a-helix has hydrogen bonds between residues i and i+4. (pymolshowhydrogenbondsinhelix) Amphipathic a-Helix Yellow: hydrophobic amino acids Blue: hydrophylic amino acids Val – Lys – Glu – Leu – Leu – Asp – Lys – Val - Glu 3 4 ProteinSecondaryStructure:The b-strand Purple: Hydrogen Bonds Red: Oxygen Dark Blue: Nitrogen Light Blue: Hydrogen Green: Carbon b-sheet b-strands come together to form b-sheets (the interaction can be either parallel or anti-parallel). ParallelvsAn/parallelb-strandInterac/ons (pymolshowbetasheets) β-sheetsforma“pleatedsheet” Note: Therearealso manyMIXED ß-sheets,with somestrands paralleland othersanPparallel 7.0Å Cβofside chain Lehningerf04.074thed. VVP2/eFig6-10 InbothparallelandanP-parallelβ-sheets: ThesidechainspointalternaPnglyinoppositedirecPons 25 Parallelb-strands:whyaretheytwisted? Afullyextendedchainisflat Realbetastrandstwistandare notflat Parallelb-strands:whyaretheytwisted? • Ondiagonal:notwisttopep/debackbone phi=-psi • Abovediagonal:right-handedtwistto pep/debackbone;favoredbysidechains largerthanAla • Foragivenenergy,moreaccessiblestates forright-handtwistthanforleo-hand (pymol->showtwistinsheet) Hydrophobic / hydrophilic patterning in b-strands Thr – Leu – Asn – Ile – Lys - Phe 2 (pymol->showhydrophobicpaFerninginbetasheet) Protein Secondary Structure: Loops and Turns loop Example: an antigen binding domain of an antibody Active site residues and binding residues are often found in loops. Turns are short loops (2-4 residues), and typically have more regular structure than loops. Between secondary and tertiary structure • Supersecondary structure: arrangement of elements of same or different secondary structure into motifs; a motif is usually not stable by itself. • Domains: A domain is an independent unit, usually stable by itself; it can comprise the whole protein or a part of the protein. b-hairpin:Mostcommonformof/ghtturn type Fi+1 Yi+1 Fi+2 Yi+2 I -60 -30 -90 0 I’ 60 30 90 0 i+3 II -60 120 80 0 II’ 60 -120 -80 0 i+2 i+1 i TypeII’ b-hairpin:Mostcommonformof/ghtturn Example of a b-hairpin in bovine pancreatic trypsin inhibitor– BPTI. Example of a protein with two bhairpins: erabutoxin from whale. The helix-turn-helix motif • Thismo/fischaracteris/cofproteinsbindingtothemajorDNAgrove. • Theproteinscontainingthismo/frecognizepalindromicDNAsequences. • Thesecondhelixisresponsiblefornucleo/desequencerecogni/on. The helix-turn-helix motif βαβmo/f Why? • Shorterconnec/onsinright-handedtopology? • Accessibilitytohelixterminiforhydrogenbonding? • Trappedends? TriosePhosphateIsomerase(TIM) Adomainwhichoccursinamanyproteins. 5 Notethe“β-barrel”in thecentersurrounded by α-helices 4 3 6 2 7 8 1 Notethe8-fold repeatedβ-αmoPf The“TIMbarrel”:α/βclasstopology 36 Protein Tertiary Structure • Most proteins adopt a unique three dimensional structure that is essential to the biological role they perform. Protein structures can be divided into three groups: globular proteins, fibrous proteins, and integral membrane proteins. Examples: HIV protease (globular) Porin (membrane) Collagen (fibrous) Most globular proteins share these characteristics 1) Hydrophobics on the inside 2) Close packing 3) Most polar groups involved in a hydrogen bond Hydrophobic residues of procarboxypeptidase Most globular proteins share these characteristics 1) Hydrophobics on the inside 2) Close packing 3) Most polar groups involved in a hydrogen bond acylphosphatase (pymol) Most globular proteins share these characteristics 1) Hydrophobics on the inside 2) Close packing 3) Most polar groups involved in a hydrogen bond Hydrogen bond between a serine and a backbone carbonyl Fibrous Proteins • highly elongated molecules that generally function as structural materials • their sequences are usually highly repetitive Example: Collagen is the major stress-bearing component of bone, tendon and other connective tissues. A fiber 1 mm in diameter can support 20 lbs. 15 Å 3000 Å Collagen Sequence: G-X-Y Hyp Pro Gly Hyp Pro Gly Collagen Hyp = 4-hydroxyproline • Pro is converted to Hyp by the enzyme prolyl hydroxylase which uses vitamin C as a cofactor – hence vitamin C deficiency can lead to unstable collagen à connective tissue problems (aka scurvy) FIBROUSPROTEINS:KeraPnandcoiledcoilα-helices α-keratin is the principal protein of mammalian hair, nails, skin. VVP2/eFig6-15. VV(notVVP)f08.253rded.. 43 Membrane Proteins • ~30% of human proteins are membrane proteins • ~70% of therapeutics are directed towards membrane proteins Membrane proteins are important for: 1) ion and solute transport 2) detection of external signals, e.g. hormones 3) cell-to-cell recognition Membrane Proteins: hydrophobic residues are found on the exterior membrane blue: hydrophilic sidechains yellow: hydrophobic sidechains Hydrophobic environment water Space filling representation of porin protein Ribbon representation of porin protein (pymol->show2POR) membrane Membraneproteins areooeneitherall-αorall-β TheproteinavoidsplacingmainchainC=OandNHgroups inthehydrophobicbilayer) OmpFPorin LIPID BILAYER LIPID BILAYER Bacteriorhodopsin α-HELICEScrossingthemembrane β-BARRELcrossingthemembrane 46 Mul/-domainproteins • Manyproteinscontain‘independent’domainsconnectedbylinkers.Itis commontocombinerecogni/ondomainswithac/va/ondomains.By piecingdomainstogetherinnewwaysitispossibletocreatenewfunc/ons. Example: Src tyrosine kinase. The SH3 domain recognizes substrate and the kinase domain phosphorylates the substrate. SH3 SH2 Kinase Interesting fact: the human genome does not contain more types of protein domains than more primitive organisms, but rather just puts them together in more complicated ways. MulP-domainproteinsareverycommon The order of the symbols indicates the order of the domains Domains are compact folded “nodules” of a protein chain Livingorganismsolenstringdomainstogetherintooneproteinchainand thenmodifyeachdomainforaspecificfuncPon 48 Intrinsically Unfolded Proteins • Whatareunstructuredproteins?Proteinsorsegmentsof proteinsthatlackawell-structured3Dfold.Theyare referredtoas“na/velyunfolded”or“intrinsicallyunfolded” • Howprevalentareunstructuredproteins?Approximately 40%ofproteinshaveunstructuredregionsthatarelonger than50residues,6-17%ofproteinsinSwiss-Protare probablyfullydisordered(basedontheore/calpredic/ons). • WhatarethefuncPonsofunstructuredproteins?Thereare many(seelater). ManyIntrinsicallyUnfoldedProteinsAdopt StructureUponBindingPartnerMolecules DysonandWright(2005)NatRevMolCellBiol.6:197-208 What are some of the unique features of disordered proteins? • Extensivebindinginterfacescanbecreatedwithrela/vely smallproteins • Conforma/onalflexibilityallowsaproteinsegmenttobind itstargetaswellastoamodifyingenzyme(i.e.posttransla/onalmodifica/on). • Pliable(unstructured)proteinscaninteractwithmany differentbindingpartners.Chaperonesooencontain unstructuredregionsthatareusedtorecognizeadiverse arrayofsubstrates. ClassifyingTer/aryStructure Historically:MuchworkdonebyChothiatodeveloprules governingpackingarrangementsofsecondarystructure (likeridge-into-groovemodelforhelix-helixpacking) Modernschemesusesequencesimilarityandstructurestructurecomparisonstoorganizetheproteinuniverse andelucidatestructuralandevolu/onaryrela/onships – SCOP–structuralclassifica/onofproteins – CATH–classarchitecturetopologyhomologous superfamily – Dalidomaindic/onary CATH • acombina/onofmanualandautomated hierarchicalclassifica/on • fourmajorlevels: – Class(C)–basedonsecondary structurecontent – Architecture(A)–basedongross orienta/onofsecondarystructures – Topology(T)–basedonconnec/ons andnumbersofsecondarystructures – Homologoussuperfamily(H)–based onstructure/func/onevolu/onary commonali/es • providesusefulgeometricinforma/on(e.g. architecture) • par/alautoma/onmayresultinexamples nearfixedthresholdsbeingassigned inaccurately SCOP • • • • apurelymanualhierarchical classifica/on threemajorlevels: – Family–basedonclear evolu/onaryrela/onship (pairwiseresidueiden//es betweenproteinsare>30%) – Superfamily–basedonprobable evolu/onaryorigin(low sequenceiden/tybutcommon structure/func/onfeatures – Fold–basedonmajorstructural similarity(majorsecondary structuresinsamearrangement andtopology providesdetailedevolu/onary informa/on manualprocessinfluencesupdate frequencyandequallyexhaus/ve examina/on α-Helicalproteins • Mostlyα-helices • Parallel/an/parellelbundles • Globulararrangements βproteins • Mostlyβstrands • Manyan/parallelstrand pairsfromβhairpins • “βsandwich”–twosheets pairedwith~30°rota/on α/βproteins • Intermixedαandβsecondarystructureelements • • • • Manyβαβelements:mostlyparallelstrandpairs Openfacedsheetswithtwoedges;twoorthreelayers Barrelswithnoedgestrands Helicestendtofollowthedirec/onofstrands Otherproteinclasses • α+β:segregatedαandβsecondarystructure • small:/nyproteinsthatusedisulfidebondstoconferstability • coiledcoils:2+parallelheliceswrappedaroundoneanother Theproteinstructureuniverse X-RayCrystallography • crystallizeandimmobilizesingle, perfectprotein • bombardwithX-rays,record scaFeringdiffrac/onpaFerns • determineelectrondensitymap fromscaFeringandphasevia Fouriertransform: • useelectrondensityand biochemicalknowledgeofthe proteintorefineanddetermine amodel NMRSpectroscopy determiningconstraints usingconstraintstodetermine secondarystructure • proteininaqueoussolu/on, mo/leandtumbles/vibrates withthermalmo/on • NMRdetectschemicalshiosof atomicnucleiwithnon-zero spin,shiosduetoelectronic environmentnearby • determinedistancesbetween specificpairsofatomsbased onshios,“constraints” • useconstraintsand biochemicalknowledgeofthe proteintodeterminean ensembleofmodels Cryo-electronmicroscopy