Download Outline Visualizing proteins with PyMol

Document related concepts

Point mutation wikipedia , lookup

Paracrine signalling wikipedia , lookup

Thylakoid wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Metabolism wikipedia , lookup

Biosynthesis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genetic code wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Signal transduction wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Structural alignment wikipedia , lookup

SR protein wikipedia , lookup

Interactome wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Metalloprotein wikipedia , lookup

Homology modeling wikipedia , lookup

Protein purification wikipedia , lookup

Biochemistry wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
StructuralBioinforma/cs
Lecture1:ProteinStructure
FrankDiMaio([email protected])
Outline
•  “Homework”fornext/me
–  installPyMOL(seeemail)
–  installFoldit(hFp://fold.it/portal/),
completefirst6introductorypuzzles
•  Today’slecture
–  proteinstructure,visualiza/on
VisualizingproteinswithPyMol
Obtaining PyMOL
We will use an educational build of PyMOL that is freely available. I will
email you download instructions.
Very Basic Tutorial
Here is a simple tutorial that will show you how to open and visualize a
protein:
http://www.pymolwiki.org/index.php/Practical_Pymol_for_Beginners
For today’s lecture I will be using PyMOL to look at the protein ubiquitin.
(pdb file: 1ubq from the Protein Data Bank - http://www.rcsb.org/pdb/
home/home.do)
(pymol -> show how to move protein, explain default representation)
Mo/va/on:Whydowecareabout
macromolecularstructure?
•  Sequence-->Structure-->Func/on
•  Structuredeterminesfunc/on,sounderstandingstructurehelps
ourunderstandingoffunc/on
•  Structuremoreconservedthansequence
•  Structureallowsiden/fica/onofmoredistantevolu/onary
rela/onships
•  Structureisencodedinsequence
•  Understandingthedeterminantsofstructureallowsdesignand
manipula/onofproteins
Proteinbioinforma/csproblems
•Proteinclassifica/on
•Analysisofproteinstructuralproper/es
•Structurepredic/onfromsequence
•Func/onpredic/onfromsequenceandstructure
•Interfaces,ac/vesites,regulatorysites,specificity
•Modelingmolecularmo/ons
•Predic/ngphysicalproper/es
(stability,bindingaffini/es)
•Designofstructureandfunc/on
ProteinsarePolymersofAminoAcids
Amino
acids
Aminoacidshave
chiralcenters
polypep/de
Non-polarorHydrophobicAminoAcids
Serine(Ser)
Glycine
(Gly, G)
Glycine(Gly)
Alanine
(Ala, A)
Alanine(Ala)
Valine(Val)
Valine
(Val, V)
Isoleucine(Ile)
Isoleucine
(Ile, I)
OH
Threonin
Leucine(Leu)
Leucine
(Leu, L)
CH2
H
CH3
CH
CH3
CH3
Phenylalanine
(Phe, F)
Phenylalanine(Phe)
Tyrosine
(Tyr, Y)
Tyrosine(Tyr)
CH
CH3
CH2
CH2
CH
CH3
CH3
CH
OH
CH3
CH3
Trptophan
(Trp, W) Methionine
(Met, M) Proline
(Pro, P)
Proline(Pro)
Trptophan(Trp)
Methionine(Met)
Histidine(His)
(pKa=6.0)
CH2
CH2
CH2
CH2
CH2
H
N
CH2
S
HN
CH3
Serine(Ser)
Threonine(Thr)
Cysteine(Cys)
Asparagine(Asn)
Aspartic Ac
(pKa=3.9)
O
C
CH2
N
C
NH
Backbone bonds: red
Side chain bonds: black
OH
O
Glutamine (Gln)
CH2
O-
N
S
PolarorHydrophilicAminoAcids
HN
NH
CH3
OH
Serine (Ser, S)
Serine(Ser)
(Ile)
3
Threonine (Thr, T)
Threonine(Thr)
Cysteine (Cys, C)
Cysteine(Cys)
Asparagine (Asn, N)
Asparagine(Asn)
Glutamine (Gln, Q)
Glutamine (Gln)
Leucine(Leu)
CH2
CH2
CH
CH3
CH
OH
OH
C
SH
CH3
CH2
CH2
CH2
CH2
O
C
NH2
CH3
NH2
hionine(Met) Histidine
Histidine(His)
(His, H)
Proline(Pro)
=6.0)
pKa(pK
= 6.0
a
H2
Glutamic Acid(Glu) Lysine(Lys)
Arginine(Arg)
Aspartic
AsparticAcid(Asp)
Acid (Asp, D) Glutamic Acid (Glu, (pK
E) Lysine
(Lys, K)
Arginine (Arg, R)
=
10.8)
(pK
=4.1)
a
a
(pKa=3.9)
pKa = 3.9
pKa = 4.1
pKa = 10.8 (pKa = 12.5)
pKa = 12.5
CH2
H
N
H2
CH2
N
C
H3
NH
O-
O
CH2
CH2
CH2
CH2
CH2
CH2
CH2
CH2
CH2
NH
C
O-
ragine(Asn)
O
Glutamine (Gln)
O
NH3+
(pymol1ubq–showhowtodisplaysequence,explainatomcoloring,
CH2
CH2
selectaspecificaminoacidtype)
+C
NH2
NH2
Waterandhydrogenbonds
(Hydrogenbondsarealsocalled“H-bonds”)
NOTE:
H-bond distances vary.
The H-bond distance shown is only
the approximate average H-bond
distance in liquid water.
(see AAM, Table 2.3)
O-O
distance:
0.274 nm =
2.74 Å
=1.77Å
=0.965Å
Lehninger 4/e Fig 2.1
Important:
The O-H distance of ~2.74 Å in an H-bond is smaller than the sum of :
(i) the O-H covalent bond distance of ~0.97 Å
+ (ii) the H vdW-radius of ~1.2 Å
+ (iii) the O vdW-radius of ~1.4 Å,
since this sum is: 0.97 + 1.2 + 1.4 = ~ 3.6 Å.
10Å=1nm=10-9m
Hydrogen bonds in general
Ingeneral:
AhydrogenbondcanberepresentedasD-H….A,where:
D-H=weaklyacidic“donor”group,suchasO-H,N-H
A=weaklybasic“acceptor”atomsuchasO,N
(Lehninger4/eFig.2.4)
The Building Blocks of All Proteins
Gly
Ala
Val
Ile
Leu
Met
Phe
Tyr
Trp
Pro
Ser
Cys
Thr
His
Asp
Lys
Glu
Asn
Gln
Arg
A Polypeptide Chain
Linking amino acids by forming peptide units.
The order of the amino acids is called the “Primary Structure” of a protein
GeneralFeaturesofPolypep/des
Backbonehastwopolar
groupsperresidue
pep/debondshavedouble
bondcharacterandpreferto
beplanar
Bondanglesandlengthsarelargelyinvariant,proteinsadoptdifferent
conforma/onsbyvaryingphiandpsi
(pymol->showhowtomeasuredistances,anglesandtorsions)
Two main chain torsional angles per residue:
phi (Φ) and psi (Ψ)
Cαn+1
Pep/deunit
Cαn
Pep/deunit
EachpepPdeunitcontains
sixatoms:
α
C ,C,O,N,Handthenext
Cα.
Cαn−1
If one peptide unit is kept fixed,
Φ and Ψ define the orientation of the second peptide unit.
So, in a first approximation, the course of a polypeptide chain is defined by
a pair of dihedral angles (Φ and Ψ) per amino acid residue.
14
The definition of phi (Φ)
Cαn+1
Pep/deunit
Cαn
The four highlighted atoms
Pep/deunit
(Cn-1,Nn, Cαn, Cn)
define the dihedral angle Φ
Cαn−1
If one peptide unit is kept fixed,
Φ and Ψ define the orientation of the second peptide unit.
So, in a first approximation, the course of a polypeptide chain is defined by
15
a pair of dihedral angles (Φ and Ψ) per amino acid residue.
The definition of psi (Ψ)
Cαn+1
The four highlighted atoms
Pep/deunit
(Nn, Cαn, Cn, Nn+1)
define the dihedral angle Ψ
Cαn
Pep/deunit
Cαn−1
If one peptide unit is kept fixed,
Φ and Ψ define the orientation of the second peptide unit.
So, in a first approximation, the course of a polypeptide chain is defined by
16
a pair of dihedral angles (Φ and Ψ) per amino acid residue.
Ramachandrandiagramalsocalledthe(Φ,Ψ) -plot
Regions with allowed torsion angles (Φ, Ψ) for
non-Gly and non-Pro residues
βA
β P
α
αL
βP
βA
c
= α-helix
= left-handed α-helix
= parallel β-strands (“ideal”)
= antiparallel β-strands (“ideal”)
= collagen
Blue:
region of (Φ, Ψ) combinations of
lowest energy
Green: region of (Φ, Ψ) combinations with
somewhat less favorable energy
Tan:
(Φ, Ψ) –plot
also called the “Ramachandran diagram”
region of (Φ, Ψ) combinations with
high energy.
Duetoclashesbetweenatomsfromsubsequentresidues,notallcombinaPonsofΦandΨarepossible
For18aminoacidstheso-calledRamachandranplotisasabove.
ForGly,whichhasnosidechain,theRamachandranislessrestricted.
17
ForPro,whichhasaspecialsidechain,theRamachandranis(much)morerestricted.
Sidechaindependenceof
Ramachandranangles
•  Torsionpreferencesvaryfordifferentsidechains
•  MostlooklikealaninebecauseofclasheswithCβ
Classifyingproteinstructure
1. Theirprimarystructureistheaminoacidsequenceofthe
polypep/dechain.
2.Secondarystructureisthelocalspa/alarrangementofa
polypep/de’sbackboneatoms.Commonsecondarystructures
area-helicesandb-strands.
3.TerParystructurereferstothethree-dimensionalstructureof
theen/repolypep/dechain.
4.Someproteinsarecomposedoftwoormorepolypep/de
chains.Thespa/alarrangementofthesechainsisaprotein’s
quaternarystructure.
Higher-orderStructure
(pymol->showcartoonrepresenta/on)
ProteinSecondaryStructure:Thea-helix
Purple: Hydrogen
Bonds
i
Red: Oxygen
i+1
Dark Blue: Nitrogen
i+2
i+3
Light Blue: Hydrogen
Green: Carbon
i+4
A standard a-helix has hydrogen bonds between residues i and i+4.
(pymolshowhydrogenbondsinhelix)
Amphipathic a-Helix
Yellow: hydrophobic amino acids
Blue: hydrophylic amino acids
Val – Lys – Glu – Leu – Leu – Asp – Lys – Val - Glu
3
4
ProteinSecondaryStructure:The b-strand
Purple: Hydrogen
Bonds
Red: Oxygen
Dark Blue: Nitrogen
Light Blue: Hydrogen
Green: Carbon
b-sheet
b-strands come together to form b-sheets (the interaction can be either
parallel or anti-parallel).
ParallelvsAn/parallelb-strandInterac/ons
(pymolshowbetasheets)
β-sheetsforma“pleatedsheet”
Note:
Therearealso
manyMIXED
ß-sheets,with
somestrands
paralleland
othersanPparallel
7.0Å
Cβofside
chain
Lehningerf04.074thed.
VVP2/eFig6-10
InbothparallelandanP-parallelβ-sheets:
ThesidechainspointalternaPnglyinoppositedirecPons
25
Parallelb-strands:whyaretheytwisted?
Afullyextendedchainisflat
Realbetastrandstwistandare
notflat
Parallelb-strands:whyaretheytwisted?
•  Ondiagonal:notwisttopep/debackbone
phi=-psi
•  Abovediagonal:right-handedtwistto
pep/debackbone;favoredbysidechains
largerthanAla
•  Foragivenenergy,moreaccessiblestates
forright-handtwistthanforleo-hand
(pymol->showtwistinsheet)
Hydrophobic / hydrophilic patterning in b-strands
Thr – Leu – Asn – Ile – Lys - Phe
2
(pymol->showhydrophobicpaFerninginbetasheet)
Protein Secondary Structure: Loops and Turns
loop
Example: an antigen binding
domain of an antibody
Active site residues and
binding residues are often
found in loops.
Turns are short loops (2-4 residues), and typically have more regular
structure than loops.
Between secondary and
tertiary structure
•  Supersecondary structure: arrangement of
elements of same or different secondary
structure into motifs; a motif is usually not stable
by itself.
•  Domains: A domain is an independent unit,
usually stable by itself; it can comprise the whole
protein or a part of the protein.
b-hairpin:Mostcommonformof/ghtturn
type
Fi+1
Yi+1
Fi+2
Yi+2
I
-60
-30
-90
0
I’
60
30
90
0
i+3
II
-60
120
80
0
II’
60
-120
-80
0
i+2
i+1
i
TypeII’
b-hairpin:Mostcommonformof/ghtturn
Example of a b-hairpin in bovine
pancreatic trypsin inhibitor– BPTI.
Example of a protein with two bhairpins: erabutoxin from whale.
The helix-turn-helix motif
•  Thismo/fischaracteris/cofproteinsbindingtothemajorDNAgrove.
•  Theproteinscontainingthismo/frecognizepalindromicDNAsequences.
•  Thesecondhelixisresponsiblefornucleo/desequencerecogni/on.
The helix-turn-helix motif
βαβmo/f
Why?
•  Shorterconnec/onsinright-handedtopology?
•  Accessibilitytohelixterminiforhydrogenbonding?
•  Trappedends?
TriosePhosphateIsomerase(TIM)
Adomainwhichoccursinamanyproteins.
5
Notethe“β-barrel”in
thecentersurrounded
by
α-helices
4
3
6
2
7
8
1
Notethe8-fold
repeatedβ-αmoPf
The“TIMbarrel”:α/βclasstopology
36
Protein Tertiary Structure
•  Most proteins adopt a unique three dimensional structure that is
essential to the biological role they perform. Protein structures can be
divided into three groups: globular proteins, fibrous proteins, and
integral membrane proteins.
Examples:
HIV protease
(globular)
Porin
(membrane)
Collagen
(fibrous)
Most globular proteins share these characteristics
1)  Hydrophobics on the inside
2)  Close packing
3)  Most polar groups involved in
a hydrogen bond
Hydrophobic residues of
procarboxypeptidase
Most globular proteins share these characteristics
1)  Hydrophobics on the inside
2)  Close packing
3)  Most polar groups involved in
a hydrogen bond
acylphosphatase
(pymol)
Most globular proteins share these characteristics
1)  Hydrophobics on the inside
2)  Close packing
3)  Most polar groups involved
in a hydrogen bond
Hydrogen bond between a serine
and a backbone carbonyl
Fibrous Proteins
•  highly elongated molecules that generally function as structural materials
•  their sequences are usually highly repetitive
Example: Collagen is the major stress-bearing component of bone,
tendon and other connective tissues. A fiber 1 mm in diameter can
support 20 lbs.
15 Å
3000 Å
Collagen
Sequence: G-X-Y
Hyp
Pro
Gly
Hyp
Pro
Gly
Collagen
Hyp = 4-hydroxyproline
•  Pro is converted to Hyp by the enzyme prolyl hydroxylase which
uses vitamin C as a cofactor – hence vitamin C deficiency can
lead to unstable collagen à connective tissue problems (aka
scurvy)
FIBROUSPROTEINS:KeraPnandcoiledcoilα-helices
α-keratin is the principal protein of mammalian hair, nails, skin.
VVP2/eFig6-15.
VV(notVVP)f08.253rded..
43
Membrane Proteins
•  ~30% of human proteins are
membrane proteins
•  ~70% of therapeutics are
directed towards membrane
proteins
Membrane proteins are important for:
1) ion and solute transport
2) detection of external signals, e.g. hormones
3) cell-to-cell recognition
Membrane Proteins: hydrophobic residues are
found on the exterior
membrane
blue: hydrophilic sidechains
yellow: hydrophobic sidechains
Hydrophobic
environment
water
Space filling
representation
of porin protein
Ribbon
representation of
porin protein
(pymol->show2POR)
membrane
Membraneproteins
areooeneitherall-αorall-β
TheproteinavoidsplacingmainchainC=OandNHgroups
inthehydrophobicbilayer)
OmpFPorin
LIPID BILAYER
LIPID BILAYER
Bacteriorhodopsin
α-HELICEScrossingthemembrane
β-BARRELcrossingthemembrane
46
Mul/-domainproteins
• Manyproteinscontain‘independent’domainsconnectedbylinkers.Itis
commontocombinerecogni/ondomainswithac/va/ondomains.By
piecingdomainstogetherinnewwaysitispossibletocreatenewfunc/ons.
Example: Src tyrosine kinase. The SH3
domain recognizes substrate and the
kinase domain phosphorylates the
substrate.
SH3
SH2
Kinase
Interesting fact: the human genome
does not contain more types of protein
domains than more primitive
organisms, but rather just puts them
together in more complicated ways.
MulP-domainproteinsareverycommon
The order of the
symbols indicates
the order of the
domains
Domains are compact
folded “nodules” of a
protein chain
Livingorganismsolenstringdomainstogetherintooneproteinchainand
thenmodifyeachdomainforaspecificfuncPon
48
Intrinsically Unfolded Proteins
•  Whatareunstructuredproteins?Proteinsorsegmentsof
proteinsthatlackawell-structured3Dfold.Theyare
referredtoas“na/velyunfolded”or“intrinsicallyunfolded”
•  Howprevalentareunstructuredproteins?Approximately
40%ofproteinshaveunstructuredregionsthatarelonger
than50residues,6-17%ofproteinsinSwiss-Protare
probablyfullydisordered(basedontheore/calpredic/ons).
•  WhatarethefuncPonsofunstructuredproteins?Thereare
many(seelater).
ManyIntrinsicallyUnfoldedProteinsAdopt
StructureUponBindingPartnerMolecules
DysonandWright(2005)NatRevMolCellBiol.6:197-208
What are some of the unique features of
disordered proteins?
•  Extensivebindinginterfacescanbecreatedwithrela/vely
smallproteins
•  Conforma/onalflexibilityallowsaproteinsegmenttobind
itstargetaswellastoamodifyingenzyme(i.e.posttransla/onalmodifica/on).
•  Pliable(unstructured)proteinscaninteractwithmany
differentbindingpartners.Chaperonesooencontain
unstructuredregionsthatareusedtorecognizeadiverse
arrayofsubstrates.
ClassifyingTer/aryStructure
Historically:MuchworkdonebyChothiatodeveloprules
governingpackingarrangementsofsecondarystructure
(likeridge-into-groovemodelforhelix-helixpacking)
Modernschemesusesequencesimilarityandstructurestructurecomparisonstoorganizetheproteinuniverse
andelucidatestructuralandevolu/onaryrela/onships
–  SCOP–structuralclassifica/onofproteins
–  CATH–classarchitecturetopologyhomologous
superfamily
–  Dalidomaindic/onary
CATH
•  acombina/onofmanualandautomated
hierarchicalclassifica/on
•  fourmajorlevels:
–  Class(C)–basedonsecondary
structurecontent
–  Architecture(A)–basedongross
orienta/onofsecondarystructures
–  Topology(T)–basedonconnec/ons
andnumbersofsecondarystructures
–  Homologoussuperfamily(H)–based
onstructure/func/onevolu/onary
commonali/es
•  providesusefulgeometricinforma/on(e.g.
architecture)
•  par/alautoma/onmayresultinexamples
nearfixedthresholdsbeingassigned
inaccurately
SCOP
• 
• 
• 
• 
apurelymanualhierarchical
classifica/on
threemajorlevels:
–  Family–basedonclear
evolu/onaryrela/onship
(pairwiseresidueiden//es
betweenproteinsare>30%)
–  Superfamily–basedonprobable
evolu/onaryorigin(low
sequenceiden/tybutcommon
structure/func/onfeatures
–  Fold–basedonmajorstructural
similarity(majorsecondary
structuresinsamearrangement
andtopology
providesdetailedevolu/onary
informa/on
manualprocessinfluencesupdate
frequencyandequallyexhaus/ve
examina/on
α-Helicalproteins
•  Mostlyα-helices
•  Parallel/an/parellelbundles
•  Globulararrangements
βproteins
•  Mostlyβstrands
•  Manyan/parallelstrand
pairsfromβhairpins
•  “βsandwich”–twosheets
pairedwith~30°rota/on
α/βproteins
•  Intermixedαandβsecondarystructureelements
• 
• 
• 
• 
Manyβαβelements:mostlyparallelstrandpairs
Openfacedsheetswithtwoedges;twoorthreelayers
Barrelswithnoedgestrands
Helicestendtofollowthedirec/onofstrands
Otherproteinclasses
•  α+β:segregatedαandβsecondarystructure
•  small:/nyproteinsthatusedisulfidebondstoconferstability
•  coiledcoils:2+parallelheliceswrappedaroundoneanother
Theproteinstructureuniverse
X-RayCrystallography
•  crystallizeandimmobilizesingle,
perfectprotein
•  bombardwithX-rays,record
scaFeringdiffrac/onpaFerns
•  determineelectrondensitymap
fromscaFeringandphasevia
Fouriertransform:
•  useelectrondensityand
biochemicalknowledgeofthe
proteintorefineanddetermine
amodel
NMRSpectroscopy
determiningconstraints
usingconstraintstodetermine
secondarystructure
•  proteininaqueoussolu/on,
mo/leandtumbles/vibrates
withthermalmo/on
•  NMRdetectschemicalshiosof
atomicnucleiwithnon-zero
spin,shiosduetoelectronic
environmentnearby
•  determinedistancesbetween
specificpairsofatomsbased
onshios,“constraints”
•  useconstraintsand
biochemicalknowledgeofthe
proteintodeterminean
ensembleofmodels
Cryo-electronmicroscopy