Download Coming Soon !!! The next lecture will review step 4 and cover this as

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression wikipedia , lookup

NADH:ubiquinone oxidoreductase (H+-translocating) wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Interactome wikipedia , lookup

Western blot wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Protein wikipedia , lookup

Biosynthesis wikipedia , lookup

Point mutation wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Anthrax toxin wikipedia , lookup

Metalloprotein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
5- Look for signatures of protein-protein interactions and
use these to guide docking together of the different subunits or domains of the complex (Quaternary
structure).
Coming Soon !!!
The next lecture will review
step 4 and cover this as well
as the actual docking (step
5).
pg 23
5- Look for signatures of protein-protein interactions and
use these to guide docking together of the different subunits or domains of the complex (Quaternary
structure). In our case we are just reconstructing a single protein, but have only found structural
templates for two separate portions of the whole. However there are other very important cases of
multi-protein complexes where separate protein molecules must be correctly assembled into a
complex in order to support function.
5a- Residues that are highly conserved yet outside the active site are strong candidates for binding
interfaces.
5b- Exposed hydrophobes; burial of these within a binding interface can provide driving force for
complex formation.
5c- Patches of conserved charge may attract and bind to conserved opposite charge on the partner
protein.
Special in our case: We know that the two structural units must be close enough in 3D space to be
connected by a stretch of 19 amino acids. The domain we have done together in class covered
residues 1-550. The other domain with good homology covers residues 569-643. The 19 residues
between models needs to be able to stretch from the C-terminus of the large domain and the Nterminal of the C-terminus of the small domain.
(There is a feeble structural model for residues 524-643. This could be used with extreme caution to
bridge the two decent models. We will not attempt that in this class.)
(Numbers are drawn from page 16, SwissModel results)
pg 24
5a- Residues that are highly conserved yet outside the active site are strong
candidates for binding interfaces.
We will perform a Multiple Sequence Alignment
There is a variety of on-line tools for this. We will use the well-known
http://www.ebi.ac.uk/Tools/msa/clustalo/
‘Clustal-Omega’.
Our sequences go
here.
The CRUCIAL
decision is which
sequences to use:
We want to
identify amino
acids conserved
among ‘bifurcating’
hydrogenases.
These may not be
conserved among
hydrogenases in
general. Therefore
we need to select
ONLY sequences
of hydrogenases
that bifurcate.
pg 25
5a- Residues that are highly conserved yet outside the active site are strong
candidates for binding interfaces.
Sequences for our Multiple Sequence Alignment should be those of
bifurcating hydrogenases:
Tm1426_NP_229226.1 = GI:15644177 this is our Thermotoga maritima protein, our unknown
Clo1313_1881 = YP_005688384.1 this is the homologous protein from Clostridium pasteurianum
Cthe_0342 = YP_001036773_HydA this is the homologous protein from Clostridium thermocellum
Clo1313_1791_YP_005688298_HydA this is another homologous protein from Clostridium
pasteurianum
YP_001956466 this is a NAD-depFe-hydrogenase from a Termite gut bacterium.
I have posted these sequences on our web site as a plain text file
HydAsequencesCulled.txt .
You would generate an equivalent file by copying the amino acid sequences in FASTA
format from each of the proteins into a text file with the heading line for each sequence
beginning with ‘>’. Do not include the symbol ‘:’ anywhere in the file.
To run Clustal, select the entire text content, copy and paste into the clustal box.
Alternately upload the file (did not work for me). This is ‘step 1’.
In ‘step 2’ choose ‘Clustal w numbers’ (the numbers are very helpful).
Chose whether or not to be notified by email (useful, because the message will include your
job number and allow you to access it later (but not forever!).
Click ‘Submit’.
pg 26
Aside: accumulation of a txt file
containing protein amino acid
sequences.
Use the accession name
of the protein in the
/protein/ data base of NCBI to get the amino
acid sequence directly. It is way down at the
bottom of the page
NP_229226.1
>Tm1426_NP_229226.1_GI:15644177_Hyd-alpha
1 mkiyvdgrev iindnernll ealknvgiei pnlcylseas iygacrmclv eingqittsc
61 tlkpyegmkv ktntpeiyem rrnilelila thnrdcttcd rngscklqky aedfgirkir
121 fealkkehvr desapvvrdt skcilcgdcv rvceeiqgvg viefakrgfe svvttafdtp
181 lietecvlcg qcvaycptga lsirndidkl iealesdkiv igmiapavra aiqeefgide
241 dvamaeklvs flktigfdkv fdvsfgadlv ayeeahefye rlkkgerlpq ftsccpawvk
301 haehtypqyl qnlssvkspq qalgtvikki yarklgvpee kiflvsfmpc takkfeaere
361 ehegivdivl ttrelaqlik msridinrve pqpfdrpygv ssqaglgfgk aggvfscvls
421 vlneeigiek vdvkspedgi rvaevtlkdg tsfkgaviyg lgkvkkflee rkdveiievm
481 acnygcvggg gqpypndsri rehrakvlrd tmgikslltp venlflmkly eedlkdehtr
541 heilhttyrp rrrypekdve ilpvpngekr tvkvclgtsc ytkgsyeilk klvdyvkend
601 megkievlgt fcvencgasp nvivddkiig gatfekvlee lskng
>Clo1313_1881_YP_005688384.1_NC_017304.1_HydA
1 mqmvnvtidn ckiqvpanyt vleaakqani diptlcflkd inevgacrmc vvevkgarsl
61 qaacvypvse glevytqtpa vrearkvtle lilsnhekkc ltcvrsence lqrlakdlnv
121 kdirfegems nlpiddlsps vvrdpnkcvl crrcvsmckn vqtvgaidvt ergfrttvst
181 afnkplsevp cvncgqcinv cpvgalrekd didkvweala npelhvvvqt apavrvalge
241 efgmpigsrv tgkmvaalsr lgfkkvfdtd taadltimee gtelinrikn ggklplitsc
301 spgwikfceh nypefldnls scksphemfg avlksyyaqk ngidpskvfv vsimpctakk
361 feaqrpelss tgypdvdvvl ttrelarmik eagidfnslp dkqfddpmge asgagvifga
421 tggvmeaair tvgellsgkp adkieytevr gldgikeasi eldgftlkaa vahglgnark
481 lldkikagea dyhfieimac pggcingggq piqpssvrnw kdirceraka iyeedeslpi
541 rkshenpkik mlyeeffgep gshkahellh thyekrenyp vk
>Cthe_0342_YP_001036773_HydA
1 mqmvnvtidn ckiqvpanyt vleaakqani diptlcflkd inevgacrmc vvevkgarsl
61 qaacvypvse glevytqtpa vrearkvtle lilsnhekkc ltcvrsence lqrlakdlnv
121 kdirfegems nlpiddlsps vvrdpnkcvl crrcvsmckn vqtvgaidvt ergfrttvst
181 afnkplsevp cvncgqcinv cpvgalrekd didkvweala npelhvvvqt apavrvalge
241 efgmpigsrv tgkmvaalsr lgfkkvfdtd taadltimee gtelinrikn ggklplitsc
301 spgwikfceh nypefldnls scksphemfg avlksyyaqk ngidpskvfv vsimpctakk
361 feaqrpelss tgypdvdvvl ttrelarmik etgidfnslp dkqfddpmge asgagvifga
421 tggvmeaair tvgellsgkp adkieytevr gldgikeasi eldgftlkaa vahglgnark
481 lldkikagea dyhfieimac pggcingggq piqpssvrnw kdirceraka iyeedeslpi
541 rkshenpkik mlyeeffgep gshkahellh thyekrenyp vk
>Clo1313_1791_YP_005688298_HydA
Begin a txt file to house all the aa sequences.
(No need for step 1.)
Then use BLAST to find homologues. Do this
directly from the Entrez/protein page.
>Cthe_0430_YP_001036861.1_HydA-Identical_to_Clo1313-1791
1 mdnreymlid gipveingek nllelirkag iklptfcyhs elsvygacrm cmvenewggl
61 daacstppra gmsiktnter lqkyrkmile lllanhcrdc ttcnnngkck lqdlamryni
121 shirfpntas npdvddsslc itrdrskcil cgdcvrvcne vqnvgaidfa yrgskmtist
181 vfdkpifesn cvgcgqcala cptgaivvkd dtqkvwkeiy dkntrvsvqi apavrvalgk
241 elglndgena igkivaalrr mgfddifdts tgadltvlee saellrrire gkndmplfts
301 ccpawvnyce kfypellphv stcrspmqmf asiikeeyst sskrlvhvav mpctakkfea
361 arkefkvngv pnvdyvlttq elvrmikesg ivfselepea idmpfgtytg agvifgvsgg
421 vteavlrrvv sdksptsfrs laytgvrgmn gvkeasvmyg drklkvavvs glknagdlie
481 rikagehydl vevmacpggc ingggqpfvq seerekrgkg lysadklcni ksseenplmm
541 tlykgilkgr vhellhvdya skkeak
pg 27
RESULTS: MSA of 5 bifurcating
HydAs
This sequence turned out to be identical to
another I already had. Therefore it was dropped.
You need a minimum of 5 sequences for later
steps. Therefore I did a BLAST search and
identified the termite gut bacterial sequence as a
bifurcating hydrogenase.
1 mdnreymlid gipveingek nllelirkag iklptfcyhs elsvygacrm cmvenewggl
61 daacstppra gmsiktnter lqkyrkmile lllanhcrdc ttcnnngkck lqdlamryni
121 shirfpntas npdvddsslc itrdrskcil cgdcvrvcne vqnvgaidfa yrgskmtist
181 vfdkpifesn cvgcgqcala cptgaivvkd dtqkvwkeiy dkntrvsvqi apavrvalgk
241 elglndgena igkivaalrr mgfddifdts tgadltvlee saellrrire gkndmplfts
301 ccpawvnyce kfypellphv stcrspmqmf asiikeeyst sskrlvhvav mpctakkfea
361 arkefkvngv pnvdyvlttq elvrmikesg ivfselepea idmpfgtytg agvifgvsgg
421 vteavlrrvv sdksptsfrs laytgvrgmn gvkeasvmyg drklkvavvs glknagdlie
481 rikagehydl vevmacpggc ingggqpfvq seerekrgkg lysadklcni ksseenplmm
541 tlykgilkgr vhellhvdya skkeak
pg 28
MSA of 5 bifurcating HydAs
The hydrogenase from a bacterium inhabiting termite gut is described as
being NAD-dependent so I think it is a Bif H2ase.
This sequence has a C-terminal NuoE-like domain, similar to our target
Tm1426, so it could be a good model.
4 Cys ligands of a FeS cluster
The alignments are fun to look at but we need to see degree of conservation mapped onto our model
structure to know which conserved amino acids are on the surface and not near active sites. To do this
we will use the Consurf tool. However we will want to use the MSA that Clustal made for us.
Download your MAS from the clustal site that presents your results. Chose an informative title but
pg 29
retain the .clustal_num file type.
5a- Residues that are highly conserved yet outside the active site are strong
candidates for binding interfaces.
Displaying the conservation patterns of the MSA as colour codes on
the structural model. Go to
http://consurf.tau.ac.il/
Choose Amino acids
‘Yes’ there is a structure ( = our model, saved from SwissModel, page 17)
Upload the pdb file you saved from SwissModel (page 17).
modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.pdb
Click ‘Next’
The Chain identifier can be selected among options that Consurf has already found within
your pdb file. In our case there is only one option: ‘A’.
Do we have an MSA? YES (that is what we used the Clustal-Omega for.)
Upload the MSA you just saved.
modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.clustal_num
I called my analysis “HydA-LargeDomain”
NO we do not have a phylogenetic tree (5 sequences is nowhere near a sufficient number
to do that.)
I tend to give the job a title “HydA-LargeDomain” and give my email address because this
will provide a record and access to the analysis, for a week.
Click Submit.
pg 30
5- a Residues that are highly conserved yet outside the active site are strong
candidates for binding interfaces.
Displaying the conservation patterns of the MSA as colour codes on
the structural model. Go to
http://consurf.tau.ac.il/
Choose Amino acids
‘Yes’ there is a structure ( = our model, saved from SwissModel, page 17)
Upload the pdb file you saved from SwissModel (page 17).
modelhydrogenase_QMEANlocal_in_Bfactor_cofactors.pdb
Click ‘Next’
The Chain identifier can be selected among options that Consurf has already found within
your pdb file. In our case there is only one option: ‘A’.
Do we have an MSA? YES (that is what we used the Clustal-Omega for.)
Upload the MSA you just saved.
clustalo-HydAwTermite_21Jan2014-oy.clustal_num.clustal_num
Provide the query sequence name. This has to exactly match a sequence in the MSA and
have the same amino acid sequence as in the pdb. I used “Tm1426-NP_229226GI15644177_Hyd-alpha”
NO we do not have a phylogenetic tree (5 sequences is nowhere near a sufficient number
to do that.)
I tend to give the job a title “HydA-LargeDomain” and give my email address because this
will provide a record and access to the analysis, for a week.
Click Submit.
pg 31
5- a Displaying the conservation patterns of the MSA as colour codes on the
structural model.
Another visual representation
of the MSA in 1D
Also useful. I download this
file and retain it.
Click on this link to gain
access to a pdb file in which
each residue is accompanied
by a color code proportional
to the conservation score.
pg 32
Download
these!
And this
too.
Give each an
informative name.
pg 33
5- a Displaying the conservation patterns
Phe121
of the MSA as colour codes on the
Ile119
structural model.
Open the desired† pdb file in Chimera,
then open the colouring script
“chimera_consurf.cmd”.
There is a patch of exposed residues.
Highly conserved
Flip over horiz axis
Very variable
Open your saved pdb file (the one with the consurf
† warning: because we only used 5 sequences, Consurf
regards that all residues are evaluated based on
insufficient data. Open the file that does not show
insufficient data.
pg 34
5a-b Exposed hydrophobes; burial of these within a
binding interface can provide driving force for
complex formation.
5a-c Patches of conserved charge may attract and
bind to conserved opposite charge on the partner
protein.
To see which conserved residues are hydrophobic,
we can leave Consurf’s colours on the surface and
colour atoms according to hydrophobicity. I made
Ile, Leu,Val, Phe and Trp green. Alternately we can
make Asp and Glu red and Arg and Lys blue.
Phe121
Ile119
Arg120
Cterm
Pro550
Arg46
Pro550
Nterm
Lys2
Met1
Phe121
pg 35
Why did you mark the locations
of the termini of the protein?
Ile119
Because in this case they can
provide a reality check on the
quality of our docking model.
Guess why
Arg120
Arg46
Pro550
Met1
pg 36
part b: the C-terminal domain
0- Get the genes.
1- Get the amino acid sequence of the protein. (Primary structure)
2- Search for other proteins with high homology and known structure.
3- Model the amino acid sequence of the query protein onto the fold of the homologous template
protein, use simulated molecular dynamics to allow the new amino acid side chains to adjust to their
folded environment and to allow loops and secondary structures to adjust to their new lengths.
(Secondary and tertiary structures)
4- Test to see if function is likely to be supported by the model structure, on the basis of fit with
cofactors. (Tertiary structure and function)
5- Look for signatures of protein-protein interactions and use these to guide docking together of the
different subunits or domains of the complex (Quaternary structure).
Now, do it all again for the other domain of Tm1426. The small C-terminal domain which has
homology to NuoE ferredoxin.
Multiple sequence alignment of C-term of HydA (Tm1426) with other NuoE-like domains
that might also bind to HydA.
Tm1426=NP_229226 also Tm1424=NP_229224
YP_001956466 (from a termite-gut bacterium’s HydA),
Identical, keep just Clo1313-1885
Clo1313-1885= YP_005688388
Cthe_0338=YP_001036769
Identical, keep Clo1313_1793
Clo1313_1793=YP_005688300
Cthe_0428=YP_001036859
Component of a hydrogenase 2AUV, page 14
Also: complex I of Thermus thermophilus
pg 37
3b
N
C
C-terminal domain of thioredoxin-like 2Fe2S
ferredoxin of Desulfovibrio fructosovorans
NADP_reducing hydrogenase: HndA
red = poor reliability
pg 38
3b
NuoE: thioredoxin-like 2Fe2S ferredoxin of
Complex I from T thermophilus
pg 39
3b
NuoE of 3iam-B
2auv-A (only includes C-terminal
domain)
pg 40
4b
Active site integrity: does the Fe2S2 cluster fit?
N- and C- termini of the
models are marked with
blue and red arrows
respectively (guess why
this could be useful).
modeled on 2auv
modeled on 3iam
pg 41
5b
Multiple sequence alignment of C-term of HydA (Tm1426) with other NuoE-like domains
that might also bind to HydA. Tm1426=NP_229226
Tm1424=NP_229224
YP_001956466 (from the termite-gut bacterium HydA)
Clo1313-1885= YP_005688388
Clo1313_1793=YP_005688300
start of 2AUV model
clustalo-I20140123-023154-0071-61010473-oy
pg 42
Examples of models coloured to show conserved
residues, hydrophobes, -ves and +ves.
N-term
Flip over horiz axis
I have to admit: none of this is obvious or compelling. Your
job is to identify something intelligent to go by and just try.
pg 43
5b manual docking exploiting your
chemical intelligence.
pg 44
ot
N
(
.
r
e
t
bet
y
n
a
n do xternal
a
c
r
e
ge
put
n
i
m
t
o
s
c
e
r
if a
nte
i
e
e
n
s
a
s
t
: let ment, bu
e
l
i
h
w
Mean the assign
f
part o l.
o
pg 45
contr
Review of goals and steps:
Develop a model for the structure of an enzyme complex when know the
nucleotide sequence of the gene.
Goals
1- obtain the amino acid sequence of the protein.
(Primary structure)
2- Search for other proteins with high homology
and known structure.
3- Model the amino acid sequence of the query
protein onto the fold of the homologous template
protein, (simulated molecular dynamics to adjust
sequence of one protein to structure of the other
(Secondary and tertiary structures).
4- Test to see if function is likely to be supported
by the model structure, on the basis of fit with
cofactors. (Tertiary structure and function)
5- Look for signatures of protein-protein
interactions and use these to guide docking
together of the different subunits or domains of
the complex (Quaternary structure).
5a: exposed conserved residues, 5b exposed
hydrophobes, 5c exposed complementary charges.
Tools (just a small sampling of the many)
1- http://web.expasy.org/translate/
(the parent site provides access to many more tools:
http://www.expasy.org/)
2- BLAST of various sorts:
http://blast.ncbi.nlm.nih.gov/Blast.cgi
(parent site provides access to many more tools:
http://www.ncbi.nlm.nih.gov/ )
3- SwissModel
http://swissmodel.expasy.org/?pid=smd03
(again, this is just one of a whole family of tools.)
4- Download the template protein from the PDB and
transfer the cofactor coordinates into your model.
http://www.rcsb.org/pdb/home/home.do
5a- Homology must be obtained within a curated set
of proteins that retain the same activity as the
model, in order to identify features of interest and
interaction patterns, since modules are reused in
many different contexts. I have done this for you.
pg 46