Download Structure_prediction_analysis_of_huntingtin_by_Phyre2_20160307

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Multi-state modeling of biomolecules wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Biochemistry wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Transcript
Structure prediction analysis of huntingtin – 2016/03/07
Premise for this work:
In the absence of additional experimental data so far, I have decided to use computational tools to model
the huntingtin structure. I will compare the models generated with the domain predictions from InterPro
and my experimental domain mapping data from the limited-proteolysis/mass spectrometry experiment.
Please note: these models are computationally generated and whilst they may provide important hints
and clues they must also be interpreted with a pinch of salt.
Phyre2 analysis
1. Method of structure prediction
3D model building using advanced remote homology detection methods
-
-
Gather homologous sequences based on the submitted query sequence and scan against the nr20
(no sequences with >20% mutual sequence identity) protein sequence database to generate a
multiple sequence alignment.
Predict secondary structure with PSIPRED and generate a hidden Markov model (HMM) with both
the alignment and secondary structure prediction.
Scan the fold library against a database of HMMs of proteins of known structure (experimentally
calculated) and construct crude backbone-only models with the top alignment hits.
Insertions and deletions in query sequence relative to the aligned models are corrected by loop
modeling.
Amino acid side chains are added to generate the final Phyre2 model.
2. Query submitted sequences
NB: Phyre2 is unable to process protein sequences >1200 amino acids so the complete huntingtin
sequence may not be submitted.
a) Exon1 (1-90)
MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPP
PPPPPPGPAVAEEPLHRP
No reliable models were calculated as none of the alignments had confidence measurements >10%.
However, the secondary structure prediction shows an alpha-helical region across the N17 and polyQ, see
the additional PDF file (Phyre2_SS_1-90.pdf).
b) 1-1200
MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPP
PPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESD
VRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLT
RTSKRPEESVQETLAAAVPKIMASFGNFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQY
FYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVY
ELTLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSIVELIAGGGSS
CSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQPRSQ
HTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAVPSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVT
PSDSSEIVLDGTDNQYLGLQIGQPQDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDK
FVLRDEATEPGDQENKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSC
VGAAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDW
MGTIRTLTGNTFSLADCIPLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQLIIDVLTLRNSSYW
LVRTELLETLAEIDFRLVSFLEAKAENLHRGAHHYTGLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRL
VPKLFYKCDQGQADPVVAVARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTMENNLSRVI
AAVSHELITSTTRALTFGCCEALCLLSTAFPVCIWSLGWHCGVPPLSASDESRKSCTVGMATMILTLLSSAW
FPLDLSAHQDALILAGNLLAASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKVIN
ICAHVLDDVAPGPAIKAALPSLTNPPSLSPIRRKGKEKEPGEQASVPLSPKKGSEASAASRQSDTSGPVTTS
KSSSLGSFYHLPSYLKLHDVLKATHANYKVTLDLQNSTEKFGGFLRSALDVLSQILELATLQDIGK
The secondary structure prediction for this region can be seen in Phyre2_SS_1-1200.pdf. This details alpha
helical regions between ~90-400 and ~660-1170, in line with the InterPro predictions and the
experimentally determined putative domain boundaries.
Template with greatest sequence coverage:
131-1152 (85%) coverage is aligned with 97% confidence using the template d2bpta1 from the PDB 2BPT:
structure of the NUP1P:KAP95P complex.
Fold library id
d2bpta1
Fold
alpha-alpha superhelix
Superfamily
ARM repeat
Family
Armadillo repeat
No model was generated on the basis of this structural prediction as only the top 20 predictions were
modelled, this result was ranked number 24. The template structure shows an extended armadillo repeat
structure so I would imagine the modelled structure would show a similar Armadillo repeat architecture.
On the next page, an alignment of the sequences is shown with the predicted secondary structure
architecture. The query sequence has many extended insertion regions relative to the template sequence.
Whether these would also potentially form Armadillo repeats is not clear from this preliminary analysis or
perhaps they would be loop regions or linkers between domains of armadillo repeats.
Only a sparse array of alpha-helices are predicted between amino acids ~390-660 based on the secondary
structure prediction. Interestingly, this also corresponds to the region of huntingtin sequence known to be
cleaved by endogenous proteases (sites 402 through to 586, see previous posting
http://dx.doi.org/10.5281/zenodo.46008). Perhaps this region is more open and therefore accessible to
digestion by proteases.
Template with highest confidence:
130-936 (67%) coverage is aligned with 99% confidence using the template d1b3ua from the PDB 1B3U:
structure of constant regulatory domain of human PP2A, PR65ALPHA.
Fold library id
Fold
d1b3ua_
alpha-alpha superhelix
Superfamily
Family
ARM repeat
HEAT repeat
As this result was ranked top, a model was generated by Phyre2. The PDB generated from the template
detailed above, is shown in cartoon format. The protein model is coloured from blue through to red from
N to C terminus. The model is shown in 2 orientations, related by a 90 degree rotation in the plane of the
screen.
I am not surprised PP2A was pulled out as a model template as this is one of the most well characterized
HEAT repeat proteins. The curved architecture is reminiscent of the importin structures which wrap
around their binding partners. Perhaps huntingtin completes its role as a scaffold protein in a similar
fashion.
On the next page, an alignment of the sequences is shown with the predicted secondary structure
architecture. The model shows a classic HEAT repeat protein with a curved overall architecture to form a
U-shape. In this model, 400-620 has limited predicted secondary structure, in concurrence with the
previous model. As such, the PDB model generated amino acids 130-399 and 626-936. The missing region
is cut out of the model, meaning that the remaining regions are just stitched together in the middle of an
alpha-helix. In reality, there is likely an extended loop region joining these two regions of HEAT repeat
together which is not represented by this model.
Overall coverage with high confidence templates
78 templates were aligned with greater than 90% confidence. These spanned from amino acid 91 to 1152,
all corresponding to Armadillo repeats or HEAT repeats. As such, it could be possible for the
HEAT/Armadillo repeat domains to map from 91-399 and then 626-1152. This data tallies with the
InterPro predictions, 81-401 and 696-1152. Both of these prediction analyses line up with the
experimental domain boundary determinations ~100, 420, 627 and 1180. See additional PDF
(Phyre2_Htt_1-1200.pdf) with details of all results from this analysis.
c) 91-1290
KKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMD
SNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAV
PKIMASFGNFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLVPVED
EHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYELTLHHTQHQDHNVVTGA
LELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSIVELIAGGGSSCSPVLSRKQKGKVLLGEE
EALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSS
ATDGDEEDILSHSSSQVSAVPSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLG
LQIGQPQDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQENKPC
RIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVGAAVALHPESFFSKLYK
VPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDWMGTIRTLTGNTFSLADCI
PLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQLIIDVLTLRNSSYWLVRTELLETLAEIDFRLV
SFLEAKAENLHRGAHHYTGLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRLVPKLFYKCDQGQADPVVA
VARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTMENNLSRVIAAVSHELITSTTRALTFG
CCEALCLLSTAFPVCIWSLGWHCGVPPLSASDESRKSCTVGMATMILTLLSSAWFPLDLSAHQDALILAGNL
LAASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKVINICAHVLDDVAPGPAIKAA
LPSLTNPPSLSPIRRKGKEKEPGEQASVPLSPKKGSEASAASRQSDTSGPVTTSKSSSLGSFYHLPSYLKLH
DVLKATHANYKVTLDLQNSTEKFGGFLRSALDVLSQILELATLQDIGK
To determine how the prediction would alter in the absence of exon1, I reran Phyre2 with the huntingtin
sequence from amino acids 91-1290. The results were the same as for the analysis of amino acids 1-1200.
See additional PDF (Phyre2_Htt_91-1290.pdf and Phyre2_SS_91-1291.pdf) with details of all results from
this analysis.
d) 1201-2400
SPKKGSEASAASRQSDTSGPVTTSKSSSLGSFYHLPSYLKLHDVLKATHANYKVTLDLQNSTEKFGGFLRSA
LDVLSQILELATLQDIGKCVEEILGYLKSCFSREPMMATVCVQQLLKTLFGTNLASQFDGLSSNPSKSQGRA
QRLGSSSVRPGLYHYCFMAPYTHFTQALADASLRNMVQAEQENDTSGWFDVLQKVSTQLKTNLTSVTKNRAD
KNAIHNHIRLFEPLVIKALKQYTTTTCVQLQKQVLDLLAQLVQLRVNYCLLDSDQVFIGFVLKQFEYIEVGQ
FRESEAIIPNIFFFLVLLSYERYHSKQIIGIPKIIQLCDGIMASGRKAVTHAIPALQPIVHDLFVLRGTNKA
DAGKELETQKEVVVSMLLRLIQYHQVLEMFILVLQQCHKENEDKWKRLSRQIADIILPMLAKQQMHIDSHEA
LGVLNTLFEILAPSSLRPVDMLLRSMFVTPNTMASVSTVQLWISGILAILRVLISQSTEDIVLSRIQELSFS
PYLISCTVINRLRDGDSTSTLEEHSEGKQIKNLPEETFSRFLLQLVGILLEDIVTKQLKVEMSEQQHTFYCQ
ELGTLLMCLIHIFKSGMFRRITAAATRLFRSDGCGGSFYTLDSLNLRARSMITTHPALVLLWCQILLLVNHT
DYRWWAEVQQTPKRHSLSSTKLLSPQMSGEEEDSDLAAKLGMCNREIVRRGALILFCDYVCQNLHDSEHLTW
LIVNHIQDLISLSHEPPVQDFISAVHRNSAASGLFIQAIQSRCENLSTPTMLKKTLQCLEGIHLSQSGAVLT
LYVDRLLCTPFRVLARMVDILACRRVEMLLAANLQSSMAQLPMEELNRIQEYLQSSGLAQRHQRLYSLLDRF
RLSTMQDSLSPSPPVSSHPLDGDGHVSLETVSPDKDWYVHLVKSQCWTRSDSALLEGAELVNRIPAEDMNAF
MMNSEFNLSLLAPCLSLGMSEISGGQKSALFEAAREVTLARVSGTVQQLPAVHHVFQPELPAEPAAYWSKLN
DLFGDAALYQSLPTLARALAQYLVVVSKLPSHLHLPPEKEKDIVKFVVATLEALSWHLIHEQIPLSLDLQAG
LDCCCLALQLPGLWSVVSSTEFVTHACSLIYCVHFILEAVAVQPGEQLLSPERRTNTPKAISEEEEEVDPNT
QNPKYITAACEMVAEMVESLQSVLALGHKRNSGVPAFLTPLLRNIIIS
No templates were found to build models with >55% rendering the generated models highly inaccurate.
See additional PDF (Phyre2_Htt_1201-2400.pdf) with details of all results from this analysis. However, the
region is predicted to have many elements of secondary structure with ~68% of the protein to be folded as
alpha helix. See additional PDF (Phyre2_SS_1201-2400.pdf). However, with no models of confidence built
by this analysis or domains determined by other prediction analyses (InterPro) or experimental domain
mapping to date, it is unclear how this region would be folded at the tertiary structure level.
e) 2401-3144
LARLPLVNSYTRVPPLVWKLGWSPKPGGDFGTAFPEIPVEFLQEKEVFKEFIYRINTLGWTSRTQFEETWAT
LLGVLVTQPLVMEQEESPPEEDTERTQINVLAVQAITSLVLSAMTVPVAGNPAVSCLEQQPRNKPLKALDTR
FGRKLSIIRGIVEQEIQAMVSKRENIATHHLYQAWDPVPSLSPATTGALISHEKLLLQINPERELGSMSYKL
GQVSIHSVWLGNSITPLREEEWDEEEEEEADAPAPSSPPTSPVNSRKHRAGVDIHSCSQFLLELYSRWILPS
SSARRTPAILISEVVRSLLVVSDLFTERNQFELMYVTLTELRRVHPSEDEILAQYLVPATCKAAAVLGMDKA
VAEPVSRLLESTLRSSHLPSRVGALHGVLYVLECDLLDDTAKQLIPVISDYLLSNLKGIAHCVNIHSQQHVL
VMCATAFYLIENYPLDVGPEFSASIIQMCGVMLSGSEESTPSIIYHCALRGLERLLLSEQLSRLDAESLVKL
SVDRVNVHSPHRAMAALGLMLTCMYTGKEKVSPGRTSDPNPAAPDSESVIVAMERVSVLFDRIRKGFPCEAR
VVARILPQFLDDFFPPQDIMNKVIGEFLSNQQPYPQFMATVVYKVFQTLHSTGQSSMVRDWVMLSLSNFTQR
APVAMATWSLSCFFVSASTSPWVAAILPHVISRMGKLEQVDVNLFCLVATDFYRHQIEEELDRRAFQSVLEV
VAAPGSPYHRLLTCLRNVHKVTTC
1 model was built with 82.5% confidence across 2745-3091 using templatr c2jkrL from PDB: 2JKR - AP2
CLATHRIN ADAPTOR CORE with Dileucine peptide RM(phosphoS)QIKRLLSE.
Fold library id PDB Header
c2jkrL_
Molecule
Title
PDB header: Chain: L: PDB Molecule: ap-2 PDBTitle: ap2 clathrin adaptor core with
endocytosis complex subunit alpha-2;
dileucine peptide rm(2 phosphos)qikrllse
Below, an alignment of the sequences is shown with the predicted secondary structure architecture.
The PDB generated from the template detailed above, is shown in cartoon format. The protein model is
coloured from blue through to red from N to C terminus. The model is shown in 2 orientations, related by
a 90 degree rotation in the plane of the screen.
The model shows extended helical bundles which whilst not explicitly determined to be HEAT or armadillo
repeats, bear some structural similarity. This predicted domain maps almost precisely to the armadillo
repeat domain 2740-3083 determined by InterPro which in turn tallies to the experimentally determined
fragment 2672-3130, lending strength to this prediction.
3. Overall conclusions
-
-
-
The high confidence models determined by Phyre2 show the expected secondary structural features
of huntingtin – extended alpha helical regions
These alpha helical motifs are generally folded into HEAT (2 helices joined by a helical hairpin) or
armadillo (3 helices - H2 and H3 packed together in an antiparallel fashion, perpendicular to shorter
H1, with a sharp loop between H1 and H2 mediated by a conserved glycine) repeats in the predicted
models, again in line with previous predictions by InterPro
High confidence models (>80%) consisting of these tertiary structure features were built for regions
130-399, 626-936 and 2745-3091. The regions of huntingtin sequence for each of these models
correlate to domain predictions by InterPro as well as experimentally determined putative domain
boundaries from the limited proteolysis and mass spectrometry experiments.
Despite fairly confident prediction of alpha-helical secondary structure motifs throughout the region
of huntingtin sequence from 1201-2400, no models were built with confidence, consistent with the
InterPro analysis and to some degree the experimental data.
4. Next steps:
-
Repeat analysis with other structural prediction programmes
Continue experimental domain determination by limited proteolysis with alternative enzymes i.e.
chymotrypsin
Begin domain construct design for BVES expression
Phyre2: The Phyre2 web portal for protein modeling, prediction and analysis. Kelley LA et al. Nature
Protocols 10, 845-858 (2015)