* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structure_prediction_analysis_of_huntingtin_by_Phyre2_20160307
Multi-state modeling of biomolecules wikipedia , lookup
Point mutation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Biochemistry wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Proteolysis wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Structure prediction analysis of huntingtin – 2016/03/07 Premise for this work: In the absence of additional experimental data so far, I have decided to use computational tools to model the huntingtin structure. I will compare the models generated with the domain predictions from InterPro and my experimental domain mapping data from the limited-proteolysis/mass spectrometry experiment. Please note: these models are computationally generated and whilst they may provide important hints and clues they must also be interpreted with a pinch of salt. Phyre2 analysis 1. Method of structure prediction 3D model building using advanced remote homology detection methods - - Gather homologous sequences based on the submitted query sequence and scan against the nr20 (no sequences with >20% mutual sequence identity) protein sequence database to generate a multiple sequence alignment. Predict secondary structure with PSIPRED and generate a hidden Markov model (HMM) with both the alignment and secondary structure prediction. Scan the fold library against a database of HMMs of proteins of known structure (experimentally calculated) and construct crude backbone-only models with the top alignment hits. Insertions and deletions in query sequence relative to the aligned models are corrected by loop modeling. Amino acid side chains are added to generate the final Phyre2 model. 2. Query submitted sequences NB: Phyre2 is unable to process protein sequences >1200 amino acids so the complete huntingtin sequence may not be submitted. a) Exon1 (1-90) MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPP PPPPPPGPAVAEEPLHRP No reliable models were calculated as none of the alignments had confidence measurements >10%. However, the secondary structure prediction shows an alpha-helical region across the N17 and polyQ, see the additional PDF file (Phyre2_SS_1-90.pdf). b) 1-1200 MATLEKLMKAFESLKSFQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQAQPLLPQPQPPPP PPPPPPGPAVAEEPLHRPKKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESD VRMVADECLNKVIKALMDSNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLT RTSKRPEESVQETLAAAVPKIMASFGNFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQY FYSWLLNVLLGLLVPVEDEHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVY ELTLHHTQHQDHNVVTGALELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSIVELIAGGGSS CSPVLSRKQKGKVLLGEEEALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQPRSQ HTLQADSVDLASCDLTSSATDGDEEDILSHSSSQVSAVPSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVT PSDSSEIVLDGTDNQYLGLQIGQPQDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDK FVLRDEATEPGDQENKPCRIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSC VGAAVALHPESFFSKLYKVPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDW MGTIRTLTGNTFSLADCIPLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQLIIDVLTLRNSSYW LVRTELLETLAEIDFRLVSFLEAKAENLHRGAHHYTGLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRL VPKLFYKCDQGQADPVVAVARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTMENNLSRVI AAVSHELITSTTRALTFGCCEALCLLSTAFPVCIWSLGWHCGVPPLSASDESRKSCTVGMATMILTLLSSAW FPLDLSAHQDALILAGNLLAASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKVIN ICAHVLDDVAPGPAIKAALPSLTNPPSLSPIRRKGKEKEPGEQASVPLSPKKGSEASAASRQSDTSGPVTTS KSSSLGSFYHLPSYLKLHDVLKATHANYKVTLDLQNSTEKFGGFLRSALDVLSQILELATLQDIGK The secondary structure prediction for this region can be seen in Phyre2_SS_1-1200.pdf. This details alpha helical regions between ~90-400 and ~660-1170, in line with the InterPro predictions and the experimentally determined putative domain boundaries. Template with greatest sequence coverage: 131-1152 (85%) coverage is aligned with 97% confidence using the template d2bpta1 from the PDB 2BPT: structure of the NUP1P:KAP95P complex. Fold library id d2bpta1 Fold alpha-alpha superhelix Superfamily ARM repeat Family Armadillo repeat No model was generated on the basis of this structural prediction as only the top 20 predictions were modelled, this result was ranked number 24. The template structure shows an extended armadillo repeat structure so I would imagine the modelled structure would show a similar Armadillo repeat architecture. On the next page, an alignment of the sequences is shown with the predicted secondary structure architecture. The query sequence has many extended insertion regions relative to the template sequence. Whether these would also potentially form Armadillo repeats is not clear from this preliminary analysis or perhaps they would be loop regions or linkers between domains of armadillo repeats. Only a sparse array of alpha-helices are predicted between amino acids ~390-660 based on the secondary structure prediction. Interestingly, this also corresponds to the region of huntingtin sequence known to be cleaved by endogenous proteases (sites 402 through to 586, see previous posting http://dx.doi.org/10.5281/zenodo.46008). Perhaps this region is more open and therefore accessible to digestion by proteases. Template with highest confidence: 130-936 (67%) coverage is aligned with 99% confidence using the template d1b3ua from the PDB 1B3U: structure of constant regulatory domain of human PP2A, PR65ALPHA. Fold library id Fold d1b3ua_ alpha-alpha superhelix Superfamily Family ARM repeat HEAT repeat As this result was ranked top, a model was generated by Phyre2. The PDB generated from the template detailed above, is shown in cartoon format. The protein model is coloured from blue through to red from N to C terminus. The model is shown in 2 orientations, related by a 90 degree rotation in the plane of the screen. I am not surprised PP2A was pulled out as a model template as this is one of the most well characterized HEAT repeat proteins. The curved architecture is reminiscent of the importin structures which wrap around their binding partners. Perhaps huntingtin completes its role as a scaffold protein in a similar fashion. On the next page, an alignment of the sequences is shown with the predicted secondary structure architecture. The model shows a classic HEAT repeat protein with a curved overall architecture to form a U-shape. In this model, 400-620 has limited predicted secondary structure, in concurrence with the previous model. As such, the PDB model generated amino acids 130-399 and 626-936. The missing region is cut out of the model, meaning that the remaining regions are just stitched together in the middle of an alpha-helix. In reality, there is likely an extended loop region joining these two regions of HEAT repeat together which is not represented by this model. Overall coverage with high confidence templates 78 templates were aligned with greater than 90% confidence. These spanned from amino acid 91 to 1152, all corresponding to Armadillo repeats or HEAT repeats. As such, it could be possible for the HEAT/Armadillo repeat domains to map from 91-399 and then 626-1152. This data tallies with the InterPro predictions, 81-401 and 696-1152. Both of these prediction analyses line up with the experimental domain boundary determinations ~100, 420, 627 and 1180. See additional PDF (Phyre2_Htt_1-1200.pdf) with details of all results from this analysis. c) 91-1290 KKELSATKKDRVNHCLTICENIVAQSVRNSPEFQKLLGIAMELFLLCSDDAESDVRMVADECLNKVIKALMD SNLPRLQLELYKEIKKNGAPRSLRAALWRFAELAHLVRPQKCRPYLVNLLPCLTRTSKRPEESVQETLAAAV PKIMASFGNFANDNEIKVLLKAFIANLKSSSPTIRRTAAGSAVSICQHSRRTQYFYSWLLNVLLGLLVPVED EHSTLLILGVLLTLRYLVPLLQQQVKDTSLKGSFGVTRKEMEVSPSAEQLVQVYELTLHHTQHQDHNVVTGA LELLQQLFRTPPPELLQTLTAVGGIGQLTAAKEESGGRSRSGSIVELIAGGGSSCSPVLSRKQKGKVLLGEE EALEDDSESRSDVSSSALTASVKDEISGELAASSGVSTPGSAGHDIITEQPRSQHTLQADSVDLASCDLTSS ATDGDEEDILSHSSSQVSAVPSDPAMDLNDGTQASSPISDSSQTTTEGPDSAVTPSDSSEIVLDGTDNQYLG LQIGQPQDEDEEATGILPDEASEAFRNSSMALQQAHLLKNMSHCRQPSDSSVDKFVLRDEATEPGDQENKPC RIKGDIGQSTDDDSAPLVHCVRLLSASFLLTGGKNVLVPDRDVRVSVKALALSCVGAAVALHPESFFSKLYK VPLDTTEYPEEQYVSDILNYIDHGDPQVRGATAILCGTLICSILSRSRFHVGDWMGTIRTLTGNTFSLADCI PLLRKTLKDESSVTCKLACTAVRNCVMSLCSSSYSELGLQLIIDVLTLRNSSYWLVRTELLETLAEIDFRLV SFLEAKAENLHRGAHHYTGLLKLQERVLNNVVIHLLGDEDPRVRHVAAASLIRLVPKLFYKCDQGQADPVVA VARDQSSVYLKLLMHETQPPSHFSVSTITRIYRGYNLLPSITDVTMENNLSRVIAAVSHELITSTTRALTFG CCEALCLLSTAFPVCIWSLGWHCGVPPLSASDESRKSCTVGMATMILTLLSSAWFPLDLSAHQDALILAGNL LAASAPKSLRSSWASEEEANPAATKQEEVWPALGDRALVPMVEQLFSHLLKVINICAHVLDDVAPGPAIKAA LPSLTNPPSLSPIRRKGKEKEPGEQASVPLSPKKGSEASAASRQSDTSGPVTTSKSSSLGSFYHLPSYLKLH DVLKATHANYKVTLDLQNSTEKFGGFLRSALDVLSQILELATLQDIGK To determine how the prediction would alter in the absence of exon1, I reran Phyre2 with the huntingtin sequence from amino acids 91-1290. The results were the same as for the analysis of amino acids 1-1200. See additional PDF (Phyre2_Htt_91-1290.pdf and Phyre2_SS_91-1291.pdf) with details of all results from this analysis. d) 1201-2400 SPKKGSEASAASRQSDTSGPVTTSKSSSLGSFYHLPSYLKLHDVLKATHANYKVTLDLQNSTEKFGGFLRSA LDVLSQILELATLQDIGKCVEEILGYLKSCFSREPMMATVCVQQLLKTLFGTNLASQFDGLSSNPSKSQGRA QRLGSSSVRPGLYHYCFMAPYTHFTQALADASLRNMVQAEQENDTSGWFDVLQKVSTQLKTNLTSVTKNRAD KNAIHNHIRLFEPLVIKALKQYTTTTCVQLQKQVLDLLAQLVQLRVNYCLLDSDQVFIGFVLKQFEYIEVGQ FRESEAIIPNIFFFLVLLSYERYHSKQIIGIPKIIQLCDGIMASGRKAVTHAIPALQPIVHDLFVLRGTNKA DAGKELETQKEVVVSMLLRLIQYHQVLEMFILVLQQCHKENEDKWKRLSRQIADIILPMLAKQQMHIDSHEA LGVLNTLFEILAPSSLRPVDMLLRSMFVTPNTMASVSTVQLWISGILAILRVLISQSTEDIVLSRIQELSFS PYLISCTVINRLRDGDSTSTLEEHSEGKQIKNLPEETFSRFLLQLVGILLEDIVTKQLKVEMSEQQHTFYCQ ELGTLLMCLIHIFKSGMFRRITAAATRLFRSDGCGGSFYTLDSLNLRARSMITTHPALVLLWCQILLLVNHT DYRWWAEVQQTPKRHSLSSTKLLSPQMSGEEEDSDLAAKLGMCNREIVRRGALILFCDYVCQNLHDSEHLTW LIVNHIQDLISLSHEPPVQDFISAVHRNSAASGLFIQAIQSRCENLSTPTMLKKTLQCLEGIHLSQSGAVLT LYVDRLLCTPFRVLARMVDILACRRVEMLLAANLQSSMAQLPMEELNRIQEYLQSSGLAQRHQRLYSLLDRF RLSTMQDSLSPSPPVSSHPLDGDGHVSLETVSPDKDWYVHLVKSQCWTRSDSALLEGAELVNRIPAEDMNAF MMNSEFNLSLLAPCLSLGMSEISGGQKSALFEAAREVTLARVSGTVQQLPAVHHVFQPELPAEPAAYWSKLN DLFGDAALYQSLPTLARALAQYLVVVSKLPSHLHLPPEKEKDIVKFVVATLEALSWHLIHEQIPLSLDLQAG LDCCCLALQLPGLWSVVSSTEFVTHACSLIYCVHFILEAVAVQPGEQLLSPERRTNTPKAISEEEEEVDPNT QNPKYITAACEMVAEMVESLQSVLALGHKRNSGVPAFLTPLLRNIIIS No templates were found to build models with >55% rendering the generated models highly inaccurate. See additional PDF (Phyre2_Htt_1201-2400.pdf) with details of all results from this analysis. However, the region is predicted to have many elements of secondary structure with ~68% of the protein to be folded as alpha helix. See additional PDF (Phyre2_SS_1201-2400.pdf). However, with no models of confidence built by this analysis or domains determined by other prediction analyses (InterPro) or experimental domain mapping to date, it is unclear how this region would be folded at the tertiary structure level. e) 2401-3144 LARLPLVNSYTRVPPLVWKLGWSPKPGGDFGTAFPEIPVEFLQEKEVFKEFIYRINTLGWTSRTQFEETWAT LLGVLVTQPLVMEQEESPPEEDTERTQINVLAVQAITSLVLSAMTVPVAGNPAVSCLEQQPRNKPLKALDTR FGRKLSIIRGIVEQEIQAMVSKRENIATHHLYQAWDPVPSLSPATTGALISHEKLLLQINPERELGSMSYKL GQVSIHSVWLGNSITPLREEEWDEEEEEEADAPAPSSPPTSPVNSRKHRAGVDIHSCSQFLLELYSRWILPS SSARRTPAILISEVVRSLLVVSDLFTERNQFELMYVTLTELRRVHPSEDEILAQYLVPATCKAAAVLGMDKA VAEPVSRLLESTLRSSHLPSRVGALHGVLYVLECDLLDDTAKQLIPVISDYLLSNLKGIAHCVNIHSQQHVL VMCATAFYLIENYPLDVGPEFSASIIQMCGVMLSGSEESTPSIIYHCALRGLERLLLSEQLSRLDAESLVKL SVDRVNVHSPHRAMAALGLMLTCMYTGKEKVSPGRTSDPNPAAPDSESVIVAMERVSVLFDRIRKGFPCEAR VVARILPQFLDDFFPPQDIMNKVIGEFLSNQQPYPQFMATVVYKVFQTLHSTGQSSMVRDWVMLSLSNFTQR APVAMATWSLSCFFVSASTSPWVAAILPHVISRMGKLEQVDVNLFCLVATDFYRHQIEEELDRRAFQSVLEV VAAPGSPYHRLLTCLRNVHKVTTC 1 model was built with 82.5% confidence across 2745-3091 using templatr c2jkrL from PDB: 2JKR - AP2 CLATHRIN ADAPTOR CORE with Dileucine peptide RM(phosphoS)QIKRLLSE. Fold library id PDB Header c2jkrL_ Molecule Title PDB header: Chain: L: PDB Molecule: ap-2 PDBTitle: ap2 clathrin adaptor core with endocytosis complex subunit alpha-2; dileucine peptide rm(2 phosphos)qikrllse Below, an alignment of the sequences is shown with the predicted secondary structure architecture. The PDB generated from the template detailed above, is shown in cartoon format. The protein model is coloured from blue through to red from N to C terminus. The model is shown in 2 orientations, related by a 90 degree rotation in the plane of the screen. The model shows extended helical bundles which whilst not explicitly determined to be HEAT or armadillo repeats, bear some structural similarity. This predicted domain maps almost precisely to the armadillo repeat domain 2740-3083 determined by InterPro which in turn tallies to the experimentally determined fragment 2672-3130, lending strength to this prediction. 3. Overall conclusions - - - The high confidence models determined by Phyre2 show the expected secondary structural features of huntingtin – extended alpha helical regions These alpha helical motifs are generally folded into HEAT (2 helices joined by a helical hairpin) or armadillo (3 helices - H2 and H3 packed together in an antiparallel fashion, perpendicular to shorter H1, with a sharp loop between H1 and H2 mediated by a conserved glycine) repeats in the predicted models, again in line with previous predictions by InterPro High confidence models (>80%) consisting of these tertiary structure features were built for regions 130-399, 626-936 and 2745-3091. The regions of huntingtin sequence for each of these models correlate to domain predictions by InterPro as well as experimentally determined putative domain boundaries from the limited proteolysis and mass spectrometry experiments. Despite fairly confident prediction of alpha-helical secondary structure motifs throughout the region of huntingtin sequence from 1201-2400, no models were built with confidence, consistent with the InterPro analysis and to some degree the experimental data. 4. Next steps: - Repeat analysis with other structural prediction programmes Continue experimental domain determination by limited proteolysis with alternative enzymes i.e. chymotrypsin Begin domain construct design for BVES expression Phyre2: The Phyre2 web portal for protein modeling, prediction and analysis. Kelley LA et al. Nature Protocols 10, 845-858 (2015)