* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Primary Structure
Drug design wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
Fatty acid synthesis wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Fatty acid metabolism wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Polyclonal B cell response wikipedia , lookup
Peptide synthesis wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Monoclonal antibody wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Point mutation wikipedia , lookup
Homology modeling wikipedia , lookup
Proteolysis wikipedia , lookup
Genetic code wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Primary Structure The primary sequence is: >DYKDDDDKEVQLQESGPSLVKPSQTLSLTCSVTGDSVTSGYWSWIRQFPGNKLDY MGYISYRGSTYYNPSLKSRISITRDTSKNQVYLQLKSVSSEDTATYYCSYFDSDDYA MEYWGQGTSVTVSGGGGSGGGGSGGGGSQIVLTQSPAIMSASPGEKVTLTCSASSSV SSSHLYWYQQKPGSSPKLWIYSTSNLASGVPARFSGSGSGTSYSLTISSMEAEDAASY FCHQWSSFPFTFGSGTKLEIKRAP A very similar sequence is found in PDB that is identical to this, without the DYKDDDDK (Sigma’s FLAG Tag) at the beginning and without the final P, but containing everything else[1]. In Appendix one the search result from PDB is shown, and the paper referenced shows the sequence in Figure 2 of the appendix, identifying each section of the sequence that is also found here. The protein is a modified single chain A Fv antibody fragment scFv6H4 that binds to methamphetamine and one of its derivates. The matches in Blastp 2.2.24 (appendix 2) are these two best ones for homology: 100% Score 384: Chain A, Crystal Structures Of A Therapeutic Single Chain Antibody. In Complex Methamphetamine. Contains the Sigma’s FLAG Tag (DYKDDDDK), anda hhhhhh string at the end after the phosphate 81% Score 241: anti BoNT/A Hc scFv antibody [synthetic construct]. The molecular weight [2] is given as 27.4kDa, which is small enough to lead to rapid renal clearance after the drug is injected to treat certain amphetamine derivates. It is often found in dimer and monomeric combinations. It is a genetic recombinant form of murine mAb6H4 Heavy and Light Chains. From the EVQ to the SVT is the FLAG epitope, which has been extended to DYKDDDDKEVQLQESGPSLVKPSQTLSLTCSVT. The CDR of the heavy chain is given as {GDSVTSGYWS}{YISYRGSTY}{SDDYAMEY} The CDR of the Light chain is: {SASSSVSSSHLY}{STSNLASG}{HQWSSFPFT} The Variable heavy chain (VH)is given: {EVQLQESGPSLVKSQTLSLTCSVT}{WIRQFPGNKLDYMG}{YNPSLKSRISITRDTSK NQVYLQLKSVSSEDTATYYCSY}{WGQGTSVTV} The variable light chain (VL) is given: {QIVLTQSPAIMSASPGEKVTLT}{WYQQKPGSSPKLWIY}{VPARFSGSGSGTSYSLTI SSMEAEDAASYFC}{FGSGTKLEIKRA} The linker is given as: SGGGGSGGGGSGGGGS. Surprisingly there is no 6 Histidine but we have a phosphate group. Phosphorylation is critical for many enzymes to work and affects quaternary folding so this will be looked at. Interestingly, the pI and Mw of the protein were calculated based solely on the primary amino acid sequence given and inputted to a calculator [3]. The results can be seen in appendix 3 figure 4. The pI was calculated as 5.39 and the Mw was calculated as 26940.51 Da. Since the theoretical Mw closely matches the measured Mw of 27.4kDa, the theoretical pI calculated here as 5.39 is assumedly expected to lie close to the real value. Since there is no real measurement of pI readily found in the research, the theoretical pI from the calculator will be used. The calculator was also used to calculate the amino acid compositions. In appendix 3 figure 5 the compositions are seen. The high amount of Serine is worth noting. This could have an effect on the kinetics of the antibody fragment binding, or for potential glycosylation if found at the surface. High Serine levels allow the protein to be soluble. Also worth noting is that there is twice as many hydrophobic amino acids as polar, seen in figure 6 of Appendix 3. This has implications for the structure where most of the amino acids will be folded inward away from the polar environment of the blood. The main structure seems to be more determined by the hydrophobic amino acids rather than structural amino acids, and glycine is the most common structural amino acid, often found in scFv linkers for flexibility as is the case here. The molecular weight and pI are critical to the protein for a few reasons. The molecular weight can affect renal clearance if it is really high and can bypass delivery to the liver or bladder for much longer in the body. The pharmacokinetics are greatly improved. Many drugs aim for PEGylation for this reason, by attaching large molecules to the drug to avoid clearance by the kidneys and liver. This is known as glomerular filtration rate [4]. The isoelectric point determines the solubility of the protein. If the isoelectric point is close to 7, then it will be difficult to dissolve in water. The pH of blood is close to or usually very slightly above neutral, pH 7.4. If the drug is to be dissolved in blood and thus mobile, the pI will have to be outside this, either positively or negatively charged. If this protein was found in your blood it probably means you are being treated for methamphetamine or its derivative to clear the drug from your blood. Secondary Structure The sequence was inputted to an online secondary structure prediction tool[5], called Jpred3, run by the University of Dundee. This runs an alignment to known sequences with sequences and calculates the secondary structure from the best matches. The result is found in Appendix 4 Figure 1. There are mostly beta sheets, and this covers quite a lot of the sequence, given as B’s in the results. No alpha sheets were recorded. There were also one cysteine bond found in the heavy chain and one in the light chain[6]. The secondary structure was also measured using NetsurfP to identify which amino acids are buried and which are exposed. The results are in Appendix 5. The amino acids from 26 to 60 are mostly buried, and this includes the first two heavy light chain coding regions. From 7* to 119 they are again mostly buried, again including the final part of the CDR. The linker is exposed. In general, the CDR’s of both chains are buried, while the linker and terminal regions are exposed. The secondary structure is very similar in both heavy and light chains. Tertiary Structure: The tertiary structure for the protein was calculated from the swissmodel website. A workspace was setup and using the sequence the structure was modelled and saved as Model1.PDB. This was then viewed with the swiss pdbviewer 4.0.1 First of all the linker was found by highlighting the glycine stretch in the middle of the sequence in the control panel. This can be seen in Appendix 6. The three CDR’s of the Heavy Chain are located very close together on one side of the molecule. Again the three CDR’s of the light chain are also located close together, but more interestingly are located in the same section of the protein as the light chain CDR’s. The PDB model of 3GKZ was added to the sequence structure and with a magic fit and align the RMS was calculated to be 0.28. This is very low, representing almost identical structure. Quaternary Structure The antigen was found bound to the CDR’s. The closest amino acids are the Gly40, Tyr41, Trp42, Ser43, Tyr58, Ile59, Ser60, Tyr66, Ser108, Tyr111, Met 113, Glu114, Tyr115, His173, Tyr175, His230, Gln231, Trp232, Ser234, Phe 235, (pro236 gives strong bend in CDR around the antigen), Phe 237, Thr238. There is a very significant aromatic presence in this area of the molecule, with the aromatic branches of numerous amino acids branching directly towards the antigen. Those amino acids inside 5 Angstrom are: Glu114, Tyr175, His230, Trp232. There are two hydrogen bonds to the antigen, one from His230 and one from Glu114. The protein shows a very strong electrostatic potential at the site of the CDR that may be involved in attracting the antigen, and when zoomed in it can be seen that Glu114 significantly adds to that right up to the antigen binding site. The antigen for this protein is Methamphetamine and 3,4-methylenedioxymethamphetamine. This project has identified the two amino acids that bind to the antigen, a nearby proline which helps bend the section structure around the antigen, the amount of serines around the CDR region which may be involved in the pharmacokinetics of the binding, a strong electrostatic potential right up to the binding site itself that may be involved in attracting the drug to the active site. Appendix 1. Figure 1. A search result that comes directly from the pdb main webpage search window. Figure 2 The functions of various sections of the amino acid sequence. Figure 3-Blast results showing up 3GKZ_A Appendix 2: Protein pI and Mw Calculation Figure 4 above calculating the pI and Mw. Figure 5 showing the percentage of each amino acid. Figure 6. Showing the breakdown of amino acids into their physiochemical and structural properties Appendix 4 Figure 7 a/b. The secondary structure prediction showing the beta-sheets. Figure 8. Identification of Disulphide Bonds Appendix 5. # Col.1: Class assignment - B for buried or E for Exposed - Threshold: 25% exposure, but not based on RSA # Col. 2: Amino acid # Col. 3: Amino acid No. # Col. 5: Relative Surface Accessibility - RSA # Col. 6: Absolute Surface Accessibility # Col. 7: Z-fit score for RSA prediction # Col. 8: Probability for Alpha-Helix # Col.9: Probability for Beta-strand # Col. 10: Probability for Coil Type AA E D 1 0.879 126.621 0.598 0.003 0.003 0.994 E Y 2 0.356 76.077 -1.544 0.018 0.088 0.893 B K 3 0.208 42.744 -1.117 0.019 0.141 0.84 E D 4 0.451 65.061 0.351 0.019 0.141 0.84 B D 5 0.222 31.99 -2.136 0.02 0.205 0.775 B D 6 0.099 14.237 -0.918 0.02 0.205 0.775 B D 7 0.279 40.276 0.095 0.022 0.359 0.619 B K 8 0.046 9.38 -0.592 0.022 0.552 0.426 B E 9 0.155 27.166 0.059 0.023 0.655 0.322 B V 10 0.068 10.375 0.314 0.011 0.918 0.071 E Q 11 0.26 46.365 1.635 0.011 0.918 0.071 B L 12 0.058 10.675 0.692 0.011 0.918 0.071 E Q 13 0.307 54.777 0.809 0.021 0.756 0.223 B E 14 0.201 35.167 0.423 0.004 0.514 0.481 E S 15 0.383 44.864 -0.056 0.004 0.138 0.858 E G 16 0.513 40.405 -1.644 0.018 0.047 0.935 E P 17 0.551 78.229 -2.373 0.018 0.047 0.935 E S 18 0.509 59.655 -1.822 0.019 0.141 0.84 E L 19 0.396 72.434 -0.891 0.022 0.359 0.619 E V 20 0.308 47.309 0.483 0.021 0.451 0.528 E K 21 0.474 97.584 1.195 0.02 0.205 0.775 E P 22 0.477 67.757 -0.515 0.005 0.045 0.951 E S 23 0.606 71.047 -1.65 0.005 0.045 0.951 E Q 24 0.426 76.012 -0.283 0.004 0.138 0.858 E T 25 0.434 60.224 0.146 0.004 0.616 0.381 B L 26 0.115 21.057 0.973 0.001 0.9 0.099 E S 27 0.402 47.091 1.141 0.001 0.9 0.099 Pos RSA ASA Z-Score pA pB pCoil B L 28 0.069 12.579 0.694 0.001 0.959 0.04 B T 29 0.162 22.483 0.711 0.001 0.959 0.04 B C 30 0.039 5.433 0.03 0.001 0.959 0.04 E S 31 0.317 37.141 0.645 0.001 0.9 0.099 B V 32 0.086 13.172 -0.603 0.002 0.816 0.182 B T 33 0.179 24.869 -0.773 0.004 0.514 0.481 E G 34 0.288 22.658 -1.614 0.004 0.138 0.858 B D 35 0.286 41.155 -1.441 0.005 0.262 0.733 B S 36 0.192 22.502 -0.994 0.004 0.42 0.576 B V 37 0.13 20.027 -1.237 0.021 0.451 0.528 B T 38 0.245 34.037 -1.213 0.022 0.359 0.619 B S 39 0.22 25.737 -1.274 0.019 0.141 0.84 B G 40 0.117 9.232 -0.188 0.019 0.141 0.84 B Y 41 0.087 18.571 0.141 0.021 0.451 0.528 B W 42 0.029 6.854 0.236 0.021 0.756 0.223 B S 43 0.05 5.86 -0.337 0.021 0.756 0.223 B W 44 0.032 7.696 0.21 0.018 0.846 0.136 B I 45 0.041 7.511 0.56 0.018 0.846 0.136 B R 46 0.12 27.434 0.339 0.023 0.655 0.322 B Q 47 0.102 18.164 0.388 0.022 0.359 0.619 E F 48 0.308 61.856 0.984 0.005 0.045 0.951 E P 49 0.372 52.815 0.383 0.005 0.015 0.979 E G 50 0.67 52.76 -2.003 0.005 0.015 0.979 E N 51 0.426 62.352 0.448 0.018 0.019 0.964 E K 52 0.472 97.111 -0.534 0.018 0.088 0.893 B L 53 0.174 31.786 0.779 0.005 0.262 0.733 B D 54 0.177 25.477 -0.072 0.004 0.616 0.381 B Y 55 0.05 10.749 0.357 0.001 0.9 0.099 B M 56 0.028 5.523 0.678 0.001 0.959 0.04 B G 57 0.019 1.456 -0.038 0.001 0.959 0.04 B Y 58 0.088 18.891 0.083 0.001 0.959 0.04 B I 59 0.034 6.272 -0.023 0.001 0.959 0.04 B S 60 0.147 17.217 0.524 0.004 0.616 0.381 E Y 61 0.369 78.855 -0.952 0.004 0.197 0.799 E R 62 0.603 137.973 -2.17 0.005 0.015 0.979 E G 63 0.406 31.913 -2.156 0.016 0.005 0.979 E S 64 0.509 59.608 -1.153 0.018 0.047 0.935 E T 65 0.352 48.767 -0.554 0.019 0.141 0.84 Appendix 6 Heavy Chain CDRs Light Chain CDR’s. Match with 3GKZ. The antigen bound in the middle of all CDR’s Showing the antigen in close to find amino acids within 5Angstrom, also the hydrogen bonds can be seen in green. Showing the strong electrostatic potential near the antigen. The electrostatic potential at the active site. Glu114 is giving a strong reading in red. References 1: Reha Celikel, Eric C Peterson, S Michael Owens, and Kottayil I Varughese. Crystal structures of a therapeutic single chain antibody in complex with two drugs of abuse—Methamphetamine and 3,4methylenedioxymethamphetamine. Protein Sci. 2009 November; 18(11): 2336–2345. Published online 2009 September 16. doi: 10.1002/pro.244. 2. Peterson EC, Laurenzana EM, Atchley WT, Hendrickson HP, Owens SM. Development and preclinical testing of a high-affinity single-chain antibody against (+)-methamphetamine. J Pharmacol Exp Ther. 2008 Apr;325(1):124-33. Epub 2008 Jan 11. 3. Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A.; Protein Identification and Analysis Tools on the ExPASy Server; (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press (2005). 4. Stevens LA, Coresh J, Greene T, Levey AS (June 2006). "Assessing kidney function--measured and estimated glomerular filtration rate". The New England Journal of Medicine 354 (23): 2473–83. doi:10.1056/NEJMra054415. PMID 16760447. 5. Citation: Cole C, Barber JD & Barton GJ. Nucleic Acids Res. 2008. 35 (suppl. 2) W197-W201 6. ] A. Ceroni, A. Passerini, A. Vullo and P. Frasconi. DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server, Nucleic Acids Research, 34(Web Server issue):W177-W181, 2006.ik90000000