Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski Main functions of proteins (selected): Enzymes Immunoglobulins Transport factors (e.g.hemoglobin) Hormones, neurotransmitters Structural and storage proteins Contractile proteins (muscles, flagella) Jacek Leluk Protein – a polymer of amino acids. Proteins consists of one or more chains. Some proteins contain other components (sugars, lipids, nucleotides, metal ions, other compounds...) – proteids. The basic unit of a protein is amino acid. There are 20 biogenic amino acids (genetically encoded). Jacek Leluk Amino acids Amino acid – organic compound that contains amino group and acidic group (usually it is carboxyl group) General formula Jacek Leluk Alanine Amino acid – polypeptide – protein Jacek Leluk Protein chain folding Jacek Leluk Diversity of proteins Glucagon Jacek Leluk Insulin ROP protein Diversity of proteins Light „harvesting” protein from purple bacteria Jacek Leluk Sequence – structure - function At first the central dogma of molecular biology assumed very strict relationship between genetic information, protein structure and function: 1 gene ? 1 sequence ? 1 structure ? 1 function At present this dogma is still valid but not in as strict form as before. These relationships are not strictly univocal. e.g. a protein of the same sequence may reveal different secondary and tertiary structures. Jacek Leluk Sequence – structure - function All information about protein structure (and function as well) is included in its amino acid sequence, which is unique for each protein. In order to be able to apply these relationships for protein modelling, first we have to learn to read and understand the information „written” in amino acid sequence. The current level of our understanding this „writing” depends on the protein complexity and the prediction accuracy is between 20% and 80%. Jacek Leluk What do we have? Biomolecular databases (genomic, protein and bibliographic) Tools for theoretical analysis of biomolecules Labs for experimental verification of the results Knowledge (theories, hypotheses, theoretical models) Jacek Leluk Regular types of structure (secondary structure) -helix Jacek Leluk helix Regular types of structure (secondary structure) -chain (-sheet) sheet Jacek Leluk 3D protein structures Structure-function relationship Sea anemone toxin Snake toxin Jacek Leluk 3D protein structures Structure-function relationship Bacterial RNase Mammalian RNase Rnase inhibitor (inhibits both RNases) Jacek Leluk Errors (mutations) and resulting implications Sickle cell anemia Sickle cell anemia – genetic disease caused by a single amino acid substitution in hemoglobin -chain (one of 146). S hemoglobin has Val instead of Glu in -chain. Homozygotes (HbSS) are lethal, heterozygotes (Hb AS) are anemic, but resistant to malaria. Normal hemoglobin – chain VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG KEFTPPVQAAYQKVVAGVANALAHKYH S Hemoglobin – chain VHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG KEFTPPVQAAYQKVVAGVANALAHKYH Jacek Leluk Mutations and resulting implications Sickle cell anemia Hemoglobin Normal Jacek Leluk Altered Mutations and resulting implications Sickle cell anemia Jacek Leluk Mutations and resulting implications Sickle cell anemia Jacek Leluk Glucagon (pig) – hormone, 29 amino acids HSQGTFTSDYSKYLDSRRAQDFVQWLMNT Glucagon (synthetic) – hormone, 29 amino acids HSQGTFTSDYSKYLDSKKAQEFVQWLMNT Jacek Leluk „Gluca con” modelling Glucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT Gluca con LAALIAAVAAAIAAVLRRIAEVLAIVAAL Hydrophobic amino acids: L, I, V, F, M, Y, (W) Jacek Leluk „Gluca con” design - results Glucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT Gluca con Jacek Leluk – LAALIAAVAAAIAAVLRRIXEVLAIVAAL Can we „improve” the Nature at molecular level? What for? Our goal is to get the knowledge about natural mechanisms and then to apply this knowledge for our needs, but not to alter the evolved mechanisms that naturally occur. Jacek Leluk Role and significance of theoretical protein modeling and design Time economy Money economy Work and material economy Increasing our knowledge Supporting the experimental work Jacek Leluk The value of virtual protein design = Jacek Leluk P01055 P01057 P01056 P01058 P01059 P01063 P17734 P81483 P81484 P16343 P01064 P82469 P01061 P01062 P01060 1BBI: 1D6R:I 1DF9:C 1PI2: 1PBI:A AAB4719 TISYC2 JC2225 TIZB2 JC2073 JC2072 0506164 0401177 763679A TISYD2 0907248 1102213 1102213 0404180 TIZB1B TIMB TIZB1P JC1066 Q41066 P80321 Q41065 P81705 P56679 P16346 P01065 P24661 P07679 P19860 P22737 220645 P09864 P09863 3 10 20 30 40 50 60 ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP ESSKPCCDECACTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS QSSKPCCBHCACTKSIPPQCRCTDLRLDSCHSACKSCICTLSIPAQCV-CBBIBDFCYEP-CKS ESSKPCCDQCSCTKSMPPKCRCSDIRLNSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS ESSKPCCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICALSEPAQCF-CVDTTDFCYKS-CHN ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS QSSKPCCRQCACTKSIPPQCRCSQVRLNSCHSACKSCACTFSIPAQCF-CGBIBBFCYKP-CKS -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS ESSKPCCSSC-CTRSRPPQCQCTDVRLNSCHSACKSCMCTFSDPGMCS-CLDVTDFCYKP-CKS EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS -SSGPCCDRCRCTKSEPPQCQCQDVRLNSCHSACEACVCSHSMPGLCS-CLDITHFCHEP-CKS ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS ESSEPCCDSCDCTKSIPPECHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES QSSPPCCBICVCTASIPPQCVCTBIRLBSCHSACKSCMCTRSMPGKCR-CLBTTBYCYKS-CKS ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP ---KPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKESSEPCCDSCDCTKSIPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES EYSKPCCDLCMCTRSMPPQCSCED-RINSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKQ-CHN ESSKPCCDQCTCTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS ESSEPCCDSCRCTKSIPPQCHCADIRLNSCHSACKSCMCTRSMPGKCR-CLDTDDFCYKP-CES ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES ESSHPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCTKP-CES DVKSACCDTCLCTKSDPPTCRCVDVGET-CHSACDSCICALSYPPQCQ-CFDTHKFCYKA-CHN STTTACCDFCPCTRSIPPQCQCTDVREK-CHSACKSCLCTLSIPPQCH-CYDITDFCYPS-CRDVKSACCDTCLCTKSNPPTCRCVDVRET-CHSACDSCICAYSNPPKCQ-CFDTHKFCYKA-CHN --TSACCDKCFCTKSNPPICQCRDVGET-CHSACKFCICALSYPAQCH-CLDQNTFCYDK-CDS DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKA-CHN --TTACCNFCPCTRSIPPQCRCTDIGET-CHSACKTCLCTKSIPPQCH-CADITNFCYPK-CNDVKSACCDTCLCTRSQPPTCRCVDVGER-CHSACNHCVCNYSNPPQCQ-CFDTHKFCYKA-CHS DVKSACCDTCLCTKSEPPTCRCVDVGER-CHSACNSCVCRYSNPPKCQ-CFDTHKFCYKS-CHN KRPWECCDIAMCTRSIPPICRCVDKVDR-CSDACKDCEETEDN--RHV-CFDTYIGDPGPTCHD ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE ES-EGCCDRCICTKSMPPQCHCHDVRLDSCHSDCETCICTRSYPAQCR-CADTTDFCYKP-C-S TRPWKCCDRAICTKSFPPMCRCMDMVEQ-CAATCKKCGPATSDSSRRV-CEDXY----------KRPWKCCDQAVCTRSIPPICRCMDQVFE-CPSTCKACGPSVGDPSRRV-CQDQYV----------