* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Folding
Fatty acid metabolism wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Western blot wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Peptide synthesis wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Metalloprotein wikipedia , lookup
Point mutation wikipedia , lookup
Proteolysis wikipedia , lookup
Genetic code wikipedia , lookup
Molecular evolution wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Proteins What is a protein? • A protein is a molecule consisting of amino acids linked in a linear chain through peptide bonds. Protein primary structure Peptide formation There are many kinds of proteins. • Structural--determine shape and function of cells • Enzymes--speed up chemical reactions • Ligand-binding--bind small molecules and transport them to other locations Cells • muscle • nerve Structural proteins • collagen -- in connective tissue such as cartilage • elastin -- in connective tissue such as cartilage • keratin--in hair and nails • actin -- in muscle • myosin -- in muscle to generate mechanical forces Enzymes • glucose isomerase--convert glucose into fructose • rennin--make cheese • cellulase--break down cellulose into sugars to make ethanol • amylase--detergent for machine dish washing Ligand-binding proteins. • hemoglobin--transport oxygen from the lungs • antibodies--bind foreign substances for destruction The string of amino acids tends to “fold” into a shape. Hemoglobin structure Heart of Steel (Hemoglobin) by Julian Voss-Andreae Protein views (Triose phosphate isomerase) Visualizing proteins Amino acids • There are 20 different standard amino acids • The different amino acids differ in chemical properties. • • • • • • • • • • • • • • • • • • • • • Amino Acid Alanine Arginine Asparagine Aspartic acid Cysteine Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine 3-Letter Ala Arg Asn Asp Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val 1-Letter A R N D C E Q G H I L K M F P S T W Y V Polarity nonpolar polar polar polar nonpolar polar polar nonpolar polar nonpolar nonpolar polar nonpolar nonpolar nonpolar polar polar nonpolar polar nonpolar Acidity neutral basic (s) neutral acidic neutral acidic neutral neutral basic (w) neutral neutral basic neutral neutral neutral neutral neutral neutral neutral neutral Hydrophobicity index 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2 Hydrophobicity index. • The larger the index, the stronger the tendency to be internal in the protein; the lower the index, the stronger the tendency to appear near the protein surface. • Amino acids with high index are called hydrophobic; with low index are called hydrophilic. What is the shape of the protein? • This is the “protein folding problem.” • The geometry and chemistry of the parts of the protein determine how it behaves in the cell. DNA • DNA is deoxyribose nucleic acid. • It occurs as long molecules in a double helix. DNA is a long molecule in a double helix What makes DNA? • DNA consists of sequences of nucleotides. • There are 4 kinds of nucleotide: • Adenine (A), Cytosine (C), Guanine (G), and Thymine (T) Matching • Each A has weak (“hydrogen”) bonds with T on the other chain. • Each C has weak (“hydrogen”) bonds with G on the other chain. A single chain carries the information • For example, the two strings might be ACGGTCAG TGCCAGTC • Hence all the information is in the order of A, C, G, T in one of the chains. • We write DNA as a (long) string of A, C, G, T for example AGGCTACATAG… Human DNA • Humans have 46 chromosomes. • Each chromosome is essentially a double helix of DNA, with variable numbers of nucleotides, from 50,000,000 to 250,000,000 base pairs. • There are a total of about 2,860,000,000 nucleotide pairs. Genes • A gene is a portion of the DNA that tells how to make a protein. DNA for beta hemoglobin • ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAA • • • • • • • • • • • • • • • • • • • • • Amino Acid Alanine Arginine Asparagine Aspartic acid Cysteine Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine 3-Letter Ala Arg Asn Asp Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val 1-Letter A R N D C E Q G H I L K M F P S T W Y V Polarity nonpolar polar polar polar nonpolar polar polar nonpolar polar nonpolar nonpolar polar nonpolar nonpolar nonpolar polar polar nonpolar polar nonpolar Acidity neutral basic (s) neutral acidic neutral acidic neutral neutral basic (w) neutral neutral basic neutral neutral neutral neutral neutral neutral neutral neutral Hydrophobicity index 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2 DNA determines the order of amino acids • ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAA Primary structure for beta hemoglobin--the order • MVHLTPEEKSAVTALWGKVNVDEVG GEALGRLLVVYWTQRFFESFGDLSTP DAVMGNPKVKAHGKKVLGAFSDGLA HLDNLKGTFATLSELHCDKLHVDPEN FRLLGNVLVCVLAHHFGKEFTPPVQA AYQKVVAGVANALAHKYH Hemoglobin structure How does DNA determine the order of amino acids? • Three successive nucleotides form a “codon.” • Different codons stand for different amino acids. Translating codons • • • • • • • • • • • Ala/A GCT, GCC, GCA, GCG Arg/R CGT, CGC, CGA, CGG, AGA, AGG Asn/N AAT, AAC Asp/D GAT, GAC Cys/C TGT, TGC Gln/Q CAA, CAG Glu/E GAA, GAG Gly/G GGT, GGC, GGA, GGG His/H CAT, CAC Ile/I ATT, ATC, ATA START ATG Leu/L Lys/K Met/M Phe/F Pro/P Ser/S Thr/T Trp/W Tyr/Y Val/V STOP TTA, TTG, CTT, CTC, CTA, CTG AAA, AAG ATG TTT, TTC CCT, CCC, CCA, CCG TCT, TCC, TCA, TCG, AGT, AGC ACT, ACC, ACA, ACG TGG TAT, TAC GTT, GTC, GTA, GTG TAG, TGA, TAA DNA for beta hemoglobin • ATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGTTACTG CCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGA GGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCA GAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAA AGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGAC AACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT GTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCT GGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGC AAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAG TGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAA Primary structure for beta hemoglobin • MVHLTPEEKSAVTALWGKVNVDEVG GEALGRLLVVYWTQRFFESFGDLSTP DAVMGNPKVKAHGKKVLGAFSDGLA HLDNLKGTFATLSELHCDKLHVDPEN FRLLGNVLVCVLAHHFGKEFTPPVQA AYQKVVAGVANALAHKYH Hemoglobin structure The order of amino acids is important • Consider what may happen when the “wrong” amino acid is in a certain position. Primary structure for beta hemoglobin • MVHLTPEEKSAVTALWGKVNVDEVG GEALGRLLVVYWTQRFFESFGDLSTP DAVMGNPKVKAHGKKVLGAFSDGLA HLDNLKGTFATLSELHCDKLHVDPEN FRLLGNVLVCVLAHHFGKEFTPPVQA AYQKVVAGVANALAHKYH Sickle cell anemia beta hemoglobin • MVHLTPVEKSAVTALWGKVNVDEVG GEALGRLLVVYWTQRFFESFGDLSTP DAVMGNPKVKAHGKKVLGAFSDGLA HLDNLKGTFATLSELHCDKLHVDPEN FRLLGNVLVCVLAHHFGKEFTPPVQA AYQKVVAGVANALAHKYH • • • • • • • • • • • • • • • • • • • • • Amino Acid Alanine Arginine Asparagine Aspartic acid Cysteine Glutamic acid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine 3-Letter Ala Arg Asn Asp Cys Glu Gln Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val 1-Letter A R N D C E Q G H I L K M F P S T W Y V Polarity nonpolar polar polar polar nonpolar polar polar nonpolar polar nonpolar nonpolar polar nonpolar nonpolar nonpolar polar polar nonpolar polar nonpolar Acidity neutral basic (s) neutral acidic neutral acidic neutral neutral basic (w) neutral neutral basic neutral neutral neutral neutral neutral neutral neutral neutral Hydrophobicity index 1.8 -4.5 -3.5 -3.5 2.5 -3.5 -3.5 -0.4 -3.2 4.5 3.8 -3.9 1.9 2.8 -1.6 -0.8 -0.7 -0.9 -1.3 4.2 Simple model • Pretend there are only 2 kinds of amino acid--H and P. • H stands for “hydrophobic”. • Pretend that they must be placed on a grid. • Example: HHPPPPPPPHH A folding of HHPPPPPPPHH H H P P P P H H P P P Another folding of HHPPPPPPPHH H H P H P P H P P P P Energy • • • • • HH has energy -1. PP has energy 0. HP has energy 0. PH has energy 0. The protein folds so as to minimize the energy. A folding of HHPPPPPPPHH with energy -2 H H P P P P H H P P P A folding of HHPPPPPPPHH with energy -4 H H P H P P H P P P P A folding of HHPPPPPPPHH with ? energy H H H H P P P P P P P The real problem • There are 20 amino acids. • Pairs have different energies. • Typically a protein has about 100 amino acids. • The protein is in 3 dimensions. • It does not need to be on a grid. • It must be worked on a computer. The Direct Approach • Write down a formula for the energy E, taking into account the (variable) locations of all amino acids, all charges and electrostatic attractions and repulsions, and all constraints. • Minimize E. Indirect Methods • Statistics of amino acids in known structures • Neural network models • Nearest neighbor methods • Hidden Markov models Does a method work? • We want to be able to check some answers, to see whether a method appears to work. • Professor Zhijun Wu works on some problems related to this. NMR • NMR is Nuclear Magnetic Resonance • Using NMR one can often find the distances between some particular atoms in a protein. Distances A1 A2 d(2,3) A3 d(1,4) A4 • Here d(1,4) is the distance between the first and fourth atoms. Locations A1 • • • • A2 A1 is at (x11, x12, x13). A2 is at (x21, x22, x23). A3 is at (x31, x32, x33). A4 is at (x41, x42, x43). d(2,3) A3 d(1,4) A4 • Once you know all the locations, you know the shape of the protein. Position Matrix • Form the matrix X A1 A2 d(2,3) A3 d(1,4) A4 x11 x21 x31 x41 x12 x22 x32 x42 x13 x23 x33 x43 Matrix Equation • It turns out that A1 A2 d(2,3) A3 d(1,4) A4 X XT = D where D is a matrix that can be obtained just using all the numbers d(i,j). The matrix D A1 A2 d(2,3) A3 d(1,4) A4 • If there are n atoms and the last is at the origin, then the entry of D in the ith row and jth column is (d(i,n)2 - d(i,j)2 + d(j,n)2) / 2 Solving the matrix equation A1 A2 d(2,3) A3 d(1,4) A4 • Professor Zhijun Wu studies ways to solve such matrix equations rapidly. Energy • • • • • HH has energy -1. PP has energy 0. HP has energy 0. PH has energy 0. The protein folds so as to minimize the energy. What is the best folding of • HPPHPPHPHPPHPHPHHH • (Careful: answer is on the next slide) HPPHPPHPHPPHPHPHHH P P H P H H H H H H P H H P P P P P with energy -11