* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction, ppt file - Cheriton School of Computer Science
Catalytic triad wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Fatty acid synthesis wikipedia , lookup
Fatty acid metabolism wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Peptide synthesis wikipedia , lookup
Protein purification wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Biosynthesis wikipedia , lookup
Amino acid synthesis wikipedia , lookup
CS882, Fall 2006 1001 Stories of Protein Folding Ming Li School of Computer Science University of Waterloo By the time I finish telling these protein stories, I hope we know better how to fold them by computers. Prelude: Why should you care? Through 3 billion years of evolution, nature has created an enormous number of protein structures for different biological functions. Understanding these structures is key to proteomics. Fast computation of protein structures is one of the most important unsolved problems in science today. Much more important than, for example, the P≠NP conjecture. We now have a real chance to solve it. This course: I do ½ of the course, so that we understand everything about proteins. You do ½ of the course, to present all methods for protein folding. 50% marks. You do a final project designing your method for folding proteins. 50% marks. Proteins – the life story Proteins are building blocks of life. In a cell, 70% is water and 15%-20% are proteins. Examples: hormones – regulate metabolism structures – hair, wool, muscle,… antibodies – immune response enzymes – chemical reactions Sickle-cell anemia: hemoglobin protein is made of 4 chains, 2 alphas and 2 betas. Single mutation from Glu to Val happens at residue 6 of the beta chain. This is recessive. Homozygotes die but Heterozygotes have resistance to malaria, hence it had some evolutionary advantage in Africa. T A A T C G T A Human: 3 billion bases, 30k genes. E. coli: 5 million bases, 4k genes cDNA reverse transcription A G C G T C G T C G T A mRNA (A,C,G,U) C A translation transcription Protein (20 amino acids) Codon: three nucleotides encode an amino acid. 64 codons 20 amino acids, some w/more codes A T They are built from 20 amino acids and fold in space into functional shapes Several polypeptide chains can form more complex structures: What happened in sickle-cell anemia Mutating to Valine. Hydrophobic patch on the surface. Hemoglobin Mutating to Valine. Hydrophobic patch on the surface. Amino acids stories There are 500 amino acids in nature. Only 20 (22) are used in proteins. The first amino acid was discovered from asparagus, hence called Asparagine, in 1806. All 20 amino acids in proteins are discovered by 1935. Traces of glycin, alanine etc were found in a meteorite in Australia in 1969. That brings the conjecture that life began from extraterrestrial origin. 20 Amino acids – the boring part Hydrophobic amino acids Alanine Neutral Valine Non-polar Phenylalanine Proline Methionine Isoleucine Lucine Charged Amino Acids Aspartic acid Glutamic acid Lysine Arginine Polar amino acids Serine Polar: one positive and one negative charged ends, Threonine e.g. H O is polar, oil is non-polar. Tyrosine Histidine Cysteine Asparagine Glutamine Tryptophan 2 Simplest Amino Acid Glycine Why do protein fold? Some philosophy The folded structure of a protein is actually thermodynamically less favorable because it reduces the disorder or entropy of the protein. So, why do proteins fold? One of the most important factors driving the folding of a protein is the interaction of polar and nonpolar side chains with the environment. Nonpolar (water hating) side chains tend to push themselves to the inside of a protein while polar (water loving) side chains tend to place themselves to the outside of the molecule. In addition, other noncovalent interactions including electrostatic and van der Waals will enable the protein once folded to be slightly more stable than not. When oil, a nonpolar, hydrophobic molecule, is placed into water, they push each other away. Since proteins have nonpolar side chains their reaction in a watery environment is similar to that of oil in water. The nonpolar side chains are pushed to the interior of the protein allowing them to avoid water molecule and giving the protein a globular shape. There is, however, a substantial difference in how the polar side chains react to the water. The polar side chains place themselves to the outside of the protein molecule which allows for their interact with water molecules by forming hydrogen bonds. The folding of the protein increases entropy by placing the nonpolar molecules to the inside, which in turn, compensates for the decrease in entropy as hydrogen bonds form with the polar side chains and water molecules. 1 letter label & how to remember them If only one amino acid begins with a letter, that letter is used: C = Cys = Cysteine H = His = Histidine I = Ile = Isoleucine M = Met = Methionine S = Ser = Serine V = Val = Valine The losers try phonetically F = Phe = Phenylalanine R = Arg = Arginine Y = Tyr = Tyrosine W = Trp = Trptophan (double ring) Otherwise the letter is assigned to the more frequent one: A = Ala = Alanine G = Gly = Glycine L = Leu = Leucine P = Pro = Proline T = Thr = Threonine When everything fails: D = Asp = Aspartic acid N = Asn = Asparagine E = Glu = Glutamic acid Q = Gln = Glutamine K = Lys = Lysine They really look all the same: One amino acid. The difference is only in the side chain R. Lose H2O Many amino acids connected to a polypeptide chain The amino acids are connected to form polypeptide chains: going from N terminal to C terminal Lose water H2O when forming the peptide bond Planar, rigid, with known bond distances and angles. They could have been different L-form vs D-form: Looking down the HCα bond from H, the L-form is CORN. The D-form is NRCO All amino acids occur in proteins have L-form. It is unclear why Dform was not chosen Mirror image In functioning proteins, only L-form occur In nature, L, Dforms occur with equal chance. Story of cysteines Two cysteine residues in different (non- adjacent) parts of a protein sequence can be oxidized to form a disulfide bridge, as end product of air oxidation: 2 cysteines + ½ O2 = 2 linked cysteines + H2O They have the functions: Stablize single protein fold Linking two chains (linking A and B chains in insulin) Disulfide bond between two cystines: Cystine: SH | CH2 | Note: We will not study amino acids one by one, but we will study their structures when we meet them. Red bond connects to Cα The Φ and Ψ angles The angle at N-Cα is Φ angle The angle at Cα-C’ is Ψ angle No side chain is involved (which is at Cα) These angles determine backbone structure. Cα The Ramachandran plot Red: good Yellow: ok White: forbidden Except Glycine L-amino acids cannot form Large left-handed helix, but Gly (also apn, asp) can form short left-handed helix, with side chain forming hydrogen bound with main chain. The story of Glycine Glycines have no side-chain (just H), so it can adopt phi and psi angles in all 4 quadrants of the Ramachadran plot. Thus, it frequently occur in turn regions of proteins where any other residue would be sterically hindered. Glycine: H | Staggered carbon atoms for side chains Most favorable + 1200 rotations Ethan: CH3CH3 Aligned, too crowded Valine: (b) is more favorable, least crowded Cβ Cα