* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computational Biology 1 - Bioinformatics Institute
Signal transduction wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Peptide synthesis wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Expression vector wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Interactome wikipedia , lookup
Biosynthesis wikipedia , lookup
Point mutation wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Genetic code wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Homology modeling wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Computational Biology 1 Protein Structure, Stability and Folding Guna Rajagopal, Bioinformatics Institute, [email protected] References : Molecular Biology of the Cell, Cell 4th Ed. Alberts et. al. Pg. 129 – 190 Introduction to Protein Structure, Structure 2nd Ed. Branden and Tooze The Big Picture Genome projects are providing parts lists for the genetic and protein components of the cellular circuitry. Bioinformatics analysis of this data provides protein function and sometimes structure by homology, partial identification of regulatory sites on the DNA and functional RNAs. Partial networks can be constructed by homology to known biochemical networks. Genetic defects that lead to disease can also be identified at this level. Evolutionary relationships among organisms can also be calculated from this data. Structural biology provides experimental data on the 3-dimensional structure of biomolecules and computational approaches to predicting structure from sequence and for predicting biomolecular recognition. Both static and dynamic models of biomolecular interactions are the basis for rational drug design and automated biochemical reaction network prediction. Check if regulation is by hexamer (AbrB)6 σH regulates spo0A spo0F kinA "additional posttranslational regulation" σH "maximal expression at T0" σ σ σ A H spo0K D ? #72 P6 P3 P4 (operon) pathway for repression of sporulation in favor of competence? (sigH) σA P1P2 P5 P3P4 PhrA PhrA PhrA spo0H P1P2 σ ?? σH P6 ComA σA inhibits RapA through unknown mechanism PhrA σA PhrA* PhrA* I PhrA (secreted) P5 orf dnaG sigA (rpoD) inhibitory? *See notes P1 P2spo0F O1 O2 Spo0A~P dimer? ? rapA σA O3 phrA Spo0A~P dimer? Spo0F RapA spo0B obg σA KinB kinB KapB σH KinA Spo0B KinA kapB P KinA-P Spo0F-P Spo0F (Spo0A-P)2 Spo0A-P Spo0B-P Spo0A kinA SinR RapA SinR-SinI Spo0A~P dimer? AbrB6 Hpr spoIIA Spo0E SpoIIG SinR SinI SinI sinI Commitment to sporulation SpoIIA SinR sinR constitutive during growth (spo0H) Activation of KinB (unk. mech.) spoIIG SinR dimer? A AbrB (AbrB)6 AbrB6 σ dimer? Spo0A~P AbrB AbrB6 dimer? σA σ PA PS spo0A constitutive(??) abrB Spo0A A σH O1 O2 O3 spo0E Biochemical and genetic network analysis integrates integrates data data from from all all the the steps steps above above to to provide provide aa prediction prediction of of cellular cellular system system function. function. Such Such analyses analyses provide provide insight insight into into how how cells cells process process and and act act upon upon complex complex external external and and internal internal signals. signals. These These are are the the fundamental fundamental control control mechanisms mechanisms that: that: 1) 1) lead lead to to partial partial penetrance penetrance of of genotype genotype and and maintenance maintenance of of population population heterogeneity, heterogeneity, 2) 2) determine determine reliability reliability of of cellular cellular function function and and the the propensity propensity for for disease disease given given partial partial failure failure of of aa network network component, component, 3) 3) govern govern adaptation adaptation of of pathogens pathogens to to pharmaceutical pharmaceutical attack,, attack,, and and 4) 4) may may provide provide the the basis basis for for reversal reversal of of development development defects defects and and early early detection detection of of cellular failure cellular control control failure. failure. Systems Biology Ultimately, Ultimately, integration integration of of genomic genomic data data and and genome genome derived derived data data such such as as that that from from gene gene chips, chips, structural structural and and molecular molecular dynamic dynamic data, data, network network functional functional analyses analyses and and data, data, will will lead lead to to aa quantitative quantitative understanding understanding of of differential differential developmental developmental processes processes and and finally finally aa full full tracing tracing of of the the molecular molecular basis basis of of development development from from fertilized fertilized egg egg to to adult adult organism organism Adult 1.5 mm long ~1000 cells Why are the quantative Sciences Important to Biology? Many of the technological innovations that allowed us to peer more closely into the workings of living systems involve/require physical, mathematical, and computational techniques. •Enzymology & Metabolism •System level understanding of regulation •Pattern formation and development •Protein structure & function •Transport (Flagellar motors, ion pumps) •Mechanisms of mutation and heredity •Gel Electrophoresis •NMR/XRAY structural analysis and imaging •Sequence assembly and analysis •Mass Spectrometry in Proteomics •control theory, •Neuronal signalling and modeling •Stochastic processes •Machine learning and knowledge discovery •Data mining •Adaptive complex systems theory Biology in the High-Throughput Era Genomes Gene Products Pathways & Physiology Structure & Function Populations & Evolution Ecosystems z Scientific Challenges z Algorithmic Challenges z Data Integration Challenges z Computational Challenges Recent Nobel prizes in medicine went to discoveries that had profound physical implications for cellular function. 1997- Discovery of the Prions A prion is an infectious agent that has no genetic material. Unlike most proteins it can fold into more than one structure. One of the structures is “healthy”, the other forms long filaments that disrupt cellular function. Indeed, the unhealthy form catalyzes conversion of the healthy one! Questions: 1) How can a protein have two stable “lowest energy states”? 2) What is the rate of inter-conversion between them? 3) What does the auto-catalysis do? 4) How much inter-conversion leads to disease? 5) Under what conditions can the disease be transmitted? Biology after the Genome •Great effort (and money) going into sequencing the human (and other) genome(s). The idea was-- once we found all the parts of the cellular program then we would know how cells functioned. •You need to know the physical behavior of each of the parts, how they interact amongst themselves and the environment in order to determine behavior as a whole •We are now at a point where a physical/mathematical/computational approaches to integrating the available biological data is necessary. Bacteria Have ~1e10 Molecules Percent of Total Cell Weight Water Inorganic ions Sugars and precursors Amino acids and precursors Nucleotides and precursors Fatty acids and precursors Other small molecules Macromolecules (proteins, nucleic acids, and polysaccharides) Total 70 1 1 0.4 0.4 1 0.2 26 Number of Types of Each Molecule 1 20 250 100 100 50 ~300 ~3000 ~4000 Complex Behaviors of Living Systems Myxococcus xanthus colony undergoing traveling wave selforganization on its way to sporulation. Human neutrophil tracking a Staphylococcus. Drosophila melanogaster embryo developing The Grand Challenges • Improve in vitro macromolecular synthesis. • Conceptually link atomic (mutational) changes to population evolution molecular & systems modeling). (via • Novel polymers for smart-materials, mirror-enzymes & drug selection. • Model combinations of external signals & genome-programming on expression. • Manipulate stem-cell fate & stability. • Engineer reduction of mutation & cancerous proliferation. • Programmed cells to replace or augment (low toxicity) drugs. • Programming of cell and tissue morphology. • Quantitate robustness & evolvability. • Engineer sensor-effector feedback networks where macro-morphologies determine the functions; past (Darwinian) or future (prosthetic). Protein Structure Overview • • • • Proteins are the building blocks from which all cells are built, i.e. they are biopolymers. They execute nearly all cell functions i.e. they are enzymes, channels/pumps, carry signals/messages, serve as molecular machines, antibodies, toxins, hormones, elastic fibers etc. They influence how our bodies function (or malfunction!). Their structure (or conformation) under physiological conditions governs their function. Diversity of protein structures The diversity of viable proteins have been constrained by natural selection to give: • • • • desired function adequate stability foldability evolvability from appropriate evolutionary precursors. Levinthal’s paradox and folding pathways Overview of Protein Function Protein Architecture Is based on 3 principles: • Formation of a polypeptide chain • Folding of this chain into a compact function-enabling structure (i.e. the native structure), • Post-translational modification of the folded structure. Proteins are chains of amino acids • Polymer – a molecule composed of repeating units Views of a protein Wireframe Ball and stick See PDB website. These figures can also be produced by RASMOL Views of a protein Spacefill Cartoon CPK colors Carbon = green, black, or grey Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white Illustrating tight packing inside of a protein molecule Different degrees of mobility within a protein molecule Fluctuations of portions of structure as seen in MD simulations. Some parts of the protein seem more mobile than others. Shape and Structure of Proteins • The shape of a protein is specified by its amino acid (AA) sequence. • Proteins are made up of 20 different AA’s each linked to its neighbour by a covalent (peptide) peptide bond. • The 3-D shape (conformation) of a protein influences its function. AA sequence and structure Anfinsen’s experiment (see any Biochemistry book) Amino acid composition Side chain • Basic Amino Acid Structure: H – The side chain, R, varies for each of the 20 amino acids H R O N Cα C Amino group H OH Carboxyl group The side chains, R, as part of a polypeptide chain, have a different tendency to interact among themselves and water due to their different electrical properties and their size (steric effects). This Influences their final conformation. Side chain properties • The electronegativity of carbon is at about the middle of the scale for light elements – Carbon does not make hydrogen bonds with water easily – hydrophobic – O and N are generally more likely than C to hydrogen bond to water – hydrophilic • We group the amino acids into three general groups: – Hydrophobic – Charged (positive/basic & negative/acidic) – Polar The Hydrophobic Amino Acids Engage in VdW interactions only and tendency to avoid water Is the basis for the hydrophobic effect. Proline severely limits allowable conformations! The Charged Amino Acids The Polar Amino Acids Able to make hydrogen bonds to one another, the peptide backbone and to water. More Polar Amino Acids And then there’s… The Peptide Bond Convention – start at amino terminus and proceed to carboxy terminus Polypeptides • A few amino acids in a chain are called a polypeptide. polypeptide A protein is usually composed of 50 to 400+ amino acids. • Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues. residues Interactions responsible for Stability of Polypeptides Protein Stability • High temperature break weak bonds that stabilize the native state eventually converting it to the denatured state. • Denatured state identified by loss of biochemical activity. • Because the free energy difference between denatured and native state is so small, a single mutation can cause a stable protein to unfold. A few additional interactions can increase stability e.g. Taq DNA polymerase used in PCR. • Thermophilic proteins retain their structure and activity at high temperatures (e.g. found in microorganisms that live in thermal vents in the deep ocean.) Post-translational Modifications (Important process that determines protein function) Primary & Secondary Structure • Primary structure = the linear sequence of amino acids comprising a protein: AGVGTVPMTAYGNDIQYYGQVT… • Secondary structure – Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the α-helix and the β-sheet – The location of direction of these periodic, repeating structures is known as the secondary structure of the protein Secondary structure Provide stability; btwn AAs in backbone Planarity of the peptide bond Psi (ψ) – the angle of rotation about the Cα-C bond. Phi (φ) – the angle of rotation about the N-Cα bond. The planar bond angles and bond lengths are fixed. The angles Psi And Phi The alpha helix φ≈ψ ≈ −60° Properties of the alpha helix • φ ≈ ψ ≈ −60° • Hydrogen bonds between C=O of residue n, and NH of residue n+4 • 3.6 residues/turn • 1.5 Å/residue rise • 100°/residue turn • 4 – 40+ residues in length • Often amphipathic or “dual-natured” – Half hydrophobic and half hydrophilic – Mostly when surface-exposed • If we examine many α-helices, we find trends… – Helix formers: formers Ala, Glu, Leu, Met – Helix breakers: breakers Pro, Gly, Tyr, Ser The beta strand (& sheet) φ ≈ − 135° ψ ≈ +135° Properties of beta sheets • Formed of stretches of 5-10 residues in extended conformation • Pleated – each Cα a bit above or below the previous • Parallel/aniparallel, aniparallel contiguous/non-contiguous The Ramachandran Plot • G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi Experimental observation of Secondary Structures via Circular Dichroism Effect of secondary structure on polarized light Computed CD spectra of poly (Lys) in alpha, beta and random coil conformation. Turns and Loops • Secondary structure elements are connected by regions of turns and loops • Turns – short regions of non-α, non-β conformation • Loops – larger stretches with no secondary structure. Often disordered. – “Random coil” – Sequences vary much more than secondary structure regions Secondary Structure Prediction All the prediction schemes seem to agree on approx location of alpha helices and Beta strands but disagree considerably on the lengths and end positions. Loops and turns are very inconsistently predicted. Applications of many methods more informative than a single one.