* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides
Survey
Document related concepts
Size-exclusion chromatography wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Drug design wikipedia , lookup
Magnesium transporter wikipedia , lookup
Expression vector wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Gene expression wikipedia , lookup
Signal transduction wikipedia , lookup
Metalloprotein wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Biochemistry wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Western blot wikipedia , lookup
Proteolysis wikipedia , lookup
Transcript
Spring 2006 – http://www.stanford.edu/class/cs273/ CS273 Algorithms for Structure and Motion in Biology Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe | ssgross | @ cs.stanford.edu Need a Scribe!! Range of Bio-CS Interaction Enormous range over space and time Body system Tissue/Organs Cells Molecules Gene Sequence alignment Robotic surgery Soft-tissue simulation and surgical training Simulation of cell interaction Molecular structures, similarities and motions CS273 Focus on Proteins Proteins are the workhorses of all living organisms They perform many vital functions, e.g: • • • • • • Catalysis of reactions Transport of molecules Building blocks of muscles Storage of energy Transmission of signals Defense against intruders Proteins are also of great interest from a computational viewpoint They are large molecules (few 100s to several 1000s of atoms) They are made of building blocks (amino acids) drawn from a small “library” of 20 amino-acids They have an unusual kinematic structure: long serial linkage (backbone) with short side-chains Proteins are associated with many challenging problems Predict folded structures and motion pathways Understand why some proteins misfold or partially fold, causing such diseases as: cystic fibrosis, Parkinson, Creutzfeldt-Jakob (mad cow) Find structural similarities among proteins and classify proteins Find functional structural motifs in proteins Predict how proteins bind against other proteins and smaller molecules Design new drugs Engineer and design proteins and protein-like structures (polymers) Central Dogma of Molecular Biology Central Dogma of Molecular Biology translation transcription Protein Sequence (residue i-1) O O N N N N O O Long sequence of amino-acids (dozens to thousands), also called residues Dictionary of 20 amino-acids (several billion years old) Protein Sequence O O N N N N T O O Peptide bond (partial double bond character) Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure Levels of Protein Structures Quaternary hemoglobin (4 polypeptide chains) Mostly a-helices Mostly b-sheets Mixed Unfolded (denatured) state Folding Folded (native) state Intermediate states Many pathways How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html Motion of Proteins in Folded State HIV-1 protease Structural variability of the overall ensemble of native ubiquitin structures [Shehu, Kavraki, Clementi, 2005] Flexible Loop Loop 7 Amylosucrase Central Dogma of Molecular Biology Binding Inhibitor binding to HIV protease Ligand-protein binding Protein-protein binding Binding of Pyruvate to LDH (reduction of pyruvate to lactase) Loop GLN-101 ARG-106 CH3 O C C O O Pyruvate ASP-195 NADH Nicotinamide adenine dinucleotide (coenzyme) HIS193 + + THR-245 ASP-166 + ARG-169 Lactate dehydrogenase environment What is CS273 about? Algorithms and computational schemes for molecular biology problems Molecular biology seen by computer scientists The Shock of Two Cultures y = f(x) Biologists like experiments, specifics and classifications They like it better to know many (xi,yi) – i.e., facts – and classify them, than to know f Computer scientists like simulation, abstractions, and general algorithms They want to know f – the explanation of the facts – and efficient ways to compute it, but rarely care for any (xi,yi) One challenge of Computational Biology is to fuse these two cultures Two Views of a BioComputation Class Where are IT resources for biology available and how to use them How to design efficient data structures and algorithms for biology Main Ideas Behind CS273 1. The information is in the sequence Sequence Structure (shape) Function Sequence similarity Structural/functional similarity Sequences are related by evolution Main Ideas Behind CS273 1. The information is in the sequence Sequence Structure (shape) Function Sequence similarity Structural/functional similarity Sequences are related by evolution 2. Biomolecules move and bind to achieve their functions Deformation folded structures of proteins Motion + deformation multi-molecule complexes One cannot just “jump” from sequence to function Protein folding Ligand protein binding sequence similarity Sequence Structure structure similarity Function Main Ideas Behind CS273 1. The information is in the sequence Sequence Structure (shape) Function Sequence similarity Structural/functional similarity Sequences are related by evolution 2. Biomolecules move and bind to achieve their functions Deformation folded structures of proteins Motion + deformation multi-molecule complexes One cannot just “jump” from sequence to function CS273 is about algorithms for sequence, structure and motion - Finding sequence and shape similarities - Relating structure to function - Extracting structure from experimental data - Computing and analyzing motion pathways Vision Underlying CS273 Goal of computational biology: Low-cost high-bandwidth in-silico biology Requirements: Reliable models Efficient algorithms Algorithmic efficiency by exploiting properties of molecules and processes: Proteins are long kinematic chains Atoms cannot bunch up together Forces have relatively short ranges Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation) Tentative Schedule 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 April 5 April 10 April 12 April 17 April 19 April 24 April 26 May 1 May 3 May 8 May 10 May 15 May 17 May 22 May 24 May 31 June 5 June 7 June 12 Introduction Protein geometric and kinematic models Conformational space Inverse kinematics and applications Sequence similarity Sequence similarity Sequence similarity Structure comparison Structure comparison Protein phylogeny, clustering, and classification Protein phylogeny, clustering, and classification Energy maintenance Energy maintenance Structure prediction Roadmap methods Structure prediction Structure prediction TBA Project presentations (2 hours) Instructors and TAs Instructors: – Serafim Batzoglou – Jean-Claude Latombe TA: – Sam Gross Emails: | serafim | latombe | ssgross | @ cs.stanford.edu Class website: http://cs273.stanford.edu Expected Work Regular attendance to lectures and active participation Class scribing (assignments will depend on # of students) Exciting programming project: http://www.stanford.edu/class/cs273/project/project.html - Structure prediction - Clustering and distance metrics - Protein design - Something else Questions?