* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download here
Non-coding DNA wikipedia , lookup
Bottromycin wikipedia , lookup
Magnesium transporter wikipedia , lookup
Expanded genetic code wikipedia , lookup
Non-coding RNA wikipedia , lookup
Silencer (genetics) wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Epitranscriptome wikipedia , lookup
Protein moonlighting wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Molecular evolution wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
List of types of proteins wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Western blot wikipedia , lookup
Point mutation wikipedia , lookup
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Biosynthesis wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Gene expression wikipedia , lookup
Protein adsorption wikipedia , lookup
Proteolysis wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Homology modeling wikipedia , lookup
Structural Bioinformatics Seminar •Dina Schneidman •Email: [email protected] Outline Seminar requirements Biological Introduction How to prepare seminar lecture? Seminar Requirements No prior knowledge in Biology is assumed or required! Attend ALL lectures Prepare one of the lectures Seminar Goals Learn how to study new subject from articles Learn how to present work in Computer Science Biological Introduction Schedule Introduction to molecular structure. Introduction to pattern matching. Introduction to protein structure alignment (comparison). Protein docking. Small Ligands Small organic molecules, composed of tens of atoms. Highly flexible: can have many torsional degrees of freedom. DNA – The code of life DNA is a polymer. The monomer units of DNA are nucleotides: A, T, C, G. DNA is a normally double stranded macromolecule. RNA RNA is a polymer too. The monomer units of RNA are nucleotides: A, U (instead of T), C, G. DNA serves as the template for the synthesis of RNA. Protein Protein is a polymer too. The monomer units of Protein are 20 amino acids. Each amino acid is encoded by 3 RNA nucleotides. Hemoglobin sequence: VHLTPEEKSAVTALWGKVNVDEVGGEAL GRLLVVYPWTQRFFESFGDLSTPDAVMG NPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHXDKLHVD PENFRLLGNVLVCVLAHHFGKEFTPPVQ AAYQKVVAGVANA LAHKYH The Central Dogma Transcription Translation mRNA Gene (DNA) DNA RNA Cells express different subset of the genes in different tissues and under different conditions. Protein Protein Symptomes (Phenotype) The central dogma DNA ---> {A,C,G,T} mRNA {A,C,G,U} Guanine-Cytosine ---> Protein {A,D,..Y} T->U Thymine-Adenine 4 letter alphabets 20 letter alphabet Sequence of nucleic acids Sequence of amino acids Bioinformatics - Computational Genomics DNA mapping. Protein or DNA sequence comparisons. Exploration of huge textual databases. In essence one- dimensional methods and intuition. Structural Bioinformatics Structural Genomics Elucidation of the 3D structures of biomolecules. Analysis and comparison of biomolecular structures. Prediction of biomolecular recognition. Handles three-dimensional (3-D) structures. Geometric Computing. (a methodology shared by Computational Geometry, Computer Vision, Computer Graphics, Pattern Recognition etc.) Protein Structural Comparison ApoAmicyanin - 1aaj Pseudoazurin - 1pmy Algorithmic Solution About 1 sec. Fischer, Nussinov, Wolfson ~ 1990. Introduction to Protein Structure Amino acids and the peptide bond Cα atoms Cb – first side chain carbon (except for glycine). Backbone or Secondary structure display Wire-frame or ribbons display Spacefill model Geometric Representation 3-D Curve {vi}, i=1…n Secondary structure b strands and sheets Hydrogen bonds. The Holy Grail - Protein Folding From Sequence to Structure. Relatively primitive computational folding models have proved to be NP hard even in the 2-D case. Determination of protein structures X-ray Crystallography NMR (Nuclear Magnetic Resonance) EM (Electron microscopy) An NMR result is an ensemble of models Cystatin (1a67) The Protein Data Bank (PDB) International repository of 3D molecular data. Contains x-y-z coordinates of all atoms of the molecule and additional data. http://pdb.tau.ac.il http://www.rcsb.org/pdb/ Why bother with structures when we have sequences ? In evolutionary related proteins structure is much better preserved than sequence. Structural motifs may predict similar biological function Getting insight into protein folding. Recovering the limited (?) number of protein folds. Applications Classification of protein databases by structure. Search of partial and disconnected structural patterns in large databases. Extracting Structure information is difficult, we want to extract “new” folds. Applications (continued) Speed up of drug discovery. Detection of structural pharmacophores in an ensemble of drugs (similar substructures in drugs acting on a given receptor – pharmacophore). Comparison and detection of drug receptor active sites (structurally similar receptor cavities could bind similar drugs). Object Recognition Model Database Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988. Protein Alignment = Geometric Pattern Discovery Protein Alignment • The superimposition pattern is not known apriori – pattern discovery . • The matching recovered can be inexact. • We are looking not necessarily for the largest superimposition, since other matchings may have biological meaning. Geometric Task : Given two configurations of points in the three dimensional space, T find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points. Geometric Task (continued) Aspects: •Object representation (points, vectors, segments) •Object resemblance (distance function) •Transformation (translations, rotations, scaling) -> Optimization technique Transformations Translation x x t Translation and Rotation Rigid Motion (Euclidian Trans.) x R x Ux t Translation, Rotation + Scaling x Tx s(Ux t ) Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. T Question: how to measure alignment error? Superposition - best least squares (RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={pi}, Q={qi} , i=1,…,n; rmsd(P,Q) = √ S i|pi - qi |2 /n Find a 3-D rigid transformation T* such that: rmsd( T*(P), Q ) = minT √ S i|T*pi - qi |2 /n A closed form solution exists for this task. It can be computed in O(n) time. Problem statement with RMSD metric. Given two configurations of points in the three dimensional space, and ε threshold T find the largest alignment, a set of matched elements and transformation, with RMSD less than ε. (belong to NP,) Docking Problem: • Given two molecules find their correct association: T = + Docking Problem: + = ? Docking Problem: + = ? How to present a paper in Computer Science Lecture Preparation The lecture should cover a given slot of time (~90 minutes). Use PowerPoint slides for presentation. Each slide usually spans 1-2 minutes. The slides should not be overloaded. Use mouse or pointer. Use colors, pictures, tables and animation, but don’t exaggerate. What to say and how Communicate the key ideas during your lecture. Don’t get lost in technical details. Structure your talk. Use a top-down approach. Lecture Structure Introduction – general description of the paper. Body - abstract of the current method. Technical details. Conclusions and discussion. Introduction Most important part of your talk! Title + short explanation about the presented topic. Lecture outline. Problem definition, input and output. Don’t forget to define the problem! Problem motivation. Introduce terminology of the field. Short review of existing approaches (don’t forget to add references!). Body Abstract of the major results presented in the paper. Significance of the results. Sketch of the method. Technicalities Extended presentation of the method. Present key algorithmic ideas clearly and carefully. Complexity of the method. Experimental results. Conclusions and Discussion Summarize major contributions of the work. You can highlight points based on technical details you couldn’t discuss in introduction. Present related open problems. Don’t forget to thank the audience !!! Questions. Getting to the Audience Use repetitions: “Tell them what you're going to tell them. Tell them. Then tell them what you told them". Remind, don’t assume Maintain eye contact Control your voice and motion Thanks!!! and Good Luck in your lectures!