Download Slide1 - upatras eclass

Bioinformatics Lecturer: Christos Makris Office: Π502 (ΠΡΟΚΑΤ) e-mail: [email protected] Teaching: Friday 09:00-11:00 Introduction Bioinformatics can be defined as: “the application of computational techniques and methods in an effort to understand and organize daata and information that is related to biologocal macromolecules...”. Bioinformatics-Bioinformatics is conceptualizing biology in terms of molecules (in the sense of Physical Chemistry) and applying "informatics techniques" (derived from disciplines such as applied maths, computer science and statistics) to understand and organise the information associated with these molecules, on a large scale. In short bioinformatics is a management information system for molecular biology and has many practical applications. Research Areas  Efficient storage and organization of biological data  Development of tools that permit the efficient analysis of biological data.  Development of tools for data analysis of experimental results. Course contents General Introduction  Part I -Introduction to algorithms for efficient storage and management of strings and biological data sequences . -Algorithms for exact pattern matching (Boyer-Moore,Knuth-MorrisPratt,Shift-Or, Multiple pattern matching). -Introduction to suffix trees and its applications. -Αlgorithms for approximate pattern matching and Sequence Alignment -Algorithms for searching in Sequence Data Bases (FASTA, BLAST, PROSITE)  Part II -- The theoretical base of molecular design. -- Molecular models and biochemical information -- Structure based drug design -- Open problems  Part III Clustering and categorization techniques for biological data, in order to predict the behavior of biological molecules. Course Exam Course exams consist of: 1. Delivering a project of 1-2 individuals→30% total degree, based on the book: Dan Gusfield, Algorithms on trees strings and sequences, Computer Science and Computational Biology, Cambridge University Press 2. Presentation & Oral exam, on the class notes and one more project →70% total degree Basic Notion (1) Bioinformatics is management of Biology in terms of molecules (Physical Chemistry) and the application of “information techniques” (applied mathematics, computer science and statistics) for comprehending and organizing information that is related to molecules in large scale. Figure from “Bioinformatics, from Genomes to Drugs”. T. Lengauer Basic Notions (2)    DNA consists of 1 double 1 strand of bases, The bases are united in a concrete sequence and store the genetic information of every organization : Α (adenine ), Τ (thymine ), C (cytosine ), G (guanine ) Every DNA molecule can be considered as a string of alphabet {A,C,T,G} Double helix, the knowledge of one assures the knowledge of the other (Α-Τ, C-G) Genomes    The term genome, refers to the DNA sequence of a living organism, Human genome consist of 46 chromosomes. Every cell consist of the while genome of an organism (differentiation between prokaryotes and eukariotes) Proteins     Proteins are molecules that consist of one or more polypeptides, A polypeptide is a polymer consisting of amino acids The cells produce their proteins from 20 different amino acids. A protein sequence can b considered as a seuqence of an alphabet of 20 characters, 􀂅Σ= {Ala, Arg, Asp, Asn, Cys, Glu, Gln, Gly, Hsi,Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp, Tyr, Val}. Genes Genes are the basic unit of inheritance, A gene encodes a protein since it stores information for its construction, Human genome consists of ~40.000 genes Molecular Biology Dogma DNA (RNA-polymerase) -> pre-RNA (Τ->U (uracil))  Pre-RNA (Spliceosome) -> RNA  RNA (Ribosome) -> Protein (the first step for eukariotes)  -- The process deals with the part of the DNA that codes a gene -- The term gene expression refers to the RNA production from a gene. Central Dogma of Molecular Biology DNA molecule (https://public.ornl.gov/site/gallery/default.cfm - U.S. Department of Energy Genome Programs )  Central dogma of Molecular Biology (http://www.accesexcellence.org/AB/GG)  Replication  Transcription  Synthesis  Genetic code (http://www.accesexcellence.org/AB/GG)  In a nutshell  (http://www.doegenomes.org - U.S. Department of Energy Genome Programs )     Little changed imply variations (http://www.doegenomes.org - U.S. Department of Energy Genome Programs ) Genomics, and proteomics Notions   Genomics (studying genes) Proteomics (studying proteins) Molecular Biology targets Sequencing and comparison of genomes of different organisms (evolution, correlation). Gene recognition and definition of the functionality that they define (recognition of contact areas and form there gene recognition). Understanding gene expression (every gene is acticated during the production of the specific gene expression, studying the activation procedure).􀂅 Understanding the role of genes for several diseases (gene mutation). The data that we use can be from experiments or from biological data baes) Solving Computational Problems using Bioinformatics Tools Combining gene information (sequencing and linking) Comparing sequences (information retrieval algorithms using schematic similaritities). Protein categorizaion. Information extraction from gene expressiong. Representing cells as transcription networks. Bioinformativ research areas Implementing and designing computations tool for automatic knowledge extraction from Biological Data Bases Analysing Biological Data Sequences Biological data categorization Molecular modeling Protein analysis Drug design with Computers Computational Tools    Managing data with Molecular Biology is increasingly demanding and the model of relational data bases does not seem satisfying. The target is to design and implement a model that permits automated knowledge discovery from data of large scale. Recognizing common structural charateristics not only at the sequence level but also in two dimensional (2D) or three dimesional (3D) level. Similarity tracking between 2D or 3D shapes Sequence Analysis       Exact matching Approximate matching Alignment (local or global) Searching for maximal common subsequencew Data Bases of biological macromolecules (compression Algorithms for searching, indexing and Cross-Referencing Categorization/Clustering Biological Data   Categorization takes place using common motifs structutal or functional We are interested in applications that integrate fdifferent data types (data integration: sequences, 3D co-ordinates, functional knowledge) Molecular Modeling    Selection of the proper model that described sufficiently the innermolecular correlations of the studied biological system. Computing the energy state of the system and it minimixation, Analysis of these calculations and control of the final formulation so that the conditions and the restrictions that have been settled by are fullfilled. Protein Analysis   Setting the three dimensional structure of a protein and its amino acid sequence Studying the Protein Docking Problem and the πρωτεϊνώνDNA-Protein Docking Problem Computer Based Drug Design  Computer systems store useful information related to: 1) three dimensional architecture of molecules, 2) phycic and chemical properties, 3) comparing one molecule with other molecules, 4) micromolecules and macromolecules complexes, 5) predictios for new molecules.  The first target of scientists that are engaged to computer based drug design is the effective representations of the structure of normal and pathological molecules that are subsesequently compared to pathogenic enzymes and active receptors, and the target of molecular design is settled.  So, if we know the protein structure and the way that the receptor ar the active region acts, we can “built” and simulate their docking saving time and space Computer Based Drug Design     Graphic Algorithms Geometric calculations Arithmetic methods Graph-theoretic methods In silico drug based design (in silico)       Design drug compound Optimizing drug compound In vitro and in vivo tests Toxic tests Human tests Performance tests Design Genome sequences Gene finding Protein sequences Structure prediction Protein structure Geometry calculation Protein surface Molecular modeling Field forces Molecular docking Molecular docking Problems      the lack of a general and unified tool for designing molecular structures that can contain the whole set of biological molecules, the increasing computational complexity that is expressed in time and needed resources and that increases with the size of the molecule Selection of the proper representation model (aalogous to the biological molecult) and defining the crucial parameters (e.g.: geometrical coordinates) that should be checked especially at the level of computational geometry, Robustness to errors in the input data and rebuilding of a three dimensional model from incomplete data, Simulataneous representation of a set of physicchemical properties (entropy, energy) and the means that this information can be comperehensible and interpetable by a researcher   Phylogenetics means to register the genetic correlations between sepcies and the history of living organisms Lyhlogenetic is the tree that depicts the evolution between different sepcied with common ancestor.ο. Φύλλα : είδη/βιολογικές ακολουθίες Εσωτερικοί Κόμβοι: υποθετικοί κοινοί πρόγονοι Projects (1) Handbook of Computational Molecular Biology    Sequence alignment (pairwise sequence alignment, spliced alignment and similarity based gene recognition, multiple sequence alignment, parametric sequence alignment) String data structures (lookup tables suffix trees and suffix arrays, suffix tree applications, enhanced suffix tree and applications) Genome assembly and EST clustering (Computational methods for genome assembly, assembling the human genome, comparative methods for sequence assembly, information theoretic approach to genome reconstruction, expressed sequence tags clustering and applications, algorithms for large scale clustering and assembly of biological sequence data) Projects (2)    Genome scale computational methods (comparison of long genomic sequence, algorithms and applications, chaining algorithms and applications in comparative genomics, computational analysis of alternative splicing, human genetic linkage analysis, haplotype inference) Phylogenetics (phylogenetic reconstruction, consensus trees and supertrees, large scale phylogenetic analysis, high performance phylogeny reconstruction) Microarrays and gene expression analysis (microarray data: annotation retrieval, storage and communication, computational methods for microarray design, clustering algorithms for gene expression analysis, biclustering algorithms, identifying gene regulatory networks from gene expression data, modeling and analysis of gene networks using feedback control analysis) Projects (3)   Computational Structural Biology (predicting protein structure and supersecondary structure, protein structure prediction with lattice models, proteins tructure determination via NMR spectral data, geometric and signal processing of reconstructed 3D maps of molecular complexes, in search of remote homologs, biomolecular modeling using parallel supercomputers) Bioinformatic databases and data mining (string search in external memory , index structures for approximate matching in sequence databases, algorithms for motif search, data minign in computational biology) Interesting articles - Books        J. Cohen, Bioinformatics-An introduction for Computer Sceintists, ACM Computing Surveys, 36(2), 2004, 122-158 N. Luscombe, D. Greenbaum, M. Gerstein, What is Bioinformatics? A proposed definition and overview of the field, Method Inform Med, 2001 Thomas Lengauer, Raimund Mannhold, Hugo Kubinyi, and Hendrik Timmerman , Bioinformatics: form genomes to drugs Thomas Lengauer, Bioinformatics: from genomes to therapies (two volumes) Neil Jones, Pavel Pevzner, An Introduction to Bioinformatics Algorithms, S. Aluru, Hanbook of Computational Molecular Biology Dan Gusfield, Algorithms on trees strings and sequences, Computer Science and Computational Biology, Cambridge University Press. Main Journals    Bioinformatics, Oxford University Press IEEE/ACM transactions on Computational Biology and Bioinformatics (TCBB) Journal of Computational Biology) Science, Nature, Nucleic Acid Research, Journal of Molecular Biology, Proceedings of the National Academy of Sciences (PNAS), Main Conferences     RECOMB, (Research in Computational Molecular Biology) IEEE Computer Society Bioinformatics Conference PSB Pacific Symposium on Biocomputing ISMB Intelligent System for Molecular Biology Medical Informatics Medical Informatics is interdisciplinary and engages with information science, informatics and medical science. It deals with resources, devices and techniques that are demanded to improve the storage, retrieval, and usage of information in health and biomedical science. The combination of health informatics and bioinformatics is termed biomedical informatics. Biomedical informatics               Medical Data Bases and genomics Proteomics and analysis Selection of proper therapies Gene expression in medical diagnosis and prediction Modeling and simulating biological structures and procedures Medical annotation of biological data bases Functional and molecular image processing Embedding molecular data of patients in electronic heath records Medical Decision Systems Interfaces for molecular information in patients Semantic interoperability and ontologies in biinformatics Technologies for biomedical information integration Mutlilevel modeling and vertical intergation of information Data extraction in biomedical informatics                  Data interoperabiblity and standards Biomedical informatis in Public Health Connecting biobanks with data bases of large scale in order to etract knowledge, Records that relate molecular, fmily and clinical data Biodefence systems and netwroks Patitent life management Personal identification and personal genomics Informatiques for supporting fharmocgenetic and layered clinical tests Informatiques that allows the debelopment of medical devices and biosensors Applied pharmaceutical precision Clinical and ethical facts concerning biomedical data processing Medical Decision Support Systems Interfaces for moleculr information to doctors Semantic Interoperability and ontologies in Biomedicine Tecnologies for Biomedical Data Integration Multilevel modeling and vertical integration of information Knowledge extraction in biomedical informatics

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide1 - upatras eclass