Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 7010: Computational Methods in Bioinformatics (course review) Dong Xu Computer Science Department 271C Life Sciences Center 1201 East Rollins Road University of Missouri-Columbia Columbia, MO 65211-2060 E-mail: [email protected] 573-882-7064 (O) http://digbio.missouri.edu Technical Definitions NIH (http://www.bisti.nih.gov/) Bioinformatics: “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, represent, describe, store, analyze, or visualize such data”. Computational Biology: “the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems”. Course Topics Data interpretation in analytical technologies Data management and computational infrastructure Discovery from data mining Modeling, prediction and design Theoretical in silico biology Cover classical/mainstream bioinformatics problems from computer science prospective Discovery from Data Mining (I) Discovery from Data Mining (II) Data source Genomic / protein sequence Microarray data Protein interaction Complicated data Large-scale, high-dimension Noisy (false positives and false negatives) Discovery from Data Mining (III) Pattern/knowledge discovery from data many biological data are generated by biological processes which are not well understood interpretation of such data requires discovery of convoluted relationships hidden in the data which segment of a DNA sequence represents a gene, a regulatory region which genes are possibly responsible for a particular disease Modeling, Prediction and Design (I) Modeling and prediction of biological objects/processes Sequence comparison Secondary structure prediction Gene finding Regulatory sequence identification Modeling, Prediction and Design (II) Prediction of outcomes of biological processes computing will become an integral part of modern biology through an iterative process of model formulation computational prediction experimental validation From prediction to engineering design Drug design Protein structure prediction to protein engineering Design genetically modified species Scope of Bioinformatics data management; data mining; modeling; prediction; theory formulation bioinformatics genes, proteins, protein complexes, pathways, cells, organisms, ecosystem an indispensable part of biological science engineering aspect scientific aspect computer science, biology, statistics mathematics, physics, chemistry, engineering,… Bioinformatics Foundations Technology Biology/medicine Computer Science Statistics From interdisciplinary field to a distinct discipline Course Coverage A general introduction to the field of bioinformatics problems definitions: from biological problem to computable problem key computational techniques A way of thinking: tackling “biological problem” computationally how to look at a biological problem from a computational point of view how to formulate a computational problem to address a biological issue how to collect statistics from biological data how to build a computational model how to design algorithms for the model how to test and evaluate a computational algorithm how to access confidence of a prediction result Dong’s top 10 list for computational methods in BI 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Dynamic programming Neural network Hidden Markov Model Hypothesis test Bayesian statistics Clustering Information theory Support Vector Machine Maximum likelihood Sampling search (Gibbs, Monte Carlo, etc) Research Areas 1. “Solved” problems 2. “Developed” areas with remaining challenges hard to solve 3. Developing areas 4. Emergent areas 5. Future directions 5 3 2 4 1 “Solved” Problems DNA sequence base calling and assembly Pairwise sequence comparison Protein secondary structure prediction Disordered region in proteins Transmembrane segment prediction Subcellular localization Signal peptide prediction Protein geometry Homology modeling Physical/genetic mapping informatics “Developed” areas with remaining challenges Gene finding Phylogenetic tree construction and evolution Protein docking Drug design Protein design Linkage analysis and quantitative traits (QTL) Microarray data collection Gene expression clustering Developing Areas Multiple sequence comparison and remote homolog search Repetitive sequence analysis Protein structure comparison Protein tertiary structure prediction RNA secondary structure prediction Regulatory sequence analysis Computational proteomics Protein interaction networks Gene ontology and function prediction Computational neural science and applications in various species and systems (e.g., cancer) Emergent Areas Pathway (regulatory network) prediction ChIP-chip analysis Tiling array analysis Haplotype/SNP analysis Computational comparative genomics Text (literature) mining Small RNA and anti-sense regulation Alternative splicing prediction Computational metabolomics Possible future directions Genome semantics Membrane protein structure prediction RNA tertiary structure prediction Post-translational modification Dynamics of regulatory networks Virtual cell/organism modeling Phenotype-genotype relationship … (nobody knows) Where the science is going? (1) Bioinformatics has been a “technology” to biological research: Interpretation of data generated by bench biologists We start to see a trend that computational predictions can guide experimental design With more high-throughput technologies become available, discovery-driven science will play increasingly more important roles in biology research With computational techniques continue to mature for biological applications, we will see more and more computational applications with powerful prediction capabilities Where the science is going? (2) Like physics, where general rules and laws are taught at the start, biology will surely be presented to future generations of students as a set of basic systems ....... duplicated and adapted to a very wide range of cellular and organismic functions, following basic evolutionary principles constrained by Earth’s geological history. --Temple Smith, Current Topics in Computational Molecular Biology Major research centers (1) National Center for Biotechnology Information (NCBI) of NIH (http://www.ncbi.nlm.nih.gov/) the home of many important databases including GenBank the home of many important bioinformatics tools including BLAST Major research centers (2) European Molecular Biology Laboratory (EMBL) (http://www.embl-heidelberg.de/) has some of the most powerful research groups in bioinformatics Has numerous tools and databases Major research centers (3) Sanger Institute (http://www.sanger.ac.uk/) The Institute for Gonomic Research (TIGR, http://www.tigr.org/) Swiss-Prot (http://www.tigr.org/) Major Universities in US University of California at Santa Cruz University of California at San Diego Washington University University of Southern California Stanford University Columbia University Boston University Harvard University MIT Virginia Tech Major journals Bioinformatics Nucleic Acids Research Genome Research Journal of Computational Biology Journal of Bioinformatics and Computational Biology In silico Biology Briefings in bioinformatics Applied Bioinformatics IEEE/ACM Transactions on Computational Biology and Bioinformatics Proteins: structure, function and bioinformatics Journal of Computer Science and Technology Genomics, Proteomics and Bioinformatics … Major conferences Intelligent Systems for Molecular Biology (ISMB) Annual Conference on Computational Biology (RECOMB) IEEE/Computational Systems Bioinformatics Conference (CSB) Pacific Symposium on Biocomputing (PSB) European Conference on Computational Biology (ECCB) IEEE Conference on Biotechnology and Bioinformatics (BIBE) International Workshop on Genome Informatics (GIW) Asia-Pacific Bioinformatics Conference (APBC) … Academicians Michael Phil Waterman Green Gene Myers Barry Honig No Nobel Price Winner yet… Discussions Scope of the new biology (large-scale) Technology (tool development) vs. science (biological application) Knowledge vs. prediction Experimental vs. computational/theoretical First principle vs. empirical / statistical Automated vs. curated One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man. Choosing Bioinformatics as Career - 1 Field outlook Must be a believer of bioinformatics (for its value to science) Must have a strong motivation and willing to walk extra miles (learn more disciplines) Technologist vs. technician Choosing Bioinformatics as Career - 2 Molecular & cellular and evolutionary biology understanding the science Computational, mathematical, and statistical sciences mastering the techniques High-throughput measurement technologies Knowing what biological data are obtainable