Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Doug Raiford Lesson 1 Biologists and Computer Scientists Note the word “Scientists” 5/23/2017 Introduction 2 Wikipedia Computational biology encompasses bioinformatics Bioinformatics applies algorithms and statistical techniques to the interpretation, classification and understanding of For the purposes of this biological datasets NCBI course we are treating the Bioinformatics: Research, development or application of computational tools and for expanding the use of terms as approaches synonymous biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems 5/23/2017 Introduction 3 It’s All About the Data Virtually every biological experiment requires a processor and software 5/23/2017 Introduction 4 Genetic material comprised of 3 billion base-pairs The sheer volume of data requires the involvement of computational and storage techniques in order to analyze 5/23/2017 Introduction 5 Can now identify which genes are affected by a disease or treatment Thousands of genes per experiment Multiple experiments per time-point Multiple time-points 5/23/2017 Introduction 6 Data growing exponentially Thousands of complete genomes Each genome results in thousands of experiments 5/23/2017 Introduction 7 Vast amounts of data More data coming in daily Sophisticated computational techniques required 5/23/2017 Clustering Searches Optimizations Data mining Pattern recognition Classification Introduction 8 A little about me Work School 5/23/2017 Introduction 9 Moodle is the primary page Weekly schedule ▪ When homeworks are due ▪ When projects are due ▪ Links to quizzes, projects, and homeworks Instructor website Syllabus Slides 5/23/2017 Introduction 10 Bioinformatics: Sequence and Genome Analysis Beginning Perl for Bioinformatics 5/23/2017 Introduction 11 3:30 to 5:00 Tuesday and Thursday Or by appointment Social Science 412 Phone 406-243-5605 Email [email protected] A little about myself 5/23/2017 Introduction 12 Try to get assignments in on time Letter grade for each day late 90 - 100 87 - 89 Component Undergrad 80 - 86 Homework 10% 77 - 79 Quizzes 25% Exams (3 of them) 70 - 76 30% 67 - 69 Projects 35% 60 - 66 Grad Project NA 00 - 59 5/23/2017 Introduction A B+Graduate B 8% C+21% C 25% D+29% D 17% F 13 Your work in this class needs to be your own Overly similar work (to that of your classmates or to content from the web) will be considered to be the result of copying First offense will result in a zero on the assignment Second will be referred to the Dean of Academic Affairs Student Conduct Code http://life.umt.edu/vpsa/student_cond uct.php 5/23/2017 Introduction 14 Let me know of any special needs during this first week Letter from Disability Services for Students (DSS) Religious observances Officially sanctioned, scheduled University extracurricular activity opportunity to make up class assignments or other graded assignments 5/23/2017 Introduction 15 Improve the computer scientist’s understanding of biological systems and problems Improve the biologist’s understanding of the science of computing and provide the beginnings of a CS skill-set 5/23/2017 Introduction 16 Four Distinct Audiences Computer Scientists Undergrad Grad Biologists etc. Undergrad Grad Computer scientists all about the algorithms, implementations, programming languages, design, etc. Biologists mostly just want an introduction to programming Undergrads High-level overview Graduate Students Specific tools and skills that will aid them in research 5/23/2017 Introduction 17 Undergraduates Some algorithms (even implement some) New language: Perl and R Introduce programming concepts Lots of practice programming (8 projects) Lots of guidance from me Graduate students Practice writing a grant (a draft and a final version) Practice writing a paper (a draft and a final version) Practice using several actual Bio Tools All Team projects 5/23/2017 Introduction 18 5/23/2017 Introduction 19 Computer science wise Not really anything new More of an application of existent techniques Dynamic programming techniques Hidden Markov Models Exploratory data analysis ▪ Clustering ▪ Multivariate analysis ▪ Clustering ▪ Principal components analysis 5/23/2017 Introduction 20 Research Ph.D. generating publications Employee in a company Drug company Genomics lab 5/23/2017 Introduction 21 Bioinformatician www.simplyhired.com 5/23/2017 Introduction 22 Techniques that are successful in bioinformatics are the same that are successful in other data-intensive fields 5/23/2017 Introduction 23 5/23/2017 Hunger, need for clean water Global warming Disease Introduction 24 Genetically engineered crops Disease resistant Greater yields Water treatment Genetically engineered microbes ▪ Sewage treatment— purification ▪ Clean oil spills 5/23/2017 Introduction 25 Plants consume CO2 and release O2 But the carbon is released back into the atmosphere over a period of time Genetically engineered plants could convert into stable form 5/23/2017 Introduction 26 Genetically enhanced microbes convert back to fuel Methanococcus jannaschii Takes CO2 and converts it to methane 5/23/2017 Introduction 27 Test for increased risk of certain cancers Personalize medicine Leukemia ▪ Genetic profile resistant to certain chemotherapy Increased risk of drug reactions 5/23/2017 Introduction 28 Many drugs bind to protein active sites Computational techniques for predicting drug performance 5/23/2017 Introduction 29 Actually alter our genetic code to treat genetic disorder Or simply add disembodied gene to our complement 5/23/2017 Introduction 30 What does it have to do with informatics? Where do computer scientists fit in this picture? Role of computers and computer scientists 5/23/2017 Introduction 31 Why biologists would attend 5/23/2017 Introduction 32 CS types good at the data analysis Must understand what the data means Don’t know what to look for—what questions to ask Don’t speak the lingo 5/23/2017 Introduction Haploid Hypertonic Hypotonic Erythematous Cilia Cell membrane Nucleus Lytic cycle Gene Biotic factors Nulliparity Hyperosmotic Natural selection Fluid mosaic model Solute Homologous chromosome Ribosome Mitochondria Diffusion Leucocytes Photosynthesis Genetic variation Organism Plasma membrane Cytoplasm Wagners disease Meiosis Habitat Diploid Cell Youpon Concentration gradient Ecosystem Homeostasis Mitosis Osmosis Allele Enzyme Autotrophic Egestion Mitochondrion Gamete Organisms Nucleotide Aminoacyl Gene expression Point mutation Duplication event 33 Biologists understand the data Don’t know how to formulate the problem in CS terms Don’t know what magic the CS types can bring to the table Don’t speak the lingo 5/23/2017 Introduction Acyclic graph Heap sort Huffman coding Adjacency-matrix Admissible vertex Abstract data type Algorithm All pairs shortest path Euclidean distance Hash Tree Linked list Heap Complexity analysis Recursion Dynamic programming Graph Hamiltonian path Heuristic Hidden Markov Model Principal components analysis Isomorphic Simplex algorithm Mahalanobis distance Discrete event simulation NP-complete Big O Optimization problem Polymorphism Polynomial time Clustering Classifying Stack Queue Stochastic modeling Tail recursion Binary tree Self organizing map Shortest common string Minimum spanning tree Singular matrix Trie Vertex cover 34 Won’t be a full-fledged bioinformatician Will be able to contribute given close guidance practice and continued training and guidance 5/23/2017 Introduction 35 Biologists perform all steps Might involve data retrieval if utilizing repository data 5/23/2017 Determine problem to be solved given data Determine which tool to utilize Manually Format data for input to tool Run tool Analyze results Introduction 36 Biologist Computer Scientist Determine problem to be solved given data Develop algorithmic approach Implement algorithm (write code) Format data for input to algorithm Might involve data retrieval if utilizing repository data Biologist 5/23/2017 Run code Analyze results Introduction 37 Biologist Computer Scientist Determine problem to be solved given data Develop algorithmic approach Implement algorithm (write code) Format data for input to algorithm Might involve data retrieval if utilizing repository data Biologist 5/23/2017 Run code Analyze results Introduction 38 CS types Provide beginnings of a biology background Introduce some existing tools, sources of data, and analysis techniques Biologists Introduce some existing tools, sources of data, and analysis techniques Provide some programming essentials 5/23/2017 Introduction 39