Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Summer Bioinformatics Workshop 2008 Introduction to Bioinformatics Summer Bioinformatics Workshop 2008 Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester [email protected] Summer Bioinformatics Workshop 2008 Outline • What is Bioinformatics • The Human Genome Project • Applications of Bioinformatics • References Acknowledgement: The presentation includes adaptations from DOE’s “Human Genome 2 Project and Beyond Primer” and Dr. Yan Asmann’s (Mayo Clinic) lecture notes Summer Bioinformatics Workshop 2008 Bioinformatics • Living things have the ability to store, utilize, and pass on information • Bioinformatics strives to – determine what information is biologically important – decipher how it is used to precisely control the chemical environment within living organisms 3 Summer Bioinformatics Workshop 2008 What is Bioinformatics • The collaboration of Biology and Informatics • Originally referred to the use of computational tools to organize and analyze genetic and protein sequence data (first coined by Dr. Hwa Lim in 1988) 4 Summer Bioinformatics Workshop 2008 NCBI’s Definition of Bioinformatics • NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/) – “Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.” – “The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.” 5 Summer Bioinformatics Workshop 2008 Human Genome Project 6 Summer Bioinformatics Workshop 2008 Human Genome Project • Goals include – Identify genes in human DNA – Determine sequence making up human DNA – Store this information in databases – Improve tools for data analysis – Etc. • Milestone – April 2003: HGP sequencing is completed and project is declared finished two years ahead 7 of schedule Summer Bioinformatics Workshop 2008 Interesting Numbers characterizing the Human Genome • 3 billion: – The number of chemical nucleotide bases (A, C, G, and T) contained in the haploid human genome • 3 million: – The number of locations where single-base DNA differences occur in the human genome • 2.4 million: – The number of bases comprising the largest known human gene (the average gene comprises 3000 bases) • 30,000: – The total number of genes estimated (much lower than previous estimates of 80,000 to 140,000) 8 Summer Bioinformatics Workshop 2008 Interesting Numbers characterizing the Human Genome • 99.9% – Fraction of nucleotide bases that are exactly the same in all people • 50% – Fraction of discovered genes for which function is unknown • 2% – Fraction of genome that codes for proteins (the rest: “junk”(?) DNA) • 9%, 11%, 26%, 28%, 45%, 83%, 89%, and 95% – The percentage of genes E. coli, rice, roundworm, yeast, fruit fly, zebrafish, mouse, and chimpanzee share with humans, respectively. 9 Summer Bioinformatics Workshop 2008 How does the human genome stack up? Organism Genome Size Estimated (Bases) Genes Human (Homo sapiens) 3 billion 30,000 Laboratory mouse (M. musculus) 2.6 billion 30,000 Mustard weed (A. thaliana) 100 million 25,000 Roundworm (C. elegans) 97 million 19,000 Fruit fly (D. melanogaster) 137 million 13,000 Yeast (S. cerevisiae) 12.1 million 6,000 Bacterium (E. coli) 4.6 million 3,200 Human immunodeficiency 9700 9 virus (HIV) Humans share most of the same protein families with worms, flies, and plants! 10 Summer Bioinformatics Workshop 2008 Anticipated Benefits of Genome Research • • • • • • • • Molecular medicine Microbial genomics Bioarchaeology Anthropology Evolution Human Migration DNA identification (forensics) Agriculture, livestock breeding, and bioprocessing 11 Summer Bioinformatics Workshop 2008 ELSI: Ethical, Legal, and Social Issues • • • • • • • • • • Privacy and confidentiality of genetic information Fairness in the use of genetic information Psychological impact, stigmatization, and discrimination Reproductive issues Clinical issues Uncertainties associated with gene tests for susceptibilities and complex conditions Fairness in access to advanced genomic technologies. Conceptual and philosophical implications Health and environmental issues Commercialization of products 12 Summer Bioinformatics Workshop 2008 Mike Thompson, Detroit, Michigan -- from The Detroit Free Press Source: http://cagle.msnbc.com/news/gene/gene5.asp 13 Summer Bioinformatics Workshop 2008 Future Challenges: What We Still Don’t Know • Gene prediction and discovery – location, function, structure, regulation, etc. • Single-base DNA variations among individuals – Correlation with health and disease – Disease-susceptibility prediction • • • • Genes involved in complex traits and multigene disorders Protein conservation (structure and function) Proteomes (total protein content and function) in organisms Systems biology – Coordination of gene expression and protein synthesis – Interaction of proteins in complex molecular machines – Microbial consortia useful for environmental restoration • Developmental genetics and genomics • Evolutionary conservation among organisms • And many more … 14 Summer Bioinformatics Workshop 2008 Tackle Future Challenges: Bioinformatics • High volume of data to store, compute, and analyze • Huge amount of information to retrieve, interpret, and visualize • Complex system to study, model, and simulate THAT’S WHY BIOINFORMATICS IS INDISPENSABLE!! 15 Summer Bioinformatics Workshop 2008 Genomics Studies • Genomics – Study of the whole genome – Sequencing and annotating genomes • Comparative genomics – Comparison and characterization of genomes from different species to identify genes and their functions and to investigate evolutionary history • Functional genomics – Understanding the function of genes and other parts of the genome • Structural genomics – Determining the 3D structure of all proteins • Pharmacogenomics – Study of how an individual's genetic inheritance affects the body's response to drugs 16 Summer Bioinformatics Workshop 2008 Genome Sequencing Drew Sheneman, New Jersey -- The Newark Star Ledger Source: http://cagle.msnbc.com/news/gene/gene14.asp 17 Summer Bioinformatics Workshop 2008 Human Migration Patterns using DNA Sequences 18 Summer Bioinformatics Workshop 2008 Medicine and the New Genetics Gene Testing Pharmacogenomics Gene Therapy • Anticipated benefits: – Improved diagnosis of disease – Earlier detection of genetic predispositions to disease – Pharmacogenomics: • Genetic testing before prescribing drugs • Dose-selection based on genetic variations • Drugs tailor-made to each patient However, the application of pharmacogenomics in medical practice is still quite limited today, due to the lack of genetic information from a large population 19 Summer Bioinformatics Workshop 2008 References • NCBI (National Center for Biotechnology Information) http://www.ncbi.nlm.nih.gov/ homepage • NCBI Science Primer http://www.ncbi.nlm.nih.gov/About/primer/ • Human Genome Project Information http://www.ornl.gov/sci/techresources/Human_G enome/home.shtml (esp. link to the Education module) • The Human Genome Project and Beyond Primer http://www.ornl.gov/sci/techresources/Human_G enome/publicat/primer2001/primer.ppt 20