* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data management
Gene expression profiling wikipedia , lookup
Point mutation wikipedia , lookup
Genealogical DNA test wikipedia , lookup
DNA sequencing wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Primary transcript wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
DNA barcoding wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenomics wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Molecular cloning wikipedia , lookup
Public health genomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Minimal genome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Human genome wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Synthetic biology wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Genome evolution wikipedia , lookup
Whole genome sequencing wikipedia , lookup
History of genetic engineering wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
Genomic library wikipedia , lookup
Pathogenomics wikipedia , lookup
Helitron (biology) wikipedia , lookup
Human Genome Project wikipedia , lookup
Genome editing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
NCBI Bioinformatics Workshop Rabat, Morocco 2012 What is Bioinformatics? Bioinformatics is the application of information technology to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatics' processes in biotic systems. Its primary use since at least the late 1980s has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Wikipedia What is NCBI? On November 4, 1988 that President Ronald Reagan signed the Health Omnibus Extension Act to create The National Center for Biotechnology Information as part of National Library of Medicine at NIH. • Create automated systems for knowledge about molecular biology, biochemistry, and genetics. • Perform research into advanced methods of analyzing and interpreting molecular biology data. • Enable biotechnology researchers and medical care personnel to use the systems and methods developed. History of molecular biology 1860 Genetics Gregor Mendel discovered that genes determine characteristics of the organism genes are passed to children from both parents 1943 Molecular biology James Watson discovered that DNA molecule might store the genes 1962 Noble Prize James Watson, Francis Crick, Wilkins (Rosaline Franklin) 1970 Central Dogma (first announced in 1952) and reinstated by Francis Crick in Nature. Central Dogma of molecular biology The central dogma of molecular biology was first enunciated by Francis Crick in 1958[1] and restated in a Nature paper published in 1970 The general transfers describe the normal flow of biological information: DNA can be copied to DNA (DNA replication), DNA information can be copied into mRNA, (transcription), and proteins can be synthesized using the information in mRNA as a template (translation). Does the central dogma still stand? Koonin EV. Biol Direct. 2012 Aug 23;7(1):27. [Epub ahead of print] History of biotechnology 1590 the microscope is discovered by Janssen 1675 Leeuwehoek discovered protozoa and bacteria 1855 Escherichia coli bacterium is discovered (major research and production tool for biotechnology 1879 Flemming discovered chromatin, rod-like structures in cell nucleus, later called ‘chromosomes’ 1942 The electron microscope is used to identify and characterize a bacteriophage- a virus that infects bacteria. 1953 Watson and Crick reveal the three-dimensional structure of DNA. 1973 Cohen and Boyer perform the first successful recombinant DNA experiment, using bacterial genes. 1983 The Polymerase Chain Reaction (PCR) technique 1995 First bacterial genome is sequenced by whole genome shotgun technology 2001 The sequence of the human genome is published in Science and Nature, making it possible for researchers all over the world to begin developing treatments. 2005 Next Generation Sequencing: Illumna, MySeq, Ion Toron, PAcBio History of Bioinformatics Sequence database 1960 - Margaret Dayhoff collected sequences in a database that later become PIR 1974 –GenBank; 1980 –EMBL(ENA); 1984 – DDBJ; 1984 –SwissProt Sequence comparison 1970 – Needleman- Wuncsh global pairwise alignment 1972 - Smith-Waterman local alignment 1973 – multiple alignment Database searches by sequence similarity 1988 – FASTA by Pearson and Lipman 1990 – BLAST by Altshul, Gish, Lipman Text search and retrieval system 1990 – Entrez designed by Lipman and Benson Algorithms Gene prediction Protein structure Hidden Markov Model Clustering Trees Problem Solving DATA Data manag ment Validation Experim ent Hypothesis Interpretation MODEL Visualization Analysis For every complex problem, there is an answer that is clear, simple, and wrong… - H. L. Mencken ROC curve analysis Receiver Operating Characteristic (ROC) curve analysis (Metz, 1978; Zweig & Campbell, 1993) Challenges in Computational Biology Multiple alignments and phylogenetic tree Protein structure prediction Protein Genome assembly and annotation Homology searches Challenging issues in Bioinformatics • Data management processing, storage accuracy (highthrouput low quality) search and retrieval presentation • Data analysis algorithms statistical techniques • Simulation modeling and prediction Parameter estimation prediction accuracy NCBI mission: discovery initiative Validation NCBI Analysis Search Visualization What is GenBank? NCBI’s Primary Sequence Database • Nucleotide only sequence database • Archival in nature – Historical – Reflective of submitter point of view (subjective) – Redundant • GenBank Data – Direct submissions (traditional records) – Batch submissions (EST, GSS, STS) – ftp accounts (genome data) • Three collaborating databases – GenBank – DNA Database of Japan (DDBJ) – European Molecular Biology Laboratory (EMBL) Database Sequence Databases RefSeq Labs Sequencing Centers TATAGCCG AGCTCCGATA CCGATGACAA Curators TATAGCCG TATAGCCG TATAGCCG TATAGCCG Updated continually by NCBI GenBank Updated ONLY by submitters Genome Assembly UniGene Algorithms Next Generation Sequencing Next Generation Sequencing NGS produces a lot of data Information retrieval NCBI Discovery initiative Entrez Search and retrieval system Vice President Gore 1997 "From a computer in the comfort of your own home or from one in your neighborhood library, you will be able to access timely and accurate information. Already 30,000 people a day are using MEDLINE. By making it more accessible -- free and private -- we can increase that number many times over." Improve information retrieval Add links filters Related information Rescuing Zero-Result PubMed Searches 2011 2008 Zero-result rescued by spelling 19% Improvement Zero-result rescued by spelling Auto-complete Gene sensor Citation sensor/Hydra 16% of all PubMed searches Unassisted Unassisted 37% Improvement Sequence analysis Visualization NCBI Bioinformatics Workshop 2009 NCBI Bioinformatics Workshop 2011