* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Quick Overview of Bioinformatics
Vectors in gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Community fingerprinting wikipedia , lookup
Exome sequencing wikipedia , lookup
Molecular ecology wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Drug design wikipedia , lookup
Non-coding DNA wikipedia , lookup
Drug discovery wikipedia , lookup
Personalized medicine wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Quick Overview of Bioinformatics NCBI Chuong Huynh NIH/NLM/NCBI New Delhi, India September 28, 2004 [email protected] What is bioinformatics? - Definition • My definition – bringing biological themes to computers • Peter Elkin: Primer on Medical Genomics: Part V: Bioinformatics – “Bioinformatics is the discipline that develops and applies informatics to the field of molecular biology.” • BISTIC Bioinformatics Definition – “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data” • BISTIC Computational Biology Definition • http://www.bisti.nih.gov/ NCBI – “Computational Biology: the development and application of dataanalytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.” Useful/Necessary Bioinformatics Skills NCBI • Strong background in some aspect of molecular biology!!! • Ability to communicate biological questions comprehensibly to computer scientists • Thorough comprehension of the problem in the bioinformatics field • Statistics (association studies, clustering, sampling) • Ability to filter, parse, and munge data and determine the relationships between the data sets • Mathematics (e.g. algorithm development) • Engineering (e.g. robotics) • Good knowledge of a few molecular biology software packages (molecular modeling / sequence analysis) • Command line computing environment (Linux/Unix knowledge) • Data administration (esp. relational database concept) and Computer Programming Skills/Experience (C/C++, Sybase, Java, Oracle) and Scripting Language Knowledge (Perl and perhaps Phython) Bioinformatics Flow Chart (0) 1a. Sequencing 1b. Analysis of nucleic acid seq. 2. Analysis of protein seq. 3. Molecular structure prediction 6. Gene & Protein expression data 7. Drug screening Ab initio drug design OR Drug compound screening in database of molecules NCBI 4. molecular interaction 8. Genetic variability 5. Metabolic and regulatory networks Bioinformatics Flow Chart (1) 1a. Sequencing -Base calling -Physical mapping -Fragment assembly 1b. Analysis of nucleic acid seq. -gene finding -Multiple seq alignment evolutionary tree Stretch of DNA coding for protein; Analysis of noncoding region of genome 2. Analysis of protein seq. Sequence relationship 3. Molecular structure prediction 5. Metabolic and regulatory networks Protein-protein interaction Protein-ligand interaction NCBI 4. molecular interaction 3D modeling; DNA, RNA, protein, lipid/carbohydrate Bioinformatics Flow Chart (2) 6. Gene & Protein expression data 7. Drug screening Ab initio drug design OR Drug compound screening in database of molecules 8. Genetic variability -EST -DNA chip/microarray a) Lead compound binds tightly to binding site of target protein b) Lead optimization – lead compound modified to be nontoxic, few side effects, target deliverable Drug molecules designed to be complementary to binding Sites with physiochemical and steric restrictions. -Now investigated at the genome scale NCBI -SNP, SAGE Genome Sequencing Strategy Clone by clone vs whole genome shotgun Libraries Subcloning; generate small insert libraries Sequencing Assembly Closure Release NCBI Annotation •Most genome will be sequenced and can be sequenced; few problem are unsolvable. Assembly: Process of taking raw single-pass reads into contiguous •Problem consensus sequence (Phred/Phrap) lies in understanding what you have: Closure: Process of ordering and merging consensus •Gene finding sequences into a singleprediction/gene contiguous sequence •Annotation -DNA features (repeats/similarities) -Gene finding Release to the public e.g. EMBL or GenBank -Peptidedata features -Initial role assignment -Others- regulatory regions Sequencing Genomic DNA Shearing/Sonication Subclone and Sequence Small DNA fragments 1.0-2.0kb Clone Library pUC18 DNA sequencing Random clones Shotgun reads Assembly Contigs Finishing read Both strands coverage; Gap filled Complete sequence NCBI Finishing Annotation of eukaryotic genomes Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA translation Nascent polypeptide Comparative gene prediction folding Active enzyme Function Reactant A Product B NCBI Functional identification Annotation • • • • NCBI Predict protein Extract ORFs Remove errors Compare with database of ‘known function proteins’ • Provide transitive annotations Positional Cloning NCBI Positional Candidate Cloning NCBI The new information is always partial • • • • • Complete Eukaryotic Genomes Ongoing Eukaryotic Prokaryotic Ongoing Published Even a complete genome is only partially understood NCBI Why not use the genome sequence once its ‘ready’? • Finding exons • Expressed sequences are there in part and represent a very very powerful key. NCBI – 30% overprediction – 20% not found at all – Comparison systems rely on EST sequences which themselves contain large error rates – Others are looking through partial data – Once the genome is done …when? Interpreting data from many sources NCBI Genomics and Tropical Diseases How Can Genomics Contribute to the Control of Tropical Diseases? Challenges and Opportunities The Role of Bioinformatics NCBI Strategic emphases for research http://www.who.int/tdr/grants/strategic-emphases/default.htm WHO/TDR Genomics and World Health Report 2002 Why Pathogen Genomics? B. Bloom (1995) A microbial minimalist. Nature 378:236 NCBI “The power and cost-effectiveness of modern genome sequencing technology mean that complete genome sequences of 25 of the major bacterial and parasitic pathogens could be available within five years. For about 100 million dollars (…), we could buy the sequence of every virulence determinant, every protein antigen and every drug target.” Genomics and Drug Development for Tropical Diseases: Challenges • Knowledge limitations – A large proportion of pathogen genes have unknown function – Heavy investment in genomics is done by the commercial sector and therefore not widely available • Emphasis and priorities NCBI – Genomes of non-pathogenic model organisms (S. cerevisiae, D. melanogaster, C. elegans, A. thaliana) – Genomes of pathogens that affect individuals in developed countries – Neglected diseases neglected pathogens Doing Successful Science in the new millennium NCBI • Huge increase in available biological information • Classic paradigm of ‘molecular biology’ now is altering rapidly to genomics • Understanding of the new paradigms concerns more than ‘just bench biology’ • Discovery requires large scale systems and broad collaborations, Global problems • Funding comes in large amounts at group level, no longer a single laboratory or institution effort. • Accountable output The Bigger Picture (Malaria) NCBI Genomics Approach to Drug Development: Opportunities • Classical laboratory assays aim at targets in which mutation is lethal to the pathogen – Valuable targets can be missed NCBI • Sulphonamides: Inhibition of the p-aminobenzoic acid pathway not lethal for growth in laboratory but severely attenuate the capacity to cause disease Genomics Approach to Drug Development: Opportunities • New approaches for the identification of gene products specifically involved in the disease process may uncover further drug targets • Pathogen genomics and data mining for the discovery of new drug targets NCBI – Signature tagged mutagenesis (STM) – Transposon site hybridization (TraSH) Fosmidomycin • September 1999: a basic • 1st semester 2001: Results of Phase I clinical trials NCBI science breakthrough (data mining through bioinformatics identify new targets for chemotherapy of malaria) Fosmidomycin example - lesson • A lesson to take home: 1½ years from data mining and laboratory research to phase II, proof-ofprinciple clinical trials NCBI Bioinformatics: Opportunities in Health Research and Development • New drug research and development – – – – – Identification of novel drug/vaccine targets Structural predictions Tapping into biodiversity Reconstruction of metabolic pathways Systems biology NCBI • Identification of vaccine candidates through analysis of surface antigens and epitopes A Window of Opportunity for Disease Endemic Countries • Bioinformatics is an extremely important tool, with relevance to studying pathogenic organisms – Pathogens of interest to DECs already being sequenced (e.g. P. falciparum, T. cruzi, T. brucei, Leishmania sp.) NCBI • Computational biology is ‘people-intensive’, less affected by infrastructure, economics, etc than other areas of biological research • ‘Critical mass’ issues less critical – a world-wide community is within reach Relatively Modest Hardware Needs and Technical Support • Linux operating system permits use of the personal computer as a powerful workstation – Vast repository of public domain software for computational biology – EMB network nodes, FIOCRUZ (Brazil), SANBI (South Africa), CECALCULA (Venezuela), ICGEB (Trieste and New Delhi) NCBI • Individual accounts for remote access and data processing can be open at highperformance computer facilities and regional centers Relatively Modest Hardware Needs and Technical Support • Powerful searches using public websites – NCBI, EMB nodes, Sanger Center, Expasy/SwissProt, KEGG database • High-speed internet access is becoming more and more available in disease endemic countries through regional and international support, e.g.: NCBI – Asia-Pacific Advanced Network Consortium (APAN) http://www.th.apan.net/ – MIMCom Malaria Research Resources http://www.nlm.nih.gov/mimcom/about.html International Training Course on Bioinformatics and Computational Biology Applied to Genome Studies (Train-the-trainers Workshop) May 21-June 15, 2001 FIOCRUZ, Brazil NCBI TDR Regional Training Centers & Regional Training Courses on Bioinformatics Applied to Tropical Diseases • Africa – SANBI, Cape Town, South Africa • Course: Jan 20-Feb 02, 2002; Mar 19-Apr 4, 2003; Feb 215, 2004 (with NBN series) – Univ of Ibadan, Ibadan, Nigeria • Course: May 26-Jun 07, 2003 • South America – USP, São Paulo, Brazil • Course: Feb 18-March 02, 2002; July 17-19, 2003; July 516, 2004; • Southeast Asia – ICGEB, New Delhi, India • Course: Apr 26-May 09, 2002; Sep 22-Oct 06, 2003; Sept 28-Oct 11, 2004 – Mahidol University, Bangkok, Thailand • Course: Jul 09-23, 2002; Sep 29-Oct 10, 2003; July 26Aug6, 2004 Training Course on Bioinformatics and Functional Genomics Applied to Insect Vectors of Human Diseases At the Center for Bioinformatics and Applied Genomics (CBAG) and Center for Vector and Vector-Borne Diseases (CVVD), Faculty of Science, Mahidol University, Bangkok, Thailand January 17-28, 2005 NCBI Training Course on Functional Genomics of Insect Vectors of Human Diseases African Center for Training in Functional Genomics of Insect Vectors of Human Diseases (AFRO VECTGEN) At the Malaria Research and Training Center (MRTC), Bamako, Mali Dec 1-16, 2004 Beginning Bioinformatics Books NCBI • Baxevanis & Ouellette 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 2nd Edition. John Wiley Publishing. • Gibas & Jambeck 2001. Developing Bioinformatics Computer Skills. O’Reilly. • Bioinformatics: Genome Sequence Analysis Mount 2001 • Bioinformatics For Dummies – Claverie & Notredame 2003 • Bioinformatics and Functional Genomics Pesvner 2003 • Introduction to Bioinformatics – Lesk 2002 • Fundamental Concepts of Bioinformatics Krane & Raymer 2003 • Beginning Perl for Bioinformatics – Tisdall 2002 • Primer of Genome Science – Gibson & Muse 2002 The Challenge What is expected of you? NCBI Course Schedule Take out your course schedule. Comments and Suggestions NCBI