Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Genomics Genomics is the study of an organism's genome and the function of the genes >>200 microbial genomes completely sequenced. Key question: How to use this rich source of information? DNA code: A , G , C , T D W C I start stop gene Functional genomics All genes Single genes DNA Organisation (HT-sequencing) RNA Expression (DNA-arrays) PROTEIN METABOLISM Synthesis/Structure (2D gels -MS-NMR-Xray) Flux (NMR-kinetics-model) FUNCTION GENOME TRANSCRIPTOME PROTEOME METABOLOME Reading the genome map Steps 1. 2. 3. 4. 5. 6. 7. Determine complete DNA sequence Predict genes Translate genes to proteins Predict functions of proteins Reconstruct metabolic pathways Predict regulatory elements Reconstruct regulatory networks Next: experimental confirmation ! transciptomics, proteomics, metabolomics Genomics: from sequence to predicted function Raw sequence data: Bacterial sequence of 2.000.000 to 5.000.000 nucleotides AAACACTTAGACAATCAATATAAAGATGAAGTGAA CGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAAC AAATCAAAGATCAAAAAAGGATAGAAGAACAAGAA AAACCACAAACACTTAGACAATCAATATAAAGATG AAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTG CAGGAACAAATCAAAGATCAAAAAAGGATAGAAGA ACAAGAAAAACCACAAACACTTAGACAATCAATAT AAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGA AAACTTGCAGGAACAAATCAAAGATCAAAAAAGGA TAGAAGAACAAGAAAAACCACAAACACTTAGACAA TCAATATAAAGATGAAGTGAACGCTCTTAAAGAGA AGTTGGAAAACTTGCAGGAACAAATCAAAGATCAA AAAAGGATAGAAGAACAAGAAAAACCACAAACACT TAGACAATCAATATAAAGATGAAGTGAACGCTCTT AAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAA AGATCAAAAAAGGATAGAAGAACAAGAAAAACCAC AAACACTTAGACAATCAATATAAAGATGAAGTGAA CGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAAC AAATCAAAGATCAAAAAAGGATAGAAGAACAAGAA AAACCAC A virtual cell: overview of predicted pathways What do we want to learn ? Overview of • complete repertoire of genes and proteins • complete metabolic network • complete regulatory network • diversity and evolution Systems biology: understand how a whole cell works Genome content bacteria Size (Mb) 2 yeast 12 worm 97 fly man 137 3.500 % genes total genes junk ? 2.000 6.300 19.000 14.000 30.000 ? Microbial genomes Microbial genome sequencing 1995-2000: Mainly pathogenic bacteria 2000-present: Genomes of many food relevant micro-organisms - Lactic Acid Bacteria - Food Spoilage Bacteria Genome Sequencing Projects 2005: 250 complete genomes 600 million bases 600 thousand proteins 1997 2000 2003 Microbial genomes Archaea sequenced genomes size range (Mb) 23 Bacteria 236 0.5-5.8 0.6-9.1 genes 540-4500 470-8300 % GC 31 - 68 22 - 72 Coding density is ~ 85-90% Average of ~ 1 gene per 1 kb Status Sept. 2004 Bacterial genomes Chromosomes Plasmids 1 0-10 Exceptions Linear chromosomes • Borrelia burgdorfei • Rickettsia typhi • Desulfotalea psychrophila • Streptomyces coelicolor Two chromosomes • Ralstonia solanacearum • Agrobacterium tumefaciens • Vibrio cholerae • Brucella melitensis • Deinococcus radiodurans circular circular 0.91 Mb 1.11 Mb 3.52 Mb 8.67 Mb 3.72 and 2.09 Mb 2.84 and 2.07 Mb 2.96 and 1.07 Mb 2.12 and 1.18 Mb 2.65 and 0.41 Mb 0.6 - 9 Mb 1 - 250 kb Biological Databases Database types: • • • • • • • • sequence annotation enzyme genome structure pathway organism organizational EMBL, GenBank SwissProt Enzyme, Brenda Entrez, EBI-Genome Reviews PDB, SCOP KEGG, EcoCyc FlyBase, WormBase Pfam, COG Summarized each year in Nucleic Acids Res., January issue Genome Databases Main databases • NCBI Entrez www.ncbi.nlm.nih.gov/genomes/lproks.cgi • EBI Genome Reviews www.ebi.ac.uk/genomes/bacteria.html • TIGR Comprehensive Microbial Resource (CMR) www.tigr.org/tdb • Integrated Genomics GOLD www.genomesonline.org • CBS Genome Altas www.cbs.dtu.dk/services/GenomeAtlas Genome Databases Specialized databases • Sanger Institute (UK) own genomes, many pathogenic bacteria www.sanger.ac.uk/projects • Pasteur Institute (France) own genomes, many pathogenic bacteria www.pasteur.fr/english.html • MIPS (Germany) PEDANT – all genomes http://pedant.gsf.de/ • DOE-JGI (USA) own genomes, many microbial - environmental http://genome.jgi-psf.org/microbial/ Genome Databases Overviews of databases • ABIM (France) organism databases www.up.univ-mrs.fr/~wabim/english/genome.html Complete Genomes • COGENT (COmplete GENome Tracking : a flexible data environment for computational genomics) EBI (UK) • Complete genomes NCBI (Haemophilius influenza, E. Coli, Mycoplasma genitalium) • Completed Genomes at the EBI EBI (UK) • Completed microbial genomes InfoBioGen (France) • Completed microbial genomes TIGR • Completely sequenced genomes Rockfeller (USA) • EMGLib (completely sequenced bacterial genomes and the yeast genome) PBIL (France) • Fully Sequenced Genomes Present In The Public DataBases GOLD (USA) • Integr8 (integrated views of complete genomes and proteomes) EBI (UK) • PEDANT (Protein Extraction, Description, and Analysis Tool) MIPS (more 200 Genome Databases Comparative genomics databases • • ERGO (USA) Comparative genomics analysis http://ergo.integratedgenomics.com/ERGO Genome Databases Comparative genomics databases • STRING (De) Search Tool for the Retrieval of Interacting Genes/Proteins http://www.bork.embl-heidelberg.de/STRING Genome Databases Metabolic pathway-genome databases (PGDB) • KEGG (Japan) Kyoto Encyclopedia of Genes and Genomes http://www.genome.jp/kegg/kegg2.html • EcoCyc E.coli metabolic pathways (highly curated) http://www.ecocyc.org. (USA) • BioCyc collection of PGDBs http://www.biocyc.org Modeling metabolic networks: what are the questions? • modeling the components and their wiring (roadmap) • modeling regulatory interactions (traffic lights) • modeling fluxes and dynamics (traffic) • predictive modeling: rational design (solve traffic jams) • “genomics modeling”: provide biological interpretation of omics data Genome sequence annotation