Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Astrobiology + Bioinformatics R. Eric Collins, 11 August 2010 METABOLISMS BACTERIA chemo-litho-autotrophs chemo-litho-heterotrophs chemo-organo-autotrophs chemo-organo-heterotrophs photo-litho-autotrophs photo-litho-heterotrophs photo-organo-autotrophs photo-organo-heterotrophs ARCHAEA chemo-litho-autotrophs chemo-litho-heterotrophs chemo-organo-autotrophs chemo-organo-heterotrophs photo-litho-heterotrophs photo-organo-heterotrophs EUCARYA chemo-organo-heterotrophs* photo-organo-autotrophs* photo-organo-heterotrophs* *utilizes Bacterial endosymbiont Bioinformatics ● ● “The application of statistics and computer science to the field of molecular biology” Common applications of Bioinformatics: ● Sequence analysis ● Genome annotation and comparative genomics ● Computational evolutionary biology ● Analysis of gene expression and regulation ● ● Prediction of protein structure and protein expression Modeling complex ecological systems Central Dogma of Molecular Biology (for a biologist) replication DNA transcription RNA translation protein Central Dogma of Molecular Biology (for a computer scientist) cp DNA.tar DNA.tar.1 md5 DNA.tar DNA.tar.1 MD5 (DNA.tar) = 483f0777e... MD5 (DNA.tar.1) = f39e1e9... DNA.tar tar -xf DNA.tar RNA.c gcc -o protein RNA.c protein http://www.youtube.com/watch?v=D3fOXt4MrOM The Era of Molecular Genetics and Exobiology ● ● ● ● ● ● 1924: Alexander Oparin writes “The Origin of Life” 1947, 1952: Joshua Lederberg founded modern bacterial genetics and gene manipulation 1954: “The Origins of Life” by JBS Haldane, geneticist 1960: Lederberg writes “Exobiology: Approaches to Life Beyond Earth” 1965: Linus Pauling founded the use of “Molecules as Documents of Evolutionary History” 1977, 1990: Carl Woese identified Archaea as the Third Domain of Life The Rise of Computers NASA Ames Center for Bioinformatics (1996 to 2001) NASA Center for Astrobioinformatics (December 2003 to Feb 2004) NASA Center for Computational Astrobiology (2000 to 2008) The PCR Revolution: Culture Independence ● gene sequencing (informational & functional) ● identification of cells at microscopic level ● community fingerprinting ● metabolic profiling of DNA, RNA, protein, lipids Beaufort Sea, Canadian Arctic Collins et al. 2010 Ribosomal gene sequencing Shark Bay, Western Australia Leuko et al. 2006 cc photo by flickr user Koala:Bear Community fingerprinting The Sequencing Revolution: Comparative Genomics ● Genomics ● ● Transcriptomics ● ● Microarrays: $100-$1000 per slide, ~10,000 probes Proteomics ● ● Sanger sequencing: $7000/Mb, 96 x 700bp reads Mass spectrometer, ~$500 per experiment DOE JGI IMG: 1911 Bacteria, 84 Archaea, 76 Eukarya Siberian Permafrost Ayala-del-Río et al. 2010 cc photo by flickr user Магадан Gene expression by Psychrobacter arcticus 273-4 Black Sea, Russia Fuchsman and Rocap 2006 cc photo by flickr user И. Максим One way of computing genetic similarity Protein similarity by whole genome BLAST (ranks) A1 B1 B2 ... Bn A2 1 1 2 2 3 2 4 4 ... 2 2 1 4 1 3 1 1 An 3 3 3 3 2 4 2 3 4 4 4 1 4 1 3 2 Reciprocal best BLAST hits A1 B1 B2 ... Bn A2 1 1 2 2 3 2 4 4 ... 2 2 1 4 1 3 1 1 An 3 3 3 3 2 4 2 3 4 4 4 1 4 1 3 2 Whole genome comparisons all Bacteria vs. all Archaea with reciprocal best BLAST hits Whole genome comparisons all Bacteria vs. all Archaea with reciprocal best BLAST hits Limited by genome size of bacterium Limited by genome size of archaeon (oxygen-using salt-loving Archaeon) (oxygen-sensitive high-temperature-loving Archaeon) Anaerobic/thermophilic Bacteria are genomically more similar to Archaea than other Bacteria The Sequencing Revolution (2.0): Metagenomics ● ● ● Next Generation Sequencing technology ● 454: $30/Mb, 1 million x 400bp reads, 12 hours ● Illumina: $6/Mb, 15 million x 2 x 100bp reads, 5 days ● SOLiD: $3/Mb, 200 million x 2 x 25bp reads, 5 days ● PacBio, Ion Torrent, Helicos, ... Applications ● Metagenomics: whole community sequencing ● Deep Sequencing: hypervariable tag sequencing ● Transcriptomics: whole transcriptome sequencing ● ???? Essential resources ● IMG/m (217 metagenomes), CAMERA Diffuse Hydrothermal Vents Sogin et al. 2006 Short, error-prone sequencing reads ... x 20,000 (or 20,000,000) “Rare Biosphere” World Ocean viromes Angly et al. 2006 Genome assembly with short reads is hard Virus genes are mostly unknown Cuatro Ciénegas, Mexico Breitbart et al. 2008 Pilbara craton, Western Australia Shen et al. 2001 cc photo by flickr user ccferg Sulfate-reducing Bacteria & Archaea Fractionation == biological sulfate reduction (?) Placing time boundaries on the evolution of metabolisms Matching observations to genetics Matching observations to genetics Clustering proteins by similarity A1 B1 B2 ... Bn A2 1 1 2 2 3 2 4 4 ... 2 2 1 4 1 3 1 1 An 3 3 3 3 2 4 2 3 4 4 4 1 4 1 3 2 Birth, Death, Innovation Project ideas ● Noise/error filter for short sequencing reads ● Genome assembly from short error-prone reads ● ● Mathematical formalization of bacteria vs. archaea genome size relationships Better ways of calling outliers e.g. in all vs. all BLAST comparisons ● Digitization of Bergey's manual ● Methods for astronomy analogies in biology ● gamma ray bursts? habitable zones? ...