* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Are there bacterial species, and what is the goal of metagenomics
Mitochondrial DNA wikipedia , lookup
DNA barcoding wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Designer baby wikipedia , lookup
Human genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene expression profiling wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Koinophilia wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genome editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic library wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metagenomics wikipedia , lookup
Microevolution wikipedia , lookup
Minimal genome wikipedia , lookup
Are there bacterial species, and what is the goal of metagenomics Tom Doak, Lynch lab, Biology; [email protected] 20% time, labs of HaiXu Tang and YuZhen Ye The concept of species • Reproduc9ve isola9on (however, morphology is always the first pass method) • Species we can be sure about; all higher taxonomic levels we can argue about • Common ancestor (for the genome as a set): Phylogenies Old fashioned bacterial classification: shape, staining, metabolism Figure 25-4 Bacterial shapes and cell-surface structures Bacteria are classified into three different shapes: (A) spheres (cocci), (B) rods (bacilli), and (C) spiral cells (spirochetes). (D) They are also classified as Gram-positive or Gram-negative. Bacteria such as Streptococci and Staphylococci have a single membrane and a thick cell wall made of cross-linked peptidoglycan. They retain the violet dye used in the Gram staining procedure and are thus called Gram-positive. Gram-negative bacteria such as E. coli and Salmonella have two membranes, separated by a periplasmic space (see Figure 11-17). The peptidoglycan layer in the cell wall of these organisms is located in the periplasmic space and is thinner than in Gram-positives; they therefore fail to retain the dye in the Gram staining procedure. The inner membrane of Gram-negative bacteria is a phospholipid bilayer, and the inner leaflet of the outer membrane is also made primarily of phospholipids; the outer leaflet of the outer membrane, however, is composed of a unique glycosylated lipid called lipopolysaccharide (LPS) (see Figure 25-40). (E) Cellsurface projections are important for bacterial behavior. Many bacteria swim using the rotation of helical flagella (see Figure 15-68). The bacterium illustrated has only a single flagellum at one pole; others such as E. coli are decorated with multiple flagella all over the surface. Straight pili (also called fimbriae) are used to adhere to surfaces in the host and to facilitate genetic exchange between bacteria. Both flagella and pili are anchored to the cell surface by large multiprotein complexes. Bacterial pathogenomics Mark J. Pallen & Brendan W. Wren Nature 449, 835-842(18 October 2007) The problem of horizontal transfer • Genes in a genome do not share common decent. In the worst case, a mosaic of genes from different sources (from different “species” ). • Two general types, – selfish elements (transposons and phage) – Metabolic genes (generally, enzymes) • Iden9fied by: – Genome comparisons – Composi9onal differences across a genome We found that 755 of 4,288 ORFs (547.8 kb) have been introduced into the E. coli genome in at least 234 lateral transfer events since this species diverged from the Salmonella lineage 100 million years (Myr) ago. The average age of introduced genes was 14.4 Myr, yielding a rate of transfer 16 kby/Myry lineage since divergence. Although most of the acquired genes subsequently were deleted, the sequences that have persisted ('18% of the current chromosome) have conferred properOes permiPng E. coli to explore otherwise unreachable ecological niches. IslandViewer | An integrated interface for computational identification and visualization of genomic islands: Salmonella enterica subsp. enterica serovar Typhi str. CT18 Nature 413, 852‐856. 2001 Complete genome sequence of Salmonella enterica serovar Typhimurium LT2 30 genes for vitamin B12 synthesis Genomic islands in pathogenic and environmental microorganisms Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker Nature Reviews Microbiology 2, 414-424 (May 2004) Genomic islands in pathogenic and environmental microorganisms Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker Nature Reviews Microbiology 2, 414-424 (May 2004) Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker. 2004. Genomic islands in pathogenic and environmental microorganisms. Nature Reviews Microbiology 2, 414-424. Genomic islands in two strains of a marine phototroph, Prochlorococcus The two strains of Prochlorococcus marinus are cyanobacteria, major oxygenic producers in the oceans, consuming a large part of atmospheric CO2. The strains MED4 and MIT96512 differ by only 0.8% of their genome, yet their distributions throughout the ocean are very different, for unknown reasons. The reason for the difference in distribution may have to do with genes encoded within five genomic islands specific to MED4 (ISL1, ISL4, ISL5) or to MIT9312 (ISL2, ISL3). Iden9fica9on of Genomic Islands: Synechococcus sp. WH8102 vs. Sargasso Sea MUMmerplot (alignment of Sargasso Sea reads against the genome of this marine cyanobacterium) EvoluOon of virulence in Pseudomonas syringae pv. phaseolicola.(John Mansfield) Sequence and funcOonal analyses of Haemophilus spp. genomic islands Gene islands and genome diversity in Pseudomonas aeruginosa: Different Pseudomonas aeruginosa strains show a remarkable genomic diversity mainly caused by insertion and deletion of mobile DNA blocks such as (pro)phages, plasmids, genomic islands and other elements. We have monitored large genomic islands in several P. aeruginosa strains and analysed these DNA blocks both for function of their encoded proteins and mobilisation from the host genome. Although these islands represent strain-specific insertions and can be excised and mobilised with different frequencies, the islands have apparently evolved from a common ancestor with phage- and plasmid-like characteristics and belong to a family of related genetic elements. All contain homologous parts with genes found in all related islands. Within these conserved parts unrelated blocks of DNA are interspersed. By screening larger collections of P. aeruginosa strains we could show that members of this family of genomic islands are widespread within this species, and in the meantime more than 30 related DNA elements have been detected in the genomes of many different band g-proteobacteria. Pseudomonas aeruginosa is an opportunistic pathogen for plants, animals and man. It is responsible for severe nosocomial infections and chronically colonizes lungs of patients with cystic fibrosis (CF) leading to morbidity and mortality. The genomes of the reference strain PAO1 (www.pseudomonas.com) and of several other strains have been sequenced completely, the results are invaluable for the research on pseudomonas genomics. What metabolic abili9es are found on islands? Pathogenesis (eg. host range, and disease effectors) Drug resistance (eg. mul9drug resistance) Alterna9ve fuels (eg. degrading PCBs) Novel biosynthe9c pathways (eg. vitamin B12) Microbial compe99on effectors (an9bio9cs) Genomic islands in pathogenic and environmental microorganisms Ulrich Dobrindt, Bianca Hochhut, Ute Hentschel & Jörg Hacker Nature Reviews Microbiology 2, 414-424 (May 2004) The genome of Salinibacter ruber: Convergence and gene exchange among hyperhalophilic bacteria and archaea A schematic representation of the hypersalinity island identified in the genome of Salinibacter Mongodin E F et al. PNAS 2005;102:18147-18152 ©2005 by National Academy of Sciences The no9on that all prokaryotes belong to genomically and phenomically cohesive clusters that we might legi9mately call ‘‘species’’ is a conten9ous one. At issue are (1) whether such clusters actually exist; (2) what species defini9on might most reliably iden9fy them, if they do; and (3) what species concept—by which is meant a gene9c and ecological theory of specia9on—might best explain species existence and ra9onalize a species defini9on, if we could agree on one. We review exis9ng theories and some relevant data. We conclude that microbiologists now understand in some detail the various gene9c, popula9on, and ecological processes that effect the evolu9on of prokaryotes. There will be on occasion circumstances under which these, working together, will form groups of related organisms sufficiently like each other that we might all agree to call them ‘‘species,’’ but there is no reason that this must always be so. Thus, there is no principled way in which quesOons about prokaryoOc species, such as how many there are, how large their populaOons are, or how globally they are distributed, can be answered. These ques9ons can, however, be reformulated so that metagenomic methods and thinking will meaningfully address the biological paderns and processes whose understanding is our ul9mate target. ‘‘ . . . in the end, I think the debate about species reality boils down, sadly, to different interpreta;ons of the word ‘real’.’’ J. Mallet (2005) Our quota9on is from a review of Coyne and Orr’s recent authorita9ve monograph, Specia9on (Coyne and Orr 2004). The book deals overwhelmingly with the problems and prac9ces of systema9sts who work with nonmicrobes (mostly animals) and the arguments of philosophers and historians who have taken an interest in what these systema9sts do. But Mallet’s conclusion applies equally to debates among microbiologists. We too remain deeply divided, in our case about whether or not prokaryotes (i.e., Bacteria and Archaea; pace Pace 2006) have real species and if so how we might recognize, enumerate, and integrate them into exis9ng theore9cal frameworks in ecology, popula9on gene9cs, and evolu9onary biology. To the philosophically inclined, this should be more interes9ng than sad, however. At the end of this essay we will conclude that prokaryoOc genomics shows us that there is no reasonable interpretaOon of the word ‘‘real’’ that can be applied to microbial species generally, but that thinking about species has been highly producOve—and learning to do without them will be even more so. Figure 1. The problema9cs of any metapopula9on lineage‐based general species concept. Arrowheads represent popula9ons or subpopula9ons that might or might not comprise a single species. Ohen, phylogene9c rela9onships between such clusters of individuals will be unknown or ambiguous: (leh panel) Common memberships in a ‘‘metapopula9on lineage’’ cannot be established. (middle panel) As well, there is in principle no way of knowing at what degree of divergence subpopula9ons assume independent ‘‘evolu9onary roles and tendencies,’’ [i.e. Bacteria don’t have sex, to define specie] and thus no way of recognizing minimally inclusive groupings (that is, of dis9nguishing species from higher taxonomic groupings). (right panel) And, when individuals are the product of extensive gene exchange, the very no9on of lineage becomes problema9c. HGT — gene exchange between non‐related organisms —appears commonplace among bacteria, but contributes just small fragments of gene9c informa9on, leaving the tradi9onal tree of life intact. From: Comparing Gene Trees and Genome Trees: A Cobweb of Life? PLoS Biol 3:e347 773 genomes available in NCBI’s RefSeq database were ini9ally clustered using 16S rRNA iden9ty of at least 97% as a guide to form groups. A dozen clusters were selected (list of genomes within each cluster is available in Supplemental Table 1). Clustering of cores: A possible recourse for species monism and realism? Figure 2. Comparison of average nucleo9de iden99es (ANI) with gene content. 773 genomes available in NCBI’s RefSeq database were ini9ally clustered using 16S rRNA iden9ty of at least 97%as a guide to form groups. A dozen clusters were selected (list of genomes within each cluster is available in Supplemental Table 1). For genomes within each cluster, pair‐ wise ANI was calculated essen9ally as described in Konstan9nidis and Tiedje (2005). Shared genes for each pair of genomes were iden9fied as reciprocal top scoring BLASTPmatches (E‐value < 0.001, z = 20,000,000). The propor9on of shared genes was calculated as a ra9o of the number of shared genes over the average number of genes in two genomes. Each ORF in a genome was assigned to a func9onal category according to the Clusters of Orthologous Groups (COG) database (August 2005 release), and three selected categories are depicted in this figure: categories J, P, and Q in COG category one‐leder designa9on. Note that genomes of the E. coli/Shigella group have similar ANI values, but drama9cally varying gene content. Some groups form 9ght clusters (e.g., Legionella spp.), while others exhibit a con9nuum of ANI/shared genes values (e.g., Burkholderia spp.). The clustering also exhibits a large variability in the number of shared genes if genes are considered by func9onal category. Figure 3. Phylogene9c rela9onships among selected genomes in the Prochlorococcus marinus/marine Synechococcus group. Each point in a triangle (simplex) represents a set of orthologous genes that contains at least four analyzed genomes (and as many as 19 genomes from this group). Posi9on of the point in the barycentric coordinate system (triangle) depends on bootstrap support values for each of three possible tree topologies with which each vertex is associated. The closer the point to the vertex, the higher its bootstrap support for that tree topology. Poorly resolved rela9onships result in points located closer to the center of the triangle. Values at each vertex refer to the number of sets of orthologous genes that support the tree topology at the vertex overall, with at least 80% and at least 90% bootstrap support, respec9vely. For a full descrip9on of the methodology used to analyze embedded quartets, see Zhaxybayeva and Gogarten (2003) and Zhaxybayeva et al. (2006). Genomes are designated by their strain names. (Bold) Genomes of marine Synechococcus spp., (italics) low‐light adapted Prochlorococcus marinus genomes, (plain font) Prochlorococcus marinus high‐light adapted strains (all genomes are from NCBI’s RefSeq database). Full analyses of the phylogene9c rela9onships within this group as well as details on the selec9on of sets of orthologous genes and phylogene9c analyses performed will be presented elsewhere (O. Zhaxybayeva, F. Doolidle, T. Papke, and P. Gogarten, in prep.). Clustering of cores: A possible recourse for species monism and realism? Residual Ks Synonymous subs9tu9ons per site (Ks) We separated sequence divergence into rate and Ome components, revealing that different regions of the Escherichia coli and Salmonella enterica chromosomes diverged over a ~70‐million‐year period. GeneOc isolaOon first occurred at regions carrying species‐specific genes, indicaOng that physiological disOncOveness between the nascent Escherichia and Salmonella lineages was maintained for tens of millions of years before the complete geneOc isolaOon of their chromosomes. Distance from the Escherichia coli replica;on origin (Mb) Codon Usage Bias (standard devia9ons from the CAI mean)