Download Evolutionary and Ecological Bioinformatics Biology/Computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Evolutionary and Ecological Bioinformatics
Biology/Computer Science 327, Fall 2015
Professors Fred Cohan and Danny Krizanc
DATE
Sept. 8
Sept. 10
Sept. 15
Sept. 17
Sept. 22
Sept. 24
Sept. 29
Oct. 1
Oct. 6
Oct. 8
Oct. 13
Oct. 15
Oct. 20
Oct. 22
Oct. 27
Oct. 29
Nov. 3
Nov. 5
Nov. 10
Nov. 12
LECTURER
LECTURE TITLE
Cohan
1. Bioinformatic approaches to ecology and evolution
Krizanc
2. Algorithms in everyday life and in research
3. Approaches to phylogeny through overall similarity
Cohan
of organisms (phenetics vs. cladistics)
Krizanc
4. Alignment of DNA and protein sequences
5. Distance-based algorithms for estimating
Krizanc
relationships (UPGMA and NJ)
6. Maximum parsimony approach to phylogeny; search
Krizanc
algorithms for finding the best phylogeny
7. Models of molecular evolution (including JukesCantor, neutral theory, transition-transversion);
incorporating molecular models in maximum
Krizanc
likelihood algorithms for phylogeny estimation
Krizanc
9. Bayesian approaches to phylogeny and your own life
Krizanc
8. Testing the robustness of a tree
10. Gene trees vs. species trees; splits trees and
Krizanc
phylogenetic networks
11. The importance of using phylogeny for testing
hypotheses about natural selection; phylogenetic
Cohan
algorithms for testing natural selection
12. Assembly algorithms for genome sequencing—
from isolates, metagenomes, and uncultivated single
Krizanc
cells (Velvet)
13. Algorithms for structural annotation (where are the
Cohan and genes) and databases for functional annotation (what
Krizanc
are the genes)
14. Genome-based trees (based on gene content, gene
order, and sequence of concatenation) (FastTree);
Krizanc
supertrees.
Fall break
15. Gene duplication in evolution; genome-wide
Cohan
analysis of adaptation through gene acquisition
16. Analyses of adaptation through changes in genomeCohan
wide gene expression
Cohan and
Krizanc
17. Research projects
Cohan and 18. Genome-wide approaches for finding shared genes
Krizanc
under recent positive selection
19. Metagenomics in ecosystems biology: how to find
Cohan
out the physiological processes occurring in an
TEXTBOOK
READINGS
Ch. 1,2
Ch. 3, 4, 12
Ch. 5, 6
Ch. 5, 8
Ch. 9
pp. 75-78
Ch. 10
pp. 82-89
Ch. 15
Ch. 14
Nov. 17
Nov. 19
Nov. 24
Nov. 26
Cohan
GIS Guest
speaker:
Sophie
Breitbart
Dec. 1
Cohan
Dec. 3
Cohan
Dec. 8
Cohan
Dec. 10
Registrarscheduled
time for
our Final
Exam
Zelnick
Pavillion
Cohan and
Krizanc
Due Oct. 6
Due Oct. 22
Due Dec. 1
Due Dec. 3
Due Dec. 8
Due at time of final
exam
Due at time of final
exam
ecosystem even when we don’t know who the
organisms are
20. Metagenomic approaches for characterizing
community-wide organismal diversity; sorting
sequences into taxa
21. Ecological niche modeling
(cancelled)
Thanksgiving
22. Metagenomic approaches to finding out what
unidentified genes do (ecological annotation);
bioinformational bioprospecting
23. The human microbiome: types of communities
across humans, functional screening for novel genes,
antibiotic holocausts and health consequences
24. Baseball, biology, global climate change, and big
data
25. Molecular approaches for identifying microbial
diversity in natural communities—AdaptML and
Ecotype Simulation
POSTER SESSION
HOMEWORK ASSIGNMENTS
1. Make a tree (with help from computer algorithms)
2. A pencil and paper phylogenetic problem set
3. Ecological niche modeling
4. Project abstract
5. Comparing genomes to characterize past natural selection
TERM PROJECT
Poster on research project
Paper on (the same) research project
GRADING
Homework assignments
Term project poster
50%
20%
Term project paper
30%
READINGS
Textbook: Phylogenetic Trees Made Easy: A How-To Manual, Fourth Edition, Barry G. Hall.,
2011, Sinauer Associates.
Supplementary Readings will be listed on the class WesFiles web site.
CONTACT INFORMATION (Email is the best way to set up an appointment.)
Fred Cohan
207 Shanklin
X3482
[email protected]
Office hours: Fridays 1:15-2:15, and by appointment
Danny Krizanc
631 Exley Science Center
X2186
[email protected]
Abby Cram
111 Hall-Atwater
[email protected]
Office hours: Tuesdays 1:00-2:00 (with additional times to be announced)
Sophie Breitbart
QAC Office, Allbritton
[email protected]
Office hours: Mondays 3:00-4:00, Wednesday 3:00-4:00, Thursday 7:00-9:00 pm (except Nov.
19 & Dec. 3) at the QAC and by appointment. You can also email her.
November 18, 2015
Evolutionary and Ecological Bioinformatics
Biology/Computer Science 327, Fall 2015
Supplementary Reading
Sep.
8
1.
Bioinformatic
approaches to
ecology and
evolution
Sep.
10
2. Algorithms in
everyday life
and in research
3. Approaches
to phylogeny
through overall
similarity of
organisms
Sep.
15
Ginsberg gives a really nice example of the Big Data approach, in this case to
predict influenza levels before the CDC can, based on Google search queries
(Ginsberg et al., 2009). Also, Salathé et al. (Salathé et al., 2013). Larson et al.
provides phylogenetic evidence that wild pigs were domesticated in six
different places around Eurasia (Larson et al., 2005); similarly, Thalmann et al.
show that dogs were domesticated four times in Europe (Thalmann et al.,
2013). Keeling and Palmer chart phylogenetically the most significant
horizontal transfer events in eukaryotic history (Keeling and Palmer, 2008).
Mikkelsen et al. have identified those genes in the genome that have been
under selection for new adaptations in humans (Mikkelsen et al., 2005).
Merhej compared bacterial genomes to test whether different lineages
evolving independently toward pathogenicity (or mutualism) tend to lose the
same genes convergently (Merhej et al., 2009). (They do!) Christina Richards
et al. explored the circumstances under which gene expression changed over
the course of an organism’s life, in the case of the plant Arabidopsis (Richards
et al., 2012). Fierer et al. explored how the bacterial community on hands
varies between the left and right hands and between people, and the effects
of washing on hands’ bacterial communities (Fierer et al., 2008). Knight et al.
showed, in a meta-analysis across various high-impact studies from the Earth
Microbiome Project, how the similarity of environment drives the similarity of
bacterial communities (Knight et al., 2012).
Harel’s Chapter 4 is a "gentle" introduction to the notion of NP-completeness
or why some problems are hard for computers to solve (Harel, 2000).
Nosenko et al. give a recent phylogeny of animals based on various genes;
they explain how to choose the best set of genes when genes differ in the
phylogenies they yield; for our purposes, the article shows how some
evolutionary groups that were not obvious from morphological similarities
were discovered through sequence analysis (Nosenko et al., 2013). Related to
this, Adoutte et al. show how morphological and sequence data yield
different relationships among animal phyla (Adoutte et al., 2000). Funch and
Kristensen present their discovery of an animal phylum (Funch and
Kristensen, 1995). Schloss and Handelsman present a phylogeny of the
bacterial phyla, showing that most of the phyla do not have even a single
cultivated species (Schloss and Handelsman, 2004). My colleagues and I offer
an example of discovering new taxa based on sequence data alone, at the
level of new genera and new species (Kim et al., 2012). My recent
encyclopedia chapter on species gives an overview of the concepts of species,
including the dynamic qualities species have long been expected to have
(Cohan, 2013). Mallet gives a species concept based on Darwin’s idea that two
species should have no or very little overlap in a set of distinguishing
characteristics; his concept does not deal with the dynamic qualities of
cohesion irreversible separateness, and so on (Mallet, 1995). Genoways and
Choate, from the heyday of numerical taxonomy, illustrate two ways of
Sep.
17
4. Sequence
Alignment
Sep.
22
5. Distancebased Methods
for Phylogeny
Construction
Sep.
24
6. Maximum
parsimony
approach
7. Models of
molecular
evolution and
maximum
likelihood
approach
Sep.
29
Oct.
1
8. Bayesian
methods
Oct.
6
9. Testing the
robustness of
trees
presenting data on clustering of organisms by their overall phenotypic
similarity (Genoways and Choate, 1972). Kämpfer et al. make a case that
species of Streptomyces form distinct, justifiable units when we demarcate
species at the 80% similarity level for phenotypic traits (Kämpfer et al., 1991).
Futuyma, in his textbook, explains the limitations of the phenetic approach to
phylogeny (where all characters are used), and why we should constrain our
analyses to those characters that are derived (Futuyma, 1998). Baum and
Smith, in their textbook, give a clearer example of using shared, derived
characters for making a phylogeny (Baum and Smith, 2013).
Sean Eddy contains a biologist’s view of something called dynamic
programming, which is the central idea behind a number of bioinformatics
algorithms including how to perform pairwise sequence alignment (Eddy,
2004). I’ve also included the original papers introducing ClustalW (the most
commonly used multiple alignment tool), MUSCLE (a newer tool
recommended by Hall) (Edgar, 2004) and GUIDANCE (a tool for evaluating the
quality of alignments described in Chapter 12 of Hall) (Penn et al., 2010).
Morrison tries to answer the question ``Why would phylogeneticists ignore
computerized sequence alignment’’ and makes some interesting points along
the way (Morrison, 2009). His conclusion is that the current tools aren’t good
enough.
I’ve included the original papers describing UPGMA (by Michener and Sokal)
(Michener and Sokal, 1057) and Neighbor-Joining (by Saitou and Nei) (Saitou
and Nei, 1987). Both are pretty heavy going but interesting. For gentler
descriptions of these algorithms I suggest Wikipedia. For a computer science
perspective on this and the next three lectures I have also included Mona
Singh’s notes (from a course she teaches at Princeton) on phylogeny
reconstruction.
The paper by Bos and Posada is a nice review of different models of DNA
evolution and how they are used in building trees (Bos and Posada, 2005). The
article by Guindon et al. discusses some recent developments in maximum
likelihood algorithms that have had a real impact on how fast they are and
how large a tree they can construct (Guindon et al., 2010). Sumner et al.
discuss why it might not be such a good idea to use the most general model
available when estimation trees (Sumner et al., 2012).
McGrayne discusses implicit, embedded use of Bayesian methods in baseball
batting averages and other issues of daily import (McGrayne, 2011) (p. 130).
Silver introduces Bayesian analysis using the mysterious panties (or nighty)
story (Silver, 2012) (p. 245). Huelsenbeck et al. reviews the use of Bayesian
methods in phylogeny reconstruction (Huelsenbeck et al., 2001). Ronquist and
Huelsenbeck introduce the third iteration of the program Mr. Bayes (Ronquist
and Huelsenbeck, 2003).
The paper by Anisimova and Gascuel introduces an approximate likelihood
ratio test that can be used in conjunction with maximum likelihood methods
to estimate one’s confidence in the clades of a given tree (Anisimova and
Oct.
8
10. Gene trees
vs. species
trees; splits
trees and
phylogenetic
networks
Oct.
13
11. The
importance of
using phylogeny
for testing
hypotheses
about natural
selection;
phylogenetic
algorithms for
testing natural
selection
Oct.
15
12. Assembly
algorithms for
genome
sequencing—
from isolates,
metagenomes,
and
uncultivated
single cells
13. Algorithms
for structural
annotation
(where are the
genes) and
databases for
functional
annotation
(what are the
genes)
Oct.
20
Gascuel, 2006). This turns out to be much faster than using non-parametric
approaches such as bootstrapping.
The paper by Degnan and Rosenberg shows how lineage sorting can cause
serious problems when trying to infer the correct species tree from gene trees
(Degnan and Rosenberg, 2006). White et al. study the discordance between
gene trees for three subspecies of mouse (White et al., 2009). The Iwabe et al.
paper uses gene duplication/loss parsimony to root the tree of life (Iwabe et
al., 1989). Zmasek and Eddy describe a straightforward algorithm for inferring
duplication/loss events given a gene tree and its corresponding species tree
(Zmasek and Eddy, 2001). Ropars et al. present genomic evidence for adaptive
horizontal gene transfers between cheese fungi; apparently, this was driven
by artificial selection for deliciousness (Ropars, 2015).
Donoghue presents the classic case for why every evolutionary biologist
needs to pay attention to phylogeny (Donoghue, 1989). In their book on
comparative biology, Harvey and Pagel explain how phylogeny can be used to
make tests of natural selection (Harvey and Pagel, 1991). Probert et al.
analyze the relationship between seed longevity and various phenotypic and
environmental factors. In one analysis, they perform the tests using a preDonoghue, non-phylogenetic approach, and in another, they make a test
based on phylogenetically independent contrasts (Probert et al., 2009). Our
own Mike Singer presents a very nice phylogenetically independent contrasts
analysis of the effect of caterpillar specialization on the effect of bird
predation (Singer et al., 2014). Laurin compares the accuracy of various
methods for phylogenetically independent contrasts (Laurin, 2010).
Zerbino and Birney present Velvet, an algorithm for sequence assembly from
very short reads (Zerbino and Birney, 2008). Miller et al. present an overview
of algorithms for assembly from short-read sequencing (Miller et al., 2010).
Waterston et al. discuss the Celera project as a cannibalization of the
worldwide human genome project (Waterston et al., 2002). And from the
Venter group a rebuttal (Myers et al., 2002). She et al. discuss the challenge of
genome sequencing in the context of duplicated regions (She et al., 2004).
Van Domselaar et al. present an overview of genome annotation (van
Domselaar et al., 2014). Pruitt et al. describe the current state of NCBI’s
Reference Sequences project, against which “extrinsic gene finding” is used to
characterize putative ORFs (Pruitt et al., 2012). Hyatt presents Prodigal, a
recent algorithm for identifying genes by ab initio methods (Hyatt et al.,
2010), and Borodovsky et al. present Gene Mark, among the first successful
ab initio algorithms (Borodovsky et al., 2003). Delcher et al. on Glimmer
(Delcher et al., 2007). Functional annotation requires a reliable database with
functions accurately attributed to genes; The Uniprot Consortium provides
one such database (Consortium, 2013). The KEGG database is another
(Kanehisa et al., 2014). The TIGRFAM database of protein families is
searchable by nhmmer (Wheeler and Eddy, 2013). Gardy presents PSORTb, a
program for locating an unknown protein to a portion of a bacterial cell
(Gardy et al., 2005). The COG database of 25 broad functional categories of
proteins is presented by Tatusov (Tatusov et al., 2003). An example of a RAST
Oct.
22
Oct.
29
Nov.
3
14. Genomebased trees
(based on gene
content, gene
order, and
sequence of
concatenation)
(FastTree);
supertrees;
need for highly
resolved trees.
15. Gene
duplication in
evolution;
genome-wide
analysis of
adaptation
through gene
acquisition
16. Adaptation
through gene
acquisition (part
2); analyses of
adaptation
through
changes in
genome-wide
gene expression
annotation is presented by Kopac et al. (Kopac et al., 2014).
Rokas et al. empirically show that about 20 genes are all that is required to
yield a high-resolution tree based on concatenation of genes (Rokas et al.,
2003). Lin et al. present an algorithm for producing genome-based trees of
prokaryotes, taking into account the content and structure of the genome (Lin
et al., 2009). Huson and Steel present an algorithm for producing a tree based
on gene content (Huson and Steel, 2004). Li et al. produce a distance-based
tree-building algorithm based on the Kolmogorov distance between genomes
(Li et al., 2001). Whidden et al. present an algorithm for producing a
supertree while accommodating horizontal genetic transfer (Whidden et al.,
2014). Swenson et al. present a two-step algorithm for producing supertrees
with enormous numbers of organisms (Swenson et al., 2012).
Brenner et al. present their classic result that families of gene duplicates are
extremely common in a genome (Brenner et al., 1995). Zhong et al. explore
the young duplicated genes specific to various species and lineages of
Drosophila fruit flies, and find that there has been much convergence of
duplication events across lineages (Zhong et al., 2013). Merhej et al. make a
case for convergent losses of genes in the origins of various lineages of
obligately intracellular parasites (Merhej et al., 2009). The Welch et al. paper
shows the huge magnitude of gene content differences within one bacterial
species (Welch et al., 2002). Popa et al. perform a comprehensive network
analysis to identify donor-recipient pairs in recent HGT events, showing that
most HGT events have involved close relatives (Popa et al., 2011).
Nevertheless, this same group also showed that the radical transformation
that resulted in the Haloarchaea involved over 1000 HGT events from various
bacteria (a different domain) (Nelson-Sathi et al., 2012). Such a radical
transformation is hindered by architectural constraints, as I have discussed
(Wiedenbeck and Cohan, 2011; Cohan, 2010). Veyrier et al. identify the genes
in the various steps from non-pathogenic Mycoobacterium species to the
human tuberculosis pathogen (Veyrier et al., 2009). Sun et al. used a genomic
comparison to figure out the critical step toward Plague’s transmission from
fleas (Sun et al., 2014). The bioinformatics for the Plague study was
performed by Chain et al. (Chain et al., 2004).
Luo et al. (as discussed by Cohan and Kopac) identify the genes that
distinguish environmental from gut-commensal E. coli, and use a
bioinformatic approach to show that these changes are adaptive (Luo et al.,
2011; Cohan and Kopac, 2011). Kleiner et al. present a case of genomic
“reverse ecology,” where genomes indicate aspects of the environment that
were previously unknown, in this case the sea grass sediment environment
(Kleiner et al., 2012). Bhaya et al. do the same for a hot springs environment
(Bhaya et al., 2007). Hao and Golding present evidence that evolution is
accelerated when a gene enters a new organism through HGT (Hao and
Golding, 2006). Touchon et al. identify HGT events among members of the
species taxon E. coli, and show that among closest relatives, nearly all of HGT
events involve genes without a function for the bacteria (Touchon et al.,
2009). Kopac et al. also showed that, among extremely close relatives, nearly
all of the genes acquired are not of known function (Kopac et al., 2014).
Richards worked out the timing of gene gains and losses in Streptococcus, and
Nov.
10
18. Genomewide
approaches for
finding shared
genes under
recent positive
selection
Nov.
12
19.
Metagenomics
in ecosystems
biology: how to
find out the
physiological
processes
found an early period of net gene gains and then a later period of net losses
(Richards et al., 2014).
Herring et al. use a genome “re-sequencing” approach to infer that single
changes in one gene might have manifold effects on gene expression across
the genome (Herring et al., 2006). Ferea et al. present a classic piece of work
showing the hundreds of gene expression changes that yeast undergoes as it
spontaneously evolves to be aerobic (in the absence of competitors) (Ferea et
al., 1999). Sumby et al. use genome-wide gene expression and genome
resequencing to show that passaging a non-pathogenic strain of Strep
through a mouse brings about evolution of virulence through a single change
in a signal transducing gene brings about massive changes in gene expression,
including dozens of virulence genes (Sumby et al., 2006). Hahne et al. explore
in one strain of Bacillus subtilis the various gene expression changes genomewide that respond to a salinity challenge (Hahne et al., 2010). Dettman et al.
discuss the diversity of evolutionary responses among closely related
populations to a single selection pressure through the magic of genome-wide
gene expression analyses (Dettman et al., 2012). Vital et al. work out the
transcriptome differences between commensal and aquatic strains of E. coli
(Vital et al., 2015). Gómez-Lozano et al. show the power of RNA-seq to openendedly discover differences in expression beyond the annotated genome
(Gómez-Lozano et al., 2012).
Williamson et al. present a genome-wide analysis of selective sweeps in the
human genome, across the entire species and within ethnic groups
(Williamson et al., 2007). Pavlidis et al. presented a new algorithm (SweeD)
for detecting selective sweeps from an input of thousands of whole-genome
sequences (Pavlidis et al., 2013). Here they applied it to detect several genes
that underwent a selective sweep on human chromosome 1. Clark et al.
performed a genome-wide phylogenetic analysis of positive selection in the
human lineage, compared to chimps, and with mouse as the outgroup (Clark
et al., 2003). Note how they identified the individual genes under selection in
the human lineage, and how they identified functional classes of genes with a
particularly high frequency of accelerated evolution in humans. Vos
developed a species concept for bacteria based on each ecotype having its
own unique history of positive selection (Vos, 2011); you might think about
how this idea may yield the same or different demarcations of ecotypes. Our
Kopac et al. 2014 article implements this approach to find evidence that the
most closely related strains we can find within Bacillus subtilis are already
ecologically divergent (Kopac et al., 2014). Vos et al. present their new
computer package ODoSE to find bacterial ecotypes as units that are different
in their histories of positive selection (Vos et al., 2013). PAML implements a
maximum likelihood approach (Yang, 2007).
Bell et al. present evidence that increasing bacterial diversity increases the
productivity of an ecosystem (Bell et al., 2005). Lay et al. investigate the
functional diversity in an extremely cold and salty spring at the top of the
world; they find that certain functions are found redundantly in a great
diversity of organisms, while others are not (Lay et al., 2013). Simon et al. use
a metagenomic approach to studying the microbial organismic diversity on a
glacier; they also discover the genes responsible for protection against the
occurring in an
ecosystem even
when we don’t
know who the
organisms are
Nov.
17
20.
Metagenomic
approaches for
characterizing
communitywide organismal
diversity
cold in this community (Simon et al., 2009). McHardy et al. present a package
called Phylopythia, for identifying organisms from a single metagenomic
sequence, based on nucleotide composition (McHardy et al., 2007). Cecchini
et al. use a metagenomic approach to figure out which organisms provide
certain functions in the environment, in this case the ability to utilize prebiotic
compounds (Cecchini et al., 2013). McMahon et al. present a functional
screen for novel genes that provide a certain function, and they show that the
host in which metagenomic segments are cloned makes a big difference in
their expression (and ability to be screened) (McMahon et al., 2012). Sommer
et al. perform a functional screen for antibiotic resistance genes in human
guts; surprisingly, there are many resistance genes that show only a distant
relationship to those resistance genes isolated from cultured bacteria
(Sommer et al., 2009). Robertson et al. perform a functional screen for novel
nitrilases, and are able to chart the history of evolutionary transitions from
activity on one enantiomer to activity on another (Robertson et al., 2004).
Rinke et al. show how single-cell genomics (i.e., sequencing the entire
genome of one cell we cannot culture) add to our understanding of the
functional repertoire of an ecosystem (Rinke et al., 2013). (More from Rinke
in the next lecture on the diversity of organisms in bacterial communities.)
And one last bit (discovered after the lecture). Rocca et al. have tested the
assumption that metagenomic or metatranscriptomic abundance of a gene is
correlated with the amount of the process coded by the gene in the
ecosystem—not such a great correlation, it turns out (Rocca et al., 2015)†.
Nayfach et al. have very recently developed a pipeline for accurately
annotating the functions of genes in a metagenome (Nayfach et al., 2015)†.
DeSantis et al. present their algorithm and web site, GreenGenes, for
classifying a 16S rRNA sequence to a taxon (DeSantis et al., 2006).
Konstantinidis and Tiedje present evidence for criteria (or a range of criteria)
of 16S rRNA divergence for demarcating taxa of different ranks
(Konstantinidis and Tiedje, 2005). Kim et al. is my foray into discovery of new
genera and species by 16S rRNA analysis of environmental DNA (Kim et al.,
2012). There are various tools for classifying unknown sequences into taxa,
including Qiime (Caporaso et al., 2010), Mothur (Schloss et al., 2009), and
BioMaS (Fosso et al., 2015). An approach to gene-based classification to
identify elephant populations that are being poached is presented by Wasser
et al. (Wasser et al., 2015). Sogin et al. present the first high-throughput
sequencing of environmental DNA from a marine habitat, providing evidence
that there is an extraordinary diversity of extremely rare organisms (Sogin et
al., 2006). Hughes et al. review the various ways to estimate the total richness
of a community when we know we haven’t sampled enough organisms to see
everyone who is there (Hughes et al., 2001). We briefly revisit Simon et al.,
who gave an example of characterizing the organismic diversity of a
community by assigning protein-coding genes from the metagenome to taxa
(Simon et al., 2009); also, we revisit PhyloPythia (McHardy et al., 2007). Hess
et al. perform the amazing feat of obtaining a nearly complete genome
sequence of various organisms from the metagenome fragments of a cow’s
rumen (Hess, 2011); Mackelprang et al. obtained a similar result from
permafrost soil, obtaining the sequence of a novel methanogen from
Nov.
19
21. Ecological
niche modeling
(Sophie
Breitbart
lecture)
Dec.
1
22.
Metagenomic
approaches to
finding out what
unidentified
genes do
(ecological
annotation);
bioinformatic
prospecting
Dec.
3
23. The human
microbiome:
types of
communities
across humans,
functional
screening for
novel genes,
antibiotic
holocausts and
health
consequences
permafrost soil (Mackelprang et al., 2011). Rinke et al. provide results from
single-cell genome sequencing of various phyla that had never previously
been sequenced; this provided evidence for four previously unknown
superphyla (Rinke et al., 2013). Just to show that we care about the genebased discovery of phylogenetic supergroups in non-bacteria, we provide the
discovery of superorders of mammals (Bininda-Emonds et al., 2007).
Levine et al. present an ecological niche modeling analysis of where the
pathogen monkeypox could survive in today’s world (Levine et al., 2007). Why
might monkeypox not be able to survive in some of the places where it is
predicted to be able to? Peterson and colleagues predict with ENM what
habitats the chachalaca of Mexico will be leaving and which it will be entering
between now and 2055; more generally, they predict species turnovers
among all species of birds and mammals in Mexico (Peterson et al., 2009).
Batalden et al. predict the future geographic distribution of monarch
butterflies, assuming that their food organisms (milkweeds) can keep up with
climate change (Batalden et al., 2007). Peterson predicts the future
distribution of malaria in Africa (Peterson, 2009). And now, something
completely different, or is it? Lozier et al. present an ENM analysis of
Sasquatch citings (Lozier et al., 2009)!
Here are the references for the metagenome projects discussed in class (Wu
et al., 2009; Turnbaugh et al., 2007; Gilbert et al., 2010; 10K, 2009; Davies et
al., 2012; Tyson et al., 2004). Knight et al. and Field et al. plea for a new
standard of coverage of environmental data in metagenomics studies (Knight
et al., 2012; Field et al., 2011). Plewniak give a nice old-style example of how
we can identify the genes responsible for adaptation to a given geochemical
stressor, if we already know the genes (Plewniak et al., 2013). Inskeep et al.
give a nice example of extremely different sets of geochemical stressors
across habitats in a metagenome study (Inskeep et al., 2010). Biddle et al. give
an example of less extreme variation among environments, where the same
phyla are found everywhere, possibly a good source of ecological annotation
(Biddle et al., 2011). Mackay et al. describe the Drosophila melanogaster
genetic reference panel, which consists of the genome sequences of 168
inbred lines derived from a single natural population; this is being used to
determine the genes responsible for each of many physiological, behavioral,
and ecological traits (Mackay et al., 2012).
Our story today begins with the emergence of the germ theory of disease, and
an attitude both within households and in the public health establishment
that the only good germ is a dead germ; I recommend The Gospel of Germs by
Nancy Tomes as a great narrative of this period, from the 1870’s mostly until
the antibiotic revolution of the 1940’s (Tomes, 1998). Martin Blaser’s Missing
Microbes and Alanna Collen’s 10% Human are extremely engaging booklength accounts of the importance of our gut microbiomes in human health
(Blaser, 2014; Collen, 2015). Zimmer and Velasquez-Manoff have recently
written short popular accounts on this issue (Pollan, 2013; Velasquez-Manoff,
2013) http://www.nytimes.com/2013/05/19/magazine/say-hello-to-the-100trillion-bacteria-that-make-up-your-microbiome.html?ref=magazine . The
most direct repercussion of the germ-as-enemy approach, leading to overuse
of antibiotics, has been the emergence of antibiotic resistance. Forslund et al.
Dec.
8
24. Baseball,
biology, and big
data
present data on the prevalence of antibiotic resistance in different countries,
and the relationship between use of antibiotics for animal agriculture and
resistance in the human gut microbiome (Forslund et al., 2013). More
recently, we have reached an appreciation for the beneficial qualities of our
gut bacteria, and Khosravi and Mazmanian describe the disease-fighting
importance of our resident bacteria (Khosravi and Mazmanian, 2013). PérezCobas describe the lasting effect of an antibiotic regimen on the composition
of an individual’s gut microbiome (Perez-Cobas et al., 2013). Cho et al. chart
the change in the mouse microbiome with early antibiotic treatment (Cho et
al., 2012). Liping Zhao presents a proposal for a research field where we use
various bioinformatic approaches to determine the organismal changes
correlated with obesity and leanness, and then perform experiments to test
the effects of the implicated bacteria (Zhao, 2013). Wu et al. make a case that
there are two primary clusters of bacterial gut communities across humanity,
one dominated by Prevotella and associated with a carbohydrate diet, and
another dominated by Bacteroides and associated with a diet high in fat and
proteins; they also show that the microbiome can be changed in the shortterm but that it probably takes a long time to fully change a human’s gut
microbiome (Wu et al., 2011). Lozupone and Knight have developed a very
useful algorithm called Unifrac for clustering bacterial communities by their
phylogenetic differences; it is described in a couple of articles (Lozupone et
al., 2006; Lozupone and Knight, 2005). Muegge et al. find a functional pattern
to the differences in microbiomes of mammalian herbivores vs. carnivores;
they find such interesting things as carnivores tending to have microbiomes
with lots of amino acid degradation enzymes, while herbivores tend to have
lots of amino acid biosynthesis enzymes, which makes sense when you think
about it. They also make the case that the microbiomes of human vegans
tend to look more like those of mammalian herbivores, while microbiomes of
human meat-eaters tend to look like those of mammalian carnivores (Muegge
et al., 2011). A mouse study by Zhang et al. (including Liping Zhao) shows that
the microbiome of a mouse is rapidly changed with the onset of a high-fat
diet, and changes back quickly with resumption of a low-fat diet; they also
identify key phylotypes that change rapidly with the change in diet, a step
toward replacing fecal transplant with targeted probiotic therapy (Zhang et
al., 2012). The next step in Liping Zhao’s paradigm is to test each of these taxa
for its effect on weight by introducing it into gnotobiotic mice; I supply a (nonbioinformatic) example with a previously suspected effect of the bacterium
on reducing inflammation (Sokol et al., 2008). With thanks to our own Ariel
Kaluzhny, here is an article from Wired on a start-up to do poop metagenome
sequencing for the CDC (Zhang, 2015).
I have written a couple of pieces on Sandy Koufax’s perfect game, and what it
taught me about using our imaginations better to have a fuller and more
useful data set (Cohan, 2011b; Cohan, 2011c). I also wrote an article on how
Big Data approaches can be used better in biology, in homage to Moneyball
(Cohan, 2012). Lozupone and Knight wrote their break-out piece on Unifrac,
showing that changes in adaptations to salinity were the most difficult
transitions in bacterial history; they also lamented that the resolution of the
environmental data was such that they could not also investigate the difficulty
Dec.
10
25. Molecular
approaches for
identifying
microbial
diversity in
natural
communities
and tests of
models of
bacterial
speciation
of more subtle changes in salinity adaptation (Lozupone and Knight, 2007).
Travis Sawchik’s Big Data Baseball talks about the second wave of big data
transforming baseball (Sawchik, 2015). Schoenfield’s article on Ned Yost
discusses how a manager who has totally eschewed big data has succeeded
(Schoenfeld, 2015). The disappointment of the missing data led to various
conferences on what environmental parameters (and sequencing and
assembly tools!) we should be recording when we spend millions of dollars on
genome and metagenome sequencing (Field et al., 2008; Gilbert et al., 2010).
David Toomey writes about how we see only what we expect to see. This was
exemplified by microbiologists’ disinterest in exploring what life may exist in
Yellowstone’s hot springs, owing to their “knowledge” that life couldn’t
possibly exist at such high temperatures (Toomey, 2013), p. 12-13. See also
Thomas Brock’s account of his discovery (Brock, 1995) and Thomas Kuhn’s
account of our limitations toward discovery (Kuhn, 1996), p. 63-64. Hurwirtz
and Sullivan have organized the unknown diversity among marine viral
proteins by clustering them, and then trying to find out what ocean properties
each cluster is associated with (Hurwitz and Sullivan, 2013). A whole new and
open-ended way of seeing biodiversity (maybe with less preexisting bias) is
being explored by Map of Life, a mobile application that allows researchers to
add in the field to the knowledge base of species that they encounter; plus it
gives all the species that one can expect in the area that you are in (using your
GPS); a kind of a community-based wiki of species distributions
(http://mol.org ) (Goldsmith, 2015).
Two of our recent papers have explored the diversity of models of bacterial
speciation (Cohan, 2011a; Kopac et al., 2014). Some of our earlier papers
defined a wider range of speciation models but were more wedded to the
Stable Ecotype model (Cohan and Perry, 2007). A classic paper by Silvia Acinas
on marine bacterial diversity used a lineage-through-time approach to
provide evidence for ecotypes (Acinas et al., 2004); Danny and I developed
our algorithm Ecotype Simulation to estimate the parameters of speciation
dynamics that would best match a lineage-through-time plot (Koeppel et al.,
2008). We later compared Ecotype Simulation to various other algorithms
that also purported to find ecotypes, and found that ES competed well
(Francisco et al., 2014). We showed that the putative ecotypes demarcated
for Bacillus by Ecotype Simulation were ecologically distinct, as based on
habitat associations and physiological differences relating to heat tolerance
(Connor et al., 2010). In what I call Ford Doolittle’s theory of Very Little
Species, he has compellingly argued that the rate of ecological diversification
may be extremely rapid in many bacteria, owing to the high rate of horizontal
genetic transfer (Doolittle and Zhaxybayeva, 2009). Our genomic study of
variation within one putative ecotype of Bacillus has indeed shown an
extremely high rate of speciation, and has supported the Nano-Niche model
of speciation (Kopac et al., 2014). Michiel Vos has argued that we can identify
bacterial ecotypes by finding groups with unique histories of positive selection
(Vos, 2011). In our study of diversification among extremely close relatives of
Synechococcus within Yellowstone’s hot springs, we found that putative
ecotypes identified by ES (with up to 0.5% genomic divergence) were
ecologically homogeneous, indicating a slow rate of speciation in this group
(Becraft et al., 2015). Also, our analysis of several genomes did not indicate
evidence of positive selection for any gene, within putative ecotypes (Olsen et
al., 2015). A not-quite-published paper by Bendall et al. (from the Trina
McMahon and Rex Malmstrom labs) gives evidence for genome-wide sweeps
in some sequence clusters from Trout Bog Lake, but single-gene sweeps in
other clusters. They offer an recombinational model for the differences in the
breadth of selective sweeps, but in a paper to come out as soon as Bendall et
al. comes out, I argue for an ecological mechanism (Cohan, 2016). Kwong et
al. show that a phylogeny based on a concatenation of the core genome of
Listeria monocytogenes yielded many unexpected, small sequence clusters
(Kwong et al., 2015); this illustrates a way to test for microdiversity within the
sequence clusters of Trout Bog Lake.
10K, G. (2009). Genome 10K: A proposal to obtain whole-genome sequence for 10,000 vertebrate
species. Journal of Heredity 100: 659-674.
Acinas, S. G., Klepac-Ceraj, V., Hunt, D. E., Pharino, C., Ceraj, I., Distel, D. L. & Polz, M. F. (2004). Finescale phylogenetic architecture of a complex bacterial community. Nature 430: 551-554.
Adoutte, A., Balavoine, G., Lartillot, N., Lespinet, O., Prud'homme, B. & de Rosa, R. (2000). The new
animal phylogeny: reliability and implications. Proc Natl Acad Sci U S A 97(9): 4453-4456.
Anisimova, M. & Gascuel, O. (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and
powerful alternative. Syst Biol 55(4): 539-552.
Batalden, R. V., Oberhauser, K. & Peterson, A. T. (2007). Ecological niches in sequential generations of
eastern North American monarch butterflies (Lepidoptera: Danaidae): the ecology of migration
and likely climate change implications. Environ Entomol 36(6): 1365-1373.
Baum, D. A. & Smith, S. D. (2013). Tree Thinking: An Introduction to Phylogenetic Biology. Greenwood
Village, Colorado: Roberts & Company Publishers.
Becraft, E. D., Wood, J. M., Rusch, D. B., Kühl, M., Jensen, S. I., Bryant, D. A., Roberts, D. W., Cohan, F. M.
& Ward, D. M. (2015). The molecular dimension of microbial species: 1. Ecological distinctions
among, and homogeneity within, putative ecotypes of Synechococcus inhabiting the
cyanobacterial mat of Mushroom Spring, Yellowstone National Park. Front Microbiol 6: 590.
Bell, T., Newman, J. A., Silverman, B. W., Turner, S. L. & Lilley, A. K. (2005). The contribution of species
richness and composition to bacterial services. Nature 436(7054): 1157-1160.
Bhaya, D., Grossman, A. R., Steunou, A. S., Khuri, N., Cohan, F. M., Hamamura, N., Melendrez, M. C.,
Bateson, M. M., Ward, D. M. & Heidelberg, J. F. (2007). Population level functional diversity in a
microbial community revealed by comparative genomic and metagenomic analyses. ISME J 1(8):
703-713.
Biddle, J. F., White, J. R., Teske, A. P. & House, C. H. (2011). Metagenomics of the subsurface BrazosTrinity Basin (IODP site 1320): comparison with other sediment and pyrosequenced
metagenomes. ISME J 5(6): 1038-1047.
Bininda-Emonds, O. R., Cardillo, M., Jones, K. E., MacPhee, R. D., Beck, R. M., Grenyer, R., Price, S. A.,
Vos, R. A., Gittleman, J. L. & Purvis, A. (2007). The delayed rise of present-day mammals. Nature
446(7135): 507-512.
Blaser, M. J. (2014). Missing Microbes: How the Overuse of Antibiotics Is Fueling Our Modern Plagues.
New York: Henry Holt and Co.
Borodovsky, M., Mills, R., Besemer, J. & Lomsadze, A. (2003). Prokaryotic gene prediction using
GeneMark and GeneMark.hmm. Curr Protoc Bioinformatics Chapter 4: Unit4 5.
Bos, D. H. & Posada, D. (2005). Using models of nucleotide evolution to build phylogenetic trees. Dev
Comp Immunol 29(3): 211-227.
Brenner, S. E., Hubbard, T., Murzin, A. & Chothia, C. (1995). Gene duplications in H. influenzae. Nature
378(6553): 140.
Brock, T. D. (1995). The road to Yellowstone--and beyond. Annu Rev Microbiol 49: 1-28.
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Pena,
A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E.,
Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh,
P. J., Walters, W. A., Widmann, J., Yatsunenko, T., Zaneveld, J. & Knight, R. (2010). QIIME allows
analysis of high-throughput community sequencing data. Nat Methods 7(5): 335-336.
Cecchini, D. A., Laville, E., Laguerre, S., Robe, P., Leclerc, M., Dore, J., Henrissat, B., Remaud-Simeon, M.,
Monsan, P. & Potocki-Veronese, G. (2013). Functional metagenomics reveals novel pathways of
prebiotic breakdown by human gut bacteria. PLoS One 8(9): e72766.
Chain, P. S., Carniel, E., Larimer, F. W., Lamerdin, J., Stoutland, P. O., Regala, W. M., Georgescu, A. M.,
Vergez, L. M., Land, M. L., Motin, V. L., Brubaker, R. R., Fowler, J., Hinnebusch, J., Marceau, M.,
Medigue, C., Simonet, M., Chenal-Francisque, V., Souza, B., Dacheux, D., Elliott, J. M., Derbise,
A., Hauser, L. J. & Garcia, E. (2004). Insights into the evolution of Yersinia pestis through wholegenome comparison with Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A 101(38): 1382613831.
Cho, I., Yamanishi, S., Cox, L., Methe, B. A., Zavadil, J., Li, K., Gao, Z., Mahana, D., Raju, K., Teitler, I., Li,
H., Alekseyenko, A. V. & Blaser, M. J. (2012). Antibiotics in early life alter the murine colonic
microbiome and adiposity. Nature 488(7413): 621-626.
Clark, A. G., Glanowski, S., Nielsen, R., Thomas, P. D., Kejariwal, A., Todd, M. A., Tanenbaum, D. M.,
Civello, D., Lu, F., Murphy, B., Ferriera, S., Wang, G., Zheng, X., White, T. J., Sninsky, J. J., Adams,
M. D. & Cargill, M. (2003). Inferring nonneutral evolution from human-chimp-mouse
orthologous gene trios. Science 302(5652): 1960-1963.
Cohan, F. M. (2010). Synthetic biology: now that we're creators, what should we create? Curr Biol
20(16): R675-677.
Cohan, F. M. (2011a).Are species cohesive?--A view from bacteriology. In Bacterial Population Genetics:
A Tribute to Thomas S. Whittam, 43-65 (Eds S. Walk and P. Feng). Washington, DC: American
Society for Microbiology Press.
Cohan, F. M. (2011b).A more perfect numbers game. In Los Angeles Times.
Cohan, F. M. (2011c). Q&A: Frederick Cohan. Current Biology 21(11): R412-R414.
Cohan, F. M. (2012). Science needs more Moneyball. American Scientist 100(3): 182-185.
Cohan, F. M. (2013).Species. In Brenner's Encyclopedia of Genetics, Second Edition, 506-511 (Eds S.
Maloy and K. Hughes). Amsterdam: Elsevier.
Cohan, F. M. (2016). Bacterial Speciation: Genetic Sweeps in Bacterial Species. Current Biology.
Cohan, F. M. & Kopac, S. M. (2011). Microbial genomics: E. coli relatives out of doors and out of body.
Curr Biol 21(15): R587-589.
Cohan, F. M. & Perry, E. B. (2007). A systematics for discovering the fundamental units of bacterial
diversity. Current Biology 17: R373-R386.
Collen, A. (2015). 10% Human: How Your Body's Microbes Hold the Key to Health and Happiness. New
York: HarperCollins.
Connor, N., Sikorski, J., Rooney, A. P., Kopac, S., Koeppel, A. F., Burger, A., Cole, S. G., Perry, E. B.,
Krizanc, D., Field, N. C., Slaton, M. & Cohan, F. M. (2010). The ecology of speciation in Bacillus.
Applied and Environmental Microbiology 76: 1349-1358.
Consortium, T. U. (2013). Update on activities at the Universal Protein Resource (UniProt) in 2013.
Nucleic Acids Res 41: D43-D47.
Davies, N., Field, D. & Genomic Observatories, N. (2012). Sequencing data: A genomic network to
monitor Earth. Nature 481(7380): 145.
Degnan, J. H. & Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees.
PLoS Genet 2(5): e68.
Delcher, A. L., Bratke, K. A., Powers, E. C. & Salzberg, S. L. (2007). Identifying bacterial genes and
endosymbiont DNA with Glimmer. Bioinformatics 23(6): 673-679.
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. &
Andersen, G. L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench
compatible with ARB. Appl Environ Microbiol 72(7): 5069-5072.
Dettman, J. R., Rodrigue, N., Melnyk, A. H., Wong, A., Bailey, S. F. & Kassen, R. (2012). Evolutionary
insight from whole-genome sequencing of experimentally evolved microbes. Mol Ecol 21(9):
2058-2077.
Donoghue, M. J. (1989). Phylogenies and the analysis of evolutionary sequences, with examples from
seed plants. Evolution 43: 1137-1156.
Doolittle, W. F. & Zhaxybayeva, O. (2009). On the origin of prokaryotic species. Genome Res 19(5): 744756.
Eddy, S. R. (2004). What is dynamic programming? Nat Biotechnol 22(7): 909-910.
Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Research 32: 1792-1797.
Ferea, T. L., Botstein, D., Brown, P. O. & Rosenzweig, R. F. (1999). Systematic changes in gene expression
patterns following adaptive evolution in yeast. Proc Natl Acad Sci U S A 96(17): 9721-9726.
Field, D., Amaral-Zettler, L., Cochrane, G., Cole, J. R., Dawyndt, P., Garrity, G. M., Gilbert, J., Glockner, F.
O., Hirschman, L., Karsch-Mizrachi, I., Klenk, H. P., Knight, R., Kottmann, R., Kyrpides, N., Meyer,
F., San Gil, I., Sansone, S. A., Schriml, L. M., Sterk, P., Tatusova, T., Ussery, D. W., White, O. &
Wooley, J. (2011). The Genomic Standards Consortium. PLoS Biol 9(6): e1001088.
Field, D., Garrity, G., Gray, T., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., Thomson, N., Allen, M. J.,
Angiuoli, S. V., Ashburner, M., Axelrod, N., Baldauf, S., Ballard, S., Boore, J., Cochrane, G., Cole,
J., Dawyndt, P., De Vos, P., DePamphilis, C., Edwards, R., Faruque, N., Feldman, R., Gilbert, J.,
Gilna, P., Glockner, F. O., Goldstein, P., Guralnick, R., Haft, D., Hancock, D., Hermjakob, H., HertzFowler, C., Hugenholtz, P., Joint, I., Kagan, L., Kane, M., Kennedy, J., Kowalchuk, G., Kottmann,
R., Kolker, E., Kravitz, S., Kyrpides, N., Leebens-Mack, J., Lewis, S. E., Li, K., Lister, A. L., Lord, P.,
Maltsev, N., Markowitz, V., Martiny, J., Methe, B., Mizrachi, I., Moxon, R., Nelson, K., Parkhill, J.,
Proctor, L., White, O., Sansone, S. A., Spiers, A., Stevens, R., Swift, P., Taylor, C., Tateno, Y., Tett,
A., Turner, S., Ussery, D., Vaughan, B., Ward, N., Whetzel, T., San Gil, I., Wilson, G. & Wipat, A.
(2008). The minimum information about a genome sequence (MIGS) specification. Nat
Biotechnol 26(5): 541-547.
Fierer, N., Hamady, M., Lauber, C. L. & Knight, R. (2008). The influence of sex, handedness, and washing
on the diversity of hand surface bacteria. Proc Natl Acad Sci U S A 105(46): 17994-17999.
Forslund, K., Sunagawa, S., Kultima, J. R., Mende, D. R., Arumugam, M., Typas, A. & Bork, P. (2013).
Country-specific antibiotic use practices impact the human gut resistome. Genome Res 23(7):
1163-1169.
Fosso, B., Santamaria, M., Marzano, M., Alonso-Alemany, D., Valiente, G., Donvito, G., Monaco, A.,
Notarangelo, P. & Pesole, G. (2015). BioMaS: a modular pipeline for Bioinformatic analysis of
Metagenomic AmpliconS. BMC Bioinformatics 16: 203.
Francisco, J. C., Cohan, F. M. & Krizanc, D. (2014). Accuracy and efficiency of algorithms for the
demarcation of bacterial ecotypes from DNA sequence data. Int. J. Bioinformatics Research and
Applications 10: 409-425.
Funch, P. & Kristensen, R. (1995). Cycliophora is a new phylum with affinities to Entoprocta and
Ectoprocta. Nature 378: 711-714.
Futuyma, D. J. (1998). Evolutionary Biology.
Gardy, J. L., Laird, M. R., Chen, F., Rey, S., Walsh, C. J., Ester, M. & Brinkman, F. S. (2005). PSORTb v.2.0:
expanded prediction of bacterial protein subcellular localization and insights gained from
comparative proteome analysis. Bioinformatics 21(5): 617-623.
Genoways, H. H. & Choate, J. r. (1972). A multivariate analysis of systematic relationships among
populations of the short-tailed shrew (genus Blarina) in Nebraska. Systematic Zoology 21: 106116.
Gilbert, J. A., Meyer, F., Jansson, J., Gordon, J., Pace, N., Tiedje, J., Ley, R., Fierer, N., Field, D., Kyrpides,
N., Glockner, F. O., Klenk, H. P., Wommack, K. E., Glass, E., Docherty, K., Gallery, R., Stevens, R. &
Knight, R. (2010). The Earth Microbiome Project: Meeting report of the "1 EMP meeting on
sample selection and acquisition" at Argonne National Laboratory October 6 2010. Stand
Genomic Sci 3(3): 249-253.
Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. & Brilliant, L. (2009). Detecting
influenza epidemics using search engine query data. Nature 457(7232): 1012-1014.
Goldsmith, G. R. (2015). The field guide, rebooted. Science 349: 594.
Gómez-Lozano, M., Marvig, R. L., Molin, S. & Long, K. S. (2012). Genome-wide identification of novel
small RNAs in Pseudomonas aeruginosa. Environ Microbiol 14(8): 2006-2016.
Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. (2010). New algorithms
and methods to estimate maximum-likelihood phylogenies: assessing the performance of
PhyML 3.0. Syst Biol 59(3): 307-321.
Hahne, H., Mader, U., Otto, A., Bonn, F., Steil, L., Bremer, E., Hecker, M. & Becher, D. (2010). A
comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress
adaptation. J Bacteriol 192(3): 870-882.
Hao, W. & Golding, G. B. (2006). The fate of laterally transferred genes: life in the fast lane to adaptation
or death. Genome Res 16(5): 636-643.
Harel, D. (2000).Sometimes we just don't know. In computers Ltd.: what they really can't do, 91-117
Oxford: Oxford Univ. Press.
Harvey, P. H. & Pagel, M. D. (1991). The Comparative Method in Evolutionary Biology. Oxford: Oxford
University Press.
Herring, C. D., Raghunathan, A., Honisch, C., Patel, T., Applebee, M. K., Joyce, A. R., Albert, T. J., Blattner,
F. R., van den Boom, D., Cantor, C. R. & Palsson, B. O. (2006). Comparative genome sequencing
of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet
38(12): 1406-1412.
Hess, M. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.
Science 331: 463-467.
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. (2001). Bayesian inference of phylogeny and
its impact on evolutionary biology. Science 294(5550): 2310-2314.
Hughes, J. B., Hellmann, J. J., Ricketts, T. H. & Bohannan, B. J. (2001). Counting the uncountable:
statistical approaches to estimating microbial diversity. Appl Environ Microbiol 67(10): 43994406.
Hurwitz, B. L. & Sullivan, M. B. (2013). The Pacific Ocean virome (POV): a marine viral metagenomic
dataset and associated protein clusters for quantitative viral ecology. PLoS One 8(2): e57355.
Huson, D. H. & Steel, M. (2004). Phylogenetic trees based on gene content. Bioinformatics 20(13): 20442049.
Hyatt, D., Chen, G. L., Locascio, P. F., Land, M. L., Larimer, F. W. & Hauser, L. J. (2010). Prodigal:
prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics
11: 119.
Inskeep, W. P., Rusch, D. B., Jay, Z. J., Herrgard, M. J., Kozubal, M. A., Richardson, T. H., Macur, R. E.,
Hamamura, N., Jennings, R., Fouke, B. W., Reysenbach, A. L., Roberto, F., Young, M., Schwartz,
A., Boyd, E. S., Badger, J. H., Mathur, E. J., Ortmann, A. C., Bateson, M., Geesey, G. & Frazier, M.
(2010). Metagenomes from high-temperature chemotrophic systems reveal geochemical
controls on microbial community structure and function. PLoS One 5(3): e9773.
Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S. & Miyata, T. (1989). Evolutionary relationship of
archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated
genes. Proc Natl Acad Sci U S A 86(23): 9355-9359.
Kämpfer, P., Kroppenstedt, R. M. & Dott, W. (1991). A numerical classification of the genera
Streptomyces and Streptoverticillium using miniaturized physiological tests. Journal of General
Microbiology 137: 1831-1891.
Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. (2014). Data, information,
knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42(Database issue):
D199-205.
Keeling, P. J. & Palmer, J. D. (2008). Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9(8):
605-618.
Khosravi, A. & Mazmanian, S. K. (2013). Disruption of the gut microbiome as a risk factor for microbial
infections. Curr Opin Microbiol 16(2): 221-227.
Kim, J. S., Makama, M., Petito, J., Park, N. H., Cohan, F. M. & Dungan, R. S. (2012). Diversity of Bacteria
and Archaea in hypersaline sediment from Death Valley National Park, California.
MicrobiologyOpen 1(2): 135-148.
Kleiner, M., Wentrup, C., Lott, C., Teeling, H., Wetzel, S., Young, J., Chang, Y. J., Shah, M., VerBerkmoes,
N. C., Zarzycki, J., Fuchs, G., Markert, S., Hempel, K., Voigt, B., Becher, D., Liebeke, M., Lalk, M.,
Albrecht, D., Hecker, M., Schweder, T. & Dubilier, N. (2012). Metaproteomics of a gutless marine
worm and its symbiotic microbial community reveal unusual pathways for carbon and energy
use. Proceedings of the National Academy of Sciences of the United States of America 109(19):
E1173-E1182.
Knight, R., Jansson, J., Field, D., Fierer, N., Desai, N., Fuhrman, J. A., Hugenholtz, P., van der Lelie, D.,
Meyer, F., Stevens, R., Bailey, M. J., Gordon, J. I., Kowalchuk, G. A. & Gilbert, J. A. (2012).
Unlocking the potential of metagenomics through replicated experimental design. Nat
Biotechnol 30(6): 513-520.
Koeppel, A., Perry, E. B., Sikorski, J., Krizanc, D., Warner, W. A., Ward, D. M., Rooney, A. P., Brambilla, E.,
Connor, N., Ratcliff, R. M., Nevo, E. & Cohan, F. M. (2008). Identifying the fundamental units of
bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics.
Proceedings of the National Academy of Sciences 105: 2504-2509.
Konstantinidis, K. T. & Tiedje, J. M. (2005). Towards a genome-based taxonomy for prokaryotes. J
Bacteriol 187(18): 6258-6264.
Kopac, S., Wang, Z., Wiedenbeck, J., Sherry, J., Wu, M. & Cohan, F. M. (2014). Genomic heterogeneity
and ecological speciation within one subspecies of Bacillus subtilis. Applied and Environmental
Microbiology 80: 4842-4853.
Kuhn, T. (1996). The Structure of Scientific Revolutions. Chicago: University of Chicago.
Kwong, J. C., Mercoulia, K., Tomita, T., Easton, M., Li, H. Y., Bulach, D. M., Stinear, T. P., Seemann, T. &
Howden, B. P. (2015). Prospective whole genome sequencing enhances national surveillance of
Listeria monocytogenes. Journal of Clinical Microbiology.
Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H.,
Brand, T., Willerslev, E., Rowley-Conwy, P., Andersson, L. & Cooper, A. (2005). Worldwide
phylogeography of wild boar reveals multiple centers of pig domestication. Science 307(5715):
1618-1621.
Laurin, M. (2010). Assessment of the Relative Merits of a Few Methods to Detect Evolutionary Trends.
Systematic Biology 59: 689-704.
Lay, C. Y., Mykytczuk, N. C., Yergeau, E., Lamarche-Gagnon, G., Greer, C. W. & Whyte, L. G. (2013).
Defining the functional potential and active community members of a sediment microbial
community in a high-arctic hypersaline subzero spring. Appl Environ Microbiol 79(12): 36373648.
Levine, R. S., Peterson, A. T., Yorita, K. L., Carroll, D., Damon, I. K. & Reynolds, M. G. (2007). Ecological
niche and geographic distribution of human monkeypox in Africa. PLoS One 2(1): e176.
Li, M., Badger, J. H., Chen, X., Kwong, S., Kearney, P. & Zhang, H. (2001). An information-based sequence
distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2):
149-154.
Lin, G. N., Cai, Z., Lin, G., Chakraborty, S. & Xu, D. (2009). ComPhy: prokaryotic composite distance
phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 10 Suppl 1: S5.
Lozier, L. D., Aniello, P. & Hickerson, M. J. (2009). Predicting the distribution of Sasquatch in western
North America: anything goes with ecological niche modelling. Journal of Biogeography.
Lozupone, C., Hamady, M. & Knight, R. (2006). UniFrac--an online tool for comparing microbial
community diversity in a phylogenetic context. BMC Bioinformatics 7: 371.
Lozupone, C. & Knight, R. (2005). UniFrac: a new phylogenetic method for comparing microbial
communities. Appl Environ Microbiol 71(12): 8228-8235.
Lozupone, C. A. & Knight, R. (2007). Global patterns in bacterial diversity. Proc Natl Acad Sci U S A
104(27): 11436-11440.
Luo, C., Walk, S. T., Gordon, D. M., Feldgarden, M., Tiedje, J. M. & Konstantinidis, K. T. (2011). Genome
sequencing of environmental Escherichia coli expands understanding of the ecology and
speciation of the model bacterial species. Proc Natl Acad Sci U S A 108(17): 7200-7205.
Mackay, T. F., Richards, S., Stone, E. A., Barbadilla, A., Ayroles, J. F., Zhu, D., Casillas, S., Han, Y., Magwire,
M. M., Cridland, J. M., Richardson, M. F., Anholt, R. R., Barron, M., Bess, C., Blankenburg, K. P.,
Carbone, M. A., Castellano, D., Chaboub, L., Duncan, L., Harris, Z., Javaid, M., Jayaseelan, J. C.,
Jhangiani, S. N., Jordan, K. W., Lara, F., Lawrence, F., Lee, S. L., Librado, P., Linheiro, R. S., Lyman,
R. F., Mackey, A. J., Munidasa, M., Muzny, D. M., Nazareth, L., Newsham, I., Perales, L., Pu, L. L.,
Qu, C., Ramia, M., Reid, J. G., Rollmann, S. M., Rozas, J., Saada, N., Turlapati, L., Worley, K. C.,
Wu, Y. Q., Yamamoto, A., Zhu, Y., Bergman, C. M., Thornton, K. R., Mittelman, D. & Gibbs, R. A.
(2012). The Drosophila melanogaster Genetic Reference Panel. Nature 482(7384): 173-178.
Mackelprang, R., Waldrop, M. P., DeAngelis, K. M., David, M. M., Chavarria, K. L., Blazewicz, S. J., Rubin,
E. M. & Jansson, J. K. (2011). Metagenomic analysis of a permafrost microbial community
reveals a rapid response to thaw. Nature 480(7377): 368-371.
Mallet, J. (1995). A species definition for the modern synthesis. Trends Ecol. Evol. 10: 294-299.
McGrayne, S. B. (2011). The theory that would not die: how bayes' rule cracked the enigma code, hunted
down russian submarines, & emerged triumphant from two centuries of controversy. New
Haven: Yale.
McHardy, A. C., Martin, H. G., Tsirigos, A., Hugenholtz, P. & Rigoutsos, I. (2007). Accurate phylogenetic
classification of variable-length DNA fragments. Nat Methods 4(1): 63-72.
McMahon, M. D., Guan, C., Handelsman, J. & Thomas, M. G. (2012). Metagenomic analysis of
Streptomyces lividans reveals host-dependent functional expression. Appl Environ Microbiol
78(10): 3622-3629.
Merhej, V., Royer-Carenzi, M., Pontarotti, P. & Raoult, D. (2009). Massive comparative genomic analysis
reveals convergent evolution of specialized bacteria. Biol Direct 4: 13.
Michener, C. D. & Sokal, R. R. (1057). A Quantitative Approach to a Problem in Classification. Evolution
11: 130-162.
Mikkelsen, T. S., Hillier, L. W. & authors, a. m. o. (2005). Initial sequence of the chimpanzee genome and
comparison with the human genome. Nature 437(7055): 69-87.
Miller, J. R., Koren, S. & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data.
Genomics 95(6): 315-327.
Morrison, D. A. (2009). Why would phylogeneticists ignore computerized sequence alignment? 58: 150158.
Muegge, B. D., Kuczynski, J., Knights, D., Clemente, J. C., Gonzalez, A., Fontana, L., Henrissat, B., Knight,
R. & Gordon, J. I. (2011). Diet drives convergence in gut microbiome functions across
mammalian phylogeny and within humans. Science 332(6032): 970-974.
Myers, E. W., Sutton, G. G., Smith, H. O., Adams, M. D. & Venter, J. C. (2002). On the sequencing and
assembly of the human genome. Proc Natl Acad Sci U S A 99(7): 4145-4146.
Nayfach, S., Bradley, P. H., Wyman, S. K., Laurent, T. J., Williams, A., Eisen, J. A., Pollard, K. S. & Sharpton,
T. J. (2015). Automated and Accurate Estimation of Gene Family Abundance from Shotgun
Metagenomes. PLoS Comput Biol 11(11): e1004573.
Nelson-Sathi, S., Dagan, T., Landan, G., Janssen, A., Steel, M., McInerney, J. O., Deppenmeier, U. &
Martin, W. F. (2012). Acquisition of 1,000 eubacterial genes physiologically transformed a
methanogen at the origin of Haloarchaea. Proc Natl Acad Sci U S A 109(50): 20537-20542.
Nosenko, T., Schreiber, F., Adamska, M., Adamski, M., Eitel, M., Hammel, J., Maldonado, M., Muller, W.
E., Nickel, M., Schierwater, B., Vacelet, J., Wiens, M. & Worheide, G. (2013). Deep metazoan
phylogeny: when different genes tell different stories. Mol Phylogenet Evol 67(1): 223-233.
Olsen, M. T., Nowack, S., Wood, J. M., Becraft, E. D., LaButti, K., Lipzen, A., Martin, J., Schackwitz, W. S.,
Rusch, D. B., Cohan, F. M., Bryant, D. A. & Ward, D. M. (2015). The molecular dimension of
microbial species: 3. Comparative genomics of Synechococcus isolates with different light
responses and in situ diel transcription patterns of associated putative ecotypes in the
Mushroom Spring microbial mat. Front Microbiol 6: 604.
Pavlidis, P., Zivkovic, D., Stamatakis, A. & Alachiotis, N. (2013). SweeD: likelihood-based detection of
selective sweeps in thousands of genomes. Mol Biol Evol 30(9): 2224-2234.
Penn, O., Privman, E., Ashkenazy, H., Landan, G., Graur, D. & Pupko, T. (2010). GUIDANCE: a web server
for assessing alignment confidence scores. Nucleic Acids Res 38(Web Server issue): W23-28.
Perez-Cobas, A. E., Gosalbes, M. J., Friedrichs, A., Knecht, H., Artacho, A., Eismann, K., Otto, W., Rojo, D.,
Bargiela, R., von Bergen, M., Neulinger, S. C., Daumer, C., Heinsen, F. A., Latorre, A., Barbas, C.,
Seifert, J., dos Santos, V. M., Ott, S. J., Ferrer, M. & Moya, A. (2013). Gut microbiota disturbance
during antibiotic therapy: a multi-omic approach. Gut 62(11): 1591-1601.
Peterson, A. T. (2009). Shifting suitability for malaria vectors across Africa with warming climates. BMC
Infect Dis 9: 59.
Peterson, I., Borrell, L. N., El-Sadr, W. & Teklehaimanot, A. (2009). A temporal-spatial analysis of malaria
transmission in Adama, Ethiopia. Am J Trop Med Hyg 81(6): 944-949.
Plewniak, F., Koechler, S., Navet, B., Dugat-Bony, E., Bouchez, O., Peyret, P., Seby, F., Battaglia-Brunet, F.
& Bertin, P. N. (2013). Metagenomic insights into microbial metabolism affecting arsenic
dispersion in Mediterranean marine sediments. Mol Ecol 22(19): 4870-4883.
Pollan, M. (2013).Some of my best friends are germs. In New York TimesNew York.
Popa, O., Hazkani-Covo, E., Landan, G., Martin, W. & Dagan, T. (2011). Directed networks reveal genomic
barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res
21(4): 599-609.
Probert, R. J., Daws, M. I. & Hay, F. R. (2009). Ecological correlates of ex situ seed longevity: a
comparative study on 195 species. Annals of Botany 104(1): 57-69.
Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. (2012). NCBI Reference Sequences (RefSeq):
current status, new features and genome annotation policy. Nucleic Acids Res 40(Database
issue): D130-135.
Richards, C. L., Rosas, U., Banta, J., Bhambhra, N. & Purugganan, M. D. (2012). Genome-wide patterns of
Arabidopsis gene expression in nature. PLoS Genet 8(4): e1002662.
Richards, V. P., Palmer, S. R., Pavinski Bitar, P. D., Qin, X., Weinstock, G. M., Highlander, S. K., Town, C.
D., Burne, R. A. & Stanhope, M. J. (2014). Phylogenomics and the dynamic genome evolution of
the genus Streptococcus. Genome Biol Evol 6(4): 741-753.
Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J. F., Darling, A., Malfatti, S.,
Swan, B. K., Gies, E. A., Dodsworth, J. A., Hedlund, B. P., Tsiamis, G., Sievert, S. M., Liu, W. T.,
Eisen, J. A., Hallam, S. J., Kyrpides, N. C., Stepanauskas, R., Rubin, E. M., Hugenholtz, P. & Woyke,
T. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature
499(7459): 431-437.
Robertson, D. E., Chaplin, J. A., DeSantis, G., Podar, M., Madden, M., Chi, E., Richardson, T., Milan, A.,
Miller, M., Weiner, D. P., Wong, K., McQuaid, J., Farwell, B., Preston, L. A., Tan, X., Snead, M. A.,
Keller, M., Mathur, E., Kretz, P. L., Burk, M. J. & Short, J. M. (2004). Exploring nitrilase sequence
space for enantioselective catalysis. Appl Environ Microbiol 70(4): 2429-2436.
Rocca, J. D., Hall, E. K., Lennon, J. T., Evans, S. E., Waldrop, M. P., Cotner, J. B., Nemergut, D. R., Graham,
E. B. & Wallenstein, M. D. (2015). Relationships between protein-encoding gene abundance and
corresponding process are commonly assumed yet rarely observed. ISME J 9(8): 1693-1699.
Rokas, A., Williams, B. L., King, N. & Carroll, S. B. (2003). Genome-scale approaches to resolving
incongruence in molecular phylogenies. Nature 425(6960): 798-804.
Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed
models. Bioinformatics 19(12): 1572-1574.
Ropars, J. (2015). Adaptive Horizontal Gene Transfers between Multiple Cheese-Associated Fungi.
Current Biology 25: 1-8.
Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Mol Biol Evol 4(4): 406-425.
Salathé, M., Freifeld, C. C., Mekaru, S. R., Tomasulo, A. F. & Brownstein, J. S. (2013). Influenza A (H7N9)
and the Importance of Digital Epidemiology. N Engl J Med.
Sawchik, T. (2015). Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak. New York:
Flatiron Books.
Schloss, P. D. & Handelsman, J. (2004). Status of the microbial census. Microbiol Mol Biol Rev 68(4): 686691.
Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A.,
Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D. J. &
Weber, C. F. (2009). Introducing mothur: open-source, platform-independent, communitysupported software for describing and comparing microbial communities. Appl Environ
Microbiol 75(23): 7537-7541.
Schoenfeld, B. (2015).How Ned Yost Made the Kansas City Royals Unstoppable. In New York Times.
She, X., Jiang, Z., Clark, R. A., Liu, G., Cheng, Z., Tuzun, E., Church, D. M., Sutton, G., Halpern, A. L. &
Eichler, E. E. (2004). Shotgun sequence assembly and recent segmental duplications within the
human genome. Nature 431(7011): 927-930.
Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some Don't. New York:
Penguin.
Simon, C., Wiezer, A., Strittmatter, A. W. & Daniel, R. (2009). Phylogenetic diversity and metabolic
potential revealed in a glacier ice metagenome. Appl Environ Microbiol 75(23): 7519-7526.
Singer, M. S., Lichter-Marck, I. H., Farkas, T. E., Aaron, E., Whitney, K. D. & Mooney, K. A. (2014).
Herbivore diet breadth mediates the cascading effects of carnivores in food webs. Proc Natl
Acad Sci U S A 111(26): 9521-9526.
Sogin, M. L., Morrison, H. G., Huber, J. A., Mark Welch, D., Huse, S. M., Neal, P. R., Arrieta, J. M. &
Herndl, G. J. (2006). Microbial diversity in the deep sea and the underexplored "rare biosphere".
Proc Natl Acad Sci U S A 103(32): 12115-12120.
Sokol, H., Pigneur, B., Watterlot, L., Lakhdari, O., Bermudez-Humaran, L. G., Gratadoux, J. J., Blugeon, S.,
Bridonneau, C., Furet, J. P., Corthier, G., Grangette, C., Vasquez, N., Pochart, P., Trugnan, G.,
Thomas, G., Blottiere, H. M., Dore, J., Marteau, P., Seksik, P. & Langella, P. (2008).
Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut
microbiota analysis of Crohn disease patients. Proc Natl Acad Sci U S A 105(43): 16731-16736.
Sommer, M. O., Dantas, G. & Church, G. M. (2009). Functional characterization of the antibiotic
resistance reservoir in the human microflora. Science 325(5944): 1128-1131.
Sumby, P., Whitney, A. R., Graviss, E. A., DeLeo, F. R. & Musser, J. M. (2006). Genome-wide analysis of
group a streptococci reveals a mutation that modulates global phenotype and disease
specificity. PLoS Pathog 2(1): e5.
Sumner, J. G., Jarvis, P. D., Fernandez-Sanchez, J., Kaine, B. T., Woodhams, M. D. & Holland, B. R. (2012).
Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 61(6): 10691074.
Sun, Y.-C., Jarrett, Clayton O., Bosio, Christopher F. & Hinnebusch, B. J. (2014). Retracing the
Evolutionary Path that Led to Flea-Borne Transmission of Yersinia pestis. Cell Host & Microbe
15(5): 578-586.
Swenson, M. S., Suri, R., Linder, C. R. & Warnow, T. (2012). SuperFine: fast and accurate supertree
estimation. Syst Biol 61(2): 214-227.
Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M.,
Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V.,
Vasudevan, S., Wolf, Y. I., Yin, J. J. & Natale, D. A. (2003). The COG database: an updated version
includes eukaryotes. BMC Bioinformatics 4: 41.
Thalmann, O., Shapiro, B., Cui, P., Schuenemann, V. J., Sawyer, S. K., Greenfield, D. L., Germonpre, M. B.,
Sablin, M. V., Lopez-Giraldez, F., Domingo-Roura, X., Napierala, H., Uerpmann, H. P., Loponte, D.
M., Acosta, A. A., Giemsch, L., Schmitz, R. W., Worthington, B., Buikstra, J. E., Druzhkova, A.,
Graphodatsky, A. S., Ovodov, N. D., Wahlberg, N., Freedman, A. H., Schweizer, R. M., Koepfli, K.
P., Leonard, J. A., Meyer, M., Krause, J., Paabo, S., Green, R. E. & Wayne, R. K. (2013). Complete
mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science
342(6160): 871-874.
Tomes, N. (1998). The Gospel of Germs: Men, Women, and the Microbe in American Life. Cambridge,
Mass.: Harvard University Press.
Toomey, D. (2013). Weird Life: The Search for Life that Is Very, Very Different from our Own. New York:
Norton.
Touchon, M., Hoede, C., Tenaillon, O., Barbe, V., Baeriswyl, S., Bidet, P., Bingen, E., Bonacorsi, S.,
Bouchier, C., Bouvet, O., Calteau, A., Chiapello, H., Clermont, O., Cruveiller, S., Danchin, A.,
Diard, M., Dossat, C., Karoui, M. E., Frapy, E., Garry, L., Ghigo, J. M., Gilles, A. M., Johnson, J., Le
Bouguenec, C., Lescat, M., Mangenot, S., Martinez-Jéhanne, V., Matic, I., Nassif, X., Oztas, S.,
Petit, M. A., Pichon, C., Rouy, Z., Ruf, C. S., Schneider, D., Tourret, J., Vacherie, B., Vallenet, D.,
Médigue, C., Rocha, E. P. & Denamur, E. (2009). Organised genome dynamics in the Escherichia
coli species results in highly diverse adaptive paths. PLoS Genet 5(1): e1000344.
Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R. & Gordon, J. I. (2007). The
human microbiome project. Nature 449(7164): 804-810.
Tyson, G. W., Chapman, J., Hugenholtz, P., Allen, E. E., Ram, R. J., Richardson, P. M., Solovyev, V. V.,
Rubin, E. M., Rokhsar, D. S. & Banfield, J. F. (2004). Community structure and metabolism
through reconstruction of microbial genomes from the environment. Nature 428(6978): 37-43.
van Domselaar, G., Graham, M. & Strothard, P. (2014).Prokaryotic genome annotation. In Bioinformatics
and Data Analysis in Microbiology, 81-111 (Ed Ö. Taşton Bishop). Norfolk: Caister Academic
Press.
Velasquez-Manoff, M. (2013).A cure for the allergy epidemic? In New York TimesNew York Times.
Veyrier, F., Pletzer, D., Turenne, C. & Behr, M. A. (2009). Phylogenetic detection of horizontal gene
transfer during the step-wise genesis of Mycobacterium tuberculosis. BMC Evol Biol 9: 196.
Vital, M., Chai, B., Ostman, B., Cole, J., Konstantinidis, K. T. & Tiedje, J. M. (2015). Gene expression
analysis of E. coli strains provides insights into the role of gene regulation in diversification. ISME
J 9(5): 1130-1140.
Vos, M. (2011). A species concept for bacteria based on adaptive divergence. Trends Microbiol 19(1): 17.
Vos, M., te Beek, T. A., van Driel, M. A., Huynen, M. A., Eyre-Walker, A. & van Passel, M. W. (2013).
ODoSE: a webserver for genome-wide calculation of adaptive divergence in prokaryotes. PLoS
One 8(5): e62447.
Wasser, S. K., Brown, L., Mailand, C., Mondol, S., Clark, W., Laurie, C. & Weir, B. S. (2015).
CONSERVATION. Genetic assignment of large seizures of elephant ivory reveals Africa's major
poaching hotspots. Science 349(6243): 84-87.
Waterston, R. H., Lander, E. S. & Sulston, J. E. (2002). On the sequencing of the human genome. Proc
Natl Acad Sci U S A 99(6): 3712-3716.
Welch, R. A., Burland, V., Plunkett, G., 3rd, Redford, P., Roesch, P., Rasko, D., Buckles, E. L., Liou, S. R.,
Boutin, A., Hackett, J., Stroud, D., Mayhew, G. F., Rose, D. J., Zhou, S., Schwartz, D. C., Perna, N.
T., Mobley, H. L., Donnenberg, M. S. & Blattner, F. R. (2002). Extensive mosaic structure revealed
by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A
99(26): 17020-17024.
Wheeler, T. J. & Eddy, S. R. (2013). nhmmer: DNA homology search with profile HMMs. Bioinformatics
29(19): 2487-2489.
Whidden, C., Zeh, N. & Beiko, R. G. (2014). Supertrees Based on the Subtree Prune-and-Regraft Distance.
Syst Biol 63(4): 566-581.
White, M. A., Ane, C., Dewey, C. N., Larget, B. R. & Payseur, B. A. (2009). Fine-scale phylogenetic
discordance across the house mouse genome. PLoS Genet 5(11): e1000729.
Wiedenbeck, J. & Cohan, F. M. (2011). Origins of bacterial diversity through horizontal gene transfer and
adaptation to new ecological niches. FEMS Microbiology Reviews 35: 957-976.
Williamson, S. H., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D. & Nielsen, R. (2007).
Localizing recent adaptive evolution in the human genome. PLoS Genet 3(6): e90.
Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N. N., Kunin, V., Goodwin, L., Wu,
M., Tindall, B. J., Hooper, S. D., Pati, A., Lykidis, A., Spring, S., Anderson, I. J., D'Haeseleer, P.,
Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J. F., Lucas,
S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E. M., Kyrpides, N. C., Klenk, H. P.
& Eisen, J. A. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.
Nature 462(7276): 1056-1060.
Wu, G. D., Chen, J., Hoffmann, C., Bittinger, K., Chen, Y. Y., Keilbaugh, S. A., Bewtra, M., Knights, D.,
Walters, W. A., Knight, R., Sinha, R., Gilroy, E., Gupta, K., Baldassano, R., Nessel, L., Li, H.,
Bushman, F. D. & Lewis, J. D. (2011). Linking long-term dietary patterns with gut microbial
enterotypes. Science 334(6052): 105-108.
Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8): 1586-1591.
Zerbino, D. R. & Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn
graphs. Genome Res 18(5): 821-829.
Zhang, C., Zhang, M., Pang, X., Zhao, Y., Wang, L. & Zhao, L. (2012). Structural resilience of the gut
microbiota in adult mice under high-fat dietary perturbations. ISME J 6(10): 1848-1857.
Zhang, S. (2015).Microbiome Startup uBiome Will Sequence Poop for the CDC. In Wired.
Zhao, L. (2013). The gut microbiota and obesity: from correlation to causality. Nat Rev Microbiol 11(9):
639-647.
Zhong, Y., Jia, Y., Gao, Y., Tian, D., Yang, S. & Zhang, X. (2013). Functional requirements driving the gene
duplication in 12 Drosophila species. BMC Genomics 14: 555.
Zmasek, C. M. & Eddy, S. R. (2001). A simple algorithm to infer gene duplication and speciation events
on a gene tree. Bioinformatics 17(9): 821-828.