Download Genome Sequence of an Extremely Halophilic Archaeon

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Exome sequencing wikipedia , lookup

Molecular ecology wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene regulatory network wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transposable element wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene wikipedia , lookup

Genomic library wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Extremely Halophilic Archaeon Sequence
383
21
Genome Sequence
of an Extremely Halophilic Archaeon
Shiladitya DasSarma
INTRODUCTION
Extreme halophiles are novel microorganisms that require 5–10 times the salinity of
seawater (ca. 3–5M NaCl) for optimal growth (1,2). They include diverse prokaryotic
species, both archaeal and bacterial, and some eukaryotic organisms. Extreme halophiles are found in hypersaline environments near the sea or salt deposits of marine or
nonmarine origin. Two of the largest hypersaline lakes supporting a variety of halophilic species are the Great Salt Lake in the western United States and the Dead Sea in
the Middle East. Some of the most interesting hypersaline environments are small artificial solar salterns used for producing salt from the sea, which are distributed throughout the world. Many hypersaline environments exhibit gradients of increasing salinity
temporally and produce sequential growth of progressively more halophilic species,
including complex microbial mats and spectacular blooms of bright red and red-orange
colored species. These environments are important ecologically, frequently supporting
entire populations of such exotic birds as pink flamingoes, which obtain their color from
the pigmented halophilic microorganisms. A critical feature of halophilic microbes that
prevents cell lysis in hypersaline environments is their high internal concentration of
compatible solutes (e.g., amino acids, polyols, and salts), which act as osmoprotectants.
Although a wide variety of halophiles has been cultured, the genome of only a single
extreme halophile, Halobacterium sp NRC-1, has been completely sequenced thus far
(3,4). This species is a typical halophile commonly found in many hypersaline environments, including the Great Salt Lake and solar salterns. Phylogenetically, it is classified
as an archaeon, a member of the third branch of life (Fig. 1). It has a growth optimum
of 4.5M NaCl, close to the saturation point, and a high concentration of K+ salts internally. Halobacterium NRC-1 is a mesophilic archaeon, with a temperature optimum of
42oC for growth. Alhough Halobacterium species are thought to have limited physiological capabilities, strain NRC-1 is metabolically quite versatile, growing aerobically,
anaerobically, and phototrophically. Phototrophic growth is mediated by the light-driven
proton pumping of bacteriorhodopsin, which forms a two-dimensional crystalline lattice
in the purple membrane. Halobacterium NRC-1 is also highly resistant to ultraviolet and gradiation and displays sophisticated motility responses, including phototaxis, chemotaxis,
From: Microbial Genomes
Edited by: C. M. Fraser, T. D. Read, and K. E. Nelson © Humana Press Inc., Totowa, NJ
383
384
DasSarma
Fig. 1. Whole genome tree of selected archaeal organisms. Gene content phylogeny done by
neighbor-joining using the SHOT web server (19) indicates that Halobacterium is located at
the base of the archaeal branch of the phylogenetic tree.
and gas vesicle-mediated flotation. One of the most notable features of Halobacterium
NRC-1, revealed by genome sequencing, is a highly acidic proteome, which is likely
essential to maintain protein solubility and function in high salinity. Significantly, this
organism is amenable to analysis using well-developed genetic methodology, including
gene knockouts, expression vectors, and complementation systems, which make Halobacterium NRC-1 a good model for functional genomic studies among extremophiles
and archaea (2).
In addition to Halobacterium NRC-1, several other halophiles are the subject of
ongoing genome projects. The most notable among these are two Dead Sea archaea,
Haloarcula marismortui and Haloferax volcanii (1), which are slightly less halophilic
than Halobacterium NRC-1, with an optimum salinity of 2–3M NaCl and a high magnesium ion tolerance, reflecting the salt composition of their environment. They also
display metabolic capability for growth in media containing simple sugars and carbohydrates as carbon and energy sources. Several other interesting categories of halophiles worthy of genomic studies include alkaliphilic halophiles, which grow in soda
lakes with pH of 9.0–11.0; psychrotrophic halophiles, which grow at freezing temperatures in Antarctic lakes; bacterial halophiles, which tolerate a wide range of salinity;
and eukaryotic halophiles, such as the green algae, Dunaliella salina. Finally, sequencing of a haloarchaeal strain with a nearly identical chromosome to strain NRC-1 is also
in progress. A listing of current genome projects on halophiles is maintained on the Halophile Genomes Web site at the University of Maryland Biotechnology Institute, Center
of Marine Biotechnology (http://zdna2.umbi.umd.edu).
Extremely Halophilic Archaeon Sequence
385
THE HALOBACTERIUM GENOME
The genomes of Halobacterium species were originally studied a half-century ago;
they are composed of two components, a major fraction that is G+C-rich and a relatively A+T-rich (58% G+C) satellite (5). Subsequent studies showed that the satellite
deoxyribonucleic acid (DNA) corresponded mainly to large heterogeneous extrachromosomal replicons containing many transposable insertion sequence (IS) elements (6).
For Halobacterium NRC-1, extensive mapping revealed the presence of three replicons:
pNRC100, about 200 kbp; pNRC200, nearly twice the size of pNRC100; and a 2-Mbp
chromosome (Fig. 2) (7,8). The pNRC100 replicon was found to be partly identical to
pNRC200 and to exist as inversion isomers (7). The chromosomes of strain NRC-1 and
another wild-type strain, GRB, were compared by restriction mapping, which showed
extensive regions of similarity and a few regions with differences, including a large
inversion and an insertion. Ordered cosmid libraries representing the genomes of
Halobacterium species GRB and H. volcanii were also constructed and compared by
hybridization, which indicated the lack of any detectable conserved gene organization
(9). These and other mapping projects suggest that significant diversity exists within
the genomes of halophilic archaea.
Genome Sequencing and Analysis
Because of the high G+C composition and the large number of IS elements, the Halobacterium NRC-1 genome was sequenced in two stages. Initially, the pNRC100 replicon
was sequenced by a combination of random shotgun sequencing of libraries made from
purified covalently closed circular DNA and directed sequencing of cloned and mapped
HindIII fragments (3,7). This approach permitted the assembly of an unstable replicon
that undergoes frequent DNA rearrangements, including inversion isomerization, and
contains many IS elements. Subsequently, whole genome random shotgun sequencing
was performed, providing 7.5´ coverage of the relatively stable large chromosome (4).
Remaining lower-quality regions were sequenced using polymerase chain reaction fragments and by primer walking. The NRC-1 genome was assembled using the Phred, Phrap,
and Consed programs, initially masking all the known and putative new IS elements,
to avoid the formation of chimeric contigs (4,10).
The complete genome sequence of Halobacterium NRC-1 revealed a 2,571,010-bp
genome, including the 2,014,239-bp G+C-rich chromosome, and two smaller circles,
191,346-bp pNRC100, and 365,425-bp pNRC200 (Table 1; Fig. 2) (3,4). Interestingly,
pNRC100 and pNRC200 contained a 145,428-bp region of 100% identity, including
33- to 39-kb inverted repeats, which mediate inversion isomerization; the small single
copy region; and a part of the large single copy regions (Fig. 2) (7). The unique regions
of the large single copy region contained 45,918 bp for pNRC100 and 219,997 bp for
pNRC200. Glimmer (Gene Locator and Interpolated Markov Modeler) was used to identify 2,630 likely genes in the genome, of which 64% coded for proteins with significant
matches to the databases (4). In addition, 52 ribonucleic acid (RNA) genes were identified. About 40 genes in pNRC100 and pNRC200 coded for proteins likely to be essential
or important for cell viability, such as a DNA polymerase, TBP and TFB transcription
factors, and the arginyl–tRNA (transfer RNA) synthetase, suggesting that these replicons should be classified as minichromosomes rather than megaplasmids (3,4).
386
DasSarma
Extremely Halophilic Archaeon Sequence
387
Proteome Analysis
One of the most dramatic results of genome sequencing of Halobacterium NRC-1
was the finding of an extremely acidic complement of encoded proteins, which is likely
directly related to protein function in its hypersaline (>4M KCl) cytoplasm (11). Calculated isoelectric points (pIs) for predicted proteins showed an average pI of approx 5,
a prediction confirmed by proteomic analysis (Fig. 3). Similarly, acidic proteomes were
predicted from partial genome sequences of two other halophiles, H. marismortui and
H. volcanii. In contrast, the average pIs of nearly all other proteomes are close to neutral. Notable exceptions are Methanobacterium thermoautotrophicum, which also contains both an acidic proteome and a relatively high (~1M) internal concentration of K+
ions, and three hyperthermophiles (Pyrobaculum aerophilum, Pyrococcus furiosus, and
Sulfolobus solfataricus), which have relatively basic proteomes. Homology modeling
has shown that the acidic pI of Halobacterium NRC-1 proteins is correlated with a high
concentration of surface negative charge (11). For example, a transcription factor (TbpE)
and a topoisomerase subunit (GyrA) showed a marked increase in surface negative charge
when compared to their homologs in nonhalophilic organisms (11).
G+C Composition and IS Elements
Common characteristics of halophile genomes are their high G+C composition major
fraction, low G+C satellite fraction, and a preponderance of IS elements (6). For Halobacterium NRC-1, the two pNRC replicons, which represent only 22% of the genome,
are substantially less G+C rich (58–59% G+C) than the large chromosome (68% G+C)
and contain a majority (69/91 or 76%) of the IS elements in the genome (Fig. 2). In addition, two regions of the chromosome are less G+C rich than average, with one 270-kbp
region (region I) containing 65% G+C and 13 IS elements and a second 150-kbp region
(region II) with 66% G+C and 4 IS elements (Fig. 2) (11). Interestingly, a 15-kbp region
Fig. 2. (Opposite page) (A) Circular map of the Halobacterium NRC-1 large chromosome
and (B) aligned linear genetic maps of pNRC100 and pNRC200 replicons. (A) The circular
map of the large chromsome plots contains locations of IS elements (outer scale), c-squared
analysis (red line), and G+C composition of open reading frames (black line). Colored bars associated with the outermost circle indicate the position of the chromosomal IS elements (ISH1,
beige; ISH2, purple; ISH3, green; ISH4, yellow; ISH6, pink; ISH8, blue; ISH10, red). Roman
numerals I and II indicate AT-rich islands. (B) The circular replicons are depicted in linear forms,
with the genes and IS elements represented as blocks. The two replicons contain 145,428 bp of
identity and either 45,918 bp or 219,997 bp of unique DNA for pNRC100 and pNRC200, respectively (3,4). The 33- to 39-kb inverted repeats are shown in yellow (conserved in all copies) and
orange (conserved in some, but not all, copies); the small single copy regions are in purple; the
common large single copy regions are in bright green; and the unique large single copy regions
are in tan (pNRC100) and light green (pNRC100). The IS elements are shown in dark orange (ISH2),
brown (ISH3), indigo (ISH5), blue (ISH7), dark green (ISH8), teal (ISH9), red (ISH10), and
blue-gray (ISH11). The pNRC replicons contain 69 IS elements (44 unique), 29 on pNRC100
and 40 on pNRC200; with 6 elements in the inverted repeats (repeated twice in both pNRC100
and pNR200 each), 4 elements in the SSC region in both pNRC100 and pNRC200, 7 elements in
the common large single copy region in both pNRC100 and pNRC200; and 23 elements in the
unique large single copy regions, 6 in pNRC100 and 17 in pNRC200. (Figure 2A reproduced
with permission from Cold Spring Harbor Laboratory Press, ref. 11.)
388
DasSarma
Table 1
Halobacterium NRC-1 Genome Statistics
Size (bp)
G+C composition (%)
Number of predicted genes
Coding (%)
Number of IS elements
ISH1
ISH2
ISH3
ISH4
ISH5
ISH6
ISH7
ISH8
ISH9
ISH10
ISH11
ISH12
Total
Chromosome
pNRC200
pNRC100
2,571,010
65.9
2,682
84
91
1
13
23
2
6
2
4
21
4
6
7
2
2,014,239
67.9
2,111
87
22
1
4
5
1
0
1
0
5
0
2
2
1
365,425
59.2
374
76
40
0
5
10
0
4
1
2
10
2
2
3
1
191,346
57.9
197
71
29
0
4
8
1
2
0
2
6
2
2
2
0
on the pNRC inverted repeats is higher in G+C content (64%) than pNRC100 as a whole
(58%) and lacks any IS elements (3), indicating the occurrence of genomic regions with
diverse character in all three replicons. All together, there are 91 IS elements, which represent 12 families in the NRC-1 genome (Table 1) (4). These findings suggest the involvement of IS elements in DNA exchange between the replicons of Halobacterium NRC-1.
The high G+C composition of Halobacterium NRC-1 is likely an adaptation to survival under intense solar radiation (e.g., to minimize targets for thymine dimer formation).
Statistically, the number of thymine dimer sites is expected to be nearly 60% lower for
the NRC-1 large chromosome compared to a comparable size replicon of 50% G+C.
However, dinucleotide analysis indicated even fewer sites, by an additional 20%, than
predicted from the G+C content (11). The high G+C composition also results in an
extreme third-position G+C bias in the codon usage (86% G+C vs 70% and 46% in the
first two positions) (11).
ANNOTATION OF THE HALOBACTERIUM GENOME
The Halobacterium Genome Consortium, an international group representing 12
institutions, conducted annotation of the NRC-1 genome from summer 1999 to summer
2000. Data were released starting at 3´ coverage periodically until completion, with a workshop held in Amherst, Massachusetts, in January 2000. This effort led to a thorough
analysis of this first halophile sequence and made it maximally useful to the community. In the subsequent 2-year period, numerous additional genes have been identified.
The high points of the current annotation are summarized here, and a comprehensive
database is available at the Halophile Genomes web site (http://zdna2.umbi.umd.edu).
Extremely Halophilic Archaeon Sequence
389
Fig. 3. Average pI profiles of proteomes predicted from genome sequences: Halobacterium
sp NRC-1 (NRC1), Haloarcula marismortui (Hma), Haloferax volcanii (Hvo), Archaeoglobus
fulgidus (Afu), Methanosarcina acetivorans (Mac), Methanococcus jannaschii (Mja), Methanobacter thermoautotrophicum (Mth), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu),
Sulfolobus solfataricus (Sso), Thermoplasma acidophilum (Tac), Bacillus subtilis (Bsu), Escherichia coli K12 (Eco), Saccharomyces cerevisiae (Sce).
DNA Replication
The Halobacterium NRC-1 genome codes for a heterodimeric family D DNA polymerase found in Archaea; many eukaryoticlike replication proteins; 2 family B DNA
polymerases, one coded by pNRC200; origin recognition and helicase recruiters (10
Orc1/Cdc6); replicative helicase (MCM); ssDNA binding proteins (6 Rfa); primases
(2 Pri); clamp loaders (RfcABC); processivity clamp (2 proliferating cell nuclear antigen
homologs); type I topoisomerase (TopA); type II topoisomerases (Top6A and Top6B);
RNA primer removal (Rad2 and RNaseH); and a few bacterial genes involved in replication, a primase (DnaG) and topoisomerase (GyrA and GyrB). Interestingly, multiple
copies of genes coding for eukaryotic origin recognition complex proteins Orc1/Cdc6
were found, including 3 scattered on the large chromosome, suggesting the possibility
of multiple replication origins (11). When analyzed for strand-specific G+C nucleotide variation or G+C skew, the large chromosome of Halobacterium NRC-1 was found
to contain 4 inflection points. Two of the three orc1/cdc6 genes were located near the
inflection points, suggesting that Halobacterium NRC-1 has a novel replication system
with two separate origins of replication on the large chromosome (11).
DNA Repair
The Halobacterium NRC-1 genome contains many DNA repair genes (Fig. 4), likely
necessary to repair DNA damage resulting from intense solar radiation in its environment (12). Consistent with expectations, NRC-1 displays high levels of resistance to
both ultraviolet and g-radiation. Photoreactivation is a very efficient process in Halobacterium, and two photolyase/cryptochrome homologs are encoded in the genome,
390
DasSarma
Extremely Halophilic Archaeon Sequence
391
one of which probably functions in DNA repair. A base excision repair is likely carried
out by the Ogg, AlkA, MutY, and Nth homologs and probably by XthA, a homolog of the
endonuclease IV family of AP endonuclease, and a Ogt, possible methylation damage
repair methylase. Halobacterium NRC-1 also encodes homologs of the bacterial excision repair complex UvrABCD. Interestingly, the presence of some genes coding for
homologs of the eukaryotic form of excision repair (Rad2, Rad3, Rad25, and ERCC4)
suggests the existence of duplicate repair systems in NRC-1. Mismatch repair proteins
MutS1, MutS2, and MutL are found in Halobacterium NRC-1. RadA1 and RadA2, homologs of RecA/Rad51 genes that are likely encoding recombinases; MRE11; and a Holliday junction resolvase likely involved in homologous recombination and recombinational repair are also present. A homolog of the bacterial UmuC polymerase for damage
bypass is found in the Halobacterium NRC-1 genome, as is a eukaryotic adenosine triphosphate (ATP)-type DNA ligase.
Transcription
Like other archaea, a simplified version of a eukaryotic RNA polymerase II–like transcription system is found in Halobacterium NRC-1; it contains Rpo subunits A, C, B', B",
E', E", H, K, L, N, and M (4). In addition, a surprising finding was that the NRC-1 genome
codes for 13 copies of TBP and TFB transcription factor genes, including 5 complete and
1 partial tbp genes (4 located on pNRC100, 1 on pNRC200, and 1 on the large chromosome) and 7 tfb genes (2 on pNRC200 and 5 on the large chromosome) (13). These results
suggested the possibility of a novel mechanism for gene regulation using alternate TBP–
TFB combinations for promoter selection. Consistent with this hypothesis, analysis of
Fig. 4. (Opposite page) Integrated view of the genome of Halobacterium NRC-1 (4). Aspects
of energy production, nutrient uptake, membrane assembly, cation and anion transport, and
signal transduction are depicted. ATP synthesis by chemiosmotic coupling of proton transport
by the respiratory chain and by light-driven proton pumping by bacteriorhodopsin (BR; purple
oval) or chloride transport by halorhodopsin (HR; blue oval) is shown. Below, the semiphosphorylated Entner–Doudoroff pathway is shown, and the presence of fatty acid oxidation and
the citric acid cycle is indicated. Enzymes not yet identified are marked with asterisks. A variety of nutrient uptake systems (represented by yellow or brown structures) coded by the genome,
including glycerol 3-phosphate (UgpABCE) and sugar (RbsAC) ABC transporters, a lactate
(LctP) transporter, formate–oxalate antiporter (OxiT), spermidine and putrescine uptake ABC
transporter (PotABCD), and amino acid (PutP, Cat) and dipeptide (DppABCDF) transporters,
are shown. Other amino acid uptake systems, represented by a generic ABC transporter, are also
likely to exist. Components of the protein translocation machinery (SecDEFY, SRP19, SRP54,
SRa) (in black) are shown. Carotenoid and retinal (Ret) biosynthesis is shown. Cation transporters (in green) shown are for K+ (TrkAH and KdpABC), Na+ (NhaC), Cd2+ (ZntX and Cd efflux
ATPase), Co2+ (CbiNOQ), Cu2+ (NosFY), Fe3+ (iron permease and HemUV), and Zn2+ (ZurMA).
Anion transporters shown (in red) are for SO42- (CysAT), PO43- (PstABC and phosphate permease), Cl- (chloride channel), and arsenate (ArsABC). A complex system of photoreceptors and
signal transduction components are shown, including 2 sensory receptors (SRI shown in blue
and SRII shown in orange), 17 transducers (Htr I–Htr X, HtrXII–Htr XVIII) responding to light
(hn), O2, or amino acids, as indicated. Transmission of the motility signal to the flagellar motor
via CheAW and CheY is shown by arrows. A flagellum is depicted as a wavy line. Single examples of sensor kinases (membrane bound [white rhombus] or cytoplasmic) and response regulators are identified. Gas vesicles (white ovals) and DNA repair systems are indicated within the cell.
392
DasSarma
the genome sequence and saturation mutagenesis of the bop promoter provided evidence
for alternate TATA box sequences (14). Nearly 100 transcriptional regulators, mostly
bacterial type, have also been identified.
Protein Synthesis
The translation system of Halobacterium NRC-1 has hybrid eukaryotic and bacterial
character, but like other Archaea, all of its ribosomal proteins have eukaryotic homologs. Interestingly, the ribosomal protein genes of Halobacterium NRC-1 are organized
into multigene clusters that resemble operons of bacteria. In addition to the 52 RNAs
(16S, 23S, and 5S rRNAs, 47 tRNAs [transfer RNAs], 7S RNA, and RNaseP), NRC-1
has 18 different aminoacyl–tRNA synthetases coded in the genome plus the GatABC
amidotransferases for charging with glutamine and asparagine (4). Interestingly, one
aminoacyl–tRNA synthetase, ArgRS, closely related to the bacterial and yeast mitochondrial enzymes, is coded by pNRC200.
For protein secretion, the Halobacterium NRC-1 general secretory (Sec) machinery
is a hybrid of eukaryotic and bacterial systems. Sec61a, Sec61g, SRP54, SRP19, and the
7S RNA are related to the corresponding eukaryotic factors, while FtsY, SecD, and SecF
(but not SecA) are related to the bacterial factors (4). In addition to the Sec system,
recent bioinformatic analysis has suggested that the twin-arginine (Tat) protein export
pathway used for secretion of mainly redox proteins in bacteria is also present in NRC1 and may be commonly used in this archaeon (15,16).
Cell Envelope
Halobacterium NRC-1 cells are surrounded by a single lipid bilayer membrane and
an S layer assembled from the cell surface glycoprotein. The cytoplasm is in osmotic
equilibrium with the hypersaline environment, with a correspondingly high intracellular K+ concentration that may be equivalent to the external Na+ concentration. Like
other Archaea, the polar lipids are based on archaeol, a glycerol diether lipid containing
phytanyl chains derived from C20 isoprenoids. The Halobacterium NRC-1 genome contained all of the key enzyme genes of isoprenoid synthesis, including HMG–coenzyme
A reductase (MvaA), the target of the growth inhibitor mevinolin (4). To maintain the
ionic balance, NRC-1 encodes multiple K+ transporters, including KdpABC, an ATP-driven
K+ transport system, and TrkAH, a low-affinity K+ transporter driven by the membrane
potential (Fig. 4). Active Na+ efflux is likely mediated by NhaC proteins coding for unidirectional Na+/H+ antiporters. Interestingly, genes coding KdpABC and copies of TrkA
(three of five) and NhaC (one of three) are found on pNRC200. In addition, active transporters for nutrient uptake were identified for cationic amino acids (Cat) and proline (PutP),
dipeptides (DppABCDF), oligopeptides (AppACF), a sugar transporter (Rbs), removal of
heavy metals (arsenite and cadmium) and other toxic compounds (multidrug resistance
homologs), and multiple copies of phosphate transporter systems, PstABC, and phosphate
permease.
Purple Membrane
Halobacterium NRC-1 contains purple membrane, a two-dimensional crystalline lattice
of the light-driven proton pump, bacteriorhodopsin, a complex of a protein, bacterio-
Extremely Halophilic Archaeon Sequence
393
opsin, and a chromophore, retinal (Fig. 4). Under high-illumination conditions, cells
can grow phototrophically, a capability recently recognized in planktonic bacteria (12).
Five purple membrane regulon genes, which are clustered on the chromosome and coordinately regulated, were identified, including bop, specifying bacteriorhodopsin; crtB1
and brp, coding the first and last committed steps of retinal synthesis, respectively, blp, a
gene of unknown function; and bat, the sensor–regulator (14). The bat gene product
(Bat) is a member of a small gene family, containing a GAF (cGMP-binding) domain,
PAS/PAC (redox-sensing) domain, and DNA-binding helix-turn-helix motif, which
likely binds an UAS (upstream activator protein) sequence for gene activation. The bop
gene TATA box sequence deviates from the consensus archaeal promoter sequence, suggesting the involvement of novel factors, such as alternate TBP and TFB proteins, in its
transcription (14).
Taxis and Signal Transduction
Halobacterium species are highly chemotactic and phototactic, with both chemical
gradients and gradients of light intensity or color modulating their swimming behavior.
A large number of taxis genes have been identified, including sopI and sopII, coding
for the phototaxis receptors; SRI and SRII, which are in the bacteriorhodopsin family
(and also including halorhodopsin, a chloride pump) (Fig. 4) (12). SRI mediates attractant responses to orange light and repellent responses to near-ultraviolet light, while SRII
is a blue light repellent photoreceptor. Interestingly, homologs of haloarchaeal rhodopsins
have recently been found in the genomes of fungi, algae, marine bacteria, and cyanobacteria (12). A total of 17 htr genes coding for integral membrane proteins homologous to
bacterial chemotaxis receptors were found, as were a complete set of che genes encoding chemotaxis determinants. There are 6 flagellin genes and an archaeal-type flagellar
apparatus (16). A large gene cluster, flaD-K, codes the archaeal flagellar apparatus,
with flaD, flaE, flaG, flaH, flaI, and flaJ similar to other archaea and only flaK resembling a bacterial flagellar regulator. Two-component regulatory systems are evident in
the Halobacterium NRC-1 genome, including 6 response regulator genes and 14 histidine kinases. The Halobacterium NRC-1 genome revealed the presence of several possible circadian photoregulators, including a eukaryotic cryptochrome and a cyanobacterial
KaiC-like protein, consistent with a circadian rhythm in this phototrophic microbe (12).
Gas Vesicles
Halobacterium species, like many photosynthetic aquatic prokaryotes, possess the
ability to regulate buoyancy by the synthesis of gas-filled vesicles (Fig. 4). The requirements for gas vesicle formation have been extensively studied in NRC-1 by genetic analysis (17). A cluster of genes, gvpMLKJIHGFEDACN(O), present on both pNRC100 and
pNRC200 in NRC-1 was shown to be necessary and sufficient for wild-type gas vesicle
synthesis. Interestingly, the genome sequence of Halobacterium NRC-1 also revealed
a silent, but nearly complete, gvp gene cluster, lacking only gvpM, on pNRC200 (4,12).
Carotenoids and Retinal
Halobacterium produces red-orange carotenoids that are essential for phototransduction and protection against photodamage, the most abundant being bacterioruberins
394
DasSarma
(Fig. 4). Genes encoding bacterial phytoene synthases have been identified in Halobacterium NRC-1, crtB1, and crtB2, and several genes coding for subsequent desaturation
steps are likely coded by crtI1, crtI2, and crtI3 (4). Genes that catalyze subsequent conversion to bacterioruberin have not yet been identified. In a branch of the carotenoid
pathway, lycopene is cyclized by the crtY gene product to form b-carotene, which is oxidatively cleaved to form retinal by the brp and blh gene products (Fig. 4) (18). For certain
steps of the carotenoid biosynthetic pathway, multiple genes may exist in Halobacterium NRC-1, and these may be differentially regulated by light or oxygen.
Energy Metabolism
Halobacterium NRC-1 can grow chemoorganotrophically, either aerobically or anerobically, and has phototrophic capability using bacteriorhodopsin. Halobacterium requires
all but 5 of the 20 amino acids for growth, and several amino acids may be used as a
source of energy. Aerobically, arginine and aspartate can be used via the citric acid cycle;
anaerobically, arginine can be used via the arginine deiminase pathway, coded by the
arcRACB genes on pNRC200 (Fig. 4) (3). Genes for a gluconeogenic pathway for carbohydrate synthesis during growth on amino acids and nearly all genes for a reverse
Embden–Meyerhof glycolytic pathway are present. Although Halobacterium is reported
to be unable to metabolize sugars, a sugar uptake transporter and genes coding for glucose
dehydrogenase and 2-keto-3-deoxygluconate kinase, a semi-phosphorylated Entner–
Doudoroff pathway, are present in Halobacterium NRC-1. The genes for gluconeogenesis
and catabolism of glyceraldeyde 3-phosphate (produced by glucose catabolism) to pyruvate are also present. Halobacterium NRC-1 also possesses genes encoding enzymes of
the bacterial-like fatty acid b-oxidation pathway and a 2-oxoacid dehydrogenase complex.
EVOLUTION AND LATERAL GENE TRANSFERS
Halobacterium NRC-1 is an organism of evolutionary interest that is distantly related
to some methanogens and is classified as a euryarchaeote based on the 16S rRNA
sequence. After complete sequencing, the Halobacterium NRC-1 genome was compared to 11 other complete genomes by gene content analysis using the DARWIN suite
of programs (4). The results confirmed the archaeal status of NRC-1, with the closest
relatives being Archeoglobus fulgidus and Methanococcus jannaschii. Interestingly,
however, similarities were also noted to the Gram-positive bacterium, Bacillus subtilis,
and the radiation-resistant bacterium, Deinococcus radiodurans. More recently, whole
genome analysis using a larger number of completed genomes showed Halobacterium
NRC-1 to branch at the root of the archaeal tree (Fig. 1) (19). The discrepancy between
the 16S rRNA and whole genome trees requires a more detailed investigation because
it suggests the possibility for the appearence of halophiles at a very early point in evolution. However, an additional possibility is that the position of NRC-1 in whole genome
trees is distorted, with Halobacterium pulled away from the other archaea and toward
the bacteria as a consequence of many lateral gene transfers from bacteria.
A comprehensive analysis of gene histories of Halobacterium NRC-1 has recently
been conducted (S. P. Kennedy and S. DasSarma, unpublished). Detailed phylogenetic
analysis of proteins catalogued as having bacterial phylogenies in the National Center
for Biotechnology Information Clusters of Orthologous Groups database was carried
Extremely Halophilic Archaeon Sequence
395
out. In addition bacterial-like genes clustered together in the genome and coding specific metabolic pathways were also subjected to phylogenetic analysis. Based on this
analysis, several hundred proteins, including biosynthetic, transport, and energy systems
(e.g., histidine utilization, purine metabolism, glycerol utilization) and components of
the electron transport chain were found to display clear bacterial histories. These genes
are likely to have been acquired in this halophile by lateral gene transfers. Surprisingly,
no physical link was observed with IS elements for these bacterial genes, suggesting that
the genes were acquired at an early point in evolution, and any vestige of the underlying
acquisition recombinational activity has been ameliorated. Although the mechanisms
responsible for interdomain genetic exchanges are unknown, the finding of hundreds
of bacterial genes in NRC-1 likely reflects the long-term opportunity for exchanges
between halophilic bacteria and archaea cohabiting hypersaline environments over evolutionary time. In this respect, NRC-1 is similar to some other mesophilic archaea (20)
and hyperthermophilic bacteria (21) in having large numbers of horizontally acquired
genes in its genome.
Acquisition of Respiratory Chain Components
Two of the most interesting cases of possible lateral gene transfers into Halobacterium NRC-1 are the genes encoding electron transport chain factors and biosynthetic
proteins (11). Ten nuo genes, encoding subunits of NADH dehydrogenase, along with 3
cox genes, encoding subunits of cytochrome-c oxidase, are clustered together into probable operons, as are 6 men genes, for menaquinone biosynthesis. Interestingly, the nuo
gene order is conserved with respect to Escherichia coli, with closest branching to Synechocystis sp PCC6803; the men gene order is conserved with respect to both E. coli and
D. radiodurans, with closest branching to B. subtilis. Moreover, the G+C analysis of
these two groups of genes showed they were distinguishable from the average chromosomal genes (64 or 73% compared with 68%). These results point to the interesting possibility that adaptation of halophiles to an oxidizing atmosphere occurred via the acquisition of electron transport chain components from aerobic bacteria through lateral transfer
events. Further analysis is necessary to determine whether such transfers of respiratory
genes have occurred once or repeatedly in the evolution of the diversity of modern
halophiles.
Evolution of Purple Membrane
Retinal-containing chromoproteins like bacteriorhodopsin in purple membrane and
sensory rhodopsins have recently been discovered in diverse bacteria and eukaryotes
and are therefore present in all three branches of life, Archaea, Bacteria, and Eukarya
(12,22). Although the evolutionary origin of retinal chromoproteins is unclear at present,
their wide distribution in nature is consistent with horizontal transmission. An interesting further speculation is that primordial rhodopsins were an early evolutionary invention and may have been responsible for the original dominant form of phototrophy in the
sea, pre-dating chlorophyll-based photosynthesis. Such early phototrophs, with the relatively simple capacity for coupling transmembrane light-driven proton pumping to adenosine triphosphate synthesis (22,23), could have arisen in a reducing atmosphere (although
a small quantity of oxygen would have been necessary for the synthesis of retinal). Evolution of organisms with more complex chlorophyll-based photosynthetic systems operating
396
DasSarma
Fig. 5. Ultraviolet-visible spectra of Halobacterium NRC-1 purple membrane (PM), red membrane (RM), and photosynthetic membrane (PM). Purple membrane and red membrane were
separated on a sucrose gradient, and spectra were plotted with photosynthetic membrane. The
complementarity of purple and photosynthetic membrane spectra is apparent, consistent with
coevolution of the two membranes.
with great efficiency could subsequently have displaced purple membrane–containing
organisms from most environments. Interestingly, the complementarity of the spectra
for purple membrane, with a peak at 568 nm, and photosynthetic membranes, with a
trough in this same wavelength, is striking (Fig. 5) and is consistent with coevolution
of the two types of membranes. Moreover, both chlorophyll-based cyanobacteria and
purple membrane–based haloarchaea still coexist in modern hypersaline environments,
with the former dominating at relatively lower salinity and the latter dominating at
saturating salinity.
Evolution of pNRC Replicons
The genome organization of Halobacterium NRC-1, with a large chromosome and
two related extrachromosomal replicons, is both complex and intriguing. One possible
reason for the maintenance of multiple replicons, including pNRC100 and pNRC200,
is that they have captured some essential genes and are therefore required for viability.
The compatibility between these related replicons may be explained by the presence of
multiple origins of replication of different compatibility groups (3,24). Because dozens
of copies of IS elements are present on these replicons, the transposable elements are
likely responsible for frequently promoting exchanges of DNA between them. More-
Extremely Halophilic Archaeon Sequence
397
over, once an extrachromosomal replicon is established in two or more copies, continued
DNA exchanges between individual copies of the smaller replicons and the large chromosome could result in generation of additional genomic diversity.
Such a possible scheme has been proposed for the evolution of pNRC100, including
multiple replicon fusions of precursor plasmids, followed by the acquisition of chromosomal genes, with both processes mediated by IS elements (3). The duplication of a portion of a pNRC100 precursor replicon through unequal crossing over of two IS element
pairs would have resulted in the formation of inverted repeats, which subsequently would
serve to stabilize the region within the repeats and create inversion isomers. Through
such processes, essential genes may have been captured from the chromosome, stabilized on the pNRC100 and pNRC200 replicons, and resulted in their achievement of
minichromosome status.
The existence of multiple minichromosome replicons with the capability to acquire
new genes and harboring multiple essential genes is a highly novel character of the Halobacterium NRC-1 genome. As a result, the NRC-1 genomic condition may be one of a
competitive dynamic equilibrium between several essential replicons in the genome.
Such a condition may arise from time to time in evolution and subside for intervening
periods through reduction in numbers by replicon fusions. The heterogeneity of minichromosomes among Halobacterium strains is testament to such underlying dynamic
processes (25). Given these findings in Halobacterium NRC-1, it is not inconceivable
that competition between replicons is a general phenomenon in evolution and may play
an important role in shaping the long-term evolution of prokaryotic genomes, including the evolution of new chromosomes from plasmids.
FUTURE PROSPECTS
The complete sequence of Halobacterium NRC-1 has provided an excellent platform
for evolutionary and comparative genomic analysis of an extremely halophilic archaeon
(4,11). As one of the few sequenced mesophilic archaea, which coinhabits a dynamic
environment populated by a multitude of bacteria, hundred of genes with bacterial or
uncertain histories have been uncovered. Additional genomic studies of diverse halophiles (1) (e.g., marine haloarchaea) are necessary to provide a significantly better understanding of the evolutionary position of these novel microorganisms. The finding of
large dynamic extrachromosomal replicons, containing both essential genes and a large
number of IS elements, has suggested the occurrence of multiple chromosomes that may
compete for genes (3).
In addition to evolutionary insights, the ease of culture and the wide range of biological responses of halophiles promise significant opportunities in functional genomics
and biotechnology. DNA arrays, proteomics, and gene knockouts are all approaches
available for further studies of Halobacterium biology (2,26). The recent use of a whole
genome microarray to study purple membrane expression illustrates the power of functional genomic approaches and remind us of the need to adhere to established rigorous
genetic practices in the postgenomic era (27,28). Significantly, halophilic archaea serve
as excellent models for fundamental aspects of eukaryotic biology (e.g., DNA replication, transcription, and translation). Finally, halophilic proteins and complexes, many
of which are extremely novel, provide genuine future opportunities for biotechnology,
including the development of new vaccines and antibiotics (29,30).
398
DasSarma
ACKNOWLEDGMENTS
Studies of haloarchaeal genomics in my laboratory have been generously supported
by the National Science Foundation. I wish to thank many current and former students
and associates and collaborators in the Halobacterium Genome Consortium who provided much of the information collected in this chapter. Special thanks are given to Dr.
Philip Harriman for support and encouragement.
REFERENCES
1. DasSarma S, Arora P. Halophiles. In: Encyclopedia of Life Sciences. London: Macmillan, 2000,
pp. 458–466.
2. DasSarma S, Robb FT, Place AR, et al. (eds). Archaea: A Laboratory Manual—Halophiles.
Cold Spring Harbor, NY: Cold Spring Harbor, Laboratory Press, 1995.
3. Ng W-L, Ciufo SA, Smith TM, et al. Snapshot of a large dynamic replicon from a halophilic
archaeon: megaplasmid or minichromosome? Genome Res 1998; 8:1131–1141.
4. Ng WV, Kennedy SP, Mahairas GG, et al. Genome sequence of Halobacterium species NRC-1.
Proc Natl Acad Sci USA 2000; 97:12,176–12,181.
5. Joshi JG, Guild WR, Handler P. The presence of two species of DNA in some halobacteria.
J Mol Biol 1963; 6:34–38.
6. Charlebois RL, Doolittle WF. Transposable elements and genome structure in halobacteria. In:
Berg DE, Howe MM (eds). Mobile DNA. Washington, DC: American Society for Microbiology,
1989, pp. 297–307.
7. Ng W-L, Kothakota S, DasSarma S. Structure of the large gas vesicle plasmid in Halobacterium
halobium: inversion isomers, inverted repeats, and insertion sequences. J Bacteriol 1991; 173:
1958–1964.
8. Hackett NR, Bobovnikova Y, Heyrovska N. Conservation of chromosomal arrangement among
three strains of the genetically unstable archaeon Halobacterium species. J Bacteriol 1994; 176:
7711–7718.
9. St Jean A, Charlebois RL. Comparative genomic analysis of the Haloferax volcanii DS2 and
Halobacterium sp GRB contig maps reveals extensive rearrangement. J Bacteriol 1996; 178:
3860–3868.
10. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res
1998; 8:195–202.
11. Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S. Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome
sequence. Genome Res 2001; 11:1641–1650.
12. DasSarma S, Kennedy SP, Berquist B, et al. Genomic perspective on the photobiology of Halobacterium species NRC-1, a phototrophic, phototactic, and UV-tolerant haloarchaeon. Photosyn
Res 2001; 70:3–17.
13. Baliga NS, Goo YA, Ng WV, Hood L, Daniels CJ, DasSarma S. Is gene expression in Halobacterium NRC-1 regulated by multiple TBP and TFB transcription factors? Mol Microbiol 2000;
36:1184–1185.
14. Baliga NS, Kennedy SP, Ng WV, Hood L, DasSarma S. Genomic and genetic dissection of an
archaeal regulon. Proc Natl Acad Sci USA 2001; 98:2521–2525.
15. Bolhuis A. Protein transport in the halophilic archaeon Halobacterium sp NRC-1: a major role
for the twin-arginine translocation pathway? Microbiology 2002; 148:3335–3346.
16. Patenge N, Berendes A, Engelhardt H, Schuster SC, Oesterhelt D. The fla gene cluster is
involved in the biogenesis of flagella in Halobacterium. Mol Microbiol 2001; 41:653-663.
Extremely Halophilic Archaeon Sequence
399
17. DasSarma S, Arora P. Genetic analysis of the gas vesicle gene cluster in haloarchaea. FEMS
Microbiol Lett 1997; 153:1–10.
18. Peck RF, Echavarri-Erasun C, Johnson EA, et al. brp and blh are required for synthesis of the
retinal cofactor of bacteriorhodopsin in Halobacterium. J Biol Chem 2001; 276:5739–5744.
19. Korbel JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome
phylogenies. Trends Genet 2002; 18:158–162.
20. Deppenmeier U, Johann A, Hartsch T, et al. The genome of Methanosarcina mazei: evidence for
lateral gene transfer between bacteria and archaea. J Mol Microbiol Biotechnol 2002; 4:453–461.
21. Nelson KE, Clayton RA, Gill SR, et al. Evidence for lateral gene transfer between archaea and
bacteria from genome sequence of Thermotoga maritima. Nature 1999; 399:323–329.
22. Beja O, Aravind L, Koonin EV, et al. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 2000; 289:1902–1906.
23. Racker E, Stoeckenius W. Reconstitution of purple membrane vesicles catalyzing light-driven
proton uptake and adenosine triphosphate formation. J Biol Chem 1974; 249:662–663.
24. Ng WL, DasSarma S. Minimal replication origin of the 200-kilobase Halobacterium plasmid
pNRC100. J Bacteriol 1993; 175:4584–4596.
25. Ng W-L, Arora P, DasSarma S. Large deletions in class III gas-vesicles deficient mutants of
Halobacterium. Sys Appl Microbiol 1994; 16:560-568.
26. Peck RF, DasSarma S, Krebs MP. Homologous gene knockout in the archaeon Halobacterium
with ura3 as a counterselectable marker. Mol Microbiol 2000; 35:667–676.
27. Baliga NS, Pan M, Goo YA, et al. Coordinate regulation of energy transduction modules in
Halobacterium sp analyzed by a global systems approach. Proc Natl Acad Sci USA 2003; 99:
14,913–14,918.
28. DasSarma S. Biology reports Ltd. faculty of 1000 commentary. Available at: http://www.faculty
of1000.com/article/12403819. Accessed January 8, 2003.
29. Stuart ES, Morshed F, Sremac M, DasSarma S. Antigen presentation using novel particulate
organelles from halophilic archaea. J Biotechnol 2001; 88:119–128.
30. Hansen JL, Ippolito JA, Ban N, Nissen P, Moore PB, Steitz TA. The structures of four macrolide antibiotics bound to the large ribosomal subunit. Mol Cell 2002; 10:117–128.