* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome Sequence of an Extremely Halophilic Archaeon
Exome sequencing wikipedia , lookup
Molecular ecology wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene regulatory network wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transposable element wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Community fingerprinting wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Extremely Halophilic Archaeon Sequence 383 21 Genome Sequence of an Extremely Halophilic Archaeon Shiladitya DasSarma INTRODUCTION Extreme halophiles are novel microorganisms that require 5–10 times the salinity of seawater (ca. 3–5M NaCl) for optimal growth (1,2). They include diverse prokaryotic species, both archaeal and bacterial, and some eukaryotic organisms. Extreme halophiles are found in hypersaline environments near the sea or salt deposits of marine or nonmarine origin. Two of the largest hypersaline lakes supporting a variety of halophilic species are the Great Salt Lake in the western United States and the Dead Sea in the Middle East. Some of the most interesting hypersaline environments are small artificial solar salterns used for producing salt from the sea, which are distributed throughout the world. Many hypersaline environments exhibit gradients of increasing salinity temporally and produce sequential growth of progressively more halophilic species, including complex microbial mats and spectacular blooms of bright red and red-orange colored species. These environments are important ecologically, frequently supporting entire populations of such exotic birds as pink flamingoes, which obtain their color from the pigmented halophilic microorganisms. A critical feature of halophilic microbes that prevents cell lysis in hypersaline environments is their high internal concentration of compatible solutes (e.g., amino acids, polyols, and salts), which act as osmoprotectants. Although a wide variety of halophiles has been cultured, the genome of only a single extreme halophile, Halobacterium sp NRC-1, has been completely sequenced thus far (3,4). This species is a typical halophile commonly found in many hypersaline environments, including the Great Salt Lake and solar salterns. Phylogenetically, it is classified as an archaeon, a member of the third branch of life (Fig. 1). It has a growth optimum of 4.5M NaCl, close to the saturation point, and a high concentration of K+ salts internally. Halobacterium NRC-1 is a mesophilic archaeon, with a temperature optimum of 42oC for growth. Alhough Halobacterium species are thought to have limited physiological capabilities, strain NRC-1 is metabolically quite versatile, growing aerobically, anaerobically, and phototrophically. Phototrophic growth is mediated by the light-driven proton pumping of bacteriorhodopsin, which forms a two-dimensional crystalline lattice in the purple membrane. Halobacterium NRC-1 is also highly resistant to ultraviolet and gradiation and displays sophisticated motility responses, including phototaxis, chemotaxis, From: Microbial Genomes Edited by: C. M. Fraser, T. D. Read, and K. E. Nelson © Humana Press Inc., Totowa, NJ 383 384 DasSarma Fig. 1. Whole genome tree of selected archaeal organisms. Gene content phylogeny done by neighbor-joining using the SHOT web server (19) indicates that Halobacterium is located at the base of the archaeal branch of the phylogenetic tree. and gas vesicle-mediated flotation. One of the most notable features of Halobacterium NRC-1, revealed by genome sequencing, is a highly acidic proteome, which is likely essential to maintain protein solubility and function in high salinity. Significantly, this organism is amenable to analysis using well-developed genetic methodology, including gene knockouts, expression vectors, and complementation systems, which make Halobacterium NRC-1 a good model for functional genomic studies among extremophiles and archaea (2). In addition to Halobacterium NRC-1, several other halophiles are the subject of ongoing genome projects. The most notable among these are two Dead Sea archaea, Haloarcula marismortui and Haloferax volcanii (1), which are slightly less halophilic than Halobacterium NRC-1, with an optimum salinity of 2–3M NaCl and a high magnesium ion tolerance, reflecting the salt composition of their environment. They also display metabolic capability for growth in media containing simple sugars and carbohydrates as carbon and energy sources. Several other interesting categories of halophiles worthy of genomic studies include alkaliphilic halophiles, which grow in soda lakes with pH of 9.0–11.0; psychrotrophic halophiles, which grow at freezing temperatures in Antarctic lakes; bacterial halophiles, which tolerate a wide range of salinity; and eukaryotic halophiles, such as the green algae, Dunaliella salina. Finally, sequencing of a haloarchaeal strain with a nearly identical chromosome to strain NRC-1 is also in progress. A listing of current genome projects on halophiles is maintained on the Halophile Genomes Web site at the University of Maryland Biotechnology Institute, Center of Marine Biotechnology (http://zdna2.umbi.umd.edu). Extremely Halophilic Archaeon Sequence 385 THE HALOBACTERIUM GENOME The genomes of Halobacterium species were originally studied a half-century ago; they are composed of two components, a major fraction that is G+C-rich and a relatively A+T-rich (58% G+C) satellite (5). Subsequent studies showed that the satellite deoxyribonucleic acid (DNA) corresponded mainly to large heterogeneous extrachromosomal replicons containing many transposable insertion sequence (IS) elements (6). For Halobacterium NRC-1, extensive mapping revealed the presence of three replicons: pNRC100, about 200 kbp; pNRC200, nearly twice the size of pNRC100; and a 2-Mbp chromosome (Fig. 2) (7,8). The pNRC100 replicon was found to be partly identical to pNRC200 and to exist as inversion isomers (7). The chromosomes of strain NRC-1 and another wild-type strain, GRB, were compared by restriction mapping, which showed extensive regions of similarity and a few regions with differences, including a large inversion and an insertion. Ordered cosmid libraries representing the genomes of Halobacterium species GRB and H. volcanii were also constructed and compared by hybridization, which indicated the lack of any detectable conserved gene organization (9). These and other mapping projects suggest that significant diversity exists within the genomes of halophilic archaea. Genome Sequencing and Analysis Because of the high G+C composition and the large number of IS elements, the Halobacterium NRC-1 genome was sequenced in two stages. Initially, the pNRC100 replicon was sequenced by a combination of random shotgun sequencing of libraries made from purified covalently closed circular DNA and directed sequencing of cloned and mapped HindIII fragments (3,7). This approach permitted the assembly of an unstable replicon that undergoes frequent DNA rearrangements, including inversion isomerization, and contains many IS elements. Subsequently, whole genome random shotgun sequencing was performed, providing 7.5´ coverage of the relatively stable large chromosome (4). Remaining lower-quality regions were sequenced using polymerase chain reaction fragments and by primer walking. The NRC-1 genome was assembled using the Phred, Phrap, and Consed programs, initially masking all the known and putative new IS elements, to avoid the formation of chimeric contigs (4,10). The complete genome sequence of Halobacterium NRC-1 revealed a 2,571,010-bp genome, including the 2,014,239-bp G+C-rich chromosome, and two smaller circles, 191,346-bp pNRC100, and 365,425-bp pNRC200 (Table 1; Fig. 2) (3,4). Interestingly, pNRC100 and pNRC200 contained a 145,428-bp region of 100% identity, including 33- to 39-kb inverted repeats, which mediate inversion isomerization; the small single copy region; and a part of the large single copy regions (Fig. 2) (7). The unique regions of the large single copy region contained 45,918 bp for pNRC100 and 219,997 bp for pNRC200. Glimmer (Gene Locator and Interpolated Markov Modeler) was used to identify 2,630 likely genes in the genome, of which 64% coded for proteins with significant matches to the databases (4). In addition, 52 ribonucleic acid (RNA) genes were identified. About 40 genes in pNRC100 and pNRC200 coded for proteins likely to be essential or important for cell viability, such as a DNA polymerase, TBP and TFB transcription factors, and the arginyl–tRNA (transfer RNA) synthetase, suggesting that these replicons should be classified as minichromosomes rather than megaplasmids (3,4). 386 DasSarma Extremely Halophilic Archaeon Sequence 387 Proteome Analysis One of the most dramatic results of genome sequencing of Halobacterium NRC-1 was the finding of an extremely acidic complement of encoded proteins, which is likely directly related to protein function in its hypersaline (>4M KCl) cytoplasm (11). Calculated isoelectric points (pIs) for predicted proteins showed an average pI of approx 5, a prediction confirmed by proteomic analysis (Fig. 3). Similarly, acidic proteomes were predicted from partial genome sequences of two other halophiles, H. marismortui and H. volcanii. In contrast, the average pIs of nearly all other proteomes are close to neutral. Notable exceptions are Methanobacterium thermoautotrophicum, which also contains both an acidic proteome and a relatively high (~1M) internal concentration of K+ ions, and three hyperthermophiles (Pyrobaculum aerophilum, Pyrococcus furiosus, and Sulfolobus solfataricus), which have relatively basic proteomes. Homology modeling has shown that the acidic pI of Halobacterium NRC-1 proteins is correlated with a high concentration of surface negative charge (11). For example, a transcription factor (TbpE) and a topoisomerase subunit (GyrA) showed a marked increase in surface negative charge when compared to their homologs in nonhalophilic organisms (11). G+C Composition and IS Elements Common characteristics of halophile genomes are their high G+C composition major fraction, low G+C satellite fraction, and a preponderance of IS elements (6). For Halobacterium NRC-1, the two pNRC replicons, which represent only 22% of the genome, are substantially less G+C rich (58–59% G+C) than the large chromosome (68% G+C) and contain a majority (69/91 or 76%) of the IS elements in the genome (Fig. 2). In addition, two regions of the chromosome are less G+C rich than average, with one 270-kbp region (region I) containing 65% G+C and 13 IS elements and a second 150-kbp region (region II) with 66% G+C and 4 IS elements (Fig. 2) (11). Interestingly, a 15-kbp region Fig. 2. (Opposite page) (A) Circular map of the Halobacterium NRC-1 large chromosome and (B) aligned linear genetic maps of pNRC100 and pNRC200 replicons. (A) The circular map of the large chromsome plots contains locations of IS elements (outer scale), c-squared analysis (red line), and G+C composition of open reading frames (black line). Colored bars associated with the outermost circle indicate the position of the chromosomal IS elements (ISH1, beige; ISH2, purple; ISH3, green; ISH4, yellow; ISH6, pink; ISH8, blue; ISH10, red). Roman numerals I and II indicate AT-rich islands. (B) The circular replicons are depicted in linear forms, with the genes and IS elements represented as blocks. The two replicons contain 145,428 bp of identity and either 45,918 bp or 219,997 bp of unique DNA for pNRC100 and pNRC200, respectively (3,4). The 33- to 39-kb inverted repeats are shown in yellow (conserved in all copies) and orange (conserved in some, but not all, copies); the small single copy regions are in purple; the common large single copy regions are in bright green; and the unique large single copy regions are in tan (pNRC100) and light green (pNRC100). The IS elements are shown in dark orange (ISH2), brown (ISH3), indigo (ISH5), blue (ISH7), dark green (ISH8), teal (ISH9), red (ISH10), and blue-gray (ISH11). The pNRC replicons contain 69 IS elements (44 unique), 29 on pNRC100 and 40 on pNRC200; with 6 elements in the inverted repeats (repeated twice in both pNRC100 and pNR200 each), 4 elements in the SSC region in both pNRC100 and pNRC200, 7 elements in the common large single copy region in both pNRC100 and pNRC200; and 23 elements in the unique large single copy regions, 6 in pNRC100 and 17 in pNRC200. (Figure 2A reproduced with permission from Cold Spring Harbor Laboratory Press, ref. 11.) 388 DasSarma Table 1 Halobacterium NRC-1 Genome Statistics Size (bp) G+C composition (%) Number of predicted genes Coding (%) Number of IS elements ISH1 ISH2 ISH3 ISH4 ISH5 ISH6 ISH7 ISH8 ISH9 ISH10 ISH11 ISH12 Total Chromosome pNRC200 pNRC100 2,571,010 65.9 2,682 84 91 1 13 23 2 6 2 4 21 4 6 7 2 2,014,239 67.9 2,111 87 22 1 4 5 1 0 1 0 5 0 2 2 1 365,425 59.2 374 76 40 0 5 10 0 4 1 2 10 2 2 3 1 191,346 57.9 197 71 29 0 4 8 1 2 0 2 6 2 2 2 0 on the pNRC inverted repeats is higher in G+C content (64%) than pNRC100 as a whole (58%) and lacks any IS elements (3), indicating the occurrence of genomic regions with diverse character in all three replicons. All together, there are 91 IS elements, which represent 12 families in the NRC-1 genome (Table 1) (4). These findings suggest the involvement of IS elements in DNA exchange between the replicons of Halobacterium NRC-1. The high G+C composition of Halobacterium NRC-1 is likely an adaptation to survival under intense solar radiation (e.g., to minimize targets for thymine dimer formation). Statistically, the number of thymine dimer sites is expected to be nearly 60% lower for the NRC-1 large chromosome compared to a comparable size replicon of 50% G+C. However, dinucleotide analysis indicated even fewer sites, by an additional 20%, than predicted from the G+C content (11). The high G+C composition also results in an extreme third-position G+C bias in the codon usage (86% G+C vs 70% and 46% in the first two positions) (11). ANNOTATION OF THE HALOBACTERIUM GENOME The Halobacterium Genome Consortium, an international group representing 12 institutions, conducted annotation of the NRC-1 genome from summer 1999 to summer 2000. Data were released starting at 3´ coverage periodically until completion, with a workshop held in Amherst, Massachusetts, in January 2000. This effort led to a thorough analysis of this first halophile sequence and made it maximally useful to the community. In the subsequent 2-year period, numerous additional genes have been identified. The high points of the current annotation are summarized here, and a comprehensive database is available at the Halophile Genomes web site (http://zdna2.umbi.umd.edu). Extremely Halophilic Archaeon Sequence 389 Fig. 3. Average pI profiles of proteomes predicted from genome sequences: Halobacterium sp NRC-1 (NRC1), Haloarcula marismortui (Hma), Haloferax volcanii (Hvo), Archaeoglobus fulgidus (Afu), Methanosarcina acetivorans (Mac), Methanococcus jannaschii (Mja), Methanobacter thermoautotrophicum (Mth), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu), Sulfolobus solfataricus (Sso), Thermoplasma acidophilum (Tac), Bacillus subtilis (Bsu), Escherichia coli K12 (Eco), Saccharomyces cerevisiae (Sce). DNA Replication The Halobacterium NRC-1 genome codes for a heterodimeric family D DNA polymerase found in Archaea; many eukaryoticlike replication proteins; 2 family B DNA polymerases, one coded by pNRC200; origin recognition and helicase recruiters (10 Orc1/Cdc6); replicative helicase (MCM); ssDNA binding proteins (6 Rfa); primases (2 Pri); clamp loaders (RfcABC); processivity clamp (2 proliferating cell nuclear antigen homologs); type I topoisomerase (TopA); type II topoisomerases (Top6A and Top6B); RNA primer removal (Rad2 and RNaseH); and a few bacterial genes involved in replication, a primase (DnaG) and topoisomerase (GyrA and GyrB). Interestingly, multiple copies of genes coding for eukaryotic origin recognition complex proteins Orc1/Cdc6 were found, including 3 scattered on the large chromosome, suggesting the possibility of multiple replication origins (11). When analyzed for strand-specific G+C nucleotide variation or G+C skew, the large chromosome of Halobacterium NRC-1 was found to contain 4 inflection points. Two of the three orc1/cdc6 genes were located near the inflection points, suggesting that Halobacterium NRC-1 has a novel replication system with two separate origins of replication on the large chromosome (11). DNA Repair The Halobacterium NRC-1 genome contains many DNA repair genes (Fig. 4), likely necessary to repair DNA damage resulting from intense solar radiation in its environment (12). Consistent with expectations, NRC-1 displays high levels of resistance to both ultraviolet and g-radiation. Photoreactivation is a very efficient process in Halobacterium, and two photolyase/cryptochrome homologs are encoded in the genome, 390 DasSarma Extremely Halophilic Archaeon Sequence 391 one of which probably functions in DNA repair. A base excision repair is likely carried out by the Ogg, AlkA, MutY, and Nth homologs and probably by XthA, a homolog of the endonuclease IV family of AP endonuclease, and a Ogt, possible methylation damage repair methylase. Halobacterium NRC-1 also encodes homologs of the bacterial excision repair complex UvrABCD. Interestingly, the presence of some genes coding for homologs of the eukaryotic form of excision repair (Rad2, Rad3, Rad25, and ERCC4) suggests the existence of duplicate repair systems in NRC-1. Mismatch repair proteins MutS1, MutS2, and MutL are found in Halobacterium NRC-1. RadA1 and RadA2, homologs of RecA/Rad51 genes that are likely encoding recombinases; MRE11; and a Holliday junction resolvase likely involved in homologous recombination and recombinational repair are also present. A homolog of the bacterial UmuC polymerase for damage bypass is found in the Halobacterium NRC-1 genome, as is a eukaryotic adenosine triphosphate (ATP)-type DNA ligase. Transcription Like other archaea, a simplified version of a eukaryotic RNA polymerase II–like transcription system is found in Halobacterium NRC-1; it contains Rpo subunits A, C, B', B", E', E", H, K, L, N, and M (4). In addition, a surprising finding was that the NRC-1 genome codes for 13 copies of TBP and TFB transcription factor genes, including 5 complete and 1 partial tbp genes (4 located on pNRC100, 1 on pNRC200, and 1 on the large chromosome) and 7 tfb genes (2 on pNRC200 and 5 on the large chromosome) (13). These results suggested the possibility of a novel mechanism for gene regulation using alternate TBP– TFB combinations for promoter selection. Consistent with this hypothesis, analysis of Fig. 4. (Opposite page) Integrated view of the genome of Halobacterium NRC-1 (4). Aspects of energy production, nutrient uptake, membrane assembly, cation and anion transport, and signal transduction are depicted. ATP synthesis by chemiosmotic coupling of proton transport by the respiratory chain and by light-driven proton pumping by bacteriorhodopsin (BR; purple oval) or chloride transport by halorhodopsin (HR; blue oval) is shown. Below, the semiphosphorylated Entner–Doudoroff pathway is shown, and the presence of fatty acid oxidation and the citric acid cycle is indicated. Enzymes not yet identified are marked with asterisks. A variety of nutrient uptake systems (represented by yellow or brown structures) coded by the genome, including glycerol 3-phosphate (UgpABCE) and sugar (RbsAC) ABC transporters, a lactate (LctP) transporter, formate–oxalate antiporter (OxiT), spermidine and putrescine uptake ABC transporter (PotABCD), and amino acid (PutP, Cat) and dipeptide (DppABCDF) transporters, are shown. Other amino acid uptake systems, represented by a generic ABC transporter, are also likely to exist. Components of the protein translocation machinery (SecDEFY, SRP19, SRP54, SRa) (in black) are shown. Carotenoid and retinal (Ret) biosynthesis is shown. Cation transporters (in green) shown are for K+ (TrkAH and KdpABC), Na+ (NhaC), Cd2+ (ZntX and Cd efflux ATPase), Co2+ (CbiNOQ), Cu2+ (NosFY), Fe3+ (iron permease and HemUV), and Zn2+ (ZurMA). Anion transporters shown (in red) are for SO42- (CysAT), PO43- (PstABC and phosphate permease), Cl- (chloride channel), and arsenate (ArsABC). A complex system of photoreceptors and signal transduction components are shown, including 2 sensory receptors (SRI shown in blue and SRII shown in orange), 17 transducers (Htr I–Htr X, HtrXII–Htr XVIII) responding to light (hn), O2, or amino acids, as indicated. Transmission of the motility signal to the flagellar motor via CheAW and CheY is shown by arrows. A flagellum is depicted as a wavy line. Single examples of sensor kinases (membrane bound [white rhombus] or cytoplasmic) and response regulators are identified. Gas vesicles (white ovals) and DNA repair systems are indicated within the cell. 392 DasSarma the genome sequence and saturation mutagenesis of the bop promoter provided evidence for alternate TATA box sequences (14). Nearly 100 transcriptional regulators, mostly bacterial type, have also been identified. Protein Synthesis The translation system of Halobacterium NRC-1 has hybrid eukaryotic and bacterial character, but like other Archaea, all of its ribosomal proteins have eukaryotic homologs. Interestingly, the ribosomal protein genes of Halobacterium NRC-1 are organized into multigene clusters that resemble operons of bacteria. In addition to the 52 RNAs (16S, 23S, and 5S rRNAs, 47 tRNAs [transfer RNAs], 7S RNA, and RNaseP), NRC-1 has 18 different aminoacyl–tRNA synthetases coded in the genome plus the GatABC amidotransferases for charging with glutamine and asparagine (4). Interestingly, one aminoacyl–tRNA synthetase, ArgRS, closely related to the bacterial and yeast mitochondrial enzymes, is coded by pNRC200. For protein secretion, the Halobacterium NRC-1 general secretory (Sec) machinery is a hybrid of eukaryotic and bacterial systems. Sec61a, Sec61g, SRP54, SRP19, and the 7S RNA are related to the corresponding eukaryotic factors, while FtsY, SecD, and SecF (but not SecA) are related to the bacterial factors (4). In addition to the Sec system, recent bioinformatic analysis has suggested that the twin-arginine (Tat) protein export pathway used for secretion of mainly redox proteins in bacteria is also present in NRC1 and may be commonly used in this archaeon (15,16). Cell Envelope Halobacterium NRC-1 cells are surrounded by a single lipid bilayer membrane and an S layer assembled from the cell surface glycoprotein. The cytoplasm is in osmotic equilibrium with the hypersaline environment, with a correspondingly high intracellular K+ concentration that may be equivalent to the external Na+ concentration. Like other Archaea, the polar lipids are based on archaeol, a glycerol diether lipid containing phytanyl chains derived from C20 isoprenoids. The Halobacterium NRC-1 genome contained all of the key enzyme genes of isoprenoid synthesis, including HMG–coenzyme A reductase (MvaA), the target of the growth inhibitor mevinolin (4). To maintain the ionic balance, NRC-1 encodes multiple K+ transporters, including KdpABC, an ATP-driven K+ transport system, and TrkAH, a low-affinity K+ transporter driven by the membrane potential (Fig. 4). Active Na+ efflux is likely mediated by NhaC proteins coding for unidirectional Na+/H+ antiporters. Interestingly, genes coding KdpABC and copies of TrkA (three of five) and NhaC (one of three) are found on pNRC200. In addition, active transporters for nutrient uptake were identified for cationic amino acids (Cat) and proline (PutP), dipeptides (DppABCDF), oligopeptides (AppACF), a sugar transporter (Rbs), removal of heavy metals (arsenite and cadmium) and other toxic compounds (multidrug resistance homologs), and multiple copies of phosphate transporter systems, PstABC, and phosphate permease. Purple Membrane Halobacterium NRC-1 contains purple membrane, a two-dimensional crystalline lattice of the light-driven proton pump, bacteriorhodopsin, a complex of a protein, bacterio- Extremely Halophilic Archaeon Sequence 393 opsin, and a chromophore, retinal (Fig. 4). Under high-illumination conditions, cells can grow phototrophically, a capability recently recognized in planktonic bacteria (12). Five purple membrane regulon genes, which are clustered on the chromosome and coordinately regulated, were identified, including bop, specifying bacteriorhodopsin; crtB1 and brp, coding the first and last committed steps of retinal synthesis, respectively, blp, a gene of unknown function; and bat, the sensor–regulator (14). The bat gene product (Bat) is a member of a small gene family, containing a GAF (cGMP-binding) domain, PAS/PAC (redox-sensing) domain, and DNA-binding helix-turn-helix motif, which likely binds an UAS (upstream activator protein) sequence for gene activation. The bop gene TATA box sequence deviates from the consensus archaeal promoter sequence, suggesting the involvement of novel factors, such as alternate TBP and TFB proteins, in its transcription (14). Taxis and Signal Transduction Halobacterium species are highly chemotactic and phototactic, with both chemical gradients and gradients of light intensity or color modulating their swimming behavior. A large number of taxis genes have been identified, including sopI and sopII, coding for the phototaxis receptors; SRI and SRII, which are in the bacteriorhodopsin family (and also including halorhodopsin, a chloride pump) (Fig. 4) (12). SRI mediates attractant responses to orange light and repellent responses to near-ultraviolet light, while SRII is a blue light repellent photoreceptor. Interestingly, homologs of haloarchaeal rhodopsins have recently been found in the genomes of fungi, algae, marine bacteria, and cyanobacteria (12). A total of 17 htr genes coding for integral membrane proteins homologous to bacterial chemotaxis receptors were found, as were a complete set of che genes encoding chemotaxis determinants. There are 6 flagellin genes and an archaeal-type flagellar apparatus (16). A large gene cluster, flaD-K, codes the archaeal flagellar apparatus, with flaD, flaE, flaG, flaH, flaI, and flaJ similar to other archaea and only flaK resembling a bacterial flagellar regulator. Two-component regulatory systems are evident in the Halobacterium NRC-1 genome, including 6 response regulator genes and 14 histidine kinases. The Halobacterium NRC-1 genome revealed the presence of several possible circadian photoregulators, including a eukaryotic cryptochrome and a cyanobacterial KaiC-like protein, consistent with a circadian rhythm in this phototrophic microbe (12). Gas Vesicles Halobacterium species, like many photosynthetic aquatic prokaryotes, possess the ability to regulate buoyancy by the synthesis of gas-filled vesicles (Fig. 4). The requirements for gas vesicle formation have been extensively studied in NRC-1 by genetic analysis (17). A cluster of genes, gvpMLKJIHGFEDACN(O), present on both pNRC100 and pNRC200 in NRC-1 was shown to be necessary and sufficient for wild-type gas vesicle synthesis. Interestingly, the genome sequence of Halobacterium NRC-1 also revealed a silent, but nearly complete, gvp gene cluster, lacking only gvpM, on pNRC200 (4,12). Carotenoids and Retinal Halobacterium produces red-orange carotenoids that are essential for phototransduction and protection against photodamage, the most abundant being bacterioruberins 394 DasSarma (Fig. 4). Genes encoding bacterial phytoene synthases have been identified in Halobacterium NRC-1, crtB1, and crtB2, and several genes coding for subsequent desaturation steps are likely coded by crtI1, crtI2, and crtI3 (4). Genes that catalyze subsequent conversion to bacterioruberin have not yet been identified. In a branch of the carotenoid pathway, lycopene is cyclized by the crtY gene product to form b-carotene, which is oxidatively cleaved to form retinal by the brp and blh gene products (Fig. 4) (18). For certain steps of the carotenoid biosynthetic pathway, multiple genes may exist in Halobacterium NRC-1, and these may be differentially regulated by light or oxygen. Energy Metabolism Halobacterium NRC-1 can grow chemoorganotrophically, either aerobically or anerobically, and has phototrophic capability using bacteriorhodopsin. Halobacterium requires all but 5 of the 20 amino acids for growth, and several amino acids may be used as a source of energy. Aerobically, arginine and aspartate can be used via the citric acid cycle; anaerobically, arginine can be used via the arginine deiminase pathway, coded by the arcRACB genes on pNRC200 (Fig. 4) (3). Genes for a gluconeogenic pathway for carbohydrate synthesis during growth on amino acids and nearly all genes for a reverse Embden–Meyerhof glycolytic pathway are present. Although Halobacterium is reported to be unable to metabolize sugars, a sugar uptake transporter and genes coding for glucose dehydrogenase and 2-keto-3-deoxygluconate kinase, a semi-phosphorylated Entner– Doudoroff pathway, are present in Halobacterium NRC-1. The genes for gluconeogenesis and catabolism of glyceraldeyde 3-phosphate (produced by glucose catabolism) to pyruvate are also present. Halobacterium NRC-1 also possesses genes encoding enzymes of the bacterial-like fatty acid b-oxidation pathway and a 2-oxoacid dehydrogenase complex. EVOLUTION AND LATERAL GENE TRANSFERS Halobacterium NRC-1 is an organism of evolutionary interest that is distantly related to some methanogens and is classified as a euryarchaeote based on the 16S rRNA sequence. After complete sequencing, the Halobacterium NRC-1 genome was compared to 11 other complete genomes by gene content analysis using the DARWIN suite of programs (4). The results confirmed the archaeal status of NRC-1, with the closest relatives being Archeoglobus fulgidus and Methanococcus jannaschii. Interestingly, however, similarities were also noted to the Gram-positive bacterium, Bacillus subtilis, and the radiation-resistant bacterium, Deinococcus radiodurans. More recently, whole genome analysis using a larger number of completed genomes showed Halobacterium NRC-1 to branch at the root of the archaeal tree (Fig. 1) (19). The discrepancy between the 16S rRNA and whole genome trees requires a more detailed investigation because it suggests the possibility for the appearence of halophiles at a very early point in evolution. However, an additional possibility is that the position of NRC-1 in whole genome trees is distorted, with Halobacterium pulled away from the other archaea and toward the bacteria as a consequence of many lateral gene transfers from bacteria. A comprehensive analysis of gene histories of Halobacterium NRC-1 has recently been conducted (S. P. Kennedy and S. DasSarma, unpublished). Detailed phylogenetic analysis of proteins catalogued as having bacterial phylogenies in the National Center for Biotechnology Information Clusters of Orthologous Groups database was carried Extremely Halophilic Archaeon Sequence 395 out. In addition bacterial-like genes clustered together in the genome and coding specific metabolic pathways were also subjected to phylogenetic analysis. Based on this analysis, several hundred proteins, including biosynthetic, transport, and energy systems (e.g., histidine utilization, purine metabolism, glycerol utilization) and components of the electron transport chain were found to display clear bacterial histories. These genes are likely to have been acquired in this halophile by lateral gene transfers. Surprisingly, no physical link was observed with IS elements for these bacterial genes, suggesting that the genes were acquired at an early point in evolution, and any vestige of the underlying acquisition recombinational activity has been ameliorated. Although the mechanisms responsible for interdomain genetic exchanges are unknown, the finding of hundreds of bacterial genes in NRC-1 likely reflects the long-term opportunity for exchanges between halophilic bacteria and archaea cohabiting hypersaline environments over evolutionary time. In this respect, NRC-1 is similar to some other mesophilic archaea (20) and hyperthermophilic bacteria (21) in having large numbers of horizontally acquired genes in its genome. Acquisition of Respiratory Chain Components Two of the most interesting cases of possible lateral gene transfers into Halobacterium NRC-1 are the genes encoding electron transport chain factors and biosynthetic proteins (11). Ten nuo genes, encoding subunits of NADH dehydrogenase, along with 3 cox genes, encoding subunits of cytochrome-c oxidase, are clustered together into probable operons, as are 6 men genes, for menaquinone biosynthesis. Interestingly, the nuo gene order is conserved with respect to Escherichia coli, with closest branching to Synechocystis sp PCC6803; the men gene order is conserved with respect to both E. coli and D. radiodurans, with closest branching to B. subtilis. Moreover, the G+C analysis of these two groups of genes showed they were distinguishable from the average chromosomal genes (64 or 73% compared with 68%). These results point to the interesting possibility that adaptation of halophiles to an oxidizing atmosphere occurred via the acquisition of electron transport chain components from aerobic bacteria through lateral transfer events. Further analysis is necessary to determine whether such transfers of respiratory genes have occurred once or repeatedly in the evolution of the diversity of modern halophiles. Evolution of Purple Membrane Retinal-containing chromoproteins like bacteriorhodopsin in purple membrane and sensory rhodopsins have recently been discovered in diverse bacteria and eukaryotes and are therefore present in all three branches of life, Archaea, Bacteria, and Eukarya (12,22). Although the evolutionary origin of retinal chromoproteins is unclear at present, their wide distribution in nature is consistent with horizontal transmission. An interesting further speculation is that primordial rhodopsins were an early evolutionary invention and may have been responsible for the original dominant form of phototrophy in the sea, pre-dating chlorophyll-based photosynthesis. Such early phototrophs, with the relatively simple capacity for coupling transmembrane light-driven proton pumping to adenosine triphosphate synthesis (22,23), could have arisen in a reducing atmosphere (although a small quantity of oxygen would have been necessary for the synthesis of retinal). Evolution of organisms with more complex chlorophyll-based photosynthetic systems operating 396 DasSarma Fig. 5. Ultraviolet-visible spectra of Halobacterium NRC-1 purple membrane (PM), red membrane (RM), and photosynthetic membrane (PM). Purple membrane and red membrane were separated on a sucrose gradient, and spectra were plotted with photosynthetic membrane. The complementarity of purple and photosynthetic membrane spectra is apparent, consistent with coevolution of the two membranes. with great efficiency could subsequently have displaced purple membrane–containing organisms from most environments. Interestingly, the complementarity of the spectra for purple membrane, with a peak at 568 nm, and photosynthetic membranes, with a trough in this same wavelength, is striking (Fig. 5) and is consistent with coevolution of the two types of membranes. Moreover, both chlorophyll-based cyanobacteria and purple membrane–based haloarchaea still coexist in modern hypersaline environments, with the former dominating at relatively lower salinity and the latter dominating at saturating salinity. Evolution of pNRC Replicons The genome organization of Halobacterium NRC-1, with a large chromosome and two related extrachromosomal replicons, is both complex and intriguing. One possible reason for the maintenance of multiple replicons, including pNRC100 and pNRC200, is that they have captured some essential genes and are therefore required for viability. The compatibility between these related replicons may be explained by the presence of multiple origins of replication of different compatibility groups (3,24). Because dozens of copies of IS elements are present on these replicons, the transposable elements are likely responsible for frequently promoting exchanges of DNA between them. More- Extremely Halophilic Archaeon Sequence 397 over, once an extrachromosomal replicon is established in two or more copies, continued DNA exchanges between individual copies of the smaller replicons and the large chromosome could result in generation of additional genomic diversity. Such a possible scheme has been proposed for the evolution of pNRC100, including multiple replicon fusions of precursor plasmids, followed by the acquisition of chromosomal genes, with both processes mediated by IS elements (3). The duplication of a portion of a pNRC100 precursor replicon through unequal crossing over of two IS element pairs would have resulted in the formation of inverted repeats, which subsequently would serve to stabilize the region within the repeats and create inversion isomers. Through such processes, essential genes may have been captured from the chromosome, stabilized on the pNRC100 and pNRC200 replicons, and resulted in their achievement of minichromosome status. The existence of multiple minichromosome replicons with the capability to acquire new genes and harboring multiple essential genes is a highly novel character of the Halobacterium NRC-1 genome. As a result, the NRC-1 genomic condition may be one of a competitive dynamic equilibrium between several essential replicons in the genome. Such a condition may arise from time to time in evolution and subside for intervening periods through reduction in numbers by replicon fusions. The heterogeneity of minichromosomes among Halobacterium strains is testament to such underlying dynamic processes (25). Given these findings in Halobacterium NRC-1, it is not inconceivable that competition between replicons is a general phenomenon in evolution and may play an important role in shaping the long-term evolution of prokaryotic genomes, including the evolution of new chromosomes from plasmids. FUTURE PROSPECTS The complete sequence of Halobacterium NRC-1 has provided an excellent platform for evolutionary and comparative genomic analysis of an extremely halophilic archaeon (4,11). As one of the few sequenced mesophilic archaea, which coinhabits a dynamic environment populated by a multitude of bacteria, hundred of genes with bacterial or uncertain histories have been uncovered. Additional genomic studies of diverse halophiles (1) (e.g., marine haloarchaea) are necessary to provide a significantly better understanding of the evolutionary position of these novel microorganisms. The finding of large dynamic extrachromosomal replicons, containing both essential genes and a large number of IS elements, has suggested the occurrence of multiple chromosomes that may compete for genes (3). In addition to evolutionary insights, the ease of culture and the wide range of biological responses of halophiles promise significant opportunities in functional genomics and biotechnology. DNA arrays, proteomics, and gene knockouts are all approaches available for further studies of Halobacterium biology (2,26). The recent use of a whole genome microarray to study purple membrane expression illustrates the power of functional genomic approaches and remind us of the need to adhere to established rigorous genetic practices in the postgenomic era (27,28). Significantly, halophilic archaea serve as excellent models for fundamental aspects of eukaryotic biology (e.g., DNA replication, transcription, and translation). Finally, halophilic proteins and complexes, many of which are extremely novel, provide genuine future opportunities for biotechnology, including the development of new vaccines and antibiotics (29,30). 398 DasSarma ACKNOWLEDGMENTS Studies of haloarchaeal genomics in my laboratory have been generously supported by the National Science Foundation. I wish to thank many current and former students and associates and collaborators in the Halobacterium Genome Consortium who provided much of the information collected in this chapter. Special thanks are given to Dr. Philip Harriman for support and encouragement. REFERENCES 1. DasSarma S, Arora P. Halophiles. In: Encyclopedia of Life Sciences. London: Macmillan, 2000, pp. 458–466. 2. DasSarma S, Robb FT, Place AR, et al. (eds). Archaea: A Laboratory Manual—Halophiles. Cold Spring Harbor, NY: Cold Spring Harbor, Laboratory Press, 1995. 3. Ng W-L, Ciufo SA, Smith TM, et al. Snapshot of a large dynamic replicon from a halophilic archaeon: megaplasmid or minichromosome? Genome Res 1998; 8:1131–1141. 4. Ng WV, Kennedy SP, Mahairas GG, et al. Genome sequence of Halobacterium species NRC-1. Proc Natl Acad Sci USA 2000; 97:12,176–12,181. 5. Joshi JG, Guild WR, Handler P. The presence of two species of DNA in some halobacteria. J Mol Biol 1963; 6:34–38. 6. Charlebois RL, Doolittle WF. Transposable elements and genome structure in halobacteria. In: Berg DE, Howe MM (eds). Mobile DNA. Washington, DC: American Society for Microbiology, 1989, pp. 297–307. 7. Ng W-L, Kothakota S, DasSarma S. Structure of the large gas vesicle plasmid in Halobacterium halobium: inversion isomers, inverted repeats, and insertion sequences. J Bacteriol 1991; 173: 1958–1964. 8. Hackett NR, Bobovnikova Y, Heyrovska N. Conservation of chromosomal arrangement among three strains of the genetically unstable archaeon Halobacterium species. J Bacteriol 1994; 176: 7711–7718. 9. St Jean A, Charlebois RL. Comparative genomic analysis of the Haloferax volcanii DS2 and Halobacterium sp GRB contig maps reveals extensive rearrangement. J Bacteriol 1996; 178: 3860–3868. 10. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res 1998; 8:195–202. 11. Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S. Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. Genome Res 2001; 11:1641–1650. 12. DasSarma S, Kennedy SP, Berquist B, et al. Genomic perspective on the photobiology of Halobacterium species NRC-1, a phototrophic, phototactic, and UV-tolerant haloarchaeon. Photosyn Res 2001; 70:3–17. 13. Baliga NS, Goo YA, Ng WV, Hood L, Daniels CJ, DasSarma S. Is gene expression in Halobacterium NRC-1 regulated by multiple TBP and TFB transcription factors? Mol Microbiol 2000; 36:1184–1185. 14. Baliga NS, Kennedy SP, Ng WV, Hood L, DasSarma S. Genomic and genetic dissection of an archaeal regulon. Proc Natl Acad Sci USA 2001; 98:2521–2525. 15. Bolhuis A. Protein transport in the halophilic archaeon Halobacterium sp NRC-1: a major role for the twin-arginine translocation pathway? Microbiology 2002; 148:3335–3346. 16. Patenge N, Berendes A, Engelhardt H, Schuster SC, Oesterhelt D. The fla gene cluster is involved in the biogenesis of flagella in Halobacterium. Mol Microbiol 2001; 41:653-663. Extremely Halophilic Archaeon Sequence 399 17. DasSarma S, Arora P. Genetic analysis of the gas vesicle gene cluster in haloarchaea. FEMS Microbiol Lett 1997; 153:1–10. 18. Peck RF, Echavarri-Erasun C, Johnson EA, et al. brp and blh are required for synthesis of the retinal cofactor of bacteriorhodopsin in Halobacterium. J Biol Chem 2001; 276:5739–5744. 19. Korbel JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome phylogenies. Trends Genet 2002; 18:158–162. 20. Deppenmeier U, Johann A, Hartsch T, et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J Mol Microbiol Biotechnol 2002; 4:453–461. 21. Nelson KE, Clayton RA, Gill SR, et al. Evidence for lateral gene transfer between archaea and bacteria from genome sequence of Thermotoga maritima. Nature 1999; 399:323–329. 22. Beja O, Aravind L, Koonin EV, et al. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science 2000; 289:1902–1906. 23. Racker E, Stoeckenius W. Reconstitution of purple membrane vesicles catalyzing light-driven proton uptake and adenosine triphosphate formation. J Biol Chem 1974; 249:662–663. 24. Ng WL, DasSarma S. Minimal replication origin of the 200-kilobase Halobacterium plasmid pNRC100. J Bacteriol 1993; 175:4584–4596. 25. Ng W-L, Arora P, DasSarma S. Large deletions in class III gas-vesicles deficient mutants of Halobacterium. Sys Appl Microbiol 1994; 16:560-568. 26. Peck RF, DasSarma S, Krebs MP. Homologous gene knockout in the archaeon Halobacterium with ura3 as a counterselectable marker. Mol Microbiol 2000; 35:667–676. 27. Baliga NS, Pan M, Goo YA, et al. Coordinate regulation of energy transduction modules in Halobacterium sp analyzed by a global systems approach. Proc Natl Acad Sci USA 2003; 99: 14,913–14,918. 28. DasSarma S. Biology reports Ltd. faculty of 1000 commentary. Available at: http://www.faculty of1000.com/article/12403819. Accessed January 8, 2003. 29. Stuart ES, Morshed F, Sremac M, DasSarma S. Antigen presentation using novel particulate organelles from halophilic archaea. J Biotechnol 2001; 88:119–128. 30. Hansen JL, Ippolito JA, Ban N, Nissen P, Moore PB, Steitz TA. The structures of four macrolide antibiotics bound to the large ribosomal subunit. Mol Cell 2002; 10:117–128.