Download Human mutations in glucose 6-phosphate dehydrogenase reflect

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genome (book) wikipedia , lookup

Mutagen wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Expanded genetic code wikipedia , lookup

Designer baby wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

Computational phylogenetics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Sequence alignment wikipedia , lookup

Koinophilia wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome editing wikipedia , lookup

Oncogenomics wikipedia , lookup

Epistasis wikipedia , lookup

Genetic code wikipedia , lookup

Microevolution wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Human mutations in glucose 6-phosphate
dehydrogenase reflect evolutionary history
ROSARIO NOTARO,* ADEYINKA AFOLAYAN,*,† AND LUCIO LUZZATTO*,1
*Department of Human Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10021,
USA; and †Department of Biochemistry, Obafemi Awolowo University, Ile-Ife, Nigeria
ABSTRACT
Glucose 6-phosphate dehydrogenase
(G6PD) is a cytosolic enzyme encoded by a housekeeping X-linked gene whose main function is to
produce NADPH, a key electron donor in the defense against oxidizing agents and in reductive biosynthetic reactions. Inherited G6PD deficiency is
associated with either episodic hemolytic anemia
(triggered by fava beans or other agents) or life-long
hemolytic anemia. We show here that an evolutionary
analysis is a key to understanding the biology of a
housekeeping gene. From the alignment of the
amino acid (aa) sequence of 52 glucose 6-phosphate
dehydrogenase (G6PD) species from 42 different
organisms, we found a striking correlation between
the aa replacements that cause G6PD deficiency in
humans and the sequence conservation of G6PD:
two-thirds of such replacements are in highly and
moderately conserved (50 –99%) aa; relatively few
are in fully conserved aa (where they might be lethal)
or in poorly conserved aa, where presumably they
simply would not cause G6PD deficiency. This is
consistent with the notion that all human mutants
have residual enzyme activity and that null mutations
are lethal at some stage of development. Comparing
the distribution of mutations in a human housekeeping gene with evolutionary conservation is a useful
tool for pinpointing amino acid residues important
for the stability or the function of the corresponding
protein. In view of the current explosive increase in
full genome sequencing projects, this tool will become rapidly available for numerous other genes.—
Notaro, N., Afolayan, A., Luzzatto, L. Human mutations in glucose 6-phosphate dehydrogenase reflect
evolutionary history. FASEB J. 14, 485– 494 (2000)
Key Words: G6PD deficiency 䡠 G6PD variants 䡠 housekeeping
genes 䡠 human mutants 䡠 evolution
Glucose 6-phosphate dehydrogenase (G6PD) is
a cytosolic enzyme whose main function is to produce NADPH, a key electron donor in the defense
against oxidizing agents and in reductive biosynthetic reactions (1). In humans, genetically determined G6PD deficiency is associated with either
episodic hemolytic anemia, triggered by fava beans
0892-6638/00/0014-0485/$02.25 © FASEB
or other agents (referred to hereinafter as mild
phenotype), or life-long hemolytic anemia (referred
to hereafter as severe phenotype) (2). Among 122
variants characterized at the molecular level, those
with severe phenotype are all rare, but those with
mild phenotype have become common in many
populations as a result of malaria selection (1).
G6PD was first identified in the early 1930s in yeast
(3) and human red blood cells (4), and since then in
all organisms that have been tested for this enzyme
activity as well as in all tissues and cell types from
higher animals and plants (5); hence, it can be
regarded as a prototype example of a housekeeping
enzyme. Structural and functional features of the
G6PD promoter conform to those of a typical housekeeping gene (6 – 8). The G6PD gene from Homo
sapiens was the first to be cloned (9). Since then, 52
complete G6PD coding sequences have become
available; of these, 43 were sought out deliberately
and 9 have emerged in the last 2 years from the
complete sequencing of microorganisms. From
these sequence data we have derived a general idea
of the evolution of G6PD over a Myr time scale
(macroevolution). We used this evolutionary information to understand the distribution of human
mutations that affect enzyme activity (microevolution).
MATERIALS AND METHODS
G6PD sequences
For this study we have selected only complete G6PD sequences for which there was an available open reading frame
from a start to a stop codon. The following 49 G6PD complete
coding sequences have been obtained from GENBANK by
keyword search or by BLAST search using human G6PD
sequence as probe (in parentheses, GENBANK sequence ID):
Actinobacillus actinomycetemcomitans (1651208); Anabaena sp.
(988293); Aquifex aeolicus (2983137); Arabidopsis thaliana 1,
chloroplast (3021305); Arabidopsis thaliana 2 (1174336); Aspergillus niger (870831); Bacillus subtilis (1303961); Borrelia
1
Correspondence: Department of Human Genetics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue,
New York, NY 10021, USA. E-mail: [email protected]
485
burgdorferi (2688568); Caenorhabditis elegans (1321752); Ceratitis capitata (460877); Chlamydia trachomatis (1791245); Cricetulus griseus (2828743); Drosophila melanogaster (157470); Emericella nidulans (642160); Erwinia chrysanthemi (397854);
Escherichia coli (1788158); Fugu rubripes (2734869); Haemophilus influenzae (1573543); Helicobacter pylori (2314250); Homo
sapiens (31543); Kluyveromyces lactis (5539); Leuconostoc mesenteroides (149631); Macropus robustus (560549); Medicago sativa
(603219); Mus musculus G6PD1 (51114) and G6PD2
(1806126); Mycobacterium tuberculosis zwf (2117215) and zwf2
(2131049); Nicotiana tabacum cytosolic TCG6 (3021508), cytosolic TCG9 (3021510), chloroplast TPG16 (AJ001771), chloroplast TPG18 (3021532) and chloroplast (1480344) G6PD;
Nostoc sp. (558505); Petroselinum crispum cytosolic cG6PDH1
(2352921), cytosolic cG6PDH2 (2352923) and chloroplast
pG6PDH (2352919); Pichia jadinii (348417); Plasmodium falciparum (438212); Pseudomonas aeruginosa (2935661); Rattus
norvegicus (56196); Saccharomyces cerevisiae (171545); Solanum
tuberosum cytosolic (471345) and chloroplast (1197385)
G6PD; Spinacia oleracea, chloroplast (2276344); Synechococcus
sp. (988286); Synechocystis sp. (1652530); Treponema pallidum
(3322776); Zymomonas mobilis (155591). Two additional complete sequences (Deinococcus radiodurans and Streptococcus
pnaeumoniae) were retrieved by BLAST search from the
archive of The Institute of Genomic Research. The complete
sequence from Rhodobacter capsulatus was retrieved from the
archive of University of Chicago. The full genomes of Archaeoglobus fulgidus, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Mycoplasma genitalium, Mycoplasma pneumoniae, and Rickettsia prowazekii are listed in GENBANK and
were searched by using BLAST; the full-sequence Pyrococcus
horikoshii was searched by using BLAST on the server of
National Institute of Technology and Evaluation, Tokyo,
Japan. We have not included the sequence of a rabbit
microsomal G6PD (10); it is likely that this sequence belongs
to the non-glucose-6-phosphate-specific hexose-6-phosphate
dehydrogenase previously characterized from rat liver microsomes (11).
Alignment and sequence analysis
The sequences were aligned using the program PIMA 1.4
(12) (Human Genome Center, Baylor College of Medicine,
Houston, Tex.: http://dot.imgen.bcm.tmc.edu:9331/multialign.html). The alignment was manually refined. The percentages of identity and similarity were calculated from the
alignment using the program GeneDoc (13). For aa similarity, we adopted the criteria of the Dayhoff’s PAM 250 matrix
(14): 1) DEQHN, 2) FY, 3) KR, 4) SAT, and 5) LIVM. For
physicochemical grouping, we adopted GeneDoc criteria in
which aa are classified in twelve groups based on three
subsequent hierarchical levels: the first level is based on size,
the second is based on electrical charge for polar aa, the third
is based on aromaticity for nonpolar aa (15, 16).
Phylogenetic tree
The evolutionary tree of G6PD was derived from the alignment of the 52 sequences. Distances between sequences were
calculated from the alignment by using PROTDIST (according to Dayhoff’s PAM 250 matrix; ref 14); the tree was then
generated by using KITSCH and drawn by using TREEVIEW
(17). The programs PROTDIST and KITSCH are part of the
PHYLIP 3.57 package (18).
Amino acid solvent accessibility
The 3-dimensional structure of human G6PD was previously
modeled on the crystal structure of L. mesenteroides G6PD
486
Vol. 14
March 2000
(19). Specifically, residues 27–512 of human G6PD are
aligned to residues 1– 486 of L. mesenteroides. The coordinates
of this model have been used to calculate the solvent accessibility (SA) of individual aa residues in the monomeric
structure of human G6PD using Swisse-PdbViewer (20) program (http://www.expasy.ch/spdbv/mainpage.htm). Amino
acids with SA ⬍ 10% are regarded as buried and amino acids
with SA ⱖ 10% are regarded as exposed (21).
Human mutations
One hundred twenty-two mutations or combination of mutations of human G6PD have recently been tabulated (22): 114
variants have missense mutations; of these, 106 have a single
mutation, 7 have two mutations, and 1 has three mutations.
Six variants have in-frame deletions (four single aa deletions,
one 2 aa, and one 8 aa deletion). One variant has a nonsense
mutation and one has a splicing site mutation. We have made
use of only 118 variants from this database (Fig. 4) because no
clinical data are available on the other four. Of these 118
variants, 3 do not affect enzyme activity; all others entail
enzyme deficiency: of these, 57 are associated with chronic
nonspherocytic hemolytic anemia (WHO class I, severe phenotype) and 58 are associated with the risk of acute hemolytic
anemia (WHO classes II and III, mild phenotype). In studying
the relationship between human mutations and evolutionary
conservation, we have confined the analysis to only the 99
single missense mutations and the 4 single amino acid
deletions.
RESULTS
G6PD in living organisms
We have retrieved from available databases a total of
52 sequences clearly homologous to human G6PD,
which we have used as reference. Of these, 50 were
annotated as G6PD and 2 were not. When all of these
sequences are aligned by using conventional algorithms (Fig. 1), we find that the mammalian sequences have ⬃94% identity (by comparison, homology among mammalian hemoglobins is of the
order of 70%); when these are compared with those
of lower vertebrates, invertebrates, and microorganisms, it is not surprising that the degree of identity
decreases but remains higher than 20% (Table 1).
The percentage of similarity among all of the 52
sequences ranges between 43 to 98%. A long stretch
of 141 AA (163–303 in human) is aligned without a
single gap in all sequences (except H. pylori and A.
aeolicus). Some G6PD species in plants have aminoterminal extensions related to their localization in
chloroplasts; G6PD from Plasmodium falciparum has
an intriguing amino-terminal extension (23) that
may have a different enzymatic function (24), but in
the G6PD-like portion the homology is of the same
order of magnitude as that found in other organisms.
In our retrieval of G6PD sequences, it was of
special interest that no coding sequence recognizable as G6PD could be found in seven microorgan-
The FASEB Journal
NOTARO ET AL.
Figure 1. Macroevolution of the G6PD molecule. This diagram reflects a synoptic analysis of 52 different G6PD sequences.
Human G6PD (italics) is used as the reference sequence and the even exons are underlined and numbered: the coding
sequence starts in exon 2. In the lettering of human G6PD, aa solvent accessibility (SA) is reported with the following color code:
blue, buried aa (SA ⬍10%); red, exposed aa (SA⫽10%). Immediately above the human sequence we show the similarity
consensus for the 52 sequences analyzed (similarity), and above that the identity consensus sequence (identity). At each
position, the most frequent aa is shown, with a coding for its frequency (white uppercase in blue field: 100%; black uppercase
in green field: 76 –99%; white lowercase in red field: 51–75%); hyphens are entered in the two consensus lines for aa with ⬍50%
conservation. For aa similarity we have adopted the criteria of the Dayhoff PAM 250 matrix (14): DEQHN, FY, KR, SAT, and
LIVM. Conserved blocks described in the text are boxed and identified with roman numerals. Above the identity consensus
sequence, runs of underlined a and b indicate ␣-helices and ␤-strands respectively, as found in the Leuconostoc mesenteroides
molecule (sec. structure) (19). The structure of the first 24 aa and the last 3 aa (lowercase) is not known for L. mesenteroids.
isms whose genome has been fully sequenced. Four
of these (Archaeoglobus fulgidus, Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Pyrococcus horikoshii) are Archea, which grow in an oxygenMACRO- AND MICROEVOLUTION OF G6PD
poor habitat. The other three are Eubacteria with a
small and defective genome, two of which (Mycoplasma genitalium and M. pneumoniae) are cell surface
parasites, whereas one (Rickettsia prowazekii) is an
487
TABLE 1. Identity and similarity among G6PD sequencesa
HUMAN
MOUSE
FUGRU
DROME
CAEEL
SOLTU
YEAST
SYNP7
ECOLI
LEUME
AQUAE
HUMAN
515
98
89
76
76
70
64
54
53
49
45
MOUSE
93
515
89
77
76
71
64
54
53
50
45
FUGRU
76
75
514
75
74
69
63
52
51
51
42
DROME
59
59
59
523
73
66
64
53
50
48
44
CAEEL
56
55
55
54
522
68
64
56
52
50
45
SOLTU
49
49
48
44
44
511
64
55
53
50
47
YEAST
41
42
43
42
43
46
505
56
55
54
46
SYNP7
32
32
30
29
32
32
31
524
58
54
44
ECOLI
31
31
30
28
31
32
30
38
491
59
47
LEUME
28
27
28
27
28
27
30
30
31
486
43
AQUAE
23
23
22
20
22
23
23
25
24
24
431
I
D
E
N
T
I
T
Y
SIMILARITY
a
Percentage of identity and similarity were determined on the alignment of the 52 G6PD sequences (8) with the program GeneDoc (25)
according to the Dayhoff matrix PAM 250 (26). Identity is shown in the upper right portion of the matrix; similarity is shown in the lower left
portion of the matrix. In the highlighted diagonal the number of amino acids for each sequence is shown. Abbreviations as follows: HUMAN:
H. sapiens; MOUSE: M. musculus; FUGRU: F. rubripes; DROME: D. melanogaster; CAEEL: C. elegans; SOLTU: S. tuberosum (cytosol); YEAST: S.
cerevisiae; SYNP7: Synechoccus sp.; ECOLI: E. coli; LEUME: L. mesenteroides; AQUAE: A. aeolicus.
obligate intracellular parasite that might be able to
capitalize on the G6PD activity of the host cell.
The homology between G6PD sequences from
Eubacteria and Eukaryota clearly supports a common
evolutionary origin of G6PD throughout living organisms. Unlike tissue-specific genes such as globins,
a housekeeping gene such as G6PD allows us to
outline evolutionary relationships for most of the
living organism, the only exception so far being the
Archea. A dendrogram based on economy principles
is consonant with taxonomy (Fig. 2), and some
specific points are noteworthy. 1) Metazoa, fungi,
and plants all separate early on from bacteria. 2)
Vertebrates separate cleanly from invertebrates. 3)
The chloroplast G6PD sequences of the plants separate early from the cytosolic G6PD species of the
plants and from G6PD of fungi and metazoa. 4)
Rather than being nearer metazoa, the G6PD sequences of fungi seem to fall between chloroplast
G6PD and cytosolic G6PD.
Sequence conservation and 3-dimensional structure
In all of the 52 known G6PD species, there are 25
identical aa and 56 similar aa. The total number of aa
is 515 in human and between 425 and 604 in most
species. If we apply broader criteria of similarity,
whereby aa are classified into 12 groups sharing some
physicochemical properties, there are in fact 143 conserved aa. The overall percent homology among G6PD
sequences is of course only a rough measure of evolutionary conservation, because homology is not uniform
throughout the sequence. The degree of homology
tends to decrease toward both the NH2 terminus and
the carboxyl terminus, perhaps because these regions
are allowed greater flexibility without prejudice to the
overall conformation of the molecule. Similarity con488
Vol. 14
March 2000
servation is shaped in blocks; we have identified 12 by
visual inspection (Fig. 1). Although we cannot yet fully
explain the significance of each conserved block, we
can offer a rationale in most cases. Block I is the NADP
binding region (residues 38 – 47 of human G6PD); it
has a characteristic dinucleotide pocket (GXXGXX),
flanked on either side by additional conserved hydrophobic amino acids (e.g., IIM 35–37, P 50 and P 62, L
55, and L 61) and a KKK motif. Block IV comprises the
catalytic site, which surrounds a completely conserved
lysine; in human G6PD this is K 205, previously shown
to be essential for enzyme activity though not for
substrate binding (25).
To understand the significance of other regions of
high conservation, we must consider the conformation of the protein. The crystal structure has been
fully solved for L. mesenteroides G6PD (26) and human G6PD (27). The latter agrees well with a previously published model (19); therefore, it is reasonable to presume that the 3-dimensional structure is
essentially conserved. With reference to the human
G6PD model (19), we find that the conserved blocks
III and V are facing the active center (see above).
Block X contributes much of the subunit interface.
Blocks II, VII, VIII, IX, and XI are part of the
hydrophobic core. The carboxyl-terminal ends of
block IV and block XI also contribute to the subunit
interface. Blocks VI and XII include amphipatic ␣
helices. To test the notion that surface accessibility is
significantly related to evolutionary conservation, we
consider four groups of aa based on similarity conservation (fully conserved, 100%; highly conserved,
75–99%; moderately conserved, 50 –74%; poorly
conserved, ⬍50%). We find that buried aa residues
are over-represented and that exposed aa residues
are under-represented among those with a similarity
higher than 75% (Fig. 3) (␹2⫽50; P⬍0.001).
The FASEB Journal
NOTARO ET AL.
served. In addition, the mutations in human G6PD
are spread throughout the sequence. Only one discrete cluster emerges in the aa range 380 – 410 that
corresponds to the subunit interface in the enzymatically active G6PD dimer (19),
To assess how human mutations associated with
G6PD deficiency relate in general to the evolutionary
history of the G6PD sequence, we refer again to the
four similarity conservation groups defined above
and see that a distinct pattern emerges (Table 2; Fig.
5). Fully conserved amino acids and poorly conserved amino acids are under-represented in G6PDdeficient mutants. By contrast, highly and moderately conserved amino acids are over-represented in
G6PD-deficient mutants. This skewed distribution of
mutations among the different amino acid conservation group is statistically significant (␹2⫽9.36;
P⬍0.03).
In 84% (83/99) of cases, regardless of the degree
of evolutionary conservation, aa replacements in
human mutants do not respect similarity; in 68%
(67/99) of cases, the aa replaced in the mutants are
not found in any nonhuman G6PD—if so, they
probably would not cause G6PD deficiency. In support of this notion, we noticed that in 9 of the 16
cases (56%) where a human mutation does respect
similarity, the mutated residue is normal in another
species; but of the 83 cases where the human mutation does not respect similarity, there are only 23
(28%) where the mutated residue is normal in
another species.
Figure 2. Evolutionary tree of G6PD derived from the alignment of the 52 sequences (see Fig. 1 and Materials and
Methods). Distances between sequences were calculated from
the alignment by using PROTDIST according to Dayhoff’s
PAM 250 matrix (14); the tree was then generated by using
KITSCH and drawn by using TREEVIEW (17). The programs
PROTDIST and KITSCH are part of the PHYLIP 3.57 package
(18). Cyt, cytosol; Chl: chloroplast. #The author who reported
the M. sativa G6PD sequence (603219) did not state whether
the sequence is encoding for the G6PD present in the cytosol
or in the chloroplast. From the alignment and from this
phylogenetic tree, we infer that the sequence almost certainly
encodes for a cytosolic G6PD. $The author who reported A.
thaliana 2 G6PD sequence (1174336) did not state whether
the sequence is encoding for the G6PD present in the cytosol
or in the chloroplast. From the alignment and from this
phylogenetic tree, we infer that the sequence almost certainly
encodes for a chloroplast G6PD.
Mutations in human G6PD
We next analyzed the known mutations of human
G6PD (microevolution) in relation to the macroevolutionary conservation of the protein sequence (Fig.
4). The majority of mutations causing G6PD deficiency are missense, and obvious ‘null’ mutations
(early nonsense mutations, mutations destroying the
reading frame, mutations in the substrate binding or
coenzyme binding regions) have never been obMACRO- AND MICROEVOLUTION OF G6PD
DISCUSSION
The G6PD gene is unique among housekeeping
genes because of the existence of a large number of
human mutants that have been discovered, of which
many are polymorphic rather than sporadic. Analyzing this system therefore affords an opportunity to
relate the naturally occurring genetic variation in the
human species (microevolution) to sequence information available for G6PD from more than 50 other
organisms (macroevolution).
G6PD is not ubiquitous
Knowledge of the complete sequence of the genome
of several microorganisms has been one of the most
significant advances in genomics research in the past
5 years. A remarkable implication is that for the first
time it is possible to ask directly not only what genes
can be found in an organism, but also what genes are
lacking. To our surprise, we found no recognizable
G6PD sequence in seven microorganisms whose genome had been fully sequenced. Indeed, three of
the microorganisms that lack G6PD are Eubacteria,
489
Figure 3. Conservation and 3-dimensional structure. Each bar represents the distribution of G6PD amino acids among the
various conservation categories (see text) for exposed aa, total aa, and buried aa, respectively.
with small and ‘defective’ genomes, and parasitic
habit (Mycoplasma genitalium and M. pneumoniae,
Rickettsia prowazekii). We surmise that all three might
be able to capitalize on the G6PD activity of the host
cell. The other four microorganisms that lack G6PD
(Archaeoglobus fulgidus, Methanococcus jannaschii, Meth-
Figure 4. Microevolution of the G6PD molecule. This diagram reflects a synoptic analysis of 52 different G6PD sequences and
of 118 human mutations. Human G6PD is used as the reference sequence and the exons are in black boxes; the coding sequence
starts in exon 2. In the lettering of human G6PD, aa similarity (cfr. Fig. 1) is reported with the following color code: White letters
in blue field, 100%; black letters in green field, 76 –99%; white letters in red field, 51–75%; black lowercase for aa with ⬍50%
conservation. Below the human sequence, aa replacements in human mutants (mutations) are shown in uppercase in a black
field if they have a severe clinical phenotype, in boldface underlined lowercase if they have a mild clinical phenotype, in italics
lowercase if they do not have any known clinical phenotype, and in lowercase if they have been observed only in association with
other mutations; ⌬ indicates a deleted residue; Z indicates a nonsense mutation; 兰 indicates a mutation that interferes with
splicing. The human mutations are from ref 22
490
Vol. 14
March 2000
The FASEB Journal
NOTARO ET AL.
TABLE 2. Human G6PD deficiency mutations relate to evolutionary conservationa
Amino acid residue with increasing similarity conservation, n (%)
1) G6PD aa
sequence
2) Mutations, mildc
3) Mutations, severeb
4) Mutations, totalh
⬍50%
50–75%
76–99%
100%
Total
151 (29.3)
10e (18.6)
9d (18.4)
19 (18.4)
128f (24.9)
18 (33.3)
13 (26.5)
31g (30.1)
180f (34.9)
22 (40.7)
24e (49)
46g (44.7)
56 (10.9)
4 (7.4)
3 (6.1)
7 (6.8)
515
54
49
103
a
Of 115 known human mutations that affect G6PD activity (see Fig. 1), we have included in this analysis only 103: the 99 missense mutations
b
that are not found in combination with other mutations and the 4 single amino acid deletions.
Associated with chronic nonspherocytic
c
d
hemolytic anemia (WHO class I).
Associated with the risk of acute hemolytic anemia (WHO class II and III).
Of these nine mutations,
two are in frame deletions and seven are in the subunit interface; it appears that elsewhere a simple replacement of a nonconserved aa is most
e
f
unlikely to cause a severe phenotype.
Of these mutations, one is an in-frame deletion.
The aa in these two cells add up to 308, or
g
h
59.8% of the entire sequence.
The mutations in these two cells add up to 77, or 74.8% of the total number of mutations analyzed.
The
frequencies of mutations observed in the four conservation classes (line 4) when compared with those expected based on the respective aa
2
frequencies (line 1) yielded a ␹ value of 9.36 (P ⬍ 0.03).
anobacterium thermoautotrophicum, Pyrococcus horikoshii)
are Archea, which grow in an oxygen-poor habitat; we
surmise that they do not need G6PD because they do
not need to defend themselves against oxidative
stress. These evolutionary findings, while showing
that life without G6PD is possible, provide independent confirmation for the notion derived from targeting the G6PD gene in mouse embryonic stem
cells—namely, that in the organisms that do have
G6PD, its only indispensable function consists not in
pentose synthesis, but rather in supplying reductive
potential in the form of NADPH (28). NADPH in
turn serves as a defense against oxidative stress as
well as for the synthesis of nitric oxide (29, 30).
Phylogenesis
The phylogenetic tree we obtained for G6PD clearly
supports a common evolutionary origin throughout
living organisms and is consonant with taxonomy
(Fig. 2). Our findings on the evolution of the G6PD
gene are reminiscent of those that have been reported with other housekeeping genes, most of
which are ancient in evolutionary history (31):
namely, the degree of homology bears a broadly
inverse correlation to evolutionary distances, and
certain critical regions are nearly completely conserved. P. falciparum seems to be at an evolutionary
dead-end, as is often the case with parasitic protozoa.
The absence of G6PD in all the Archea that have been
fully sequenced is intriguing, but not unique. Eight
hundred sixty-four clusters of orthologous groups
(COG) have been defined on the basis of consistent
patterns of sequence similarities when comparing
protein sequences encoded in eight complete genomes from six major phylogenetic lineages (32).
Each COG consists of individual proteins or groups
of paralogs from at least three lineages. Indeed, 26%
of COG show the same phylogenetic pattern of
G6PD (presence in Bacteria and Eukarya, absence in
Archaea) (33). This phylogenetic pattern suggests
that G6PD gene has been lost in the common
ancestor of Archaea. Alternatively, the common ancestor of Archea and Eukaria did not have G6PD, and
Eukaria later acquired G6PD gene by horizontal
transfer from Bacteria.
Figure 5. Conservation and G6PD variants. The upper bar represents the distribution of G6PD mutants among the various
conservation categories (see text). For comparison, the lower bar represents the distribution of all amino acids in human G6PD
among the various conservation categories.
MACRO- AND MICROEVOLUTION OF G6PD
491
Evolutionary conservation and 3-dimensional
structure
In general, one would expect in an enzyme that is a
globular protein that the active center and the
hydrophobic core would show a high degree of
conservation (34), and we have validated this principle in the case of G6PD: 64.2% of buried (but only
35.8% of exposed) amino acid residues are highly or
fully conserved (Fig. 1, Fig. 3; P⬍0.001). In addition,
we note that in a homodimer the subunit interface is
structurally special because each amino acid within
that region is represented twice, in symmetrical
positions, on the contact surface. As a result, any
amino acid replacement within this region will produce two nearby changes in the structure of the
protein. In fact, we have found that the subunit
interface probably accounts for the majority of the
conserved blocks in the G6PD sequence on a longrange evolutionary scale, suggesting that constraints
against sequence change are greater than average in
that region.
Genetic variation in human G6PD
Comparing the distribution of mutations in a human
gene with its evolutionary history is a powerful tool
for pinpointing the function of domains and even of
individual aa within a protein. To do this, it is useful
to consider the range of potential phenotypic (clinical) consequences of different types of mutations in
a housekeeping gene such as G6PD. 1) Null mutations have never been observed. We have obtained
mice heterozygous for G6PD deficiency from G6PD
null embryonic stem cells (28), but no hemizygous
G6PD-deficient mice; we have recently found that
the condition is lethal at an early stage in embryonic
development (35). 2) Mutations that compromise
drastically substrate binding, coenzyme binding, or
the catalytic mechanism would be expected to affect
G6PD function in all cells to approximately the same
extent and they ought to cluster in the respective
regions of the sequence; no such clusters are seen
(Fig. 4). We presume that mutations with such
drastic effects may frequently be lethal. 3) Mutations
that cause instability of the G6PD protein molecule
would be expected to produce marked deficiency of
G6PD in red cells (much more than in other cells),
because these cells have a long life span after they
have lost capacity for protein synthesis (36). The
majority of known human G6PD-deficient variants
fall into this category. 4) The only cluster of mutations emerges between aa 380 and 410, and corresponds to the subunit interface in the enzymatically
active G6PD dimer (19). Moreover, nearly all of the
mutations in this cluster cause a severe phenotype
(class I); indeed, 34% of all class I mutations fall
492
Vol. 14
March 2000
within this 6% of the entire sequence. This highlights the critical role of precisely fitting noncovalent
interactions for the formation and stability of the
dimeric molecule.
The distribution of mutations in relationship to
the evolutionary history of the G6PD sequence shows
a definite and peculiar pattern: more than two-thirds
of the aa replacements that cause G6PD deficiency in
humans are in highly and moderately conserved aa,
whereas relatively few are in fully or poorly conserved
aa (Fig. 5, P⬍0.03). Fully conserved amino acids are
under-represented in G6PD-deficient mutants, presumably because of a higher probability that their
replacement may be lethal. Poorly conserved amino
acids are also under-represented, presumably because in many cases their replacement may not cause
G6PD deficiency and therefore such mutations may
go undetected. In fact, of the 19 mutants in this
group, 7 are in the subunit interface region (see
above) and 3 are in-frame deletions rather than
missense mutations; of the remaining, none has a
severe phenotype, confirming that replacements of
these nonconserved amino acids are generally well
tolerated. By contrast, highly and moderately conserved amino acids are over-represented in G6PDdeficient mutants; indeed, two-thirds of these mutants affect such amino acids residues, as though the
consequence of their replacement is often serious
enough to cause instability but not sufficiently serious to be lethal. This finding is a novel modification
of the notion that mutations associated with genetic
defects tend to affect the amino acids residues that
are most conserved.
Human polymorphic mutations vs. human sporadic
mutations
Extensive databases on human disease genes in
several cases comprise 100 or more mutations (37).
However, G6PD is the only case of a housekeeping
gene in which many (about one-half; see Table 2) of
the known mutations are both potentially pathogenic and polymorphic2 as a result of malaria selection (38, 39). Indeed, each of the mutant alleles
concerned constitutes an example of balanced polymorphism, whereby the deleterious phenotypic consequences of the mutation are balanced by the
resistance that it confers against Plasmodium falciparum. By contrast, we can presume that each of the
sporadic G6PD mutants has remained sporadic because its phenotypic consequences are too deleterious to be balanced by the resistance it might confer
2
The other pertinent case is of course that of hemoglobin,
produced by two highly tissue specific (nonhousekeeping)
genes. Polymorphic mutants of both the ␣ and the ␤ globin
genes are known, although they are not as numerous as the
G6PD mutants.
The FASEB Journal
NOTARO ET AL.
against Plasmodium falciparum. If we analyze separately the severe sporadic mutations and the mild
polymorphic mutations against the backdrop of macroevolution, we find that the former are slightly
more frequent in the 76 –99% conservation bracket
and the latter are more frequent in the 50 –75%
conservation bracket (see Table 2). However, with
the number of mutants known thus far, the difference is not yet significant.
5.
6.
7.
8.
CONCLUSION
9.
The evolutionary conservation of a housekeeping
gene such as G6PD is greater than that of tissuespecific genes, presumably because the latter may
require more specific adaptation to the physiology of
individual organisms. Total lack of G6PD is almost
certainly lethal in mammals, but partial G6PD deficiency confers biological advantage with respect to
malaria. Most of the aa replacements causing G6PD
deficiency take place in positions that are conserved,
but less than fully conserved. Thus, the range of
permissible G6PD mutations in the microevolution
of the human species bears a clear imprint of the
macroevolutionary history of this gene. With increasing numbers of full genome sequences becoming
available, the approach we have used for the analysis
of G6PD mutations is likely to be generally applicable to genes (particularly housekeeping genes) underlying other human diseases.
10.
11.
12.
13.
14.
15.
16.
17.
We thank Dr. Margaret Adams for providing the coordinates of the human G6PD model, Dr. Mike Rosen for valuable
help with the analysis of the 3-dimensional structure, Drs.
Nicola Pavletich and Dr. Martin Wiedmann for critical reading of the manuscript, and Dr. Howard T. Thaler for helpful
suggestions. We also thank all colleagues of the G6PD groups
in Ibadan (Nigeria), Napoli (Italy), and London (UK). This
work was supported by MSKCC and NIH (PO1 HL 59312).
R.N. is on leave of absence from the Hematology Division of
Federico II University, Naples, Italy.
Note added in proof: In a publication that appeared after this
paper was submitted, Cheng et al. [J. Biomed. Sci. (1999) 6,
106 –114] have reported an independent analysis of human
mutations in relation to amino acid conservation among 23
G6PD sequences from different organisms; some of their
conclusions are in good agreement with ours.
18.
19.
20.
21.
22.
23.
REFERENCES
24.
1.
Luzzatto, L., and Mehta, A. (1995) Glucose 6-phosphate dehydrogenase deficiency In The Metabolic and Molecular Basis of
Inherited Disease (Scriver, C. R., Beaudet, A. L., Sly, W. S., and
Valle, D., eds) pp. 3367–3398, McGraw-Hill, New York
2. Beutler, E. (1991) Glucose 6-phosphate dehydrogenase deficiency. N. Engl. J. Med. 324, 169 –174
3. Warburg, O., and Christian, W. (1932) Über ein neues Oxydationsferment und sein Absorptionsspektrum. Biochem. Z. 254, 438 – 458
4. Warburg, O., and Christian, W. (1931) Über Aktivierung der
robinsonschen Hexosemono-phosphorsäure in roten Blutzellen
MACRO- AND MICROEVOLUTION OF G6PD
25.
26.
and die Gewinnung aktivierender Fermentlösung. Biochem. Z.
242, 206 –227
Luzzatto, L., and Battistuzzi, G. (1985) Glucose-6-phosphate
dehydrogenase. Adv. Hum. Genet. 14, 217–329
Martini, G., Toniolo, D., Vulliamy, T., Luzzatto, L., Dono, R.,
Viglietto, G., Paonessa, G., D’Urso, M., and Persico, M. G.
(1986) Structural analysis of the X-linked gene encoding human
glucose 6-phosphate dehydrogenase. EMBO J. 5, 1849 –1855
Philippe, M., Larondelle, Y., Lemaigre, F., Mariame, B., Delhez,
H., Mason, P., Luzzatto, L., and Rousseau, G. G. (1994) Promoter function of the human glucose-6-phosphate dehydrogenase gene depends on two GC boxes that are cell specifically
controlled. Eur. J. Biochem. 226, 377–384
Ursini, M. V., Scalera, L., and Martini, G. (1990) High levels of
transcription driven by a 400 bp segment of the human G6PD
promoter. Biochem. Biophys. Res. Commun. 170, 1203–1209
Persico, M. G., Viglietto, G., Martini, G., Toniolo, D., Paonessa,
G., Moscatelli, C., Dono, R., Vulliamy, T., Luzzatto, L., and
D’Urso, M. (1986) Isolation of human glucose-6-phosphate
dehydrogenase (G6PD) cDNA clones: primary structure of the
protein and unusual 5⬘ non-coding region. Nucleic Acids Res. 14,
2511–2522; 7822
Ozols, J. (1993) Isolation and the complete amino acid sequence of lumenal endoplasmic reticulum glucose-6-phosphate
dehydrogenase. Proc. Natl. Acad. Sci. USA 90, 5302–5306
Hino, Y., and Minakami, S. (1982) Hexose-6-phosphate dehydrogenase of rat liver microsomes. Isolation by affinity chromatography and properties. J. Biol. Chem. 257, 2563–2568
Smith, R. F., and Smith, T. F. (1990) Automatic generation of
primary sequence patterns from sets of related protein sequences. Proc. Natl. Acad. Sci. USA 87, 118 –122
Nicholas, K. B., and Nicholas, B. J. (1997) GeneDoc. Pittsburgh
Supercomputer Center
Dayhoff, M. O. (1979) Atlas of Protein Sequences and Structure, Vol.
5, Suppl. 3, National Biomedical Research Foundation, Washington, D.C.
Nicholas, K. B., and Nicholas, H. B. J. (1997) Analysis and
visualization of genetic variant. http://wwwcriscom/⬃Ketchup/genedocshtml
Taylor, W. R. (1986) The classification of amino acid conservation. J. Theor. Biol. 119, 205–218
Page, R. D. (1996) TreeView: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 12,
357–358
Felsenstein, J. (1989) PHYLIP—phylogeny inference package
(version 3.2). Cladistic 5, 164 –166
Naylor, C. E., Rowland, P., Basak, A. K., Gover, S., Mason, P. J.,
Bautista, J. M., Vulliamy, T. J., Luzzatto, L., and Adams, M. J.
(1996) Glucose 6-phosphate dehydrogenase mutations causing
enzyme deficiency in a model of the tertiary structure of the
human enzyme. Blood 87, 2974 –2982
Guex, N., and Peitsch, M. C. (1997) SWISS-MODEL and the
Swiss-dbViewer: an environment for comparative protein modeling. Electrophoresis 18, 2714 –2723
Goldman, N., Thorne, J. L., and Jones, D. T. (1998) Assessing
the impact of secondary structure and solvent accessibility on
protein evolution. Genetics 149, 445– 458
Vulliamy, T., Luzzatto, L., Hirono, A., and Beutler, E. (1997)
Hematologically important mutations: glucose-6-phosphate dehydrogenase. Blood Cells Mol. Dis. 23, 302–313
O’Brien, E., Kurdi-Haidar, B., Wanachiwanawin, W., Carvajal,
J. L., Vulliamy, T. J., Cappadoro, M., Mason, P. J., and Luzzatto,
L. (1994) Cloning of the glucose 6-phosphate dehydrogenase
gene from Plasmodium falciparum. Mol. Biochem. Parasitol. 64,
313–326
Scopes, D. A., Bautista, J. M., Vulliamy, T. J., and Mason, P. J.
(1997) Plasmodium falciparum glucose-6-phosphate dehydrogenase (G6PD)-the N-terminal portion is homologous to a
predicted protein encoded near to G6PD in Haemophilus
influenzae. Mol. Microbiol. 23, 847– 848
Bautista, J. M., Mason, P. J., and Luzzatto, L. (1995) Human
glucose-6-phosphate dehydrogenase. Lysine 205 is dispensable
for substrate binding but essential for catalysis. FEBS Lett. 366,
61– 64
Rowland, P., Basak, A. K., Gover, S., Levy, H. R., and Adams,
M. J. (1994) The 3-dimensional structure of glucose 6-phos-
493
27.
28.
29.
30.
31.
32.
33.
494
phate dehydrogenase from Leuconostoc mesenteroides refined
at 2.0 A resolution. Structure 2, 1073–1087
Au, S. W., Naylor, C. E., Gover, S., Vandeputte-Rutten, L.,
Scopes, D. A., Mason, P. J., Luzzatto, L., Lam, V. M., and Adams,
M. J. (1999) Solution of the structure of tetrameric human
glucose 6-phosphate dehydrogenase by molecular replacement.
Acta Crystallogr. D. Biol. Crystallogr. 55, 826 – 834
Pandolfi, P. P., Sonati, F., Rivi, R., Mason, P., Grosveld, F., and
Luzzatto, L. (1995) Targeted disruption of the housekeeping
gene encoding glucose 6-phosphate dehydrogenase (G6PD):
G6PD is dispensable for pentose synthesis but essential for
defense against oxidative stress. EMBO J. 14, 5209 –5215
Tsai, K. J., Hung, I. J., Chow, C. K., Stern, A., Chao, S. S., and
Chiu, D. T. (1998) Impaired production of nitric oxide, superoxide, and hydrogen peroxide in glucose 6-phosphate-dehydrogenase-deficient granulocytes. FEBS Lett. 436, 411– 414
Nathan, C. (1992) Nitric oxide as a secretory product of
mammalian cells. FASEB J. 6, 3051–3064
Straus, D., and Gilbert, W. (1985) Genetic engineering in the
Precambrian: structure of the chicken triosephosphate isomerase gene. Mol. Cell. Biol. 5, 3497–3506
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A
genomic perspective on protein families. Science 278, 631– 637
Tatusov, R. L., Koonin, E. V. K., and Lipman, D. J. (1998) A
natural system of gene families from complete genomes. http://
www.ncbi.nlm.nih.gov/COG
Vol. 14
March 2000
34.
35.
36.
37.
38.
39.
Chothia, C., and Lesk, A. M. (1986) The relation between the
divergence of sequence and structure in proteins. EMBO J. 5,
823– 826
Longo, L., Rosti, V., Gaboli, M., Tabarini, D., Pandolfi, P. P., and
Luzzatto, L. (1997) Mice heterozygous for complete glucose
6-phosphate dehydrogenase (G6PD) deficiency are viable and
show somatic cell selection. Blood 90 (Suppl. 1), 408a
Mason, P. J. (1996) New insights into G6PD deficiency. Br. J.
Haematol. 94, 585–591
Krawczak, M., and Cooper, D. N. (1997) The human gene
mutation database. Trends Genet. 13, 121–122
Luzzatto, L. (1979) Genetics of red cells and susceptibility to
malaria. Blood 54, 961–976
Ruwende, C., Khoo, S. C., Snow, R. W., Yates, S. N., Kwiatkowski,
D., Gupta, S., Warn, P., Allsopp, C. E., Gilbert, S. C., Peschu, N.,
et al. (1995) Natural selection of hemi- and heterozygotes for
G6PD deficiency in Africa by resistance to severe malaria. Nature
(London) 376, 246 –249
The FASEB Journal
Received for publication February 28, 1999.
Revised for publication September 16, 1999.
NOTARO ET AL.