Download - Wiley Online Library

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polymorphism (biology) wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Metagenomics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Human genetic variation wikipedia , lookup

Genomics wikipedia , lookup

Genetic code wikipedia , lookup

Genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome evolution wikipedia , lookup

Population genetics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Oncogenomics wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene nomenclature wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Microsatellite wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Gene desert wikipedia , lookup

Epistasis wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome editing wikipedia , lookup

Mutation wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Frameshift mutation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
American Journal of Hematology 70:269–277 (2002)
Spectrum of ␤-Thalassemia Mutations and Their
Association With Allelic Sequence Polymorphisms at
the ␤-Globin Gene Cluster in an Eastern
Indian Population
Ritushree Kukreti,1* Debasis Dash,1 Vineetha K E,1 Sanchita Chakravarty,1 Swapan Kr Das ,2
Madhusnata De ,2 and Geeta Talukder 2
2
1
Functional Genomics Unit, Centre for Biochemical Technology (CSIR), Delhi, University Campus, Delhi, India
Thalassemia Counselling Department, Vivekananda Institute of Medical Sciences, Ramakrishna Mission Seva Pratisthan, Calcutta, India
In this report, the spectrum of ␤-thalassemia mutations and genotype-to-phenotype correlations were defined in large number of patients (␤-thalassemia carriers and major) with
varying disease severity in an Eastern Indian population mainly from the state of West
Bengal. The five most common ␤-thalassemia mutations were detected, which included
IVS1-5 (G➝C), codon 15 (G➝A), codon 26 (G➝A), codon 30 (G➝C), and codon 41/42
(−TCTT). These accounted for 85% in 80 ␤-thalassemic alleles deciphered from 56 patients, including ␤-thalassemia major and carriers, and 15% of alleles remained uncharacterized in these patients. Expression of the human ␤-globin gene is regulated by an
array of cis-acting DNA elements, including five DNase I hypersensitive sites (HSs) in the
locus control region (LCR), promoters that incorporate certain silencer elements, and
enhancers at 3ⴕ of the ␤-globin gene. For detailed studies and to understand the molecular basis of ␤-thalassemia, we studied two groups of subjects: a group of 12 patients from
four families having ␤-thalassemia major and carrier phenotype and a control group of 26
healthy individuals. In these two groups, we examined portions of the ␤-globin gene
locus control region HSs 1, 2, 3, and 4, which included the (CA)x(TA)y repeat motif, the
(AT)xNy(AT)z repeat motif, the inverted repeat sequence TGGGGACCCCA, the promoter
region of the G␥-globin gene, an (AT)x(T)y repeat 5ⴕ of the silencer region, and the ␤-globin
gene and its 3ⴕ flanking region. We investigated the allelic sequence polymorphisms in
these regions and their association with the ␤-thalassemia mutations to know the possible genotype–phenotype relationship in ␤-thalassemia patients. An analysis of cisacting regulatory regions showed varied sequence haplotypes associated with some
frequent ␤-thalassemia mutations in this Eastern Indian population. Am. J. Hematol.
© 2002 Wiley-Liss, Inc.
70:269–277, 2002.
Key words: ␤-thalassemia mutation; ␤-globin gene; locus control region; hypersensitive
site; allelic sequence polymorphisms
INTRODUCTION
␤-Thalassemia is a highly prevalent autosomal recessive disorder characterized by the reduced or absent expression of the ␤-globin gene, leading to an imbalance of
␣- and ␤-globin chains [1]. So far, over 300 ␤-thalassemia alleles have been characterized in or around the
␤-globin region. Normally ␤-thalassemia trait in India is
3.3% with 1–2 per 1,000 couples being at risk of having
an affected offspring each year, leading to a high societal
burden [2]. As the ethnic composition of the Indian popu© 2002 Wiley-Liss, Inc.
Contract grant sponsor: Department of Biotechnology, Government of
India, Programme on Functional Genomics.
*Correspondence to: Dr. Ritushree Kukreti, Functional Genomics
Unit, Centre for Biochemical Technology (CSIR), Delhi University
Campus, Mall Road, Delhi 110 007, India. E-mail: [email protected]
Received for publication 12 November 2001; Accepted 15 March
2002
Published online in Wiley InterScience (www.interscience.wiley.
com). DOI: 10.1002/ajh.10117
270
Kukreti et al.
Fig. 1. Physical representation of the ␤-globin genes and ␤-globin LCR. (a) Location of the seven restriction enzyme sites
defining the RFLP haplotypes (Orkin et al., 1982), HincII 5ⴕ of the ␧-globin gene; HindIII in G␥ and A␥ genes; HincII in the
⌿␤1-globin gene; HincII 3ⴕ of the ⌿␤1-globin gene; AvaII in the ␤-globin gene; BamH1 3ⴕ of the ␤-globin gene. (b) The
human ␤-globin LCR, located from approximately 8–22 kb upstream of the ␧-globin gene is composed in 5 hypersensitive
sites. Arrows above the map indicate location of major HSs of the LCR. HS4 showing SNP at the Palindromic sequence, HS2
showing AT rich region and HS1 showing (CA)x(TA)y motif (c) ␤-globin gene showing positions of 5ⴕ silencer region
including (AT)x(T)y motif, promoter region, exons, introns, and 3ⴕ sequences (enhancer) present downstream to Exon3 of
the ␤-globin gene. Polymorphic positions, in the 5ⴕ flanking region, inside the ␤-globin gene and 3ⴕ flanking region defining
the sequence haploytype. ␤-thalassemia mutations Cd 15 (G➝A), Cd 30 (G➝C),IVSI-5 (G➝C) are represented by bold
characters. ␤-globin gene and HSs of the ␤-LCR are amplified and sequenced by appropriate synthetic oligonucleotides.
lation is heterogeneous [3], each region of the country
has its own distinct set and frequency of ␤-thalassemia
mutations [4]. Orkin et al. (5, Fig. 1) observed the entire
chromosomal environments of the ␤-thalassemia mutations bearing the mutant alleles are in strong linkage
disequilibrium to specific patterns of DNA restriction
site polymorphism, referred to as the ␤-globin gene cluster or RFLP haplotypes. Analysis of these haplotypes in
the ␤-globin gene cluster has been useful in determining
the chromosomal background of ␤-thalassemia mutations in several human populations. The eastern region of
India is not well characterized in this regard. The Bengali
population from the state of West Bengal has been the
subject of our study. It is an admixture of native people
with later migrants who settled there and left their genetic
imprints. ␤-Thalassemia mutations are found in abun-
dance here. We have sequenced minimal combinations of
cis-acting regulatory elements that normally direct full
expression of the ␤-globin gene [6,7] (Fig. 1) and analyzed the allelic sequence variations at the ␤-globin gene
cluster in eastern Indian population. In the present study,
we also attempted to study the nucleotide variations in
the ␤-globin gene cluster and their association with the
␤-thalassemia mutations so that we could explore the
genetic basis of the clinical diversity of the disease in the
eastern part of India.
MATERIALS AND METHODS
Genomic DNA Samples
␤-Thalassemia patients studied for this work were recruited from the Vivekananda Institute of Medical Sci-
Spectrum of ␤-Thalassaemia Mutations in Eastern India
271
G
TABLE I. PCR-Based Methods of Analysis for the ␤-LCR-HS1, -HS2, -HS3, -HS4, ␥-Promoter Region, (AT)x(T)y, and 3ⴕ Flanking
Sequence (Enhancer) Polymorphisms
Region analyzed
Primers (sequence 5⬘–3⬘)
␤-LCR–HS1
FP–CCTGCAAGTTATCTGGTCAC
RP–CTTAGGGGCTTATTTTATTTTGT
␤-LCR–HS2
FP–CAGGGCAGATGGCAAAAA
RP–CTGACCCCGTATGTGAGCA
FP–ATGGGGCAATGAAGCAAAGGAA
RP–ACCCATACATAGGAAGCCCATAGC
FP–GCAAACACAGCAAACACAACGAC
RP–CAGGGCAAGCCATCTCATAGC
FP–GGCCCCTTCCCCACACTATC
RP–ATGGCAGAGGCAGAGGACAGGTTG
FP–TTCCCAAAACCTAATAAGTAAC
RP–CCTCAGCCCTCCCTCTAA
␤-LCR–HS3
␤-LCR–HS4
G
␥–promoter
region
(AT)x(T)y region
3⬘ flanking
sequences
(enhancer)
FP–TGCCCTGGCCCACAAGTATC
RP–TCAGGGGAAAGGTGGTATCTCTAA
Location of the primers
corresponding to
positions of the human
␤-globin gene
(Coordinates GenBank)
13156–13601
8757–9217
4397–4992
668–1109
34273–34816
61440–61960
63586–64125
Size of
amplified
products
(bp)
PCR profile
94°C, 5⬘; (94°C, 30⬙; 62°C,
cycles; (94°C, 30⬙; 61°C,
cycles; (94°C, 30⬙; 60°C,
23 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
27 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
27 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
35 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
35 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
cycles; (94°C, 30⬙; 61°C,
cycles; (94°C, 30⬙; 60°C,
23 cycles; 72°C, 10⬘
94°C, 5⬘; (94°C, 30⬙; 62°C,
35 cycles; 72°C, 10⬘
1⬘; 72°C, 1⬘) 3
1⬘; 72°C, 1⬘) 3
1⬘; 72°C, 1⬘)
445
75⬙; 72°C, 15⬙)
460
75⬙; 72°C, 15⬙)
595
75⬙; 72°C, 15⬙)
442
75⬙; 72°C, 15⬙)
544
1⬘; 72°C, 1⬘) 3
1⬘; 72°C, 1⬘) 3
1⬘; 72°C, 1⬘)
520
75⬙; 72°C, 15⬙)
539
PCR, polymerase chain reaction; FP, forward primer; RP, reverse primer; bp, base pairs.
ences, Calcutta, after diagnosis. The disease was diagnosed by clinical and hematological data. Control blood
samples were collected from different communities with
informed consent.
out direct sequencing. Standard Gene Scan analysis software was used to analyze the data. We also opted for
direct DNA sequencing of the ␤-globin gene to further
confirm the mutations.
Phenotype Analysis
Examination of Cis-Acting Potential Regulators
Genomic DNA was isolated from whole blood in anticoagulant (EDTA) by using the standard proteinase
K–phenol–chloroform procedure as described in Old et
al. [8]. Abnormal hemoglobins were analyzed by agarose
gel electrophoresis at pH 8.6, Hb A2 and Hb F were
determined by gel elution and alkali denaturation [9],
respectively. ␣-Globin genotype was determined as described previously [10].
␤-Globin gene, spanning a 1.8-kb region, and the following regions cis to the ␤-globin gene were examined
(GenBank accession number U01317): (a) ␤-globin gene
cluster locus control 5⬘ HS-1, 5⬘ HS-2, 5⬘ HS-3, and 5⬘
HS-4; (b) promoter region of G␥-globin gene; (c) 5⬘ silencer region the (AT)x(T)y repeat; and (d) 3⬘ flanking
region to ␤-globin gene. The (CA)x(TA)y repeat motif in
␤-LCRHS1, the (AT) x N y (AT) z repeat motif in
␤-LCRHS2, and the (AT)x(T)y repeat motif in the 5⬘
␤-globin gene promoter region were analyzed by Genescan to determine the size of the PCR products, and
direct sequencing was carried out using forward and reverse primers [12,13]. The location of these cis-acting
regulatory elements, the PCR primers used for their amplification, sizing, and sequencing, and the optimum conditions for polymerase chain reactions with the PerkinElmer Gene Amp PCR System 9600 (Perkin-Elmer, Oak
Brook, IL) are shown in Table I. The PCR products were
purified with the Qiaquick PCR purification Kit (Qiagen), and DNA sequencing was performed by the ABI
Prism 377 automated DNA sequencer (Applied Biosystems) using dye terminator chemistry. The sequencing
reaction consisted of 25 cycles, with denaturation at 96°C
Mutation Analysis by Allele-Specific
Oligonucleotide, ARMS-PCR, and SnapShot
Primer Extension Kit
Screening for known five ␤-thalassemia mutations,
IVS1-1 (G➝T), IVS1-5 (G➝C), codon 8/9 (+G), codon
26 (G➝A), and codon 41/42 (−TCTT), was performed
with the Amplification Refractory Mutation System
(ARMS) as described in Old et al. [8]. The singlenucleotide primer extension method was used for validation or comparative genotyping of known SNPs and
point mutations [11]. This straightforward technique permits exact base identity determination of a polymorphic
locus by using ABI Prism Snapshot ddNTP primer extension kit (Applied Biosystems, Foster City, CA) with-
272
Kukreti et al.
TABLE II. Spectrum of ␤-Thalassemia Mutations in
Eastern India
RESULTS
HS4, 5⬘ silencer region (AT)x(T)y, promoter region of
G
␥-globin gene, and 3⬘ flanking region. The pedigrees,
hematological parameters, ␣-genotype, and ␤-thalassemia mutations of all members of the four families are
given in Fig. 2. Genotypes linked to ␤-thalassemia mutations in these members were revealed by PCR and genomic sequencing (Table III). Complete sequencing of
the repeat motifs (CA) x (TA) y , (AT) x N y (AT) z , and
(AT)x(T)y of ␤-globin gene cluster was carried in 26
normal individuals from eastern Indian population, and
the number and frequencies of observed repeat polymorphisms are shown in Table IV. It is often difficult to
correctly align the repeat polymorphism. We have combined analysis of PCR product size using fluorescent
primers with information derived from sequence analysis
to show that it is possible to discern the pattern of various
repeats in the individuals with correct typing of the motifs [12,13].
Spectrum of ␤-Thalassemia Mutations in
Eastern India
Polymorphic Sites in ␤-LCR HS1, ␤-LCR HS2,
␤-LCR HS3, and ␤-LCR HS4
A total of 80 ␤-thalassemic alleles have been deciphered from 10 homozygous ␤-thalassemia, 40 heterozygous ␤-thalassemia, 4 Hb E/␤-thalassemia, and 6 double heterozygous ␤-thalassemia Eastern Indian patients.
␤-Thalassemia carriers and ␤-thalassemia major were
defined by the phenotypic characteristics (high levels of
Hb A2, >3.5%, and transfusion dependence) [14]. Common mutations IVS1-1 (G➝T), IVS1-5 (G➝C), codon
8/9 (+G), codon 26 (G➝A), codon 41/42 (−TCTT),
codon 15 (G➝A), and codon 30 (G➝C) described in the
Indian population [15] were screened by ARMS. Five
different mutations were detected as shown in Table II:
IVS1-5 (G➝C), Hb E:codon 26 (G➝A), codon 41/42
(−TCTT), codon 15 (G➝A), and codon 30 (G➝C).
These accounted for 28.75%, 25%, 17.5%, 3.75%, and
10% of all the studied ␤-thalassemic alleles, respectively,
and 15% was uncharacterized. The single nucleotide extension technique, using fluorescent-labeled dideoxy
nucleotide and primers designed to terminate on the base
adjacent to the SNP, has also been used to detect these
mutations in the ␤-globin gene, and it is a cost-effective
method for investigating point mutations and deletions
for diagnostic purposes.
The ␤-LCR HS1 sequence analysis of the 445-bp fragment revealed three polymorphic patterns in the (CA)x(TA) y motif: (CA) 1 2 (TA) 6 , (CA) 1 0 (TA) 8 , and
(CA)12(TA)7, the latter being a novel polymorphic pattern found in the Indian population. The observed variations led to allelic variability, both in base composition
and in the length of the repeat, and they add an element
to the genetically variable environment of the ␤-thalassemia chromosome. Sequence analysis of the 462-bp
HS2 fragment, including the highly polymorphic
(AT)xNy(AT)z motif of the ␤-LCR, showed polymorphisms identical to those described previously [16,17]
and occurred in three different sequence configurations
of the (AT) x N y (AT) z motif: (AT) 1 0 N 1 2 (AT) 1 1 ,
(AT)9N12(AT)11, (AT)9N12(AT)10 specific to the genotype of the SNP at the palindromic region of HS4 of the
␤-LCR. It was observed that the G allele of ␤-LCR HS4
is associated with only one distinct polymorphic pattern
of AT-rich segments, (AT)9N12(AT)10 of the LCR HS2
region [17] (Table IV), indicating these mutational
events were probably rare and arose from a common
founder. In the core region of 5⬘ HS3 of the LCR, there
were no variations from the reference sequence among
the different haplotypes (data not shown).
Mutation
IVS1–5 (G→C)
Hb E:codon 26 (G→A)
Codon 41/42 (−TCTT)
Codon 15 (G→A)
Codon 30 (G→C)
Uncharacterized
Total
Alleles
%
23
20
14
3
8
12
80
28.75
25.0
17.50
3.75
10.0
15.0
100
for 30 sec and annealing at 60°C for 4 min. Sequences
were aligned with the corresponding wild-type sequences
using the Factura and Sequence Navigator software programs.
Family Studies
We studied 12 patients belonging to four unrelated
families with ␤-thalassemia carrier and major phenotype
from an eastern Indian population. Three individuals of
family D showed the phenotypes of heterozygous and
homozygous ␤-thalassemia; however, as a result of sequence analysis, no mutation was detected in either allele
of the ␤-globin gene. All the family members were analyzed for ␤-globin gene and subsequently cis-acting elements: ␤-LCR HS1, ␤-LCR HS2, ␤-LCR HS3, ␤-LCR
Polymorphic Sites in Promoter Region of the
G
␥-Globin Gene, 5ⴕ Silencer Region to the
␤-Globin Gene, ␤-Globin Gene, and Its 3ⴕ
Flanking Region
Analysis of G␥-globin gene promoter showed no variation from the normal sequence. We examined sequence
polymorphism in the 5⬘-flanking region (−650 to −250)
upstream of a cap site and in the ␤-globin gene (positions
−224 to poly A signal) and in its 3⬘ flanking region
Spectrum of ␤-Thalassaemia Mutations in Eastern India
273
Fig. 2. Pedigrees, hematological results, ␣-genotype and ␤-thalassemia mutations of patients from four families with
␤-thalassemia carrier and ␤-thalassemia major phenotype. N: ␤-globin gene GenBank refernce sequence. Arrow indicate
the family where the members showed the phenotypes of heterozygous and homozgous ␤-thalassemia, however no
mutation was detected in either allele of the ␤-globin gene.
(sequences at the end of exon 3 of the ␤-globin gene)
[18] in 24 ␤-thalassemic alleles from eastern India. The
5⬘-flanking region contains numerous dimorphic nucleotide positions [10]. Three of them were polymorphic in
our samples at positions −551, −521, and −340. In addition, a microsatellite, (AT)x(T)y, is subject to frequent
micro-insertion or deletion [19]. We found (AT)7(T)7,
(AT)8(T)5, and (AT)9(T)5 configurations in the families
studied here. No polymorphisms were observed in either
the 5⬘ or the 3⬘ untranslated regions of the gene. Five
dimorphic sites defining the intragenic polymorphism in
the ␤-globin gene (Fig. 1, positions +59, +511, +569,
+576, and +1,161) and four dimorphic sites in the 3⬘
flanking region of the ␤-globin gene (Fig. 1, positions
101, 181, 183, and 339) have been studied. Three ␤-thalassemia mutations, i.e., codon 15 (G➝A), codon 30
(G➝C), and IVS1-5 (G➝C), and two uncharacterized
alleles were analyzed in our family members.
A
B
C
D
E
F
G
H
12–6
10–8
*
*
*
*
*
12–7
*
10–12–11
9–12–10
*
*
9–12–10
9–12–10
9–12–10
9–12–10
*
HS2
(AT)xNy(AT)
−10,623
−551
T
C
*
C
C
C
*
*
C
−158
C
*
*
*
*
*
*
*
*
␥-Promoter region
HS4
(TGGGGACCCCA)
−18,542
A
G
*
*
G
G
G
*
*
G
7–7
8–5
9–5
*
*
8–5
8–5
8–5
9–5
C
*
*
*
*
T
T
*
*
A
*
*
*
*
*
*
*
*
5⬘ Silencer region
(AT)x(T)y
−530 −521 −491
T
C
*
*
*
C
C
C
C
−340
C
*
T
*
*
*
T
T
T
Exon 1
+59
C
G
*
*
*
*
*
G
G
G
T
T
*
*
*
*
T
T
C
*
*
*
*
*
*
*
*
T
*
C
C
C
C
C
C
C
␤-Globin gene
Intron 2
+511 +569 +576 +1161
1
4
5
2
2
2
3
6
6
FR
G
*
*
*
*
*
*
*
*
G
*
A
*
*
*
*
*
*
C
*
A
*
*
A
A
A
A
A
*
*
*
*
*
*
*
*
3⬘ to ␤-globin gene
101 181 183 339
a
In ␤-LCR HS2, N ⳱ ACA CAT ATA CGT. Various positions in the ␤-LCR are numbered relative to ␧-globin gene. Various positions are numbered at the 5⬘ end, within the ␤-globin gene in relation
to the ␤-globin gene cap site and at the 3⬘ end in relation to its polyadenylylation site. Positions +59, +511, +569, +576, and +1161 refer to the framework defined by Orkin et al. in 1982 [5]; FR–framework.
In each position, asterisks indicate identity with the ␤-globin GenBank reference sequence. BGC refers to normal individual from eastern Indian population.
BGC ref
Codon 15
Codon 30
Codon 30
Codon 30
IVS1–5
IVS1–5
Uncharacterised allele
Uncharacterised allele
␤-thalassemia
mutations
HS1
(CA)x(TA)y
−6,284
␤-LCR
TABLE III. ␤-Thalassemia Mutations and Their Associated Sequence Haplotypes of ␤-Globin Gene Cluster in Eastern Indian Populationsa
274
Kukreti et al.
Spectrum of ␤-Thalassaemia Mutations in Eastern India
275
TABLE IV. Number (n) and Percentages (%) of Different Repeat Sequences of the ␤-Globin Gene Cluster in the Alleles From
Normal Individuals of Eastern Indian Population
Origin (Eastern India)
Repeat sequences
n
%
␤-LCR HS1 (CA)x(TA)y
(CA)12(TA)6
(CA)10(TA)8
(CA)12(TA)7
No.a
34
7
11
52
65.4
13.5
21.1
100
18
16
18
52
34.6
30.8
34.6
100
34
12
6
52
65.4
23.1
11.5
100
␤-LCR HS2(AT)xNy(TA)z
(AT)9N12(TA)10
(AT)9N12(TA)11
(AT)10N12(TA)11
No.
5⬘ Silencer region (AT)x(T)y
(AT)7(T)7
(AT)8(T)5
(AT)9(T)5
No.
a
No. refers to total number of analyzed alleles.
Sequence Polymorphism Associated With
␤-Thalassemia Mutations
Analyzed sequence haplotypes in the ␤-globin gene
cluster defines the chromosome background in which the
thalassemia mutation arose. Nine single nucleotide polymorphisms and three length variants were identified in
the ␤-globin gene cluster, and these sites could be segregated as eight sequence haplotypes in the thalassemia
families studied. Three observed ␤-thalassemia mutations, codon 15 (G➝A), codon 30 (G➝C), and IVS1-5
(G➝C), were found to be linked with different sequence
haplotype, and sequence variations within the structural
gene refer to the framework as described by Orkin et al.
[5] (Table III). The codon 15 (G➝A) mutation is linked
to haplotype A and sequence framework 4. The codon 30
(G➝C) is associated with three different haplotypes: B,
C, and D. Haplotype B has sequence framework 5, while
haplotypes C and D are linked with identical sequence
framework 2. The most common IVS1-5 (G➝C) mutation is associated with two different haplotypes, E and F,
and sequence frameworks are namely 2 and 3, while in
family D, the patient with the ␤-thalassemia major phenotype inherited different chromosome haplotypes, G
and H, which are linked to the identical framework 6
sequence containing the normal ␤-globin gene (uncharacterized allele). Because this is the first molecular characterization involving the repeat sequence polymorphism
in the ␤-globin gene cluster in eastern India, we combined both the technique of automated genotyping with
direct sequence analysis and family studies to demonstrate the nature of repeat polymorphic sequences.
DISCUSSION
We have investigated the molecular basis of ␤-thalassemia in Eastern India and linked some ␤-thalassemia
mutations to ␤-globin gene cluster sequence haplotype
with population samples from eastern India. This is the
first report of an association of repeat polymorphic markers with the globin gene mutations in this region of India.
Five different ␤-thalassemia mutations were detected in
85% of the total 80 ␤-thalassemic alleles studied (Table
II). Among the ␤-thalassemia mutations, IVS1-5 (G➝C)
and Hb E are the most common mutations, occurring at
percentages of 28.75 and 25, respectively. IVS1-5
(G➝C) happens to be the most frequent mutation in
many other parts of India as well, i.e., Punjab, Gujrat,
Maharastra, and southern India [2,15,20]. Characterization of the mutation patterns revealed from such study
should provide the basis for prenatal diagnosis and genetic counseling of affected individuals.
Although regulation of ␤-globin gene expression may
have diverse explanations, there could be genetic elements cis to the ␤-globin gene that affect the severity of
the disease. Several genetic loci are candidates for cisacting regulators of the ␤-globin gene transcription [7].
The ␤-globin gene LCR, located 6–18 kb 5⬘ to the ␧-globin gene, plays a vital role in the tissue-specific and
developmental-specific expression of globin genes. It
consists of 1–5 hypersensitive sites (HSs) [21], which
occur in phylogenetically conserved regions that bind
transcription factors. Three different polymorphic (CA)x(TA)y patterns of ␤-LCR HS1, namely, (CA)12(TA)6,
(CA)10(TA)8, and (CA)12(TA)7, were observed in our
patients. These polymorphic patterns are also present in
the studied normals (Table IV), indicating that it is presumably a polymorphism [6]. We observed from automated genotyping and direct sequence analysis that the G
allele in the palindromic sequence of the HS4 locus is
associated with only one distinct pattern,
(AT)9N12(AT)10, of HS2 of the ␤-LCR. There appears to
be linkage disequilibrium at the two loci with the pre-
276
Kukreti et al.
ponderance of occurrence of the G allele at HS4 along
with the (AT)9N12(AT)10 allele at the HS2 locus of
␤-LCR, suggesting that the G allele could be an evolutionarily new mutation in the study population [17].
There are reports that sequence variations in the HS2
region of ␤-LCR are associated with altered levels of
fetal hemoglobin [16] in the thalassemia intermediate
patients. The DNA silencer region 5⬘ to the ␤-globin
gene acts as a negative regulatory element, and it is possible that the ␤-globin gene silencer with the core structure (AT)x(T)y may modulate expression of ␤-globin
gene [22]. Three length polymorphisms were observed in
the (AT)x(T)y sequence region: (AT)7(T)7, (AT)8(T)5,
and (AT)9(T)5. The AT dinucleotide insert accompanies
either the presence or the absence of a T➝A replacement
at −528, showing a +ATA,−T event as reported previously [23]. These variants have also been described by
Perrin et al. [24], Ragusa et al. [25], and Gasperini et al.
[26] mostly in Sicily, Algerian, and Sardinian populations. The (AT)9(T)5 motif has been studied with varying
implications. It has been associated with silent ␤-thalassemia, mild phenotype, and higher levels of Hb F in some
homozygous ␤-thalassemia patients [25]. The dimorphisms observed in the 5⬘ region at positions −551, −521,
and −340 within the ␤-globin gene at positions +59,
+511, +569, and +1,161 and in the 3⬘ flanking region of
the ␤-globin gene at positions 181 and 183 have also
been reported in Melanesian, Caucasian, Asian, and African populations [18].
The codon 15 (G➝A) mutation appears on one sequence haplotype A. The codon 30 (G➝C) mutation
showed association with three distinct sequence haplotypes: B, C, and D. These haplotypes exhibit differences
at both the sides of the mutation and could be the result
of a conversion between a normal and a thalassemia
chromosome. Such events are likely, as variant 9-12-10
in the ␤-LCR HS2 and variant 9-3 in the silencer region
5⬘ to the ␤-globin gene are frequent in normal alleles
from the eastern Indian population (Table IV). The most
common mutation, IVS1-5 (G➝C), appears on two sequence haplotypes, E and F, differing by three base substitutions (−551, −521, and +59). The haplotypes associated with this mutation are also represented among the
normal alleles of this population. In India this mutation is
strongly linked with eight different haplotypes and two
different sequence frameworks, 2 and 3 [27], but it is also
found in the same percentage in other populations, such
as Indo-Mauritians, with three varied haplotypes on an
identical sequence framework [27]. Recently Bandyopadhyay et al. [2] found this mutation to be strongly
linked (82%) with a + − − − − + haplotype (HindII ␧,
HindIII ␥G, HindIII ␥A, HindII 5⬘␺␤, HindII 3⬘␺␤, and
HinfI ␤) in an eastern Indian population. This mutation is
considered to be the oldest ␤-thalassemia allele in India,
on the basis of its high haplotype diversity as well its
wide distribution in this subcontinent. This type of molecular heterogeneity underlies the wide spectrum of
clinical manifestations of the disease. In brief, this analysis revealed the association of some common ␤-thalassemia mutations with the varied haplotypes which would
be generated due to the large size of the chromosomal
portion and the existence of hot spots for meiotic recombinations leading to many recombinations, substitutions,
or conversions with the adjacent regions [24]. This exercise gives us an opportunity to determine the allele
diversity associated with the ␤-thalassemia mutations; it
also allows us to study the nucleotide variations in the
␤-globin gene cluster and their associations with the
␤-thalassemia mutations in order to explore the genetic
basis of the clinical diversity of the disease in eastern part
of India, primarily the state of West Bengal.
One of our aims was to learn if the severity of the
phenotype of the thalassemia carrier and thalassemia major in family D could be predicted on the basis of genotypic analysis. The clinical phenotypes are usually heterogeneous and depend mainly on the ␤-thalassemia
mutation inherited [28]. ␤-Globin gene analysis for family D members did not reveal any mutation in either
heterozygous state or homozygous state corresponding to
the severity of the disease. Studies of such families
would be of interest, as they may reveal some association
of the sequence polymorphisms in the cluster with the
low ␤-globin gene expression and consequently the phenotypes of the ␤-thalassemia. To explain the phenotype
for a ␤-thalassemia carrier and ␤-thalassemia major, we
postulated the existence of an unknown genetic determinant in these patients that might be present in ␤-globin
gene cluster and affecting the phenotype; simple sequences present throughout the cluster have been known
to form a number of unusual structures in chromatin, and
they are believed to play an important role in eukaryotic
gene regulation [29]. Sequence analysis of major regulatory regions of ␤-globin gene cluster detected various
changes from the reference sequence. We found sequence haplotypes G and H to be associated with the
uncharacterized allele in patients from family D. The
nucleotide changes (data not shown) and number of repeat variations (Table IV) in the ␤-globin gene cluster
implied in our study are also represented among the normal alleles of this population. This indicates that the
sequence repeat polymorphisms present in the ␤-globin
gene cluster are common polymorphisms [6,7,10,16],
and although they may not be necessarily associated with
the hematological characteristic of ␤-thalassemia, they
could play an important role in subtle expression of the
locus as a whole [16]. We speculate that the genetic
determinant may lie either in other unexamined cisacting sequences, i.e., phylogenetically conserved regions of the LCR, 5⬘ of the ␧-globin gene, 5⬘ of the
␦-globin gene, other 3⬘ sequences of the ␤-globin gene,
Spectrum of ␤-Thalassaemia Mutations in Eastern India
or elsewhere in the genome, which may involve the function of a gene encoding for a transcription factor regulating the function of the ␤-globin gene [27,30]. With the
recent success of the Human Genome Project, it is anticipated that more genetic modifiers and environmental
factors will be discovered that can help in understanding
the variations in phenotypes of ␤-thalassemia.
14.
15.
16.
ACKNOWLEDGMENTS
The authors thank Prof. Samir K. Brahmachari and Dr.
Mitali Mukherjee for scientific help and Ms. R. Jaya, Ms.
Sakshi, and Ms. Ruchi for technical support. Financial
support from the Department of Biotechnology, Government of India, in the Programme on Functional Genomics to S.K.B. is duly acknowledged.
17.
18.
19.
REFERENCES
1. Weatherall DJ, Clegg JB. Thalassemia—a global public health problem. Nat Med 1996;2:847–849.
2. Bandyopadhyay A, Bandyopadhyay S, Chowdhury MD, Dasgupta
UB. Major ␤-globin gene mutations in Eastern India and their associated haplotypes. Hum Hered 1999;49:232–235.
3. Varawalla NY, Old JM, Sarkar R, Venkatesan R, Weatherall DJ. The
spectrum of ␤- thalassemia mutation on the Indian subcontinent: the
basis for prenatal diagnosis. Br J Haematol 1991;78:242–247.
4. Thapar R. A history of India. Vol 1. Harmondsworth, England: Penguin; 1996.
5. Orkin SH, Kazazian HH Jr, Antonarakis SE, Goff SC, Boehm CD,
Sexton JP, Waber PG, Giardina PJ. Linkage of ␤-thalassemia mutations and ␤-globin gene polymorphisms with DNA polymorphisms in
human ␤-globin gene cluster. Nature 1982;296:627–631.
6. Pasceri P, Pannell D, Wu X, Ellis J. Full activity from human ␤-globin
locus control region transgenes requires 5⬘HS1, distal ␤-globin promoter, and 3⬘ ␤-globin sequences. Blood 1998;82:853–862.
7. Lu Z-H, Steinberg MH. Fetal hemoglobin in sickle cell anemia: relation to regulatory sequences cis to the ␤-globin gene. Blood 1996;87:
1604–1611.
8. Old JM, Varawalla NY, Weatherall DJ. The rapid detection and prenatal diagnosis of ␤-thalassemia in the Asian Indian and Cypriot populations in the UK. Lancet 1990;336:834–837.
9. De M, Chakraborty G, Das SK, Bhattacharya DK, Talukder G. Molecular studies of haemoglobin E in tribal populations of Tripura.
Lancet 1997;349:1297.
10. Dimovski AJ, Adekile AD, Divoky V, Baysal E, Huisman THJ. Polymorphic pattern of the (AT)x(T)y motif at −530 5⬘ to the ␤-globin gene
in over 40 patients homozygous for various ␤-thalassemia mutations.
Am J Hematol 1994;45:51–57.
11. Fortina P, Delgrosso K, Sakazume T, Santacroce R, Moutereau S, Su
HJ, Graves D, McKenzie S, Surrey S. Simple two-color array-based
approach for mutation detection. Eur J Hum Genet 2000;8:884–894.
12. Tatu T, Thein SL. Automated genotyping for accurate assignment of
the (AT)xNy(AT)z motif within the ␤-globin locus control region—
hypersensitive site 2. Br J Hematol 2001;112:488–492.
13. Saleem Q, Choudhry S, Mukerji M, Bashyam L, Padma MV, Chakravarthy A, Maheshwari MC, Jain S, Brahmachari SK. Molecular analysis
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
277
of autosomal dominant hereditary ataxias in the Indian population:
high frequency of SCA2 and evidence for a common founder mutation. Hum Genet 2000;106:179–187.
Chui DHK, Waye JS. Hydrops fetalis caused by ␣-thalassemia: an
emerging health care problem. Blood 1998;91:2213–2222.
Varawalla NY, Fitches AC, Old JM. Analysis of ␤-globin gene haplotypes in Asian Indians: origin and spread of ␤ on the Indian subcontinent. Hum Genet 1992;90:443–449.
Samakoglu S, Philipsen S, Grosveld F, Luleci G, Bagci H. Nucleotide
changes in the ␥-globin promoter and the (AT)xNy(AT)z polymorphic
sequence of ␤LCRS-2 region associated with altered levels of Hb F.
Eur J Hum Genet 1999;7:345–356.
Kukreti R, Rao CB, Das SK, De M, Talukder G, Vaz F, Verma IC,
Brahmachari SK. Study of the single nucleotide polymorphism (SNP)
at the palindromic sequence of hypersensitive site (HS) 4 of the human
␤-globin locus control region (LCR) in Indian population. Am J Hematol 2002;69:77–79.
Fullerton SM, Harding RM, Boyce AJ, Clegg JB. Molecular and population genetic analysis of allelic sequence diversity at the human ␤-globin locus. Proc Natl Acad Sci U S A 1994;91:1805–1809.
Harding RM, Fullerton SM, Griffiths RC, Bond J, Cox MJ, Schneider
JA, Moulin DS, Clegg JB. Archaic African and Asian lineages in the
genetic ancestry of modern humans. Am J Hum Genet 1997;60:772–
789.
Venkatesan R, Sarkar R, Old JM. ␤-Thalassemia mutations and their
linkage to ␤-haplotypes in Tamilnadu in Southern India. Clin Genet
1992;42:251–256.
Orkin SH. Regulation of globin gene expression in erythroid cells. Eur
J Biochem 1995;23:271–281.
Drew LR, Tang DC, Berg PE, Rodgers GP. The role of trans-acting
factors and DNA-bending in the silencing of human ␤-globin gene
expression. Nucleic Acids Res 2000;28:2823–2830.
Elion J, Berg PE, Trabuchet G, Schechter AN, Krishnamoorthy R,
Labie D. Is polymorphism 0.5 kb 5⬘ to the ␤-globin gene relevant to
the ␤S gene expression? Blood 1989;74(Suppl 1):527a.
Perrin P, Bouhassa R, Mselli L, Garguier N, Nigon V-M, Bennani C,
Labie D, Trabuchet G. Diversity of sequence haplotypes associated
with ␤-thalassemia mutations in Algeria: implications for their origin.
Gene 1998;213:169–177.
Ragusa A, Lombardo M, Beldjord C, Ruberto C, Lombardo T, Elion
J, Nagel RL, Krishnamoorthy R. Genetic epidemiology of ␤-thalassemia in Sicily: do sequences 5⬘ to the G␥ gene and 5⬘ to the ␤ gene
interact to enhance HBF expression in ␤-thalassemia? Am J Hematol
1992;40:199–206.
Gasperini D, Perseu L, Melis MA, Maccioni L, Sollaino MC, Paglietti
E, Cao A, Galanello R. Heterozygous ␤-thalassemia with thalassemia
intermedia phenotype. Am J Hematol 1998;57:43–47.
Kotea N, Ramasawmy R, Lu CY, Fa NS, Gerard N, Beesoon S, Ducrocq R, Surrun SK, Nagel RL, Krishnamoorthy R. Spectrum of
␤-thalassemia mutations and their linkage to ␤-globin gene haplotypes
in the Indo-Mauritians. Am J Hematol 2000;63:11–15.
Rund D, Oron-Karni V, Filon D, Goldfarb A, Rachmilewitz E, Oppenheim A. Genetic analysis of ␤-thalassemia intermedia in Israel:
diversity of mechanisms and unpredictability of phenotype. Am J Hematol 1997;54:16–22.
Wells RD, Collier DA, Hanvey JC, Shimizu M, Wohlrab F. The chemistry and biology of unusual DNA structures adopted by oligopurine–
oligopyrimidine sequences. FASEB J 1988;2:2939–2949.
Onishi Y, Kiyama R. Enhancer activity of HS2 of the human ␤-LCR
is modulated by distance from the key nucleosome. Nucleic Acids Res
2001;29:3448–3457.