Download Functional SNPs in the SCGB3A2 promoter are

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomic imprinting wikipedia , lookup

Point mutation wikipedia , lookup

Behavioural genetics wikipedia , lookup

Medical genetics wikipedia , lookup

Epigenetics of depression wikipedia , lookup

History of genetic engineering wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Population genetics wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

NEDD9 wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene desert wikipedia , lookup

Nutriepigenomics wikipedia , lookup

SNP genotyping wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Public health genomics wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
Human Molecular Genetics, 2009, Vol. 18, No. 6
doi:10.1093/hmg/ddn442
Advance Access published on January 6, 2009
1156–1170
Functional SNPs in the SCGB3A2 promoter are
associated with susceptibility to Graves’ disease
Huai-Dong Song1,2, , {, Jun Liang3, {,{, Jing-Yi Shi1, {, Shuang-Xia Zhao1, {, Zhi Liu1, {,
Jia-Jun Zhao3, {, Yong-De Peng4, Guan-Qi Gao5, Jiong Tao1, Chun-Ming Pan1, Li Shao1,
Feng Cheng1, Yi Wang6, Guo-Yue Yuan7, Chao Xu1, Bing Han1, Wei Huang8, Xun Chu8, Yi Chen1,
Yan Sheng1, Rong-Ying Li1, Qing Su9, Ling Gao3, Wei-Ping Jia10, Li Jin6, Ming-Dao Chen1,
Sai-Juan Chen1,2, Zhu Chen1,2 and Jia-Lun Chen1
1
Ruijin Hospital, State Key Laboratory of Medical Genomics, Molecular Medicine Center, Shanghai Institute of
Endocrinology, Shanghai Jiao Tong University (SJTU), School of Medicine, Shanghai 20025, China, 2Shanghai
Center for Systems Biomedicine, SJTU, 800 Dong Chuan Road, Shanghai 200240, China, 3Department of
Endocrinology, Shandong Province Hospital, Shandong University, 324 Jing 5 Road, Jinan 250021, China,
4
Department of Endocrinology, The First People’s Hospital, Shanghai Jiaotong University, Shanghai 200080, China,
5
Department of Endocrinology, The People’s Hospital of Linyi, Shandong Province, 27 Liberation Road, Linyi 276003,
China, 6Centre of Anthropology, Fudan University, 220 Handan Road, Shanghai 200433, China, 7Department of
Endocrinology, Hospital of Jiangsu University, Zhenjiang, Jiangsu 212001, China, 8Chinese National Human Genome
Center at Shanghai, Zhang Jiang High Tech Park, 250 Bi Bo Road, Shanghai 201203, China, 9Department of
Endocrinology, Xin Hua Hospital, Shanghai Jiao Tong University (SJTU), School of Medicine, Shanghai 20092, China
and 10Shanghai Diabetes Institute, Shanghai Jiaotong University, No. 6 Hospital, Shanghai 200233, China
Received November 14, 2008; Revised December 22, 2008; Accepted December 30, 2008
Graves’ disease (GD) is one of the most common human autoimmune diseases, and recent data estimated a
prevalence of clinical hyperthyroidism of 0.25 –1.09% in the population. Several reports have linked GD to the
region 5q12– q33; and a locus between markers D5s436 and D5s434 was specifically linked to GD susceptibility in the Chinese population. In the present study, association analysis was performed using a large
number of single-nucleotide polymorphisms (SNPs) at this locus in 2811 patients with GD recruited from
different geographic regions of China. The strongest associations with GD in the combined Chinese Han
cohorts were mapped to two SNPs in the promoter (pSNP) of SCGB3A2 [SNP76, rs1368408, P 5 1.43 3
1026, odds ratio (OR) 5 1.28 and SNP75, 2623 2622, P 5 7.62 3 1025, OR 5 1.32, respectively], a gene
implicated in immune regulation. On the other hand, pSNP haplotypes composed of the SNP76
(rs1368408)1SNP74 (rs6882292) or SNP761SNP75 (2623 2622, AG/T) variants are correlated with high disease susceptibility (P 5 0.0007, and P 5 0.0192, respectively) in this combined Chinese Han cohort.
Furthermore, these haplotypes were associated with reduced SCGB3A2 gene expression levels in human
thyroid tissue, while functional analysis revealed a relatively low efficiency of SCGB3A2 promoters of the
SNP761SNP75 and SNP761SNP74 haplotypes in driving gene expression. These results suggest that the
SCGB3A2 gene may contribute to GD susceptibility.
†
‡
To whom correspondence should be addressed. Tel: þ86 2164370045 Ext. 610808; Fax: þ86 2164743206; Email: [email protected]
The authors wish it to be known that, in their opinion, the first six authors should be regarded as joint First Authors.
Present address: Department of Endocrinology, the Fourth Hospital of Xuzhou, Jiangsu Province, China.
# The Author 2009. Published by Oxford University Press. All rights reserved.
For Permissions, please email: [email protected]
Human Molecular Genetics, 2009, Vol. 18, No. 6
1157
INTRODUCTION
RESULTS
Graves’ disease (GD) is one of the most common human
autoimmune diseases with recent data estimating frequencies
of up to 1.3% (0.5% clinical and 0.7% subclinical) in the
USA (1) and 0.25– 1.09% in China (2). The hallmark of GD
is the production of thyroid-stimulating hormone receptor
(TSHR)-stimulating antibodies, leading to hyperthyroidism.
GD is a complex trait disease and develops in genetically susceptible individuals, which arises through the interactions
of susceptibility genes (3) and non-genetic factors, such as
infection (4).
Many genetic studies of GD have been carried out and
several genes, such as human leukocyte antigen (3), cytotoxic
T lymphocyte antigen 4 (CTLA-4) (5,6), CD40 gene (7),
PTPN22 (8), TSHR (9) and SAS-ZFAT (10) have been
linked to GD susceptability. However, none of these genes
show an absolute correlation with disease predisposition and
the exact genetic requirements for the development of GD
are still unknown. A previous genome-wide study of 54
Chinese Han GD pedigrees provided the strongest evidence
for linkage at D5s436 on chromosome 5q31. When four
additional markers around D5s436 were used, a maximum
two-point LOD score of 4.31 and a maximum multipoint
LOD score of 4.12 were obtained for marker D5s2090 (11).
Interestingly, from a dataset of 123 Japanese sibling pairs,
the 5q31 locus was also linked with autoimmune thyroid
disease (AITD), including GD and Hashmoto’s disease, with
a maximum multipoint LOD score of 3.14 at D5s436 (12).
Data from linkage analysis conducted on 445 subjects from
29 families of a homogeneous founder Caucasian population,
the Old Order Amish of Lancaster County, Pennsylvania, also
supports a linkage with AITD at chromosome 5q (13). Given
the inherent inaccuracies of linkage analysis in identifying susceptibility genes (14), it is reasonable to hypothesize that the
previously observed linkages point to the same locus involved
in GD predisposition.
In the present study, we have performed association analysis
on a large number of single-nucleotide polymorphisms (SNPs)
to identify the putative GD susceptibility gene at the 5q31
locus in the Chinese Han population. First, we used 179
SNPs within a 3.0 Mb region surrounding marker D5s2090
and found the most significant association signal to be at
SNP rs1368408. Subsequent association analysis was then performed using 122 SNPs from a 1.0 Mb region surrounding
rs1368408 for two independent populations collected from
Shandong province and the city of Shanghai. The results
suggested that the SNP76 (rs1368408) and SNP75 in the promoter of Secretoglobin Family 3A Member 2 (SCGB3A2)
gene may be the causal variants of GD. Next, these results
were further confirmed by association analysis in 2811
Chinese Han patients with GD and 2807 healthy individuals
recruited from different geographic regions in China.
Finally, functional analysis in vivo and in vitro has revealed
that the susceptible alleles of the SNP76, SNP75 and
SNP74, which are located on the promoter of SCGB3A2
gene, affect the binding of transcription factors to the
promoter of SCGB3A2 and that the SNP76þSNP75 and
SNP76þSNP74 haplotypes are associated with lower levels
of SCGB3A2 gene expression.
Defining the GD susceptibility region by association
analysis of a 3.0 Mb region surrounding marker D5s2090
To narrow down the GD susceptibility locus, we started with a
3.0 Mb region surrounding D5s2090, defined by a decrease in
the LOD score of 1.5 or more with an 99% confidence interval for linkage (Fig. 1A). The NCBI database indicates that
this region, from markers D5s436 to D5s413, contains 25
genes (Fig. 1B, Supplementary Material, Table S1). Accordingly, 179 SNPs distributed with an average space of 15 Kb
were selected from the NCBI SNP database (dbSNP) (NCBI
Human Genome Build 36.1) for genotyping of 384 GD
patients and 382 healthy subjects from Shandong province,
China. Data quality control (QC) filters removed 40 SNPs
with minor allele frequencies (MAFs) ,1% (N ¼ 32) or a
Hardy – Weinberg equilibrium (HWE) P 1 1 026 in controls (n ¼ 8) (missing data in Supplementary Material,
Table S2) (15). Out of the 139 SNPs, 4 SNPs have significantly different allele frequencies (at P-value ,0.001 level)
in the GD and normal subjects and the strongest association
was measured for SNP rs1368408 (P ¼ 3.69 1025). It is
also notable that the four SNPs, including SNP rs1368408,
form a cluster, suggesting a locus of strong association
(Fig. 1C, Supplementary Material, Table S2). These results
lead us to further investigate a 1.0 Mb region surrounding
SNP rs1368408 (between SHGC-111280 and RH92492),
which contains 11 genes.
Identification of a susceptibility gene in a 1.0 Mb region
surrounding rs1368408
The 83 SNPs in the exons and promoters of the 11 genes in the
1.0 Mb region surrounding rs1368408 were identified by
re-sequencing. In addition, 39 SNPs in the intergenic
sequences of the same region, distributed with an approximate
interval of 5 Kb, were selected from the NCBI dbSNP. The
allele frequencies for the 122 SNPs within the 1.0 Mb region
were measured (Fig. 1D and E and Table 1) from 541 GD
patients and 478 normal subjects from Shandong province.
Data QC filters removed 18 SNPs from the analysis.
Notably, out of the remaining 104 SNPs, 20 exhibit significantly different allele frequencies between the two groups,
with P-values ,0.05 (Table 1 and Fig. 2A). Further analysis
of these 20 SNPs revealed that 7 are distributed in the Secretoglobin Family 3A Member 2 (SCGB3A2, also designated
Uteroglobin-related protein 1, UGRP1) gene, including 4 in
the promoter region [pSNPs: SNP72 (21351, G/A); SNP74
(rs6882292); SNP75 (2623 2622, AG/T); SNP76
(rs1368408)], 2 in the introns [iSNPs: SNP77 (rs2278376);
SNP78 (rs3217372)] and one synonymous SNP in exon 3
(cSNP): SNP89 (rs34212847) (Table 1 and Fig. 2A).
The most significant association was measured at SNP76
(P ¼ 4.11 1028) and SNP75 (P ¼ 1.37 1028) (Table 1
and Fig. 2A and C). SNPs with relatively weak, albeit significant, GD associations were also detected in adjacent regions,
including the promoter/coding portions of the SPINK5
(2 SNPs), KIAA0555 (2 SNPs), MGC23985 (1 SNP) and
SPINK1 (1 SNP) genes, and in intergenic regions (7 SNPs)
(Table 1 and Fig. 2A). Furthermore, in order to exclude
1158
Human Molecular Genetics, 2009, Vol. 18, No. 6
Figure 1. Linkage and association analysis, as well as SNPs distribution and gene content of the region between markers D5s436 and D5s413 on chromosome
5q31. (A) Non-parametric LOD score (NPL) sketch map from the original genome scan (11). The multipoint analysis localized the GD susceptibility locus to
within an approximate interval of 2 cM between markers D5s436 and D5s434. The multipoint LOD scores throughout this interval are greater than 3.0, with a
maximum multipoint LOD score of 4.12 at the marker D5s2090 (11). (B) The 3.0 Mb region surrounding D5s2090, defined by a decrease in LOD score by 1.5 or
more, an 99% confidence interval for linkage, was used as the focal point of our search to identify candidate GD-susceptibility genes. The region identified
contains 25 genes. Red lines represent the genes with forward orientations, and blue lines represent those with reverse orientations. The symbols 125 represent
the gene names (Supplementary Material, Table S1). (C) The genotype results of the 179 SNPs selected from the 3.0 Mb region surrounding marker D5s2090
(D5s436– D5s413). Four SNPs with significant differences at the P-value ,0.001 level between GD and control subjects are marked in (B) with a red cross, and
the most significant GD associations were observed in SNP rs1368408 (P ¼ 3.69 1025) (see Supplementary Material, Table S2 for detailed information). (D)
The positions of 122 SNPs located in 11 genes within the 1.0 Mb region (from marker SHGC-111280 to RH92492). Of these, 83 SNPs were identified by
re-sequencing these genes and 39 SNPs were selected from the NCBI dbSNP in a proportional space of 5 Kb, which are marked in (D) (see Supplementary
Material, Table S1 for detailed information). Each gene is indicated by a box. (E) The SNPs located on exon 3, introns and the 50 and 30 flanking regions of
the SCGB3A2 gene. A total of 38 SNPs were identified by re-sequencing a 15 Kb region of SCGB3A2.
false positives, we analyzed 20 neutral SNPs on different
chromosomes as genomic controls (GCs). In the population,
the GC inflation factor (lgc) was 0.5351. Our statistical
results were all normalized to the GC. Notably, the above
Mass array results were corroborated with the results of the
sequencing analysis for the three pSNPs of the SCGB3A2
gene (SNP76, SNP75 and SNP74) in the Shandong population.
Next, the linkage disequilibrium (LD) regions of the 104
SNPs within 1.0 Mb region were evaluated using the Haploview program (16). Three LD regions composed of these
SNPs were observed in the Shandong population (Fig. 2A,
bottom panel). They were located between SNP32 and
SNP39, SNP65 and SNP103, and SNP141 and SNP148,
respectively (Fig. 2A, bottom panel). Interestingly, when the
20 SNPs with significantly different allele frequencies
between the GD and control populations in Shandong were
examined for their locations within the LD block structure, 7
of them were found to be distributed in the middle region,
whereas 5 and 8 of them, with relatively weak association
signals with GD (P-value: 0.04020.0028), were found in
the left and right LD blocks, respectively (Fig. 2A and
Table 1). It was notable that all SNPs in this 1 Mb region
with P-value less than 0.001 are distributed in the middle
region (Table 1 and Fig. 2A and C).
To identify causal variants of GD in this 1.0 Mb region, the
genotype data of 104 of the 122 SNPs suitable for logistic
regression analysis in the Shandong population were further
mined by logistic regression analysis (5,17) (Fig. 2D – K and
Supplementary Material, Table S3). When SNP72 (21351,
G/A), SNP74 (rs6882292), SNP75 (2623 2622, AG/T),
SNP76
(rs1368408),
SNP77
(rs2278376),
SNP78
(rs3217372) and SNP89 (rs34212847) were individually put
Table 1. The name and location of the SNPs in the 1.0 Mb region around rs1368408 and the results of association analysis for these SNPs
Shandong
Control Case
(%)
(%)
P-value
OR OR
(95%CI)
Shanghai
Control Case
(%)
(%)
P-value
OR OR
(95%CI)
SNP27
SNP28
SNP29
SNP30
SNP31
SNP32
SNP33
SNP34
SNP35
SNP36
SNP37
SNP38
SNP39
SNP40
SNP41
SNP42
SNP43
SNP44
SNP45
SNP46
SNP47
SNP48
SNP49
SNP50
SNP51
SNP52
SNP53
SNP54
SNP55
SNP56
SNP57
SNP58
SNP59
SNP60
SNP61
SNP62
SNP63
SNP64
SNP65
SNP66
SNP67
SNP68
SNP69
SNP70
SNP71
SNP72
SNP73
SNP74
SNP75
SNP76
SNP77
SNP78
SNP79
SNP80
SNP81
SNP82
SNP83
SNP84
SNP85
SNP86
SNP87
STK32A
STK32A
STK32A
STK32A
STK32A
STK32A
STK32A
DPYSL3
DPYSL3
DPYSL3
DPYSL3
DPYSL3
DPYSL3
Intergenetic
Intergenetic
Intergenetic
KIAA0555
KIAA0555
KIAA0555
KIAA0555
KIAA0555
KIAA0555
KIAA0555
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
SPINK1
SPINK1
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
40.19
33.54
40.68
6.00
39.53
32.30
39.90
6.13
0.7781
0.6548
0.7679
0.9437
0.97
0.95
0.97
1.02
34.35
32.06
40.08
5.50
33.57
33.37
37.71
6.64
0.7006
0.5351
0.3782
0.2177
0.97
1.06
0.91
1.22
40.36
28.57
23.20
12.46
23.27
23.12
23.31
11.57
31.43
37.78
26.32
22.10
11.06
19.68
22.78
23.19
11.15
28.13
0.2775
0.3571
0.5682
0.5450
0.0567
0.8358
0.9119
0.8028
0.1094
0.90
0.90
0.93
0.90
0.80
0.98
0.99
0.95
0.85
35.29
26.77
19.70
21.00
19.79
20.23
18.20
15.22
31.07
32.74
26.51
18.91
24.11
19.94
20.62
19.24
16.81
28.67
0.1698
0.8932
0.6068
0.0546
0.9244
0.8035
0.4877
0.3112
0.2762
0.89
0.99
0.95
1.20
1.01
1.02
1.07
1.13
0.89
40.70
23.96
12.58
33.28
46.05
49.91
31.95
38.39
19.66
9.78
36.86
47.02
43.24
28.49
0.2869
0.0264
0.0588
0.1573
0.7059
0.0028
0.0916
0.91
0.78 0.63 – 0.97
0.75
1.17
1.04
0.76 0.64 – 0.91
0.85
43.58
26.19
12.34
34.89
50.00
43.57
24.62
51.67
31.77
12.38
38.90
47.43
47.29
25.63
0.0010
0.0238
0.9729
0.0645
0.3252
0.1124
0.6438
1.38 1.14 – 1.68
1.31 1.04 – 1.66
1.00
1.19
0.90
1.16
1.05
28.22
35.36
6.85
26.57
30.72
16.05
10.30
6.05
14.43
4.48
4.39
8.38
17.60
10.12
20.40
19.39
29.32
41.03
6.17
32.45
34.96
18.09
10.08
8.92
13.92
5.08
4.46
7.84
19.90
11.94
16.26
20.95
0.6914
0.0203
0.5960
0.0097
0.0561
0.2757
0.9146
0.0402
0.7775
0.5594
0.9454
0.7822
0.2174
0.2485
0.0657
0.4110
1.06
28.21
1.27 1.04 – 1.55 38.18
0.89
4.81
1.33 1.08 – 1.64 26.12
1.21
33.01
1.16
15.15
0.98
8.78
1.53 1.08 – 2.17 7.82
0.96
13.60
1.14
4.04
1.02
4.59
0.93
24.92
1.16
17.86
1.20
8.28
0.76
17.87
1.10
17.43
24.36
41.81
6.15
24.73
31.39
17.01
8.17
8.85
13.77
4.79
5.87
25.89
14.98
9.01
18.81
19.90
0.0253
0.0780
0.2042
0.5234
0.3696
0.3043
0.5696
0.3837
0.8976
0.3359
0.1368
0.6140
0.0441
0.7375
0.5393
0.1463
0.82 0.69 – 0.98
1.16
1.30
0.93
0.93
1.15
0.92
1.14
1.01
1.20
1.30
1.05
0.81 0.66 – 0.99
1.10
1.06
1.18
18.93
20.32
0.1683
1.09
5.81
12.67
10.20
7.29
5.25
7.35
4.79
8.63
10.24
25.87
11.44
9.75
19.03
11.92
7.21
11.35
6.47
13.93
7.92
7.39
4.13
12.34
4.33
14.71
19.85
37.69
14.60
14.61
24.88
11.83
8.90
12.44
0.5554
0.4289
0.2082
0.9653
0.2659
0.0004
0.6520
0.0001
1.37 10 28
4.11 10 28
0.0443
0.0016
0.0896
0.9544
0.2027
0.4452
1.12
1.12
0.76
0.99
0.78
1.77
0.90
1.83
2.17
1.73
1.32
1.58
1.41
0.99
1.26
1.11
6.72
12.15
8.72
6.35
3.42
7.32
3.00
5.27
6.89
14.95
9.40
9.17
19.74
10.89
6.41
8.54
4.60
13.65
9.06
6.12
5.58
8.52
7.01
5.96
8.65
20.49
12.25
13.56
21.52
11.33
9.55
11.23
0.0448
0.3222
0.7672
0.8059
0.0153
0.2995
1.46 10 25
0.4986
0.1348
0.0012
0.0366
0.0014
0.4304
0.7379
0.0146
0.0427
0.67
1.14
1.04
0.96
1.67
1.18
2.44
1.14
1.28
1.47
1.35
1.55
1.11
1.05
1.54
1.35
6.43
12.60
9.00
6.54
3.84
7.48
3.65
6.83
8.66
19.47
8.72
8.70
18.94
11.38
8.40
10.12
6.11
13.45
9.02
6.50
4.47
9.38
4.31
8.43
11.12
23.59
10.19
10.67
20.71
11.67
9.82
11.61
0.5514
0.2500
0.9830
0.9512
0.1431
0.0035
0.1270
0.0041
7.62 10 25
1.43 10 26
0.0158
0.0033
0.0823
0.6778
0.0341
0.2018
0.95
1.08
1.00
0.99
1.17
1.28
1.19
1.26
1.32
1.28
1.19
1.25
1.12
1.03
1.19
1.17
5.36
11.86
12.67
6.49
11.67
12.76
0.3037
0.8998
0.9549
1.23
0.98
1.01
6.20
10.35
11.64
4.97
9.87
14.15
0.2119
0.7062
0.0918
0.79
0.95
1.25
5.44
11.67
11.19
6.10
11.36
12.03
0.5167
0.8300
0.2283
1.13
0.97
1.09
rs4705132
rs6894633
rs6580458
rs55936730
rs6884181
I7-1,1113728
rs918797
rs3805533
rs1049171
E14-2,160813
rs3749721
I11-1,154766
rs2241696
rs958677
rs7716144
rs981644
rs3763094
rs3763095
rs2116766
rs7735403
rs1432827
rs6895278
P1,2963
rs6895278
rs12655012
rs12659905
rs1016104
rs4705201
rs17107298
rs11319
P1,2133
rs3806925
rs4705194
rs1594671
rs1368412
rs1025489
rs7702893
rs3777125
rs6877288
rs6895894
rs6877478
rs7726085
rs7726552
rs7727031
P6,21664
P5,21351
P4,2130121303
P3,rs6882292
P2,26232622
P1,rs1368408
I1-1,rs2278376
I1-2,rs3217372
rs10058203
rs2116805
I1.3,11454
I1.4-1,11779
rs13355689
I1.4-2,11939
rs6859234
rs41291429
rs6859391
1.30 – 2.42
1.36 – 2.44
1.67 – 2.82
1.43 – 2.10
1.01 – 1.73
1.20 – 2.09
0.45 – 0.99
1.10 – 2.54
1.61 – 3.69
1.16 – 1.85
1.02 – 1.78
1.18 – 2.04
1.09 – 2.19
1.01 – 1.82
Combined Han
Control Case
(%)
(%)
P-value
Position
Sequence near the
polymorphism
146599399
146637662
146637777
146639268
146703083
146708589
146708683
146751943
146752171
146752641
146753229
146758688
146784559
146842422
146892300
146947379
146992484
146992496
147004669
147008123
147053948
147102213
147143408
147150530
147156489
147162250
147167239
147172314
147183226
147184385
147191585
147192922
147200067
147205835
147212204
147216985
147226926
147232327
147234886
147234964
147235002
147235514
147235795
147236113
147236803
147237116
147237164
147237749
147237844
147238355
147238688
147238737
147239235
147239598
147239920
147240245
147240275
147240405
147240931
147240961
147241008
GGGAGC[G/C]AACACT
TTCTAT[G/T]TTTACT
AGATCA[T/C]GTTTTA
TTGACA[C/T]AATTGC
ACTTTT[C/T]AGCTGG
CAAATG[C/T]TGTGCT
GAGGAT[A/G]AGTGAC
CTCTTA[G/A]TTTACA
ACCATT[G/A]TCTCTG
GGGGAA[C/T]TGGGAA
CCCTAG[G/A]GTCTGC
TCAAAA[C/T]CTCAAC
AGGTTG[G/A]ATTACA
CTGCTT[G/T]GATAGA
GCAAGG[C/T]GTTCCT
TAGGCA[G/T]GTTAAA
ACAGGA[C/T]GCCAGA
ACAACA[A/G]GAACTA
TGAACT[G/T]ATGGTG
GAGATA[T/C]ACTAAA
GAAATC[A/T]CTACTG
TTACTA[C/T]GTGCCA
NTATAG[T/C]TAGAAA
TTACTA[C/T]GTGCCA
ATTATT[C/T]AGGTAG
GCTATC[A/G]TTGGCT
TTATCA[A/G]TGTTAT
ATGTAG[A/C]GGTTAA
TTATCT[A/G]AAGTTT
GGTCAC[C/T]GCGAGG
TTTTCC[T/C]GACAGA
TCCTAG[C/T]GCTAAG
TAAGCC[A/G]AGTGTG
GCTTCC[A/G]AGCTTC
GTCAGC[C/T]CAATTT
CTCCCT[A/G]TCACAG
AAAAAA[A/T]TTCAAA
GTTATT[C/G]CAATCA
TCTGGG[A/G]TCTTGG
AATAAA[A/G]GTCGTT
ATTTGC[A/G]TATGAA
TGTATA[C/T]GTATGT
CTTGGC[A/T]TTTATA
GGCCTG[C/T]GTGGCA
ATTTAT[A/T]TATACT
TTCATG[G/A]TGTCTT
AAAGAT[AAA/2]GAAATG
ATTTAT[G/A]TTCCCA
TCAAA[AG/T]ACACT
TTGTTT[G/A]GTGAGA
AGTAAG[C/A]CTTGCC
TTTTTT[T/2]ATTTTA
GCTTCT[A/G]CCTAAG
CCTACA[A/C]TGGCAA
CACATG[C/A]ATGTGT
AAGGCT[C/T]ACCATC
TCCTAA[C/T]GGTTCC
CATGTT[A/G]GAATTA
ATGACG[A/G]AGAGTG
CTTCTC[C/T]GAGGAG
AGAAAG[C/G]TAAGTA
OR OR
(95%CI)
1.10– 1.49
1.09– 1.45
1.16– 1.50
1.17– 1.40
1.04– 1.35
1.09– 1.44
1.03– 1.38
Continued
1159
Marker
location
Human Molecular Genetics, 2009, Vol. 18, No. 6
SNP
Description
symbols
Table 1. Continued
SNP88
SNP89
SNP90
SNP91
SNP92
SNP93
SNP94
SNP95
SNP96
SNP97
SNP98
SNP99
SNP100
SNP101
SNP102
SNP103
SNP104
SNP105
SNP106
SNP107
SNP108
SNP109
SNP110
SNP111
SNP112
SNP113
SNP114
SNP115
SNP116
SNP117
SNP118
SNP119
SNP120
SNP121
SNP122
SNP123
SNP124
SNP125
SNP126
SNP127
SNP128
SNP129
SNP130
SNP131
SNP132
SNP133
SNP134
SNP135
SNP136
SNP137
SNP138
SNP139
SNP140
SNP141
SNP142
SNP143
SNP144
SNP145
SNP146
SNP147
SNP148
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
SCGB3A2
Intergenetic
Intergenetic
MGC23985
MGC23985
MGC23985
MGC23985
Intergenetic
LOC391839
LOC391839
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
Intergenetic
SPINK5
SPINK5
SPINK5
SPINK5
SPINK5
SPINK5
SPINK5
SPINK5
SPINK5
Intergenetic
Intergenetic
SPINK5L2
SPINK5L2
SPINK5L2
SPINK5L2
MGC21394
MGC21394
LOC402232
LOC402232
rs3910207
E3-1,rs34212847
rs3843496
rs3910183
3U1-1,13679
rs17107376
rs61012413
rs60040551
rs4705204
rs17107378
rs17107379
rs17107380
rs17107381
rs7708635
rs1594666
rs1010764
rs1549204
rs2250145
rs1432974
P2,21412
rs1153089
rs9654488
rs7712580
rs721570
rs1153084
rs11958481
rs1432973
rs7722416
rs4705047
rs7703202
rs1432978
rs4273592
rs961841
rs1363525
rs2161337
rs4421091
rs2080085
rs7700964
rs7703112
rs17775074
rs7706528
rs2895729
rs2287771
rs2303062
rs2303063
rs2303064
rs2303065
rs2303067
rs2303068
rs4349706
rs3088193
rs2303064
rs2287770
rs6881658
rs6895745
rs4705055
rs4269285
rs17096690
rs10477364
rs9325091
rs1023714
Shandong
Control Case
(%)
(%)
P-value
OR OR
(95%CI)
Shanghai
Control Case
(%)
(%)
P-value
OR OR
(95%CI)
Combined Han
Control Case
(%)
(%)
P-value
OR OR
(95%CI)
6.99
28.85
11.85
2.89
11.59
11.46
3.99
11.97
28.56
12.21
2.51
12.24
11.95
4.32
0.0003
0.8912
0.8031
0.6245
0.6651
0.7441
0.7465
1.81 1.32 – 2.47 7.25
0.99
26.02
1.03
9.38
0.86
2.56
1.06
11.16
1.05
10.94
1.09
5.48
8.83
26.12
12.01
4.06
10.71
10.15
4.63
0.1653
0.9536
0.0535
0.0682
0.7626
0.5557
0.4249
1.24
1.01
1.32
1.61
0.96
0.92
0.84
8.05
27.94
10.67
2.88
11.07
11.24
6.03
9.54
28.49
11.44
2.85
11.38
11.34
5.83
0.0110
0.5728
0.2396
0.9413
0.6520
0.8866
0.7105
1.20 1.05 – 1.37
1.03
1.08
0.99
1.03
1.01
0.97
13.18
13.19
11.40
7.01
18.55
5.16
29.48
16.22
36.34
22.21
41.80
13.18
12.88
11.45
7.21
21.31
7.64
29.96
14.74
38.51
23.95
36.83
0.9968
0.8417
0.9737
0.8919
0.1572
0.0555
0.8524
0.4823
0.3311
0.4041
0.0301
1.00
0.97
1.00
1.03
1.19
1.52
1.02
0.89
1.10
1.10
0.81 0.67 – 0.98
10.88
12.69
11.03
8.85
17.01
9.46
26.30
16.60
35.86
20.64
16.11
11.38
10.95
7.77
19.65
9.22
27.96
17.17
36.89
22.72
0.0005
0.4096
0.9587
0.4127
0.1238
0.8523
0.3406
0.7005
0.5930
0.2342
1.57 1.22 – 2.03
0.88
0.99
0.87
1.19
0.97
1.09
1.04
1.05
1.13
11.38
13.49
10.31
10.25
18.89
10.08
12.65
12.76
10.48
10.10
20.41
10.27
0.0751
0.6327
0.7962
0.8283
0.0819
0.7830
1.13
0.94
1.02
0.98
1.10
1.02
39.39
5.65
32.26
48.89
2.63
35.83
0.0030
0.1172
0.1257
0.3542
0.6221
0.0603
1.47 1.14 – 1.90 36.36
0.45
11.53
1.17
39.38
8.78
0.84
9.80
0.93
25.07
0.56
10.48
35.93
15.29
38.46
7.14
9.17
26.43
10.77
0.8491
0.0529
0.6864
0.2466
0.6025
0.4501
0.8234
0.98
1.38
0.96
0.80
0.93
1.07
1.03
12.68
21.33
10.07
10.87
20.17
5.86
45.77
31.31
3.27
35.63
38.42
8.50
43.10
25.63
4.31
35.15
39.07
12.16
0.2593
0.0323
0.3337
0.8362
0.7863
0.0176
0.90
0.76 0.59 – 0.97
1.33
0.98
1.03
1.49 1.08 – 2.05
49.60
31.81
2.59
31.95
34.94
10.74
41.49
5.01
53.50
35.02
4.04
26.46
34.56
6.42
46.42
4.21
0.0615
0.0869
0.1169
0.0060
0.8427
0.0091
0.0146
0.3467
1.17
1.16
1.58
0.77 0.63 – 0.93
0.98
0.57 0.37 – 0.87
1.22 1.04 – 1.44
0.83
4.81
4.64
0.9176
0.96
20.79
21.85
16.22
16.22
0.0152
0.0029
22.36
25.33
0.1488
0.74 0.58 – 0.94 17.92
0.69 0.55 – 0.88 17.89
5.64
1.18
25.22
15.05
14.24
6.25
25.13
0.0775
0.0220
0.5716
0.9624
0.81
0.76 0.60 – 0.96
1.11
1.00
48.61
44.49
44.80
45.43
46.33
45.89
48.93
46.60
44.14
47.58
44.31
40.99
40.89
50.10
0.3809
0.8771
0.2628
0.6229
0.0197
0.0275
0.6241
0.92
0.99
1.12
0.96
0.80 0.67 – 0.96
0.82 0.68 – 0.98
1.05
51.31
40.03
41.43
45.69
44.64
43.11
49.21
47.19
48.98
46.81
47.87
47.35
44.04
52.86
0.0353
0.0001
0.0207
0.3687
0.3165
0.6690
0.0748
0.85 0.73 – 0.99
1.44 1.19 – 1.73
1.24 1.03 – 1.50
1.09
1.12
1.04
1.16
49.22
49.02
48.46
49.33
34.17
6.27
23.33
22.82
47.63
46.48
46.57
47.61
38.37
7.11
22.59
22.33
0.5029
0.2854
0.4198
0.4600
0.1137
0.5192
0.7546
0.8401
0.94
0.90
0.93
0.93
1.20
1.14
0.96
0.98
47.19
48.76
48.87
47.69
35.19
6.55
21.56
21.55
43.05
44.11
43.93
46.72
28.81
9.07
17.40
18.95
0.0595
0.0221
0.0112
0.6422
0.0125
0.0569
0.0079
0.0964
0.85
0.83
0.82
0.96
0.75
1.42
0.77
0.85
0.71 – 0.97
0.70 – 0.96
0.59 – 0.94
0.63 – 0.93
Position
Sequence near the
polymorphism
147241677
147241803
147242005
147242108
147242145
147242413
147243151
147243346
147243630
147243731
147243784
147243815
147243864
147244810
147245459
147247923
147251658
147266247
147266805
147267670
147267677
147270206
147280966
147283244
147304359
147307881
147311055
147314223
147322787
147323729
147328074
147334766
147338580
147363753
147370184
147377498
147389414
147391761
147397943
147411105
147417300
147421180
147425205
147460200
147460220
147460273
147460305
147461148
147461211
147496791
147496955
147508590
147519728
147528521
147528541
147528612
147529356
147562510
147566982
147602376
147604459
GTTTCC[C/T]CATCAG
CTTGGT[G/A]TGACAT
CCAGAT[C/T]AGTTTT
CTCTAA[G/T]TTAAAC
ATCTCA[T/C]GGTGTT
TTTCCT[C/T]TACTCT
CACCTA[C/G]TTGACT
CTTTCA[C/T]TCTGTG
CATATT[G/T]ATGCAT
TCCTAT[A/G]GGAAAG
TTACTT[A/G]ATGACT
TAGATG[A/C]CTCTCA
TCTTTC[C/T]GCCTAC
CCATCA[G/T]CCATAC
TGTAGA[C/G]AAGCTG
ACTAAT[A/C]ACCATG
AAAATT[A/C]TTTGTG
CTGTCT[C/T]AGTACT
GAAGGT[A/G]TCACAA
CACGGT[A/G]GCTCAC
AGCCAG[G/A]CACGGT
AGGAAG[A/G]AAAAAC
GAGAAA[C/T]TTCAAA
TCATTA[A/G]AGGAAA
CCCTGA[A/G]CCTTCA
AGTCAT[G/T]AGAAAA
ATGCTA[A/G]GATGAT
AATGCC[A/C]GTCAGC
AGACGA[G/T]CTAATT
CCATTG[C/T]TCTGTG
ATTCTG[A/T]GAAGTT
TGGTAG[C/T]GGTGAT
TCTTCC[G/T]TTCAAT
GGTTTT[C/T]CTGTGT
GACTCA[A/G]TGATAC
AATATA[A/G]TTCTGA
CCCCTG[C/T]CAACAA
CCTACA[C/T]CTCTTT
GAAATA[A/G]TTTAAT
TGCTCT[A/G]TGGCTT
TACATA[C/T]GGTGAG
CATACA[C/T]GTACAA
CCTTCA[T/C]GTTAAT
TCTTCT[A/G]TCTCGG
TTTGCA[G/A]TGAATA
GAGAAC[G/A]ATCCTA
AGTGCA[T/C]GGCAAC
GAAGGT[A/G]AATCAA
CCTCCA[G/A]CAACTC
CCCCAG[T/C]TCTGAA
AGACAT[C/G]TCCACC
GAGAAC[A/G]ATCCTA
AGGTGA[C/T]GCTGAA
AACCTT[T/G]CATAGT
GAACTT[G/C]CAATCA
AGAGCA[C/T]ATCAGC
AATGGG[T/C]GGAGTA
AATATA[A/G]GAATCA
CATTTC[A/G]TATCTC
AAGGAC[A/T]ACCAGG
ATTCAG[A/T]TCTTAA
The P-values with bold letters indicate those allele frequencies with significant differences between GD and normal subjects. Blank in line of P-value indicate 40 SNPs with MAF ,1% or a HWE
P 1 1026 in controls removed from analysis.
Human Molecular Genetics, 2009, Vol. 18, No. 6
Marker
location
1160
SNP
Description
symbols
Human Molecular Genetics, 2009, Vol. 18, No. 6
1161
Figure 2. The results of the association and logistic regression analysis for SNPs located in the 1 Mb region around SNP rs1368408 in the Shandong, Shanghai,
and the combined Han populations. A total of 122 SNPs located in the 1 Mb region around SNP rs1368408 were genotyped in all subjects from Shandong and
Shanghai. After removal of the SNPs with MAFs ,1% or HWE P 1 1026 in the controls, the SNPs case –control associations plotted [2log10(P-value)
against location in megabase] and SNPs linkage disequilibrium(LD) region analysis for the Shandong (A) and Shanghai (B) are presented in (A) and (B). The
SNPs from the SCGB3A2 region that have strong associations with GD are marked within the red vertical lines. The most significantly associated SNPs are
located in the SCGB3A2 gene in two independent studies, with the smallest P-values of 1.37 1028 and 1.46 1025 in Shandong (top portion of A) and Shanghai populations (top portion of B), respectively (Table 1 for detailed information). The LD regions of these SNPs in the 1.0 Mb region were analyzed with Haploview software in the Shandong (bottom portion of A) and Shanghai populations (bottom portion of B). Three LD blocks composed of these SNPs are observed in
these two independent populations (bottom portion of A and B). They are located between SNP32 and SNP39, SNP65 and SNP103, and SNP141 and SNP148,
respectively. The SNPs in the SCGB3A2 gene are marked by the rectangle. (C) The 38 SNPs located in the 15.0 Kb region of SCGB3A2 were genotyped in 2811
case subjects with GD and 2807 controls subjects in the combined Chinese Han population. The case –control association plots [2log10 (P-value)] for the SNPs
located in the 15 Kb region were magnified in Shandong, Shanghai and the combined Chinese Han population in (C). (D–K) Two locus logistic regression
analyses of SNP75 (2623 2622, AG/T) and SNP76 (rs1368408) in Shandong (D–G) and the combined Han (H–K) populations. SNP75 and SNP76 were
put individually into the regression models as the best makers in the SCGB3A2 gene, and all other markers were sequentially added to see if a second locus
could improve the model. In the Shandong population, 8 of the 104 SNPs suitable for logistic regression analysis improved the model with SNP75 (D) and
eight markers improved the model with SNP76 (F), at the P-value ,0.01 level. In contrast, we tested a regression model by taking each one of 104 loci in
turn and adding the test locus to it. All the markers could be improved by adding SNP75 (E) or SNP76 (G) (see Supplementary Material, Table S3 for detailed
information). Moreover, in the combined Han population, 6 of the 33 SNPs suitable for logistic regression analysis improved the model with SNP75 (H) and 10
markers improved the model with SNP76 (J), at the P-value ,0.01 level. In contrast, when we tested a regression model by taking each one of the 33 loci in turn
and adding the test locus to it, all the markers could be improved by adding SNP75 (I) or SNP76 (K) (see Supplementary Material, Table S4 for detailed information).
into the regression models as the best marker for the region of
SCGB3A2, only SNPs in the other three regions could
improve these models, with a cut-off P-value ,0.01
(SNP47, SNP53; SNP107; and SNP128, respectively)
(Fig. 2D and F and Supplementary Material, Table S3).
However, the SNP47, SNP107 and SNP128 are unlikely to
contribute to the susceptibility of GD because their mutation
frequencies are lower in the GD group than in the control
group (Table 1). Next, we tested the regression model by
taking each one of the 104 loci in turn and adding the
testing locus to it. Interestingly, the majority of the markers
could be improved by adding each of the SNPs in
1.11 3 10 25
7.04 3 10 24
1.31 3 10 24
0.0392
0.0135
4.04 3 10 25
2.67 3 10 27
0.8781
0.2564
0.0157
7.23 3 10 25
0.0007
0.0192
0.0448
1.35 3 10 28
7.30
9.98
4.71
5.36
2.68
59.61
0.18
5.39
8.31
5.06
75.73
5.20
8.90
3.78
73.90
79
108
51
58
29
645
2
48
74
45
674
257
440
187
3655
2.94
5.88
1.68
3.47
1.16
68.38
2.94
5.24
6.99
3.00
82.78
3.80
7.62
3.06
78.68
28
56
16
33
11
651
28
63
84
36
995
200
401
161
4140
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
1
0
1
0
1
0
1
0
1
1
1
1
1
0
0
1
1
1
0
1
1
1
0
0
1
1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
1
0
0
Combined Han
SNP74
SNP75
SNP76
Other
SNP74
SNP75
SNP76
Other
Shanghai
Other
SNP76
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
SNP74
SNP75
Shandong
Bold letters indicate those haplotypes with significant differences between GD and normal subjects. All data shown here are haplotypes whose frequencies are more than 2%.
2.60
1.77
2.89
1.58
2.36
0.68
0.06
1.03
1.21
1.72
0.65
1.39
1.18
1.24
0.77
1.67–4.04
1.27–2.48
1.64–5.11
1.02–2.44
1.17–4.74
0.57–0.82
0.01–0.26
OR (95%CI)
OR
P-value
Percent
Case
Number
Percent
Control
Number
SNP89
SNP78
SNP77
SNP76
SNP75
SNP74
SNP73
Haplotype
SNP71 SNP72
SNP no.
Population
Table 2. Frequencies of SCGB3A2 haplotypes in different populations
1.10–2.70
0.52–0.80
1.15–1.68
1.03–1.36
1.00–1.54
0.70–0.84
Human Molecular Genetics, 2009, Vol. 18, No. 6
1162
SCGB3A2 gene (SNP72, SNP74, SNP75, SNP76, SNP78 and
SNP89) (Supplementary Material, Table S3 and Fig. 2E and
G). In contrast, only 16 SNP models could be improved by
SNP53, at a P-value of less than 0.01 (Supplementary
Material, Table S3). These results suggested that SNP53 was
not likely to be the causal variant, owing to the very limited
impact on the overall model of SNPs, and the weak association
we observed for this SNP was probably due to LD with causal
variants residing in the SCGB3A2 region. With regard to the
SNPs in the SCGB3A2 region, SNP76 (rs1368408) and
SNP75 (26232622, AG/T) are probably the most important
for the susceptibility to GD because they improve the model
with any one of 104 SNPs, with the lowest P-value among
the SNPs of SCGB3A2 (Supplementary Material, Table S3
and Fig. 2E and G). However, these results do not reject the
possibility that multiple SNPs located in the SCGB3A2
region act in combination to increase the risk of GD.
Because multiple SNPs may act in combination to increase
the risk of disease, haplotypes of the SNPs on the SCGB3A2
gene were investigated and their frequencies in the GD and
control groups were compared. In the population of Shandong
Province, 7 haplotypes with a frequency of more than 2% were
formed from 9 SCGB3A2 SNPs and accounted for 85% of
all haplotypes (Table 2). Five of these haplotypes showed significantly higher frequencies among individuals with GD than
the control group. As shown in Table 2, the haplotype
000101110 displayed the highest statistical difference (P ¼
1.11 1025, OR 2.60, CI 1.67– 4.04), followed by haplotype
000011000 (P ¼ 1.31 1024, OR 2.89, CI 1.64 – 5.11) and
010011001 (P ¼ 7.041024, OR 1.77, CI 1.27 – 2.48). In contrast, haplotype 000000000 was more frequently observed in
controls than in GD patients (P ¼ 4.04 1025, OR 0.68, CI
0.57– 0.82) (Table 2). Notably, all the haplotypes with close
associations with GD contained one or two variants of the
SNP76, SNP75 or SNP74 alleles (Table 2).
At the same time, a replication study was performed in 545
patients and 603 normal subjects from Shanghai, a metropolitan city in China where many individuals come from different
regions and have multiplex founders. After 16 SNPs were
removed from the analysis by data QC filters (15), 24 out of
106 SNPs had different distribution patterns in the Shanghai
population analysis (Table 1 and Fig. 2B and C). Of those,
nine were found in the SCGB3A2 gene, with the most significant association found in the promoter of this gene [SNP73
(21301 21303, AAA/2), P ¼ 1.46 1025, OR 2.44, CI
1.61– 3.69] (Table 1). Interestingly, three SNPs in the
SCGB3A2 gene, named SNP76 (rs1368408), SNP77
(rs2278376) and SNP78 (rs3217372), had significant frequency differences between patients and controls in two independent populations (Table 1). We noticed that SNP76
(rs1368408), the nucleotide variant with the most significant
association signal in the Shandong population, also exhibited
significantly higher allele frequencies in patients with GD
than in healthy individuals collected from Shanghai (20.49
versus 14.95%, P ¼ 1.20 1023, OR 1.47, CI 1.16 – 1.85)
(Table 1). In the Shanghai population, only one haplotype,
101001110, was more frequent in individuals with GD than
in the controls (P ¼ 0.0157, OR 1.72, CI 1.10 – 2.70)
(Table 2). All of these results strongly suggested that the
SNP76 and related haplotypes conferred susceptibility to GD.
Human Molecular Genetics, 2009, Vol. 18, No. 6
1163
Table 3. False positive report probability (FPRP) values for eight SNPs with significant difference between 2811 patients with GD and 2807 health individuals
SNP symbols
SNP72
SNP74
SNP75
SNP76
SNP77
SNP78
SNP81
SNP89
Description
P5, 21351
P3, rs6882292
P2, – 623 –622
P1, rs1368408
I1-1, rs2278376
I1-2, rs3217372
I1.3
E3-1, rs34212847
Odds ratio (95% CI)
1.28 (1.10– 1.49)
1.26 (1.09– 1.45)
1.32 (1.16– 1.50)
1.28 (1.17– 1.40)
1.19 (1.04– 1.35)
1.25 (1.09– 1.44)
1.19 (1.03– 1.38)
1.20 (1.05– 1.37)
Reported P-value
0.0035
0.0041
7.621025
1.431026
0.0158
0.0033
0.0341
0.0110
Statistical power
under recessive modela
Prior probability
0.25
0.1
0.01
0.001
0.0001
0.00001
0.980
0.993
0.975
1.000
1.000
0.994
0.999
1.000
0.004
0.004
0.000
0.000
0.020
0.006
0.060
0.021
0.596
0.559
0.021
0.000
0.873
0.667
0.955
0.875
0.937
0.927
0.175
0.001
0.986
0.953
0.995
0.986
0.993
0.992
0.680
0.007
0.999
0.995
1.000
0.999
0.013
0.011
0.000
0.000
0.058
0.018
0.161
0.059
0.128
0.112
0.002
0.000
0.405
0.166
0.679
0.409
a
Statistical power is the power to detect an odds ratio of 1.5 for the homozygotes with the rare genetic variant, with an a level equal to the reported
P-value. FPRP values below 0.2 are in bold face.
To further confirm the associations of the SCGB3A2 variants with GD susceptibility, the 15 Kb region containing the
SCGB3A2 gene and its 50 and 30 flanking regions were completely re-sequenced. A total of 38 SNPs were found in this
gene, with 13, 12 and 13 SNPs distributed in the exons and
introns, and 50 as well as 30 flanking regions of SCGB3A2,
respectively. Subsequent association analyses of the 38 SNPs
residing in and around the SCGB3A2 gene were performed
from 2811 patients with GD and 2807 healthy individuals,
which were collected from Jiangsu, Henan, Anhui and
Fujian Province, along with samples collected from the Shandong and Shanghai populations; all subjects were from the
Chinese Han population. Excluding 5 SNPs with HWE P 1 1026 in controls, of the remaining 33 SNPs in the
SCGB3A2 region, 8 had significant frequency differences in
the patients with GD compared to healthy individuals in the
combined Han population. Similarly, the most significant
differences between the GD patients and controls were
measured for SNP76 and SNP75, which are located in the promoter of SCGB3A2 (P ¼ 1.43 1026 and 7.62 1025,
respectively) (Table 1 and Fig. 2C). Interestingly, in the
Chinese Han cohorts recruited from different geographic
regions of China, four haplotypes with frequencies higher
than 2% were formed from nine SCGB3A2 SNPs and
accounted for more than 90% of all haplotypes (Table 2).
Three of these haplotypes had significantly higher frequencies
among patients with GD than the control group. Notably,
similar to the results from the Shandong population studies,
the haplotypes of the SNP76 (rs1368408, G/A)þSNP74
(rs6882292, G/A) (000101110) or SNP76þSNP75
(26232622, AG/T) (010011001) variants also correlated
with high disease susceptibility in the combined Chinese
Han cohort (P ¼ 0.0007 and P ¼ 0.0192, respectively)
(Table 2). In contrast, haplotype 000000000 was more frequently observed in the control group than in GD patients
(P ¼ 1.35 1028, OR 0.77, CI 0.70– 0.84) (Table 2). Moreover, the results of logistic regression analysis (Fig. 2H –K
and Supplementary Material, Table S4) of the combined
Chinese Han cohorts also suggested that pSNPs in the
SCGB3A2 gene, SNP76 and SNP75, were the strongest determinants in the susceptibility of GD because they improved the
model when combined with any one of the other 33 SNPs
(Fig. 2H – K and Supplementary Material, Table S4). All the
results strongly suggested that the SNP76 and SNP75 con-
ferred susceptibility to GD, particularly when they existed in
haplotypes of SNP76 and SNP75 or SNP76 and SNP74
(Table 2).
The false positive report probability (FPRP) of the SNPs
with significant association with GD in the combined
Chinese Han cohorts was also analyzed. In the present
study, for each genetic variant, the FPRP value was calculated
using the assigned prior probability range, the statistical power
to detect an odds ratio (OR) of 1.5 and detected ORs and
P-values. As showed on the Table 3, among the eight
genetic variants with a significant difference between the
patients with GD and healthy individuals, the FPRP values
of five SNPs were below 0.2 for the prior probability from
0.25 to 0.01, which was a relatively high prior probability
range. However, the FPRP values for the SNP76 and SNP75
were very low even for low prior probabilities, since the
FPRP value remains below 0.2 even for a prior probability
of 0.0001 (0.001 and 0.175, respectively). This relationship
was especially true for the SNP76, as the FPRP value was
0.007 even for a prior probability of 0.00001 (Table 3). Interestingly, the case– control study for these eight SNPs with significant differences between the 2811 patients with GD and
2807 healthy individuals have more than 97% statistical
power to detect a SNP with an a level equal to their reported
P-value, corresponding to relative risks of 1.5 for GD
(Table 3).
The pSNPs (SNP76, SNP75 and SNP74) most strongly
associated with GD are also correlated with lower
SCGB3A2 expression
Since the GD-associated SNP haplotypes are located in the
SCGB3A2 promoter, a region that contains relatively wellconserved transcription factor binding sites (Fig. 3), we
hypothesized that these pSNPs may affect the expression of
SCGB3A2. To test this, seven SCGB3A2 promoter/luciferase
reporter constructs were made and transfected into HeLa and
SPC-A1 cells, which are human cervical carcinoma and lung
carcinoma cell lines, respectively (Fig. 4A and B). As
shown in Figure 4, the luciferase activities of pGL3-(SNP76þ
SNP75), pGL3-(SNP76þSNP74) and pGL3-(SNP76þ
SNP75þSNP74) were decreased in both HeLa and SPC-A1
cells, relative to other haplotypes (Fig. 4A and B). The
co-transfection of thyroid transcription factor-1 (TTF-1)
1164
Human Molecular Genetics, 2009, Vol. 18, No. 6
Figure 3. Sequence conservation and transcription factor binding sites near the SNP76, SNP75 and SNP74, as predicted by the web site of UCSC (http://genome.
ucsc.edu/) and using the Alibaba 2.1 software, respectively. In the human sequence, two TTF-1, one NFkB and one C/EBPa binding site near SNP76, and one
TTF-1 motif adjacent to SNP75, were predicted. In the presence of pSNPs, the C/EBPa near SNP76 disappears, while a TBP binding site appears at the positions
of SNP75 and SNP74. The broken lines indicate the putative TBP binding sites alleles with the SNP75 or SNP74.
increased the overall luciferase activity levels, while the relative influence of the pSNPs on reporter gene expression
remained unchanged. Next, using electrophoretic mobility
shift assays (EMSAs), we asked whether the GD susceptibility
alleles of the SNP76, SNP75 and SNP74 affected the binding
of transcription factors to the SCGB3A2 promoter. Two main
bands, I and II, were identified in the EMSAs with each of the
SNP76, SNP75 and SNP74 SNP probes after incubation with
nuclear extracts of SPC-A1 cells. Compared to probes for
alleles not linked to GD, the SNP76 and SNP75 probes produced one band of stronger intensity (Fig. 4C), whereas one
band produced by the SNP74 probe was less intense
(Fig. 4C). In addition, use of unlabeled AG and T allele
probes to compete for the labeled AG allele probe of
SNP75, showed that the T allele was better able to compete
for binding of AG allele with band II. This data also suggested
that the susceptible allele T of SNP75 had higher binding affinity with the transcription factor band II than the nonsusceptible allele AG (Fig. 4C, right panel).
We next sought to determine if the differential promoter
activity associated with the GD-linked SNPs is also seen in
thyroid tissue. Samples of thyroid tissue were collected from
93 patients in the Shandong province with thyroid adenoma
or multinodular goiter but not hyperthyroidism. The
expression of the SCGB3A2 gene in the thyroid tissue
samples derived from patients with SNP76þSNP75 and
SNP76þSNP74 alleles was significantly lower than it was in
samples from patients with wild-type alleles (P ¼ 0.047 and
0.027, respectively) (Fig. 4D and E). The effect of these
SNPs on SCGB3A2 transcription was further confirmed
using allele-specific transcript quantification (ASTQ) (5).
The relative contribution of each haplotype to SCGB3A2 transcript production in five samples of thyroid tissue from heterozygous individuals was evaluated using a Mae III restriction
fragment length polymorphism (RFLP) located at SNP
rs34212847 (SNP89) in exon 3 of the SCGB3A2 gene
(Fig. 4F). As shown in Figure 4Fc, the intensities of the 380
and 253 bp bands represented the SCGB3A2 mRNA levels
transcribed from the GD-susceptible haplotype T:A:A and
the non-susceptible haplotype AG:G:G at the corresponding
positions (SNP75, SNP76 and SNP89 SNPs). Because the
intensities of the bands depend on the lengths of the digested
RT– PCR products from ASTQ, when the mRNA transcribed
from the two alleles are equal, the ratio of the intensities
between the 380 bp band and the 253 bp band should theoretically be greater than 1:1. In fact, when equal amounts of the
ASTQ products amplified from the lung tissue of six individuals with homozygous AG:G:G alleles were separated on an
agarose gel, with or without MaeIII digestion, the actual
ratio of intensities between the 380 bp band and the 253 bp
band was 2.1 + 0.4 (mean + SD) (Fig. 4Fb). However, the
ratio of the ASTQ bands derived from thyroid tissues of five
individuals with heterozygosity at the SNP75, SNP76 and
SNP89 positions was 0.9 + 0.2 (mean + SD) (Fig. 4Fc),
which was significantly lower than what was measured in
individual homozygous for non-susceptible alleles (P ,
0.001). These results suggested that SCGB3A2 mRNA levels
were lower in individuals with the GD-susceptible haplotype.
The expression pattern of SCGB3A2 and its receptor
MARCO gene in human and mouse
With regard to the SCGB3A2 gene expression, it was previously reported that the highest mRNA level was observed
in the human lung by the northern blot analysis, whereas
low expression was also detected in human thyroid tissue
(18). In the present study, using RT – PCR analysis, we confirmed that the mRNA of SCGB3A2 was expressed at high
Human Molecular Genetics, 2009, Vol. 18, No. 6
1165
Figure 4. The effect of SNPs on SCGB3A2 expression in in vitro and in vivo analyses. Relative luciferase activities of the reporter plasmids containing
SCGB3A2 promoter regions with distinct pSNP combination and a wild-type control were detected in SPC-A1 (A) and Hela (B) cell lines. Open and filled
bars represent co-transfection with or without a plasmid expressing the gene for thyroid transcription factor-1 (TTF-1). Luciferase activities are normalized
according to pRLO activity, and relative luciferase activity (fold) is expressed based on the induction-fold relative to the transfection of empty vector
(pGL3-Basic) in each reporter gene assay. The results are the average of three independent experiments performed in triplicate. The bars indicate the standard
error. (C) Binding affinity of nuclear factors to the 2550 bp promoter regions around the SNP76, SNP75 and SNP74 of SCGB3A2. The 2550 bp oligonucleotides, including wild and mutation alleles of the SNP76, SNP75 and SNP74 of SCGB3A2, were labeled with [g- 32P] dATP. Arrows indicate the bands of the
EMSAs using each of the SNP76, SNP75 and SNP74 probes incubated with nuclear extracts from SPC-A1 cells. Top and bottom arrows correspond to band I and
band II, respectively. LWP and NLWP: the labeled and unlabeled wild-type probes, respectively; LMP and NLMP: labeled and unlabeled mutant probes, respectively; NP: extracted nuclear protein. (D) The expression levels of SCGB3A2 in thyroid tissues with haplotype SNP76þSNP75 (n ¼ 11) were significantly
decreased based on real time RT– PCR analysis, as compared with those devoid of SNP76, SNP75 or SNP74 (n ¼ 16). (E) Comparison of SCGB3A2 gene
expression with SNP76þSNP74 haplotype (n ¼ 5) and wild-type haplotypes. (F) Allele-specific ASTQ of SCGB3A2 using the Mae III RFLP located at the
position SNP89 (rs34212847) G/A in exon 3 of RNA (cDNA) derived from thyroid tissues of five heterozygous individuals at the SNP75, SNP76 and
SNP89 positions (Fb). Relative contributions of the susceptible (SNP75T2SNP76A2SNP89A) and non-susceptible (SNP75AG2SNP76G2SNP89G) haplotypes to SCGB3A2 expression are presented as a SNP89A (380 bp) to G (253 bp) ratio. The smaller sized bands from the SNP89 G allele (127 bp) are not
included in calculating the ratio, owing to their weak intensities. As a result, in six control samples homozygous for the SNP89 G allele (Fc), the mean ratio
of intensity between the 380 bp band and the 253 bp band was 2.1:1, instead of the theoretical ratio of 1:1.
level in lung tissues in both mouse and human, though low
level transcripts were also present in thyroid and kidney in
human, and adrenal gland, thymus, brain, muscle and skin in
mice (Fig. 5A). Recently, the macrophage scavenger receptor
with collagenous structure (MARCO) protein was identified as
a receptor for SCGB3A2 (19). Interestingly, we found that
MARCO was expressed in a wide range of tissues, including
immunity-related ones such as spleen, thymus, lymph node
and liver by semi-quantitative RT– PCR analysis (Fig. 5B).
DISCUSSION
Our case – control study of 2811 GD patients and 2807 healthy
individuals using a large number of SNPs located in the 5q12 –
q33 region, which is linked to GD, had identified and validated
a new gene (SCGB3A2) associated with GD. A significant
association between GD with several SNPs in the SCGB3A2
gene was identified, with the strongest associations mapped
to SNPs in the SCGB3A2 promoter (pSNP) (SNP76 and
SNP75). Furthermore, the results of the logistic regression
analysis in the combined Chinese Han cohorts suggested that
these two SNPs were probably the causal variants because
they improved the model when combined with any one of
the other 33 SNPs in SCGB3A2 gene. Interestingly, in our
study cohorts that were recruited from different geographic
regions of China, three of haplotypes showed significantly
higher frequencies among patients with GD than those in the
control individuals. However, the haplotypes contributing to
the susceptibility of GD were different in two subsets, the
Shandong and Shanghai populations. The haplotypes of the
SNP76 (rs1368408, G/A)þSNP74 (rs6882292, G/A)
(000101110) or SNP76þSNP75 (2623 2622, AG/T)
(010011001) variants were correlated with high-disease susceptibility in the Shandong subset, and the significant association between the haplotype of SNP76þSNP73
(2130121303,
AAA/2)þSNP71
(21664,
A/T)
(101001110) and GD collected from Shanghai subset was
identified. These results were similar to the observation in
1166
Human Molecular Genetics, 2009, Vol. 18, No. 6
Figure 5. RT –PCR analysis of the expression of SCGB3A2 and the gene for its receptor, MARCO, in different human and mouse tissues. (A) The SCGB3A2
transcript was detected at a high level in lung tissues from both mouse and human, while low-level expression was detected in human thyroid and kidney, and in
adrenal gland, thymus, brain, muscle and skin of mice. (B) The MARCO gene was expressed at a high level in lung and liver (human and mouse), mammary
gland (human), submandibular gland, spleen, thymus and epididymis fat (mouse), while a low level of expression was measured in thyroid and muscle (human
and mouse), lymph node (human) and testis (mouse).
the most Mendelian monogenic disorders, in which a spectrum
of different mutations in a gene (or genes) caused a disease
(20). The notion was supported by the recent study that rare
DNA sequence variants in some genes collectively contributed
significantly to low plasma levels of HDL-C, a common quantitative trait (21). In fact, previous studies have also documented that causal variants in a gene in the different ethnic and
geographic populations with a common complex disease
were different (5,6,22).
However, as we were concluding our study, a study describing the lack of an association between SCGB3A2 and GD was
reported (23). This report shows that the allele frequency distribution of the SNPs within the SCGB3A2 gene do not show
significant differences between 146 GD patients and 142 unrelated controls (23). However, the sample size in that study was
relatively small, and the number of SNPs used was limited.
Indeed, in recent years, some statisticians suggested that the
prior odds against an association in a case – control study
would usually exceed 1000:1, even for candidate genes, and
may even exceed 10 000:1 for random polymorphisms (24).
The arguments of Wacholder et al. (25) would then suggest
the use of statistical significance levels in the range of 1024
to 1026. According to the criteria, few previous molecular epidemiology studies, with sample sizes in the hundreds that have
been typical in the field, were likely to attain such levels of
statistical significance. This lack of statistical power, together
with the usual sources of bias (e.g. confounding, inappropriate
controls and measurement error), might account for most of
the observed failures to replicate reported associations
between genetic variants and diseases (24). In recent years,
Wacholder et al. (25) defined the probability of no association
given a statistically significant finding as the FPRP and developed a statistic procedure for FPRP. In the mathematics
model, a high FPRP could be a consequence of any combination of a low prior probability that the association
between the genetic variant and the disease was real, low statistical power or a relatively high P-value. Given that some
estimates of the overall FPRP in the molecular epidemiology
literature have been near 0.95 (26), Wacholder et al. (25)
considered that an FPRP value near 0.5 would represent a substantial improvement over current practice about studies of
association between genetic variants and diseases. They
further suggested that large studies or pooled analyses that
attempted to be more definitive evaluations of a hypothesis
about association between a genetic variant and a disease
should use a more stringent FPRP value, perhaps below 0.2
(25). The current work found that among the eight genetic variants with significant association with GD in the region of
SCGB3A2, the FPRP values for the SNP76 and SNP75 were
very low for this prior probability range and were quite
robust even for low prior probabilities. These data suggested
that these two SNPs with significant association with GD in
the promoter of SCGB3A2 gene were noteworthy. Although
the SCGB3A2 region probably harbored etiological DNA variants, it was still not refused that there were other primary
disease causing polymorphisms within the region 5q12 – q33
linked to GD.
The SCGB3A2 gene encodes a secretary protein and is
reported to be a target of the homeodomain transcription
factor T/EBP (TTF-1), which regulates the expression of
thyroid- and lung-specific genes, such as thyroglobulin (27),
thyroid peroxidase (28,29), TSH receptor (30) and Na/I sym-
Human Molecular Genetics, 2009, Vol. 18, No. 6
porter (31) in the thyroid, and surfactant proteins (32) and
Clara cell secretory protein (33) in the lung. Previous reports
state that the SCGB3A2 is expressed at high levels in
human lung tissue and at low levels in the thyroid (18). The
SCGB3A2 protein has been detected specifically in the epithelial cells of respiratory system (18). SCGB3A2 mRNA
levels are down-regulated in inflamed mouse lungs, whereas
the expression level returns to normal following dexamethasone treatment (18). A recent study demonstrated that
expression of SCGB3A2 was reduced in a mouse model of
allergic airway inflammation by a mechanism involving IL-5
and IL-9 (34,35). However, the constitutive expression of
SCGB3A2 mRNA is enhanced by IL-10 (36). Furthermore,
a polymorphism (G/A) at the 2112 locus (which is in SNP
rs1368408) of the human SCGB3A2 gene promoter has been
identified to associate with an increased risk of adult bronchial
asthma in the Japanese population (37), although the association have not been replicated in small size populations
recruited from another Japanese population involving asthmatic children (38), a Germanic Caucasian (39) and Indian
populations (40). Interestingly, Inoue et al. (41) recently
showed that the mean plasma SCGB3A2 levels for subjects
with 2112A allele were significantly lower than those
without it (P ¼ 0.025). Moreover, severe asthma patients
without treatment by oral corticosteroid had significantly
lower plasma SCGB3A2 levels compared to mild- or
moderate-asthma patients and controls. In this study, we also
found that pSNPs (SNP76, SNP75 and SNP74) most strongly
associated with GD tended to be associated with reduced
SCGB3A2 gene expression levels in human thyroid tissue,
while functional analysis revealed a relatively low efficiency
of SCGB3A2 promoters of the SNP76þSNP75 and
SNP76þSNP74 haplotypes in driving gene expression.
Recently, MARCO was identified as the SCGB3A2 receptor
(19), and is expressed in the macrophages of spleen and
lymph nodes (42) and lung alveoli (19). We confirmed the
tissue distribution of MARCO, including the expression in
immunity-related organs. It is tempting to speculate that
SCGB3A2 protein secreted from the lung tissue may regulate
the functions of immune organs via MARCO, and thereby
contribute to the susceptibility of GD. Further studies are
needed to confirm this hypothesis.
MATERIALS AND METHODS
Sample recruitment
A total of 541 unrelated individuals with GD were recruited from
Shandong Province, China. The control group was made up of
478 unrelated healthy subjects from the same geographic
region screened for the absence of thyroid disease. The diagnosis
of GD was based on documented clinical and biochemical evidence of hyperthyroidism, diffused goiter and the presence of
at least one of the following items: positive TSH receptor antibody tests, diffusely increased 131I (iodine-131) uptake in the
thyroid gland or presence of exophthalmos. All individuals
classified as affected were interviewed and examined by experienced clinicians. Two additional series of 545 cases and 603 controls, and 1725 cases and 1726 controls that met identical criteria
were collected from Shanghai, and from different geographic
1167
regions, such as Jiangsu, Henan, Anhui and Fujian Province in
China. All subjects were Han Chinese in origin. After receiving
informed consent, 5 ml blood samples were collected from all
participants for DNA preparations, as well as for biochemical
measurements.
Identification of SNPs, genotyping and QC filters
Several steps were taken to narrow down the size of the region(s)
associated with GD susceptibility. First, 179 SNPs in the 3.0 Mb
region surrounding D5s2090 were selected from the NCBI
dbSNP (NCBI Human Genome Build 36.1) for association
analysis in 384 GD and 382 normal subjects, collected according
to their time queue of sampling from Shandong Province. The
results covered a region with strong association, as indicated
by four SNPs with statistical significance (at P-value ,0.001
level). Next, a second SNP association study was performed
for the 1.0 Mb region between SHGC-111280 and RH92492,
which was determined to have the highest association with
GD. The information on the 11 genes contained in this 1.0 Mb
and the primers used for amplifying the exons and promoters
of these genes are given in Supplementary Material, Table S5.
Each exon was sequenced using flanking primers that were
about 100 base pairs upstream of the 50 intron–exon junction
or downstream of the 30 intron–exon junction. This approach
enabled us to sequence all regions that could affect the amino
acid sequence, as well as splicing sites of these genes. The
10002000 base pairs upstream of the first exon of these
genes were also re-sequenced. A total of 39 intergenic SNPs
within the 1.0 Mb region, which were distributed over approximately 5 Kb, were selected from the NCBI dbSNP (NCBI
Human Genome Build 36.1). Furthermore, the 15 Kb region
containing the exons and introns, 50 and 30 flanks of the
SCGB3A2 gene, which had the strongest association GD, were
completely resequenced. We found 38 SNPs in this region and
13, 12 and 13 SNPs, respectively, were distributed on the
exons and introns, and 50 and 30 flanking regions of the
SCGB3A2. Genomic DNA was amplified using specific
primers and the PCR products were sequenced with an ABI
3700 DNA Sequencer (Applied Biosystems), as described
(43). We sequenced PCR products from 48 unrelated individuals
with GD to identify SNPs. All genotypes were performed using
the Mass-ArrayTM Technology Platform of Sequenom, Inc. (San
Diego, CA, USA). The genotyping results of a small number of
key SNPs on the SCGB3A2 gene, such as SNP78-71, SNP82 and
SNP91, in the Shanghai population, and SNP76, SNP75 and
SNP74 in the Shandong population, were confirmed using
either a second batch of the Mass-ArrayTM , or directed sequencing. SNPs with MAF ,1% or HWE P 1 1026 in controls
were removed from the analysis (15).
Statistical analysis of association
In the case – control design, allele/genotype frequencies, ORs
and significance values were analyzed by x2 analysis using
SPSS (version 13.0; SPSS Inc.). A P-value ,0.05 was considered significant. The genotype data were further mined by
logistic regression analysis, as previously described (5,17).
LD regions were analyzed by Haploview (16). Haplotypes
were generated for SNPs within genes using the PHASE
1168
Human Molecular Genetics, 2009, Vol. 18, No. 6
program (Version 2.1). Haplotype frequencies were calculated
for case and control, respectively, and the significance was
assessed by x2 values and a P-value ,0.05 was considered
significant. FPRP was analyzed using the FPRP calculation
spreadsheet provided by Wacholder et al. (25).
Cell culture, transfections and luciferase assays
To construct the promoter/luciferase reporter plasmids containing the various SNP changes in the SCGB3A2 promoter
region, seven types of fragments were generated, including
the wild-type (rs1368408 G/G, 26232622 AG/AG and
rs6882292 G/G), SNP76 (rs1368408 A/A), SNP75
(2623 2622
T/T),
SNP74
(rs6882292
A/A),
SNP76þSNP75 (rs1368408 A/A and 26232622 T/T),
SNP76þSNP74 (rs1368408 A/A and rs6882292 A/A) and
SNP76þSNP75þSNP74 (rs1368408 A/A, 26232622 T/T
and rs6882292 A/A), each of which were separately cloned
into the Kpn I – Hind III site of the pGL3-Basic luciferase
reporter vector (Promega) to generate pGL3-N,
pGL3-SNP76, pGL3-SNP75, pGL3-SNP74, pGL3-(SNP76þ
SNP75), pGL3-(SNP76þSNP74) and pGL3-(SNP76þ
SNP75þSNP74) plasmids (Fig. 4A and B). The sequence of
each insert was verified by direct sequencing. For construction
of the pcDNA3.1-TTF1 expression plasmid, the respective
coding sequence was amplified by RT – PCR from total
RNAs prepared from SPC-A1 human lung adenocarcinoma
cells. The forward primer, 50 -ATCCTCGAGATGTCGATGA
GTCCAAAGC-30 , and the reverse primer, 50 -ATAGGA
TCCACCAGGTCCGACCGTATAGC-30 , were used for PCR
amplification. The amplified fragment was then cloned into
the XhoI -BamHI site of pcDNA3.1/Myc-His (þ) C vector
(Invitrogen). The sequence of the insert was verified by
direct sequencing. HeLa cells and SPC-A1 cells were grown
in DMEM medium supplemented with 10% FCS. Transient
transfections were performed using Lipofectamine 2000
(Invitrogen), according to the manufacturer’s protocol.
Briefly, 5 104 cells per well were seeded in 12 well plates
16 h before the experiment, and transfected at approximately
50– 70% confluence, with 300 ng SCGB3A2 luciferase reporter constructs and 30 ng of pcDNA3.1-TTF1 or pcDNA3.1
vector, together with 0.3 ng of pRL-SV40 as a normalization
control. After 36 h incubation, luciferase activities were determined using a Dual Luciferase Reporter Assay System
(Promega) according to the manufacturer’s instruction. To
correct for transfection efficiency, the luminescence unit of
each SCGB3A2 luciferase reporter construct was normalized
to that of the pRL-SV40 control plasmid. The promoter
activity was expressed as a ratio of relative luciferase unit of
each SCGB3A2 construct compared to that of the promoterless pGL3-Basic vector in the presence of the same transactivating plasmid. Data are reported as the mean value of at
least three independent experiments (triplicate samples).
Electrophoretic mobility shift assays
Nuclear extracts from SPC-A1 cells were prepared using
NE-PERR Nuclear and Cytoplasmic Extraction Kits (PIERCE).
Double-stranded oligonucleotide probes were used in this
study. The oligo sequences are as follow: SNP76 probe,
sense strand, 50 -TCC AAA TTG TTT [G/A]GT GAG AAA
ACA T-30 , antisense strand, 50 -ATG TTT TCT CAC
[C/T]AA ACA ATT TGG A-30 ; SNP75 probe, sense strand,
50 -TTT TCA AA[AG/T] ACA CTC TGA TTT TAG ATC
TTA AGC CTA TTA TTC TA-30 , antisense strand, 50 -TAG
AAT AAT AGG CTT AAG ATC TAA AAT CAG AGT
GT[CT/A] TTT GAA AA-30 ; and SNP74 probe, sense
strand, 50 -TGT GTT ATT TAT [G/A]TT CCC ATT TTA-30 ,
antisense strand, 50 -TAA AAT GGG AA[C/T] ATA AAT
AAC ACA-30 . These probes were labeled at the 50 end with
[g-32P] dATP and T4 polynucleotide kinase (Promega). The
labeled oligonucleotides were separated from the unincorporated nucleotides using a MicroSpinTM G-25 Column (Amersham). An aliquot of 25 mg nuclear extract was incubated
with 1 ml (radio activity: 6 1051 cpm/min) radiolabeled
probe for 30 min on ice in 20 ml binding buffer (Promega).
Specificity of protein binding to radiolabeled oligonucleotides
was demonstrated by the addition of a 10-fold excess of
unlabelled competing oligonucleotides. After 20 min incubation at room temperature, the samples were resolved on a
6% polyacrylamide gel in 0.5TBE at 250 V for 2 h on ice.
After electrophoresis, the polyacrylamide gel was dried and
autoradiographed. For competition study, nuclear extracts
were pre-incubated with 0.5-, 1.5-, 3-, 5- or 10-fold unlabeled
wild-type or mutant SNP75 probes before adding the [g-32P]
dATP-labeled wild-type SNP75 probe.
Real-time reverse transcriptase – polymerase chain
reaction (RT – PCR)
To measure the relative expression levels of SCGB3A2 in the
thyroid tissues of patients with combined variants of
SNP76þSNP75, SNP76þSNP74 but who did not present
with GD or those without SNP76, SNP75 and SNP74, quantitative PCR was performed using TaqMan. After informed
consent, 93 thyroid tissue samples were collected from the
Shandong Provincial Hospital, Jinan, China, during the
surgeries of patients with thyroid adenoma or multinodular
goiter but without hyperthyroidism. There were 11 thyroid
tissue samples belonging to SNP76þSNP75 haplotype group,
5 to SNP76þSNP74 group and 16 to the group without
SNP76, SNP75 and SNP74. Primer sequences for real-time
PCR were as follows: human SCGB3A2 primers (forward,
50 -GCTACTGCCTTCCTCATCAACAA-30 ; reverse, 50 -CCC
TCCACAAGGTGCTCAAC-30 ) and GAPDH (forward,
50 -GAAGGTGAAGGTCGGAGTC-30 ; reverse, 50 -GAAGAT
GGTGATGGGATTTC-30 ). TaqMan probe sequences were as
follows: SCGB3A2 probe (50 -TGCCCCTTCCTGTTGAC
AAGTTGGC-30 ) and GAPDH probe (50 -CAAGCTTCCC
GTTCTCAGCC-30 ). Reaction temperatures and cycling parameters were as follows: 958C for 15 min, then 45 cycles at
948C for 30 s, 588C for 40 s and 728C for 1 min, then 728C for
10 min. Quantification was accomplished by comparison with
standard curves generated from known amounts of plasmid containing the gene of interest (100–10 000 000 copies).
Allele-specific transcript quantification
The lung or thyroid tissues were collected from patients
with lung cancer or thyroid adenoma or multinodular goiter
Human Molecular Genetics, 2009, Vol. 18, No. 6
undergoing surgery. The regions containing the SNP75
(26232622, AG/T), SNP76 (rs1368408, G/A) and
SNP89 (rs34212847, G/A) SNPs were divided into two fragments (both of which contained the SNP rs1368408) and
amplified from these samples using the following two pairs
of primers: the first pair: forward, 50 -CATATGGACTCCGC
TTTCTATTTC-30 ; reverse, 50 -CAACCCTGCAAATATGT
GC-30 and the second pair: forward, 50 -GGATTCGTTGGG
CTCTTTG-30 ; reverse, 50 -TGGTAGAACAGGTTTCAGG
CAG-30 . The amplified products were cloned into the PG
EM-T easy vector and sequenced to identify the individuals
with the heterozygous or homozygous haplotypes at the positions of the SNP75, SNP76 and SNP89 SNPs in the SCG
B3A2 gene. ASTQ was performed, as previously described
(5) with some modifications. The cDNAs were prepared
from lung and thyroid tissues. The SCGB3A2 gene was amplified by PCR using the primers: 50 -TGGTGACCATCAG
CCTTTG-30 and 50 -TGTCCTTTTCACGGGTCACTAC-30 .
Reaction temperatures and cycling parameters were as
follows: 958C for 15 min, then 35 cycles at 958C for 30 s,
628C for 30 s, and 728C for 30 s. The PCR products were
labeled with DIG-11-dUTP (Roche, Germany) at the 35th
cycle of PCR. ASTQ PCR products were digested with
MaeIII (Roche, Germany) and resolved on a 2% agarose gel.
As a control to monitor whether the ASTQ PCR products
were fully digested, equal amounts of PCR product amplified
from individuals with homozygote genotypes at SNP89
(rs34212847, G/G) were digested with MaeIII, owning to the
G in the SNP89 position forming a cleavage site for MaeIII.
Digested products were transferred to a positively charged N
membrane (Roche, Germany) in alkaline solution; the membrane was then baked at 808C for 30 min. According to manufacturer’s instruction, the membrane was washed and blocked
and then incubated with anti-DIG serum/alkaline phosphatase
conjugate. CDP-star was used as the chemiluminescence substrate. Signals were visualized on X-ray film. Data were
obtained by scanning the exposed bands with the Quantity
One software (BIO-RAD). The band intensities of the 380
and 253 bp digested fragments were determined and represent
the level of SCGB3A2 mRNA transcribed from the susceptible haplotype (T:A:A) and non-susceptible haplotype
(AG:G:G) at the corresponding positions (SNP75, SNP76
and SNP89 SNPs). The smaller sized bands from the SNP89
G allele (127 bp) were not included in calculating the ratio
of the T:A:A and AG:G:G haplotypes owing to their weak
intensities; this lead to all SNP89 A/G ratios being overestimated; thus, the normalization was conducted using the lung
tissues of six control individuals with homozygous alleles at
the SNP75, SNP76 and SNP89 positions.
Semi-quantitative RT – PCR
The expression patterns of the SCGB3A2 gene and MARCO
gene in mouse and human tissues were analyzed by semiquantitative RT –PCR. The first-strand cDNAs were synthesized from total RNA (1– 2 mg) from different tissues
using oligo(dT) (Promega) in a 20 ml reaction. cDNAs were
then amplified using gene-specific PCR primers. GAPDH
was used as an internal control. The PCR mixture contained
cDNA (1 ml), 10 mM dNTP (0.5 ml), 10 PCR buffer for
1169
Taq plus (2 ml) and Taq plus DNA polymerase (2 U)
(Sangon), GAPDH primers (10 pmol) and gene-specific
intron-spanning primers (20 pmol). Reactions were carried
out in a PCR apparatus (PTC-100 MJ.RESARCH, Inc.). One
PCR cycle consisted of denaturation for 30 s (948C), annealing
for 30 s (608C) and extension for 45 s (728C). Each PCR reaction consisted of 25– 28 cycles.
SUPPLEMENTARY MATERIAL
Supplementary Material is available at HMG online.
ACKNOWLEDGEMENTS
We thank all patients and normal individuals for participating
in this study, and professor Ding-Liang Zhu and Dr Lin Lu in
Ruijin Hospital for providing the DNA of healthy subjects in
Shanghai.
Conflict of Interest statement. None declared.
FUNDING
This work was supported in part by the National Key Program
for Basic Research (973), National Natural Science Foundation of China (30530370, 30470815 and 30771017),
Chinese High Tech Program (863), Commission for Science
and Technology of Shanghai, Shandong and Jiangsu Province,
and the Foundation for the Author of National Excellent Doctoral Dissertation of People’s Republic of China.
REFERENCES
1. Hollowell, J.G., Staehling, N.W., Flanders, W.D., Hannon, W.H.,
Gunter, E.W., Spencer, C.A. and Braverman, L.E. (2002) Serum TSH, T4,
and thyroid antibodies in the United States population (1988 to 1994):
National Health and Nutrition Examination Survey (NHANES III).
J. Clin. Endocrinol. Metab., 87, 489– 499.
2. Chen, X., Wu, W.S., Chen, G.L., Zhang, K.Z., Zhang, F.L., Lin, Y.C., Liu,
Y.C., Liu, X.Y., Fang, Z.P. and Luo, C.R. (2000) The effect of salt
iodization for 10 years on the prevalences of endemic goiter and
hyperthyroidism. Chin. J. Endocrinol. Metab., 18, 342– 344.
3. Tomer, Y. and Davies, T.F. (2003) Searching for the autoimmune thyroid
disease susceptibility genes: from gene mapping to gene function. Endocr.
Rev., 24, 694– 717.
4. Onodera, T. and Awaya, A. (1990) Anti-thyroglobulin antibodies induced
with recombinant reovirus infection in BALB/c mice. Immunology, 71,
581– 585.
5. Ueda, H., Howson, J.M.M., Esposito, L., Heward, J., Snook, H.,
Chamberlain, G., Rainbow, D.B., Hunter, K.M.D., Smith, A.N. and Di
Genova, G. (2003) Association of the T-cell regulatory gene CTLA 4 with
susceptibility to autoimmune disease. Nature, 423, 506–511.
6. Yanagawa, T., Hidaka, Y., Guimaraes, V., Soliman, M. and DeGroot, L.J.
(1995) CTLA-4 gene polymorphism associated with Graves’ disease in a
Caucasian population. J. Clin. Endocrinol. Metab., 80, 41– 45.
7. Tomer, Y., Concepcion, E. and Greenberg, D.A. (2002) AC/T
single-nucleotide polymorphism in the region of the CD40 gene is
associated with Graves’ disease. Thyroid, 12, 1129– 1135.
8. Velaga, M.R., Wilson, V., Jennings, C.E., Owen, C.J., Herington, S.,
Donaldson, P.T., Ball, S.G., James, R.A., Quinton, R. and Perros, P.
(2004) The codon 620 tryptophan allele of the lymphoid tyrosine
phosphatase (LYP) gene is a major determinant of Graves’ disease.
J. Clin. Endocrinol. Metab., 89, 5862–5865.
9. Hiratani, H., Bowden, D.W., Ikegami, S., Shirasawa, S., Shimizu, A.,
Iwatani, Y. and Akamizu, T. (2005) Multiple SNPs in intron 7 of
1170
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
Human Molecular Genetics, 2009, Vol. 18, No. 6
thyrotropin receptor are associated with Graves’ disease. J. Clin.
Endocrinol. Metab., 90, 2898–2903.
Shirasawa, S., Harada, H., Furugaki, K., Akamizu, T., Ishikawa, N., Ito,
K., Ito, K., Tamai, H., Kuma, K. and Kubota, S. (2004) SNPs in the
promoter of a B cell-specific antisense transcript, SAS-ZFAT, determine
susceptibility to autoimmune thyroid disease. Hum. Mol. Gen., 13, 2221–
2231.
Jin, Y., Teng, W., Ben, S., Xiong, X., Zhang, J., Xu, S., Shugart, Y.Y.,
Jin, L., Chen, J. and Huang, W. (2003) Genome-wide scan of Graves’
disease: evidence for linkage on chromosome 5q31 in Chinese Han
pedigrees. J. Clin. Endocrinol. Metab., 88, 1798– 1803.
Sakai, K., Shirasawa, S., Ishikawa, N., Ito, K., Tamai, H., Kuma, K.,
Akamizu, T., Tanimura, M., Furugaki, K. and Yamamoto, K. (2001)
Identification of susceptibility loci for autoimmune thyroid disease to
5q31-q33 and Hashimoto’s thyroiditis to 8q23-q24 by multipoint affected
sib-pair linkage analysis in Japanese. Hum. Mol. Genet., 10, 1379–1386.
Allen, E.M., Hsueh, W.C., Sabra, M.M., Pollin, T.I., Ladenson, P.W.,
Silver, K.D., Mitchell, B.D. and Shuldiner, A.R. (2003) A genome-wide
scan for autoimmune thyroiditis in the Old Order Amish: replication of
genetic linkage on chromosome 5q11. 2-q14. 3. J. Clin. Endocrinol.
Metab., 88, 1292– 1296.
Roberts, S.B., MacLean, C.J., Neale, M.C., Eaves, L.J. and Kendler, K.S.
(1999) Replication of linkage studies of complex traits: an examination of
variation in location estimates. Am. J. Hum. Genet., 65, 876 –884.
Hom, G., Graham, R.R., Modrek, B., Taylor, K.E., Ortmann, W., Garnier,
S., Lee, A.T., Chung, S.A., Ferreira, R.C. and Pant, P.V. (2008)
Association of systemic lupus erythematosus with C8orf13-BLK and
ITGAM-ITGAX. N. Engl. J. Med., 358, 900– 909.
Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. (2005) Haploview: analysis
and visualization of LD and haplotype maps. Bioinformatics, 21,
263– 265.
Cordell, H.J. and Clayton, D.G. (2002) A unified stepwise regression
procedure for evaluating the relative effects of polymorphisms within a
gene using case/control or family data: application to[ITAL] HLA
[/ITAL] in type 1 diabetes. Am. J. Hum. Genet., 70, 124– 141.
Niimi, T., Keck-Waggoner, C.L., Popescu, N.C., Zhou, Y., Levitt, R.C.
and Kimura, S. (2001) SCGB3A2, a uteroglobin/Clara cell secretory
protein-related protein, is a novel lung-enriched downstream target gene
for the T/EBP/NKX2.1 homeodomain transcription factor. Mol.
Endocrinol., 15, 2021– 2036.
Bin, L.H., Nielson, L.D., Liu, X., Mason, R.J. and Shu, H.B. (2003)
Identification of uteroglobin-related protein 1 and macrophage scavenger
receptor with collagenous structure as a lung-specific ligand-receptor pair.
J. Immunol., 171, 924 –930.
Reich, D.E. and Lander, E.S. (2001) On the allelic spectrum of human
disease. Trends Genet., 17, 502–510.
Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and
Hobbs, H.H. (2004) Multiple rare alleles contribute to low plasma levels
of HDL cholesterol. Science, 305, 869–872.
Vaidya, B., Imrie, H., Perros, P., Young, E.T., Kelly, W.F., Carr, D.,
Large, D.M., Toft, A.D., McCarthy, M.I., Kendall-Taylor, P. and Pearce,
S.H. (1999) The cytotoxic T lymphocyte antigen-4 is a major Graves’
disease locus. Hum. Mol. Genet., 8, 1195– 1199.
Yang, Y., Lingling, S., Ying, J., Yushu, L., Zhongyan, S., Wei, H. and
Weiping, T. (2005) Association study between the IL4, IL13, IRF1 and
UGRP1 genes in chromosomal 5q31 region and Chinese Graves’ disease.
J. Hum. Genet., 50, 574– 582.
Thomas, D.C. and Clayton, D.G. (2004) Betting odds and genetic
associations. J. Natl Cancer Inst., 96, 421–423.
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and
Rothman, N. (2004) Assessing the probability of false-positive reports
in molecular epidemiology studies. J. Natl Cancer Inst., 96,
434– 442.
Colhoun, H.M., McKeigue, P.M. and Davey Smith, G. (2003) Problems of
reporting genetic associations with complex outcomes. Lancet, 361,
865– 872.
27. Civitareale, D., Lonigro, R., Sinclair, A.J. and Di Lauro, R. (1989) A
thyroid-specific nuclear protein essential for tissue-specific expression of
the thyroglobulin promoter. EMBO J., 8, 2537– 2542.
28. Francis-Lang, H., Price, M., Polycarpou-Schwarz, M. and Di Lauro, R.
(1992) Cell-type-specific expression of the rat thyroperoxidase promoter
indicates common mechanisms for thyroid-specific gene expression. Mol.
Cell. Biol., 12, 576–588.
29. Kikkawa, F., Gonzalez, F.J. and Kimura, S. (1990) Characterization of a
thyroid-specific enhancer located 5.5 kilobase pairs upstream of the
human thyroid peroxidase gene. Mol. Cell. Biol., 10, 6216–6224.
30. Shimura, H. (1994) Thyroid-specific expression and cyclic adenosine 30 ,
50 -monophosphate autoregulation of the thyrotropin receptor gene
involves thyroid transcription factor-1. Mol. Endocrinol., 8, 1049–1069.
31. Endo, T., Kaneshige, M., Nakazato, M., Ohmori, M., Harii, N. and Onaya,
T. (1997) Thyroid transcription factor-1 activates the promoter activity of
rat thyroid Naþ/I-Symporter Gene. Mol. Endocrinol., 11, 1747–1755.
32. Bohinski, R.J., Di Lauro, R. and Whitsett, J.A. (1994) The lung-specific
surfactant protein B gene promoter is a target for thyroid transcription
factor 1 and hepatocyte nuclear factor 3, indicating common factors for
organ-specific gene expression along the foregut axis. Mol. Cell. Biol., 14,
5671– 5681.
33. Ray, M.K., Chen, C.Y., Schwartz, R.J. and DeMayo, F.J. (1996)
Transcriptional regulation of a mouse Clara cell-specific protein (mCC10)
gene by the NKx transcription factor family members thyroid
transcription factor 1 and cardiac muscle-specific homeobox protein
(CSX). Mol. Cell. Biol., 16, 2056– 2064.
34. Chiba, Y., Srisodsai, A., Supavilai, P. and Kimura, S. (2005) Interleukin-5
reduces the expression of uteroglobin-related protein (UGRP) 1 gene in
allergic airway inflammation. Immunol. Lett., 97, 123–129.
35. Chiba, Y., Kusakabe, T. and Kimura, S. (2004) Decreased expression of
uteroglobin-related protein 1 in inflamed mouse airways is mediated by
IL-9. Am. J. Physiol. Lung. Cell. Mol. Physiol., 287, L1193– L1198.
36. Srisodsai, A., Kurotani, R., Chiba, Y., Sheikh, F., Young, H.A., Donnelly,
R.P. and Kimura, S. (2004) Interleukin-10 induces uteroglobin-related
protein (UGRP) 1 gene expression in lung epithelial cells through
homeodomain transcription factor T/EBP/NKX2.1. J. Biol. Chem., 279,
54358– 54368.
37. Niimi, T., Munakata, M., Keck-Waggoner, C.L., Popescu, N.C., Levitt,
R.C., Hisada, M. and Kimura, S. (2002) A polymorphism in the human
UGRP1 gene promoter that regulates transcription is associated with an
increased risk of asthma. Am. J. Hum. Genet., 70, 718 –725.
38. Jian, Z., Nakayama, J., Noguchi, E., Shibasaki, M. and Arinami, T. (2003)
No evidence for association between the – 112G/A polymorphism of
UGRP1 and childhood atopic asthma. Clin. Exp. Allergy, 33,
902–904.
39. Heinzmann, A., Dietrich, H. and Deichmann, K.A. (2003) Association of
uteroglobulin-related protein 1 with bronchial asthma. Int. Arch. Allergy
Immunol., 31, 291–295.
40. Batra, J., Niphadkar, P.V., Sharma, S.K. and Ghosh, B. (2005)
Uteroglobin-related protein 1 (UGRP1) gene polymorphisms and
atopic asthma in the Indian population. Int. Arch. Allergy Immunol.,
136, 1– 6.
41. Inoue, K., Wang, X., Saito, J., Tanino, Y., Ishida, T., Iwaki, D., Fujita, T., Kimura,
S. and Munakata, M. (2008) Plasma UGRP1 levels associate with
promoter G-112A polymorphism and the severity of asthma. Allergol. Int.,
57, 57–64.
42. Elomaa, O., Kangas, M., Sahlberg, C., Tuukkanen, J., Sormunen, R.,
Liakka, A., Thesleff, I., Kraal, G. and Tryggvason, K. (1995) Cloning
of a novel bacteria-binding receptor structurally related to scavenger
receptors and expressed in a subset of macrophages. Cell, 80, 603 –609.
43. Song, H.D., Sun, X.J., Deng, M., Zhang, G.W., Zhou, Y., Wu, X.Y.,
Sheng, Y., Chen, Y., Ruan, Z. and Jiang, C.L. (2004) Hematopoietic gene
expression profile in zebrafish kidney marrow. Proc. Natl Acad. Sci. USA,
101, 16240–16245.