* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Functional SNPs in the SCGB3A2 promoter are
Fetal origins hypothesis wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Point mutation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Medical genetics wikipedia , lookup
Epigenetics of depression wikipedia , lookup
History of genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Population genetics wikipedia , lookup
Genome evolution wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene desert wikipedia , lookup
Nutriepigenomics wikipedia , lookup
SNP genotyping wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Human genetic variation wikipedia , lookup
Microevolution wikipedia , lookup
Human Molecular Genetics, 2009, Vol. 18, No. 6 doi:10.1093/hmg/ddn442 Advance Access published on January 6, 2009 1156–1170 Functional SNPs in the SCGB3A2 promoter are associated with susceptibility to Graves’ disease Huai-Dong Song1,2, , {, Jun Liang3, {,{, Jing-Yi Shi1, {, Shuang-Xia Zhao1, {, Zhi Liu1, {, Jia-Jun Zhao3, {, Yong-De Peng4, Guan-Qi Gao5, Jiong Tao1, Chun-Ming Pan1, Li Shao1, Feng Cheng1, Yi Wang6, Guo-Yue Yuan7, Chao Xu1, Bing Han1, Wei Huang8, Xun Chu8, Yi Chen1, Yan Sheng1, Rong-Ying Li1, Qing Su9, Ling Gao3, Wei-Ping Jia10, Li Jin6, Ming-Dao Chen1, Sai-Juan Chen1,2, Zhu Chen1,2 and Jia-Lun Chen1 1 Ruijin Hospital, State Key Laboratory of Medical Genomics, Molecular Medicine Center, Shanghai Institute of Endocrinology, Shanghai Jiao Tong University (SJTU), School of Medicine, Shanghai 20025, China, 2Shanghai Center for Systems Biomedicine, SJTU, 800 Dong Chuan Road, Shanghai 200240, China, 3Department of Endocrinology, Shandong Province Hospital, Shandong University, 324 Jing 5 Road, Jinan 250021, China, 4 Department of Endocrinology, The First People’s Hospital, Shanghai Jiaotong University, Shanghai 200080, China, 5 Department of Endocrinology, The People’s Hospital of Linyi, Shandong Province, 27 Liberation Road, Linyi 276003, China, 6Centre of Anthropology, Fudan University, 220 Handan Road, Shanghai 200433, China, 7Department of Endocrinology, Hospital of Jiangsu University, Zhenjiang, Jiangsu 212001, China, 8Chinese National Human Genome Center at Shanghai, Zhang Jiang High Tech Park, 250 Bi Bo Road, Shanghai 201203, China, 9Department of Endocrinology, Xin Hua Hospital, Shanghai Jiao Tong University (SJTU), School of Medicine, Shanghai 20092, China and 10Shanghai Diabetes Institute, Shanghai Jiaotong University, No. 6 Hospital, Shanghai 200233, China Received November 14, 2008; Revised December 22, 2008; Accepted December 30, 2008 Graves’ disease (GD) is one of the most common human autoimmune diseases, and recent data estimated a prevalence of clinical hyperthyroidism of 0.25 –1.09% in the population. Several reports have linked GD to the region 5q12– q33; and a locus between markers D5s436 and D5s434 was specifically linked to GD susceptibility in the Chinese population. In the present study, association analysis was performed using a large number of single-nucleotide polymorphisms (SNPs) at this locus in 2811 patients with GD recruited from different geographic regions of China. The strongest associations with GD in the combined Chinese Han cohorts were mapped to two SNPs in the promoter (pSNP) of SCGB3A2 [SNP76, rs1368408, P 5 1.43 3 1026, odds ratio (OR) 5 1.28 and SNP75, 2623 2622, P 5 7.62 3 1025, OR 5 1.32, respectively], a gene implicated in immune regulation. On the other hand, pSNP haplotypes composed of the SNP76 (rs1368408)1SNP74 (rs6882292) or SNP761SNP75 (2623 2622, AG/T) variants are correlated with high disease susceptibility (P 5 0.0007, and P 5 0.0192, respectively) in this combined Chinese Han cohort. Furthermore, these haplotypes were associated with reduced SCGB3A2 gene expression levels in human thyroid tissue, while functional analysis revealed a relatively low efficiency of SCGB3A2 promoters of the SNP761SNP75 and SNP761SNP74 haplotypes in driving gene expression. These results suggest that the SCGB3A2 gene may contribute to GD susceptibility. † ‡ To whom correspondence should be addressed. Tel: þ86 2164370045 Ext. 610808; Fax: þ86 2164743206; Email: [email protected] The authors wish it to be known that, in their opinion, the first six authors should be regarded as joint First Authors. Present address: Department of Endocrinology, the Fourth Hospital of Xuzhou, Jiangsu Province, China. # The Author 2009. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] Human Molecular Genetics, 2009, Vol. 18, No. 6 1157 INTRODUCTION RESULTS Graves’ disease (GD) is one of the most common human autoimmune diseases with recent data estimating frequencies of up to 1.3% (0.5% clinical and 0.7% subclinical) in the USA (1) and 0.25– 1.09% in China (2). The hallmark of GD is the production of thyroid-stimulating hormone receptor (TSHR)-stimulating antibodies, leading to hyperthyroidism. GD is a complex trait disease and develops in genetically susceptible individuals, which arises through the interactions of susceptibility genes (3) and non-genetic factors, such as infection (4). Many genetic studies of GD have been carried out and several genes, such as human leukocyte antigen (3), cytotoxic T lymphocyte antigen 4 (CTLA-4) (5,6), CD40 gene (7), PTPN22 (8), TSHR (9) and SAS-ZFAT (10) have been linked to GD susceptability. However, none of these genes show an absolute correlation with disease predisposition and the exact genetic requirements for the development of GD are still unknown. A previous genome-wide study of 54 Chinese Han GD pedigrees provided the strongest evidence for linkage at D5s436 on chromosome 5q31. When four additional markers around D5s436 were used, a maximum two-point LOD score of 4.31 and a maximum multipoint LOD score of 4.12 were obtained for marker D5s2090 (11). Interestingly, from a dataset of 123 Japanese sibling pairs, the 5q31 locus was also linked with autoimmune thyroid disease (AITD), including GD and Hashmoto’s disease, with a maximum multipoint LOD score of 3.14 at D5s436 (12). Data from linkage analysis conducted on 445 subjects from 29 families of a homogeneous founder Caucasian population, the Old Order Amish of Lancaster County, Pennsylvania, also supports a linkage with AITD at chromosome 5q (13). Given the inherent inaccuracies of linkage analysis in identifying susceptibility genes (14), it is reasonable to hypothesize that the previously observed linkages point to the same locus involved in GD predisposition. In the present study, we have performed association analysis on a large number of single-nucleotide polymorphisms (SNPs) to identify the putative GD susceptibility gene at the 5q31 locus in the Chinese Han population. First, we used 179 SNPs within a 3.0 Mb region surrounding marker D5s2090 and found the most significant association signal to be at SNP rs1368408. Subsequent association analysis was then performed using 122 SNPs from a 1.0 Mb region surrounding rs1368408 for two independent populations collected from Shandong province and the city of Shanghai. The results suggested that the SNP76 (rs1368408) and SNP75 in the promoter of Secretoglobin Family 3A Member 2 (SCGB3A2) gene may be the causal variants of GD. Next, these results were further confirmed by association analysis in 2811 Chinese Han patients with GD and 2807 healthy individuals recruited from different geographic regions in China. Finally, functional analysis in vivo and in vitro has revealed that the susceptible alleles of the SNP76, SNP75 and SNP74, which are located on the promoter of SCGB3A2 gene, affect the binding of transcription factors to the promoter of SCGB3A2 and that the SNP76þSNP75 and SNP76þSNP74 haplotypes are associated with lower levels of SCGB3A2 gene expression. Defining the GD susceptibility region by association analysis of a 3.0 Mb region surrounding marker D5s2090 To narrow down the GD susceptibility locus, we started with a 3.0 Mb region surrounding D5s2090, defined by a decrease in the LOD score of 1.5 or more with an 99% confidence interval for linkage (Fig. 1A). The NCBI database indicates that this region, from markers D5s436 to D5s413, contains 25 genes (Fig. 1B, Supplementary Material, Table S1). Accordingly, 179 SNPs distributed with an average space of 15 Kb were selected from the NCBI SNP database (dbSNP) (NCBI Human Genome Build 36.1) for genotyping of 384 GD patients and 382 healthy subjects from Shandong province, China. Data quality control (QC) filters removed 40 SNPs with minor allele frequencies (MAFs) ,1% (N ¼ 32) or a Hardy – Weinberg equilibrium (HWE) P 1 1 026 in controls (n ¼ 8) (missing data in Supplementary Material, Table S2) (15). Out of the 139 SNPs, 4 SNPs have significantly different allele frequencies (at P-value ,0.001 level) in the GD and normal subjects and the strongest association was measured for SNP rs1368408 (P ¼ 3.69 1025). It is also notable that the four SNPs, including SNP rs1368408, form a cluster, suggesting a locus of strong association (Fig. 1C, Supplementary Material, Table S2). These results lead us to further investigate a 1.0 Mb region surrounding SNP rs1368408 (between SHGC-111280 and RH92492), which contains 11 genes. Identification of a susceptibility gene in a 1.0 Mb region surrounding rs1368408 The 83 SNPs in the exons and promoters of the 11 genes in the 1.0 Mb region surrounding rs1368408 were identified by re-sequencing. In addition, 39 SNPs in the intergenic sequences of the same region, distributed with an approximate interval of 5 Kb, were selected from the NCBI dbSNP. The allele frequencies for the 122 SNPs within the 1.0 Mb region were measured (Fig. 1D and E and Table 1) from 541 GD patients and 478 normal subjects from Shandong province. Data QC filters removed 18 SNPs from the analysis. Notably, out of the remaining 104 SNPs, 20 exhibit significantly different allele frequencies between the two groups, with P-values ,0.05 (Table 1 and Fig. 2A). Further analysis of these 20 SNPs revealed that 7 are distributed in the Secretoglobin Family 3A Member 2 (SCGB3A2, also designated Uteroglobin-related protein 1, UGRP1) gene, including 4 in the promoter region [pSNPs: SNP72 (21351, G/A); SNP74 (rs6882292); SNP75 (2623 2622, AG/T); SNP76 (rs1368408)], 2 in the introns [iSNPs: SNP77 (rs2278376); SNP78 (rs3217372)] and one synonymous SNP in exon 3 (cSNP): SNP89 (rs34212847) (Table 1 and Fig. 2A). The most significant association was measured at SNP76 (P ¼ 4.11 1028) and SNP75 (P ¼ 1.37 1028) (Table 1 and Fig. 2A and C). SNPs with relatively weak, albeit significant, GD associations were also detected in adjacent regions, including the promoter/coding portions of the SPINK5 (2 SNPs), KIAA0555 (2 SNPs), MGC23985 (1 SNP) and SPINK1 (1 SNP) genes, and in intergenic regions (7 SNPs) (Table 1 and Fig. 2A). Furthermore, in order to exclude 1158 Human Molecular Genetics, 2009, Vol. 18, No. 6 Figure 1. Linkage and association analysis, as well as SNPs distribution and gene content of the region between markers D5s436 and D5s413 on chromosome 5q31. (A) Non-parametric LOD score (NPL) sketch map from the original genome scan (11). The multipoint analysis localized the GD susceptibility locus to within an approximate interval of 2 cM between markers D5s436 and D5s434. The multipoint LOD scores throughout this interval are greater than 3.0, with a maximum multipoint LOD score of 4.12 at the marker D5s2090 (11). (B) The 3.0 Mb region surrounding D5s2090, defined by a decrease in LOD score by 1.5 or more, an 99% confidence interval for linkage, was used as the focal point of our search to identify candidate GD-susceptibility genes. The region identified contains 25 genes. Red lines represent the genes with forward orientations, and blue lines represent those with reverse orientations. The symbols 125 represent the gene names (Supplementary Material, Table S1). (C) The genotype results of the 179 SNPs selected from the 3.0 Mb region surrounding marker D5s2090 (D5s436– D5s413). Four SNPs with significant differences at the P-value ,0.001 level between GD and control subjects are marked in (B) with a red cross, and the most significant GD associations were observed in SNP rs1368408 (P ¼ 3.69 1025) (see Supplementary Material, Table S2 for detailed information). (D) The positions of 122 SNPs located in 11 genes within the 1.0 Mb region (from marker SHGC-111280 to RH92492). Of these, 83 SNPs were identified by re-sequencing these genes and 39 SNPs were selected from the NCBI dbSNP in a proportional space of 5 Kb, which are marked in (D) (see Supplementary Material, Table S1 for detailed information). Each gene is indicated by a box. (E) The SNPs located on exon 3, introns and the 50 and 30 flanking regions of the SCGB3A2 gene. A total of 38 SNPs were identified by re-sequencing a 15 Kb region of SCGB3A2. false positives, we analyzed 20 neutral SNPs on different chromosomes as genomic controls (GCs). In the population, the GC inflation factor (lgc) was 0.5351. Our statistical results were all normalized to the GC. Notably, the above Mass array results were corroborated with the results of the sequencing analysis for the three pSNPs of the SCGB3A2 gene (SNP76, SNP75 and SNP74) in the Shandong population. Next, the linkage disequilibrium (LD) regions of the 104 SNPs within 1.0 Mb region were evaluated using the Haploview program (16). Three LD regions composed of these SNPs were observed in the Shandong population (Fig. 2A, bottom panel). They were located between SNP32 and SNP39, SNP65 and SNP103, and SNP141 and SNP148, respectively (Fig. 2A, bottom panel). Interestingly, when the 20 SNPs with significantly different allele frequencies between the GD and control populations in Shandong were examined for their locations within the LD block structure, 7 of them were found to be distributed in the middle region, whereas 5 and 8 of them, with relatively weak association signals with GD (P-value: 0.04020.0028), were found in the left and right LD blocks, respectively (Fig. 2A and Table 1). It was notable that all SNPs in this 1 Mb region with P-value less than 0.001 are distributed in the middle region (Table 1 and Fig. 2A and C). To identify causal variants of GD in this 1.0 Mb region, the genotype data of 104 of the 122 SNPs suitable for logistic regression analysis in the Shandong population were further mined by logistic regression analysis (5,17) (Fig. 2D – K and Supplementary Material, Table S3). When SNP72 (21351, G/A), SNP74 (rs6882292), SNP75 (2623 2622, AG/T), SNP76 (rs1368408), SNP77 (rs2278376), SNP78 (rs3217372) and SNP89 (rs34212847) were individually put Table 1. The name and location of the SNPs in the 1.0 Mb region around rs1368408 and the results of association analysis for these SNPs Shandong Control Case (%) (%) P-value OR OR (95%CI) Shanghai Control Case (%) (%) P-value OR OR (95%CI) SNP27 SNP28 SNP29 SNP30 SNP31 SNP32 SNP33 SNP34 SNP35 SNP36 SNP37 SNP38 SNP39 SNP40 SNP41 SNP42 SNP43 SNP44 SNP45 SNP46 SNP47 SNP48 SNP49 SNP50 SNP51 SNP52 SNP53 SNP54 SNP55 SNP56 SNP57 SNP58 SNP59 SNP60 SNP61 SNP62 SNP63 SNP64 SNP65 SNP66 SNP67 SNP68 SNP69 SNP70 SNP71 SNP72 SNP73 SNP74 SNP75 SNP76 SNP77 SNP78 SNP79 SNP80 SNP81 SNP82 SNP83 SNP84 SNP85 SNP86 SNP87 STK32A STK32A STK32A STK32A STK32A STK32A STK32A DPYSL3 DPYSL3 DPYSL3 DPYSL3 DPYSL3 DPYSL3 Intergenetic Intergenetic Intergenetic KIAA0555 KIAA0555 KIAA0555 KIAA0555 KIAA0555 KIAA0555 KIAA0555 Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic SPINK1 SPINK1 Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 40.19 33.54 40.68 6.00 39.53 32.30 39.90 6.13 0.7781 0.6548 0.7679 0.9437 0.97 0.95 0.97 1.02 34.35 32.06 40.08 5.50 33.57 33.37 37.71 6.64 0.7006 0.5351 0.3782 0.2177 0.97 1.06 0.91 1.22 40.36 28.57 23.20 12.46 23.27 23.12 23.31 11.57 31.43 37.78 26.32 22.10 11.06 19.68 22.78 23.19 11.15 28.13 0.2775 0.3571 0.5682 0.5450 0.0567 0.8358 0.9119 0.8028 0.1094 0.90 0.90 0.93 0.90 0.80 0.98 0.99 0.95 0.85 35.29 26.77 19.70 21.00 19.79 20.23 18.20 15.22 31.07 32.74 26.51 18.91 24.11 19.94 20.62 19.24 16.81 28.67 0.1698 0.8932 0.6068 0.0546 0.9244 0.8035 0.4877 0.3112 0.2762 0.89 0.99 0.95 1.20 1.01 1.02 1.07 1.13 0.89 40.70 23.96 12.58 33.28 46.05 49.91 31.95 38.39 19.66 9.78 36.86 47.02 43.24 28.49 0.2869 0.0264 0.0588 0.1573 0.7059 0.0028 0.0916 0.91 0.78 0.63 – 0.97 0.75 1.17 1.04 0.76 0.64 – 0.91 0.85 43.58 26.19 12.34 34.89 50.00 43.57 24.62 51.67 31.77 12.38 38.90 47.43 47.29 25.63 0.0010 0.0238 0.9729 0.0645 0.3252 0.1124 0.6438 1.38 1.14 – 1.68 1.31 1.04 – 1.66 1.00 1.19 0.90 1.16 1.05 28.22 35.36 6.85 26.57 30.72 16.05 10.30 6.05 14.43 4.48 4.39 8.38 17.60 10.12 20.40 19.39 29.32 41.03 6.17 32.45 34.96 18.09 10.08 8.92 13.92 5.08 4.46 7.84 19.90 11.94 16.26 20.95 0.6914 0.0203 0.5960 0.0097 0.0561 0.2757 0.9146 0.0402 0.7775 0.5594 0.9454 0.7822 0.2174 0.2485 0.0657 0.4110 1.06 28.21 1.27 1.04 – 1.55 38.18 0.89 4.81 1.33 1.08 – 1.64 26.12 1.21 33.01 1.16 15.15 0.98 8.78 1.53 1.08 – 2.17 7.82 0.96 13.60 1.14 4.04 1.02 4.59 0.93 24.92 1.16 17.86 1.20 8.28 0.76 17.87 1.10 17.43 24.36 41.81 6.15 24.73 31.39 17.01 8.17 8.85 13.77 4.79 5.87 25.89 14.98 9.01 18.81 19.90 0.0253 0.0780 0.2042 0.5234 0.3696 0.3043 0.5696 0.3837 0.8976 0.3359 0.1368 0.6140 0.0441 0.7375 0.5393 0.1463 0.82 0.69 – 0.98 1.16 1.30 0.93 0.93 1.15 0.92 1.14 1.01 1.20 1.30 1.05 0.81 0.66 – 0.99 1.10 1.06 1.18 18.93 20.32 0.1683 1.09 5.81 12.67 10.20 7.29 5.25 7.35 4.79 8.63 10.24 25.87 11.44 9.75 19.03 11.92 7.21 11.35 6.47 13.93 7.92 7.39 4.13 12.34 4.33 14.71 19.85 37.69 14.60 14.61 24.88 11.83 8.90 12.44 0.5554 0.4289 0.2082 0.9653 0.2659 0.0004 0.6520 0.0001 1.37 10 28 4.11 10 28 0.0443 0.0016 0.0896 0.9544 0.2027 0.4452 1.12 1.12 0.76 0.99 0.78 1.77 0.90 1.83 2.17 1.73 1.32 1.58 1.41 0.99 1.26 1.11 6.72 12.15 8.72 6.35 3.42 7.32 3.00 5.27 6.89 14.95 9.40 9.17 19.74 10.89 6.41 8.54 4.60 13.65 9.06 6.12 5.58 8.52 7.01 5.96 8.65 20.49 12.25 13.56 21.52 11.33 9.55 11.23 0.0448 0.3222 0.7672 0.8059 0.0153 0.2995 1.46 10 25 0.4986 0.1348 0.0012 0.0366 0.0014 0.4304 0.7379 0.0146 0.0427 0.67 1.14 1.04 0.96 1.67 1.18 2.44 1.14 1.28 1.47 1.35 1.55 1.11 1.05 1.54 1.35 6.43 12.60 9.00 6.54 3.84 7.48 3.65 6.83 8.66 19.47 8.72 8.70 18.94 11.38 8.40 10.12 6.11 13.45 9.02 6.50 4.47 9.38 4.31 8.43 11.12 23.59 10.19 10.67 20.71 11.67 9.82 11.61 0.5514 0.2500 0.9830 0.9512 0.1431 0.0035 0.1270 0.0041 7.62 10 25 1.43 10 26 0.0158 0.0033 0.0823 0.6778 0.0341 0.2018 0.95 1.08 1.00 0.99 1.17 1.28 1.19 1.26 1.32 1.28 1.19 1.25 1.12 1.03 1.19 1.17 5.36 11.86 12.67 6.49 11.67 12.76 0.3037 0.8998 0.9549 1.23 0.98 1.01 6.20 10.35 11.64 4.97 9.87 14.15 0.2119 0.7062 0.0918 0.79 0.95 1.25 5.44 11.67 11.19 6.10 11.36 12.03 0.5167 0.8300 0.2283 1.13 0.97 1.09 rs4705132 rs6894633 rs6580458 rs55936730 rs6884181 I7-1,1113728 rs918797 rs3805533 rs1049171 E14-2,160813 rs3749721 I11-1,154766 rs2241696 rs958677 rs7716144 rs981644 rs3763094 rs3763095 rs2116766 rs7735403 rs1432827 rs6895278 P1,2963 rs6895278 rs12655012 rs12659905 rs1016104 rs4705201 rs17107298 rs11319 P1,2133 rs3806925 rs4705194 rs1594671 rs1368412 rs1025489 rs7702893 rs3777125 rs6877288 rs6895894 rs6877478 rs7726085 rs7726552 rs7727031 P6,21664 P5,21351 P4,2130121303 P3,rs6882292 P2,26232622 P1,rs1368408 I1-1,rs2278376 I1-2,rs3217372 rs10058203 rs2116805 I1.3,11454 I1.4-1,11779 rs13355689 I1.4-2,11939 rs6859234 rs41291429 rs6859391 1.30 – 2.42 1.36 – 2.44 1.67 – 2.82 1.43 – 2.10 1.01 – 1.73 1.20 – 2.09 0.45 – 0.99 1.10 – 2.54 1.61 – 3.69 1.16 – 1.85 1.02 – 1.78 1.18 – 2.04 1.09 – 2.19 1.01 – 1.82 Combined Han Control Case (%) (%) P-value Position Sequence near the polymorphism 146599399 146637662 146637777 146639268 146703083 146708589 146708683 146751943 146752171 146752641 146753229 146758688 146784559 146842422 146892300 146947379 146992484 146992496 147004669 147008123 147053948 147102213 147143408 147150530 147156489 147162250 147167239 147172314 147183226 147184385 147191585 147192922 147200067 147205835 147212204 147216985 147226926 147232327 147234886 147234964 147235002 147235514 147235795 147236113 147236803 147237116 147237164 147237749 147237844 147238355 147238688 147238737 147239235 147239598 147239920 147240245 147240275 147240405 147240931 147240961 147241008 GGGAGC[G/C]AACACT TTCTAT[G/T]TTTACT AGATCA[T/C]GTTTTA TTGACA[C/T]AATTGC ACTTTT[C/T]AGCTGG CAAATG[C/T]TGTGCT GAGGAT[A/G]AGTGAC CTCTTA[G/A]TTTACA ACCATT[G/A]TCTCTG GGGGAA[C/T]TGGGAA CCCTAG[G/A]GTCTGC TCAAAA[C/T]CTCAAC AGGTTG[G/A]ATTACA CTGCTT[G/T]GATAGA GCAAGG[C/T]GTTCCT TAGGCA[G/T]GTTAAA ACAGGA[C/T]GCCAGA ACAACA[A/G]GAACTA TGAACT[G/T]ATGGTG GAGATA[T/C]ACTAAA GAAATC[A/T]CTACTG TTACTA[C/T]GTGCCA NTATAG[T/C]TAGAAA TTACTA[C/T]GTGCCA ATTATT[C/T]AGGTAG GCTATC[A/G]TTGGCT TTATCA[A/G]TGTTAT ATGTAG[A/C]GGTTAA TTATCT[A/G]AAGTTT GGTCAC[C/T]GCGAGG TTTTCC[T/C]GACAGA TCCTAG[C/T]GCTAAG TAAGCC[A/G]AGTGTG GCTTCC[A/G]AGCTTC GTCAGC[C/T]CAATTT CTCCCT[A/G]TCACAG AAAAAA[A/T]TTCAAA GTTATT[C/G]CAATCA TCTGGG[A/G]TCTTGG AATAAA[A/G]GTCGTT ATTTGC[A/G]TATGAA TGTATA[C/T]GTATGT CTTGGC[A/T]TTTATA GGCCTG[C/T]GTGGCA ATTTAT[A/T]TATACT TTCATG[G/A]TGTCTT AAAGAT[AAA/2]GAAATG ATTTAT[G/A]TTCCCA TCAAA[AG/T]ACACT TTGTTT[G/A]GTGAGA AGTAAG[C/A]CTTGCC TTTTTT[T/2]ATTTTA GCTTCT[A/G]CCTAAG CCTACA[A/C]TGGCAA CACATG[C/A]ATGTGT AAGGCT[C/T]ACCATC TCCTAA[C/T]GGTTCC CATGTT[A/G]GAATTA ATGACG[A/G]AGAGTG CTTCTC[C/T]GAGGAG AGAAAG[C/G]TAAGTA OR OR (95%CI) 1.10– 1.49 1.09– 1.45 1.16– 1.50 1.17– 1.40 1.04– 1.35 1.09– 1.44 1.03– 1.38 Continued 1159 Marker location Human Molecular Genetics, 2009, Vol. 18, No. 6 SNP Description symbols Table 1. Continued SNP88 SNP89 SNP90 SNP91 SNP92 SNP93 SNP94 SNP95 SNP96 SNP97 SNP98 SNP99 SNP100 SNP101 SNP102 SNP103 SNP104 SNP105 SNP106 SNP107 SNP108 SNP109 SNP110 SNP111 SNP112 SNP113 SNP114 SNP115 SNP116 SNP117 SNP118 SNP119 SNP120 SNP121 SNP122 SNP123 SNP124 SNP125 SNP126 SNP127 SNP128 SNP129 SNP130 SNP131 SNP132 SNP133 SNP134 SNP135 SNP136 SNP137 SNP138 SNP139 SNP140 SNP141 SNP142 SNP143 SNP144 SNP145 SNP146 SNP147 SNP148 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 SCGB3A2 Intergenetic Intergenetic MGC23985 MGC23985 MGC23985 MGC23985 Intergenetic LOC391839 LOC391839 Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic Intergenetic SPINK5 SPINK5 SPINK5 SPINK5 SPINK5 SPINK5 SPINK5 SPINK5 SPINK5 Intergenetic Intergenetic SPINK5L2 SPINK5L2 SPINK5L2 SPINK5L2 MGC21394 MGC21394 LOC402232 LOC402232 rs3910207 E3-1,rs34212847 rs3843496 rs3910183 3U1-1,13679 rs17107376 rs61012413 rs60040551 rs4705204 rs17107378 rs17107379 rs17107380 rs17107381 rs7708635 rs1594666 rs1010764 rs1549204 rs2250145 rs1432974 P2,21412 rs1153089 rs9654488 rs7712580 rs721570 rs1153084 rs11958481 rs1432973 rs7722416 rs4705047 rs7703202 rs1432978 rs4273592 rs961841 rs1363525 rs2161337 rs4421091 rs2080085 rs7700964 rs7703112 rs17775074 rs7706528 rs2895729 rs2287771 rs2303062 rs2303063 rs2303064 rs2303065 rs2303067 rs2303068 rs4349706 rs3088193 rs2303064 rs2287770 rs6881658 rs6895745 rs4705055 rs4269285 rs17096690 rs10477364 rs9325091 rs1023714 Shandong Control Case (%) (%) P-value OR OR (95%CI) Shanghai Control Case (%) (%) P-value OR OR (95%CI) Combined Han Control Case (%) (%) P-value OR OR (95%CI) 6.99 28.85 11.85 2.89 11.59 11.46 3.99 11.97 28.56 12.21 2.51 12.24 11.95 4.32 0.0003 0.8912 0.8031 0.6245 0.6651 0.7441 0.7465 1.81 1.32 – 2.47 7.25 0.99 26.02 1.03 9.38 0.86 2.56 1.06 11.16 1.05 10.94 1.09 5.48 8.83 26.12 12.01 4.06 10.71 10.15 4.63 0.1653 0.9536 0.0535 0.0682 0.7626 0.5557 0.4249 1.24 1.01 1.32 1.61 0.96 0.92 0.84 8.05 27.94 10.67 2.88 11.07 11.24 6.03 9.54 28.49 11.44 2.85 11.38 11.34 5.83 0.0110 0.5728 0.2396 0.9413 0.6520 0.8866 0.7105 1.20 1.05 – 1.37 1.03 1.08 0.99 1.03 1.01 0.97 13.18 13.19 11.40 7.01 18.55 5.16 29.48 16.22 36.34 22.21 41.80 13.18 12.88 11.45 7.21 21.31 7.64 29.96 14.74 38.51 23.95 36.83 0.9968 0.8417 0.9737 0.8919 0.1572 0.0555 0.8524 0.4823 0.3311 0.4041 0.0301 1.00 0.97 1.00 1.03 1.19 1.52 1.02 0.89 1.10 1.10 0.81 0.67 – 0.98 10.88 12.69 11.03 8.85 17.01 9.46 26.30 16.60 35.86 20.64 16.11 11.38 10.95 7.77 19.65 9.22 27.96 17.17 36.89 22.72 0.0005 0.4096 0.9587 0.4127 0.1238 0.8523 0.3406 0.7005 0.5930 0.2342 1.57 1.22 – 2.03 0.88 0.99 0.87 1.19 0.97 1.09 1.04 1.05 1.13 11.38 13.49 10.31 10.25 18.89 10.08 12.65 12.76 10.48 10.10 20.41 10.27 0.0751 0.6327 0.7962 0.8283 0.0819 0.7830 1.13 0.94 1.02 0.98 1.10 1.02 39.39 5.65 32.26 48.89 2.63 35.83 0.0030 0.1172 0.1257 0.3542 0.6221 0.0603 1.47 1.14 – 1.90 36.36 0.45 11.53 1.17 39.38 8.78 0.84 9.80 0.93 25.07 0.56 10.48 35.93 15.29 38.46 7.14 9.17 26.43 10.77 0.8491 0.0529 0.6864 0.2466 0.6025 0.4501 0.8234 0.98 1.38 0.96 0.80 0.93 1.07 1.03 12.68 21.33 10.07 10.87 20.17 5.86 45.77 31.31 3.27 35.63 38.42 8.50 43.10 25.63 4.31 35.15 39.07 12.16 0.2593 0.0323 0.3337 0.8362 0.7863 0.0176 0.90 0.76 0.59 – 0.97 1.33 0.98 1.03 1.49 1.08 – 2.05 49.60 31.81 2.59 31.95 34.94 10.74 41.49 5.01 53.50 35.02 4.04 26.46 34.56 6.42 46.42 4.21 0.0615 0.0869 0.1169 0.0060 0.8427 0.0091 0.0146 0.3467 1.17 1.16 1.58 0.77 0.63 – 0.93 0.98 0.57 0.37 – 0.87 1.22 1.04 – 1.44 0.83 4.81 4.64 0.9176 0.96 20.79 21.85 16.22 16.22 0.0152 0.0029 22.36 25.33 0.1488 0.74 0.58 – 0.94 17.92 0.69 0.55 – 0.88 17.89 5.64 1.18 25.22 15.05 14.24 6.25 25.13 0.0775 0.0220 0.5716 0.9624 0.81 0.76 0.60 – 0.96 1.11 1.00 48.61 44.49 44.80 45.43 46.33 45.89 48.93 46.60 44.14 47.58 44.31 40.99 40.89 50.10 0.3809 0.8771 0.2628 0.6229 0.0197 0.0275 0.6241 0.92 0.99 1.12 0.96 0.80 0.67 – 0.96 0.82 0.68 – 0.98 1.05 51.31 40.03 41.43 45.69 44.64 43.11 49.21 47.19 48.98 46.81 47.87 47.35 44.04 52.86 0.0353 0.0001 0.0207 0.3687 0.3165 0.6690 0.0748 0.85 0.73 – 0.99 1.44 1.19 – 1.73 1.24 1.03 – 1.50 1.09 1.12 1.04 1.16 49.22 49.02 48.46 49.33 34.17 6.27 23.33 22.82 47.63 46.48 46.57 47.61 38.37 7.11 22.59 22.33 0.5029 0.2854 0.4198 0.4600 0.1137 0.5192 0.7546 0.8401 0.94 0.90 0.93 0.93 1.20 1.14 0.96 0.98 47.19 48.76 48.87 47.69 35.19 6.55 21.56 21.55 43.05 44.11 43.93 46.72 28.81 9.07 17.40 18.95 0.0595 0.0221 0.0112 0.6422 0.0125 0.0569 0.0079 0.0964 0.85 0.83 0.82 0.96 0.75 1.42 0.77 0.85 0.71 – 0.97 0.70 – 0.96 0.59 – 0.94 0.63 – 0.93 Position Sequence near the polymorphism 147241677 147241803 147242005 147242108 147242145 147242413 147243151 147243346 147243630 147243731 147243784 147243815 147243864 147244810 147245459 147247923 147251658 147266247 147266805 147267670 147267677 147270206 147280966 147283244 147304359 147307881 147311055 147314223 147322787 147323729 147328074 147334766 147338580 147363753 147370184 147377498 147389414 147391761 147397943 147411105 147417300 147421180 147425205 147460200 147460220 147460273 147460305 147461148 147461211 147496791 147496955 147508590 147519728 147528521 147528541 147528612 147529356 147562510 147566982 147602376 147604459 GTTTCC[C/T]CATCAG CTTGGT[G/A]TGACAT CCAGAT[C/T]AGTTTT CTCTAA[G/T]TTAAAC ATCTCA[T/C]GGTGTT TTTCCT[C/T]TACTCT CACCTA[C/G]TTGACT CTTTCA[C/T]TCTGTG CATATT[G/T]ATGCAT TCCTAT[A/G]GGAAAG TTACTT[A/G]ATGACT TAGATG[A/C]CTCTCA TCTTTC[C/T]GCCTAC CCATCA[G/T]CCATAC TGTAGA[C/G]AAGCTG ACTAAT[A/C]ACCATG AAAATT[A/C]TTTGTG CTGTCT[C/T]AGTACT GAAGGT[A/G]TCACAA CACGGT[A/G]GCTCAC AGCCAG[G/A]CACGGT AGGAAG[A/G]AAAAAC GAGAAA[C/T]TTCAAA TCATTA[A/G]AGGAAA CCCTGA[A/G]CCTTCA AGTCAT[G/T]AGAAAA ATGCTA[A/G]GATGAT AATGCC[A/C]GTCAGC AGACGA[G/T]CTAATT CCATTG[C/T]TCTGTG ATTCTG[A/T]GAAGTT TGGTAG[C/T]GGTGAT TCTTCC[G/T]TTCAAT GGTTTT[C/T]CTGTGT GACTCA[A/G]TGATAC AATATA[A/G]TTCTGA CCCCTG[C/T]CAACAA CCTACA[C/T]CTCTTT GAAATA[A/G]TTTAAT TGCTCT[A/G]TGGCTT TACATA[C/T]GGTGAG CATACA[C/T]GTACAA CCTTCA[T/C]GTTAAT TCTTCT[A/G]TCTCGG TTTGCA[G/A]TGAATA GAGAAC[G/A]ATCCTA AGTGCA[T/C]GGCAAC GAAGGT[A/G]AATCAA CCTCCA[G/A]CAACTC CCCCAG[T/C]TCTGAA AGACAT[C/G]TCCACC GAGAAC[A/G]ATCCTA AGGTGA[C/T]GCTGAA AACCTT[T/G]CATAGT GAACTT[G/C]CAATCA AGAGCA[C/T]ATCAGC AATGGG[T/C]GGAGTA AATATA[A/G]GAATCA CATTTC[A/G]TATCTC AAGGAC[A/T]ACCAGG ATTCAG[A/T]TCTTAA The P-values with bold letters indicate those allele frequencies with significant differences between GD and normal subjects. Blank in line of P-value indicate 40 SNPs with MAF ,1% or a HWE P 1 1026 in controls removed from analysis. Human Molecular Genetics, 2009, Vol. 18, No. 6 Marker location 1160 SNP Description symbols Human Molecular Genetics, 2009, Vol. 18, No. 6 1161 Figure 2. The results of the association and logistic regression analysis for SNPs located in the 1 Mb region around SNP rs1368408 in the Shandong, Shanghai, and the combined Han populations. A total of 122 SNPs located in the 1 Mb region around SNP rs1368408 were genotyped in all subjects from Shandong and Shanghai. After removal of the SNPs with MAFs ,1% or HWE P 1 1026 in the controls, the SNPs case –control associations plotted [2log10(P-value) against location in megabase] and SNPs linkage disequilibrium(LD) region analysis for the Shandong (A) and Shanghai (B) are presented in (A) and (B). The SNPs from the SCGB3A2 region that have strong associations with GD are marked within the red vertical lines. The most significantly associated SNPs are located in the SCGB3A2 gene in two independent studies, with the smallest P-values of 1.37 1028 and 1.46 1025 in Shandong (top portion of A) and Shanghai populations (top portion of B), respectively (Table 1 for detailed information). The LD regions of these SNPs in the 1.0 Mb region were analyzed with Haploview software in the Shandong (bottom portion of A) and Shanghai populations (bottom portion of B). Three LD blocks composed of these SNPs are observed in these two independent populations (bottom portion of A and B). They are located between SNP32 and SNP39, SNP65 and SNP103, and SNP141 and SNP148, respectively. The SNPs in the SCGB3A2 gene are marked by the rectangle. (C) The 38 SNPs located in the 15.0 Kb region of SCGB3A2 were genotyped in 2811 case subjects with GD and 2807 controls subjects in the combined Chinese Han population. The case –control association plots [2log10 (P-value)] for the SNPs located in the 15 Kb region were magnified in Shandong, Shanghai and the combined Chinese Han population in (C). (D–K) Two locus logistic regression analyses of SNP75 (2623 2622, AG/T) and SNP76 (rs1368408) in Shandong (D–G) and the combined Han (H–K) populations. SNP75 and SNP76 were put individually into the regression models as the best makers in the SCGB3A2 gene, and all other markers were sequentially added to see if a second locus could improve the model. In the Shandong population, 8 of the 104 SNPs suitable for logistic regression analysis improved the model with SNP75 (D) and eight markers improved the model with SNP76 (F), at the P-value ,0.01 level. In contrast, we tested a regression model by taking each one of 104 loci in turn and adding the test locus to it. All the markers could be improved by adding SNP75 (E) or SNP76 (G) (see Supplementary Material, Table S3 for detailed information). Moreover, in the combined Han population, 6 of the 33 SNPs suitable for logistic regression analysis improved the model with SNP75 (H) and 10 markers improved the model with SNP76 (J), at the P-value ,0.01 level. In contrast, when we tested a regression model by taking each one of the 33 loci in turn and adding the test locus to it, all the markers could be improved by adding SNP75 (I) or SNP76 (K) (see Supplementary Material, Table S4 for detailed information). into the regression models as the best marker for the region of SCGB3A2, only SNPs in the other three regions could improve these models, with a cut-off P-value ,0.01 (SNP47, SNP53; SNP107; and SNP128, respectively) (Fig. 2D and F and Supplementary Material, Table S3). However, the SNP47, SNP107 and SNP128 are unlikely to contribute to the susceptibility of GD because their mutation frequencies are lower in the GD group than in the control group (Table 1). Next, we tested the regression model by taking each one of the 104 loci in turn and adding the testing locus to it. Interestingly, the majority of the markers could be improved by adding each of the SNPs in 1.11 3 10 25 7.04 3 10 24 1.31 3 10 24 0.0392 0.0135 4.04 3 10 25 2.67 3 10 27 0.8781 0.2564 0.0157 7.23 3 10 25 0.0007 0.0192 0.0448 1.35 3 10 28 7.30 9.98 4.71 5.36 2.68 59.61 0.18 5.39 8.31 5.06 75.73 5.20 8.90 3.78 73.90 79 108 51 58 29 645 2 48 74 45 674 257 440 187 3655 2.94 5.88 1.68 3.47 1.16 68.38 2.94 5.24 6.99 3.00 82.78 3.80 7.62 3.06 78.68 28 56 16 33 11 651 28 63 84 36 995 200 401 161 4140 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 Combined Han SNP74 SNP75 SNP76 Other SNP74 SNP75 SNP76 Other Shanghai Other SNP76 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 SNP74 SNP75 Shandong Bold letters indicate those haplotypes with significant differences between GD and normal subjects. All data shown here are haplotypes whose frequencies are more than 2%. 2.60 1.77 2.89 1.58 2.36 0.68 0.06 1.03 1.21 1.72 0.65 1.39 1.18 1.24 0.77 1.67–4.04 1.27–2.48 1.64–5.11 1.02–2.44 1.17–4.74 0.57–0.82 0.01–0.26 OR (95%CI) OR P-value Percent Case Number Percent Control Number SNP89 SNP78 SNP77 SNP76 SNP75 SNP74 SNP73 Haplotype SNP71 SNP72 SNP no. Population Table 2. Frequencies of SCGB3A2 haplotypes in different populations 1.10–2.70 0.52–0.80 1.15–1.68 1.03–1.36 1.00–1.54 0.70–0.84 Human Molecular Genetics, 2009, Vol. 18, No. 6 1162 SCGB3A2 gene (SNP72, SNP74, SNP75, SNP76, SNP78 and SNP89) (Supplementary Material, Table S3 and Fig. 2E and G). In contrast, only 16 SNP models could be improved by SNP53, at a P-value of less than 0.01 (Supplementary Material, Table S3). These results suggested that SNP53 was not likely to be the causal variant, owing to the very limited impact on the overall model of SNPs, and the weak association we observed for this SNP was probably due to LD with causal variants residing in the SCGB3A2 region. With regard to the SNPs in the SCGB3A2 region, SNP76 (rs1368408) and SNP75 (26232622, AG/T) are probably the most important for the susceptibility to GD because they improve the model with any one of 104 SNPs, with the lowest P-value among the SNPs of SCGB3A2 (Supplementary Material, Table S3 and Fig. 2E and G). However, these results do not reject the possibility that multiple SNPs located in the SCGB3A2 region act in combination to increase the risk of GD. Because multiple SNPs may act in combination to increase the risk of disease, haplotypes of the SNPs on the SCGB3A2 gene were investigated and their frequencies in the GD and control groups were compared. In the population of Shandong Province, 7 haplotypes with a frequency of more than 2% were formed from 9 SCGB3A2 SNPs and accounted for 85% of all haplotypes (Table 2). Five of these haplotypes showed significantly higher frequencies among individuals with GD than the control group. As shown in Table 2, the haplotype 000101110 displayed the highest statistical difference (P ¼ 1.11 1025, OR 2.60, CI 1.67– 4.04), followed by haplotype 000011000 (P ¼ 1.31 1024, OR 2.89, CI 1.64 – 5.11) and 010011001 (P ¼ 7.041024, OR 1.77, CI 1.27 – 2.48). In contrast, haplotype 000000000 was more frequently observed in controls than in GD patients (P ¼ 4.04 1025, OR 0.68, CI 0.57– 0.82) (Table 2). Notably, all the haplotypes with close associations with GD contained one or two variants of the SNP76, SNP75 or SNP74 alleles (Table 2). At the same time, a replication study was performed in 545 patients and 603 normal subjects from Shanghai, a metropolitan city in China where many individuals come from different regions and have multiplex founders. After 16 SNPs were removed from the analysis by data QC filters (15), 24 out of 106 SNPs had different distribution patterns in the Shanghai population analysis (Table 1 and Fig. 2B and C). Of those, nine were found in the SCGB3A2 gene, with the most significant association found in the promoter of this gene [SNP73 (21301 21303, AAA/2), P ¼ 1.46 1025, OR 2.44, CI 1.61– 3.69] (Table 1). Interestingly, three SNPs in the SCGB3A2 gene, named SNP76 (rs1368408), SNP77 (rs2278376) and SNP78 (rs3217372), had significant frequency differences between patients and controls in two independent populations (Table 1). We noticed that SNP76 (rs1368408), the nucleotide variant with the most significant association signal in the Shandong population, also exhibited significantly higher allele frequencies in patients with GD than in healthy individuals collected from Shanghai (20.49 versus 14.95%, P ¼ 1.20 1023, OR 1.47, CI 1.16 – 1.85) (Table 1). In the Shanghai population, only one haplotype, 101001110, was more frequent in individuals with GD than in the controls (P ¼ 0.0157, OR 1.72, CI 1.10 – 2.70) (Table 2). All of these results strongly suggested that the SNP76 and related haplotypes conferred susceptibility to GD. Human Molecular Genetics, 2009, Vol. 18, No. 6 1163 Table 3. False positive report probability (FPRP) values for eight SNPs with significant difference between 2811 patients with GD and 2807 health individuals SNP symbols SNP72 SNP74 SNP75 SNP76 SNP77 SNP78 SNP81 SNP89 Description P5, 21351 P3, rs6882292 P2, – 623 –622 P1, rs1368408 I1-1, rs2278376 I1-2, rs3217372 I1.3 E3-1, rs34212847 Odds ratio (95% CI) 1.28 (1.10– 1.49) 1.26 (1.09– 1.45) 1.32 (1.16– 1.50) 1.28 (1.17– 1.40) 1.19 (1.04– 1.35) 1.25 (1.09– 1.44) 1.19 (1.03– 1.38) 1.20 (1.05– 1.37) Reported P-value 0.0035 0.0041 7.621025 1.431026 0.0158 0.0033 0.0341 0.0110 Statistical power under recessive modela Prior probability 0.25 0.1 0.01 0.001 0.0001 0.00001 0.980 0.993 0.975 1.000 1.000 0.994 0.999 1.000 0.004 0.004 0.000 0.000 0.020 0.006 0.060 0.021 0.596 0.559 0.021 0.000 0.873 0.667 0.955 0.875 0.937 0.927 0.175 0.001 0.986 0.953 0.995 0.986 0.993 0.992 0.680 0.007 0.999 0.995 1.000 0.999 0.013 0.011 0.000 0.000 0.058 0.018 0.161 0.059 0.128 0.112 0.002 0.000 0.405 0.166 0.679 0.409 a Statistical power is the power to detect an odds ratio of 1.5 for the homozygotes with the rare genetic variant, with an a level equal to the reported P-value. FPRP values below 0.2 are in bold face. To further confirm the associations of the SCGB3A2 variants with GD susceptibility, the 15 Kb region containing the SCGB3A2 gene and its 50 and 30 flanking regions were completely re-sequenced. A total of 38 SNPs were found in this gene, with 13, 12 and 13 SNPs distributed in the exons and introns, and 50 as well as 30 flanking regions of SCGB3A2, respectively. Subsequent association analyses of the 38 SNPs residing in and around the SCGB3A2 gene were performed from 2811 patients with GD and 2807 healthy individuals, which were collected from Jiangsu, Henan, Anhui and Fujian Province, along with samples collected from the Shandong and Shanghai populations; all subjects were from the Chinese Han population. Excluding 5 SNPs with HWE P 1 1026 in controls, of the remaining 33 SNPs in the SCGB3A2 region, 8 had significant frequency differences in the patients with GD compared to healthy individuals in the combined Han population. Similarly, the most significant differences between the GD patients and controls were measured for SNP76 and SNP75, which are located in the promoter of SCGB3A2 (P ¼ 1.43 1026 and 7.62 1025, respectively) (Table 1 and Fig. 2C). Interestingly, in the Chinese Han cohorts recruited from different geographic regions of China, four haplotypes with frequencies higher than 2% were formed from nine SCGB3A2 SNPs and accounted for more than 90% of all haplotypes (Table 2). Three of these haplotypes had significantly higher frequencies among patients with GD than the control group. Notably, similar to the results from the Shandong population studies, the haplotypes of the SNP76 (rs1368408, G/A)þSNP74 (rs6882292, G/A) (000101110) or SNP76þSNP75 (26232622, AG/T) (010011001) variants also correlated with high disease susceptibility in the combined Chinese Han cohort (P ¼ 0.0007 and P ¼ 0.0192, respectively) (Table 2). In contrast, haplotype 000000000 was more frequently observed in the control group than in GD patients (P ¼ 1.35 1028, OR 0.77, CI 0.70– 0.84) (Table 2). Moreover, the results of logistic regression analysis (Fig. 2H –K and Supplementary Material, Table S4) of the combined Chinese Han cohorts also suggested that pSNPs in the SCGB3A2 gene, SNP76 and SNP75, were the strongest determinants in the susceptibility of GD because they improved the model when combined with any one of the other 33 SNPs (Fig. 2H – K and Supplementary Material, Table S4). All the results strongly suggested that the SNP76 and SNP75 con- ferred susceptibility to GD, particularly when they existed in haplotypes of SNP76 and SNP75 or SNP76 and SNP74 (Table 2). The false positive report probability (FPRP) of the SNPs with significant association with GD in the combined Chinese Han cohorts was also analyzed. In the present study, for each genetic variant, the FPRP value was calculated using the assigned prior probability range, the statistical power to detect an odds ratio (OR) of 1.5 and detected ORs and P-values. As showed on the Table 3, among the eight genetic variants with a significant difference between the patients with GD and healthy individuals, the FPRP values of five SNPs were below 0.2 for the prior probability from 0.25 to 0.01, which was a relatively high prior probability range. However, the FPRP values for the SNP76 and SNP75 were very low even for low prior probabilities, since the FPRP value remains below 0.2 even for a prior probability of 0.0001 (0.001 and 0.175, respectively). This relationship was especially true for the SNP76, as the FPRP value was 0.007 even for a prior probability of 0.00001 (Table 3). Interestingly, the case– control study for these eight SNPs with significant differences between the 2811 patients with GD and 2807 healthy individuals have more than 97% statistical power to detect a SNP with an a level equal to their reported P-value, corresponding to relative risks of 1.5 for GD (Table 3). The pSNPs (SNP76, SNP75 and SNP74) most strongly associated with GD are also correlated with lower SCGB3A2 expression Since the GD-associated SNP haplotypes are located in the SCGB3A2 promoter, a region that contains relatively wellconserved transcription factor binding sites (Fig. 3), we hypothesized that these pSNPs may affect the expression of SCGB3A2. To test this, seven SCGB3A2 promoter/luciferase reporter constructs were made and transfected into HeLa and SPC-A1 cells, which are human cervical carcinoma and lung carcinoma cell lines, respectively (Fig. 4A and B). As shown in Figure 4, the luciferase activities of pGL3-(SNP76þ SNP75), pGL3-(SNP76þSNP74) and pGL3-(SNP76þ SNP75þSNP74) were decreased in both HeLa and SPC-A1 cells, relative to other haplotypes (Fig. 4A and B). The co-transfection of thyroid transcription factor-1 (TTF-1) 1164 Human Molecular Genetics, 2009, Vol. 18, No. 6 Figure 3. Sequence conservation and transcription factor binding sites near the SNP76, SNP75 and SNP74, as predicted by the web site of UCSC (http://genome. ucsc.edu/) and using the Alibaba 2.1 software, respectively. In the human sequence, two TTF-1, one NFkB and one C/EBPa binding site near SNP76, and one TTF-1 motif adjacent to SNP75, were predicted. In the presence of pSNPs, the C/EBPa near SNP76 disappears, while a TBP binding site appears at the positions of SNP75 and SNP74. The broken lines indicate the putative TBP binding sites alleles with the SNP75 or SNP74. increased the overall luciferase activity levels, while the relative influence of the pSNPs on reporter gene expression remained unchanged. Next, using electrophoretic mobility shift assays (EMSAs), we asked whether the GD susceptibility alleles of the SNP76, SNP75 and SNP74 affected the binding of transcription factors to the SCGB3A2 promoter. Two main bands, I and II, were identified in the EMSAs with each of the SNP76, SNP75 and SNP74 SNP probes after incubation with nuclear extracts of SPC-A1 cells. Compared to probes for alleles not linked to GD, the SNP76 and SNP75 probes produced one band of stronger intensity (Fig. 4C), whereas one band produced by the SNP74 probe was less intense (Fig. 4C). In addition, use of unlabeled AG and T allele probes to compete for the labeled AG allele probe of SNP75, showed that the T allele was better able to compete for binding of AG allele with band II. This data also suggested that the susceptible allele T of SNP75 had higher binding affinity with the transcription factor band II than the nonsusceptible allele AG (Fig. 4C, right panel). We next sought to determine if the differential promoter activity associated with the GD-linked SNPs is also seen in thyroid tissue. Samples of thyroid tissue were collected from 93 patients in the Shandong province with thyroid adenoma or multinodular goiter but not hyperthyroidism. The expression of the SCGB3A2 gene in the thyroid tissue samples derived from patients with SNP76þSNP75 and SNP76þSNP74 alleles was significantly lower than it was in samples from patients with wild-type alleles (P ¼ 0.047 and 0.027, respectively) (Fig. 4D and E). The effect of these SNPs on SCGB3A2 transcription was further confirmed using allele-specific transcript quantification (ASTQ) (5). The relative contribution of each haplotype to SCGB3A2 transcript production in five samples of thyroid tissue from heterozygous individuals was evaluated using a Mae III restriction fragment length polymorphism (RFLP) located at SNP rs34212847 (SNP89) in exon 3 of the SCGB3A2 gene (Fig. 4F). As shown in Figure 4Fc, the intensities of the 380 and 253 bp bands represented the SCGB3A2 mRNA levels transcribed from the GD-susceptible haplotype T:A:A and the non-susceptible haplotype AG:G:G at the corresponding positions (SNP75, SNP76 and SNP89 SNPs). Because the intensities of the bands depend on the lengths of the digested RT– PCR products from ASTQ, when the mRNA transcribed from the two alleles are equal, the ratio of the intensities between the 380 bp band and the 253 bp band should theoretically be greater than 1:1. In fact, when equal amounts of the ASTQ products amplified from the lung tissue of six individuals with homozygous AG:G:G alleles were separated on an agarose gel, with or without MaeIII digestion, the actual ratio of intensities between the 380 bp band and the 253 bp band was 2.1 + 0.4 (mean + SD) (Fig. 4Fb). However, the ratio of the ASTQ bands derived from thyroid tissues of five individuals with heterozygosity at the SNP75, SNP76 and SNP89 positions was 0.9 + 0.2 (mean + SD) (Fig. 4Fc), which was significantly lower than what was measured in individual homozygous for non-susceptible alleles (P , 0.001). These results suggested that SCGB3A2 mRNA levels were lower in individuals with the GD-susceptible haplotype. The expression pattern of SCGB3A2 and its receptor MARCO gene in human and mouse With regard to the SCGB3A2 gene expression, it was previously reported that the highest mRNA level was observed in the human lung by the northern blot analysis, whereas low expression was also detected in human thyroid tissue (18). In the present study, using RT – PCR analysis, we confirmed that the mRNA of SCGB3A2 was expressed at high Human Molecular Genetics, 2009, Vol. 18, No. 6 1165 Figure 4. The effect of SNPs on SCGB3A2 expression in in vitro and in vivo analyses. Relative luciferase activities of the reporter plasmids containing SCGB3A2 promoter regions with distinct pSNP combination and a wild-type control were detected in SPC-A1 (A) and Hela (B) cell lines. Open and filled bars represent co-transfection with or without a plasmid expressing the gene for thyroid transcription factor-1 (TTF-1). Luciferase activities are normalized according to pRLO activity, and relative luciferase activity (fold) is expressed based on the induction-fold relative to the transfection of empty vector (pGL3-Basic) in each reporter gene assay. The results are the average of three independent experiments performed in triplicate. The bars indicate the standard error. (C) Binding affinity of nuclear factors to the 2550 bp promoter regions around the SNP76, SNP75 and SNP74 of SCGB3A2. The 2550 bp oligonucleotides, including wild and mutation alleles of the SNP76, SNP75 and SNP74 of SCGB3A2, were labeled with [g- 32P] dATP. Arrows indicate the bands of the EMSAs using each of the SNP76, SNP75 and SNP74 probes incubated with nuclear extracts from SPC-A1 cells. Top and bottom arrows correspond to band I and band II, respectively. LWP and NLWP: the labeled and unlabeled wild-type probes, respectively; LMP and NLMP: labeled and unlabeled mutant probes, respectively; NP: extracted nuclear protein. (D) The expression levels of SCGB3A2 in thyroid tissues with haplotype SNP76þSNP75 (n ¼ 11) were significantly decreased based on real time RT– PCR analysis, as compared with those devoid of SNP76, SNP75 or SNP74 (n ¼ 16). (E) Comparison of SCGB3A2 gene expression with SNP76þSNP74 haplotype (n ¼ 5) and wild-type haplotypes. (F) Allele-specific ASTQ of SCGB3A2 using the Mae III RFLP located at the position SNP89 (rs34212847) G/A in exon 3 of RNA (cDNA) derived from thyroid tissues of five heterozygous individuals at the SNP75, SNP76 and SNP89 positions (Fb). Relative contributions of the susceptible (SNP75T2SNP76A2SNP89A) and non-susceptible (SNP75AG2SNP76G2SNP89G) haplotypes to SCGB3A2 expression are presented as a SNP89A (380 bp) to G (253 bp) ratio. The smaller sized bands from the SNP89 G allele (127 bp) are not included in calculating the ratio, owing to their weak intensities. As a result, in six control samples homozygous for the SNP89 G allele (Fc), the mean ratio of intensity between the 380 bp band and the 253 bp band was 2.1:1, instead of the theoretical ratio of 1:1. level in lung tissues in both mouse and human, though low level transcripts were also present in thyroid and kidney in human, and adrenal gland, thymus, brain, muscle and skin in mice (Fig. 5A). Recently, the macrophage scavenger receptor with collagenous structure (MARCO) protein was identified as a receptor for SCGB3A2 (19). Interestingly, we found that MARCO was expressed in a wide range of tissues, including immunity-related ones such as spleen, thymus, lymph node and liver by semi-quantitative RT– PCR analysis (Fig. 5B). DISCUSSION Our case – control study of 2811 GD patients and 2807 healthy individuals using a large number of SNPs located in the 5q12 – q33 region, which is linked to GD, had identified and validated a new gene (SCGB3A2) associated with GD. A significant association between GD with several SNPs in the SCGB3A2 gene was identified, with the strongest associations mapped to SNPs in the SCGB3A2 promoter (pSNP) (SNP76 and SNP75). Furthermore, the results of the logistic regression analysis in the combined Chinese Han cohorts suggested that these two SNPs were probably the causal variants because they improved the model when combined with any one of the other 33 SNPs in SCGB3A2 gene. Interestingly, in our study cohorts that were recruited from different geographic regions of China, three of haplotypes showed significantly higher frequencies among patients with GD than those in the control individuals. However, the haplotypes contributing to the susceptibility of GD were different in two subsets, the Shandong and Shanghai populations. The haplotypes of the SNP76 (rs1368408, G/A)þSNP74 (rs6882292, G/A) (000101110) or SNP76þSNP75 (2623 2622, AG/T) (010011001) variants were correlated with high-disease susceptibility in the Shandong subset, and the significant association between the haplotype of SNP76þSNP73 (2130121303, AAA/2)þSNP71 (21664, A/T) (101001110) and GD collected from Shanghai subset was identified. These results were similar to the observation in 1166 Human Molecular Genetics, 2009, Vol. 18, No. 6 Figure 5. RT –PCR analysis of the expression of SCGB3A2 and the gene for its receptor, MARCO, in different human and mouse tissues. (A) The SCGB3A2 transcript was detected at a high level in lung tissues from both mouse and human, while low-level expression was detected in human thyroid and kidney, and in adrenal gland, thymus, brain, muscle and skin of mice. (B) The MARCO gene was expressed at a high level in lung and liver (human and mouse), mammary gland (human), submandibular gland, spleen, thymus and epididymis fat (mouse), while a low level of expression was measured in thyroid and muscle (human and mouse), lymph node (human) and testis (mouse). the most Mendelian monogenic disorders, in which a spectrum of different mutations in a gene (or genes) caused a disease (20). The notion was supported by the recent study that rare DNA sequence variants in some genes collectively contributed significantly to low plasma levels of HDL-C, a common quantitative trait (21). In fact, previous studies have also documented that causal variants in a gene in the different ethnic and geographic populations with a common complex disease were different (5,6,22). However, as we were concluding our study, a study describing the lack of an association between SCGB3A2 and GD was reported (23). This report shows that the allele frequency distribution of the SNPs within the SCGB3A2 gene do not show significant differences between 146 GD patients and 142 unrelated controls (23). However, the sample size in that study was relatively small, and the number of SNPs used was limited. Indeed, in recent years, some statisticians suggested that the prior odds against an association in a case – control study would usually exceed 1000:1, even for candidate genes, and may even exceed 10 000:1 for random polymorphisms (24). The arguments of Wacholder et al. (25) would then suggest the use of statistical significance levels in the range of 1024 to 1026. According to the criteria, few previous molecular epidemiology studies, with sample sizes in the hundreds that have been typical in the field, were likely to attain such levels of statistical significance. This lack of statistical power, together with the usual sources of bias (e.g. confounding, inappropriate controls and measurement error), might account for most of the observed failures to replicate reported associations between genetic variants and diseases (24). In recent years, Wacholder et al. (25) defined the probability of no association given a statistically significant finding as the FPRP and developed a statistic procedure for FPRP. In the mathematics model, a high FPRP could be a consequence of any combination of a low prior probability that the association between the genetic variant and the disease was real, low statistical power or a relatively high P-value. Given that some estimates of the overall FPRP in the molecular epidemiology literature have been near 0.95 (26), Wacholder et al. (25) considered that an FPRP value near 0.5 would represent a substantial improvement over current practice about studies of association between genetic variants and diseases. They further suggested that large studies or pooled analyses that attempted to be more definitive evaluations of a hypothesis about association between a genetic variant and a disease should use a more stringent FPRP value, perhaps below 0.2 (25). The current work found that among the eight genetic variants with significant association with GD in the region of SCGB3A2, the FPRP values for the SNP76 and SNP75 were very low for this prior probability range and were quite robust even for low prior probabilities. These data suggested that these two SNPs with significant association with GD in the promoter of SCGB3A2 gene were noteworthy. Although the SCGB3A2 region probably harbored etiological DNA variants, it was still not refused that there were other primary disease causing polymorphisms within the region 5q12 – q33 linked to GD. The SCGB3A2 gene encodes a secretary protein and is reported to be a target of the homeodomain transcription factor T/EBP (TTF-1), which regulates the expression of thyroid- and lung-specific genes, such as thyroglobulin (27), thyroid peroxidase (28,29), TSH receptor (30) and Na/I sym- Human Molecular Genetics, 2009, Vol. 18, No. 6 porter (31) in the thyroid, and surfactant proteins (32) and Clara cell secretory protein (33) in the lung. Previous reports state that the SCGB3A2 is expressed at high levels in human lung tissue and at low levels in the thyroid (18). The SCGB3A2 protein has been detected specifically in the epithelial cells of respiratory system (18). SCGB3A2 mRNA levels are down-regulated in inflamed mouse lungs, whereas the expression level returns to normal following dexamethasone treatment (18). A recent study demonstrated that expression of SCGB3A2 was reduced in a mouse model of allergic airway inflammation by a mechanism involving IL-5 and IL-9 (34,35). However, the constitutive expression of SCGB3A2 mRNA is enhanced by IL-10 (36). Furthermore, a polymorphism (G/A) at the 2112 locus (which is in SNP rs1368408) of the human SCGB3A2 gene promoter has been identified to associate with an increased risk of adult bronchial asthma in the Japanese population (37), although the association have not been replicated in small size populations recruited from another Japanese population involving asthmatic children (38), a Germanic Caucasian (39) and Indian populations (40). Interestingly, Inoue et al. (41) recently showed that the mean plasma SCGB3A2 levels for subjects with 2112A allele were significantly lower than those without it (P ¼ 0.025). Moreover, severe asthma patients without treatment by oral corticosteroid had significantly lower plasma SCGB3A2 levels compared to mild- or moderate-asthma patients and controls. In this study, we also found that pSNPs (SNP76, SNP75 and SNP74) most strongly associated with GD tended to be associated with reduced SCGB3A2 gene expression levels in human thyroid tissue, while functional analysis revealed a relatively low efficiency of SCGB3A2 promoters of the SNP76þSNP75 and SNP76þSNP74 haplotypes in driving gene expression. Recently, MARCO was identified as the SCGB3A2 receptor (19), and is expressed in the macrophages of spleen and lymph nodes (42) and lung alveoli (19). We confirmed the tissue distribution of MARCO, including the expression in immunity-related organs. It is tempting to speculate that SCGB3A2 protein secreted from the lung tissue may regulate the functions of immune organs via MARCO, and thereby contribute to the susceptibility of GD. Further studies are needed to confirm this hypothesis. MATERIALS AND METHODS Sample recruitment A total of 541 unrelated individuals with GD were recruited from Shandong Province, China. The control group was made up of 478 unrelated healthy subjects from the same geographic region screened for the absence of thyroid disease. The diagnosis of GD was based on documented clinical and biochemical evidence of hyperthyroidism, diffused goiter and the presence of at least one of the following items: positive TSH receptor antibody tests, diffusely increased 131I (iodine-131) uptake in the thyroid gland or presence of exophthalmos. All individuals classified as affected were interviewed and examined by experienced clinicians. Two additional series of 545 cases and 603 controls, and 1725 cases and 1726 controls that met identical criteria were collected from Shanghai, and from different geographic 1167 regions, such as Jiangsu, Henan, Anhui and Fujian Province in China. All subjects were Han Chinese in origin. After receiving informed consent, 5 ml blood samples were collected from all participants for DNA preparations, as well as for biochemical measurements. Identification of SNPs, genotyping and QC filters Several steps were taken to narrow down the size of the region(s) associated with GD susceptibility. First, 179 SNPs in the 3.0 Mb region surrounding D5s2090 were selected from the NCBI dbSNP (NCBI Human Genome Build 36.1) for association analysis in 384 GD and 382 normal subjects, collected according to their time queue of sampling from Shandong Province. The results covered a region with strong association, as indicated by four SNPs with statistical significance (at P-value ,0.001 level). Next, a second SNP association study was performed for the 1.0 Mb region between SHGC-111280 and RH92492, which was determined to have the highest association with GD. The information on the 11 genes contained in this 1.0 Mb and the primers used for amplifying the exons and promoters of these genes are given in Supplementary Material, Table S5. Each exon was sequenced using flanking primers that were about 100 base pairs upstream of the 50 intron–exon junction or downstream of the 30 intron–exon junction. This approach enabled us to sequence all regions that could affect the amino acid sequence, as well as splicing sites of these genes. The 10002000 base pairs upstream of the first exon of these genes were also re-sequenced. A total of 39 intergenic SNPs within the 1.0 Mb region, which were distributed over approximately 5 Kb, were selected from the NCBI dbSNP (NCBI Human Genome Build 36.1). Furthermore, the 15 Kb region containing the exons and introns, 50 and 30 flanks of the SCGB3A2 gene, which had the strongest association GD, were completely resequenced. We found 38 SNPs in this region and 13, 12 and 13 SNPs, respectively, were distributed on the exons and introns, and 50 and 30 flanking regions of the SCGB3A2. Genomic DNA was amplified using specific primers and the PCR products were sequenced with an ABI 3700 DNA Sequencer (Applied Biosystems), as described (43). We sequenced PCR products from 48 unrelated individuals with GD to identify SNPs. All genotypes were performed using the Mass-ArrayTM Technology Platform of Sequenom, Inc. (San Diego, CA, USA). The genotyping results of a small number of key SNPs on the SCGB3A2 gene, such as SNP78-71, SNP82 and SNP91, in the Shanghai population, and SNP76, SNP75 and SNP74 in the Shandong population, were confirmed using either a second batch of the Mass-ArrayTM , or directed sequencing. SNPs with MAF ,1% or HWE P 1 1026 in controls were removed from the analysis (15). Statistical analysis of association In the case – control design, allele/genotype frequencies, ORs and significance values were analyzed by x2 analysis using SPSS (version 13.0; SPSS Inc.). A P-value ,0.05 was considered significant. The genotype data were further mined by logistic regression analysis, as previously described (5,17). LD regions were analyzed by Haploview (16). Haplotypes were generated for SNPs within genes using the PHASE 1168 Human Molecular Genetics, 2009, Vol. 18, No. 6 program (Version 2.1). Haplotype frequencies were calculated for case and control, respectively, and the significance was assessed by x2 values and a P-value ,0.05 was considered significant. FPRP was analyzed using the FPRP calculation spreadsheet provided by Wacholder et al. (25). Cell culture, transfections and luciferase assays To construct the promoter/luciferase reporter plasmids containing the various SNP changes in the SCGB3A2 promoter region, seven types of fragments were generated, including the wild-type (rs1368408 G/G, 26232622 AG/AG and rs6882292 G/G), SNP76 (rs1368408 A/A), SNP75 (2623 2622 T/T), SNP74 (rs6882292 A/A), SNP76þSNP75 (rs1368408 A/A and 26232622 T/T), SNP76þSNP74 (rs1368408 A/A and rs6882292 A/A) and SNP76þSNP75þSNP74 (rs1368408 A/A, 26232622 T/T and rs6882292 A/A), each of which were separately cloned into the Kpn I – Hind III site of the pGL3-Basic luciferase reporter vector (Promega) to generate pGL3-N, pGL3-SNP76, pGL3-SNP75, pGL3-SNP74, pGL3-(SNP76þ SNP75), pGL3-(SNP76þSNP74) and pGL3-(SNP76þ SNP75þSNP74) plasmids (Fig. 4A and B). The sequence of each insert was verified by direct sequencing. For construction of the pcDNA3.1-TTF1 expression plasmid, the respective coding sequence was amplified by RT – PCR from total RNAs prepared from SPC-A1 human lung adenocarcinoma cells. The forward primer, 50 -ATCCTCGAGATGTCGATGA GTCCAAAGC-30 , and the reverse primer, 50 -ATAGGA TCCACCAGGTCCGACCGTATAGC-30 , were used for PCR amplification. The amplified fragment was then cloned into the XhoI -BamHI site of pcDNA3.1/Myc-His (þ) C vector (Invitrogen). The sequence of the insert was verified by direct sequencing. HeLa cells and SPC-A1 cells were grown in DMEM medium supplemented with 10% FCS. Transient transfections were performed using Lipofectamine 2000 (Invitrogen), according to the manufacturer’s protocol. Briefly, 5 104 cells per well were seeded in 12 well plates 16 h before the experiment, and transfected at approximately 50– 70% confluence, with 300 ng SCGB3A2 luciferase reporter constructs and 30 ng of pcDNA3.1-TTF1 or pcDNA3.1 vector, together with 0.3 ng of pRL-SV40 as a normalization control. After 36 h incubation, luciferase activities were determined using a Dual Luciferase Reporter Assay System (Promega) according to the manufacturer’s instruction. To correct for transfection efficiency, the luminescence unit of each SCGB3A2 luciferase reporter construct was normalized to that of the pRL-SV40 control plasmid. The promoter activity was expressed as a ratio of relative luciferase unit of each SCGB3A2 construct compared to that of the promoterless pGL3-Basic vector in the presence of the same transactivating plasmid. Data are reported as the mean value of at least three independent experiments (triplicate samples). Electrophoretic mobility shift assays Nuclear extracts from SPC-A1 cells were prepared using NE-PERR Nuclear and Cytoplasmic Extraction Kits (PIERCE). Double-stranded oligonucleotide probes were used in this study. The oligo sequences are as follow: SNP76 probe, sense strand, 50 -TCC AAA TTG TTT [G/A]GT GAG AAA ACA T-30 , antisense strand, 50 -ATG TTT TCT CAC [C/T]AA ACA ATT TGG A-30 ; SNP75 probe, sense strand, 50 -TTT TCA AA[AG/T] ACA CTC TGA TTT TAG ATC TTA AGC CTA TTA TTC TA-30 , antisense strand, 50 -TAG AAT AAT AGG CTT AAG ATC TAA AAT CAG AGT GT[CT/A] TTT GAA AA-30 ; and SNP74 probe, sense strand, 50 -TGT GTT ATT TAT [G/A]TT CCC ATT TTA-30 , antisense strand, 50 -TAA AAT GGG AA[C/T] ATA AAT AAC ACA-30 . These probes were labeled at the 50 end with [g-32P] dATP and T4 polynucleotide kinase (Promega). The labeled oligonucleotides were separated from the unincorporated nucleotides using a MicroSpinTM G-25 Column (Amersham). An aliquot of 25 mg nuclear extract was incubated with 1 ml (radio activity: 6 1051 cpm/min) radiolabeled probe for 30 min on ice in 20 ml binding buffer (Promega). Specificity of protein binding to radiolabeled oligonucleotides was demonstrated by the addition of a 10-fold excess of unlabelled competing oligonucleotides. After 20 min incubation at room temperature, the samples were resolved on a 6% polyacrylamide gel in 0.5TBE at 250 V for 2 h on ice. After electrophoresis, the polyacrylamide gel was dried and autoradiographed. For competition study, nuclear extracts were pre-incubated with 0.5-, 1.5-, 3-, 5- or 10-fold unlabeled wild-type or mutant SNP75 probes before adding the [g-32P] dATP-labeled wild-type SNP75 probe. Real-time reverse transcriptase – polymerase chain reaction (RT – PCR) To measure the relative expression levels of SCGB3A2 in the thyroid tissues of patients with combined variants of SNP76þSNP75, SNP76þSNP74 but who did not present with GD or those without SNP76, SNP75 and SNP74, quantitative PCR was performed using TaqMan. After informed consent, 93 thyroid tissue samples were collected from the Shandong Provincial Hospital, Jinan, China, during the surgeries of patients with thyroid adenoma or multinodular goiter but without hyperthyroidism. There were 11 thyroid tissue samples belonging to SNP76þSNP75 haplotype group, 5 to SNP76þSNP74 group and 16 to the group without SNP76, SNP75 and SNP74. Primer sequences for real-time PCR were as follows: human SCGB3A2 primers (forward, 50 -GCTACTGCCTTCCTCATCAACAA-30 ; reverse, 50 -CCC TCCACAAGGTGCTCAAC-30 ) and GAPDH (forward, 50 -GAAGGTGAAGGTCGGAGTC-30 ; reverse, 50 -GAAGAT GGTGATGGGATTTC-30 ). TaqMan probe sequences were as follows: SCGB3A2 probe (50 -TGCCCCTTCCTGTTGAC AAGTTGGC-30 ) and GAPDH probe (50 -CAAGCTTCCC GTTCTCAGCC-30 ). Reaction temperatures and cycling parameters were as follows: 958C for 15 min, then 45 cycles at 948C for 30 s, 588C for 40 s and 728C for 1 min, then 728C for 10 min. Quantification was accomplished by comparison with standard curves generated from known amounts of plasmid containing the gene of interest (100–10 000 000 copies). Allele-specific transcript quantification The lung or thyroid tissues were collected from patients with lung cancer or thyroid adenoma or multinodular goiter Human Molecular Genetics, 2009, Vol. 18, No. 6 undergoing surgery. The regions containing the SNP75 (26232622, AG/T), SNP76 (rs1368408, G/A) and SNP89 (rs34212847, G/A) SNPs were divided into two fragments (both of which contained the SNP rs1368408) and amplified from these samples using the following two pairs of primers: the first pair: forward, 50 -CATATGGACTCCGC TTTCTATTTC-30 ; reverse, 50 -CAACCCTGCAAATATGT GC-30 and the second pair: forward, 50 -GGATTCGTTGGG CTCTTTG-30 ; reverse, 50 -TGGTAGAACAGGTTTCAGG CAG-30 . The amplified products were cloned into the PG EM-T easy vector and sequenced to identify the individuals with the heterozygous or homozygous haplotypes at the positions of the SNP75, SNP76 and SNP89 SNPs in the SCG B3A2 gene. ASTQ was performed, as previously described (5) with some modifications. The cDNAs were prepared from lung and thyroid tissues. The SCGB3A2 gene was amplified by PCR using the primers: 50 -TGGTGACCATCAG CCTTTG-30 and 50 -TGTCCTTTTCACGGGTCACTAC-30 . Reaction temperatures and cycling parameters were as follows: 958C for 15 min, then 35 cycles at 958C for 30 s, 628C for 30 s, and 728C for 30 s. The PCR products were labeled with DIG-11-dUTP (Roche, Germany) at the 35th cycle of PCR. ASTQ PCR products were digested with MaeIII (Roche, Germany) and resolved on a 2% agarose gel. As a control to monitor whether the ASTQ PCR products were fully digested, equal amounts of PCR product amplified from individuals with homozygote genotypes at SNP89 (rs34212847, G/G) were digested with MaeIII, owning to the G in the SNP89 position forming a cleavage site for MaeIII. Digested products were transferred to a positively charged N membrane (Roche, Germany) in alkaline solution; the membrane was then baked at 808C for 30 min. According to manufacturer’s instruction, the membrane was washed and blocked and then incubated with anti-DIG serum/alkaline phosphatase conjugate. CDP-star was used as the chemiluminescence substrate. Signals were visualized on X-ray film. Data were obtained by scanning the exposed bands with the Quantity One software (BIO-RAD). The band intensities of the 380 and 253 bp digested fragments were determined and represent the level of SCGB3A2 mRNA transcribed from the susceptible haplotype (T:A:A) and non-susceptible haplotype (AG:G:G) at the corresponding positions (SNP75, SNP76 and SNP89 SNPs). The smaller sized bands from the SNP89 G allele (127 bp) were not included in calculating the ratio of the T:A:A and AG:G:G haplotypes owing to their weak intensities; this lead to all SNP89 A/G ratios being overestimated; thus, the normalization was conducted using the lung tissues of six control individuals with homozygous alleles at the SNP75, SNP76 and SNP89 positions. Semi-quantitative RT – PCR The expression patterns of the SCGB3A2 gene and MARCO gene in mouse and human tissues were analyzed by semiquantitative RT –PCR. The first-strand cDNAs were synthesized from total RNA (1– 2 mg) from different tissues using oligo(dT) (Promega) in a 20 ml reaction. cDNAs were then amplified using gene-specific PCR primers. GAPDH was used as an internal control. The PCR mixture contained cDNA (1 ml), 10 mM dNTP (0.5 ml), 10 PCR buffer for 1169 Taq plus (2 ml) and Taq plus DNA polymerase (2 U) (Sangon), GAPDH primers (10 pmol) and gene-specific intron-spanning primers (20 pmol). Reactions were carried out in a PCR apparatus (PTC-100 MJ.RESARCH, Inc.). One PCR cycle consisted of denaturation for 30 s (948C), annealing for 30 s (608C) and extension for 45 s (728C). Each PCR reaction consisted of 25– 28 cycles. SUPPLEMENTARY MATERIAL Supplementary Material is available at HMG online. ACKNOWLEDGEMENTS We thank all patients and normal individuals for participating in this study, and professor Ding-Liang Zhu and Dr Lin Lu in Ruijin Hospital for providing the DNA of healthy subjects in Shanghai. Conflict of Interest statement. None declared. FUNDING This work was supported in part by the National Key Program for Basic Research (973), National Natural Science Foundation of China (30530370, 30470815 and 30771017), Chinese High Tech Program (863), Commission for Science and Technology of Shanghai, Shandong and Jiangsu Province, and the Foundation for the Author of National Excellent Doctoral Dissertation of People’s Republic of China. REFERENCES 1. Hollowell, J.G., Staehling, N.W., Flanders, W.D., Hannon, W.H., Gunter, E.W., Spencer, C.A. and Braverman, L.E. (2002) Serum TSH, T4, and thyroid antibodies in the United States population (1988 to 1994): National Health and Nutrition Examination Survey (NHANES III). J. Clin. Endocrinol. Metab., 87, 489– 499. 2. Chen, X., Wu, W.S., Chen, G.L., Zhang, K.Z., Zhang, F.L., Lin, Y.C., Liu, Y.C., Liu, X.Y., Fang, Z.P. and Luo, C.R. (2000) The effect of salt iodization for 10 years on the prevalences of endemic goiter and hyperthyroidism. Chin. J. Endocrinol. Metab., 18, 342– 344. 3. Tomer, Y. and Davies, T.F. (2003) Searching for the autoimmune thyroid disease susceptibility genes: from gene mapping to gene function. Endocr. Rev., 24, 694– 717. 4. Onodera, T. and Awaya, A. (1990) Anti-thyroglobulin antibodies induced with recombinant reovirus infection in BALB/c mice. Immunology, 71, 581– 585. 5. Ueda, H., Howson, J.M.M., Esposito, L., Heward, J., Snook, H., Chamberlain, G., Rainbow, D.B., Hunter, K.M.D., Smith, A.N. and Di Genova, G. (2003) Association of the T-cell regulatory gene CTLA 4 with susceptibility to autoimmune disease. Nature, 423, 506–511. 6. Yanagawa, T., Hidaka, Y., Guimaraes, V., Soliman, M. and DeGroot, L.J. (1995) CTLA-4 gene polymorphism associated with Graves’ disease in a Caucasian population. J. Clin. Endocrinol. Metab., 80, 41– 45. 7. Tomer, Y., Concepcion, E. and Greenberg, D.A. (2002) AC/T single-nucleotide polymorphism in the region of the CD40 gene is associated with Graves’ disease. Thyroid, 12, 1129– 1135. 8. Velaga, M.R., Wilson, V., Jennings, C.E., Owen, C.J., Herington, S., Donaldson, P.T., Ball, S.G., James, R.A., Quinton, R. and Perros, P. (2004) The codon 620 tryptophan allele of the lymphoid tyrosine phosphatase (LYP) gene is a major determinant of Graves’ disease. J. Clin. Endocrinol. Metab., 89, 5862–5865. 9. Hiratani, H., Bowden, D.W., Ikegami, S., Shirasawa, S., Shimizu, A., Iwatani, Y. and Akamizu, T. (2005) Multiple SNPs in intron 7 of 1170 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Human Molecular Genetics, 2009, Vol. 18, No. 6 thyrotropin receptor are associated with Graves’ disease. J. Clin. Endocrinol. Metab., 90, 2898–2903. Shirasawa, S., Harada, H., Furugaki, K., Akamizu, T., Ishikawa, N., Ito, K., Ito, K., Tamai, H., Kuma, K. and Kubota, S. (2004) SNPs in the promoter of a B cell-specific antisense transcript, SAS-ZFAT, determine susceptibility to autoimmune thyroid disease. Hum. Mol. Gen., 13, 2221– 2231. Jin, Y., Teng, W., Ben, S., Xiong, X., Zhang, J., Xu, S., Shugart, Y.Y., Jin, L., Chen, J. and Huang, W. (2003) Genome-wide scan of Graves’ disease: evidence for linkage on chromosome 5q31 in Chinese Han pedigrees. J. Clin. Endocrinol. Metab., 88, 1798– 1803. Sakai, K., Shirasawa, S., Ishikawa, N., Ito, K., Tamai, H., Kuma, K., Akamizu, T., Tanimura, M., Furugaki, K. and Yamamoto, K. (2001) Identification of susceptibility loci for autoimmune thyroid disease to 5q31-q33 and Hashimoto’s thyroiditis to 8q23-q24 by multipoint affected sib-pair linkage analysis in Japanese. Hum. Mol. Genet., 10, 1379–1386. Allen, E.M., Hsueh, W.C., Sabra, M.M., Pollin, T.I., Ladenson, P.W., Silver, K.D., Mitchell, B.D. and Shuldiner, A.R. (2003) A genome-wide scan for autoimmune thyroiditis in the Old Order Amish: replication of genetic linkage on chromosome 5q11. 2-q14. 3. J. Clin. Endocrinol. Metab., 88, 1292– 1296. Roberts, S.B., MacLean, C.J., Neale, M.C., Eaves, L.J. and Kendler, K.S. (1999) Replication of linkage studies of complex traits: an examination of variation in location estimates. Am. J. Hum. Genet., 65, 876 –884. Hom, G., Graham, R.R., Modrek, B., Taylor, K.E., Ortmann, W., Garnier, S., Lee, A.T., Chung, S.A., Ferreira, R.C. and Pant, P.V. (2008) Association of systemic lupus erythematosus with C8orf13-BLK and ITGAM-ITGAX. N. Engl. J. Med., 358, 900– 909. Barrett, J.C., Fry, B., Maller, J. and Daly, M.J. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263– 265. Cordell, H.J. and Clayton, D.G. (2002) A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to[ITAL] HLA [/ITAL] in type 1 diabetes. Am. J. Hum. Genet., 70, 124– 141. Niimi, T., Keck-Waggoner, C.L., Popescu, N.C., Zhou, Y., Levitt, R.C. and Kimura, S. (2001) SCGB3A2, a uteroglobin/Clara cell secretory protein-related protein, is a novel lung-enriched downstream target gene for the T/EBP/NKX2.1 homeodomain transcription factor. Mol. Endocrinol., 15, 2021– 2036. Bin, L.H., Nielson, L.D., Liu, X., Mason, R.J. and Shu, H.B. (2003) Identification of uteroglobin-related protein 1 and macrophage scavenger receptor with collagenous structure as a lung-specific ligand-receptor pair. J. Immunol., 171, 924 –930. Reich, D.E. and Lander, E.S. (2001) On the allelic spectrum of human disease. Trends Genet., 17, 502–510. Cohen, J.C., Kiss, R.S., Pertsemlidis, A., Marcel, Y.L., McPherson, R. and Hobbs, H.H. (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science, 305, 869–872. Vaidya, B., Imrie, H., Perros, P., Young, E.T., Kelly, W.F., Carr, D., Large, D.M., Toft, A.D., McCarthy, M.I., Kendall-Taylor, P. and Pearce, S.H. (1999) The cytotoxic T lymphocyte antigen-4 is a major Graves’ disease locus. Hum. Mol. Genet., 8, 1195– 1199. Yang, Y., Lingling, S., Ying, J., Yushu, L., Zhongyan, S., Wei, H. and Weiping, T. (2005) Association study between the IL4, IL13, IRF1 and UGRP1 genes in chromosomal 5q31 region and Chinese Graves’ disease. J. Hum. Genet., 50, 574– 582. Thomas, D.C. and Clayton, D.G. (2004) Betting odds and genetic associations. J. Natl Cancer Inst., 96, 421–423. Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. and Rothman, N. (2004) Assessing the probability of false-positive reports in molecular epidemiology studies. J. Natl Cancer Inst., 96, 434– 442. Colhoun, H.M., McKeigue, P.M. and Davey Smith, G. (2003) Problems of reporting genetic associations with complex outcomes. Lancet, 361, 865– 872. 27. Civitareale, D., Lonigro, R., Sinclair, A.J. and Di Lauro, R. (1989) A thyroid-specific nuclear protein essential for tissue-specific expression of the thyroglobulin promoter. EMBO J., 8, 2537– 2542. 28. Francis-Lang, H., Price, M., Polycarpou-Schwarz, M. and Di Lauro, R. (1992) Cell-type-specific expression of the rat thyroperoxidase promoter indicates common mechanisms for thyroid-specific gene expression. Mol. Cell. Biol., 12, 576–588. 29. Kikkawa, F., Gonzalez, F.J. and Kimura, S. (1990) Characterization of a thyroid-specific enhancer located 5.5 kilobase pairs upstream of the human thyroid peroxidase gene. Mol. Cell. Biol., 10, 6216–6224. 30. Shimura, H. (1994) Thyroid-specific expression and cyclic adenosine 30 , 50 -monophosphate autoregulation of the thyrotropin receptor gene involves thyroid transcription factor-1. Mol. Endocrinol., 8, 1049–1069. 31. Endo, T., Kaneshige, M., Nakazato, M., Ohmori, M., Harii, N. and Onaya, T. (1997) Thyroid transcription factor-1 activates the promoter activity of rat thyroid Naþ/I-Symporter Gene. Mol. Endocrinol., 11, 1747–1755. 32. Bohinski, R.J., Di Lauro, R. and Whitsett, J.A. (1994) The lung-specific surfactant protein B gene promoter is a target for thyroid transcription factor 1 and hepatocyte nuclear factor 3, indicating common factors for organ-specific gene expression along the foregut axis. Mol. Cell. Biol., 14, 5671– 5681. 33. Ray, M.K., Chen, C.Y., Schwartz, R.J. and DeMayo, F.J. (1996) Transcriptional regulation of a mouse Clara cell-specific protein (mCC10) gene by the NKx transcription factor family members thyroid transcription factor 1 and cardiac muscle-specific homeobox protein (CSX). Mol. Cell. Biol., 16, 2056– 2064. 34. Chiba, Y., Srisodsai, A., Supavilai, P. and Kimura, S. (2005) Interleukin-5 reduces the expression of uteroglobin-related protein (UGRP) 1 gene in allergic airway inflammation. Immunol. Lett., 97, 123–129. 35. Chiba, Y., Kusakabe, T. and Kimura, S. (2004) Decreased expression of uteroglobin-related protein 1 in inflamed mouse airways is mediated by IL-9. Am. J. Physiol. Lung. Cell. Mol. Physiol., 287, L1193– L1198. 36. Srisodsai, A., Kurotani, R., Chiba, Y., Sheikh, F., Young, H.A., Donnelly, R.P. and Kimura, S. (2004) Interleukin-10 induces uteroglobin-related protein (UGRP) 1 gene expression in lung epithelial cells through homeodomain transcription factor T/EBP/NKX2.1. J. Biol. Chem., 279, 54358– 54368. 37. Niimi, T., Munakata, M., Keck-Waggoner, C.L., Popescu, N.C., Levitt, R.C., Hisada, M. and Kimura, S. (2002) A polymorphism in the human UGRP1 gene promoter that regulates transcription is associated with an increased risk of asthma. Am. J. Hum. Genet., 70, 718 –725. 38. Jian, Z., Nakayama, J., Noguchi, E., Shibasaki, M. and Arinami, T. (2003) No evidence for association between the – 112G/A polymorphism of UGRP1 and childhood atopic asthma. Clin. Exp. Allergy, 33, 902–904. 39. Heinzmann, A., Dietrich, H. and Deichmann, K.A. (2003) Association of uteroglobulin-related protein 1 with bronchial asthma. Int. Arch. Allergy Immunol., 31, 291–295. 40. Batra, J., Niphadkar, P.V., Sharma, S.K. and Ghosh, B. (2005) Uteroglobin-related protein 1 (UGRP1) gene polymorphisms and atopic asthma in the Indian population. Int. Arch. Allergy Immunol., 136, 1– 6. 41. Inoue, K., Wang, X., Saito, J., Tanino, Y., Ishida, T., Iwaki, D., Fujita, T., Kimura, S. and Munakata, M. (2008) Plasma UGRP1 levels associate with promoter G-112A polymorphism and the severity of asthma. Allergol. Int., 57, 57–64. 42. Elomaa, O., Kangas, M., Sahlberg, C., Tuukkanen, J., Sormunen, R., Liakka, A., Thesleff, I., Kraal, G. and Tryggvason, K. (1995) Cloning of a novel bacteria-binding receptor structurally related to scavenger receptors and expressed in a subset of macrophages. Cell, 80, 603 –609. 43. Song, H.D., Sun, X.J., Deng, M., Zhang, G.W., Zhou, Y., Wu, X.Y., Sheng, Y., Chen, Y., Ruan, Z. and Jiang, C.L. (2004) Hematopoietic gene expression profile in zebrafish kidney marrow. Proc. Natl Acad. Sci. USA, 101, 16240–16245.