* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Supplementary Information (doc 42K)
Survey
Document related concepts
Quantitative trait locus wikipedia , lookup
Heritability of IQ wikipedia , lookup
Microevolution wikipedia , lookup
Behavioural genetics wikipedia , lookup
Metagenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Human genetic variation wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Molecular Inversion Probe wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
SNP genotyping wikipedia , lookup
Transcript
Supplementary Information dbGap dataset and quality control Genome-wide SNP data for 7 018 individuals comprising 2 339 trios in which each child was affected with any type of NSOFC (CL/P or CPO), were downloaded from dbGaP (Accession number: phs000094.v1.p1)1. Available genotype data included a total of 1 387 466 SNPs, comprising 601 273 genotyped SNPs and an additional 786 193 SNPs imputed with BEAGLE using HapMap Phase II samples as a reference panel. Quality control of the original data prior to release included removal of samples that showed high rates of Mendelian errors, inconsistent gender, and unexpected relatedness. For each individual, high confidence imputed SNPs (r2≥0.9) were converted to respective genotypes, while those SNPs with r2<0.9 were removed. Following the recommendations accompanying the dbGaP data release, we removed individuals that had <95% of SNPs successfully genotyped, or those in whom a gross autosomal chromosomal anomaly (≥10Mb) had been detected. We then filtered SNPs, removing those with >5% missing genotypes, minor allele frequency (MAF) <0.01, >50 Mendelian errors, or HardyWeinberg equilibrium (HWE) p<10-5 in the Caucasian and Asian subset of samples separately. After these filtering steps, the population comprised 39% Caucasian individuals, 49% Asians, with the remaining 11% comprising a mix of Pacific Islanders, African Americans, Native Americans, and individuals of mixed ancestry. The number of SNPs and samples used in each analysis are shown in Table 1. All genome coordinates quoted are hg18. Genome-wide detection of putative parent-of-origin effects We used an extension of the Transmission Asymmetry Test and the Parental Asymmetry Test2,3 to study the role of PofO effects in NSOFC. Using rules of Mendelian inheritance in trios, we identified the transmitted and non-transmitted alleles in each parent, and the paternally and maternally inherited alleles in each child. To avoid potential falsepositives due to low-frequency alleles, we utilized only SNPs where all three members of a trio had genotypes available with incomplete trios being discarded for this analysis4. Within each trio, SNPs for which all three family members were heterozygous could not be assigned parental origin and were removed (~5% of markers). We then performed four different tests to detect putative PofO effects: (i) analysis of transmission bias from heterozygous fathers to affected children (PAT); (ii) analysis of transmission bias from heterozygous mothers to affected children (MAT); (iii) a comparison of the maternal and paternal odds ratios (PofO); (iv) a comparison of the relative frequency of the two classes of heterozygotes in affected children (HET). These methods were implemented in PLINK5 by modification of the ‘poo.cpp’ file. Assuming independent transmission of maternal and paternal alleles, the significance of PAT, MAT and HET tests were calculated using the Chi-square distribution at 1 d.f., under the null hypothesis of an equal distribution of the two parental alleles to affected offspring from each parent. Significance for the PofO test was calculated using a normal distribution (Figure 1). We performed a combined analysis for the main phenotype NSOFC (including both NSCL/P and NSCPO trios), and also separate analyses for each of the two etiologically distinct subtypes NSCL/P and NSCPO6,7. Separate analyses were performed using all samples combined (which included Caucasians, Asians, Pacific Islanders, mixed and others) and using the two major ethnic groups (Caucasians and Asians) separately. Because of the reduced sample size, we did not subdivide CPO samples based on ethnicity. SNPs showing putative PofO effects in this discovery cohort were defined as follows: As a primary filter, we first selected those SNPs that showed nominal significance (PPofO<0.05) in the PofO test. Then, using a significance threshold of P<10-5, SNPs were considered as showing a possible PofO bias if they were significant in any of the four tests: PAT, MAT, PofO and HET. A subset of these SNPs were then carried forward for further investigation in a replication cohort (see below). Using the PLINK -blocks function, we calculated the number of distinct linkage disequilibrium (LD) blocks containing GWAS SNPs showing PPofO<0.05, and either PMAT<10-4 or PPAT<10-4. Enrichment analysis was performed using chi-square test at d.f.=1, under the null hypothesis of an equal number of MAT and PAT LD blocks. SNP selection for replication We selected the most significant candidate loci for replication in an independent familybased NSOFC cohort. SNP selection covered the most significant candidate SNPs/loci for the main phenotype NSOFC and for both etiological distinct cleft phenotypes, namely NSCL/P and NSCPO. As we had no access to a replication sample comprising Asian trios, SNP selection for replication was based on the genome-wide PofO analysis in the Caucasian sample and the Caucasian/Asian combined sample. For each candidate locus identified, we tried to select at least two SNPs to allow for possible technical failures. The final replication SNP set comprised 64 SNPs (32 SNPs for NSOFC, 33 SNPs for NSCL/P and 5 SNPs for NSCPO; with six SNPs shared between NSOFC and NSCL/P). Note that two additional SNPs overlapping SEMA4D, a gene with roles in axon guidance8, were also included for CLP+CPO replication analysis. Additionally, a gender-determinant site was included in the SNP set to allow confirmation of the maternal and paternal samples. This particular variant is located in the GYG2 gene, which is located outside the pseudoautosomal region PAR1 but still in a region of X- chromosomal and Y-chromosomal homology9. For this variant, 46,XY males are heterozygous with both a C and an A allele, whereas 46,XX females are homozygous with only a C allele. Replication sample The set of 64 SNPs selected for replication were genotyped in a trio sample of white individuals of European origin. Initially we had access to 1 534 European families, each with an index patient affected with NSOFC. Samples were recruited either in the context of the EUROCRAN/ITALCLEFT study, or were part of a large German NSOFC cohort10. Trios were excluded from the study if any member showed (i) incomplete phenotype status (i.e., unaffected, NSCL/P or NSCPO definition missing), (ii) a gender inconsistency between reported gender and genotype-determined gender (49 triads), (iii) a missing genotype rate >20% (173 triads), (iv) Mendelian inconsistencies for more than two of the genotyped SNPs (45 triads). After applying these stringent criteria, 1 197 nuclear trios remained for further analyses. Of these, 746 trios were part of the EUROCRAN/ITALCLEFT studies (273 from the Netherlands, 124 from Italy, 118 from the UK, 73 from Slovakia, 71 from Hungary, 33 from Bulgaria, 23 from Slovenia, 21 from Estonia and 10 from Spain), and the 451 remaining trios were recruited in Bonn, Germany. At the phenotypic level, the sample was subdivided into 931 trios that had an index patient with NSCL/P and 266 with NSCPO. SNP genotyping Peripheral venous blood samples were collected from the majority of probands, with the remainder from buccal swabs/saliva. Extracted genomic DNA was diluted to a concentration of 5 ng/μl for the genotyping assay. Genotyping was conducted using Sequenom MALDI-ToF mass spectrometer MassArray system (Sequenom Inc., San Diego, CA, USA). Primers were synthesized at Metabion, Germany. Using Sequenom MassARRAY Assay Design Software 3.4, two multiplex assays comprising all 64 selected SNPs plus the gender-specific variant were designed. Primer sequences and PCR/assay conditions are available upon request. Genotype data were analyzed using Sequenom Spectrodesigner Software package. Inter- and intraplate duplicates were included to check for genotype consistencies across DNA plates. Allele peaks were analyzed using Sequenom Typer Analysis software and genotype calls were confirmed by visual inspection of cluster plots. In order to exclude low-quality assays prior to genotyping of the entire replication sample, a test plate was run for each of the two plexes. Based on this analysis, three SNPs (rs7649938, rs523340, rs8059365) were removed from the assay as no genotype clusters could be observed. An additional set of eight SNPs (rs704573, rs17294646, rs4132699, rs8063919, rs1950678, rs10421474, rs4244713 and rs1489816) were removed due to ambiguous cluster plots. Thanks to the inclusion of back-up SNPs, these failures only resulted in a loss of four out of the total 46 chromosomal loci identified by the PofO GWAS. Quality control and statistical analysis Fifty-three successfully genotyped SNPs underwent post-genotyping quality control. A SNP was excluded from statistical analyses if it (i) had a genotype call rate of less than 85%, (ii) showed >50 Mendelian errors, or (iii) showed deviation from HWE with P<105 . No MAF filter was applied (minimum MAF in the dataset: 0.048). In total, five SNPs failed one or more of these quality control checks (rs1034832, rs2196457, rs11999884, rs1920435 and rs6011617) resulting in the loss of four loci. Our final filtered replication dataset comprised 48 SNPs mapping to 38 distinct chromosomal loci. In the replication sample, each SNP was statistically analyzed in the phenotypic group that it was initially identified in. PAT, MAT, PofO and HET tests for analysis of PofO effects were performed, as described above. Subsequently, combined analyses in which we pooled both the discovery and replication samples were conducted, again applying each of the four statistical tests. References 1. Beaty TH, Murray JC, Marazita ML et al: A genome-wide association study of cleft lip with and without cleft palate identifies risk variants near MAFB and ABCA4. Nat Genet 2010; 42: 525-529. 2. Weinberg CR, Wilcox AJ, Lie RT: A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting. Am J Hum Genet 1998; 62: 969-978. 3. Weinberg CR: Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. Am J Hum Genet 1999; 65: 229-235. 4. Curtis D, Sham PC: A note on the application of the transmission disequilibrium test when a parent is missing. Am J Hum Genet 1995; 56: 811-812. 5. Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559-575. 6. Lacheretz M, Poupard B: [Inheritance of harelip and cleft palate. Reexamination apropos of statistics of 879 cases, 212 of them familial]. Chirurgie 1972; 98: 264270. 7. Sperber GH: Formation of the primary and secondary palate. Cleft Lip and Palate: From Origin to Treatment Oxford University Press 2002: 5–13. 8. Isbister CM, Tsai A, Wong ST, Kolodkin AL, O'Connor TP. Discrete roles for secreted and transmembrane semaphorins in neuronal growth cone guidance in vivo. Development. 1999 126:2007-19. 9. Zhai L, Mu J, Zong H, DePaoli-Roach AA, Roach PJ: Structure and chromosomal localization of the human glycogenin-2 gene GYG2. Gene 2000; 242: 229-235. 10. Mangold E, Ludwig KU, Birnbaum S et al: Genome-wide association study identifies two susceptibility loci for nonsyndromic cleft lip with or without cleft palate. Nat Genet 2010; 42: 24-26.