Download Supplementary Information (doc 42K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Replisome wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Heritability of IQ wikipedia , lookup

Microevolution wikipedia , lookup

Behavioural genetics wikipedia , lookup

Metagenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Human genetic variation wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

RNA-Seq wikipedia , lookup

SNP genotyping wikipedia , lookup

Haplogroup G-M201 wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
Supplementary Information
dbGap dataset and quality control
Genome-wide SNP data for 7 018 individuals comprising 2 339 trios in which each child
was affected with any type of NSOFC (CL/P or CPO), were downloaded from dbGaP
(Accession number: phs000094.v1.p1)1. Available genotype data included a total of 1
387 466 SNPs, comprising 601 273 genotyped SNPs and an additional 786 193 SNPs
imputed with BEAGLE using HapMap Phase II samples as a reference panel. Quality
control of the original data prior to release included removal of samples that showed high
rates of Mendelian errors, inconsistent gender, and unexpected relatedness. For each
individual, high confidence imputed SNPs (r2≥0.9) were converted to respective
genotypes, while those SNPs with r2<0.9 were removed. Following the recommendations
accompanying the dbGaP data release, we removed individuals that had <95% of SNPs
successfully genotyped, or those in whom a gross autosomal chromosomal anomaly
(≥10Mb) had been detected. We then filtered SNPs, removing those with >5% missing
genotypes, minor allele frequency (MAF) <0.01, >50 Mendelian errors, or HardyWeinberg equilibrium (HWE) p<10-5 in the Caucasian and Asian subset of samples
separately. After these filtering steps, the population comprised 39% Caucasian
individuals, 49% Asians, with the remaining 11% comprising a mix of Pacific Islanders,
African Americans, Native Americans, and individuals of mixed ancestry. The number of
SNPs and samples used in each analysis are shown in Table 1. All genome coordinates
quoted are hg18.
Genome-wide detection of putative parent-of-origin effects
We used an extension of the Transmission Asymmetry Test and the Parental Asymmetry
Test2,3 to study the role of PofO effects in NSOFC. Using rules of Mendelian inheritance
in trios, we identified the transmitted and non-transmitted alleles in each parent, and the
paternally and maternally inherited alleles in each child. To avoid potential falsepositives due to low-frequency alleles, we utilized only SNPs where all three members of
a trio had genotypes available with incomplete trios being discarded for this analysis4.
Within each trio, SNPs for which all three family members were heterozygous could not
be assigned parental origin and were removed (~5% of markers). We then performed four
different tests to detect putative PofO effects: (i) analysis of transmission bias from
heterozygous fathers to affected children (PAT); (ii) analysis of transmission bias from
heterozygous mothers to affected children (MAT); (iii) a comparison of the maternal and
paternal odds ratios (PofO); (iv) a comparison of the relative frequency of the two classes
of heterozygotes in affected children (HET). These methods were implemented in
PLINK5 by modification of the ‘poo.cpp’ file. Assuming independent transmission of
maternal and paternal alleles, the significance of PAT, MAT and HET tests were
calculated using the Chi-square distribution at 1 d.f., under the null hypothesis of an
equal distribution of the two parental alleles to affected offspring from each parent.
Significance for the PofO test was calculated using a normal distribution (Figure 1). We
performed a combined analysis for the main phenotype NSOFC (including both NSCL/P
and NSCPO trios), and also separate analyses for each of the two etiologically distinct
subtypes NSCL/P and NSCPO6,7. Separate analyses were performed using all samples
combined (which included Caucasians, Asians, Pacific Islanders, mixed and others) and
using the two major ethnic groups (Caucasians and Asians) separately. Because of the
reduced sample size, we did not subdivide CPO samples based on ethnicity.
SNPs showing putative PofO effects in this discovery cohort were defined as
follows: As a primary filter, we first selected those SNPs that showed nominal
significance (PPofO<0.05) in the PofO test. Then, using a significance threshold of P<10-5,
SNPs were considered as showing a possible PofO bias if they were significant in any of
the four tests: PAT, MAT, PofO and HET. A subset of these SNPs were then carried
forward for further investigation in a replication cohort (see below).
Using the PLINK -blocks function, we calculated the number of distinct linkage
disequilibrium (LD) blocks containing GWAS SNPs showing PPofO<0.05, and either
PMAT<10-4 or PPAT<10-4. Enrichment analysis was performed using chi-square test at
d.f.=1, under the null hypothesis of an equal number of MAT and PAT LD blocks.
SNP selection for replication
We selected the most significant candidate loci for replication in an independent familybased NSOFC cohort. SNP selection covered the most significant candidate SNPs/loci for
the main phenotype NSOFC and for both etiological distinct cleft phenotypes, namely
NSCL/P and NSCPO. As we had no access to a replication sample comprising Asian
trios, SNP selection for replication was based on the genome-wide PofO analysis in the
Caucasian sample and the Caucasian/Asian combined sample.
For each candidate locus identified, we tried to select at least two SNPs to allow
for possible technical failures. The final replication SNP set comprised 64 SNPs (32
SNPs for NSOFC, 33 SNPs for NSCL/P and 5 SNPs for NSCPO; with six SNPs shared
between NSOFC and NSCL/P). Note that two additional SNPs overlapping SEMA4D, a
gene with roles in axon guidance8, were also included for CLP+CPO replication analysis.
Additionally, a gender-determinant site was included in the SNP set to allow
confirmation of the maternal and paternal samples. This particular variant is located in
the GYG2 gene, which is located outside the pseudoautosomal region PAR1 but still in a
region of X- chromosomal and Y-chromosomal homology9. For this variant, 46,XY
males are heterozygous with both a C and an A allele, whereas 46,XX females are
homozygous with only a C allele.
Replication sample
The set of 64 SNPs selected for replication were genotyped in a trio sample of white
individuals of European origin. Initially we had access to 1 534 European families, each
with an index patient affected with NSOFC. Samples were recruited either in the context
of the EUROCRAN/ITALCLEFT study, or were part of a large German NSOFC
cohort10. Trios were excluded from the study if any member showed (i) incomplete
phenotype status (i.e., unaffected, NSCL/P or NSCPO definition missing), (ii) a gender
inconsistency between reported gender and genotype-determined gender (49 triads), (iii)
a missing genotype rate >20% (173 triads), (iv) Mendelian inconsistencies for more than
two of the genotyped SNPs (45 triads).
After applying these stringent criteria, 1 197 nuclear trios remained for further
analyses. Of these, 746 trios were part of the EUROCRAN/ITALCLEFT studies (273
from the Netherlands, 124 from Italy, 118 from the UK, 73 from Slovakia, 71 from
Hungary, 33 from Bulgaria, 23 from Slovenia, 21 from Estonia and 10 from Spain), and
the 451 remaining trios were recruited in Bonn, Germany. At the phenotypic level, the
sample was subdivided into 931 trios that had an index patient with NSCL/P and 266
with NSCPO.
SNP genotyping
Peripheral venous blood samples were collected from the majority of probands, with the
remainder from buccal swabs/saliva. Extracted genomic DNA was diluted to a
concentration of 5 ng/μl for the genotyping assay. Genotyping was conducted using
Sequenom MALDI-ToF mass spectrometer MassArray system (Sequenom Inc., San
Diego, CA, USA). Primers were synthesized at Metabion, Germany. Using Sequenom
MassARRAY Assay Design Software 3.4, two multiplex assays comprising all 64
selected SNPs plus the gender-specific variant were designed. Primer sequences and
PCR/assay conditions are available upon request. Genotype data were analyzed using
Sequenom Spectrodesigner Software package. Inter- and intraplate duplicates were
included to check for genotype consistencies across DNA plates. Allele peaks were
analyzed using Sequenom Typer Analysis software and genotype calls were confirmed
by visual inspection of cluster plots.
In order to exclude low-quality assays prior to genotyping of the entire replication
sample, a test plate was run for each of the two plexes. Based on this analysis, three SNPs
(rs7649938, rs523340, rs8059365) were removed from the assay as no genotype clusters
could be observed. An additional set of eight SNPs (rs704573, rs17294646, rs4132699,
rs8063919, rs1950678, rs10421474, rs4244713 and rs1489816) were removed due to
ambiguous cluster plots. Thanks to the inclusion of back-up SNPs, these failures only
resulted in a loss of four out of the total 46 chromosomal loci identified by the PofO
GWAS.
Quality control and statistical analysis
Fifty-three successfully genotyped SNPs underwent post-genotyping quality control. A
SNP was excluded from statistical analyses if it (i) had a genotype call rate of less than
85%, (ii) showed >50 Mendelian errors, or (iii) showed deviation from HWE with P<105
. No MAF filter was applied (minimum MAF in the dataset: 0.048). In total, five SNPs
failed one or more of these quality control checks (rs1034832, rs2196457, rs11999884,
rs1920435 and rs6011617) resulting in the loss of four loci. Our final filtered replication
dataset comprised 48 SNPs mapping to 38 distinct chromosomal loci.
In the replication sample, each SNP was statistically analyzed in the phenotypic
group that it was initially identified in. PAT, MAT, PofO and HET tests for analysis of
PofO effects were performed, as described above. Subsequently, combined analyses in
which we pooled both the discovery and replication samples were conducted, again
applying each of the four statistical tests.
References
1. Beaty TH, Murray JC, Marazita ML et al: A genome-wide association study of
cleft lip with and without cleft palate identifies risk variants near MAFB and
ABCA4. Nat Genet 2010; 42: 525-529.
2. Weinberg CR, Wilcox AJ, Lie RT: A log-linear approach to case-parent-triad
data: assessing effects of disease genes that act either directly or through maternal
effects and that may be subject to parental imprinting. Am J Hum Genet 1998; 62:
969-978.
3. Weinberg CR: Methods for detection of parent-of-origin effects in genetic studies
of case-parents triads. Am J Hum Genet 1999; 65: 229-235.
4. Curtis D, Sham PC: A note on the application of the transmission disequilibrium
test when a parent is missing. Am J Hum Genet 1995; 56: 811-812.
5. Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome
association and population-based linkage analyses. Am J Hum Genet 2007; 81:
559-575.
6. Lacheretz M, Poupard B: [Inheritance of harelip and cleft palate. Reexamination
apropos of statistics of 879 cases, 212 of them familial]. Chirurgie 1972; 98: 264270.
7. Sperber GH: Formation of the primary and secondary palate. Cleft Lip and Palate:
From Origin to Treatment Oxford University Press 2002: 5–13.
8. Isbister CM, Tsai A, Wong ST, Kolodkin AL, O'Connor TP. Discrete roles for
secreted and transmembrane semaphorins in neuronal growth cone guidance in
vivo. Development. 1999 126:2007-19.
9. Zhai L, Mu J, Zong H, DePaoli-Roach AA, Roach PJ: Structure and chromosomal
localization of the human glycogenin-2 gene GYG2. Gene 2000; 242: 229-235.
10. Mangold E, Ludwig KU, Birnbaum S et al: Genome-wide association study
identifies two susceptibility loci for nonsyndromic cleft lip with or without cleft
palate. Nat Genet 2010; 42: 24-26.