Download Supplementary Information (doc 132K)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Behavioural genetics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene desert wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Pathogenomics wikipedia , lookup

Designer baby wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Microevolution wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genetic studies on Bulgarians wikipedia , lookup

Public health genomics wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Human genetic variation wikipedia , lookup

Molecular Inversion Probe wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

SNP genotyping wikipedia , lookup

Haplogroup G-M201 wikipedia , lookup

Tag SNP wikipedia , lookup

Transcript
LIST OF SUPPLEMENTARY MATERIAL TABLES AND FIGURES
Table S1: Summary of sample-level QC for each genotyping platform. Cells indicate the
number of samples removed at each QC step. *Includes known duplicate pairs intended for
cross-platform concordance checks. †Includes unexpected duplicates and additional affected
siblings. ‡Includes 43 control samples recruited for TS GWAS.
Table S2: Strongest Associated GWAS Variants (p<10-3) in EU, AJ, SA, Trio, Combined
Case-Control and Combined Trio-Case-Control Samples
Single nucleotide polymorphisms (SNP) listed by rs# include those with association Pvalues<10-3 for EU, AJ and SA case-control subgroups individually and combined, trios, and triocase-controls. The chromosome (Chr) and base pair location for each SNP are listed in
columns to the right of the SNP column. OR indicates the odds ratio for the tested allele in the
trio sample. Direction indicates whether the direction of association between OCD and the A1
allele is either positive (+) or negative (-) A1 allele for individual subgroups within the combined
(EU, AJ, SA, trios) samples. The left gene and right gene columns lists the closest genes in the
SNP region, either being within the gene (no distance given) or right and left flanking genes (+
distance in kilobases) or downstream (- distance in kilobases). For SNPs located within genes,
other functional elements in the region are as noted. QTL (eQTL) columns list genes whose
expression or methylation levels (m) are associated (P-value) with the specified SNP in that
row, specifically as identified previously in EU-ancestry frontal (F), parietal (P) or cerebellar (C)
tissue. mQTL and F eQTL data is unavailable for X chromosome SNP.
Table S3: Strongest Associated GWAS Variants within previously identified linkage
regions
Table S4: Strongest Associated GWAS Variants within previously identified candidate
genes
Table S5: Enrichment of miRNA Target Sets in the Best Supported SNPs from the OCD
GWAS. Gene sets regulated by each miRNA were downloaded from TargetScan
www.targetscan.org and filtered to remove genes with a <90% probability of being regulated by
each miRNA set (micro-RNA Annotatiion). LD-pruned independent SNPs lying within each
target gene set (Target Gene Number) at a given p-value threshold (SNP P_thrshold) from the
indicated “Sample” were then for enrichment, and the number of intervals with p-values less
than the threshold (# Intervals +) are noted. Only results with an empirical p-value (Empirical_P)
of p<0.1 are shown, and p-values corrected for multiple testing (Corrected_P) are listed.
Table S6: Enrichment of Gene Ontology (GO) Pathway Target Sets in the Best Supported
SNPs from the OCD GWAS. Enrichment of the best supported SNPs in each GO pathway
target set (Target_gene set ID and Gene Set Annotation) was tested using INRICH for the
Case-Control, Trio or Trio-Case-Control analyses. Each GO target gene set contains the
indicated “Target Gene Number” and within this set the number of intervals containing SNPs (#
Intervals +) that are below the p-value threshold (SNP P_threshold) are shown. Only results
with an empirical p-value (Empirical_P) of p<0.05 or p-values after correction for multiple testing
(Corrected_P) of p<1 are shown.
Table S7: Detailed Association Results of SNPs in the 14.6 kb region Surrounding
rs6131295 in the Trio Analysis. Location (CHR, BP (hg19)), minor allele (A1), minor allele
frequency (freq (A1)), Odds Ratio (OR), p-value (P), type of SNP (0=imputed), and r2 and D’ to
1
rs6131295 are listed.
Figure S1: Quality Control Pipeline
Figure S2: Multi-dimensional scaling (MDS) plot of all OCD GWAS case-control samples
Figure S3: Multi-dimensional scaling (MDS) plot of OCD GWAS case-control samples of
European ancestry
Figure S4: Multi-dimensional scaling (MDS) plot of European ancestry OCD GWAS
identifying South African Subsample Plots of additional MDS dimensions (here the 2nd and
5th dimensions) demonstrated a separation of the South African (SA) case-control sample
(green) from the other European ancestry samples.
Figure S5: Schematic of differential SNP missingness tests for cross-platform
comparisons. 9961 SNPs were removed based on differential missingness with respect to
phenotype (i.e. between cases and controls). An additional 4960 SNPs were removed due to
differential missingness with respect to flanking SNP genotypes.
Figure S6: MDS plot of OCD trio founders.
Figure S7: Quantile-quantile (QQ) Plots of Observed versus Expected p-values in: (a) EU,
(b) AJ, and (c) SA Samples.
The 95% confidence interval of expected values is indicated in grey. Corresponding genomic
control lambda values are indicated within each plot (lambda=1.009 for EU, 0.982 for AJ, and
0.969 for SA).
Figure S8: Quantile-quantile (QQ) Plots of Observed versus Expected p-values among
SNPs from 22 Candidate Genes for: (a) Case-Control samples and (b) Combined TrioCase-Control Samples.
The 95% confidence interval of expected values is indicated in grey. Corresponding genomic
control lambda values are indicated within each plot (lambda=1.085 for Case-Controls and
lambda=1.168 for trio-case-controls).
Figure S9: Regional results plot of top hits in meta-analyses.
a) LocusZoom plot of rs26728 from the case-control meta-analysis; b) LocusZoom plot of
rs4868342 from the case-control meta-analysis; c) LocusZoom plot of rs297941 from the triocase-control meta-analysis.
Figure S10: Cluster and LocusZoom plots of rs6131295, the top SNP in family based TDT
analysis.
a) Normalized intensity plot of SNP genotype clusters from BeadStudio (Illumina, San Diego,
CA, USA); b) Regional results plot of rs6131295, which is 90kb 3’ to BTBD3.
Figure S11: LocusZoom Plot of Directly Genotyped and Imputed SNPs near rs6131295.
Locations and observed (-log (p-values) for genotyped SNPs are show as circles, imputed
SNPs as diamonds. Red, orange, green and blue colors indicate the r2 (derived from 1000
Genomes CEU data) between each plotted SNP and the top SNP in the region (rs6131295, in
purple). Blue lines indicate the estimated recombination rate from HapMap release 22.
2
Figure S12: Interrelationships between strongest GWAS findings in Trio and Trio-CaseControl Meta-Analysis.
rs6131295 is an eQTL for BTBD3 and ISM1in cis and DHRS11, in trans (indicated by blue
arrows). Co-expression of DHRS11 and FAIM2 (green arrows) and ISM1 and ADCY8 (orange
arrows) in approximately 16 regions from each of 40 brains across the human lifespan (9 wk
post-conception to 40 years) was found by examination of the BrainSpan project
(BrainSpan.org, access data 10/2011). “r” indicates the correlation coefficient between each
pair of genes and “Rank” refers to the rank order of correlations of the 22,238 genes examined.
SUPPLEMENTARY MATERIALS
Abbreviations: AJ, Ashkenazi-Jewish European-derived samples; EU, European-ancestry,
non-isolate samples collected from the US, Canada and Europe; SA, South African samples
collected from Capetown, South Africa.
SUBJECTS
Case and trio subjects were recruited and assessed as described in the main text. Cases and
trios were recruited predominantly from OCD specialty clinics, and controls were recruited from
Bonn, Germany and from Capetown, South Africa. For study inclusion, all cases and trio
probands were required to have a DSM-IV diagnosis of OCD. The controls from Bonn had an
absent lifetime history of all axis I disorders and the South African controls were diagnostically
unscreened. Additional, unscreened controls, genotyped on two different Illumina SNP arrays,
came from: 1) the Study of Addiction: Genes and Environment (SAGE) cohort (1,288
individuals, Illumina Hap1M)1-3; 2) the HYPERGENES Consortium Milan, Italy (501 individuals,
Illumina Hap1M); 3) the Illumina ‘iControl’ Genotype Control Database (3,212 individuals,
Illumina Hap550k_v1); and 4) a cohort of Dutch ancestry (653 individuals, Illumina
Hap550k_v1)(Table S1).4
GENOTYPING
As described in the main text, 1817 OCD cases, 504 controls, and 663 OCD trios (2041
samples, including 1326 parents, 663 probands, and 52 affected siblings) were genotyped on
3
the Illumina Human610-Quadv1_B SNP array (Illumina, San Diego, CA, USA) at the Broad
Institute of Harvard and MIT Center for Genotyping and Analysis (CGA) (Cambridge, MA, USA)
in two batches (Sept-Nov 2008 and Dec 2008-Feb 2009). The method of genotype calling for
the OCD samples is the same as the TS samples, and was described in the accompanying TS
GWAS paper.5 1586 OCD cases, 448 controls, and 1739 OCD trio samples were successfully
genotyped at CGA with call rate > 97%. 43 additional European descent control samples
recruited for TS GWAS were included in the OCD GWAS as control samples, bringing the
genotyped control sample total to 491.
Previously genotyped control datasets were also included in the OCD GWAS, including SAGE
control samples (N=1288), iControls (n=3212), Dutch controls (n=653) and Italian controls
(n=501). The first three of these datasets are described in the accompanying TS GWAS
manuscript. The latter control dataset was genotyped on Illumina Hap1M, consisting of 501
Italian controls from the HYPERGENES consortium collected in Milan, Italy and characterized
as normotensive, non-obese, and non-dyslipidemic with no abnormal findings on physical
examination, but with no formal assessment for neurologic or psychiatric conditions. 2781
SAGE samples were utilized in the initial platform-specific SNP QC steps to increase the power
of detecting low quality SNPs and samples. The SAGE cases and non-European controls were
then removed for further quality control steps and the final GWAS.
QUALITY CONTROL PROCEDURES
QC Overview
A schematic of the ordered QC pipeline is provided in Supplementary Figure S1. Initial QC
steps were performed in parallel within each of the five datasets. SNP genotyping concordance
is checked on duplicate TS samples that were genotyped together with the OCD samples on
4
two different platforms (Hap610 and on Hap550 or Hap370) to confirm the robustness of
Illumina genotyping across different platforms and sites as well as to remove SNPs with
discrepant calls across platforms.
Platform-specific QC includes removing SNPs and samples with low call rate (<98%), samples
with ambiguous genomic sex or discordance between genomic and phenotypic sex, and strandambiguous SNPs or SNPs with allele frequency significantly different from HapMap CEU
reference data. The batch effect was investigated on the samples genotyped at CGA, and no
evidence for batch effect was found. Three SNPs with p<10-5 in the batch effect regression
analysis were flagged, and none of these appeared in the top 580 SNPs in the case-control
meta-analysis or in the top 584 SNPs in the final case-control and trio meta-analysis. Any SNPs
detected with low concordance rate among different platforms based on the TS samples were
removed from OCD GWAS dataset.
As noted in the main text, two SNP QC thresholds were generally used for each step: a more
stringent threshold at which SNPs were removed from the analysis, and a second liberalized
threshold for which SNPs were flagged in an annotation file and re-examined later for potential
QC-related bias.
Platform merging and initial cross-platform comparisons
At this stage in the QC process, all samples were merged into a single dataset using PLINK.
Following the merge, 23 SNPs were either mismatched or tri-allelic and were removed. SNP
allele frequencies were compared among each platform and any SNP with an absolute allele
frequency difference >0.15 between two platforms were flagged. Lastly, any SNP not in
common between the cleaned Hap1M, Hap610 and Hap550 platforms were removed, leaving
485,232 cleaned SNPs for subsequent analyses.
5
Removal of duplicates, related samples and individuals of non-European descent
For all 7667 case-control samples remaining in the common dataset and 1654 trio samples,
pairwise estimation of genome-wide identity-by-descent (IBD) was conducted with an LDpruned set of 51,516 SNPs using PLINK. 401 complete trios were confirmed with the parentproband relationship with Z1>0.9. Among the incomplete trios, 106 probands with European
ancestry were included as cases in the case-control samples. One individual from each casecontrol sample pair with either a pi-hat>0.1 or Z1>0.2, representing unexpected duplicates or
relatives, was removed from subsequent analyses. For the unexpected duplicates or relatives
between the case-control samples and the trio samples, the case-control samples were
removed from subsequent analyses. All remaining case-control samples were subjected to a
multi-dimensional scaling (MDS) analysis to identify individuals of non-European ancestry
(Supplementary Figure S2), and the remaining trio samples were subjected for a Mendelian
error checking.
The majority of samples clustered along a diagonal with samples of Dutch origin at the top left
(blue) and Italian origin at the lower middle (red), consistent with the expected distribution of
European ancestry samples along a Northern to Southern European cline. The samples
clustered at the bottom right were later identified as Ashkenazi Jewish (AJ) samples
(Supplementary Figure S2). However, 46 cases and 141 controls fell far outside this general
European ancestry cluster and thus were removed from analysis due to the presence of nonEuropean genetic ancestry (Supplementary Table S1).
Separation of case-control samples into genetically homogeneous sub-populations of
European-ancestry derived samples: EU, SA, and AJ
6
After removing all individuals with non-European genetic ancestry, a second European ancestry
MDS analysis was performed to stratify remaining samples into more homogeneous
subpopulations and to re-assign individuals whose self-reported ancestry did not reflect
observed genetic ancestry (Supplementary Figure S3).
As expected, most case-control subjects clustered together in a homogeneous cloud along the
expected Northern-Southern European cline (from the top middle to the left bottom). Within the
EU cluster, the Dutch cases (light blue) and the Italian cases (pink) genotyped on Hap610 at
Broad were well matched by the Dutch controls (blue) and the Italian HYPERGENES controls
(yellow). The individuals within the EU cluster were separated out as a non-isolate European
ancestry stratum (EU) for sub-population-specific QC and analysis.
AJ Sub-population
Two major clusters of individuals distinct from the main EU sample in the European ancestry
MDS analysis were identified as AJ ancestry (Supplementary Figure S3). The middle red
cluster represents half-AJ/half-EU ancestry. Due to the small number of samples, this “half-AJ
cluster” was combined with the main AJ cluster and analyzed together as a single AJ stratum.
SA Subpopulation
Although the South African (SA) cases and controls (green) also fell within the general EU
cluster, further MDS analyses identified additional dimensions that distinguished SA cases and
controls from the EU samples, and thus they were analyzed separately as an SA-specific
stratum (Supplementary Figure S4, green).
Subpopulation-specific QC
7
After separating the final samples into three subpopulation-specific strata (EU, AJ, SA), an
additional set of QC analyses were undertaken within each subpopulation to optimize casecontrol matching and to remove remaining poorly performing samples and SNPs
(Supplementary Figure S1). First, samples were removed that demonstrated low-level
relatedness (Z1>0.1) with a large number (≥20) other samples in the subpopulation. Second,
samples within each sub-population were subjected to a cluster analysis (--cluster in PLINK),
and any sample whose pairwise identity-by-state distance from the closest samples was > 5
standard deviations compared to the rest of samples was removed. Mean heterozygosity was
calculated, and any sample with Fhet > ±0.05 was also removed from the final analysis.
Following these sample QC steps, SNPs were tested for the presence of Hardy-Weinberg
disequilibrium in controls from each subpopulation. Any SNPs with HWE p<10-10 were removed;
those with HWE p<10-5 were flagged. SNPs were also flagged in all samples if they generated
>1% Mendelian errors in the 400 OCD trios.
Given the use of five different datasets across three different nested Illumina platforms, we
performed an additional QC step to identify SNPs with differential missingness between cases
and controls across 5 cross-platform comparisons (Supplementary Figure S5). For each of
these comparisons, SNPs were removed using increasing levels of stringency with decreasing
minor allele frequency thresholds. For SNPs with MAF≥0.2, SNPs were excluded with Chisquare test p<10-5
2
test, 1df). For SNPs with MAF<0.2, but ≥0.1, SNPs were excluded with
p<10-4. Lastly, for SNPs with MAF<0.1, SNPs were excluded with p<10-3. In addition, a
differential missingness test relative to adjacent genotypes (haplotype-based missingness test, -test-mishap in PLINK) was performed with SNPs excluded for p< 10-10.
In addition, for EU cases of known Dutch ancestry genotyped on the Hap610 platform, all SNPs
absent from the Dutch Hap550 control dataset were removed to reduce any differential
8
missingness specific to these matched samples. For the known Italian ancestry genotyped on
the Hap610 platform, all SNPs absent from the Italian 1M control dataset were removed for the
same reason.
Two further rounds of MDS analyses were conducted within each ancestry-specific
subpopulation. The first set of subpopulation-specific MDS analyses was used to remove any
remaining samples with poor case-control matching. A final MDS analysis was then performed
to identify MDS dimensions which could explain any residual population stratification. MDS
dimensions were retained for subsequent association analysis if: 1) they were associated with
the OCD phenotype at p<0.01 or 2) for dimensions association with the OCD phenotype at
values associated with inclusion of each MDS dimension. All dimensions demonstrating a
notable drop in genomic control values relative to prior MDS dimensions were retained. These
MDS dimensions were included as covariates in the logistic regression model used for tests of
association.
Trio samples
As described in the methods section, the trio samples were recruited from sites in EU-descent
and non-EU descent countries including the following: Germany, France, the Netherlands, Italy,
Canada, the United States, the United Arab Emirates, South Africa, Mexico, Brazil, and Costa
Rica (Central Valley of Costa Rica, CVCR). For trio sample inclusion, the OCD-affected
proband and both biological parents were required. On the MDS plot of the founders in the trio
samples (Supplementary Figure S6), 75% of the founders were clustering on the bottom as
the European ancestry samples (black). The self-reported Mexican samples (blue) show
overlapping with the CVCR (green). Distinct genetic components were identified in the Brazil
samples (red) from the European samples. Due to the heterogeneous nature of the trio
9
samples and the small sample size within each population stratification stratum, the quality
control tests that require homogeneous population were not applied. Instead, we removed any
SNPs that failed in the quality control tests in the European case-control samples.
The Mendelian error test was applied in 401 complete trios. One trio with >0.1% Mendelian
errors was removed, and the rest of trios were subjected for the second Mendelian error test.
2803 SNPs were found with ≥1% (≥4) Mendelian errors in the second Mendelian error test,
therefore were excluded. The monomorphic SNPs in the trio samples were also removed.
Post-hoc confirmation of QC analyses
As a final step to confirm the quality of the QC process, we examined the square of the GWAS
test statistic for any correlation with residual call rate, Hardy-Weinberg p-value and minor allele
frequency of the surviving SNPs, none of which were significant (data not shown).
Sex Chromosome SNPs
For analysis of sex chromosome SNPs, males and females were assessed separately for each
subgroup, with adjustment by MDS factors as described above, and then combined via metaanalysis, using number of cases or trios as a weighting factor.
X chromosome QC
QC steps for X chromosome SNPs followed the same pipeline as for autosomal SNPs (Figure
S1) with a few modifications. In the first QC step, a SNP call rate threshold of 98% was used as
calculated based on female samples only. Similarly, for resolution of strand-ambiguous SNPs,
allele frequencies were estimated based on female samples only. Third, prior to merging
samples from each platform, samples with a call rate <95% on the X chromosome were
removed from analysis. After dataset merging, 1915 SNPs were removed for having
10
heterozygous genotypes in males. Finally, in the subpopulation-specific QC, a more
conservative cutoff for SNPs in Hardy-Weinberg disequilibrium was used (HWE p<0.001 in
female controls). Of note, no pre-defined pseudo-autosomal SNPs were genotyped on the
610Quad and thus were not available for analysis. S imilarly, since only 129 Y chromosome
SNPs passed QC with a call rate>98% in males, all Y chromosome SNPs were removed from
the analysis. All X-chromosome SNPs with p<1x10-3 are provided in Table S2.
ANALYSES
Subpopulation-specific association analysis
Following QC, each of the three cleaned datasets (EU, AJ, SA) were analyzed as separate
subpopulations in PLINK using logistic regression under an additive model with subpopulationspecific MDS dimensions incorporated as covariates in each analysis (EU: 4 MDS dimensions,
AJ: 1 MDS dimension; SA: 2 MDS dimensions). The remaining 400 trios with 467,978 SNP
were analyzed in PLINK using Transmission Disequilibrium Test (TDT). Quantile-quantile plots
of each case-control subpopulation-specific analysis and in the TDT analysis of the trio samples
revealed no evidence of residual population stratification or significant systematic technical
artifacts (Supplementary Figures S7a-c).
Meta-analysis
Meta-analysis was conducted using METAL, which combined the p-values using the number of
cases in each subpopulation-specific stratum for weighting.6 Two meta-analyses were
conducted: a case-control meta-analysis of the three European-derived populations (EU, AJ,
SA) and a final meta-analysis of three case-control populations and one trio dataset (EU, SA,
AJ, Trio).
11
Using the sign test with 3616 LD-pruned SNPs with p<0.01, there was evidence for increased
consistent directionality (1907/3616=0.52; p=5.25 x 10-4 for 1-sided binomial test) between the
trios and the combined case-controls.” On further limiting analysis to the 414 LD-pruned SNPs
with p<0.001, we found no evidence that the directionality of effect was more consistent (205/
414=0.49; p=0.60 for 1-sided binomial test), although this loss of statistical significance can
likely be attributed to decreased power provided by the smaller sample size.
eQTL and mQTL enrichment tests
Previously generated expression quantitative trait loci (eQTL) data from lymphoblastoid cell
lines (LCL),7 frontal lobes,8 parietal lobes7 and the cerebellum7 was used to annotate the 580
SNPs with the strongest evidence of association in the trio-case-control meta-analysis
(p<0.001) .7 Annotation of these SNPs was also conducted with data regarding methylation
levels (methylation quantitative trait loci-mQTLs)7 within the cerebellum. Details regarding eQTL
and mQTL data collection are provided in the accompanying TS GWAS paper.5
Gene and coding SNP enrichment tests
The top SNPs from the trio-case-control analyses with p<0.001 and with p<0.01 were compared
to 1,000 random sets of the same size, conditioning on allele frequency, to yield an empirical
distribution. An enrichment p-value was then calculated as the proportion of randomized sets in
which the eQTL (or mQTL) count matched or exceeded the actual observed count in the list of
top SNP associations, to test whether the SNPs with the strongest observed associations were
enriched for eQTLS or mQTLs.7
Enrichment of missense polymorphisms and genic SNPs was also assessed, using a similar
approach to that applied for eQTL enrichment. To do this, each polymorphic SNP in HapMap
was assigned a function, following the dbSNP functional classification scheme, as previously
12
described. SNP were considered “genic” if they were located either within a coding region,
intron or 2 kb of upstream or downstream flanking sequences. Coding SNPs were assigned a
function depending on how each allele altered the translated amino acid sequence. If either
allele is nonsynonymous, it was assigned a “missense,” “nonsense,” or “frameshift” annotation.
To test for enrichment of genic SNPs and specifically for missense polymorphisms, a similar
approach to that applied for eQTL enrichment was used.
GWAS SNPs within genomic regions with previous suggestive evidence for linkage in OCD
were examined and their strength of association are summarized in Supplementary Table S3.
Potential enrichment of top hits (at thresholds of p<0.001 and p<0.01) for the combined triocase-control sample from the set of SNPs from 22 previously identified candidate genes was
examined by assessing for potential association with this set of SNPs using INRICH.40
The Q-Q plot of candidate gene SNPs for the combined case-control group shows little inflation
(λ=1.085, Supplementary Figure S8), suggesting no evidence for over-representation within
these genes. While the Q-Q plot of the combined trio-case-control sample indicates small
inflation (λ=1.168, Supplementary Figure S8), the follow-up enrichment test demonstrated no
over-representation of top hits (p<0.001 and p<0.01) within previously identified candidate
genes (p=0.15 and p=0.10, respectively). For these 22 OCD candidate genes examined, the
lowest SNP p-values are reported in Supplementary Table S4. The strongest finding was
observed for ADARB222, with a p-value=1.6x10-4, which did not survive correction for multiple
testing of candidate gene SNPs (corrected p=0.53).
Potential enrichment of micro-RNA (miRNA) binding sites among LD-independent associated
genomic intervals was also examined using a Target Scan56 probability of conserved targeting
cutoff of> 0.9 and Entrez Genes hg v.18 (http://www.ncbi.nlm.nih.gov/gene) (Supplementary
Table S5). Moreover, signals of enriched association with pre-defined gene pathways were also
13
queried via INRICH, providing empirical and corrected p-values for target gene sets from the
GWAS results (Supplementary Table S6).
SUPPLEMENTARY RESULTS
Case-control meta-analysis of European ancestry derived samples (EU, AJ, and SA)
Separate analyses of the MDS-identified EU, SA and AJ subpopulations were conducted to
reduce the genetic heterogeneity. The case-control European-ancestry meta-analysis produced
580 loci with association p-values <1 x 10-3. No SNP shows genome wide significant evidence
for association. The results of the SNPs with p-values < 1 x 10-3 are provided, along with the
complete annotation information, including eQTL data from all three tissues (LCL, cerebellum,
and frontal cortex) and cerebellar mQTL data (Supplementary Table S2). LocusZoom plots of
loci discussed in the main text from the case-control meta-analysis, including rs26728 (within
EFNA5), rs4868342 (within HMP19) and rs297941 (5’ to FAIM2), are shown in Supplementary
Figure S9.9
Trio samples (Family-based TDT results)
The strongest evidence for association was found on rs6131295 on 20p12, and reached the
genome wide significant threshold (p = 3.84 x 10-8). The cluster plot of rs6131295 shows no
evidence of artifact effect, suggesting that the association signal on rs6131295 is not likely due
to genotyping artifacts (Supplementary Figure S10a).
Trio-case-control meta-analysis of all OCD samples (EU, AJ, SA, Trios)
The global meta-analysis of all subpopulations consisting of 1465 OCD cases and 5557 controls
and 400 trios produced 584 loci with association p-values <10-3 (complete annotated list
provided in Supplementary Table S2). The regional results plot of rs297941 for the global
meta-analysis is shown in Supplementary Figure S9.
14
The brain-wide expression patterns of genes represented among the most strongly associated
GWAS SNPs were examined, in addition to correlation with expression of other GWAS
implicated genes. A schematic illustrating inter-correlations is found in Supplementary Figure
S12.
SUPPLEMENTARY REFERENCES
1.
Bierut LJ, Saccone NL, Rice JP, Goate A, Foroud T, Edenberg H et al. Defining alcoholrelated phenotypes in humans. The Collaborative Study on the Genetics of Alcoholism.
Alcohol Res Health 2002; 26(3): 208-213.
2.
Bierut LJ, Madden PA, Breslau N, Johnson EO, Hatsukami D, Pomerleau OF et al.
Novel genes identified in a high-density genome wide association study for nicotine
dependence. Hum Mol Genet 2007; 16(1): 24-35.
3.
Bierut LJ, Strickland JR, Thompson JR, Afful SE, Cottler LB. Drug use and dependence
in cocaine dependent subjects, community-based individuals, and their siblings. Drug
Alcohol Depend 2008; 95(1-2): 14-22.
4.
Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D et al.
Common variants conferring risk of schizophrenia. Nature 2009; 460(7256): 744-747.
5.
Scharf JM, Mathews CA. Copy number variation in Tourette syndrome: another case of
neurodevelopmental generalist genes? Neurology 2010; 74(20): 1564-1565.
6.
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide
association scans. Bioinformatics 2010; 26(17): 2190-2191.
7.
Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ. Trait-associated SNPs
are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet
2010; 6(4): e1000888.
8.
Gibbs JR, van der Brug MP, Hernandez DG, Traynor BJ, Nalls MA, Lai SL et al.
Abundant quantitative trait loci exist for DNA methylation and gene expression in human
brain. PLoS Genet 2010; 6(5): e1000952.
9.
Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP et al. LocusZoom:
regional visualization of genome-wide association scan results. Bioinformatics 2010;
26(18): 2336-2337.
15