Download 1 - bioRxiv

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Supplemental Material
Common Variant Associations with Fragile X Syndrome
TABLE OF CONTENTS
Supplemental Methods ..................................................................................................................... 2
Functional genomics ...................................................................................................................................................2
Supplemental Figures ........................................................................................................................ 4
Supplemental Figure 1. Sample ancestry ...................................................................................................................4
Supplemental Figure 2. SNP allele frequencies in FXS cases & controls ....................................................................5
Supplemental Figure 3. Comparison of relatedness in FXS cases & controls ............................................................6
Supplemental Figure 4. Brain RNA-seq Data in FMR1 association region .................................................................7
Supplemental Figure 5. 5C results for the FMR1 region ............................................................................................8
Supplemental Tables ......................................................................................................................... 9
Supplemental Table 1. Subject characteristics ...........................................................................................................9
Supplemental Table 2. Fine-mapping results .......................................................................................................... 10
References ...................................................................................................................................... 11
Page 1 of 12
SUPPLEMENTAL METHODS
FUNCTIONAL GENOMICS
Brain samples. Dorsolateral prefrontal cortex (Brodmann area 9) dissected from postmortem samples from nine
adult schizophrenia cases and nine adult control brains were obtained from Dr Craig Stockmeier (University of
Mississippi Medical Center) (1). Cases and controls were sex- and age-matched, and all were of European
ancestry. Controls had no history of psychiatric disorders or substance abuse. Frontal cortex from nine fetal
brains, gestational age 17-19 weeks, were obtained from the NIH NeuroBiobank (https://neurobiobank.nih.gov).
All fetal samples were of African American ancestry. Samples were genotyped on the Illumina Human Omni
Express chip in order to confirm sample integrity. Samples were dry homogenized to a fine powder using a liquid
nitrogen-cooled mortar and pestle. Aliquots from each sample were prepared, and used for multiple purposes
including ATAC-seq, RNA-seq, carbon-copy chromatin conformation capture (5C), and DNA microarray. Sample
processing was conducted blind to case-control status.
Cell lines. Three neural progenitor cell lines were used: an NIH human iPSC, H1 hESC, and ReNCellVM cells. We
used a modified protocol from Zeng et al. (2) to differentiate NIH human iPSCs and H1 hESCs. iPSCs were
collected after 51 days of differentiation, and H1 hESCs were collected after 47 days of differentiation.
ReNCellVM cells were grown and differentiated according to the pre-aggregation differentiation protocol by
Donato et al. (3).
RNA-seq. RNA-sequencing data was generated from all brain samples and cell lines. We extracted total RNA
from 25 mg of pulverized tissue from fetal and adult brain using Norgen’s Fatty Tissue RNA Purification Kit
(Norgen Biotek, Thorold, ON Canada). Extracted RNA from the cell pellets (each with ~3-6 million cells) using
Norgen’s Total RNA extraction kit. RNA concentration was measured using fluorometry (Qubit 2.0 Fluorometer),
and RNA quality verified using a microfluidics platform (Bioanalyzer, Agilent Technologies). Barcoded libraries
were created (Illumina TruSeq RNA Sample Preparation Kit v4) using 1 µg of total RNA as input. Samples were
randomly assigned to bar codes and lanes. Libraries were quantified using fluorometry and equal amounts of all
barcoded samples were pooled and sequenced using Illumina HiSeq 2000 (100 bp paired-end reads). We
mapped lane-level reads to the human genome (hg19) with Tophat (4) (v2.0.6) using default parameters and the
--library-type fr-firststrand option for strand-specific libraries. Using samtools (5), we removed reads with quality
score <10 or potential PCR duplicates. Mapped reads were summarized into gene-level expression estimates of
total read count (TReC). TReC is the number of reads that overlapped exonic regions of a gene (determined
using countReads in the R package isoform (6)). Gene models were based on non-overlapping exons from
Ensembl. genes. We excluded genes with very low expression levels (sum of TReC across all samples <50). TReC
data were normalized using the weighted trimmed mean of M-values scale-normalization method in EdgeR (7).
Total-stranded RNA-seq. We used total-stranded RNA-seq to detect unannotated features on five samples:
DLPFC from a control, PFC from a fetal sample, and three neural progenitor cell lines (differentiated NIH human
iPSCs, H1 hESCs, and ReNCellVM cells). RNA was extracted as detailed above.
5C. Chromosome conformation capture-carbon copy (5C) was performed on five human fetal prefrontal cortex
samples. The region targeted by 5C was chrX:146200000-147460000 (hg19). Tissue processing was based on two
previously published protocols (8, 9), one of which was designed for postmortem brain tissue. (9). Frozen tissue
(~250 mg) was pulverized using the Cellcrusher. To produce a single cell suspension, pulverized tissue was
treated with 1mL of 0.125% (w/v) collagenase in 1x PBS and placed in shaker/incubator (300 RPM, 30 minutes,
Page 2 of 12
37°C). Tissue was then Dounce homogenized with pestle A (10x) and returned to shaker/incubator (300 RPM, 30
minutes, 37°C) followed by further dissociation by Dounce homogenization with pestle B (10x), and spun at
400xg for 2 minutes at room temperature. After discarding the supernatant, the pellet was be re-suspended in 2
mL of 1x PBS. Cells were filtered using a 40 µm cell strainer, and transferred to a 15 mL conical tube. Following
confirmation of single cell suspension and quantification of cells (Countess Automatic Cell Counter), we
proceeded with the 5C protocol. We crosslinked cells with formaldehyde (1% final concentration) and quenched
the reaction using glycine (150 mM). Cells were then lysed and DNA was digested using HindIII. 3C library
preparation were done according to protocol by Naumova et al. (10) 5C library preparation was done according
to protocol by van Berkum et al. (11) 5C primers (226 forward primers; 6 reverse primers) were designed using
the my5C.primers module from the my5C web tools for 5C experiments and based on hg19. (12) Reverse
primers were based on predicted transcription start sites (TSS; Gencode V17 Comprehensive), while forward
primers mapped to all other HindIII fragments, with acceptable primers according to specific criteria from
my5C.primers module. 5C libraries were indexed (KAPA Hyper) and sequenced on two lanes of Illumina
HiSeq2500 (2x100bp). Plots were generated with the Bioconductor package Sushi. Two criteria for interactions
were examined: a liberal criterion (FDR < 0.1 in 3 of 5 samples) and a more conservative criterion (FDR < 0.05 in
4 of 5 samples).
ATAC-seq library preparation and sequencing. Assay for transposase-accessible chromatin sequencing (ATACseq) was used to map chromatin accessibility genome-wide. This method probes DNA accessibility with
hyperactive Tn5 transposase, which inserts sequencing adapters into accessible regions of chromatin. We used
tissue from human DLPFC of nine schizophrenia cases and nine controls along with PFC from nine fetuses (see
above for sample description). Approximately 20 mg of pulverized material was used for ATAC-seq. Frozen
samples were thawed in 1 ml of nuclear isolation buffer (20 mM Tris-HCL, 50 mM EDTA, 5mM Spermidine, 0.15
mM Spermine, 0.1% mercaptoethanol, 40% glycerol, pH 7.5), inverted for 5 minutes to mix, and samples were
filtered through Miracloth to remove larger pieces of tissue. Samples were centrifuged at 1100 x g for 10 min at
4°C. The resulting pellet was washed with 50 µl RSB buffer, centrifuged again, and supernatant was removed.
The final crude nuclear pellet was re-suspended in transposition reaction mix and libraries prepared for
sequencing as described in Buenrostro et al. (13) All samples were barcoded, and combined into pools. Each
pool contained 6 randomly selected samples. Each pool was sequenced on two lanes of an Illumina 2500
sequencer (San Diego, CA, USA).
The raw fastq files were processed through cutadapt (version 1.2.0, URLs) (14) to remove adaptors and lowquality reads. cutadapt-filtered reads were aligned to hg19 using bowtie2 (version 2.1.0, URLs) (15) using default
parameters. In alignment, all reads are treated as single-read sequences, regardless of whether ATAC-seq
libraries were sequenced as single-end or paired-end. The aligned bam files were sorted using samtools (version
0.1.18, URLs), (16) duplicates removed using Picard MarkDuplicates (URLs), and then converted to bed format
using BedTools (version: v2.17.0, URLs). (17) ENCODE “blacklist” regions were removed (i.e., empirically
identified genomic regions that produce artefactual high signal in functional genomic experiments). Narrow
open chromatin peaks were called from the final bed files using MACS2 (URLs). For visualization, bigwig files
were generated using wigToBigWig (version 4, URLs) (18) and bedgraph files were output by MACS2.
Page 3 of 12
SUPPLEMENTAL FIGURES
SUPPLEMENTAL FIGURE 1. SAMPLE ANCESTRY
Scatterplot of the first two principal components (PCs) for FXS cases (filled diamonds), GABC controls (open
squares), and the means for ten HapMap 3 samples. Nine FXS cases are outliers, eight with variable African
ancestry and one that clustered with the GIH / MEX HapMap samples. Abbreviations. ASW=African ancestry in
Southwest USA. CEU=Utah residents with Northern and Western European ancestry from the CEPH collection.
CHB=Han Chinese in Beijing, China. CHD=Chinese in Metropolitan Denver, Colorado. GIH=Gujarati Indians in
Houston, Texas. JPT=Japanese in Tokyo, Japan. LWK=Luhya in Webuye, Kenya. MEX=Mexican ancestry in Los
Angeles, California. TSI=Toscani in Italia. YRI=Yoruba in Ibadan, Nigeria.
Page 4 of 12
SUPPLEMENTAL FIGURE 2. SNP ALLELE FREQUENCIES IN FXS CASES & CONTROLS
Scatterplot of the allele frequencies in FXS cases and control for all GWAS SNPs. Contours indicate SNP density. Most SNPs
had similar allele frequencies in cases and controls (Spearman =0.97, P <10-300). Five SNPs (all in the vicinity of FMR1, red
X’s) had divergent allele frequencies, and reached genome-wide significance in case-control analyses (P<5x10-8).
Page 5 of 12
SUPPLEMENTAL FIGURE 3. COMPARISON OF RELATEDNESS IN FXS CASES & CONTROLS
Genomic relatedness was computed for all pairs of FXS and control subjects using autosomal, LD-pruned SNPs. Subject
quality control had previously removed one member of any pair of subjects with 𝜋̂ > 0.20. Three pairs of cases (6 individuals)
were related at approximately the third-degree. Despite this, the median 𝜋̂ values were lower for case-case pairs (0.0078)
than control-control pairs (0.0094). In other words, case pairs tended to be less interrelated than control pairs.
Page 6 of 12
SUPPLEMENTAL FIGURE 4. BRAIN RNA-SEQ DATA IN FMR1 ASSOCIATION REGION
The top tracks show GENCODE genes, classical FMR1 markers, and the fine-mapping results. The next track is empty as there
was no evidence of RNA alignments to genomic areas without current annotations. The next tracks show gene-level
expression in brain from RNA-seq of dorsolateral prefrontal cortex (DLPFC) in 9 SCZ cases and 9 controls along with
prefrontal cortex (PFC) from 9 fetuses. FMR1 was robustly expressed, but there was no detectible expression of FMR1-AS1.
We used total-stranded RNA-seq to detect unannotated features on 5 samples: DLPFC in a control, PFC in a fetal sample,
and three neural progenitor cell lines (an NIH human IPSC on day 51 of a differentiation protocol, H1 cells on day 47 of a
differentiation protocol, and ReNCellVM cells). All samples had substantial alignments to the forward strand (blue) mostly
corresponding to known FMR1 exons. The reverse strand alignments (red) had very low counts consistent with “leaky”
reverse direction transcription from the FMR1 promoter. We did not find evidence of an unannotated feature in the
association region.
Page 7 of 12
SUPPLEMENTAL FIGURE 5. 5C RESULTS FOR THE FMR1 REGION
5C (chromosome conformation capture-carbon copy) results from five human fetal prefrontal cortex samples. The region
shown was targeted by 5C (hg19, chrX:146200000-147460000). Plots were generated with the Bioconductor package Sushi.
The top plot shows interactions with a liberal criterion (FDR < 0.1 in 3 of 5 samples), and the middle plot a more conservative
criterion (FDR < 0.05 in 4 of 5 samples). The bottom track shows GENCODE genes, the positions of classical markers
(including the FMR1 promoter CGG repeat), and the genome-wide significant common variation. The miRNA gene cluster on
the left was unremarkable in the GWAS, but has dense DNA-DNA interactions, including some with the GWAS region.
Interaction involving the FMR1 promoter containing the CGG repeat are discussed in the text.
Page 8 of 12
SUPPLEMENTAL TABLES
SUPPLEMENTAL TABLE 1. SUBJECT CHARACTERISTICS
Characteristic
FXS Cases
Controls
89
266
100%
100%
Mean age at assessment (SD)
10.5 (5.2)
20.8 (3.5)
Vineland adaptive behavior scale, socialization score, mean (SD)
53.8 (17.0)
–
Vineland adaptive behavior scale, composite score, mean (SD)
43.1 (14.9)
–
Number
Percent male
SD=standard deviation. Vineland scores were standardized to mean=100 and SD=15. FXS cases were markedly impaired.
Page 9 of 12
SUPPLEMENTAL TABLE 2. FINE-MAPPING RESULTS
chrX (hg19)
SNP
Alleles
Fcase
Fcontrol
P (fine mapping)
OR
SE
146844358
P (Table 2)
rs73602793
A/C
0.09574
0.05139
0.095
1.954
0.4084
146852679
rs5952060
C/T
0.7835
0.4004
5.91e-12
5.419
0.264
146853379
rs141685640
G/A
0.03093
0.0364
0.79
0.8448
0.6364
146861015
rs45631655
A/T
0.4124
0.1799
4.88e-7
3.2
0.2389
146865083
rs148756788
C/T
0.1957
0.1073
0.018
2.024
0.3024
146876718
rs112146098
C/G
0.3505
0.1156
6.609e-9
4.128
0.2573
146882730
rs61577930
G/A
0.06186
0.06852
0.81
0.8963
0.4596
146884909
rs112866660
C/T
0.02062
0.03212
0.55
0.6344
0.7612
146886586
rs5905137
C/T
0.5567
0.2805
1.35e-7
3.221
0.2289
146895120
rs2197706
A/C
0.8041
0.3876
6.83e-14
6.487
0.2729
2.53e-10
146908213
rs5905149
A/C
0.6289
0.2655
4.04e-12
4.687
0.2348
5.76e-10
146908857
rs12009090
T/C
0
0.002169
0.65
0
146913828
rs7876251
G/A
0.7113
0.3362
6.85e-12
4.866
0.2445
3.68e-9
146918268
rs4824253
G/A
0.7113
0.3362
6.85e-12
4.866
0.2445
6.41e-9
146923549
rs5905152
C/T
0.01031
0
0.028
146925273
rs138345743
A/T
0.01042
0.01293
0.84
0.8035
1.086
146939662
rs45631657
C/T
0.375
0.0991
5.20e-12
5.455
0.264
146940339
rs139987830
A/C
0
0.002141
0.65
0
146944900
rs12008373
T/A
0
0.002141
0.65
0
146951567
rs148850319
C/T
0
0.002141
0.65
0
146955690
rs149005990
A/G
0
0.002146
0.65
0
146962305
rs140441434
T/C
0
0.002141
0.65
0
146963621
rs4824231
T/C
0.7423
0.3212
1.13e-14
6.086
146968287
rs181526497
A/T
0
0.03426
0.064
0
146980516
rs25705
T/C
0.433
0.167
5.74e-9
3.808
0.2395
146989460
rs12556135
T/C
0.07216
0.1159
0.21
0.5934
0.4182
146994239
rs146302130
C/G
0
0.02141
0.15
0
146994959
rs1805422
C/G
0.04124
0.0985
0.071
0.3936
146995450
rs12013181
C/T
0
0.002141
0.65
0
147003339
rs25726
G/A
0.06186
0.1113
0.14
0.5262
0.4464
147010529
rs25708
A/G
0.05155
0.02784
0.23
1.898
0.5385
147013704
rs29283
G/A
0
0.002151
0.65
0
4.57e-8
0.2524
0.5337
Fine-mapping using 32 SNPs in 97 FXS cases and 467 male controls. The first allele given is the least common in this sample
and is the reference for the odds ratio (OR) and frequencies in male cases (Fcase) and controls (Fcontrol). SE is standard error.
The first P is for fine mapping, and the second is from the initial GWAS. Genome-wide significant SNPs are highlighted in red.
Page 10 of 12
REFERENCES
1.
Zhu H, Urban DJ, Blashka J, McPheeters MT, Kroeze WK, Mieczkowski P, et al. Quantitative analysis of
focused A-to-I RNA editing sites by ultra-high-throughput sequencing in psychiatric disorders. PLoS ONE.
2012;7(8):e43227.
2.
Zeng H, Guo M, Martins-Taylor K, Wang X, Zhang Z, Park JW, et al. Specification of region-specific neurons
including forebrain glutamatergic neurons from human induced pluripotent stem cells. PLoS ONE.
2010;5(7):e11853.
3.
Donato R, Miljan EA, Hines SJ, Aouabdi S, Pollock K, Patel S, et al. Differential development of neuronal
physiological responsiveness in two human neural stem cell lines. BMC Neurosci. 2007;8:36.
4.
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics.
2009;25(9):1105-11.
5.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and
SAMtools. Bioinformatics. 2009;25(16):2078-9.
6.
Sun W, Liu Y, Crowley JJ, Chen TH, Zhou H, Chu H, et al. IsoDOT Detects Differential RNA-isoform
Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and
Specificity. J Am Stat Assoc. 2015;110(511):975-86.
7.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis
of digital gene expression data. Bioinformatics. 2010;26(1):139-40.
8.
Stadhouders R, Kolovos P, Brouwer R, Zuin J, van den Heuvel A, Kockx C, et al. Multiplexed chromosome
conformation capture sequencing for rapid genome-scale high-resolution detection of long-range
chromatin interactions. Nat Protoc. 2013;8(3):509-24.
9.
Mitchell AC, Bharadwaj R, Whittle C, Krueger W, Mirnics K, Hurd Y, et al. The genome in three dimensions: a
new frontier in human brain research. Biol Psychiatry. 2014;75(12):961-9.
10. Naumova N, Smith EM, Zhan Y, Dekker J. Analysis of long-range chromatin interactions using Chromosome
Conformation Capture. Methods. 2012;58(3):192-203.
11. van Berkum NL, Dekker J. Determining spatial chromatin organization of large genomic regions using 5C
technology. Methods Mol Biol. 2009;567:189-213.
12. Lajoie BR, van Berkum NL, Sanyal A, Dekker J. My5C: web tools for chromosome conformation capture
studies. Nat Methods. 2009;6(10):690-1.
13. Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and
sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat
Methods. 2013;10(12):1213-8.
14. Martin M. Cutadapt Removes Adapter Sequences From High-Throughput Sequencing Reads.
EMBnetjournal.17.
15. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357-9.
Page 11 of 12
16. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and
SAMtools. Bioinformatics. 2009;25(16):2078-9.
17. Quinlan AR. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics.
2014;47:11 2 1-34.
18. Speir ML, Zweig AS, Rosenbloom KR, Raney BJ, Paten B, Nejad P, et al. The UCSC Genome Browser
database: 2016 update. Nucleic Acids Res. 2016;44(D1):D717-25.
Page 12 of 12