* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download S1 Text. Supplementary Methods
Comparative genomic hybridization wikipedia , lookup
Bioinformatics wikipedia , lookup
Exome sequencing wikipedia , lookup
X-inactivation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Neurogenomics wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Non-coding DNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genomic library wikipedia , lookup
S1 Text. Supplementary Methods Filtering of Genomic Regions We performed filtering of the assembly considered in analyses of allelic expression biases, to identify regions where we have high confidence in our SNP calls. To do so, we first identified genomic regions with evidence for large-scale copy-number variation, second, we identified repeats and selfish genetic elements, and third, we identified genomic regions with unusually high proportions of heterozygous genotype calls in the inbred C. rubella line Cr1GR1, which is expected to be highly homozygous. Regions with evidence for high proportions of repeats, copy number variation or high proportion of heterozygous calls in Cr1GR1 were removed from consideration in further analyses of allele-specific expression. To identify regions with large-scale copy-number variation, we used the software Control-FREEC, which uses information on read depth to call copy number variants (Boeva et al 2011, http://bioinfo-out.curie.fr/projects/freec/). Control-FREEC does not require sequences from a reference sample, and controls for variation in GC content, a major source of variability in read depth. We ran Ctrl-FREEC 6.4 in sliding windows of 50 kb on .bam files filtered to retain only primary alignments. We then filtered all genomic regions with copy number variant calls in any of the samples for which we had genomic data. To identify regions with repeats, we ran RepeatModeler 1.0.5 (www.repeatmasker.org) on the C. rubella reference genome to build a custom library of repeats. We then ran RepeatMasker 4.0.1 (www.repeatmasker.org) using this custom library to identify repetitive regions. We assessed the cumulative distribution of repeats in the genome and set a threshold for filtering such that all 50 kb windows within the 30% most repetitive regions were removed. This corresponded to filtering all 50 kb windows with more than ~17% of sites assigned as repeats by RepeatMasker. Finally, we filtered all genomic regions containing a high proportion of heterozygous genotype calls in the C. rubella line Cr1GR1 which has been inbred in the lab for at least six generations. This was also done using the cumulative distribution of the fraction of heterozygous calls across the genome, in 50 kb windows, and we set the cutoff such that regions within the top 20% most heterozygous regions were removed. This corresponded to filtering all 50 kb windows with more than ~9% heterozygous calls in C. rubella Cr1GR1. Supplementary Figs S2-S5 show the regions kept for analysis, as well as the distribution of repeats, copy number variant calls, and fraction of heterozygous SNP calls in C. rubella across the eight main scaffolds of the C. rubella assembly. For all scaffolds, the proportion of sites assigned as repeats was elevated around centromeric regions, and genomic windows with evidence for copy number variation or high proportions of repeats often overlapped with windows with high repeat content and high proportions of heterozygous calls in C. rubella Cr1GR1. After filtering, we retained approximately 55% of the assembly, where we had high confidence in our SNP calls. Validation of ASE Results by qPCR We validated ASE results by performing qPCR. For this we used the TaqMan® Reverse Transcription Reagents (LifeTechnologies, Carlsbad, CA, USA) using oligo(dT)16s to convert mRNA into cDNA using the manufacturers protocol and performed qPCR with the Custom TaqMan® Gene Expression Assay (LifeTechnologies, Carlsbad, CA, USA) with the colors FAM and VIC using manufacturers protocol. The qPCR for both alleles was multiplexed in one well to directly compare the two alleles using a Bio-Rad CFX96 Touch™ Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). Primers were designed to match both alleles, whereas probes were designed to overlay SNPs that separated the two alleles. The primers were additionally designed to have either the forward or the reverse primer overlaying an intron to make sure only RNA was included in the analysis and not any remaining DNA contamination. To exclude color bias, we tested five genes using reciprocal probes (S11 Table) with VIC and FAM colorant and assessed if we saw a difference in the expression signal. Difference in expression signal was inferred by the relative expression difference between the two alleles, as well as the Quantification Cycle (Cq value) (Table 4). References Boeva V, Zinovyev A, Bleakley K, Vert J-P, Janoueix-Lerosey I, Delattre O, Barillot E. 2011. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 27:268–269.