Download Description of de novo copy number calls for cleft case

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Description of de novo copy number calls for cleft case-parent trios:
Both MinimumDistance and PennCNV utilize the log R ratios (LRRs) and B allele frequencies
(BAFs) from the Illumina 610 Quad array probes to infer de novo deletions. The LRR is a standardized
estimate of the probe intensity, quantifying the total number of allele copies at the locus of interest. The
BAF is a standardized estimate for the proportion of the B allele's contribution to the total probe
intensity, assessing genotype at the probe of interest. The BAF is standardized so homozygous
genotypes in copy neutral states (two allele copies) have BAFs of approximately zero or one (for AA and
BB genotypes, respectively), and heterozygous AB genotypes yield BAFs roughly equal to 0.5. As a
quality control step, we excluded triads where any sample (father, mother or child) with whole genome
amplified DNA.
The PennCNV algorithm for detecting de novo DNA copy number aberrations is based on a
hidden Markov model (HMM), jointly modeling the unknown copy number states in all three triad
members (Wang et al. 2008). The state transition probabilities are based on observed LRRs and BAFs in
the samples, and the population BAF. Maximum likelihood methods are employed to identify the most
likely copy number states in the father, mother and offspring, and these are encoded as a three digit
numerical code. A normal DNA copy number (2 alleles) is designated as a 3, a hemizygous deletion (one
allele copy) is indicated as a 2, and homozygous deletion (zero allele copies) is indicated as a 1. Thus, de
novo deletions in offspring with genotypic normal parents are encoded as triad state `332' (loss of one
allele copy in the child) or `331' (loss of both alleles). PennCNV addresses genomic waves by
incorporating the population GC content at each marker into the HMM. While the joint PennCNV HMM
considers all possible copy number states including inherited deletions (e.g. `322'), MinimumDistance
was developed specifically for detecting de novo copy number changes since the computational
demands of the joint PennCNV HMM are substantial, and false positive identifications of de novo
deletions remain a concern even when the recommended PennCNV quality control procedures including
genomic wave correction are employed. This MinimumDistance approach is based on the “minimum
distance" statistic, capturing differences in copy number estimates between the offspring and each of
the parents at each locus, making it robust to genomic waves and other probe specific artifacts by
design (Scharpf et al. 2012). In particular when the samples of the triad members are hybridized on the
same plate (which is the highly recommended and commonly employed approach), MinimumDistance is
an effective approach for reducing technical and experimental sources of noise which can generate false
positives. Following genome-wide segmentation of these minimum distances by circular binary
segmentation (an extremely fast procedure), final inference regarding de novo copy number events is
based on a posterior calling step on the inferred candidate regions. This procedure is about an order of
magnitude faster than the joint PennCNV HMM. MinimumDistance uses the same code for the triad
copy number states, where `332' and `331' represent de novo loss of alleles in the proband. All analyses
using the MinimumDistance algorithm were carried out in the statistical environment R (http://cran.rproject.org/) using the packages DNACopy, GenomicRanges, GWASTools, IRanges, MinimumDistance, all
available as free software via the Bioconductor (http://www.bioconductor.org/). The results of the
MinimumDistance calls were incorporated into a BED file format suitable for use in the UCSC Genome
Browser and submitted to the FaceBase Hub in January, 2013. This file contained de novo deletions
from subjects of European ancestry drawn from both the oral cleft study and the dental caries study as a
control group (see Younkin et al. 2014). No personal information on individual subjects was included in
this BED file, and both deletions and amplifications were included (coded to display as amplifications as
red and deletions as blue).
References
Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I: Fast detection of de novo copy
number variants from SNP arrays for case-parent trios. BMC Bioinformatics 2012, 13:330.
Wang K, Chen Z, Tadesse MG, Glessner J, Grant SFA, Hakonarson H, Bucan M, Li M: Modeling genetic
inheritance of copy number variations. Nucleic Acids Res 2008, 36(21):e138
Younkin SG, Scharpf RB, Schwender H, Parker MM, Scott AF, Marazita ML, Beaty TH, Ruczinski I. A
genome-wide study of de novo deletions identifies a candidate locus for non-syndromic isolated cleft
lip/palate risk. BMC Genetics 2014 15:24.