Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SUPPLEMENTARY MATERIAL Noninvasive Prenatal Detection of Sex Chromosomal Aneuploidies by Sequencing Circulating Cell-Free DNA from Maternal Plasma Amin R. Mazloom1, Željko Džakula1, Huiquan Wang1, Paul Oeth1, Taylor Jensen1, John Tynan1, Ron McCullough1, Juan-Sebastian Saldivar1, Mathias Ehrich2, Dirk van den Boom2, Allan T. Bombard1,2, Margo Maeder2, Graham McLennan2, Wendy Meschino3, Glenn E. Palomaki4, Jacob A. Canick4, Cosmin Deciu1 1 Sequenom Center for Molecular Medicine, San Diego, CA, USA 2 Sequenom 3 North 4 Inc., San Diego, CA, USA York General Hospital, Department of Genetics, Toronto, ON, Canada Division of Medical Screening and Special Testing, Department of Pathology and Laboratory Medicine, Women & Infants Hospital, Alpert Medical School of Brown University, Providence, RI, USA Removal of Systematic Biases To remove systematic biases from raw measurements, we extended a previously developed parameterized error removal and unbiased normalization (PERUN) protocol to sex chromosomes. Both chromosomes X and Y were partitioned into contiguous, nonoverlapping 50 kBp genomic regions and parameterized as previously described for autosomes.1 The parameterization and normalization starts by extracting the GC content of each region from the reference human genome. For each sequenced sample, linear regression was applied to the measured numbers of aligned reads per region (counts) as a function of the region-specific GC content. Regions which exhibited outlier behavior in either counts or GC content were not included in this linear model. The sample-specific GC bias coefficients were evaluated as the slopes of the straight lines relating the counts to the GC content. For each genomic region, a regression of measured read counts versus sample-specific GC bias coefficient values yielded the region-specific model parameters. The parameters are employed to flatten the systematic variations in measured read counts. The model parameters for chromosome X were derived from 752 euploid samples corresponding to female fetuses. Region filtering for chromosome X was based on 10fold cross-validation, stratified with respect to the levels of the GC bias coefficient. The final region selection comprised 76.7% of chromosome X and used to quantify the amount of chromosome X present in the sample. Model parameters for chromosome Y were extracted from a separate set of adult male samples. To identify informative chromosome Y regions, all euploid count profiles were first normalized according to the PERUN procedure. For each chromosome Y region, we calculated the median and mean absolute deviation (MAD) of the normalized counts over the subset of 752 female samples as well as over the subset of 757 male samples. The two medians and MADs were then combined to yield a single, region-specific t-statistic value. A subset of regions representing 2.2% of chromosome Y generated t-values that exceeded a predefined cutoff of 50. Those regions were used to evaluate the representation of chromosome Y. Distribution of Chromosome X Representations The distribution of chromosome X representations for the euploid female pregnancies in the training cohort (shown in a comparative manner to the normal distribution by means of a quintile-quintile plot in Figure S1) is markedly asymmetrical, with a heavy left tail. The observed skew may result from putative maternal and/or fetal mosaicism of monosomy X, as well as technological imperfections (GC bias and other systematic errors). Such distribution increases the complexity associated with detection of fetal chromosome X abnormalities. Standardized Chromosome X Representations To express chromosome X representations in terms of standardized z-scores, we estimate the width of a censored chromosome X distribution for normal female samples from the training set. First, we establish a linear model relating the theoretical standard normal quintiles (the predictor) to the observed chromosome X quintiles (the response variable). Figure S2 illustrates the distribution of the residuals from the point estimates based on the linear model. The samples whose residuals deviate more than 5 to the left or more than 3 to the right of mode of the distribution (Figure S2, horizontal lines) are excluded, yielding a censored distribution of the chromosome X representation. The width, , of this censored distribution is then used to standardize female chromosome X representations as follows: (S1) In Equation S1 above, chrX is the chromosome X representation for female pregnancies, ZX is the standardized equivalent of chrX, represents the median of chrX, and stands for the median absolute deviation of the censored distribution of chromosome X representations. Non-reportable SCA regions for female samples To reflect the complexity of the detection of chromosome X aneuploidy, we introduce two non-reportable regions centered on the 3 cutoffs on the ZX scale. These regions comprise ZX values within the segments [2.5, 3.5]. Non-reportable SCA regions for male samples The classification of a male fetus as normal or aneuploidy with respect to chromosome Y relies on both X and Y chromosomal representations. The depletion of the chromosome X in normal male fetuses is accompanied by the proportional elevation of the chromosome Y. Doubling of the ratio between chromosomes Y and X indicates [47,XYY] fetal aneuploidy. On the other hand, an elevation of the chromosome Y in the absence of the chromosome X depletion suggests a Klinefelter pregnancy [47,XXY]. A two-dimensional XY scatter plot relating the two chromosomal representations (chromosome X on the abscissa and chromosome Y on the ordinate) forms the basis of our classifier for male fetal aneuploidies. Both the measured elevation of the fetal chromosome Y representation and the depletion of the chromosome X are proportional to the fraction of the fetal DNA in the maternal plasma. Insufficient amount of fetal DNA therefore adversely affects the signal to noise ratio. The problem is exacerbated by the routine appearance of a non-zero background chromosome Y signal in all pregnancies, both male and female, in addition to the stochastic errors and in spite of the absence of the chromosome Y from the maternal genome. The artifact may be partially attributed to miss-alignments, but a significant contribution to the noise stems from the homology between the chromosome Y and other chromosomes (most notably chromosome X, e.g. 3.5 Mb of TGIFL X/Y). The scarcity of [47,XXY] and [47,XYY] aneuploidies impeded both the training of our male sex aneuploidy classifier and the assessment of the classifier’s accuracy. Furthermore, at low fetal fraction values, the areas on the XY scatter plot occupied by the two male sex aneuploidies as well as euploid male samples partially overlap. To reflect the complexity of the detection of chromosome Y aneuploidy due to insufficient fetal DNA levels, we introduced two non-reportable zones for samples pertaining to male fetuses. The first such zone is defined by the euploid control samples containing pooled male fetal DNA at a median level of 4%. This zone comprises the semi-plane defined by the chromosome Y levels below the 0.15 percentile of the euploid control measurements (Figure S4, yellow horizontal line). The second zone outlines the overlap between the XXY, XYY, and euploid male areas. It is shaped as a right-angled triangle just above the origin of the XY scatter plot. The triangle touches the first NR zone with its horizontal cathetus (Figure S4). The location of its vertical cathetus is determined by the ZX = –3. The hypotenuse of the triangle coincides with the straight line defining the upper 99th percentile confidence interval for the normal male area (Figure S4, upper tilted blue line). The line’s intersection with the euploid male control cutoff (Figure S4, yellow horizontal line) marks the right extremity of the triangle. The above-introduced cutoffs are summarized in Table S1. The decision tree which implements the SCA algorithm is depicted in Figure S5, using the variable names from Table S1. Table S1: Description of cutoffs used in the sex chromosome aneuploidy (SCA) decision tree Variable Description Chromosome X representation for sample Chromosome Y representation for sample Predicted fetal sex for sample Gray zone for chromosome X } Standardized chromosome X representation for sample as described in the text 0.15 percentile of chromosome X representations in the female pregnancies of the training cohort 99.85 percentile of chromosome X representations in the female pregnancies of the training cohort Threshold for maternal monosomy 45,X Threshold for maternal trisomy 47,XXX 99.7% CI of euploid female chromosome X representation 99% CI of the point estimate of chromosome Y representation at level of chromosome X 0.05 percentile confidence level of the point estimate of chromosome Y representation at level of chromosome X The 0.15 percentile of the chromosome Y representation level of the pooled male control samples The gray zone for the male pregnancies with a right-angle triangular geometry in chromosome-XY representation plane Figure S1: Quintile comparison between the theoretical normal distribution and the distribution of chromosome X representations observed in from the training cohort. Standard normal quintiles and observed chromosome X quintiles are shown along the abscissa and ordinate, respectively. The solid line connects the first and the third quartiles of the chromosome X. Figure S2: Distribution of the residuals of female chromosome X representations. The residuals are estimated from the linear model trained on the interquartile range of the female cohort. Figure S3: Gray ribbons: non-reportable zone for female fetal sex aneuploidy classification. Figure S4: The gray triangular region delineates male pregnancies deemed nonreportable for sex chromosomal aneuploidies. The dotted yellow horizontal line depicts the 0.15% percentile of the male euploid control samples spiked with 4% fetal fraction. The vertical dotted lines correspond to ZX = –3 and ZX = 3. Figure S5: The decision tree used in the sex chromosome aneuploidy (SCA) algorithm (for a description of the variables’ names see table S1) References: 1. Jensen TJ, Zwiefelhofer T, Tim RC, et al. High-throughput massively parallel sequencing for fetal aneuploidy detection from maternal plasma. PLOS ONE 2013; in press.