Download To remove systematic biases from raw measurements, we

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SUPPLEMENTARY MATERIAL
Noninvasive Prenatal Detection of Sex
Chromosomal Aneuploidies by Sequencing
Circulating Cell-Free DNA from Maternal Plasma
Amin R. Mazloom1, Željko Džakula1, Huiquan Wang1, Paul Oeth1, Taylor Jensen1,
John Tynan1, Ron McCullough1, Juan-Sebastian Saldivar1, Mathias Ehrich2, Dirk
van den Boom2, Allan T. Bombard1,2, Margo Maeder2, Graham McLennan2,
Wendy Meschino3, Glenn E. Palomaki4, Jacob A. Canick4, Cosmin Deciu1
1
Sequenom Center for Molecular Medicine, San Diego, CA, USA
2 Sequenom
3 North
4
Inc., San Diego, CA, USA
York General Hospital, Department of Genetics, Toronto, ON, Canada
Division of Medical Screening and Special Testing, Department of Pathology and
Laboratory Medicine, Women & Infants Hospital, Alpert Medical School of Brown
University, Providence, RI, USA
Removal of Systematic Biases
To remove systematic biases from raw measurements, we extended a previously developed
parameterized error removal and unbiased normalization (PERUN) protocol to sex
chromosomes. Both chromosomes X and Y were partitioned into contiguous, nonoverlapping 50 kBp genomic regions and parameterized as previously described for
autosomes.1 The parameterization and normalization starts by extracting the GC content
of each region from the reference human genome. For each sequenced sample, linear
regression was applied to the measured numbers of aligned reads per region (counts) as a
function of the region-specific GC content. Regions which exhibited outlier behavior in
either counts or GC content were not included in this linear model. The sample-specific
GC bias coefficients were evaluated as the slopes of the straight lines relating the counts
to the GC content. For each genomic region, a regression of measured read counts versus
sample-specific GC bias coefficient values yielded the region-specific model parameters.
The parameters are employed to flatten the systematic variations in measured read
counts. The model parameters for chromosome X were derived from 752 euploid samples
corresponding to female fetuses. Region filtering for chromosome X was based on 10fold cross-validation, stratified with respect to the levels of the GC bias coefficient. The
final region selection comprised 76.7% of chromosome X and used to quantify the
amount of chromosome X present in the sample. Model parameters for chromosome Y
were extracted from a separate set of adult male samples. To identify informative
chromosome Y regions, all euploid count profiles were first normalized according to the
PERUN procedure. For each chromosome Y region, we calculated the median and mean
absolute deviation (MAD) of the normalized counts over the subset of 752 female
samples as well as over the subset of 757 male samples. The two medians and MADs
were then combined to yield a single, region-specific t-statistic value. A subset of regions
representing 2.2% of chromosome Y generated t-values that exceeded a predefined cutoff
of 50. Those regions were used to evaluate the representation of chromosome Y.
Distribution of Chromosome X Representations
The distribution of chromosome X representations for the euploid female pregnancies in
the training cohort (shown in a comparative manner to the normal distribution by means
of a quintile-quintile plot in Figure S1) is markedly asymmetrical, with a heavy left tail.
The observed skew may result from putative maternal and/or fetal mosaicism of
monosomy X, as well as technological imperfections (GC bias and other systematic
errors). Such distribution increases the complexity associated with detection of fetal
chromosome X abnormalities.
Standardized Chromosome X Representations
To express chromosome X representations in terms of standardized z-scores, we estimate
the width of a censored chromosome X distribution for normal female samples from the
training set. First, we establish a linear model relating the theoretical standard normal
quintiles (the predictor) to the observed chromosome X quintiles (the response variable).
Figure S2 illustrates the distribution of the residuals from the point estimates based on the
linear model. The samples whose residuals deviate more than 5 to the left or more than
3 to the right of mode of the distribution (Figure S2, horizontal lines) are excluded,
yielding a censored distribution of the chromosome X representation. The width, , of
this censored distribution is then used to standardize female chromosome X
representations as follows:
(S1)
In Equation S1 above, chrX is the chromosome X representation for female pregnancies,
ZX is the standardized equivalent of chrX, represents the median of chrX, and stands
for the median absolute deviation of the censored distribution of chromosome X
representations.
Non-reportable SCA regions for female samples
To reflect the complexity of the detection of chromosome X aneuploidy, we introduce
two non-reportable regions centered on the 3 cutoffs on the ZX scale. These regions
comprise ZX values within the segments [2.5, 3.5].
Non-reportable SCA regions for male samples
The classification of a male fetus as normal or aneuploidy with respect to chromosome Y
relies on both X and Y chromosomal representations. The depletion of the chromosome
X in normal male fetuses is accompanied by the proportional elevation of the
chromosome Y. Doubling of the ratio between chromosomes Y and X indicates
[47,XYY] fetal aneuploidy. On the other hand, an elevation of the chromosome Y in the
absence of the chromosome X depletion suggests a Klinefelter pregnancy [47,XXY]. A
two-dimensional XY scatter plot relating the two chromosomal representations
(chromosome X on the abscissa and chromosome Y on the ordinate) forms the basis of
our classifier for male fetal aneuploidies.
Both the measured elevation of the fetal chromosome Y representation and the depletion
of the chromosome X are proportional to the fraction of the fetal DNA in the maternal
plasma. Insufficient amount of fetal DNA therefore adversely affects the signal to noise
ratio. The problem is exacerbated by the routine appearance of a non-zero background
chromosome Y signal in all pregnancies, both male and female, in addition to the
stochastic errors and in spite of the absence of the chromosome Y from the maternal
genome. The artifact may be partially attributed to miss-alignments, but a significant
contribution to the noise stems from the homology between the chromosome Y and other
chromosomes (most notably chromosome X, e.g. 3.5 Mb of TGIFL X/Y).
The scarcity of [47,XXY] and [47,XYY] aneuploidies impeded both the training of our
male sex aneuploidy classifier and the assessment of the classifier’s accuracy.
Furthermore, at low fetal fraction values, the areas on the XY scatter plot occupied by the
two male sex aneuploidies as well as euploid male samples partially overlap. To reflect
the complexity of the detection of chromosome Y aneuploidy due to insufficient fetal
DNA levels, we introduced two non-reportable zones for samples pertaining to male
fetuses. The first such zone is defined by the euploid control samples containing pooled
male fetal DNA at a median level of 4%. This zone comprises the semi-plane defined by
the chromosome Y levels below the 0.15 percentile of the euploid control measurements
(Figure S4, yellow horizontal line).
The second zone outlines the overlap between the XXY, XYY, and euploid male areas. It
is shaped as a right-angled triangle just above the origin of the XY scatter plot. The
triangle touches the first NR zone with its horizontal cathetus (Figure S4). The location of
its vertical cathetus is determined by the ZX = –3. The hypotenuse of the triangle
coincides with the straight line defining the upper 99th percentile confidence interval for
the normal male area (Figure S4, upper tilted blue line). The line’s intersection with the
euploid male control cutoff (Figure S4, yellow horizontal line) marks the right extremity
of the triangle.
The above-introduced cutoffs are summarized in Table S1. The decision tree which
implements the SCA algorithm is depicted in Figure S5, using the variable names from
Table S1.
Table S1: Description of cutoffs used in the sex chromosome aneuploidy (SCA)
decision tree
Variable
Description
Chromosome X representation for sample
Chromosome Y representation for sample
Predicted fetal sex for sample
Gray zone for chromosome X
}
Standardized chromosome X representation for sample as
described in the text
0.15 percentile of chromosome X representations in the female
pregnancies of the training cohort
99.85 percentile of chromosome X representations in the female
pregnancies of the training cohort
Threshold for maternal monosomy 45,X
Threshold for maternal trisomy 47,XXX
99.7% CI of euploid female chromosome X representation
99% CI of the point estimate of chromosome Y representation at
level of chromosome X
0.05 percentile confidence level of the point estimate of chromosome
Y representation at level of chromosome X
The 0.15 percentile of the chromosome Y representation level of the
pooled male control samples
The gray zone for the male pregnancies with a right-angle triangular
geometry in chromosome-XY representation plane
Figure S1: Quintile comparison between the theoretical normal distribution and the
distribution of chromosome X representations observed in from the training cohort.
Standard normal quintiles and observed chromosome X quintiles are shown along
the abscissa and ordinate, respectively. The solid line connects the first and the third
quartiles of the chromosome X.
Figure S2: Distribution of the residuals of female chromosome X representations.
The residuals are estimated from the linear model trained on the interquartile range
of the female cohort.
Figure S3: Gray ribbons: non-reportable zone for female fetal sex aneuploidy
classification.
Figure S4: The gray triangular region delineates male pregnancies deemed nonreportable for sex chromosomal aneuploidies. The dotted yellow horizontal line
depicts the 0.15% percentile of the male euploid control samples spiked with 4%
fetal fraction. The vertical dotted lines correspond to ZX = –3 and ZX = 3.
Figure S5: The decision tree used in the sex chromosome aneuploidy (SCA)
algorithm (for a description of the variables’ names see table S1)
References:
1. Jensen TJ, Zwiefelhofer T, Tim RC, et al. High-throughput massively parallel
sequencing for fetal aneuploidy detection from maternal plasma. PLOS ONE
2013; in press.