Download Event-wise testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Detection of Structural Variants
Structural Variants (SVs)
► Variants that change the landscape of chromosomes
► Copy-number variations (CNVs)
Microscopic
Sub-microscopic
deletions/insertions/duplications usu. > 1kb,
unbalanced rearrangements
large-scale CNVs ≥ 50 kb
► Other structutral variants
Indels (insertions, deletions usu. ≤1kb)
Balanced rearrangements (inversions, translocations)
Segmental duplications
Repeats
SVs - Functional Impacts
► Wide-spread in human genomes
► Evolution, genetic diversity between individuals, genetic diseases
► More significant impact on phenotypic variation than SNPs
► Higher de novo locus-specific mutation rate than SNPs
New mutation rate
Zhang et al. Annual Reviews. 2009
Detecting Methods – Now & Then
Low-throughput
High-throughput
► Microscopy
► Comparative genome hybridization –
array-CGH
Leujeune et al. Acad Sci 248 (1959)
Down’s syndrome (trisomy 21’)
Fluorescent labeled probes (FISH)
R. Redon et al., Nature 444, 444 (2006).
DNA microarrays
Detection of CNVs
Resolution: low, > 50kb
► Southern Blotting
► Fosmid paired-end sequencing (FPES)
► In situ hybridization (ISH),
E. Tuzun et al., Nat. Genet. 37, 727 (2005)
Low resolution: > 8kb
Laborious
► PCR-based methods
High-throughput
Next generation sequencing
► Paired-end mapping (PEM)
► Read Depth of Coverage
Korbel et al. Science 318 (2007)
Detection of deletions < 1kb (< insert size)
breakpoints in a small region
Yoon et al. Genome Research 19 (2009)
Detection of SVs of large size, SVs in
complex genomic regions (segmental
duplication-rich)
Paired-end mapping (PEM)
Korbel et al. Science 318, 420 (2007)
3kb-fragments
Flow Chat
-> Paired-ends mapping to
the ref human genome
-> Analysis for distribution
-> Determination of cutoffs
-> Detection of SVs
Paired-end mapping (PEM)
Korbel et al. Science 318, 420 (2007)
Signatures used to detect SVs
1.
Deletions: paired-ends spanning longer regions than a specified cutoff
2.
Simple insertions: paired-ends spanning shorter regions than a cutoff
3.
Mated insertions: sequences connected to a distal locus with paired ends
4.
Unmated insertions: sequences connected to a distal locus with only one predicted breakpoint
5.
Inversions: Paired-ends with different orientation
Paired-end mapping (PEM)
Korbel et al. Science 318, 420 (2007)
Detected SVs
SV size: simple insertions (2 - 3 kb); others (~3 kb or larger)
Average breakpoint resolution: 644 bp (allowing validation by PCR)
Paired-end mapping (PEM)
Korbel et al. Science 318, 420 (2007)
SV validation
► PCR (97%)
► Comparison with the Database of Genomic Variants (DGV) (60%)
► Comparison with alternative human genome assembly (“Celera assembly”) (12- 22%)
► Array-CGH (65%)
► Fiber-FISH
► One-pass PCR for breakpoint junctions
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Pipeline
Raw genome sequence data
(.fastq files/short-gun sequencing)
Mapping/alignement to the reference genome
(.bam files/MAQ alignment)
Filtering for reads of low mapping quality
(.SAMtools)
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Estimation of Read Depth (RD)
► RD = read count = number of mapped reads in nonoverlapping windows of 100bp
► Each read counted once, by the start position
► Adjustment of RD after the deviation in coverage for a given GC content (GC
content influence sequence coverage)  GC-corrected RD
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Event Detection – Event-wise testing
► GC-corrected RD = quantitative measurement of genome copy number
► Deletion = decrease, duplication = increase in coverage (across multiple consecutive
windows)
► Event-wise testing method as CNV-calling algorithm
Based on significant testing
Search for small events of statistical significance and cluster them into larger
events
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Event-wise testing - Algorithm
► Coversions
Z-core (Zi)
= (RDi – mean RD)/Standard deviation
Upper-tail probability Pi(upper)
= P(Z>Zi)
Lower-tail probability Pi(lower)
= P(Z<Zi)
► For an interval of consecutive windows A (with l windows)
If
Max{Pi(upper)/i Є A} < {FPR/(L/l)}1/l  duplications
If
Max{Pi(lower)/i Є A} < {FPR/(L/l)}1/l  deletions
FPR = the nominal false positive rate desired for the entire chromosome
L = no. of windows of a chromosome
l = no. of windows in the interval A
► Search first with 2-window events, then increase the size of event by 1
► Stop searching with N-1 when {FPR/(L/N)}1/N > 0.5
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Event-wise testing - CNV calling results
CNVs by Read Depth of Coverage
Yoon et al. Genome Research 19 (2009)
Call Results - Filtering
► Merging of clusters of small events with copy number change in the same direction
► Filtering out events with median RD = 0.75 X – 1.25 X overall mean RD
► Significant testing (one-sided Z-test) of merged events. Significance filtering threshold 10-6
► Increased stringency but reduced sensitivity
Call Results - Simulations
► To test the FPRs and false negative rates of the EWT calls
► Pairwise comparison of RD among individuals
 distinguish polymorphic from monomorphic events
T-test
Polymorphic : t-test P-value < 0.001
& the abosulte difference between median read counts > 0.5
► Simulation of the obtained data sets
Comparison between RD and PEM
RD
PEM
1100 bp
Median size
414 bp
less
Simple repeats
more
more
Segmental duplications
less