Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Detection of Structural Variants Structural Variants (SVs) ► Variants that change the landscape of chromosomes ► Copy-number variations (CNVs) Microscopic Sub-microscopic deletions/insertions/duplications usu. > 1kb, unbalanced rearrangements large-scale CNVs ≥ 50 kb ► Other structutral variants Indels (insertions, deletions usu. ≤1kb) Balanced rearrangements (inversions, translocations) Segmental duplications Repeats SVs - Functional Impacts ► Wide-spread in human genomes ► Evolution, genetic diversity between individuals, genetic diseases ► More significant impact on phenotypic variation than SNPs ► Higher de novo locus-specific mutation rate than SNPs New mutation rate Zhang et al. Annual Reviews. 2009 Detecting Methods – Now & Then Low-throughput High-throughput ► Microscopy ► Comparative genome hybridization – array-CGH Leujeune et al. Acad Sci 248 (1959) Down’s syndrome (trisomy 21’) Fluorescent labeled probes (FISH) R. Redon et al., Nature 444, 444 (2006). DNA microarrays Detection of CNVs Resolution: low, > 50kb ► Southern Blotting ► Fosmid paired-end sequencing (FPES) ► In situ hybridization (ISH), E. Tuzun et al., Nat. Genet. 37, 727 (2005) Low resolution: > 8kb Laborious ► PCR-based methods High-throughput Next generation sequencing ► Paired-end mapping (PEM) ► Read Depth of Coverage Korbel et al. Science 318 (2007) Detection of deletions < 1kb (< insert size) breakpoints in a small region Yoon et al. Genome Research 19 (2009) Detection of SVs of large size, SVs in complex genomic regions (segmental duplication-rich) Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) 3kb-fragments Flow Chat -> Paired-ends mapping to the ref human genome -> Analysis for distribution -> Determination of cutoffs -> Detection of SVs Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) Signatures used to detect SVs 1. Deletions: paired-ends spanning longer regions than a specified cutoff 2. Simple insertions: paired-ends spanning shorter regions than a cutoff 3. Mated insertions: sequences connected to a distal locus with paired ends 4. Unmated insertions: sequences connected to a distal locus with only one predicted breakpoint 5. Inversions: Paired-ends with different orientation Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) Detected SVs SV size: simple insertions (2 - 3 kb); others (~3 kb or larger) Average breakpoint resolution: 644 bp (allowing validation by PCR) Paired-end mapping (PEM) Korbel et al. Science 318, 420 (2007) SV validation ► PCR (97%) ► Comparison with the Database of Genomic Variants (DGV) (60%) ► Comparison with alternative human genome assembly (“Celera assembly”) (12- 22%) ► Array-CGH (65%) ► Fiber-FISH ► One-pass PCR for breakpoint junctions CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Pipeline Raw genome sequence data (.fastq files/short-gun sequencing) Mapping/alignement to the reference genome (.bam files/MAQ alignment) Filtering for reads of low mapping quality (.SAMtools) CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Estimation of Read Depth (RD) ► RD = read count = number of mapped reads in nonoverlapping windows of 100bp ► Each read counted once, by the start position ► Adjustment of RD after the deviation in coverage for a given GC content (GC content influence sequence coverage) GC-corrected RD CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event Detection – Event-wise testing ► GC-corrected RD = quantitative measurement of genome copy number ► Deletion = decrease, duplication = increase in coverage (across multiple consecutive windows) ► Event-wise testing method as CNV-calling algorithm Based on significant testing Search for small events of statistical significance and cluster them into larger events CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event-wise testing - Algorithm ► Coversions Z-core (Zi) = (RDi – mean RD)/Standard deviation Upper-tail probability Pi(upper) = P(Z>Zi) Lower-tail probability Pi(lower) = P(Z<Zi) ► For an interval of consecutive windows A (with l windows) If Max{Pi(upper)/i Є A} < {FPR/(L/l)}1/l duplications If Max{Pi(lower)/i Є A} < {FPR/(L/l)}1/l deletions FPR = the nominal false positive rate desired for the entire chromosome L = no. of windows of a chromosome l = no. of windows in the interval A ► Search first with 2-window events, then increase the size of event by 1 ► Stop searching with N-1 when {FPR/(L/N)}1/N > 0.5 CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Event-wise testing - CNV calling results CNVs by Read Depth of Coverage Yoon et al. Genome Research 19 (2009) Call Results - Filtering ► Merging of clusters of small events with copy number change in the same direction ► Filtering out events with median RD = 0.75 X – 1.25 X overall mean RD ► Significant testing (one-sided Z-test) of merged events. Significance filtering threshold 10-6 ► Increased stringency but reduced sensitivity Call Results - Simulations ► To test the FPRs and false negative rates of the EWT calls ► Pairwise comparison of RD among individuals distinguish polymorphic from monomorphic events T-test Polymorphic : t-test P-value < 0.001 & the abosulte difference between median read counts > 0.5 ► Simulation of the obtained data sets Comparison between RD and PEM RD PEM 1100 bp Median size 414 bp less Simple repeats more more Segmental duplications less