Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
HIV Forum PacBio Analysis 12 March 2015 FIND MEANING IN COMPLEXITY PACIFIC BIOSCIENCES® CONFIDENTIAL © Copyright 2015 by Pacific Biosciences of California, Inc. All rights reserved. For Research Use Only. Not for use in diagnostic procedures. Single Molecule, Real-Time (SMRT®) DNA Sequencing SMRT® Cell PacBio® RS II PACIFIC BIOSCIENCES® CONFIDENTIAL Zero-Mode Waveguide Phospholinked Nucleotide Trace PacBio Sample Preparation and Enhancing Accuracy Using CCS SMRTbell™ template preparation: Key Advantages: • Structurally linear • Topologically circular • Structural homogeneity of templates • Provides sequences of both forward and reverse strands in the same trace Reads of Insert (ROI) / Circular Consensus Sequencing (CCS) Generates multiple passes on each molecule sequenced PACIFIC BIOSCIENCES® CONFIDENTIAL Result: Highly accurate intra-molecular circular consensus sequence Examined Datasets • Examined Datasets: • • Monogram. 28 samples. ~2kb long region UNC. 4 samples. ~2.8kb long region • One PacBio Chip Per Sample • • • • • P6/C4 v2 conditions 4h movies Modified calculator settings (3:1 pol:template ratio) Pol Reads/Subreads filtered on ≥ 75% predicted accuracy, ≥ 50bp length CCS filtered on ≥ 6 passes and ≥ 90% predicted accuracy 4 PACIFIC BIOSCIENCES® CONFIDENTIAL Monogram Read Statistics • Monogram Raw Reads Number Reads Mean Length N50 Length Number CCS / ROI MGM Mean 62140 16910 31060 25360 MGM Median 60980 17050 31410 28710 5 PACIFIC BIOSCIENCES® CONFIDENTIAL UNC Read Statistics • UNC Raw Reads Number Reads Mean Length N50 Length Number CCS / ROI UNC Mean 62090 12890 26830 14300 UNC Median 61440 12710 26860 14220 6 PACIFIC BIOSCIENCES® CONFIDENTIAL Mapped Read Statistics • Map ROI reads to HXB2 genome for rough mapping statistics • Monogram: • 99.9% map (no contamination) • 1848 base target • 91% of hits are full length within 2% (truncated PCR products) • 24,340 mean coverage uniform • UNC: • 99.9% map (no contamination) • 2891 target length • 80% of hits are full length within 2% (truncated PCR products) • 13,010 mean coverage uniform • At 10k coverage and 1% minor, 95% confident minor seen 84 or more times 7 PACIFIC BIOSCIENCES® CONFIDENTIAL De-novo Analysis of UNC Mixtures • Design: 4 mixes, 8 subspecies at different frequencies, each subspecies two AA variants • Use de-novo LAA analysis that yields set of unique sequences estimated to be in the sample. o o Originally designed for multiple HLA diploid gene sequencing. Relies on reads covering entire genes so variants are phased by being from a single molecule from a continuous read • Take LAA estimates, directly translate to amino acids, place next to reference amino acid sequence, and call amino acid variants (no alignment) 8 PACIFIC BIOSCIENCES® CONFIDENTIAL De-novo Analysis of UNC Mixtures • Results with standard parameters. • Every run calls three sequences of length 2890 bases. • For 11/12, top three most abundant subspecies called exactly correct. For one sequence, missed half of one variant. • Used default settings designed to minimize false positives in general; tuning likely to yield more sensitive results. 10 30 46 54 63 71 82 88 L...................D...............M.G.....I........L.......A..........V.I...N.L GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIG... GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADNTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKVIGTVLVGPTPVNIIGRNLLTQIG... GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKLIGGIGGFIKVRQYDQILIEICGHKVIGTVLVGPTPVNIIGRNLLTQIG... 962 AA GADRQGTVSFSFPQITLWQRPIVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFVKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIG... 9 PACIFIC BIOSCIENCES® CONFIDENTIAL Conclusions and Future Work • Good coverage of PacBio on HIV Forum Samples • Long PacBio span regions in their entirety. • De-novo analysis on mixtures of subspecies • Works in progress: • Parameter tuning for more sensitive de novo analysis • Standard variant positions and haplotype estimation. 10 PACIFIC BIOSCIENCES® CONFIDENTIAL