Download HIV Forum PacBio Analysis - Forum for Collaborative HIV Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HIV Forum PacBio Analysis
12 March 2015
FIND MEANING IN COMPLEXITY
PACIFIC BIOSCIENCES® CONFIDENTIAL
© Copyright 2015 by Pacific Biosciences of California, Inc. All rights reserved.
For Research Use Only. Not for use in diagnostic procedures.
Single Molecule, Real-Time (SMRT®) DNA Sequencing
SMRT® Cell
PacBio® RS II
PACIFIC BIOSCIENCES® CONFIDENTIAL
Zero-Mode
Waveguide
Phospholinked
Nucleotide
Trace
PacBio Sample Preparation and Enhancing Accuracy Using CCS
SMRTbell™ template preparation:
Key Advantages:
• Structurally linear
• Topologically circular
• Structural homogeneity of
templates
• Provides sequences of both
forward and reverse strands in the
same trace
Reads of Insert (ROI) / Circular Consensus Sequencing (CCS)
Generates multiple passes on
each molecule sequenced
PACIFIC BIOSCIENCES® CONFIDENTIAL
Result: Highly accurate intra-molecular
circular consensus sequence
Examined Datasets
• Examined Datasets:
•
•
Monogram. 28 samples. ~2kb long region
UNC. 4 samples. ~2.8kb long region
• One PacBio Chip Per Sample
•
•
•
•
•
P6/C4 v2 conditions
4h movies
Modified calculator settings (3:1 pol:template ratio)
Pol Reads/Subreads filtered on ≥ 75% predicted accuracy, ≥ 50bp length
CCS filtered on ≥ 6 passes and ≥ 90% predicted accuracy
4
PACIFIC BIOSCIENCES® CONFIDENTIAL
Monogram Read Statistics
• Monogram Raw Reads
Number
Reads
Mean
Length
N50
Length
Number
CCS / ROI
MGM
Mean
62140
16910
31060
25360
MGM
Median
60980
17050
31410
28710
5
PACIFIC BIOSCIENCES® CONFIDENTIAL
UNC Read Statistics
• UNC Raw Reads
Number
Reads
Mean
Length
N50
Length
Number
CCS / ROI
UNC
Mean
62090
12890
26830
14300
UNC
Median
61440
12710
26860
14220
6
PACIFIC BIOSCIENCES® CONFIDENTIAL
Mapped Read Statistics
• Map ROI reads to HXB2 genome for rough mapping statistics
• Monogram:
• 99.9% map (no contamination)
• 1848 base target
• 91% of hits are full length within 2% (truncated PCR products)
• 24,340 mean coverage uniform
• UNC:
• 99.9% map (no contamination)
• 2891 target length
• 80% of hits are full length within 2% (truncated PCR products)
• 13,010 mean coverage uniform
• At 10k coverage and 1% minor, 95% confident minor seen 84 or more times
7
PACIFIC BIOSCIENCES® CONFIDENTIAL
De-novo Analysis of UNC Mixtures
• Design: 4 mixes, 8 subspecies at different frequencies, each subspecies two
AA variants
• Use de-novo LAA analysis that yields set of unique sequences estimated to
be in the sample.
o
o
Originally designed for multiple HLA diploid gene sequencing.
Relies on reads covering entire genes so variants are phased by being from a
single molecule from a continuous read
• Take LAA estimates, directly translate to amino acids, place next to reference
amino acid sequence, and call amino acid variants (no alignment)
8
PACIFIC BIOSCIENCES® CONFIDENTIAL
De-novo Analysis of UNC Mixtures
• Results with standard parameters.
• Every run calls three sequences of length 2890 bases.
• For 11/12, top three most abundant subspecies called exactly correct. For one
sequence, missed half of one variant.
• Used default settings designed to minimize false positives in general; tuning likely to
yield more sensitive results.
10
30
46
54
63
71
82
88
L...................D...............M.G.....I........L.......A..........V.I...N.L
GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIG...
GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADNTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKVIGTVLVGPTPVNIIGRNLLTQIG...
GADRQGTVSFSFPQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKLIGGIGGFIKVRQYDQILIEICGHKVIGTVLVGPTPVNIIGRNLLTQIG... 962 AA
GADRQGTVSFSFPQITLWQRPIVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFVKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIG...
9
PACIFIC BIOSCIENCES® CONFIDENTIAL
Conclusions and Future Work
• Good coverage of PacBio on HIV Forum Samples
• Long PacBio span regions in their entirety.
• De-novo analysis on mixtures of subspecies
• Works in progress:
• Parameter tuning for more sensitive de novo analysis
• Standard variant positions and haplotype estimation.
10
PACIFIC BIOSCIENCES® CONFIDENTIAL
Related documents