Download Reference - Human Microbiome Journal Club

Next-Generation Sequencing of Microbial Genomes and Metagenomes Christine King Farncombe Metagenomics Facility Human Microbiome Journal Club July 13, 2012 Overview  Next-generation sequencing  Applications  Instruments  Library prep and sequencing chemistry  Sequence quality  Project overview  Microbial genomes  Microbial communities DNA Sequencing  1st generation Sanger chain termination  Capillary electrophoresis   2nd generation (NGS) High throughput, “massively parallel”  Shorter reads  Sequencing-by-synthesis   3rd generation Single molecule  Nanopores  Applications  DNA sequencing   De novo genomes Resequencing     Metagenome     Shotgun (e.g. mutant strains) Amplicon (e.g. HLA, cancer) Sequence capture (e.g. exome) Amplicon (e.g. 16S, COI, viral) Shotgun ChIP RNA sequencing    Gene expression Gene annotation, splice variants Metatranscriptome Instruments Instruments Instrument # of reads Read length (bp) Total outpu t (Gb) Cost per base Run Time GS FLX 1M 450 0.5 $$$$ ++ GS FLX+ 1M 650 0.6 $$$$ ++ GS Jr 100K 450 0.05 $$$$ ++ GAIIx 640M 2x 150 90 $$ +++ HiSeq 2000 6B 2x 100 600 $ +++ MiSeq 12M 2x 150 2 $$ ++ PacBio RS >10K >1000 0.01 $$$$ + Single-molecule seq, fluorophore SOLiD 5500xl 1.4B 75 + 35 155 $ +++ emPCR, probe ligation, fluorophore Ion PGM - 316 1M >100 0.1 $$$ + Technology emPCR, SBS, light detection Bridge PCR, SBS, fluororphore emPCR, SBS, pH change Ion PGM - 318 6M >100 1 $$ + Which instrument(s) to use?  Read length vs number of reads  Cost per base, per sample, per project (multiplexing?)  Accuracy  Run time, wait time Application Length # Reads Accuracy Instruments Considerations De novo (small) +++ ++ ++ MiSeq, 454, Ion Mix lengths De novo (large) +++ +++ ++ HiSeq, 454, SOLiD Mix lengths, MP Re-seq (small) ++ ++ ++ MiSeq, Ion Multiplex? Re-seq (large) ++ +++ ++ HiSeq, SOLiD Enrichment? RNA-seq (count) + +++ + Illumina, SOLiD, Ion Ref? Size? Rare? Amplicons +++ + +++ 454, MiSeq Size? Multiplex? Metagenomics ++ +++ +++ Illumina, 454, SOLiD Length vs depth Library Preparation     Goal: fragments of DNA, each end flanked by adaptor sequences Adaptors contain amplification- and sequencing primer binding sites; platform- and chemistry-specific Optional: sample-specific barcodes/indexes/MIDs/tags allow multiplexing during sequencing Library QC: quantity, size Library Preparation  Library types:  Shotgun (DNA) May begin with ChIP  May follow with sequence capture  Mate pair (DNA)  Amplicon (DNA)  Total RNA  May enrich for mRNA (poly-A enrichment, rRNA depletion)  Convert to cDNA (then similar to DNA protocols)   Small RNA  RNA ligations, convert to cDNA after Library Preparation: Shotgun  Fragmentation  Sonication Nebulization  Enzymatic   End repair    3’ overhangs digested 5’ overhangs filled 5’ phosphate added Library Preparation: Shotgun  Adapter ligation    Library amplification     T-overhangs Forked structure controls orientation Few cycles Enrich for correctly-adapted fragments Required to complete adapter structure in some protocols Size selection   Gel excision, AMPure beads Limit insert size as needed, remove artifacts Library Preparation: Amplicon  Amplify region of interest using PCR  Primers contain adapter sequences Library Preparation: Mate Pair   Begin with large fragments (e.g. 3kb, 20kb) Circularize and fragment again     Illumina: direct ligation 454: Cre/Lox recombination Enrich for fragments containing the junction Proceed with shotgun library prep Library Preparation: Mate Pair   Why? Paired sequences are a known distance apart; improves genome assembly Note: 454 calls these “paired end libraries”, not to be confused with Illumina’s “paired end sequencing”! Sequencing: Illumina  Cluster generation   Library fragments hybridize to oligos on the flow cell  New strand synthesized, original denatured, removed  Free end binds to adjacent oligos (bridge formation)  Complimentary strand synthesized, denatured (both tethered to flow cell)  Repeat to form clonal cluster  Cleave one oligo, denature to leave ssDNA clusters ~800K clusters/mm^2 Sequencing: Illumina  Variety of workflows:  Single- or paired end reads  0, 1, or 2 index reads Sequencing: Illumina     At each cycle, all 4 fluorescently-labeled nucleotides pass over the flow cell Each cluster incorporates one nt (terminator) per cycle Fluor is imaged, then cleaved De-block and repeat Sequencing: Illumina  Other terminology:        cBot – accessory instrument that performs cluster generation Lanes – divisions (8) of HiSeq and GAIIx flow cells PhiX – bacteriophage with small, balanced genome; PhiX library spiked in with samples for QC Phasing/pre-phasing – nt incorporation falls behind or jumps ahead on a portion of strands in the cluster and contributes to noise Chastity filter – measures signal purity (after intensity corrections); if the background signal is high, cluster will be discarded BaseSpace – cloud computing site for processing MiSeq data File format: fastq Sequencing: 454   emPCR: clonal amplification of beadbound library in microdroplets Library input amounts critical!  One molecule per bead  Titration procedure Sequencing: 454    Library capture: beads coated with complimentary oligo Amplification: droplet contains PCR reagents and the other oligo Post-PCR: millions of identical fragments attached to the bead Sequencing: 454  Bead Recovery: physical and chemical disruption  Enrichment: capture successfully amplified beads using biotinylated primers + magnetic, streptavidin beads Sequencing: 454  Deposit bead layers onto PicoTiterPlate:  Enzyme beads  Enriched DNA beads  More enzyme beads  PPiase beads Sequencing: 454 Sequencing: 454  Pyrosequencing  4 nucleotides flow separately  If nt incorporation…PPi...light  APS + PPi (sulfurylase)  ATP  Luciferin + ATP (luciferase)  light + oxyluciferin  Amount of light proportional to #nt incorporated  Rinse and repeat with next nt Sequencing: 454  Camera captures light emitted from every well during every nucleotide flow Sequencing: 454  Flowgram: representation of a sequence, based on the pattern of light emitted from a single well Sequencing: 454  Other terminology: Lib-L/Lib-A: adapter variants, “ligated” or “annealed”  Titanium chemistry: ~450 bp reads on all instruments  XL+ chemistry: ~700 bp reads on the FLX+ instrument  Flow: one of the four nucleotides flows over the PTP  Cycle: a set of four flows, in order  Valley flow: if number of bases incorporated in a given read during that flow is uncertain, e.g. 1.5 units of light (background signal, homopolymers)   File format: sff (standard flowgram format) Sequencing: Ion Torrent    Procedures and chemistry similar to 454 Instead of PPi, measure H+ release (pH change) via semiconductor chip No expensive camera or laser required, no modified nucleotides Sequence Quality Phred (Q) Score Probability of Error (P) Base Call Accuracy 10 1 in 10 90% 20 1 in 100 99% 30 1 in 1K 99.9% 40 1 in 10K 99.99% 50 1 in 100K 99.999%    Error probabilities determined using training sets, platformspecific biases Expressed as a quality value (QV or Q score) per base Similar to PHRED scores: Q = -10 log10P  P = 10 -Q/10  Project 1: Microbial Genome  Considerations: Reference genome?  How much coverage do I want?  How big is the genome  How much data do I need?    bp needed = genome size X coverage Which instrument/chemistry configuration to use?  Coverage Depth (number of times a particular base is “covered” by a read (e.g. 25X)  Breadth (% of genome with at least 1X coverage)  Project 1: Microbial Genome  Sample preparation Isolate high quality (not degraded) and high purity (no RNA) gDNA  Verify on a gel  Quantify using dsDNA-specific dye   Library preparation Can do this yourself if you like  ~ $200 per sample for Nextera  Cheaper protocols  Cheaper in bulk   Barcode compatibility Project 1: Microbial Genome  Library QC  Insert size confirmed on BioAnalyzer (within range, no artifacts)  Pool barcoded libraries (normalize based on PicoGreen quantification)  Absolute quantification of library pools using qPCR Project 1: Microbial Genome  MiSeq sequencing  Dilute and denature library pool (optimal concentration requires titration...)  Spike in PhiX library as needed (e.g. 1%)  Prepare and load reagents, flow cell  Basic filtering and de-multiplexing performed automatically  Download fastq files from BaseSpace Project 1: Microbial Genome  Data processing  Additional filtering  Trim the ends  Remove PCR duplicates  Assembly: overlapping reads are assembled to eachother based on sequence similarity = contigs Project 1: Microbial Genome  What’s next?  Polish the genome (hybrid assemblies, mate pair libraries)  Annotate (ORFs, RNAseq)  Compare Project 2: Microbial Community  Shotgun metagenomics Unbiased survey of community content  Random library fragments may provide very little taxonomic resolution (e.g. conserved, unknown)   Identify genes, classify by function  Targeted metagenomics Limited survey of community content  Targeted loci provide excellent taxonomic resolution, but may exclude certain taxa   Identify OTUs, classify by taxonomy Project 2: Microbial Community     16S rRNA Multi-copy gene (1.5 kb) Conserved and hypervariable regions Extensive databases from known species Project 2: Microbial Community  Considerations:  Biases in sampling methods, culturing, DNA isolation, PCR...replicate  Available SOPs  How many reads per sample?  Read length matters!  Sample preparation:  Isolate DNA  PCR amplify, purify  High-fidelity polymerase  Barcoded primers  No primer dimers!  Normalize PCR products and pool Project 2: Microbial Community  454 Sequencing  emPCR titrations with different library input  Bulk emPCR  Sequence  Basic filtering  Collect sff files  Data processing  De-multiplexing  Additional filtering  Trim the barcodes, primers  Check for chimeras Project 2: Microbial Community  Clustering  Sequences grouped by similarity = OTUs Project 2: Microbial Community  Taxonomic identification OTUs are classifed by comparing to known 16S sequences  Level of classification (e.g. family vs genus)?   Diversity Within sample  Between samples 

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Reference - Human Microbiome Journal Club