Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing Classic chain-terminator sequencing Dye chain-terminator sequencing Next-generation sequencing Next-gen sequencing principle › Massive parallel › Add ACTGs › Catch a signal Roche/454 GS-FLX+ (‘454’) › Pyrosequencing problems with homopolymers (e.g. AAAAAA) › Long-read sequencing: 500-1000 bp › Variable sequencing length › 1 million reads/run 1Gb/run › Sequencing speed: ~ 1 day/run › Next-next generation: IonTorrent PGM/Proton Illumina › Sequence by synthesis › Short-read sequencing: 36, 72, …, 150bp › Fixed sequencing length › 1 billion reads/run 100Gb/run (= 33 x human genome!) Sequencing speed: 3 day – 10 days ~ length Solid › Short-read sequencing (similar to Illumina) 454 Illumina Price per run: $10000/run Price per machine: $200-500.000 › Supporting IT hardware › Peripheral devices such as fragmentation instrument, PCR equipment … › Negotiating power… Use service centers! › Nxtgnt (BE), GATC(EU), Baseclear(NL), BGI … › No overhead cost, no maintenance etc. › Cheaper Next-generation sequencing has become 2nd generation sequencing Next-next-generation sequencing is almost there: 3rd generation sequencing › Helicos: True Single Molecule Sequencing › IonTorrent/Life: Cheap and fast › Nanopore: Unlimited read size › … Evolution sequencing technology goes hand in hand with evolution of › IT infrastructure/hardware › Analysis software Hardware › 1 Illumina run ~ 100Gb text-file ~ 5million page book › Processing power/storage are an issue! Software › Mapping to a human genome: ‘couple of hours’ Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing Prokaryotic genomics 101 › Prokaryotes = bacterias + archaea › Prokaryotic genomes Large circular genome (0.5 – 10 Mb) ‘chromosome’ Small plasmids (1-1000 kb) (virulence factors, antibiotics resistance …) (Almost) no introns Easy ORF annotation Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing 1953: Watson/Crick discover DNA helix 1977: First complete genome bacteriophage φX174 1995: First genome of free-living organism H. influenza 2001: First draft of the human genome 2006: >200 complete bacterial genomes 2012: An uncountable number of bacterial genomes have been sequenced using next-gen sequencing Complete bacterial genomes used to be › Expensive › Difficult to obtain › ‘Nature’ or ‘Science’ work › Remained complex until the invention of next-generation sequencing Using next-generation sequencing, de novo sequencing has become › Relatively easy › Relatively cheap › Routine research Already >10 complete bacterial genomes published in 2012 › More than just an assembly! Practical 1. Get some DNA from an isolated species of interest 2. Sequence: long or short reads (1-10 days) 3. Obtain your sequences 4. Assemble (1h) Pure de novo assembly Guided assembly 5. Annotate the genome (days-weeks) Assembly: Multiple ‘short’ reads 1 long sequence Existing software › Velvet › SSAKE › Newbler › SSAKE › … Source: Nature 2009, MacLean et al. Relatively cheap › Sequencing cost: depending on coverage Illumina, 30x, 5Gb genome: $10-$100 454, 30x, 5Gb genome: $1000-$5000 › Equipment IT infrastructure, sequencing equipment, people … Relatively easy › Need for IT support › No out-of-the-box standard solution for everything › Several different software packages for assembly Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing De novo genome assembly › Study of 1 single species › Need for species isolation Metagenomics analysis › Study of a community of species › No need for isolation (culturing bias!) › Study the collective gene pool and function of the community/ecology › No need for individual functions Practical 1. Get bacterial DNA or RNA from a sample Soil Gut/Fecal Ocean water (e.g. Craig Venter) … 2. Sequence: long or short reads (1-10 days) 3. Obtain your sequences 4. Map on a database of known genes (1 day) 5. Annotate/analyse the community (weeks) 2010: Giant Panda genome (2nd carnivore) › No umami taster receptor -> no meat affinity › The panda is more a dog than a bear › The panda is a carnivore eating bamboo! Still 2010 !: Panda ‘microbiome’ Gut microbiome of the panda reveals the presence of bamboo/cellulose degrading pathways A clinical example: gut microbiome can predict diabetes and malnourishment Plos One (2011), Brown et al. Plos One (2010), GutValladares Pathology et (2011),Gupta al. et al. Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing Classical SNP analysis - practical 1. Design PCR primers 2. Generate amplicons 3. Re-sequence using long read sequencing Conserve ‘SNP blocks’ 4. Detect SNPs 5. Correlate SNPs to drug resistance, severity of symptoms … Amplicon resequencing is the same for human, prokaryotic, viral analyses Many standardized out-of-the-box solutions available Very simple analysis Watch out for the overkill… › › Don’t use a bazooka to kill a fly! Throughput can be too high Profile the coding region of hepatitis C Lauck et al. 2012 Use next-generation sequencing to predict the optimal HIV therapy Thielen et al. 2012 Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing Imagine the following research questions › Which (known) species/groups are present in a certain sample › Does this composition alter given a certain treatment, change of conditions, patients etc. No need for de novo genome sequencing No metagenomics: species instead of functions Prokaryotes have the gene 16S rDNA, coding for ribosomal RNA The 16S rDNA region is 1.5 kb long 16S rDNA is specific for each species/strain 1,500 903 Theoretical: 4 = 10 possibilities In practice: 16S rDNA sequence known for millions of species 16S rDNA can be isolated in different species using universal PCR primers › Isolate/amplify different regions using the same primers Compare the isolated sequences against a database of known sequences Practical procedure 1. Sample an environment and isolate DNA 2. Do a universal PCR amplification 3. Sequence using long read sequencing: the longer the better! 4. Obtain sequences 5. Map sequences against a reference database 6. Annotate the data Example: The Antarctica project › Which parameters determine the › › › › composition of bacterial communities in antarctical lakes? 20 different samples/lakes Sequence 16S rDNA genes 1 x 454 run (1 million 500bp sequences) Map all sequences back to the RDP database Analyse the data using computing power › Compare different locations Is species A present in location1, location2,… › Assess the distribution in a single location How dominant is the most dominant species in location 1 How many species are in location 1 … Visualize ! Analyse different samples on different taxonomic levels › Include taxonomic tree of life of bacterias › Use a ‘taxonomy browser’ Analyse a single location Compare different locations Analysis Lab work difficulty Analysis difficulty De novo genome ++ (isolate) + Metagenomics + +++ (pathways etc.) SNP +++ (design primers) ++ (correlate) Species quantification ++ (universal primers) ++ Sequencing technology › Roche/454 GS-FLX (‘454’) › Illumina Prokaryotic profiling › › › › De novo genome sequencing Metagenomics SNP profiling Species quantification Viral profiling › De novo genome sequencing Viral profiling › Viral profiling = prokaryotic profiling, but… Cheaper Faster Easier › De novo genome sequencing = OK › Don’t spend $10.000 on a 100kb genome! › Multiplexing/pooling capacity is limited! Watch out for the overkill › An illumina run can be split into 8 lanes › >20 samples per lane can be combined Still >100Mb per sample… Thanks for your attention ! [email protected]