Download Gene-Boosted Assembly of a Novel Bacterial Genome from

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic library wikipedia , lookup

Human genome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Transcript
Harry Presman
Overview

Motivation

Assembly

Results

Advantages/Limitations
Motivation

Next-gen sequencers produce short
read-lengths

Useful for polymorphism discovery

Difficult to assemble whole genomes

Current assembly algorithms produce
highly fragmented results
Sequencing P. aeruginosa(PAb1)

Source of common in-hospital infections

Chosen due to available comparators,
PAO1 and PA14

8,627,900 shotgun reads (Solexa)
Assembly

Step 1: AMOScmp
 Comparative assembler
 Uses MUMmer
○ Alignment system based on suffix trees
 Referenced in “Comparative Genome
Assembly”
 PA14 – 2053 contigs
 PAO1 – 2797 contigs
Assembly

Step 2 : multiple sequence alignment
 Align PAO1 and PA14 assemblies
 Use Minimus to fill gaps with contigs
○ AMOS component for small data sets
 Re-map reads using AMOScmp to clean
assembly
 Closed 203 gaps
Assembly

Step 3 : gene-boosted assembly
 UofMaryland annotation pipeline
○ Based on BLAST and Glimmer
 Protein-coding genes used to fill gaps
○ Identify genes at contig edges and gaps
○ Extract AA sequences
○ tBlastn identified potential filler reads
○ ABBA assembled reads into gaps
 Closed 185 gaps
Aside

Tested gene-boosted analysis alone

PAb1 assembled using PA14 proteins

96% of PAb1 proteins assembled using
only this method

Lacks global genome structure
information
Assembly
Step 4 : Clean up
 SSAKE

 “Short Sequence Assembly by K-mer search
and 3’ read Extension”

Edena
 “Exact DE Novo Assembler”
Velvet
 Closed 46 gaps

Results

76 contigs containing 6,290,005 bp
 94% of bases in single scaffold
 5602 protein-coding genes identified
 Error rate per read = 1.04%
 Error with coverage > 20X is zero
 Slight bias toward high gene coverage
regions
Results

SNP analysis
 Aligned PA14 and PAb1
 5,537,508/5,568,550 bp agreed
 1157/5,568,550 possible sequence errors
 187/1104 indels in error
 Accuracy of assembly: > 99.97%
Advantages/Limitations

Requires related genomes and protein
sequences
 GenBank contains > 650 microbial genomes
Genome size should not matter
 High speed and low cost

 ¼ of a single Solexa sequencing run in this
case
Thank You
Questions?