Download Classification of DNA sequences using Bloom Filters

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Zinc finger nuclease wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA barcoding wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

NUMT wikipedia , lookup

Microevolution wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

Primary transcript wikipedia , lookup

Pathogenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Point mutation wikipedia , lookup

Whole genome sequencing wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human Genome Project wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microsatellite wikipedia , lookup

Genomic library wikipedia , lookup

Human genome wikipedia , lookup

Non-coding DNA wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Sequence alignment wikipedia , lookup

Genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Genome editing wikipedia , lookup

Transcript
CS 6293 AT: Current Bioinformatics
HW2
Papers
1. BLAT--The BLAST-Like Alignment Tool
2. Classification of DNA sequences using Bloom Filters
Course Intructor
Dr. Jianhua Ruan
Presenters
Husnu Narman
Nihat Altiparmak
BLAT--The BLAST-Like Alignment
Tool
W. James Kent (2002)
UCSC
Cited by 2229(Google Scholar)
Brief Information About BLAST
• BLAST: Basic Local Allignment Search Tool
• Find a gene in different kinds of databases
Divide query to
small part words
and compare
High Scoring
Segments
Pairs(HSP)
Evaluate, handle
exceptions, and
reports
Scan for exact
matches in HSP
List all of the
HSPs in the
database
Extend exact
matches to HSP
BLAT
• BLAT: The Blast-Like Alignment Tool
• Find a gene in different kinds of databases
• Why new search tool?
Differences between BLAST and BLAT
BLAST
BLAT
• Index of Query
• Triggers extension one
or two hit occur
• List of exons sorted by
size
• Index of Database
• Triggers extensions
any number perfect or
near perfect hits
• Look up location of a
sequence in genome
or determine exon
structure of a mRNA
Classification of DNA sequences
using Bloom Filters
Strannheim et al. (2010)
Stockholm, SWEDEN
Classification of DNA sequences using
Bloom Filters
• New generation sequencing technologies
– Complex datasets
– New efficient, specialized sequence analysis algorithms
• Often, only noval sequences required, unnecessary
sequences(belonging to a known genome) need to be
removed
• A new algorithm(FACS) to classify sequences as
belonging or not belonging to a reference sequence
• Source code available at;
– http://facs.biotech.kth.se
Bloom Filter
• A memory efficient data structure for testing
whether an element is part of a reference set
• m bit vector with k hash functions
• Never returns a false negative; may however
return a false positive
• Optimal number of hash functions;
𝑚
𝑘 = 𝑙𝑛2
𝑛
Example Bloom Filter
x
y
z
0 1
0 0 1
0 1
0 1
0 0 0 0 0 0 1
0 0 1
0 0 0 1
0 0
√
√
w
𝑚 = 18, 𝑘 = 3
x
Method
• Bloom filter is created from the reference
sequence with desired K-mer and false
positive rate.
• The query sequences are then classified by
using the bloom filter
Evaluation
• Experimental metagenome dataset(Allander et al.
2005) containing 177184 reads
• Analysis using human genome as a reference
• FACS, BLAT and SSAHA2 compared
21x
31x
Evaluation
False Positive Rate(Missed)
False Positive Rate
Percentage (%)
0.06
0.05
0.04
0.03
0.02
0.01
0
FACS
BLAT
SSAHA2
Any Questions?