Download Korilog

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Korilog
BIOINFORMATICS Solutions
KLAST
high-performance sequence similarity search tool
& integration with KNIME platform
Patrick Durand, PhD, CEO
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
Sequence analysis big challenge
DNA sequence...
Context
1. Modern sequencers produce huge amount of data
2. Reference databanks contain large amount of data
Problem
How to compare all that data quickly and efficiently ?
Our answer
KLAST
… deluge of data
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
The KLAST project
Context
Public/private collaborative R&D project to create new high-performance tools for NGS
Partners
www.korilog.com
team.inria.fr/genscale
Duration
Stage 1: 2011 - 2013: creation of KLAST
Stage 2: 2013 – 2015: enhancements of KLAST + metagenomics oriented tools
Funding and Support
Région Bretagne / Oséo Innovation / CRITT Santé Bretagne
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
KLAST algorithm
KLAST = PLAST + ORIS + data filtering engine
Algorithm PLAST
⇨ “Protein / Protein” algorithm
(KLASTp, KLASTx, tKLASTn, tKLASTx)
Van-Hoa Nguyen, Dominique Lavenier, « PLAST: Parallel Local Alignment Search
Tool for database comparison», BMC Bioinformatics 2009
Algorithm ORIS
⇨ “Nucleotide / Nucleotide” algorithm
(KLASTn)
Dominique Lavenier, « Ordered Index Seed Algorithm for Intensive DNA Sequence
Comparison », HiCOMB 2008
Efficient usage of hardware capabilities
⇨ Multi-cores architectures
⇨ SSE3 architecture / AVX2 ready
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
KLAST algorithm
Differences between KLAST and BLAST
⇨ amino acid algorithm
* hits localization (subset seeds) + hits filtering (specific hardware usage)
⇨ nucleotides algorithm
* hits filtering (specific heuristic & specific hardware usage)
⇨ data filtering engine (scores, e-value, identity, coverage, etc.)
Common parts between KLAST and BLAST
⇨ last algorithm step (dynamic programming) + statistical model
Questions
⇨ what are the differences between tools results ? (quality)
⇨ how fast KLAST is compared to BLAST ? (speed)
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
Benchmark
Study of Tara Oceans data sets
KLAST and BLAST+ benchmark: comparison of 8,245 sequences (translated
454 reads) from Tara Oceans metagenomic data against 15 million proteins from
Uniprot.
Both algorithms ran on 8 Intel Xeon cores.
1. Speedup is 18x
2. KLAST covers 96% of BLAST results
8,238 min vs. 469 min
Benchmark data courtesy of Jean-Marc Aury, Eric Pelletier and Thomas Vannier research team
(National Sequencing Centre – CEA, France).
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
KLAST integration
Challenge
Combining KLAST sequence comparison engine and data integration & analysis tools
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
Integration with KNIME
What is KLAST Extension for KNIME ?
Provides nodes to
+ run Klast on NGS data sets
+ annotate results: Enzyme, GO, InterPro, NCBI Taxonomy (full and Lowest Common Ancestor)
+ filter data
+ import and export sequences and results
+ quickly prototype sequence analysis workflows
+ manage databanks:
EMBL, Genbank, Uniprot,
RefSeq, Silva,
DNA Barcoding,
standard FASTA,
user-defined, etc.
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
KLAST: graphical mode
Study of Tara Oceans data sets
Dataset: klastp comparison of 8,245 proteins vs. Uniprot (15 million sequences)
CPU time : 469 minutes for klastp workflow vs. 8238 minutes for blastp on a Genoscope cluster node (8 cores).
Speedup: 18x
Datasets and tests provided by Jean-Marc Aury, Eric Pelletier and Thomas Vannier
(French National Sequencing Center / Genoscope / CEA)
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
BIOINFORMATICS Solutions
KLAST: graphical mode
Study of functional & taxonomy diversity
Display the taxonomy diversity
of a result dataset as a piechart
Dataset: klastx comparison of 97,000 sequences (454 reads) vs. SwissProt_bacteria (350,000 sequences)
Computation: 2h on an Apple MacBook Air (4 cores, 4 Go RAM).
Metagenomics dataset provided by Philippe Vandenkoornhuyse and Alexis Dufresne, CAREN – CNRS UMR 6553 EcoBio, Rennes
Knime UGM Meeting – Zurich, Feb. 2014
Korilog
B I O I N F O R M AT I C S Solutions
More information: contact Patrick Durand
Email: [email protected]
Phone: +33 (0) 960 368 038
www.klast-search.com
Knime UGM Meeting – Zurich, Feb. 2014
Related documents