Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Korilog BIOINFORMATICS Solutions KLAST high-performance sequence similarity search tool & integration with KNIME platform Patrick Durand, PhD, CEO Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions Sequence analysis big challenge DNA sequence... Context 1. Modern sequencers produce huge amount of data 2. Reference databanks contain large amount of data Problem How to compare all that data quickly and efficiently ? Our answer KLAST … deluge of data Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions The KLAST project Context Public/private collaborative R&D project to create new high-performance tools for NGS Partners www.korilog.com team.inria.fr/genscale Duration Stage 1: 2011 - 2013: creation of KLAST Stage 2: 2013 – 2015: enhancements of KLAST + metagenomics oriented tools Funding and Support Région Bretagne / Oséo Innovation / CRITT Santé Bretagne Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions KLAST algorithm KLAST = PLAST + ORIS + data filtering engine Algorithm PLAST ⇨ “Protein / Protein” algorithm (KLASTp, KLASTx, tKLASTn, tKLASTx) Van-Hoa Nguyen, Dominique Lavenier, « PLAST: Parallel Local Alignment Search Tool for database comparison», BMC Bioinformatics 2009 Algorithm ORIS ⇨ “Nucleotide / Nucleotide” algorithm (KLASTn) Dominique Lavenier, « Ordered Index Seed Algorithm for Intensive DNA Sequence Comparison », HiCOMB 2008 Efficient usage of hardware capabilities ⇨ Multi-cores architectures ⇨ SSE3 architecture / AVX2 ready Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions KLAST algorithm Differences between KLAST and BLAST ⇨ amino acid algorithm * hits localization (subset seeds) + hits filtering (specific hardware usage) ⇨ nucleotides algorithm * hits filtering (specific heuristic & specific hardware usage) ⇨ data filtering engine (scores, e-value, identity, coverage, etc.) Common parts between KLAST and BLAST ⇨ last algorithm step (dynamic programming) + statistical model Questions ⇨ what are the differences between tools results ? (quality) ⇨ how fast KLAST is compared to BLAST ? (speed) Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions Benchmark Study of Tara Oceans data sets KLAST and BLAST+ benchmark: comparison of 8,245 sequences (translated 454 reads) from Tara Oceans metagenomic data against 15 million proteins from Uniprot. Both algorithms ran on 8 Intel Xeon cores. 1. Speedup is 18x 2. KLAST covers 96% of BLAST results 8,238 min vs. 469 min Benchmark data courtesy of Jean-Marc Aury, Eric Pelletier and Thomas Vannier research team (National Sequencing Centre – CEA, France). Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions KLAST integration Challenge Combining KLAST sequence comparison engine and data integration & analysis tools Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions Integration with KNIME What is KLAST Extension for KNIME ? Provides nodes to + run Klast on NGS data sets + annotate results: Enzyme, GO, InterPro, NCBI Taxonomy (full and Lowest Common Ancestor) + filter data + import and export sequences and results + quickly prototype sequence analysis workflows + manage databanks: EMBL, Genbank, Uniprot, RefSeq, Silva, DNA Barcoding, standard FASTA, user-defined, etc. Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions KLAST: graphical mode Study of Tara Oceans data sets Dataset: klastp comparison of 8,245 proteins vs. Uniprot (15 million sequences) CPU time : 469 minutes for klastp workflow vs. 8238 minutes for blastp on a Genoscope cluster node (8 cores). Speedup: 18x Datasets and tests provided by Jean-Marc Aury, Eric Pelletier and Thomas Vannier (French National Sequencing Center / Genoscope / CEA) Knime UGM Meeting – Zurich, Feb. 2014 Korilog BIOINFORMATICS Solutions KLAST: graphical mode Study of functional & taxonomy diversity Display the taxonomy diversity of a result dataset as a piechart Dataset: klastx comparison of 97,000 sequences (454 reads) vs. SwissProt_bacteria (350,000 sequences) Computation: 2h on an Apple MacBook Air (4 cores, 4 Go RAM). Metagenomics dataset provided by Philippe Vandenkoornhuyse and Alexis Dufresne, CAREN – CNRS UMR 6553 EcoBio, Rennes Knime UGM Meeting – Zurich, Feb. 2014 Korilog B I O I N F O R M AT I C S Solutions More information: contact Patrick Durand Email: [email protected] Phone: +33 (0) 960 368 038 www.klast-search.com Knime UGM Meeting – Zurich, Feb. 2014