Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HP-SEE Short Fragment Sequence Alignment on the HP-SEE infrastructure www.hp-see.eu M. KOZLOVSZKY, G. WINDISCH, Á. BALASKÓ, MTA SZTAKI & Obuda University The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499 Overview The HP-SEE project HP-SEE Life Sciences Virtual Community HP-SEE Bioinformatics Life Science gateway Sequence alignment applications workflow based online bioinformatics services Future plans MIPRO 2012 – Opatija, Croatia 25.05.2012 2 The HP-SEE Life Science VRC and its objectives Main goal: Utilize the combined HPC resources with regional needs coming from the life/bioscience communities, fostering the research process in the field within the region with the help of the large-scale high availability infrastructure, and facilitate the cooperation between the sparsely distributed life science research centres. Data and limitations The Life Sciences domain has been revolutionized by advances in both computer hardware and software algorithms. Assembling the Human Genome Gene-expression chips to understand cellular processes Exponential growth in the amount of publicly available genomic data. GeneBank Traditional database approaches are no longer sufficient for rapidly performing life science queries involving the fusion of data types. Existing computational tools were created by experimentalists dealing with data sets that were miniscule in comparison to those available today. As a result, software that was once perfectly adequate now performs slowly or is incapable of successful analysis on traditional computational platforms. MIPRO 2012 – Opatija, Croatia 25.05.2012 3 Accessible infrastructure Country Center Computing Cores Teraflops BG Blue Gene/P 8192 27.85 HPCG 576 3.23 FINKI SC 2016 9 NIIFI SC Pecs SC Debrecen SC Szeged 144 1152 3078 2112 0.5 10 18 14 InfraGRID IFIN_BIO IFIN_BC NCIT UVT Blue Gene/P 400 256 368 562 4096 2.5 2.72 3.9 3.4 13.9 Bulgaria HP-SEE Supercomputing infrastructure SEE-GRID-SCI Grid infrastructure FYR of Macedonia Hungary Romania Serbia PARADOX 672 23624 TOTAL MIPRO 2012 – 6.26 115.26 Opatija, Croatia 25.05.2012 4 HP-SEE’s LS Applications 7 applications from 5 countries Greece: Montenegro: Deep sequencing for short fragment alignment (DeepAligner) In-silico Disease Gene Mapper (DiseaseGene) - gUSE & workflow based - gUSE & workflow based Georgia: DNA Multi-core Analysis (DNAMA) Hungary: Searching for novel miRNA genes and their targets (miRs) Network models of short and long term memory (CMSLTM) Modeling of some biochemical processes with the purpose of realization of their thin and purposeful synthesis (MSBP) Armenia: Molecular Dynamics Study of Complex systems (MDSCS) MIPRO 2012 – Opatija, Croatia 25.05.2012 5 Why gUSE/WS-PGRADE Infrastructure HP-SEE infrastructure Based on Glite and Arc as middleware Authentication procedures are painfull (as usual) Application Interoperabilty with grids is a plus Workflow like process with embedded (legacy) applications Restricted input parameter sets for the algorithms Service like operation Portal features for a community Knowledge, licensing & support Open source software environment needed Knowledge transfer required for the application specific modules MIPRO 2012 – Opatija, Croatia 25.05.2012 6 HP-SEE Bioinformatics eScience Gateway HP-SEE Bioinformatics eScience Gateway hosted at Obuda University, operated by MTA SZTAKI. gUSE+WS-PGRADE (v3.3.2) - Liferay based SEE region’s supercomputing & grid infrastructure used Accessible at: http://ls-hpsee.nik.uni-obuda.hu:8080/liferay-portal-6.0.5 MIPRO 2012 – Opatija, Croatia 25.05.2012 7 Architecture and application porting steps Unified porting steps of the applications: MIPRO 2012 – Opatija, Croatia 25.05.2012 8 DeepAligner-Deep sequencing for short fragment alignment Description & Objectives Mapping short fragment reads to open-access eukaryotic genomes is solvable by a group of algorithms (BLAST, BWA, PatternHunter, and other sequence alignment tools – BLAST /mpiblast or scalablast/ is one of the most frequently used tool in bioinformatics and the others are relative new fast light-weighted tools that aligns short sequences. Local installations of these algorithms are typically not able to handle such problem size therefore the procedure runs slowly, while web based implementations cannot accept high number of queries. The HP-SEE infrastructure allows accessing massively parallel architectures and the sequence alignment code is distributed free for academia. Result Online workflow based short sequence alignment service Impact Freely available service/code for large scale short sequence alignment Collaborations Hungarian Bioinformatics Association, Semmelweis University HP-SEE infrastructure used: Hungarian HPC, NIIF’s supercomputing sites MIPRO 2012 – Opatija, Croatia 25.05.2012 9 DeepAligner-Deep sequencing for short fragment alignment (contd.) Small scale launch (Home cluster): PBS/Linux Cluster, at the Obuda University – John von Neumann Faculty of Informatics. Activity and technical assistance in pre-production stage: Technical assistance was provided by MTA SZTAKI and NIIF. Porting: Application was ported using(Perl/C). Workflow and GUI was created for the application by Obuda University. Benchmarking Scaled from 32 cores to 96 cores (MPI). DeepAligner Status The online service is using two from NIIF’s supercomputing infrastructure (Budapest site and Szeged site). Foreseen activities: Parameter assignments optimization of the GUI, more scientific publications about short sequence alignment. Further scaling is planned with performance analysis. More information: http://hpseewiki.ipb.ac.rs/index.php/DeepAligner MIPRO 2012 – Opatija, Croatia 25.05.2012 10 Development & working on gUSE/WS-PGRADE Close collaboration and useful support (pros) ARC middleware connector was developed from scratch by MTA SZTAKI on request ASM and ARC submitter related bugs have been found and reported Helpful and skilled support & development team Documentation: installation, development and upgrade (cons) Hard to find information and hard to use it Installation Configuration Upgrade MIPRO 2012 – Opatija, Croatia 25.05.2012 11 Future plans Additional plug-in like online bioinformatics services More sequence alignment workflows More sequence multiple alignment workflows Sequence database quality measurement workflows Open up the gateway for users outside SEE region Thank you for you attention! Questions? MIPRO 2012 – Opatija, Croatia 25.05.2012 12 gUSE/WS-PGRADE architecture DeepAligner DiseaseGene ASM Application specific Module MIPRO 2012 – WS-PGRADE Opatija, Croatia 25.05.2012 13