Download Slides - Indico

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Sequence alignment wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Transcript
HP-SEE
Short Fragment Sequence Alignment on
the HP-SEE infrastructure
www.hp-see.eu
M. KOZLOVSZKY, G. WINDISCH, Á. BALASKÓ,
MTA SZTAKI & Obuda University
The HP-SEE initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 261499
Overview

The HP-SEE project

HP-SEE Life Sciences Virtual Community

HP-SEE Bioinformatics Life Science gateway

Sequence alignment applications workflow
based online bioinformatics services

Future plans
MIPRO 2012
–
Opatija, Croatia
25.05.2012
2
The HP-SEE Life Science VRC and its
objectives
Main goal:
 Utilize the combined HPC resources with regional needs coming from the
life/bioscience communities, fostering the research process in the field within the
region with the help of the large-scale high availability infrastructure, and facilitate the
cooperation between the sparsely distributed life science research centres.
Data and limitations

The Life Sciences domain has been revolutionized by advances in both computer hardware and
software algorithms.



Assembling the Human Genome
Gene-expression chips to understand cellular processes
Exponential growth in the amount of publicly available genomic data.

GeneBank

Traditional database approaches are no longer sufficient for rapidly performing life science queries
involving the fusion of data types.

Existing computational tools were created by experimentalists dealing with data sets that were
miniscule in comparison to those available today. As a result, software that was once perfectly
adequate now performs slowly or is incapable of successful analysis on traditional computational
platforms.
MIPRO 2012
–
Opatija, Croatia
25.05.2012
3
Accessible infrastructure
Country
Center
Computing
Cores
Teraflops
BG Blue Gene/P
8192
27.85
HPCG
576
3.23
FINKI SC
2016
9
NIIFI SC
Pecs SC
Debrecen SC
Szeged
144
1152
3078
2112
0.5
10
18
14
InfraGRID
IFIN_BIO
IFIN_BC
NCIT
UVT Blue Gene/P
400
256
368
562
4096
2.5
2.72
3.9
3.4
13.9
Bulgaria

HP-SEE Supercomputing
infrastructure

SEE-GRID-SCI Grid infrastructure
FYR of
Macedonia
Hungary
Romania
Serbia
PARADOX
672
23624
TOTAL
MIPRO 2012
–
6.26
115.26
Opatija, Croatia
25.05.2012
4
HP-SEE’s LS Applications

7 applications from 5 countries

Greece:



Montenegro:



Deep sequencing for short fragment alignment (DeepAligner)
In-silico Disease Gene Mapper (DiseaseGene)
- gUSE & workflow based
- gUSE & workflow based
Georgia:


DNA Multi-core Analysis (DNAMA)
Hungary:


Searching for novel miRNA genes and their targets (miRs)
Network models of short and long term memory (CMSLTM)
Modeling of some biochemical processes with the purpose of realization of their thin and
purposeful synthesis (MSBP)
Armenia:

Molecular Dynamics Study of Complex systems (MDSCS)
MIPRO 2012
–
Opatija, Croatia
25.05.2012
5
Why gUSE/WS-PGRADE

Infrastructure

HP-SEE infrastructure
 Based
on Glite and Arc as middleware
 Authentication procedures are painfull (as usual)


Application





Interoperabilty with grids is a plus
Workflow like process with embedded (legacy) applications
Restricted input parameter sets for the algorithms
Service like operation
Portal features for a community
Knowledge, licensing & support


Open source software environment needed
Knowledge transfer required for the application specific modules
MIPRO 2012
–
Opatija, Croatia
25.05.2012
6
HP-SEE Bioinformatics
eScience Gateway




HP-SEE Bioinformatics eScience Gateway hosted at
Obuda University, operated by MTA SZTAKI.
gUSE+WS-PGRADE (v3.3.2) - Liferay based
SEE region’s supercomputing & grid infrastructure used
Accessible at: http://ls-hpsee.nik.uni-obuda.hu:8080/liferay-portal-6.0.5
MIPRO 2012
–
Opatija, Croatia
25.05.2012
7
Architecture and application
porting steps
Unified porting steps of the applications:
MIPRO 2012
–
Opatija, Croatia
25.05.2012
8
DeepAligner-Deep sequencing for short
fragment alignment

Description & Objectives
Mapping short fragment reads to open-access eukaryotic genomes is solvable by a
group of algorithms (BLAST, BWA, PatternHunter, and other sequence alignment tools
– BLAST /mpiblast or scalablast/ is one of the most frequently used tool in
bioinformatics and the others are relative new fast light-weighted tools that aligns
short sequences. Local installations of these algorithms are typically not able to handle
such problem size therefore the procedure runs slowly, while web based
implementations cannot accept high number of queries. The HP-SEE infrastructure
allows accessing massively parallel architectures and the sequence alignment code is
distributed free for academia.

Result
Online workflow based short sequence alignment service

Impact
Freely available service/code for large scale short sequence alignment

Collaborations
Hungarian Bioinformatics Association, Semmelweis University
HP-SEE infrastructure used: Hungarian HPC, NIIF’s supercomputing sites
MIPRO 2012
–
Opatija, Croatia
25.05.2012
9
DeepAligner-Deep sequencing for short
fragment alignment (contd.)




Small scale launch (Home cluster): PBS/Linux Cluster, at the Obuda University – John von Neumann Faculty of
Informatics.
Activity and technical assistance in pre-production stage: Technical assistance was provided by MTA SZTAKI
and NIIF.
Porting: Application was ported using(Perl/C). Workflow and GUI was created for the application by Obuda
University.
Benchmarking
Scaled from 32 cores to 96 cores (MPI).
DeepAligner Status

The online service is using two from NIIF’s supercomputing infrastructure (Budapest site and Szeged site).

Foreseen activities: Parameter assignments optimization of the GUI, more scientific publications about short
sequence alignment. Further scaling is planned with performance analysis.

More information: http://hpseewiki.ipb.ac.rs/index.php/DeepAligner
MIPRO 2012
–
Opatija, Croatia
25.05.2012
10
Development & working on
gUSE/WS-PGRADE

Close collaboration and useful support (pros)




ARC middleware connector was developed from scratch by MTA
SZTAKI on request
ASM and ARC submitter related bugs have been found and reported
Helpful and skilled support & development team
Documentation: installation, development and upgrade
(cons)

Hard to find information and hard to use it
 Installation
 Configuration
 Upgrade
MIPRO 2012
–
Opatija, Croatia
25.05.2012
11
Future plans

Additional plug-in like online bioinformatics services




More sequence alignment workflows
More sequence multiple alignment workflows
Sequence database quality measurement workflows
Open up the gateway for users outside SEE region
Thank you for you attention!
Questions?
MIPRO 2012
–
Opatija, Croatia
25.05.2012
12
gUSE/WS-PGRADE
architecture
DeepAligner
DiseaseGene
ASM
Application specific Module
MIPRO 2012
–
WS-PGRADE
Opatija, Croatia
25.05.2012
13