Download Parallel and Distributed Analytics Approaches on Big Data Clouds

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
Research Proposal
Parallel and Distributed Analytics Approaches on Big Data Clouds
JAMIA MILIA ISLAMIA UNIVERSITY, NEW DELHI-110025
Submitted By:
Mahboob Alam
Submitted To:
Department of Computer Engineering
Jamia Milia Islamia University
JAMIA MILIA ISLAMIA
UNIVERSITY,
NEW DELHI
Research Synopsis
Parallel and Distributed Analytics Approaches on Big Data Clouds
Submitted By: Mahboob Alam
Overview:
Big Data indicates very large and complex data sets that are difficult to process
using traditional and sequential data processing applications. Data-intensive,
parallel and distributed approaches are typically employed, such as the
MapReduce programming paradigm (e.g., Apache Hadoop). However, one of the
most interesting challenges is not about the storage and the management of
the data, rather it is about the insights and the impact the analysis of the data
can generate. From this perspective, providing effective and efficient algorithms
and tools for Big Data Analytics and Mining is fundamental. The potential of
Big Data is in our ability to provide solutions to business and to the scientific
community which are based on the approach known as ‘data-driven discovery’.
The project will investigate, develop and test distributed formulations of data
mining algorithms that are suitable for parallel and distributed computing
paradigms. Depending on ongoing collaborations, the project may contribute to
multi-disciplinary applications for the analysis of very large data in one of the
following domains: Climate Science, Neuroscience, or Finance.
Computer simulation is widely utilized and important in various scientific
areas, as diverse as earth sciences, drug design, healthcare, or manufacturing.
The execution of simulation applications can be computationally demanding
requiring special computing resources, such as clusters, grids or clouds.
However, it is not only the execution of the simulation that raises technical
challenges. Simulation programs typically generate large amount of data that
needs to be processed and analyzed.
Objective:
The objective of the proposed PhD research is to design novel architectures and
algorithms to store, process and analyze large volumes of data, structured and
unstructured. As a recent approach, cloud computing technologies have been
proposed to execute large scale distributed simulation (e.g. the CloudSMEs
European Project. Proving the interdisciplinary nature of the CloudSME
simulation platform (originally designed for manufacturing and engineering),
the proposed research will investigate how to apply these results for big
datasets, and how to extend them with cloud-based big data storage and
analytical tools.
Simulation analytics can be performed in area of biomolecular simulations,
financial data generation, health care data etc. The research would focus on
analyzing the problem of storing, processing and retrieving meaningful insight
from petabytes of data. A multilayer architecture can be designed between data
generating sources to end users and ensuring each layer uses the best of bread
for its specific task.
Recent publications relevant to the project
[1] T Kiss, P Borsody, G Terstyanszky, S Winter, P Greenwell, S McEldowney, H
Heindl: Large-scale virtual screening experiments on Windows Azure-based
cloud resources, Concurrency and Computation, Practice and experience, DOI:
10.1002/cpe.3113, 2013.
[2] T Kiss, P Greenwell, H Heindl, G Terstyanszky and N Weingarten, Parameter
Sweep Workflows for Modelling Carbohydrate Recognition, Journal of Grid
Computing, Vol 8, No 4, pp 587-601, DOI: 10.1007/s10723-010-9166-8, 2010.
[3] S J E Taylor, T Kiss, G Terstyanszky, P Kacsuk and N Fantini: Cloud
Computing for Simulation in Manufacturing and Engineering: Introducing the
CloudSME Simulation Platform, to be published in proceedings of ANSS 14.