Download Short Slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Concurrency control wikipedia , lookup

Transcript
Performance Comparison of
Clustered Systems
Yugandhar Maram, #91527748
Anjana Vadivel, #78563168
Stuthi Balaji, #34682837
Motivation and Goals



The primary motivation is to study the
architecture of distributed systems and
understand typical issues that arises in
middleware.
Aim of this project is to analyze the
performance of various distributed systems and
reason the results.
We are currently considering Hadoop and
Spark systems as our target distributed
environments.
Implementation Details



In order to perform analysis for both systems,
we are using Hive tool which runs on top of
them.
We are issuing TPCH benchmark SQL queries
to the hive, which queries database of many
GBs of size that is spread across the systems.
The hive translates the SQL queries to
Hadoop/Spark systems jobs, where they will be
performed in distributed manner.
Implementation Details(cont.)



We will later analyze the performance of these
systems based on the latency to generate the
required results.
We will compare the differences in architecture
of the systems to reason the results of the
above queries.
Performance analysis of the same systems with
different sizes of databases will also be reason.
Related Work/Progress



We have set up Hadoop environment on our
local machines and also ran map-reduce
programs successfully.
We have also set up Hive on top of Hadoop and
performed sample queries to check for correct
functionality.
The architecture of Hadoop distributed systems,
Google File Systems and other relevant topics
that might be required for this project were
carefully studied.
Evaluation Plans



Currently, this week, we will start the next
phase with TPCH queries on Hive.
Once we get familiarized with whole setup in
local systems, we will start the actual analysis
on cluster nodes.
Then, we start the last phase of reasoning the
results, and present our analysis.
References



http://bradhedlund.com/2011/09/10/understandi
ng-hadoop-clusters-and-the-network/#download
The Google File System by Sanjay Ghemawat,
Howard Gobioff, and Shun-Tak Leung.
The Hadoop Distributed File System:
Architecture and Design by Dhruba Borthakur.