Download Short Slides

Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837 Motivation and Goals    The primary motivation is to study the architecture of distributed systems and understand typical issues that arises in middleware. Aim of this project is to analyze the performance of various distributed systems and reason the results. We are currently considering Hadoop and Spark systems as our target distributed environments. Implementation Details    In order to perform analysis for both systems, we are using Hive tool which runs on top of them. We are issuing TPCH benchmark SQL queries to the hive, which queries database of many GBs of size that is spread across the systems. The hive translates the SQL queries to Hadoop/Spark systems jobs, where they will be performed in distributed manner. Implementation Details(cont.)    We will later analyze the performance of these systems based on the latency to generate the required results. We will compare the differences in architecture of the systems to reason the results of the above queries. Performance analysis of the same systems with different sizes of databases will also be reason. Related Work/Progress    We have set up Hadoop environment on our local machines and also ran map-reduce programs successfully. We have also set up Hive on top of Hadoop and performed sample queries to check for correct functionality. The architecture of Hadoop distributed systems, Google File Systems and other relevant topics that might be required for this project were carefully studied. Evaluation Plans    Currently, this week, we will start the next phase with TPCH queries on Hive. Once we get familiarized with whole setup in local systems, we will start the actual analysis on cluster nodes. Then, we start the last phase of reasoning the results, and present our analysis. References    http://bradhedlund.com/2011/09/10/understandi ng-hadoop-clusters-and-the-network/#download The Google File System by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Short Slides