Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Performance Comparison of Clustered Systems Yugandhar Maram, #91527748 Anjana Vadivel, #78563168 Stuthi Balaji, #34682837 OUTLINE Motivation/Goals System architecture/tools used/Softwares integrated Related work and efforts Validation/Evaluation Results Motivation and Goals To study the architecture of widely used distributed systems and familiarised ourselves with Hadoop and Spark and Google File Systems Aimed at analyzing the performance of these distributed systems under high work-loads. Hive DB and sparkSQL System Architecture Hadoop Cluster with Database distributed across nodes. HIVE (Issuing SQL queries to Hadoop Distributed system) Spark Cluster using HDFS. SparkSQL (Issuing SQL queries to Spark Distributed system) Tools used/Softwares Integrated Hadoop and Spark with Hive and SparkSQL atop those systems, respectively. TPC-H benchmark data for for Load generation. DBGen Related work and efforts (cont.) Set up the Hadoop and Spark environment along with the Hive,SparkSQL databases of size 30 GB on the cluster. Issued TPCH benchmark SQL queries to the hive and SparkSQL databases that queries the database spread across the nodes of the systems. Hive Query Results THANK YOU!!