* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download LoadAtomizer
Zero-configuration networking wikipedia , lookup
Distributed firewall wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Computer network wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Network tap wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
LoadAtomizer: A Locality and I/O Load aware Task Scheduler for MapReduce 2012, in Proc. of IEEE 4th International Conference on Cloud Computing Technology and Science, M. Asahara, S. Nakadai, and T. Araki 2015.11.26 Network & Database Lab. 김민수 Index • • • • • Background Introduction LoadAtomizer Simulation and Evaluation Conclusions and Future work Network & Database Lab 2 BackGround- Hadoop[/] • Hadoop – HDFS+ MapReduce HDFS(Hadoop Distributed File System) – storage of data Network & Database Lab 3 BackGround- Hadoop[/] • Hadoop – HDFS+ MapReduce HDFS(Hadoop Distributed File System) – Components Network & Database Lab 4 BackGround- Hadoop[/] • Hadoop – HDFS+ MapReduce MapReduce – processing of data Network & Database Lab 5 BackGround- Hadoop[/] • Hadoop – HDFS+ MapReduce MapReduce – components Network & Database Lab 6 BackGround- Hadoop[/] • Hadoop – WorkFlow Network & Database Lab 7 Introduction • Data-intensive computing problem. I/O bottlenecks • data locality based task scheduling policy To place a computing task near its input data Data locality is NOT good enough • Heavy I/O load by the other task Network & Database Lab 8 LoadAtomizer - motivation • When all jobs running on a cluster have the same or similar I/O characteristics Locality-aware task shedulers work effectively • When the I/O characteristics of jobs differ from each other Locality-aware task shedulers do not always work good Network & Database Lab 9 LoadAtomizer - motivation • When the I/O characteristics of jobs differ from each other Network & Database Lab 10 LoadAtomizer - motivation • If the schedulers were also aware of I/O loads of storages and the network Network & Database Lab 11 LoadAtomizer - motivation • Two MapReduce jobs on a cluster with eight slaves TeraSort job : s1 is Reduce task(shuffle and merge phase) Grep job : s2 is map task I/O loads Network traffic Network & Database Lab 12 LoadAtomizer – Design Issues • I/O Load aware Task Assignment • I/O Load aware Storage Selection • Network Load aware Scheduling • Locality Awareness Network & Database Lab 13 LoadAtomizer • Maintaining I/O Load Information Collects the I/O load information from slaves and network switches The State of storages and the network -> heavily loaded or lightly loaded Topology-aware load tree Network & Database Lab 14 LoadAtomizer • Maintaining I/O Load Information Network & Database Lab 15 LoadAtomizer • Storage Selection and Task Scheduling Job scheduling policy is independent from locality and I/O load aware task scheduling -> use other scheduling Selection lightly loaded storage connect lightly network path Map task chooses storage Network & Database Lab 16 LoadAtomizer • The algorithm chooses quickly a lightly loaded storage that meets three conditions the storage stores input data of at least one map task the slave can connect to the storage through a lightly loaded network path the storage is closer to the slave than others Network & Database Lab 17 LoadAtomizer • Storage Selection and Task Scheduling Network & Database Lab 18 IMPLEMENTATION • Prototype of LoadAtomizer into Apache Hadoop 0.23.1 on Linux 2.6.26 • modified some modules of the Hadoop MapReduce framework and HDFS to command a slave to read the storage selected by LoadAtomizer • Storage monitor : /proc/diskstats • Network nomitor : /proc/net/dev • It uses a threshold approach to determine if the loading state becomes heavily or lightly loaded Network & Database Lab 19 IMPLEMENTATION • I/O-heavy job : TeraSort(40GB) • I/O-lightly job : grep(64GB) , word count(32GB) • 2GHz quad-core CPU, 12GB RAM, 300GB 10k-rpm SAS disk, gigabit Ethernet port. • Threshold storage : 100 I/Os • Threshold network : 80Mbps • • • • The block size of HDFS : 256MB eight reduce tasks Three replicas Allocated 1GB memory to each map and reduce task Network & Database Lab 20 IMPLEMENTATION Network & Database Lab 21