Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya ([email protected]) Topics of Discussion PARMON System Model & Architecture PARMON Server PARMON Client PARMON Features and Services PARMON Installation and its Usage Monitoring with PARMON PARMON Integration with other products Conclusions and Future Directions Motivations Workstation clusters have off late become a cost-effective solution for HPC ? . C-DAC’s PARAM OpenFrame is a large cluster of more than 40 Ultra-4 workstations interconnected through lowlatency, high bandwidth communication networks. Monitoring such huge systems is a tedious and challenging task since typical workstations are designed to work as a standalone system, rather than a part of workstation clusters. System administrators require tools to effectively monitor such huge systems. PARMON provides the solution to this challenging problem. C-DAC HPCC Software Architecture APPLICATIONS SYSTEM MANAGEMENT TOOLS Parallel File system C-PFS Development Tools F90 IDE, DIVIA Languages C, F77, F90, Message Passing Interfaces C-MPI, PVM Light Weight Protocols SOLARIS CLUSTER HARDWARE PARMON - Salient Features Online creation of Node and Group database Allows to monitor system activities at Component, Node, Group, or entire Cluster level monitoring Designed using state-of-the-art Java technology Monitoring of System Components : CPU, Memory, Disk and Network Allows to monitor multiple instances of the same componet. Facility for definition of events and automatic notification Miscellaneous facilities : Message broadcast, Invocation of system management commands (halt, reboot, etc.), System Information & Configuration PARMON provides GUI interface for initiating activities/request and presents results graphically. PARMON System Model PARMON Client on JVM PARMON Server on Solaris Node parmon parmond PARMON High-Speed Switch PARMON Implementation Server Multithreaded using POSIX and Solaris Developed using C as it need to access system internals It is a stateless server Client Developed using Java Java features are extensively used.. New Window is created for each client request, which interacts with server Threads are used extensively to while creating online resource utilization meters Dynamically configures with changes to node date base. Setting up of PARMON Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no> (either at boot time or on-line) Needs to be loaded on all nodes to be monitored Client installation & invocation Java based client (client machine can be PC/workstation supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no> Setting up of PARMON Server installation & invocation Binding to port Rights (requires root permission for full functionality) parmond or parmond <port-no> (either at boot time or on-line) Needs to be loaded on all nodes to be monitored Client installation & invocation Java based client (client machine can be PC/workstation supporting JVM) CLASSPATH (pointing to classes.zip, parmon.jar) jar file (parmon.jar) java parmon or java parmon <port-no> Monitoring System Activities and Resource Utilization PARMON Launcher Creation of Node Database Node Deletion Group Creation Group Modification/Deletion Resource Utilization at a Glance Selection of Nodes/Group CPU Usage Monitoring Memory Usage monitoring Disk/Network Usage Monitoring Message Viewer (System logs) Process activities Kernel Data Catalog - CPU Kernel Data Catalog - Memory Kernel Data Catalog - Disk Kernel Data Catalog - Network Catalog of CPU Parameters Component View - Physical Component View - Logical Message Broadcast System Configuration System Information Issuing Commands : halt, shutdown, etc. Node Diagnostics - Online (SunVTS) Online Help PARMON Integration with other Products PARMON can send resource utilization information to any other product if protocols are made available Node 1 parmond Node N PARAM online bulletin board Conclusions and Future Directions PARMON successfully used in monitoring PARAM OpenFrame Supercomputer, which is a cluster of 48 Ultra-4 workstations running SUN-Solaris operating system. Portable across platforms supporting Java Comprehensive monitoring support and GUI PARMON supports Solaris and Linux clusters and planned for supporting NT clusters. Can easily be extended to support web-based monitoring of clusters, by creating a interface server (running on web-server) between client and PARMON server running on cluster nodes. Thank YOU ?