Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Jaeyoung Choi Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi [email protected] {heaven, psiver, lithmmon}@ss.ssu.ac.kr, [email protected] School of Computing, Soongsil University School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea Seoul 156-743, Korea 2017-05-25 Motivation Linux Cluster System widely used for high performance computing It emphasizes on the use of commodity hardware and open source software It delivers a very high-performance at the extremely low cost System management is a challenging task Automatic and convenient installation of OS & application software packages The effective way to navigate and interact with cluster component Mechanism and tools to perform collective commands Some services such as monitoring, fault detection and recovery 2017-05-25 Soongsil university What is CATS-i ? Cluster Administration ToolS on the Internet A collection of system management tools Provides automatic and convenient installation of OS & application software packages Provides efficient monitoring and management of cluster nodes with simple operation on the Internet. Provides easy-to-use GUI of PBS. Easy-to-install CATS-i rpm package 2017-05-25 Soongsil university CATS-i System Architecture Client Daemon Setup tool Management tool Get system information from local OS on each node Server Daemon Server Daemon Repository Running on server node to collect information from client daemon Setup tool Client daemon Client daemon Client daemon Management tool 2017-05-25 Soongsil university Implemented with JAVA Implemented with JAVA Support internet Difference with CATS-i NodeCloner CACR at CalTech to make all nodes identical using the Bootp and NFS not provide a GUI must edit the setup files related to NodeCloner Beoboot 2017-05-25 Rembo Technology SaRL, Swizerland Boot-ROM booting using DHCP using batch file interpreter defect: make the batch file, difficult interface Soongsil university Difference with CATS-i LUI(Linux Utility for cluster Installation) IBM Support BOOTP protocol and using DHCP and PXE. GUI Interface Heterogeneous cluster Must define the resource object Using TFTP As the number of nodes is increased, I/O road is increased. 2017-05-25 Soongsil university Installation using the IP Multicasting It provides same speed of installation and reduce I/O load Automatically, multicast a client module through NFS Sever sends slave node disk image through the D class IP address To make up for the unreliability of UDP timeout and retransmission 2017-05-25 Soongsil university Setup tools with IP multicasting Master node Network Configuration info Node DB GUI Error/Flow Control Multicast Server Module UDP D class IP (224.0.0.0 ~ 239.255.255.255) UDP Node 1 2017-05-25 Node 2 Node 3 Soongsil university …… Node N Setup tool in the CATS-i Disk Cloning using the NFS A slave node must be boot with DHCP and NFS enabled kernel It has a same way to boot as the diskless terminal using DHCP It makes a disk image of a slave node include hard disk info store slave node disk image in the server disk 2017-05-25 Soongsil university OS Setup tools Architecture - Disk cloning Interface Master-node DHCP server mode change 2.command 1.Start Slave-node 4.IP info 5.Query Boot kernel image Init Program 7.Mode 11.Partition info Backup wizard 8.Operation Client program 15.Result 6 Boot disk management Daemon NFS client server 14 Image file Disk cloning preparation Step 1, 2, 3 Command operation Step 4, 5, 6, 7, 8 2017-05-25 13 10 Lock management Disk config Low disk input 9 12 Hard disk Make disk image Step 9, 10, 11, 12, 13 Save disk image Step 14, 15 Soongsil university 3.booting OS Setup tools Architecture - Installation Interface Master-node 2.command 1.Start Slaver-node 4.IP info 5.Query DHCP server mode change Boot kernel image Init Program 7.Mode Restore wizard 8. Operation 8.connect Client program 15. Result 9.Start command Boot disk management Daemon 6 Multicast Client Lock management Server Sender Step 1, 2, 3 Installation Step 4, 5, 6, 7, 8, 9 2017-05-25 14 Hard Disk Command operation Low disk output 12 Image file Installation preparation 13 11 Disk Config format 10 3.booting Soongsil university Step 10, 11, 12, 13, 14 OS Setup tools Slave Node Master Node 2017-05-25 Soongsil university Related works for CMS -VACM VA Linux Systems Cluster administration tool runs on VA-Linux Real-time hardware sensor data such as temperature, fan speed and voltage are reported 2017-05-25 Soongsil university Related works for CMS - MAT Ryerson University, Canada It is implemented with Tcl/Tk It causes a lot of overhead to display rapidly changing data Individual management about each node monitor about system file mainly 2017-05-25 Soongsil university Related works for CMS - SCMS Kasetsart University It consists of real-time monitoring system, parallel unix command and numerous system administration utilities It supports java applet to report real-time system information It supports 3D interface using VRML 2017-05-25 Soongsil university Related works for CMS – M3C Oak Ridge National Lab It is implemented with java.User can manage multiple cluster group in one interface It supports job scheduling and software installation 2017-05-25 Soongsil university Management tools in the CATS-i Management tool offers maintenance of cluster nodes. Characteristics of management tool It is possible to bind many node as one cluster group, and manage multiple cluster groups in one place. It is possible to apply the same operation efficiently to all or selected nodes. It offers real-time monitoring to users for resource information such as CPU, memory and etc. Console implemented with java is interactive and easy to use. Job scheduling using JPBS through Internet CATS-i offers many function about resource. 2017-05-25 Soongsil university CATS-i function Node status CPU, memory, process, user list, account Disk space File management Alarm System log Shutdown/Reboot Package management JPBS 2017-05-25 Soongsil university Management tools – Node status It shows node information for each group Real-time information about CPU and memory total view 2017-05-25 Soongsil university Management tools – Node status It enable user to monitor resource information of cluster nodes such as CPU, memory, account, user, real-time CPU and memory monitoring, process monitoring, and managing basic info Performance 2017-05-25 Soongsil university process Disk Account User List 2017-05-25 Soongsil university Management tools – file management It provides file management functions for a cluster group. File Management It is very easy to use When they want to perform jobs related with files, users just click the right button to show a pop-up menu. 2017-05-25 Soongsil university Management tools – alarm function Monitor import system parameters Processor utilization, Memory Usage, etc. Notification is done through e-mail of system functions. 2017-05-25 Soongsil university Management tools – system log Log information is very useful in various situation Server daemon collects log information from each node Log Tree 2017-05-25 Soongsil university Management tools – RPM package User can install, remove, upgrade application packages with management tool and query about installed RPM Support REDHAT Linux It is implemented with thread library Option Dialog 2017-05-25 Soongsil university Management tools – PBS Interface It enables users to user a general PBS with the same CATS-i interface. JPBS job Submission Dialog main screen 2017-05-25 Soongsil university Conclusion & Future works CATS-i will offer more functions such as Status of CPU temperature, voltage and speed Extended aggregation of services Statistical memory and CPU information for each user Statistical information can be displayed graphically Network monitoring using SNMP and network analysis detect network bottleneck of clusters. Enhanced alarm services Administrator can can specify the condition to alarm and action to be taken In emergence, CATS-i can shutdown or reboot cluster nodes 2017-05-25 Soongsil university