Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Grid Computing in Hong Kong Dr. Cho-Li Wang Systems Research Group Department of Computer Science and Information Systems The University of Hong Kong 1 Agenda Grid computing – a simple picture The Hong Kong Grid SRG Projects SLIM, ODGPC G-JavaMPI JESSICA2 LOTS DSM for Grid Summary and Conclusion 2 Grid Computing : A Simple Picture Much like “utilities” in our daily lives – electricity, water, etc. Advantages: Cost-effectiveness Platform extensibility Convenience (P&P) CPU power, Memory, Network, Storage… Data.. Services.. Resource providers Grid Computing Access to remote resources via standard protocols for cross-domain collaboration End users 3 Grid Computing in Hong Kong -The Hong Kong Grid The experimental grid in HK Supported under HKU Foundation Seed Grant http://www.hkgrid.org/ 4 The Hong Kong Grid (HKGrid) Goals: to construct and make available a grid test bed to facilitate the development of grid middleware and applications by local industry and institutions in Hong Kong and their partners in the region to demonstrate the benefits of adopting grid technologies and to showcase any outstanding results of development or application HKGrid provides a platform for its members to experiment with various research prototypes and pilot applications 5 HKGrid - Current constituents Institutions Computing facilities City University of HK Service gateway (2-way Xeon SMP) HK Baptist University 2-way Xeon SMP x 64 (#300 in TOP500, 6/2003) HK University of Science and Technology 4-way SMP cluster The HK Polytechnic University Service gateway (2-way Xeon SMP) The HK Institute of HPC Service gateway (2-way Xeon SMP) HKU – Computer Centre 2-way Xeon SMP x 128 (#240 in TOP500, 11/2003) HKU – Department of CSIS Pentium 4 x 300 (#340 in TOP500, 6/2003) A 4 Tflop/s theoretical maximum computing power 6 Grid Point Monitoring with Ganglia 7 URL: http://gideon.csis.hku.hk/status/ HKU Grid Point: Grid and Cluster Software Grid middleware Remote job submission - Globus Toolkit (GT) 2.0, 2.4, 3.0.1 Gatekeeper gideon.csis.hku.hk Job scheduling - OpenPBS 2.3.16 - Maui 3.2.5 Programming Local Job Scheduler -HPF, Fortran 90 -C, C++, Java with MPI -JESSICA2 (HKU) Gideon Ostrich Srgdell Real Communication Lib - MPICH-G2 1.2.3 IPC / Network communication 8 Main Computing Facilities: HKU-CSIS Gideon 300 Cluster 9 Research Projects in HKGrid HKBU: Knowledge Grid (Autonomous grid service composition). HKPU: Peer-to-peer (P2P) grid, meta scheduler, fault tolerance HKUST: Development of sensor Grid infrastructure HKU ETI: Modelling of Air Quality in Hong Kong (E-Business Technology Institute with the Environmental Protection Department, HKSAR) Computer Centre : HKU campus grid ; scientific applications running across the ApGrid CSIS : Robust Speech Recognition (J. Wu and Dr. Q. Huo) CSIS : Simulation for the DNA Shuffling Experiment (W.H. Hon and Dr. T.W. Lam) CSIS: Approximate String Matching on DNA Sequences (L.L. Cheng) CSIS: Whole Genome Alignment via Mutation-Sensitive Sequence Similarity (H.L. Chan, N. Lu, and Dr. T.W. Lam) ME: Parallel Simulation of Turbulent Flow Model (Dr. C.H. Liu, Dept. of Mechanical Engineering) CSIS : HKU Grid Point (863 Project: China National Grid) CSIS: Asia-Pacific Grid ….. 10 HKGrid – Connections Links to China National Grid (CNGrid) and Asia-Pacific Grid (ApGrid) via CERNET and APAN Internet2 connection to the Abilene backbone at Chicago, USA Plays the role of a gateway for the other bigger grids 11 China National Grid (CNGrid) : 863 Project China National Grid Participants 上海超级计算中心 中科院计算所 香港大学 (CSIS) 中科院计算所开发的网 格系统软件已将计算所 、华中科技大学 与香 港大学网格节点连接在 一起,通过 VEGA_GOS … 西安交通大学 中国科技大学 国防科技大学 中科院应用物理所 清华大学 Supporting software : VEGA (织女星) grid management system : dynamic service deployment, single-sign-on, data replication, and performance monitoring. Developed by Institute of Computing Technology, Chinese Academy of Sciences V.1.0 released 8 12 ApGrid / PRAGMA Testbed 10 countries 21 organizations 22 clusters 853 CPUs 13 ApGrid Demon on The HKU School Open Day (Oct. 2003) 14 Grid Research at HKU-CSIS SRG Projects SLIM + ODGPC G-JavaMPI JESSICA2 LOTS DSM 15 Our Goal To construct an advanced grid computing platform to accommodate utility-like computing via traditional and “pervasive” means Utility computing: to aggregate and make use of distributed computing resources transparently Traditional means: to utilize the dedicated HPC facilities distributed across institutions Performance and reliability are key Pervasive means: any user can be resource provider (e.g., idle PCs, etc.) or consumer, or both Convenience and security are key 16 Research at HKU – An Advanced Grid Computing Platform (Programming Environment) User’s convenience Objectives AGP Research Issues Convenient system administration Grid point construction Performance and Reliability G-JavaMPI Load balancing JESSICA LOTS Singlesystem image SLIM ODGPC On-demand Grid point construction (ODGPC) 17 SLIM Single Linux Image Management 18 SLIM Utility computing decouples computing platforms (resources) and computing logic (applications) I.e., a single platform can run completely different applications Problem: different applications demand different execution environments (OS, shared libraries, supporting apps, etc.) Hassles associated with managing execution environments (EE’s) in the resource provider side offset the benefits of resource sharing SLIM is a network service for managing and constructing EE’s, and disseminating them to remote computing platforms 19 SLIM – System design How it works? A node sends a EE specification across the network to find the Boot server Boot server delivers the requested Linux kernel Image server constructs an EE by collecting shared libraries, user data, etc. Linux kernel boots, and contacts the Image Server to “mount” the EE via a file synchronization protocol such as NFS Aggressive caching techniques are deployed to optimize performance 20 SLIM – Ongoing and future work SLIM has been managing: the HKU-CSIS grid point (350 nodes) for various grid research projects an addition 300+ lab machines for teaching purpose (different courses have different requirements) Future work To overcome the challenges in deploying SLIM over broadband links Realizing the “pervasive utility computing” 21 On-Demand Grid Point Construction (ODGPC) SLIM server OS image DHCP SLIM server TFTP /usr/local/gt3.2 1 2 client client client 1. Software installation at SLIM server client client client 2. Client boots and obtains kernel SLIM server client1 certificate 1 CA server 4 4 client1 3 3 client 2 SLIM server client client client1 3. OS image/App disseminated 4. Process to generate certificates 22 SLIM and ODGPC Performance Evaluation 256 PCs < 5 minutes (OS only) Boot up 100 machines (Linux + GT3) : 6 minutes. Generate certificates for 100 machines (Step 4) : 30 minutes. Total time : 6 + 30 = 36 minutes 23 SLIM – Key references http://www.csis.hku.hk/ ~cmlee/slim/ C.M. Lee, R.S.C. Ho, D.H.F. Hung, C.L. Wang, and F.C.M. Lau, “Managing Execution Environments for Utility Computing,” Network Research Workshop 2004 (with APAN 2004), March, 2004. (LinuxPilot 2004/04) 24 G-JavaMPI A grid-enabled Java-MPI system with dynamic load-balancing via process migration 25 G-JavaMPI A grid-enabled implementation of Java binding of MPI, supporting efficient MPI communication among distributed Java processes Supports transparent Java process migration (through JVMDI) within and across grid points for balancing CPU and network loads Communication-aware process migration policies based on: application’s communication pattern available network bandwidth on grid overlays 26 G-JavaMPI – System design (3) (1)(1*) Gatekeeper LS LS Gatekeeper Java-MPI (2)communicatio nWAN (*) Some legacy Migrating (restarting a new process through Globus remote job request with delegated user credentials and Java-MPI job credentials) messages are redirected during migration (2*) JVM (3*) Gatekeepe r LS M Migration module resides in each JVM 27 G-JavaMPI – Ongoing and future work The migration mechanism has been implemented Future work targets at process migration policies Goal: to offset performance pitfalls caused by heterogeneity through dynamic process migration Sources of heterogeneity in grids CPU, network, runtime environments, etc. CPU and network heterogeneities cause long “blocking” periods in cooperative processes, thus limiting the system throughput G-JavaMPI aims to detect and eliminate “blocking” through process migration (e.g. to migrate a “bottleneck” process to a faster node, etc.) 28 G-JavaMPI – Key references L. Chen, C.L. Wang, and F.C.M. Lau, “A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports,” Journal of Computer Science and Technology (China), Vol. 18, No. 4, July 2003, pp. 505-514. L. Chen, C.L. Wang, F.C.M. Lau, and R.K.K. Ma, “A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports,” International Workshop on Grid and Cooperative Computing (GCC-2002), December 26-28, 2002, Hainan, China, pp. 640652. 29 JESSICA2 : A Java-Enabled SingleSystem Image Computing Architecture JESSICA2 is a distributed Java Virtual Machine (DJVM) which consists of a group of extended JVMs running on a distributed environment to support true parallel execution of a multithreaded Java application. Java threads can freely move across node boundaries and execute in parallel to achieve more scalable high-performance computing using clusters The JESSICA2 DJVM provides standard JVM services, that are compliant with the Java language specification, as if running on a single machine – Single System Image (SSI). 30 JESSICA2 Architecture A Multithreaded Java Program Thread Migration JIT Compiler Mode Portable Java Frame JESSICA2 JVM JESSICA2 JVM Master JESSICA2 JVM Worker JESSICA2 JVM Worker JESSICA2 JVM Worker JESSICA2 JVM Worker Worker Global Object Space 31 JESSICA2 Main Features Transparent Java thread migration Full Speed Computation Runtime capturing and restoring of thread execution context. No source code modification; no bytecode instrumentation (preprocessing); no new API introduced Enable dynamic load balancing on clusters JITEE: cluster-aware bytecode execution engine Operated in Just-In-Time (JIT) compilation mode Zero cost if no migration Transparent Remote Object Access Global Object Space : A shared global heap spanning all cluster nodes Adaptive migrating home protocol for memory consistency + various optimizing schemes. I/O redirection 32 Ray Tracing on JESSICA2 (64 PCs) Linux 2.4.18-3 kernel (Redhat 7.3) 64 nodes: 108 seconds 1 node: 4402 seconds ( 1.2 hour) Speedup = 4402/108=40.75 33 JESSICA – Key references W.Z. Zhu , C.L. Wang, and F.C.M. Lau “A Lightweight Solution for Transparent Java Thread Migration in Just-in-Time Compilers,” The 2003 International Conference on Parallel Processing (ICPP-2003), pp. 465-472, Taiwan, Oct. 6-10, 2003 W.Z. Zhu, C.L. Wang and F.C.M. Lau, “JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support,” IEEE Fourth International Conference on Cluster Computing (CLUSTER 2002), Chicago, USA, September 23-26, 2002, pp. 381-388. M.J.M. Ma, C.L. Wang, F.C.M. Lau. “JESSICA: JavaEnabled Single-System-Image Computing Architecture,” Journal of Parallel and Distributed Computing, Vol. 60, No. 10, October 2000, pp. 1194-1222. 34 LOTS: Large Object Space on Grid LOTS LOTS OS OS H/W H/W Grid LOTS OS Large Global Object Space LOTS H/W OS LOTS H/W OS H/W A large software distributed memory system for Grid. Provides a global object space larger than the process space (4GB in 32-bit CPU) Uses local hard disk to store recently unused objects Scope Consistency + Home Migration to reduce redundant data traffic 35 Summary Performance Reliability G-JavaMPI, JESSICA, establish extensible grid platforms (good for computation-intensive applications) Process/thread migration enables performance optimization and load balancing LOTS supports shared memory programming environment on large object space (easier to develop data grid applications) G-JavaMPI migrates processes from failed machines SLIM helps construct platforms for failover Convenience G-JavaMPI, JESSICA, and LOTS enable users to harness distributed resources via traditional means SLIM and ODGPC simplify Grid point managements 36 Conclusion Grid/utility computing are relatively new paradigms that deserve further investigation We address the performance, reliability, and user convenience issues in grid/utility computing Our advanced grid computing platform (consisting of G-JavaMPI, JESSICA2, LOTS, and SLIM/ODGPC) is geared to deploy in the HKGrid for easy adoption of Grid technologies. 37 Q&A Thank you! The SRGers (Photo: 12/2003) 38 Reference • Hong Kong Grid • http://www.hkgrid.org/ • Grid Computing Research Portal • http://grid.csis.hku.hk/ • The HKU Systems Research Group • http://www.srg.csis.hku.hk VEGA Project http://vega.ict.ac.cn/ The HK Supercomputing Directory http://www.hkhpc.org/~SuperDir/ 39