Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Division of Labor: Tools for Growing and Scaling Grids Tim Freeman, Kate Keahey, Ian Foster, Abhishek Rana, Frank Wuerthwein, Borja Sotomayor Division of Labor The greatest improvements in the productive powers of labour , and the greater part of the skill, dexterity, and judgment with which it is anywhere directed, or applied, seem to have been the effects of the division of labour. (Adam Smith) How can we implement division of labor in Grid computing? tools to implement an abstraction requirements for an abstraction 12/05/06 ICSOC ‘06 Overview Problem Definition Workspace Service The Edge Service Use Case Overview of the workspace service Extensions to workspace service Implementation and Evaluation CPU enforcement Network Enforcement Status of the Edge Services Project Conclusions 12/05/06 ICSOC ‘06 Overview Problem Definition Workspace Service 12/05/06 Overview of the workspace service Extensions to workspace service Implementation and Evaluation The Edge Service Use Case CPU enforcement Network Enforcement Status of the Edge Services Project Conclusions ICSOC ‘06 Providers and Consumers 12/05/06 Resource provider Resource consumers Has a limited number of resources Want the resources when they need them & as much as they need Has to balance the software needs of multiple users Want to use specific software packages Has to provide a limited execution environment for security reasons Wants as much control as possible over resources ICSOC ‘06 The Edge Service Use Case 12/05/06 ICSOC ‘06 Edge Services: Challenges VO-specific Edge Services Resource management The VOs would like to provide quality of service to their users The resource needs of the VOs are change dynamically Dynamic, policy-based deployment and management of Edge Services 12/05/06 Each VO has very specific configuration requirements Updates, ephemeral edge services, infrastructure testing, short-term usage ICSOC ‘06 Division of Labor Dimensions Environment and Configuration Isolation 12/05/06 Critical from the point of view of the provider if the VOs are to be allowed some independence Resource usage and accounting Application-independent Management along different resource aspects Dynamically renegotiable/adaptable ICSOC ‘06 Overview Problem Definition Workspace Service 12/05/06 Overview of the workspace service Extensions to workspace service Implementation and Evaluation The Edge Service Use Case CPU enforcement Network Enforcement Status of the Edge Services Project Conclusions ICSOC ‘06 GT4 workspace service The GT4 Virtual Workspace Service (VWS) allows an authorized client to deploy and manage workspaces on-demand. GT4 WSRF front-end Leverages multiple GT services Currently implements workspaces as VMs Uses the Xen VMM but others could also be used Current release 1.2.1 (December, 06) http://workspace.globus.org 12/05/06 ICSOC ‘06 Workspace Service Usage Scenario The VWS manages a set of nodes inside the TCB (typically a cluster). This is called the node pool. The workspace service has a WSRF frontend that allows users to deploy and manage virtual workspaces VWS Service VWS Node Each node must have a VMM (Xen) installed, along with the workspace backend (software that manages individual nodes) Image Node VM images are staged to a designated image node inside the TCB Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Trusted Computing Base (TCB) 12/05/06 ICSOC ‘06 Deploying Workspaces Adapter-based implementation model VWS Service Workspace Transport adapters - Workspace metadata Default scp, then gridftp - Resource Allocation Control adapters Default ssh Deprecated: PBS, SLURM VW deployment adapter 12/05/06 Image Node Xen Previous versions: VMware ICSOC ‘06 Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Interacting with Workspaces The workspace service publishes information on each workspace as standard WSRF Resource Properties. VWS Service Users can query those properties to find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. 12/05/06 Image Node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Trusted Computing Base (TCB) ICSOC ‘06 Deployment Request Arguments A workspace, composed of: VM image Workspace metadata Need not change between deployments Resource Allocation 12/05/06 XML document Includes deployment-independent information: VMM and kernel requirements NICs + IP configuratoin VM image location Specifies availability, memory, CPU%, disk Changes during or between deployments ICSOC ‘06 Workspace Service Interfaces Handles creation of workspaces. Also publishes information on what types of workspaces it can support Workspace Meta-data/Image Create() Resource Allocation Workspace Factory Service Workspace Resource Instance inspect & manage notify Workspace Service Service Workspace Resource Properties publish the assigned resource allocation, how VW was bound to metadata (e.g. IP address), duration, and state Handles management of each created workspace (start, stop, pause, migrate, inspecting VW state, ...) 12/05/06 authorize & instantiate ICSOC ‘06 Extensions to Resource Allocation 12/05/06 ICSOC ‘06 Overview Problem Definition Workspace Service The Edge Service Use Case Overview of the workspace service Extensions to workspace service Implementation and Evaluation CPU resource allocation Network resource allocation Status of the Edge Services Project Conclusions 12/05/06 ICSOC ‘06 Edge Services Today Compute Element (CE) implemented as GT GRAM VO1 7.83 jpm VO1 8 jpm VO2 GRAM Both VOs share the same resource Job throughput is low as both VOs are equally impacted by the high VO1 traffic 12/05/06 ICSOC ‘06 Allocating Resources for Edge Services Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon VO1 VO1 Resource Allocation: MEM: 896 MB CPU: CPU %: 45% CPU arch: AMD Athlon 4.18 jpm GRAM 22.36 jpm VO2 GRAM Workspace Service Dom0 CPU %: 10% Job throughput for VO2 is high as it is unimpacted by the high VO1 traffic 12/05/06 ICSOC ‘06 Tracking Requests Overtime Comparison of Request Throughput over Time VO1Client VO2Client 30 - Histogram of request throughput 25 Completed jobs 20 - Resource usage is enforced on an “as needed” basis 15 10 5 0 30 90 150 210 270 330 390 450 510 570 630 690 750 Time (in 30 second buckets) 12/05/06 ICSOC ‘06 810 Increasing Load on VO1 VO2 (under changing VO1 load conditions 1mill-VO2 2mill-VO2 - Histogram of request throughput 3mill-VO2 16 14 Jobs completed 12 - The load on VO1 increases 2x and 3x 10 8 6 4 2 0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 Time (in 30 second buckets) 12/05/06 ICSOC ‘06 450 480 - Request throughput for VO2 is unimpacted Network Resource Allocation domU dom0 B domU Processing network traffic requires CPU In Xen: for both dom0 and guest domains CPU allocation tradeoffs Scheduling frequency The mechanism is general 12/05/06 Save for direct drivers ICSOC ‘06 Network Resource Allocation Network Allocation Implementation CPU allocations based on a parameter sweep Linux network shaping tools Negotiating network resource allocations 12/05/06 Close to maximum bandwidth Policy: accepting only CPU allocations that match the bandwidth ICSOC ‘06 Storage Element (SE) Edge Service Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s VO1 VO2 Workspace Service VO1 GridFTP VO2 GridFTP Dom0 CPU %: 22% 12/05/06 ICSOC ‘06 Negotiating Bandwidth 12/05/06 ICSOC ‘06 Renegotiating CPU and Bandwidth Resource Allocation: MEM: 128 MB CPU: CPU %: 6% 14% CPU arch: AMD Athlon NIC: Incoming: 4.1 8.2 MB/s MB/s VO1 GridFTP Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s VO2 GridFTP Workspace Service Dom0 CPU %: 22% 12/05/06 ICSOC ‘06 Renegotiating CPU and Bandwidth 12/05/06 ICSOC ‘06 Renegotiating CPU Resource Allocation: MEM: 128 MB CPU: CPU %: 14% 34% CPU arch: AMD Athlon NIC: Incoming: 8.2 MB/s VO1 GridFTP Resource Allocation: MEM: 128 MB CPU: CPU %: 6% CPU arch: AMD Athlon NIC: Incoming: 4.1 MB/s VO2 GridFTP Workspace Service Dom0 CPU %: 22% 12/05/06 ICSOC ‘06 Renegotiating CPU 12/05/06 ICSOC ‘06 Edge Services: Status OSG activity 12/05/06 www.opensciencegrid.org/esf Edge Services in use (database caches) ATLAS: mysql-gsi db built by the DASH project CMS: frontier database Base Image library SDSC: SL3.0.3, FC4, CentOS4.1 FNAL: SL3.0.3, SL4, LTS 3, LTS 4 Sites Production: SDSC also testing at FNAL, UC and ANL ICSOC ‘06 Related Work Edge Service efforts OGF efforts: WS-Agreement, JSDL Managed Services QoS with Xen Padma Apparo, Intel (VTDC paper) Rob Gardner & team, HP Credit-based scheduler Grid computing and virtualization 12/05/06 VO boxes, EGEE APAC, static Edge Services Grid-Ireland, static Edge Services Work at University of Florida, Purdue, Northwestern, Duke and others ICSOC ‘06 Conclusions VM-based workspaces are a promising tool to implement “division of labor” Renegotiation is an important resource management tool 12/05/06 Enforcement methods: dynamic reallocation, migration, etc. Aggregate resource allocations Protocols Different resource aspects influence each other More work on managing VM resources is needed ICSOC ‘06