Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen [email protected] http://www.cct.lsu.edu/~gallen Concrete Example I have a source file Main.F on machine A, an input file on machine B. Main.F is written using MPI, it will need around 4GB of core memory to run, it will take several hours to complete, and will produce a large output file. What functionality do we need? Issues How to select a machine to run it on? How to provide an executable which can run on that machine? How to move the input file? How to start the executable? How to monitor the job? When does it start? When does it finish? How to move the output file back? What about security? How do we know if it didn’t work and how it failed? How to Select a Machine What properties of a machine are we interested in? – What resources does my executable require? • 4 GB memory, “several hours of compute time” • Enough diskspace for the output – What kind of environment do I need on the machine? – – – – • OS limitations? • MPI? (Which version?), Fortran? What resources am I authorized to run on? How quickly will it run? How much will it cost/what is my allocation there? How to find all this information? What should the user provide? More Complicated What if the program might need to read in data kept on machine C while it is running? What about distributing across processors on different machines? What if I have a lot of interconnected programs? How do I find the output file afterwards? What is it doesn’t work? Questions What kind of functionality do we need? What tools exist to do this? What kinds of features of distributed computing do they need to be designed? What design issues to watch for? Abstract Requirements Single sign-on Job submission, monitoring and management – – – – submit a job to a resource on the grid monitor the progress of a submitted job retrieve results cancel job File transfer – move files from A to B, securely, reliably and efficiently Resource discovery – locate resources or services with particular characteristics Less typical: Metacomputing, workflow enactment, resource brokering,... What do I have to choose from? Globus Toolkit – – – – version 2 is widely deployed; nearest thing to a de facto standard horizontally integrated bag of tools suits grid application developers better than end users Brand new V4 based on web services UNICORE – less widely deployed; few UK deployments – vertically integrated – suits end users better than application developers Condor – high throughput computing – great for cycle harvesting Web Services? – GT4 or roll your own using Web Services tools Others – yes, there are others Increased functionality, standardization Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams Web services X.509, LDAP, FTP, … Custom solutions Globus Toolkit Condor, Unicore Defacto standards GridFTP, GSI (adapted from Ian Foster GGF7 Plenary) Generation Game App-specific Services Open Grid Services Architecture Data and knowledge intensive Open services-based architecture Builds on Web services GGF + OASIS+W3C Multiple implementations Time Global Grid Forum Industry participation QuickTime™ and a FF (Uncompressed) decompressor are needed to see this picture. Grid Architecture Application Collective Resource Connectivity Fabric Fabric Layer Contains the resources themselves which the Grid infrastructure needs to access Fabric components implement local, resource specific operations to provide higher level Grid operations – NFS storage protocol – Kerberos security – PBS queuing system Grid cannot provide more than local operations can support (e.g. advanced reservation) Fabric Layer Computational resources Storage resources Network resources But also – Database resources – Code repository resources – Etc. Fabric Layer What is the minimum functionality? – Introspection mechanisms: • Computational resources: hardware, software characteristics, state information such as current load and queue state • Storage resources: hardware, software characteristics, available space • Network resources: network characteristics and load – Resource management mechanisms • Computational resources: starting programs, monitoring and controlling execution of resulting programs • Storage resources: file put and get Fabric Layer What is desirable? – Introspection mechanisms: • Storage resources: bandwidth utilization – Resource management mechanisms • Computational resources: control over resources allocated to processes, advanced reservation • Storage resources: 3rd party transfers, high performance transfers, put and get of file subsets, callback functionality • Network resources: control of resources, prioritization, reservation Connectivity Layer Core communication and authentication protocols for needed network transactions Exchange of data between fabric layer resources Security Requirements: transport, routing, naming Assumed using protocols from TCP/IP stack (IP, ICMP, TCP, UCP, DNS, OSPF, RSVP, …), but could be others. Connectivity Layer Security requirements – – – – – Single sign-on to all resources Delegation of rights Integration with local security Implementation of trust relations Secure transport of data Resource Layer Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources Concerned with individual resources (addressed in next layer) Information protocols – Obtaining information about structure and state of a resource Management protocols – Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes. QuickTime™ and a FF (Uncompressed) decompressor are needed to see this picture. Grid Architecture Application Collective Resource Connectivity Fabric Resource Layer Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources Concerned with individual resources (addressed in next layer) Information protocols – Obtaining information about structure and state of a resource Management protocols – Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes. Collective Layer Dealing with operations across collective resources Build on relativity small number of resource/connectivity protocols Examples – – – – – Directory services (to provide information about resources) Co-allocation, scheduling, brokering services Monitoring and diagnostic services Data replication services Community authorization and accounting services What do I have to choose from? Globus Toolkit – – – – version 2 is widely deployed; nearest thing to a de facto standard horizontally integrated bag of tools suits grid application developers better than end users Brand new V4 based on web services – – – less widely deployed; few UK deployments vertically integrated suits end users better than application developers – – high throughput computing great for cycle harvesting – GT4 or roll your own using Web Services tools – yes, there are others UNICORE Condor Web Services? Others UNICORE Packaged Software with GUI Open source – http://unicore.sourceforge.net/ Designed for firewalls Strict security model – explicit delegation Abstract Job Object (AJO) – built-in workflow management Resource Broker – can submit to Globus grids Has notion of software resource Few APIs – extend through plug-ins – starting to expose service interfaces Serves the user http://www.unicore.org/ Condor: High-throughput computing Condor converts collections of workstations and clusters into a distributed high-throughput computing facility Emphasis on policy management and reliability High-throughput scheduler Supports job checkpoint and migration – single processor jobs only Remote system calls Condor-G lets Condor users add Globus-enabled resources to their private view of a Condor pool ("flock") "glide-in" http://www.cs.wisc.edu/condor/ Legion/Avaki Object based meta-system, providing a single integrated infrastructure All components are objects (unlike GT) – Data abstraction, encapsulation, inheritance, polymorphism API to core services Core object types – – – – – – Classes/metaclasses: managers and policy makers Host objects: abstractions of processing resources (one or many) Vault objects: persistent storage Implementation objects and caches: “exectuables” Binding agents: maps objects to physical addresses Context objects: naming of objects Globus Toolkit V2 GT2 “Implements Grid protocols for security, information discovery, resource management, data management, communication, fault detection and portability” Bag of tools rather than a uniform programming model, aims to provide distinct services with well defined APIs Assumes suitable software deployed on resources to provide basic fabric functionality (although some tools to help this are provided) – Discovering and packaging structure and state information Globus Toolkit version 2 "Single sign-on" through Grid Security Infrastructure (GSI) Remote execution of jobs – Grid-FTP – GRAM, job-managers, Resource Specification Language (RSL) Efficient, reliable file transfer; third-party file transfers Applications Diverse global services MDS (Metacomputing Directory Service) – Resource discovery (GRIS and GIIS) – Limited by support from scheduling infrastructure – gsi-ssh, grid-cvs, etc. Co-allocation (DUROC) Other GSI-enabled utilities Low-level APIs and command-line interfaces Commodity Grid Kits (CoG-kits), Java, Perl, Python Core services Widespread deployment, lots of projects Local OS Globus Toolkit V2 Connectivity – Grid Security Infrastructure (GSI) protocols – Based on public-key-infrastructure (PKI) and Internet protocols – Single sign-in (authentication creates a proxy credential: a digitally signed certificate that grants the holder the right to perform operations on behalf of signer for a limited time) – Delegation (communication of a (restricted) proxy credential to a remote service) – Credential format is extension of X.509 certificate – Remote delegation protocol based on transport layer security (TLS) protocol (follow on to SSL) – High-level programming API extensions of generic sercurity service application programming interface (GSS-API) Globus Toolkit V2 Resource Layer – Grid Resource Allocation and Management (GRAM) protocol – Monitoring and Discovery Service (MDS-2) – Grid File Transfer Protocol (GridFTP) GRAM Protocol Grid Resource Allocation and Management – Creation and management of remote computations – GSI for authentication, authorization, delegation – GRAM implementations map requests expressed in a Resource Specification Language (RSL) into commands understood by local schedulers and computers – Multiple GRAM implementations exist (with C, Java, Python interfaces) – GT2 implementation • Based on HTTP protocol • “gatekeeper” initiates remote computations • “jobmanager” manages remote computation • GRAM reporter monitors and publishes information MDS-2 Monitoring and Discovery Service – Framework for discovering and accessing structure and status information about resources (and services) • Data model for representing information • Protocols for publishing and accessing information – GT2 implementation • Based on LDAP (lightweight directory access protocol) • Local registry to manage collection and publication of information at a single location • Collective registry to support queries for information from multiple locations • Caching for performance GridFTP Protocol Extended version of file transfer protocol – – – – GSI security Partial file access, high speed striping Third party transfers Separate control/data channels Increased functionality, standardization Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams Web services X.509, LDAP, FTP, … Custom solutions Globus Toolkit Condor, Unicore Defacto standards GridFTP, GSI (adapted from Ian Foster GGF7 Plenary) Generation Game App-specific Services Open Grid Services Architecture Data and knowledge intensive Open services-based architecture Builds on Web services GGF + OASIS+W3C Multiple implementations Time Global Grid Forum Industry participation Web Services A Web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface that is described in a machine-processable format such as WSDL. Other systems interact with the Web service in a manner prescribed by its interface using messages (usually enclosed in a SOAP envelope). These messages are typically conveyed using HTTP, and are normally comprised of XML Software applications written in various programming languages and running on various platforms can use web services to exchange data over networks. This interoperability (e.g., between Java and Python, or Windows and Linux applications) is due to the use of open standards. OASIS and the W3C are the primary committees responsible for the architecture and standardization of web services. Specifications for additional features under development. Basically: Web service = TRANSPORT (HTTP) + MESSAGING (SOAP) + DESCRIPTION (WSDL) + DISCOVERY (UDDI) + MESSAGE (XML) Service Oriented Architecture Components are defined by service interfaces (e.g. Web Services) Characterized by: – Abstract logical view of programs, databases etc – Services defined by exchanged messages (not by properties of the agents themselves) – Internal structure of agent is not relevant (can accommodate legacy systems) – Services defined by machine processable meta data (documented semantics) – Small number of operations – Services oriented towards network usage – Platform neutral (e.g. messages in XML) Open Grid Services Architecture Resulted from attempt to standardize GT protocols, influenced by uptake of web services and SoA ideas: – Modularize components for different grid functions – Uniform treatment of network entities (service orientation) – Standard IDLs aligned with Web services – Develop within standards body (Global Grid Forum) Open Grid Services Architecture Grid Service – A web service which is extended to include transient and stateful services OGSI specification – Open Grid Services Infrastructure – Defines interfaces, behaviours and conventions for grid services – Now replaced by range of web service definitions OGSA defines services and interfaces required in a working grid environment – GGF working groups are identifying required functions and then making OGSI compliant interfaces Multiple implementations – GT3: reference implementation of OGSI and basic OGSA services – GT4: pure web services GT4 Released April 2005 Service oriented architecture Web services to describe and invoke most components GT4 web service containers for deploying and managing GT4 services (Java, C, Python) Most interfaces still need to be standardized Coursework 3 Write one or two pages describing each of the following Globus components: – GRAM – MDS – GridFTP Best documentation and relevant papers at http://www.globus.org Required Reading The Physiology of the Grid – See course page for link