* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Monitoring and Discovery - Pegasus Workflow Management System
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Grid Discovery and Monitoring Systems Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team Outline  Overview of information systems  Some real implementations  Globus MDS2 / BDII  Globus MDS4  Inca  GMA / R-GMA Discovery and Monitoring  Discovery: finding resources that exist, at any moment, possibly meeting some criteria   Monitoring: determining the state of one or more resources   E.g., “find linux boxes with Java 1.5 installed” E.g., “how much memory is free on machine X”? “Monitoring” and “Discovery” information sometimes overlap  “find me machines with 2G memory” vs. “how much memory does Machine X have” Examples of Useful Information  Characteristics of a compute resource Software available, networks connected to, load, type of CPU, disk space  Characteristics Bandwidth  Information Contact of a network and latency, protocols about a service info, version number, etc. Who uses this information?    Individual users, trying to pick the ‘best’ resource Brokers or workflow systems trying to find suitable resources VO administrators who want to know the state of every resource.  System administrators may use this information, but probably also have local site monitoring systems in place What Interfaces are Needed?    Graphic and command-line interfaces for individual users and administrators Programmatic interfaces for brokers, workflow systems, etc. Asynchronous notifications for administrators  “send me mail when we’re almost out of disk space” Monitoring/Discovery Problems in Grids  Dynamic in nature  VOs come and go  Resources join and leave VOs  Resources change status and fail  Geographically distributed users  Geographically distributed resources  Heterogeneous implementations Grid Information: Facts of Life  Information  Distributed is always old state hard to obtain  Components We will fail must deal with this gracefully  Scalability  Many and overhead different usage scenarios Resource Discovery/Monitoring R R R R ? R dispersed users ? network R R R R R R R ? ? R R R R R R R VO-A  Distributed users and resources  Variable resource status  Variable grouping VO-B R Resource Discovery/Monitoring R R R R ? R dispersed users ? network R R R R R R R ? ? R R R R R R R VO-A  Some resources have failed  A network partition has occurred  Still, some work can get done… VO-B R Scalability  Large numbers Many resources  Many users   Independence Resources shouldn’t affect one another  VOs shouldn’t affect one another   Graceful degradation of service “As much function as possible”  Tolerate partitions, prune failures  Failure Scenarios  User is disconnected  Resource fails or is disconnected  Discovery service fails or is disconnected  Network partition When a user is disconnected    This should not adversely affect other users Some state (such as the user’s subscriptions) may need to be cleaned up. Some systems use soft-state to deal with this issue:   Subscriptions are valid for a limited time and must be periodically refreshed If the user does not come back in time to refresh the subscription, it will be removed automatically. When a resource disappears    Monitoring services should indicate that the resource is no longer there Discovery services should stop advertising the resource Neither of these can be gauranteed to happen instantaneously. When a discovery service dies     Users cannot discover new resources. They may have old information cached – this data is still useful, although it degrates in quality/usefulness. Users can contact the resources directly and determine their status. Some implementations allow for mirroring of discovery services. When the network is partitioned   This could be seen as a generalization of some the previous scenarios – all of the previous scenarios can be modelled as appropriate network partitions. If there is a discovery service in a user’s partition, the user should be able to discover resources in that partition. Information Systems  We sometimes refer to Discovery and Monitoring as “Information Systems”   This is misleading, as we’re not including general-purpose database systems Discovery and Monitoring information is:  Often stale as soon as it’s reported  Sometimes inconsistent  Often updated by running probes, either on-demand or periodically Discovery Services   Used to locate monitoring services with information about resources. May cache some resource data   Generally involve a database-like query interface   May even cache enough resource data to act as a monitoring system. Languages like ldap, xpath, sql Usually a relatively small number (maybe even just one, or one with a mirror) are deployed in a VO. Two Models for Discovery Services Discovery Service Monitoring Service Monitoring Service Monitoring Service Monitoring Service Monitoring Service Monitoring Service Monitoring & Discovery Service Monitoring Services  Used to monitor the state of a resource  Service interface usually involves db-like queries   With languages like ldap, xpath, sql  Often also provides for asynchronous notification Typically also includes a back-end provider interface   Allows locally-written scripts, programs, etc. to collect information for the monitoring service Typically deployed on each host that houses a resource. How Different Implementations Differ  Overall architecture   Wire protocol    LDAP, Web Services, custom Query Language   Are monitoring and discovery separate? LDAP, Xpath, SQL Caching Strategies Schemas  Really more a deployment issue MDS2 / BDII history  MDS2 was developed as part of the Globus Toolkit   It’s now superseded by MDS4, which has a different architecture. BDII is a reimplementation of MDS2 by EGEE, and is still in use. MDS2 Architecture Overview  The Grid Resource Information Service (GRIS) collects information about a local resource and responds to requests for that information     Uses pluggable information providers The Grid Index Information Service (GIIS) aggregates information from various GRIS servers Users may query the GIIS for aggregated information or query the GRIS servers directly. GIIS servers may be arranged hierarchically. MDS2 Architecture GIIS GIIS GRIS IP IP GIIS GRIS IP IP GRIS IP IP MDS2 GIIS  Grid Index Information Service (GIIS) servers aggregate information from GRIS servers and other GIIS servers.     These other servers register themselves to the GIIS server. Registrations must be periodically refreshed GIIS servers cache information (results from previous queries). If a GIIS server receives a query for which there is no fresh cached information, it forwards the query to its registered servers. MDS2 GRIS  A Grid Resource Information Server (GRIS):   Runs on each host that has resources to be monitored. Accepts requests for information about local resources   Runs a local “information provider” to collect and format the information   May come from users or GIIS servers Unless the requested information is cached and relatively fresh Caches the information and replies to the request MDS2 Query Language  Both the GIIS and GRIS servers use LDAP as the service protocol and query language. LDAP Basics    Hierarchical data model Each entry has a distinguished name and a set of attribute/value pairs Distinguished name    Is a collection of name-value pairs Must be unique Determines the entry’s place in the hierarchy   Each entry’s DN must include its parent’s DN Queries   Can search on attributes or DNs Results can include children (or not) or include only certain attributes. MDS4 Overview   MDS4 is a redesign of MDS The MDS4 Index Service acts as both a monitoring and discovery service.   A second monitoring service, the MDS4 Trigger Service, examines aggregated information and takes action when certain conditions are met.   Uses WSRF standard resource property queries as its query interface. E.g., “send email when a remote system appears to be down”. MDS4 uses WSRF standards for its query and registration interfaces. WS-Resource Review  A WS-Resource is a Web Service that exposes internal state as Resource Properties   Each WS-Resource has a Resource Property Document   An XML element of arbitrary complexity An XML document that includes all its Resource Properties Example: The WS-GRAM service advertises information about its associated queues and clusters as a resource property. Retrieving Resource Properties  GetResourceProperty   GetMultipleResourceProperties   Gets a set of named resource properties QueryResourceProperty   Gets a single named resource property Returns the results of a query against a resource’s resource property set Subscription/notification  Clients subscribe and get periodic or occasional notifications What this means…   Standard requests can be used to get state information from any WS-Resource. This means that every WS-Resource is also a monitoring service!   But not necessarily monitoring anything (i.e., providing any interesting state) We sometimes want information from sources other than WS Resources    Non-WSRF services General system information Catalogues of installed software Service Groups Review   A service group is a service that represents a group of other services or resources Service groups contain Service Group Entries (SGEs), which consist of:     The address of the SGE itself, The address of the Service Group that the SGE belongs to, and A Content element consisting of arbitrarily-formatted data SGEs are created via the Service Group Add request The MDS4 Index Service  Acts as a Discovery Service    Gathers information from other WSResources Including other Index Servers Acts as a Monitoring Service   Caches all the information it gathers Also has a pluggable interface for Information Providers  Programs or Java classes that gather information An MDS4 Index Deployment Index Index GRAM Index Index RFT GRAM Index RFT IP IP The MDS4 Index Data Model  The Index Service keeps its data as a Service Group   Registering a new resource to be monitored is accomplished by adding a service group entry to the service group. The data in each SGE contains both:  Configuration information   E.g., “query the X resource property from server Y” and the actual collected data. Index Data Model (simplified) Index Service Group SGE SG EPR SGE SGE EPR Content Config GetRP RP EPR Data GLUECE Queue Name State Cluster Name OS Data Model continued   In the Index Service data model, data is grouped with its configuration information Can have the “same” data two different places in the tree, if it was acquired from two different information sources.   E.g., information about a host’s load average from two different GRAM servers running on that host. Relatively easy to find where each piece of data came from. How the Index Updates its Data    Periodically, the Index Service examines each SGE in its Service Group If the SGE’s registration has expired and not been renewed, it is destroyed. Otherwise, the Index     looks at the Config part of the SGE content, gathers data as specified by that config information, and updates the data in the Data part of the SGE content Data is updated periodically, not on demand. Querying the Index Service  The Index Service advertises its service group as a resource property    You can fetch the whole thing with GetRP or GetMultipleRPs Most people use QueryRP to query it. QueryRP allows you to specify a dialect and a query  Currently, only Xpath is supported as a dialect XPath Queries   Search an XML document and return some subset of the XML entities. If an entity is included in the results, it’s included in its entirety  Unlike LDAP, no way to leave out attributes or children MDS4 Trigger Service    A second monitoring service in MDS4 The Index is geared more towards queries intended for resource location and selection. The Trigger service is intended to alert people to problems.  Can be configured to take action (e.g., send mail to an administrator) when issues arise. MDS4 Trigger Service   Maintains information in a service group, like the Index Service SGE config information also includes an xpath query and an action   The action is the name of a program to run. Periodically, the trigger service looks at each SGE in its servicegroup:   It evaluates the SGE’s xpath query against the SGE’s data. If the query returns true, it runs the program specified by the action. MDS4 WebMDS  Provides a simple HTTP interface to query an MDS Index Service    Really, to query resource properties of any WS-Resource Optionally applies XSLT transforms to the query results. Designed as a user interface, to be used with a web browser  But some people are using it to provide a REST-like interface to MDS4. INCA       Monitoring system developed at SDSC Users define tests for Inca to run. Inca runs them and stores the results in a database. Users can view the results on a web page. Can be configured to send mail if tests fail, etc. Can run tests using the user’s credentials From the Inca 2.1 User’s Guide, http://inca.sdsc.edu/releases/2.1/guide/userguide.html Inca Query Interface    Uses an SQL database internally End-users can query using a web page or receive notifications via email. A web-services interface is also available    Uses a custom query language Overall a nice monitoring/testing framework Not designed as a discovery service GMA (Grid Monitoring Architecture)  Proposed architecture with three components:  Producers produce information  Consumers consume information  QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Directories keep track of what information is available  what producers can be queried, not the actual data Diagram from “A Grid Monitoring Architecture”, B. Tierney et al., http://wwwdidc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-2.pdf R-GMA  Relational Grid Monitoring Architecture  Implements the GMA model    Except that users never interact with the directory service (called a “registry” in RGMA) A consumer service does that instead, and users query the consumer service. Uses SQL as its query language. An R-GMA Query •Client sends SQL query to Consumer Service •Consumer Service contacts registry for list of producers to contact •Consumer service queries producers and buffers results •Client retrieves results from consumer service Diagram from “R-GMA: Architectural Design” at http://www.r-gma.org/archconsumers.html For More Information  Globus: http://www.globus.org  Inca: http://inca.sdsc.edu  R-GMA: http://www.r-gma.org  XML / Xpath / XSLT: http://www.w3c.org
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            