Download Monitoring and Discovery - Pegasus Workflow Management System

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Information security wikipedia , lookup

Database model wikipedia , lookup

Business intelligence wikipedia , lookup

Search engine indexing wikipedia , lookup

Information privacy law wikipedia , lookup

Do Not Track legislation wikipedia , lookup

Transcript
Grid Discovery and
Monitoring Systems
Laura Pearlman
USC/Information Sciences Institute
With materials from Ben Clifford and others
from the Globus Project Team
Outline

Overview of information systems

Some real implementations

Globus MDS2 / BDII

Globus MDS4

Inca

GMA / R-GMA
Discovery and Monitoring

Discovery: finding resources that exist, at any
moment, possibly meeting some criteria


Monitoring: determining the state of one or more
resources


E.g., “find linux boxes with Java 1.5 installed”
E.g., “how much memory is free on machine X”?
“Monitoring” and “Discovery” information
sometimes overlap

“find me machines with 2G memory” vs. “how much
memory does Machine X have”
Examples of Useful Information
 Characteristics
of a compute resource
Software
available, networks connected to,
load, type of CPU, disk space
 Characteristics
Bandwidth
 Information
Contact
of a network
and latency, protocols
about a service
info, version number, etc.
Who uses this information?



Individual users, trying to pick the ‘best’ resource
Brokers or workflow systems trying to find suitable
resources
VO administrators who want to know the state of
every resource.

System administrators may use this information, but
probably also have local site monitoring systems in
place
What Interfaces are Needed?



Graphic and command-line interfaces for
individual users and administrators
Programmatic interfaces for brokers,
workflow systems, etc.
Asynchronous notifications for
administrators

“send me mail when we’re almost out of
disk space”
Monitoring/Discovery
Problems in Grids

Dynamic in nature

VOs come and go

Resources join and leave VOs

Resources change status and fail

Geographically distributed users

Geographically distributed resources

Heterogeneous implementations
Grid Information: Facts of Life
 Information
 Distributed
is always old
state hard to obtain
 Components
We
will fail
must deal with this gracefully
 Scalability
 Many
and overhead
different usage scenarios
Resource Discovery/Monitoring
R
R
R
R
?
R
dispersed users
?
network
R
R
R
R
R
R
R
?
?
R
R
R
R
R
R
R
VO-A

Distributed users and resources

Variable resource status

Variable grouping
VO-B
R
Resource Discovery/Monitoring
R
R
R
R
?
R
dispersed users
?
network
R
R
R
R
R
R
R
?
?
R
R
R
R
R
R
R
VO-A

Some resources have failed

A network partition has occurred

Still, some work can get done…
VO-B
R
Scalability

Large numbers
Many resources
 Many users


Independence
Resources shouldn’t affect one another
 VOs shouldn’t affect one another


Graceful degradation of service
“As much function as possible”
 Tolerate partitions, prune failures

Failure Scenarios

User is disconnected

Resource fails or is disconnected

Discovery service fails or is disconnected

Network partition
When a user is disconnected



This should not adversely affect other users
Some state (such as the user’s subscriptions) may
need to be cleaned up.
Some systems use soft-state to deal with this
issue:


Subscriptions are valid for a limited time and must
be periodically refreshed
If the user does not come back in time to refresh the
subscription, it will be removed automatically.
When a resource disappears



Monitoring services should indicate that
the resource is no longer there
Discovery services should stop advertising
the resource
Neither of these can be gauranteed to
happen instantaneously.
When a discovery service dies




Users cannot discover new resources.
They may have old information cached –
this data is still useful, although it degrates
in quality/usefulness.
Users can contact the resources directly
and determine their status.
Some implementations allow for mirroring
of discovery services.
When the network is partitioned


This could be seen as a generalization of
some the previous scenarios – all of the
previous scenarios can be modelled as
appropriate network partitions.
If there is a discovery service in a user’s
partition, the user should be able to
discover resources in that partition.
Information Systems

We sometimes refer to Discovery and
Monitoring as “Information Systems”


This is misleading, as we’re not including
general-purpose database systems
Discovery and Monitoring information is:

Often stale as soon as it’s reported

Sometimes inconsistent

Often updated by running probes, either
on-demand or periodically
Discovery Services


Used to locate monitoring services with
information about resources.
May cache some resource data


Generally involve a database-like query interface


May even cache enough resource data to act as a
monitoring system.
Languages like ldap, xpath, sql
Usually a relatively small number (maybe even
just one, or one with a mirror) are deployed in a
VO.
Two Models for Discovery
Services
Discovery
Service
Monitoring
Service
Monitoring
Service
Monitoring
Service
Monitoring
Service
Monitoring
Service
Monitoring
Service
Monitoring
& Discovery
Service
Monitoring Services

Used to monitor the state of a resource

Service interface usually involves db-like queries


With languages like ldap, xpath, sql

Often also provides for asynchronous notification
Typically also includes a back-end provider
interface


Allows locally-written scripts, programs, etc. to
collect information for the monitoring service
Typically deployed on each host that houses a
resource.
How Different
Implementations Differ

Overall architecture


Wire protocol



LDAP, Web Services, custom
Query Language


Are monitoring and discovery separate?
LDAP, Xpath, SQL
Caching Strategies
Schemas

Really more a deployment issue
MDS2 / BDII history

MDS2 was developed as part of the Globus
Toolkit


It’s now superseded by MDS4, which has a
different architecture.
BDII is a reimplementation of MDS2 by
EGEE, and is still in use.
MDS2 Architecture Overview

The Grid Resource Information Service (GRIS)
collects information about a local resource and
responds to requests for that information




Uses pluggable information providers
The Grid Index Information Service (GIIS)
aggregates information from various GRIS servers
Users may query the GIIS for aggregated
information or query the GRIS servers directly.
GIIS servers may be arranged hierarchically.
MDS2 Architecture
GIIS
GIIS
GRIS
IP
IP
GIIS
GRIS
IP
IP
GRIS
IP
IP
MDS2 GIIS

Grid Index Information Service (GIIS) servers
aggregate information from GRIS servers and
other GIIS servers.




These other servers register themselves to the GIIS
server.
Registrations must be periodically refreshed
GIIS servers cache information (results from
previous queries).
If a GIIS server receives a query for which there is
no fresh cached information, it forwards the query
to its registered servers.
MDS2 GRIS

A Grid Resource Information Server (GRIS):


Runs on each host that has resources to be
monitored.
Accepts requests for information about local
resources


Runs a local “information provider” to collect and
format the information


May come from users or GIIS servers
Unless the requested information is cached and relatively
fresh
Caches the information and replies to the request
MDS2 Query Language

Both the GIIS and GRIS servers use LDAP
as the service protocol and query
language.
LDAP Basics



Hierarchical data model
Each entry has a distinguished name and a set of
attribute/value pairs
Distinguished name



Is a collection of name-value pairs
Must be unique
Determines the entry’s place in the hierarchy


Each entry’s DN must include its parent’s DN
Queries


Can search on attributes or DNs
Results can include children (or not) or include only
certain attributes.
MDS4 Overview


MDS4 is a redesign of MDS
The MDS4 Index Service acts as both a monitoring
and discovery service.


A second monitoring service, the MDS4 Trigger
Service, examines aggregated information and
takes action when certain conditions are met.


Uses WSRF standard resource property queries as its
query interface.
E.g., “send email when a remote system appears to
be down”.
MDS4 uses WSRF standards for its query and
registration interfaces.
WS-Resource Review

A WS-Resource is a Web Service that exposes
internal state as Resource Properties


Each WS-Resource has a Resource Property
Document


An XML element of arbitrary complexity
An XML document that includes all its Resource
Properties
Example: The WS-GRAM service advertises
information about its associated queues and
clusters as a resource property.
Retrieving Resource Properties

GetResourceProperty


GetMultipleResourceProperties


Gets a set of named resource properties
QueryResourceProperty


Gets a single named resource property
Returns the results of a query against a resource’s
resource property set
Subscription/notification

Clients subscribe and get periodic or occasional
notifications
What this means…


Standard requests can be used to get state
information from any WS-Resource.
This means that every WS-Resource is also a
monitoring service!


But not necessarily monitoring anything (i.e.,
providing any interesting state)
We sometimes want information from sources
other than WS Resources



Non-WSRF services
General system information
Catalogues of installed software
Service Groups Review


A service group is a service that represents a
group of other services or resources
Service groups contain Service Group Entries
(SGEs), which consist of:




The address of the SGE itself,
The address of the Service Group that the SGE
belongs to, and
A Content element consisting of arbitrarily-formatted
data
SGEs are created via the Service Group Add
request
The MDS4 Index Service

Acts as a Discovery Service



Gathers information from other WSResources
Including other Index Servers
Acts as a Monitoring Service


Caches all the information it gathers
Also has a pluggable interface for
Information Providers

Programs or Java classes that gather information
An MDS4 Index Deployment
Index
Index
GRAM
Index
Index
RFT
GRAM
Index
RFT
IP
IP
The MDS4 Index Data Model

The Index Service keeps its data as a
Service Group


Registering a new resource to be monitored
is accomplished by adding a service group
entry to the service group.
The data in each SGE contains both:

Configuration information


E.g., “query the X resource property from server Y”
and the actual collected data.
Index Data Model (simplified)
Index Service Group
SGE
SG EPR
SGE
SGE EPR
Content
Config
GetRP
RP
EPR
Data
GLUECE
Queue
Name State
Cluster
Name
OS
Data Model continued


In the Index Service data model, data is
grouped with its configuration information
Can have the “same” data two different
places in the tree, if it was acquired from
two different information sources.


E.g., information about a host’s load
average from two different GRAM servers
running on that host.
Relatively easy to find where each piece of
data came from.
How the Index Updates its Data



Periodically, the Index Service examines each SGE
in its Service Group
If the SGE’s registration has expired and not been
renewed, it is destroyed.
Otherwise, the Index




looks at the Config part of the SGE content,
gathers data as specified by that config information,
and
updates the data in the Data part of the SGE content
Data is updated periodically, not on demand.
Querying the Index Service

The Index Service advertises its service
group as a resource property



You can fetch the whole thing with GetRP or
GetMultipleRPs
Most people use QueryRP to query it.
QueryRP allows you to specify a dialect and
a query

Currently, only Xpath is supported as a
dialect
XPath Queries


Search an XML document and return some
subset of the XML entities.
If an entity is included in the results, it’s
included in its entirety

Unlike LDAP, no way to leave out attributes
or children
MDS4 Trigger Service



A second monitoring service in MDS4
The Index is geared more towards queries
intended for resource location and
selection.
The Trigger service is intended to alert
people to problems.

Can be configured to take action (e.g., send
mail to an administrator) when issues arise.
MDS4 Trigger Service


Maintains information in a service group, like the
Index Service
SGE config information also includes an xpath
query and an action


The action is the name of a program to run.
Periodically, the trigger service looks at each SGE
in its servicegroup:


It evaluates the SGE’s xpath query against the SGE’s
data.
If the query returns true, it runs the program
specified by the action.
MDS4 WebMDS

Provides a simple HTTP interface to query
an MDS Index Service



Really, to query resource properties of any
WS-Resource
Optionally applies XSLT transforms to the
query results.
Designed as a user interface, to be used
with a web browser

But some people are using it to provide a
REST-like interface to MDS4.
INCA






Monitoring system developed at SDSC
Users define tests for Inca to run.
Inca runs them and stores the results in a
database.
Users can view the results on a web page.
Can be configured to send mail if tests fail,
etc.
Can run tests using the user’s credentials
From the Inca 2.1 User’s Guide, http://inca.sdsc.edu/releases/2.1/guide/userguide.html
Inca Query Interface



Uses an SQL database internally
End-users can query using a web page or
receive notifications via email.
A web-services interface is also available



Uses a custom query language
Overall a nice monitoring/testing
framework
Not designed as a discovery service
GMA (Grid Monitoring
Architecture)

Proposed architecture with three
components:

Producers produce information

Consumers consume information

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Directories keep track of what information
is available

what producers can be queried, not the actual data
Diagram from “A Grid Monitoring Architecture”, B. Tierney et al., http://wwwdidc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-2.pdf
R-GMA

Relational Grid Monitoring Architecture

Implements the GMA model



Except that users never interact with the
directory service (called a “registry” in RGMA)
A consumer service does that instead, and
users query the consumer service.
Uses SQL as its query language.
An R-GMA Query
•Client sends SQL query to Consumer Service
•Consumer Service contacts registry for list of
producers to contact
•Consumer service queries producers and buffers
results
•Client retrieves results from consumer service
Diagram from “R-GMA: Architectural Design” at http://www.r-gma.org/archconsumers.html
For More Information

Globus: http://www.globus.org

Inca: http://inca.sdsc.edu

R-GMA: http://www.r-gma.org

XML / Xpath / XSLT: http://www.w3c.org