Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Resource management (computing) wikipedia , lookup

Application Interface Specification wikipedia , lookup

Transcript
Grid Computing 7700
Fall 2005
Lecture 5: Grid Architecture and Globus
Gabrielle Allen
[email protected]
http://www.cct.lsu.edu/~gallen
Concrete Example


I have a source file Main.F on machine A,
an input file on machine B. Main.F is written
using MPI, it will need around 4GB of core
memory to run, it will take several hours to
complete, and will produce a large output
file.
What functionality do we need?
Issues








How to select a machine to run it on?
How to provide an executable which can run on
that machine?
How to move the input file?
How to start the executable?
How to monitor the job? When does it start?
When does it finish?
How to move the output file back?
What about security?
How do we know if it didn’t work and how it failed?
How to Select a Machine

What properties of a machine are we interested
in?
– What resources does my executable require?
• 4 GB memory, “several hours of compute time”
• Enough diskspace for the output
– What kind of environment do I need on the machine?
–
–
–
–
• OS limitations?
• MPI? (Which version?), Fortran?
What resources am I authorized to run on?
How quickly will it run?
How much will it cost/what is my allocation there?
How to find all this information? What should the user
provide?
More Complicated





What if the program might need to read in data
kept on machine C while it is running?
What about distributing across processors on
different machines?
What if I have a lot of interconnected programs?
How do I find the output file afterwards?
What is it doesn’t work?
Questions




What kind of functionality do we need?
What tools exist to do this?
What kinds of features of distributed
computing do they need to be designed?
What design issues to watch for?
Abstract Requirements


Single sign-on
Job submission, monitoring and management
–
–
–
–

submit a job to a resource on the grid
monitor the progress of a submitted job
retrieve results
cancel job
File transfer
– move files from A to B, securely, reliably and efficiently

Resource discovery
– locate resources or services with particular characteristics
Less typical:
 Metacomputing, workflow enactment, resource brokering,...
What do I have to choose from?

Globus Toolkit
–
–
–
–

version 2 is widely deployed; nearest thing to a de facto standard
horizontally integrated bag of tools
suits grid application developers better than end users
Brand new V4 based on web services
UNICORE
– less widely deployed; few UK deployments
– vertically integrated
– suits end users better than application developers

Condor
– high throughput computing
– great for cycle harvesting

Web Services?
– GT4 or roll your own using Web Services tools

Others
– yes, there are others
Increased functionality,
standardization
Computationally intensive
File access/transfer
Bag of various heterogeneous
protocols & toolkits
Monolithic design
Recognised internet, ignored Web
Academic teams
Web services
X.509,
LDAP,
FTP, …
Custom
solutions
Globus Toolkit
Condor, Unicore
Defacto standards
GridFTP, GSI
(adapted from Ian Foster GGF7 Plenary)
Generation Game
App-specific
Services
Open Grid
Services
Architecture
Data and knowledge intensive
Open services-based architecture
Builds on Web services
GGF + OASIS+W3C
Multiple implementations
Time Global Grid Forum
Industry participation
QuickTime™ and a
FF (Uncompressed) decompressor
are needed to see this picture.
Grid Architecture
Application
Collective
Resource
Connectivity
Fabric
Fabric Layer


Contains the resources themselves which the Grid
infrastructure needs to access
Fabric components implement local, resource
specific operations to provide higher level Grid
operations
– NFS storage protocol
– Kerberos security
– PBS queuing system

Grid cannot provide more than local operations can
support (e.g. advanced reservation)
Fabric Layer




Computational resources
Storage resources
Network resources
But also
– Database resources
– Code repository resources
– Etc.
Fabric Layer

What is the minimum functionality?
– Introspection mechanisms:
• Computational resources: hardware, software
characteristics, state information such as current load and
queue state
• Storage resources: hardware, software characteristics,
available space
• Network resources: network characteristics and load
– Resource management mechanisms
• Computational resources: starting programs, monitoring and
controlling execution of resulting programs
• Storage resources: file put and get
Fabric Layer

What is desirable?
– Introspection mechanisms:
• Storage resources: bandwidth utilization
– Resource management mechanisms
• Computational resources: control over resources
allocated to processes, advanced reservation
• Storage resources: 3rd party transfers, high
performance transfers, put and get of file subsets,
callback functionality
• Network resources: control of resources,
prioritization, reservation
Connectivity Layer





Core communication and authentication protocols
for needed network transactions
Exchange of data between fabric layer resources
Security
Requirements: transport, routing, naming
Assumed using protocols from TCP/IP stack (IP,
ICMP, TCP, UCP, DNS, OSPF, RSVP, …), but could
be others.
Connectivity Layer

Security requirements
–
–
–
–
–
Single sign-on to all resources
Delegation of rights
Integration with local security
Implementation of trust relations
Secure transport of data
Resource Layer



Protocols for secure negotiation, initiation,
monitoring, control, accounting on individual
resources
Concerned with individual resources (addressed in
next layer)
Information protocols
– Obtaining information about structure and state of a
resource

Management protocols
– Negotiating access for given resource requirements,
performing operations (job starting, data access).
Monitoring and controlling resources and processes.
QuickTime™ and a
FF (Uncompressed) decompressor
are needed to see this picture.
Grid Architecture
Application
Collective
Resource
Connectivity
Fabric
Resource Layer



Protocols for secure negotiation, initiation, monitoring,
control, accounting on individual resources
Concerned with individual resources (addressed in next
layer)
Information protocols
– Obtaining information about structure and state of a resource

Management protocols
– Negotiating access for given resource requirements, performing
operations (job starting, data access). Monitoring and
controlling resources and processes.
Collective Layer



Dealing with operations across collective resources
Build on relativity small number of resource/connectivity
protocols
Examples
–
–
–
–
–
Directory services (to provide information about resources)
Co-allocation, scheduling, brokering services
Monitoring and diagnostic services
Data replication services
Community authorization and accounting services
What do I have to choose from?





Globus Toolkit
–
–
–
–
version 2 is widely deployed; nearest thing to a de facto standard
horizontally integrated bag of tools
suits grid application developers better than end users
Brand new V4 based on web services
–
–
–
less widely deployed; few UK deployments
vertically integrated
suits end users better than application developers
–
–
high throughput computing
great for cycle harvesting
–
GT4 or roll your own using Web Services tools
–
yes, there are others
UNICORE
Condor
Web Services?
Others
UNICORE


Packaged Software with GUI
Open source
– http://unicore.sourceforge.net/


Designed for firewalls
Strict security model
– explicit delegation

Abstract Job Object (AJO)
– built-in workflow management

Resource Broker
– can submit to Globus grids


Has notion of software resource
Few APIs
– extend through plug-ins
– starting to expose service
interfaces

Serves the user
http://www.unicore.org/
Condor: High-throughput computing
Condor converts collections of workstations and
clusters into a distributed high-throughput
computing facility
 Emphasis on policy management and reliability
 High-throughput scheduler
 Supports job checkpoint and migration
– single processor jobs only
Remote system calls
Condor-G lets Condor users add Globus-enabled
resources to their private view of a Condor pool
("flock")
 "glide-in"

http://www.cs.wisc.edu/condor/
Legion/Avaki


Object based meta-system, providing a single integrated
infrastructure
All components are objects (unlike GT)
– Data abstraction, encapsulation, inheritance, polymorphism


API to core services
Core object types
–
–
–
–
–
–
Classes/metaclasses: managers and policy makers
Host objects: abstractions of processing resources (one or many)
Vault objects: persistent storage
Implementation objects and caches: “exectuables”
Binding agents: maps objects to physical addresses
Context objects: naming of objects
Globus Toolkit V2



GT2 “Implements Grid protocols for security,
information discovery, resource management, data
management, communication, fault detection and
portability”
Bag of tools rather than a uniform programming
model, aims to provide distinct services with well
defined APIs
Assumes suitable software deployed on resources
to provide basic fabric functionality (although
some tools to help this are provided)
– Discovering and packaging structure and state
information
Globus Toolkit version 2


"Single sign-on" through Grid Security
Infrastructure (GSI)
Remote execution of jobs
–

Grid-FTP
–






GRAM, job-managers, Resource Specification Language
(RSL)
Efficient, reliable file transfer; third-party file
transfers
Applications
Diverse global services
MDS (Metacomputing Directory Service)
–
Resource discovery (GRIS and GIIS)
–
Limited by support from scheduling infrastructure
–
gsi-ssh, grid-cvs, etc.
Co-allocation (DUROC)
Other GSI-enabled utilities
Low-level APIs and command-line interfaces
Commodity Grid Kits (CoG-kits), Java, Perl, Python
Core
services
Widespread deployment, lots of projects
Local OS
Globus Toolkit V2

Connectivity
– Grid Security Infrastructure (GSI) protocols
– Based on public-key-infrastructure (PKI) and Internet protocols
– Single sign-in (authentication creates a proxy credential: a digitally
signed certificate that grants the holder the right to perform
operations on behalf of signer for a limited time)
– Delegation (communication of a (restricted) proxy credential to a
remote service)
– Credential format is extension of X.509 certificate
– Remote delegation protocol based on transport layer security (TLS)
protocol (follow on to SSL)
– High-level programming API extensions of generic sercurity service
application programming interface (GSS-API)
Globus Toolkit V2

Resource Layer
– Grid Resource Allocation and Management
(GRAM) protocol
– Monitoring and Discovery Service (MDS-2)
– Grid File Transfer Protocol (GridFTP)
GRAM Protocol

Grid Resource Allocation and Management
– Creation and management of remote computations
– GSI for authentication, authorization, delegation
– GRAM implementations map requests expressed in a Resource
Specification Language (RSL) into commands understood by
local schedulers and computers
– Multiple GRAM implementations exist (with C, Java, Python
interfaces)
– GT2 implementation
• Based on HTTP protocol
• “gatekeeper” initiates remote computations
• “jobmanager” manages remote computation
• GRAM reporter monitors and publishes information
MDS-2

Monitoring and Discovery Service
– Framework for discovering and accessing structure and
status information about resources (and services)
• Data model for representing information
• Protocols for publishing and accessing information
– GT2 implementation
• Based on LDAP (lightweight directory access protocol)
• Local registry to manage collection and publication of
information at a single location
• Collective registry to support queries for information from
multiple locations
• Caching for performance
GridFTP Protocol

Extended version of file transfer protocol
–
–
–
–
GSI security
Partial file access, high speed striping
Third party transfers
Separate control/data channels
Increased functionality,
standardization
Computationally intensive
File access/transfer
Bag of various heterogeneous
protocols & toolkits
Monolithic design
Recognised internet, ignored Web
Academic teams
Web services
X.509,
LDAP,
FTP, …
Custom
solutions
Globus Toolkit
Condor, Unicore
Defacto standards
GridFTP, GSI
(adapted from Ian Foster GGF7 Plenary)
Generation Game
App-specific
Services
Open Grid
Services
Architecture
Data and knowledge intensive
Open services-based architecture
Builds on Web services
GGF + OASIS+W3C
Multiple implementations
Time Global Grid Forum
Industry participation
Web Services









A Web service is a software system designed to support interoperable
machine-to-machine interaction over a network.
It has an interface that is described in a machine-processable format such
as WSDL.
Other systems interact with the Web service in a manner prescribed by its
interface using messages (usually enclosed in a SOAP envelope).
These messages are typically conveyed using HTTP, and are normally
comprised of XML
Software applications written in various programming languages and running
on various platforms can use web services to exchange data over networks.
This interoperability (e.g., between Java and Python, or Windows and Linux
applications) is due to the use of open standards.
OASIS and the W3C are the primary committees responsible for the
architecture and standardization of web services.
Specifications for additional features under development.
Basically: Web service = TRANSPORT (HTTP) + MESSAGING (SOAP) +
DESCRIPTION (WSDL) + DISCOVERY (UDDI) + MESSAGE (XML)
Service Oriented Architecture


Components are defined by service interfaces (e.g.
Web Services)
Characterized by:
– Abstract logical view of programs, databases etc
– Services defined by exchanged messages (not by
properties of the agents themselves)
– Internal structure of agent is not relevant (can
accommodate legacy systems)
– Services defined by machine processable meta data
(documented semantics)
– Small number of operations
– Services oriented towards network usage
– Platform neutral (e.g. messages in XML)
Open Grid Services Architecture

Resulted from attempt to standardize GT
protocols, influenced by uptake of web
services and SoA ideas:
– Modularize components for different grid
functions
– Uniform treatment of network entities (service
orientation)
– Standard IDLs aligned with Web services
– Develop within standards body (Global Grid
Forum)
Open Grid Services Architecture

Grid Service
– A web service which is extended to include transient and stateful
services

OGSI specification
– Open Grid Services Infrastructure
– Defines interfaces, behaviours and conventions for grid services
– Now replaced by range of web service definitions

OGSA defines services and interfaces required in a working grid
environment
– GGF working groups are identifying required functions and then making
OGSI compliant interfaces

Multiple implementations
– GT3: reference implementation of OGSI and basic OGSA services
– GT4: pure web services
GT4





Released April 2005
Service oriented architecture
Web services to describe and invoke most
components
GT4 web service containers for deploying and
managing GT4 services (Java, C, Python)
Most interfaces still need to be standardized
Coursework 3

Write one or two pages describing each of
the following Globus components:
– GRAM
– MDS
– GridFTP

Best documentation and relevant papers at
http://www.globus.org
Required Reading

The Physiology of the Grid
– See course page for link