Download Data - Digital Science Center

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Computer simulation wikipedia , lookup

General circulation model wikipedia , lookup

Corecursion wikipedia , lookup

Neuroinformatics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
http://www.grid2002.org
Complexity
Computational
Environment,
integrating data and
simulation on the Grid:
Multiscale computing
JPL
June 18 2003
Geoffrey Fox, Marlon Pierce
Community Grids Lab
Indiana University
[email protected]
http://academia.web.cern.ch/academia/lectures/grid/
Grid Backdrop from CT Project
• Grid Computational Environment (GCE) for SERVOGrid based
on Web services (WS)
• Job submission Job management, simple security (to be
addressed), File processing
• Support as WS key simulation and Pattern recognition codes
(DISLOC*, SIMPLEX*, VC, PARK, GEOFEST*, DAHMM,
PDPC)
– *Current
•
•
•
•
•
Support databases and visualization
Simple workflow, notification, metadata services
Initial Schema for GEM specific (meta-)data
Portlet based Interfaces
Extend to ACES (Japan, Australia) for distributed computers,
software, databases, clients
• Collaboration and other useful portlets
• Can inherit Globus support from Alliance Portal, NMI efforts
AIST Additions
• Compatibility with Grid Services
• Use of OGSA-DAI XML and SQL database standards
– Including extensions for streaming (sensor) data
– Including extensions for integration with simulations
• Optimization for parallel simulations (e.g. parallel IO) (?)
• Better workflow, notification, metadata services
– openGIS/GML compatibility (fault etc. Schema)
– Semantic Grid
• Autonomic (Robust Reliable Resilient) services (?)
• Support multi-scale simulations and data assimilation
• ServoPSE Problem Solving Environments (?)
– GeoLanguage (ServoML specializing CCEML) integrating
workflow and multi-scale support
– Interactive portlet based front end with Matlab and/or Mathemetica
style interface
SERVOGrid Caricature
Repositories
Federated Databases
Database
Loosely Coupled
Filters
Sensor Nets
Streaming Data
Database
Closely Coupled Compute Nodes
Analysis and
Visualization
Sources of Grid Technology?
• Grids support distributed collaboratories or virtual
organizations that support People, Computers,
Observational Data and results of thought and data
processing
• The Web and Web Services
– Most important for Information Grids as these are naturally
service-based
• Distributed Objects (CORBA Java/Jini COM)
– Distributed Object same as a Service
• Globus Legion Condor NetSolve Ninf and other High
Performance Computing activities
– Compute/File Grids that need to be made into services (Globus
GT3) and integrated with Information Grids for Geocomplexity
• Peer-to-peer Networks
Taxonomy of Grid Functionalities
Name of Grid Type
Description of Grid Functionality
Compute/File Grid
or Data File Grid
Run multiple jobs with distributed compute and data
resources (Global “UNIX Shell”)
Desktop Grid
e.g. SETI@Home
Information Grid
or Data Service Grid
“Internet Computing” and “Cycle Scavenging” with secure
sandbox on large numbers of untrusted computers
Complexity or
Hybrid Grid
Hybrid combination of Information and Compute/File Grid
emphasizing integration of experimental data, filters and
simulations: Data assimilation
Campus Grid
Grid supporting University community computing
Enterprise Grid
Grid supporting a company’s enterprise infrastructure
Grid service access to distributed information, data and
knowledge repositories
Approach
• Build on e-Science methodology and Grid
technology
• Geocomplexity (and Biocomplexity)
applications with multi-scale models,
scalable parallelism, data assimilation as
key issues
– Data-driven models for earthquakes
Application WS
WS linking
to user and
Other WS
(data sources)
• Use existing code/database technology
(SQL/Fortran/C++) linked to “Application
Web/OGSA services”
– XML specification of models, computational
steering, scale supported at “Web Service” level
as don’t need “high performance” here
– Allows use of Semantic Grid technology
• AIST builds on CT
Typical
codes
OGSA-DAI
Grid Services
Grid
Grid Data
Assimilation
HPC
Simulation
Analysis
Control
Visualize
This Type of Grid
integrates with
Parallel computing
Multiple HPC
facilities but only
use one at a time
Many simultaneous
data sources and
sinks
Distributed Filters
massage data
For simulation
SERVOGrid (Complexity)Computing Model
Data Assimilation
• Data assimilation implies one is solving some optimization
problem which might have Kalman Filter like structure
Nobs
min
Theoretical Unknowns
2
Data
(
position
,
time
)

Simulated
_
Value
Error



i
i
2
i 1
• As discussed by DAO at Earth Science meeting, one will
become more and more dominated by the data (Nobs much
larger than number of simulation points).
• Natural approach is to form for each local (position, time)
patch the “important” data combinations so that
optimization doesn’t waste time on large error or insensitive
data.
• Data reduction done in natural distributed fashion NOT on
HPC machine as distributed computing most cost effective
if calculations essentially independent
– Filter functions must be transmitted from HPC machine
Distributed Filtering
Nobslocal patch >> Nfilteredlocal patch ≈ Number_of_Unknownslocal patch
In simplest approach, filtered data gotten by linear transformations on
original data based on Singular Value Decomposition of Least squares
matrix
Send needed Filter
Receive filtered data
Nobslocal patch 1
Nfilteredlocal patch 1
Geographically
Distributed
Sensor patches
Nobslocal patch 2
Factorize Matrix
to product of
local patches
Nfilteredlocal patch 2
Distributed
Machine
HPC Machine
Grid Politics
• There is a Global Grid Forum meeting 3 times per year with about
700 attendees per meeting
– Exchange information and define standards for “everything” not done in
W3C and OASIS
– e.g. Grid Service, Security, What is a Job, Database, Computer, How to build
portals ….
• There is a large project called Globus developing software largely
for “compute/file” Grids
• There are some 50 Grid projects (mainly in Europe and USA)
developing software and applications as well as installing
infrastructure
– Some are “deployment”: EDG NMI VDT …..
• There are related initiatives called CyberInfrastructure (NSF USA)
and e-Science (UK)
• There is a proposed OMII (Open Middleware Infrastructure
Institute) – an international Alliance of separately funded projects
with common coordination
OGSA OGSI & Hosting Environments
• Start with Web Services in a hosting environment
• Add OGSI to get a Grid service and a component model
• Add OGSA to get Interoperable Grid “correcting” differences in base
platform and adding key functionalities
Possibly OGSA
More specialized services: data
replication, workflow, etc., etc.
OGSA
Environment
Broadly applicable services: registry,
authorization, monitoring, data
access, etc., etc.
OGSI on Web Services
Given to us from on high
Hosting Environment for WS
Network
Models for resources
& other entities
Domain -specific services
Other
models
Not OGSA
•
•
•
•
•
•
•
•
OGSI Open Grid Service Interface
http://www.gridforum.org/ogsi-wg
It is a “component model” for web services.
It defines a set of behavior patterns that each OGSI service must exhibit.
Every “Grid Service” portType extends a common base type.
– Defines an introspection model for the service
– You can query it (in a standard way) to discover
• What methods/messages a port understands
• What other port types does the service provide?
• If the service is “stateful” what is the current state?
Factory Model
A set of standard portTypes for
– Message subscription and notification
– Service collections
Each service is identified by a URI called the “Grid Service Handle”
GSHs are bound dynamically to Grid Services References (typically wsdl
docs)
– A GSR may be transient. GSHs are fixed.
– Handle map services translate GSHs into GSRs.
OGSA-DAI
(Malcolm Atkinson Edinburgh)
UK e-Science Grid Core Programme
Development of Data Access and Integration Services for OGSA
http://umbriel.dcs.gla.ac.uk/NeSC/general/projects/OGSA_DAI
- Access to XML Databases - Access to Relational Databases - Distributed Query Processing (DB Federation) - XML Schema Support for e-Science -
DAI Key Services
GridDataService
GDS
Access to data & DB operations
GridDataServiceFactory
GDSF
Makes GDS & GDSF
GridDataServiceRegistry
GDSR
Discovery of GDS(F) & Data
GridDataTranslationService GDTS
Translates or Transforms Data
GridDataTransportDepot
Data transport with persistence
GDTD
Integrated Structured Data Transport
Relational & XML models supported
Role-based Authorisation
Binary structured files (later)
Interface transparency:
one GDS supports multiple
database types
Relational
database
Client
Client
Client
Grid
Data
Service
XML
databas
e
Director
y / File
system
Integration of Data and Filters
• One has the OGSA-DAI Data repository interface
combined with WSDL of the (Perl, Fortran, Python …)
filter
• User only sees WSDL not data syntax
• Some non-trivial issues as to where the filtering compute
power is
– Microsoft says filter next to data
WSDL
Of Filter
Filter
OGSA-DAI
Interface
DB
Grid
Portals
Multi
Scale
Info
Grid
Load Balancing
Algorithms
Parallel Computing
Integrated CCE
Computer Science
Extended/Integrated
VA+PARK+GEOFEST
Large System
Simulations
Visualization
Grid
e-Science
Collaboration
Grid
Infrastructure
Modeling
Clusters
General
Complex
Systems
Simulations
Databases
Geology
GeoInformatics
Experiments
Sensors/Satellites
Other Fields
X-Complexity
Field
BioComplexity
Stock Market
Complex
Fluids
SERVOGrid Complexity Computing Environment
Database
Database
Service
Application
Service-1
Application
Service-2
Application
Service-3
Parallel
Simulation
Service
Compute
Service
Middle Tier
with XML
CCE Control
Portal Aggregation
Users
Sensor
Service
Interfaces
XML Meta-data
Service
Complexity
Simulation
Service
Visualization
Service
SERVOGrid Requirements
• Seamless Access to Data repositories and large scale
computers
• Integration of multiple data sources including sensors,
databases, file systems with analysis system
– Including filtered OGSA-DAI
• Rich meta-data generation and access with SERVOGrid
specific Schema extending openGIS standards and using
Semantic Grid
• Portals with component model for user interfaces and web
control of all capabilities
• Collaboration to support world-wide work
• Basic Grid tools: workflow and notification
Portal such as “Jetspeed”
H
o
s
t
i
n
g
E
n
v
i
r
o
n
m
e
n
t
AWS
AWS
AWS
AWS
Application/User Framework supporting
development and deployment of OGSI compliant
AWS (Application Web Services)
Generic Application Services
OGSA Interoperability Layer
“Sophisticated” System Services
OGSA Interoperability Layer
Resource Grid Services
Database
H
o
s
t
i
n
g
E
n
v
i
r
o
n
m
e
n
t
Grid
Computing or
Programming
Environments
Web
Services
“Core”
Grid
e.g. DAI compliant
database
Resources
Taxonomy of Grid Operational Style
Name of Grid Style
Semantic Grid
Peer-to-peer Grid
Description of Grid Operational or
Architectural Style
Integration of Grid and Semantic Web meta-data
and ontology technologies
Grid built with peer-to-peer mechanisms
Lightweight Grid
Grid designed for rapid deployment and minimum
life-cycle support costs
Collaboration Grid
Grid supporting collaborative tools like the Access
Grid, whiteboard and shared applications.
Fault tolerant and self-healing Grid
Robust Reliable Resilient R3
R3 or Autonomic
Grid
Paradigms Protocols Platforms and Hosting
• We can start from the Web view where the basic
Grid paradigm is
• Meta-data rich Web Services communicating via
messages
• These have some basic support from some runtime
such as .NET, Jini (pure Java), Apache
Tomcat+Axis (Web Service toolkit), Enterprise
JavaBeans, WebSphere (IBM) or GT3 (Globus
Toolkit 3)
– These are the distributed equivalent of operating
system functions as in UNIX Shell
• Called Hosting Environment or platform
•
•
•
•
•
•
•
•
•
•
Permeating Principles and Policies
Meta-data rich Message-linked Web Services as the permeating paradigm
“User” Component Model such as “Enterprise JavaBean (EJB)” or .NET.
Service Management framework including a possible Factory mechanism
High level Invocation Framework describing how you interact with system
components.
– This could for example be used to allow the system to built from either W3C or
GGF style (OGSI) Web Services and to protect the user from changes in their
specifications.
Security is a service but the need for fine grain selective authorization encourages
Policy context that sets the rules for each particular Grid.
– Currently OGSA supports policies for routing, security and resource use.
The Grid Fabric or set of resources needs mechanisms to manage them. This
includes automatic recording of meta-data and configuration of software.
Quality of service (QoS) for the Network and this implies performance monitoring
and bandwidth reservation services.
– Challenging as end-to-end and not just backbone QoS is needed.
Messaging systems like MQSeries from IBM provide robustness from asynchronous
delivery and can abstract destination and allow customization of content such as
converting between different interface specifications.
Messaging is built on transport mechanisms which can be used to support
mechanisms to implement QoS and to virtualize ports
Virtualization
• The Grid could and sometimes does virtualize various
concepts
• Location: URI (Universal Resource Identifier) virtualizes
URL
• Replica management (caching) virtualizes file location
generalized by GriPhyn virtual data concept
• Protocol: message transport and WSDL bindings
virtualize transport protocol as a QoS request
• P2P or Publish-subscribe messaging virtualizes matching
of source and destination services
• Semantic Grid virtualizes Knowledge as a meta-data
query
• Brokering virtualizes resource allocation
• Virtualization implies references can be indirect
Interfaces and Functionality and Semantics I
• The Grid platform tries to minimize detail in protocols and
maximize detail in interfaces to enhance scaling
• However rich meta-data and semantics are critical for
correct and interesting operation
– Put as much semantic interpretation as you can into specific
services
– Lack of Semantic interoperation is in fact main weakness of
today’s Grids and Web services
• Everything becomes a service whether system or
application level
• There are some very important “Global Services”
– Discovery (look up) and Registration of service metadata
– Workflow
– MetaSchedulers
Interfaces and Functionality and Semantics II
• There are many other generally important services
• OGSA-DAI The Database Service
• Portal Service linked to by WSRP (Web services
for Remote Portals)
• Notification of events
• Job submission
• Provenance – interpret meta-data about history of
data
• File Interfaces
• Sensor service – satellites …
• Visualization
• Basic brokering/scheduling
Categories of Worldwide Grid Services
to be exploited by SERVOGrid
•
–
–
–
–
•
–
–
–
–
•
–
–
–
–
•
–
–
–
–
•
•
–
–
–
–
1) Types of Grid
R3
Lightweight
P2P
Federation and Interoperability
2) Core Infrastructure and Hosting Environment
Service Management
Component Model
Service wrapper/Invocation
Messaging
3) Security Services
Certificate Authority
Authentication
Authorization
Policy
4) Workflow Services and Programming Model
Enactment Engines (Runtime)
Languages and Programming
Compiler
Composition/Development
5) Notification Services
6) Metadata and Information Services
Basic including Registry
Semantically rich Services and meta-data
Information Aggregation (events)
Provenance
•
•
•
•
•
7) Information Grid Services
– OGSA-DAI/DAIT
– Integration with compute resources
– P2P and database models
8) Compute/File Grid Services
– Job Submission
– Job Planning Scheduling Management
– Access to Remote Files, Storage and
Computers
– Replica (cache) Management
– Virtual Data
– Parallel Computing
9) Other services including
– Grid Shell
– Accounting
– Fabric Management
– Visualization Data-mining and
Computational Steering
– Collaboration
10) Portals and Problem Solving Environments
11) Network Services
– Performance
– Reservation
– Operations
Two-level Programming I
• The paradigm implicitly assumes a two-level Programming
Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such nuggets accept and produce data from users files and
databases
Nugget
Data
• The Grid is built by coordinating such nuggets assuming
we have solved problem of programming the nugget
Two-level Programming II
• The Grid is discussing the linkage and distribution of the
nuggets with the only
Nugget2
addition runtime interfaces Nugget1
to Grid as opposed to
UNIX data streams
Nugget3
Nugget4
• Familiar from use of UNIX Shell, PERL or Python scripts
to produce real applications from core programs
• Such interpretative environments are the single processor
analog of Grid Programming and this tends to be called
workflow
• Workflow is the composition of multiple services
(programs) together to make a new service
– Includes “Software Bus”, “Application Integration”, “Coordination Languages” etc.
Workflow
• Workflow has at least 4 parts
– “Programming Environment” – typically GUI to drag and drop
services and their linkages (familiar from AVS etc. which was
workflow for visualization)
– Language – from XML to extended Python
– Compiler – converting Language into executable
– Runtime controlling flow of information and notification events
• Can use Python, Mathematica, Matlab, JavaSpaces, IBM
BPEL4WS, DoE CCA etc.
– Don’t think current systems are very near “what we will want” but
expect much progress over next 3 years and plenty of systems to
work with
• Metadata critical to tell you how to combine services in a
sensible way – so workflow engines must interface with
metadata service
Workflow GCEs and Problem Solving
Environments (PSEs)
• There is some confusion between fields of workflow
(Grid Computing Environments GCE) and PSEs
• To extent PSEs “just” allow manipulation of “nuggets”,
they are indistinguishable from a domain specific GCE
• They are distinct if they support intra nugget operations
such as
– Integration of mesh and simulation
– Closely coupled code linkage
– Generation of code from high level interface like Mathematica
• Even in latter case, a new generation of PSEs should be
built with Grid architecture – e.g. message based – and
using Grid services like metadata and notification
Jobs
Database
Tools
Selected GeoInformatics Data
XML Meta-data
Service
Tool MetaData
Job MetaData
MultiScale
Ontologies
Complexity Scripts
Workflow
SERVOGrid Complexity
Simulation Service
SERVOPSE
Programs
using CCEML
(SERVOML)
Importance of Metadata Service; how should this be implemented?
Metadata Approaches
• Specialized services like UDDI and MDS (Globus)
– Nobody likes UDDI
– MDS uses LDAP
– RGMA is MDS with a relational database backend
• “By hand” as in current GEM Portal which is roughly
same as using service stored SDE’s (Service Data
Elements) as in OGSI
• Some new MDS coming from Globus GT3?
– Current MDS has both a Schema (insufficient for us) and a
“database technology”
• Semantic Grid technologies
• Some basic XML database (Oracle, Xindice …)
• If “OGSA compliant” (not defined yet), then doesn’t
matter that much
Workflow and SERVOGrid CCE
• SERVOGrid should workflow technology to support both
– “code and data coupling” (DISLOC with SIMPLEX etc.)
– Multiscale features
• Implementing multiscale model requires
– building Web services for each model,
– describing each model with metadata and
– Describing linkage of models (linkage of ports on web services)
– And describing when to use which scale model
• So workflow and multiscale depend on web services described by
rich metadata
• This analysis isn’t correct if scales must be “tightly coupled” as
current workflow won’t support this (CCA from DoE claims to
address this but not clear if general)
– We should focus on multiscale models with loose “nugget”
coupling
– Hopefully we will learn how to take same architecture, compile
away inefficiencies and get high performance on tighter coupling
than conventional distributed workflow
Technologies under development at Indiana
• Portal Infrastructure and Portlets integrating with rest of
Globus/OGSA-DAI Community
– Including job submission, management of modest meta-data and
linkage to databases
– Should package as “application web service toolkit” and test on
ACES world wide iSERVOGrid
• “Some” core portal Metadata (Semantic Grid) services
• Messaging system between Web services that is useful for
– “Service Management”/Autonomic Grids
– Security
– Notification service
• Collaboration infrastructure and portlets
Web Services as a Portlet
• Each Web Service naturally has a
user interface specified as “just
another port”
– Customizable for universal access
• This gives each Web Service a
Portlet view specified (in XML as
always) by WSRP (Web services
for Remote Portals)
• So component model for resources
“automatically” gives a component
model for user interfaces
– When you build your
application, you define portlet
at same time
Application as a WS
General Application Ports
Interface with other Web
Services
WSDL
W
Application or
Content source
Web Service
P
S
R
User Face of
Web Service
WSRP Ports define
WS as a Portlet
Web Services have other
ports (Grid Service) to be
OGSI compliant
Online Knowledge Center built from Portlets
A set of UI
Components
• Web Services provide a component model
for the middleware (see large “common
component architecture” effort in Dept. of
Energy)
• Should match each WSDL component with
a corresponding user interface component
• Thus one “must use” a component model
for the portal with again an XML
specification (portalML) of portal
Sample page with
several portlets:
proxy credential manager,
submission, monitoring
Administer Grid Portal
Provide information
about application
and
host parameters
Select application
to edit