Download Alonso - Communication and Distributed Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
G. Alonso, W. Bausch, C. Pautasso
Information and Communication Systems Research Group
Department of Computer Science
Swiss Federal Institute of Technology, Zurich (ETHZ)
Who are we?


Information and Communication Research Group. Our
expertise:
 High performance, high availability systems (e.g., data
replication, backup architectures, parallel data
processing)
 Information systems architecture (mainframe or cluster
based)
 Large data repositories (e.g., HEDC data warehouse for
the HESSI satellite)
 Large distributed systems (e.g., enterprise middleware,
workflow, internet electronic commerce)
Our interests:
 System architectures for applications with unconventional
requirements
 performance: cluster computing and parallel
processing
 availability: long term fault tolerance, hot backups
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
 communications: ubiquitous computing, mobile ad-hoc
2
What do we do?


Until recently, our work has been in the area of high
performance/cluster/parallel computing in the IT world
 high performance transaction processing systems
(throughput, availability)
 very large data repositories (scalability, evolution)
 data processing applications (large scale data mining)
In the last years we have started several projects in scientific
applications:
 HEDC: a multi-terabyte data repository for NASA with the
capability to store and process astrophysics data; data
production rate is 1 GB per day (raw); data goes on line
24 hours after delivery from the satellite, fully
preprocessed, cleaned, catalogued, indexed, with
relevant events identified (solar flares, gamma ray bursts,
non solar events, etc.), and an extended catalogue of
analyses for the events of interest. Users can browse,
produce more analyses, request batch processing jobs,
perform low resolution analyses,
retrieve and store data,
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
etc.
3
Cluster computing


Question: “Will cluster computing penetrate the IT market?”
(next talk …)
Our answer: it did a long time ago but we do not benchmark
our clusters in terms of Gflops and there is no Top500 list.
Examples:
 eBay: cluster of several hundred nodes for high
performance transaction processing, fully duplicated for
availability purposes
 Google: several thousand nodes used for parallel data
processing (4 weeks to retrieve, sort, classify, index and
store the entire contents of the web)
 TPC-C results Compaq ProLiant 8500-700-192P
• Server: 24 nodes (8 x Pentium III Xeon 700MHz, 8
GB RAM)
• Clients: 48 nodes (2 x Pentium III 800 Mhz, 521 MB
RAM)
• Disks: > 2500 SCSI Drives
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
• Total storage capacity:
> 42 TB
4
Grid computing

Similarly, we have an IT version of (data or computing) grids:
virtual business processes
Compilation
Definition
WWW Catalogue IvyFrame
Enterprise X
Activity, behavior, price
Enterprise Y
Activity, behavior, price
...
Virtual Enterprise Process
Trading Community
Posting
WISE Engine
Persistence
Security
Q 0f S
Exec. Guarante
Availability
Awareness Mod
Enactment
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
5
Complex computing
environments
Slope
Angle
Topo-Map Slope
Elevation Digital
Analysis
Samples Elevation
Reconstruction
Slope Length
Erosion
Soil
Model
Samples
Satellite
Images
Vegetation
Samples
Rainfall
Records
Vegetation
Model
Precipitation
Model
Vegetation
Cover
Vegetation
Evolution
Model
Soil
Erosion
Vegetation
Change
Estimated
Rainfall
Spatial
Spatial
Interpolation Model
Rainfall Data
Hydrograph
Storm and
Discharge Models
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
6
High level components
Tasks
Threads/
Processes
Compilers
CPU
Task i-l
func1 ( )
{
....
....
}
a ( 0 ) =..
b ( 0 ) =..
+
Task i
Task i+1
func2 ( )
{
....
....
}
func3 ( )
{
....
....
}
a ( 1 )=..
b ( 1 )=..
x
a ( 2 )=..
b ( 2 )=..
Load
Code-Granularity
Large grain
(application level)
Program
Medium grain
(operating system
level)
Function (thread or
process)
Fine grain
(compiler level)
Loops
Very fine grain
(instruction level)
With hardware
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
7
Process Management
Systems




Processes are the way to
build distributed
computations over clusters
Process
 Description of resources
(programs, applications,
data), specification of
control and data flow
Process Management
System
 Management of
distributed resources
(scheduling, load
balancing, parallelism,
data transfer, etc.). An
operating system at a
coarse granularity?
The use of databases
reduces by about 50 % the
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
8
Opera
O
P
E
R
A
Open Process Engine
for Reliable Activities
 Kernel based system instead of complete application
 Which functionality needs to be implemented in the kernel and which
functionality is application specific is an open research area.
Advanced Services Expanded Services
Atomicity
Exception handling
Recovery
Event Services
Workflow Services
Modelling
Scheduling
Fault tolerance
Dist. Services
Ext. Applications
Load balancing
Process migration
Persistence
Logging
Query interfaces
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
9
Application specific
extensions
O
P
E
R
A
Open Process Engine
for Reliable Activities
Modeling
and Compiler
Interface for Data
and Applications Metadata and
Tools
OPERA
KERNEL
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 10
BioOpera
TYPICAL COMPUTING ENVIRONMENTS IN BIOINFORMATICS

Multiple step complex computations over heterogeneous
applications
 algorithmic complexity (most of them NP-hard, NPcomplete)
 large input / output data sets (exponential is not
uncommon)
 complex data dependencies and control flow
 long lived computations over unreliable platforms

Current programming environments are very primitive:
 support for single algorithms (complex computations put
together by hand or very low level scripts)
 it is difficult to specify, run, keep track and reproduce
computations
 no automated support, computations are manually
maintained
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
11
BioOpera: A Process
Support System
BIOPERA

programming
environment for multiple
step computations over
heterogeneous
applications:
 program invocations,
control & data flow
 parallelism (data,
search space)
 failure behaviour

Run time environment for
processes
 navigation
 distribution / load
balancing
 persistence / recovery

Monitor and interaction
tools
 at process level
 at task level
Process front-end
Application front-end
Programming tool
Monitoring tool
Data visualization
Similarity search
BioOpera
Process Enactment
Process Monitoring
System Awareness
Lineage info
Recovery info
Awareness info
Bioinformatics
Application
Process
output
DARWIN
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 12
Darwin
Activity
queue
Interface layer
(remote)
Execution
client
Load
monitor
Execution Layer
Navigator
process
recovery
interpreter
OCR Compiler
process
spec.
Interface layer
(local)
BioOpera client
Cluster on a LAN
translation
Instance
space
Dispatcher
scheduling
distribution
Config
Template
space
space
Data Layer
load
balancing
Data
space
BioOpera server
Process Support System Frontend
BioOpera: Architecture
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 13
The All vs. All





Problem: Compute a
complete cross-comparison
of the SwissProt v38
protein database
SwissProt v38 80‘000
entries
average seq. length: 400
residues
A complete crosscomparison (using state of
the art methods) runs 2
years (cpu time)
on aUnit
single
Execution
PC
Ref
Raw
Currently, doing
this in
a
much smaller scale takes
several months (just
updates)
n times
Task 1
User Input
Database,
Output file
Task 2
Parallel
PreProcessing
Input
Params
Task 3
All vs. All
Output
Files
Task 4
Merge
Master
Result
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 14
The All vs. All lifecycle
Other
user
needs
cluster
Cluster
failure
All jobs
manually
killed
Disk
space
shortage
Cluster
busy with
other jobs
Server down for
maintenance
30
Number of processors
20
10
0
30
20
10
0
Jan 21
Jan 20
Jan 19
Jan 18
Jan 17
Jan 16
Jan 15
Jan 14
Jan 13
Jan 12
Jan 11
Failed
jobs
Jan 10
Jan 09
Jan 08
Jan 07
Jan 06
Succesful
jobs
Jan 05
Jan 04
Jan 03
Jan 02
Jan 01
Dec 31
Running
jobs
Dec 30
Dec 29
Dec 28
Dec 27
Dec 26
Dec 25
Dec 24
Dec 23
Dec 22
Dec 21
Dec 20
Dec 19
Dec 18
Dec 17
Available
processors
Jobs
on CPU
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 15
Conclusions



BioOpera offers:
 a better way to program complex computations over
clusters,
 persistent computations, recoverability,
 monitoring and querying facilities,
 a scalable architecture that can be applied both in
clusters and in grids
BioOpera is not:
 a parallel programming language or a message passing
library (too low level),
 a scripting language (too inflexible),
 a workflow tool (too oriented to human activities)
BioOpera is extensible:
 accepts different process programming models,
 monitoring and querying facilities can be expanded and
changed,
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
 BioOpera servers can be combined to form clusters of
16