Download Alonso - Communication and Distributed Systems

G. Alonso, W. Bausch, C. Pautasso Information and Communication Systems Research Group Department of Computer Science Swiss Federal Institute of Technology, Zurich (ETHZ) Who are we?   Information and Communication Research Group. Our expertise:  High performance, high availability systems (e.g., data replication, backup architectures, parallel data processing)  Information systems architecture (mainframe or cluster based)  Large data repositories (e.g., HEDC data warehouse for the HESSI satellite)  Large distributed systems (e.g., enterprise middleware, workflow, internet electronic commerce) Our interests:  System architectures for applications with unconventional requirements  performance: cluster computing and parallel processing  availability: long term fault tolerance, hot backups ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.  communications: ubiquitous computing, mobile ad-hoc 2 What do we do?   Until recently, our work has been in the area of high performance/cluster/parallel computing in the IT world  high performance transaction processing systems (throughput, availability)  very large data repositories (scalability, evolution)  data processing applications (large scale data mining) In the last years we have started several projects in scientific applications:  HEDC: a multi-terabyte data repository for NASA with the capability to store and process astrophysics data; data production rate is 1 GB per day (raw); data goes on line 24 hours after delivery from the satellite, fully preprocessed, cleaned, catalogued, indexed, with relevant events identified (solar flares, gamma ray bursts, non solar events, etc.), and an extended catalogue of analyses for the events of interest. Users can browse, produce more analyses, request batch processing jobs, perform low resolution analyses, retrieve and store data, ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. etc. 3 Cluster computing   Question: “Will cluster computing penetrate the IT market?” (next talk …) Our answer: it did a long time ago but we do not benchmark our clusters in terms of Gflops and there is no Top500 list. Examples:  eBay: cluster of several hundred nodes for high performance transaction processing, fully duplicated for availability purposes  Google: several thousand nodes used for parallel data processing (4 weeks to retrieve, sort, classify, index and store the entire contents of the web)  TPC-C results Compaq ProLiant 8500-700-192P • Server: 24 nodes (8 x Pentium III Xeon 700MHz, 8 GB RAM) • Clients: 48 nodes (2 x Pentium III 800 Mhz, 521 MB RAM) • Disks: > 2500 SCSI Drives ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. • Total storage capacity: > 42 TB 4 Grid computing  Similarly, we have an IT version of (data or computing) grids: virtual business processes Compilation Definition WWW Catalogue IvyFrame Enterprise X Activity, behavior, price Enterprise Y Activity, behavior, price ... Virtual Enterprise Process Trading Community Posting WISE Engine Persistence Security Q 0f S Exec. Guarante Availability Awareness Mod Enactment ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 5 Complex computing environments Slope Angle Topo-Map Slope Elevation Digital Analysis Samples Elevation Reconstruction Slope Length Erosion Soil Model Samples Satellite Images Vegetation Samples Rainfall Records Vegetation Model Precipitation Model Vegetation Cover Vegetation Evolution Model Soil Erosion Vegetation Change Estimated Rainfall Spatial Spatial Interpolation Model Rainfall Data Hydrograph Storm and Discharge Models ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 6 High level components Tasks Threads/ Processes Compilers CPU Task i-l func1 ( ) { .... .... } a ( 0 ) =.. b ( 0 ) =.. + Task i Task i+1 func2 ( ) { .... .... } func3 ( ) { .... .... } a ( 1 )=.. b ( 1 )=.. x a ( 2 )=.. b ( 2 )=.. Load Code-Granularity Large grain (application level) Program Medium grain (operating system level) Function (thread or process) Fine grain (compiler level) Loops Very fine grain (instruction level) With hardware ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 7 Process Management Systems     Processes are the way to build distributed computations over clusters Process  Description of resources (programs, applications, data), specification of control and data flow Process Management System  Management of distributed resources (scheduling, load balancing, parallelism, data transfer, etc.). An operating system at a coarse granularity? The use of databases reduces by about 50 % the ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 8 Opera O P E R A Open Process Engine for Reliable Activities  Kernel based system instead of complete application  Which functionality needs to be implemented in the kernel and which functionality is application specific is an open research area. Advanced Services Expanded Services Atomicity Exception handling Recovery Event Services Workflow Services Modelling Scheduling Fault tolerance Dist. Services Ext. Applications Load balancing Process migration Persistence Logging Query interfaces ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 9 Application specific extensions O P E R A Open Process Engine for Reliable Activities Modeling and Compiler Interface for Data and Applications Metadata and Tools OPERA KERNEL ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 10 BioOpera TYPICAL COMPUTING ENVIRONMENTS IN BIOINFORMATICS  Multiple step complex computations over heterogeneous applications  algorithmic complexity (most of them NP-hard, NPcomplete)  large input / output data sets (exponential is not uncommon)  complex data dependencies and control flow  long lived computations over unreliable platforms  Current programming environments are very primitive:  support for single algorithms (complex computations put together by hand or very low level scripts)  it is difficult to specify, run, keep track and reproduce computations  no automated support, computations are manually maintained ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 11 BioOpera: A Process Support System BIOPERA  programming environment for multiple step computations over heterogeneous applications:  program invocations, control & data flow  parallelism (data, search space)  failure behaviour  Run time environment for processes  navigation  distribution / load balancing  persistence / recovery  Monitor and interaction tools  at process level  at task level Process front-end Application front-end Programming tool Monitoring tool Data visualization Similarity search BioOpera Process Enactment Process Monitoring System Awareness Lineage info Recovery info Awareness info Bioinformatics Application Process output DARWIN ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 12 Darwin Activity queue Interface layer (remote) Execution client Load monitor Execution Layer Navigator process recovery interpreter OCR Compiler process spec. Interface layer (local) BioOpera client Cluster on a LAN translation Instance space Dispatcher scheduling distribution Config Template space space Data Layer load balancing Data space BioOpera server Process Support System Frontend BioOpera: Architecture ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 13 The All vs. All      Problem: Compute a complete cross-comparison of the SwissProt v38 protein database SwissProt v38 80‘000 entries average seq. length: 400 residues A complete crosscomparison (using state of the art methods) runs 2 years (cpu time) on aUnit single Execution PC Ref Raw Currently, doing this in a much smaller scale takes several months (just updates) n times Task 1 User Input Database, Output file Task 2 Parallel PreProcessing Input Params Task 3 All vs. All Output Files Task 4 Merge Master Result ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 14 The All vs. All lifecycle Other user needs cluster Cluster failure All jobs manually killed Disk space shortage Cluster busy with other jobs Server down for maintenance 30 Number of processors 20 10 0 30 20 10 0 Jan 21 Jan 20 Jan 19 Jan 18 Jan 17 Jan 16 Jan 15 Jan 14 Jan 13 Jan 12 Jan 11 Failed jobs Jan 10 Jan 09 Jan 08 Jan 07 Jan 06 Succesful jobs Jan 05 Jan 04 Jan 03 Jan 02 Jan 01 Dec 31 Running jobs Dec 30 Dec 29 Dec 28 Dec 27 Dec 26 Dec 25 Dec 24 Dec 23 Dec 22 Dec 21 Dec 20 Dec 19 Dec 18 Dec 17 Available processors Jobs on CPU ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ. 15 Conclusions    BioOpera offers:  a better way to program complex computations over clusters,  persistent computations, recoverability,  monitoring and querying facilities,  a scalable architecture that can be applied both in clusters and in grids BioOpera is not:  a parallel programming language or a message passing library (too low level),  a scripting language (too inflexible),  a workflow tool (too oriented to human activities) BioOpera is extensible:  accepts different process programming models,  monitoring and querying facilities can be expanded and changed, ©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.  BioOpera servers can be combined to form clusters of 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Alonso - Communication and Distributed Systems