Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
S(O)OS IN A NUTSHELL Towards the Large-Scale OS S(o)OS Realising resource-independent execution support on tera-device systems Structure of this presentation CONTENT 1. Background / Introduction: current problems of OS and HPC / tightly coupled systems 2. The basic idea behind S(o)OS a quick overview over the main concepts 3. Project structure & management approach S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 2 issues for large scale systems BACKGROUND S(o)OS Service-oriented Operating Systems Brief overview over current HPC TRENDS • We face systems with – – – – hundred of thousands of cores Heterogeneous cores Accelerators attached using wide range of technologies Myriad of connection options • • • • • Hypertransport/Quickpath PCIExpress Gigabit Ethernet Infiniband Lot of vendor specific connectivity • This systems cannot easily be programmed The potential cannot be exploited Current approaches are working on the symptoms (PGAS, CUDA, OpenCL, etc.) S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 4 Current Operating Systems ISSUES • Focus on homogeneous resource infrastructures • Scale well wrt. processes but not wrt. processing units • Are essentially centralistic => bottleneck, OS jitter ... Future environments will be large-scale, heterogeneous and potentially widely distributed Current OS architectures can not deal with this S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 5 Current Operating Systems COMMUNICATION Assign Process Procedure Call Context Switch Memory Access Cross Core Comm. Load Process Process Load Cache Procedure Call Context Switch Memory Access Cross Core Comm. Load Process Process Load Proc. Unit Cache Proc. Unit Operating System Assign Process Proc. Unit Massive communication and management overhead S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 6 Distributed Programming ISSUES • Require a lot of knowledge about – – – – – The resource infrastructure The relationship between algorithm and data The potential distribution of the algorithms and Its connectivity / communication Etc. Large scale tightly coupled systems will become a common good Programming models must become more efficient and manageable S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 7 Revising OS Architectures TOWARDS S(O)OS S(o)OS Service-oriented Operating Systems GENERAL CONCEPT • Distribute operating system and code across resource infrastructure • Rearrange running code and operating system according to – Availability of resources – Requirements of code – Linkage between data S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 9 OVERALL CONCEPT match deployment Service-oriented Operating Systems resource capabilities Physical Memory resource requirements actual execution Segmented Code resource requirements S(o)OS resource capabilities resource requirements Virtual Memory virtual execution Process Code resource capabilities 5/23/2017 Presenter 10 REAL VON NEUMANN Memory PU ALU MU Control In / Out Memory Control In / Out Data Bus Data Bus S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 11 THE OS MONSTER S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 12 OBSTACLES • Limited cache size • Communication is costly • Resource environment is dynamic or changes between executions • Data consistency • Code consistency S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 13 BASIC PRINCIPLES 1 • Distributed, Self-Managed Microkernels File Management Scheduler HAL S(o)OS Service-oriented Operating Systems Job Mgr Memory – Following the SOA / Grid principle OS functionalities are separate “services” – “Code is where the data is” – Dynamic composition of elements according to code segment requirements I/O 5/23/2017 Presenter 14 BASIC PRINCIPLES 2 • Runtime Code Behaviour Analysis – Identify code parts with strong relationships – Primary and secondary datasets – OS relation – Distribute segments according to requirements and availability – Annotate memory with analysis results S(o)OS Service-oriented Operating Systems OS Modules Procedure Calls etc. Code Segments Selfreferencing 5/23/2017 Presenter 15 “Hacking” Applications Address &x00000000 &x00000100 &x00000200 &x00000300 &x00000400 &x00000500 &x00000600 &x00000700 &x00000800 ... Add. Address Inf. Type accessed from: &x00000600 &x00000C00 &x00000F00 calls: &x0002AB00 &x00000000 &x001D1F00 reads from: &x00011F00 Process 1 &x000BAE30 Process 1 writes to: &x000BAE30 accessed from: &x00000100 &x00000800 &x00000200 jumps to: &x00000600 ... &x00000300 &x00000400 &x00000500 &x00000600 &x00000700 Process 2 Process 2 &x00000800 ... Annotated Virtual Memory f:0.5 f:0.9 OS1 P1.B1 OS2 w:0.8 w:0.3 f:0.2 P2.B1 w:0.9 P1.B2 f:0.3 f:0.8 P1.B1 w:0.7 P2.B1 w:0.5 w:0.9 w:0.3 D.B1 P1.B2 w:0.6 D.B2 BASIC PRINCIPLES 3 • Distributed Execution Model – – – – – Whole code can be distributed, not just threads Execution context may move between cores Distribution may vary with infrastructure Essential distribution information is maintained with code Reduce communication overhead Virtual process space Exec Env S(o)OS Service-oriented Operating Systems Exec Env 5/23/2017 Presenter 17 PROJECT GOALS • New OS architectures / paradigms • New approaches and algorithms to deal with future distributed execution systems • Proof-of-concept implementation of distributed execution support tools S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 18 Notes on Project Structure MANAGING S(O)OS S(o)OS Service-oriented Operating Systems WORK PACKAGES Two main strands: • Design Strand Development of algorithms, architectures, reference implementations – WP2: looks at the development from a hardware perspective – WP3: examines (communication) protocols – WP4: deals with distributed execution models • Integration & Testing Strand Aligns the models, performs integrated, application related tests – WP5: OS model – WP6: Application S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 20 PARTICIPANTS • HLRS, University of Stuttgart • Instituto de Telecomunicações Aveiro • RETIS Lab, Sant'Anna School of Advanced Studies • CTIT, Universiteit Twente • Ecole Polytechnique Fédérale de Lausanne • European Microsoft Innovation Centre S(o)OS Service-oriented Operating Systems 5/23/2017 Presenter 21