Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Operating Systems Lecture 8: Distributed Systems (introduction) University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Univ. of Tehran Distributed Operating Systems 1 Covered topic Distributed Systems, Why? And how. References Chapter 1 of the text book Univ. of Tehran Distributed Operating Systems 2 Outline Why Distributed systems Challenges. Communication Distributed Operating systems Architectural models Univ. of Tehran Distributed Operating Systems 3 Distributed System? (examples) The Internet A Sensor Network Gnutella peer to peer system Food Web of Little Rock Lake, WI Problems? Bigger Problems like weather forecast, Economic modeling, Scientific problems, etc. Faster machines? It is getting harder to extract the performance modern applications require out of a single processor machine Some application are inherently distributed, sensor networks, etc. A lot of Data to store in one place More efficient use of resources, sharing resources Solution: Distributed computing Univ. of Tehran Distributed Operating Systems 5 Distributed systems Definitions A collection of autonomous computers linked by a network, with software designed to produce an integrated computing facility A distributed system is a collection of independent computers that appear to users as a single computer A system in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages Examples World Wide Web Automatic Teller Machines Cell Phones Univ. of Tehran Distributed Operating Systems 6 A working definition A distributed system is a collection of entities, each of which is autonomous, programmable, asynchronous and failure-prone, and which communicate through an unreliable communication medium. Our interest in distributed systems involves algorithmics, design and implementation, maintenance, study Advantages Item Economics Speed Description Microprocessors offer a better price/performance than mainframes A distributed system may have more total computing power than a mainframe Inherent Distribution Reliability Some applications involve spatially separated machines. If one machine crashes, the system as a whole can still survive Incremental Growth Computing power can be added in small increments Univ. of Tehran Distributed Operating Systems 8 Disadvantages Item Description Software Little software, OSs, etc., exist at present for distributed systems The network can saturate or cause other problems. Easy access also applies to secret data, privacy! Networking Security Univ. of Tehran Distributed Operating Systems 9 A range of challenges Failures (of nodes or Network) Asynchrony Scalability Security Consequences Concurrency No Global Clock Concurrency is the norm instead of the exception Synchronization is critical There is a limit as to how accurate a global clock can be. Contrary to parallel systems. Independent Failures The more stuff you add the more likely something will break Single system view says independent failures should not affect users Univ. of Tehran Distributed Operating Systems 11 Communication Issues Building a system out of interconnected computers requires that some major issues be addressed Independent failure Unreliable communication Insecure Communication Costly Communication Univ. of Tehran Distributed Operating Systems 12 Distributed Operating Systems A distributed operating system supports the encapsulation and protection of resources inside servers; and it supports mechanisms required to access these resources, including naming, communication and scheduling The software for multiple CPU systems can be divided into three rough classes Network operating systems (file servers) Distributed Operating Systems Shared Memory Multiprocessors Univ. of Tehran Distributed Operating Systems 13 Parallel Computing A large collection of processing elements that can communicate and cooperate to solve large problems quickly A form of information processing which uses concurrent events during execution In other words, both the language and the hardware support concurrency Univ. of Tehran Distributed Operating Systems 14 Parallel Architectures Unlike traditional von Neumann machines, there is no single standard architecture used on parallel machines In fact dozens of different parallel architectures have been built and are being used Several people have tried to classify the different types of parallel machines The taxonomy proposed by Flynn is the most commonly used Univ. of Tehran Distributed Operating Systems 15 Ex. Building a mail server mail arrives from outside world store it until... user reads/deletes/saves it Solution: One server w/ disk to store mail-boxes Problems: Performance: Stable performance under high load consistent w.r.t. client-side copies concurrent mail arrival, deletion crash recovery (crash while updating mail-box) availability Univ. of Tehran Distributed Operating Systems 16 Other problems? Not necessarily plenty of bandwidth Not necessarily low latency Significant variance in latency and bandwidth Frequent and unpredictable partial failure of channel Lost messages, &c What else has changed? We don't have hardware support for synchronization/atomicity among hosts We don't have a global timer or clock Frequent and unpredictable failure of some CPUs, I/O devices, &c Snoopy caches are not practical, because broadcasting is too Univ. of Tehran Distributed Operating Systems 17 expensive. Challenges There are a number of challenges found in building distributed systems Heterogeneity Openness Security Scalability Failure Handling Concurrency Transparency Univ. of Tehran Distributed Operating Systems 18 Heterogeneity Applies to Networks Computer Hardware Operating Systems Programming Languages Implementations Middleware applies to a software layer that helps to handle heterogeneity Univ. of Tehran Distributed Operating Systems 19 Openness The characteristic that a system can be extended in various ways Hardware extensions Software extensions Historically, computer systems were largely closed UNIX broke the mold for OS IBM PC broke the mold for hardware Univ. of Tehran Distributed Operating Systems 20 Security Security is a huge issue in computing in general, but even more so in distributed computing Communication Distributed Resources Infrastructure Attacks Univ. of Tehran Distributed Operating Systems 21 Scalability Distributed systems operate at many different scales Two workstations and a file server Department computers… Often the more important question is not can you scale, but can you scale well Consider the Internet Univ. of Tehran Distributed Operating Systems 22 Failure Handling What happens when a fault occurs? Detect Mask Tolerate Fault tolerant design is based on two approaches Hardware redundancy Software recovery Univ. of Tehran Distributed Operating Systems 23 Hardware Redundancy Two computers are employed for a single application, one acting as a standby Very costly, but often very effective, solution Redundancy can be planned at a finer grain Individual servers can be replicated Redundant hardware can be used for noncritical activities when no faults are present Univ. of Tehran Distributed Operating Systems 24 Software Redundancy Software must be designed so that the state of permanent data can be recovered or “rolled back” when a fault is detected Transaction processing Univ. of Tehran Distributed Operating Systems 25 Concurrency Concurrency in a distributed system does not necessarily mean concurrency within a single program Many users invoke similar commands Many different server processes may be running Synchronization, of course, is a problem Univ. of Tehran Distributed Operating Systems 26 Transparency Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may have several copies. Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk Univ. of Tehran Distributed Operating Systems 27 Scalability Problems Concept Example Centralized services A single server for all users Centralized data A single on-line telephone book Centralized algorithms Doing routing based on complete information Examples of scalability limitations. Univ. of Tehran Distributed Operating Systems 28 Scaling Techniques (1) 1.4 The difference between letting: • a server or • a client check forms as they are being filled Univ. of Tehran Distributed Operating Systems 29 Scaling Techniques (2) 1.5 An example of dividing the DNS name space into zones. Univ. of Tehran Distributed Operating Systems 30 Hardware Models 1.6 Different basic organizations and memories in distributed computer systems Univ. of Tehran Distributed Operating Systems 31 Multiprocessors (1) A bus-based multiprocessor. 1.7 Univ. of Tehran Distributed Operating Systems 32 Multiprocessors (2) a) b) A crossbar switch An omega switching network 1.8 Univ. of Tehran Distributed Operating Systems 33 Homogeneous Multicomputer Systems a) b) Grid Hypercube 1-9 Univ. of Tehran Distributed Operating Systems 34 Software Models System Description Main Goal DOS Tightly-coupled operating system for multiprocessors and homogeneous multicomputers Hide and manage hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency An overview between DOS (Distributed Operating Systems) NOS (Network Operating Systems) Middleware Univ. of Tehran Distributed Operating Systems 35 Uniprocessor Operating Systems Separating applications from operating system code through a microkernel. 1.11 Univ. of Tehran Distributed Operating Systems 36 Multicomputer Operating Systems (1) 1.14 Univ. of Tehran Distributed Operating Systems 37 Multicomputer Operating Systems (2) Alternatives for blocking and buffering in message passing. 1.15 Univ. of Tehran Distributed Operating Systems 38 Distributed Shared Memory Systems (1) a) b) c) Pages of address space distributed among four machines Situation after CPU 1 references page 10 Situation if page 10 is read only and replication is used Univ. of Tehran Distributed Operating Systems 39 Distributed Shared Memory Systems (2) False sharing of a page between two independent processes. 1.18 Univ. of Tehran Distributed Operating Systems 40 Network Operating System (1) General structure of a network operating system. 1-19 Univ. of Tehran Distributed Operating Systems 41 Positioning Middleware General structure of a distributed system as middleware. 1-22 Univ. of Tehran Distributed Operating Systems 42 Software Layers Applications, services Middleware Operating system Platform Computer and network hardware Univ. of Tehran Distributed Operating Systems 43 Middleware What does it do? Provides an API for the application Hides the underlying heterogeneity Examples Sun RPC, ISIS CORBA RMI DCOM Univ. of Tehran Distributed Operating Systems 44 Middleware and Openness 1.23 In an open middleware-based distributed system, the protocols used by each middleware layer should be the same, as well as the interfaces they offer to applications. Univ. of Tehran Distributed Operating Systems 45 Comparison between Systems Distributed OS Multiproc. Multicomp. Network Middleware -based OS OS Degree of transparency Very High High Low High Same OS on all nodes Yes Yes No No Number of copies of OS 1 N N N Basis for communication Shared memory Messages Files Model specific Resource management Global, central Global, distributed Per node Per node Scalability No Moderately Yes Varies Openness Closed Closed Open Open Item Univ. of Tehran Distributed Operating Systems 46 Next Lecture DS Architecture References Chapter 2 of the book The Anatomy of the Grid Chord: A Scalable Peer to peer Lookup Service for Internet Applications. Univ. of Tehran Distributed Operating Systems 47