Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Distributed Systems COEN 317 Introduction Chapter 1,2,3 COEN 317 JoAnne Holliday Email: [email protected] (best way to reach me) Office: Engineering 247, (408) 551-1941 Office Hours: TW 3:00-4:30 and by appointment Class web page: http://www.cse.scu.edu/~jholliday/ Textbook: Distributed Systems, Principles and Paradigms By Tanenbaum and van Steen We will cover chapter 4-8 and parts of 9. Read chapter 1. Review chapters 2 if needed for networks and 3 as needed for threads and processes Chapter 1: Introduction Chapter 2: Communication, Networking Chapter 3: Processes Definition of a Distributed System (1) A distributed system is: A collection of independent computers that appears to its users as a single coherent system. Definition of a Distributed System (2) 1.1 A distributed system organized as middleware. Note that the middleware layer extends over multiple machines. Threads (chapter 3) Message propagation times are long. Send a message and let one thread wait for response while another continues with task. Distributed systems “Distributed System” covers a wide range of architectures from slightly more distributed than a centralized system to a truly distributed network of peers. One Extreme: Centralized Centralized: mainframe and dumb terminals All of the computation is done on the mainframe. Each line or keystroke is sent from the terminal to the mainframe. Moving Towards Distribution In a client-server system, the clients are workstations or computers in their own right and perform computations and formatting of the data. However, the data and the application which manipulates it ultimately resides on the server. More Decentralization In Distributed-with-Coordinator, the nodes or sites depend on a coordinator node with extra knowledge or processing abilities Coordinator might be used only in case of failures or other problems True Decentralization A true Distributed system has no distinguished node which acts as a coordinator and all nodes or sites are equals. The nodes may choose to elect one of their own to act as a temporary coordinator or leader Distributed Systems: Pro and Con Some things that were difficult in a centralized system become easier – Doing tasks faster by doing them in parallel – Avoiding a single point of failure (all eggs in one basket) – Geographical distribution Some things become more difficult – Transaction commit – Snapshots, time and causality – Agreement (consensus) Advantages of the True Distributed System • No central server or coordinator means it is scalable • SDDS, Scalable Distributed Data Structures, attempt to move distributed systems from a small number of nodes to thousands of nodes • We need scalable algorithms to operate on these networks/structures – For example peer-to-peer networks Transparency in a Distributed System Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that copies of a resource exist and a user might use different ones at different times Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource Persistence Hide whether a (software) resource is in memory or on disk Important: location, migration (relocation), replication, concurrency, failure. Scalability • Something is scalable if it “increases linearly with size” where size is usually number of nodes or distance. • “X is scalable with the number of nodes” • Every site (node) is directly connected to every other site through a communication channel. Number of channels is NOT scalable. For N sites there are N! channels. • Sites connected in a ring. # of channels IS scalable. (N channels for N sites) Scalability Problems Concept Example Centralized services A single server for all users Centralized data A single on-line telephone book Centralized algorithms Doing routing based on complete information Examples of scalability limitations. Scaling Techniques (1) 1.4 The difference between letting: a) a server or b) a client check forms as they are being filled Scaling Techniques (2) 1.5 An example of dividing the DNS name space into zones. Characteristics of Scalable Distributed Algorithms • No machine (node, site) has complete information about the system state. • Sites make decisions based only on local information. • Failure of one site does not ruin the algorithm. • There is no implicit assumption that a global clock exists. Homogeneous and tightly coupled vs heterogeneous and loosely coupled We will study heterogeneous and loosely coupled systems. Multiprocessors (1) 1.7 A bus-based multiprocessor. Multiprocessors (2) 1.8 a) A crossbar switch b) An omega switching network Homogeneous Multicomputer Systems 1-9 a) (a) Grid b) (b) Hypercube: 2N nodes at degree N Software Concepts System Description Main Goal DOS Tightly-coupled operating system for multiprocessors and homogeneous multicomputers Hide and manage hardware resources NOS Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN) Offer local services to remote clients Middleware Additional layer atop of NOS implementing general-purpose services Provide distribution transparency • DOS (Distributed Operating Systems) • NOS (Network Operating Systems) • Middleware Uniprocessor Operating Systems 1.11 Separating applications from operating system code through a microkernel. Distributed Operating Systems 1.14 May share memory or other resources. Network Operating System 1-19 General structure of a network operating system. Middleware based Distributed System 1-22 General structure of a distributed system as middleware. Middleware and Openness 1.23 In an open middleware-based distributed system, the protocols used by each middleware layer should be the same, as well as the interfaces they offer to applications. Comparison between Systems Item Distributed OS Network OS Middlewarebased OS Multiproc. Multicomp. Very High High Low High Yes Yes No No Number of copies of OS 1 N N N Basis for communication Shared memory Messages Files Model specific Resource management Global, central Global, distributed Per node Per node Scalability No Moderately Yes Varies Openness Closed Closed Open Open Degree of transparency Same OS on all nodes A comparison between multiprocessor operating systems, multicomputer operating systems, network operating systems, and middleware based distributed systems. Modern Architectures 1-31 An example of horizontal distribution of a Web service. Two meanings of synchronous and asynchronous communications • Synchronous communications is where a process blocks after sending a message to wait for the answer or before receiving. • Sync and async have come to describe the communications channels with which they are used. • Synchronous: message transit time is short and bounded. If site does not respond in x sec, site can be declared dead. Simplifies algorithms! • Asynchronous: message transit time is unbounded. If a message is not received in a given time interval, it could just be slow. What makes Distributed Systems Difficult? • Asynchrony – even “synchronous” systems have time lag. • Limited local knowledge – algorithms can consider only information acquired locally. • Failures – parts of the distributed system can fail independently leaving some nodes operational and some not. Example: Byzantine Agreement Introduced as voting problem (Lamport, Shostak, Pease ’82) A and B can defeat enemy iff both attack A sends message to B: Attack at Noon! General A General B The Enemy Byzantine Agreement Impossible with unreliable networks Possible if some guarantees of reliability – Guaranteed delivery within bounded time – Limitations on corruption of messages – Probabilistic guarantees (send multiple messages)