* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download dist-prog2
Abstraction (computer science) wikipedia , lookup
Scheduling (computing) wikipedia , lookup
Thread (computing) wikipedia , lookup
Reactive programming wikipedia , lookup
Supercomputer wikipedia , lookup
Data-intensive computing wikipedia , lookup
Stream processing wikipedia , lookup
Process management (computing) wikipedia , lookup
Flow-based programming wikipedia , lookup
Parallel computing wikipedia , lookup
Distributed computing wikipedia , lookup
Client/Server Distributed Systems 240-322, Semester 1, 2005-2006 2. Distributed Programming Concepts Objectives – explain the general meaning of distributed programming beyond client/server – look at the history of distributed programming 240-322 Cli/Serv.: Dist. Prog./2 1 Overview 1. Definition 2. From Parallel to Distributed 3. Forms of Communication 4. Data Distribution 5. Algorithmic Distribution 6. Granularity 7. Load Balancing 8. Brief History of Distributed Programming 240-322 Cli/Serv.: Dist. Prog./2 2 1. Definition Distributed programming is the spreading of a computational task across several programs, processes, or processors. Includes parallel (concurrent) and networked programming. Definition 240-322 Cli/Serv.: Dist. Prog./2 is a bit vague. 3 2. From Parallel to Distributed Most parallel languages talk about processes: – these can be on different processors or on different computers The implementor may choose to add language features to explicitly say where a process should run. May also choose to address network issues (bandwidth, failure, etc.) at the language level. 240-322 Cli/Serv.: Dist. Prog./2 continued 4 Often resources required by programs are distributed, which means that the programs must be distributed. 240-322 Cli/Serv.: Dist. Prog./2 continued 5 240-322 Cli/Serv.: Dist. Prog./2 continued 6 Network Transparency Most users want networks to be as transparent (invisible) as possible: – users do not want to care which machine is used to store their files – they do not want to know where a process is running 240-322 Cli/Serv.: Dist. Prog./2 7 3. Forms of Communication 1-to-1 communication processes 1-to-many 240-322 Cli/Serv.: Dist. Prog./2 communication These can be supported on top of shared memory or distributed memory platforms. continued 8 many-to-1 communication many-to-many 240-322 Cli/Serv.: Dist. Prog./2 communication 9 4. Data Distribution Divide input data between identical separate processes. Examples: – database search – edge detection in an image – builders making a room with bricks 240-322 Cli/Serv.: Dist. Prog./2 10 Boss-Workers send part of database send answer workers (all database search engines) boss 240-322 Cli/Serv.: Dist. Prog./2 11 workers often need to talk to one another workers (all builders) send bricks done boss 240-322 Cli/Serv.: Dist. Prog./2 talking 12 Boss - Eager Workers workers (all builders) ask for bricks send bricks boss 240-322 Cli/Serv.: Dist. Prog./2 talking 13 Things to Note The code is duplicated in every process. The maximum no. of processes depends on the size of the task and difficulty of dividing data. Talking can be very hard to code. Talking is usually called communication, synchronisation or cooperation 240-322 Cli/Serv.: Dist. Prog./2 continued 14 Communication is almost always implemented using message passing. How are processes assigned to processors? 240-322 Cli/Serv.: Dist. Prog./2 15 5. Algorithmic Distribution Divide algorithm into parallel parts / processes – e.g. UNIX pipes dirty plates on table collector plates in cupboard 240-322 Cli/Serv.: Dist. Prog./2 dirty plates washer Stacker wipe dry plates clean wet plates Drier 16 Things to Note Talking is simple: pass data to next process which ‘wakes up’ that process. Talking becomes harder to code if there are loops. How 240-322 Cli/Serv.: Dist. Prog./2 to assign processes to processors? 17 Several Workers per Sub-task Use both algorithmic and data distribution. Problems: how to divide data? how to combine data? collector Drier washer collector Drier Stacker 240-322 Cli/Serv.: Dist. Prog./2 Drier 18 Parallelise Separate Sub-tasks Build a house: plumbing brick laying paint electrical wiring b | (pl & e) | pt 240-322 Cli/Serv.: Dist. Prog./2 19 6. Granularity Amount of data handled by a process: Course grained: lots of data per process – e,g, UNIX processes Fine grained: small amounts of data per process – e.g. UNIX threads, Java threads 240-322 Cli/Serv.: Dist. Prog./2 20 7. Load Balancing How to assign processes to processors? Want to ‘even out’ work so that each processor does about the same amount of work. But: – different processors have different capabilities – must consider cost of moving a process to a processor (e.g. network speed, load) 240-322 Cli/Serv.: Dist. Prog./2 21 8. Brief History of (UNIX) Distributed Programming 1970’s: UNIX was a multi-user, time-sharing OS – &, pipes – interprocess communication (IPC) on a single processor mid 1980’s: System V UNIX – added extra IPC mechanisms: shared memory, messages, queues, etc. 240-322 Cli/Serv.: Dist. Prog./2 continued 22 late 1970's to mid 1980’s: ARPA – US Advanced Research Projects Agency – funded research that produced TCP/IP, sockets – added to BSD Unix 4.2 mid-late 1980’s: utilities developed – telnet, ftp – r* utilities: rlogin, rcp, rsh – client-server model based on sockets 240-322 Cli/Serv.: Dist. Prog./2 continued 23 1986: System V UNIX – released TL1, a set of socket-based libraries that support OSI – not widely used late 1980’s: Sun Microsystems – NFS (Network File System) – RPC (Remote Procedure Call) – NIS (Network Information Services) 240-322 Cli/Serv.: Dist. Prog./2 continued 24 early 1990’s – POSIX threads (light-weight processes) – Web client-server model based on TCP/IP mid 1990's: Java – Java threads – Java Remote Method Invocation (RMI) – CORBA 240-322 Cli/Serv.: Dist. Prog./2 continued 25 late 1990's / early 2000's – J2EE, .NET – peer-to-peer (P2P) Napster, Gnutella, etc. – JXTA 240-322 Cli/Serv.: Dist. Prog./2 26