Download Concurrency

Advanced Operating Systems Lecture 7: Concurrency University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Univ. of Tehran Distributed Operating Systems 1 How to use shared resource   Some general problem and solutions. References Univ. of Tehran Distributed Operating Systems 2 Outline        Introduction Motivation Implementing mutual exclusion Implementing restartable atomic sequence Kernel design considerations The performance of three software techniques Conclusions Univ. of Tehran Distributed Operating Systems 3 Why Coordinate?  Critical section:    Must execute atomically, without interruption. Atomicity usually only w.r.t. other operations on the same data structures. What are sources of interruption?    Hardware interrupts, UNIX signals. Thread pre-emption. Interleaving of multiple CPUs. Univ. of Tehran Distributed Operating Systems 4 Spooling Example: Correct Process 1 int next_free; Shared memory Process 2 int next_free; … out 1 next_free = in; 2 Stores F1 into next_free; 3 in=next_free+1 4 abc 5 Prog.c 6 7 Prog.n F1 in F2 4 next_free = in 5 Stores F2 into next_free; … Univ. of Tehran Distributed Operating Systems 6 in=next_free+1 5 Spooling Example: Races Process 1 int next_free; Shared memory Process 2 int next_free; … out 1 next_free = in; 3 Stores F1 into next_free; 4 in=next_free+1 4 abc 5 Prog.c 6 7 Prog.n F1 F2 2 next_free = in /* value: 7 */ in 5 Stores F2 into next_free; … Univ. of Tehran Distributed Operating Systems 6 in=next_free+1 6 Critical Section Problem     N threads all competing to use the same shared data It might eventuate to Race condition Each thread has a code segment, called a critical section, in which share data is accessed We need to ensure that when one thread is executing in its critical section, no other thread is allowed to execute in its critical section Univ. of Tehran Distributed Operating Systems 7 Critical Region (Critical Section) Process { while (true) { ENTER CRITICAL SECTION Access shared variables; // Critical Section; LEAVE CRITICAL SECTION Do other work } } Univ. of Tehran Distributed Operating Systems 8 Critical Region Requirement Mutual Exclusion:  one process must execute within the critical. Progress:  If no process is waiting in its critical section and several processes are trying to get into their critical section, then entry to the critical section cannot be postponed indefinitely.  No process running outside its critical region may block other processes Bounded Wait:  A process requesting entry to a critical section should only have to wait for a bounded number of other processes to enter and leave the critical section.  No process should have to wait forever to enter its critical region Speed and Number of CPUs: Univ. of Tehran Distributed Operating Systems 9 Critical Regions (2) Mutual exclusion using critical regions Univ. of Tehran Distributed Operating Systems 10 Synchronization approaches        Disabling Interrupts Lock Variables Strict Alternation Peterson’s solution TSL Sleep and Wakeup Message sending Univ. of Tehran Distributed Operating Systems 11 Disabling Interrupts  How does it work?   Why does it work?   With interrupts disabled, no clock interrupts can occur. (The CPU is only switched from one process to another as a result of clock or other interrupts, and with interrupts disabled, no switching can occur.) Problems:    Disable all interrupts just after entering a critical section and re-enable them just before leaving it. What if the process forgets to enable the interrupts? Multiprocessor? (disabling interrupts only affects one CPU) Only used inside OS Univ. of Tehran Distributed Operating Systems 12 Lock Variables Int lock; lock:=0 While (lock); lock = 1; EnterCriticalSection; access shared variable; LeaveCriticalSection; lock = 0; Does the above code work? Univ. of Tehran Distributed Operating Systems 13 Strict Alternation Thread Me; /* For two threads */ { while (true) { while ( turn != my_thread_id) { }; Access shared variables; // Critical Section; turn = other_thread_id; Do other work } } Satisfies mutual exclusion but not progress. Why?  Notes:   While {turn != my_thread_id} {}; /* busy waiting*/ A lock (turn variable) that uses busy waiting is called a spin lock Univ. of Tehran Distributed Operating Systems 14 Using Flags int flag[2]= {false, false}; Thread Me; { while (true) { flag[my_thread_id] = true; while (flag[other_thread_id] ) { }; Access shared variables; // Critical Section; flag[my_thread_id] = false; Do other work } } Can block indefinitely Why? (You go ahead!) Univ. of Tehran Distributed Operating Systems 15 Test & Set (TSL)   Requires hardware support Does test and set atomically char Test_and_Set ( char* target); \\ All done atomically { char temp = *target; *target = true; return(temp) } Univ. of Tehran Distributed Operating Systems 16 Problems with TSL  Operates at motherboard speeds, not CPU.   Prevents other use of the memory system.   Much slower than cached load or store. Interferes with other CPUs and DMA. Silly to spin in TSL on a uniprocessor.  Add a thread_yield() after every TSL. Univ. of Tehran Distributed Operating Systems 17 Other Similar Hardware Instruction Swap = TSL void Swap (char* x,* y); \\ All done atomically { char temp = *x; *x = *y; *y = temp }  Univ. of Tehran Distributed Operating Systems 18 Peterson’s Solution int flag[2]={false, false}; int turn; Thread Me; { while (true) { flag[my_thread_id] = true; turn = other_thread_id; while (flag[other_thread_id] and turn == other_thread_id ) { }; Access shared variables; // Critical Section; flag[my_thread_id] = false; Do other work } } It works!!! Why? Univ. of Tehran Distributed Operating Systems 19 Sleep and Wakeup  Problem with previous solutions    Busy waiting Wasting CPU Priority Inversion:    a high priority waits for a low priority to leave the critical section the low priority can never execute since the high priority is not blocked. Solution: sleep and wakeup    When blocked, go to sleep Wakeup when it is OK to retry entering the critical section Semaphore operation that executes sleep and wakeup Univ. of Tehran Distributed Operating Systems 20 Semaphores   A semaphore count represents count number of abstract resources. New variable having 2 operations     The Down (P) operation is used to acquire a resource and decrements count. The Up (V) operation is used to release a resource and increments count. Any semaphore operation is indivisible (atomic) Semaphores solve the problem of the wakeup-bit Univ. of Tehran Distributed Operating Systems 21 What’s Up? What’s Down?  Definitions of P and V: Down(S) { while (S <= 0) { }; // no-op S= S-1; } Up(S) { S++; }   Counting semaphores: 0..N Binary semaphores: 0,1 Univ. of Tehran Distributed Operating Systems 22 Possible Deadlocks with Semaphores Example: P0 P1 share two semaphores S and Q S:= 1; Q:=1; Down(S); // S=0 ------------> Down(Q); //Q=0 Down(Q); // Q= -1 <---------------------------> Down(S); // S=-1 // P0 blocked // P1 blocked Up(S); Up(Q); DEADLOCK Up(Q); Up(S); Univ. of Tehran Distributed Operating Systems 23 Monitor   A simpler way to synchronize A set of programmer defined operators monitor monitor-name { // variable declaration public entry P1(..); {... }; ...... public entry Pn(..); {...}; begin initialization code end Univ. of Tehran Distributed Operating Systems 24 Monitor Properties      The internal implementation of a monitor type cannot be accessed directly by the various threads. The encapsulation provided by the monitor type limits access to the local variables only by the local procedures. Monitor construct does not allow concurrent access to all procedures defined within the monitor. Only one thread/process can be active within the monitor at a time. Synchronization is built in. Univ. of Tehran Distributed Operating Systems 25 Cooperating Processors via Message Passing    IPC is best provided by a “messaging system” Messaging system and shared memory system are not mutually exclusive, they can be used simultaneously within a single OS or single process Two basic operations:     Send (destination, &message) Receive (source, &message) Message size: Fixed or Variable size. Real life analogy: conversation Univ. of Tehran Distributed Operating Systems 26 Message Passing Univ. of Tehran Distributed Operating Systems 27 Direct Communication   Binds the algorithm to Process name Sender explicitly names the received or receiver explicitly names the sender      Send(P,message) Receive(Q,message) Link is established automatically between every paid of processes that want to communicate Processes must know about each other identity One link per pairDistributed of processes Univ. of Tehran Operating Systems 28 Indirect Communication send(A,message) /* send a message to mailbox A */ receive(A,message) /* receive a message from mailbox A */  Mailbox is an abstract object into which a message can be placed to or removed from.  Mailbox is owned either by a process or by the system Univ. of Tehran Distributed Operating Systems 29 Mailbox Communication  Ownership by Process     When the process terminates, mailbox disappears Process that declares a mailbox is the owner Any other process that knows the name of the mailbox, can use it. Ownership by System   Mailbox is independent of any process OS provides mechanisms to all the user to Create a mailbox  Send and receive messages through the mailbox  Destroy mailbox Distributed Operating Systems Univ. of Tehran  30 Asynchronous Messaging    In automatic buffering, sender does not know if the message was received Use ACKNOWLEDGEMENT mechanism Process P:    Send(Q,message) Receive(Q,message) Process Q:   Receive(P,message) Send(P,”acknowledgement”) Univ. of Tehran Distributed Operating Systems 31 Message Problems (1)    Using messaging systems failures can occur We need error recovery: exception-condition handling Process termination  P(sender) terminates and Q(receiver) waits (blocks) for ever   Solution: System terminates Q; system notifies Q that P terminated;Q has an internal mechanisms (timer) how long to wait for message from P P(sender) sends a message and Q(receiver) terminates, in automatic buffering, P sends messages until buffer is full or forever; in no-buffering scheme, Univ. of Tehran for ever Distributed Operating Systems P blocks 32 Message Problems (2)  Lost Messages     OS guarantees retransmission Sender is responsible for detecting lost messages using timeouts Sender gets an exception Scrambled Messages  Message arrives from sender P to receiver Q, but it is scrambled and corrupted due to noise in the communication channel Solution: need error detection mechanism such as CHECHSUM to detect any error; need error correction mechanisms to correct errors (e.g. retransmission) Univ. of Tehran Distributed Operating Systems 33  Fast Mutual Exclusion for Uniprocessors  Describe restartable atomic sequences (an optimistic mechanism for implementing atomic operations on a uniprocessor)    Assumes that short, atomic sequences are rarely interrupted. Rely on a recovery mechanisms. Performance improvements. Univ. of Tehran Distributed Operating Systems 34 Motivation of efficient mutual-exclusion  Modern applications use multiple threads     As a program structuring device As a mechanism for portability to multiprocessors As a way to manage I/O and server concurrency Many OSs are build on top of a microkernel   Many services are implemented as multithreaded user-level applications Even single threaded programs rely on basic OS services that are implemented outside the kernel Univ. of Tehran Distributed Operating Systems 35 Implementing mutual exclusion on a uniprocessor  Pessimistic methods     Memory-interlocked instruction Software reservation Kernel emulation Restartable atomic sequences Univ. of Tehran Distributed Operating Systems 36 Memory-interlocked instruction    Implicitly delays interrupts until the instruction completes. Require special hardware support from the processor and bus. The cycle time for an interlocked access is several times greater than that for a noninterlocked access. Univ. of Tehran Distributed Operating Systems 37 Software reservation    Explicitly guards against arbitrary interleaving. A thread must register its intent to perform an atomic operation, and then wait. Examples:    Dekker’s algorithm Lamport’s algorithm Peterson’s algorithm Univ. of Tehran Distributed Operating Systems 38 Kernel emulation     A strictly uniprocessor solution Explicitly disables interrupts during operations that must execute atomically. Although requires no special hardware, its runtime cost is high. The kernel must be invoked on every synchronization operation Univ. of Tehran Distributed Operating Systems 39 Restartable atomic sequence    Instead of using a mechanism that guards against interrupts, we can instead recognize when an interrupt occurs and recover. The recovery process: “restart the sequence”. Are attractive because    Do not require hardware support. Have a short code path with one load and store per atomic read-modify-write. Do not involve the kernel on every atomic operation. Univ. of Tehran Distributed Operating Systems 40 Implementing restartable atomic sequences   Require kernel support to ensure that a suspended thread is resumed at the beginning of the sequence. Strategies for implementing kernel   Explicit registration in Mach Designated sequences in Taos Univ. of Tehran Distributed Operating Systems 41 Explicit registration in Mach    The kernel keeps track of each address space’s restartable atomic sequence. An application registers the starting address and length of the sequence with kernel. In response to the failure  Replace restartable atomic sequence with conventional mechanisms code. Univ. of Tehran Distributed Operating Systems 42 Costs of explicit registration  Cost of subroutine linkage   Because the kernel identifies restartable atomic sequences by a single PC range per address space, They cannot be inlined. Cost of checking return PC   Kernel must check the return PC, whenever a thread is suspended. Make additional scheduling overhead worthwhile. Univ. of Tehran Distributed Operating Systems 43 Designated sequences in Taos   The kernel must recognize every interrupted sequence. Uses two-stage check to recognize atomic sequences.  1st: rejects most interrupted code sequences that are not restartable. (the opcode of the suspended instruction is used as an index into a hash table containing instructions eligible to appear in a restartable atomic sequence)  2nd: uses another table Univ. of Tehran (indexed by opcode) Distributed Operating Systems 44 Costs of designated sequences  Cost of the two-stage check on every thread switch. Univ. of Tehran Distributed Operating Systems 45 Kernel design considerations   Placement of the PC check Mutual exclusion in the kernel Univ. of Tehran Distributed Operating Systems 46 Placement of the PC check  When should the kernel check/adjust the PC of a suspended thread?    When it is first suspended. When it is about to be resumed. Detection at user level    Whenever a suspended thread is resumed by the kernel, it returns to a fixed user-level sequence. Determine if the thread was suspended within a restartable atomic sequence. (complexity and overhead -&- save return address to user-level stack at each suspension) Univ. of Tehran Distributed Operating Systems 47 Mutual exclusion in the kernel   The kernel is itself a client of thread management facilities. Two events, can trigger a thread switching    Page fault Thread preemption Careless ordering of the PC check could lead to mutual recursion between the thread scheduler and the virtual memory system. Univ. of Tehran Distributed Operating Systems 48 software techniques of mutual exclusion   “R.A.S.” via “Kernel Emulation” via “Software reservation” Discuss performance at three levels    Basic overhead of various mechanisms. Effect on the performance of common thread management operations. Effect of mutual exclusion overhead on the performance of several application. Univ. of Tehran Distributed Operating Systems 49 Microbenchmarks  The performance is with test which enters critical section (TSL) in a loop for 1M  Two version of Lamprot algorith (fast and meta) Univ. of Tehran Distributed Operating Systems 50 Thread management overhead  Different thread management packages Two thread using mutex and condition variable alternatively Univ. of Tehran Distributed Operating Systems 51 Application performance     afs-bench: file sys intensive like cp Parthenon-n: theorem prover with n threads Procon-64: producer-consumer Thread suspensions: for R.A.S # of time to check Univ. of Tehran Distributed Operating Systems 52 Conclusions    R.A.S. represent a “common case” approach to mutual exclusion on a uniprocessor. R.A.S. are appropriate for uniprocessors that do not support memory-interlocked atomic instructions. Also on processors that do have hardware support for synchronization, better performance may be possible. Univ. of Tehran Distributed Operating Systems 53 Next Lecture Distributed systems  References   Read the first chapter of the book Univ. of Tehran Distributed Operating Systems 54

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Concurrency