Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 519: Lecture 2 Processes, Threads, and Synchronization Assignment 1 Design and implement a threads package (details left out on purpose!) Check the Web page and the newsgroup for further details Groups of 3 students. Anybody still looking for a group? Due 2/12/2001 at 5pm CS 519 2 Operating System Theory Introduction: Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode instruction Execute instruction CPU OS is just a program OS text and data reside in memory too Invoke OS functionality through procedure calls CS 519 3 Memory Operating System Theory Execution Mode Most processors support at least two modes of execution for protection reasons Privileged - kernel-mode Non-privileged - user-mode The portion of the OS that executes in kernel-mode is called the kernel Access hardware resources Protected from interference by user programs Code running in kernel-mode can do anything - no protection User code executes in user-mode OS functionality that does not need direct access to hardware may run in user-mode CS 519 4 Operating System Theory Interrupts and traps Interrupt: an asynchronous event External events which occur independently of the instruction execution in the processor, e.g. DMA completion Can be masked (specifically or not) Trap: a synchronous event Conditionally or unconditionally caused by the execution of the current instruction, e.g. floating point error, trap instruction Interrupt and trap events are predefined (typically as integers) Each interrupt and trap has an associated interrupt vector Interrupt vector specifies a handler that should be called when the event occurs Interrupts and traps force the processor to save the current state of execution and jump to the handler CS 519 5 Operating System Theory Process An “instantiation” of a program System abstraction – the set of resources required for executing a program Execution context(s) Address space File handles, communication endpoints, etc. Historically, all of the above “lumped” into a single abstraction More recently, split into several abstractions Threads, address space, protection domain, etc. OS process management: Supports creation of processes and interprocess communication (IPC) Allocates resources to processes according to specific policies Interleaves the execution of multiple processes to increase system utilization CS 519 6 Operating System Theory Process Image The physical representation of a process in the OS A process control structure includes Identification info: process, parent process, user Control info: scheduling (state, priority), resources (memory, opened files), IPC Execution contexts – threads An address space consisting of code, data, and stack segments CS 519 7 Operating System Theory Process Address Space A: OS Code C Globals B B: A’s activation frame Stack … C() … C: Heap CS 519 … B() … 8 Operating System Theory Virtual Memory Process addresses are virtual memory addresses Mapping from virtual memory to physical memory Details next lecture Translation: Translation Look-aside Buffer (TLB), page table Will cover more next time … CS 519 9 Operating System Theory User Mode When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure call It’s a system call because, in reality, it’s a software trap Why is a system call a trap instead of a procedure call? CS 519 10 Operating System Theory User Mode When running in user-mode, a process can only access its virtual memory and processor resources (registers) directly All other resources can only be accessed through the kernel by “calling the system” System call A system call is a call because it looks like a procedure call It’s a system call because, in actuality, it’s a software trap Why is a system call a trap instead of a procedure call? Need to switch to kernel mode CS 519 11 Operating System Theory Mode switching One or several bits in the processor status word (PSW) indicate the current mode of execution Sometimes also the previous mode (Other bits in PSW: interrupt enabled/disabled flags, alignment check bit, arithmetic overflow bit, parity bit, carry bit, etc.) Mode bits can be changed explicitly while in kernel-mode but not in user-mode The processor status word can be loaded from the interrupt vector along with the address of the interrupt handler Setting of kernel mode bit in the interrupt vector allows interrupt and trap handlers to be “automatically” executed in kernel mode CS 519 12 Operating System Theory System Call In Monolithic OS interrupt vector for trap instruction PC PSW in-kernel file system(monolithic OS) code for read system call kernel mode trap iret user mode read(....) CS 519 13 Operating System Theory Signals OS may need to “upcall” into user processes Signals UNIX mechanism to upcall when an event of interest occurs Potentially interesting events are predefined: e.g., segmentation violation, message arrival, kill, etc. When interested in “handling” a particular event (signal), a process indicates its interest to the OS and gives the OS a procedure that should be invoked in the upcall. CS 519 14 Operating System Theory Signals (Cont’d) When an event of interest occurs the kernel handles the event first, then modifies the process’s stack to look as if the process’s code made a procedure call to the signal handler. When the user process is scheduled next it executes the handler first From the handler the user process returns to where it was when the event occurred CS 519 15 Handler B B A A Operating System Theory Process Creation How to create a process? System call. In UNIX, a process can create another process using the fork() system call int pid = fork() The creating process is called the parent and the new process is called the child The child process is created as a copy of the parent process (process image and process control structure) except for the identification and scheduling state Parent and child processes run in two different address spaces by default no memory sharing Process creation is expensive because of this copying CS 519 16 Operating System Theory Example of Process Creation Using Fork The UNIX shell is command-line interpreter whose basic purpose is for user to run applications on a UNIX system cmd arg1 arg2 ... argn CS 519 17 Operating System Theory Inter-Process Communication Most operating systems provide several abstractions for inter-process communication: message passing, shared memory, etc Communication requires synchronization between processes (i.e. data must be produced before it is consumed) Synchronization can be implicit (message passing) or explicit (shared memory) Explicit synchronization can be provided by the OS (semaphores, monitors, etc) or can be achieved exclusively in user-mode (if processes share memory) CS 519 18 Operating System Theory Inter-Process Communication More on shared memory and message passing later Synchronization after we talk about threads CS 519 19 Operating System Theory Threads Why limit ourselves to a single execution context? For example, have to use select() to deal with multiple outstanding events. Having multiple execution contexts is more natural. Multiprocessor systems Multiple execution contexts threads All the threads of a process share the same address space and the same resources Each thread contains An execution state: running, ready, etc.. An execution context: PC, SP, other registers A per-thread stack CS 519 20 Operating System Theory Process Address Space Revisited OS OS Code Code Globals Globals Stack Stack Stack Heap Heap (a) Single-threaded address space (b) Multi-threaded address space CS 519 21 Operating System Theory Threads vs. Processes Why multiple threads? Can’t we use multiple processes to do whatever it is that we do with multiple threads? Sure, but we would need to be able to share memory (and other resources) between multiple processes This sharing is directly supported by threads – see later in the lecture Operations on threads (creation, termination, scheduling, etc) are cheaper than the corresponding operations on processes This is because thread operations do not involve manipulations of other resources associated with processes Inter-thread communication is supported through shared memory without kernel intervention Why not? Have multiple other resources, why not execution contexts CS 519 22 Operating System Theory Process state diagram ready (in memory) swap in swap out suspended (swapped out) CS 519 23 Operating System Theory Thread State Diagram thread scheduling dispatch ready running timeout event occurred suspend activate suspend suspended (swapped out) wait for event blocked process scheduling CS 519 24 Operating System Theory Thread Switching Typically referred to as context switching Context switching is the act of taking a thread off of the processor and replacing it with another one that is waiting to run A context switch takes place when Time quota allocated to the executing thread expires The executing thread performs a blocking system call A memory fault due to a page miss CS 519 25 Operating System Theory Context Switching How to do a context switch? Very carefully!! Save state of currently executing thread Copy all “live” registers to thread control block Need at least 1 scratch register – points to area of memory in thread control block that registers should be saved to Restore state of thread to run next Copy values of live registers from thread control block to registers How to get into and out of the context switching code? CS 519 26 Operating System Theory Context Switching OS is just code stored in memory, so … Call context switching subroutine The subroutine saves context of current thread, restores context of the next thread to be executed, and returns The subroutine returns in the context of another (the next) thread! Eventually, will switch back to the current thread To thread, it appears as if the context switching subroutine just took a long while to return CS 519 27 Operating System Theory Thread Switching (Cont’d) What if switching to a thread of a different process? Context switch to process. Implications for caches, TLB, page table, etc.? Caches Physical addresses: no problem Virtual addresses: cache must either have process tag or must flush cache on context switch TLB Each entry must have process tag or flush TLB on context switch Page table Typically have page table pointer (register) that must be reloaded on context switch CS 519 28 Operating System Theory Process Swapping May want to swap out entire process Thrashing if too many processes competing for resources To swap out a process Suspend all of its threads Must keep track of whether thread was blocked or ready Copy all of its information to backing store (except for PCB) To swap a process back in Copy needed information back into memory, e.g. page table, thread control blocks Restore each thread to blocked or ready Must check whether event(s) has (have) already occurred CS 519 29 Operating System Theory Threads & Signals (Usual Unix Implementation) A signal is directed to a single thread (we don’t want a signal delivered multiple times), but the handler is associated with the process Signals can be ignored/masked by threads. Common strategy: one thread responsible to field all signals (calls sigwait() in an infinite loop); all other threads mask all signals What happens if kernel wants to signal a process when all of its threads are blocked? When there are multiple threads, which thread should the kernel deliver the signal to? OS writes into PCB that a signal should be delivered to the process If one thread is responsible for all signals, it is scheduled to run and fields the signal Otherwise, the next time any thread that does not mask the signal is allowed to run, the signal is delivered to that thread as part of the context switch CS 519 30 Operating System Theory POSIX Threads (Pthreads) API Standard thread creation and termination pthread_create(&tid,NULL,start_fn,arg); pthread_exit(status)’ thread join pthread_join(tid, &status); mutual exclusion pthread_mutex_lock(&lock); pthread_mutex_unlock(&lock); condition variable pthread_cond_wait(&c,&lock); pthread_cond_signal(&c); CS 519 31 Operating System Theory Thread Implementation Kernel-level threads (lightweight processes) Kernel sees multiple execution contexts Thread management done by the kernel User-level threads Implemented as a thread library which contains the code for thread creation, termination, scheduling and switching Kernel sees one execution context and is unaware of thread activity Can be preemptive or not CS 519 32 Operating System Theory User-Level vs. Kernel-Level Threads Advantages of user-level threads Performance: low-cost thread operations and context switching, since it is not necessary to go into the kernel Flexibility: scheduling can be application-specific Portability: user-level thread library easy to port Disadvantages of user-level threads If a user-level thread is blocked in the kernel, the entire process (all threads of that process) are blocked Cannot take advantage of multiprocessing (the kernel assigns one process to only one processor) CS 519 33 Operating System Theory User-Level vs. Kernel-Level Threads user-level threads kernel-level threads threads thread scheduling user kernel threads process thread scheduling process scheduling processor processor CS 519 process scheduling 34 Operating System Theory User-Level vs. Kernel-Level Threads No reason why we shouldn’t have both (e.g. Solaris) Most systems now support kernel threads User-level threads are available as linkable libraries user-level threads thread scheduling user kernel kernel-level threads thread scheduling process scheduling processor CS 519 35 Operating System Theory Kernel Support for User-Level Threads Even kernel threads are not quite the right abstraction for supporting user-level threads Mismatch between where the scheduling information is available (user) and where scheduling on real processors is performed (kernel) When the kernel thread is blocked, the corresponding physical processor is lost to all user-level threads although there may be some ready to run. CS 519 36 Operating System Theory Why Kernel Threads Are Not The Right Abstraction user-level threads user-level scheduling user kernel kernel thread blocked kernel-level scheduling physical processor CS 519 37 Operating System Theory Scheduler Activations Kernel allocates processors to processes as scheduler activations Each process contains a user-level thread system which controls the scheduling of the allocated processors Kernel notifies a process whenever the number of allocated processors changes or when a scheduler activation is blocked due to the user-level thread running on it The process notifies the kernel when it needs more or fewer scheduler activations (processors) CS 519 38 Operating System Theory User-Level Threads On Top of Scheduler Activations user-level threads user-level scheduling blocked active user kernel scheduler activation kernel-level scheduling blocked active physical processor CS 519 39 Operating System Theory Thread/Process Operation Latencies Operation User-level Threads (s) Kernel Threads (s) Processes (s) Null fork 34 948 11,300 Signal-wait 37 441 1,840 VAX uniprocessor running UNIX-like OS, 1992. CS 519 40 Operating System Theory Synchronization Why synchronization? Problem Threads share data Data integrity must be maintained Example Transfer $10 from account A to account B A A + 10 B B - 10 If was taking snapshot of account balances, don’t want to read A and B between the previous two statements CS 519 41 Operating System Theory Terminology Critical section: a section of code which reads or writes shared data Race condition: potential for interleaved execution of a critical section by multiple threads Results are non-deterministic Mutual exclusion: synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections Deadlock: permanent blocking of threads Livelock: execution but no progress Starvation: one or more threads denied resources CS 519 42 Operating System Theory Requirements for Mutual Exclusion No assumptions on hardware: speed, # of processors Execution of CS takes a finite time A thread/process not in CS cannot prevent other threads/processes to enter the CS Entering CS cannot de delayed indefinitely: no deadlock or starvation CS 519 43 Operating System Theory Synchronization Primitives Most common primitives Mutual exclusion Condition variables Semaphores Monitors Need Semaphores Mutual exclusion and condition variables CS 519 44 Operating System Theory Mutual Exclusion Mutual exclusion used when want to be only thread modifying a set of data items Have three components: Lock, Unlock, Waiting Example: transferring $10 from A to B Lock(A) Lock(B) A A + 10 B B - 10 Unlock(B) Unlock(A) CS 519 Function Transfer (Amount, A, B) Lock(Transfer_Lock) A A + 10 B B - 10 Unlock(Transfer_Lock) 45 Operating System Theory Implementation of Mutual Exclusion Many read/write algorithms for mutual exclusion See any OS book Disadvantages: difficult to get correct and not very general Simple with a little help from the hardware Atomic read-modify-write instructions Test-and-set Fetch-and-increment Compare-and-swap CS 519 46 Operating System Theory Atomic Read-Modify-Write Instructions Test-and-set Read a memory location and, if the value is 0, set it to 1 and return true. Otherwise, return false Fetch-and-increment Return the current value of a memory location and increment the value in memory by 1 Compare-and-swap Compare the value of a memory location with an old value, and if the same, replace with a new value CS 519 47 Operating System Theory Simple Spin Lock Test-and-set Spin_lock(lock) { while (test-and-set(lock) == 0); } Spin_unlock(lock) { lock = 0; } CS 519 48 Operating System Theory Load-Locked, Store-Conditional Load-Linked, Store-Conditional (LLSC) Pair of instructions may be easier to implement in hardware Load-linked (or load-locked) returns the value of a memory location Store-conditional stores a new value to the same memory location if the value of that location has not been changed since the LL. Returns 0 or 1 to indicate success or failure If a thread is switched out between an LL and an SC, then the SC automatically fails CS 519 49 Operating System Theory What To Do While Waiting? Spinning Waiting threads keep testing location until it changes value Doesn’t work on uniprocessor system Blocking OS or RT system deschedules waiting threads Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor) What is the main trade-off? CS 519 50 Operating System Theory What To Do While Waiting? Spinning Waiting threads keep testing location until it changes value Doesn’t work on uniprocessor system Blocking OS or RT system deschedules waiting threads Spinning vs. blocking becomes an issue in multiprocessor systems (with > 1 thread/processor) What is the main trade-off? Overhead of blocking vs. expected waiting time CS 519 51 Operating System Theory Deadlock CS 519 Lock A Lock B A B 52 Operating System Theory Deadlock CS 519 Lock A Lock B A B 53 Operating System Theory Deadlock CS 519 Lock A Lock B A B 54 Operating System Theory Deadlock (Cont’d) Deadlock can occur whenever multiple parties are competing for exclusive access to multiple resources How can we deal deadlocks? Deadlock prevention Design a system without one of mutual exclusion, hold and wait, no preemption or circular wait (four necessary conditions) To prevent circular wait, impose a strict ordering on resources. For instance, if need to lock variables A and B, always lock A first, then lock B Deadlock avoidance Deny requests that may lead to unsafe states (Banker’s algorithm) Running the algorithm on all resource requests is expensive Deadlock detection and recovery Check for circular wait periodically. If circular wait is found, abort all deadlocked processes (extreme solution but very common) Checking for circular wait is expensive CS 519 55 Operating System Theory Condition Variables A condition variable is always associated with a condition and a lock Typically used to wait for a condition to take on a given value Three operations: cond_wait(lock, cond_var) cond_signal(cond_var) cond_broadcast(cond_var) CS 519 56 Operating System Theory Condition Variables cond_wait(lock, cond_var) Release the lock Sleep on cond_var When awaken by the system, reacquire the lock and return cond_signal(cond_var) If at least 1 thread is sleeping on cond_var, wake 1 up Otherwise, no effect cond_broadcast(cond_var) If at least 1 thread is sleeping on cond_var, wake everyone up Otherwise, no effect CS 519 57 Operating System Theory Producer/Consumer Example Producer Consumer lock(lock_bp) while (free_bp.is_empty()) cond_wait(lock_bp, cond_freebp_empty) buffer free_bp.get_buffer() unlock(lock_bp) lock(lock_bp) while (data_bp.is_empty()) cond_wait(lock_bp, cond_databp_empty) buffer data_bp.get_buffer() unlock(lock_bp) … produce data in buffer … … consume data in buffer … lock(lock_bp) data_bp.add_buffer(buffer) cond_signal(cond_databp_empty) unlock(lock_bp) lock(lock_bp) free_bp.add_buffer(buffer) cond_signal(cond_freebp_empty) unlock(lock_bp) CS 519 58 Operating System Theory Condition Variables Condition variables are implemented using locks Implementation is tricky because it involves multiple locks and scheduling queue Implemented in the OS or run-time thread systems because they involve scheduling operations Sleep/Wakeup CS 519 59 Operating System Theory Semaphores Synchronized counting variables A semaphore is comprised of: An integer value Two operations: P() and V() P() Decrement value While value < 0, sleep V() Increments value If there are any threads sleeping, waiting for value to become zero, wakeup 1 thread CS 519 60 Operating System Theory Monitors Semaphores have a few limitations: unstructured, difficult to program correctly. Monitors eliminate these limitations and are as powerful as semaphores A monitor consists of a software module with one or more procedures, an initialization sequence, and local data (can only be accessed by procedures) Only one process can execute within the monitor at any one time (mutual exclusion) Synchronization within the monitor implemented with condition variables (wait/signal) => one queue per condition variable CS 519 61 Operating System Theory Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful) Problem: Critical sections If a process is delayed or halted in a critical section, hard or impossible for other processes to make progress Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object in a finite number of steps Wait-free concurrent objects Each process attempting to perform an operation on the concurrent object will complete it in a finite number of steps Essential difference between these two? CS 519 62 Operating System Theory Lock and Wait-free Synchronization (or Mutual Exclusion Considered Harmful) Problem: Critical sections If a process is delayed or halted in a critical section, hard or impossible for other processes to make progress Lock-free (aka non-blocking) concurrent objects Some process will complete an operation on the object in a finite number of steps Wait-free concurrent objects Each process attempting to perform an operation on the concurrent object will complete it in a finite number of steps Essential difference between these two? Latter definition guarantees lack of starvation CS 519 63 Operating System Theory Herlihy’s Paper What’s the point? Lock and wait-free synchronization can be very useful for building multiprocessor systems; they provide progress guarantees Building lock and wait-free concurrent objects is HARD, so a methodology for implementing these objects is required Lock and wait-free synchronization provide guarantees but can be rather expensive – lots of copying M. Greenwald and D. Cheriton. The Synergy between Non-blocking Synchronization and Operating System Structure. In Proceedings of OSDI ’96, Oct, 1996. CS 519 64 Operating System Theory Single Word Protocol Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success old: integer := obj new: integer := old+v success := compare_and_swap(obj, old, new) end loop return old end Fetch_and_add_sync Lock-free but not wait-free!! CS 519 65 Operating System Theory Small Objects Small object: occupies 1 block (or less) Basic idea of lock-free object implementation 1. 2. 3. 4. Allocate a new block Copy object to new block, making sure copy is consistent Modify copied block, i.e. perform operation on copied block Try to atomically replace pointer to old block with new one Basic idea of wait-free object implementation Same as before, except that processes should announce their operations in a shared data structure Any process in step 3 should perform all previously announced operations, not just its own operations CS 519 66 Operating System Theory Single Word Protocol Fetch_and_add_sync(obj: integer, v: integer) returns(integer) success: boolean := false loop exit when success old: integer := obj new: integer := old+v // allocate, copy and modify // single word so consistent success := compare_and_swap(obj, old, new) // replace end loop return old end Fetch_and_add_sync Lock-free but not wait-free!! CS 519 67 Operating System Theory Large Objects I’ll leave for you to read Basic idea is to make local changes in the data structure using small object protocol Complicated but not different than locking: e.g., if you lock an entire tree to modify some nodes, performance will go down the drain CS 519 68 Operating System Theory