Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
59.305 - Operating Systems INTRODUCTION What is an operating system? Early Systems Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems Personal-Computer Systems Parallel Systems Distributed Systems Real-Time Systems What is an Operating system A program that acts as an intermediary between a user of a computer and the computer hardware. A systems program which controls all the computer's resources and provides a base upon which application programs can be written. Operating system goals: Execute user programs and make solving user problems easier. Make the computer system convenient to use. Use the computer hardware in an efficient manner. Computer System Components 1. Hardware - provides basic computing resources (CPU, memory, I/O devices). 2. Operating system - controls and coordinates the use of the hardware among the various application programs for the various users. 3. Applications programs - define the ways in which the system resources are used to solve the computing problems of the users (compilers, database systems, video games, business pro grams). 4. Users (people, machines, other computers). 1 Operating System Functions Resource allocator - manages and allocates resources. Control program - controls the execution of user programs and operation of I/O devices. Kernel - the one program running at all times (all else being application programs). Early Systems - bare machine (early 1950s) - First Generation. Structure o Large machines run from console o Single user system o Programmer/User as operator o Paper tape or punched cards Early Software o Assemblers o Loaders o Linkers o Libraries of common subroutines o Compilers o Device drivers Secure Inefficient use of expensive resources o Low CPU utilization o Significant amount of setup time 2 Simple Batch Systems - Second Generation. Use an operator (somebody to work the machine) Add a card reader (a device to read programs written on punched cards) Reduce setup time by batching similar jobs Automatic job sequencing - automatically transfers control from one job to another. First rudimentary operating system. Resident monitor o initial control in monitor o control transfers to job o when job completes control transfers back to monitor Problems: 1. How does the monitor know about the nature of the job (e.g., Fortran versus Assembly) or which program to execute? 2. How does the monitor distinguish a) job from job? b) data from program? Solution: introduce control cards Control Cards Special cards that tell the resident monitor which programs to run. Parts of resident monitor o Control card interpreter - responsible for reading and carrying out instructions on the cards. o Loader - loads systems programs and applications programs into memory. - 3 o Device drivers - know special characteristics and properties for each of the system's I/O devices. Problem: Slow Performance - since I/O and CPU could not overlap, and card reader very slow. Solution: Off-line operation - speed up computation by loading jobs into memory from tapes and card reading and line printing done off-line using smaller machines. Advantage of off-line operation - main computer not constrained by the speed of the card readers and line printers, but only by the speed of faster magnetic tape units. No changes need to be made to the application programs to change from direct to off-line I/O operation. Real gain - possibility of using multiple reader to-tape and tape-to-printer systems for one CPU. Spooling - overlap the I/O of one job with the computation of another job. (Simultaneous Peripheral Operation On Line) Simple Multiprogramming. While executing one job, the operating system: o reads the next job from the card reader into a storage area on the disk (job queue). o outputs the printout of previous job from disk to the line printer. Job pool - data structure that allows the operating system to select which job to run next, in order to increase CPU utilization. 4 Multiprogramming and Time Sharing- Third Generation Multiprogramming Several jobs are kept in main memory at the same time, and the CPU is shared between them. Each job is called a process. OS Features Needed for Multiprogramming I/O routine supplied by the system. Memory management - the system must allocate the memory to several jobs. CPU scheduling - the system must choose among several jobs ready to run. Allocation of devices. Time-Sharing Systems- Interactive Computing Most efficient for many users to share a large computer. The CPU is shared between several processes. Each process belongs to a user and I/O is to/from a separate terminal for each user. On-line file system must be available for users to access data and code. Personal-Computer Systems - Fourth Generation Personal computers - computer system dedicated to a single user. I/O devices - keyboards, mice, display screens, small printers. User convenience and responsiveness. Can adopt technology developed for larger operating systems; often individuals have sole use of computer and do not need advanced CPU utilization or protection features. 5 Parallel Systems - multiprocessor systems with more than one CPU in close communication. Tightly coupled system - processors share memory and a clock; communication usually takes place through the shared memory. Advantages of parallel systems: o Increased throughput o Economical o Increased reliability Symmetric multiprocessing o Each processor runs an identical copy of the operating system. o Many processes can run at once without performance deterioration. Asymmetric multiprocessing o Each processor is assigned a specific task; master processor schedules and allocates work to slave processors. o More common in extremely large systems. Distributed Systems - distribute the computation among several physical processors. Loosely coupled system - each processor has its own local memory; processors communicate with one another through various communication lines, such as high-speed networks. Advantages of distributed systems: o Resource sharing o Computation speed up - load sharing o Reliability o Communication Real-Time Systems Often used as a control device in a dedicated application such as controlling scientific experiments, medical imaging systems, industrial control systems, and some display systems. Well-defined fixed-time constraints. OS must be able to respond very quickly. COMPUTER-SYSTEM STRUCTURES Computer-System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture 6 Computer-System Operation I/O devices and the CPU can operate concurrently. Each device controller is in charge of a particular device type. Each device controller has a local buffer. CPU moves data from/to main memory to/from the local buffers. I/O is from the device to local buffer of controller. Device controller informs CPU that it has finished its operation by causing an interrupt. Interrupts Types 1. Hardware - Asynchronous Device informs CPU that something has happened e.g. a key has been pressed on the keyboard. 2. Hardware - Synchronous CPU has tried to do something that has caused the interrupt. e.g. tried to read from an invalid memory location. (not always a problem, it may mean that that page is on disk needs to be fetched). Often called an Exception or Trap. 3. Software CPU asked for the interrupt to happen. e.g. to perform an OS Call. Often called a Trap. Hardware Interrupts I/O devices use Asynchronous Hardware Interrupts (i.e. caused by outside world and may happen at any time). Transfers control to the interrupt service routine, through the interrupt vector, which contains the addresses of all the service routines. CPU must save the address of the interrupted instruction. Interrupt Handling 7 Interrupt handling is a very important part of the OS. The operating system must preserve the state of the CPU by storing all registers. Determine which type of interrupt has occurred: o polling - ask each device if it caused the interrupt. o vectored interrupt system - device identifies itself when it causes the interrupt. Separate segments of code determine what action should be taken for each type of interrupt. I/O Calls Blocking I/O User program requests I/O, control returns to user program only upon I/O completion. o CPU may be allocated to another process. Non-Blocking I/O After I/O starts, control returns to user program without waiting for I/O completion. Direct Memory Access (DMA) Structure Used for high-speed I/O devices able to transmit information at close to memory speeds. Device controller transfers blocks of data from buffer storage directly to main memory without CPU intervention. Only one interrupt is generated per block, rather than the one interrupt per byte. Storage Structure Main memory - only large storage media that the CPU can access directly. Secondary storage - extension of main memory that provides large non-volatile storage capacity. Magnetic disks 8 o o Disk surface is logically divided into tracks, which are subdivided into sectors. The disk controller determines the logical interaction between the device and the computer. Storage Hierarchy Storage systems can be organized in a hierarchy: o speed o cost o volatility Most programs make accesses to memory which are localised o in time i.e. the program spends a lot of time executing short sections of code. o in space i.e. the program reads and writes to certain memory locations a lot; these locations tend to be close together. Caching - copying information into faster storage system; main memory can be viewed as a fast cache for secondary memory. Hardware Protection Dual-Mode Operation I/O Protection Memory Protection CPU Protection Dual-Mode Operation Sharing system resources requires operating system to ensure that an incorrect program cannot cause other programs to execute incorrectly. Provide hardware support to differentiate between at least two modes of operations. o o User mode - execution done on behalf of a user. Monitor mode (also supervisor mode or system mode) - execution done on behalf of operating system. Mode bit added to computer hardware (in CPU flags) to indicate the current mode: monitor (0) or user (1). When an interrupt or fault occurs hardware switches to monitor mode 9 Certain Privileged instructions can be issued only in monitor mode. Some CPUs have more complex protection mechanisms with many levels of protection (sometimes called rings). I/O Protection All I/O instructions are privileged instructions. Must ensure that a user program could never gain control of the computer in monitor mode Memory Protection Must provide memory protection at least for the interrupt vector and the interrupt service routines. In order to have memory protection, add two registers that determine the range of legal addresses a program may access: o base register - holds the smallest legal physical memory address. o limit register - contains the size of the range. Memory outside the defined range is protected. 10 Protection hardware When executing in monitor mode, the operating system has unrestricted access to both monitor and users' memory. The load instructions for the base and limit registers are privileged instructions. In practice, memory protection is much more complicated than this. A device called a Memory Management Unit (MMU) controls access to memory. CPU Protection - how does the OS stay in control. Timer - interrupts computer after specified period to ensure operating system maintains control. o Timer is decremented every clock tick. o When timer reaches the value 0, an interrupt occurs. Timer used to implement multiprogramming. Timer also used to compute the current time. Load-timer is a privileged instruction. User programs can not disable interrupts. General-System Architecture Given that I/O instructions are privileged, how does the user program perform I/O? System call - the method used by a process to request action by the operating system. o Usually takes the form of a trap (software interrupt). o Control passes through an interrupt vector to a service routine in the OS, and the mode bit is automatically set to supervisor mode. o The OS verifies that the parameters are correct and legal, executes the request, and returns control to the instruction following the system call. 11 OPERATING-SYSTEM STRUCTURES System Components Operating-System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation Booting Most operating systems support the following types of system components: Process Management Main-Memory Management Secondary-Storage Management I/O System Management File Management Protection System Networking Command-Interpreter System Process Management A process is a program in execution. A process needs certain resources, including CPU time, memory, files, and I/O devices, to accomplish its task. The operating system is responsible for the following activities in connection with process management: o process creation and deletion. o process suspension and resumption (scheduling). o provision of mechanisms for: process synchronization process communication Process management is usually performed by the kernel. Main-Memory Management Memory is a large array of words or bytes, each with its own address. It is a repository of quickly accessible data shared by the CPU and I/O devices. 12 The operating system is responsible for the following activities in connection with memory management: o Keep track of which parts of memory are currently being used and by whom. o Decide which processes to load when memory space becomes available. o Allocate and deallocate memory space as needed. e.g. the C function 'malloc' (or 'New' in Pascal) allocates a specified amount of memory; this happens via an OS call. The functions 'free'(C) and 'Dispose'(Pascal) deallocate this memory. I/O System Management The I/O system consists of: o A buffer-caching system o A general device-driver interface o Drivers for specific hardware devices (device drivers) Device Drivers o Must have access to I/O hardware o Must be able to handle interrupts. o Communicate with other parts of the OS (File system, Networking etc). File Management A file is a collection of related information. Commonly, files represent programs (both source and object forms) and data. Files may also be used to represent devices (e.g. lpt1: in DOS). The operating system is responsible for the following activities in connection with file management: o File creation and deletion. o Directory creation and deletion. o Support of primitives for manipulating files and directories. o Mapping files onto secondary storage. e.g. free space allocation. o File backup on stable (non-volatile) storage media. Protection System Protection refers to a mechanism for controlling access by programs, processes, or users to both system and user resources. Operating Systems commonly control access by using permissions. All system resources have an owner and a permission associated with them. Users may be combined into groups for the purpose of protection. e.g. in UNIX every file has an owner and a group. The following is a listing of all the information about a file. rwxr-xr-- martin staff 382983 Jan 18 10:20 notes305.html 13 The first field is the protection information; it shows the permissions for the owner, then the group, then everybody else. The first rwx means that the owner has read, write and execute permissions. The next r-x means that the group has read and execute permissions. The next r-- means that all other users have only read permission. The name of the owner of the file is martin; the name of the group for the file is staff; the length of the file is 382983 bytes; the file was created on Jan 18 at 10:20 and the name of the file is: notes305.html There is usually a special user corresponding to the system administrator, this user has permission do do anything. On UNIX systems this user is called root. Networking (Distributed Systems) A distributed system is a collection of processors that do not share memory or a clock. Each processor has its own local memory. The processors in the system are connected through a communication network. A distributed system provides user access to various system resources. Access to a shared resource allows: o Computation speed-up o Increased data availability o Enhanced reliability Command-Interpreter System Many commands are given to the operating system by control statements which deal with: o process creation and management (e.g. running a program) o I/O handling (e.g. set terminal type) o secondary-storage management (e.g. format a disk) o main-memory management (e.g. specify virtual memory parameters) o file-system access (e.g. print file) o protection (e.g. set permissions) o networking (e.g. set IP address) The program that reads and interprets control statements is called variously: o command-line interpreter o shell (in UNIX) Its function is to get and execute the next command statement. Some operating systems have no command line interpreter and use a GUI for all system administration (e.g. NT). 14 Operating-System Services Program execution - ability to load a program into memory and to run it. I/O operations - since user programs cannot execute I/O operations directly, the operating system must provide some means to perform I/O. File-system manipulation - capability to read, write, create, and delete files. Communications - exchange of information between processes executing either on the same computer or on different systems tied together by a network. Implemented via shared memory or message passing. Error detection - ensure correct computing by detecting errors in the CPU and memory hardware, in I/O devices, or in user programs. Additional operating-system functions exist not for helping the user, but rather for ensuring efficient system operation. Resource allocation - allocating resources to multiple users or multiple processes running at the same time. Accounting - keep track of and record which users use how much and what kinds of computer resources for account billing or for accumulating usage statistics. Protection - ensuring that all access to system resources is controlled. System Calls System calls provide the interface between a running program and the operating system. o Generally available as an assembly-language instruction to generate a software interrupt. (e.g. INT 21h in DOS) o Systems programming languages such as C allow system calls to be made directly. Three general methods are used to pass parameters between a running program and the operating system: o Pass parameters in registers. o Store the parameters in a table in memory, and the table address is passed as a parameter in a register. o Push (store) the parameters onto the stack by the program, and pop off the stack by the operating system. System Programs System programs provide a convenient environment for program development and execution. They can be divided into: o File manipulation o Status information o File modification o Programming-language support 15 o o o Program loading and execution Communications Application programs Most users' view of the operation system is defined by system programs, not the actual system calls. System Structure - Simple Approach MS-DOS - written to provide the most functionality in the least space; it was not divided into modules. MS-DOS has some structure, but its interfaces and levels of functionality are not well separated. UNIX - limited by hardware functionality, the original UNIX operating system had limited structuring. The UNIX OS consists of two separable parts: o the systems programs. o the kernel, which consists of everything below the system-call interface and above the physical hardware. Provides the file system, CPU scheduling, memory management, and other operating-system functions; a large number of functions for one level. Often this is called a Monolithic Kernel Many modern operating systems use a Microkernel - The kernel provides only the following minimal services. 1. Interprocess communication. 2. Memory management. 3. Low level process management. 4. Low Level I/O All other services are provided by user level processes. 16 System Structure - Layered Approach The operating system is divided into a number of layers (levels), each built on top of lower layers. The bottom layer (layer 0) is the hardware; the highest (layer N) is the user interface. With modularity, layers are selected such that each uses functions (operations) and services of only lower-level layers. A layered design was first used in the THE operating system of Dijkstra in 1968. Its six layers are as follows: _______________________________________________ Level 5: user programs _______________________________________________ Level 4: buffering for input and output devices _______________________________________________ Level 3: operator-console device driver _______________________________________________ Level 2: memory management _______________________________________________ Level 1: CPU scheduling _______________________________________________ Level 0: hardware _______________________________________________ Virtual Machines A virtual machine takes the layered approach to its logical conclusion. It treats hardware and the operating system kernel as though they were all hardware. A virtual machine provides an interface identical to the underlying bare hardware. The operating system creates the illusion of multiple processes, each executing on its own processor with its own (virtual) memory. The resources of the physical computer are shared to create the virtual machines. o CPU scheduling can create the appearance that users have their own processor. o Spooling and a file system can provide virtual I/O. o A terminal serves as the virtual machine console. Advantages and Disadvantages of Virtual Machines The virtual-machine concept provides complete protection of system resources since each virtual machine is isolated from all other virtual machines. This isolation, however, permits no direct sharing of resources. A virtual-machine system is a perfect vehicle for operating-systems research and development. System development is done on the virtual machine, instead of on a physical machine and so does not disrupt normal system operation. The virtual machine concept is difficult to implement due to the effort required to provide an exact duplicate of the underlying machine. 17 System Design Goals User goals - operating system should be convenient to use, easy to learn, reliable, safe, and fast. System goals - operating system should be easy to design, implement, and maintain, as well as flexible, reliable, error-free, and efficient. Mechanisms and Policies Mechanisms determine how to do something; policies decide what will be done. The separation of policy from mechanism is a very important principle; it allows maximum flexibility if policy decisions are to be changed later. System Implementation Traditionally written in assembly language, operating systems are now mostly written in higher level languages. Code written in a high-level language: o can be written faster. o is more compact. o is easier to understand and debug. An operating system is far easier to port (move to some other hardware) if it is written in a high level language. The first OS to be written in a high-level language was UNIX, at the time there were very few suitable languages and so a new one was developed from an existing language called B - it was called C. Booting Modern operating systems are designed to run on machines with a wide range of different hardware. Booting - starting a computer by loading the kernel. Bootstrap program - code stored in ROM that is able to locate the kernel, load it into memory, and start its execution. Once the kernel is loaded it must identify all the hardware present in the machine and load relevant device drivers. PROCESSES Process Concepts Process Scheduling Processes Creation and Termination Cooperating Processes Threads Interprocess Communication 18 Process Concepts An operating system executes a variety of programs: o Batch system - jobs o Time-shared systems - user programs or tasks Textbook uses the terms job and process almost interchangeably. Process - a program in execution; process execution must progress in a sequential fashion. A process includes: o program counter o stack o data section As a process executes, it changes state. o New: The process is being created. o Running: Instructions are being executed. o Waiting: The process is waiting for some event to occur. o Ready: The process is waiting to be assigned to a processor. o Terminated: The process has finished execution. Diagram of process states: Process Control Block (PCB) - Information associated with each process. o Process ID (name, number) o Process state o Priority, owner, etc... o Program counter o CPU registers o CPU scheduling information o Memory-management information o Accounting information o I/O status information 19 Process Scheduling Process scheduling queues o job queue - set of all processes in the system. o ready queue - set of all processes residing in main memory, ready and waiting to execute. o device queues - set of processes waiting for a particular I/O device. Process migration between the various queues. Schedulers o Long-term scheduler (job scheduler) - selects which processes should be brought into the ready queue. o Short-term scheduler (CPU scheduler) - selects which process should be executed next and allocates CPU. 20 Short-term scheduler is invoked very frequently (milliseconds) => (must be fast). Long-term scheduler is invoked very infrequently (seconds, minutes) => (may be slow). The long-term scheduler controls the degree of multiprogramming. Processes can be described as either: o I/O-bound process - spends more time doing I/O than computations; many short CPU bursts. o CPU-bound process - spends more time doing computations; few very long CPU bursts. Context Switch When CPU switches to another process, the system must save the state of the old process and load the saved state for the new process. Context-switch time is overhead; the system does no useful work while switching. Time dependent on hardware support. Process Creation Parent process creates children processes, which, in turn create other processes, forming a tree of processes. Example Process Tree Resource sharing - 3 possibilities o Parent and children share all resources. o Children share subset of parent's resources. o Parent and child share no resources. Execution - 2 choices. o Parent and children execute concurrently. o Parent waits until children terminate. Address space o Child duplicate of parent. o Child has a program loaded into it. 21 UNIX examples o fork system call creates new process. o The new process is an exact copy of the parent and continues execution from the same point as its parent. o The only difference between parent and child is the value returned from the fork call. 0 for the child. the process id (pid) of the child, for the parent. o The execve system call used after a fork to replace the process' memory space with a new program. o Using fork and execve we can write a simple command line interpreter. o o o o o o o while (true) { read_command_line(&command,¶meters); if(fork()!=0) { waitpid(-1,&status,0); } else { execve(command,parameters,0); } } Process Termination Process executes last statement and asks the operating system to delete it (exit). o Output data from child to parent (via fork). o Process' resources are deallocated by operating system. Parent may terminate execution of children processes (abort). o Child has exceeded allocated resources. o Task assigned to child is no longer required. o Parent is exiting. Operating system does not allow child to continue if its parent terminates. - Cascading termination. Cooperating Processes Independent process cannot affect or be affected by the execution of another process. Cooperating process can affect or be affected by the execution of another process. Advantages of process cooperation: o Information sharing o Computation speed-up o Modularity o Convenience 22 Producer-Consumer Problem Paradigm for cooperating processes; producer process produces information that is consumed by a consumer process. o unbounded-buffer places no practical limit on the size of the buffer. o bounded-buffer assumes that there is a fixed buffer size. Shared-memory solution: Shared data typedef .... item; item buffer[0..N-1]; int in, out; in = 0; out = 0; Producer process while(true) { ... produce an item in nextp ... while((in+1)%n == out) no-op; buffer[in] = nextp; in = (in+1)%n; } Consumer process while(true) { while(in == out) no-op; nextc = buffer[out]; out = (out+1)%n; ... consume the item in nextc ... } Solution is correct, but uses busy waiting. while(in == out) no-op; Uses CPU time doing nothing. Later we will see how to avoid this. 23 Threads A thread (or lightweight process) is a basic unit of CPU utilization; it consists of: o program counter o register set o stack space A thread shares with its peer threads its: o code section o data section o operating-system resources A traditional or heavyweight process is equal to a task with one thread. In a task containing multiple threads, while one server thread is blocked and waiting, a second thread in the same task could run. o Cooperation of multiple threads in same job confers higher throughput and improved performance. o Applications that require sharing a common buffer (producer-consumer problem) benefit from thread utilization. Threads provide a mechanism that allows sequential processes to make blocking system calls while also achieving parallelism. Types of threads Kernel-supported threads; OS supports threads directly. o Overhead for thread creation. User-level threads; supported above the kernel, via a set of library calls at the user level . o Can not use multiple processors. Hybrid approach implements both user-level and kernel-supported threads. Two types of threads you are likely to see: o POSIX threads POSIX is a standard for UNIX systems - the standard includes a thread library. o WIN32 threads - those available on Windows 95 and NT. Interprocess Communication (IPC) Provides a mechanism to allow processes to communicate / synchronize their actions. Message system - processes communicate with each other without resorting to shared variables. IPC facility provides two operations: o send(message) - messages can be of either fixed or variable size. o receive(message) If P and Q wish to communicate, they need to: o establish a communication link between them o exchange messages via send/receive 24 Implementation questions: How are links established? Can a link be associated with more than two processes? How many links can there be between every pair of communicating processes? What is the capacity of a link? Is the size of a message that the link can accommodate fixed or variable? Is a link unidirectional or bidirectional? Direct Communication Processes must name each other explicitly: o send(P, message) - send a message to process P o receive(Q, message) - receive a message from process Q Properties of communication link o Links are established automatically. o A link is associated with exactly one pair of communicating processes. o Between each pair there exists exactly one link. o The link may be unidirectional, but is usually bidirectional. Indirect Communication Messages are directed and received from mail boxes (also referred to as ports). o Each mailbox has a unique id. o Processes can communicate only if they share a mailbox. Properties of communication link o Link established only if the two processes share a mailbox in common. o A link may be associated with many processes. o Each pair of processes may share several communication links. o Link may be unidirectional or bidirectional. Operations o create a new mailbox o send and receive messages through mailbox o destroy a mailbox Mailbox sharing o P1 , P2 , and P3 share mailbox A. o P1 sends; P2 and P3 receive. o Who gets the message? Solutions o Allow a link to be associated with at most two processes. o Allow only one process at a time to execute a receive operation. o Allow the system to select arbitrarily the receiver. Sender is notified who the receiver was. 25 Buffering - queue of messages attached to the link; implemented in one of three ways. Zero capacity - 0 messages Sender must wait for receiver (rendezvous). Bounded capacity - finite length of n messages Sender must wait if link full. Unbounded capacity - infinite length Sender never waits. Exception Conditions - error recovery Process terminates Lost messages Scrambled Messages Pipes A pipe is a simple method for communicating between two processes. As far as the processes are concerned the pipe appears to be just like a file. When A performs a write, it is buffered in the pipe. When B reads then it reads from the pipe, blocking if there is no input. in UNIX (and DOS) one process can be piped into another pipe using the '|' character. e.g. cat classlist | sort | more the cat command prints the contents of the file 'classlist', this is piped into the sort command which sorts the list. Finally, the sorted list is sent to the more command that prints it one screenfull at a time. Pipes may be implemented using shared memory (UNIX) or even with temporary files (DOS). 26 CPU SCHEDULING Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation Basic Concepts Maximum CPU utilization obtained with multi programming. CPU-I/O Burst Cycle - Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution Short-term scheduler -selects from among the processes in memory that are ready to execute, and allocates the CPU to one of them. CPU scheduling decisions may take place when a process: 1. 2. 3. 4. switches from running to waiting state. switches from running to ready state. switches from waiting to ready. terminates. Scheduling under 1 and 4 is non-preemptive (cooperative). All other scheduling is preemptive. Dispatcher Dispatcher module gives control of the CPU to the process selected by the shortterm scheduler; this involves: o switching context o switching to user mode o jumping to the proper location in the user program to restart that program Dispatch latency - time it takes for the dispatcher to stop one process and start another running. 27 Scheduling Criteria CPU utilization - keep the CPU as busy as possible Throughput - # of processes that complete their execution per time unit Turnaround time - amount of time to execute a particular process Waiting time - amount of time a process has been waiting in the ready queue Response time - amount of time it takes from when a request was submitted until the first response is produced, not output (for time sharing environment) Optimization o Max CPU utilization o Max throughput o Minimum turnaround time o Minimum waiting time o Minimum response time First-Come, First-Served (FCFS) Scheduling Example: Process P1 P2 P3 Burst time 24 3 3 Suppose that the processes arrive in the order: P1, P2, P3. A diagram to show this schedule is: Waiting time for: P1 = 0 , P2 = 24 , P3 = 27 Average waiting time: (0 + 24 + 27)/3 = 17 Suppose that the processes arrive in the order: P2 , P3 , P1. The diagram for the schedule is: Waiting time for: P1 = 6, P2 = 0, P3 = 3 Average waiting time: (6 + 0 + 3)/3 = 3 Much better than previous case. Convoy effect: short process behind long process 28 Shortest-Job-First (SJF) Scheduling Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time. Two schemes: a) non-preemptive - once CPU given to the process it cannot be preempted until it completes its CPU burst. b) preemptive - if a new process arrives with CPU burst length less than remaining time of current executing process, preempt. This scheme is known as the Shortest-Remaining Time-First (SRTF). SJF is optimal - gives minimum average waiting time for a given set of processes. Example of SJF Process Arrival time CPU time P1 P2 P3 P4 0 2 4 5 7 4 1 4 SJF (non-preemptive) Average waiting time = (0 + 6 + 3 + 7)/4 = 4 SRTF (preemptive) Average waiting time = (9 + 1 + 0 + 2)/4 = 3 29 How do we know the length of the next CPU burst? Can only estimate the length. Can be done by using the length of previous CPU bursts, using exponential averaging. 1. 2. 3. 4. Tn = actual length of n'th CPU burst Pn = predicted value of n'th CPU burst 0 <= W <= 1 Define: Pn+1 = W * Tn + (1-W) Pn Examples: W=0 Pn+1 = Pn Recent history does not count. W=1 Pn+1 = Tn Only the actual last CPU burst counts. If we expand the formula, we get: Pn+1 = W Tn+ (1-W ) W Tn-1+ (1-W)2 W Tn-2+ ... + (1 - W )q W Tn-q So if W = 1/2 - each successive term has less and less weight. Priority Scheduling A priority number (integer) is associated with each process. The CPU is allocated to the process with the highest priority (smallest integer -> highest priority). a) preemptive b) non-preemptive SJF is a priority scheduling where priority is the predicted next CPU burst time. Problem = Starvation (or indefinite blocking) - low priority processes may never execute. Solution = Aging - as time progresses increase the priority of the process. 30 Round Robin (RR) Each process gets a small unit of CPU time (time quantum), usually 10-100 milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue. If there are n processes in the ready queue and the time quantum is q , then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n -1)q time units. Performance q large -> FIFO q small -> q must be large with respect to context switch, otherwise overhead is too high. Example of RR with time quantum = 20 Process CPU times P1 P2 P3 P4 53 17 68 24 Typically, higher average turnaround than SRTF, but better response. Multilevel Queue Ready queue is partitioned into separate queues. Example: foreground (interactive) background (batch) Each queue has its own scheduling algorithm. Example: 31 foreground - RR background - FCFS Scheduling must be done between the queues. o Fixed priority scheduling Example: serve all from foreground then from background. Possibility of starvation. Time slice - each queue gets a certain amount of CPU time which it can schedule amongst its processes. Example: 80% to foreground in RR 20% to background in FCFS Multilevel Feedback Queue A process can move between the various queues; aging can be implemented this way. Multilevel-feedback-queue scheduler defined by the following parameters: o number of queues o scheduling algorithm for each queue o method used to determine when to upgrade a process o method used to determine when to demote a process o method used to determine which queue a process will enter when that process needs service Example of multilevel feedback queue Three queues: o Q0 - time quantum 8 milliseconds o Q1 - time quantum 16 milliseconds o Q2 - FCFS Scheduling A new job enters queue Q0 which is served FCFS. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q1 . At Q1 , job is again served FCFS and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to queue Q2 . Multiple-Processor Scheduling 32 CPU scheduling more complex when multiple CPUs are available. Homogeneous processors within a multiprocessor (CPUs must be the same). Load sharing - use a common ready queue. Each processor schedules itself, or one processor is used for scheduling. Real-Time Scheduling Hard real-time systems - required to complete a critical task within a guaranteed amount of time. Soft real-time computing - requires that critical processes receive priority over less fortunate ones. Algorithm Evaluation Deterministic modeling - takes a particular predetermined workload and defines the performance of each algorithm for that workload. Queuing models - make a mathematical model based on the distributions of job start times and burst times. Simulation - write a program to schedule imaginary tasks using various algorithms. Implementation - code the algorithms into the OS. Summary 2 queues - ready and I/O request. FCFS simple but causes short jobs to wait for long jobs. SJF is optimal giving shortest waiting time but need to know length of next burst. SJF is a type of priority scheduling - may suffer from starvation - prevent using aging RR is gives good response time, it is preemptive. FCFS is non-preemptive priority algorithms can be both. Problem selecting the quantum. Multiple queue Algorithms use the best of each algorithm by having more than one queue. Feedback queues allow jobs to move from queue to queue. Algorithms may be evaluated by deterministic methods, mathematical models and implementation. PROCESS SYNCHRONIZATION Background The Critical-Section Problem Synchronization Hardware Semaphores Classical Problems of Synchronization Critical Regions Monitors Atomic Transactions 33 Background Concurrent access to shared data may result in data inconsistency. Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes. Suppose that we modify the producer-consumer code by adding a variable counter, initialized to 0 and incremented each time a new item is added to the buffer. The new scheme is illustrated by the following: Shared data typedef .... item; item buffer[N]; int in=0, out=0, counter=0; Producer process while(true) { ... produce an item in nextp ... while(counter == n) no-op; buffer[in] = nextp; in = (in+1)%n; counter = counter + 1; } Consumer process while(true) { while(counter == 0) no-op; nextc = buffer[out]; out = (out+1)%n; counter = counter - 1; ... consume the item in nextc ... } The statements: counter = counter + 1; counter = counter - 1; must be executed atomically. 34 The Critical-Section Problem n processes all competing to use some shared data Each process has a section of code code, called its critical section, in which the shared data is accessed. Problem - ensure that when one process is executing in its critical section, no other process is allowed to execute in its critical section. Structure of process Pi while(true) { entry section critical section exit section remainder section } A solution to the critical-section problem must satisfy the following three requirements: 1. Mutual Exclusion. If process Pi is executing in its critical section, then no other processes can be executing in their critical sections. 2. Progress. If no process is executing in its critical section and there are some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely. 3. Bounded Waiting. A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted. Assumption that each process is executing at a nonzero speed. No assumption concerning relative speed of the n processes. Initial attempts to solve the problem. Only 2 processes, P0 and P1 General structure of process Pi (other process Pj ) while(true) { entry section critical section exit section remainder section } Processes may share some common variables to synchronize their actions. 35 Algorithm 1 Shared variables: int turn=0; turn = i -> Pi can enter its critical section Process Pi while(true) { while(turn!=i) no-op; critical section turn = j; remainder section } Satisfies mutual exclusion, but not progress. Algorithm 2 Shared variables bool flag[2] = {false, false}; flag[i] = true -> Pi ready to enter its critical section Process Pi Does not satisfy progress because: If the two processes set their flags to true at the same time, then they will both wait forever. while(true) { flag[i] = true; while(flag[j]) no-op; critical section flag[i] = false; remainder section until false; Algorithm 3 Combined shared variables of algorithms 1 and 2. Process Pi while(true) { flag[i] = true; turn = j; while (flag[j] && turn==j) no-op; critical section flag[i] := false; remainder section } Meets all three requirements; solves the critical section problem for two processes. 36 Bakery Algorithm - Critical section for n processes Before entering its critical section, process receives a number. Holder of the smallest number enters the critical section. If processes Pi and Pj receive the same number, if i < j , then Pi is served first; else Pj is served first. The numbering scheme always generates numbers in increasing order of enumeration. Example: 1,2,3,3,3,3,4,5... Bakery Algorithm Shared data bool choosing[n] = {false,..}; int number[n] = {0,..}; while(true) { choosing[i] = true; max=0; for(i=0;i<n;i++) if(max<number[i]) max = number[i]; number[i] = max + 1; choosing[i] = false; for (j = 0; j < n; j++) { while (choosing[j]); while (number[j] !> 0 && number[j] < number[i] || (number[j] == number[i] && j < i) ); } critical section number[i] = 0; remainder section } Synchronization Hardware Test and modify the content of a word atomically. bool TestandSet (bool *target) { bool t=*target; /* all this is */ *target=true; /* done by one */ return t; /* machine instruction */ } void Exchg(bool *a,bool *b) { bool temp=*a; /* all this is */ *a=*b; /* done by one */ *b=temp; /* machine instruction */ } 37 Mutual exclusion algorithm o Shared data: bool lock=false; o o o o o o Process Pi while(true) { while TestandSet(lock) no-op; critical section lock = false; remainder section } or o Shared data: bool lock=false; o o o o o o o o o o o Process Pi bool key; while(true) { key = true; do { Exchg(lock,key); }while(key); critical section lock = false; remainder section } Semaphore - synchronization tool that does not require busy waiting. Semaphore S integer variable introduced by Dijkstra can only be accessed via two indivisible (atomic) operations wait(S): S = S - 1; if S < 0 then block(S) signal(S): S = S + 1; if S <= 0 then wakeup(S) sometimes wait and signal are called down and up or P and V block(S) - results in suspension of the process invoking it (sometimes called sleep). wakeup(S) - results in resumption of exactly one process that has invoked block(S). 38 Example: critical section for n processes Shared variables : semaphore mutex=1; Process Pi while(true) { wait(&mutex); critical section signal(&mutex); remainder section } Implementation of the wait and signal operations so that they must execute atomically. Uniprocessor: o Disable interrupts around the code segment implementing the wait and signal operations. Multiprocessor: o If no special hardware provided, use a correct software solution to the critical-section problem, where the critical sections consist of the wait and signal operations. o Use special hardware if available, i.e., TestandSet: Implementation of wait(S) operation with the TestandSet instruction: Shared variables : boolean lock = false; Code for wait(S) while (TestandSet(lock)); S = S - 1; if (S < 0) { lock = false; block(S); } else lock = false; Race condition exists! Semaphore can be used as general synchronization tool: Execute B in Pj only after A executed in Pi Use semaphore flag initialized to 0 Code: Pi -. . . A signal(flag) Pj -. . . wait(flag) B 39 Deadlock - two or more processes are waiting indefinitely for an event that can be caused by only one of the waiting processes. Let S and Q be two semaphores initialized to 1 P0 -----wait(S) wait(Q) . . . signal(S) signal(Q) P1 -----wait(Q) wait(S) . . . signal(Q) signal(S) Starvation - indefinite blocking A process is never be removed from the semaphore queue in which it is suspended. Two types of semaphores: Counting semaphore - integer value can range over an unrestricted domain. Binary semaphore - integer value can range only between 0 and 1; can be simpler to implement. Classical Problems of Synchronization Bounded-Buffer Problem Readers and Writers Problem Dining-Philosophers Problem Bounded-Buffer Problem Shared data Producer process typedef .... item; item buffer[n]; semaphore full=0, empty=n, mutex=1; item nextp, nextc; while(true) { ... produce an item in nextp ... wait(&empty); /* wait while buffer is full */ wait(&mutex); ... add nextp to buffer ... signal(&mutex); signal(&full); /* one more in buffer */ } 40 Consumer process while(true) { wait(&full); /* wait while no data */ wait(&mutex); ... remove an item from buffer to nextc ... signal(&mutex); signal(&empty); /* one less in buffer */ ... consume the item in nextc ... } Readers-Writers Problem A number of processes, some reading data, some writing. Any number of processes can read at the same time, but if a writer is writing then no other process must be able to access the data. Shared data Writer process Reader process semaphore mutex=1, wrt=1; int readcount=0; wait(&wrt); ... writing is performed ... signal(&wrt); wait(&mutex); readcount = readcount + 1; if (readcount == 1) wait(&wrt); signal(&mutex); ... reading is performed ... wait(&mutex); readcount = readcount - 1; if (readcount == 0) signal(&wrt); signal(&mutex); 41 Dining-Philosophers Problem A Problem posed by Dijkstra in 1965 Possible solution to the problem: void philosopher(int no) { while(1) { ...think.... take_fork(no); take_fork((no+1) % N); ....eat..... put_fork(no); put_fork((no+1) % N); } } /* get the left fork */ /* get the right fork */ /* put left fork down */ /* put down right fork */ "take_fork" waits until the specified fork is available and then grabs it. 42 Unfortunately this solution will not work... what happens if all the philosophers grab their left fork at the same time. Better solution. Shared data int p[N]; semaphore s[N]=0; semaphore mutex=1; /* status of the philosophers */ /* semaphore for each philosopher */ /* semaphore for mutual exclusion */ Code #define LEFT(n) (n+N-1)%N #define RIGHT(n) (n+1)%N /* Macros to give left */ /* and right around the table */ void test(int no) { /* can philosopher 'no' eat */ if ((p[no] == HUNGRY) && (p[LEFT(no)] != EATING) && (p[RIGHT(no)] != EATING) ) { p[no]=EATING; signal(&s[no]); /* if so then eat */ } } void take_forks(int no) { wait(&mutex); */ p[no]=HUNGRY; test(no); signal(&mutex); wait(&s[no]); } /* get both forks */ /* only one at a time here please void put_forks(int no) { wait(&mutex); p[no]=THINKING; test(LEFT(no)); */ test(RIGHT(no)); signal(&mutex); } /* /* /* /* void philosopher(int no) { while(1) { ...think.... take_forks(no); ....eat..... put_forks(no); } return NULL; } /* I'm Hungry */ /* can I eat? */ /* wait until I can */ put the forks down */ only one at a time here */ let me think */ see if my neighbours can now eat /* get the forks */ /* put forks down */ 43 High-level synchronization constructs Monitors High-level synchronization construct that allows the safe sharing of an abstract data type among concurrent processes. (Hoare and Brinch Hansen 1974) A collection of procedures, variables and data structures. Only one process can be active in a monitor at any instant. monitor example integer i; condition c; procedure producer(x); begin . . . end procedure consumer(x); begin . . . end end monitor; To allow a process to wait within the monitor, a condition variable must be declared, as: condition x; Condition variables can only be used with the operations wait and signal. o The operation wait(x); means that the process invoking this operation is suspended until another process invokes signal(x); o The signal(x) operation resumes exactly one suspended process. If no process is suspended, then the signal operation has no effect. The producer consumer problem can be solved as follows using monitors: 44 monitor ProducerConsumer condition full, empty; integer count; procedure enter; begin if count = N then wait(full); ...enter item... count := count + 1; if count = 1 then signal(empty) end; procedure remove; begin if count = 0 then wait(empty); ...remove item... count := count - 1; if count = N - 1 then signal(full) end; count := 0; end monitor; procedure producer; begin while true do begin ...produce item... ProducerConsumer.enter end end; procedure consumer; begin while true do begin ProducerConsumer.remove; ...consume item... end end; The dining philosophers problem can also be solved easily. monitor dining-philosophers status state[n]; condition self[n]; procedure pickup (i:integer); begin state[i] := hungry; test (i); if state[i] <> eating then wait(self[i]); end; procedure putdown (i:integer); begin state[i] := thinking; test (i+4 mod 5); test (i+1 mod 5); end; 45 procedure test (k:integer); begin if state[k+4 mod 5] <> eating and state[k] = hungry and state[k+1 mod 5] <> eating then begin state[k] := eating; signal(self[k]); end; end; begin for i := 0 to 4 do state[i] := thinking; end end monitor procedure philosopher(no:integer); begin while true do begin ...think.... pickup(no); ....eat..... putdown(no) end end There are very few languages that support constructs such as monitors... expect this to change. One language that does is Java. Here is a Java class that can be used to solve the producer consumer problem. class CubbyHole { private int seq; private boolean available = false; public synchronized int get() { while (available == false) { try { wait(); } catch (InterruptedException e) { } } available = false; notify(); return seq; } public synchronized void put(int value) { while (available == true) { try { wait(); } catch (InterruptedException e) { } } seq = value; available = true; notify(); } } 46 Monitor implementation using semaphores. What happens when a monitor signals a condition variable? A process waiting on the variable can't be active at the same time as the signaling process, therefore: 2 choices. 1. Signaling process waits until the waiting process either leaves the monitor or waits for another condition. 2. Waiting process waits until the signaling process either leaves the monitor or waits for another condition. Variables semaphore mutex=1, next=0; int next-count=0; 'mutex' provides mutual exclusion inside the monitor. 'next' is used to suspend signaling processes. 'next-count' gives the number of processes suspended on 'next'. Each external procedure F will be replaced by Mutual exclusion within a monitor is ensured. by 'mutex' For each condition variable x, we have: semaphore x-sem=0; int x-count=0; The operation wait(x) can be implemented as: sem_wait(&mutex); ... body of F; ... if (next-count > 0) sem_signal(&next); else sem_signal(&mutex); x-count = x-count + 1; if (next-count > 0) sem_signal(&next); else sem_signal(&mutex); sem_wait(&x-sem); x-count = x-count - 1; The operation signal(x) can be implemented as: if (x-count > 0) { next-count = next-count + 1; sem_signal(&x-sem); sem_wait(&next); next-count = next-count - 1; } 47 Conditional-wait construct cond_wait(x,c); 'c' is an integer expression evaluated when the wait operation is executed. The value of c (priority number) is stored with the name of the process that is suspended. When signal(x) is executed, the process with smallest associated priority number is resumed next. Must check two conditions to establish the correctness of this system: o User processes must always make their calls on the monitor in a correct sequence. o Must ensure that an uncooperative process does not ignore the mutualexclusion gateway provided by the monitor, and try to access the shared resource directly, without using the access protocols. Atomic Transactions Transaction - program unit that must be executed atomically; that is, either all the operations associated with it are executed to completion, or none are performed. Must preserve atomicity despite possibility of failure. We are concerned here with ensuring transaction atomicity in an environment where failures result in the loss of information on volatile storage. Log-Based Recovery Write-ahead log - all updates are recorded on the log, which is kept in stable storage; log has following fields: o transaction name o data item name, old value, new value The log has a record of <Ti starts>, and either <Ti commits> if the transactions commits, or <Ti aborts> if the transaction aborts. Recovery algorithm uses two procedures: o undo(Ti) - restores value of all data updated by transaction Ti to the old values. It is invoked if the log contains record <Ti starts>, but not <Ti commits>. o redo(Ti ) - sets value of all data updated by transaction Ti to the new values. It is invoked if the log contains both <Ti starts> and <Ti commits>. Checkpoints - reduce recovery overhead 1. Output all log records currently residing in volatile storage onto stable storage. 48 2. Output all modified data residing in volatile storage to stable storage. 3. Output log record <checkpoint> onto stable storage. Recovery routine examines log to determine the most recent transaction Ti that started executing before the most recent checkpoint took place. o Search log backward for first <checkpoint> record. o Find subsequent <Ti start> record. redo and undo operations need to be applied to only transaction Ti and all transactions Tj that started executing after transaction Ti . Concurrent Atomic Transactions Serial schedule - the transactions are executed sequentially in some order. Example of a serial schedule in which T0 is followed by T1 : Conflicting operations - Oi and Oj conflict if they access the same data item, and at least one of these operations is a write operation. Conflict serialisable schedule - schedule that can be transformed into a serial schedule by a series of swaps of non-conflicting operations. Example of a concurrent serialisable schedule: T0 | T1 ---------|---------read(A) | write(A) | read(B) | write(B) | | read(A) | write(A) | read(B) | write(B) T0 | T1 ---------|---------read(A) | write(A) | | read(A) | write(A) read(B) | write(B) | | read(B) | write(B) Locking protocol governs how locks are acquired and released; data item can be locked in following modes: o Shared: If Ti has obtained a shared-mode lock on data item Q, then Ti can read this item, but it cannot write Q. o Exclusive: If Ti has obtained an exclusive mode lock on data item Q, then Ti can both read and write Q. 49 Two-phase locking protocol o Growing phase: A transaction may obtain locks, but may not release any lock. o Shrinking phase: A transaction may release locks, but may not obtain any new locks. The two-phase locking protocol ensures conflict serializability, but does not ensure freedom from deadlock. Timestamp-ordering scheme - transaction ordering protocol for determining serialisability order. o With each transaction Ti in the system, associate a unique fixed timestamp, denoted by TS(Ti ). o If Ti has been assigned timestamp TS(Ti ), and a new transaction Tj enters the system, then TS(Ti ) < TS(Tj ). Implement by assigning two timestamp values to each data item Q . o W-timestamp(Q) - denotes largest timestamp of any transaction that executed write(Q) successfully. o R-timestamp(Q) - denotes largest timestamp of any transaction that executed read(Q) successfully. Example of a schedule possible under the time stamp protocol: T0 | T1 ---------|---------read(B) | | read(B) | write(B) read(A) | | read(A) | write(A) There are schedules that are possible under the two-phase locking protocol but are not possible under the timestamp protocol, and vice versa. The timestamp-ordering protocol ensures conflict serializability; conflicting operations are processed in timestamp order. DEADLOCKS System Model Deadlock Characterization Methods for Handling Deadlocks Deadlock Prevention Deadlock Avoidance Deadlock Detection Recovery from Deadlock Combined Approach to Deadlock Handling 50 The Deadlock Problem A set of blocked processes each holding a resource and waiting to acquire a resource held by another process in the set. Example o System has 2 tape drives. o P1 and P2 each hold one tape drive and each needs another one. Example semaphores A and B , initialized to 1 P0 P1 ------ -----wait(A) wait(B) wait(B) wait(A) Example: bridge crossing o o o o o Traffic only in one direction. Each section of a bridge can be viewed as a resource. If a deadlock occurs, it can be resolved if one car backs up (preempt resources and rollback). Several cars may have to be backed up if a deadlock occurs. Starvation is possible. System Model Resource types R1 , R2 , ..., Rm-1 Examples of resource types - CPU cycles, memory space, I/O devices Each resource type Ri has Wi instances. e.g. 2 CPUs, 1 Floppy Disk, 2 Hard Disks Each process utilizes a resource (using system calls) as follows: o request o use o release 51 Deadlock Characterization - deadlock can arise if four conditions hold simultaneously. Mutual exclusion: only one process at a time can use a resource. Hold and wait: a process holding at least one resource is waiting to acquire additional resources held by other processes. No preemption: a resource can be released only voluntarily by the process holding it, after that process has completed its task. Circular wait: there exists a set {P0 , P1 , ..., Pn } of waiting processes such that P0 is waiting for a resource that is held by P1 , P1 is waiting for a resource that is held by P2 , ..., Pn -1 is waiting for a resource that is held by Pn , and Pn is waiting for a resource that is held by P 0 . Resource-Allocation Graph - a diagram showing allocations A set of vertices V and a set of edges E. V is partitioned into two types: o P = {P1 , P2 , ..., Pn }, the set consisting of all the processes in the system. o R = {R 1 , R 2 , ..., Rm }, the set consisting of all resource types in the system. request edge - directed edge Pi -> Rj assignment edge - directed edge Rj -> Pi Example Process Resource type with 4 instances Pi requests instance of R j 52 Pi is holding an instance of R j Example of a resource-allocation graph with no cycles. Example of a resource-allocation graph with a cycle. If graph contains no cycles -> no deadlock. If graph contains a cycle -> o if only one instance per resource type, then deadlock. o if several instances per resource type, possibility of deadlock. e.g. R={1r1,2r2,1r3},E={(p1,r1),(p2,r3),(r1,p2),(r2,p2),(r2,p1),(r3,p3),(p3,r2)} e.g. R={2r1,2r2},E={(p1,r1),(r1,p2),(r1,p3),(r2,p1),(p3,r2),(r2,p4)} 53 Methods for Handling Deadlocks Ensure that the system will never enter a deadlock state. (traffic lights) Allow the system to enter a deadlock state and then recover. (back up cars) Ignore the problem and pretend that deadlocks never occur in the system; used by most operating systems, including UNIX. Deadlock Prevention - restrain the ways resource requests can be made. Mutual Exclusion - not required for sharable resources; must hold for nonsharable resources. Hold and Wait - must guarantee that whenever a process requests a resource, it does not hold any other resources. o Require process to request and be allocated all its resources before it begins execution, or allow process to request resources only when the process has none. o Low resource utilization; starvation possible. No Preemption o If a process that is holding some resources requests another resource that cannot be immediately allocated to it, then all resources currently being held are released. o Preempted resources are added to the list of resources for which the process is waiting. o Process will be restarted only when it can regain its old resources, as well as the new ones that it is requesting. Circular Wait - impose a total ordering of all resource types, and require that each process requests resources in an increasing order of enumeration. Deadlock Avoidance - requires that the system has some additional a priori information available. Simplest and most useful model requires that each process declare the maximum number of resources of each type that it may need. The deadlock-avoidance algorithm dynamically examines the resource-allocation state to ensure that there can never be a circular-wait condition. Resource-allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes. 54 Safe State - when a process requests an available resource, system must decide if immediate allocation leaves the system in a safe state. System is in safe state if there exists a safe sequence of all processes. Sequence <P1 , P2 , ..., Pn > is safe if for each Pi , the resources that Pi can still request can be satisfied by the currently available resources plus the resources held by all the Pj , with j < i. o If Pi resource needs are not immediately available, then Pi can wait until all Pj have finished. o When Pj is finished, Pi can obtain needed resources, execute, return allocated resources, and terminate. o When Pi terminates, Pi+1 can obtain its needed resources, and so on. If a system is in safe state -> no deadlocks. If a system is in unsafe state -> possibility of deadlock. Avoidance -> ensure that a system will never enter an unsafe state. e.g. 12 instances of a resource. p0 p1 p2 Max Needs 10 4 9 Current Needs 5 2 2 systems is safe because <p1, p0, p2> satisfies safety condition. The following diagram shows how deadlock can occur. At point t, any move upwards would enter an unsafe state. 55 Resource-Allocation Graph Algorithm Claim edge Pi -> Rj indicates that process Pi may request resource Rj ; represented by a dashed line. Claim edge converts to request edge when a process requests a resource. When a resource is released by a process, assignment edge reconverts to a claim edge. Resources must be claimed a priori in the system. Example E={(r1,p1)} C={(p1,r2),(p2,r1),(p2,r2)} no cycles -> system is safe now if p2 requests r2 -> system is unsafe. Banker's Algorithm (Dijkstra 1965) Multiple resource types. Each process must a priori claim maximum use. When a process requests a resource it may have to wait. When a process gets all its resources it must return them in a finite amount of time. Data Structures for the Banker's algorithm where n = number of processes, and m = number of resource types. o Available: Vector of length m. If Available[j] = k, there are k instances of resource type Rj available. o Max: n x m matrix. If Max[i,j] = k, then process Pi may request at most k instances of resource type R j . o Allocation: n x m matrix. If Allocation[i,j] = k, then Pi is currently allocated k instances of R j . o Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to complete its task. Need[i,j] = Max[i,j] - Allocation[i,j]. Example: consider the following: A banker 10 thousand dollars and four customers Florence, Dougal, Dylan and Zebedee. each customer has a maximum need and and starts owing nothing. Name Used Max Florence 0 6 Dougal 0 5 Dylan 0 4 Zebedee 0 7 Available = 10 Safe 56 Name Used Max Florence 1 6 Dougal 1 5 Dylan 2 4 Zebedee 4 7 Available = 2 Safe, because any requests for loans, except to Dylan, can wait until Dylan repays his loan. Name Used Max Florence 1 6 Dougal 2 5 Dylan 2 4 Zebedee 4 7 Available = 1 Unsafe, since if all customers ask for their maximum, none will get it, causing deadlock. Safety Algorithm 1. Let Work and Finish be vectors of length m and n, respectively. Initialize: Work := Available Finish[i] := false for i = 1, 2, ..., n. 2. Find an i such that both: 1. Finish[i] = false 2. Need i <= Work (every element in Needi < every element in Work) If no such i exists, go to step 4. 3. Work := Work + Allocation i Finish[i] := true go to step 2. 4. If Finish[i] = true for all i, then the system is in a safe state. May require an order of m x n 2 operations to decide whether a state is safe. Resource-Request Algorithm for process Pi Request i = request vector for process Pi . If Request i [ j ] = k , then process Pi wants k instances of resource type R j . 57 1. If Request i <= Need i , go to step 2. Otherwise, raise error condition, since process has exceeded its maximum claim. 2. If Request i <= Available, go to step 3. Otherwise, Pi must wait, since resources are not available. 3. Pretend to allocate requested resources to Pi by modifying the state as follows: Available := Available - Request i ; Allocation i := Allocation i + Request i ; Need i := Need i - Request i ; o o If safe -> the resources are allocated to Pi . If unsafe -> Pi must wait, and the old resource-allocation state is restored. Example of Banker's algorithm 5 processes P 0 through P4 ; 3 resource types A (10 instances), B (5 instances), and C (7 instances). Snapshot at time T 0 : Allocation ---------A B C P0 0 1 0 P1 2 0 0 P2 3 0 2 P3 2 1 1 P4 0 0 2 4 Max --A B 7 5 3 2 9 0 2 2 3 3 C 3 2 2 2 Available --------A B C 3 3 2 Need ----A B C 7 4 3 1 2 2 6 0 0 0 1 1 4 3 1 Sequence <P1, P3, P4, P2, P0> satisfies safety criteria. P1 now requests resources. Request 1 = (1,0,2). o Check that Request 1 <= Available (that is, (1,0,2) <= (3,3,2)) -> true. o o o o o o o P4 Allocation Need -----------A B C A B C P0 0 1 0 7 4 3 P1 3 0 2 0 2 0 P2 3 0 2 6 0 0 P3 2 1 1 0 1 1 0 0 2 4 3 1 Available --------A B C 2 3 0 o Executing safety algorithm shows that sequence <P1, P3, P4, P0, P2> satisfies safety requirement. From this state, can request for (3,3,0) by P4 be granted? From this state, can request for (0,2,0) by P0 be granted? 58 Deadlock Detection Allow system to enter deadlock state Detection algorithm Recovery scheme Single Instance of Each Resource Type Maintain wait-for graph o Nodes are processes. o Pi ->Pj if Pi is waiting for Pj . Periodically invoke an algorithm that searches for a cycle in the graph. An algorithm to detect a cycle in a graph requires an order of n 2 operations, where n is the number of vertices in the graph. Several Instances of a Resource Type Data structures o Available: A vector of length m indicates the number of available resources of each type. o Allocation: An n x m matrix defines the number of resources of each type currently allocated to each process. o Request: An n x m matrix indicates the current request of each process. If Request[i,j] = k, then process Pi is requesting k more instances of resource type Rj . Detection Algorithm 1. Let Work and Finish be vectors of length m and n, respectively. Initialize: Work := Available. For i = 1, 2, ..., n, if Allocationi <> 0, then Finish[i] := false; otherwise, Finish[i] := true. 2. Find an index i such that both: 1. Finish[i] = false. 2. Request i <= Work. If no such i exists, go to step 4. 3. Work := Work + Allocation i Finish[i] := true go to step 2. 4. If Finish[i] = false, for some i, 1 <= i <= n, then the system is in a deadlock state. Moreover, if Finish[i] = false, then Pi is deadlocked. 59 Algorithm requires an order of m x n2 operations to detect whether the system is in a deadlocked state. Example of Detection algorithm Five processes P 0 through P4 ; three resource types A (7 instances), B (2 instances), and C (6 instances). Snapshot at time T 0 : P4 Available --------A B C 0 0 0 Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i. P2 requests an additional instance of type C. P4 Allocation Request ---------- ------A B C A B C P0 0 1 0 0 0 0 P1 2 0 0 2 0 2 P2 3 0 3 0 0 0 P3 2 1 1 1 0 0 0 0 2 0 0 2 P0 P1 P2 P3 0 Request ------A B C 0 0 0 2 0 2 0 0 1 1 0 0 0 2 State of system? o Can reclaim resources held by process P0 , but insufficient resources to fulfill other processes' requests. o Deadlock exists, consisting of processes P1 , P2 , P3 , and P4 . Detection-Algorithm Usage When, and how often, to invoke depends on: o How often a deadlock is likely to occur? o How many processes will need to be rolled back? one for each disjoint cycle If detection algorithm is invoked arbitrarily, there may be many cycles in the resource graph and so we would not be able to tell which of the many deadlocked processes ``caused'' the deadlock. 60 Recovery from Deadlock Process termination o Abort all deadlocked processes. o Abort one process at a time until the deadlock cycle is eliminated. o In which order should we choose to abort? Priority of the process. How long process has computed, and how much longer to completion. Resources the process has used. Resources process needs to complete. How many processes will need to be terminated. Is process interactive or batch? Resource Preemption o Selecting a victim - minimize cost. o Rollback - return to some safe state, restart process from that state. o Starvation - same process may always be picked as victim; include number of rollback in cost factor. Combined Approach to Deadlock Handling Combine the three basic approaches (prevention, avoidance, and detection), allowing the use of the optimal approach for each class of resources in the system. Partition resources into hierarchically ordered classes. Use most appropriate technique for handling deadlocks within each class. MEMORY MANAGEMENT Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging Background Program must be brought into memory and placed within a process for it to be executed. User programs go through several steps before being executed. 61 Address binding of instructions and data to memory addresses can happen at three stages: Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes. Load time: Must generate relocatable code if memory location is not known at compile time. Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers). Dynamic Loading - routine is not loaded until it is called. o Better memory-space utilization; unused routine is never loaded. o Useful when large amounts of code are needed to handle infrequently occurring cases. o No special support from the operating system is required; implemented through program design. Dynamic Linking - linking postponed until execution time. o Small piece of code, stub, used to locate the appropriate memory-resident library routine. o Stub replaces itself with the address of the routine, and executes the routine. o Operating system needed to check if routine is in processes' memory address. Overlays - keep in memory only those instructions and data that are needed at any given time. o Needed when process is larger than amount of memory allocated to it. o Implemented by user, no special support needed from operating system; programming design of overlay structure is complex. Logical versus Physical Address Space The concept of a logical address space that is bound to a separate physical address space is central to proper memory management. o Logical address - generated by the CPU; also referred to as virtual address. o Physical address - address seen by the memory unit. Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme. 62 Memory-management unit (MMU) - hardware device that maps virtual to physical address. In MMU scheme, the value in a relocation register is added to every address generated by a user process at the time it is sent to memory. The user program deals with logical addresses; it never sees the real physical addresses. Swapping A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution. Backing store - fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images. Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped. Modified versions of swapping are found on many systems, e.g., UNIX and Windows 95. Schematic view of swapping 63 Contiguous Allocation Main memory usually into two partitions: o Resident operating system, often held in low memory with interrupt vector. o User processes then held in high memory. Single-partition allocation o Relocation-register scheme used to protect user processes from each other, and from changing operating-system code and data. o Relocation register contains value of smallest physical address; limit register contains range of logical addresses - each logical address must be less than the limit register. Multiple-partition allocation o Hole - block of available memory; holes of various size are scattered throughout memory. o When a process arrives, it is allocated memory from a hole large enough to accommodate it. Example Operating system maintains information about: o allocated partitions o free partitions (holes) Dynamic storage-allocation problem - how to satisfy a request of size n from a list of free holes. o First-fit: Allocate the first hole that is big enough. o Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. o Worst-fit: Allocate the largest hole; must also search entire list. Produces the largest leftover hole. First-fit and best-fit better than worst-fit in terms of speed and storage utilization. External fragmentation - total memory space exists to satisfy a request, but it is not contiguous. 64 Internal fragmentation - allocated memory may be slightly larger than requested memory; difference between these two numbers is memory internal to a partition, but not being used. Reduce external fragmentation by compaction. o Shuffle memory contents to place all free memory together in one large block. o Compaction is possible only if relocation is dynamic, and is done at execution time. o I/O problem Latch job in memory while it is involved in I/O. Do I/O only into OS buffers. Paging - logical address space of a process can be noncontiguous; process is allocated physical memory wherever the latter is available. Divide physical memory into fixed-sized blocks called frames (size is power of 2, between 512 bytes and 8192 bytes). Divide logical memory into blocks of same size called pages. Keep track of all free frames. To run a program of size n pages, need to find n free frames and load program. Set up a page table to translate logical to physical addresses. No external fragmentation but internal fragmentation. Address generated by CPU is divided into: o Page number (p) - used as an index into a page table which contains base address of each page in physical memory. o Page offset (d) - combined with base address to define the physical memory address that is sent to the memory unit. 65 Separation between user's view of memory and actual physical memory reconciled by address translation hardware; logical addresses are translated into physical addresses. Implementation of page table Page table is kept in main memory. Page-table base register (PTBR) points to the page table. Page-table length register (PTLR) indicates size of the page table. In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction. The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative registers or translation look-aside buffers (TLBs). Associative registers - parallel search Page No | Frame No ________|_________ |________|_________| |________|_________| |________|_________| |________|_________| Address translation (A', A'') o o If A' in associative register, get frame number out. Otherwise get frame number from page table in memory. Hit ratio - percentage of times that a page number is found in the associative registers; ratio related to number of associative registers. Effective Access Time (EAT) o associative lookup = e time units o memory cycle time = m time units 66 o hit ratio = a EAT = (m + e) a+ (2m + e) (1 - a) = 2m + e - a Memory protection implemented by associating protection bits with each frame. Valid-invalid bit attached to each entry in the page table: o ``valid'' indicates that the associated page is in the process' logical address space, and is thus a legal page. o ``invalid'' indicates that the page is not in the process' logical address space. Write bit attached to each entry in the page table. o pages which have not been written may be shared between processes o do not need to be swapped - can be reloaded. Multilevel Paging - partitioning the page table allows the operating system to leave partitions unused until a process needs them. A two-level page-table scheme A logical address (on 32-bit machine with 4K page size) is divided into: 67 o o a page number consisting of 20 bits. a page offset consisting of 12 bits. Since the page table is paged, the page number is further divided into: o a 10-bit page number. o a 10-bit page offset. Thus, a logical address is as follows: where p1 is an index into the outer page table, and p2 is the displacement within the page of the outer page table. Address-translation scheme for a two-level 32-bit paging architecture Multilevel paging and performance o Since each level is stored as a separate table in memory, converting a logical address to a physical one may take four memory accesses. o Even though time needed for one memory access is quintupled (4 level paging) , caching permits performance to remain reasonable. o Cache hit rate of 98 percent, memory access of 100ns, TLB lookup 20ns, 4 level paging: effective access time = 0.98 x 120 + 0.02 x 520 = 128 nanoseconds which is only a 28 percent slowdown in memory access time. 68 Inverted Page Table One entry for each real page of memory; entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs. Use hash table to limit the search to one - or at most a few - page-table entries. Shared pages One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems). 69 Segmentation - memory-management scheme that supports user view of memory. A program is a collection of segments. A segment is a logical unit such as: code local variables global variables stack Example Logical address consists of a two tuple: <segment-number, offset> A segment table maps two-dimensional user defined addresses into onedimensional physical addresses; each entry in the table has: o base - contains the starting physical address where the segments reside in memory. o limit - specifies the length of the segment. Segment-table base register (STBR) points to the segment table's location in memory. Segment-table length register (STLR) indicates number of segments used by a program; segment number s is legal if s < STLR. Sharing shared segments same segment number 70 Protection With each entry in segment table associate: validation bit = 0 -> illegal segment read/write/execute privileges Allocation first fit/best fit external fragmentation Protection bits associated with segments; code sharing occurs at segment level. Since segments vary in length, memory allocation is a dynamic storage-allocation problem. 71 Segmentation with Paging The Intel Pentium uses segmentation with paging for memory management, with a two-level paging scheme. Considerations in comparing memory-management strategies: Hardware support Performance Fragmentation Relocation Swapping Sharing Protection 72 VIRTUAL MEMORY Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmentation Background Virtual memory - separation of user logical memory from physical memory. o Only part of the program needs to be in memory for execution. o Logical address space can therefore be much larger than physical address space. o Need to allow pages to be swapped in and out. Virtual memory can be implemented via: o Demand paging o Demand segmentation Demand Paging Bring a page into memory only when it is needed. o Less I/O needed o Less memory needed o Faster response o More users Page is needed => reference to it o invalid reference => abort o not-in-memory => bring to memory Valid-Invalid bit With each page table entry a valid-invalid bit is associated (1 = in-memory, 0 = not-in-memory) Initially valid-invalid bit is set to 0 on all entries. 73 Example of a page table snapshot. During address translation, if valid-invalid bit in page table entry is 0 => page fault. Page Fault 1. If there is ever a reference to a page, first reference will trap to OS -> page fault. 2. OS looks at another table to decide: 3. 4. 5. 6. a) Invalid reference => abort. b) Just not in memory. Get empty frame. Swap page into frame. Reset tables, validation bit = 1. Restart instruction: o block move o auto increment/decrement location What happens if there is no free frame? Page replacement - find some page in memory, but not really in use, swap it out. o algorithm o performance - want an algorithm which will result in minimum number of page faults. Same page may be brought into memory several times. 74 Performance of Demand Paging Page Fault Rate 0 <= p <= 1.0 if p = 0, no page faults if p = 1, every reference is a fault Effective Access Time (EAT) EAT = (1 - p ) x memory access + p (page fault overhead + [swap page out] + swap page in + restart overhead) Example: o memory access time = 1 microsecond o 50% of the time the page that is being replaced has been modified and therefore needs to be swapped out. o Swap Page Time = 10 msec = 10,000 msec o EAT = (1 - p ) * 1 + p (15000) = 1 + 15000P (in msec) Page Replacement Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. Use modify (dirty) bit to reduce overhead of page transfers - only modified pages are written to disk. Page replacement completes separation between logical memory and physical memory - large virtual memory can be provided on a smaller physical memory. Page-Replacement Algorithms Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. First-In-First-Out (FIFO) Algorithm Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process) 1 | 1 4 5 2 | 2 1 3 3 | 3 2 4 9 page faults 75 4 frames 1 2 3 4 | | | | 4 1 5 4 2 1 5 3 2 3 10 page faults FIFO Replacement - Belady's Anomaly more frames != less page faults Optimal Algorithm Replace the page that will not be used for the longest period of time. 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 2 3 4 4 6 page faults 5 How do you know this? Used for measuring how well an algorithm performs. Least Recently Used (LRU) Algorithm 1 2 3 4 5 5 3 4 Counter implementation o Every page entry has a counter; every time page is referenced through this entry, copy the clock into the counter. o When a page needs to be changed, look at the counters to determine which are to change Stack implementation - keep a stack of page numbers in a double link form: o Page referenced: move it to the top requires 6 pointers to be changed o No search for replacement 76 LRU Approximation Algorithms Reference bit o With each page associate a bit, initially = 0. o When page is referenced bit set to 1. o Replace the one which is 0 (if one exists). We do not know the order, however. Second chance o Need reference bit. o Clock replacement. o If page to be replaced (in clock order) has reference bit = 1, then: a) set reference bit 0. b) leave page in memory. c) replace next page (in clock order), subject to same rules. Counting Algorithms - keep a counter of the number of references that have been made to each page. o LFU Algorithm: replaces page with smallest count. o MFU Algorithm: based on the argument that the page with the smallest count was probably just brought in and has yet to be used. Page-Buffering Algorithm - desired page is read into a free frame from the pool before the victim is written out. Allocation of Frames Each process needs minimum number of pages. Example: IBM 370 - 6 pages to handle SS MOVE instruction: a) Instruction is 6 bytes, might span 2 pages. b) 2 pages to handle from. c) 2 pages to handle to. Two major allocation schemes: o fixed allocation o priority allocation Fixed allocation o Equal allocation o If 100 frames and 5 processes, give each 20 pages. Proportional allocation Allocate according to the size of process. 77 s i = size of process p i S = sum(s i) m = total number of frames a i = allocation for p i = (si/S) x m Example : m = 64 s 1 = 10 s 2 = 127 a 1 = 10/137 x 64 = 5 a 2 = 127/137 x 64 = 59 Priority allocation o Use a proportional allocation scheme using priorities rather than size. o If process Pi generates a page fault, select for replacement one of its frames. select for replacement a frame from a process with lower priority number. Global versus local allocation Global replacement - process selects a replacement frame from the set of all frames; one process can take a frame from another. Local replacement - each process selects from only its own set of allocated frames. Thrashing If a process does not have "enough'' pages, the page-fault rate is very high: o low CPU utilization. o operating system thinks that it needs to increase the degree of multiprogramming. o another process added to the system. Thrashing = a process is busy swapping pages in and out. 78 Why does paging work? Locality model o Process migrates from one locality to another. o Localities may overlap. Why does thrashing occur? sum(size of locality) > total memory size Working-Set Model D = working-set window = a fixed number of page references Example: 10,000 instruction WSSi - working set of process Pi = total number of pages referenced in the most recent D(varies in time) If D too small will not encompass entire locality. If D too large will encompass several localities. If D= infinity => will encompass entire program. D = sum(WSS i) = total demand frames If D > m => thrashing. Policy if D > m , then suspend one of the processes. How do you keep track of the working set? Approximate with: interval timer + a reference bit Example: D = 10,000 o Timer interrupts after every 5000 time units. o Keep in memory 2 bits for each page. o Whenever a timer interrupts copy and sets the values of all reference bits to 0. o If one of the bits in memory = 1 *page in working set. Not completely accurate (why?) Improve = 10 bits and interrupt every 1000 time units 79 Page-Fault Frequency Scheme Establish "acceptable'' page-fault rate. o If actual rate too low, process loses frame. o If actual rate too high, process gains frame. Other Consideration 1. Prepaging 2. Page size selection - fragmentation - table size - I/O overhead - locality 3. Program structure - Array A[1024,1024] of integer - Each row is stored in one page - One frame - Program 1 for j := 1 to 1024 do for i := 1 to 1024 do A[i,j] := 0; 1024 * 1024 page faults - Program 2 for i := 1 to 1024 do for j := 1 to 1024 do A[i,j] := 0; 1024 page faults 4. I/O interlock and addressing 80 Demand Segmentation - used when insufficient hardware to implement demand paging. OS/2 allocates memory in segments, which it keeps track of through segment descriptors. Segment descriptor contains a valid bit to indicate whether the segment is currently in memory. o If segment is in main memory, access continues, o If not in memory, segment fault. FILE-SYSTEM INTERFACE File Concept Access Methods Directory Structure Protection File Concept Contiguous logical address space Types: o Data numeric character binary o Program source object (load image) o Documents File Structure None - sequence of words, bytes Simple record structure o Lines o Fixed length o Variable length Complex Structures o Formatted document o Relocatable load file Can simulate last two with first method by inserting appropriate control characters. 81 Who decides: o Operating system o Program File Attributes o Name - only information kept in human readable form. o Type - needed for systems that support different types. o Location - pointer to file location on device. o Size - current file size. o Protection - controls who can do reading, writing, executing. o Time, date, and user identification - data for protection, security, and usage monitoring. Information about files are kept in the directory structure, which is maintained on the disk. File Operations create write read reposition within file - file seek delete truncate Access Methods Sequential Access Direct Access Directory Structure - a collection of nodes containing information about all files. Both the directory structure and the files reside on disk. Backups of these two structures are kept on tapes. 82 Organize the directory (logically) to obtain: Efficiency - locating a file quickly. Naming - convenient to users. o Two users can have same name for different files. o The same file can have several different names. Grouping - logical grouping of files by proper ties, e.g., all Pascal programs, all games, ... Single-Level Directory - a single directory for all users. Naming problem Grouping problem Two-Level Directory - separate directory for each user. Path name Can have the same file name for different user Efficient searching No grouping capability 83 Tree-Structured Directories Efficient searching Grouping capability Current directory (working directory) Absolute or relative path name Creating a new file is done in current directory. Delete a file cd /avi/books/os type ch1 rm <file-name> Creating a new subdirectory is done in current directory. mkdir <dir-name> Example: if in current directory /avi/books mkdir modula Deleting "books'' => deleting the entire subtree rooted by "books''. 84 Acyclic-Graph Directories - have shared subdirectories and files. Two different names (aliasing) If A deletes D => dangling pointer. Solutions: o Backpointers, so we can delete all pointers. Variable size records a problem. o Backpointers using a daisy chain organization. o Entry-hold-count solution. General Graph Directory How do we guarantee no cycles? o Allow only links to file not subdirectories. o Garbage collection. o Every time a new link is added use a cycle detection algorithm to determine whether it is OK. Protection File owner/creator should be able to control: o what can be done o by whom Types of access o Read o Write o Execute o Append o Delete o List 85 Access Lists and Groups Mode of access: read, write, execute Three classes of users a) owner access b) group access c) public access R 1 R 6 => 1 R 1 => 0 0 7 => W 1 W 1 W 1 X 1 X 0 X Ask manager to create a group (unique name), say G , and add some users to the group. For a particular file (say game) or subdirectory, define an appropriate access. chmod 761 game /|\ / | \ public | owner group Attach a group to a file chgrp G game FILE-SYSTEM IMPLEMENTATION File-System Structure Allocation Methods Free-Space Management Directory Implementation Efficiency and Performance Recovery File-System Structure File structure o Logical storage unit o Collection of related information File system resides on secondary storage (disks). File system organized into layers. File control block - storage structure consisting of information about a file. 86 Contiguous Allocation - each file occupies a set of contiguous blocks on the disk. Simple - only starting location (block #) and length (number of blocks) are required. Random access. Wasteful of space (dynamic storage-allocation problem). Files cannot grow. Mapping from logical to physical. If A is the logical Address and Q and R are the quotient and Remainder when LA is divided by the block size (512) then. Q = A div 512 R = A mod 512 o o Block to be accessed = Q + starting address Displacement into block = R Linked Allocation - each file is a linked list of disk blocks; blocks may be scattered anywhere on the disk. _______ block = |pointer| |-------| | | |_______| Allocate as needed, link together. Example: File starts at block 9 Simple - need only starting address Free-space management system - no waste of space No random access Mapping Q = A div 511, R = A mod 511 87 o o Block size is smaller to allow space for the pointer. Block to be accessed is the Qth block in the linked chain of blocks representing the file. o Displacement into block = R + 1 File-allocation table (FAT) - disk-space allocation used by Windows 95. Indexed Allocation - brings all pointers together into the index block. Need index table Random access Dynamic access without external fragmentation, but have overhead of index block. Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table. o o Q = displacement into index table R = displacement into block Mapping from logical to physical in a file of unbounded length (block size of 512 words). o Linked scheme - Link blocks of index tables (no limit on size). o Q = A div 512 R = A mod 512 Q1 = A div (512*511) R1 = A mod (512*511) o o o Q1 = block of index table R1 is used as follows: o o Q2 = displacement into block of index table R2 = displacement into block of file Q2 = R1 div 512 R2 = R1 mod 512 Two-level index (maximum file size is 5123 ) Q1 = A div (512*512), R1 = A mod (512*512) 88 o o o Q1 = displacement into outer-index R1 is used as follows: o o Q2 = displacement into block of index table R2 = displacement into block of file Q2 = R1 div 512 R2 = R1 mod 512 Combined scheme: UNIX (4K bytes per block) o o o directly accessed 48K bytes single indirection 222 bytes double indirection 232 bytes 89 Free-Space Management 1. Bit vector (n blocks) o Block number calculation (number of bits per word) * (number of 0-value words) + offset of first 1 bit o Bit map requires extra space. E.g.: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230 /212 = 218 = 256K o Easy to get contiguous files 2. Need to protect: o Pointer to free list o Bit map Must be kept on disk. Copy in memory and disk may differ. Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk. Solution: Set bit[i ] = 1 in disk. Allocate block[i ]. Set bit[i ] = 1 in memory. Linked list (free list) o Cannot get contiguous space easily o No waste of space Grouping Counting Directory Implementation Linear list of file names with pointers to the data blocks. o simple to program o time-consuming to execute Hash Table - linear list with hash data structure. o decreases directory search time o collisions - situations where two file names hash to the same location o fixed size 90 Efficiency and Performance Efficiency dependent on: o disk allocation and directory algorithms o types of data kept in file's directory entry Performance o disk cache - separate section of main memory for frequently used blocks o free-behind and read-ahead - techniques to optimize sequential access o improve PC performance by dedicating section of memory as virtual disk, or RAM disk Recovery Consistency checker - compares data in directory structure with data blocks on disk, and tries to fix inconsistencies. SECONDARY-STORAGE STRUCTURE Disk Structure Disk Scheduling Disk Management Swap-Space Management Disk Reliability Stable-Storage Implementation Disk Structure A disk can be viewed as an array of blocks. There exists a mapping scheme from logical block address B i to physical address (track, sec tor). o Smallest storage allocation area is a block. o Internal fragmentation on block. 91 Disk Scheduling Disk Requests - Track/Sector o Seek o Latency o Transfer Minimize Seek Time Seek Time is proportional to Seek Distance A number of different algorithms exist. We illus trate them with a request queue (0-199). 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 FCFS SSTF SCAN 92 C-SCAN LOOK C-LOOK Disk Management Disk formatting o physical o logical Boot block initializes system. Need methods to detect and handle bad blocks. Swap-Space Management Swap-space use Swap-space location o normal file system o separate disk partition Swap-space management o 4.3BSD allocates swap space when process starts (holds text segment and data segment). o Kernel uses swap maps to track swap-space use. 93 Disk Reliability Disk striping RAID o Mirroring or shadowing keeps duplicate of each disk. o Block interleaved parity. Stable-Storage Implementation Write-ahead log scheme requires stable storage. To implement stable storage: o Replicate information on more than one non volatile storage media with independent failure modes. o Update information in a controlled manner to ensure that failure during data transfer does not damage information. PROTECTION Goals of Protection Domain of Protection Access Matrix Implementation of Access Matrix Revocation of Access Rights Capability-Based Systems Language-Based Protection Goals of Protection Operating system consists of a collection of objects, hardware or software. Each object has a unique name and can be accessed through a well-defined set of operations. Protection problem - ensure that each object is accessed correctly and only by those processes that are allowed to do so. Domain Structure Access-right = <object-name,rights-set> Rights-set is a subset of all valid operations that can be performed on the object. Domain = set of access-rights 94 AR1 = <file_A,{Read,Write}> AR8 = <file_A,{Read}> Domain Implementation Simple Operating System consists of 2 domains: o user o supervisor UNIX o Domain = user-id o Domain switch accomplished via file system. Each file has associated with it a domain bit (setuid bit). When file is executed and setuid = on, then user-id is set to owner of the file being executed. When execution completes user-id is reset. Multics Rings - Let Di and Dj be any two domain rings. If j < i => Di is a subset of Dj . Access Matrix Rows - domains Columns - domains + objects Each entry - Access rights 95 Use of Access Matrix If a process in Domain Di tries to do "op" on object Oj , then "op" must be in the access matrix. Can be expanded to dynamic protection. o Operations to add, delete access rights. o Special access rights: owner of Oj control - switch from domain Di to Dj Access matrix design separates mechanism from policy. o Mechanism - operating system provides Access-matrix + rules. It ensures that the matrix is only manipulated by authorized agents and that rules are strictly enforced. Policy - user dictates policy. Who can access what object and in what mode. Implementation of Access Matrix Each column = Access-control list for one object Defines who can perform what operation. Domain 1 = Read,Write Domain 2 = Read Domain 3 = Read ... Each Row = Capability List (like a key) For each domain, what operations allowed on what objects. Object 1 - Read Object 4 - Read,Write,Execute Object 5 - Read,Write,Delete,Copy 96 Revocation of Access Rights Access List - Delete access rights from access list. o simple o immediate Capability List - Scheme required to locate capability in the system before capability can be revoked. o Reacquisition o Back-pointers Language-Based Protection Specification of protection in a programming language allows the high-level description of policies for the allocation and use of resources. Language implementation can provide software for protection enforcement when automatic hardware-supported checking is unavailable. Interpret protection specifications to generate calls on whatever protection system is provided by the hardware and the operating system. SECURITY The Security Problem Authentication Program Threats System Threats Threat Monitoring Encryption The Security Problem Security must consider external environment of the system, and protect it from: o unauthorized access. o malicious modification or destruction. o accidental introduction of inconsistency. Easier to protect against accidental than malicious misuse. Authentication User identity most often established through passwords, can be considered a special case of either keys or capabilities. Passwords must be kept secret. o Frequent change of passwords. o Use of ``non-guessable'' passwords (not in dictionary or username - 1st letter). o Log all invalid access attempts. 97 Program Threats Trojan Horse o Code segment that misuses its environment. o Exploits mechanisms for allowing programs written by users to be executed by other users. Trap Door o Specific user identifier or password that circumvents normal security procedures. o Could be included in a compiler or the kernel. System Threats Worms - use spawn mechanism; standalone program. Internet worm o Exploited UNIX networking features (remote access) and bugs in finger and sendmail programs. o Grappling hook program uploaded main worm program. Viruses - fragment of code embedded in a legitimate program. o Mainly effect microcomputer systems. o Downloading viral programs from public bulletin boards or exchanging floppy disks containing an infection. o Safe computing. Threat Monitoring Check for suspicious patterns of activity - i.e., several incorrect password attempts may signal password guessing. Audit log - records the time, user, and type of all accesses to an object; useful for recovery from a violation and developing better security measures. Scan the system periodically for security holes; done when the computer is relatively unused. Check for: o Short or easy-to-guess passwords o Unauthorized set-uid programs o Unauthorized programs in system directories o Unexpected long-running processes o Improper directory protections o Improper protections on system data files o Dangerous entries in the program search path (Trojan horse) o Changes to system programs; monitor check sum values 98 Encryption Encrypt clear text into cipher text. Properties of good encryption technique: o Relatively simple for authorized users to encrypt and decrypt data. o Encryption scheme depends not on the secrecy of the algorithm but on a parameter of the algorithm called the encryption key. o Extremely difficult for an intruder to determine the encryption key. Data Encryption Standard substitutes characters and rearranges their order on the basis of an encryption key provided to authorized users via a secure mechanism. Scheme only as secure as the mechanism. Public-key encryption based on each user having two keys: o public key - published key used to encrypt data. o private key - key known only to individual user used to decrypt data. Must be an encryption scheme that can be made public without making it easy to figure out the decryption scheme. o Efficient algorithm for testing whether or not a number is prime. o No efficient algorithm is known for finding the prime factors of a number. 99