Download Silberschatz_ AllNotes

Document related concepts

Plan 9 from Bell Labs wikipedia , lookup

RSTS/E wikipedia , lookup

Distributed operating system wikipedia , lookup

Spring (operating system) wikipedia , lookup

DNIX wikipedia , lookup

VS/9 wikipedia , lookup

CP/M wikipedia , lookup

Unix security wikipedia , lookup

Burroughs MCP wikipedia , lookup

Memory management unit wikipedia , lookup

Process management (computing) wikipedia , lookup

Paging wikipedia , lookup

Transcript
59.305 - Operating Systems
INTRODUCTION









What is an operating system?
Early Systems
Simple Batch Systems
Multiprogramming Batched Systems
Time-Sharing Systems
Personal-Computer Systems
Parallel Systems
Distributed Systems
Real-Time Systems
What is an Operating system
A program that acts as an intermediary between a user of a computer and the
computer hardware.
A systems program which controls all the computer's resources and provides a
base upon which application programs can be written.
Operating system goals:



Execute user programs and make solving user problems easier.
Make the computer system convenient to use.
Use the computer hardware in an efficient manner.
Computer System Components
1. Hardware - provides basic computing resources (CPU, memory, I/O devices).
2. Operating system - controls and coordinates the use of the hardware among the
various application programs for the various users.
3. Applications programs - define the ways in which the system resources are used
to solve the computing problems of the users (compilers, database systems, video
games, business pro grams).
4. Users (people, machines, other computers).
1
Operating System Functions



Resource allocator - manages and allocates resources.
Control program - controls the execution of user programs and operation of I/O
devices.
Kernel - the one program running at all times (all else being application
programs).
Early Systems - bare machine (early 1950s) - First
Generation.




Structure
o Large machines run from console
o Single user system
o Programmer/User as operator
o Paper tape or punched cards
Early Software
o Assemblers
o Loaders
o Linkers
o Libraries of common subroutines
o Compilers
o Device drivers
Secure
Inefficient use of expensive resources
o Low CPU utilization
o Significant amount of setup time
2
Simple Batch Systems - Second Generation.





Use an operator (somebody to work the machine)
Add a card reader (a device to read programs written on punched cards)
Reduce setup time by batching similar jobs
Automatic job sequencing - automatically transfers control from one job to
another. First rudimentary operating system.
Resident monitor
o initial control in monitor
o control transfers to job
o when job completes control transfers back to monitor
Problems:
1. How does the monitor know about the nature of the job (e.g., Fortran versus
Assembly) or which program to execute?
2. How does the monitor distinguish
a) job from job?
b) data from program?
Solution: introduce control cards
Control Cards

Special cards that tell the resident monitor which programs to run.

Parts of resident monitor
o Control card interpreter - responsible for reading and carrying out
instructions on the cards.
o Loader - loads systems programs and applications programs into memory.
-
3
o





Device drivers - know special characteristics and properties for each of the
system's I/O devices.
Problem: Slow Performance - since I/O and CPU could not overlap, and card
reader very slow.
Solution: Off-line operation - speed up computation by loading jobs into memory
from tapes and card reading and line printing done off-line using smaller
machines.
Advantage of off-line operation - main computer not constrained by the speed of
the card readers and line printers, but only by the speed of faster magnetic tape
units.
No changes need to be made to the application programs to change from direct to
off-line I/O operation.
Real gain - possibility of using multiple reader to-tape and tape-to-printer systems
for one CPU.
Spooling - overlap the I/O of one job with the computation of another job.
(Simultaneous Peripheral Operation On Line) Simple Multiprogramming.


While executing one job, the operating system:
o reads the next job from the card reader into a storage area on the disk (job
queue).
o outputs the printout of previous job from disk to the line printer.
Job pool - data structure that allows the operating system to select which job to
run next, in order to increase CPU utilization.
4
Multiprogramming and Time Sharing- Third
Generation
Multiprogramming

Several jobs are kept in main memory at the same time, and the CPU is shared
between them. Each job is called a process.
OS Features Needed for Multiprogramming




I/O routine supplied by the system.
Memory management - the system must allocate the memory to several jobs.
CPU scheduling - the system must choose among several jobs ready to run.
Allocation of devices.
Time-Sharing Systems- Interactive Computing




Most efficient for many users to share a large computer.
The CPU is shared between several processes.
Each process belongs to a user and I/O is to/from a separate terminal for each
user.
On-line file system must be available for users to access data and code.
Personal-Computer Systems - Fourth Generation




Personal computers - computer system dedicated to a single user.
I/O devices - keyboards, mice, display screens, small printers.
User convenience and responsiveness.
Can adopt technology developed for larger operating systems; often individuals
have sole use of computer and do not need advanced CPU utilization or protection
features.
5
Parallel Systems - multiprocessor systems with more
than one CPU in close communication.




Tightly coupled system - processors share memory and a clock; communication
usually takes place through the shared memory.
Advantages of parallel systems:
o Increased throughput
o Economical
o Increased reliability
Symmetric multiprocessing
o Each processor runs an identical copy of the operating system.
o Many processes can run at once without performance deterioration.
Asymmetric multiprocessing
o Each processor is assigned a specific task; master processor schedules and
allocates work to slave processors.
o More common in extremely large systems.
Distributed Systems - distribute the computation among
several physical processors.


Loosely coupled system - each processor has its own local memory; processors
communicate with one another through various communication lines, such as
high-speed networks.
Advantages of distributed systems:
o Resource sharing
o Computation speed up - load sharing
o Reliability
o Communication
Real-Time Systems



Often used as a control device in a dedicated application such as controlling
scientific experiments, medical imaging systems, industrial control systems, and
some display systems.
Well-defined fixed-time constraints.
OS must be able to respond very quickly.
COMPUTER-SYSTEM STRUCTURES



Computer-System Operation
I/O Structure
Storage Structure



Storage Hierarchy
Hardware Protection
General System Architecture
6
Computer-System Operation






I/O devices and the CPU can operate concurrently.
Each device controller is in charge of a particular device type.
Each device controller has a local buffer.
CPU moves data from/to main memory to/from the local buffers.
I/O is from the device to local buffer of controller.
Device controller informs CPU that it has finished its operation by causing an
interrupt.
Interrupts
Types
1. Hardware - Asynchronous
Device informs CPU that something has happened e.g. a key has been pressed on
the keyboard.
2. Hardware - Synchronous
CPU has tried to do something that has caused the interrupt. e.g. tried to read from
an invalid memory location. (not always a problem, it may mean that that page is
on disk needs to be fetched). Often called an Exception or Trap.
3. Software
CPU asked for the interrupt to happen. e.g. to perform an OS Call. Often called a
Trap.
Hardware Interrupts



I/O devices use Asynchronous Hardware Interrupts (i.e. caused by outside world
and may happen at any time).
Transfers control to the interrupt service routine, through the interrupt vector,
which contains the addresses of all the service routines.
CPU must save the address of the interrupted instruction.
Interrupt Handling
7




Interrupt handling is a very important part of the OS.
The operating system must preserve the state of the CPU by storing all registers.
Determine which type of interrupt has occurred:
o polling - ask each device if it caused the interrupt.
o vectored interrupt system - device identifies itself when it causes the
interrupt.
Separate segments of code determine what action should be taken for each type of
interrupt.
I/O Calls
Blocking I/O

User program requests I/O, control returns to user program only upon I/O
completion.
o CPU may be allocated to another process.
Non-Blocking I/O

After I/O starts, control returns to user program without waiting for I/O
completion.
Direct Memory Access (DMA) Structure



Used for high-speed I/O devices able to transmit information at close to memory
speeds.
Device controller transfers blocks of data from buffer storage directly to main
memory without CPU intervention.
Only one interrupt is generated per block, rather than the one interrupt per byte.
Storage Structure



Main memory - only large storage media that the CPU can access directly.
Secondary storage - extension of main memory that provides large non-volatile
storage capacity.
Magnetic disks
8
o
o
Disk surface is logically divided into tracks, which are subdivided into
sectors.
The disk controller determines the logical interaction between the device
and the computer.
Storage Hierarchy



Storage systems can be organized in a hierarchy:
o speed
o cost
o volatility
Most programs make accesses to memory which are localised
o in time
i.e. the program spends a lot of time executing short sections of code.
o in space
i.e. the program reads and writes to certain memory locations a lot; these
locations tend to be close together.
Caching - copying information into faster storage system; main memory can be
viewed as a fast cache for secondary memory.
Hardware Protection




Dual-Mode Operation
I/O Protection
Memory Protection
CPU Protection
Dual-Mode Operation


Sharing system resources requires operating system to ensure that an incorrect
program cannot cause other programs to execute incorrectly.
Provide hardware support to differentiate between at least two modes of
operations.
o
o


User mode
- execution done on behalf of a user.
Monitor mode (also supervisor mode or system mode)
- execution done on behalf of operating system.
Mode bit added to computer hardware (in CPU flags) to indicate the current
mode: monitor (0) or user (1).
When an interrupt or fault occurs hardware switches to monitor mode
9


Certain Privileged instructions can be issued only in monitor mode.
Some CPUs have more complex protection mechanisms with many levels of
protection (sometimes called rings).
I/O Protection


All I/O instructions are privileged instructions.
Must ensure that a user program could never gain control of the computer in
monitor mode
Memory Protection



Must provide memory protection at least for the interrupt vector and the interrupt
service routines.
In order to have memory protection, add two registers that determine the range of
legal addresses a program may access:
o base register - holds the smallest legal physical memory address.
o limit register - contains the size of the range.
Memory outside the defined range is protected.
10

Protection hardware

When executing in monitor mode, the operating system has unrestricted access to
both monitor and users' memory.
The load instructions for the base and limit registers are privileged instructions.
In practice, memory protection is much more complicated than this. A device
called a Memory Management Unit (MMU) controls access to memory.


CPU Protection - how does the OS stay in control.





Timer - interrupts computer after specified period to ensure operating system
maintains control.
o Timer is decremented every clock tick.
o When timer reaches the value 0, an interrupt occurs.
Timer used to implement multiprogramming.
Timer also used to compute the current time.
Load-timer is a privileged instruction.
User programs can not disable interrupts.
General-System Architecture


Given that I/O instructions are privileged, how does the user program perform
I/O?
System call - the method used by a process to request action by the operating
system.
o Usually takes the form of a trap (software interrupt).
o Control passes through an interrupt vector to a service routine in the OS,
and the mode bit is automatically set to supervisor mode.
o The OS verifies that the parameters are correct and legal, executes the
request, and returns control to the instruction following the system call.
11
OPERATING-SYSTEM STRUCTURES








System Components
Operating-System Services
System Calls
System Programs
System Structure
Virtual Machines
System Design and Implementation
Booting
Most operating systems support the following types of
system components:








Process Management
Main-Memory Management
Secondary-Storage Management
I/O System Management
File Management
Protection System
Networking
Command-Interpreter System
Process Management



A process is a program in execution. A process needs certain resources, including
CPU time, memory, files, and I/O devices, to accomplish its task.
The operating system is responsible for the following activities in connection with
process management:
o process creation and deletion.
o process suspension and resumption (scheduling).
o provision of mechanisms for:
 process synchronization
 process communication
Process management is usually performed by the kernel.
Main-Memory Management

Memory is a large array of words or bytes, each with its own address. It is a
repository of quickly accessible data shared by the CPU and I/O devices.
12

The operating system is responsible for the following activities in connection with
memory management:
o Keep track of which parts of memory are currently being used and by
whom.
o Decide which processes to load when memory space becomes available.
o Allocate and deallocate memory space as needed.
e.g. the C function 'malloc' (or 'New' in Pascal) allocates a specified
amount of memory; this happens via an OS call. The functions 'free'(C)
and 'Dispose'(Pascal) deallocate this memory.
I/O System Management


The I/O system consists of:
o A buffer-caching system
o A general device-driver interface
o Drivers for specific hardware devices (device drivers)
Device Drivers
o Must have access to I/O hardware
o Must be able to handle interrupts.
o Communicate with other parts of the OS (File system, Networking etc).
File Management


A file is a collection of related information.
Commonly, files represent programs (both source and object forms) and data.
Files may also be used to represent devices (e.g. lpt1: in DOS).
The operating system is responsible for the following activities in connection with
file management:
o File creation and deletion.
o Directory creation and deletion.
o Support of primitives for manipulating files and directories.
o Mapping files onto secondary storage.
e.g. free space allocation.
o File backup on stable (non-volatile) storage media.
Protection System


Protection refers to a mechanism for controlling access by programs, processes, or
users to both system and user resources.
Operating Systems commonly control access by using permissions. All system
resources have an owner and a permission associated with them. Users may be
combined into groups for the purpose of protection.
e.g. in UNIX every file has an owner and a group.
The following is a listing of all the information about a file.
rwxr-xr--
martin
staff
382983 Jan 18 10:20 notes305.html
13
The first field is the protection information; it shows the permissions for the
owner, then the group, then everybody else.
The first rwx means that the owner has read, write and execute permissions.
The next r-x means that the group has read and execute permissions.
The next r-- means that all other users have only read permission.

The name of the owner of the file is martin; the name of the group for the file is
staff; the length of the file is 382983 bytes; the file was created on Jan 18 at
10:20 and the name of the file is: notes305.html
There is usually a special user corresponding to the system administrator, this user
has permission do do anything. On UNIX systems this user is called root.
Networking (Distributed Systems)




A distributed system is a collection of processors that do not share memory or a
clock. Each processor has its own local memory.
The processors in the system are connected through a communication network.
A distributed system provides user access to various system resources.
Access to a shared resource allows:
o Computation speed-up
o Increased data availability
o Enhanced reliability
Command-Interpreter System


Many commands are given to the operating system by control statements which
deal with:
o process creation and management (e.g. running a program)
o I/O handling (e.g. set terminal type)
o secondary-storage management (e.g. format a disk)
o main-memory management (e.g. specify virtual memory parameters)
o file-system access (e.g. print file)
o protection (e.g. set permissions)
o networking (e.g. set IP address)
The program that reads and interprets control statements is called variously:
o command-line interpreter
o shell (in UNIX)
Its function is to get and execute the next command statement.
Some operating systems have no command line interpreter and use a GUI for all
system administration (e.g. NT).
14
Operating-System Services





Program execution - ability to load a program into memory and to run it.
I/O operations - since user programs cannot execute I/O operations directly, the
operating system must provide some means to perform I/O.
File-system manipulation - capability to read, write, create, and delete files.
Communications - exchange of information between processes executing either
on the same computer or on different systems tied together by a network.
Implemented via shared memory or message passing.
Error detection - ensure correct computing by detecting errors in the CPU and
memory hardware, in I/O devices, or in user programs.
Additional operating-system functions exist not for helping the user, but rather for
ensuring efficient system operation.



Resource allocation - allocating resources to multiple users or multiple processes
running at the same time.
Accounting - keep track of and record which users use how much and what kinds
of computer resources for account billing or for accumulating usage statistics.
Protection - ensuring that all access to system resources is controlled.
System Calls


System calls provide the interface between a running program and the operating
system.
o Generally available as an assembly-language instruction to generate a
software interrupt. (e.g. INT 21h in DOS)
o Systems programming languages such as C allow system calls to be made
directly.
Three general methods are used to pass parameters between a running program
and the operating system:
o Pass parameters in registers.
o Store the parameters in a table in memory, and the table address is passed
as a parameter in a register.
o Push (store) the parameters onto the stack by the program, and pop off the
stack by the operating system.
System Programs

System programs provide a convenient environment for program development
and execution. They can be divided into:
o File manipulation
o Status information
o File modification
o Programming-language support
15
o
o
o

Program loading and execution
Communications
Application programs
Most users' view of the operation system is defined by system programs, not the
actual system calls.
System Structure - Simple Approach


MS-DOS - written to provide the most functionality in the least space; it was not
divided into modules. MS-DOS has some structure, but its interfaces and levels of
functionality are not well separated.
UNIX - limited by hardware functionality, the original UNIX operating system
had limited structuring. The UNIX OS consists of two separable parts:
o the systems programs.
o the kernel, which consists of everything below the system-call interface
and above the physical hardware. Provides the file system, CPU
scheduling, memory management, and other operating-system functions; a
large number of functions for one level.
Often this is called a Monolithic Kernel
Many modern operating systems use a Microkernel - The kernel provides
only the following minimal services.
1. Interprocess communication.
2. Memory management.
3. Low level process management.
4. Low Level I/O
All other services are provided by user level processes.
16
System Structure - Layered Approach



The operating system is divided into a number of layers (levels), each built on top
of lower layers. The bottom layer (layer 0) is the hardware; the highest (layer N)
is the user interface.
With modularity, layers are selected such that each uses functions (operations)
and services of only lower-level layers.
A layered design was first used in the THE operating system of Dijkstra in 1968.
Its six layers are as follows:
_______________________________________________
Level 5: user programs
_______________________________________________
Level 4: buffering for input and output devices
_______________________________________________
Level 3: operator-console device driver
_______________________________________________
Level 2: memory management
_______________________________________________
Level 1: CPU scheduling
_______________________________________________
Level 0: hardware
_______________________________________________
Virtual Machines




A virtual machine takes the layered approach to its logical conclusion. It treats
hardware and the operating system kernel as though they were all hardware.
A virtual machine provides an interface identical to the underlying bare hardware.
The operating system creates the illusion of multiple processes, each executing on
its own processor with its own (virtual) memory.
The resources of the physical computer are shared to create the virtual machines.
o CPU scheduling can create the appearance that users have their own
processor.
o Spooling and a file system can provide virtual I/O.
o A terminal serves as the virtual machine console.
Advantages and Disadvantages of Virtual Machines



The virtual-machine concept provides complete protection of system resources
since each virtual machine is isolated from all other virtual machines. This
isolation, however, permits no direct sharing of resources.
A virtual-machine system is a perfect vehicle for operating-systems research and
development. System development is done on the virtual machine, instead of on a
physical machine and so does not disrupt normal system operation.
The virtual machine concept is difficult to implement due to the effort required to
provide an exact duplicate of the underlying machine.
17
System Design Goals


User goals - operating system should be convenient to use, easy to learn, reliable,
safe, and fast.
System goals - operating system should be easy to design, implement, and
maintain, as well as flexible, reliable, error-free, and efficient.
Mechanisms and Policies


Mechanisms determine how to do something; policies decide what will be done.
The separation of policy from mechanism is a very important principle; it allows
maximum flexibility if policy decisions are to be changed later.
System Implementation




Traditionally written in assembly language, operating systems are now mostly
written in higher level languages.
Code written in a high-level language:
o can be written faster.
o is more compact.
o is easier to understand and debug.
An operating system is far easier to port (move to some other hardware) if it is
written in a high level language.
The first OS to be written in a high-level language was UNIX, at the time there
were very few suitable languages and so a new one was developed from an
existing language called B - it was called C.
Booting




Modern operating systems are designed to run on machines with a wide range of
different hardware.
Booting - starting a computer by loading the kernel.
Bootstrap program - code stored in ROM that is able to locate the kernel, load it
into memory, and start its execution.
Once the kernel is loaded it must identify all the hardware present in the machine
and load relevant device drivers.
PROCESSES






Process Concepts
Process Scheduling
Processes Creation and Termination
Cooperating Processes
Threads
Interprocess Communication
18
Process Concepts







An operating system executes a variety of programs:
o Batch system - jobs
o Time-shared systems - user programs or tasks
Textbook uses the terms job and process almost interchangeably.
Process - a program in execution; process execution must progress in a sequential
fashion.
A process includes:
o program counter
o stack
o data section
As a process executes, it changes state.
o New: The process is being created.
o Running: Instructions are being executed.
o Waiting: The process is waiting for some event to occur.
o Ready: The process is waiting to be assigned to a processor.
o Terminated: The process has finished execution.
Diagram of process states:
Process Control Block (PCB) - Information associated with each process.
o Process ID (name, number)
o Process state
o Priority, owner, etc...
o Program counter
o CPU registers
o CPU scheduling information
o Memory-management information
o Accounting information
o I/O status information
19
Process Scheduling



Process scheduling queues
o job queue - set of all processes in the system.
o ready queue - set of all processes residing in main memory, ready and
waiting to execute.
o device queues - set of processes waiting for a particular I/O device.
Process migration between the various queues.
Schedulers
o Long-term scheduler (job scheduler) - selects which processes should be
brought into the ready queue.
o Short-term scheduler (CPU scheduler) - selects which process should be
executed next and allocates CPU.
20




Short-term scheduler is invoked very frequently
(milliseconds) => (must be fast).
Long-term scheduler is invoked very infrequently
(seconds, minutes) => (may be slow).
The long-term scheduler controls the degree of multiprogramming.
Processes can be described as either:
o I/O-bound process - spends more time doing I/O than computations; many
short CPU bursts.
o CPU-bound process - spends more time doing computations; few very
long CPU bursts.
Context Switch



When CPU switches to another process, the system must save the state of the old
process and load the saved state for the new process.
Context-switch time is overhead; the system does no useful work while switching.
Time dependent on hardware support.
Process Creation

Parent process creates children processes, which, in turn create other processes,
forming a tree of processes.
Example Process Tree



Resource sharing - 3 possibilities
o Parent and children share all resources.
o Children share subset of parent's resources.
o Parent and child share no resources.
Execution - 2 choices.
o Parent and children execute concurrently.
o Parent waits until children terminate.
Address space
o Child duplicate of parent.
o Child has a program loaded into it.
21

UNIX examples
o fork system call creates new process.
o The new process is an exact copy of the parent and continues execution
from the same point as its parent.
o The only difference between parent and child is the value returned from
the fork call.
 0 for the child.
 the process id (pid) of the child, for the parent.
o The execve system call used after a fork to replace the process' memory
space with a new program.
o Using fork and execve we can write a simple command line interpreter.
o
o
o
o
o
o
o
while (true) {
read_command_line(&command,&parameters);
if(fork()!=0) {
waitpid(-1,&status,0);
} else {
execve(command,parameters,0);
}
}
Process Termination


Process executes last statement and asks the operating system to delete it (exit).
o Output data from child to parent (via fork).
o Process' resources are deallocated by operating system.
Parent may terminate execution of children processes (abort).
o Child has exceeded allocated resources.
o Task assigned to child is no longer required.
o Parent is exiting.
 Operating system does not allow child to continue if its parent
terminates.
- Cascading termination.
Cooperating Processes



Independent process cannot affect or be affected by the execution of another
process.
Cooperating process can affect or be affected by the execution of another process.
Advantages of process cooperation:
o Information sharing
o Computation speed-up
o Modularity
o Convenience
22
Producer-Consumer Problem

Paradigm for cooperating processes; producer process produces information that
is consumed by a consumer process.
o unbounded-buffer places no practical limit on the size of the buffer.
o bounded-buffer assumes that there is a fixed buffer size.
Shared-memory solution:

Shared data





typedef .... item;
item buffer[0..N-1];
int in, out;
in = 0;
out = 0;
Producer process








while(true) {
...
produce an item in nextp
...
while((in+1)%n == out)
no-op;
buffer[in] = nextp;
in = (in+1)%n;
}

Consumer process








while(true) {
while(in == out)
no-op;
nextc = buffer[out];
out = (out+1)%n;
...
consume the item in nextc
...
}

Solution is correct, but uses busy waiting.

while(in == out)
no-op;
Uses CPU time doing nothing. Later we will see how to avoid this.
23
Threads





A thread (or lightweight process) is a basic unit of CPU utilization; it consists of:
o program counter
o register set
o stack space
A thread shares with its peer threads its:
o code section
o data section
o operating-system resources
A traditional or heavyweight process is equal to a task with one thread.
In a task containing multiple threads, while one server thread is blocked and
waiting, a second thread in the same task could run.
o Cooperation of multiple threads in same job confers higher throughput and
improved performance.
o Applications that require sharing a common buffer (producer-consumer
problem) benefit from thread utilization.
Threads provide a mechanism that allows sequential processes to make blocking
system calls while also achieving parallelism.
Types of threads




Kernel-supported threads; OS supports threads directly.
o Overhead for thread creation.
User-level threads; supported above the kernel, via a set of library calls at the user
level .
o Can not use multiple processors.
Hybrid approach implements both user-level and kernel-supported threads.
Two types of threads you are likely to see:
o POSIX threads
POSIX is a standard for UNIX systems - the standard includes a thread
library.
o WIN32 threads
- those available on Windows 95 and NT.
Interprocess Communication (IPC)
Provides a mechanism to allow processes to communicate / synchronize their actions.



Message system - processes communicate with each other without resorting to
shared variables.
IPC facility provides two operations:
o send(message) - messages can be of either fixed or variable size.
o receive(message)
If P and Q wish to communicate, they need to:
o establish a communication link between them
o exchange messages via send/receive
24
Implementation questions:






How are links established?
Can a link be associated with more than two processes?
How many links can there be between every pair of communicating processes?
What is the capacity of a link?
Is the size of a message that the link can accommodate fixed or variable?
Is a link unidirectional or bidirectional?
Direct Communication


Processes must name each other explicitly:
o send(P, message) - send a message to process P
o receive(Q, message) - receive a message from process Q
Properties of communication link
o Links are established automatically.
o A link is associated with exactly one pair of communicating processes.
o Between each pair there exists exactly one link.
o The link may be unidirectional, but is usually bidirectional.
Indirect Communication





Messages are directed and received from mail boxes (also referred to as ports).
o Each mailbox has a unique id.
o Processes can communicate only if they share a mailbox.
Properties of communication link
o Link established only if the two processes share a mailbox in common.
o A link may be associated with many processes.
o Each pair of processes may share several communication links.
o Link may be unidirectional or bidirectional.
Operations
o create a new mailbox
o send and receive messages through mailbox
o destroy a mailbox
Mailbox sharing
o P1 , P2 , and P3 share mailbox A.
o P1 sends; P2 and P3 receive.
o Who gets the message?
Solutions
o Allow a link to be associated with at most two processes.
o Allow only one process at a time to execute a receive operation.
o Allow the system to select arbitrarily the receiver. Sender is notified who
the receiver was.
25
Buffering - queue of messages attached to the link;
implemented in one of three ways.



Zero capacity - 0 messages Sender must wait for receiver (rendezvous).
Bounded capacity - finite length of n messages Sender must wait if link full.
Unbounded capacity - infinite length Sender never waits.
Exception Conditions - error recovery



Process terminates
Lost messages
Scrambled Messages
Pipes
A pipe is a simple method for communicating between two processes.




As far as the processes are concerned the pipe appears to be just like a file.
When A performs a write, it is buffered in the pipe.
When B reads then it reads from the pipe, blocking if there is no input.
in UNIX (and DOS) one process can be piped into another pipe using the '|'
character. e.g.
cat classlist | sort | more
the cat command prints the contents of the file 'classlist', this is piped into the
sort command which sorts the list. Finally, the sorted list is sent to the more
command that prints it one screenfull at a time.

Pipes may be implemented using shared memory (UNIX) or even with temporary
files (DOS).
26
CPU SCHEDULING



Basic Concepts
Scheduling Criteria
Scheduling Algorithms



Multiple-Processor Scheduling
Real-Time Scheduling
Algorithm Evaluation
Basic Concepts



Maximum CPU utilization obtained with multi programming.
CPU-I/O Burst Cycle - Process execution consists of a cycle of CPU execution
and I/O wait.
CPU burst distribution

Short-term scheduler -selects from among the processes in memory that are ready
to execute, and allocates the CPU to one of them.

CPU scheduling decisions may take place when a process:
1.
2.
3.
4.


switches from running to waiting state.
switches from running to ready state.
switches from waiting to ready.
terminates.
Scheduling under 1 and 4 is non-preemptive (cooperative).
All other scheduling is preemptive.
Dispatcher


Dispatcher module gives control of the CPU to the process selected by the shortterm scheduler; this involves:
o switching context
o switching to user mode
o jumping to the proper location in the user program to restart that program
Dispatch latency - time it takes for the dispatcher to stop one process and start
another running.
27
Scheduling Criteria






CPU utilization - keep the CPU as busy as possible
Throughput - # of processes that complete their execution per time unit
Turnaround time - amount of time to execute a particular process
Waiting time - amount of time a process has been waiting in the ready queue
Response time - amount of time it takes from when a request was submitted until
the first response is produced, not output (for time sharing environment)
Optimization
o Max CPU utilization
o Max throughput
o Minimum turnaround time
o Minimum waiting time
o Minimum response time
First-Come, First-Served (FCFS) Scheduling

Example:




Process
P1
P2
P3
Burst time
24
3
3
Suppose that the processes arrive in the order: P1, P2, P3.
A diagram to show this schedule is:


Waiting time for:
P1 = 0 , P2 = 24 , P3 = 27
Average waiting time: (0 + 24 + 27)/3 = 17
Suppose that the processes arrive in the order: P2 , P3 , P1.
The diagram for the schedule is:




Waiting time for:
P1 = 6, P2 = 0, P3 = 3
Average waiting time: (6 + 0 + 3)/3 = 3
Much better than previous case.
Convoy effect: short process behind long process
28
Shortest-Job-First (SJF) Scheduling


Associate with each process the length of its next CPU burst. Use these lengths to
schedule the process with the shortest time.
Two schemes:
a) non-preemptive - once CPU given to the process it cannot be preempted until it
completes its CPU burst.
b) preemptive - if a new process arrives with CPU burst length less than
remaining time of current executing process, preempt. This scheme is known as
the Shortest-Remaining Time-First (SRTF).

SJF is optimal - gives minimum average waiting time for a given set of processes.
Example of SJF

Process
Arrival time
CPU time
P1
P2
P3
P4
0
2
4
5
7
4
1
4
SJF (non-preemptive)
Average waiting time = (0 + 6 + 3 + 7)/4 = 4

SRTF (preemptive)
Average waiting time = (9 + 1 + 0 + 2)/4 = 3
29
How do we know the length of the next CPU burst?


Can only estimate the length.
Can be done by using the length of previous CPU bursts, using exponential
averaging.
1.
2.
3.
4.
Tn = actual length of n'th CPU burst
Pn = predicted value of n'th CPU burst
0 <= W <= 1
Define:
Pn+1 = W * Tn + (1-W) Pn
Examples:

W=0
Pn+1 = Pn
Recent history does not count.

W=1
Pn+1 = Tn
Only the actual last CPU burst counts.

If we expand the formula, we get:
Pn+1 = W Tn+ (1-W ) W Tn-1+ (1-W)2 W Tn-2+ ... + (1 - W )q W Tn-q
So if W = 1/2 - each successive term has less and less weight.
Priority Scheduling


A priority number (integer) is associated with each process.
The CPU is allocated to the process with the highest priority (smallest integer ->
highest priority).
a) preemptive
b) non-preemptive


SJF is a priority scheduling where priority is the predicted next CPU burst time.
Problem = Starvation (or indefinite blocking) - low priority processes may never
execute.
Solution = Aging - as time progresses increase the priority of the process.
30
Round Robin (RR)



Each process gets a small unit of CPU time (time quantum), usually 10-100
milliseconds. After this time has elapsed, the process is preempted and added to
the end of the ready queue.
If there are n processes in the ready queue and the time quantum is q , then each
process gets 1/n of the CPU time in chunks of at most q time units at once. No
process waits more than (n -1)q time units.
Performance
q large -> FIFO
q small -> q must be large with respect to context switch, otherwise overhead is
too high.
Example of RR with time quantum = 20

Process
CPU times
P1
P2
P3
P4
53
17
68
24
Typically, higher average turnaround than SRTF, but better response.
Multilevel Queue

Ready queue is partitioned into separate queues.
Example:
foreground (interactive)
background (batch)

Each queue has its own scheduling algorithm.
Example:
31
foreground - RR
background - FCFS

Scheduling must be done between the queues.
o Fixed priority scheduling
Example:
serve all from foreground then from background.
Possibility of starvation.

Time slice - each queue gets a certain amount of CPU time which it can schedule
amongst its processes.
Example:
80% to foreground in RR
20% to background in FCFS
Multilevel Feedback Queue


A process can move between the various queues; aging can be implemented this
way.
Multilevel-feedback-queue scheduler defined by the following parameters:
o number of queues
o scheduling algorithm for each queue
o method used to determine when to upgrade a process
o method used to determine when to demote a process
o method used to determine which queue a process will enter when that
process needs service
Example of multilevel feedback queue


Three queues:
o Q0 - time quantum 8 milliseconds
o Q1 - time quantum 16 milliseconds
o Q2 - FCFS
Scheduling
A new job enters queue Q0 which is served FCFS. When it gains CPU, job
receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to
queue Q1 . At Q1 , job is again served FCFS and receives 16 additional
milliseconds. If it still does not complete, it is preempted and moved to queue Q2 .
Multiple-Processor Scheduling
32




CPU scheduling more complex when multiple CPUs are available.
Homogeneous processors within a multiprocessor (CPUs must be the same).
Load sharing - use a common ready queue.
Each processor schedules itself, or one processor is used for scheduling.
Real-Time Scheduling


Hard real-time systems - required to complete a critical task within a guaranteed
amount of time.
Soft real-time computing - requires that critical processes receive priority over
less fortunate ones.
Algorithm Evaluation




Deterministic modeling - takes a particular predetermined workload and defines
the performance of each algorithm for that workload.
Queuing models - make a mathematical model based on the distributions of job
start times and burst times.
Simulation - write a program to schedule imaginary tasks using various
algorithms.
Implementation - code the algorithms into the OS.
Summary







2 queues - ready and I/O request.
FCFS simple but causes short jobs to wait for long jobs.
SJF is optimal giving shortest waiting time but need to know length of next burst.
SJF is a type of priority scheduling - may suffer from starvation - prevent using
aging
RR is gives good response time, it is preemptive. FCFS is non-preemptive priority
algorithms can be both. Problem selecting the quantum.
Multiple queue Algorithms use the best of each algorithm by having more than
one queue. Feedback queues allow jobs to move from queue to queue.
Algorithms may be evaluated by deterministic methods, mathematical models and
implementation.
PROCESS SYNCHRONIZATION




Background
The Critical-Section Problem
Synchronization Hardware
Semaphores




Classical Problems of
Synchronization
Critical Regions
Monitors
Atomic Transactions
33
Background



Concurrent access to shared data may result in data inconsistency.
Maintaining data consistency requires mechanisms to ensure the orderly
execution of cooperating processes.
Suppose that we modify the producer-consumer code by adding a variable
counter, initialized to 0 and incremented each time a new item is added to the
buffer.
The new scheme is illustrated by the following:

Shared data




typedef .... item;
item buffer[N];
int in=0, out=0, counter=0;
Producer process









while(true) {
...
produce an item in nextp
...
while(counter == n)
no-op;
buffer[in] = nextp;
in = (in+1)%n;
counter = counter + 1;
}

Consumer process









while(true) {
while(counter == 0)
no-op;
nextc = buffer[out];
out = (out+1)%n;
counter = counter - 1;
...
consume the item in nextc
...
}


The statements:
counter = counter + 1;
counter = counter - 1;
must be executed atomically.
34
The Critical-Section Problem









n processes all competing to use some shared data
Each process has a section of code code, called its critical section, in which the
shared data is accessed.
Problem - ensure that when one process is executing in its critical section, no
other process is allowed to execute in its critical section.
Structure of process Pi
while(true) {
entry section
critical section
exit section
remainder section
}
A solution to the critical-section problem must satisfy the following three requirements:
1. Mutual Exclusion. If process Pi is executing in its critical section, then no other
processes can be executing in their critical sections.
2. Progress. If no process is executing in its critical section and there are some
processes that wish to enter their critical section, then the selection of the
processes that will enter the critical section next cannot be postponed indefinitely.
3. Bounded Waiting. A bound must exist on the number of times that other
processes are allowed to enter their critical sections after a process has made a
request to enter its critical section and before that request is granted.


Assumption that each process is executing at a nonzero speed.
No assumption concerning relative speed of the n processes.
Initial attempts to solve the problem.

Only 2 processes, P0 and P1
General structure of process Pi (other process Pj )
while(true) {
entry section
critical section
exit section
remainder section
}

Processes may share some common variables to synchronize their actions.
35
Algorithm 1

Shared variables:
int turn=0;
turn = i -> Pi can enter its critical section






Process Pi
while(true) {
while(turn!=i) no-op;
critical section
turn = j;
remainder section
}

Satisfies mutual exclusion, but not progress.
Algorithm 2

Shared variables
bool flag[2] = {false, false};
flag[i] = true -> Pi ready to enter its critical section







Process Pi

Does not satisfy progress because:
If the two processes set their flags to true at the same time, then they will both
wait forever.
while(true) {
flag[i] = true;
while(flag[j]) no-op;
critical section
flag[i] = false;
remainder section
until false;
Algorithm 3









Combined shared variables of algorithms 1 and 2.
Process Pi
while(true) {
flag[i] = true;
turn = j;
while (flag[j] && turn==j) no-op;
critical section
flag[i] := false;
remainder section
}

Meets all three requirements; solves the critical section problem for two
processes.
36
Bakery Algorithm - Critical section for n processes



Before entering its critical section, process receives a number. Holder of the
smallest number enters the critical section.
If processes Pi and Pj receive the same number, if i < j , then Pi is served first; else
Pj is served first.
The numbering scheme always generates numbers in increasing order of
enumeration.
Example: 1,2,3,3,3,3,4,5...
Bakery Algorithm

Shared data bool choosing[n] = {false,..};
int number[n] = {0,..};
while(true) {
choosing[i] = true;
max=0;
for(i=0;i<n;i++)
if(max<number[i]) max = number[i];
number[i] = max + 1;
choosing[i] = false;
for (j = 0; j < n; j++) {
while (choosing[j]);
while (number[j] !> 0 &&
number[j] < number[i] ||
(number[j] == number[i] && j < i) );
}
critical section
number[i] = 0;
remainder section
}
Synchronization Hardware

Test and modify the content of a word atomically.




bool TestandSet (bool *target) {
bool t=*target; /* all this is */
*target=true;
/* done by one */
return t;
/* machine instruction */
}
void Exchg(bool *a,bool *b) {
bool temp=*a; /* all this is */
*a=*b;
/* done by one */
*b=temp;
/* machine instruction */
}
37

Mutual exclusion algorithm
o Shared data:
bool lock=false;
o
o
o
o
o
o
Process Pi
while(true) {
while TestandSet(lock) no-op;
critical section
lock = false;
remainder section
}

or
o
Shared data:
bool lock=false;
o
o
o
o
o
o
o
o
o
o
o
Process Pi
bool key;
while(true) {
key = true;
do {
Exchg(lock,key);
}while(key);
critical section
lock = false;
remainder section
}
Semaphore - synchronization tool that does not require
busy waiting.
Semaphore S


integer variable introduced by Dijkstra
can only be accessed via two indivisible (atomic) operations
wait(S): S = S - 1; if S < 0 then block(S)
signal(S): S = S + 1; if S <= 0 then wakeup(S)
sometimes wait and signal are called down and up or P and V


block(S) - results in suspension of the process invoking it (sometimes called
sleep).
wakeup(S) - results in resumption of exactly one process that has invoked
block(S).
38
Example: critical section for n processes

Shared variables :
semaphore mutex=1;






Process Pi
while(true) {
wait(&mutex);
critical section
signal(&mutex);
remainder section
}
Implementation of the wait and signal operations so that they must execute atomically.


Uniprocessor:
o Disable interrupts around the code segment implementing the wait and
signal operations.
Multiprocessor:
o If no special hardware provided, use a correct software solution to the
critical-section problem, where the critical sections consist of the wait and
signal operations.
o Use special hardware if available, i.e., TestandSet:
Implementation of wait(S) operation with the TestandSet instruction:
 Shared variables :
boolean lock = false;







Code for wait(S)
while (TestandSet(lock));
S = S - 1;
if (S < 0) {
lock = false;
block(S);
} else
lock = false;
Race condition exists!
Semaphore can be used as general synchronization tool:



Execute B in Pj only after A executed in Pi
Use semaphore flag initialized to 0
Code:






Pi
-.
.
.
A
signal(flag)
Pj
-.
.
.
wait(flag)
B
39

Deadlock - two or more processes are waiting indefinitely for an event that can be
caused by only one of the waiting processes.
Let S and Q be two semaphores initialized to 1
P0
-----wait(S)
wait(Q)
.
.
.
signal(S)
signal(Q)
P1
-----wait(Q)
wait(S)
.
.
.
signal(Q)
signal(S)
 Starvation - indefinite blocking
A process is never be removed from the semaphore queue in which it is suspended.
Two types of semaphores:


Counting semaphore - integer value can range over an unrestricted domain.
Binary semaphore - integer value can range only between 0 and 1; can be simpler
to implement.
Classical Problems of Synchronization



Bounded-Buffer Problem
Readers and Writers Problem
Dining-Philosophers Problem
Bounded-Buffer Problem




Shared data












Producer process
typedef .... item;
item buffer[n];
semaphore full=0, empty=n, mutex=1;
item nextp, nextc;
while(true) {
...
produce an item in nextp
...
wait(&empty); /* wait while buffer is full */
wait(&mutex);
...
add nextp to buffer
...
signal(&mutex);
signal(&full); /* one more in buffer */
}
40












Consumer process
while(true) {
wait(&full); /* wait while no data */
wait(&mutex);
...
remove an item from buffer to nextc
...
signal(&mutex);
signal(&empty); /* one less in buffer */
...
consume the item in nextc
...
}
Readers-Writers Problem
A number of processes, some reading data, some writing. Any number of processes can
read at the same time, but if a writer is writing then no other process must be able to
access the data.


Shared data





Writer process











Reader process
semaphore mutex=1, wrt=1;
int readcount=0;
wait(&wrt);
...
writing is performed
...
signal(&wrt);
wait(&mutex);
readcount = readcount + 1;
if (readcount == 1) wait(&wrt);
signal(&mutex);
...
reading is performed
...
wait(&mutex);
readcount = readcount - 1;
if (readcount == 0) signal(&wrt);
signal(&mutex);
41
Dining-Philosophers Problem
A Problem posed by Dijkstra in 1965
Possible solution to the problem:
void philosopher(int no) {
while(1) {
...think....
take_fork(no);
take_fork((no+1) % N);
....eat.....
put_fork(no);
put_fork((no+1) % N);
}
}
/* get the left fork */
/* get the right fork */
/* put left fork down */
/* put down right fork */
"take_fork" waits until the specified fork is available and then grabs it.
42
Unfortunately this solution will not work... what happens if all the philosophers grab their
left fork at the same time.
Better solution.
 Shared data
int p[N];
semaphore s[N]=0;
semaphore mutex=1;






































/* status of the philosophers */
/* semaphore for each philosopher */
/* semaphore for mutual exclusion */
Code
#define LEFT(n) (n+N-1)%N
#define RIGHT(n) (n+1)%N
/* Macros to give left */
/* and right around the table */
void test(int no) {
/* can philosopher 'no' eat */
if ((p[no] == HUNGRY) &&
(p[LEFT(no)] != EATING) &&
(p[RIGHT(no)] != EATING) ) {
p[no]=EATING;
signal(&s[no]);
/* if so then eat */
}
}
void take_forks(int no) {
wait(&mutex);
*/
p[no]=HUNGRY;
test(no);
signal(&mutex);
wait(&s[no]);
}
/* get both forks */
/* only one at a time here please
void put_forks(int no) {
wait(&mutex);
p[no]=THINKING;
test(LEFT(no));
*/
test(RIGHT(no));
signal(&mutex);
}
/*
/*
/*
/*
void philosopher(int no) {
while(1) {
...think....
take_forks(no);
....eat.....
put_forks(no);
}
return NULL;
}
/* I'm Hungry */
/* can I eat? */
/* wait until I can */
put the forks down */
only one at a time here */
let me think */
see if my neighbours can now eat
/* get the forks */
/* put forks down */
43
High-level synchronization constructs
Monitors
High-level synchronization construct that allows the safe sharing of an abstract data type
among concurrent processes. (Hoare and Brinch Hansen 1974)
A collection of procedures, variables and data structures. Only one process can be active
in a monitor at any instant.
monitor example
integer i;
condition c;
procedure producer(x);
begin
.
.
.
end
procedure consumer(x);
begin
.
.
.
end
end monitor;

To allow a process to wait within the monitor, a condition variable must be
declared, as:
condition x;

Condition variables can only be used with the operations wait and signal.
o The operation
wait(x);
means that the process invoking this operation is suspended until another
process invokes
signal(x);
o
The signal(x) operation resumes exactly one suspended process. If no
process is suspended, then the signal operation has no effect.
The producer consumer problem can be solved as follows using monitors:
44
monitor ProducerConsumer
condition full, empty;
integer count;
procedure enter;
begin
if count = N then wait(full);
...enter item...
count := count + 1;
if count = 1 then signal(empty)
end;
procedure remove;
begin
if count = 0 then wait(empty);
...remove item...
count := count - 1;
if count = N - 1 then signal(full)
end;
count := 0;
end monitor;
procedure producer;
begin
while true do
begin
...produce item...
ProducerConsumer.enter
end
end;
procedure consumer;
begin
while true do
begin
ProducerConsumer.remove;
...consume item...
end
end;
The dining philosophers problem can also be solved easily.
monitor dining-philosophers
status state[n];
condition self[n];
procedure pickup (i:integer);
begin
state[i] := hungry;
test (i);
if state[i] <> eating then wait(self[i]);
end;
procedure putdown (i:integer);
begin
state[i] := thinking;
test (i+4 mod 5);
test (i+1 mod 5);
end;
45
procedure test (k:integer);
begin
if state[k+4 mod 5] <> eating
and state[k] = hungry
and state[k+1 mod 5] <> eating
then begin
state[k] := eating;
signal(self[k]);
end;
end;
begin
for i := 0 to 4
do state[i] := thinking;
end
end monitor
procedure philosopher(no:integer);
begin
while true do
begin
...think....
pickup(no);
....eat.....
putdown(no)
end
end
There are very few languages that support constructs such as monitors... expect this to
change. One language that does is Java. Here is a Java class that can be used to solve the
producer consumer problem.
class CubbyHole {
private int seq;
private boolean available = false;
public synchronized int get() {
while (available == false) {
try {
wait();
} catch (InterruptedException e) {
}
}
available = false;
notify();
return seq;
}
public synchronized void put(int value) {
while (available == true) {
try {
wait();
} catch (InterruptedException e) {
}
}
seq = value;
available = true;
notify();
}
}
46
Monitor implementation using semaphores.
What happens when a monitor signals a condition variable?
A process waiting on the variable can't be active at the same time as the signaling
process, therefore: 2 choices.
1. Signaling process waits until the waiting process either leaves the monitor or
waits for another condition.
2. Waiting process waits until the signaling process either leaves the monitor or
waits for another condition.

Variables

semaphore mutex=1, next=0;
int next-count=0;
'mutex' provides mutual exclusion inside the monitor.
'next' is used to suspend signaling processes.
'next-count' gives the number of processes suspended on 'next'.

Each external procedure F will be replaced by








Mutual exclusion within a monitor is ensured. by 'mutex'
For each condition variable x, we have:


semaphore x-sem=0;
int x-count=0;
The operation wait(x) can be implemented as:






sem_wait(&mutex);
...
body of F;
...
if (next-count > 0)
sem_signal(&next);
else sem_signal(&mutex);
x-count = x-count + 1;
if (next-count > 0)
sem_signal(&next);
else sem_signal(&mutex);
sem_wait(&x-sem);
x-count = x-count - 1;
The operation signal(x) can be implemented as:





if (x-count > 0) {
next-count = next-count + 1;
sem_signal(&x-sem);
sem_wait(&next);
next-count = next-count - 1;
}
47

Conditional-wait construct
cond_wait(x,c);
'c' is an integer expression evaluated when the wait operation is executed.
The value of c (priority number) is stored with the name of the process that is
suspended.
When signal(x) is executed, the process with smallest associated priority number
is resumed next.

Must check two conditions to establish the correctness of this system:
o User processes must always make their calls on the monitor in a correct
sequence.
o Must ensure that an uncooperative process does not ignore the mutualexclusion gateway provided by the monitor, and try to access the shared
resource directly, without using the access protocols.
Atomic Transactions



Transaction - program unit that must be executed atomically; that is, either all the
operations associated with it are executed to completion, or none are performed.
Must preserve atomicity despite possibility of failure.
We are concerned here with ensuring transaction atomicity in an environment
where failures result in the loss of information on volatile storage.
Log-Based Recovery

Write-ahead log - all updates are recorded on the log, which is kept in stable
storage; log has following fields:
o transaction name
o data item name, old value, new value
The log has a record of <Ti starts>, and either <Ti commits> if the transactions
commits, or <Ti aborts> if the transaction aborts.

Recovery algorithm uses two procedures:
o undo(Ti) - restores value of all data updated by transaction Ti to the old
values. It is invoked if the log contains record <Ti starts>, but not <Ti
commits>.
o redo(Ti ) - sets value of all data updated by transaction Ti to the new
values. It is invoked if the log contains both <Ti starts> and <Ti commits>.
Checkpoints - reduce recovery overhead
1. Output all log records currently residing in volatile storage onto stable storage.
48
2. Output all modified data residing in volatile storage to stable storage.
3. Output log record <checkpoint> onto stable storage.


Recovery routine examines log to determine the most recent transaction Ti that
started executing before the most recent checkpoint took place.
o Search log backward for first <checkpoint> record.
o Find subsequent <Ti start> record.
redo and undo operations need to be applied to only transaction Ti and all
transactions Tj that started executing after transaction Ti .
Concurrent Atomic Transactions


Serial schedule - the transactions are executed sequentially in some order.
Example of a serial schedule in which T0 is followed by T1 :












Conflicting operations - Oi and Oj conflict if they access the same data item, and at
least one of these operations is a write operation.
Conflict serialisable schedule - schedule that can be transformed into a serial
schedule by a series of swaps of non-conflicting operations.
Example of a concurrent serialisable schedule:










T0
|
T1
---------|---------read(A) |
write(A) |
read(B) |
write(B) |
| read(A)
| write(A)
| read(B)
| write(B)
T0
|
T1
---------|---------read(A) |
write(A) |
| read(A)
| write(A)
read(B) |
write(B) |
| read(B)
| write(B)
Locking protocol governs how locks are acquired and released; data item can be
locked in following modes:
o Shared: If Ti has obtained a shared-mode lock on data item Q, then Ti can
read this item, but it cannot write Q.
o Exclusive: If Ti has obtained an exclusive mode lock on data item Q, then
Ti can both read and write Q.
49





Two-phase locking protocol
o Growing phase: A transaction may obtain locks, but may not release any
lock.
o Shrinking phase: A transaction may release locks, but may not obtain any
new locks.
The two-phase locking protocol ensures conflict serializability, but does not
ensure freedom from deadlock.
Timestamp-ordering scheme - transaction ordering protocol for determining
serialisability order.
o With each transaction Ti in the system, associate a unique fixed timestamp,
denoted by TS(Ti ).
o If Ti has been assigned timestamp TS(Ti ), and a new transaction Tj enters
the system, then TS(Ti ) < TS(Tj ).
Implement by assigning two timestamp values to each data item Q .
o W-timestamp(Q) - denotes largest timestamp of any transaction that
executed write(Q) successfully.
o R-timestamp(Q) - denotes largest timestamp of any transaction that
executed read(Q) successfully.
Example of a schedule possible under the time stamp protocol:









T0
|
T1
---------|---------read(B) |
| read(B)
| write(B)
read(A) |
| read(A)
| write(A)
There are schedules that are possible under the two-phase locking protocol but are
not possible under the timestamp protocol, and vice versa.
The timestamp-ordering protocol ensures conflict serializability; conflicting
operations are processed in timestamp order.
DEADLOCKS








System Model
Deadlock Characterization
Methods for Handling Deadlocks
Deadlock Prevention
Deadlock Avoidance
Deadlock Detection
Recovery from Deadlock
Combined Approach to Deadlock Handling
50
The Deadlock Problem



A set of blocked processes each holding a resource and waiting to acquire a
resource held by another process in the set.
Example
o System has 2 tape drives.
o P1 and P2 each hold one tape drive and each needs another one.
Example
semaphores A and B , initialized to 1




P0
P1
------ -----wait(A) wait(B)
wait(B) wait(A)
Example: bridge crossing
o
o
o
o
o
Traffic only in one direction.
Each section of a bridge can be viewed as a resource.
If a deadlock occurs, it can be resolved if one car backs up (preempt
resources and rollback).
Several cars may have to be backed up if a deadlock occurs.
Starvation is possible.
System Model

Resource types R1 , R2 , ..., Rm-1

Examples of resource types - CPU cycles, memory space, I/O devices
Each resource type Ri has Wi instances.

e.g. 2 CPUs, 1 Floppy Disk, 2 Hard Disks
Each process utilizes a resource (using system calls) as follows:
o request
o use
o release
51
Deadlock Characterization - deadlock can arise if four
conditions hold simultaneously.




Mutual exclusion: only one process at a time can use a resource.
Hold and wait: a process holding at least one resource is waiting to acquire
additional resources held by other processes.
No preemption: a resource can be released only voluntarily by the process holding
it, after that process has completed its task.
Circular wait: there exists a set {P0 , P1 , ..., Pn } of waiting processes such that P0
is waiting for a resource that is held by P1 , P1 is waiting for a resource that is held
by P2 , ..., Pn -1 is waiting for a resource that is held by Pn , and Pn is waiting for a
resource that is held by P 0 .
Resource-Allocation Graph - a diagram showing
allocations
A set of vertices V and a set of edges E.



V is partitioned into two types:
o P = {P1 , P2 , ..., Pn }, the set consisting of all the processes in the system.
o R = {R 1 , R 2 , ..., Rm }, the set consisting of all resource types in the
system.
request edge - directed edge Pi -> Rj
assignment edge - directed edge Rj -> Pi
Example

Process

Resource type with 4 instances

Pi requests instance of R j
52

Pi is holding an instance of R j

Example of a resource-allocation graph with no cycles.

Example of a resource-allocation graph with a cycle.


If graph contains no cycles -> no deadlock.
If graph contains a cycle ->
o if only one instance per resource type, then deadlock.
o if several instances per resource type, possibility of deadlock.
e.g. R={1r1,2r2,1r3},E={(p1,r1),(p2,r3),(r1,p2),(r2,p2),(r2,p1),(r3,p3),(p3,r2)}
e.g. R={2r1,2r2},E={(p1,r1),(r1,p2),(r1,p3),(r2,p1),(p3,r2),(r2,p4)}
53
Methods for Handling Deadlocks



Ensure that the system will never enter a deadlock state. (traffic lights)
Allow the system to enter a deadlock state and then recover. (back up cars)
Ignore the problem and pretend that deadlocks never occur in the system; used by
most operating systems, including UNIX.
Deadlock Prevention - restrain the ways resource
requests can be made.




Mutual Exclusion - not required for sharable resources; must hold for nonsharable
resources.
Hold and Wait - must guarantee that whenever a process requests a resource, it
does not hold any other resources.
o Require process to request and be allocated all its resources before it
begins execution, or allow process to request resources only when the
process has none.
o Low resource utilization; starvation possible.
No Preemption o If a process that is holding some resources requests another resource that
cannot be immediately allocated to it, then all resources currently being
held are released.
o Preempted resources are added to the list of resources for which the
process is waiting.
o Process will be restarted only when it can regain its old resources, as well
as the new ones that it is requesting.
Circular Wait - impose a total ordering of all resource types, and require that each
process requests resources in an increasing order of enumeration.
Deadlock Avoidance - requires that the system has some
additional a priori information available.



Simplest and most useful model requires that each process declare the maximum
number of resources of each type that it may need.
The deadlock-avoidance algorithm dynamically examines the resource-allocation
state to ensure that there can never be a circular-wait condition.
Resource-allocation state is defined by the number of available and allocated
resources, and the maximum demands of the processes.
54
Safe State - when a process requests an available
resource, system must decide if immediate allocation
leaves the system in a safe state.





System is in safe state if there exists a safe sequence of all processes.
Sequence <P1 , P2 , ..., Pn > is safe if for each Pi , the resources that Pi can still
request can be satisfied by the currently available resources plus the resources
held by all the Pj , with j < i.
o If Pi resource needs are not immediately available, then Pi can wait until
all Pj have finished.
o When Pj is finished, Pi can obtain needed resources, execute, return
allocated resources, and terminate.
o When Pi terminates, Pi+1 can obtain its needed resources, and so on.
If a system is in safe state -> no deadlocks.
If a system is in unsafe state -> possibility of deadlock.
Avoidance -> ensure that a system will never enter an unsafe state.
e.g. 12 instances of a resource.
p0
p1
p2
Max Needs
10
4
9
Current Needs
5
2
2
systems is safe because <p1, p0, p2> satisfies safety condition.
The following diagram shows how deadlock can occur.
At point t, any move upwards would enter an unsafe state.
55
Resource-Allocation Graph Algorithm




Claim edge Pi -> Rj indicates that process Pi may request resource Rj ;
represented by a dashed line.
Claim edge converts to request edge when a process requests a resource.
When a resource is released by a process, assignment edge reconverts to a claim
edge.
Resources must be claimed a priori in the system.
Example
E={(r1,p1)} C={(p1,r2),(p2,r1),(p2,r2)}
no cycles -> system is safe
now if p2 requests r2 -> system is unsafe.
Banker's Algorithm (Dijkstra 1965)





Multiple resource types.
Each process must a priori claim maximum use.
When a process requests a resource it may have to wait.
When a process gets all its resources it must return them in a finite amount of
time.
Data Structures for the Banker's algorithm where n = number of processes, and m
= number of resource types.
o Available: Vector of length m. If Available[j] = k, there are k instances of
resource type Rj available.
o Max: n x m matrix. If Max[i,j] = k, then process Pi may request at most k
instances of resource type R j .
o Allocation: n x m matrix. If Allocation[i,j] = k, then Pi is currently
allocated k instances of R j .
o Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of
Rj to complete its task.
Need[i,j] = Max[i,j] - Allocation[i,j].
Example: consider the following:
A banker 10 thousand dollars and four customers Florence, Dougal, Dylan and Zebedee.
each customer has a maximum need and and starts owing nothing.
Name
Used Max
Florence 0
6
Dougal
0
5
Dylan
0
4
Zebedee
0
7
Available = 10
Safe
56
Name
Used Max
Florence 1
6
Dougal
1
5
Dylan
2
4
Zebedee
4
7
Available = 2
Safe, because any requests for loans, except to Dylan, can wait until Dylan repays his
loan.
Name
Used Max
Florence 1
6
Dougal
2
5
Dylan
2
4
Zebedee
4
7
Available = 1
Unsafe, since if all customers ask for their maximum, none will get it, causing deadlock.
Safety Algorithm
1. Let Work and Finish be vectors of length m and n, respectively.
Initialize:
Work := Available
Finish[i] := false for i = 1, 2, ..., n.
2. Find an i such that both:
1. Finish[i] = false
2. Need i <= Work (every element in Needi < every element in Work)
If no such i exists, go to step 4.
3. Work := Work + Allocation i
Finish[i] := true
go to step 2.
4. If Finish[i] = true for all i, then the system is in a safe state.
May require an order of m x n 2 operations to decide whether a state is safe.
Resource-Request Algorithm for process Pi
Request i = request vector for process Pi .
If Request i [ j ] = k , then process Pi wants k instances of resource type R j .
57
1. If Request i <= Need i , go to step 2. Otherwise, raise error condition, since
process has exceeded its maximum claim.
2. If Request i <= Available, go to step 3. Otherwise, Pi must wait, since resources
are not available.
3. Pretend to allocate requested resources to Pi by modifying the state as follows:
Available := Available - Request i ;
Allocation i := Allocation i + Request i ;
Need i := Need i - Request i ;
o
o
If safe -> the resources are allocated to Pi .
If unsafe -> Pi must wait, and the old resource-allocation state is restored.
Example of Banker's algorithm











5 processes P 0 through P4 ; 3 resource types A (10 instances), B (5 instances), and
C (7 instances).
Snapshot at time T 0 :
Allocation
---------A B C
P0 0 1 0
P1 2 0 0
P2 3 0 2
P3 2 1 1
P4 0 0 2
4
Max
--A B
7 5
3 2
9 0
2 2
3 3
C
3
2
2
2
Available
--------A B C
3 3 2
Need
----A B C
7 4 3
1 2 2
6 0 0
0 1 1
4 3 1
Sequence <P1, P3, P4, P2, P0> satisfies safety criteria.
P1 now requests resources.
Request 1 = (1,0,2).
o Check that Request 1 <= Available (that is, (1,0,2) <= (3,3,2)) -> true.
o
o
o
o
o
o
o
P4
Allocation
Need
-----------A B C
A B C
P0 0 1 0
7 4 3
P1 3 0 2
0 2 0
P2 3 0 2
6 0 0
P3 2 1 1
0 1 1
0 0 2
4 3 1
Available
--------A B C
2 3 0
o


Executing safety algorithm shows that sequence <P1, P3, P4, P0, P2>
satisfies safety requirement.
From this state, can request for (3,3,0) by P4 be granted?
From this state, can request for (0,2,0) by P0 be granted?
58
Deadlock Detection



Allow system to enter deadlock state
Detection algorithm
Recovery scheme
Single Instance of Each Resource Type



Maintain wait-for graph
o Nodes are processes.
o Pi ->Pj if Pi is waiting for Pj .
Periodically invoke an algorithm that searches for a cycle in the graph.
An algorithm to detect a cycle in a graph requires an order of n 2 operations,
where n is the number of vertices in the graph.
Several Instances of a Resource Type

Data structures
o Available: A vector of length m indicates the number of available
resources of each type.
o Allocation: An n x m matrix defines the number of resources of each type
currently allocated to each process.
o Request: An n x m matrix indicates the current request of each process. If
Request[i,j] = k, then process Pi is requesting k more instances of resource
type Rj .
Detection Algorithm
1. Let Work and Finish be vectors of length m and n, respectively. Initialize:
Work := Available.
For i = 1, 2, ..., n, if Allocationi <> 0, then Finish[i] := false; otherwise, Finish[i]
:= true.
2. Find an index i such that both:
1. Finish[i] = false.
2. Request i <= Work.
If no such i exists, go to step 4.
3. Work := Work + Allocation i
Finish[i] := true
go to step 2.
4. If Finish[i] = false, for some i, 1 <= i <= n, then the system is in a deadlock state.
Moreover, if Finish[i] = false, then Pi is deadlocked.
59
 Algorithm requires an order of m x n2 operations to detect whether the system is in a
deadlocked state.
Example of Detection algorithm


Five processes P 0 through P4 ; three resource types A (7 instances), B (2
instances), and C (6 instances).
Snapshot at time T 0 :







P4


Available
--------A B C
0 0 0
Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i.
P2 requests an additional instance of type C.







P4

Allocation Request
---------- ------A B C
A B C
P0 0 1 0
0 0 0
P1 2 0 0
2 0 2
P2 3 0 3
0 0 0
P3 2 1 1
1 0 0
0 0 2
0 0 2
P0
P1
P2
P3
0
Request
------A B C
0 0 0
2 0 2
0 0 1
1 0 0
0 2
State of system?
o Can reclaim resources held by process P0 , but insufficient resources to
fulfill other processes' requests.
o Deadlock exists, consisting of processes P1 , P2 , P3 , and P4 .
Detection-Algorithm Usage


When, and how often, to invoke depends on:
o How often a deadlock is likely to occur?
o How many processes will need to be rolled back?
 one for each disjoint cycle
If detection algorithm is invoked arbitrarily, there may be many cycles in the
resource graph and so we would not be able to tell which of the many deadlocked
processes ``caused'' the deadlock.
60
Recovery from Deadlock


Process termination
o Abort all deadlocked processes.
o Abort one process at a time until the deadlock cycle is eliminated.
o In which order should we choose to abort?
 Priority of the process.
 How long process has computed, and how much longer to
completion.
 Resources the process has used.
 Resources process needs to complete.
 How many processes will need to be terminated.
 Is process interactive or batch?
Resource Preemption
o Selecting a victim - minimize cost.
o Rollback - return to some safe state, restart process from that state.
o Starvation - same process may always be picked as victim; include
number of rollback in cost factor.
Combined Approach to Deadlock Handling



Combine the three basic approaches (prevention, avoidance, and detection),
allowing the use of the optimal approach for each class of resources in the system.
Partition resources into hierarchically ordered classes.
Use most appropriate technique for handling deadlocks within each class.
MEMORY MANAGEMENT







Background
Logical versus Physical Address Space
Swapping
Contiguous Allocation
Paging
Segmentation
Segmentation with Paging
Background


Program must be brought into memory and placed within a process for it to be
executed.
User programs go through several steps before being executed.
61
Address binding of instructions and data to memory
addresses can happen at three stages:






Compile time: If memory location known a priori, absolute code can be
generated; must recompile code if starting location changes.
Load time: Must generate relocatable code if memory location is not known at
compile time.
Execution time: Binding delayed until run time if the process can be moved
during its execution from one memory segment to another. Need hardware
support for address maps (e.g., base and limit registers).
Dynamic Loading - routine is not loaded until it is called.
o Better memory-space utilization; unused routine is never loaded.
o Useful when large amounts of code are needed to handle infrequently
occurring cases.
o No special support from the operating system is required; implemented
through program design.
Dynamic Linking - linking postponed until execution time.
o Small piece of code, stub, used to locate the appropriate memory-resident
library routine.
o Stub replaces itself with the address of the routine, and executes the
routine.
o Operating system needed to check if routine is in processes' memory
address.
Overlays - keep in memory only those instructions and data that are needed at any
given time.
o Needed when process is larger than amount of memory allocated to it.
o Implemented by user, no special support needed from operating system;
programming design of overlay structure is complex.
Logical versus Physical Address Space


The concept of a logical address space that is bound to a separate physical address
space is central to proper memory management.
o Logical address - generated by the CPU; also referred to as virtual address.
o Physical address - address seen by the memory unit.
Logical and physical addresses are the same in compile-time and load-time
address-binding schemes; logical (virtual) and physical addresses differ in
execution-time address-binding scheme.
62
Memory-management unit (MMU) - hardware device
that maps virtual to physical address.


In MMU scheme, the value in a relocation register is added to every address
generated by a user process at the time it is sent to memory.
The user program deals with logical addresses; it never sees the real physical
addresses.
Swapping





A process can be swapped temporarily out of memory to a backing store, and then
brought back into memory for continued execution.
Backing store - fast disk large enough to accommodate copies of all memory
images for all users; must provide direct access to these memory images.
Major part of swap time is transfer time; total transfer time is directly proportional
to the amount of memory swapped.
Modified versions of swapping are found on many systems, e.g., UNIX and
Windows 95.
Schematic view of swapping
63
Contiguous Allocation







Main memory usually into two partitions:
o Resident operating system, often held in low memory with interrupt
vector.
o User processes then held in high memory.
Single-partition allocation
o Relocation-register scheme used to protect user processes from each other,
and from changing operating-system code and data.
o Relocation register contains value of smallest physical address; limit
register contains range of logical addresses - each logical address must be
less than the limit register.
Multiple-partition allocation
o Hole - block of available memory; holes of various size are scattered
throughout memory.
o When a process arrives, it is allocated memory from a hole large enough
to accommodate it.
Example
Operating system maintains information about:
o allocated partitions
o free partitions (holes)
Dynamic storage-allocation problem - how to satisfy a request of size n from a list
of free holes.
o First-fit: Allocate the first hole that is big enough.
o Best-fit: Allocate the smallest hole that is big enough; must search entire
list, unless ordered by size. Produces the smallest leftover hole.
o Worst-fit: Allocate the largest hole; must also search entire list. Produces
the largest leftover hole.
First-fit and best-fit better than worst-fit in terms of speed and storage utilization.
External fragmentation - total memory space exists to satisfy a request, but it is
not contiguous.
64


Internal fragmentation - allocated memory may be slightly larger than requested
memory; difference between these two numbers is memory internal to a partition,
but not being used.
Reduce external fragmentation by compaction.
o Shuffle memory contents to place all free memory together in one large
block.
o Compaction is possible only if relocation is dynamic, and is done at
execution time.
o I/O problem
 Latch job in memory while it is involved in I/O.
 Do I/O only into OS buffers.
Paging - logical address space of a process can be
noncontiguous; process is allocated physical memory
wherever the latter is available.







Divide physical memory into fixed-sized blocks called frames (size is power of 2,
between 512 bytes and 8192 bytes).
Divide logical memory into blocks of same size called pages.
Keep track of all free frames.
To run a program of size n pages, need to find n free frames and load program.
Set up a page table to translate logical to physical addresses.
No external fragmentation but internal fragmentation.
Address generated by CPU is divided into:
o Page number (p) - used as an index into a page table which contains base
address of each page in physical memory.
o Page offset (d) - combined with base address to define the physical
memory address that is sent to the memory unit.
65

Separation between user's view of memory and actual physical memory
reconciled by address translation hardware; logical addresses are translated into
physical addresses.
Implementation of page table






Page table is kept in main memory.
Page-table base register (PTBR) points to the page table.
Page-table length register (PTLR) indicates size of the page table.
In this scheme every data/instruction access requires two memory accesses. One
for the page table and one for the data/instruction.
The two memory access problem can be solved by the use of a special fast-lookup
hardware cache called associative registers or translation look-aside buffers
(TLBs).
Associative registers - parallel search





Page No | Frame No
________|_________
|________|_________|
|________|_________|
|________|_________|
|________|_________|
Address translation (A', A'')
o
o


If A' in associative register, get frame number out.
Otherwise get frame number from page table in memory.
Hit ratio - percentage of times that a page number is found in the associative
registers; ratio related to number of associative registers.
Effective Access Time (EAT)
o associative lookup = e time units
o memory cycle time = m time units
66
o
hit ratio = a
EAT = (m + e) a+ (2m + e) (1 - a) = 2m + e - a



Memory protection implemented by associating protection bits with each frame.
Valid-invalid bit attached to each entry in the page table:
o ``valid'' indicates that the associated page is in the process' logical address
space, and is thus a legal page.
o ``invalid'' indicates that the page is not in the process' logical address
space.
Write bit attached to each entry in the page table.
o pages which have not been written may be shared between processes
o do not need to be swapped - can be reloaded.
Multilevel Paging - partitioning the page table allows
the operating system to leave partitions unused until a
process needs them.

A two-level page-table scheme

A logical address (on 32-bit machine with 4K page size) is divided into:
67
o
o

a page number consisting of 20 bits.
a page offset consisting of 12 bits.
Since the page table is paged, the page number is further divided into:
o a 10-bit page number.
o a 10-bit page offset.
Thus, a logical address is as follows:

where p1 is an index into the outer page table, and p2 is the displacement within
the page of the outer page table.
Address-translation scheme for a two-level 32-bit paging architecture


Multilevel paging and performance
o Since each level is stored as a separate table in memory, converting a
logical address to a physical one may take four memory accesses.
o Even though time needed for one memory access is quintupled (4 level
paging) , caching permits performance to remain reasonable.
o Cache hit rate of 98 percent, memory access of 100ns, TLB lookup 20ns, 4
level paging:
effective access time = 0.98 x 120 + 0.02 x 520
= 128 nanoseconds
which is only a 28 percent slowdown in memory access time.
68
Inverted Page Table
One entry for each real page of memory; entry consists of the virtual address of the page
stored in that real memory location, with information about the process that owns that
page.


Decreases memory needed to store each page table, but increases time needed to
search the table when a page reference occurs.
Use hash table to limit the search to one - or at most a few - page-table entries.
Shared pages

One copy of read-only (reentrant) code shared among processes (i.e., text editors,
compilers, window systems).
69
Segmentation - memory-management scheme that
supports user view of memory.

A program is a collection of segments. A segment is a logical unit such as:
code
local variables
global variables
stack

Example

Logical address consists of a two tuple:
<segment-number, offset>



A segment table maps two-dimensional user defined addresses into onedimensional physical addresses; each entry in the table has:
o base - contains the starting physical address where the segments reside in
memory.
o limit - specifies the length of the segment.
Segment-table base register (STBR) points to the segment table's location in
memory.
Segment-table length register (STLR) indicates number of segments used by a
program;
segment number s is legal if s < STLR.
Sharing


shared segments
same segment number
70
Protection
With each entry in segment table associate:


validation bit = 0 -> illegal segment
read/write/execute privileges
Allocation




first fit/best fit
external fragmentation
Protection bits associated with segments; code sharing occurs at segment level.
Since segments vary in length, memory allocation is a dynamic storage-allocation
problem.
71
Segmentation with Paging

The Intel Pentium uses segmentation with paging for memory management, with
a two-level paging scheme.
Considerations in comparing memory-management
strategies:







Hardware support
Performance
Fragmentation
Relocation
Swapping
Sharing
Protection
72
VIRTUAL MEMORY









Background
Demand Paging
Performance of Demand Paging
Page Replacement
Page-Replacement Algorithms
Allocation of Frames
Thrashing
Other Considerations
Demand Segmentation
Background


Virtual memory - separation of user logical memory from physical memory.
o Only part of the program needs to be in memory for execution.
o Logical address space can therefore be much larger than physical address
space.
o Need to allow pages to be swapped in and out.
Virtual memory can be implemented via:
o Demand paging
o Demand segmentation
Demand Paging


Bring a page into memory only when it is needed.
o Less I/O needed
o Less memory needed
o Faster response
o More users
Page is needed => reference to it
o invalid reference => abort
o not-in-memory => bring to memory
Valid-Invalid bit


With each page table entry a valid-invalid bit is associated (1 = in-memory, 0 =
not-in-memory)
Initially valid-invalid bit is set to 0 on all entries.
73

Example of a page table snapshot.

During address translation, if valid-invalid bit in page table entry is 0 => page
fault.
Page Fault
1. If there is ever a reference to a page, first reference will trap to OS -> page fault.
2. OS looks at another table to decide:
3.
4.
5.
6.
a) Invalid reference => abort.
b) Just not in memory.
Get empty frame.
Swap page into frame.
Reset tables, validation bit = 1.
Restart instruction:
o block move
o
auto increment/decrement location
What happens if there is no free frame?


Page replacement - find some page in memory, but not really in use, swap it out.
o algorithm
o performance - want an algorithm which will result in minimum number of
page faults.
Same page may be brought into memory several times.
74
Performance of Demand Paging



Page Fault Rate 0 <= p <= 1.0
if p = 0, no page faults
if p = 1, every reference is a fault
Effective Access Time (EAT)
EAT = (1 - p ) x memory access
+ p (page fault overhead
+ [swap page out]
+ swap page in
+ restart overhead)
Example:
o memory access time = 1 microsecond
o 50% of the time the page that is being replaced has been modified and
therefore needs to be swapped out.
o Swap Page Time = 10 msec = 10,000 msec
o EAT = (1 - p ) * 1 + p (15000) = 1 + 15000P (in msec)
Page Replacement



Prevent over-allocation of memory by modifying page-fault service routine to
include page replacement.
Use modify (dirty) bit to reduce overhead of page transfers - only modified pages
are written to disk.
Page replacement completes separation between logical memory and physical
memory - large virtual memory can be provided on a smaller physical memory.
Page-Replacement Algorithms



Want lowest page-fault rate.
Evaluate algorithm by running it on a particular string of memory references
(reference string) and computing the number of page faults on that string.
In all our examples, the reference string is
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
First-In-First-Out (FIFO) Algorithm
Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5




3 frames (3 pages can be in memory at a time per process)
1 | 1 4 5
2 | 2 1 3
3 | 3 2 4
9 page faults
75




4 frames
1
2
3
4 |
|
|
|
4
1 5 4
2 1 5
3 2
3
10 page faults
FIFO Replacement - Belady's Anomaly
more frames != less page faults
Optimal Algorithm







Replace the page that will not be used for the longest period of time.
4 frames example
1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
1
2
3
4


4
6 page faults
5
How do you know this?
Used for measuring how well an algorithm performs.
Least Recently Used (LRU) Algorithm
1
2
3
4


5
5
3
4
Counter implementation
o Every page entry has a counter; every time page is referenced through this
entry, copy the clock into the counter.
o When a page needs to be changed, look at the counters to determine which
are to change
Stack implementation - keep a stack of page numbers in a double link form:
o Page referenced: move it to the top requires 6 pointers to be changed
o No search for replacement
76
LRU Approximation Algorithms


Reference bit
o With each page associate a bit, initially = 0.
o When page is referenced bit set to 1.
o Replace the one which is 0 (if one exists). We do not know the order,
however.
Second chance
o Need reference bit.
o Clock replacement.
o If page to be replaced (in clock order) has reference bit = 1, then:
a) set reference bit 0.
b) leave page in memory.
c) replace next page (in clock order), subject to same rules.


Counting Algorithms - keep a counter of the number of references that have been
made to each page.
o LFU Algorithm: replaces page with smallest count.
o MFU Algorithm: based on the argument that the page with the smallest
count was probably just brought in and has yet to be used.
Page-Buffering Algorithm - desired page is read into a free frame from the pool
before the victim is written out.
Allocation of Frames

Each process needs minimum number of pages.
Example: IBM 370 - 6 pages to handle SS MOVE instruction:


a) Instruction is 6 bytes, might span 2 pages.
b) 2 pages to handle from.
c) 2 pages to handle to.
Two major allocation schemes:
o fixed allocation
o priority allocation
Fixed allocation
o Equal allocation
o
If 100 frames and 5 processes, give each 20 pages.
Proportional allocation
Allocate according to the size of process.
77




s i = size of process p i
S = sum(s i)
m = total number of frames
a i = allocation for p i = (si/S) x m
Example : m = 64
s 1 = 10
s 2 = 127
a 1 = 10/137 x 64 = 5
a 2 = 127/137 x 64 = 59

Priority allocation
o Use a proportional allocation scheme using priorities rather than size.
o If process Pi generates a page fault,
 select for replacement one of its frames.
 select for replacement a frame from a process with lower priority
number.
Global versus local allocation


Global replacement - process selects a replacement frame from the set of all
frames; one process can take a frame from another.
Local replacement - each process selects from only its own set of allocated
frames.
Thrashing


If a process does not have "enough'' pages, the page-fault rate is very high:
o low CPU utilization.
o operating system thinks that it needs to increase the degree of
multiprogramming.
o another process added to the system.
Thrashing = a process is busy swapping pages in and out.
78

Why does paging work?

Locality model
o Process migrates from one locality to another.
o Localities may overlap.
Why does thrashing occur?
sum(size of locality) > total memory size
Working-Set Model

D = working-set window = a fixed number of page references

Example: 10,000 instruction
WSSi - working set of process Pi =
total number of pages referenced in the most recent D(varies in time)
If D too small will not encompass entire locality.
If D too large will encompass several localities.
If D= infinity => will encompass entire program.



D = sum(WSS i) = total demand frames
If D > m => thrashing.
Policy if D > m , then suspend one of the processes.
How do you keep track of the working set?

Approximate with:

interval timer + a reference bit
Example:
D = 10,000
o Timer interrupts after every 5000 time units.
o Keep in memory 2 bits for each page.
o Whenever a timer interrupts copy and sets the values of all reference bits
to 0.
o If one of the bits in memory = 1 *page in working set.
Not completely accurate (why?)
Improve = 10 bits and interrupt every 1000 time units
79
Page-Fault Frequency Scheme

Establish "acceptable'' page-fault rate.
o If actual rate too low, process loses frame.
o If actual rate too high, process gains frame.
Other Consideration
1. Prepaging
2. Page size selection
- fragmentation
- table size
- I/O overhead
- locality
3. Program structure
- Array A[1024,1024] of integer
- Each row is stored in one page
- One frame - Program 1
for j := 1 to 1024 do
for i := 1 to 1024 do
A[i,j] := 0;
1024 * 1024 page faults
- Program 2
for i := 1 to 1024 do
for j := 1 to 1024 do
A[i,j] := 0;
1024 page faults
4. I/O interlock and addressing
80
Demand Segmentation - used when insufficient
hardware to implement demand paging.


OS/2 allocates memory in segments, which it keeps track of through segment
descriptors.
Segment descriptor contains a valid bit to indicate whether the segment is
currently in memory.
o If segment is in main memory, access continues,
o If not in memory, segment fault.
FILE-SYSTEM INTERFACE




File Concept
Access Methods
Directory Structure
Protection
File Concept


Contiguous logical address space
Types:
o Data
numeric
character
binary
o Program
source
object (load image)
o Documents
File Structure



None - sequence of words, bytes
Simple record structure
o Lines
o Fixed length
o Variable length
Complex Structures
o Formatted document
o Relocatable load file
Can simulate last two with first method by inserting appropriate control characters.
81

Who decides:
o Operating system
o Program

File Attributes
o Name - only information kept in human readable form.
o Type - needed for systems that support different types.
o Location - pointer to file location on device.
o Size - current file size.
o Protection - controls who can do reading, writing, executing.
o Time, date, and user identification - data for protection, security, and
usage monitoring.
Information about files are kept in the directory structure, which is maintained on
the disk.

File Operations



create
write
read



reposition within file - file seek
delete
truncate
Access Methods


Sequential Access
Direct Access
Directory Structure - a collection of nodes containing
information about all files.


Both the directory structure and the files reside on disk.
Backups of these two structures are kept on tapes.
82
Organize the directory (logically) to obtain:



Efficiency - locating a file quickly.
Naming - convenient to users.
o Two users can have same name for different files.
o The same file can have several different names.
Grouping - logical grouping of files by proper ties, e.g., all Pascal programs, all
games, ...
Single-Level Directory - a single directory for all users.


Naming problem
Grouping problem
Two-Level Directory - separate directory for each user.




Path name
Can have the same file name for different user
Efficient searching
No grouping capability
83
Tree-Structured Directories




Efficient searching
Grouping capability
Current directory (working directory)



Absolute or relative path name
Creating a new file is done in current directory.
Delete a file
cd /avi/books/os
type ch1
rm <file-name>

Creating a new subdirectory is done in current directory.
mkdir <dir-name>
Example: if in current directory /avi/books
mkdir modula

Deleting "books'' => deleting the entire subtree rooted by "books''.
84
Acyclic-Graph Directories - have shared subdirectories
and files.


Two different names (aliasing)
If A deletes D => dangling pointer.
Solutions:
o Backpointers, so we can delete all pointers. Variable size records a
problem.
o Backpointers using a daisy chain organization.
o Entry-hold-count solution.
General Graph Directory

How do we guarantee no cycles?
o Allow only links to file not subdirectories.
o Garbage collection.
o Every time a new link is added use a cycle detection algorithm to
determine whether it is OK.
Protection


File owner/creator should be able to control:
o what can be done
o by whom
Types of access
o Read
o Write
o Execute
o Append
o Delete
o List
85
Access Lists and Groups







Mode of access: read, write, execute
Three classes of users
a) owner access
b) group access
c) public access







R
1
R
6 => 1
R
1 => 0 0
7 =>
W
1
W
1
W
1
X
1
X
0
X
Ask manager to create a group (unique name), say G , and add some users to the
group.
For a particular file (say game) or subdirectory, define an appropriate access.
chmod 761 game
/|\
/ | \
public | owner
group
Attach a group to a file
chgrp G game
FILE-SYSTEM IMPLEMENTATION






File-System Structure
Allocation Methods
Free-Space Management
Directory Implementation
Efficiency and Performance
Recovery
File-System Structure




File structure
o Logical storage unit
o Collection of related information
File system resides on secondary storage (disks).
File system organized into layers.
File control block - storage structure consisting of information about a file.
86
Contiguous Allocation - each file occupies a set of
contiguous blocks on the disk.





Simple - only starting location (block #) and length (number of blocks) are
required.
Random access.
Wasteful of space (dynamic storage-allocation problem).
Files cannot grow.
Mapping from logical to physical.
If A is the logical Address and Q and R are the quotient and Remainder when LA
is divided by the block size (512) then.

Q = A div 512
R = A mod 512
o
o
Block to be accessed = Q + starting address
Displacement into block = R
Linked Allocation - each file is a linked list of disk
blocks; blocks may be scattered anywhere on the disk.
_______
block = |pointer|
|-------|
|
|
|_______|

Allocate as needed, link together.
Example: File starts at block 9




Simple - need only starting address
Free-space management system - no waste of space
No random access
Mapping

Q = A div 511,
R = A mod 511
87
o
o

Block size is smaller to allow space for the pointer.
Block to be accessed is the Qth block in the linked chain of blocks
representing the file.
o Displacement into block = R + 1
File-allocation table (FAT) - disk-space allocation used by Windows 95.
Indexed Allocation - brings all pointers together into the
index block.




Need index table
Random access
Dynamic access without external fragmentation, but have overhead of index
block.
Mapping from logical to physical in a file of maximum size of 256K words and
block size of 512 words. We need only 1 block for index table.

o
o

Q = displacement into index table
R = displacement into block
Mapping from logical to physical in a file of unbounded length (block size of 512
words).
o Linked scheme - Link blocks of index tables (no limit on size).
o


Q = A div 512
R = A mod 512
Q1 = A div (512*511)
R1 = A mod (512*511)
o
o
o
Q1 = block of index table
R1 is used as follows:
o
o
Q2 = displacement into block of index table
R2 = displacement into block of file
Q2 = R1 div 512
R2 = R1 mod 512
Two-level index (maximum file size is 5123 )
Q1 = A div (512*512),
R1 = A mod (512*512)
88

o
o
o
Q1 = displacement into outer-index
R1 is used as follows:
o
o
Q2 = displacement into block of index table
R2 = displacement into block of file
Q2 = R1 div 512
R2 = R1 mod 512
Combined scheme: UNIX (4K bytes per block)
o
o
o
directly accessed 48K bytes
single indirection 222 bytes
double indirection 232 bytes
89
Free-Space Management
1. Bit vector (n blocks)
o
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
o Bit map requires extra space.
E.g.: block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230 /212 = 218 = 256K
o Easy to get contiguous files
2. Need to protect:
o Pointer to free list
o Bit map
 Must be kept on disk.
 Copy in memory and disk may differ.
 Cannot allow for block[i] to have a situation where bit[i] = 1 in
memory and bit[i] = 0 on disk.
Solution:
 Set bit[i ] = 1 in disk.
 Allocate block[i ].
 Set bit[i ] = 1 in memory.



Linked list (free list)
o Cannot get contiguous space easily
o No waste of space
Grouping
Counting
Directory Implementation


Linear list of file names with pointers to the data blocks.
o simple to program
o time-consuming to execute
Hash Table - linear list with hash data structure.
o decreases directory search time
o collisions - situations where two file names hash to the same location
o fixed size
90
Efficiency and Performance


Efficiency dependent on:
o disk allocation and directory algorithms
o types of data kept in file's directory entry
Performance
o disk cache - separate section of main memory for frequently used blocks
o free-behind and read-ahead - techniques to optimize sequential access
o improve PC performance by dedicating section of memory as virtual disk,
or RAM disk
Recovery

Consistency checker - compares data in directory structure with data blocks on
disk, and tries to fix inconsistencies.
SECONDARY-STORAGE STRUCTURE






Disk Structure
Disk Scheduling
Disk Management
Swap-Space Management
Disk Reliability
Stable-Storage Implementation
Disk Structure

A disk can be viewed as an array of blocks.

There exists a mapping scheme from logical block address B i to physical address
(track, sec tor).
o Smallest storage allocation area is a block.
o Internal fragmentation on block.
91
Disk Scheduling




Disk Requests - Track/Sector
o Seek
o Latency
o Transfer
Minimize Seek Time
Seek Time is proportional to Seek Distance
A number of different algorithms exist. We illus trate them with a request queue
(0-199).
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53

FCFS

SSTF

SCAN
92

C-SCAN

LOOK

C-LOOK
Disk Management



Disk formatting
o physical
o logical
Boot block initializes system.
Need methods to detect and handle bad blocks.
Swap-Space Management



Swap-space use
Swap-space location
o normal file system
o separate disk partition
Swap-space management
o 4.3BSD allocates swap space when process starts (holds text segment and
data segment).
o Kernel uses swap maps to track swap-space use.
93
Disk Reliability


Disk striping
RAID
o Mirroring or shadowing keeps duplicate of each disk.
o Block interleaved parity.
Stable-Storage Implementation


Write-ahead log scheme requires stable storage.
To implement stable storage:
o Replicate information on more than one non volatile storage media with
independent failure modes.
o Update information in a controlled manner to ensure that failure during
data transfer does not damage information.
PROTECTION







Goals of Protection
Domain of Protection
Access Matrix
Implementation of Access Matrix
Revocation of Access Rights
Capability-Based Systems
Language-Based Protection
Goals of Protection



Operating system consists of a collection of objects, hardware or software.
Each object has a unique name and can be accessed through a well-defined set of
operations.
Protection problem - ensure that each object is accessed correctly and only by
those processes that are allowed to do so.
Domain Structure


Access-right = <object-name,rights-set>
Rights-set is a subset of all valid operations that can be performed on the object.
Domain = set of access-rights
94
AR1 = <file_A,{Read,Write}>
AR8 = <file_A,{Read}>
Domain Implementation



Simple Operating System consists of 2 domains:
o user
o supervisor
UNIX
o Domain = user-id
o Domain switch accomplished via file system.
 Each file has associated with it a domain bit (setuid bit).
 When file is executed and setuid = on, then user-id is set to owner
of the file being executed. When execution completes user-id is
reset.
Multics Rings - Let Di and Dj be any two domain rings.
If j < i => Di is a subset of Dj .
Access Matrix



Rows - domains
Columns - domains + objects
Each entry - Access rights
95
Use of Access Matrix




If a process in Domain Di tries to do "op" on object Oj , then "op" must be in the
access matrix.
Can be expanded to dynamic protection.
o Operations to add, delete access rights.
o Special access rights:
 owner of Oj
 control - switch from domain Di to Dj
Access matrix design separates mechanism from policy.
o Mechanism - operating system provides Access-matrix + rules.
It ensures that the matrix is only manipulated by authorized agents and
that rules are strictly enforced.
Policy - user dictates policy.
Who can access what object and in what mode.
Implementation of Access Matrix


Each column = Access-control list for one object
Defines who can perform what operation.
Domain 1 = Read,Write
Domain 2 = Read
Domain 3 = Read
...
Each Row = Capability List (like a key)
For each domain, what operations allowed on what objects.
Object 1 - Read
Object 4 - Read,Write,Execute
Object 5 - Read,Write,Delete,Copy
96
Revocation of Access Rights


Access List - Delete access rights from access list.
o simple
o immediate
Capability List - Scheme required to locate capability in the system before
capability can be revoked.
o Reacquisition
o Back-pointers
Language-Based Protection



Specification of protection in a programming language allows the high-level
description of policies for the allocation and use of resources.
Language implementation can provide software for protection enforcement when
automatic hardware-supported checking is unavailable.
Interpret protection specifications to generate calls on whatever protection system
is provided by the hardware and the operating system.
SECURITY






The Security Problem
Authentication
Program Threats
System Threats
Threat Monitoring
Encryption
The Security Problem


Security must consider external environment of the system, and protect it from:
o unauthorized access.
o malicious modification or destruction.
o accidental introduction of inconsistency.
Easier to protect against accidental than malicious misuse.
Authentication


User identity most often established through passwords, can be considered a
special case of either keys or capabilities.
Passwords must be kept secret.
o Frequent change of passwords.
o Use of ``non-guessable'' passwords (not in dictionary or username - 1st
letter).
o Log all invalid access attempts.
97
Program Threats


Trojan Horse
o Code segment that misuses its environment.
o Exploits mechanisms for allowing programs written by users to be
executed by other users.
Trap Door
o Specific user identifier or password that circumvents normal security
procedures.
o Could be included in a compiler or the kernel.
System Threats



Worms - use spawn mechanism; standalone program.
Internet worm
o Exploited UNIX networking features (remote access) and bugs in finger
and sendmail programs.
o Grappling hook program uploaded main worm program.
Viruses - fragment of code embedded in a legitimate program.
o Mainly effect microcomputer systems.
o Downloading viral programs from public bulletin boards or exchanging
floppy disks containing an infection.
o Safe computing.
Threat Monitoring



Check for suspicious patterns of activity - i.e., several incorrect password attempts
may signal password guessing.
Audit log - records the time, user, and type of all accesses to an object; useful for
recovery from a violation and developing better security measures.
Scan the system periodically for security holes; done when the computer is
relatively unused. Check for:
o Short or easy-to-guess passwords
o Unauthorized set-uid programs
o Unauthorized programs in system directories
o Unexpected long-running processes
o Improper directory protections
o Improper protections on system data files
o Dangerous entries in the program search path (Trojan horse)
o Changes to system programs; monitor check sum values
98
Encryption





Encrypt clear text into cipher text.
Properties of good encryption technique:
o Relatively simple for authorized users to encrypt and decrypt data.
o Encryption scheme depends not on the secrecy of the algorithm but on a
parameter of the algorithm called the encryption key.
o Extremely difficult for an intruder to determine the encryption key.
Data Encryption Standard substitutes characters and rearranges their order on the
basis of an encryption key provided to authorized users via a secure mechanism.
Scheme only as secure as the mechanism.
Public-key encryption based on each user having two keys:
o public key - published key used to encrypt data.
o private key - key known only to individual user used to decrypt data.
Must be an encryption scheme that can be made public without making it easy to
figure out the decryption scheme.
o Efficient algorithm for testing whether or not a number is prime.
o No efficient algorithm is known for finding the prime factors of a number.
99