Download Fundamental Concepts of Operating Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Library (computing) wikipedia , lookup

Plan 9 from Bell Labs wikipedia , lookup

MTS system architecture wikipedia , lookup

Commodore DOS wikipedia , lookup

RSTS/E wikipedia , lookup

OS 2200 wikipedia , lookup

DNIX wikipedia , lookup

Spring (operating system) wikipedia , lookup

VS/9 wikipedia , lookup

Process management (computing) wikipedia , lookup

Unix security wikipedia , lookup

Burroughs MCP wikipedia , lookup

CP/M wikipedia , lookup

Paging wikipedia , lookup

Transcript
Fundamental Concepts of Operating Systems
Prof. Lixin Tao
Pace University
December 4, 2002
Chapter 1: Introduction
•
•
•
•
•
•
•
•
•
•
An operating system is a software layer between CPU hardware and user
programs.
o OS provides a collection of common services so that application
programmers don’t need to work with low-level system primitives.
As a result, OS makes programming easier. Since such services are
implemented by experts, they may be more efficient.
o OS manages the computer hardware resources to improve their
utilization or the performance of particular applications.
A computer running an OS can be considered as a virtual machine that
supports all the machines instructions as well as those system calls (OS
services) implemented by the OS.
Java virtual machine is a program interpreting and executing Java bytecode
files. It runs on top of the OS layer thus providing a higher-level virtual
machine with more abstractions and services.
The main design goal of an OS is to make the computer system more efficient
and easier to use.
A mainframe system is powerful and expensive. OS should try to improve its
utilization.
A process is a program in execution. Multiple processes may run the same
copy of program in memory.
Batch systems are old form of OS to minimize the human interaction time
with mainframe systems to improve system utilization.
Multiprogramming systems has multiple processes (jobs) residing in main
memory to wait for the CPU to run. When one process is waiting for an I/O
event, another process can take over the CPU to run at the same time. This
improves system utilization. With proper process scheduling,
multiprogramming also improves on the responsiveness of applications to user
events.
Time-sharing is a special form of multiprogramming. A time unit is selected
to balance the overhead of process context switching and responsiveness of
the system to user events. All the processes ready to run will use round robin
to take turn to use the CPU to run its code for the duration of up to the
specified time unit. The result is the illusion that each process is running
concurrently on its own virtual machine.
Parallel execution usually implies that multiple processes are using multiple
hardware units to run at the same time. Concurrent execution usually implies
1
•
•
•
•
•
•
•
•
•
•
•
that multiple processes are running on the same hardware based on timesharing or multiplexing.
A personal computer using network resources (like file server) is called a
workstation. A PC or workstation serves one person at a time, and their OS
focuses on responsiveness of the system to user events.
A real-time system aims to completing the execution of a program within rigid
time constraints. Real-time OS usually simplifies its structure and functions to
reduce its execution overhead.
A multiprocessor system has multiple processors inside a cabinet connected
through a high-speed inter-processor network. The processors have no local
main memory; they share common memory modules on the network. The
multiple processors are used to run the same application faster. Speedup,
which is defined as the speed of a uniprocessor divided by the speed of a
multiprocessor, is the main objective of multiprocessor systems.
A symmetric multiprocessor (SMP) is a multiprocessor in which all
processors have the same role and run the same OS.
A multicomputer differs from a multiprocessor in that each processor has
local memory, there is no shared memory, and message passing is the only
mechanism for inter-processor communication. The main objective of
multicomputer is for speedup. While the size of a multiprocessor is limited to
1024 processors or so (due to the limit of shared memory), a multicomputer
can easily scale up to thousands of processors.
A parallel system is either a multiprocessor, or a multicomputer.
A distributed system has multiple computers running in different rooms or
buildings or cities and connected through networks. The main objectives of a
distributed system are resource sharing and fault-tolerant.
A local-area network (LAN) connects computers within a building. A widearea network (WAN) connects computers across cities and countries. Internet
is one example of WAN. Most of today’s networks are using TCP/IP protocol
for communication.
In a client-server system, one or more computers will run server programs and
provide services, and many client computers will visit these server machines
to get service. The servers and clients are not having the same role. Servers
and network bandwidth (number of bits transferred per second over the
network) are the performance bottlenecks.
In a peer-to-peer system, all computers have the same role. Each one can be a
client at a time, and a server at another time. There are fewer bottlenecks in
such a system.
A clustered system usually uses a local cluster of networked computers to
simulate a parallel system or provide high availability. It is much less
expensive than parallel systems like multiprocessors or multicomputers.
2
Chapter 2: Computer-System Structures
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The CPU, main memory, and I/O devices are connected through a system bus.
The system bus has wires for memory addresses, wires for data, and wires for
controlling the bus (who gets the next chance to use the bus). At any time, the
bus can have only one data source to send data on the bus, and one or more
listeners to get data from the bus.
At the system boot, a short bootstrap program will run from a Read-Only
memory (ROM) to load the core of an OS into memory.
Program and data for a process must be loaded into the main memory to run.
An I/O device is controlled by its device controller. A device controller has a
control register to receive commands from the CPU and show the status of the
I/O device, and a few data registers.
All I/O operations are performed through shared OS functions called interrupt
handlers. The interrupt handler will use a special device driver, usually
provided by I/O device manufacturers, to control the corresponding I/O device
controller.
At the low-end of the main memory is an interrupt vector: each memory word
is indexed by the ID of a type of I/O device and holding the starting address of
the corresponding interrupt handler.
When an I/O device needs attention from the CPU, it will generate an
interrupt signal on the system bus.
After the execution of each instruction, the hardware will check whether
interrupt handling is enabled. If it is enabled, and the interrupt has been
requested by some source on the system bus, the machine goes into its
interrupt-handling phase.
The state of a process includes the value of the program counter (indicating
the address of the next instruction to be executed), the value of the generalpurpose registers, its stack for method invocation, its open files, and its code.
During the interrupt-handling phase, hardware will save the value of program
counter and registers into the current process’s process Control Block (PCB)
in main memory, use the ID of the interrupt source carried by the system bus
address wires to jump to find the starting address of the interrupt handler in
the interrupt vector, and start to run the interrupt handler.
At the end of the interrupt processing, the interrupted process will load its
state into the registers and resume its execution.
Interrupt processing may be nested: the execution of an interrupt handler may
be further interrupted by an event with higher priority.
When a program needs service from the OS, or when a serious error happens,
it generates a trap, which is basically an interrupt generated by software. An
interrupt handler will process it.
For slow I/O devices, the transfer of each word may need to use interrupt to
let CPU copy the word between the main memory and the device controller’s
data register.
For faster I/O devices like hard disk, a Direct Memory Access (DMA)
controller will coordinate the transfer of a big block of data between the main
3
•
•
•
•
•
•
•
•
•
•
•
memory and the I/O device, and interrupt the CPU only once at the end of the
data transfer.
When a process needs the service of an I/O device but the device is busy
serving other processes, it will put itself in the waiting queue for that device.
Spooling is a technique to improve system utilization. When a process needs
to send data to a slow device, the data is first copied to a hard disk buffer; the
process can then resume its execution. The OS will then take care of the offline transfer of data from the disk buffer to the I/O device.
A typical memory system has four layers: registers, cache, main memory, and
hard disk. They are in the order of fast to slow, small capacity to large
capacity, and expensive to cheap.
The objective of a layered memory system is to implement a less expensive
memory supporting fast access.
When a word is needed from a layer of memory and it is not there, the block
of data containing that word will be copied from the lower layer in the hope
the future memory accesses will be to these words.
The success of a layered memory system is based on the principle of
referential locality: the memory access for executing a program is usually
within a small address window during a small time span. If the program is
accessing address K, most likely it will access an address around K in the near
future.
The principle of referential locality is based on the sequential execution of
programs and the data storage in arrays.
Data consistency is a challenge to a layered memory system: the same data
may have multiple copies in several layers.
The hard disk has multiple platters. Each surface of a platter is divided into
circular tracks for storing data. Each track is divided into equal-size sectors.
The data transfer between a hard disk and the main memory is in multiples of
sectors.
All the read/write heads of a hard disk are controlled by the same mechanical
arm. It is slow to move the read/write heads from inner-most track to the outmost track, or vice versa. As a result, disk address is assigned based on
cylinders: the corresponding tracks on all platters make a cylinder. We can
read/write all data in a cylinder without mechanical movement of the
read/write heads.
Hardware protection mechanisms include
o Dual-model operation: A bit in a control register will specify the
current system mode. OS code can only run in system mode (also
called supervised mode or privileged mode). Applications can only
run in user mode. All I/O operations are handled by OS interrupt
handlers and can therefore run in system mode only.
o Base and limit registers: OS will assign consecutive blocks of main
memory to a process for execution. The start address and length of
this memory space is copied in the base register and limit register
respectively. Each memory address generated by this process will
be checked against these two registers by hardware to see whether it
4
falls in the authorized memory space. A user process is not allowed
to access memory locations outside of its assigned blocks.
o Hardware timer: A timer will issue an interrupt at specified time
intervals. As a result the OS will always have a chance to check
whether the system is still under its control. This timer is also used
to implement time-sharing.
Chapter 3: Operating System Structures
•
•
•
•
•
•
•
OS is interrupt-driven. When an I/O device needs service, it generates an
interrupt. When a program needs service, it generates a trap (software
interrupt).
Logically, OS has components for process management, main-memory
management, file management, I/O system management, disk management,
networking, protection, and command-interpreter system.
Command-interpreter is the most visible user-interface for OS. It allows a user
to type a command, and then the OS executes the corresponding OS program
to provide some service. In Unix, the command-interrupter is called a shell. A
batch file contains one or more commands for OS to execute in sequence or in
particular order.
OS provides many services to facilitate the execution of programs:
o Program execution: load code, execute it, and end it.
o I/O operations
o File system manipulation: All data on hard disk are abstracted as
files. A file is a logical sequence of bits. OS can create a file,
read/write a file, delete a file, buffering data during file read/write;
manage files with directories.
o Communications: Allow processes communicate through shared
memory or message passing.
o Error detection
o Resource allocation: Allocate CPU time, main memory space, file
buffers, I/O devices, and communication sockets to processes.
o Accounting
o Protection
Applications get OS service through system calls. Each OS needs to publish
its application programmer interface (API) to allow programs to invoke its
services. For MS Windows, it is Win32 API.
Parameter passing for OS system calls:
o For fixed small-size data, user registers.
o For large data, keep the data in user memory space, and set its
starting address in a specified register; the system call will go to the
register to find the parameter data.
A daemon is a process that runs without being associated with a particular
user login session. It keeps listening for client invocation to provide some
5
•
•
•
•
•
•
•
•
•
•
service. Web server and Domain Name Server (DNS) are example system
daemons.
Processes running on the same system can communicate through shared
memory.
Processes running on the same or different systems can communicate through
message passing: the sender will use method send() to send the data to its
local OS; the OS will route the data to the OS of its destination; the receiving
process will invoke receive() to receive the data from its local OS.
An OS will usually consist of a kernel (the core functions of an OS that
always stays in main memory) and some system programs. Typical system
programs include the command-interpreter and programs implementing the
OS commands.
The structure of an OS has to balance the needs of performance and
maintainability.
In Unix, the kernel implements the common primitive operations; the system
programs implement higher-level system calls based on the primitive kernel
operations; and the shells and user programs get services through the system
call API.
OS with layer structures strives to let each layer of OS only invoke services
from its immediate lower neighbor layer. Performance is a big issue for this
approach.
OS based on microkernels minimize the kernel functions and implement most
of the OS services as system programs.
As an extreme example of the concept of virtual machine, special system
software could be used to simulate multiples of the underlying hardware
machine, and support the installation and execution of the same or different
operating systems on each of these virtual machines.
Mechanisms should be separated from policies if possible: the OS should have
rich built-in mechanisms for its management and configuration; but the actual
policy controlling these mechanisms should stay out of the OS itself. For
example, we build the process-scheduling primitives into an OS, but hopefully
we can decide on whether we want to support first-come-first-served or
priority-based scheduling at the installation time of the OS.
System generation refers to the process of compiling and building an
executable version of an OS for a particular environment. Many policies and
system constants can be specified at this stage.
Chapter 4: Processes
•
•
A process is also called a job. It is the basic unit in OS for resource allocation.
Each process has a unique ID number. Inside a process there are one or more
threads for execution. For simplicity, this chapter focuses on processes with
one thread only.
Process execution state:
o A process in Ready state can run as soon as it gets the CPU
6
o A process in Running state is executing with some CPU. For a
system with one CPU, only one process may be in its running state.
o A process in Waiting state is waiting for some event to occur.
Typical events are I/O completion or reception of a signal.
o A new created process is in Ready state and put in a ready queue.
o When the CPU is available, the CPU scheduler will choose one of
the processes in the ready queue, remove it from the queue, and
start to run it in running state.
o When the running process issues some system calls for I/O services,
the process is put in a waiting state and get hooked in the waiting
queue for a particular I/O device.
o When the I/O service completes for a process, the process is put
back in the ready state and inserted in the ready queue.
•
•
•
•
•
•
•
•
•
•
Each process is represented inside an OS by its process Control Block (PCB).
An PCB is an object having fields to save the process’s execution state, its
unique process ID, value of program counter, value of general-purpose
registers, value of base and limit registers, list of open files, etc. The PCB also
has a pointer (reference) field for other PCBs so PCBs can be linked up in
various queues like ready queue and waiting queues.
Potential processes (jobs) wait for hard disk. When there is enough main
memory, the long-term scheduler (job scheduler) will choose one of them and
move it to the main memory. This process is now in the ready state.
The long-term scheduler determines how many processes will reside in the
main memory for execution, thus determines the degree of multiprogramming.
A CPU-bound process spends more time on CPU execution. An I/O-bound
process spends more time waiting for I/O operations. It is idea for improving
system throughput and utilization if the long-term scheduler can choose a
good mix of these two types of processes to run in the main memory.
The short-term scheduler (CPU scheduler) decides on which process in the
ready queue will take over the CPU for execution.
If the processes are taking up more memory space than there are in the main
memory, a medium-term scheduler may be used to swap-out and swap-in
partially executed processes.
Process context switch time is significant. Typical context switch time ranges
from 1 to 1000 microseconds.
When a user logs in to a Unix system, a new process is created to run the shell
for the user. When the user issues a command at the shell, a new process will
be created to execute the command. During the execution of a program, new
child processes may be created to run subtasks.
If two processes communicate through shared memory with bounded-buffer,
the two processes must cooperate so there will be no buffer underflow or
overflow.
If two processes communicate through message passing, they can either use
direct communication through a direct link, or use indirect communication
7
•
•
•
•
•
•
•
•
•
•
•
•
through a mailbox. The sender normally uses method send(pid, message), and
the receiver uses method receive(pid, message).
A blocking send will block the sender process until the receiver reaches its
destination or mailbox. A nonblocking send sends the message to the OS
buffer and resumes its operation right away.
A blocking receive blocks until a message is available. A nonblocking receive
will return either a valid message or null.
If both the send and receive are blocking, we have a rendezvous between the
sender and the receiver.
Buffers are usually used to implement communication links. A send will be
blocked if the link’s buffer is full.
Each networked computer has a unique IP address. A Uniform Resource
Locator (UML) address is used to make the IP address user-friendlier. A
Domain Naming Service (DNS) can translate an UML address to its IP
address.
Each process on a computer can listen to a port to get messages from its
clients. A port is represented by a unique integer. A socket is the abstraction
of a port on a particular computer.
When a server starts on a server machine, it will create a server socket on a
particular port, publish its IP address and port number, and keep listening to
that port for potential client messages.
When a client needs to communicate with a server, it will generate a new
socket on the client machine to connect to the server’s server socket
(represented by server’s IP address and port number). The server socket will
create a new socket on the server machine to communicate with this client.
Both the server and the client know the IP address and port number of its
partner’s socket, and they can communicate like writing into or reading from
file streams. In the above discussion, except the server socket’s port number,
all the sockets’ port numbers are randomly chosen from unused port numbers
on its native machine.
Java Remote Method Invocation (RMI) can allow an object to invoke a
method on a remote server. For a class implementing the server object, a stub
class will be generated with a tool that implements the same public method as
the server object, and a skeleton class will be generated with the same tool to
communicate with the stub on the server machine.
When a server starts, a skeleton object is created an it will keep listening for
potential messages from its corresponding stub objects.
When a client needs to invoke a method on the remote server, its will create a
stub object on its local machine. The matching stub and skeleton objects know
how to communicate with each other. The client invokes a method on its local
stub object (proxy). The method body of this stub will marshal its parameter
values to a platform-independent form, and pass method name and the
marshaled parameter values to the remote skeleton object.
The remote skeleton object unmarshal the parameter values for its local
platform, and invoke the local server object. When the server object returns
the return value for the method, the skeleton object marshals the return value
8
into platform-independent form, and pass it to the remote stub object. The stub
object will unmarshal the return value, and pass it back to the client as its own
return value.
Chapter 5: Threads
•
•
•
•
•
•
•
•
•
•
•
A process may contain one or more threads of computation.
Each thread is represented by its own program counter value, registers’ value,
and execution stack contents.
Threads belong to the same process can share code and data. It is up to the
programmer to protect the integrity of the shared data.
The overhead for switching the CPU from one thread to another is much less
compared to that for process context switch.
Multithreading is critical in supporting responsiveness of an application.
Multithreaded program running on a multiprocessor may speed up execution
by letting each processor run a separate thread.
Multithreaded program running on a uniprocessor can still overlap the
execution of multiple threads on different functional units like CPU and I/O
devices.
Threads generated from programming languages are user threads.
Threads supported by the OS are kernel threads.
User threads will be mapped to kernel threads for execution.
Thread scheduling is platform-dependent. A multithreaded application should
be tested on all potential client platforms.
Chapter 6: CPU Scheduling
•
•
•
•
•
CPU scheduler decides which ready process will get the CPU to run next.
CPU burst processes spend much time in CPU computation; I/O burst
processes spend much time in I/O operations. If the system has a good mix of
CPU burst and IO burst processes, a CPU scheduler could improve CPU
utilization by overlapping CPU and I/O operations.
If a scheduling could take CPU away from a process that could still run on the
CPU, it is a preemptive scheduling.
Scheduling criteria:
o CPU utilization
o Throughput: number of processes finished per time unit.
o Turnaround time: time from submission to completion of a process
o Waiting time: total time a process spends in the ready queue.
o Response time: time delay of a process’s response to a user request.
Scheduling algorithms:
o First-Come, First Served
o Shortest-Job-First
o Priority: job with the highest priority to run first
9
•
o Round-robin: Each process takes turn to run up to a time quantum
o Multilevel queue: Processes of a particular priority level go to its own
separate queue; each queue is FCFS; processes in a queue will not be
scheduled until all queues of higher priority are empty. It is usually
preemptive.
o Multilevel feedback queue: similar to multilevel queue, but processes
could migrate to neighboring queues based on some scheduling policy.
CPU scheduling must avoid process starvation: a process never gets its turn to
run because there are always higher-priority processes. One approach is to
enhance the priority of a process over time.
Chapter 7: Process Synchronization
•
•
•
•
If one process tries to modify some shared data and one or more other
processes try to access the shared data, a race condition may happen so that
either the final value of the shared data or the process output depends on the
relative order of the process scheduling.
A critical-section is a section of code in which multiple processes may try to
access shared data. To prevent race condition, an entry section and an exit
section should be used around a critical-section to make sure that at any time,
only up to one process may execute in the critical-section.
A solution to the critical-section problem must not make assumption of
relative speeds of the involved processes, and must satisfy the following
conditions:
o Mutual exclusion: up to one process to run in the critical-section.
o Progress: (1) only processes waiting to enter a critical-section
should be part of the decision-making as to who would be the next
one to enter; (2) such decision-making cannot be postponed
indefinitely.
o Bounded waiting: each process waiting to enter the critical-section
should get its turn in bounded number of tries (competitions).
The two-process critical-section problem can be solved by the following
algorithm:
boolean[] flag = {false, false};
int turn;
do {
flag[i] = true;
turn = j;
while (flag[j] && turn == j);
// Critical Section
flag[i] = false;
// remainder section
} where (true);
10
•
•
•
•
The multiple-process critical-section problem can be solved by the Bakery
algorithm. Each process planning to enter the critical-section must apply for
an ever-increasing sequence number. The process with the smallest sequence
number will be the next to enter the critical-section. In case more than one
processes get the same sequence number (processes apply for sequence
numbers in parallel), use their unique process ID number to break ties.
Hardware atomic instructions like testAndSet(boolean) can greatly simplify
the solutions to critical-section problems.
Boolean testAndSet(ref boolean target) {
boolean rv = target;
target = true;
return rv;
}
……
boolean lock = false;
……
do {
while (testAndSet(lock));
// Critical Section
lock = false;
// remainder section
} where while (true);
Semaphores support atomic operations wait() and signal()
o class semaphore {
public int value;
// initial value is 1 for mutual exclusion
public ProcessQueue q; // initially q is empty
}
o void wait(Semaphore s) {
s.value--;
if (s.value < 0) {
add this process’s PCB to s.q;
block();
}
}
o void signal(Semaphore s) {
s.value++;
if (s.value <= 0) { // 1 - s.value = number of waiting processes
remove the PCB of a process p from s.q;
wakeup(p);
}
}
Solving critical-section problem with a semaphore
Semaphore mutex;
Mutex.value = 1;
……
11
•
•
•
•
•
•
do {
wait(mutex);
// Critical Section
signal(mutex);
// remainder section
} while (true);
Improper usage of semaphores may lead to deadlock (no processes may
execute) or starvation (some processes never get a chance to enter the criticalsection).
If a computer cannot guarantee the atomic operation of semaphore wait() and
signal() with hardware, a software approach for mutual exclusion must be
used. Even though the software approaches reintroduce busy waiting, such
busy waiting only happens during very short wait() and signal() operations.
Classical Problems of Synchronization:
o The bounded-buffer problem: two writer processes need exclusive
access to a critical-section.
o The readers-writers problem: each writer needs exclusive access to
a critical-section, but multiple readers can access the critical-section
at the same time.
o The dining-philosophers problem: multiple processes compete for
limited resources, and there are possibilities for deadlocks to occur.
Critical regions and monitors are high-level language constructs to support
process synchronization. They are usually implemented in semaphores. But
they reduce the chances of misplaced semaphores. They are less powerful than
the primitive semaphores.
Variants of the monitor concept have been incarnated in Java classes through
its synchronized keywords.
In enterprise computing, a transaction is an atomic operation made up of a
sequence of more primitive operations. All operations in a transaction must all
succeed, or none of them should happen.
Chapter 8: Deadlocks
•
•
•
For a process to use a resource, it normally needs to go through three steps:
request (to OS), use, and release.
A deadlock happens if for a set of processes, each holds some resources, and
each needs some resources held by other processes to run.
Necessary conditions for a deadlock to happen:
o Mutual exclusion: some resources cannot be shared.
o Hold and wait: at least one process is holding a resource while
waiting for resources currently held by another process.
o No preemption: resources cannot be taken away from a process
before it finishes execution.
o Circular wait: A set {P0, P1, …, Pn} of processes must exist such
that P0 is waiting for some resource held by P1, P1 is waiting for
12
•
•
•
•
•
•
•
•
some resources held by P2, … and Pn is waiting for some resources
held by P0.
Resource allocation graph: Each process or resource type is represented by a
vertex; there is a request edge from a process vertex to a resource type vertex
if the process is requesting an instance of that type of resource; there is an
assignment edge from an instance of a resource type to a process vertex if the
instance has been assigned to the process.
The existence of a directed loop in a resource allocation graph is a necessary
condition for a deadlock to happen, but it is not a sufficient condition.
Deadlock prevention: use a policy to break one of the four necessary
conditions for a deadlock to happen. Since deadlock prevention ignores the
actual resource allocation graph, it is the most conservative approach.
A system is in safe state is there is an order to assign resources to the involved
processes, one after another, so that all the processes can finish execution. A
system in safe state cannot have deadlock. A system in an unsafe state may be
in a deadlock.
Deadlock avoidance: use an algorithm to make sure that the system is always
working in a safe state.
Deadlock avoidance uses the system resource allocation state and is therefore
more aggressive than deadlock prevention. But it has much more execution
overhead.
Deadlock detection: don’t make effort to prevent or avoid deadlocks; when
system performance is low, run a deadlock detection algorithm, and terminate
some processes involved in a deadlock or preempt some of their resources.
Most operating systems don’t address the deadlock problem due to
performance considerations.
Chapter 9: Memory Management
•
•
•
•
•
•
Programs usually use a logical memory address space that is either a flat array
of words starting from address zero, or a set of segments each representing a
logical code or data unit.
The physical address space usually doesn’t start at address zero, and it is
assigned to multiple processes.
The logical addresses in a program can be mapped to physical addresses at
compile or assemble time, module linking time, executable loading time, or
execution time. A program can run in any memory location only if it maps its
logical addresses to physical ones at execution time.
In its simplest form, the execution-time mapping of logical addresses to
physical addresses can be done by a relocation register.
If there is not enough physical memory of hold all processes to run, swapping
may be used to temporarily swap a partially executed process out to a hard
disk and later swap it in memory again. If the swap-out and swap-in are to
support priority scheduling, they are also called roll out, roll in.
For contiguous memory allocation, first-fit (using the first encountered
memory hole that is big enough for the request) is usually preferred to best-fit
13
•
•
•
•
•
•
•
•
•
•
•
•
(using the smallest hole that is big enough for the request) or worst-fit (using
the largest hole that is big enough for the request) due to its simplicity.
Contiguous memory allocation suffers from external fragmentation: even
though there are enough free memory space to run a new process, the free
memory space is scattered out in many holes too small to be useful.
Compaction is to move all processes’ memory space to one end of memory to
consolidate the small holes into a single big one. Compaction is usually too
time-consuming to be useful.
Paging: Partition the logical address space into fix-sized pages, usually 4-8
kBs. Partition the physical address space into the same-sized frames. Each
page can be loaded into any frame. The mapping of a page to a frame is done
at execution time through a page table. A logical address is made up of a page
number followed by an offset inside the page.
Logically, the length of a page table is determined by the size of the logical
memory space, which is huge; and each process must maintain its own page
table. Page tables are usually kept in the main memory.
Internal fragmentation: a process needs less memory than that is allocated to
it.
A smaller page size can minimize internal fragmentation, but it can also
increase the page table size and data disk transfer overhead.
Several processes can share pages if their page tables all refer to the same
frames of the shared pages.
Memory protection can be implemented by attaching access control bits to
entries of a page table.
An associative memory can be used as a translation look-aside buffer (TLB)
that acts as a cache for a small subset of active page-table entries. The TLB
can reduce the number of memory accesses for page-table lookups.
A multi-level hierarchical paging table paginates the page table itself, thus
avoids the requirement of the allocation of huge table memory space at upfront, and allocates memory for sub-tables only when necessary.
Segmentation: support the logical view that a program is made up of variablesized segments of code and data; use a segment table to provide each segment
with its for the base/limit registers. A logical address is made up of a segment
number and an offset within the segment. The logical address is mapped to a
physical address with the help of the segment table similar to the relocation
register approach.
It is more natural to attach protection bits to segment table entries, and share
segments among multiple processes.
Variable segment sizes may cause difficulty in memory allocation.
Segmentation with paging is more popular: each variable-sized segment is
further partitioned into fix-sized pages. This approach is used by Intel
architectures.
14
Chapter 10: Virtual Memory
•
•
•
•
•
•
•
•
•
•
•
•
Virtual memory: use a large disk’s space to support the illusion that the
physical memory is as large. It is usually implemented through demandpaging or demand-segmentation.
Demand paging: The disk is divided into a sequence of pages (a page is a
multiple of a sector). The physical memory is divided into the same-sized
frames. The virtual memory is as large as the disk capacity. Each process has
its own page table to map the disk pages to memory frames. When a page is
needed but it is not in memory, a free frame will be found or created through
paging out a victim page to the disk, and the new page will be loaded into the
frame.
Demand paging may have reasonable performance because most software
exhibits referential locality.
Virtual memory may improve CPU utilization because it allows more
processes to be in ready state.
While swapping in/out is moving an entire process image between a disk and
the main memory, paging in/out only moves in smaller page units.
Both swapping in/out and paging in/out use raw disk I/O (treating a disk as a
sequence of sectors and allocating consecutive sectors to the data or memory
image), which is much more efficient than file system I/O, for which the data
are usually scattered around a disk, and I/O has to go through directory
searches and data copying through multiple buffers.
Memory-mapped files: Use a system call to assign a contiguous block of
sectors in the virtual memory disk to a file. Use lazy (demand) page loading
from the file system. Access the contents of the file through virtual memory
addresses. When there are not enough memory frames available, do paging
in/out between the virtual memory disk and the memory. The file data will
finally be copied back to the file system disk when the file is closed. This
approach can improve performance because paging in/out are raw disk I/O.
The common page replacement policies include FIFO (which has Belady’s
anomaly: increasing frame number may not improve performance), LRU
(least-recently-used), or approximations of LRU.
Dirty bits can reduce the page replacement overhead. Attach a dirty bit to each
page table entry. When a page is first loaded into a frame, reset the dirty bit.
When a write happens inside the page, set its dirty page. When the page needs
to be replaced, it needs to be copied back to the disk only if its dirty bit is set.
A process needs a minimum number of page frames to run properly. The
lower bound of this minimal number is usually decided by the maximum
number of pages a machine instruction can reference.
Thrashing: the processes have too few page frames so much of the execution
time is spent on paging in/out.
A working-set is the set of unique pages visited in the last k memory accesses,
where k is called the working-set window size. A process needs enough
memory frames for the pages currently in its working-set to run properly. If
the total of the working-set sizes of the active processes exceeds the total
15
•
number of physical memory frames, some processes must be killed to avoid
thrashing.
For Windows 2000, each process is initially assigned two numbers: a
minimum number and a maximum number of frames decided by the compiler
and OS. A process starts to run with its minimum number of frames. When a
process needs more frames, it can get them from the shared set of free frames.
When the amount of free frames fall below a threshold, an automatic workingset trimming process will reduce the number of frames owned by each process
to its minimum size so more processes may be started.
Chapter 11: File-System Interface
•
•
•
•
•
•
•
•
•
•
•
A file is a sequence of bytes. A file is usually stored on a disk.
The main attributes of a file include its name, an ID number unique in a file
system, type, data location on a disk, size, protection, times for file creation
and last modification, owner user ID, and group ID.
The attributes of a file are stored in an entry of a directory file, which is also
usually stored on a hard disk.
Some OSs use file name extension to indicate the data type of a file, and
associate a particular file extension with a particular application that can
process that type of file.
The main file operations include file creation, reading a file, writing a file,
deleting a file, moving the directory location of a file.
A sequential access file accesses data sequentially from beginning to end. It
maintains a current file position. A newly opened file has its current position
at the start of the file data. Each read or write accesses the file at the current
position, and then advances it for the next access. The contents of a sequential
file doesn’t need to have a uniform structure.
A random access file is made up of fix-sized records, and each record can be
accessed directly based on the file data start address, record size, and record
sequence number for the record of interest.
The directory structure can be a tree, an acyclic-graph, or a graph, which is
usually a tree with links (aliases). Directories allow for more logical file
organizations, and reduce the chance of name conflicts. File links are pointers
to existing files or directories. File links support file or directory sharing. File
links are usually ignored during file system traversal (as an example, for file
search).
A file path is a sequence of device name (maybe not necessary) and directory
names that eventually leads to a file or directory of interest.
To simplify file specification, an OS maintains a current directory. Users can
change the value of the current directory. A file path relative to the file system
root (“/” for Unix, a device name like “c:” in DOS) is called an absolute path,
and a path relative to the current directory is called a relative path.
File protection mainly has two forms. One is the access control bits for read,
write, and execute, each can be specified for file owner, a particular group of
16
•
users, or for all other people. The more general approach is the ACL (Access
Control List), for which we can specify access rights for each individual. A
combination of these two approaches is common.
Most OSs use environment variables to search for files. For example,
environment variable PATH is usually used to specify how to find an
executable file, and environment variable CLASSPATH is used to specify
how to find Java source or class files during Java compilation or execution.
These paths are made up a sequence directory paths (maybe Java JAR or zip
file for CLASSPATH) separated by some separator (“;” for DOS, “:” for
Unix). The current directory is represented by period “.”. A search for a file
controlled by an environment variable will follow the order in which you list
the component paths in the path definition.
Chapter 12: File-System Implementation
•
•
•
•
•
•
•
A disk usually has a boot control block, a partition control block, a directory
structure in which each file is represented by a FCB (File Control Block,
directory entry, or inode in Unix), and file data area.
The basic data access unit for a disk is a sector, which is usually 512K or
more. A block is a multiple of a sector. A cluster is a multiple of a block.
Accessing larger data units can improve disk access efficiency.
While a disk for virtual memory implementation usually uses contiguous
block allocation, a file system usually uses linked list of blocks or indexed
blocks to store file data.
In DOS, each partition has a FAT (File Allocation Table) on the disk. The
number of entries in the FAT equals to the number of clusters the disk
supports. Each FAT entry is basically a pointer to a cluster, and it points to the
next cluster belonging to the same file. Because this FAT (or part of it) could
be copied to the main memory at execution time, file accesses can avoid disk
accesses that are just for following the linked list to find the cluster of interest.
Therefore, a FAT is basically a table that centralizes all the linked list
pointers.
In Unix, each inode (directory entry) has 12 pointers to file data blocks, so
that small files can be accessed without using indexing. An inode also has one
single indirect pointer that points to an indexing table block holding pointers
to file data blocks. An inode also has one double indirect pointer and one
triple indirect pointer that generalize the single indirect approach to two-level
and three-level indexing tables for supporting even larger files.
Since each file access needs to use file attributes in its directory entry, we
usually use an open() system call to copy a file’s directory entry into memory
for fast access. Since an OS has limits for such opened file directory entries, it
is good practice to close a file when it is not needed any more so the space for
the in-memory directory entry can be recycled.
OS usually has two in-memory tables for opened files. One is at system level,
each entry of it holds general attributes of a file (basically a copy of the
17
•
•
•
directory entry on the disk). The another one is per process, each entry of
which holds information for a file specific to that process, like current read
position. The system level table minimizes redundant file information in
multiple processes if a file is accessed by each of them. Each per-process table
entry has a pointer to one of the system-level table entry. When a user opens a
file, he will get as return value the index for the file directory entry in the perprocess table. This index value is an integer and is usually called a file handle
or a file descriptor. All other file access operations take this index as an
argument.
A disk needs to be mounted to a file system before it can be accessed. DOS
and Mac use implicit mount. When a new disk is detected, it is automatically
mounted to the file system under a special device name (“C:”, for example) or
as a special folder. Unix doesn’t support special device names. A disk can be
mounted to any directory of the existing file system, and the original contents
of the directory will be hidden until the disk is unmounted. Unless a disk is
listed in a boot-up script for automatic mount, normally a Unix user needs to
manually issue mount commands to use a new disk.
In Unix, a mount table is maintained to find out which prefix of an absolute
path is specifying a disk device.
When you write to a file, you may just write to a data buffer. When the data is
full, or you issue a flush() command, or you close the file, the data in the
buffer will be copied into the persistent disk copy of the file.
18