Download Fundamental Concepts of Operating Systems

Fundamental Concepts of Operating Systems Prof. Lixin Tao Pace University December 4, 2002 Chapter 1: Introduction • • • • • • • • • • An operating system is a software layer between CPU hardware and user programs. o OS provides a collection of common services so that application programmers don’t need to work with low-level system primitives. As a result, OS makes programming easier. Since such services are implemented by experts, they may be more efficient. o OS manages the computer hardware resources to improve their utilization or the performance of particular applications. A computer running an OS can be considered as a virtual machine that supports all the machines instructions as well as those system calls (OS services) implemented by the OS. Java virtual machine is a program interpreting and executing Java bytecode files. It runs on top of the OS layer thus providing a higher-level virtual machine with more abstractions and services. The main design goal of an OS is to make the computer system more efficient and easier to use. A mainframe system is powerful and expensive. OS should try to improve its utilization. A process is a program in execution. Multiple processes may run the same copy of program in memory. Batch systems are old form of OS to minimize the human interaction time with mainframe systems to improve system utilization. Multiprogramming systems has multiple processes (jobs) residing in main memory to wait for the CPU to run. When one process is waiting for an I/O event, another process can take over the CPU to run at the same time. This improves system utilization. With proper process scheduling, multiprogramming also improves on the responsiveness of applications to user events. Time-sharing is a special form of multiprogramming. A time unit is selected to balance the overhead of process context switching and responsiveness of the system to user events. All the processes ready to run will use round robin to take turn to use the CPU to run its code for the duration of up to the specified time unit. The result is the illusion that each process is running concurrently on its own virtual machine. Parallel execution usually implies that multiple processes are using multiple hardware units to run at the same time. Concurrent execution usually implies 1 • • • • • • • • • • • that multiple processes are running on the same hardware based on timesharing or multiplexing. A personal computer using network resources (like file server) is called a workstation. A PC or workstation serves one person at a time, and their OS focuses on responsiveness of the system to user events. A real-time system aims to completing the execution of a program within rigid time constraints. Real-time OS usually simplifies its structure and functions to reduce its execution overhead. A multiprocessor system has multiple processors inside a cabinet connected through a high-speed inter-processor network. The processors have no local main memory; they share common memory modules on the network. The multiple processors are used to run the same application faster. Speedup, which is defined as the speed of a uniprocessor divided by the speed of a multiprocessor, is the main objective of multiprocessor systems. A symmetric multiprocessor (SMP) is a multiprocessor in which all processors have the same role and run the same OS. A multicomputer differs from a multiprocessor in that each processor has local memory, there is no shared memory, and message passing is the only mechanism for inter-processor communication. The main objective of multicomputer is for speedup. While the size of a multiprocessor is limited to 1024 processors or so (due to the limit of shared memory), a multicomputer can easily scale up to thousands of processors. A parallel system is either a multiprocessor, or a multicomputer. A distributed system has multiple computers running in different rooms or buildings or cities and connected through networks. The main objectives of a distributed system are resource sharing and fault-tolerant. A local-area network (LAN) connects computers within a building. A widearea network (WAN) connects computers across cities and countries. Internet is one example of WAN. Most of today’s networks are using TCP/IP protocol for communication. In a client-server system, one or more computers will run server programs and provide services, and many client computers will visit these server machines to get service. The servers and clients are not having the same role. Servers and network bandwidth (number of bits transferred per second over the network) are the performance bottlenecks. In a peer-to-peer system, all computers have the same role. Each one can be a client at a time, and a server at another time. There are fewer bottlenecks in such a system. A clustered system usually uses a local cluster of networked computers to simulate a parallel system or provide high availability. It is much less expensive than parallel systems like multiprocessors or multicomputers. 2 Chapter 2: Computer-System Structures • • • • • • • • • • • • • • • The CPU, main memory, and I/O devices are connected through a system bus. The system bus has wires for memory addresses, wires for data, and wires for controlling the bus (who gets the next chance to use the bus). At any time, the bus can have only one data source to send data on the bus, and one or more listeners to get data from the bus. At the system boot, a short bootstrap program will run from a Read-Only memory (ROM) to load the core of an OS into memory. Program and data for a process must be loaded into the main memory to run. An I/O device is controlled by its device controller. A device controller has a control register to receive commands from the CPU and show the status of the I/O device, and a few data registers. All I/O operations are performed through shared OS functions called interrupt handlers. The interrupt handler will use a special device driver, usually provided by I/O device manufacturers, to control the corresponding I/O device controller. At the low-end of the main memory is an interrupt vector: each memory word is indexed by the ID of a type of I/O device and holding the starting address of the corresponding interrupt handler. When an I/O device needs attention from the CPU, it will generate an interrupt signal on the system bus. After the execution of each instruction, the hardware will check whether interrupt handling is enabled. If it is enabled, and the interrupt has been requested by some source on the system bus, the machine goes into its interrupt-handling phase. The state of a process includes the value of the program counter (indicating the address of the next instruction to be executed), the value of the generalpurpose registers, its stack for method invocation, its open files, and its code. During the interrupt-handling phase, hardware will save the value of program counter and registers into the current process’s process Control Block (PCB) in main memory, use the ID of the interrupt source carried by the system bus address wires to jump to find the starting address of the interrupt handler in the interrupt vector, and start to run the interrupt handler. At the end of the interrupt processing, the interrupted process will load its state into the registers and resume its execution. Interrupt processing may be nested: the execution of an interrupt handler may be further interrupted by an event with higher priority. When a program needs service from the OS, or when a serious error happens, it generates a trap, which is basically an interrupt generated by software. An interrupt handler will process it. For slow I/O devices, the transfer of each word may need to use interrupt to let CPU copy the word between the main memory and the device controller’s data register. For faster I/O devices like hard disk, a Direct Memory Access (DMA) controller will coordinate the transfer of a big block of data between the main 3 • • • • • • • • • • • memory and the I/O device, and interrupt the CPU only once at the end of the data transfer. When a process needs the service of an I/O device but the device is busy serving other processes, it will put itself in the waiting queue for that device. Spooling is a technique to improve system utilization. When a process needs to send data to a slow device, the data is first copied to a hard disk buffer; the process can then resume its execution. The OS will then take care of the offline transfer of data from the disk buffer to the I/O device. A typical memory system has four layers: registers, cache, main memory, and hard disk. They are in the order of fast to slow, small capacity to large capacity, and expensive to cheap. The objective of a layered memory system is to implement a less expensive memory supporting fast access. When a word is needed from a layer of memory and it is not there, the block of data containing that word will be copied from the lower layer in the hope the future memory accesses will be to these words. The success of a layered memory system is based on the principle of referential locality: the memory access for executing a program is usually within a small address window during a small time span. If the program is accessing address K, most likely it will access an address around K in the near future. The principle of referential locality is based on the sequential execution of programs and the data storage in arrays. Data consistency is a challenge to a layered memory system: the same data may have multiple copies in several layers. The hard disk has multiple platters. Each surface of a platter is divided into circular tracks for storing data. Each track is divided into equal-size sectors. The data transfer between a hard disk and the main memory is in multiples of sectors. All the read/write heads of a hard disk are controlled by the same mechanical arm. It is slow to move the read/write heads from inner-most track to the outmost track, or vice versa. As a result, disk address is assigned based on cylinders: the corresponding tracks on all platters make a cylinder. We can read/write all data in a cylinder without mechanical movement of the read/write heads. Hardware protection mechanisms include o Dual-model operation: A bit in a control register will specify the current system mode. OS code can only run in system mode (also called supervised mode or privileged mode). Applications can only run in user mode. All I/O operations are handled by OS interrupt handlers and can therefore run in system mode only. o Base and limit registers: OS will assign consecutive blocks of main memory to a process for execution. The start address and length of this memory space is copied in the base register and limit register respectively. Each memory address generated by this process will be checked against these two registers by hardware to see whether it 4 falls in the authorized memory space. A user process is not allowed to access memory locations outside of its assigned blocks. o Hardware timer: A timer will issue an interrupt at specified time intervals. As a result the OS will always have a chance to check whether the system is still under its control. This timer is also used to implement time-sharing. Chapter 3: Operating System Structures • • • • • • • OS is interrupt-driven. When an I/O device needs service, it generates an interrupt. When a program needs service, it generates a trap (software interrupt). Logically, OS has components for process management, main-memory management, file management, I/O system management, disk management, networking, protection, and command-interpreter system. Command-interpreter is the most visible user-interface for OS. It allows a user to type a command, and then the OS executes the corresponding OS program to provide some service. In Unix, the command-interrupter is called a shell. A batch file contains one or more commands for OS to execute in sequence or in particular order. OS provides many services to facilitate the execution of programs: o Program execution: load code, execute it, and end it. o I/O operations o File system manipulation: All data on hard disk are abstracted as files. A file is a logical sequence of bits. OS can create a file, read/write a file, delete a file, buffering data during file read/write; manage files with directories. o Communications: Allow processes communicate through shared memory or message passing. o Error detection o Resource allocation: Allocate CPU time, main memory space, file buffers, I/O devices, and communication sockets to processes. o Accounting o Protection Applications get OS service through system calls. Each OS needs to publish its application programmer interface (API) to allow programs to invoke its services. For MS Windows, it is Win32 API. Parameter passing for OS system calls: o For fixed small-size data, user registers. o For large data, keep the data in user memory space, and set its starting address in a specified register; the system call will go to the register to find the parameter data. A daemon is a process that runs without being associated with a particular user login session. It keeps listening for client invocation to provide some 5 • • • • • • • • • • service. Web server and Domain Name Server (DNS) are example system daemons. Processes running on the same system can communicate through shared memory. Processes running on the same or different systems can communicate through message passing: the sender will use method send() to send the data to its local OS; the OS will route the data to the OS of its destination; the receiving process will invoke receive() to receive the data from its local OS. An OS will usually consist of a kernel (the core functions of an OS that always stays in main memory) and some system programs. Typical system programs include the command-interpreter and programs implementing the OS commands. The structure of an OS has to balance the needs of performance and maintainability. In Unix, the kernel implements the common primitive operations; the system programs implement higher-level system calls based on the primitive kernel operations; and the shells and user programs get services through the system call API. OS with layer structures strives to let each layer of OS only invoke services from its immediate lower neighbor layer. Performance is a big issue for this approach. OS based on microkernels minimize the kernel functions and implement most of the OS services as system programs. As an extreme example of the concept of virtual machine, special system software could be used to simulate multiples of the underlying hardware machine, and support the installation and execution of the same or different operating systems on each of these virtual machines. Mechanisms should be separated from policies if possible: the OS should have rich built-in mechanisms for its management and configuration; but the actual policy controlling these mechanisms should stay out of the OS itself. For example, we build the process-scheduling primitives into an OS, but hopefully we can decide on whether we want to support first-come-first-served or priority-based scheduling at the installation time of the OS. System generation refers to the process of compiling and building an executable version of an OS for a particular environment. Many policies and system constants can be specified at this stage. Chapter 4: Processes • • A process is also called a job. It is the basic unit in OS for resource allocation. Each process has a unique ID number. Inside a process there are one or more threads for execution. For simplicity, this chapter focuses on processes with one thread only. Process execution state: o A process in Ready state can run as soon as it gets the CPU 6 o A process in Running state is executing with some CPU. For a system with one CPU, only one process may be in its running state. o A process in Waiting state is waiting for some event to occur. Typical events are I/O completion or reception of a signal. o A new created process is in Ready state and put in a ready queue. o When the CPU is available, the CPU scheduler will choose one of the processes in the ready queue, remove it from the queue, and start to run it in running state. o When the running process issues some system calls for I/O services, the process is put in a waiting state and get hooked in the waiting queue for a particular I/O device. o When the I/O service completes for a process, the process is put back in the ready state and inserted in the ready queue. • • • • • • • • • • Each process is represented inside an OS by its process Control Block (PCB). An PCB is an object having fields to save the process’s execution state, its unique process ID, value of program counter, value of general-purpose registers, value of base and limit registers, list of open files, etc. The PCB also has a pointer (reference) field for other PCBs so PCBs can be linked up in various queues like ready queue and waiting queues. Potential processes (jobs) wait for hard disk. When there is enough main memory, the long-term scheduler (job scheduler) will choose one of them and move it to the main memory. This process is now in the ready state. The long-term scheduler determines how many processes will reside in the main memory for execution, thus determines the degree of multiprogramming. A CPU-bound process spends more time on CPU execution. An I/O-bound process spends more time waiting for I/O operations. It is idea for improving system throughput and utilization if the long-term scheduler can choose a good mix of these two types of processes to run in the main memory. The short-term scheduler (CPU scheduler) decides on which process in the ready queue will take over the CPU for execution. If the processes are taking up more memory space than there are in the main memory, a medium-term scheduler may be used to swap-out and swap-in partially executed processes. Process context switch time is significant. Typical context switch time ranges from 1 to 1000 microseconds. When a user logs in to a Unix system, a new process is created to run the shell for the user. When the user issues a command at the shell, a new process will be created to execute the command. During the execution of a program, new child processes may be created to run subtasks. If two processes communicate through shared memory with bounded-buffer, the two processes must cooperate so there will be no buffer underflow or overflow. If two processes communicate through message passing, they can either use direct communication through a direct link, or use indirect communication 7 • • • • • • • • • • • • through a mailbox. The sender normally uses method send(pid, message), and the receiver uses method receive(pid, message). A blocking send will block the sender process until the receiver reaches its destination or mailbox. A nonblocking send sends the message to the OS buffer and resumes its operation right away. A blocking receive blocks until a message is available. A nonblocking receive will return either a valid message or null. If both the send and receive are blocking, we have a rendezvous between the sender and the receiver. Buffers are usually used to implement communication links. A send will be blocked if the link’s buffer is full. Each networked computer has a unique IP address. A Uniform Resource Locator (UML) address is used to make the IP address user-friendlier. A Domain Naming Service (DNS) can translate an UML address to its IP address. Each process on a computer can listen to a port to get messages from its clients. A port is represented by a unique integer. A socket is the abstraction of a port on a particular computer. When a server starts on a server machine, it will create a server socket on a particular port, publish its IP address and port number, and keep listening to that port for potential client messages. When a client needs to communicate with a server, it will generate a new socket on the client machine to connect to the server’s server socket (represented by server’s IP address and port number). The server socket will create a new socket on the server machine to communicate with this client. Both the server and the client know the IP address and port number of its partner’s socket, and they can communicate like writing into or reading from file streams. In the above discussion, except the server socket’s port number, all the sockets’ port numbers are randomly chosen from unused port numbers on its native machine. Java Remote Method Invocation (RMI) can allow an object to invoke a method on a remote server. For a class implementing the server object, a stub class will be generated with a tool that implements the same public method as the server object, and a skeleton class will be generated with the same tool to communicate with the stub on the server machine. When a server starts, a skeleton object is created an it will keep listening for potential messages from its corresponding stub objects. When a client needs to invoke a method on the remote server, its will create a stub object on its local machine. The matching stub and skeleton objects know how to communicate with each other. The client invokes a method on its local stub object (proxy). The method body of this stub will marshal its parameter values to a platform-independent form, and pass method name and the marshaled parameter values to the remote skeleton object. The remote skeleton object unmarshal the parameter values for its local platform, and invoke the local server object. When the server object returns the return value for the method, the skeleton object marshals the return value 8 into platform-independent form, and pass it to the remote stub object. The stub object will unmarshal the return value, and pass it back to the client as its own return value. Chapter 5: Threads • • • • • • • • • • • A process may contain one or more threads of computation. Each thread is represented by its own program counter value, registers’ value, and execution stack contents. Threads belong to the same process can share code and data. It is up to the programmer to protect the integrity of the shared data. The overhead for switching the CPU from one thread to another is much less compared to that for process context switch. Multithreading is critical in supporting responsiveness of an application. Multithreaded program running on a multiprocessor may speed up execution by letting each processor run a separate thread. Multithreaded program running on a uniprocessor can still overlap the execution of multiple threads on different functional units like CPU and I/O devices. Threads generated from programming languages are user threads. Threads supported by the OS are kernel threads. User threads will be mapped to kernel threads for execution. Thread scheduling is platform-dependent. A multithreaded application should be tested on all potential client platforms. Chapter 6: CPU Scheduling • • • • • CPU scheduler decides which ready process will get the CPU to run next. CPU burst processes spend much time in CPU computation; I/O burst processes spend much time in I/O operations. If the system has a good mix of CPU burst and IO burst processes, a CPU scheduler could improve CPU utilization by overlapping CPU and I/O operations. If a scheduling could take CPU away from a process that could still run on the CPU, it is a preemptive scheduling. Scheduling criteria: o CPU utilization o Throughput: number of processes finished per time unit. o Turnaround time: time from submission to completion of a process o Waiting time: total time a process spends in the ready queue. o Response time: time delay of a process’s response to a user request. Scheduling algorithms: o First-Come, First Served o Shortest-Job-First o Priority: job with the highest priority to run first 9 • o Round-robin: Each process takes turn to run up to a time quantum o Multilevel queue: Processes of a particular priority level go to its own separate queue; each queue is FCFS; processes in a queue will not be scheduled until all queues of higher priority are empty. It is usually preemptive. o Multilevel feedback queue: similar to multilevel queue, but processes could migrate to neighboring queues based on some scheduling policy. CPU scheduling must avoid process starvation: a process never gets its turn to run because there are always higher-priority processes. One approach is to enhance the priority of a process over time. Chapter 7: Process Synchronization • • • • If one process tries to modify some shared data and one or more other processes try to access the shared data, a race condition may happen so that either the final value of the shared data or the process output depends on the relative order of the process scheduling. A critical-section is a section of code in which multiple processes may try to access shared data. To prevent race condition, an entry section and an exit section should be used around a critical-section to make sure that at any time, only up to one process may execute in the critical-section. A solution to the critical-section problem must not make assumption of relative speeds of the involved processes, and must satisfy the following conditions: o Mutual exclusion: up to one process to run in the critical-section. o Progress: (1) only processes waiting to enter a critical-section should be part of the decision-making as to who would be the next one to enter; (2) such decision-making cannot be postponed indefinitely. o Bounded waiting: each process waiting to enter the critical-section should get its turn in bounded number of tries (competitions). The two-process critical-section problem can be solved by the following algorithm: boolean[] flag = {false, false}; int turn; do { flag[i] = true; turn = j; while (flag[j] && turn == j); // Critical Section flag[i] = false; // remainder section } where (true); 10 • • • • The multiple-process critical-section problem can be solved by the Bakery algorithm. Each process planning to enter the critical-section must apply for an ever-increasing sequence number. The process with the smallest sequence number will be the next to enter the critical-section. In case more than one processes get the same sequence number (processes apply for sequence numbers in parallel), use their unique process ID number to break ties. Hardware atomic instructions like testAndSet(boolean) can greatly simplify the solutions to critical-section problems. Boolean testAndSet(ref boolean target) { boolean rv = target; target = true; return rv; } …… boolean lock = false; …… do { while (testAndSet(lock)); // Critical Section lock = false; // remainder section } where while (true); Semaphores support atomic operations wait() and signal() o class semaphore { public int value; // initial value is 1 for mutual exclusion public ProcessQueue q; // initially q is empty } o void wait(Semaphore s) { s.value--; if (s.value < 0) { add this process’s PCB to s.q; block(); } } o void signal(Semaphore s) { s.value++; if (s.value <= 0) { // 1 - s.value = number of waiting processes remove the PCB of a process p from s.q; wakeup(p); } } Solving critical-section problem with a semaphore Semaphore mutex; Mutex.value = 1; …… 11 • • • • • • do { wait(mutex); // Critical Section signal(mutex); // remainder section } while (true); Improper usage of semaphores may lead to deadlock (no processes may execute) or starvation (some processes never get a chance to enter the criticalsection). If a computer cannot guarantee the atomic operation of semaphore wait() and signal() with hardware, a software approach for mutual exclusion must be used. Even though the software approaches reintroduce busy waiting, such busy waiting only happens during very short wait() and signal() operations. Classical Problems of Synchronization: o The bounded-buffer problem: two writer processes need exclusive access to a critical-section. o The readers-writers problem: each writer needs exclusive access to a critical-section, but multiple readers can access the critical-section at the same time. o The dining-philosophers problem: multiple processes compete for limited resources, and there are possibilities for deadlocks to occur. Critical regions and monitors are high-level language constructs to support process synchronization. They are usually implemented in semaphores. But they reduce the chances of misplaced semaphores. They are less powerful than the primitive semaphores. Variants of the monitor concept have been incarnated in Java classes through its synchronized keywords. In enterprise computing, a transaction is an atomic operation made up of a sequence of more primitive operations. All operations in a transaction must all succeed, or none of them should happen. Chapter 8: Deadlocks • • • For a process to use a resource, it normally needs to go through three steps: request (to OS), use, and release. A deadlock happens if for a set of processes, each holds some resources, and each needs some resources held by other processes to run. Necessary conditions for a deadlock to happen: o Mutual exclusion: some resources cannot be shared. o Hold and wait: at least one process is holding a resource while waiting for resources currently held by another process. o No preemption: resources cannot be taken away from a process before it finishes execution. o Circular wait: A set {P0, P1, …, Pn} of processes must exist such that P0 is waiting for some resource held by P1, P1 is waiting for 12 • • • • • • • • some resources held by P2, … and Pn is waiting for some resources held by P0. Resource allocation graph: Each process or resource type is represented by a vertex; there is a request edge from a process vertex to a resource type vertex if the process is requesting an instance of that type of resource; there is an assignment edge from an instance of a resource type to a process vertex if the instance has been assigned to the process. The existence of a directed loop in a resource allocation graph is a necessary condition for a deadlock to happen, but it is not a sufficient condition. Deadlock prevention: use a policy to break one of the four necessary conditions for a deadlock to happen. Since deadlock prevention ignores the actual resource allocation graph, it is the most conservative approach. A system is in safe state is there is an order to assign resources to the involved processes, one after another, so that all the processes can finish execution. A system in safe state cannot have deadlock. A system in an unsafe state may be in a deadlock. Deadlock avoidance: use an algorithm to make sure that the system is always working in a safe state. Deadlock avoidance uses the system resource allocation state and is therefore more aggressive than deadlock prevention. But it has much more execution overhead. Deadlock detection: don’t make effort to prevent or avoid deadlocks; when system performance is low, run a deadlock detection algorithm, and terminate some processes involved in a deadlock or preempt some of their resources. Most operating systems don’t address the deadlock problem due to performance considerations. Chapter 9: Memory Management • • • • • • Programs usually use a logical memory address space that is either a flat array of words starting from address zero, or a set of segments each representing a logical code or data unit. The physical address space usually doesn’t start at address zero, and it is assigned to multiple processes. The logical addresses in a program can be mapped to physical addresses at compile or assemble time, module linking time, executable loading time, or execution time. A program can run in any memory location only if it maps its logical addresses to physical ones at execution time. In its simplest form, the execution-time mapping of logical addresses to physical addresses can be done by a relocation register. If there is not enough physical memory of hold all processes to run, swapping may be used to temporarily swap a partially executed process out to a hard disk and later swap it in memory again. If the swap-out and swap-in are to support priority scheduling, they are also called roll out, roll in. For contiguous memory allocation, first-fit (using the first encountered memory hole that is big enough for the request) is usually preferred to best-fit 13 • • • • • • • • • • • • (using the smallest hole that is big enough for the request) or worst-fit (using the largest hole that is big enough for the request) due to its simplicity. Contiguous memory allocation suffers from external fragmentation: even though there are enough free memory space to run a new process, the free memory space is scattered out in many holes too small to be useful. Compaction is to move all processes’ memory space to one end of memory to consolidate the small holes into a single big one. Compaction is usually too time-consuming to be useful. Paging: Partition the logical address space into fix-sized pages, usually 4-8 kBs. Partition the physical address space into the same-sized frames. Each page can be loaded into any frame. The mapping of a page to a frame is done at execution time through a page table. A logical address is made up of a page number followed by an offset inside the page. Logically, the length of a page table is determined by the size of the logical memory space, which is huge; and each process must maintain its own page table. Page tables are usually kept in the main memory. Internal fragmentation: a process needs less memory than that is allocated to it. A smaller page size can minimize internal fragmentation, but it can also increase the page table size and data disk transfer overhead. Several processes can share pages if their page tables all refer to the same frames of the shared pages. Memory protection can be implemented by attaching access control bits to entries of a page table. An associative memory can be used as a translation look-aside buffer (TLB) that acts as a cache for a small subset of active page-table entries. The TLB can reduce the number of memory accesses for page-table lookups. A multi-level hierarchical paging table paginates the page table itself, thus avoids the requirement of the allocation of huge table memory space at upfront, and allocates memory for sub-tables only when necessary. Segmentation: support the logical view that a program is made up of variablesized segments of code and data; use a segment table to provide each segment with its for the base/limit registers. A logical address is made up of a segment number and an offset within the segment. The logical address is mapped to a physical address with the help of the segment table similar to the relocation register approach. It is more natural to attach protection bits to segment table entries, and share segments among multiple processes. Variable segment sizes may cause difficulty in memory allocation. Segmentation with paging is more popular: each variable-sized segment is further partitioned into fix-sized pages. This approach is used by Intel architectures. 14 Chapter 10: Virtual Memory • • • • • • • • • • • • Virtual memory: use a large disk’s space to support the illusion that the physical memory is as large. It is usually implemented through demandpaging or demand-segmentation. Demand paging: The disk is divided into a sequence of pages (a page is a multiple of a sector). The physical memory is divided into the same-sized frames. The virtual memory is as large as the disk capacity. Each process has its own page table to map the disk pages to memory frames. When a page is needed but it is not in memory, a free frame will be found or created through paging out a victim page to the disk, and the new page will be loaded into the frame. Demand paging may have reasonable performance because most software exhibits referential locality. Virtual memory may improve CPU utilization because it allows more processes to be in ready state. While swapping in/out is moving an entire process image between a disk and the main memory, paging in/out only moves in smaller page units. Both swapping in/out and paging in/out use raw disk I/O (treating a disk as a sequence of sectors and allocating consecutive sectors to the data or memory image), which is much more efficient than file system I/O, for which the data are usually scattered around a disk, and I/O has to go through directory searches and data copying through multiple buffers. Memory-mapped files: Use a system call to assign a contiguous block of sectors in the virtual memory disk to a file. Use lazy (demand) page loading from the file system. Access the contents of the file through virtual memory addresses. When there are not enough memory frames available, do paging in/out between the virtual memory disk and the memory. The file data will finally be copied back to the file system disk when the file is closed. This approach can improve performance because paging in/out are raw disk I/O. The common page replacement policies include FIFO (which has Belady’s anomaly: increasing frame number may not improve performance), LRU (least-recently-used), or approximations of LRU. Dirty bits can reduce the page replacement overhead. Attach a dirty bit to each page table entry. When a page is first loaded into a frame, reset the dirty bit. When a write happens inside the page, set its dirty page. When the page needs to be replaced, it needs to be copied back to the disk only if its dirty bit is set. A process needs a minimum number of page frames to run properly. The lower bound of this minimal number is usually decided by the maximum number of pages a machine instruction can reference. Thrashing: the processes have too few page frames so much of the execution time is spent on paging in/out. A working-set is the set of unique pages visited in the last k memory accesses, where k is called the working-set window size. A process needs enough memory frames for the pages currently in its working-set to run properly. If the total of the working-set sizes of the active processes exceeds the total 15 • number of physical memory frames, some processes must be killed to avoid thrashing. For Windows 2000, each process is initially assigned two numbers: a minimum number and a maximum number of frames decided by the compiler and OS. A process starts to run with its minimum number of frames. When a process needs more frames, it can get them from the shared set of free frames. When the amount of free frames fall below a threshold, an automatic workingset trimming process will reduce the number of frames owned by each process to its minimum size so more processes may be started. Chapter 11: File-System Interface • • • • • • • • • • • A file is a sequence of bytes. A file is usually stored on a disk. The main attributes of a file include its name, an ID number unique in a file system, type, data location on a disk, size, protection, times for file creation and last modification, owner user ID, and group ID. The attributes of a file are stored in an entry of a directory file, which is also usually stored on a hard disk. Some OSs use file name extension to indicate the data type of a file, and associate a particular file extension with a particular application that can process that type of file. The main file operations include file creation, reading a file, writing a file, deleting a file, moving the directory location of a file. A sequential access file accesses data sequentially from beginning to end. It maintains a current file position. A newly opened file has its current position at the start of the file data. Each read or write accesses the file at the current position, and then advances it for the next access. The contents of a sequential file doesn’t need to have a uniform structure. A random access file is made up of fix-sized records, and each record can be accessed directly based on the file data start address, record size, and record sequence number for the record of interest. The directory structure can be a tree, an acyclic-graph, or a graph, which is usually a tree with links (aliases). Directories allow for more logical file organizations, and reduce the chance of name conflicts. File links are pointers to existing files or directories. File links support file or directory sharing. File links are usually ignored during file system traversal (as an example, for file search). A file path is a sequence of device name (maybe not necessary) and directory names that eventually leads to a file or directory of interest. To simplify file specification, an OS maintains a current directory. Users can change the value of the current directory. A file path relative to the file system root (“/” for Unix, a device name like “c:” in DOS) is called an absolute path, and a path relative to the current directory is called a relative path. File protection mainly has two forms. One is the access control bits for read, write, and execute, each can be specified for file owner, a particular group of 16 • users, or for all other people. The more general approach is the ACL (Access Control List), for which we can specify access rights for each individual. A combination of these two approaches is common. Most OSs use environment variables to search for files. For example, environment variable PATH is usually used to specify how to find an executable file, and environment variable CLASSPATH is used to specify how to find Java source or class files during Java compilation or execution. These paths are made up a sequence directory paths (maybe Java JAR or zip file for CLASSPATH) separated by some separator (“;” for DOS, “:” for Unix). The current directory is represented by period “.”. A search for a file controlled by an environment variable will follow the order in which you list the component paths in the path definition. Chapter 12: File-System Implementation • • • • • • • A disk usually has a boot control block, a partition control block, a directory structure in which each file is represented by a FCB (File Control Block, directory entry, or inode in Unix), and file data area. The basic data access unit for a disk is a sector, which is usually 512K or more. A block is a multiple of a sector. A cluster is a multiple of a block. Accessing larger data units can improve disk access efficiency. While a disk for virtual memory implementation usually uses contiguous block allocation, a file system usually uses linked list of blocks or indexed blocks to store file data. In DOS, each partition has a FAT (File Allocation Table) on the disk. The number of entries in the FAT equals to the number of clusters the disk supports. Each FAT entry is basically a pointer to a cluster, and it points to the next cluster belonging to the same file. Because this FAT (or part of it) could be copied to the main memory at execution time, file accesses can avoid disk accesses that are just for following the linked list to find the cluster of interest. Therefore, a FAT is basically a table that centralizes all the linked list pointers. In Unix, each inode (directory entry) has 12 pointers to file data blocks, so that small files can be accessed without using indexing. An inode also has one single indirect pointer that points to an indexing table block holding pointers to file data blocks. An inode also has one double indirect pointer and one triple indirect pointer that generalize the single indirect approach to two-level and three-level indexing tables for supporting even larger files. Since each file access needs to use file attributes in its directory entry, we usually use an open() system call to copy a file’s directory entry into memory for fast access. Since an OS has limits for such opened file directory entries, it is good practice to close a file when it is not needed any more so the space for the in-memory directory entry can be recycled. OS usually has two in-memory tables for opened files. One is at system level, each entry of it holds general attributes of a file (basically a copy of the 17 • • • directory entry on the disk). The another one is per process, each entry of which holds information for a file specific to that process, like current read position. The system level table minimizes redundant file information in multiple processes if a file is accessed by each of them. Each per-process table entry has a pointer to one of the system-level table entry. When a user opens a file, he will get as return value the index for the file directory entry in the perprocess table. This index value is an integer and is usually called a file handle or a file descriptor. All other file access operations take this index as an argument. A disk needs to be mounted to a file system before it can be accessed. DOS and Mac use implicit mount. When a new disk is detected, it is automatically mounted to the file system under a special device name (“C:”, for example) or as a special folder. Unix doesn’t support special device names. A disk can be mounted to any directory of the existing file system, and the original contents of the directory will be hidden until the disk is unmounted. Unless a disk is listed in a boot-up script for automatic mount, normally a Unix user needs to manually issue mount commands to use a new disk. In Unix, a mount table is maintained to find out which prefix of an absolute path is specifying a disk device. When you write to a file, you may just write to a data buffer. When the data is full, or you issue a flush() command, or you close the file, the data in the buffer will be copied into the persistent disk copy of the file. 18

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Fundamental Concepts of Operating Systems