Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operating Systems 2230 Computer Science & Software Engineering Lecture 8: Complexity of I/O Devices Modern operating systems are expected to support a wide variety of different input and output (I/O) devices. However, the variety of these devices makes their consistent, logical, and efficient support difficult. In particular, the I/O transfer rates and sizes are of most concern for efficient operation of an operating system. The types of I/O devices supported can be grouped roughly into three distinct categories: Human interface devices communicate with the user, including keyboard (10Bps), mouse (50Bps), laser printer (4MBps), and video displays (50MBps). Machine-readable devices communicate within the single computer system, or provide digital input from external sensors, including digital-analogue converters (100KBps), floppy disks (10KBps), magneto-optical disks (1MBps), magnetic tape (5MBps), and magnetic disk (10MBps). Communication devices connect computer systems to each other, including modem (5KBps), standard Ethernet (1.25MBps), ATM networks (19.375MBp and very fast Ethernet (125MBps). Note, that the (approximate) I/O transfer speeds shown above are presented in multiples of Bytes-per-second (Bps). 1 Many I/O devices, typically communications/networking devices, report their transfer speeds in bits-per-second (bps). Moreover, communication speeds are given in multiples of 1000bps, not 1024bps, so that transferring one megabyte of data at 1Mbps may take 2.5% longer than you’d think! The diversity of uses to which I/O devices are placed makes it difficult for an operating system to make a (single) uniform and consistent approach to their management. We can highlight many of these differences: Data rate: our previous examples have highlighted transfer rates spanning 7-8 orders of magnitude. Application: the expected use of a device affects the software policies and priorities employed by an operating system. For example, different output devices may be supported at different priorities (particularly in a real-time, alert-based system) and otherwise identical disk drives may be managed differently if one is the swapping device, and the other “only” stores users’ files. Complexity of control: devices such as mice and keyboards require little control (being read-only devices), whereas bidirectional, mirrored disk drives are much more complex. Data transfer models: typically character stream-based (keyboards), or blockbased (disks and tapes). Error management: complex I/O devices often recover from their own errors, and the operating system only hears of catastrophic failures. In addition, some errors may be handled by the operating system, whereas others must “percolate” to the user’s application. Stallings describes I/O management as the “messiest aspect of operating system design”. 2 Types of I/O Functions An operating system may be expected to support I/O using one of three methods. Which method is employed depends on the complexity of the I/O device: Programmed, or polled, I/O The processor issues an I/O-based instruction on behalf of the currently executing process. The process loops incessantly until the I/O request is satisfied. Of course, in a multi-programmed environment, all other processes are delayed, too. This technique is often termed busy-waiting. Interrupt-driven I/O The processor again issues an I/O-based instruction on behalf of the currently executing process. The process either continues execution until it is informed that the I/O has completed (termed asynchronous I/O), or the process is blocked, another process may execute, and the original process is eventually marked as Ready when the operating system receives the appropriate “I/O done” hardware interrupt. Direct Memory Access (DMA) The processor issues an I/O-based instruction on behalf of the currently executing process, but directs the request to a DMA (hardware) module. The DMA module manages the data transfer between main memory and the I/O device without processor intervention. When the whole data block has been transfered, the DMA module interrupts the processor (as described above). 3 Complexity of I/O Responsibilities As computer hardware and operating systems have evolved, the methods employed to manage I/O have increased in complexity. These evolutionary steps have been: 1. the processor directly controls the I/O device. 2. an I/O controller or module is employed. The processor communicates with this controller using programmed I/O (no interrupts), but is unaware of the device’s external interface. 3. an I/O controller is again used, but efficiency is increased because the controller interrupts the processor when ready. 4. the I/O controller communicates with the DMA module, relinquishing the processor of all I/O responsibilities. 5. the I/O controller supports its own processor and I/O instruction set. The main processor initiates I/O by providing the controller with the address of a sequence of I/O instructions in main memory. When the I/O channel has fetched and executed this sequence, it interrupts the main processor. 6. the I/O controller supports its own processor and local memory. The I/O controller is now directed from the main processor by simply providing a description of the I/O task required. When this I/O processor has performed its task, it interrupts the main processor. In this evolutionary sequence, the I/O controller becomes increasingly “intelligent”, and the main processor is increasingly relieved of any I/O responsibilities, leaving it to perform other computations. 4 Direct Memory Access The Direct Memory Access (DMA) controller has the responsibility of transferring blocks of data between main memory and any I/O devices. Typically, data sizes are kilobytes or megabytes. The DMA module uses the main bus to perform the transfer. Ideally this use will be when the processor does not require the bus, otherwise the DMA module must suspend the processor’s execution. The latter technique is often termed cycle-stealing as the DMA module’s actions steal a processor cycle. Figure 1: DMA and Interrupt Breakpoints. As can be seen from Figure 1, the processor does not need to use the bus all of the time (and certainly not for data stored entirely in its registers). This provides opportunities for the DMA controller to transfer data using the bus when it is otherwise idle. The DMA module is able to suspend the processor during its instruction fetchdecode-execute cycle. During the DMA breakpoints, the DMA module suspends the processor and transfers a single unit (typically only one byte) between 5 memory and I/O module. Notice that the DMA module does not interrupt the processor, it just suspends it. There is no need for the processor to save its execution context and execute another routine, as the DMA module does not alter this context. The processor runs slower due to its suspension, but not as slow as if it were interrupted at the completion of each byte’s transfer or if polling were being used. Direct Memory Access Organisation At the hardware level, the main processor, DMA module, and other I/O modules may be configured in different ways (Figure 11.3; this and all subsequent figures are taken from Stallings’ web-site): In 11.3(a), DMA modules are separate devices, on the same bus as the processor and I/O modules. In 11.3(b), more sophisticated (faster) devices have their own DMA controller, or a single DMA controller (here, often termed an I/O channel) may directly support multiple devices. In 11.3(c), the DMA module acts as an I/O bridge between the main system bus and a new I/O bus. It may now be possible for a single system to have a variety of different I/O bus standards, such as IDE (Integrated Drive Electronics) or SCSI (Small Computer System Interface) for disk drives, a USB v2.0 bus (Universal Serial Bus, 60MBps) for external drives and scanners, and a Firewire bus (IEEE 1394, 50MBps) for digital multimedia. This decreases the number of direct interfaces that the DMA module must support, and removes half of the main bus traffic which can potentially interfere with the processor’s execution. Moreover, some devices may now only be contacted via their DMA (now, bus) controller. 6 Processor DMA • • • I/O Memory I/O (a) Single-bus, detached DMA Processor DMA Memory DMA I/O I/O I/O (b) Single-bus, Integrated DMA-I/O System bus Processor Memory DMA I/O bus I/O I/O I/O (c) I/O bus Figure 11.3 Alternative DMA Configurations 7 Logical Organisation of I/O As with most aspects of operating system design, a hierarchical structure can be employed to decompose I/O responsibilities into manageable subproblems. At the lowest level of the I/O hierarchy are sections of code which must interact directly with hardware, and complete their activities in a few billionths of a second. At the highest level, application programs wish to communicate their I/O requirements at a more logical level, and wish to be isolated from hardware specifics. We see these more “logical” interfaces represented in a consistent, small set of system calls. There are three distinct levels in this hierarchy: Logical I/O: treats the I/O device as a logical resource, while ignoring the control of the device. Operating system interfaces allow application programs to interact with this responsibility with familiar operations such as open, seek, read, write, and close. Device I/O: the required I/O operations and the data’s location are converted to sequences of instructions and accesses. Scheduling and control: the operating system schedules the I/O requests in an attempt to maximise their throughput (based on the I/O device’s characteristics). “Returning” interrupts are also handled at this level, with control information (status) being returned to the invoking processes. Examples of Logical I/O Organisation How much operating system code is “squeezed” above Device I/O depends mostly on how the device is represented to an application (Figure 11.4). For example, the Communications port’s code may include a full TCP/IP protocol suite, requiring tens of thousands of lines of code. 8 User Processes User Processes User Processes Directory Management Logical I/O Communication Architecture File System Physical Organization Device I/O Device I/O Device I/O Scheduling & Control Scheduling & Control Scheduling & Control Hardware Hardware Hardware (a) Local peripheral device (b) Communications port (c) File system Figure 11.4 A Model of I/O Organization 9 The Need For I/O Buffering As we have seen, if a process wishes to input a block of characters, it may either poll the I/O module until the request is satisfied, or employ interrupt-driven I/O. The second approach is more efficient as other processes may execute while we wait for the I/O. However, the interrupt-driven approach interferes with the desirable use of swapping: while the I/O blocked process is waiting, the operating system may choose to suspend the process by moving some of its physical memory to the swapping device. The memory (page) holding the buffer to receive the input must not be swapped out, as it must be resident when the I/O request is satisfied. It is thus impossible to completely swap out the process. Moreover, the operating system must avoid the single-process deadlock condition under which the process is blocked on the I/O event, and the I/O event is blocked waiting for the process to be swapped in! To avoid this deadlock, the region of memory receiving the I/O result must be locked in memory, however long this takes. There is an analogous condition for processes wishing to perform output. Types Of I/O Buffering The operating system must ensure that the pages of memory involved in I/O requests are not swapped out while requests are pending. The standard solution is for the operating system to reserve some memory, collectively termed I/O buffers, for all I/O requests (Figure 11.5). An application’s data is either initially copied to an I/O buffer (for an output operation), or finally copied from an I/O buffer (for an input operation). Under Linux, try the free command. 10 Operating System I/O Device User Process In (a) No buffering Operating System I/O Device In User Process Move (b) Single buffering Operating System I/O Device In User Process Move (c) Double buffering Operating System I/O Device In • • User Process Move (d) Circular buffering Figure 11.5 I/O Buffering Schemes (input) 11 Disk I/O and Its Performance Over the last thirty years, speed improvements of processors and memory have far outstripped the improvements in disk speeds. Disks have become much larger, but not correspondingly faster. Electromechanical disks still provide the best combination of lowest access time and cost of storage, and the highest storage capacity for read/write digital storage. Sectors Tracks Inter-sector gap • • • S6 Inter-track gap • • • S6 S5 SN S5 SN S4 S1 S4 S1 S2 S3 S2 S3 Figure 11.16 Disk Data Layout 12 The speed of all file servers and user applications is limited by the number of disk drive operations (reads and writes) that can be performed per second. When operating at full capacity, a disk is spinning with a constant angular velocity. To read or write information from a disk, (one of) its disk head(s) must be positioned over the correct track, and then must wait for the correct sector to spin underneath the head (Figure 11.16). Disk Access Speeds On a fixed-head disk, track selection simply involves electronically selecting the correct head over the required track (standard PC-BIOS supports up to 255 heads per drive). More typically, on a moving head disk, one head must first be positioned over the required track (Enhanced IDE drives support a maximum of 16 heads per disk). The time spent waiting for the head to position itself is termed the seek time. Typical high-performance disk drives can move the read/write head over half the disk surface (an average seek) in 8-12ms. The time taken for the correct sector to spin under the head is termed the rotational delay. At 5,400 revolutions per minute (a typical value for inexpensive drives), the disk does a full-revolution in 1/90s, i.e. 11 ms. We thus say the average rotational delay is 5.5ms. Older disks (usually those less than 540 MBytes) spin at 3,600 RPM. More expensive drives are typically rated at 7,200 RPM or 10,000 RPM. The sum of the seek times and rotational delay is termed the access time. Once the correct sector is positioned under the head, the data may transferred between the disk and disk controller (either read or written). While an IDE interface transfers data at a burst rate of about 2 MBytes/s and 13 SCSI typically transfers data at 10 MBytes/s, the data is read from the actual disk drive at 5 Mbits/s to 48 Mbits/s. A 4,096 byte transfer (a typical cluster size) would transfer in 0.5-2 ms. Finally, the data may be transferred from the disk’s controller to the main memory (hopefully via the DMA module). This transfer rate depends on the bus used in the PC. The ISA specification transfers at about 2 MBytes/s, and PCI specification at up to 132 MBytes/s. We can summarise the times required for a typical disk operation: 35% Head seek to required track. 25% Rotational latency (for correct sector to spin under head). 25% Data transfer from disk to controller. 10% Disk driver software handling. 5% Data transfer from controller to memory. 14 Disk Scheduling Policies Of clear importance in minimising the total access times of disk data is minimising the disk’s seek time. On single user workstation, a typical sequence of disk read requests (track+sector) will arrive (relatively slowly) from a single application, often in sequential order. However, on a multi-user system, disk requests may appear in a random sequence of (track+sector) requests. We could consider the random arrival order as the worst-case access pattern, giving the worst possible performance. The simplest disk scheduling strategy is the obvious first-in-first-out (FIFO), in which all requests are serviced in order. It is a fair strategy as all requests are eventually satisfied and have a bounded response time. Figure 11.7 introduces an example track request sequence of 55, 58, 39, 18, 90, 160, 150, 38, 184. We can consider the effect of some simple disk scheduling policies on this single data set. We can improve the scheduling strategy by appreciating that the selection function (in software) can execute much faster than the required hardware positioning: track numbers can be sorted very quickly. Stallings further introduces three simple strategies: Shortest Seek Time First (SSTF) selects the next disk request which results in the least movement of the disk head. The SCAN policy keeps moving the disk head in the same direction until there are no more requests in that direction. This policy overcomes potential starvation in the SSTF policy. The C-SCAN policy restricts scanning to one direction only. thus reducing the expected waiting time for all tracks. 15 track number track number 0 25 50 75 100 125 150 175 199 track number track number 0 25 50 75 100 125 150 175 199 0 25 50 75 100 125 150 175 199 0 25 50 75 100 125 150 175 199 (a) FIFO Time (b) SSTF Time (c) SCAN Time (d) C-SCAN Figure 11.7 Comparison of Disk Scheduling Algorithms (see Table 11.3) 16 Time