Download lec10

Systems Architecture II (CS 282-001) Lecture 10: Interfacing I/O Devices to Memory, Processor, and Operating System * Jeremy R. Johnson Monday, August 6, 2001 *This lecture was derived from material in the text (Chap. 8). All figures from Computer Organization and Design: The Hardware/Software Approach, Second Edition, by David Patterson and John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED). August 6, 2001 Systems Architecture II 1 Introduction • Objective: To learn how an I/O device communicates with a user program? – How is a user I/O request transformed into a device command and communicated to the device? – How is data actually transferred to or from a memory location? – What is the role of the operating system? • Topics – Role of the OS – Giving commands to the I/O system • I/O commands • Memory mapped I/O – Communicating with the processor • polling • interrupts – Transferring data between a device and memory • Direct Memory Access (DMA) – Designing an I/O system August 6, 2001 Systems Architecture II 2 Characteristics of I/O • The responsibility of the OS arise from three characteristics of I/O systems: – The I/O system is shared by multiple programs using the processor. – I/O systems often use interrupts (externally generated exceptions) to communicate information about I/O operations. Because interrupts cause transfer to kernel (or supervisor) mode, they must be handled by the OS. – The low-level control of an I/O device is complex because it requires managing a set of concurrent events and because the requirements for correct device control are often very detailed. August 6, 2001 Systems Architecture II 3 Functions of the OS • The OS guarantees that a user’s program accesses only the portions of an I/O device to which the user has rights. • The OS provides abstractions for accessing devices by supplying routines that handle low-level device operations • The OS handles the interrupts generated by I/O devices • The OS tries to provide equitable access to the shared I/O resources, as well as schedule accesses in order to enhance system throughput August 6, 2001 Systems Architecture II 4 Types of Communication Required • The OS must be able to give commands to the I/O device (e.g. read, write, disk seek, etc.) • The device must be able to notify the OS when the I/O device has completed an operation or has encountered an error. • Data must be transferred between memory and an I/O device. August 6, 2001 Systems Architecture II 5 Giving Commands to I/O Devices • Dedicated I/O instructions (e.g. Intel 80x86) – command and device number specified in the instruction – processor communicates the device address via a set of wires included as part of the I/O bus – illegal to execute while in user mode • Memory-mapped I/O – – – – – Portions of the address space are assigned to I/O devices commands and data are written to special addresses data and status info read from special addresses Memory system ignores operation (determined by address) I/O controller, sees the operation, and transmits it to the device August 6, 2001 Systems Architecture II 6 Communicating with the Processor • Polling – Simplest way for an I/O device to communicate with the processor – I/O device simply puts information in a status register, and the processor must come and get the information – Periodically check status bits to see if it is time for the next I/O operation • Interrupt-driven I/O – The disadvantage of polling is that it wastes a lot of time. – When a device wants to notify the processor that it has completed some operation or that it needs attention, it causes the processor to be interrupted – An interrupt is similar to an exception, except • it is asynchronous with respect to instruction execution • the processor must be notified of the device causing the interrupt • interrupts must be prioritized according to the devices that caused them August 6, 2001 Systems Architecture II 7 Overhead of Polling • Determine impact of polling on three different devices: – Assume 400 cycles for polling operation and a 500 MHz clock – Determine fraction of CPU time consumed in the following 3 cases (assume that you poll often enough so that no data is lost and that the devices are potentially always busy) – Mouse must be polled 30 times per second – – Floppy disk transfers data to processor in 16-bit units and has a transfer rate of 50 KB/sec – Hard disk drive transfers data in 4 word chunks and can transfer at 4 MB/sec August 6, 2001 Systems Architecture II 8 Overhead of Polling 1 Mouse: 30 accesses per second – 30  400 = 12,000 cycles per second for polling – Fraction of processor clock cycles = (12  103)/(500  106 ) = 0.002% 2 Floppy Drive: 50KB / sec  25K accesses / sec 2bytes / polling access – 25K  400 cycles per second for polling – Fraction of processor clock cycles = (10  106)/(500  106 ) = 2% 3 Hard Drive: 4MB / sec  250 K accesses / sec 16bytes / polling access – 250K  400 cycles per second for polling – Fraction of processor clock cycles = (100  106)/(500  106 ) = 20% August 6, 2001 Systems Architecture II 9 Transferring Data between a Device and Memory • Using polling – Initiate transfer and periodically check for completion – Periodically check for updates from device (e.g. mouse) • Interrupt-driven – OS initiates transfer and waits for interrupt to indicate that the transfer has completed or an error has occurred – OS still transfers data is small chunks and must communicate through interrupts many times during the complete I/O operation • Direct Memory Access (DMA) – Also interrupt-driven, but in this case the transfer is controlled by the device without intervention by the OS (interrupt occurs only when entire transfer is complete or an error occurs) – Appropriate for high-bandwidth devices with relatively large blocks of data August 6, 2001 Systems Architecture II 10 Overhead of Interrupt-Driven I/O • Assume hard disk drive transfers data in 4 word chunks and can transfer at 4 MB/sec – 500 MHz clock – Overhead of transfer including interrupt is 500 cycles – Hard drive is transferring data only 5% of the time • Interrupt rate when the disk is busy is the same as polling – 250K  500 = 125  106 cycles per second for disk – Fraction of processor clock cycles = (125  106)/(500  106 ) = 25% • Assuming that the disk is only transferring data 5% of the time – Fraction of processor clock cycles = 25%  5% = 1.25% – Compare to polling - the absence of overhead when the disk is not active is the major advantage of an interrupt-driven interface August 6, 2001 Systems Architecture II 11 Overhead of DMA • Assume hard disk drive transfers data in 4 word chunks and can transfer at 4 MB/sec – – – – • 500 MHz clock Assume transfer with DMA and initial DMA setup takes 1000 cycles Overhead of interrupt at completion is 500 cycles If the average transfer is 8KB, what fraction of the CPU is consumed if the disk is active 100% of the time (ignore processor/DMA controller bus contention) 8KB 3 Each DMA transfer takes: 4MB / sec  2 10 sec – Cycles/sec for disk = 1000  500cycles/ transfer  750 103 clock cycles / sec 3 2 10 sec / transfer – Fraction of processor clock cycles = (750  103)/(500  106 ) = 0.15% August 6, 2001 Systems Architecture II 12 Issues with DMA • With DMA there is another path to memory • This provides difficulties with virtual memory and cache – Should physical or virtual addresses be used? – If virtual, the DMA unit, must translate to physical addresses – If physical must ensure that addresses don’t cross page boundaries (otherwise memory addresses would not be contiguous) – Can break transfer into a sequence of page size transfers – OS must not remap memory during DMA transfer – The value of a memory location as seen by DMA and the processor may differ – stale data or coherency problem (value in cache different from memory). Solved by routing through cache or cache flushing August 6, 2001 Systems Architecture II 13 Designing an I/O System • Design I/O system that ensures that latency is bounded by a certain amount. • Design I/O system to meet a set of bandwidth constraints given a workload August 6, 2001 Systems Architecture II 14 Designing an I/O System • Consider the following system: – – – – 300 MHz CPU 50,000 instructions in OS per I/O operation A memory backplane bus capable of a transfer rate of 100MB/sec SCSI-2 controllers with a transfer rate of 20MB/sec and accommodating up to seven disks – Disk drives with read/write bandwidth of 5MB/sec and an avg. seek plus rotational latency of 10ms • If the workload consists of 64-KB reads and the user program needs 100,000 instructions per I/O operation, find the maximum sustainable I/O rate and the number of disks and SCSI controllers required (ignore disk conflicts). August 6, 2001 Systems Architecture II 15 Designing an I/O System • To find max I/O rate, find rate for two fixed components to determine which is the bottleneck • Max I/O rate of CPU Instructio n execution rate 300 106   2000 I/Os / sec Instructio ns per I/O (50  100) 103 • Max I/O rate of bus Bus bandwidth 100 106   1562 I/Os / sec Bytes per I/O 64 103 • To determine the number of disks, we need to know the time per I/O operation – 10ms + 64KB/5 MB/sec = 22.8 ms – 1000/22.8 = 43.9 I/Os per sec – 1562/43.9 = 36 disks August 6, 2001 Systems Architecture II 16 Designing an I/O System • To compute the number of SCSI buses, we need to know the transfer rate – Transfer size/Transfer time = 64KB/22.8 ms = 2.74 MB/sec – Assume that disk accesses are not clustered so that we can use the full bandwidth of the bus – 2.74  7 = 19.18, so we can use seven disks per SCSI bus • This calculation required several simplifying assumptions, in practice, where this is not the case, simulation is used. August 6, 2001 Systems Architecture II 17

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lec10