Download CS4100 template - 清華大學資訊工程系

CS4100: 計算機結構 I/O Systems 國立清華大學資訊工程學系九十五學年度第一學期 Adapted from Prof. D. Patterson’s class notes Copyright 1998, 2000 UCB 5 Components of Any Computer Computer Processor Memory (active) (passive) Control (“brain”) (where programs, Datapath data live (“brawn”) when running) Devices Input Output Input/Output-1 Keyboard, Mouse Disk (where programs, data live when not running) Display, Printer Computer Architecture “What’s This Stuff Good For?” Remote Diagnosis: “NeoRest ExII,” a high-tech toilet features microprocessor-controlled seat warmers, automatic lid openers, air deodorizers, water sprays and blowdryers that do away with the need for toilet tissue... Toto's engineers are now working on a model that analyzes urine to determine bloodsugar levels in diabetics and then automatically sends a daily report, by modem, to the user's physician. One Digital Day, 1998 www.intel.com/onedigitalday  Input/Output-2 Computer Architecture Motivation for Input/Output     I/O is how humans interact with computers I/O gives computers long-term memory. I/O lets computers do amazing things:  Read pressure of synthetic hand and control synthetic arm and hand of fireman  Control propellers, fins, communicate in BOB (Breathable Observable Bubble) Computer without I/O like a car without wheels; great technology, but won’t get you anywhere Input/Output-3 Computer Architecture I/O Design Issues  Many factors (expandability, resilience) besides perf.   I/O performance complex: latency, throughput I/O performance depends on aspects of system:  Access latency, throughput, connection between devices and the system, memory hierarchy, OS Processor Interrupts Cache Memory– I/O bus Main memory I/O controller Disk Disk Input/Output-4 I/O controller I/O controller Graphics output Network Computer Architecture Outline      I/O performance measures Types and characteristics of I/O devices Buses Interfacing I/O devices Designing an I/O system Input/Output-5 Computer Architecture I/O System Performance  I/O System performance depends on many aspects of the system (limited by weakest link in the chain):   The CPU The memory system:         Internal and external caches Main memory The underlying interconnection (buses) The I/O controller The I/O device The speed of the I/O software (operating system) The efficiency of the software’s use of the I/O devices Two common performance metrics:   Throughput: I/O bandwidth Response time: latency Input/Output-6 Computer Architecture Simple Producer-Server Model Producer  Server Throughput:   Number of tasks completed by the server in unit time In order to get the highest possible throughput:    Queue The server should never be idle The queue should never be empty Response time:   Conflicting goals Begins when a task is placed in the queue and ends when it is completed by the server In order to minimize the response time:  The queue should be empty and the server is idle Input/Output-7 Computer Architecture Throughput vs. Response Time Response Time (ms) 300 pay a steep price in response time to get the last few % of max. throughput 200 100 20% 40% 60% 80% 100% Percentage of maximum throughput Input/Output-8 Computer Architecture Throughput Enhancement Server Queue Producer Queue  In general throughput can be improved by:   Server Throwing more hardware at the problem Response time is much harder to reduce:  Ultimately limited by speed of light (we are far from it) Input/Output-9 Computer Architecture I/O Benchmarks for Perf. Measure (1/2)  Supercomputer application:     Large-scale scientific problems => large files One large read and many small writes to snapshot computation Concerned with data rate: MB/second between memory and disk Transaction processing:     Examples: Airline reservations systems and bank ATMs Small changes to large shared database Concerned with I/O rate: number of disk accesses/second in bytes/second Typical benchmark: TPC-C  light/medium-weight queries on order-entry Input/Output-10 Computer Architecture I/O Benchmarks for Perf. Measure (2/2)  File system:  Measurements of UNIX file systems in an engineering environment:     80% of accesses are to files less than 10 KB 90% of all file accesses are to data with sequential addresses on the disk 67% are reads, 27% writes, 6% read-modify-write A synthetic benchmark: 70 files of 200KB in 5 phases      MakeDir Copy ScanDir ReadAll Make Input/Output-11 Computer Architecture Outline      I/O performance measures Types and characteristics of I/O devices Buses Interfacing I/O devices Designing an I/O system Input/Output-12 Computer Architecture I/O Device Examples and Speeds  I/O Speed: bytes transferred per second (from mouse to display: 1-to-million) Device Behavior Partner Keyboard Mouse Voice output Floppy disk Laser printer Magnetic disk Network-LAN Graphics display Input Input Output Storage Output Storage I or O Output Human Human Human Machine Human Machine Machine Human Data Rate (KBytes/s) 0.01 0.02 5.00 50.00 100.00 10,000.00 10,000.00 30,000.00 We will concentrate on disks in the following discussion Input/Output-13 Computer Architecture Disk History (1/2) Data density Mbit/sq. in. Capacity of Unit Shown Megabytes 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” Input/Output-14 Computer Architecture Disk History (2/2) 1997: 3090 Mbit/sq. in 8100 MBytes 1989: 63 Mbit/sq. in 60,000 MBytes source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” Input/Output-15 Computer Architecture 1-inch Disk Drive!  2000 IBM MicroDrive:      1.7” x 1.4” x 0.2” 1 GB, 3600 RPM, 5 MB/s, 15 ms seek Digital camera, PalmPC? 2006 MicroDrive? 9 GB, 50 MB/s!   Assuming it finds a niche in a successful product Assuming past trends continue Input/Output-16 Computer Architecture Storage Technology Drivers  Driven by the prevailing computing paradigm   1950s: migration from batch to on-line processing 1990s: migration to ubiquitous computing    computers in phones, books, cars, video cameras, … nationwide fiber optical network with wireless tails Effects on storage industry:  Embedded storage   Data utilities   smaller, cheaper, more reliable, lower power high capacity, hierarchically managed storage Network-attached storage (NAS) Input/Output-17 Computer Architecture Historical Perspective     Form factor and capacity drives market, more than performance 1970s: Mainframes => 14 inch diameter disks 1980s: Minicomputers, Servers => 8”, 5.25” diameter disks Late 1980s/Early 1990s:    Pizzabox PCs => 3.5 inch diameter disks Laptops, notebooks => 2.5 inch disks Palmtops didn’t use disks, so 1.8 inch diameter disks didn’t make it Input/Output-18 Computer Architecture Technology Trends Disk Capacity now doubles every 18 mo; before 1990 every 36 months The I/O GAP • Today: processing power doubles every 18 months • Today: memory size doubles every 18 months (4X/3yr) • Today: disk capacity doubles every 18 months • Disk positioning rate (Seek + Rotate) doubles every ten years! Input/Output-19 Computer Architecture Disk Device Technology Arm Head Inner Sector Track Actuator    Outer Track Platter Several platters, with information recorded magnetically on both surfaces (usually) Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes); error correction code per sector to find and correct errors Actuator moves head (end of arm) over track (“seek”), wait for sector rotate under head, then read or write  “Cylinder”: all tracks under heads Photo of Disk Head, Arm, Actuator Spindle Arm Head Actuator Platters (12) Magnetic Disk Characteristic   Cylinder: all tacks under the head at a given point on all surface Read/write is a 3-stage process:      Seek time: position the arm over proper track (8 to 20 ms. avg.) Rotational latency: wait for desired sector rotate under head (.5 / RPM) Head Transfer time: transfer a block of bits (sector) under the read-write head (2 to 15 MB/sec) Disk controller time Cylinder Platter Average seek time in the range of 8 ms to 12 ms   Track Sector (Sum of time for all possible seek) / (total # of possible seeks) Due to locality of disk reference, actual average seek time may only be 25% to 33% of advertised number Input/Output-22 Computer Architecture Typical Numbers of a Magnetic Disk Diameter: 1.8” to 8” Platters (1-15) Track   1000 to 5,000 tracks per surface 64 to 200 sectors per track (512 bytes/sector)   Sector A sector is the smallest unit that can be read or written (sector #, gap, information of sector+CRC, gap, …) Traditionally all tracks have same number of sectors:   Constant bit density: more sectors on outer tracks Recently relaxed: constant bit size, speed varies with track location Input/Output-23 Computer Architecture Typical Numbers of a Magnetic Disk  Rotational Latency:     Most disks rotate at 3,600 to 7200 RPM Approximately 16 ms to 8 ms per revolution, respectively An average latency to the desired information is halfway around the disk: 8 ms at 3600 RPM, 4 ms at 7200 RPM Track Sector Cylinder Head Platter Transfer Time is a function of :      Transfer size (usually a sector): 1 KB / sector Rotation speed: 3600 RPM to 10000 RPM Recording density: bits per inch on a track Diameter typical diameter ranges from 1.8 to 5.25 in Typical values: 2 to 40 MB per second Input/Output-24 Computer Architecture An Example: Barracuda 180         source: www.seagate.com; 181.6 GB, 3.5-inch disk 7200 RPM; SCSI 4.16 ms = 1/2 rotation 12 platters, 24 surfaces 31.2 Gbit/sq. in. areal density 10 watts (idle) 0.1 ms controller time 8.0 ms avg. seek 35 to 64 MB/s(internal) • $7.50 / GB • (Lower capacity, ATA/IDE disks ~ $2 / GB) Input/Output-25 Computer Architecture Disk Device Performance Controller Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + transfer Time   Average distance of a sector from head? 1/2 time of a rotation  7200 Revolutions Per Minute => 120 Rev/sec  1 revolution = 1/120 sec => 8.33 milliseconds  1/2 rotation (revolution) => 4.16 ms Average number of tracks moved under arm?  Sum all possible seek distances / # possible  Assumes average seek distance is random Input/Output-26 Computer Architecture Example 512 byte sector, rotate at 5400 RPM, advertised seeks is 12 ms, transfer rate is 4 MB/sec, controller overhead is 1 ms, queue idle so no service time Disk Access Time = Seek time + Rotational Latency + Transfer time + Controller Time + Queuing Delay Disk Access Time = 12 ms + 0.5 / 5400 RPM + 0.5 KB / 4 MB/s + 1 ms + 0 Disk Access Time = 12 ms + 0.5 / 90 RPS + 0.125 / 1024 s + 1 ms + 0 Disk Access Time = 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0 ms Disk Access Time = 18.6 ms  If real seeks are 1/3 advertised seeks, then its 10.6 ms, with rotation delay at 50% of the time!  Input/Output-27 Computer Architecture Areal Density  Bits recorded along a track   Number of tracks per surface   Metric is Bits Per Inch (BPI) Metric is Tracks Per Inch (TPI) Care about bit density per unit area    Metric is Bits Per Square Inch Called Areal Density Areal Density = BPI x TPI Input/Output-28 Computer Architecture Data Rate: Inner vs. Outer Tracks  To keep things simple, originally kept same number of sectors per track   Competition decided to keep BPI the same for all tracks (“constant bit density”)     Since outer track longer, lower bits per inch More capacity per disk More of sectors per track towards edge Since disk spins at constant speed, outer tracks have faster data rate Bandwidth outer track 1.7X inner track! Input/Output-29 Computer Architecture Disk Performance Model/Trends  Capacity + 100%/year (2X / 1.0 yrs)  Transfer rate (BW) + 40%/year (2X / 2.0 yrs)  Rotation + Seek time – 8%/ year (1/2 in 10 yrs)  MB/$ > 100%/year (2X / <1.5 yrs) Fewer chips + areal density  Areal density Change slope 30%/yr to 60%/yr about 1991 Input/Output-30 Computer Architecture Reliability and Availability  Two terms that are often confused:    Availability can be improved by adding hardware:   Reliability: Is anything broken? Availability: Is the system still available to the user? Example: adding ECC on memory Reliability can only be improved by:    Better environmental conditions Building more reliable components Building with fewer components  Improve availability may come at the cost of lower reliability Input/Output-31 Computer Architecture Disk Arrays  Arrays of small and inexpensive disks  Increase potential throughput with many disk drives:    Data is spread over multiple disk Multiple accesses are made to several disks Reliability is lower than a single disk:  But availability improved with redundant disks (RAID):  Lost information reconstructed from redundant infor. Input/Output-32 Computer Architecture Disk Summary  Magnetic Disks continue rapid advance: 60%/yr capacity, 40%/yr bandwidth, slow on seek, rotation improvements, MB/$ improving 100%/yr?    Designs to fit high volume form factor Disk performance: Disk Latency = Queuing Time + Controller time + Seek Time + Rotation Time + transfer Time RAID   Higher performance with more disk arms per $ Adds availability option for small number of extra disks Input/Output-33 Computer Architecture Outline      I/O performance measures Types and characteristics of I/O devices Buses Interfacing I/O devices Designing an I/O system Input/Output-34 Computer Architecture What Is a Bus?  A Bus Is:   shared communication link single set of wires used to connect multiple subsystems Processor Input Control Memory Datapath  Output A Bus is also a fundamental tool for composing large, complex systems  systematic means of abstraction Input/Output-35 Computer Architecture Ex.: Pentium System Organization Processor/Memory Bus PCI Bus I/O Busses Advantages of Buses  Versatility:    New devices can be added easily Peripherals can be moved between computer systems that use the same bus standard Low Cost:  A single set of wires is shared in multiple ways Processor I/O Device I/O Device Input/Output-37 I/O Device Memory Computer Architecture Disadvantage of Buses  It creates a communication bottleneck   Bus bandwidth can limit the maximum I/O throughput The maximum bus speed is largely limited by:    The length of the bus The number of devices on the bus The need to support a range of devices with:   Widely varying latencies Widely varying data transfer rates Processor I/O Device I/O Device Input/Output-38 I/O Device Memory Computer Architecture The General Organization of a Bus  Control lines:    Signal requests and acknowledgments Indicate what type of information is on the data lines Address/Data lines carry information between the source and the destination:   Data and addresses may be shared in a multiplexed way Complex commands Control Lines Address Lines Data Lines Input/Output-39 Computer Architecture Terminology  A bus transaction includes two parts:     Master is the one who starts the bus transaction by:   Issuing the command (and address)  request Transferring the data  action, response These are often preceded by arbitration issuing the command (and address) Slave is the one who responds to the address by:   Sending data to the master if the master ask for data Receiving data from the master if the master wants to send data Bus Master Master issues command Data can go either way Input/Output-40 Bus Slave Computer Architecture Buses According to Functionality  Processor-Memory Bus (design specific)      I/O Bus (industry standard, e.g., SCSI)     Short and high speed Need to match memory system to maximize memoryto-processor bandwidth, e.g., for cache block transfers Connects directly to the processor Optimized for cache block transfers Usually is lengthy and slower Need to match a wide range of I/O devices Connects to processor-memory bus or backplane bus Backplane Bus (standard or proprietary, e.g., PCI)   Backplane: an interconnection structure in the chassis, to allow processors, memory, and I/O devices to coexist Cost advantage: one bus for all components Input/Output-41 Computer Architecture A Computer System with One Bus: Backplane Bus  A single bus (the backplane bus) is used for:      Processor to memory communication Communication between I/O devices and memory Advantages: Simple and low cost Disadvantages: slow and the bus can become a major bottleneck Example: IBM PC-AT Backplane Bus Processor Memory I/O Devices Input/Output-42 Computer Architecture A Two-Bus System  I/O buses tap into processor-memory bus via bus adapters:    Processor-memory bus: for processor-memory traffic I/O buses: provide expansion slots for I/O devices Apple Macintosh-II   NuBus: Processor, memory, a few selected I/O devices SCCI Bus: the rest of the I/O devices Processor Memory Bus Processor Memory Bus Adapter I/O Bus Bus Adapter Bus Adapter I/O Bus I/O Bus Input/Output-43 Computer Architecture A Three-Bus System  A small number of backplane buses tap into the processor-memory bus    Processor-memory bus for processor-memory traffic I/O buses are connected to the backplane bus Advantage: loading on processor bus is reduced Processor Memory Bus Processor Memory Bus Adapter Bus Adapter Backplane Bus Bus Adapter Input/Output-44 I/O Bus I/O Bus Computer Architecture Main Components of Intel Chipset   Northbridge:  Handles memory  Graphics Southbridge: I/O  PCI bus  Disk controllers  USB controllers  Audio  Serial I/O  Interrupt controller  Timers Input/Output-45 Computer Architecture Buses According to Clocking  Synchronous Bus:     Includes a clock in the control lines A fixed protocol for communication relative to clock Advantage: very little logic and can run very fast Disadvantages:    Every device on the bus must run at the same clock rate To avoid clock skew, they cannot be long if they are fast Asynchronous Bus:     It is not clocked It can accommodate a wide range of devices It can be lengthened without worrying about clock skew It requires a handshaking protocol Input/Output-46 Computer Architecture Simple Synchronous Protocol   All devices operate synchronously and all can source/sink data at same rate Even memory busses are more complex than this   memory (slave) may take time to respond it needs to control data rate Bus Req Bus Grant R/W Address Data Cmd+Addr Data1 Data2 Input/Output-47 Computer Architecture Simple Synchronous Protocol   Slave indicates when it is prepared for data transfer Actual transfer goes at bus rate Bus Req Bus Grant R/W Address Cmd+Addr First write failed Wait Data Data1 Data1 Input/Output-48 Data2 Computer Architecture Asynchronous Handshake (Read) t0 : Master obtains control and asserts address, direction, data; waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack, indicating ready to transmit data t3: Master releases req, data received t4: Slave releases ack Address Master Asserts Address Data Next Address Slave Asserts Data Read Req Ack t0 t1 t2 Input/Output-49 t3 t4 t5 Computer Architecture Asynchronous Handshake (Write) t0 : Master obtains control and asserts address, direction, data; waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack, indicating data received t3: Master releases req t4: Slave releases ack Address Master Asserts Address Data Master Asserts Data Next Address Read Req Ack t0 t1 t2 Input/Output-50 t3 t4 t5 Computer Architecture Multiple Potential Bus Masters: Need Arbitration  Bus arbitration: decide which master to use bus     Try to balance:    A bus master wanting to use bus asserts bus request It cannot use bus until its request is granted It must signal to arbiter after finish using bus Bus priority: highest priority device serviced first Fairness: lowest priority device should never be starved Can be divided into four broad classes:     Daisy chain arbitration Centralized, parallel arbitration Distributed arbitration by self-selection: each device wanting bus places a code of identity on bus (NuBus) Distributed arbitration by collision detection: like Ethernet Input/Output-51 Computer Architecture Daisy Chain Bus Arbitration   Advantage: simple Disadvantages:   Cannot assure fairness: A low-priority device may be locked out indefinitely Daisy chain grant signal also limits the bus speed Device 1 (highest priority) Grant Device 2 Grant Grant Release Bus Arbiter Device N (lowest priority) Request wired-OR Input/Output-52 Computer Architecture Centralized Parallel Arbitration  Used in essentially all processor-memory busses and in high-speed I/O busses Device 1 Grant Device 2 Device N Req Bus Arbiter Input/Output-53 Computer Architecture Increasing the Bus Bandwidth  Separate versus multiplexed address and data lines:    Data bus width:     Address and data can be transmitted in one bus cycle if separate address and data lines are available Cost: (a) more bus lines, (b) increased complexity By increasing the width of the data bus, transfers of multiple words require fewer bus cycles Ex: SPARCstation 20’s memory bus is 128 bit wide Cost: more bus lines Block transfers:     Bus transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request Computer Architecture Input/Output-54 Increasing Transaction Rate on Multimaster Bus  Overlapped arbitration   Bus parking   requires one of the above techniques Split-phase (or packet switched) bus     master can holds onto bus and performs multiple transactions as long as no other master makes request Overlapped address / data phases   perform arbitration for next transaction during current transaction completely separate address and data phases arbitrate separately for each address phase yield a tag which is matched with data phase All of the above in most modern memory busses Input/Output-55 Computer Architecture Summary of Bus Options Option Bus width High performance Low cost Separate address Multiplex address & data lines & data lines Data width Wider Narrower (e.g., 32 bits) (e.g., 8 bits) Transfer Multiple words has Single-word size less bus overhead is simpler Bus Multiple Single master masters (requires arbitration) (no arbitration) Clocking Synchronous Asynchronous Protocol Pipelined Serial Input/Output-56 Computer Architecture Bus Summary  Buses are important for building large-scale systems    Important terminology:    Speed is critically dependent on factors such as length, number of devices, etc. Critically limited by capacitance Master: The device that can initiate new transactions Slaves: Devices that respond to the master Two types of bus timing:   Synchronous: bus includes clock Asynchronous: no clock, just REQ/ACK strobing Input/Output-57 Computer Architecture Outline      I/O performance measures Types and characteristics of I/O devices Buses Interfacing I/O devices Designing an I/O system Input/Output-58 Computer Architecture What Need to Make I/O Work?    A way to connect many types Files APIs of devices to the Proc-Mem Operating System A way to present them to user programs so they are useful A way to control these devices, Proc Mem respond to them, and transfer data PCI Bus SCSI Bus Input/Output-59 cmd reg. data reg. Computer Architecture Responsibilities of Operating System  The operating system acts as interface between:   The I/O hardware and the program that requests I/O Due to 3 characteristics of the I/O systems:   The I/O system is shared by multiple programs using the processor I/O systems often use interrupts (external generated exceptions) to communicate information about I/O operations   Interrupts must be handled by the OS because they cause a transfer to the supervisor mode The low-level control of an I/O device is complex:   Require managing a set of concurrent events The requirements for correct device control are very detailed Input/Output-60 Computer Architecture Functions OS Must Provide  Provide protection to shared I/O resources   Provides abstraction for accessing devices:    Supply routines that handle low-level device operation Handles the interrupts generated by I/O devices Provide equitable access to the shared I/O resources   Guarantees that a user’s program can only access the portions of an I/O device to which the user has rights All user programs must have equal access to the I/O resources Schedule accesses in order to enhance system throughput Input/Output-61 Computer Architecture OS: I/O Requirements   The OS must be able to communicate with I/O devices and to prevent the user program from communicating with the I/O device directly If user programs could perform I/O directly:   No protection to the shared I/O resources 3 types of communication are required:    The OS must be able to give commands to the I/O devices The I/O device notify OS when the I/O device has completed an operation or an error Data transfers between memory and I/O device Study how these can be done next... Input/Output-62 Computer Architecture Instruction Set Architecture for I/O  Two methods are used to address the device:    Special I/O instructions Memory-mapped I/O Special I/O instructions specify:  Both the device number and the command word    Device number: the processor communicates this via a set of wires normally included as part of the I/O bus Command word: this is usually send on the bus data lines Memory-mapped I/O:    Portions of address space are assigned to I/O device Read and writes to those addresses are interpreted as commands to the I/O devices I/O address space is often protected by address translation Input/Output-63 Computer Architecture Memory Mapped I/O   I/O devices communicate with the processor through a set of registers in the I/O controller Addresses from the processor are not to regular memory, but correspond to registers in I/O devices address 0xFFFFFFFF cntrl reg. data reg. 0xFFFF0000 0 Input/Output-64 Computer Architecture Processor-I/O Speed Mismatch  1GHz microprocessor can execute 1 billion load or store instructions per second, or 4,000,000 KB/s data rate   Input: device may not be ready to send data as fast as the processor loads it    I/O devices data rates range from 0.01 KB/s to 30,000 KB/s Also, might be waiting for human to act Output: device not be ready to accept data as fast as processor stores it What to do? Input/Output-65 Computer Architecture Processor Checks Status before Acting  Path to device generally has 2 registers:     Control Register: says it’s OK to read/write (I/O ready) [think of a flagman on a road] Data Register: contains data Processor reads from Control Register in loop, waiting for device to set Ready bit (0 => 1) in Control register to say its OK Processor then loads from (input) or writes to (output) Data Register  Load from or Store into Data Register resets Ready bit (1 => 0) of Control Register Input/Output-66 Computer Architecture Polling: Programmed I/O  Advantage:   Simple: processor is totally in control and does all Disadvantage:  Polling overhead can consume a lot of CPU time CPU Memory Is the data ready? yes read data but checks for I/O completion can be dispersed among computation intensive code store data IOC done? device no busy wait loop not an efficient way to use the CPU unless the device is very fast! no yes Input/Output-67 Computer Architecture Alternative to Polling?    Wasteful to have processor spend most of its time “spin-waiting” for I/O to be ready Would like an unplanned procedure call that would be invoked only when I/O device is ready Solution: use exception mechanism to help I/O. Interrupt program when I/O ready, return when done with data transfer Input/Output-68 Computer Architecture I/O Interrupt  An I/O interrupt is just like the exceptions except:    An I/O interrupt is asynchronous Further information needs to be conveyed An I/O interrupt is asynchronous with respect to instruction execution:   I/O interrupt is not associated with any instruction I/O interrupt does not prevent any instruction from completion   Can pick convenient point to take an interrupt I/O interrupt is more complicated than exception:   Needs to convey the identity of the device generating the interrupt Interrupt requests can have different urgencies:  Interrupt request needs to be prioritized Input/Output-69 Computer Architecture Interrupt Driven Data Transfer  Advantage:   User program is only halted during actual transfer Disadvantage: special hardware is needed to:    Cause an interrupt (I/O device) Detect an interrupt (processor) Save proper states to resume after (1) I/O interrupt (processor) interrupt CPU add sub and or nop (2) save PC user program interrupt service routine (3) interrupt service addr Memory IOC (4) device Input/Output-70 read store ... : rti memoryComputer Architecture Questions Raised about Interrupts  Which I/O device caused exception?   Can avoid interrupts during the interrupt routine?    Needs to convey the identity of the device generating the interrupt What if more important interrupt occurs while servicing this interrupt? Allow interrupt routine to be entered again? Who keeps track of status of all the devices, handle errors, know where to put/supply the I/O data? Input/Output-71 Computer Architecture Improving Data Transfer Performance   Thus far: OS give commands to I/O, I/O device notify OS when the I/O device completed operation or an error What about data transfer to I/O device?   Processor busy doing loads/stores between memory and I/O Data Register Ideal: specify the block of memory to be transferred, be notified on completion?  Direct Memory Access (DMA) : a simple computer transfers a block of data to/from memory and I/O, interrupting upon done Input/Output-72 Computer Architecture What is DMA (Direct Memory Access)?  I/O devices often transfer large data to memory:     Disk must transfer complete block (4K? 16K?) Large packets from network Regions of frame buffer DMA gives external device ability to write memory directly: much lower overhead than having processor request one word at a time  Processor (or at least memory system) acts like slave Input/Output-73 Computer Architecture Delegating I/O from CPU: DMA  sends a starting address, Direct Memory Access (DMA): CPU direction, and length count    External to the CPU Act as a maser on the bus Transfer blocks of data to or from memory without CPU intervention to DMAC; then issues "start" CPU Memory DMAC IOC device DMAC provides handshake signals for peripheral controller, and memory addresses and handshake signals for memory. Input/Output-74 Computer Architecture Delegating I/O from CPU: IOP D1 IOP CPU D2 main memory bus Mem . . . Dn target device where cmnds are I/O bus (1) Issues instruction to IOP CPU IOP OP Device Address (4) IOP interrupts CPU when done (2) IOP looks in memory for commands OP Addr Cnt Other (3) memory Device to/from memory transfers are controlled by the IOP directly. what to do special requests where to put data IOP steals memory cycles Input/Output-75 how much Computer Architecture DMA and Memory System   DMA goes to memory without through address translation and cache system Issue: DMA uses virtual or physical address     Physical address: what if across a page boundary? Virtual address: need address translation (mapping provided by OS) Break a transfer to a series of transfers, each within a page boundary, then chain the transfers Issue: cache coherence or stale data problem   What if I/O devices write data that is currently in cache? Solutions:    Route I/O through the cache (expensive) OS flushes cache on I/O operations Have hardware invalidate cache lines (remember Computer Architecture “Coherence” cacheInput/Output-76 misses?) Summary    I/O performance is limited by weakest link in chain between OS and device Disk I/O Benchmarks: I/O rate vs. data rate vs. latency Three components of disk access time:   I/O device notifying the operating system:      Seek time, rotational latency, transfer time Polling: it can waste a lot of processor time I/O interrupt: similar to exception except asynchronous Delegating I/O responsibility from CPU: DMA or IOP I/O control leads to Operating Systems Wide range of devices  Multimedia and high speed NW poise challenges Input/Output-77 Computer Architecture

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CS4100 template - 清華大學資訊工程系