Download CS4100 template - 清華大學資訊工程系

Document related concepts
Transcript
CS4100: 計算機結構
I/O Systems
國立清華大學資訊工程學系
九十五學年度第一學期
Adapted from Prof. D. Patterson’s class notes
Copyright 1998, 2000 UCB
5 Components of Any Computer
Computer
Processor Memory
(active)
(passive)
Control
(“brain”)
(where
programs,
Datapath data live
(“brawn”) when
running)
Devices
Input
Output
Input/Output-1
Keyboard,
Mouse
Disk
(where
programs,
data live
when not
running)
Display,
Printer
Computer Architecture
“What’s This Stuff Good For?”
Remote Diagnosis:
“NeoRest ExII,” a high-tech
toilet features
microprocessor-controlled
seat warmers, automatic lid
openers, air deodorizers,
water sprays and blowdryers that do away with the
need for toilet tissue... Toto's
engineers are now working
on a model that analyzes
urine to determine bloodsugar levels in diabetics and
then automatically sends a
daily report, by modem, to
the user's physician.
One Digital Day, 1998
www.intel.com/onedigitalday

Input/Output-2
Computer Architecture
Motivation for Input/Output




I/O is how humans interact with computers
I/O gives computers long-term memory.
I/O lets computers do amazing things:

Read pressure of synthetic hand and
control synthetic arm and hand of fireman

Control propellers, fins, communicate
in BOB (Breathable Observable Bubble)
Computer without I/O like a car without wheels; great
technology, but won’t get you anywhere
Input/Output-3
Computer Architecture
I/O Design Issues

Many factors (expandability, resilience) besides perf.


I/O performance complex: latency, throughput
I/O performance depends on aspects of system:

Access latency, throughput, connection between
devices and the system, memory hierarchy, OS
Processor
Interrupts
Cache
Memory– I/O bus
Main
memory
I/O
controller
Disk
Disk
Input/Output-4
I/O
controller
I/O
controller
Graphics
output
Network
Computer Architecture
Outline





I/O performance measures
Types and characteristics of I/O devices
Buses
Interfacing I/O devices
Designing an I/O system
Input/Output-5
Computer Architecture
I/O System Performance

I/O System performance depends on many aspects
of the system (limited by weakest link in the chain):


The CPU
The memory system:








Internal and external caches
Main memory
The underlying interconnection (buses)
The I/O controller
The I/O device
The speed of the I/O software (operating system)
The efficiency of the software’s use of the I/O devices
Two common performance metrics:


Throughput: I/O bandwidth
Response time: latency
Input/Output-6
Computer Architecture
Simple Producer-Server Model
Producer

Server
Throughput:


Number of tasks completed by the server in unit time
In order to get the highest possible throughput:



Queue
The server should never be idle
The queue should never be empty
Response time:


Conflicting goals
Begins when a task is placed in the queue and ends
when it is completed by the server
In order to minimize the response time:

The queue should be empty and the server is idle
Input/Output-7
Computer Architecture
Throughput vs. Response Time
Response
Time (ms)
300
pay a steep price in
response time to get
the last few % of
max. throughput
200
100
20%
40%
60%
80%
100%
Percentage of maximum throughput
Input/Output-8
Computer Architecture
Throughput Enhancement
Server
Queue
Producer
Queue

In general throughput can be improved by:


Server
Throwing more hardware at the problem
Response time is much harder to reduce:

Ultimately limited by speed of light (we are far from it)
Input/Output-9
Computer Architecture
I/O Benchmarks for Perf. Measure
(1/2)

Supercomputer application:




Large-scale scientific problems => large files
One large read and many small writes to snapshot
computation
Concerned with data rate: MB/second between
memory and disk
Transaction processing:




Examples: Airline reservations systems and bank ATMs
Small changes to large shared database
Concerned with I/O rate: number of disk
accesses/second in bytes/second
Typical benchmark: TPC-C

light/medium-weight queries on order-entry
Input/Output-10
Computer Architecture
I/O Benchmarks for Perf. Measure
(2/2)

File system:

Measurements of UNIX file systems in an engineering
environment:




80% of accesses are to files less than 10 KB
90% of all file accesses are to data with sequential
addresses on the disk
67% are reads, 27% writes, 6% read-modify-write
A synthetic benchmark: 70 files of 200KB in 5 phases





MakeDir
Copy
ScanDir
ReadAll
Make
Input/Output-11
Computer Architecture
Outline





I/O performance measures
Types and characteristics of I/O devices
Buses
Interfacing I/O devices
Designing an I/O system
Input/Output-12
Computer Architecture
I/O Device Examples and Speeds

I/O Speed: bytes transferred per second
(from mouse to display: 1-to-million)
Device
Behavior
Partner
Keyboard
Mouse
Voice output
Floppy disk
Laser printer
Magnetic disk
Network-LAN
Graphics display
Input
Input
Output
Storage
Output
Storage
I or O
Output
Human
Human
Human
Machine
Human
Machine
Machine
Human
Data Rate
(KBytes/s)
0.01
0.02
5.00
50.00
100.00
10,000.00
10,000.00
30,000.00
We will concentrate on disks in the following discussion
Input/Output-13
Computer Architecture
Disk History (1/2)
Data
density
Mbit/sq. in.
Capacity of
Unit Shown
Megabytes
1973:
1. 7 Mbit/sq. in
140 MBytes
1979:
7. 7 Mbit/sq. in
2,300 MBytes
source: New York Times, 2/23/98, page C3,
“Makers of disk drives crowd even more data into even smaller spaces”
Input/Output-14
Computer Architecture
Disk History (2/2)
1997:
3090 Mbit/sq. in
8100 MBytes
1989:
63 Mbit/sq. in
60,000 MBytes
source: New York Times, 2/23/98, page C3,
“Makers of disk drives crowd even more data into even smaller spaces”
Input/Output-15
Computer Architecture
1-inch Disk Drive!

2000 IBM MicroDrive:





1.7” x 1.4” x 0.2”
1 GB, 3600 RPM,
5 MB/s, 15 ms seek
Digital camera, PalmPC?
2006 MicroDrive?
9 GB, 50 MB/s!


Assuming it finds a niche
in a successful product
Assuming past trends continue
Input/Output-16
Computer Architecture
Storage Technology Drivers

Driven by the prevailing computing paradigm


1950s: migration from batch to on-line processing
1990s: migration to ubiquitous computing



computers in phones, books, cars, video cameras, …
nationwide fiber optical network with wireless tails
Effects on storage industry:

Embedded storage


Data utilities


smaller, cheaper, more reliable, lower power
high capacity, hierarchically managed storage
Network-attached storage (NAS)
Input/Output-17
Computer Architecture
Historical Perspective




Form factor and capacity drives market, more than
performance
1970s: Mainframes => 14 inch diameter disks
1980s: Minicomputers, Servers => 8”, 5.25” diameter
disks
Late 1980s/Early 1990s:



Pizzabox PCs => 3.5 inch diameter disks
Laptops, notebooks => 2.5 inch disks
Palmtops didn’t use disks, so 1.8 inch diameter disks
didn’t make it
Input/Output-18
Computer Architecture
Technology Trends
Disk Capacity
now doubles
every 18 mo;
before 1990
every 36 months
The I/O
GAP
• Today: processing power doubles every 18 months
• Today: memory size doubles every 18 months (4X/3yr)
• Today: disk capacity doubles every 18 months
• Disk positioning rate (Seek + Rotate) doubles every ten years!
Input/Output-19
Computer Architecture
Disk Device Technology
Arm Head
Inner
Sector
Track
Actuator



Outer
Track
Platter
Several platters, with information recorded
magnetically on both surfaces (usually)
Bits recorded in tracks, which in turn divided into
sectors (e.g., 512 Bytes); error correction code per
sector to find and correct errors
Actuator moves head (end of arm) over track
(“seek”), wait for sector rotate under head, then read
or write

“Cylinder”: all tracks under heads
Photo of Disk Head, Arm, Actuator
Spindle
Arm
Head
Actuator
Platters (12)
Magnetic Disk Characteristic


Cylinder: all tacks under the head
at a given point on all surface
Read/write is a 3-stage process:





Seek time: position the arm over
proper track (8 to 20 ms. avg.)
Rotational latency: wait for desired
sector rotate under head (.5 / RPM)
Head
Transfer time: transfer a block of bits (sector)
under the read-write head (2 to 15 MB/sec)
Disk controller time
Cylinder
Platter
Average seek time in the range of 8 ms to 12 ms


Track
Sector
(Sum of time for all possible seek) / (total # of possible
seeks)
Due to locality of disk reference, actual average
seek time may only be 25% to 33% of advertised
number
Input/Output-22
Computer Architecture
Typical Numbers of a Magnetic Disk
Diameter: 1.8” to 8”
Platters
(1-15)
Track


1000 to 5,000 tracks per surface
64 to 200 sectors per track (512 bytes/sector)


Sector
A sector is the smallest unit that can be read or written
(sector #, gap, information of sector+CRC, gap, …)
Traditionally all tracks have same number of sectors:


Constant bit density: more sectors on outer tracks
Recently relaxed: constant bit size, speed varies with
track location
Input/Output-23
Computer Architecture
Typical Numbers of a Magnetic Disk

Rotational Latency:




Most disks rotate at 3,600 to 7200 RPM
Approximately 16 ms to 8 ms
per revolution, respectively
An average latency to the desired
information is halfway around the disk:
8 ms at 3600 RPM, 4 ms at 7200 RPM
Track
Sector
Cylinder
Head
Platter
Transfer Time is a function of :





Transfer size (usually a sector): 1 KB / sector
Rotation speed: 3600 RPM to 10000 RPM
Recording density: bits per inch on a track
Diameter typical diameter ranges from 1.8 to 5.25 in
Typical values: 2 to 40 MB per second
Input/Output-24
Computer Architecture
An Example: Barracuda 180








source: www.seagate.com;
181.6 GB, 3.5-inch disk
7200 RPM; SCSI
4.16 ms = 1/2 rotation
12 platters, 24 surfaces
31.2 Gbit/sq. in. areal
density
10 watts (idle)
0.1 ms controller time
8.0 ms avg. seek
35 to 64 MB/s(internal)
• $7.50 / GB
• (Lower capacity,
ATA/IDE disks ~ $2 / GB)
Input/Output-25
Computer Architecture
Disk Device Performance
Controller
Disk Latency = Queueing Time +
Controller time +
Seek Time + Rotation Time + transfer Time


Average distance of a sector from head? 1/2 time of a rotation
 7200 Revolutions Per Minute => 120 Rev/sec
 1 revolution = 1/120 sec => 8.33 milliseconds
 1/2 rotation (revolution) => 4.16 ms
Average number of tracks moved under arm?
 Sum all possible seek distances / # possible

Assumes average seek distance is random
Input/Output-26
Computer Architecture
Example
512 byte sector, rotate at 5400 RPM, advertised seeks
is 12 ms, transfer rate is 4 MB/sec, controller
overhead is 1 ms, queue idle so no service time
Disk Access Time = Seek time + Rotational Latency +
Transfer time + Controller Time + Queuing Delay
Disk Access Time = 12 ms + 0.5 / 5400 RPM + 0.5 KB / 4
MB/s + 1 ms + 0
Disk Access Time = 12 ms + 0.5 / 90 RPS + 0.125 / 1024
s + 1 ms + 0
Disk Access Time = 12 ms + 5.5 ms + 0.1 ms + 1 ms + 0
ms
Disk Access Time = 18.6 ms
 If real seeks are 1/3 advertised seeks, then its 10.6
ms, with rotation delay at 50% of the time!

Input/Output-27
Computer Architecture
Areal Density

Bits recorded along a track


Number of tracks per surface


Metric is Bits Per Inch (BPI)
Metric is Tracks Per Inch (TPI)
Care about bit density per unit area



Metric is Bits Per Square Inch
Called Areal Density
Areal Density = BPI x TPI
Input/Output-28
Computer Architecture
Data Rate: Inner vs. Outer Tracks

To keep things simple, originally kept same number
of sectors per track


Competition decided to keep BPI the same for all
tracks (“constant bit density”)




Since outer track longer, lower bits per inch
More capacity per disk
More of sectors per track towards edge
Since disk spins at constant speed, outer tracks have
faster data rate
Bandwidth outer track 1.7X inner track!
Input/Output-29
Computer Architecture
Disk Performance Model/Trends

Capacity
+ 100%/year (2X / 1.0 yrs)

Transfer rate (BW)
+ 40%/year (2X / 2.0 yrs)

Rotation + Seek time
– 8%/ year (1/2 in 10 yrs)

MB/$
> 100%/year (2X / <1.5 yrs)
Fewer chips + areal density

Areal density
Change slope 30%/yr to 60%/yr about 1991
Input/Output-30
Computer Architecture
Reliability and Availability

Two terms that are often confused:



Availability can be improved by adding hardware:


Reliability: Is anything broken?
Availability: Is the system still available to the user?
Example: adding ECC on memory
Reliability can only be improved by:



Better environmental conditions
Building more reliable components
Building with fewer components

Improve availability may come at the cost of lower
reliability
Input/Output-31
Computer Architecture
Disk Arrays

Arrays of small and inexpensive disks

Increase potential throughput with many disk drives:



Data is spread over multiple disk
Multiple accesses are made to several disks
Reliability is lower than a single disk:

But availability improved with redundant disks (RAID):

Lost information reconstructed from redundant infor.
Input/Output-32
Computer Architecture
Disk Summary

Magnetic Disks continue rapid advance: 60%/yr
capacity, 40%/yr bandwidth, slow on seek, rotation
improvements, MB/$ improving 100%/yr?



Designs to fit high volume form factor
Disk performance:
Disk Latency = Queuing Time + Controller time +
Seek Time + Rotation Time + transfer Time
RAID


Higher performance with more disk arms per $
Adds availability option for small number of extra disks
Input/Output-33
Computer Architecture
Outline





I/O performance measures
Types and characteristics of I/O devices
Buses
Interfacing I/O devices
Designing an I/O system
Input/Output-34
Computer Architecture
What Is a Bus?

A Bus Is:


shared communication link
single set of wires used to connect multiple subsystems
Processor
Input
Control
Memory
Datapath

Output
A Bus is also a fundamental tool for composing large,
complex systems

systematic means of abstraction
Input/Output-35
Computer Architecture
Ex.: Pentium
System
Organization
Processor/Memory
Bus
PCI Bus
I/O Busses
Advantages of Buses

Versatility:



New devices can be added easily
Peripherals can be moved between computer
systems that use the same bus standard
Low Cost:

A single set of wires is shared in multiple ways
Processor
I/O
Device
I/O
Device
Input/Output-37
I/O
Device
Memory
Computer Architecture
Disadvantage of Buses

It creates a communication bottleneck


Bus bandwidth can limit the maximum I/O throughput
The maximum bus speed is largely limited by:



The length of the bus
The number of devices on the bus
The need to support a range of devices with:


Widely varying latencies
Widely varying data transfer rates
Processor
I/O
Device
I/O
Device
Input/Output-38
I/O
Device
Memory
Computer Architecture
The General Organization of a Bus

Control lines:



Signal requests and acknowledgments
Indicate what type of information is on the data lines
Address/Data lines carry information between the
source and the destination:


Data and addresses may be shared in a multiplexed
way
Complex commands
Control Lines
Address Lines
Data Lines
Input/Output-39
Computer Architecture
Terminology

A bus transaction includes two parts:




Master is the one who starts the bus transaction by:


Issuing the command (and address)  request
Transferring the data
 action, response
These are often preceded by arbitration
issuing the command (and address)
Slave is the one who responds to the address by:


Sending data to the master if the master ask for data
Receiving data from the master if the master wants to
send data
Bus
Master
Master issues command
Data can go either way
Input/Output-40
Bus
Slave
Computer Architecture
Buses According to Functionality

Processor-Memory Bus (design specific)





I/O Bus (industry standard, e.g., SCSI)




Short and high speed
Need to match memory system to maximize memoryto-processor bandwidth, e.g., for cache block transfers
Connects directly to the processor
Optimized for cache block transfers
Usually is lengthy and slower
Need to match a wide range of I/O devices
Connects to processor-memory bus or backplane bus
Backplane Bus (standard or proprietary, e.g., PCI)


Backplane: an interconnection structure in the chassis,
to allow processors, memory, and I/O devices to
coexist
Cost advantage: one bus for all components
Input/Output-41
Computer Architecture
A Computer System with One Bus:
Backplane Bus

A single bus (the backplane bus) is used for:





Processor to memory communication
Communication between I/O devices and memory
Advantages: Simple and low cost
Disadvantages: slow and the bus can become a
major bottleneck
Example: IBM PC-AT
Backplane Bus
Processor
Memory
I/O Devices
Input/Output-42
Computer Architecture
A Two-Bus System

I/O buses tap into processor-memory bus via bus
adapters:



Processor-memory bus: for processor-memory traffic
I/O buses: provide expansion slots for I/O devices
Apple Macintosh-II


NuBus: Processor, memory, a few selected I/O devices
SCCI Bus: the rest of the I/O devices
Processor Memory Bus
Processor
Memory
Bus
Adapter
I/O
Bus
Bus
Adapter
Bus
Adapter
I/O
Bus
I/O
Bus
Input/Output-43
Computer Architecture
A Three-Bus System

A small number of backplane buses tap into the
processor-memory bus



Processor-memory bus for processor-memory traffic
I/O buses are connected to the backplane bus
Advantage: loading on processor bus is reduced
Processor Memory Bus
Processor
Memory
Bus
Adapter
Bus
Adapter
Backplane Bus
Bus
Adapter
Input/Output-44
I/O Bus
I/O Bus
Computer Architecture
Main Components of Intel Chipset


Northbridge:
 Handles memory
 Graphics
Southbridge: I/O
 PCI bus
 Disk controllers
 USB controllers
 Audio
 Serial I/O
 Interrupt
controller
 Timers
Input/Output-45
Computer Architecture
Buses According to Clocking

Synchronous Bus:




Includes a clock in the control lines
A fixed protocol for communication relative to clock
Advantage: very little logic and can run very fast
Disadvantages:



Every device on the bus must run at the same clock rate
To avoid clock skew, they cannot be long if they are fast
Asynchronous Bus:




It is not clocked
It can accommodate a wide range of devices
It can be lengthened without worrying about clock
skew
It requires a handshaking protocol
Input/Output-46
Computer Architecture
Simple Synchronous Protocol


All devices operate synchronously and all can
source/sink data at same rate
Even memory busses are more complex than this


memory (slave) may take time to respond
it needs to control data rate
Bus Req
Bus Grant
R/W
Address
Data
Cmd+Addr
Data1
Data2
Input/Output-47
Computer Architecture
Simple Synchronous Protocol


Slave indicates when it is prepared for data transfer
Actual transfer goes at bus rate
Bus Req
Bus Grant
R/W
Address
Cmd+Addr
First write failed
Wait
Data
Data1
Data1
Input/Output-48
Data2
Computer Architecture
Asynchronous Handshake (Read)
t0 : Master obtains control and asserts address, direction, data;
waits a specified amount of time for slaves to decode target
t1: Master asserts request line
t2: Slave asserts ack, indicating ready to transmit data
t3: Master releases req, data received
t4: Slave releases ack
Address
Master Asserts Address
Data
Next Address
Slave Asserts Data
Read
Req
Ack
t0
t1
t2
Input/Output-49
t3
t4
t5
Computer Architecture
Asynchronous Handshake (Write)
t0 : Master obtains control and asserts address, direction, data;
waits a specified amount of time for slaves to decode target
t1: Master asserts request line
t2: Slave asserts ack, indicating data received
t3: Master releases req
t4: Slave releases ack
Address
Master Asserts Address
Data
Master Asserts Data
Next Address
Read
Req
Ack
t0
t1
t2
Input/Output-50
t3
t4
t5
Computer Architecture
Multiple Potential Bus Masters:
Need Arbitration

Bus arbitration: decide which master to use bus




Try to balance:



A bus master wanting to use bus asserts bus request
It cannot use bus until its request is granted
It must signal to arbiter after finish using bus
Bus priority: highest priority device serviced first
Fairness: lowest priority device should never be starved
Can be divided into four broad classes:




Daisy chain arbitration
Centralized, parallel arbitration
Distributed arbitration by self-selection: each device
wanting bus places a code of identity on bus (NuBus)
Distributed arbitration by collision detection: like
Ethernet
Input/Output-51
Computer Architecture
Daisy Chain Bus Arbitration


Advantage: simple
Disadvantages:


Cannot assure fairness: A low-priority device may be
locked out indefinitely
Daisy chain grant signal also limits the bus speed
Device 1
(highest
priority)
Grant
Device
2
Grant
Grant
Release
Bus
Arbiter
Device
N
(lowest
priority)
Request
wired-OR
Input/Output-52
Computer Architecture
Centralized Parallel Arbitration

Used in essentially all processor-memory busses and
in high-speed I/O busses
Device 1
Grant
Device 2
Device N
Req
Bus
Arbiter
Input/Output-53
Computer Architecture
Increasing the Bus Bandwidth

Separate versus multiplexed address and data lines:



Data bus width:




Address and data can be transmitted in one bus cycle
if separate address and data lines are available
Cost: (a) more bus lines, (b) increased complexity
By increasing the width of the data bus, transfers of
multiple words require fewer bus cycles
Ex: SPARCstation 20’s memory bus is 128 bit wide
Cost: more bus lines
Block transfers:




Bus transfer multiple words in back-to-back bus cycles
Only one address needs to be sent at the beginning
The bus is not released until the last word is transferred
Cost: (a) increased complexity
(b) decreased response time for request Computer Architecture
Input/Output-54
Increasing Transaction Rate on
Multimaster Bus

Overlapped arbitration


Bus parking


requires one of the above techniques
Split-phase (or packet switched) bus




master can holds onto bus and performs multiple
transactions as long as no other master makes request
Overlapped address / data phases


perform arbitration for next transaction during current
transaction
completely separate address and data phases
arbitrate separately for each
address phase yield a tag which is matched with data
phase
All of the above in most modern memory busses
Input/Output-55
Computer Architecture
Summary of Bus Options
Option
Bus width
High performance
Low cost
Separate address
Multiplex address
& data lines
& data lines
Data width Wider
Narrower
(e.g., 32 bits)
(e.g., 8 bits)
Transfer
Multiple words has
Single-word
size less bus overhead
is simpler
Bus
Multiple
Single master
masters
(requires arbitration)
(no
arbitration)
Clocking
Synchronous
Asynchronous
Protocol
Pipelined
Serial
Input/Output-56
Computer Architecture
Bus Summary

Buses are important for building large-scale systems



Important terminology:



Speed is critically dependent on factors such as length,
number of devices, etc.
Critically limited by capacitance
Master: The device that can initiate new transactions
Slaves: Devices that respond to the master
Two types of bus timing:


Synchronous: bus includes clock
Asynchronous: no clock, just REQ/ACK strobing
Input/Output-57
Computer Architecture
Outline





I/O performance measures
Types and characteristics of I/O devices
Buses
Interfacing I/O devices
Designing an I/O system
Input/Output-58
Computer Architecture
What Need to Make I/O Work?



A way to connect many types
Files
APIs
of devices to the Proc-Mem
Operating System
A way to present them to user
programs so they are useful
A way to control these devices,
Proc
Mem
respond to them, and transfer data
PCI Bus
SCSI Bus
Input/Output-59
cmd reg.
data reg.
Computer Architecture
Responsibilities of Operating System

The operating system acts as interface between:


The I/O hardware and the program that requests I/O
Due to 3 characteristics of the I/O systems:


The I/O system is shared by multiple programs using
the processor
I/O systems often use interrupts (external generated
exceptions) to communicate information about I/O
operations


Interrupts must be handled by the OS because they
cause a transfer to the supervisor mode
The low-level control of an I/O device is complex:


Require managing a set of concurrent events
The requirements for correct device control are very
detailed
Input/Output-60
Computer Architecture
Functions OS Must Provide

Provide protection to shared I/O resources


Provides abstraction for accessing devices:



Supply routines that handle low-level device operation
Handles the interrupts generated by I/O devices
Provide equitable access to the shared I/O resources


Guarantees that a user’s program can only access the
portions of an I/O device to which the user has rights
All user programs must have equal access to the I/O
resources
Schedule accesses in order to enhance system
throughput
Input/Output-61
Computer Architecture
OS: I/O Requirements


The OS must be able to communicate with I/O
devices and to prevent the user program from
communicating with the I/O device directly
If user programs could perform I/O directly:


No protection to the shared I/O resources
3 types of communication are required:



The OS must be able to give commands to the I/O
devices
The I/O device notify OS when the I/O device has
completed an operation or an error
Data transfers between memory and I/O device
Study how these can be done next...
Input/Output-62
Computer Architecture
Instruction Set Architecture for
I/O

Two methods are used to address the device:



Special I/O instructions
Memory-mapped I/O
Special I/O instructions specify:

Both the device number and the command word



Device number: the processor communicates this via a
set of wires normally included as part of the I/O bus
Command word: this is usually send on the bus data lines
Memory-mapped I/O:



Portions of address space are assigned to I/O device
Read and writes to those addresses are interpreted
as commands to the I/O devices
I/O address space is often protected by address
translation
Input/Output-63
Computer Architecture
Memory Mapped I/O


I/O devices communicate with the processor through
a set of registers in the I/O controller
Addresses from the processor are not to regular
memory, but correspond to registers in I/O devices
address
0xFFFFFFFF
cntrl reg.
data reg.
0xFFFF0000
0
Input/Output-64
Computer Architecture
Processor-I/O Speed Mismatch

1GHz microprocessor can execute 1 billion load or
store instructions per second, or 4,000,000 KB/s data
rate


Input: device may not be ready to send data as fast
as the processor loads it



I/O devices data rates range from 0.01 KB/s to 30,000
KB/s
Also, might be waiting for human to act
Output: device not be ready to accept data as fast
as processor stores it
What to do?
Input/Output-65
Computer Architecture
Processor Checks Status before
Acting

Path to device generally has 2 registers:




Control Register: says it’s OK to read/write
(I/O ready) [think of a flagman on a road]
Data Register: contains data
Processor reads from Control Register in loop, waiting
for device to set Ready bit (0 => 1) in Control register
to say its OK
Processor then loads from (input) or writes to (output)
Data Register

Load from or Store into Data Register resets Ready bit (1
=> 0) of Control Register
Input/Output-66
Computer Architecture
Polling: Programmed I/O

Advantage:


Simple: processor is totally in control and does all
Disadvantage:

Polling overhead can
consume a lot of CPU time
CPU
Memory
Is the
data
ready?
yes
read
data
but checks for I/O
completion can be
dispersed among
computation
intensive code
store
data
IOC
done?
device
no
busy wait loop
not an efficient
way to use the CPU
unless the device
is very fast!
no
yes
Input/Output-67
Computer Architecture
Alternative to Polling?



Wasteful to have processor spend most of its time
“spin-waiting” for I/O to be ready
Would like an unplanned procedure call that would
be invoked only when I/O device is ready
Solution: use exception mechanism to help I/O.
Interrupt program when I/O ready, return when done
with data transfer
Input/Output-68
Computer Architecture
I/O Interrupt

An I/O interrupt is just like the exceptions except:



An I/O interrupt is asynchronous
Further information needs to be conveyed
An I/O interrupt is asynchronous with respect to
instruction execution:


I/O interrupt is not associated with any instruction
I/O interrupt does not prevent any instruction from
completion


Can pick convenient point to take an interrupt
I/O interrupt is more complicated than exception:


Needs to convey the identity of the device generating
the interrupt
Interrupt requests can have different urgencies:

Interrupt request needs to be prioritized
Input/Output-69
Computer Architecture
Interrupt Driven Data Transfer

Advantage:


User program is only halted during actual transfer
Disadvantage: special hardware is needed to:



Cause an interrupt (I/O device)
Detect an interrupt (processor)
Save proper states to resume after
(1) I/O
interrupt (processor)
interrupt
CPU
add
sub
and
or
nop
(2) save PC
user
program
interrupt
service
routine
(3) interrupt
service addr
Memory
IOC
(4)
device
Input/Output-70
read
store
... :
rti
memoryComputer Architecture
Questions Raised about Interrupts

Which I/O device caused exception?


Can avoid interrupts during the interrupt routine?



Needs to convey the identity of the device generating
the interrupt
What if more important interrupt occurs while servicing
this interrupt?
Allow interrupt routine to be entered again?
Who keeps track of status of all the devices, handle
errors, know where to put/supply the I/O data?
Input/Output-71
Computer Architecture
Improving Data Transfer
Performance


Thus far: OS give commands to I/O,
I/O device notify OS when the I/O device completed
operation or an error
What about data transfer to I/O device?


Processor busy doing loads/stores between memory
and I/O Data Register
Ideal: specify the block of memory to be transferred,
be notified on completion?

Direct Memory Access (DMA) : a simple computer
transfers a block of data to/from memory and I/O,
interrupting upon done
Input/Output-72
Computer Architecture
What is DMA (Direct Memory
Access)?

I/O devices often transfer large data to memory:




Disk must transfer complete block (4K? 16K?)
Large packets from network
Regions of frame buffer
DMA gives external device ability to write memory
directly: much lower overhead than having
processor request one word at a time

Processor (or at least memory system) acts like slave
Input/Output-73
Computer Architecture
Delegating I/O from CPU: DMA

sends a starting address,
Direct Memory Access (DMA): CPU
direction, and length count



External to the CPU
Act as a maser on the bus
Transfer blocks of data to or
from memory without CPU
intervention
to DMAC; then issues "start"
CPU
Memory
DMAC
IOC
device
DMAC provides handshake
signals for peripheral
controller, and memory
addresses and handshake
signals for memory.
Input/Output-74
Computer Architecture
Delegating I/O from CPU: IOP
D1
IOP
CPU
D2
main memory
bus
Mem
. . .
Dn
target device
where cmnds are
I/O
bus
(1) Issues
instruction
to IOP
CPU
IOP
OP Device Address
(4) IOP interrupts
CPU when done
(2)
IOP looks in memory for commands
OP Addr Cnt Other
(3)
memory
Device to/from memory
transfers are controlled
by the IOP directly.
what
to do
special
requests
where
to put
data
IOP steals memory cycles
Input/Output-75
how
much
Computer Architecture
DMA and Memory System


DMA goes to memory without through address
translation and cache system
Issue: DMA uses virtual or physical address




Physical address: what if across a page boundary?
Virtual address: need address translation (mapping
provided by OS)
Break a transfer to a series of transfers, each within a
page boundary, then chain the transfers
Issue: cache coherence or stale data problem


What if I/O devices write data that is currently in
cache?
Solutions:



Route I/O through the cache (expensive)
OS flushes cache on I/O operations
Have hardware invalidate cache lines (remember
Computer Architecture
“Coherence” cacheInput/Output-76
misses?)
Summary



I/O performance is limited by weakest link in chain
between OS and device
Disk I/O Benchmarks: I/O rate vs. data rate vs.
latency
Three components of disk access time:


I/O device notifying the operating system:





Seek time, rotational latency, transfer time
Polling: it can waste a lot of processor time
I/O interrupt: similar to exception except asynchronous
Delegating I/O responsibility from CPU: DMA or IOP
I/O control leads to Operating Systems
Wide range of devices

Multimedia and high speed NW poise challenges
Input/Output-77
Computer Architecture