Download I/O The Five Parts of a Computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
I/O
Spring 2016
Tore Brox-Larsen
Includes original material by Kai Li, Andrew S.
Tanenbaum, Pål Halvorsen Otto J. Anshus, Bård
Fjukstad, Daniel Stødle, and Andy Bavier
The Five Parts of a Computer
•  CPU
–  Datapath
–  Control
•  Memory
•  Input
•  Output
Motivation: CS Stanford First Pillars of Strategy
Pervasive computer systems: Within two decades,
computers will routinely collect terabytes of data person
per day. A majority of our business and personal decisions will
rely on pervasive data. To create the capability to collect and
interpret such massive amounts of data, we will build computer
systems of unprecedented scale and connectivity. And we will
invent new, more scalable techniques for extracting information
from massive data streams. To facilitate this transformation,
the Stanford CS Department has to strengthen its faculty in
machine learning, distributed databases, software engineering,
privacy, and security.
Motivation: CS Stanford Second Pillar of Strategy
Physical and virtual environments: Within two decades,
we will embed sensors into a majority of our physical
spaces (cars, buildings, urban areas). These sensors will
enable us to gather detailed models of physical spaces and
activities, and to seamlessly integrate these models into virtual
worlds. As a result, we will overcome critical boundaries that
presently constrain people-to-people and people-toenvironment interactions. We will be able to carry out
meaningful long-term interactions with others regardless of
where they are, what language they speak, and so on. To
enable this transformation, the Stanford CS Department has to
increase its presence in sensor networks, computer vision,
natural language, embedded systems, computer graphics,
creative design, and human-computer interaction.
Motivation: CS Stanford Third Pillar of Strategy
Computational foundations for other academic
disciplines: Within two decades, about a dozen non-CS
disciplines in science and engineering will adopt CS as a core
methodology. These disciplines will rely on CS to provide
computational methods for problem solving and as a
foundational framework in which key concepts can be stated
formally. And they will rely on CS to harvest scientific
information from massive new datasets. This dialogue between
CS and other scientific discipline will fundamentally transform
science and engineering. The Stanford CS Department has to
help Stanford University to position itself as a global leader in
this new array of CS-enabled disciplines, through a plurality of
activities in faculty recruiting, research, and education.
Comment: This development will contribute to the
increase in volume and variance of I/O
I/O Relevance Example
•  Volvo 240 GL (1984): No embedded processors
•  Current VW Diesel “Cheatmobile”: ~40 embedded processors
•  Current Tesla: ~70 embedded processors
–  Interacting through I/O with car, driver, passengers, service stations,
manufacturer
–  Future will include car-to-car information exchange
•  I/O appears to be increasingly varied and relevant for
emerging computer systems
Input/Output
•  The processor receives input in the form of electrical signals
•  The processor generates and transmits output in the form of
electrical signals
•  Sensors generate electrical output based on some physical
condition
•  Actuators influence some physical condition based on
electrical input
•  Devices may be
–  Simple sensor or actuator (GPS, accelerometer, gyroscope, stepping
motor, …)
–  Complex system, possibly combining
•  One or more sensors/actuators
•  Mechanics
•  Memory
•  Software, …
Devices
•  Devices hook on to and communicate with computer via
–  Standardized interfaces
•  Electrical
•  Mechanical
•  Communication Protocols
–  Choice of interface depends on technical, economical, and market
issues
Input and Output
•  A computers job is to move and process data
–  Computation (CPU, caches, memory)
–  Move data in and out of the computer (between I/O devices and
memory or registers)
•  Challenges with I/O devices
–  Different categories: Storage, networking, displays, …
–  Large number of device drivers needed
–  Device drivers that run in kernel mode may crash systems
•  Goals of the OS
–  Provide a generic, consistent, convenient and reliable way to access
I/O devices
–  Achieve potential system performance
The Mother of all Demos (1968)
•  First demo of modern mouse-keyboard, graphical user
interface
•  Integrates new development, hardware & software
•  Doug Engelbart (1925-2013). Stanford Research Institute
(SRI), Doug Engelbart Institute
•  Find the video at:
–  http://sloan.stanford.edu/MouseSite/1968Demo.html
Yngvar Lundh – Computing Pioneer
•  Founder (1960) and leader
of Digital Group
(“Siffergruppen”) at NDRE
•  “Siffergruppen” – Family
home of Norsk Data,
Mycron,
Dolphin interconnect etc.
•  Implemented early (1959)
video game (Tic-tac-toe) at
MIT on TX-0
•  A lot of other interesting
stuff
•  Prof. emer. At Ifi/UiO
Big Picture
I/O Devices
•  Keyboard, mouse, joystick, sound card, display, scanner, camera, printer,
network card, DVD, disk, memory stick, GPS, wind-sensor, motion,
radiation, temperature, flow, humidity, gyroscope, etc. etc.
•  Large diversity:
–  many, widely differing device types
–  devices within each type also differ
•  Speed:
–  varying, often slow access & transfer compared to CPU
–  some device-types require very fast access & transfer
(e.g., graphic display, high-speed networks)
•  Access:
–  sequential vs. random
–  read, write, read & write
•  ...
•  Expect to see new types of I/O devices, and new application of old types
Today we talk about I/O
•  characteristics
•  interconnection
•  devices & controllers (disks will be lectured in detail later)
•  data transfers
•  I/O software
•  buffering
•  ...
Device Controllers
•  Piece of HW that controls one or more identical or similar
devices
•  Location
–  integrated on the host motherboard
–  PC-card (e.g., PCI)
–  embedded in the device itself
(e.g., disks often have additional embedded controllers)
–  Combinations of the above
Device Drivers
•  Software that provides interface between
–  Single device or class of devices
–  Operating system
•  Interface between operating system and device drivers may
be:
–  Standardized
–  Non-standardized
I/O: “Bird’s Eye View”
Device
Device
controller
Device
driver
Device
Device
controller
Device
driver
..
.
..
.
Device
controller
Device
driver
Device
Device
Rest of the
operating
system
I/O System
I/O Devices
•  Block devices: store information in fixed-size blocks, each
one with its own address
– 
– 
– 
– 
common block sizes: 512 B – 64 KB
addressable
it is possible to read or write each block independently of all others
e.g., disks, floppy, tape, CD, DVD, ...
•  Character devices: delivers or accepts a stream of
characters, without regard to any block structure
–  it is not addressable and does not have any seek operation
–  e.g., keyboards, mice, terminals, line printers, network interfaces,
and most other devices that are not disk-like...
•  Does all devices fit in?
–  clocks and timers
–  memory-mapped screens
Operating system’s job
•  Goal of OS: Abstract IO devices
•  Present a unified interface to user space
–  Provide a generic, consistent, convenient and
reliable way to access I/O devices
–  For instance, user-level app can call
open(“/dev/device”, …) and interact with “any” device
–  Device independence
•  Achieve good performance
–  What is “good” performance?
I/O Software Stack
Four Basic Questions
•  How are devices connected to CPU/memory?
•  How are device controller registers accessed &
protected?
•  How are data transmitted?
•  Synchronization: interrupts versus polling?
Classic North Bridge/South Bridge Architecture:
Via P4X266 Chipset
•  The north bridge manages traffic from
– 
– 
– 
– 
•  The south bridge manages traffic from
– 
– 
– 
– 
– 
– 
CPU
CPU & caches
memory
advanced graphics ports (AGPs)
(peripheral component interconnect (PCI)
busses)
memory
universal serial bus (USB)
IEEE 1394
ATA
(PCI busses)
keyboard & mouse
...
PCI
•  Via P4X266
–  PCI on south bridge
–  Increased south-north link compared to older
–  Integrated 10/100 Ethernet on south bridge
USB, ...
audio
north
bridge
AGP
south
bridge
ATA
?
keyboard,
mouse,
floppy
Hub Architecture: Intel 850 Chipset
•  The memory controller hub (MCH)
manages traffic from
–  CPU & caches
–  memory
–  AGP
•  The I/O controller hub (ICH)
manages traffic from
–  all other devices....
four 8-bit, 66 MHz ; 266 MB/s
•  Most of the Intel 8XX chipsets
have the hub architecture
Hub Architecture: Intel 875P Chipset
•  MCH improvements
–  AGP:
4x à 8x
–  memory interface:
200 à 400 MHz
–  system (front side) bus:
400/533 à 800 MHz
–  Gbps network interface
•  But, still only
–  four 8-bit, 66 MHz
(266 MBps) hub-to-hub
interface
–  32 bit, 33 MHz PCI bus
•  However, some chipsets
(e.g., 840) have a 64-bit,
33/66 MHz PCI Controller
Hub (P64H) connected
directly to the MCH by a
2x (16 bit) wide hub interface
•  Server chipsets (e.g., E7500) may
have several P64Hs replacing the ICH
http://en.wikipedia.org/wiki/List_of_Intel_chipsets
Intel 5520
Intel Z97
Northbridge functionality migrated onto Processor Chip
Intel X99 Chipset
Northbridge functionality migrated onto Processor Chip
Four Basic Questions
•  How are devices connected to CPU/memory?
•  How are device controller registers accessed & protected?
•  How are data transmitted?
•  Synchronization: interrupts versus polling?
Accessing Device Controller Registers
•  To communicate with the CPU, each controller have a few
registers for communicating with the device
–  The device driver needs to access these registers
•  Additionally, some devices need a memory buffer
•  Two alternatives: port I/O and memory mapped I/O
Port I /O
•  Devices registers mapped onto “ports” – ports form
a separate address space
memory
I/O ports
•  The processor has special I/O instructions to read/
write ports
•  Port access is protected by allowing I/O instructions
only when processor is in kernel/supervisor mode
•  Used for example by IBM 360 and successors
Memory Mapped I/ O
•  Device register addresses are mapped into regular
address space
Memory address space
memory
mapped I/O
•  Use regular load/store instructions to read/write
registers
•  Use memory protection mechanism to protect
device registers from unauthorized access
•  Used for example by PDP-11
Memory Mapped I/O vs. Port I/O
•  Ports:
–  special I/O instructions are CPU dependent
•  Memory mapped:
+  memory protection mechanism allows greater flexibility than
protected instructions
+  may use all memory reference instructions for I/O
–  Device register addresses must not be cached (must be able to
selectively disable caching of address regions)
–  Cannot “drown” I/O device address logic by presenting devices with
every memory address accessed. Bridges are initiated to make sure
only allocated address regions are forwarded onto slow peripheral
buses.
•  Intel Pentium use a hybrid
–  Address 640K to 1M is used for memory mapped I/O data buffers
–  I/O ports 0 to 64K is used for device control registers
Four Basic Questions
•  How are devices connected to CPU/memory?
•  How are device controller registers accessed & protected?
•  How are data transmitted?
•  Synchronization: interrupts versus polling?
Performing I/O Data Transmissions
•  Programmed I/O (PIO)
–  the CPU handles the transfers
–  transfers data between registers and device
•  Interrupt driven I/O
–  use CPU to transfer data, but let an I/O module run concurrently
•  Direct Memory Access (DMA)
–  an adaptor accesses main memory
–  transfers blocks of data between memory and device
•  Channel
–  simple specialized peripheral processor dedicated to I/O
–  handles most transmission, but less control
–  shared memory. No private memory.
•  Peripheral Processor (PPU)
–  general processor dedicated to I/O control and transmission
–  shared and private memory. (CDC 6600, 1964)
Programmed I/O Example RS-232 serial port
•  Simple serial controller
–  Status registers (ready, busy, ... )
–  Data register
•  Output
–  CPU:
•  Wait until device is not “busy”
•  Write data to “data” register
•  Tell device “ready”
–  Device
•  Wait until “ready”
•  Clear “ready” and set “busy”
•  Take data from “data” register
•  Clear “busy”
PIO
•  Device delivers data
to controller
Pentium 4
Processor
registers
cache(s)
•  PIO:
–  CPU reads data from
controller buffer to
register
RDRAM
memory
controller
hub
RDRAM
RDRAM
RDRAM
–  CPU writes register
to memory location
•  CPU is busy moving
data
I/O
controller
hub
free PCI slots
free PCI slots
disk controller
Direct Memory Access (DMA)
• Example
– Disk
• A simple disk adaptor
– Status register (done, interrupt, ...)
– DMA command
– DMA memory address and size
– DMA data buffer
• DMA Write CPU:
– Wait until DMA device is “ready”
– Clear “ready”
–  Set DMAWrite, address, size
– Set “start”
– Block current thread/process
– Disk adaptor:
• DMA data to device (size--; address++)
• Interrupt when “size == 0”
– CPU (interrupt handler):
• Put the blocked thread/process into ready queue
• Disk: Move data to disk
DMA
•  Device delivers data to
controller
Pentium 4
Processor
registers
•  DMA:
1.  set up DMA controller
2.  DMA controller initiates transfer
3.  data is moved (increasing
address, reducing count)
4.  disk controller notifies
DMA controller when finished
(count = 0)
5.  DMA controller interrupts
•  CPU is free
•  Cycle stealing on memory bus
cache(s)
DMA
controller
address
count
....
RDRAM
memory
controller
hub
RDRAM
RDRAM
RDRAM
I/O
controller
hub
free PCI slots
free PCI slots
disk controller
PIO vs. DMA
•  DMA:
+  supports large transfers, latency of requiring bus is amortized over
hundreds/thousands of bytes
–  may be expensive for small transfers
–  overhead to handle virtual memory and cache consistence
o  is common practice
•  PIO:
–  uses the CPU
–  loads data into registers and cache
+  potentially faster for small transfers with carefully designed software
Detour – Latency Numbers
Latency Comparison Numbers
-------------------------L1 cache reference
0.5 ns
Branch mispredict
5
ns
L2 cache reference
7
ns
14x L1 cache
Mutex lock/unlock
25
ns
Main memory reference
100
ns
20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy
3,000
ns
Send 1K bytes over 1 Gbps network
10,000
ns
0.01 ms
Read 4K randomly from SSD*
150,000
ns
0.15 ms
Read 1 MB sequentially from memory
250,000
ns
0.25 ms
Round trip within same datacenter
500,000
ns
0.5 ms
Read 1 MB sequentially from SSD*
1,000,000
ns
1
ms 4X memory
Disk seek
10,000,000
ns
10
ms 20x datacenter roundtrip
Read 1 MB sequentially from disk
20,000,000
ns
20
ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA
150,000,000
ns 150
ms
Notes
----1 ns = 10-9 seconds
1 ms = 10-3 seconds
* Assuming ~1GB/sec SSD
Credit
-----By Jeff Dean:
http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers
Source: https://gist.github.com/jboner/2841832
Four Basic Questions
•  How are devices connected to CPU/memory?
•  How are device controller registers accessed & protected?
•  How are data transmitted?
•  Synchronization: Interrupts versus polling?
Synchronization: Interrupts vs. Polling
•  Polling:
–  processor polls the device while waiting for I/O to complete
–  wastes cycles – inefficient
•  Interrupt:
–  device asserts interrupt when I/O completed
–  frees processor to move on to other tasks
–  interrupt processing is costly and introduces latency penalty
•  Possible strategy:
–  apply interrupts, but reduce interrupts frequency through careful
driver/controller interaction
Polling in Program I/O
• Wait until device is not “busy”
– A polling loop!
• Advantages
– Simple
• Disadvantage
– Slow
– Waste CPU cycles
• Example
– If a device runs 100 operations / second, CPU
may need to wait for 10 msec or 10,000,000 CPU
cycles (1Ghz CPU)
• Interrupt mechanism will allow CPU to avoid polling
Interrupt-Driven I/O
•  Writing a string to the printer using interruptdriven I/O
a)  code executed when print system call is made
b)  interrupt service procedure
Interrupt-Driven Device
•  Simple mouse controller
– Status registers (done, int, ...)
– Data registers (ΔX, ΔY, button)
• Input
– Mouse:
•  Wait until “done”
•  Store ΔX, ΔY, and button into data
registers
•  Raise interrupt
– CPU (interrupt handler)
•  Clear “done”
•  Move ΔX, ΔY, and button into kernel
buffer
•  Set “done”
•  Call scheduler
Synchronous Read
Asynchronous Read
Application uses POSIX P1003.4 Asynchronous I/O Interface Functions
Device Driver Design Issues
•  Operating system and driver communication
–  Commands and data between OS and device drivers
•  Driver and hardware communication
–  Commands and data between driver and hardware
•  Driver operations
– 
– 
– 
– 
– 
– 
Initialize devices
Interpreting commands from OS
Schedule multiple outstanding requests
Manage data transfers
Accept and process interrupts
Maintain the integrity of driver and kernel data structures
Design Issues
•  Statically install device drivers
–  Reboot OS to install a new device driver
•  Dynamically download device drivers
–  No reboot, but use an indirection
–  Load drivers into kernel memory
–  Install entry points and maintain related data structures
–  Initialize the device drivers
Dynamic binding of Device Drivers
• Indirection
open(1,…)
– Indirect table for all device
driver entry points
• Download a driver
– Allocate kernel memory
– Store driver code
– Link up all entry points
• Delete a driver
– Unlink entry points
– Deallocate kernel memory
Example System: UAV Instruments
Unmanned Aerial Vehicles
•  Many different sensors and
instruments onboard
•  GPS, Inertial Measurement Unit (IMU), camera, IR
camera, spectrometer, networking (GPRS, Iridium
satellite modem, Microhard radio modem), laser
altimeter, custom instruments (measure engine
RPM, internal temperatures, …)
•  Custom payload computer
manages sensor and
instrument package, logs data,
communicates with ground
station
Picture from Norut’s flight campaign in Ny-Ålesund, Svalbard, July 2012.
Instrument characteristics
• Instruments/sensors are typically either serial or USB
• Gather between 1 KB and 10 MB of data/sec
– Sensors: Stream data.
•  Little work for driver once device is configured
•  Deliver data such as position, attitude, other measurements
– Communications: Streaming I/O
•  Interfaces vary from Ethernet-based radio modem networking to
“back to the eighties” modems over 2G/3G or satellite complete
with AT command set and other “fun” stuff.
– Video using analog video capture USB sticks
– Off-the-shelf DSLRs for gathering still images. Interface to
camera resembles a block-based device.
Driver implementation
• All (custom) drivers implemented in user
space
– Not dynamically loaded by kernel, but managed by payload control
software. Hopefully no kernel panics this way!
– Use kernel’s standard interface to I/O devices
•  open(“/dev/device”, …)
– Most drivers implemented in python
– Biggest challenge: Buggy or unreliable hardware
•  Next on the list: Naming! (Especially for serial devices)
– Best approach to fix transient bugs: Disconnect power
to device. Sadly, no simple way to do this for USB- or
serial-oriented devices
Device Driver Interface
•  Open( deviceNumber )
–  Initialization and allocate resources (buffers)
•  Close( deviceNumber )
–  Cleanup, deallocate, and possibly turnoff
•  Device driver types
– 
– 
– 
– 
Block: fixed sized block data transfer
Character: variable sized data transfer
Terminal: character driver with terminal control
Network: streams for networking
Device Driver Interface
•  Block devices:
–  read( deviceNumber, deviceAddr, bufferAddr )
•  transfer a block of data from “deviceAddr” to “bufferAddr”
–  write( deviceNumber, deviceAddr, bufferAddr )
•  transfer a block of data from “bufferAddr” to “deviceAddr”
–  seek( deviceNumber, deviceAddress )
•  move the head to the correct position
•  usually not necessary
•  Character devices:
–  read( deviceNumber, bufferAddr, size )
•  reads “size” bytes from a byte stream device to “bufferAddr”
–  write( deviceNumber, bufferAddr, size )
•  write “size” bytes from “bufferSize” to a byte stream device
Some Unix Device Driver Interface Entry Points
•  init(): Initialize hardware
•  start(): Boot time initialization (require system services)
•  open(dev, flag, id): initialization for read or write
•  close/release(dev, flag, id): release resources after read and write
•  halt(): call before the system is shutdown
•  intr(vector): called by the kernel on a hardware interrupt
•  read()/write(): data transfer
•  poll(pri): called by the kernel 25 to 100 times a second
•  ioctl(dev, cmd, arg, mode): special request processing
Device-Independent I/O Software
•  Functions of the device-independent I/O software:
Uniform interfacing for device drivers
Buffering
Error reporting
Allocating and releasing dedicate devices
Providing a device-independent block size
...
Why Buffering in Kernel?
•  Speed mismatch between the producer and consumer
–  Character device and block device, for example
–  Adapt different data transfer sizes (packets vs. streams)
•  DMA requires contiguous physical memory
–  I/O devices see physical memory
–  User programs use virtual memory
•  Spooling
–  Avoid deadlock problems
•  Caching
–  Serve for same requests of the same data
–  Reduce I/O operations
Buffering
a)  No buffer
c)  Kernel buffer, copying to user
–  interrupt per character/block
b)  User buffering
–  user blocks until buffer full or I/
O complete
–  paging problems!?
–  what if buffer is full/busy when
new data arrives?
d)  Double kernel buffering
–  alternate buffers, read from one,
write to the other
Asynchronous I/O
•  Why do we want asynchronous I/O?
–  Life is simple if all I/O is synchronous
•  How to implement asynchronous I/O?
–  On
• 
• 
–  On
• 
a read
copy data from a system buffer if the data is there
otherwise, block the current process
a write
copy to a system buffer, initiate the write and return
Example: Clocks
•  Old, simple clocks used power lines and caused an interrupt at every
voltage pulse (50 - 60 Hz)
•  New clocks use
xtal
oscillator
frequency adjuster
–  quartz crystal oscillators generating periodic signals at a very high frequency
–  counter which is decremented each pulse - if zero, it causes an interrupt
–  register to load the counter
•  May have several outputs
•  Different modes
interrupt
default
value
–  one-shot - counter is restored only by software
–  square-wave - counter is reset immediately (e.g., for clock ticks)
Examples: Clocks
•  HW only generates clock interrupts
•  It is up to the clock software (driver) to make use of this
–  Maintaining time-of-day
–  Preventing processes from running longer than allowed
–  Accounting for CPU usage
–  Handling ALARM system call
–  Providing watchdog timers
–  Doing profiling, monitoring, and statistics gathering
Example: Keyboard
•  Keyboards provide input as a
sequence of bits
•  Example - coded with IRA (international reference alph.):
“K” = b7b6b5b4b3b2b1 = 1001011
•  Raw mode vs. Cooked mode
•  Buffering
Example: Keyboard
Intel 82C55A
Example: Keyboard
Pentium
Processor
registers
Intel 82C55A
cache(s)
memory
controller
hub
RDRAM
interrupt
RDRAM
RDRAM
RDRAM
I/O
controller
hub
PCI slots
PCI slots
PCI slots
keyboard, mouse, ...
Summary
•  A large fraction of the OS is concerned with I/O
•  Several ways to do I/O
•  Several layers of software