Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
I/O Spring 2016 Tore Brox-Larsen Includes original material by Kai Li, Andrew S. Tanenbaum, Pål Halvorsen Otto J. Anshus, Bård Fjukstad, Daniel Stødle, and Andy Bavier The Five Parts of a Computer • CPU – Datapath – Control • Memory • Input • Output Motivation: CS Stanford First Pillars of Strategy Pervasive computer systems: Within two decades, computers will routinely collect terabytes of data person per day. A majority of our business and personal decisions will rely on pervasive data. To create the capability to collect and interpret such massive amounts of data, we will build computer systems of unprecedented scale and connectivity. And we will invent new, more scalable techniques for extracting information from massive data streams. To facilitate this transformation, the Stanford CS Department has to strengthen its faculty in machine learning, distributed databases, software engineering, privacy, and security. Motivation: CS Stanford Second Pillar of Strategy Physical and virtual environments: Within two decades, we will embed sensors into a majority of our physical spaces (cars, buildings, urban areas). These sensors will enable us to gather detailed models of physical spaces and activities, and to seamlessly integrate these models into virtual worlds. As a result, we will overcome critical boundaries that presently constrain people-to-people and people-toenvironment interactions. We will be able to carry out meaningful long-term interactions with others regardless of where they are, what language they speak, and so on. To enable this transformation, the Stanford CS Department has to increase its presence in sensor networks, computer vision, natural language, embedded systems, computer graphics, creative design, and human-computer interaction. Motivation: CS Stanford Third Pillar of Strategy Computational foundations for other academic disciplines: Within two decades, about a dozen non-CS disciplines in science and engineering will adopt CS as a core methodology. These disciplines will rely on CS to provide computational methods for problem solving and as a foundational framework in which key concepts can be stated formally. And they will rely on CS to harvest scientific information from massive new datasets. This dialogue between CS and other scientific discipline will fundamentally transform science and engineering. The Stanford CS Department has to help Stanford University to position itself as a global leader in this new array of CS-enabled disciplines, through a plurality of activities in faculty recruiting, research, and education. Comment: This development will contribute to the increase in volume and variance of I/O I/O Relevance Example • Volvo 240 GL (1984): No embedded processors • Current VW Diesel “Cheatmobile”: ~40 embedded processors • Current Tesla: ~70 embedded processors – Interacting through I/O with car, driver, passengers, service stations, manufacturer – Future will include car-to-car information exchange • I/O appears to be increasingly varied and relevant for emerging computer systems Input/Output • The processor receives input in the form of electrical signals • The processor generates and transmits output in the form of electrical signals • Sensors generate electrical output based on some physical condition • Actuators influence some physical condition based on electrical input • Devices may be – Simple sensor or actuator (GPS, accelerometer, gyroscope, stepping motor, …) – Complex system, possibly combining • One or more sensors/actuators • Mechanics • Memory • Software, … Devices • Devices hook on to and communicate with computer via – Standardized interfaces • Electrical • Mechanical • Communication Protocols – Choice of interface depends on technical, economical, and market issues Input and Output • A computers job is to move and process data – Computation (CPU, caches, memory) – Move data in and out of the computer (between I/O devices and memory or registers) • Challenges with I/O devices – Different categories: Storage, networking, displays, … – Large number of device drivers needed – Device drivers that run in kernel mode may crash systems • Goals of the OS – Provide a generic, consistent, convenient and reliable way to access I/O devices – Achieve potential system performance The Mother of all Demos (1968) • First demo of modern mouse-keyboard, graphical user interface • Integrates new development, hardware & software • Doug Engelbart (1925-2013). Stanford Research Institute (SRI), Doug Engelbart Institute • Find the video at: – http://sloan.stanford.edu/MouseSite/1968Demo.html Yngvar Lundh – Computing Pioneer • Founder (1960) and leader of Digital Group (“Siffergruppen”) at NDRE • “Siffergruppen” – Family home of Norsk Data, Mycron, Dolphin interconnect etc. • Implemented early (1959) video game (Tic-tac-toe) at MIT on TX-0 • A lot of other interesting stuff • Prof. emer. At Ifi/UiO Big Picture I/O Devices • Keyboard, mouse, joystick, sound card, display, scanner, camera, printer, network card, DVD, disk, memory stick, GPS, wind-sensor, motion, radiation, temperature, flow, humidity, gyroscope, etc. etc. • Large diversity: – many, widely differing device types – devices within each type also differ • Speed: – varying, often slow access & transfer compared to CPU – some device-types require very fast access & transfer (e.g., graphic display, high-speed networks) • Access: – sequential vs. random – read, write, read & write • ... • Expect to see new types of I/O devices, and new application of old types Today we talk about I/O • characteristics • interconnection • devices & controllers (disks will be lectured in detail later) • data transfers • I/O software • buffering • ... Device Controllers • Piece of HW that controls one or more identical or similar devices • Location – integrated on the host motherboard – PC-card (e.g., PCI) – embedded in the device itself (e.g., disks often have additional embedded controllers) – Combinations of the above Device Drivers • Software that provides interface between – Single device or class of devices – Operating system • Interface between operating system and device drivers may be: – Standardized – Non-standardized I/O: “Bird’s Eye View” Device Device controller Device driver Device Device controller Device driver .. . .. . Device controller Device driver Device Device Rest of the operating system I/O System I/O Devices • Block devices: store information in fixed-size blocks, each one with its own address – – – – common block sizes: 512 B – 64 KB addressable it is possible to read or write each block independently of all others e.g., disks, floppy, tape, CD, DVD, ... • Character devices: delivers or accepts a stream of characters, without regard to any block structure – it is not addressable and does not have any seek operation – e.g., keyboards, mice, terminals, line printers, network interfaces, and most other devices that are not disk-like... • Does all devices fit in? – clocks and timers – memory-mapped screens Operating system’s job • Goal of OS: Abstract IO devices • Present a unified interface to user space – Provide a generic, consistent, convenient and reliable way to access I/O devices – For instance, user-level app can call open(“/dev/device”, …) and interact with “any” device – Device independence • Achieve good performance – What is “good” performance? I/O Software Stack Four Basic Questions • How are devices connected to CPU/memory? • How are device controller registers accessed & protected? • How are data transmitted? • Synchronization: interrupts versus polling? Classic North Bridge/South Bridge Architecture: Via P4X266 Chipset • The north bridge manages traffic from – – – – • The south bridge manages traffic from – – – – – – CPU CPU & caches memory advanced graphics ports (AGPs) (peripheral component interconnect (PCI) busses) memory universal serial bus (USB) IEEE 1394 ATA (PCI busses) keyboard & mouse ... PCI • Via P4X266 – PCI on south bridge – Increased south-north link compared to older – Integrated 10/100 Ethernet on south bridge USB, ... audio north bridge AGP south bridge ATA ? keyboard, mouse, floppy Hub Architecture: Intel 850 Chipset • The memory controller hub (MCH) manages traffic from – CPU & caches – memory – AGP • The I/O controller hub (ICH) manages traffic from – all other devices.... four 8-bit, 66 MHz ; 266 MB/s • Most of the Intel 8XX chipsets have the hub architecture Hub Architecture: Intel 875P Chipset • MCH improvements – AGP: 4x à 8x – memory interface: 200 à 400 MHz – system (front side) bus: 400/533 à 800 MHz – Gbps network interface • But, still only – four 8-bit, 66 MHz (266 MBps) hub-to-hub interface – 32 bit, 33 MHz PCI bus • However, some chipsets (e.g., 840) have a 64-bit, 33/66 MHz PCI Controller Hub (P64H) connected directly to the MCH by a 2x (16 bit) wide hub interface • Server chipsets (e.g., E7500) may have several P64Hs replacing the ICH http://en.wikipedia.org/wiki/List_of_Intel_chipsets Intel 5520 Intel Z97 Northbridge functionality migrated onto Processor Chip Intel X99 Chipset Northbridge functionality migrated onto Processor Chip Four Basic Questions • How are devices connected to CPU/memory? • How are device controller registers accessed & protected? • How are data transmitted? • Synchronization: interrupts versus polling? Accessing Device Controller Registers • To communicate with the CPU, each controller have a few registers for communicating with the device – The device driver needs to access these registers • Additionally, some devices need a memory buffer • Two alternatives: port I/O and memory mapped I/O Port I /O • Devices registers mapped onto “ports” – ports form a separate address space memory I/O ports • The processor has special I/O instructions to read/ write ports • Port access is protected by allowing I/O instructions only when processor is in kernel/supervisor mode • Used for example by IBM 360 and successors Memory Mapped I/ O • Device register addresses are mapped into regular address space Memory address space memory mapped I/O • Use regular load/store instructions to read/write registers • Use memory protection mechanism to protect device registers from unauthorized access • Used for example by PDP-11 Memory Mapped I/O vs. Port I/O • Ports: – special I/O instructions are CPU dependent • Memory mapped: + memory protection mechanism allows greater flexibility than protected instructions + may use all memory reference instructions for I/O – Device register addresses must not be cached (must be able to selectively disable caching of address regions) – Cannot “drown” I/O device address logic by presenting devices with every memory address accessed. Bridges are initiated to make sure only allocated address regions are forwarded onto slow peripheral buses. • Intel Pentium use a hybrid – Address 640K to 1M is used for memory mapped I/O data buffers – I/O ports 0 to 64K is used for device control registers Four Basic Questions • How are devices connected to CPU/memory? • How are device controller registers accessed & protected? • How are data transmitted? • Synchronization: interrupts versus polling? Performing I/O Data Transmissions • Programmed I/O (PIO) – the CPU handles the transfers – transfers data between registers and device • Interrupt driven I/O – use CPU to transfer data, but let an I/O module run concurrently • Direct Memory Access (DMA) – an adaptor accesses main memory – transfers blocks of data between memory and device • Channel – simple specialized peripheral processor dedicated to I/O – handles most transmission, but less control – shared memory. No private memory. • Peripheral Processor (PPU) – general processor dedicated to I/O control and transmission – shared and private memory. (CDC 6600, 1964) Programmed I/O Example RS-232 serial port • Simple serial controller – Status registers (ready, busy, ... ) – Data register • Output – CPU: • Wait until device is not “busy” • Write data to “data” register • Tell device “ready” – Device • Wait until “ready” • Clear “ready” and set “busy” • Take data from “data” register • Clear “busy” PIO • Device delivers data to controller Pentium 4 Processor registers cache(s) • PIO: – CPU reads data from controller buffer to register RDRAM memory controller hub RDRAM RDRAM RDRAM – CPU writes register to memory location • CPU is busy moving data I/O controller hub free PCI slots free PCI slots disk controller Direct Memory Access (DMA) • Example – Disk • A simple disk adaptor – Status register (done, interrupt, ...) – DMA command – DMA memory address and size – DMA data buffer • DMA Write CPU: – Wait until DMA device is “ready” – Clear “ready” – Set DMAWrite, address, size – Set “start” – Block current thread/process – Disk adaptor: • DMA data to device (size--; address++) • Interrupt when “size == 0” – CPU (interrupt handler): • Put the blocked thread/process into ready queue • Disk: Move data to disk DMA • Device delivers data to controller Pentium 4 Processor registers • DMA: 1. set up DMA controller 2. DMA controller initiates transfer 3. data is moved (increasing address, reducing count) 4. disk controller notifies DMA controller when finished (count = 0) 5. DMA controller interrupts • CPU is free • Cycle stealing on memory bus cache(s) DMA controller address count .... RDRAM memory controller hub RDRAM RDRAM RDRAM I/O controller hub free PCI slots free PCI slots disk controller PIO vs. DMA • DMA: + supports large transfers, latency of requiring bus is amortized over hundreds/thousands of bytes – may be expensive for small transfers – overhead to handle virtual memory and cache consistence o is common practice • PIO: – uses the CPU – loads data into registers and cache + potentially faster for small transfers with carefully designed software Detour – Latency Numbers Latency Comparison Numbers -------------------------L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms Notes ----1 ns = 10-9 seconds 1 ms = 10-3 seconds * Assuming ~1GB/sec SSD Credit -----By Jeff Dean: http://research.google.com/people/jeff/ Originally by Peter Norvig: http://norvig.com/21-days.html#answers Source: https://gist.github.com/jboner/2841832 Four Basic Questions • How are devices connected to CPU/memory? • How are device controller registers accessed & protected? • How are data transmitted? • Synchronization: Interrupts versus polling? Synchronization: Interrupts vs. Polling • Polling: – processor polls the device while waiting for I/O to complete – wastes cycles – inefficient • Interrupt: – device asserts interrupt when I/O completed – frees processor to move on to other tasks – interrupt processing is costly and introduces latency penalty • Possible strategy: – apply interrupts, but reduce interrupts frequency through careful driver/controller interaction Polling in Program I/O • Wait until device is not “busy” – A polling loop! • Advantages – Simple • Disadvantage – Slow – Waste CPU cycles • Example – If a device runs 100 operations / second, CPU may need to wait for 10 msec or 10,000,000 CPU cycles (1Ghz CPU) • Interrupt mechanism will allow CPU to avoid polling Interrupt-Driven I/O • Writing a string to the printer using interruptdriven I/O a) code executed when print system call is made b) interrupt service procedure Interrupt-Driven Device • Simple mouse controller – Status registers (done, int, ...) – Data registers (ΔX, ΔY, button) • Input – Mouse: • Wait until “done” • Store ΔX, ΔY, and button into data registers • Raise interrupt – CPU (interrupt handler) • Clear “done” • Move ΔX, ΔY, and button into kernel buffer • Set “done” • Call scheduler Synchronous Read Asynchronous Read Application uses POSIX P1003.4 Asynchronous I/O Interface Functions Device Driver Design Issues • Operating system and driver communication – Commands and data between OS and device drivers • Driver and hardware communication – Commands and data between driver and hardware • Driver operations – – – – – – Initialize devices Interpreting commands from OS Schedule multiple outstanding requests Manage data transfers Accept and process interrupts Maintain the integrity of driver and kernel data structures Design Issues • Statically install device drivers – Reboot OS to install a new device driver • Dynamically download device drivers – No reboot, but use an indirection – Load drivers into kernel memory – Install entry points and maintain related data structures – Initialize the device drivers Dynamic binding of Device Drivers • Indirection open(1,…) – Indirect table for all device driver entry points • Download a driver – Allocate kernel memory – Store driver code – Link up all entry points • Delete a driver – Unlink entry points – Deallocate kernel memory Example System: UAV Instruments Unmanned Aerial Vehicles • Many different sensors and instruments onboard • GPS, Inertial Measurement Unit (IMU), camera, IR camera, spectrometer, networking (GPRS, Iridium satellite modem, Microhard radio modem), laser altimeter, custom instruments (measure engine RPM, internal temperatures, …) • Custom payload computer manages sensor and instrument package, logs data, communicates with ground station Picture from Norut’s flight campaign in Ny-Ålesund, Svalbard, July 2012. Instrument characteristics • Instruments/sensors are typically either serial or USB • Gather between 1 KB and 10 MB of data/sec – Sensors: Stream data. • Little work for driver once device is configured • Deliver data such as position, attitude, other measurements – Communications: Streaming I/O • Interfaces vary from Ethernet-based radio modem networking to “back to the eighties” modems over 2G/3G or satellite complete with AT command set and other “fun” stuff. – Video using analog video capture USB sticks – Off-the-shelf DSLRs for gathering still images. Interface to camera resembles a block-based device. Driver implementation • All (custom) drivers implemented in user space – Not dynamically loaded by kernel, but managed by payload control software. Hopefully no kernel panics this way! – Use kernel’s standard interface to I/O devices • open(“/dev/device”, …) – Most drivers implemented in python – Biggest challenge: Buggy or unreliable hardware • Next on the list: Naming! (Especially for serial devices) – Best approach to fix transient bugs: Disconnect power to device. Sadly, no simple way to do this for USB- or serial-oriented devices Device Driver Interface • Open( deviceNumber ) – Initialization and allocate resources (buffers) • Close( deviceNumber ) – Cleanup, deallocate, and possibly turnoff • Device driver types – – – – Block: fixed sized block data transfer Character: variable sized data transfer Terminal: character driver with terminal control Network: streams for networking Device Driver Interface • Block devices: – read( deviceNumber, deviceAddr, bufferAddr ) • transfer a block of data from “deviceAddr” to “bufferAddr” – write( deviceNumber, deviceAddr, bufferAddr ) • transfer a block of data from “bufferAddr” to “deviceAddr” – seek( deviceNumber, deviceAddress ) • move the head to the correct position • usually not necessary • Character devices: – read( deviceNumber, bufferAddr, size ) • reads “size” bytes from a byte stream device to “bufferAddr” – write( deviceNumber, bufferAddr, size ) • write “size” bytes from “bufferSize” to a byte stream device Some Unix Device Driver Interface Entry Points • init(): Initialize hardware • start(): Boot time initialization (require system services) • open(dev, flag, id): initialization for read or write • close/release(dev, flag, id): release resources after read and write • halt(): call before the system is shutdown • intr(vector): called by the kernel on a hardware interrupt • read()/write(): data transfer • poll(pri): called by the kernel 25 to 100 times a second • ioctl(dev, cmd, arg, mode): special request processing Device-Independent I/O Software • Functions of the device-independent I/O software: Uniform interfacing for device drivers Buffering Error reporting Allocating and releasing dedicate devices Providing a device-independent block size ... Why Buffering in Kernel? • Speed mismatch between the producer and consumer – Character device and block device, for example – Adapt different data transfer sizes (packets vs. streams) • DMA requires contiguous physical memory – I/O devices see physical memory – User programs use virtual memory • Spooling – Avoid deadlock problems • Caching – Serve for same requests of the same data – Reduce I/O operations Buffering a) No buffer c) Kernel buffer, copying to user – interrupt per character/block b) User buffering – user blocks until buffer full or I/ O complete – paging problems!? – what if buffer is full/busy when new data arrives? d) Double kernel buffering – alternate buffers, read from one, write to the other Asynchronous I/O • Why do we want asynchronous I/O? – Life is simple if all I/O is synchronous • How to implement asynchronous I/O? – On • • – On • a read copy data from a system buffer if the data is there otherwise, block the current process a write copy to a system buffer, initiate the write and return Example: Clocks • Old, simple clocks used power lines and caused an interrupt at every voltage pulse (50 - 60 Hz) • New clocks use xtal oscillator frequency adjuster – quartz crystal oscillators generating periodic signals at a very high frequency – counter which is decremented each pulse - if zero, it causes an interrupt – register to load the counter • May have several outputs • Different modes interrupt default value – one-shot - counter is restored only by software – square-wave - counter is reset immediately (e.g., for clock ticks) Examples: Clocks • HW only generates clock interrupts • It is up to the clock software (driver) to make use of this – Maintaining time-of-day – Preventing processes from running longer than allowed – Accounting for CPU usage – Handling ALARM system call – Providing watchdog timers – Doing profiling, monitoring, and statistics gathering Example: Keyboard • Keyboards provide input as a sequence of bits • Example - coded with IRA (international reference alph.): “K” = b7b6b5b4b3b2b1 = 1001011 • Raw mode vs. Cooked mode • Buffering Example: Keyboard Intel 82C55A Example: Keyboard Pentium Processor registers Intel 82C55A cache(s) memory controller hub RDRAM interrupt RDRAM RDRAM RDRAM I/O controller hub PCI slots PCI slots PCI slots keyboard, mouse, ... Summary • A large fraction of the OS is concerned with I/O • Several ways to do I/O • Several layers of software