Download CS 519 -- Operating Systems -

CS519: Lecture 4 I/O and File Management I/O Devices  So far we have talked about how to abstract and manage CPU and memory (processes, VM, etc)  Now: I/O and file management  I/O devices are the computer’s interface to the outside world (I/O  Input/Output) Example devices: display, keyboard, mouse, speakers, network interface, and disk CS 519 2 Operating System Theory Basic Computer Structure CPU Memory Memory Bus (System Bus) Bridge I/O Bus NIC Disk CS 519 3 Operating System Theory Intel SR440BX Motherboard CPU System Bus & MMU/AGP/PCI Controller I/O Bus IDE Disk Controller USB Controller Another I/O Bus Serial & Parallel Ports CS 519 Keyboard & Mouse 4 Operating System Theory Communication Between CPU and I/O Devices  How does the CPU communicate with I/O devices? Memory-mapped communication Each I/O device assigned a portion of the physical address space CPU  I/O device • CPU writes to locations in this area to "talk" to I/O device I/O device  CPU • Polling: CPU repeatedly check location(s) in portion of address space assigned to device • Interrupt: Device sends an interrupt (on an interrupt line) to get the attention of the CPU Programmed I/O, Interrupt-Driven, Direct Memory Access PIO and ID = word at a time DMA = block at a time CS 519 5 Operating System Theory Programmed I/O vs. DMA  Programmed I/O is ok for sending commands, receiving status, and communication of a small amount of data  Inefficient for a large amount of data Keeps CPU busy during the transfer Programmed I/O  memory operations  slow  Direct Memory Access Device read/write directly from/to memory Transfer from memory to device typically initiated from CPU Transfer from device to memory can be initiated by the device or the CPU CS 519 6 Operating System Theory Programmed I/O vs. DMA CPU Memory Interconnect CPU Memory Interconnect CPU Memory Interconnect Disk Disk Disk Programmed I/O DMA DMA CS 519 7 Operating System Theory Device Driver  OS module controlling an I/O device  Hides the device specifics from the above layers in the kernel  Supporting a common API  UNIX: block or character device Block: device communicates with the CPU/memory in fixed-size blocks Character/Stream: stream of bytes  Translates logical I/O into device I/O  E.g., logical disk blocks into {head, track, sector}  Performs data buffering and scheduling of I/O operations  Structure Several synchronous entry points: device initialization, queue I/O requests, state control, read/write An asynchronous entry point to handle interrupts CS 519 8 Operating System Theory Some Common Entry Points for UNIX Device Drivers             Attach: attach a new device to the system. Close: note the device is not in use. Halt: prepare for system shutdown. Init: initialize driver globals at load or boot time. Intr: handle device interrupt (not used). Ioctl: implement control operations. Mmap: implement memory-mapping (SVR4). Open: connect a process to a device. Read: character-mode input. Size: return logical size of block device. Start: initialize driver at load or boot time. Write: character-mode output. CS 519 9 Operating System Theory I/O Buffering  I/O Transfer – DMA  After an I/O request is placed the source/destination of the I/O transfer must be locked in memory  To allow user process to continue (when possible), data is often copied from user address space to kernel buffers (or vice-versa) which are pinned to memory Copying is expensive  asynchronous I/O  Devices are typically slow compared to CPU  How do we speed up accesses? Caching, of course …  I/O buffering  Buffer cache: a buffer in main memory for block devices  Character queue: follows the producer/consumer model (characters in the queue are read once) CS 519 10 Operating System Theory User to Driver Control Flow read, write, ioctl user kernel ordinary file special file file system character device block device buffer cache character queue driver_read/write CS 519 driver-strategy 11 Operating System Theory Buffer Cache  When an I/O request is made for a block, the buffer cache is checked first  If block is missing from the cache, it is read into the buffer cache from the device  Exploits locality of reference as any other cache  Replacement policies similar to those for VM  UNIX Historically, UNIX has a buffer cache for the disk which does not share buffers with character/stream devices Adds overhead in a path that has become increasingly common: disk  NIC CS 519 12 Operating System Theory Disks Sectors Tracks  Seek time: time to move the disk head to the desired track  Rotational delay: time to reach desired sector once head is over the desired track  Transfer rate: rate data read/write to disk  Some typical parameters:  Seek: ~10-15ms  Rotational delay: ~4.15ms for 7200 rpm  Transfer rate: 30 MB/s CS 519 13 Operating System Theory Disk Scheduling  Disks are at least four orders of magnitude slower than main memory The performance of disk I/O is vital for the performance of the computer system as a whole Access time (seek time+ rotational delay) >> transfer time for a sector Therefore the order in which sectors are read matters a lot  Disk scheduling Usually based on the position of the requested sector rather than according to the process priority Possibly reorder stream of read/write request to improve performance CS 519 14 Operating System Theory Disk Scheduling Policies  Shortest-service-time-first (SSTF): pick the request that requires the least movement of the head  SCAN (back and forth over disk): good service distribution  C-SCAN (one way with fast return): lower service variability Problem with SSTF, SCAN, and C-SCAN: arm may not move for long time (due to rapid-fire accesses to same track)  N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N and cycling through them  FSCAN: uses two sub-queues, during a scan one queue is consumed while the other one is produced CS 519 15 Operating System Theory RAID  Redundant Array of Inexpensive Disks (RAID) A set of physical disk drives viewed by the OS as a single logical drive Replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance (at lower price) Data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives Redundant disk capacity is used to compensate for the increase in the probability of failure due to multiple drives Improve availability because no single point of failure  Six levels of RAID representing different design alternatives CS 519 16 Operating System Theory RAID Level 0  Does not include redundancy  Data is stripped across the available disks  Total storage space across all disks is divided into strips  Strips are mapped round-robin to consecutive disks  A set of consecutive strips that maps exactly one strip to each disk in the array is called a stripe  Can you see how this improves the disk I/O bandwidth?  What access pattern gives the best performance? stripe 0 CS 519 strip 0 strip 1 strip 2 strip 3 strip 4 ... strip 5 strip 6 strip 7 17 Operating System Theory RAID Level 1  Redundancy achieved by duplicating all the data  Every disk has a mirror disk that stores exactly the same data  A read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate)  A write request must be done on both disks but can be done in parallel  Recovery is simple but cost is high CS 519 strip 0 strip 1 strip 1 strip 0 strip 2 ... strip 3 strip 3 strip 2 18 Operating System Theory RAID Levels 2 and 3  Parallel access: all disks participate in every I/O request  Small strips since size of each read/write = # of disks * strip size  RAID 2: error correcting code is calculated across corresponding bits on each data disk and stored on log(# data disks) parity disks  Hamming code: can correct single-bit errors and detect double-bit errors  Less expensive than RAID 1 but still pretty high overhead – not really needed in most reasonable environments  RAID 3: a single redundant disk that keeps parity bits  P(i) = X2(i)  X1(i)  X0(i)  In the event of a failure, data can be reconstructed  Can only tolerate a single failure at a time b0 CS 519 b1 b2 P(b) 19 X2(i) = P(i)  X1(i)  X0(i) Operating System Theory RAID Levels 4 and 5  RAID 4  Large strips with a parity strip like RAID 3  Independent access - each disk operates independently, so multiple I/O request can be satisfied in parallel  Independent access  small write = 2 reads + 2 writes  Example: if write performed only on strip 0: P’(i) = X2(i)  X1(i)  X0’1(i) = X2(i)  X1(i)  X0’(i)  X0(i)  X0(i) = P(i)  X0’(i)  X0(i)  Parity disk can become bottleneck strip 0 strip 1 strip 2 P(0-2) strip 3 strip 4 strip 5 P(3-5)  RAID 5  Like RAID 4 but parity strips are distributed across all disks CS 519 20 Operating System Theory File System  File system is an abstraction of the disk File  Track/sector To a user process A file looks like a contiguous block of bytes (Unix) A file system provides a coherent view of a group of files A file system provides protection  API: create, open, delete, read, write files  Performance: throughput vs. response time  Reliability: minimize the potential for lost or destroyed data E.g., RAID could be implemented in the OS as part of the disk device driver CS 519 21 Operating System Theory Unix File System  Ordinary files (uninterpreted)  Directories File of files Organized as a rooted tree Pathnames (relative and absolute) Contains links to parent, itself Multiple links to files can exist Link - hard OR symbolic CS 519 22 Operating System Theory Unix File Systems (Cont’d)  Tree-structured file hierarchies  Mounted on existing space by using mount  No links between different file systems CS 519 23 Operating System Theory File Naming  Each file has a unique name  User visible (external) name must be symbolic  In a hierarchical file system, unique external names are given as pathnames (path from the root to the file)  Internal names: i-node in UNIX - an index into an array of file descriptors/headers for a volume  Directory: translation from external to internal name  May have more than one external name for a single internal name  Information about file is split between the directory and the file descriptor: name, type, size, location on disk, owner, permissions, date created, date last modified, date last access, link count CS 519 24 Operating System Theory Name Space  In UNIX, “devices are files” /  E.g., /dev/cdrom, /dev/tape  User process accesses devices by accessing corresponding file usr C CS 519 25 A B D Operating System Theory File Allocation  Contiguous: a contiguous set of blocks is pre-allocated to a file at the time of file creation  Good for sequential files  File size must be known at the time of file creation  External fragmentation – like memory allocation when giving a contiguous block to each job  So what do we do?  Dynamic allocation (new space allocated on demand)  First fit (first chunk of sufficient size), best fit (smallest chunk of sufficient size), nearest fit (chunk of sufficient size that is closest to the previous allocation for the same file)  Indexed allocation (contiguous and chained allocations are other options) with file allocation table. FAT includes file names and corresponding index block numbers  Use a disk allocation table (bit map, chained, and indexed) to manage the free space CS 519 26 Operating System Theory File Allocation Strategies  Contiguous allocation: find contiguous chunk for whole file  Chained allocation: pointer to next block allocated to file  Indexed: index block points to file blocks CS 519 27 Operating System Theory Free Space Management  Bitmap: one bit for each block on the disk Good to find a contiguous group of free blocks Small enough to be kept in memory Requires sequential scan of bits  Chained free portions: pointer to the next one  Indexed: treats free space as a file CS 519 28 Operating System Theory UNIX File i-nodes CS 519 29 Operating System Theory File System Buffer Cache application: OS: read/write files translate file to disk blocks ...buffer cache ... maintains controls disk accesses: read/write blocks hardware: Any problems? CS 519 30 Operating System Theory File System Buffer Cache  Disks are “stable” while memory is volatile What happens if you buffer a write and the machine crashes before the write has been saved to disk? Can use write-through but write performance will suffer In UNIX Use un-buffered I/O when writing i-nodes or pointer blocks Use buffered I/O for other writes and force sync every 30 seconds  What about replacement?  How can we further improve performance? CS 519 31 Operating System Theory Application-controlled caching application: OS: read/write files replacement policy translate file to disk blocks ...buffer cache ... maintains controls disk accesses: read/write blocks hardware: CS 519 32 Operating System Theory Application-Controlled File Caching  Two-level block replacement: responsibility is split between kernel and user level  A global allocation policy performed by the kernel which decides which process will give up a block  A block replacement policy decided by the user: Kernel provides the candidate block as a hint to the process The process can overrule the kernel’s choice by suggesting an alternative block The suggested block is replaced by the kernel  Examples of alternative replacement policy: mostrecently used (MRU) CS 519 33 Operating System Theory Sound kernel-user cooperation  Oblivious processes should do no worse than under LRU  Foolish processes should not hurt other processes  Smart processes should perform better than LRU whenever possible and they should never perform worse  If kernel selects block A and user chooses B instead, the kernel swaps the position of A and B in the LRU list and places B in a “placeholder” which points to A (kernel’s choice)  If the user process misses on B (i.e. it made a bad choice), and B is found in the placeholder, then the block pointed to by the placeholder is chosen (prevents hurting other processes) CS 519 34 Operating System Theory File System Consistency  File system almost always uses a buffer/disk cache for performance reasons  Two copies of a disk block (buffer cache, disk)  consistency problem if the system crashes before all the modified blocks are written back to disk  This problem is critical especially for the blocks that contain control information: i-node, free-list, directory blocks  Utility programs for checking block and directory consistency  Write critical blocks from the buffer cache to disk immediately  Data blocks are written to disk periodically: sync CS 519 35 Operating System Theory More on File System Consistency  To maintain file system consistency the ordering of updates from buffer cache to disk is critical  Example: if the directory block (contains pointer to inode) is written back before the i-node and the system crashes, the directory structure will be inconsistent  Similar case when free list is updated before i-node and the system crashes, free list will be incorrect  A more elaborate solution: use dependencies between blocks containing control data in the buffer cache to specify the ordering of updates CS 519 36 Operating System Theory Protection Mechanisms  Files are OS objects: unique names and a finite set of operations that processes can perform on them  Protection domain is a set of {object,rights} where right is the permission to perform one of the operations  At every instant in time, each process runs in some protection domain  In Unix, a protection domain is {uid, gid}  Protection domain in Unix is switched when running a program with SETUID/SETGID set or when the process enters the kernel mode by issuing a system call  How to store all the protection domains? CS 519 37 Operating System Theory Protection Mechanisms (cont’d)  Access Control List (ACL): associate with each object a list of all the protection domains that may access the object and how In Unix ACL is reduced to three protection domains: owner, group and others  Capability List (C-list): associate with each process a list of objects that may be accessed along with the operations C-list implementation issues: where/how to store them (hardware, kernel, encrypted in user space) and how to revoke them CS 519 38 Operating System Theory Log-Structured File System (LFS)  As memory gets larger, buffer cache size increases  increase the fraction of read requests which can be satisfied from the buffer cache with no disk access  In the future, most disk accesses will be writes  but writes are usually done in small chunks in most file systems (control data, for instance) which makes the file system highly inefficient  LFS idea: structure the entire disk as a log  Periodically, or when required, all the pending writes being buffered in memory are collected and written as a single contiguous segment at the end of the log CS 519 39 Operating System Theory LFS segment  Contain i-nodes, directory blocks and data blocks, all mixed together  Each segment starts with a segment summary  Segment size: 512 KB - 1MB  Two key issues: How to retrieve information from the log? How to manage the free space on disk? CS 519 40 Operating System Theory File Location in LFS  The i-node contains the disk addresses of the file block as in standard UNIX  But there is no fixed location for the i-node  An i-node map is used to maintain the current location of each i-node  i-node map blocks can also be scattered but a fixed checkpoint region on the disk identifies the location of all the i-node map blocks  Usually i-node map blocks are cached in main memory most of the time, thus disk accesses for them are rare CS 519 41 Operating System Theory Segment Cleaning in LFS  LFS disk is divided into segments that are written sequentially  Live data must be copied out of a segment before the segment can be re-written  The process of copying data out of a segment: cleaning  A separate cleaner thread moves along the log, removes old segments from the end and puts live data into memory for rewriting in the next segment  As a result a LFS disk appears like a big circular buffer with the writer thread adding new segments to the front and the cleaner thread removing old segments from the end  Bookkeeping is not trivial: i-node must be updated when blocks are moved to the current segment CS 519 42 Operating System Theory LFS Performance CS 519 43 Operating System Theory LFS Performance (Cont’d) CS 519 44 Operating System Theory

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CS 519 -- Operating Systems -