* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS 519 -- Operating Systems -
Survey
Document related concepts
Transcript
I/O and File Systems CS 519: Operating System Theory Computer Science, Rutgers University Instructor: Thu D. Nguyen TA: Xiaoyan Li Spring 2002 I/O Devices So far we have talked about how to abstract and manage CPU and memory Computation “inside” computer is useful only if some results are communicated “outside” of the computer I/O devices are the computer’s interface to the outside world (I/O Input/Output) Example devices: display, keyboard, mouse, speakers, network interface, and disk Computer Science, Rutgers 2 CS 519: Operating System Theory Basic Computer Structure CPU Memory Memory Bus (System Bus) Bridge I/O Bus NIC Disk Computer Science, Rutgers 3 CS 519: Operating System Theory Intel SR440BX Motherboard CPU System Bus & MMU/AGP/PCI Controller I/O Bus IDE Disk Controller USB Controller Another I/O Bus Serial & Parallel Ports Computer Science, Rutgers Keyboard & Mouse 4 CS 519: Operating System Theory OS: Abstractions and Access Methods OS must virtualizes a wide range of devices into a few simple abstractions: Storage Hard drives, Tapes, CDROM Networking Ethernet, radio, serial line Multimedia DVD, Camera, microphones Operating system should provide consistent methods to access the abstractions Otherwise, programming is too hard Computer Science, Rutgers 5 CS 519: Operating System Theory User/OS method interface The same interface is used to access devices (like disks and network lines) and more abstract resources like files 4 main methods: open() close() read() write() Semantics depend on the type of the device (block, char, net) These methods are system calls because they are the methods the OS provides to all processes. Computer Science, Rutgers 6 CS 519: Operating System Theory Unix I/O Methods fileHandle = open(pathName, flags, mode) a file handle is a small integer, valid only within a single process, to operate on the device or file pathname: a name in the file system. In unix, devices are put under /dev. E.g. /dev/ttya is the first serial port, /dev/sda the first SCSI drive flags: blocking or non-blocking … mode: read only, read/write, append … errorCode = close(fileHandle) Kernel will free the data structures associated with the device Computer Science, Rutgers 7 CS 519: Operating System Theory Unix I/O Methods byteCount = read(fileHandle, byte [] buf, count) read at most count bytes from the device and put them in the byte buffer buf. Bytes placed from 0th byte. Kernel can give the process less bytes, user process must check the byteCount to see how many were actually returned. A negative byteCount signals an error (value is the error type) byteCount = write(fileHandle, byte [] buf, count) write at most count bytes from the buffer buf actual number written returned in byteCount a Negative byteCount signals an error Computer Science, Rutgers 8 CS 519: Operating System Theory Unix I/O Example What’s the correct way to write a 1000 bytes? calling: ignoreMe = write(fileH, buffer, 1000); works most of the time. What happens if: can’t accept 1000 bytes right now? disk is full? someone just deleted the file? Let’s work this out on the board Computer Science, Rutgers 9 CS 519: Operating System Theory I/O semantics From this basic interface, three different dimension to how I/O is processed: blocking vs. non-blocking buffered vs. unbuffered synchronous vs. asynchronous The O/S tries to support as many of these dimensions as possible for each device The semantics are specified during the open() system call Computer Science, Rutgers 10 CS 519: Operating System Theory Blocking vs. Non-Blocking I/O Blocking: process is suspended until all bytes in the count field are read or written E.g., for a network device, if the user wrote 1000 bytes, then the operating system would write the bytes to the device one at a time until the write() completed. + Easy to use and understand - if the device just can’t perform the operation (e.g. you unplug the cable), what to do? Give up an return the successful number of bytes. Nonblocking: the OS only reads or writes as many bytes as is possible without suspending the process + Returns quickly - more work for the programmer (but really for robust programs)? Computer Science, Rutgers 11 CS 519: Operating System Theory Buffered vs. Unbuffered I/O Sometime we want the ease of programming of blocked I/O without the long waits if the buffers on the device are small. buffered I/O allows the kernel to make a copy of the data write() side: allows the process to write() many bytes and continue processing read() side: As device signals data is ready, kernel places data in the buffer. When the process calls read(), the kernel just copies the buffer. Why not use buffered I/O? - Extra copy overhead - Delays sending data Computer Science, Rutgers 12 CS 519: Operating System Theory Synchronous vs. Asynchronous I/O Synchronous I/O: the user process does not run at the same time the I/O does --- it is suspended during I/O So far, all the methods presented have been synchronous. Asynchronous I/O: The read() or write() call returns a small object instead of a count. separate set of methods in unix: aio_read(), aio_write() The user can call methods on the returned object to check “how much” of the I/O has completed The user can also allow a signal handler to run when the the I/O has completed. Computer Science, Rutgers 13 CS 519: Operating System Theory Handling Multiple I/O streams If we use blocking I/O, how do we handle > 1 device at a time? Example : a small program that reads from serial line and outputs to a tape and is also reading from the tape and outputting to a serial line Structure the code like this? while (TRUE) { read(tape); // block until tape is ready write(serial line);// send data to serial line read(serial line); write(tape); } Could use non-blocking I/O, but huge waste of cycles if data is not ready. Computer Science, Rutgers 14 CS 519: Operating System Theory Solution: select system call totalFds = select(nfds, readfds, writefds, errorfds, timeout); nfds: the range (0.. nfds) of file descriptors to check readfds: bit map of fileHandles. user sets the bit X to ask the kernel to check if fileHandle X ready for reading. Kernel returns a 1 if data can be read on the fileHandle. writefds: bit map if fileHandle Y for writing. Operates same as read bitmap. errorfds: bit map to check for errors. timeout: how long to wait before un-suspending the process totalFds = number of set bits, negative number is an error Computer Science, Rutgers 15 CS 519: Operating System Theory Three Device Types Most operating system have three device types: Character devices Used for serial-line types of devices (e.g. USB port) Network devices Used for network interfaces (E.g. Ethernet card) Block devices: Used for mass-storage (E.g. disks and CDROM) What you can expect from the read/write methods changes with each device type Computer Science, Rutgers 16 CS 519: Operating System Theory Character Devices Device is represented by the OS as an ordered stream of bytes bytes sent out the device by the write() system call bytes read from the device by the read() system call Byte stream has no “start”, just open and start/reading writing The user has no control of the read/write ratio, the sender process might write() 1 time of 1000 bytes, and the receiver may have to call 1000 read calls each receiving 1 byte. Computer Science, Rutgers 17 CS 519: Operating System Theory Network Devices Like I/O devices, but each write() call either sends the entire block (packet), up to some maximum fixed size, or none. On the receiver, the read() call returns all the bytes in the block, or none. Computer Science, Rutgers 18 CS 519: Operating System Theory Block Devices OS presents device as a large array of blocks Each block has a fixed size (1KB - 8KB is typical) User can read/write only in fixed sized blocks Unlike other devices, block devices support random access We can read or write anywhere in the device without having to ‘read all the bytes first’ But how to specify the nth block with current interface? Computer Science, Rutgers 19 CS 519: Operating System Theory The file pointer O/S adds a concept call the file pointer A file pointer is associated with each open file, if the device is a block device the next read or write operates at the position in the device pointed to by the file pointer The file pointer is measured in bytes, not blocks Computer Science, Rutgers 20 CS 519: Operating System Theory Seeking in a device To set the file pointer: absoluteOffset = lseek(fileHandle, offset, whence); whence specifies if the offset is absolute, from byte 0, or relative to the current file pointer position the absolute offset is returned; negative numbers signal error codes For devices, the offset should be a integral number of bytes relative to the size of a block. How could you tell what the current position of the file pointer is without changing it? Computer Science, Rutgers 21 CS 519: Operating System Theory Block Device Example You want to read the 10th block of a disk each disk block is 2048 bytes long fh = open(/dev/sda, , , ); pos = lseek(fh, size*count, ); if (pos < 0 ) error; bytesRead = read(fh, buf, blockSize); if (bytesRead < 0) error; … Computer Science, Rutgers 22 CS 519: Operating System Theory Getting and setting device specific information Unix has an IO ConTroL system call: ErrorCode = ioctl(fileHandle, int request, object); request is a numeric command to the device can also pass an optional, arbitrary object to a device the meaning of the command and the type of the object are device specific Computer Science, Rutgers 23 CS 519: Operating System Theory Communication Between CPU and I/O Devices How does the CPU communicate with I/O devices? Memory-mapped communication Each I/O device assigned a portion of the physical address space CPU I/O device CPU writes to locations in this area to "talk" to I/O device I/O device CPU Polling: CPU repeatedly check location(s) in portion of address space assigned to device Interrupt: Device sends an interrupt (on an interrupt line) to get the attention of the CPU Programmed I/O, Interrupt-Driven, Direct Memory Access PIO and ID = word at a time DMA = block at a time Computer Science, Rutgers 24 CS 519: Operating System Theory Programmed I/O vs. DMA Programmed I/O is ok for sending commands, receiving status, and communication of a small amount of data Inefficient for large amount of data Keeps CPU busy during the transfer Programmed I/O memory operations slow Direct Memory Access Device read/write directly from/to memory Memory device typically initiated from CPU Device memory can be initiated by either the device or the CPU Computer Science, Rutgers 25 CS 519: Operating System Theory Programmed I/O vs. DMA CPU Memory Interconnect CPU Memory Interconnect CPU Memory Interconnect Disk Disk Disk Programmed I/O DMA DMA Device Memory Problems? Computer Science, Rutgers 26 CS 519: Operating System Theory Direct Memory Access Used to avoid programmed I/O for large data movement Requires DMA controller Bypasses CPU to transfer data directly between I/O device and memory Computer Science, Rutgers 27 CS 519: Operating System Theory Six step process to perform DMA transfer Computer Science, Rutgers 28 CS 519: Operating System Theory Performance I/O a major factor in system performance Demands CPU to execute device driver, kernel I/O code Context switches due to interrupts Data copying Network traffic especially stressful Computer Science, Rutgers 29 CS 519: Operating System Theory Improving Performance Reduce number of context switches Reduce data copying Reduce interrupts by using large transfers, smart controllers, polling Use DMA Balance CPU, memory, bus, and I/O performance for highest throughput Computer Science, Rutgers 30 CS 519: Operating System Theory Device Driver OS module controlling an I/O device Hides the device specifics from the above layers in the kernel Supporting a common API UNIX: block or character device Block: device communicates with the CPU/memory in fixed-size blocks Character/Stream: stream of bytes Translates logical I/O into device I/O E.g., logical disk blocks into {head, track, sector} Performs data buffering and scheduling of I/O operations Structure Several synchronous entry points: device initialization, queue I/O requests, state control, read/write An asynchronous entry point to handle interrupts Computer Science, Rutgers 31 CS 519: Operating System Theory Some Common Entry Points for UNIX Device Drivers Attach: attach a new device to the system. Close: note the device is not in use. Halt: prepare for system shutdown. Init: initialize driver globals at load or boot time. Intr: handle device interrupt (not used). Ioctl: implement control operations. Mmap: implement memory-mapping (SVR4). Open: connect a process to a device. Read: character-mode input. Size: return logical size of block device. Start: initialize driver at load or boot time. Write: character-mode output. Computer Science, Rutgers 32 CS 519: Operating System Theory User to Driver Control Flow read, write, ioctl user kernel ordinary file special file file system character device block device buffer cache character queue driver_read/write Computer Science, Rutgers driver-strategy 33 CS 519: Operating System Theory Buffer Cache When an I/O request is made for a block, the buffer cache is checked first If block is missing from the cache, it is read into the buffer cache from the device Exploits locality of reference as any other cache Replacement policies similar to those for VM UNIX Historically, UNIX has a buffer cache for the disk which does not share buffers with character/stream devices Adds overhead in a path that has become increasingly common: disk NIC Computer Science, Rutgers 34 CS 519: Operating System Theory Disks Sectors Tracks Seek time: time to move the disk head to the desired track Rotational delay: time to reach desired sector once head is over the desired track Transfer rate: rate data read/write to disk Some typical parameters: Seek: ~10-15ms Rotational delay: ~4.15ms for 7200 rpm Transfer rate: 30 MB/s Computer Science, Rutgers 35 CS 519: Operating System Theory Disk Scheduling Disks are at least four orders of magnitude slower than main memory The performance of disk I/O is vital for the performance of the computer system as a whole Access time (seek time+ rotational delay) >> transfer time for a sector Therefore the order in which sectors are read matters a lot Disk scheduling Usually based on the position of the requested sector rather than according to the process priority Possibly reorder stream of read/write request to improve performance Computer Science, Rutgers 36 CS 519: Operating System Theory Disk Scheduling Policies Shortest-service-time-first (SSTF): pick the request that requires the least movement of the head SCAN (back and forth over disk): good service distribution C-SCAN (one way with fast return): lower service variability Problem with SSTF, SCAN, and C-SCAN: arm may not move for long time (due to rapid-fire accesses to same track) N-step SCAN: scan of N records at a time by breaking the request queue in segments of size at most N and cycling through them FSCAN: uses two sub-queues, during a scan one queue is consumed while the other one is produced Computer Science, Rutgers 37 CS 519: Operating System Theory Disk Management Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. To use a disk to hold files, the operating system still needs to record its own data structures on the disk. Partition the disk into one or more groups of cylinders. Logical formatting or “making a file system”. Boot block initializes system. The bootstrap is stored in ROM. Bootstrap loader program. Methods such as sector sparing used to handle bad blocks. Computer Science, Rutgers 38 CS 519: Operating System Theory Swap-Space Management Swap-space — Virtual memory uses disk space as an extension of main memory. Swap-space can be carved out of the normal file system,or, more commonly, it can be in a separate disk partition. Swap-space management 4.3BSD allocates swap space when process starts; holds text segment (the program) and data segment. Kernel uses swap maps to track swap-space use. Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. Computer Science, Rutgers 39 CS 519: Operating System Theory Disk Reliability Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. RAID is one important technique currently in common use. Computer Science, Rutgers 40 CS 519: Operating System Theory RAID Redundant Array of Inexpensive Disks (RAID) A set of physical disk drives viewed by the OS as a single logical drive Replace large-capacity disks with multiple smaller-capacity drives to improve the I/O performance (at lower price) Data are distributed across physical drives in a way that enables simultaneous access to data from multiple drives Redundant disk capacity is used to compensate for the increase in the probability of failure due to multiple drives Improve availability because no single point of failure Six levels of RAID representing different design alternatives Computer Science, Rutgers 41 CS 519: Operating System Theory RAID Level 0 Does not include redundancy Data is stripped across the available disks Total storage space across all disks is divided into strips Strips are mapped round-robin to consecutive disks A set of consecutive strips that maps exactly one strip to each disk in the array is called a stripe Can you see how this improves the disk I/O bandwidth? What access pattern gives the best performance? stripe 0 Computer Science, Rutgers strip 0 strip 1 strip 2 strip 3 strip 4 ... strip 5 strip 6 strip 7 42 CS 519: Operating System Theory RAID Level 1 Redundancy achieved by duplicating all the data Every disk has a mirror disk that stores exactly the same data A read can be serviced by either of the two disks which contains the requested data (improved performance over RAID 0 if reads dominate) A write request must be done on both disks but can be done in parallel Recovery is simple but cost is high Computer Science, Rutgers strip 0 strip 1 strip 1 strip 0 strip 2 ... strip 3 strip 3 strip 2 43 CS 519: Operating System Theory RAID Levels 2 and 3 Parallel access: all disks participate in every I/O request Small strips since size of each read/write = # of disks * strip size RAID 2: error correcting code is calculated across corresponding bits on each data disk and stored on log(# data disks) parity disks Hamming code: can correct single-bit errors and detect double-bit errors Less expensive than RAID 1 but still pretty high overhead – not really needed in most reasonable environments RAID 3: a single redundant disk that keeps parity bits P(i) = X2(i) X1(i) X0(i) In the event of a failure, data can be reconstructed Can only tolerate a single failure at a time b0 Computer Science, Rutgers b1 b2 P(b) 44 X2(i) = P(i) X1(i) X0(i) CS 519: Operating System Theory RAID Levels 4 and 5 RAID 4 Large strips with a parity strip like RAID 3 Independent access - each disk operates independently, so multiple I/O request can be satisfied in parallel Independent access small write = 2 reads + 2 writes Example: if write performed only on strip 0: P’(i) = X2(i) X1(i) X0’1(i) = X2(i) X1(i) X0’(i) X0(i) X0(i) = P(i) X0’(i) X0(i) Parity disk can become bottleneck strip 0 strip 1 strip 2 P(0-2) strip 3 strip 4 strip 5 P(3-5) RAID 5 Like RAID 4 but parity strips are distributed across all disks Computer Science, Rutgers 45 CS 519: Operating System Theory File System File system is an abstraction of the disk File Track/sector To a user process A file looks like a contiguous block of bytes (Unix) A file system provides a coherent view of a group of files A file system provides protection API: create, open, delete, read, write files Performance: throughput vs. response time Reliability: minimize the potential for lost or destroyed data E.g., RAID could be implemented in the OS as part of the disk device driver Computer Science, Rutgers 46 CS 519: Operating System Theory File API To read or write, need to open open() returns a handle to the opened file OS associates a data structure with the handle Data structure maintains current “cursor” position in the stream of bytes in the file Read and write takes place from the current position Can specify a different location explicitly When done, should close the file Computer Science, Rutgers 47 CS 519: Operating System Theory Files vs. Disk Disk Files ??? Computer Science, Rutgers 48 CS 519: Operating System Theory Files vs. Disk Disk Files Contiguous Layout What’s the problem with this mapping function? What’s the potential benefit of this mapping function? Computer Science, Rutgers 49 CS 519: Operating System Theory Files vs. Disk Disk Files What’s the problem with this mapping function? Computer Science, Rutgers 50 CS 519: Operating System Theory UNIX File i-nodes Computer Science, Rutgers 51 CS 519: Operating System Theory Defragmentation Want link-based organization of disk blocks of a file for efficient random access Want sequential layout of disk blocks for efficient sequential access How to reconcile? Computer Science, Rutgers 52 CS 519: Operating System Theory Defragmentation (cont’d) Base structure is link-based Optimize for sequential access Defragmentation: move the blocks around to simulate actual sequential layout of files Group allocation of blocks: group tracks together (cylinder). Try to allocate all blocks of a file from a single cylinder so that they are close together Computer Science, Rutgers 53 CS 519: Operating System Theory Free Space Management No policy issues here – just mechanism Bitmap: one bit for each block on the disk Good to find a contiguous group of free blocks Files are often accessed sequentially Small enough to be kept in memory Chained free portions: pointer to the next one Not so good for sequential access Index: treats free space as a file Computer Science, Rutgers 54 CS 519: Operating System Theory File System OK, we have files How can we name them? How can we organize them? Computer Science, Rutgers 55 CS 519: Operating System Theory Tree-Structured Directories Computer Science, Rutgers 56 CS 519: Operating System Theory File Naming Each file has an associated human-readable name e.g., usr, bin, mid-term.pdf, design.pdf File name must be globally unique Otherwise how would the system know which file we are referring to? OS must maintain a mapping between file names and the set of blocks belong to that file Mappings are kept in directories Computer Science, Rutgers 57 CS 519: Operating System Theory Acyclic-Graph Directories Have shared subdirectories and files Computer Science, Rutgers 58 CS 519: Operating System Theory General Graph Directory Computer Science, Rutgers 59 CS 519: Operating System Theory Unix File System Ordinary files (uninterpreted) Directories File of files Organized as a rooted tree Pathnames (relative and absolute) Contains links to parent, itself Multiple links to files can exist Link - hard OR symbolic Computer Science, Rutgers 60 CS 519: Operating System Theory Unix File Systems (Cont’d) Tree-structured file hierarchies Mounted on existing space by using mount No links between different file systems Computer Science, Rutgers 61 CS 519: Operating System Theory File Naming Each file has a unique name User visible (external) name must be symbolic In a hierarchical file system, unique external names are given as pathnames (path from the root to the file) Internal names: i-node in UNIX - an index into an array of file descriptors/headers for a volume Directory: translation from external to internal name May have more than one external name for a single internal name Information about file is split between the directory and the file descriptor: name, type, size, location on disk, owner, permissions, date created, date last modified, date last access, link count Computer Science, Rutgers 62 CS 519: Operating System Theory Name Space In UNIX, “devices are files” / E.g., /dev/cdrom, /dev/tape User process accesses devices by accessing corresponding file usr C Computer Science, Rutgers 63 A B D CS 519: Operating System Theory File System Buffer Cache application: OS: read/write files translate file to disk blocks ...buffer cache ... maintains controls disk accesses: read/write blocks hardware: Any problems? Computer Science, Rutgers 64 CS 519: Operating System Theory File System Buffer Cache Disks are “stable” while memory is volatile What happens if you buffer a write and the machine crashes before the write has been saved to disk? Can use write-through but write performance will suffer In UNIX Use un-buffered I/O when writing i-nodes or pointer blocks Use buffered I/O for other writes and force sync every 30 seconds What about replacement? How can we further improve performance? Computer Science, Rutgers 65 CS 519: Operating System Theory Application-controlled caching application: OS: read/write files replacement policy translate file to disk blocks ...buffer cache ... maintains controls disk accesses: read/write blocks hardware: Computer Science, Rutgers 66 CS 519: Operating System Theory Application-Controlled File Caching Two-level block replacement: responsibility is split between kernel and user level A global allocation policy performed by the kernel which decides which process will give up a block A block replacement policy decided by the user: Kernel provides the candidate block as a hint to the process The process can overrule the kernel’s choice by suggesting an alternative block The suggested block is replaced by the kernel Examples of alternative replacement policy: mostrecently used (MRU) Computer Science, Rutgers 67 CS 519: Operating System Theory Sound kernel-user cooperation Oblivious processes should do no worse than under LRU Foolish processes should not hurt other processes Smart processes should perform better than LRU whenever possible and they should never perform worse If kernel selects block A and user chooses B instead, the kernel swaps the position of A and B in the LRU list and places B in a “placeholder” which points to A (kernel’s choice) If the user process misses on B (i.e. it made a bad choice), and B is found in the placeholder, then the block pointed to by the placeholder is chosen (prevents hurting other processes) Computer Science, Rutgers 68 CS 519: Operating System Theory File System Consistency File system almost always uses a buffer/disk cache for performance reasons Two copies of a disk block (buffer cache, disk) consistency problem if the system crashes before all the modified blocks are written back to disk This problem is critical especially for the blocks that contain control information: i-node, free-list, directory blocks Utility programs for checking block and directory consistency Write critical blocks from the buffer cache to disk immediately Data blocks are written to disk periodically: sync Computer Science, Rutgers 69 CS 519: Operating System Theory More on File System Consistency To maintain file system consistency the ordering of updates from buffer cache to disk is critical Example: if the directory block (contains pointer to inode) is written back before the i-node and the system crashes, the directory structure will be inconsistent Similar case when free list is updated before i-node and the system crashes, free list will be incorrect A more elaborate solution: use dependencies between blocks containing control data in the buffer cache to specify the ordering of updates Computer Science, Rutgers 70 CS 519: Operating System Theory Protection Mechanisms Files are OS objects: unique names and a finite set of operations that processes can perform on them Protection domain is a set of {object,rights} where right is the permission to perform one of the operations At every instant in time, each process runs in some protection domain In Unix, a protection domain is {uid, gid} Protection domain in Unix is switched when running a program with SETUID/SETGID set or when the process enters the kernel mode by issuing a system call How to store all the protection domains? Computer Science, Rutgers 71 CS 519: Operating System Theory Protection Mechanisms (cont’d) Access Control List (ACL): associate with each object a list of all the protection domains that may access the object and how In Unix ACL is reduced to three protection domains: owner, group and others Capability List (C-list): associate with each process a list of objects that may be accessed along with the operations C-list implementation issues: where/how to store them (hardware, kernel, encrypted in user space) and how to revoke them Computer Science, Rutgers 72 CS 519: Operating System Theory Log-Structured File System (LFS) As memory gets larger, buffer cache size increases increase the fraction of read requests which can be satisfied from the buffer cache with no disk access In the future, most disk accesses will be writes but writes are usually done in small chunks in most file systems (control data, for instance) which makes the file system highly inefficient LFS idea: structure the entire disk as a log Periodically, or when required, all the pending writes being buffered in memory are collected and written as a single contiguous segment at the end of the log Computer Science, Rutgers 73 CS 519: Operating System Theory LFS segment Contain i-nodes, directory blocks and data blocks, all mixed together Each segment starts with a segment summary Segment size: 512 KB - 1MB Two key issues: How to retrieve information from the log? How to manage the free space on disk? Computer Science, Rutgers 74 CS 519: Operating System Theory File Location in LFS The i-node contains the disk addresses of the file block as in standard UNIX But there is no fixed location for the i-node An i-node map is used to maintain the current location of each i-node i-node map blocks can also be scattered but a fixed checkpoint region on the disk identifies the location of all the i-node map blocks Usually i-node map blocks are cached in main memory most of the time, thus disk accesses for them are rare Computer Science, Rutgers 75 CS 519: Operating System Theory Segment Cleaning in LFS LFS disk is divided into segments that are written sequentially Live data must be copied out of a segment before the segment can be re-written The process of copying data out of a segment: cleaning A separate cleaner thread moves along the log, removes old segments from the end and puts live data into memory for rewriting in the next segment As a result a LFS disk appears like a big circular buffer with the writer thread adding new segments to the front and the cleaner thread removing old segments from the end Bookkeeping is not trivial: i-node must be updated when blocks are moved to the current segment Computer Science, Rutgers 76 CS 519: Operating System Theory LFS Performance Computer Science, Rutgers 77 CS 519: Operating System Theory LFS Performance (Cont’d) Computer Science, Rutgers 78 CS 519: Operating System Theory Elephant File System How many time have you deleted something and then wish you did not? What about wishing you could find an old version? Disks are getting big enough and cheap enough to do something about these problems Elephant maintains files’ histories Nice clean idea Names can be extended with time Look for version close to that time Computer Science, Rutgers 79 CS 519: Operating System Theory What Histories Are Used For Undo Ability to undo a recent change Complete history be retained but for a limited period of time e.g., hour, day, or week Long term history Users may want to recover old versions Typically think in terms of landmark versions ala source control Group changes so don’t require as much space Computer Science, Rutgers 80 CS 519: Operating System Theory Retention Policies Keep One Keep All Keep Safe Basically undo for some period of time Keep Landmarks Retain only landmark versions User can specify any version as a landmark Heuristic for grouping changes into landmark versions Past a certain time, all changes made within some amount of time (e.g., several minutes) are grouped into one landmark version Keep track of when versions are merged Computer Science, Rutgers 81 CS 519: Operating System Theory Implementation Implementation fairly straightforward One interesting point is cleaner needs to lock inode log while cleaning This prevents writes while the cleaner is cleaning a file Must be fast (while holding the lock, anyways) Different than LFS cleaner since does not need to coalesce free space Supports application guided retention Performs virtually the same as FFS Better for large copy Computer Science, Rutgers 82 CS 519: Operating System Theory File System Profile Keep One: 33.6% of files – 56.3% of bytes Keep Safe: 3.9% of files – 28.5% of bytes Keep Landmarks: 62.4% of files – 15.2% of bytes Computer Science, Rutgers 83 CS 519: Operating System Theory