Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
File System Implementation CSCI 5103 Operating Systems File System Layout and Structure File System Data Structures Storage Allocation and Management File System Recovery Instructor: Abhishek Chandra 2 File System Implementation How are file systems implemented on underlying storage? Layered File System Structure Assume disk as the main storage medium Issues: Logical file system: Manages file system metadata How is the data organized on disk? Where is file system information kept? How to achieve space, performance efficiency? File-organization module: Translates logical file blocks to physical blocks Issues read/write requests to disk Manages file buffers and caches I/O control: Device drivers and interrupt handlers 3 Free space management and block allocation Basic file system: Performs I/O on storage device File organization, e.g., directories File control block (FCB): File info, e.g., inode Handles hardware-level I/O operations 4 1 File System Data Structures File system metadata: On-Disk Structures Information about disk partitions, directories, files, data File system maintains several data structures Boot control block On disk: Persistent structures In-memory: For quick access, caching Volume control block Directory structure Per-file FCB: File names, file ids (e.g., inode numbers) Contains file metadata 6 In-Memory Structures File System Structures Mount table: Directory cache: System open-file table: Per-process open-file table: Buffers: Volume-level: File-level: Information about mounted volumes 7 Contains information about volume E.g.: volume size, block size, free blocks, free FCBs UNIX superblock 5 Used to boot an OS from disk First block of volume Info about recently accessed directories Mount Table File Control Block In-memory File Tables FCBs of all open files Pointers to system open-file table entries Data being read/written from/to disk 8 2 Mount Table In-memory mount table File Control Block Contains metadata about file Contains entries for mounted file systems Each entry contains pointer to file system, type Windows: Entries for each drive (C:, D:, etc.) UNIX: In-memory inode of mount point directory is flagged, contains pointer to mount table entry Example: inode on UNIX systems Directory can be treated as special file What does OS do while mounting a volume? Verifies that device contains valid file system Adds an entry in the in-memory mount table 9 File properties: owner, mode, timestamps, etc. Pointers to disk blocks containing file data Can have its own FCB Will contain list of <filename, FCB> 10 Inode Structure In-Memory File Tables System-wide open-file table: Per-process open-file table: Each entry contains File Properties (owner, mode, timestamps, ...) Data Blocks Data Blocks Direct Data Block pointers Data Blocks Single Indirect ptr Contains FCBs for all open files in the system Pointer to system-wide open-file table entry for file Other info: File offset, Access mode File descriptor/file handle: Index into per-process open-file table Process uses this handle for all operations to the file Double Indirect ptr Triple Indirect ptr 11 12 3 UNIX: In-Memory File Tables Per-Process Open-File Table 0 System File Table 1 Proc A 2 3 4 RW, offset=10 File Open In-memory Inode Table Several steps: open (“file1”, “RW”)? 1. Search system-wide open-file table for file1 FCB 2. If not found, search directory structure (cached in memory or from disk) 3. Copy file1 FCB into system-wide open-file table 4. Create a new entry in per-process table pointing to file1 FCB in system table 5. Initialize mode (RW), file offset in per-process table entry 6. Return index to per-process table entry as the file descriptor File2 inode File1 inode 1 Proc B 2 3 4 13 What happens when user calls: R, offset=100 0 14 File System Implementation Directory Implementation Directory Implementation File Implementation: How does directory get implemented on disk? Linear list of file entries Sorted list: Hash table In-memory directory cache Space Allocation Free Space Management 15 Problems? Could keep entries sorted via B-tree Hash on file name provides index into linear list Recently accessed directories are cached 16 4 File Implementation Contiguous Allocation How is space allocated on disk for file contents? Contiguous allocation Linked allocation Indexed allocation Whole file allocated on contiguous set of blocks Issues: File location: location of first block+no. of blocks, pointer to next extent 18 Linked Allocation File Allocation Table (FAT) Each block contains pointer to next block How to do direct access to block n? Advantage? Problem? Issues: 19 Fragmentation File size estimation/growth Extent: Additional chunk of contiguous space added to file 17 How to do direct access to block n? Advantages? Problems? Direct access is inefficient Space requirement for pointers: Can be mitigated with clustering Reliability: what if a pointer is corrupted? One entry for each block Directory entry of file contains index to first block Each entry contains index of next block Last entry contains EOF value, free entry contains 0 Issues: FAT must be cached in memory for efficiency Can be large for large disks 20 5 Indexed Allocation Each file has an index block Indexed Allocation: Supporting Larger Files Contains list of pointers to data blocks How to do direct access to block n? Advantage? Problem? Linked index: Multi-level index: What if index block size is: Small? Large? First-level index contains pointers to second-level index blocks Like hierarchical paging Combined scheme: 21 Linked list of index blocks Have a set of direct index blocks as well as indirect blocks (at different levels) 22 UNIX: Combined Indexed Allocation Selecting File Allocation Method File Properties Depends on: Data Blocks Data Blocks System could support multiple allocation methods: Direct Data Block pointers Data Blocks Single Indirect ptr Double Indirect ptr File access method: Sequential vs. direct-access File size: large vs. small Specify access method at file creation time File allocation method may be changed if access method changes Small files (contiguous) changed to indexed as size grows Triple Indirect ptr 23 24 6 Free-Space Management What happens to blocks of a file that is deleted? Bit Vector Added to a free-block list Can be allocated to a different file Each block represented by a bit Advantages: How is the free-block list implemented? Need to keep in RAM for efficiency Space may become an issue 26 Linked List Similar issues as Linked Block Allocation May be ok if only one free block needed Useful for contiguous block allocation Keep a free block + count FAT keeps track of free blocks as well Grouping: Each free block contains n block addresses: Counting Each free block contains pointer to next free block 27 Simple to implement Finding free block is a hardware supported bit operation Disadvantages: 25 Bit=1 => block=free Count = no. of contiguous free blocks Similar to extent, run-length encoding Could be stored in a B-tree n-1 free blocks nth block contains another group of addresses Advantage? 28 7 Space Maps Useful for large disks Caching and Buffering Divided into smaller chunks Each chunk has a space map When a file is read in by a process: Maintained as a log of block activity on disk Maintained as a balanced tree in memory: can be created by replaying the log When a file is written by a process: Would like to cache the file data Buffer cache Going back to disk will be slow Disk blocks are kept in memory 30 Memory Mapping Unified Buffer Cache A file is mapped to a region of virtual memory File I/O is done via reading/writing to memory region What if we have separate buffer cache and page cache? Where does data get cached for: Page cache File contents are cached in units of pages Can be managed by virtual memory system 31 Data may be used again in the near future Synchronous write to disk may be slow 29 Data may be accessed multiple times by the process Data may be accessed by other processes regular I/O (via read/write syscalls)? memory-mapped file I/O (via virtual addresses)? Double caching: Same data is duplicated in both buffer and page cache Single page cache for buffering all file data 32 8 Caching Issues How to partition memory between process pages and file system page cache? Synchronous: Written directly to disk in order Asynchronous: Written from cache later, maybe out-of-order Sequential-access file: Should we use LRU? Free-behind/read-ahead: free blocks as read, prefetch subsequent blocks 33 Consistency Checking Log-Structured File Systems Backup and Restore 34 Consistency Checking Repair inconsistencies if possible E.g.: UNIX fsck, Windows chkdsk Every boot, after n boots, set an inconsistency flag How to repair damage? Examples: NTFS, ext3/ext4 Use a transactional log for metadata updates When to check for consistency? Journaling File Systems Check if the FS is consistent after system crash Depends on FS structure, storage management algos E.g.: Can reconstruct file with FAT even if dir entry is lost, but not with indexed alloc if inode lost Inspired from Databases Transaction: unit of operations for a specific task Transaction first written to disk and committed Operations applied later: transaction can be removed upon successful completion Recovery: Apply incomplete transactions Performance: 35 System crashes => inconsistent data/metadata, buffers may not have been flushed Disk corruption: might lose part or all of data Software bugs: in file system, kernel, device drivers Recovery Techniques: Cache replacement: File system can get corrupted. How? Tradeoff between thrashing and I/O performance When to write data from cache to disk? File System Recovery Transactions written sequentially to disk Recovery requires only a few operations 36 9 Log-structured File Systems The journal is the file system No updates are made in place – data written to new data blocks Data block location is updated in inode Modified inode is also written out to journal. Problem? What about directory entries? Can garbage collect or could maintain multiple versions Performance: Avoiding recursive metadata updates What happens to old data? Updating metadata Log-strcutured File Systems (Contd.) Write performance is good (sequential writes) Reads may need lot of seeks Recovery: journal is the file system Maintain and update a centralized inode map Use inode number (not pointer) in directory entries 37 38 Backup and Restore Handles disk failures/loss of data Full backup: Copy all file system state to a backup medium (e.g., a tape) Incremental backup: Changes from previous backup (incremental or full) Backups may be done periodically Cycle of 1 full followed by N incremental How big should N be? 39 10