Download Notes - CSE Labs User Home Pages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
File System Implementation

CSCI 5103
Operating Systems



File System Layout and Structure
File System Data Structures
Storage Allocation and Management
File System Recovery
Instructor: Abhishek Chandra
2
File System Implementation

How are file systems implemented on
underlying storage?


Layered File System Structure


Assume disk as the main storage medium

Issues:



Logical file system: Manages file system metadata

How is the data organized on disk?
Where is file system information kept?
How to achieve space, performance efficiency?
File-organization module: Translates logical file
blocks to physical blocks



Issues read/write requests to disk
Manages file buffers and caches
I/O control: Device drivers and interrupt handlers

3
Free space management and block allocation
Basic file system: Performs I/O on storage device


File organization, e.g., directories
File control block (FCB): File info, e.g., inode
Handles hardware-level I/O operations
4
1
File System Data Structures

File system metadata:


On-Disk Structures

Information about disk partitions, directories,
files, data


File system maintains several data structures


Boot control block

On disk: Persistent structures
In-memory: For quick access, caching
Volume control block



Directory structure

Per-file FCB:

File names, file ids (e.g., inode numbers)
Contains file metadata
6
In-Memory Structures

File System Structures
Mount table:

Directory cache:

System open-file table:

Per-process open-file table:

Buffers:




Volume-level:

File-level:
Information about mounted volumes


7
Contains information about volume
E.g.: volume size, block size, free blocks, free FCBs
UNIX superblock


5
Used to boot an OS from disk
First block of volume

Info about recently accessed directories


Mount Table
File Control Block
In-memory File Tables
FCBs of all open files
Pointers to system open-file table entries
Data being read/written from/to disk
8
2
Mount Table

In-memory mount table





File Control Block
Contains metadata about file

Contains entries for mounted file systems
Each entry contains pointer to file system, type
Windows: Entries for each drive (C:, D:, etc.)
UNIX: In-memory inode of mount point directory
is flagged, contains pointer to mount table entry


Example: inode on UNIX systems
Directory can be treated as special file



What does OS do while mounting a volume?



Verifies that device contains valid file system
Adds an entry in the in-memory mount table
9
File properties: owner, mode, timestamps, etc.
Pointers to disk blocks containing file data
Can have its own FCB
Will contain list of <filename, FCB>
10
Inode Structure
In-Memory File Tables

System-wide open-file table:

Per-process open-file table: Each entry contains

File Properties
(owner, mode,
timestamps, ...)
Data Blocks
Data Blocks


Direct Data Block
pointers

Data Blocks
Single Indirect ptr
Contains FCBs for all open files in the system
Pointer to system-wide open-file table entry for file
Other info: File offset, Access mode
File descriptor/file handle: Index into per-process
open-file table

Process uses this handle for all operations to the file
Double Indirect ptr
Triple Indirect ptr
11
12
3
UNIX: In-Memory File Tables
Per-Process Open-File Table
0
System File Table
1
Proc A 2
3
4
RW, offset=10
File Open
In-memory
Inode Table

Several steps:
open (“file1”, “RW”)?
1. Search system-wide open-file table for file1 FCB
2. If not found, search directory structure (cached in memory or
from disk)
3. Copy file1 FCB into system-wide open-file table
4. Create a new entry in per-process table pointing to file1 FCB
in system table
5. Initialize mode (RW), file offset in per-process table entry
6. Return index to per-process table entry as the file descriptor
File2 inode
File1 inode
1
Proc B 2
3
4
13
What happens when user calls:

R, offset=100
0

14
File System Implementation


Directory Implementation
Directory Implementation
File Implementation:



How does directory get implemented on disk?
Linear list of file entries

Sorted list:

Hash table

In-memory directory cache

Space Allocation
Free Space Management




15
Problems?
Could keep entries sorted via B-tree
Hash on file name provides index into linear list
Recently accessed directories are cached
16
4
File Implementation

Contiguous Allocation
How is space allocated on disk for file contents?




Contiguous allocation
Linked allocation
Indexed allocation
Whole file allocated on contiguous set of blocks




Issues:



File location: location of first block+no. of blocks,
pointer to next extent
18
Linked Allocation




File Allocation Table (FAT)
Each block contains pointer to next block


How to do direct access to block n?
Advantage?
Problem?


Issues:



19
Fragmentation
File size estimation/growth
Extent: Additional chunk of contiguous space
added to file

17
How to do direct access to block n?
Advantages?
Problems?

Direct access is inefficient
Space requirement for pointers: Can be mitigated
with clustering
Reliability: what if a pointer is corrupted?

One entry for each block
Directory entry of file contains index to first
block
Each entry contains index of next block
Last entry contains EOF value, free entry
contains 0
Issues:


FAT must be cached in memory for efficiency
Can be large for large disks
20
5
Indexed Allocation

Each file has an index block





Indexed Allocation: Supporting Larger Files
Contains list of pointers to data blocks
How to do direct access to block n?
Advantage?
Problem?

Linked index:

Multi-level index:


What if index block size is:



Small?
Large?

First-level index contains pointers to second-level
index blocks
Like hierarchical paging
Combined scheme:

21
Linked list of index blocks
Have a set of direct index blocks as well as
indirect blocks (at different levels)
22
UNIX: Combined Indexed Allocation
Selecting File Allocation Method

File Properties
Depends on:

Data Blocks

Data Blocks

System could support multiple allocation methods:

Direct Data Block
pointers

Data Blocks
Single Indirect ptr

Double Indirect ptr
File access method: Sequential vs. direct-access
File size: large vs. small
Specify access method at file creation time
File allocation method may be changed if access
method changes
Small files (contiguous) changed to indexed as size
grows
Triple Indirect ptr
23
24
6
Free-Space Management

What happens to blocks of a file that is deleted?



Bit Vector
Added to a free-block list
Can be allocated to a different file

Each block represented by a bit

Advantages:

How is the free-block list implemented?




Need to keep in RAM for efficiency
Space may become an issue
26
Linked List





Similar issues as Linked Block Allocation
May be ok if only one free block needed


Useful for contiguous block allocation
Keep a free block + count

FAT keeps track of free blocks as well
Grouping: Each free block contains n block
addresses:


Counting
Each free block contains pointer to next free block

27
Simple to implement
Finding free block is a hardware supported bit
operation
Disadvantages:

25
Bit=1 => block=free


Count = no. of contiguous free blocks
Similar to extent, run-length encoding
Could be stored in a B-tree
n-1 free blocks
nth block contains another group of addresses
Advantage?
28
7
Space Maps

Useful for large disks


Caching and Buffering

Divided into smaller chunks

Each chunk has a space map


When a file is read in by a process:

Maintained as a log of block activity on disk
Maintained as a balanced tree in memory: can be
created by replaying the log

When a file is written by a process:


Would like to cache the file data

Buffer cache

Going back to disk will be slow
Disk blocks are kept in memory
30
Memory Mapping


Unified Buffer Cache

A file is mapped to a region of virtual memory

File I/O is done via reading/writing to memory
region

What if we have separate buffer cache and
page cache? Where does data get cached for:

Page cache



File contents are cached in units of pages
Can be managed by virtual memory system

31
Data may be used again in the near future
Synchronous write to disk may be slow


29
Data may be accessed multiple times by the process
Data may be accessed by other processes
regular I/O (via read/write syscalls)?
memory-mapped file I/O (via virtual addresses)?
Double caching: Same data is duplicated in both
buffer and page cache
Single page cache for buffering all file data
32
8
Caching Issues

How to partition memory between process pages
and file system page cache?





Synchronous: Written directly to disk in order
Asynchronous: Written from cache later, maybe
out-of-order




Sequential-access file: Should we use LRU?
Free-behind/read-ahead: free blocks as read,
prefetch subsequent blocks
33

Consistency Checking
Log-Structured File Systems
Backup and Restore
34
Consistency Checking





Repair inconsistencies if possible
E.g.: UNIX fsck, Windows chkdsk


Every boot, after n boots, set an inconsistency flag

How to repair damage?


Examples: NTFS, ext3/ext4
Use a transactional log for metadata updates

When to check for consistency?


Journaling File Systems
Check if the FS is consistent after system crash

Depends on FS structure, storage management algos
E.g.: Can reconstruct file with FAT even if dir entry is
lost, but not with indexed alloc if inode lost


Inspired from Databases
Transaction: unit of operations for a specific task
Transaction first written to disk and committed
Operations applied later: transaction can be
removed upon successful completion
Recovery: Apply incomplete transactions
Performance:


35
System crashes => inconsistent data/metadata,
buffers may not have been flushed
Disk corruption: might lose part or all of data
Software bugs: in file system, kernel, device drivers
Recovery Techniques:

Cache replacement:

File system can get corrupted. How?

Tradeoff between thrashing and I/O performance
When to write data from cache to disk?


File System Recovery
Transactions written sequentially to disk
Recovery requires only a few operations
36
9
Log-structured File Systems

The journal is the file system



No updates are made in place – data written to new
data blocks



Data block location is updated in inode
Modified inode is also written out to journal. Problem?
What about directory entries?

Can garbage collect or could maintain multiple
versions
Performance:



Avoiding recursive metadata updates

What happens to old data?

Updating metadata


Log-strcutured File Systems (Contd.)
Write performance is good (sequential writes)
Reads may need lot of seeks
Recovery: journal is the file system
Maintain and update a centralized inode map
Use inode number (not pointer) in directory entries
37
38
Backup and Restore




Handles disk failures/loss of data
Full backup: Copy all file system state to a
backup medium (e.g., a tape)
Incremental backup: Changes from previous
backup (incremental or full)
Backups may be done periodically


Cycle of 1 full followed by N incremental
How big should N be?
39
10