Download Chapter 11 File System Implementation

Document related concepts

Asynchronous I/O wikipedia , lookup

Lustre (file system) wikipedia , lookup

File system wikipedia , lookup

B-tree wikipedia , lookup

Design of the FAT file system wikipedia , lookup

File Allocation Table wikipedia , lookup

Disk formatting wikipedia , lookup

XFS wikipedia , lookup

Computer file wikipedia , lookup

File locking wikipedia , lookup

Files-11 wikipedia , lookup

Transcript
Chapter 11
File System Implementation
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Outline
•File-System Structure
•File-System Implementation
•Directory Implementation
•Allocation Methods
•Free-Space Management
•Efficiency and Performance
•Recovery
•Log-Structured File Systems
•NFS
•Example: WAFL File System
NCHU System & Network Lab
11.1 File-System Structure
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
File-System Structure
• File system: provide efficient and convenient access to disk
• File structure
– Logical storage unit
– Collection of related information
• I/O transfers between memory and disk are performed in units
of blocks
– one or more sectors
• Two design problems
– How the file system should look to the user
•File and its attributes, operations on a file……
– Map the logical file system onto the disk
NCHU System & Network Lab
Layered File System: File system
organized into layers
NCHU System & Network Lab
Layered File System
file name
Application Programs
write(data_file, item)
Logical File System
information
of data_file
Given a symbolic file name,
use directory to provide
values needed by FOM
Transform logical address to
File-Organization Module
physical block address
2144th logical block
Basic File System
driver 1, cylinder 73,
surface 2, sector 10
I/O Control
Devices
NCHU System & Network Lab
Numeric disk address:
Driver: hardware instructions
Layered File System
•I/O control –device drivers and interrupt handlers
–Transfer information between main memory and disk
system
–Retrieve block 123  HW-specific instructions
•Basic file system
–Issue generic commands to device driver to read and write
physical blocks on the disk
–Physical block is identified as: drive 1, cylinder 73, track 2,
sector 10
NCHU System & Network Lab
Layered File System (Cont.)
•File-organization module
–Know about files, their logical blocks, and physical blocks
–Translate logical blocks to physical blocks (similar to VM)
•Logical blocks: 0 –N
–Free-space manager: track unallocated blocks
–Blocks allocation: allocated free blocked when requested
•Logical file system –manage metadata information
–Metadata: file-system structure, excluding the actual file
contents
–Manage the directory structure via file control blocks
(FCB)
–File control block –storage structure consisting of
information about a file
•Ownership, permissions, and location of the file content
NCHU System & Network Lab
A Typical File Control Block
NCHU System & Network Lab
Layered File System (Cont.)
•Why Layered file system?
–All the advantages of the layered approach
–File system standard: UFS (Unix File System),
FAT FAT32, NTFS…
–Duplication of code is minimized for different file
system standard
–Usually I/O control and the basic file system code
can be used by multiple file system formats.
NCHU System & Network Lab
11.2 File System Implementation
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
On-Disk Structures
• Boot control block: information needed by the system to boot
an OS from that partition
– UFS: called boot block; NTFS: called partition boot sector
• Volume control block: volume details
– No. of blocks, size of the blocks, free-block count and free-block
pointers, free FCB count and FCB pointers
– UFS: called superblock; NTFS: called Master File Table
• A directory structure is used to organize the files
– UFS: includes file names and inode numbers
– NTFS: in the Master File Table
• A per-file File control block: many of the file’
s details
– File permissions, ownership, size, location of the data blocks
– UFS: called inode; NTFS: within the Master File Table
NCHU System & Network Lab
A Possible File System Layout
1
2
3
1. Boot OS from the partition
2. Volume control block
3. Directory structure NCHU System & Network Lab
In-Memory Information
•Used for FS management and performance via
caching
–An in-memory mount table containing
information about each mounted volume
–An in-memory directory-structure cache
•Hold the directory information of recently accessed
directories
–The system-wide open-file table (Chapter 10)
•A copy of FCB of each open file
–The per-process open-file table (Chapter 10)
NCHU System & Network Lab
In-Memory File System Structures
•The following figure illustrates the necessary
file system structures provided by the
operating systems.
•Figure (a) refers to opening a file.
•Figure (b) refers to reading a file.
NCHU System & Network Lab
In-Memory File System Structures
File Open
File Read
NCHU System & Network Lab
Partitions and Mounting
•A disk can be sliced into multiple partitions
•A volume can span multiple partitions on multiple
disks
•Each partition can be
–Raw: contain no file system
•E.g., swap space
–Cooked: contain a file system
•Root partition: contains OS and other system files
–Mounted at boot time
•Other partitions are mounted at boot time or manually
NCHU System & Network Lab
Virtual File Systems
•Virtual File Systems (VFS) provide an objectoriented way of implementing file systems.
•Two functions
–VFS separates file-system-generic operations from their
implementation
•By defining a clean VFS interface
–VFS provides a mechanism for uniquely representing a file
throughput a network
•VFS is based on a file-representation structure, called a vnode, that
contains a numerical designator for a network-wide unique file
NCHU System & Network Lab
Schematic View of Virtual File System
Open, read, write…
NCHU System & Network Lab
Virtual File Systems (Cont.)
•VFS allows the same system call interface (the
API) to be used for different types of file
systems.
•The API is to the VFS interface, rather than
any specific type of file system.
NCHU System & Network Lab
11.3 Directory Implementation
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Directory Implementation
• Linear list of file names with pointer to the data blocks.
– Simple to program
– Time-consuming to execute- linear search to find a particular entry
•Cache and sorted list may help
• Hash Table –linear list with hash data structure.
– (file name) => hash function => a pointer to the linear list
– Decreases directory search time
– Collisions –situations where two file names hash to the same location
– Fixed size and the dependence of the hash function on that size
•If hash table has only 64 entries, hash function is modulo 64
•If we want to enlarge hash table
– Hash table becomes 128 entries, hash function must be changed
– Or use a chained-overflow hash table, each entry is a linked list
NCHU System & Network Lab
11.4 Allocation Methods
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Allocation Methods
•How to allocate space to files ?
–So that disk space is utilized effectively and files can be
accessed quickly
•An allocation method refers to how disk blocks are
allocated for files:
–Contiguous allocation
•Extend-based system
–Linked allocation
–Indexed allocation
NCHU System & Network Lab
Contiguous Allocation
• Each file occupies a set of contiguous blocks on the disk
• Simple –only starting location (block #) and length (number
of blocks) are required in the directory entry (FCB)
• Good
– Fast -- Minimal seek time and head movement
– Support both sequential and direct access
•If direct access to block i of a file that starts at block b, access block b+i
• Bad
– Finding space for a new file, similar to dynamic storage-allocation
problem
•Request n blocks from a list of free holes, can use first fit, best fit……
– External fragmentation
•compaction (expensive)
– When creating a file, may need to determine how much space is needed
for a file
– Files are difficult to grow
NCHU System & Network Lab
Contiguous Allocation of Disk Space
NCHU System & Network Lab
(a) Contiguous allocation of disk space for 7 files
(b) State of the disk after files D and F have been removed
NCHU System & Network Lab
Extent-Based Systems
•Many newer file systems (I.e. Veritas File System)
use a modified contiguous allocation scheme
•Extent-based file systems allocate disk blocks in
extents
•An extent is a contiguous block of disks
–Extents are allocated for file allocation
–A file consists of one or more extents.
•Integrate contiguous allocation and linked allocation
NCHU System & Network Lab
Linked Allocation
•Each file is a linked list of disk blocks:
–Blocks may be scattered anywhere on the disk.
–Directory contains a pointer to the first and last blocks
–Each block contains a pointer to the next block
•Advantages
–No external fragmentation
–Easy to grow –Any free block is OK
–Free-space management system –no waste of space
block
=
pointer
NCHU System & Network Lab
Linked Allocation
NCHU System & Network Lab
Storing a file as a linked list of disk blocks
NCHU System & Network Lab
Linked Allocation (Cont.)
•Disadvantages
–Effectively for only sequential-access file
•If find ith block, must start at the beginning and follow
the pointer
–Space required for the pointers
–Reliability –What if the pointers are lost
NCHU System & Network Lab
Linked Allocation (Cont.)
•Solution for spaces for pointers
–Collect blocks into clusters, and allocate the clusters than
blocks
–Fewer disk head seeks and decreases the space needed for
block allocation and free-list management
–Internal fragmentation
•Solution for reliability
–Double linked list or store the filename and relative block
number in each block
•More overhead for each file
NCHU System & Network Lab
Linked Allocation (Cont.)
•FAT (File Allocation Table)
–OS/2, MS-DOS
–The table has one entry for each disk block and is indexed
by block number
•Similar to the linked list
•Contain the block number of the next block in the file
–Significant number of disk head seeks
•One for FAT, one for data
•Improved by caching FAT
•Random access time is improved
–By reading the FAT
NCHU System & Network Lab
File-Allocation Table
把Pointer集中放置
於FAT,而不是跟
Data Block放一起
NCHU System & Network Lab
Indexed Allocation
•Bring all pointers together into the index block
–An array of disk-block addresses
–The ith entry points to the ith block of the file
–The directory contains the address of the index
block
–Similar to the paging scheme for memory
management
•Logical view.
index table
NCHU System & Network Lab
Example of Indexed Allocation
NCHU System & Network Lab
Indexed Allocation (Cont.)
•Advantage
–Support direct access
–Without external fragmentation
–Easy to create a file (no size-declaration problem)
•Disadvantage
–Wasted space: space for index block
•Worse than the linked allocation for small files
•How large the index block should be
–Large index block: waste space for small files
–Small index block: how to handle large files
NCHU System & Network Lab
Indexed Allocation (Cont.)
•Mechanism for handling the index block
–Linked scheme: link together several index blocks
–Multilevel index: like multi-level paging
•With 4096-byte blocks, we could store 1024 4-byte
pointers in an index block.
•Two levels of indexes allows 1,048,576 data blocks,
which allow a file of up to 4 gigabytes
–Combined scheme: for example BSD UNIX
System
NCHU System & Network Lab
Linked Scheme
directory
file
first index block
jeep
19

next
19
next
next
data
data
data
data
data
data
data
data
data
data
data
data
24
NCHU System & Network Lab
8
Multilevel Index (Two-Level)
NCHU System & Network Lab
Combined Scheme: UNIX (4K bytes
Per Block)
NCHU System & Network Lab
11.5 Free Space Management
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Free-Space Management
•Free-space list: used to keep track of free disk space
–bit vector (bit map)
•One bit for a block (1: free, 0: allocated)
–linked list: each free block points to the next
–grouping:
–counting:
NCHU System & Network Lab
Bit Vector
–block size = 212 bytes
–disk size = 230 bytes (1 gigabyte)
–n = 230/212 = 218 bits (or 32K bytes)

•Simple and efficient to find the first free block, or
consecutive free blocks
01 2
n-1
•Requires extra space: example:
…
bit[i] =
0  block[i] free
1  block[i] occupied
•Efficient only when the entire vector is kept in main
memory
–Write back to the disk occasionally for recovery needs
NCHU System & Network Lab
Linked List
•Link together all free blocks
•Keep a pointer to the first free block in a special
location on the disk and caching it in memory
•Good
–No waste of space
•Bad
–Cannot get contiguous space easily
–Not easy to traverse the list (infrequent action)
•Fortunately, OS needs one free block at a time
•Just find the first free block, no traversal
•FAT incorporate the linked list mechanism
NCHU System & Network Lab
Linked Free Space List on Disk
NCHU System & Network Lab
Grouping And Counting
•Grouping:
–Store the address of n free blocks in the first free block.
–The first n-1 are actually free.
–The final block contains the addresses of another n free
blocks…and so on
–Good: find a large number of free blocks quickly
•Counting:
–Keep the address of the first free blocks and the number n
of free contiguous blocks
–Since several contiguous blocks may be allocated or freed
simultaneously
NCHU System & Network Lab
Example Of Free-Space
Management
Bit Vector
11000011000000111001111110001111
Grouping (n = 3)
Block 2 stores (3, 4, 5)
Block 5 stores (8, 9, 10)
Block 10 stores (11, 12, 13)
Block 13 stores (17, 28, 25)
Block 25 stores (26, 27, -1)
Counting
2 4
8 6
17 2
25 3
NCHU System & Network Lab
11.6 Efficiency and Performance
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Efficiency and Performance
•Efficiency dependent on:
–Disk allocation and directory algorithms
–Types of data kept in file’
s directory entry
•If “
last access date”changed, directory entry must be modified
•Performance
–On-board cache –local memory in disk controller to store
entire tracks at a time
–Buffer cache –separate section of main memory for
frequently used blocks
–Synchronous writes v.s. asynchronous writes
–Free-behind and read-ahead –techniques to optimize
sequential access
NCHU System & Network Lab
Page Cache
•A page cache caches pages rather than disk blocks
using virtual memory techniques
•Memory-mapped I/O uses a page cache
•Routine I/O through the file system uses the buffer
cache
•This leads to the following figure
•Problem: double caching
–Data may be cached in both buffer cache and page cache
•Buffer cache: for using ordinary file operations
•Page cache: for using memory-mapped file
•Sol: unified buffer cache
NCHU System & Network Lab
I/O Without a Unified Buffer Cache
NCHU System & Network Lab
Unified Buffer Cache
•A unified buffer cache uses the same cache to
cache both memory-mapped pages and
ordinary file system I/O
NCHU System & Network Lab
I/O Using a Unified Buffer Cache
NCHU System & Network Lab
Free-Behind and Read-Ahead
•Free-behind
–Remove a page from the buffer as soon as the next page is
requested
–The previous pages are not likely to be used again and
waste buffer space
•Read-ahead
–A requested page and several subsequent pages are read
and cached
–These pages are likely to be requested soon
•Both are applied to sequential access
NCHU System & Network Lab
11.7 Recovery
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Recovery
• Some directory information is kept in main memory
– If a computer crash, information in memory are lost
•Cache, buffer contents, and the current I/O operation
– The file system may be in inconsistent state
• Consistency checker –compares data in directory structure
with data blocks on disk, and tries to fix inconsistencies
• Use system programs to back up data from disk to another
storage device
– Floppy disk, magnetic tape, other magnetic disk, optical
– Recover lost file or disk by restoring data from backup
NCHU System & Network Lab
11.8 Log Structured File Systems
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Log Structured File Systems
• Log structured (or journaling) file systems record each
metadata update to the file system as a transaction
• All transactions are written to a log
– A transaction is considered committed once it is written to the log
– However, the file system may not yet be updated
• The transactions in the log are asynchronously written to the
file system
– When the file system is modified, the transaction is removed from the
log
• If the file system crashes
– All remaining transactions in the log must still be performed
NCHU System & Network Lab
11.9 NFS
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
The Sun Network File System
(NFS)
•An implementation and a specification of a
software system for accessing remote files
across LANs (or WANs)
•The implementation is part of the Solaris and
SunOS operating systems running on Sun
workstations
–Using TCP or UDP/IP protocol
NCHU System & Network Lab
NFS (Cont.)
• Interconnected workstations viewed as a set of independent
machines with independent file systems
• Goal: allows sharing among these file systems in a transparent
manner
– Based on client-server model
• Steps
– A remote directory is mounted over a local file system directory
•The mounted directory looks like an integral subtree of the local file system
– Specification of the remote directory for the mount operation is
nontransparent
•The host name of the remote directory has to be provided
– But, files in the remote directory can then be accessed in a transparent
manner
– Subject to access-rights accreditation,
•Potentially any file system (or directory within a file system), can be
mounted remotely on top of any local directory
NCHU System & Network Lab
Three Independent File Systems
NCHU System & Network Lab
Three Independent File Systems
Mounts
Cascading mounts
mount S1:/user/shared over U:/user/local
mount S2:/user/dir2 over U:/user/local/dir1
NCHU System & Network Lab
NFS (Cont.)
• NFS is designed to operate in a heterogeneous environment of
different machines, operating systems, and network
architectures
– The NFS specifications independent of these media
• This independence is achieved through the use of RPC
primitives built on top of an External Data Representation
(XDR) protocol used between two implementationindependent interfaces
• The NFS specification distinguishes between
– The services provided by a mount mechanism : the mount protocol
– The actual remote-file-access services : the NFS protocol
NCHU System & Network Lab
The Mount Protocol
• Establishes initial logical connection between server and client
• Mount operation includes name of remote directory to be
mounted and name of server machine storing it
– Mount request is mapped to corresponding RPC and forwarded to
mount server running on server machine
– Export list
•Specifies local file systems that server exports for mounting
•Along with names of machines that are permitted to mount them
• Following a mount request that conforms to its export list, the
server returns a file handle —a key for further accesses
• File handle –a file-system identifier, and an inode number to
identify the mounted directory within the exported file system
• The mount operation changes only the user’
s view and does
not affect the server side
NCHU System & Network Lab
NFS Protocol
•Provides a set of RPCs for remote file operations.
•The procedures support the following operations:
–searching for a file within a directory
–reading a set of directory entries
–manipulating links and directories
–accessing file attributes
–reading and writing files
•NFS servers are stateless;
–No open() and close()
–Each request has to provide a full set of arguments
•Include a unique file identifier and offset
NCHU System & Network Lab
NFS Protocol (Cont.)
•Modified data must be committed to the
server’
s disk before results are returned to the
client
–Lose advantages of caching
•The NFS protocol does not provide
concurrency-control mechanisms
–Users must do it by themselves
•NFS is integrated into the OS via VFS
NCHU System & Network Lab
Schematic View of NFS Architecture
NCHU System & Network Lab
Three Major Layers of NFS
Architecture
• UNIX file-system interface
– Based on the open, read, write, and close calls, and file descriptors
• Virtual File System (VFS) layer –distinguishes local files
from remote ones, and local files are further distinguished
according to their file-system types
– The VFS activates file-system-specific operations to handle local
requests according to their file-system types
– Calls the NFS protocol procedures for remote requests
• NFS service layer –bottom layer of the architecture
– Implements the NFS protocol
NCHU System & Network Lab
NFS Path-Name Translation
•Path-Name Translation
–Performed by breaking the path into component names
•/user/local/dir1/file.text
•(1) usr, (2) local, (3) dir1
–Then performing a separate NFS lookup call for every pair
of component name and directory vnode
•To make lookup faster, a directory name lookup
cache on the client’
s side holds the vnodes for remote
directory names
NCHU System & Network Lab
NFS Remote Operations
•Nearly one-to-one correspondence between
regular UNIX system calls and the NFS
protocol RPCs
–Except opening and closing files
•NFS adheres to the remote-service paradigm
–But employs buffering and caching techniques for
the sake of performance
–File block and file attributes are fetched by RPCs
and are cached locally
NCHU System & Network Lab
NFS Remote Operations (Cont.)
•File-attribute cache –the inode-information
–The attribute cache is updated whenever new attributes
arrive from the server
•File-blocks cache –
–When a file is opened, the kernel checks with the remote
server whether to fetch or revalidate the cached attributes
–Cached file blocks are used only if the corresponding
cached attributes are up to date
•Read-ahead and delay-write are used
–Clients do not free delayed-write blocks until the server
confirms that the data have been written to disk
NCHU System & Network Lab
11.10 Example: WAFL File
System
Operating System Concepts –7th Edition, Jan 14, 2005
Silberschatz, Galvin and Gagne ©2005
Example: WAFL File System
•write-anywhere file layout
–Optimized for random writes
•Random I/O optimized, write optimized
–Random read can be improved by caching
–NVRAM for write caching
•A stable-storage cache
•Can support multiple snapshots
–Useful for backups, testing, and so on
NCHU System & Network Lab
The WAFL File Layout
file
file
file
All metadata lives in files
NCHU System & Network Lab
Snapshots in WAFL
NCHU System & Network Lab