Download 第十二章

Document related concepts
Transcript
Chapter 12: File System Implementation
•
•
•
•
•
•
•
•
•
File System Structure
File System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery
Log-Structured File Systems
NFS
Operating System Concepts
1
File-System Structure
• File structure
– Logical storage unit
– Collection of related information
• File system resides on secondary storage
(disks).
• File system organized into layers.
• File control block – storage structure consisting
of information about a file.
Operating System Concepts
2
Layered File System
Operating System Concepts
3
A Typical File Control Block
ACL:Access Control List
Operating System Concepts
4
In-Memory File System Structures
• The following figure illustrates the necessary
file system structures provided by the operating
systems.
• Figure 12-3(a) refers to opening a file.
• Figure 12-3(b) refers to reading a file.
Operating System Concepts
5
In-Memory File System Structures
Operating System Concepts
6
Virtual File Systems
• Virtual File Systems (VFS) provide an objectoriented way of implementing file systems.
• VFS allows the same system call interface (the
API) to be used for different types of file
systems.
• The API is to the VFS interface, rather than any
specific type of file system.
Operating System Concepts
7
Schematic View of Virtual File
System
Operating System Concepts
8
Directory Implementation
• Linear list of file names with pointer to the data
blocks.
– simple to program
– time-consuming to execute
• Hash Table – linear list with hash data structure.
– decreases directory search time
– collisions – situations where two file names
hash to the same location
– fixed size
Operating System Concepts
9
Allocation Methods
• An allocation method refers to how disk blocks
are allocated for files
• Contiguous allocation
• Linked allocation
• Indexed allocation
Operating System Concepts
10
Contiguous Allocation
• Each file occupies a set of contiguous blocks on the
disk.
• Simple – only starting location (block #) and length
(number of blocks) are required.
• Random access.
• Wasteful of space (dynamic storage-allocation
problem).
• Files cannot grow.
Operating System Concepts
11
Contiguous Allocation of Disk Space
Operating System Concepts
12
Extent-Based Systems
• Many newer file systems (e.g. Veritas File
System) use a modified contiguous allocation
scheme.
• Extent-based file systems allocate disk blocks
in extents.
• An extent is a contiguous block of disks.
Extents are allocated for file allocation. A file
consists of one or more extents.
Operating System Concepts
13
Linked Allocation
• Each file is a linked list of disk blocks: blocks may be
scattered anywhere on the disk.
block
=
pointer
data
Operating System Concepts
14
Linked Allocation
Operating System Concepts
15
Linked Allocation (Cont.)
Simple – need only starting address
Free-space management system – no external fragment
No random access
Space required for pointers: 4-byte pointer in 512-byte
block:0.78%
• Mapping
•
•
•
•
Q
LA/512
R
Block to be accessed is the Qth block in the linked chain
of blocks representing the file.
Displacement into block = R
Operating System Concepts
16
 Clustering:linked list of clusters instead of blocks
Decreases disk space needed for block management
Increase of internal fragmentation
Improves disk throughput(fewer disk head seeks)
File-allocation table (FAT) – disk-space allocation used by MS-DOS
and OS/2.
FAT occupies a section at the beginning of a partition
FAT contains one entry for each block, containing a pointer to
the next block in that file.
Operating System Concepts
17
File-Allocation Table
Operating System Concepts
18
Indexed Allocation
• Brings all pointers together into the index block.
• Logical view.
index table
Operating System Concepts
19
Example of Indexed Allocation
Operating System Concepts
20
Indexed Allocation (Cont.)
• Need index table
• Random access
• Dynamic access without external fragmentation, but
have overhead of index block.
• Mapping from logical to physical in a file of
maximum size of 256K words and block size of 512
words. We need only 1 block for index table.
Q
LA/512
R
Q = displacement into index table
R = displacement into block
Operating System Concepts
21
Indexed Allocation – Mapping (Cont.)
• Mapping from logical to physical in a file of
unbounded length (block size of 512 words).
• Linked scheme – Link blocks of index table (no limit
on size).
Q1
LA / (512 x 512)
R1
Q1 = block of index table
R1 is used as follows:
Q2
R1 / 512
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
Operating System Concepts
22
Indexed Allocation – Mapping (Cont.)
• Two-level index (maximum file size is 5123)
Q1
LA / (512 x 512)
R1
Q1 = displacement into outer-index
R1 is used as follows:
Q2
R1 / 512
R2
Q2 = displacement into block of index table
R2 displacement into block of file:
Operating System Concepts
23
Indexed Allocation – Mapping (Cont.)
LA

outer-index
index table
Operating System Concepts
file
24
Combined Scheme: UNIX (4K bytes per
block)
Operating System Concepts
25
Free-Space Management
• Bit vector (n blocks)
0 1
2
n-1
bit[i] =

…
1  block[i] free
0  block[i] occupied
Calculation of the first free block number
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
Operating System Concepts
26
Free-Space Management (Cont.)
– Bit map requires extra space. Example:
block size = 212 bytes
disk size = 230 bytes (1 gigabyte)
n = 230/212 = 218 bits (or 32K bytes)
– Easy to get contiguous files
• Linked list (free list)
– Cannot get contiguous space easily
– No waste of space
• Grouping:linked list of free-block blocks(nodes),each node
contains n-1 free block number and a pointer to the next
node.(UNIX )
• Counting: entry in the free-space list consists of the disk address of
a region of free blocks and its length(in block)
Operating System Concepts
27
Linked Free Space List on Disk
Operating System Concepts
28
Efficiency and Performance
• Efficiency dependent on:
– disk allocation and directory algorithms
– types of data kept in file’s directory entry(e.g. “last
write time”, “last access time”)
• Performance
– disk cache – separate section of main memory for
frequently used blocks
– free-behind and read-ahead – techniques to
optimize sequential access
– improve PC performance by dedicating section of
memory as virtual disk, or RAM disk.
Operating System Concepts
29
Various Disk-Caching Locations
Operating System Concepts
30
Page Cache
• A page cache caches pages rather than disk
blocks using virtual memory techniques.
• Memory-mapped I/O uses a page cache.
• Routine I/O through the file system uses the
buffer (disk) cache.
• This leads to the following figure.
Operating System Concepts
31
I/O Without a Unified Buffer Cache
Operating System Concepts
32
Unified Buffer Cache
• A unified buffer cache uses the same page
cache to cache both memory-mapped pages and
ordinary file system I/O.
Operating System Concepts
33
I/O Using a Unified Buffer Cache
Operating System Concepts
34
Recovery
• Consistency checking – compares data in
directory structure with data blocks on disk, and
tries to fix inconsistencies.
• Use system programs to back up data from disk
to another storage device (floppy disk, magnetic
tape).
• Recover lost file or disk by restoring data from
backup.
Operating System Concepts
35
Log Structured File Systems
• Log structured (or journaling) file systems record
each set of operations to the file system as a
transaction.
• All transactions are written to a log. A transaction is
considered committed once it is written to the log.
However, the file system may not yet be updated.
• The transactions in the log are asynchronously written
to the file system. When the file system is modified,
the transaction is removed from the log.
• If the file system crashes, all remaining transactions in
the log must still be performed.
Operating System Concepts
36
The Sun Network File System (NFS)
• An implementation and a specification of a
software system for accessing remote files
across LANs (or WANs).
• The implementation is part of the Solaris and
SunOS operating systems running on Sun
workstations using an unreliable datagram
protocol (UDP/IP protocol) and Ethernet.
Operating System Concepts
37
NFS (Cont.)
• Interconnected workstations viewed as a set of
independent machines with independent file systems,
which allows sharing among these file systems in a
transparent manner.
– A remote directory is mounted over a local file system directory.
The mounted directory looks like an integral subtree of the local
file system, replacing the subtree descending from the local
directory.
– Specification of the remote directory for the mount operation is
nontransparent; the host name of the remote directory has to be
provided. Files in the remote directory can then be accessed in a
transparent manner.
– Subject to access-rights accreditation, potentially any file system
(or directory within a file system), can be mounted remotely on
top of any local directory.
Operating System Concepts
38
NFS (Cont.)
• NFS is designed to operate in a heterogeneous
environment of different machines, operating systems,
and network architectures; the NFS specifications
independent of these media.
• This independence is achieved through the use of RPC
primitives built on top of an External Data
Representation (XDR) protocol used between two
implementation-independent interfaces.
• The NFS specification distinguishes between the
services provided by a mount mechanism(mount
protocol) and the actual remote-file-access
services(NFS protocol).
Operating System Concepts
39
Three Independent File Systems
Operating System Concepts
40
Mounting in NFS
Mounts
Cascading mounts
Operating System Concepts
41
NFS Mount Protocol
• Establishes initial logical connection between server and client.
• Mount operation includes name of remote directory to be mounted
and name of server machine storing it.
– Mount request is mapped to corresponding RPC and forwarded to
mount server running on server machine.
– Export list – specifies local file systems that server exports for
mounting, along with names of machines that are permitted to
mount them.
• Following a mount request that conforms to its export list, the server
returns a file handle—a key for further accesses.
• File handle – a file-system identifier, and an inode number to
identify the mounted directory within the exported file system.
• The mount operation changes only the user’s view and does not
affect the server side.
Operating System Concepts
42
NFS Protocol
• Provides a set of remote procedure calls for remote file operations.
The procedures support the following operations:
–
–
–
–
–
searching for a file within a directory
reading a set of directory entries
manipulating links and directories
accessing file attributes
reading and writing files
• NFS servers are stateless; each request has to provide a full set of
arguments.
• Modified data must be committed to the server’s disk before results
are returned to the client (lose advantages of caching).
• The NFS protocol does not provide concurrency-control mechanisms.
Operating System Concepts
43
Three Major Layers of NFS
Architecture
• UNIX file-system interface (based on the open, read,
write, and close calls, and file descriptors).
• Virtual File System (VFS) layer – distinguishes local files
from remote ones, and local files are further distinguished
according to their file-system types.
– The VFS activates file-system-specific operations
to handle local requests according to their filesystem types.
– Calls the NFS protocol procedures for remote
requests.
• NFS service layer – bottom layer of the architecture;
implements the NFS protocol.
Operating System Concepts
44
Schematic View of NFS Architecture
Operating System Concepts
45
NFS Path-Name Translation
• Performed by breaking the path into component names
and performing a separate NFS lookup call for every
pair of component name and directory vnode(a
numerical designator for a network-wide unique file).
• To make lookup faster, a directory name lookup cache
on the client’s side holds the vnodes for remote
directory names.
Operating System Concepts
46
NFS Remote Operations
• Nearly one-to-one correspondence between regular UNIX
system calls and the NFS protocol RPCs (except opening and
closing files).
• NFS adheres to the remote-service paradigm, but employs
buffering and caching techniques for the sake of performance.
• File-blocks cache – when a file is opened, the kernel checks
with the remote server whether to fetch or revalidate the cached
attributes. Cached file blocks are used only if the corresponding
cached attributes are up to date.
• File-attribute cache – the attribute cache is updated whenever
new attributes arrive from the server.
• Clients do not free delayed-write blocks until the server
confirms that the data have been written to disk.
Operating System Concepts
47
NTFS File System
 The fundamental structure of the NTFS file system is a
volume.
 Created by the NT disk administrator utility.
 Based on a logical disk partition.
 May occupy a portions of a disk, an entire disk, or span
across several disks.
 NTFS uses clusters as the underlying unit of disk
allocation.
 A cluster is a number of disk sectors that is a power of two.
 Larger cluster size for larger volume size:512bytes to 64K
Operating System Concepts
48
File System — Internal Layout
 NTFS uses logical cluster numbers (LCNs) as disk
addresses.
 A file in NTFS is not a simple byte stream, as in MS-DOS
or UNIX, rather, it is a structured object consisting of
attributes.
 Every file in NTFS is described by one or more records in
an array stored in a special file called the Master File
Table (MFT).
 Each file on an NTFS volume has a unique ID called a file
reference.
 64-bit quantity that consists of a 48-bit file number and a 16-
bit sequence number.
 Can be used to perform internal consistency checks.
 The NTFS name space is organized by a hierarchy of
directories; the index root contains the top level of the B+
tree.
Operating System Concepts
49
NTFS Volume Layout
partition
boot
sector
Master File Table System
Files
File Area
Boot sector : up to 16 sectors
MFT:information about files and directories in the volume,as well as
available unallocated space, organized as a relational database
structure. MFT is also considered as a file(first entry in the MFT).
System files:
MFT2: a mirror of the first three rows of the MFT
Log file:
Cluster bit map:
Attribute definition table:defines the attribute types supported on this
volume and indicates whether they can be indexed and whether
they can be recovered during a system recovery operation.
Operating System Concepts
50
A MFT record
Standard
information
File name
Security
descriptor
data
A file is a collection of attributes.
For a small file, attributes can be stored in a single record.
If the attributes cannot be stored in a single record, another record is used.
If a single attribute is too large to be accomodated in a MFT record(e.g., the
data attribute), NTFS will allocate one or more separate area named extent.
If a file is a directory, the data area includes the index of files in the directory.
Operating System Concepts
51
Windows NTFS Components
I/O Manager
Log File
Service
Log the transaction
NTFS Driver
Read/write
the file
Flush the Write the
log file
cache
Fault Tolerant
Driver
Disk Driver
Read/write a
mirrored or
striped volume
Read/write
the disk
Cache
Manager
Load data from
disk into
memory
Access the mapped
file or flush the cache
Virtual Memory
Manager
Operating System Concepts
52
File System — Recovery
 All file system data structure updates are performed
inside transactions.
 Before a data structure is altered, the transaction writes a
log record that contains redo and undo information.
 After the data structure has been changed, a commit record
is written to the log to signify that the transaction
succeeded.
 After a crash, the file system data structures can be
restored to a consistent state by processing the log records.
Operating System Concepts
53
File System — Recovery (Cont.)
 This scheme does not guarantee that all the user file data
can be recovered after a crash, just that the file system
data structures (the metadata files) are undamaged and
reflect some consistent state prior to the crash..
 The log is stored in the third metadata file at the
beginning of the volume.
 The logging functionality is provided by the 2000 log file
service.
Operating System Concepts
54
LINUX下的NFS服务
 Server side:
 运行级(runlevel)3
 Mountd( mount protocol)
 Nfsd(NFS protocol)
 /etc/exports
 /var/lib/nfs/xtab

 Exportfs –a
Client side:
 Exportfs –r –o async django:/usr/tmp
Operating System Concepts
55
LINUX的虚拟文件系统VFS
 VFS对LINUX的每个文件系统的所有细节进行抽象,使得不同的文件系
统在LINUX核心以及系统中运行的其他进程看来,都是相同的。
 VFS并不是一种实际的文件系统。它只存在于内存中,不存在于任何
外存空间。
 VFS在系统启动时建立,在系统关闭时消亡。
 VFS拥有关于各种特殊文件系统的公共界面,如超级块、inode、文件
操作函数入口等。
Operating System Concepts
56
VFS的作用
VFS inode cache
VFS
VFS directory cache
MINIX FS
EXT2 FS
EXT FS
MSDOS FS
buffer cache
I/O 设备驱动
Operating System Concepts
57
文件系统类型
支持多种不同类型的文件系统是LINUX操作系统的一大特色。
目前支持的有ext, ext2, minix, umsdos, ncp, iso9660, hpfs,msdos, xia,
vfat, proc,nfs, smb, sysv, affs,ntfs以及ufs等,
参见 include/linux/autoconf.h。
Operating System Concepts
58
文件系统类型的注册
 一种是在编译核心系统时确定,并在系统初始化时通过内嵌的函
数调用向注册表登记。
 另一种则利用LINUX的模块(module)特征,把某个文件系统当作
一个模块。装入该模块时(通过kerneld或用insmod命令 )向注
册表登记它的类型,卸装该模块时则从注册表注销。
Operating System Concepts
59
操作函数
 int register_filesystem(struct file_system_type * fs);
 int unregister_filesystem(struct file_system_type * fs);
Operating System Concepts
60
管理文件系统类型的结构
static struct file_system_type *file_systems =
(struct file_system_type *) NULL;
struct file_system_type {
struct super_block *(*read_super)(struct super_block *,void *,int);
/* read_super 所指的函数用于读出该文件系统在外存的超级块 */
const char *name;
/* 文件系统的类型名,如 ext2 */
int requires_dev;
/* 支持文件系统的设备,proc 文件系统不需要任何设备 */
struct file_system_type * next;
/* 文件系统类型链表的后续指针 */
};
file_systems
next
file_system_type
next
Operating System Concepts
next=0
61
文件系统实例的管理



系统启动时,必首先装入“根”文件系统(由全程变量ROOT_DEV指示),
然后根据/etc/fstab中指定,逐个建立文件系统。
用户也可以通过mount、umount操作,随时安装或卸装文件系统。
当装入一个文件系统时,应首先向操作系统核心注册该文件系统。
Operating System Concepts
62
安装(mount)一个文件系统
root
安装点 inode
下挂文件系统
i_sb
文件系统的超级块
i_mount
s_covered
s_mounted
Operating System Concepts
63
文件系统类型和实例示意图
vfsmount
super_block
file_system_type
file_systems
vfsmntlist
mnt_sb
s_type
inode
vfsmnttail
s_covered
mnt_sb
s_mounted
inode
Operating System Concepts
64
文件系统实例的注册操作
 struct vfsmount *add_vfsmnt(
kdev_t dev, const char * dev_name,
const char * dir_name);
 void remove_vfsmnt(kdev_t dev);
 struct vfsmount *lookup_vfsmnt(kdev_t dev);
Operating System Concepts
65
文件系统实例的数据结构
static struct vfsmount *vfsmntlist = (struct vfsmount *) NULL; /* 头 */
static struct vfsmount *vfsmnttail = (struct vfsmount *) NULL; /* 尾 */
static struct vfsmount *mru_vfsmnt = (struct vfsmount *) NULL; /* 当前 */
struct vfsmount
{
kdev_t mnt_dev;
/* 文件系统所在设备的主设备号、次设备号 */
char *mnt_devname;
/* 设备名,如/dev/hda1 */
char *mnt_dirname;
/* 安装目录名称 */
unsigned int mnt_flags;
/* 设备标志,如 ro */
struct semaphore mnt_sem;
/* 对设备 I/O 操作时的信号量 */
struct super_block *mnt_sb;
/* 指向超级块 */
struct file *mnt_quotas[MAXQUOTAS]; /* 指向配额文件的指针 */
time_t mnt_iexp[MAXQUOTAS];
/* expiretime for inodes */
time_t mnt_bexp[MAXQUOTAS];
/* expiretime for blocks */
struct vfsmount *mnt_next;
/* 已注册文件系统链表的后续指针 */
};
Operating System Concepts
66
VFS 超级块(fs.h)
struct super_block {
kdev_t s_dev;
/* 包含该文件系统的主设备号、次设备号, 如0x0301代表设备/dev/hda1 */
unsigned long s_blocksize;
/* 文件系统的块大小,以字节为单位 */
unsigned char s_blocksize_bits; /* 以2的次幂表示块的大小 */
unsigned char s_lock;
/* 锁定标志,置位表示拒绝其它进程的访问 */
unsigned char s_rd_only;
unsigned char s_dirt;
/* 已修改标志 */
struct file_system_type *s_type; /* s_type指向描述文件系统类型的file_system_type结构 */
struct super_operations *s_op; /* 指向一组操作该文件系统的函数 */
struct dquot_operations *dq_op;
unsigned long s_flags;
unsigned long s_magic;
unsigned long s_time;
struct inode * s_covered; /* 指向安装点目录项的inode节点,根文件系统的VFS超级块没有此指针 */
struct inode * s_mounted;/* 指向被安装文件系统的第一个inode节点。它与s_covered共同使用,见图3-4 */
struct wait_queue * s_wait; /* 在该超级块上的等待队列 */
union {
/* 各类文件系统的特定信息 */
struct minix_sb_info minix_sb;
struct ext_sb_info ext_sb;
struct ext2_sb_info ext2_sb; /* ext2文件系统的超级块 */
struct hpfs_sb_info hpfs_sb;
struct msdos_sb_info msdos_sb;
struct isofs_sb_info isofs_sb;
struct nfs_sb_info nfs_sb;
struct xiafs_sb_info xiafs_sb;
struct sysv_sb_info sysv_sb;
struct affs_sb_info affs_sb;
struct ufs_sb_info ufs_sb;
void *generic_sbp;
} u;
Operating System Concepts
67
};
LINUX的磁盘文件系统(ext2)
典型UNIX文件系统的磁盘组织
 引导块:在文件系统的开头,通常为一个扇区,存放引导程序,用于读入并
启动操作系统。
 超级块:用于记录文件系统的管理信息。特定的文件系统定义了特定的超级
块。
 inode区:一个文件(或目录〕占据一个索引节点。第一个索引节点是该文件
系统的根节点。利用根节点,可以把一个文件系统挂在另一个文件系统的非
叶节点上。
 数据区:存放文件数据或者管理数据(如一级间址块,二级间址块等〕。
Operating System Concepts
68
EXT2体系结构
Group 0
Super
Block
File System
Descriptors
Group 1
............
Group N
block
bitmap
inode
bitmap
inode
table
Operating System Concepts
data
blocks
69
块组(block group)
 保存关于文件系统的备份信息(超级块和所有组描述符)。当某个
组的超级块或组描述符受损时,这些信息用来恢复文件系统。
 块位图(block bitmap)记录本组内各个数据块的使用情况,每一位(bit)
对应于一个数据块,0表示空闲,非0表示已分配。
 inode位图(inode bitmap)的作用类似于块位图,它记录inode表中inode
的使用情况。
Operating System Concepts
70
EXT2超级块
 每个块组(Block Group)均包含一个相同的超级块。一般,只有
块组0的超级块才读入内存,其它块组的超级块仅作为备份。
Operating System Concepts
71
EXT2组描述符(group discriptor)
 每个块组(Block Group)都有一个组描述符来描述它
 所有的组描述符在每个块组中都有备份
 组描述符一个挨一个存放,构成了组描述符表。
Operating System Concepts
72
inode关于数据块的寻址
ext2_inode
数据块
数据块
数据块
12 个直接块
数据块
一次间接块
二次间接块
数据块
三次间接块
数据块
Operating System Concepts
73
EXT2的目录
 目录是关于文件的存取路径的特殊文件
 一个目录文件就是一个目录项的列表,每一个目录项都有一个数据
结构来描述:
struct ext2_dir_entry {
__u32 inode;
/* 该目录项的inode号,
用它可以查找inode表中对应的inode */
__u16 rec_len; /* 目录项的长度 */
__u16 name_len; /* 文件名长度,最长255 */
char name[EXT2_NAME_LEN];
/* 文件名 */
}
Operating System Concepts
74
例:EXT2中查找/usr/include/stdio.h文件
 根据ROOT_DEV,从vfsmntlist链表、file_systems链表找到文件系统的超
级块,继而找出“/”的inode号(VFS的super_block.s_mounted)。
 到块组0的inode表中读取文件系统的根的inode。
 根文件是一个目录文件(由“i_mode”识别),包含了根目录下子目录和文
件的由ext2_dir_entry描述的目录项。可以在其中找到ext2_dir_entry
.name=“usr”的目录项,从该目录项的ext2_dir_entry.inode读出代表/usr
目录的inode号。
Operating System Concepts
75
EXT3文件系统
 日志文件系统(Journaling Filesystem):利用数据库的日志技术,




使得系统发生故障时恢复文件系统更快。其他如Reiserfs,Jfs,XFs等
。是目前新型文件系统的主流实现方式。
EXT3保持了EXT2文件系统的磁盘结构,只是增加了日志功能。
将EXT2文件系统升级到EXT3时,增加了一个.journal文件
新建一个EXT3文件系统时,利用系统中的某些inode实现日志。
多种日志方式
Operating System Concepts
76