Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 12: File System Implementation • • • • • • • • • File System Structure File System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery Log-Structured File Systems NFS Operating System Concepts 1 File-System Structure • File structure – Logical storage unit – Collection of related information • File system resides on secondary storage (disks). • File system organized into layers. • File control block – storage structure consisting of information about a file. Operating System Concepts 2 Layered File System Operating System Concepts 3 A Typical File Control Block ACL:Access Control List Operating System Concepts 4 In-Memory File System Structures • The following figure illustrates the necessary file system structures provided by the operating systems. • Figure 12-3(a) refers to opening a file. • Figure 12-3(b) refers to reading a file. Operating System Concepts 5 In-Memory File System Structures Operating System Concepts 6 Virtual File Systems • Virtual File Systems (VFS) provide an objectoriented way of implementing file systems. • VFS allows the same system call interface (the API) to be used for different types of file systems. • The API is to the VFS interface, rather than any specific type of file system. Operating System Concepts 7 Schematic View of Virtual File System Operating System Concepts 8 Directory Implementation • Linear list of file names with pointer to the data blocks. – simple to program – time-consuming to execute • Hash Table – linear list with hash data structure. – decreases directory search time – collisions – situations where two file names hash to the same location – fixed size Operating System Concepts 9 Allocation Methods • An allocation method refers to how disk blocks are allocated for files • Contiguous allocation • Linked allocation • Indexed allocation Operating System Concepts 10 Contiguous Allocation • Each file occupies a set of contiguous blocks on the disk. • Simple – only starting location (block #) and length (number of blocks) are required. • Random access. • Wasteful of space (dynamic storage-allocation problem). • Files cannot grow. Operating System Concepts 11 Contiguous Allocation of Disk Space Operating System Concepts 12 Extent-Based Systems • Many newer file systems (e.g. Veritas File System) use a modified contiguous allocation scheme. • Extent-based file systems allocate disk blocks in extents. • An extent is a contiguous block of disks. Extents are allocated for file allocation. A file consists of one or more extents. Operating System Concepts 13 Linked Allocation • Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. block = pointer data Operating System Concepts 14 Linked Allocation Operating System Concepts 15 Linked Allocation (Cont.) Simple – need only starting address Free-space management system – no external fragment No random access Space required for pointers: 4-byte pointer in 512-byte block:0.78% • Mapping • • • • Q LA/512 R Block to be accessed is the Qth block in the linked chain of blocks representing the file. Displacement into block = R Operating System Concepts 16 Clustering:linked list of clusters instead of blocks Decreases disk space needed for block management Increase of internal fragmentation Improves disk throughput(fewer disk head seeks) File-allocation table (FAT) – disk-space allocation used by MS-DOS and OS/2. FAT occupies a section at the beginning of a partition FAT contains one entry for each block, containing a pointer to the next block in that file. Operating System Concepts 17 File-Allocation Table Operating System Concepts 18 Indexed Allocation • Brings all pointers together into the index block. • Logical view. index table Operating System Concepts 19 Example of Indexed Allocation Operating System Concepts 20 Indexed Allocation (Cont.) • Need index table • Random access • Dynamic access without external fragmentation, but have overhead of index block. • Mapping from logical to physical in a file of maximum size of 256K words and block size of 512 words. We need only 1 block for index table. Q LA/512 R Q = displacement into index table R = displacement into block Operating System Concepts 21 Indexed Allocation – Mapping (Cont.) • Mapping from logical to physical in a file of unbounded length (block size of 512 words). • Linked scheme – Link blocks of index table (no limit on size). Q1 LA / (512 x 512) R1 Q1 = block of index table R1 is used as follows: Q2 R1 / 512 R2 Q2 = displacement into block of index table R2 displacement into block of file: Operating System Concepts 22 Indexed Allocation – Mapping (Cont.) • Two-level index (maximum file size is 5123) Q1 LA / (512 x 512) R1 Q1 = displacement into outer-index R1 is used as follows: Q2 R1 / 512 R2 Q2 = displacement into block of index table R2 displacement into block of file: Operating System Concepts 23 Indexed Allocation – Mapping (Cont.) LA outer-index index table Operating System Concepts file 24 Combined Scheme: UNIX (4K bytes per block) Operating System Concepts 25 Free-Space Management • Bit vector (n blocks) 0 1 2 n-1 bit[i] = … 1 block[i] free 0 block[i] occupied Calculation of the first free block number (number of bits per word) * (number of 0-value words) + offset of first 1 bit Operating System Concepts 26 Free-Space Management (Cont.) – Bit map requires extra space. Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32K bytes) – Easy to get contiguous files • Linked list (free list) – Cannot get contiguous space easily – No waste of space • Grouping:linked list of free-block blocks(nodes),each node contains n-1 free block number and a pointer to the next node.(UNIX ) • Counting: entry in the free-space list consists of the disk address of a region of free blocks and its length(in block) Operating System Concepts 27 Linked Free Space List on Disk Operating System Concepts 28 Efficiency and Performance • Efficiency dependent on: – disk allocation and directory algorithms – types of data kept in file’s directory entry(e.g. “last write time”, “last access time”) • Performance – disk cache – separate section of main memory for frequently used blocks – free-behind and read-ahead – techniques to optimize sequential access – improve PC performance by dedicating section of memory as virtual disk, or RAM disk. Operating System Concepts 29 Various Disk-Caching Locations Operating System Concepts 30 Page Cache • A page cache caches pages rather than disk blocks using virtual memory techniques. • Memory-mapped I/O uses a page cache. • Routine I/O through the file system uses the buffer (disk) cache. • This leads to the following figure. Operating System Concepts 31 I/O Without a Unified Buffer Cache Operating System Concepts 32 Unified Buffer Cache • A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O. Operating System Concepts 33 I/O Using a Unified Buffer Cache Operating System Concepts 34 Recovery • Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies. • Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape). • Recover lost file or disk by restoring data from backup. Operating System Concepts 35 Log Structured File Systems • Log structured (or journaling) file systems record each set of operations to the file system as a transaction. • All transactions are written to a log. A transaction is considered committed once it is written to the log. However, the file system may not yet be updated. • The transactions in the log are asynchronously written to the file system. When the file system is modified, the transaction is removed from the log. • If the file system crashes, all remaining transactions in the log must still be performed. Operating System Concepts 36 The Sun Network File System (NFS) • An implementation and a specification of a software system for accessing remote files across LANs (or WANs). • The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol) and Ethernet. Operating System Concepts 37 NFS (Cont.) • Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner. – A remote directory is mounted over a local file system directory. The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory. – Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided. Files in the remote directory can then be accessed in a transparent manner. – Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory. Operating System Concepts 38 NFS (Cont.) • NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications independent of these media. • This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation-independent interfaces. • The NFS specification distinguishes between the services provided by a mount mechanism(mount protocol) and the actual remote-file-access services(NFS protocol). Operating System Concepts 39 Three Independent File Systems Operating System Concepts 40 Mounting in NFS Mounts Cascading mounts Operating System Concepts 41 NFS Mount Protocol • Establishes initial logical connection between server and client. • Mount operation includes name of remote directory to be mounted and name of server machine storing it. – Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine. – Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them. • Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses. • File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system. • The mount operation changes only the user’s view and does not affect the server side. Operating System Concepts 42 NFS Protocol • Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: – – – – – searching for a file within a directory reading a set of directory entries manipulating links and directories accessing file attributes reading and writing files • NFS servers are stateless; each request has to provide a full set of arguments. • Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching). • The NFS protocol does not provide concurrency-control mechanisms. Operating System Concepts 43 Three Major Layers of NFS Architecture • UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors). • Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types. – The VFS activates file-system-specific operations to handle local requests according to their filesystem types. – Calls the NFS protocol procedures for remote requests. • NFS service layer – bottom layer of the architecture; implements the NFS protocol. Operating System Concepts 44 Schematic View of NFS Architecture Operating System Concepts 45 NFS Path-Name Translation • Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode(a numerical designator for a network-wide unique file). • To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names. Operating System Concepts 46 NFS Remote Operations • Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files). • NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance. • File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. Cached file blocks are used only if the corresponding cached attributes are up to date. • File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server. • Clients do not free delayed-write blocks until the server confirms that the data have been written to disk. Operating System Concepts 47 NTFS File System The fundamental structure of the NTFS file system is a volume. Created by the NT disk administrator utility. Based on a logical disk partition. May occupy a portions of a disk, an entire disk, or span across several disks. NTFS uses clusters as the underlying unit of disk allocation. A cluster is a number of disk sectors that is a power of two. Larger cluster size for larger volume size:512bytes to 64K Operating System Concepts 48 File System — Internal Layout NTFS uses logical cluster numbers (LCNs) as disk addresses. A file in NTFS is not a simple byte stream, as in MS-DOS or UNIX, rather, it is a structured object consisting of attributes. Every file in NTFS is described by one or more records in an array stored in a special file called the Master File Table (MFT). Each file on an NTFS volume has a unique ID called a file reference. 64-bit quantity that consists of a 48-bit file number and a 16- bit sequence number. Can be used to perform internal consistency checks. The NTFS name space is organized by a hierarchy of directories; the index root contains the top level of the B+ tree. Operating System Concepts 49 NTFS Volume Layout partition boot sector Master File Table System Files File Area Boot sector : up to 16 sectors MFT:information about files and directories in the volume,as well as available unallocated space, organized as a relational database structure. MFT is also considered as a file(first entry in the MFT). System files: MFT2: a mirror of the first three rows of the MFT Log file: Cluster bit map: Attribute definition table:defines the attribute types supported on this volume and indicates whether they can be indexed and whether they can be recovered during a system recovery operation. Operating System Concepts 50 A MFT record Standard information File name Security descriptor data A file is a collection of attributes. For a small file, attributes can be stored in a single record. If the attributes cannot be stored in a single record, another record is used. If a single attribute is too large to be accomodated in a MFT record(e.g., the data attribute), NTFS will allocate one or more separate area named extent. If a file is a directory, the data area includes the index of files in the directory. Operating System Concepts 51 Windows NTFS Components I/O Manager Log File Service Log the transaction NTFS Driver Read/write the file Flush the Write the log file cache Fault Tolerant Driver Disk Driver Read/write a mirrored or striped volume Read/write the disk Cache Manager Load data from disk into memory Access the mapped file or flush the cache Virtual Memory Manager Operating System Concepts 52 File System — Recovery All file system data structure updates are performed inside transactions. Before a data structure is altered, the transaction writes a log record that contains redo and undo information. After the data structure has been changed, a commit record is written to the log to signify that the transaction succeeded. After a crash, the file system data structures can be restored to a consistent state by processing the log records. Operating System Concepts 53 File System — Recovery (Cont.) This scheme does not guarantee that all the user file data can be recovered after a crash, just that the file system data structures (the metadata files) are undamaged and reflect some consistent state prior to the crash.. The log is stored in the third metadata file at the beginning of the volume. The logging functionality is provided by the 2000 log file service. Operating System Concepts 54 LINUX下的NFS服务 Server side: 运行级(runlevel)3 Mountd( mount protocol) Nfsd(NFS protocol) /etc/exports /var/lib/nfs/xtab Exportfs –a Client side: Exportfs –r –o async django:/usr/tmp Operating System Concepts 55 LINUX的虚拟文件系统VFS VFS对LINUX的每个文件系统的所有细节进行抽象,使得不同的文件系 统在LINUX核心以及系统中运行的其他进程看来,都是相同的。 VFS并不是一种实际的文件系统。它只存在于内存中,不存在于任何 外存空间。 VFS在系统启动时建立,在系统关闭时消亡。 VFS拥有关于各种特殊文件系统的公共界面,如超级块、inode、文件 操作函数入口等。 Operating System Concepts 56 VFS的作用 VFS inode cache VFS VFS directory cache MINIX FS EXT2 FS EXT FS MSDOS FS buffer cache I/O 设备驱动 Operating System Concepts 57 文件系统类型 支持多种不同类型的文件系统是LINUX操作系统的一大特色。 目前支持的有ext, ext2, minix, umsdos, ncp, iso9660, hpfs,msdos, xia, vfat, proc,nfs, smb, sysv, affs,ntfs以及ufs等, 参见 include/linux/autoconf.h。 Operating System Concepts 58 文件系统类型的注册 一种是在编译核心系统时确定,并在系统初始化时通过内嵌的函 数调用向注册表登记。 另一种则利用LINUX的模块(module)特征,把某个文件系统当作 一个模块。装入该模块时(通过kerneld或用insmod命令 )向注 册表登记它的类型,卸装该模块时则从注册表注销。 Operating System Concepts 59 操作函数 int register_filesystem(struct file_system_type * fs); int unregister_filesystem(struct file_system_type * fs); Operating System Concepts 60 管理文件系统类型的结构 static struct file_system_type *file_systems = (struct file_system_type *) NULL; struct file_system_type { struct super_block *(*read_super)(struct super_block *,void *,int); /* read_super 所指的函数用于读出该文件系统在外存的超级块 */ const char *name; /* 文件系统的类型名,如 ext2 */ int requires_dev; /* 支持文件系统的设备,proc 文件系统不需要任何设备 */ struct file_system_type * next; /* 文件系统类型链表的后续指针 */ }; file_systems next file_system_type next Operating System Concepts next=0 61 文件系统实例的管理 系统启动时,必首先装入“根”文件系统(由全程变量ROOT_DEV指示), 然后根据/etc/fstab中指定,逐个建立文件系统。 用户也可以通过mount、umount操作,随时安装或卸装文件系统。 当装入一个文件系统时,应首先向操作系统核心注册该文件系统。 Operating System Concepts 62 安装(mount)一个文件系统 root 安装点 inode 下挂文件系统 i_sb 文件系统的超级块 i_mount s_covered s_mounted Operating System Concepts 63 文件系统类型和实例示意图 vfsmount super_block file_system_type file_systems vfsmntlist mnt_sb s_type inode vfsmnttail s_covered mnt_sb s_mounted inode Operating System Concepts 64 文件系统实例的注册操作 struct vfsmount *add_vfsmnt( kdev_t dev, const char * dev_name, const char * dir_name); void remove_vfsmnt(kdev_t dev); struct vfsmount *lookup_vfsmnt(kdev_t dev); Operating System Concepts 65 文件系统实例的数据结构 static struct vfsmount *vfsmntlist = (struct vfsmount *) NULL; /* 头 */ static struct vfsmount *vfsmnttail = (struct vfsmount *) NULL; /* 尾 */ static struct vfsmount *mru_vfsmnt = (struct vfsmount *) NULL; /* 当前 */ struct vfsmount { kdev_t mnt_dev; /* 文件系统所在设备的主设备号、次设备号 */ char *mnt_devname; /* 设备名,如/dev/hda1 */ char *mnt_dirname; /* 安装目录名称 */ unsigned int mnt_flags; /* 设备标志,如 ro */ struct semaphore mnt_sem; /* 对设备 I/O 操作时的信号量 */ struct super_block *mnt_sb; /* 指向超级块 */ struct file *mnt_quotas[MAXQUOTAS]; /* 指向配额文件的指针 */ time_t mnt_iexp[MAXQUOTAS]; /* expiretime for inodes */ time_t mnt_bexp[MAXQUOTAS]; /* expiretime for blocks */ struct vfsmount *mnt_next; /* 已注册文件系统链表的后续指针 */ }; Operating System Concepts 66 VFS 超级块(fs.h) struct super_block { kdev_t s_dev; /* 包含该文件系统的主设备号、次设备号, 如0x0301代表设备/dev/hda1 */ unsigned long s_blocksize; /* 文件系统的块大小,以字节为单位 */ unsigned char s_blocksize_bits; /* 以2的次幂表示块的大小 */ unsigned char s_lock; /* 锁定标志,置位表示拒绝其它进程的访问 */ unsigned char s_rd_only; unsigned char s_dirt; /* 已修改标志 */ struct file_system_type *s_type; /* s_type指向描述文件系统类型的file_system_type结构 */ struct super_operations *s_op; /* 指向一组操作该文件系统的函数 */ struct dquot_operations *dq_op; unsigned long s_flags; unsigned long s_magic; unsigned long s_time; struct inode * s_covered; /* 指向安装点目录项的inode节点,根文件系统的VFS超级块没有此指针 */ struct inode * s_mounted;/* 指向被安装文件系统的第一个inode节点。它与s_covered共同使用,见图3-4 */ struct wait_queue * s_wait; /* 在该超级块上的等待队列 */ union { /* 各类文件系统的特定信息 */ struct minix_sb_info minix_sb; struct ext_sb_info ext_sb; struct ext2_sb_info ext2_sb; /* ext2文件系统的超级块 */ struct hpfs_sb_info hpfs_sb; struct msdos_sb_info msdos_sb; struct isofs_sb_info isofs_sb; struct nfs_sb_info nfs_sb; struct xiafs_sb_info xiafs_sb; struct sysv_sb_info sysv_sb; struct affs_sb_info affs_sb; struct ufs_sb_info ufs_sb; void *generic_sbp; } u; Operating System Concepts 67 }; LINUX的磁盘文件系统(ext2) 典型UNIX文件系统的磁盘组织 引导块:在文件系统的开头,通常为一个扇区,存放引导程序,用于读入并 启动操作系统。 超级块:用于记录文件系统的管理信息。特定的文件系统定义了特定的超级 块。 inode区:一个文件(或目录〕占据一个索引节点。第一个索引节点是该文件 系统的根节点。利用根节点,可以把一个文件系统挂在另一个文件系统的非 叶节点上。 数据区:存放文件数据或者管理数据(如一级间址块,二级间址块等〕。 Operating System Concepts 68 EXT2体系结构 Group 0 Super Block File System Descriptors Group 1 ............ Group N block bitmap inode bitmap inode table Operating System Concepts data blocks 69 块组(block group) 保存关于文件系统的备份信息(超级块和所有组描述符)。当某个 组的超级块或组描述符受损时,这些信息用来恢复文件系统。 块位图(block bitmap)记录本组内各个数据块的使用情况,每一位(bit) 对应于一个数据块,0表示空闲,非0表示已分配。 inode位图(inode bitmap)的作用类似于块位图,它记录inode表中inode 的使用情况。 Operating System Concepts 70 EXT2超级块 每个块组(Block Group)均包含一个相同的超级块。一般,只有 块组0的超级块才读入内存,其它块组的超级块仅作为备份。 Operating System Concepts 71 EXT2组描述符(group discriptor) 每个块组(Block Group)都有一个组描述符来描述它 所有的组描述符在每个块组中都有备份 组描述符一个挨一个存放,构成了组描述符表。 Operating System Concepts 72 inode关于数据块的寻址 ext2_inode 数据块 数据块 数据块 12 个直接块 数据块 一次间接块 二次间接块 数据块 三次间接块 数据块 Operating System Concepts 73 EXT2的目录 目录是关于文件的存取路径的特殊文件 一个目录文件就是一个目录项的列表,每一个目录项都有一个数据 结构来描述: struct ext2_dir_entry { __u32 inode; /* 该目录项的inode号, 用它可以查找inode表中对应的inode */ __u16 rec_len; /* 目录项的长度 */ __u16 name_len; /* 文件名长度,最长255 */ char name[EXT2_NAME_LEN]; /* 文件名 */ } Operating System Concepts 74 例:EXT2中查找/usr/include/stdio.h文件 根据ROOT_DEV,从vfsmntlist链表、file_systems链表找到文件系统的超 级块,继而找出“/”的inode号(VFS的super_block.s_mounted)。 到块组0的inode表中读取文件系统的根的inode。 根文件是一个目录文件(由“i_mode”识别),包含了根目录下子目录和文 件的由ext2_dir_entry描述的目录项。可以在其中找到ext2_dir_entry .name=“usr”的目录项,从该目录项的ext2_dir_entry.inode读出代表/usr 目录的inode号。 Operating System Concepts 75 EXT3文件系统 日志文件系统(Journaling Filesystem):利用数据库的日志技术, 使得系统发生故障时恢复文件系统更快。其他如Reiserfs,Jfs,XFs等 。是目前新型文件系统的主流实现方式。 EXT3保持了EXT2文件系统的磁盘结构,只是增加了日志功能。 将EXT2文件系统升级到EXT3时,增加了一个.journal文件 新建一个EXT3文件系统时,利用系统中的某些inode实现日志。 多种日志方式 Operating System Concepts 76