Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
File Systems in Real-Time Embedded Applications Introduction to File Systems March 4th Eric Julien 1 Week Agenda • Day 1: Introduction to File Systems • Day 2: Understanding how the File Allocation Table (FAT) Operates • Day 3: Balancing performance, safety and resource usage in an embedded file system • Day 4: Choosing the right storage media • Day 5: The challenges of using NAND flash memory in embedded systems. 2 Definition of a file system From the user’s perspective, the file system provides a means of organizing, storing and retrieving data to a permanent storage device. 3 Definition of a file system From the designer’s perspective, the file system refers to all the internal data structures and algorithms that support these services. 4 Historical overview • 1973: CP/M operating system was first introduced. Its FS was very simple and had no directory hierarchy. • 1980: CP/M was modified and renamed QDOS. QDOS FS was based on a data structure called File Allocation Table. • 1981: Microsoft bought QDOS and its FS and marketed them as MS-DOS and FAT. 5 FS in embedded systems Embedded systems, as opposed to fullfledged computers, have strict limitations both in terms of processor speed and memory. File systems designed for huge data centers (e.g. ZFS) are therefore not well-suited for small, less capable embedded systems. 6 Files The file abstraction provides the user with a convenient way to retrieve previously stored pieces of data using their name. A file can be seen as a labeled data container. 7 File metadata The file metadata refer to pieces of information stored on disk that describe a file. The metadata is not part of the file content. Examples of file metadata are: • • • • File name File creation File size Security attributes 8 Directories The directory abstraction provides the user with a convenient way to group related files. Internally, the directory stores information that allows file names to be associated with corresponding data block locations. Some old FS (such as early versions of DOS) had a single directory containing all files. Such FS are called flat file systems. 9 Device, partition and volume The device refers to the physical storage media (e.g. hard disk, SD card, flash memory). The partition is a logical unit obtained by the division of the underlying device physical space (not FS specific). The volume is a formatted partition or device where the FS resides (FS specific). 10 Common internal structures Although internal architectures vary widely from one FS to another, the base ingredients remain the same: • • • • • Arrays Bitmaps Linked lists Unbalanced trees Balanced trees 11 Bitmaps Often used to keep track of resource allocation. 0000000000000110 Resources 1, 2 and 19 are allocated 0000000000001000 Used by ext2/3/4, NTFS, HFS, ReiserFS among others. 12 Linked lists Used to store and manage directory content (a) and file content (b). Dir X File A File X Block A File B Block B Dir Y Block C File C Block D (b) (a) Used by ext2/3 (a) and FAT12/16/32 (b). 13 Unbalanced trees Heavily used by ext2/3 to organize data blocks. More levels of indirection are added as file grows. File X Metadata A C D B E F 14 G H I J K L M N Balanced trees (B-trees) Figure (a) shows what a balanced tree looks like, as opposed to an unbalanced tree (b). The B-tree is a self-balancing tree that provides logarithmictime search at the expense of a more complex node insertion/deletion. 15 (a) (b) + B -tree vs. linked list B+-tree (a variant of B-tree) provides fast random access. File X B C File X I>=H so branch right E H I>=I so branch right H I E F VS. Data found in 3 hops ! C B D F A H G I In a B+-tree, the search time is logarithmic and deterministic. 16 E A B C D E F G H I Data found in 8 hops ! In linked list the search time is linear and non deterministic. File systems • • • • • • • • FAT exFAT Ext2/3/4 NTFS HFS/HFS Plus Btrfs ZFS Log-structured file systems (YaFFS, JFFS) 17 FAT - 3 flavors: FAT12, FAT16 and FAT32. DOS and Windows 9x file system. Simple architecture based on linked lists. Well-suited for embedded because of its low footprint (both on-disk and RAM). - Poor performances on big volumes (remember linked-list vs. B-trees ?). - More on FAT later… 18 exFAT - Smaller footprint than NTFS (more on NTFS later) but better performances than FAT32. - Bitmaps used to track unallocated clusters (much faster than browsing the FAT). - Huge file size limit (16 exabytes). 19 Ext2/3/4 • Default file system for many Linux distributions. • Internal structure based on unbalanced trees with up to 3 levels of indirection. • Journaling (in ext3) as a means of providing metadata reliability. • Extents (variable-sized blocks) in ext4 allows better large file performances. 20 NTFS • Default Windows file system since XP. • Based on extents. • Directory entries stored in a B-tree, providing much better performances than FAT for huge directories. • Clever handling of small files: data is stored with the metadata for fast access and low internal fragmentation. 21 HFS/HFS plus • Default file system for Mac OS. • All files and directories metadata is stored in a single giant B-tree. • HFS plus basically provides additional support for bigger files and longer file names. • Journaling possible with HFS plus. 22 Btrfs (B-tree file system) • Almost everything (file, directory, resource allocation management) is B-tree. • Copy-on-write is used as means of better reliability. Data or metadata is never overwritten. Instead, a modified block is written out-of-place and pointers to it are then adjusted to reflect new block location. 23 ZFS • More than a regular file system: also a logical volume manager. • Transactional model based on copy-onwrite. • Provides metadata AND data integrity by checksumming almost everything. • Many advanced features such as data deduplication, snapshots and clones. 24 Log-structured file systems • Storage media treated as log. • Good reliability: logging implies copy-onwrite. • High write throughput: logging allows long sequential write operations. • Well-suited for flash media as it inherently provides wear leveling. • Used by YaFFS and JFFS (both flash FS). 25