Download File System - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Object storage wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Lustre (file system) wikipedia , lookup

File system wikipedia , lookup

Disk formatting wikipedia , lookup

XFS wikipedia , lookup

Design of the FAT file system wikipedia , lookup

File Allocation Table wikipedia , lookup

File locking wikipedia , lookup

Computer file wikipedia , lookup

Files-11 wikipedia , lookup

Transcript
UNIVERSITY OF GHANA
DEPARTMENT OF COMPUTER SCIENCE
CSCD102 – INTRO COMPUTER SCIENCE II
FILE MANAGEMENT – FILE SYSTEMS
-----------------------------------------------------------------------In this section the following notions are discussed:
* Purpose of a File System
* File names, naming conventions
* File allocation on storage media
* Compaction
------------------------------------------------------------------------
Purpose of a File Management
The file manager handles all files on secondary storage media. To perform these tasks, file
management must:
* be able to identify the numerous files by giving unique names to them
* maintain a list telling where exactly each file is stored, how many sectors on the medium it
occupies, and in which order those sectors make up the file
* provide simple and fast algorithms to read and write files in cooperation with the device
manager
* give and deny access rights on files to users and programs
* allocate and deallocate files to processes in cooperation with the process manager
* provide users and programs with simple commands for file handling
Reference: http://courses.cs.vt.edu/~csonline/AI/Lessons/index.html
WHAT is a File?
A collection of data or information that has a name, called the filename. Almost all information
stored in a computer must be in a file. From an Operating System point of view, a files is a unit
of storage.
There are many different types of files: data files, text files , program files, directory files, and so
on.
Family of Files
Different types of files store different types of information.
For example, program files store programs, whereas text files store text.
.EXE
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 1
Pronounced ee-ex-ee file. In DOS and Windows systems, an EXE file is an executable file with an
.EXE extension.
.BAT
In DOS systems, batch files are often called BAT files because their filenames end with a.BAT
extension.
autoexec.bat
Stands for automatically executed batch file, the file that DOS automatically executes when a
computer boots up. This is a convenient place to put commands you always want to execute at
the beginning of a computing session. For example, you can set system parameters such as the
date and time, and install memory-resident programs.
log file
A file that lists actions that have occurred. For example, Web servers maintain log files listing
every request made to the server. With log file analysis tools, it's possible to get a good idea of
where visitors are coming from, how often they return, and how they navigate through a site.
Using cookies enables Webmasters to log even more detailed information about how individual
users are accessing a site.
Zone file
A file on a root server that contains domain name registration information. Zone files contain
information necessary to resolve domain names to IP addresses and contains all information
related to one domain. Zone files are also called master files.
text file
A file that holds text. The term text file is often used as a synonym for ASCII file, a file in which
characters are represented by their ASCII codes.
ascii file
A text file in which each byte represents one character according to the ASCII code. Contrast
with a binary file, in which there is no one-to-one mapping between bytes and characters. Files
that have been formatted with a word processor must be stored and transmitted as binary files
to preserve the formatting. ASCII files are sometimes called plain text files.
binary file
A file stored in binary format. A binary file is computer -readable but not human-readable. All
executable programs are stored in binary files, as are most numeric data files. In contrast, text
files are stored in a form (usually ASCII) that is human-readable.
File System
In an operating system, the overall structure in which files are named, stored, and organized.
NTFS, FAT, and FAT32 are types of file systems.
In computing, a file system (or filesystem) is a type of data store which can be used to store, retrieve
and update a set of files. The term could refer to the abstract data structures used to define files, or to
the actual software or firmware components that implement the abstract ideas.
WHAT is a Folder?
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 2
A container for programs and files in graphical user interfaces, symbolized on the screen by a
graphical image (icon) of a file folder. A folder is a means of organizing programs and
documents on a disk and can hold both files and additional folders.
Another definition: folder
In graphical user interfaces such as Windows and the Macintosh environment, a folder is an
object that can contain multiple documents. Folders are used to organize information. In the
DOS and UNIX worlds, folders are called directories.
Directories
An organizational unit, or container, used to organize folders and files into a hierarchical
structure. Directories contain bookkeeping information about files that are, figuratively
speaking, beneath them in the hierarchy. You can think of a directory as a file cabinet that
contains folders that contain files. Many graphical user interfaces use the term folder instead of
directory.
File Structure : Inverted or Tree Structure
Computer manuals often describe directories and file structures in terms of an inverted tree.
The files and directories at any level are contained in the directory above them. To access a file,
you may need to specify the names of all the directories above it. You do this by specifying a
path.
Root Directory
The topmost directory in any file is called the root directory. The root directory is provided by
the operating system and has a special name - Under DOS and Windows, the root directory is a
back slash (\).
A directory that is below another directory is called a subdirectory. A directory above a
subdirectory is called the parent directory.
parent
Refers to the directory above another directory. Every directory, except the root directory, lies
beneath another directory. The higher directory is called the parent directory, and the lower
directory is called a subdirectory. In DOS and UNIX systems, the parent directory is identified by
two dots (..).
working dir
The directory in which you are currently working. Pathnames that do not start with the root
directory are assumed by the operating system to start from the working directory.
OS commands
To read information from, or write information into, a directory, you must use an operating
system command. You cannot directly edit directory files.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 3
For example, the DIR command in DOS reads a directory file and displays its contents.
In the Macintosh environment, a four-character sequence that identifies the type of a
Macintosh file. The Macintosh Finder uses the file type and file creator to determine the
appropriate desktop icon for that file.
file type
In the Windows environment, a designation of the operational or structural characteristics of a
file. The file type identifies the program, such as Microsoft Word, that is used to open the file.
File types are associated with a file name extension. For example, files that have the .txt or .log
extension are of the Text Document type and can be opened using any text editor.
format
The structure of a file that defines the way it is stored and laid out on the screen or in print. The
format of a file is usually indicated by its extension. For example, .txt after a file name indicates
the file is a text document, and .doc after a file name indicates it is a Word document.
File names, naming conventions
In order to make users, programs and the file manager itself able to identify the different files,
they must be given a unique file name.
Relative File Name
The *relative file name* is what a user normally recognises as file name; it consists of a name
and an extension, for instance |problem.txt| or |forloop.cpp|.
Apart from some exceptions, relative file names look the same in all operating systems.
The name is normally given by the user, whereas the extension (which is separated from the
name by a dot) generally indicates what kind of file it is.
common file extensions <examples/extensions.htm>
Absolute File Name
The *absolute file name* is normally much longer than the user thinks it is. Here, the relative
file name is preceeded by the place on disk it is stored, that is: the drive name and the
directory names in which to find that file.
So the absolute file name consists of:
1. drive name
eg.
2. directory name(s)
3. file name
4. extension
C:
A:
D:
etc.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 4
Operating System and File Naming Rules
File name and extension are separated by a dot. The directories are separated by slashes (UNIX)
or back slashes (Windows, DOS). Because drive names and file organization differ from OS to
OS, absolute file names look different depending on what operating system is used.
For instance, a file with the relative name syllabus.doc , saved by the user Peter in the
directory data would look like that
in DOS:
c:\data\syllabus.doc
in WINDOWS: c:\data\syllabus.doc
in LINUX:
/usr/home/Peter/data/syllabus.doc
Note that the absolute file name changes when the location is different. The relative file name,
however, stays the same. So, after saving that file on a floppy disk, the absolute file name of the
backup would be
in DOS:
a:\syllabus.doc
in WINDOWS a:\syllabus.doc
in LINUX:
/mnt/fdd0/syllabus.doc
A relative file name is *restricted in length*. How this restriction exactly looks like again
depends on the OS.
DOS has the hardest restrictions, allowing the file name and also all directory names only to
be 8 characters long, and the extension 3.
This is properly known as "8.3"-restriction (/speak:/ eight-dot-three).
All other OS's allow the relative file name to be at least 14, but most often up to 255 characters
long.
wildcard character
A keyboard character that can be used to represent one or many characters when conducting a
query. The question mark (?) represents a single character, and the asterisk (*) represents one
or more characters.
File Allocation Table (Fat)
A file system used by MS-DOS and other Windows-based operating systems to organize and
manage files. The file allocation table (FAT) is a data structure that Windows creates when you
format a volume by using the FAT or FAT32 file systems. Windows stores information about
each file in the FAT so that it can retrieve the file later.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 5
FAT32
A derivative of the file allocation table (FAT) file system. FAT32 supports smaller cluster sizes
and larger volumes than FAT, which results in more efficient space allocation on FAT32
volumes.
NTFS file system ( New Technology File System )
An advanced file system that provides performance, security, reliability, and advanced features
that are not found in any version of FAT. For example, NTFS guarantees volume consistency by
using standard transaction logging and recovery techniques. If a system fails, NTFS uses its log
file and checkpoint information to restore the consistency of the file system. In Windows 2000
and Windows XP, NTFS also provides advanced features such as file and folder permissions,
encryption, disk quotas, and compression.
File allocation on storage media
On the storage medium a file is saved in blocks (sectors) of equal size.
To access these files, device manager and file manager work together:
The device manager "knows" where to find each sector on disk, but only the file manager has a
list telling it what sectors the file is stored.
File Allocation Table (FAT)
This list is the *File Allocation Table (FAT)*
There are different ways of allocating files. The main concern is to provide a strategy that lets
the FAT not grow too large, that makes it possible to retrieve a special sector of a file, and that
wastes not too much storage space.




contiguous file allocation
non-contiguous file allocation (FAT)
chained allocation
indexed allocation
Contiguous file allocation
With contiguous file allocation a single set of blocks is allocated to a file at the time of file
creation. Each file is stored contiguously, one sector after another. The advantage is that the
FAT only has to have a single entry for each file, indicating the name, the start sector, and the
length. Moreover, it is easy to get a single block because its address can simply be calculated:
If a file starts at sector *|c|*, and the *|n|*^th block is wanted, the location on secondary
storage is simply *|c+n|*.
The disadvantage is that it may be difficult (if not impossible) to find a sufficiently large set of
contiguous blocks. From time to time it will be neccessary to perform compaction
<#compaction>.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 6
Contiguous file allocation is nowadays only used for tapes and recordable CDs. One does not
make use of compaction algorithms, though, because data there is not supposed to be
changed. It is rather overwritten/thrown away if no longer needed.
Non-contiguous file allocation (FAT)
With non-contiguous file allocation all blocks of a file can be distributed all over the storage
medium. The File Allocation Table (FAT) lists not only all files, but has an entry for each sector
the file occupies. Because all information is stored in the FAT, and no assumption on the
distribution of the file is taken, this method of allocation is sometimes simply called FAT.
The advantage is that it is very easy to get a single block, because each block has its entry in the
FAT. Additionally, it is a very simple allocation method where not much overhead is produced
and no sophisticated search method for free blocks is needed.
The disadvantage is that the FAT can and will grow to an enormous size (imagine that a file of
1MB size must have 2000 entries in the FAT if each sector stores 512 Bytes of data). That slows
the system down. Compaction will be needed from time to time.
FAT has been in use under DOS for a long time, and some alternations of it are still used by
Win95 and Win98.
Chained allocation
With chained file allocation only the first blocks of either file gets an entry in the FAT, and this
first sector has got a pointer at its end that points to the next sector of it (or indicates that it
was the last).
The advantage is again that the FAT only has to have a single entry for each file, indicating file
name and position of the first sector. The files do not have to be stored contiguously.
The disadvantage is it takes very long to retrieve a single block because that information is
neither stored nor can it be calculated. If a special sector is needed, all preceeding sectors have
to be read, all the time in order to get information about where the next block is located.
Indexed allocation
With indexed file allocation also only the first blocks of either file get an entry in the FAT. In this
first sector, however, no data is stored but only pointers to where the file is on storage
medium. That is why the first block is called the /index block/.
Here as well the FAT only has to have a single entry for each file, indicating file name and
position of the first sector. Additionally, it is easy to retrieve a single block because the
information about where it is stored is saved in the first block.
The disadvantage is that for each file an additional sector is needed. Even a very small file
always occupies at least two blocks, where the data would easy fit in one. So some of the
secondary storage space is wasted.
Indexed allocation is (in minor variations) implemented in all UNIXes. It is fast and reliable, and
nowadays the waste of storage space does not matter so much anymore.
------------------------------------------------------------------------
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 7
fragmentation
The scattering of parts of the same disk file over different areas of the disk. Fragmentation
occurs as files on a disk are deleted and new files are added. It slows disk access and degrades
the overall performance of disk operations, although usually not severely.
defragmentation
The process of rewriting parts of a file to contiguous sectors on a hard disk to increase the
speed of access and retrieval. When files are updated, the computer tends to save these
updates on the largest continuous space on the hard disk, which is often on a different sector
than the other parts of the file. When files are thus fragmented, the computer must search the
hard disk each time the file is opened to find all of the file's parts, which slows down response
time.
Compaction
After many I/O operations files on a hard disk are usually distributed over many tracks. That
slows the hard disk's speed down, as it must reposition the head more often than normally.
During compaction the file manager tells the device manager which sector's data logically
belong together (as /files/), and the device manager then exchanges the sector's contents in a
way that logically connected data are stored in neighbouring sectors. The way to do that is
called *compaction algorithm*
Example: file distribution on a hard disk before compaction
Track 1
File B, block 1 File A, block 2 File B, block 2 File C, block 5 File A, block 1 empty
Track 2
File C, block 1 File C, block 2 File A, block 3 empty
Track 3
File C, block 3 File C, block 4 File A, block 4 empty File B, block 1
The same hard disk after compaction
Track 1
File B, block 1 File B, block 2 File B, block 3 File C, block 1 File C, block 2
Track 2
File C, block 3 File C, block 4 File C, block 5 File A, block 1 File A, block 2
Track 3
File A, block 3 File A, block 4 empty empty empty
Disk file systems are usually block-oriented. Files in a block-oriented file system are sequences
of blocks, often featuring fully random-access read, write, and modify operations. Example pen
drives.
How It Works
Technicians classify pen drives as NOT AND, also called NAND, gate-style data storage devices.
This technology works by storing data in blocks rather than randomly; as such, it doesn't work in
the same way that a computer's main memory systems — read-only memory (ROM) and
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 8
random-access memory (RAM) — do. Using blocks rather than allowing random access allows
the drive to store more information and be made at a lower cost.
Transfer Speeds
The actual transfer speed depends upon several factors, such as the computer's speed at reading
and writing to the device. Generally, a pen drive's advertised speed is the reading speed because
it is faster than the speed at which data can be written to it. Manufacturers usually list the speed
in megabytes per second (MB/s). The age of the drive and how it's being used — such as for
writing and erasing small files — also affects the transfer speed.
Benefits
Equipped with a large amount of memory, the pen drive is often considered to be an
improvement on both the older floppy drive disks and the more modern compact disks. They can
transfer data much more quickly than these older technologies. Because they are solid state —
there are no moving parts — flash drives usually last longer and the data stored on them is safer.
Depending on the storage size, flash drives can hold anywhere from 128 MB to 32 GB or more;
by comparison, a standard CD-ROM holds about 700 MB of data.
Even a pen drive with a relatively low storage capacity tends to provide plenty of space for all
different types of files. Any file that can be stored on a computer's hard drive can usually be
copied to a flash drive, as long as there is enough memory. There are also programs that can be
run directly from the drive, without needing to be installed on the computer first.
Limitations
Pen drives do have a few limitations, including how many times they can be used. Each drive has
a limited number of program-erase cycles (P/E cycles), which is the act of putting files onto the
drive and erasing them. Typically, the device can go through 100,000 P/E cycles before the
integrity of the unit is compromised and files become corrupted.
Another limitation concerns the way manufacturers build the devices. The NAND gate-style
allows a user to program or read the data one byte or word at a time, but erases data in blocks.
When only small amounts are erased, the storage capacity is reduced.
The NAND gate-style device also may cause the loss of data because of the way that the
information is accessed. Reading data in one cell may trigger changes in the cells that surround
it. Generally, a user must read the cell thousands of times before this occurs, however, and
rewriting the surrounding cells periodically may prevent this problem.
The computer chip in the drive can also wear out, causing the device to operate more slowly. The
NAND gate-style method of programming and erasing files that are smaller than a block can also
slow things down. This can make the device mark some blocks as bad, even though they are not
completely full; trying to read bad blocks and remapping them can reduce the speed with which
the device functions.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 9
A flash file system is a file system designed for storing files on flash memory devices. These are
becoming more prevalent as the number of mobile devices is increasing, the cost per memory
size decreases, and the capacity of flash memories increases.
While a block device layer can emulate a disk drive so that a disk file system can be used on a
flash device, this is suboptimal for several reasons:



Erasing blocks: Flash memory blocks have to be explicitly erased before they can be
written to. The time taken to erase blocks can be significant, thus it is beneficial to erase
unused blocks while the device is idle.
Random access: Disk file systems are optimized to avoid disk seeks whenever possible,
due to the high cost of seeking. Flash memory devices impose no seek latency.
Wear leveling: Flash memory devices tend to wear out when a single block is repeatedly
overwritten; flash file systems are designed to spread out writes evenly.
File system types can be classified into disk/tape file systems, network file systems and specialpurpose file systems.
[edit] Disk file systems
A disk file system takes advantages of the ability of disk storage media to randomly address data
in a short amount of time. Additional considerations include the speed of accessing data
following that initially requested and the anticipation that the following data may also be
requested. This permits multiple users (or processes) access to various data on the disk without
regard to the sequential location of the data. Examples include FAT32, exFAT, NTFS, HFS and
Optical discs
ISO 9660 and Universal Disk Format (UDF) are two common formats that target Compact
Discs, DVDs and Blu-ray discs. Mount Rainier is an extension to UDF supported by Linux 2.6
series and Windows Vista that facilitates rewriting to DVDs.
Flash file systems
flash file system considers the special abilities, performance and restrictions of flash memory
devices. Frequently a disk file system can use a flash memory device as the underlying storage
media but it is much better to use a file system specifically designed for a flash device.
Tape file systems
A tape file system is a file system and tape format designed to store files on tape in a selfdescribing form. Magnetic tapes are sequential storage media with significantly longer random
data access times than disks, posing challenges to the creation and efficient management of a
general-purpose file system.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 10
In a disk file system there is typically a master file directory, and a map of used and free data
regions. Any file additions, changes, or removals require updating the directory and the used/free
maps. Random access to data regions is measured in milliseconds so this system works well for
disks.
Tape requires linear motion to wind and unwind potentially very long reels of media. This tape
motion may take several seconds to several minutes to move the read/write head from one end of
the tape to the other.
Consequently, a master file directory and usage map can be extremely slow and inefficient with
tape. Writing typically involves reading the block usage map to find free blocks for writing,
updating the usage map and directory to add the data, and then advancing the tape to write the
data in the correct spot. Each additional file write requires updating the map and directory and
writing the data, which may take several seconds to occur for each file.
Tape file systems instead typically allow for the file directory to be spread across the tape
intermixed with the data, referred to as streaming, so that time-consuming and repeated tape
motions are not required to write new data.
However, a side effect of this design is that reading the file directory of a tape usually requires
scanning the entire tape to read all the scattered directory entries. Most data archiving software
that works with tape storage will store a local copy of the tape catalog on a disk file system, so
that adding files to a tape can be done quickly without having to rescan the tape media. The local
tape catalog copy is usually discarded if not used for a specified period of time, at which point
the tape must be re-scanned if it is to be used in the future.
Database file systems
Another concept for file management is the idea of a database-based file system. Instead of, or in
addition to, hierarchical structured management, files are identified by their characteristics, like
type of file, topic, author, or similar rich metadata. IBM DB2 for i (formerly known as DB2/400
and DB2 for i5/OS) is a database file system as part of the object based IBM i operating system
(formerly known as OS/400 and i5/OS), incorporating a single level store and running on IBM
Power Systems (formerly known as AS/400 and iSeries), designed by Frank G. Soltis IBM's
former chief scientist for IBM i. Around 1978 to 1988 Frank G. Soltis and his team at IBM
Rochester have successfully designed and applied technologies like the database file system
where others like Microsoft later failed to accomplish. These technologies are informally known
as 'Fortress Rochester' and were in few basic aspects extended from early Mainframe
technologies but in many ways more advanced from a technology perspective.
Some other projects that aren't "pure" database file systems but that use some aspects of a
database file system:

A lot of Web-CMS use a relational DBMS to store and retrieve files. Examples: XHTML files are
stored as XML or text fields, image files are stored as blob fields; SQL SELECT (with optional
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 11

XPath) statements retrieve the files, and allow the use of a sophisticated logic and more rich
information associations than "usual file systems".
Very large file systems, embodied by applications like Apache Hadoop and Google File System,
use some database file system concepts.
Transactional file systems
Some programs need to update multiple files "all at once". For example, a software installation
may write program binaries, libraries, and configuration files. If the software installation fails,
the program may be unusable. If the installation is upgrading a key system utility, such as the
command shell, the entire system may be left in an unusable state.
Transaction processing introduces the isolation guarantee, which states that operations within a
transaction are hidden from other threads on the system until the transaction commits, and that
interfering operations on the system will be properly serialized with the transaction. Transactions
also provide the atomicity guarantee, that operations inside of a transaction are either all
committed, or the transaction can be aborted and the system discards all of its partial results. This
means that if there is a crash or power failure, after recovery, the stored state will be consistent.
Either the software will be completely installed or the failed installation will be completely rolled
back, but an unusable partial install will not be left on the system.
Windows, beginning with Vista, added transaction support to NTFS, in a feature called
Transactional NTFS, but its use is now discouraged. There are a number of research prototypes
of transactional file systems for UNIX systems, including the Valor file system, Amino,
Ensuring consistency across multiple file system operations is difficult, if not impossible,
without file system transactions. File locking can be used as a concurrency control mechanism
for individual files, but it typically does not protect the directory structure or file metadata. For
instance, file locking cannot prevent TOCTTOU race conditions on symbolic links. File locking
also cannot automatically roll back a failed operation, such as a software upgrade; this requires
atomicity.
Network file systems
A network file system is a file system that acts as a client for a remote file access protocol,
providing access to files on a server. Examples of network file systems include clients for the
NFS, AFS, SMB protocols, and file-system-like clients for FTP
Shared disk file systems
A shared disk file system is one in which a number of machines (usually servers) all have access
to the same external disk subsystem (usually a SAN). The file system arbitrates access to that
subsystem, preventing write collisions. Examples include GFS2 from Red Hat, GPFS from IBM,
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 12
Special file systems
A special file system presents non-file elements of an operating system as files so they can be
acted on using file system APIs. This is most commonly done in Unix-like operating systems,
but devices are given file names in some non-Unix-like operating systems as well.
Device file systems
A device file system represents I/O devices and pseudo-devices as files, called device files.
Examples in Unix-like systems include devfs and, in Linux 2.6 systems, udev. In non-Unix-like
systems, such as TOPS-10 and other operating systems influenced by it, where the full filename
or pathname of a file can include a device prefix, devices other than those containing file systems
are referred to by a device prefix specifying the device, without anything following it.
Operating Systems and File Systems
Many operating systems include support for more than one file system. Sometimes the OS and
the file system are so tightly interwoven it is difficult to separate out file system functions.
There needs to be an interface provided by the operating system software between the user and
the file system. This interface can be textual (such as provided by a command line interface, such
as the Unix shell, or OpenVMS DCL) or graphical (such as provided by a graphical user
interface, such as file browsers). If graphical, the metaphor of the folder, containing documents,
other files, and nested folders is often used (see also: directory and folder).
Unix-like operating systems
Unix-like operating systems create a virtual file system, which makes all the files on all the
devices appear to exist in a single hierarchy. This means, in those systems, there is one root
directory, and every file existing on the system is located under it somewhere. Unix-like systems
can use a RAM disk or network shared resource as its root directory.
Unix-like systems assign a device name to each device, but this is not how the files on that
device are accessed. Instead, to gain access to files on another device, the operating system must
first be informed where in the directory tree those files should appear. This process is called
mounting a file system. For example, to access the files on a CD-ROM, one must tell the
operating system "Take the file system from this CD-ROM and make it appear under such-andsuch directory". The directory given to the operating system is called the mount point – it might,
for example, be /media. The /media directory exists on many Unix systems (as specified in the
Filesystem Hierarchy Standard) and is intended specifically for use as a mount point for
removable media such as CDs, DVDs, USB drives or floppy disks. It may be empty, or it may
contain subdirectories for mounting individual devices. Generally, only the administrator (i.e.
root user) may authorize the mounting of file systems.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 13
Unix-like operating systems often include software and tools that assist in the mounting process
and provide it new functionality. Some of these strategies have been coined "auto-mounting" as a
reflection of their purpose.
1. In many situations, file systems other than the root need to be available as soon as the
operating system has booted. All Unix-like systems therefore provide a facility for mounting file
systems at boot time. System administrators define these file systems in the configuration file
fstab (vfstab in Solaris), which also indicates options and mount points.
2. In some situations, there is no need to mount certain file systems at boot time, although their
use may be desired thereafter. There are some utilities for Unix-like systems that allow the
mounting of predefined file systems upon demand.
3. Removable media have become very common with microcomputer platforms. They allow
programs and data to be transferred between machines without a physical connection.
Common examples include USB flash drives, CD-ROMs, and DVDs. Utilities have therefore been
developed to detect the presence and availability of a medium and then mount that medium
without any user intervention.
4. Progressive Unix-like systems have also introduced a concept called supermounting; see, for
example, the Linux supermount-ng project. For example, a floppy disk that has been
supermounted can be physically removed from the system. Under normal circumstances, the
disk should have been synchronized and then unmounted before its removal. Provided
synchronization has occurred, a different disk can be inserted into the drive. The system
automatically notices that the disk has changed and updates the mount point contents to reflect
the new medium. Similar functionality is found on Windows machines.
5. An automounter will automatically mount a file system when a reference is made to the
directory atop which it should be mounted. This is usually used for file systems on network
servers, rather than relying on events such as the insertion of media, as would be appropriate
for removable media.
Linux
Linux supports many different file systems, but common choices for the system disk on a block
device include the ext* family (such as ext2, ext3 and ext4), XFS, JFS, ReiserFS and btrfs. For
raw flash without a flash translation layer (FTL) or Memory Technology Device (MTD), there is
UBIFS, JFFS2, and YAFFS, among others. SquashFS is a common compressed read-only file
system.
Solaris
The Sun Microsystems Solaris operating system in earlier releases defaulted to (non-journaled or
non-logging) UFS for bootable and supplementary file systems. Solaris defaulted to, supported,
and extended UFS.
Support for other file systems and significant enhancements were added over time, including
Veritas Software Corp. (Journaling) VxFS, Sun Microsystems (Clustering) QFS, Sun
Microsystems (Journaling) UFS, and Sun Microsystems (open source, poolable, 128 bit
compressible, and error-correcting) ZFS.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 14
Kernel extensions were added to Solaris to allow for bootable Veritas VxFS operation. Logging
or Journaling was added to UFS in Sun's Solaris 7. Releases of Solaris 10, Solaris Express,
OpenSolaris, and other open source variants of the Solaris operating system later supported
bootable ZFS.
Logical Volume Management allows for spanning a file system across multiple devices for the
purpose of adding redundancy, capacity, and/or throughput. Legacy environments in Solaris may
use Solaris Volume Manager (formerly known as Solstice DiskSuite.) Multiple operating
systems (including Solaris) may use Veritas Volume Manager. Modern Solaris based operating
systems eclipse the need for Volume Management through leveraging virtual storage pools in
ZFS.
OS X
OS X uses a file system that it inherited from classic Mac OS called HFS Plus, sometimes called
Mac OS Extended. HFS Plus is a metadata-rich and case-preserving but (usually) caseinsensitive file system. Due to the Unix roots of OS X, Unix permissions were added to HFS
Plus. Later versions of HFS Plus added journaling to prevent corruption of the file system
structure and introduced a number of optimizations to the allocation algorithms in an attempt to
defragment files automatically without requiring an external defragmenter.
Filenames can be up to 255 characters. HFS Plus uses Unicode to store filenames. On OS X, the
filetype can come from the type code, stored in file's metadata, or the filename extension.
Microsoft Windows
Directory listing in a Windows command shell
Windows makes use of the FAT, NTFS, exFAT and ReFS file systems (the latter is only
supported and usable in Windows Server 2012; Windows cannot boot from it).
Windows uses a drive letter abstraction at the user level to distinguish one disk or partition from
another. For example, the path C:\WINDOWS represents a directory WINDOWS on the partition
represented by the letter C. Drive C: is most commonly used for the primary hard disk partition
(since at the advent of hard disks many computers had two floppy drives, A: and B:), on which
Windows is usually installed and from which it boots. This "tradition" has become so firmly
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 15
ingrained that bugs exist in many applications which make assumptions that the drive that the
operating system is installed on is C. The use of drive letters, and the tradition of using "C" as
the drive letter for the primary hard disk partition, can be traced to MS-DOS, where the letters A
and B were reserved for up to two floppy disk drives
FAT
The family of FAT file systems is supported by almost all operating systems for personal
computers, including all versions of Windows and MS-DOS/PC DOS and DR-DOS. (PC DOS is
an OEM version of MS-DOS, MS-DOS was originally based on SCP's 86-DOS. DR-DOS was
based on Digital Research's Concurrent DOS, a successor of CP/M-86.) The FAT file systems
are therefore well-suited as a universal exchange format between computers and devices of most
any type and age.
The FAT file system traces its roots back to an (incompatible) 8-bit FAT precursor in Standalone Disk BASIC and the short-lived MDOS/MIDAS project.
Over the years, the file system has been expanded from FAT12 to FAT16 and FAT32. Various
features have been added to the file system including subdirectories, codepage support, extended
attributes, and long filenames. Third-parties such as Digital Research have incorporated optional
support for deletion tracking, and volume/directory/file-based multi-user security schemes to
support file and directory passwords and permissions such as read/write/execute/delete access
rights. Most of these extensions are not supported by Windows.
The FAT12 and FAT16 file systems had a limit on the number of entries in the root directory of
the file system and had restrictions on the maximum size of FAT-formatted disks or partitions.
FAT32 addresses the limitations in FAT12 and FAT16, except for the file size limit of close to
4 GB, but it remains limited compared to NTFS.
FAT12, FAT16 and FAT32 also have a limit of eight characters for the file name, and three
characters for the extension (such as .exe). This is commonly referred to as the 8.3 filename
limit. VFAT, an optional extension to FAT12, FAT16 and FAT32, introduced in Windows 95
and Windows NT 3.5, allowed long file names (LFN) to be stored in the FAT file system in a
backwards compatible fashion.
NTFS
NTFS, introduced with the Windows NT operating system, allowed ACL-based permission
control. Other features also supported by NTFS include hard links, multiple file streams, attribute
indexing, quota tracking, sparse files, encryption, compression, and reparse points (directories
working as mount-points for other file systems, symlinks, junctions, remote storage links),
though not all these features are well-documented]
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 16
Long file paths and long file names
In hierarchical file systems, files are accessed by means of a path that is a branching list of
directories containing the file. Different file systems have different limits on the depth of the
path. File systems also have a limit on the length of an individual filename.
Copying files with long names or located in paths of significant depth from one file system to
another may cause undesirable results. This depends on how the utility doing the copying
handles the discrepancy.
CSCD102 – INTRODUCTION TO COMPUTER SCIENCE II
Page 17