Download What is a file system? - Montana State University

Document related concepts

Object storage wikipedia , lookup

Library (computing) wikipedia , lookup

MTS system architecture wikipedia , lookup

Plan 9 from Bell Labs wikipedia , lookup

Windows NT startup process wikipedia , lookup

RSTS/E wikipedia , lookup

OS 2200 wikipedia , lookup

DNIX wikipedia , lookup

Commodore DOS wikipedia , lookup

Spring (operating system) wikipedia , lookup

Batch file wikipedia , lookup

Burroughs MCP wikipedia , lookup

CP/M wikipedia , lookup

Computer file wikipedia , lookup

VS/9 wikipedia , lookup

File locking wikipedia , lookup

Unix security wikipedia , lookup

Transcript
Linux Virtual File System
Robert Ledford
Leif Wickland
CS518
Fall 2004
I/O, I/O, It's off to disk I go-o-o, A bit or
byte to read or write, I/O, I/O, I/O...
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
What is a file system?
• Speaking broadly, a file system is the
logical means for an operating system to
store and retrieve data on the computers
hard disks, be they local drives, networkavailable volumes, or exported shares in a
storage area network (SAN)
What is a file system?
• There is some ambiguity in the term “file
system”. The term can be used to mean
any of the following:
- The type of a file system refers to a specific
implementation such as ext2, reiserfs or nfs,
each implementation contains the methods and
data structures that an operating system uses to
keep track of files on a disk or partition
What is a file system?
- An instance of a file system refers to a file
system type residing at a location such as
/dev/hda4
- Additionally a file system can refer to the
methods and data structures that an operating
system uses to keep track of files on a disk or
partition
What is a file system?
• Linux keeps regular files and directories
on block devices such as disks
• A Linux installation may have several
physical disk units, each containing one or
more file system types
• Partitioning a disk into several file system
instances makes it easier for
administrators to manage the data stored
there
What is a file system?
Overhead
view
sector
track
cylinder
Disk blocks are
composed of one or more
contiguous sectors
The same track on each
platter in a disk makes a
cylinder; partitions are
groups of contiguous
cylinders
What is a file system?
• Why have multiple partitions?
• Encapsulate your data:
- Since file system corruption is local to a
partition, you stand to lose only some of your
data if an accident occurs
What is a file system?
• Increase disk space efficiency:
- You can format partitions with varying block
sizes, depending on your usage
- If your data is in a large number of small files
(less than 1k) and your partition uses 4k sized
blocks, you are wasting 3k for every file
- In general, you waste on average one half of a
block for every file, so matching block size to the
average size of your files is important if you have
many files
What is a file system?
• Limit data growth:
- Runaway processes or maniacal users can
consume so much disk space that the operating
system no longer has room on the hard drive for
its bookkeeping operations
- This can lead to disaster. By segregating
space, you ensure that things other than the
operating system die when allocated disk space
is exhausted
What is a file system?
• Partitioning tools and utilities:
- fdisk
rledford@leonard > sudo fdisk -l /dev/had
Disk /dev/hda: 255 heads, 63 sectors, 4863 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start
End Blocks Id System
/dev/hda1 *
1
26 208813+ 83 Linux
/dev/hda2
27
1070 8385930 83 Linux
/dev/hda3
1071
1853 6289447+ 83 Linux
/dev/hda4
1854
4863 24177825 f Win95 Ext'd (LBA)
/dev/hda5
1854
2375 4192933+ 83 Linux
/dev/hda6
2376
2897 4192933+ 83 Linux
/dev/hda7
2898
3028 1052226 82 Linux swap
/dev/hda8
3029
4863 14739606 83 Linux
- parted: GNU partition editor
What is a file system?
• Review Questions
- Where does Linux keep regular file types?
• On block devices such as disks.
- On average, how much of a block is wasted for every
file?
• On average ½ of a block is wasted for every file.
What is a file system?
• Review:
• File system instances reside on partitions
• Partitioning is a means to divide a single
hard drive into many logical drives
• A partition is a contiguous set of blocks on
a drive that are treated as an independent
disk
• A partition table is an index that relates
sections of the hard drive to partitions
What is a file system?
Entire Disk
Master Boot
Record
Boot Block
Partition Table
Super Block
Inode List
Disk Partitions
Data Blocks
A Possible File System Instance Layout
What is a file system?
• The central structural concepts of a file
system type are:
- Boot Block
- Super Block
- Inode List
- Data Block
Boot Block
Super Block
Inode List
Data Blocks
What is a file system?
• Boot Block:
- Occupies the beginning of a file system
- Typically residing at the first sector, it may also
contain the bootstrap code that is read into the
machine at boot time
- Although only one boot block is required to
boot the system, every file system may contain a
boot block
Boot Block
What is a file system?
• Super Block:
- Describes the state of a file system
- How large it is
- How many files it can store
- Where to find free space in the file system
- Additional data that assists the file
management system with operating on the file
system
Boot Block
Super Block
What is a file system?
• Super Block:
- Duplicate copies of the super block may reside
through out the file system in case the super
block is corrupted
Boot Block
Super Block
What is a file system?
• Inode List:
- An inode is the internal representation of a file
contains the description of the disk layout of the
file data
- file owner
- permissions
- The inode list contains all of the inodes present
in an instance of a file system
Boot Block
Super Block
Inode List
What is a file system?
• Data Blocks:
- Contain the file data in the file system
- Additional administrative data
- An allocated data block can belong to one and
only one file in the file system
Boot Block
Super Block
Inode List
Data Blocks
What is a file system?
• On a Linux system, a user or user
program sees a single file hierarchy,
rooted at /
• Every file and directory can trace its
origins on a tree back to the root directory
What is a file system?
/
bin boot dev etc home lib mnt proc root sbin tmp usr var
What is a file system?
• Review Questions
- Where is the boot strap code located
• In the Boot Block.
- What contains information about a file’s owner and the
file’s permissions?
• The inode.
- What is the index that relates sections of the hard drive
to partitions?
• The Partition Table.
What is a file system?
• A file system implements the basic
operations to manipulate files and
directories
• These basic operations include:
- Opening of files
- Closing of files
- Creation of directories
- Listing of the contents of directories
- Removal of files from a directory
What is a file system?
• The kernel deals on a logical level with file
systems rather than with disks
• The separate file systems that the system
may use are not accessed by device
identifiers
• Instead they are combined into a single
hierarchical tree structure that represents
the file systems as one whole single entity
What is a file system?
• So what is a file system?
- A file system is a set of abstract data types that
are implemented for the storage, hierarchical
organization, manipulation, navigation, access,
and retrieval of data
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
File systems have layers
Do I have to?
Another layer of indirection
How is it done?
Sign me up
Historical view of file systems
• Managing storage has long been a
prominent role for operating systems.
– This role was so important to the MS-DOS OS
that it was named after that function.
– DOS stands for Disk Operating System.
– Created in 1980.
Historical view of file systems
• The file system hasn’t changed much
since the 1960s.
– A research paper was presented in 1965
describing “A General-Purpose File System
For Secondary Storage.”
• Laid out the notion of a hierarchal file system much
as is used today.
Historical view of file systems
• File system features in 1965 paper
–
–
–
–
–
–
–
–
Files
Directories
Links
Access permissions
Create, access and modify times
Path nomenclature: Directory:directory:file
All devices mount into a unified hierarchy
Backing up
• Everybody knew it was a good idea back then, too.
Historical view of file systems
• This type of file system was implemented
in Multics.
• Unix was created as an “emasculated”
version of Multics.
– Project started as gaming system in 1969.
• The designers of Unix had worked on
Multics and brought to Unix a Multics-style
file system.
Historical view of file systems
File System API in Unix System V, c. 1983
•
•
•
•
•
•
•
•
•
chdir: change directory
chmod: change permission
chown: change owner
chroot: change root
close: close a file
creat: create a file
dup: copy file descriptor
link: add a file reference
lseek: set open file cursor
•
•
•
•
•
•
•
•
•
mknod: make a special file
mount: graft in a file system
open: open a file system
pipe: create a pipe
read: read from a file
stat: get file status
umount: opposite of mount
unlink: delete file
write: write to a file
Historical view of file systems
• Tannenbaum wrote Minix, a pedantic
version of Unix, in 1987.
– Of course, it included a Unix-style file API.
Historical view of file systems
• Linus Torvalds introduced Linux in 1991.
– He developed Linux on Minix.
– Consequently it was convenient for the OSes
to share a file system and file API.
– Thus, Linux inherited the same style file
system as presented in the 1965 paper.
– Today Linux supports a superset of the file
system features available in Unix System V.
Historical view of file systems
• Review Questions
– In what year was the paper released that
described the file system design that is the
ancestor of Linux’ file system?
• 1965
– Yes, that’s about 40 years ago
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
Another layer of indirection
• Multics, Unix, Minux, and Linux originally
supported only one file system.
– They only understood one type of layout on
disk for directories and files.
– Because of its origins, Linux initially supported
just the Minix file system.
• Limited to small partitions and short filenames
– However, it wasn’t long before people wanted
more from their file systems
Another layer of indirection
• The problem:
– Linux was
implemented like this.
User Program
Minix FS Interface
Hard Drive
– To add support for
another file system in
a similar manner was
unsavory and didn’t
scale.
• User program must call
a separate API for each
type of file system.
User Program
Other FS
Minix FS
Hard Drive A
Hard Drive B
Another layer of indirection
• “Any problem in computer science can be
solved with another layer of indirection.”
– David Wheeler (chief programmer for the
EDSAC project in the early 1950s)
Another layer of indirection
• The solution was to add a
layer of indirection to the
file system stack.
• In Linux this layer is
called the virtual file
system (VFS).
• User programs access
any file system through a
consistent API.
• All File Systems
implement an API which
is called by the VFS.
User Program
Virtual File System
Other FS
Minix FS
Hard Drive A
Hard Drive B
Another layer of indirection
• The VFS is
– Another layer of indirection
– A file system- and device-agnostic layer of the
operating system
– A consistent API for user applications to
access storage independent of the underlying
device or type of file system
Task 1
…
Task 2
Task n
user space
kernel space
VIRTUAL FILE SYSTEM
minix
ext2
msdos
proc
Buffer Cache
device driver
for hard disk
device driver
for floppy disk
Linux Kernel
software
hardware
Hard Disk
Floppy Disk
Robbed from http://www.cs.usfca.edu/~cruse/cs326/lesson22.ppt
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
Do I have to?
• Is a VFS worth doing?
– What do you think?
Do I have to?
• Cons
– Harms system
performance
• Another layer of
indirection
– Adds to the size of the
system because more
code must be written
– A conceptually simpler
system
• Pros
– Enables using multiple
file systems
• Facilitates research
– Makes the computer
more useful
• My Linux box has ext2,
ext3, FAT32, and NTFS
partitions mounted
– Facilitates code reuse
– Simplifies
implementation
Do I have to?
• Is a VFS worth doing?
– Ultimately, the answer is yes for general
purpose operating systems.
– All modern commercial operating systems do.
– What would you do if you had to design an
OS’ file system? Would you use a VFS?
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
File systems have layers
• Like onions and ogres…
• In this context “file system” means the
software stack that extends from the user
application to the hardware.
File systems have layers
– File/Directory API
– Inodes
– Buffers
User Program
File/Directory API
Inodes
Buffers
Storage Device
File system
• The file system of
Unix System V has
three layers
File systems have layers
• Unix System V
File/Directory API
• Called by user
programs
– Directories are
implemented as files
• Contain children’s
– Name
– Inode Number
File/Directory API
Inodes
Buffers
Storage Device
File system
– Functions like open()
and read()
User Program
File systems have layers
• Unix System V Inodes
•
•
•
•
Owner
Access Permissions
File Size
Type
– File
– Directory
– Special
• Not File Names
File/Directory API
Inodes
Buffers
Storage Device
If file names aren’t stored in inodes,
where are they stored?
File names are stored in the parent
directory entry.
File system
– Allocate disk blocks for
files.
– Record file attributes
User Program
File systems have layers
• Unix System V
Inodes, continued
• With added details
– Device the inode is
from
– The inode number
– If the file is a mount
point
– Much more…
– Many pathnames may
point to a single inode
File/Directory API
Inodes
Buffers
Storage Device
File system
– Stored on disk
– Cached in memory
User Program
File systems have layers
• Implementation of an inode
File Attributes
Direct Block
Direct 0
Direct 1
Direct 2
Direct 3
Direct 4
Direct 5
Direct 6
Direct 7
Direct 8
Direct 9
Single Indirect
Double Indirect
Triple Indirect
Direct Block
.
.
.
Direct Block
Indirect
Block
Indirect
Indirect Block
Blocks
Inode
Inode
Inode
Indirect2 Blocks
Inodes
Inodes
Inodes
Indirect3 Blocks
File systems have layers
• Notes on previous diagram
– All the direct pointers of an inode are used
before using an indirect pointer
– All of the slots in a single indirect inode are
consumed before starting to use double
indirect inodes
• Likewise for double indirect
File systems have layers
• Unix System V
Buffers
File/Directory API
Inodes
• Same size
– Mechanism through
which caching is
achieved
• Read ahead
• Delayed write
Buffers
Storage Device
File system
– In memory copy of
contents of a disk
block
User Program
File systems have layers
• Review Questions
– Where is the type of a directory entry stored?
• In the inode the directory entry points to.
– Where is the name of an inode stored?
• In the directory entry which points to it.
– Can more than one file point to a given inode?
• Yes. Many files may point to the same inode.
– Can more than inode point to a disk block?
• No. Inodes point to zero or more blocks, but a block may be
referenced by zero or one inode.
– How is a directory different that a file?
• A directory is a type of file, as indicated by the inode, which
contains a listing of the directory’s children
File systems have layers
•
How do you get from a file name to a
file’s contents?
– Recursive procedure
1. Start with inode of current or root directory
•
•
2.
3.
4.
5.
If path begins with ‘/’ use root; otherwise use current
Inodes for both are cached for the process
Get disk block(s) pointed to by inode for directory
Search directory listing for next part of path.
If found, get the inode pointed to by entry
If inode says child is a directory, go to 2. If child
is a file, get disk blocks(s) inode points to.
File systems have layers
• Resolving a file path
to a file’s contents
– Consider how we’d
resolve the path
/foo/foobarred/found to
that file’s contents in
this example directory
tree.
– See following slides for
an example
/
foo
.bashrc
.ssh foobarred
lost
found
bar
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Step through the following
slides to see the process
of resolving the file name
/foo/foobarred/found to its
contents
#
Disk Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Inode for root directory
is known to be in slot 0.
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Root directory inode
points to block 0
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Root directory listing says
that inode 1 is for child
foo.
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Foo’s inode says that it’s
a directory and points to
block 7
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Foo’s directory listing
says that inode 2 is
for child foobarred
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Foobarred’s inode
says that it is a
directory and its
listing is in block 2
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Directory listing of
foobarred says that
child found has inode
7.
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
Inode says that
found is a file and 1
is its first block
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Resolving a file path to a file’s contents
Inodes
# Type Disk Block
0
Dir
0
1
Dir
7
2
Dir
2
3 File
3
4
Dir
5
5 File
4
6 File
6
7 File
1
The content of
/foo/foobarred/found
is “I’m found”
#
Disk Data Blocks
Contents
0
foo: 1; .bashrc:5; bar: 4
1
I’m found
2
lost: 6; found: 7
3
All my secrets would go in here…
4
I get into bars with my aliases
5
-empty-
6
I’m lost
7
foobarred: 2; .ssh: 3
File systems have layers
• Review Questions
– What are the layers of the System V file
system?
• File/Directory API
• Inodes
• Buffers
File systems have layers
• Problems in System V file system
implementation
– Searching a directory listing for a child is time
consuming
• Directory listing is unsorted
–
–
–
–
Allows entries to be inserted and removed cheaply
Makes searching expensive
Requires that a linear be performed; can’t use binary
Implies that time to find entry increases linearly with the
number of directory entries
File systems have layers
• Problems in System V file system
implementation
– The listing for the directory must be read from
disk at each step in a path
• Can cause the disk head to jump around
– For example, if you want to read the file
/foo/foobarred/found, you have to read and search the
three directory listing along the way
• Detrimental to performance
File systems have layers
• The Linux solution to these shortcomings
– Add another layer of indirection
– Layer is called the dcache
• Short for directory cache
• Caches the contents of directory listings
– The dcache is composed of dentries
• Short for directory entry
– A dentry is a cached association from a path
name to an inode
• Also caches relationships to other dentries
File systems have layers
• With the addition of the dcache, what does
the Linux file system software stack look
like, compared to the System V file system
software stack?
– Glad you asked
– Next slide, please
File systems have layers
• Buffers are not part of
VFS
• Dcache doesn’t exist
in System V
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
– File/Directory API
– Dcache
– Inodes
User Program
Layers of the Linux kernel FS
• Since version 2.1, the
Linux virtual file
system has had three
layers
File systems have layers
• Modern Linux
File/Directory API
– More on that later
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
• A file descriptors is an
index into an array of
pointers to file
structures
Layers of the Linux kernel FS
– Implements a superset
of the System V API
– User programs interact
with the File Directory
API through path
names or integer file
descriptors
User Program
File systems have layers
• Modern Linux Dcache
– Parent
– Siblings
– Children
• Has a hash value for
the name of the file it
represents
– Speeds up string
comparisons
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
• Caches the inode that a
path name points to
• Caches relationships to
other dentries
Layers of the Linux kernel FS
– Improves performance
– Composed of dentries
User Program
File systems have layers
• Modern Linux Inodes
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
Layers of the Linux kernel FS
– Much like System V
inodes
– Contain pointers to file
system
implementation
specific operations
User Program
File systems have layers
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
– Can be compiled into
the kernel or can be
loaded as a kernel
module
User Program
Layers of the Linux kernel FS
• Modern Linux Real
File System
Implementation
File systems have layers
• Modern Linux File
Buffers
File/Directory API
Dcache
Inodes
Real FS Implementation
Buffers
Storage Device
Virtual file system
Layers of the Linux kernel FS
– Integrated with the
virtual memory cache
since 1999
– See virtual memory
presentation for more
information
User Program
File systems have layers
• Review Questions
– What’s the purpose of a dentry?
• A dentry exists to cache a directory entry in order
to improve performance
– What are some of the things a dentry links to?
• A dentry links to
–
–
–
–
Its parent
A list of its children
A list of its siblings
Its file’s in memory inode
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
How is it done?
• Down to the nitty-gritty code details
– There’s nothing to fear here
How is it done?
• There are quite a few C structures that the
kernel employs to keep track of open files
– struct task_struct (View sched.h code)
• Details about a user process
• Has a pointer to a files_struct
• During a syscall, the kernel has a pointer the
current process’ task_struct
– struct files_struct (View file.h code)
• Tracks the open files for a process
• Has an array of pointers to file struct instances.
– When a user program opens a file, it’s given an integer
index into this array
How is it done?
• File related kernel structures, continued
– struct file (View fs.h code)
• The kernel maintains an array of these with an element for
each open file system-wide
• Has a pointer to dentry; plus file owner, read/write cursor
position, and more
– struct dentry (View dcache.h code)
• Points to the inode for the file
– struct inode (View fs.h code)
• Stores its block number and the device it’s from
• Points to operations in the FS implementation which can read
the disk
How is it done?
• The following diagram shows the
relationship of the aforementioned
structures
User Program
task_struct struct
files_struct struct
other stuff
other stuff
File handle
(integer
value)
files_struct pointer
file struct ptr array
file struct ptr
file struct ptr
file struct ptr
file struct ptr
Kernel’s array of
all file structs
A file struct
Dentry pointer
Other stuff
A file struct
Storage
Device
FS instance
Disk Inode
Disk Inode
In memory inode
cache
Dcache
inode struct
A Dentry
inode struct
inode struct
Data Blocks
Inode pointer
Other stuff
A Dentry
Inside the Linux Kernel
How is it done?
• Review Questions
– When a user program calls open() to open a file, a
non-negative return value indicates success. What
does the function of that non-negative number?
• The return value is the index into the array of file structure
pointers in the files_struct structure.
– What is the cardinality between user processes and
files_struct instances?
• There is a one-to-one relationship between files_struct
instances and user processes.
– Why?
• Because there are one-to-one relationships between user
processes and task_struct instances and between
task_struct instances and files_struct instances.
How is it done?
• Review Questions
– If two user processes open the same file, are
two or one file struct instances created?
• Trick question
– Normally, two instances are created
– However, if a process opened a file and then forked to
create another process, parent and child have the file
open and both share one file struct instance
How is it done?
• Great. Now we see how the data relates
• But we’d like to see some action
• Consider the following C program
#include <unistd.h>
void main() {
const int seekFromStart = 0;
const int rdWrCreate = 00102;
char ch = 'A';
int fd = open("fsSample.txt", rdWrCreate);
write(fd, &ch, 1);
lseek(fd, 0, seekFromStart);
read(fd, &ch, 1);
close(fd); printf("The character read was: %c\n", ch);
}
• What actually happens when those file
system API functions are called?
• Links in diagrams go to Linux 2.6 source
How is it done?
User program
open()
System call layer
sys_open()
getname()
get_unused_fd()
filp_open()
fd_install()
putname()
Takes the path name of
the file to open as an
argument
Make a copy of the path name string in kernel space
Reserve an unused element in the array of file struct
pointers in the process’ files_struct instance (diagram)
The workhorse: see the next slide for more detail.
Will return a file struct instance for the opened file.
How is it done?
User program
open()
System call layer
sys_open()
filp_open()
open_namei()
dentry_open()
path_lookup()
Determine if the path is relative to the current
directory, the root directory, or a process specific root.
Get the dentry for that directory.
Call link_path_walk() which performs file name
resolution. See next slide for more.
__lookup_hash()
Get a dentry for the file we’re opening. Call real FS
implementation to populate structure if not cached.
vfs_create()
If the file doesn’t exist, call the real FS implementation
to create it.
may_open()
Check that the user has permission to open the file.
How is it done?
User program
open()
System call layer
sys_open()
filp_open()
open_namei()
path_lookup()
link_path_walk()
Purpose: Resolve the path name to a dentry
Set inode to the inode for directory that path_lookup()
determined the path name was relative to
While there are segments left in the path name
Fail if user does not have permission to inode
Parse the next “/” separated segment in path name
If the segment is “..”
Set inode to the parent of inode, allowing for
crossing mount points or inode being root dir
Get a dentry for child of inode named by segment
If the dentry for the segment isn’t cached
Ask FS implementation for dentry and its inode
If the dentry points to a symbolic link
Get the dentry and inode for the item pointed to
If the dentry is for a mount point
Get the dentry and inode for mounted root dir
Set inode to dentry’s inode
Return the dentry that owns inode
How is it done?
User program
open()
System call layer
sys_open()
Returned a dentry for the path name
filp_open()
open_namei()
dentry_open()
Allocate a file struct instance
Populate that file struct from the dentry
How is it done?
User program
open()
System call layer
sys_open()
getname()
get_unused_fd()
filp_open()
fd_install()
putname()
Returned a file struct instance for the opened file
Set the previously reserved file struct pointer to the file
struct instance returned by filp_open()
Deallocate the kernel copy of the filename
How is it done?
User program
open()
System call layer
sys_open()
Returns the index where fd_install() put the new file
struct pointer. Value is called the file descriptor, or FD.
How is it done?
User program
Takes the open file’s FD
as an argument
read()
System call layer
sys_read()
fget_light()
file_pos_read()
vfs_read()
file_pos_write()
fput_light()
Get the file struct instance for the FD
Copy the read/write cursor location from file struct
Call the real FS implementation to read a specified
number of bytes from the file
Increment the read/write cursor by the number of
bytes read
Release the file struct instance for the FD
How is it done?
User program
Takes the open file’s FD
as an argument
lseek()
System call layer
sys_lseek()
fget_light()
vfs_llseek()
fput_light()
Get the FD’s file struct instance
default_llseek()
Get the read/write cursor location from file struct
Update the cursor according to the arguments
Save the cursor location back into the file struct
Release the FD’s file struct instance
How is it done?
User program
Takes the open file’s FD
as an argument
write()
System call layer
sys_write()
fget_light()
file_pos_read()
vfs_write()
file_pos_write()
fput_light()
Get the file struct instance for the FD
Copy the read/write cursor location from file struct
Call the real FS implementation to write the specified
bytes at the write cursor location
Increment the read/write cursor by the number of
bytes written
Release the file struct instance for the FD
How is it done?
User program
close()
Takes the FD of the file
to close as an argument
Frees FD’s slot in the array of file struct pointers
System call layer
sys_close()
filp_close()
Saves the value of FD so that an unused slot can be
found quickly when next one is needed
Calls the real FS implementation to flush any unwritten
data to storage
fput()
If this is the last reference to the file struct instance
Allow the real FS implementation to free resources
Deallocate the dentry for the file
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
Sign me up.
• So how do the Linux kernel and the Virtual
File System find out about a particular file
system implementation?
• (e.g., ext2, nfs, reiserfs)
Sign me up.
• When you perform a Linux installation you
are prompted to see if you want each of
the supported file systems in your build
• When the kernel is actually built, the file
system startup code contains calls to the
initialization routines of all of the built in file
systems
Sign me up.
• Linux file systems may also be built as
modules and, in this case, they may be
demand loaded as they are needed or
loaded by hand using insmod
• Whenever a file system module is loaded
it registers itself with the kernel and
unregisters itself when it is unloaded
Sign me up.
• Each specific file system's initialization
routine registers itself with the Virtual File
System and is represented by a:
file_system_type
data structure which contains the name of
the file system and a pointer to its VFS
Super Block read routine
Sign me up.
• When a file system is registered the
file_system_type data structure is
populated with data specific to the file
system
• And the file_system_type structure is put
into a list, pointed at by the file_systems
pointer
Sign me up.
• The kernel utilizes the file_systems list to
check if a specific file system has been
registered
• And to assist with the mapping of the
Virtual File System’s operations to a
specific implementation of a file system’s
operations by using the VFS Super Block
read routine
Sign me up.
file_systems
file_systems
pointer
list of
file_system_type
structures
file_system_type
file_system_type
file_system_type
name (ext2)
name (reiserfs)
name (nfs)
fs_flags
fs_flags
fs_flags
*read_super()
*read_super()
*read_super()
*next
*next
*next
Registered File Systems Example
Sign me up.
• Where is the file_system_type structure
declared?
• What does it look like?
linux/fs.h
struct file_system_type {
const char *name;
int fs_flags;
struct super_block *(*read_super) (struct super_block *, void *,int);
struct file_system_type * next;
};
Sign me up.
• Some details of the file_system_type
structure:
- name:
-The name of the file system type, such as
ext2, nfs, reiserfs etc.
- This field is used as a key and it is not
possible to register a file system that is
already in use
- Function find_filesystem() utilizes the name
field to check if a file system has already
been registered
Sign me up.
- fs_flags:
- A number of flags which record
features of the file system
- If the fs_flag is set to FS_REQUIRES_DEV
then a block device must be given when
mounting the file system
- Not all file systems need a device to hold
them. The /proc file system, for example,
does not require a block device
Sign me up.
- fs_flags: (continued)
- Aside: The command:
$cat /proc/filesystems
displays file system information from the
file_systems list of file_system_structures
- In particular, it is possible to see if a file
system requires a device or not by noting the
presence or lack of the “nodev” in front of a
file system listing
Sign me up.
- fs_flags: (continued)
ledford@esus ~ $ cat /proc/filesystems
nodev rootfs
nodev bdev
nodev proc
nodev sockfs
nodev tmpfs
nodev shm
nodev pipefs
nodev binfmt_misc
ext3
ext2
nodev ramfs
iso9660
nodev nfs
nodev smbfs
nodev autofs
reiserfs
nodev devpts
xfs
Sign me up.
- next:
- A pointer used for chaining all the
file_system_type structures together
file_systems
file_system_type
file_system_type
file_system_type
name (ext2)
name (reiserfs)
name (nfs)
fs_flags
fs_flags
fs_flags
*read_super()
*read_super()
*read_super()
*next
*next
*next
Sign me up.
- read_super
- This routine is called by the VFS when an
instance of the file system is mounted
struct super_block *(*read_super) (struct super_block *, void *, int);
- What is going on here?
- A VFS super_block structure is being
populated by the function call read_super
with data specific to a particular file system
Sign me up.
struct super_block *(*read_super) (struct super_block *, void *, int);
- The void* pointer points to data that has
been passed down from the mount system
call
- The trailing int signifies whether or not
read_super should be silent about errors
- This is only set when mounting
the root file system
- Several file systems may be tried
when attempting to mount the root file
system so avoiding unsightly errors is
desired
Sign me up.
• So what gets called when we register a file
system?
• Linux finds out about new file system
types by calls to:
register_filesystem()
• And forgets about them by the calls to its
counterpart:
unregister_filesystem()
Sign me up.
The formal declarations are:
#include <linux/fs.h>
…
int register_filesystem(struct file_system_type * fs);
…
int unregister_filesystem(struct file_system_type * fs);
Sign me up.
• register_filesystem()
linux/fs/super.c
…
int register_filesystem(struct file_system_type * fs)
{
struct file_system_type ** tmp;
if (!fs)
return -EINVAL;
if (fs->next)
return -EBUSY;
tmp = &file_systems;
while (*tmp) {
if (strcmp((*tmp)->name, fs->name) == 0)
return -EBUSY;
tmp = &(*tmp)->next;
}
*tmp = fs;
return 0;
}
•
So what’s happening here?
- Essentially register_filesystem() takes a, file system specific, populated
file_system_type structure as a parameter performs some checks and, if
successful, adds the file_system_type structure to the file_systems list
Sign me up.
• unregister_filesystem()
linux/fs/super.c
…
int unregister_filesystem(struct file_system_type * fs)
{
#ifdef CONFIG_MODULES
struct file_system_type ** tmp;
tmp = &file_systems;
while (*tmp) {
if (fs == *tmp) {
*tmp = fs->next;
fs->next = NULL;
return 0;
}
tmp = &(*tmp)->next;
}
#endif
return -EINVAL;
}
•
So what’s happening here?
- unregister_filesystem() takes a, file system specific, populated
file_system_type structure as a parameter and removes the
file_system_type structure from the file_systems list if present
Sign me up.
• How is the file_system_type structure
populated with file system specific data?
- Each file system implementation defines
a file_system_type structure with data
specific to that file system
Sign me up.
• Example:
- The ext2 file_system_type structure
linux/fs/ext2/super.c
…
static struct file_system_type ext2_fs_type = {
"ext2",
FS_REQUIRES_DEV /* | FS_IBASKET */,
ext2_read_super,
NULL
};
/* ibaskets have unresolved bugs */
- Here we can see the fields: name, fs_flags,
read_super and next being populated
Sign me up.
• How is the register_filesystem() called
from a specific file system?
- Each file system implementation defines
an init function that calls
register_filesystem()
Sign me up.
• Example:
- The ext2 init_ext2_fs() function
linux/fs/ext2/super.c
…
static int __init init_ext2_fs(void)
{
return register_filesystem(&ext2_fs_type);
}
- This is pretty self-explanatory
Sign me up.
• How is the unregister_filesystem() called
from a specific file system?
- Each file system implementation defines
an exit function that calls
unregister_filesystem()
Sign me up.
• Example:
- The ext2 exit_ext2_fs() function
linux/fs/ext2/super.c
…
static void __exit exit_ext2_fs(void)
{
unregister_filesystem(&ext2_fs_type);
}
- Again, This is self-explanatory
Sign me up.
• So how and where are the calls to the init
and exit functions made?
linux/fs/ext2/super.c
…
module_init(init_ext2_fs)
module_exit(exit_ext2_fs)
- Calls to module_init() and module_exit() begin the, file
system specific, registration and unregistration
processes
Sign me up.
• But how are module_init() and
module_exit() called?
- module_init() is called when the module is
loaded, if built as a module, with a call to insmod
- Or it is called at the same time as all of the
init calls are made during the kernel boot
process
- module_exit() is called when the module is
unloaded, if built as a module, with a call to
rmmod
Sign me up.
• So how does this all fit together to register
a file system?
Called at boot
time or with
insmod
Sign me up.
module_init(init_ext2_fs)
Takes a
file_system_type
structure specific to
ext2 as a parameter
init_ext2_fs(void)
file_system_type
name (ext2)
fs_flags
register_filesystem(&ext2_fs_type)
*read_super()
Populates the file_systems
list with the
file_systems_type structure
for ext2
*next
file_systems
file_system_type
name (ext2)
fs_flags
*read_super()
Registering the ext2 file system
*next
Sign me up.
• So how does this all fit together to
unregister a file system?
Called with
rmmod
Sign me up.
module_exit(exit_foo_fs)
Takes a
file_system_type
structure specific to
foo as a parameter
exit_foo_fs(void)
file_system_type
name (foo)
fs_flags
unregister_filesystem(&foo_fs_type)
*read_super()
Remove from the
file_systems list the
file_systems_type structure
for foo
*next
file_systems
file_system_type
name (foo)
fs_flags
*read_super()
Unregistering the foo file system
*next
Called with
rmmod
Sign me up.
module_exit(exit_foo_fs)
exit_foo_fs(void)
unregister_filesystem(&foo_fs_type)
Remove from the
file_systems list the
file_systems_type structure
for foo
file_systems
file_system_type
name (foo)
fs_flags
*read_super()
Unregistering the foo file system
*next
Called with
rmmod
Sign me up.
module_exit(exit_foo_fs)
exit_foo_fs(void)
unregister_filesystem(&foo_fs_type)
Remove from the
file_systems list the
file_systems_type structure
for foo
Update the file_systems
pointer to point at what
next was pointing at in the
file_system_type structure
for foo
In this case it is ext2
file_systems
file_system_type
name (ext2)
fs_flags
*read_super()
Unregistering the foo file system
*next
Sign me up.
• Whoptie doo, Basil.. what does it all
mean?
- For the Virtual File System layer to work with a
specific file system implementation it must have
some knowledge of the file system
- This knowledge is acquired in part by
registering the file system
- Once this is done the Virtual File System can
map calls to files, on a specific file system,
through the correct structures and perform the
requested operations
Sign me up.
• What’s next?
- After the file system has been registered
we must mount it in order to use it
- A file system is mounted at boot or with
the use of the mount command
- To unmount a file system the umount
command is used
Sign me up.
• So what does mounting do?
- When a file system is mounted the
file_systems list is searched to see if the
file system has been registered
Sign me up.
Looking for ext2 in the file_systems list
Is name ext2? Is name ext2? Is name ext2?
NO
file_systems
NO
YES
file_system_type
file_system_type
file_system_type
name (nfs)
name (reiserfs)
name (ext2)
fs_flags
fs_flags
fs_flags
*read_super()
*read_super()
*read_super()
*next
*next
*next
Sign me up.
- If the file system is found in the list then
the read_super() function in the
file_system_type structure for the
particular file system is called
- The call to read super occurs in
fs/super.c do_mount()
file_system_type
name (ext2)
fs_flags
*read_super()
*next
Sign me up.
• Review read_super()
- Like we saw earlier the read_super() function
populates a VFS super_block with data from a
particular file system’s super_block
Sign me up.
• So what the heck is a super_block?
- A super_block is a structure that
maintains information about a particular
file system
- The Virtual File System has a
super_block structure that is populated
with data from a specific file system’s
implementation of the super_block
- Is this confusing?
Sign me up.
VFS
ext2
super_block
ext2_super_block
data 1
data 1
data 2
data 2
data 3
data 3
data 4
data 4
…
…
data n
data n
Sign me up.
• A little more in depth
– The call to the VFS read_super() makes a call
to a specific file systems version of
read_super()
– The file systems version of read_super()
populates a VFS super_block structure with
data from it’s version of the super_block and
returns the populated VFS super_block back
to the VFS read_super()
• Confused again?
Sign me up.
VFS version
read_super()
ext2 version
ext2_read_super()
Populates
super_block
super_block
ext2_super_block
data 1
data 1
data 2
data 2
…
data n
Returns populated suber_block
back to read_super
…
data 2
Sign me up.
• So why go through all this trouble?
– In order for the VFS layer to operate on the
data or inodes residing on a specific file
system it needs to know what the file system
specific data/inode operations are
– Part of the VFS super_block structure is a
pointer to a VFS super_operations structure
– This super_operations structure is populated
during read_super with the operations specific
to a file system
Sign me up.
• Links to the source for the above
operations:
– VFS read_super()
– ext2_read_super()
– VFS super_block
– ext2_super_block
– VFS super_operations
– ext2_sops // The ext2 version of the VFS
super_operations stucture
Sign me up.
• What’s next?
– Now that we’ve populated a VFS super_block
with file system specific data we need to
maintain some lists in the VFS layer so we
know what file systems have been mounted
– In read_super there is a call to insert_super()
– insert_super places the VFS super_block
generated by read_super into the
super_blocks list
Sign me up.
fs/super.c
static void insert_super(struct super_block *s, structfile_system_type *type)
{
s->s_type = type;
list_add(&s->s_list, super_blocks.prev);
list_add(&s->s_instances, &type->fs_supers);
spin_unlock(&sb_lock);
get_filesystem(type);
}
Sign me up.
VFS version
read_super()
ext2 version
ext2_read_super()
Populates VFS super_block
Returns to read_super()
read_super()
super_block is added to
the super_blocks list
super_blocks
super_block
data 1
insert_super()
Places super_block in list
data 2
…
data n
Sign me up.
• In addition to the super_blocks list another list,
vfsmntlist, is also maintained with the currently
mounted file systems
- vfsmntlist is a list of vfsmount structures
- The vfsmount list is populated by calls to
add_vfsmnt()
- The function do_mount() calls add_vfsmount()
after the call to read_super()
- Within the vfsmount structure is a pointer to the
VFS super_block for the mounted file system
Sign me up.
do_mount()
read_super()
add_vfsmnt()
Adds a vfsmount structure
to the vfsmntlist
Returns to do_mount()
do_mount()
vfsmount structure is
added to vfsmntlist
vfsmntlist
vfsmount
data 1
data 2
…
data n
Sign me up.
• In addition to the pointer to the VFS
super_block of the file system, the
vfsmount structure contains:
– The device number of the block device
holding the file system
– And the directory where this file system is
mounted
Sign me up.
• Review
• In order to use a file system:
– It must be registered with VFS
• Registration builds a VFS super_block structure from the
specific file systems super_block
• And places a file_system_type structure entry in the
file_systems list
– It must be mounted with VFS
• Mounting adds an entry into the super_blocks and the
vfsmntlist lists
• By utilizing the information in these data
structures VFS is able to resolve operations to
specific file system implementations
Overview
•
•
•
•
•
•
•
What is a file system?
Historical view of file systems
Another layer of indirection
Do I have to?
File systems have layers
How is it done?
Sign me up
Resources
• http://www.tldp.org/LDP/lki/lki-3.html
• http://mm.iit.unimiskolc.hu/Data/texts/Linux/SAG/node74.
html
• http://www.enterprisestorageforum.com/te
chnology/features/article.php/2026611
• http://www.multicians.org/fjcc4.html
Resources
• http://www.computerhope.com/history/unix
.htm
• http://msdn.microsoft.com/library/default.a
sp?url=/library/en-us/dnpag/html/intpattch03.asp
• http://e2fsprogs.sourceforge.net/ext2intro.
html
• http://www.cs.usfca.edu/~cruse/cs326/less
on22.ppt
Resources
• http://bama.ua.edu/~dunna001/journeyma
n/html/x323.htm#AEN390
• http://ldp.rtin.bz/LDP/lki/lki-3.html
• http://people.netfilter.org/~rusty/unreliableguides/kernel-hacking/routines-initagain.html
• http://www.cse.unsw.edu.au/~neilb/oss/lin
ux-commentary/vfs-3.html
Resources
• http://www.cs.wits.ac.za/~adi/courses/linux
admin/content/module2doc.html
• http://www.tldp.org/LDP/tlk/fs/filesystem.ht
ml
• http://www.faqs.org/docs/kernel_2_4/lki3.html
• http://www.tldp.org/HOWTO/Partition/partit
ion-4.html#AEN487
Resources
• http://www.science.unitn.it/~fiorella/guideli
nux/tlk/node94.html
• The design of the UNIX Operating System,
Maurice J. Bach
• Linux File Systems, Moshe Bar