Download Slides

Document related concepts

Library (computing) wikipedia , lookup

MTS system architecture wikipedia , lookup

Plan 9 from Bell Labs wikipedia , lookup

RSTS/E wikipedia , lookup

DNIX wikipedia , lookup

OS 2200 wikipedia , lookup

Batch file wikipedia , lookup

Spring (operating system) wikipedia , lookup

Commodore DOS wikipedia , lookup

Burroughs MCP wikipedia , lookup

Computer file wikipedia , lookup

VS/9 wikipedia , lookup

CP/M wikipedia , lookup

Unix security wikipedia , lookup

Transcript
Operating Systems
CS3013 / CS502
WEEK 6
DISKS
FILE SYSTEM INTERFACE
FILE SYSTEM IMPLEMENTATION
Agenda
2
 Disks
 File System Interface
 File System Implementation
Objectives
3
 Identify different parts of a disk drive
 Explain hard disk addressing schemes
 Understand performance and stability issues
 Differentiate various levels of RAID
Context
4
 Early days: disks thought of as I/O devices
 Controlled like I/O devices
 Block transfer, DMA, interrupts, etc.
 Data in and out of memory (where action is)
 Today: disks as integral part of computing system
 Implementer of two fundamental abstractions
Virtual Memory
 Files



Long term storage of information within system
The real center of action
Disk Drives
5
 External
Connection



IDE/ATA
SCSI
USB
 Cache –
independent of
OS
 Controller



Details of
read/write
Cache
management
Failure
management
Price per Megabyte of Magnetic Hard Disk
6
Price per GB on Different Types of Disk Drives
7
 19¢ per gigabyte – 500 GB Quadra (portable)


7200 rpm, 10 ms. avg. seek time, 16 MB drive cache
USB 2.0 port (effective 40 MBytes/sec)
 27.3¢ per GB – 1 TB Caviar (internal)


5400-7200 rpm, 8.9 ms. avg. seek time , 16 MB drive cache
ATA
 62¢ per GB – 80 GB Caviar (internal)


7200 rpm, 8.9 ms. ms. avg. seek time, 8 MB drive cache
EIDE (theoretical 66-100 MBytes/sec)
 $2.60 per GB – 146.8 GB HP SAS (hot swap)


15,000 rpm, 3.5 ms. avg. seek time
ATA
Hard Disk Geometry
8
 Platters
 Two-sided magnetic material
 1-16 per drive, 3,000 – 15,000 RPM
 Tracks
 Concentric rings bits laid out serially
 Divided into sectors (addressable)
 Cylinders
 Same track on each platter
 Arms move together
 Operation
 Seek: move arm to track
 Read/Write:


wait till sector arrives under head
Transfer data
Moving-head Disk Mechanism
9
More on Hard Disk Drives
10
 Manufactured in clean room
 Permanent, air-tight enclosure


“Winchester” technology
Spindle motor integral with shaft
 “Flying heads”



Aerodynamically “float” over moving surface
Velocities > 100 meters/sec
Parking position for heads during power-off
 Excess capacity


Sector re-mapping for bad blocks
Managed by OS or by drive controller
 20,000-100,000 hours mean time between failures

Disk failure (usually) means total destruction of data!
Raw Disk Layout
11
 Track format – n sectors


200 < n < 2000 in modern disks
Some disks have fewer sectors on
inner tracks
 Inter-sector gap

Enables each sector to be read or
written independently
 Sector format




Sector address: Cylinder, Track,
Sector (or some equivalent code)
Optional header (HDR)
Data
Each field separated by small gap
and with its own CRC
 Sector length


Almost all operating systems specify
uniform sector length
512 – 4096 bytes
Formatting the Disk
12
 Write all sector addresses
 Write and read back various data patterns on all sectors
 Test all sectors
 Identify bad blocks
 Bad block
 Any sector that does not reliably return the data that was
written to it!
Bad Block Management
13
 Bad blocks are inevitable
 Part of manufacturing process (less than 1%)


Detected during formatting
Occasionally, blocks become bad during operation
 Manufacturers add extra tracks to all disks
 Physical capacity = (1 + x) * rated capacity
 Who handles them?
 Disk controller: Bad block list maintained internally



Automatically substitutes good blocks
Formatter: Re-organize track to avoid bad blocks
OS: Bad block list maintained by OS, bad blocks never used
Bad Sector Handling – within track
14
a) A disk track with a bad sector
b) Substituting a spare for the bad sector
c) Shifting all the sectors to bypass the bad one
CTS Addressing
15
 Also known as CHS (Cylinder, Head, Sector)
Addressing

Blocks per platter = head per platter * cylinder * sectors per
cylinder

Total Blocks = Blocks per platter * platter

= cylinder * head * sector
LBA Addressing
16
 Linear Base Address (LBA) Addressing
 Logical Address for the disk
 Assume a disk with C cylinders, T tracks and S
sectors and a CTS Address (c, t, s)
 LBA Address = c * T * S + t * S + s – 1
Logical vs. Physical Sector Addresses
17
 Many modern disk controllers convert
[cylinder, track, sector]
addresses into logical sector numbers



Linear array
No gaps in addressing
Bad blocks concealed by controller
 Reason:
 Backward compatibility with older PC’s
 Limited number of bits in C, T, and S fields
Disk Drive Performance
18
 Seek time

Position heads over a cylinder – 1 to 25 ms
 Rotational latency



Wait for sector to rotate under head
Full rotation - 4 to 12 ms (15000 to 5400 RPM)
Latency averages ½ of rotation time
 Transfer Rate

approx 40-380 MB/sec (aka bandwidth)
 Transfer of 1 Kbyte


Seek (4 ms) + rotational latency (2ms) + transfer (40 μsec) = 6.04
ms
Effective BW here is about 170 KB/sec (misleading!)
Disk Reading Strategies
19
 Read and cache a whole track
 Automatic in many modern controllers
 Subsequent reads to same track have zero rotational latency –
good for locality of reference!
 Disk arm available to seek to another cylinder
 Start from current head position
 Start filling cache with first sector under head
 Signal completion when desired sector is read
 Start with requested sector
 When no cache, or limited cache sizes
Disk Writing Strategies
20
 There are none
 The best one can do is
 collect together a sequence of contiguous (or nearby) sectors
for writing
 Write them in a single sequence of disk actions
 Caching for later writing is (usually) a bad idea
 Application has no confidence that data is actually written
before a failure
 Some network disk systems provide this feature, with battery
backup power for protection
Efficiency and Performance
21
 Disk Cache Mechanism

There are many places to store disk data so the system
doesn’t need to get it from the disk again and again.
Efficiency and Performance
22
 Disk Cache Management
 This is an essential part of any
well-performing Operating
System.
 The goal is to ensure that the
disk is accessed as seldom as
possible.
 Keep previously read data in
memory so that it might be
read again.
 They also hold on to written
data, hoping to aggregate
several writes from a process.
 Can also be “smart” and do
things like read-ahead.
Anticipate what will be needed.
Performance Metrics
23
 Transaction & database systems



Number of transactions per second
Focus on seek and rotational latency, not bandwidth
Track caching may be irrelevant (except read-modify-write)
 Many little files (e.g., Unix, Linux)

Same
 Big files


Focus on bandwidth and contiguous allocation
Track caching important; seek time is secondary concern
 Paging support for VM


A combination of both
Track caching is highly relevant – locality of reference
Problem
24
 Question:–
 If mean time to failure of a disk drive is 100,000 hours,
 and if your system has 100 identical disks,
 what is mean time between need to replace a drive?
 Answer:–
 1000 hours (i.e., 41.67 days ≈ 6 weeks)
 I.e.:–
 You lose 1% of your data every 6 weeks!
 But don’t worry – you can restore most of it from
backup!
Can we do better?
25
 Yes, mirrored
 Write every block twice, on two separate disks
 Mean time between simultaneous failure of both disks is
>57,000 years
 Can we do even better?
 E.g., use fewer extra disks?
 E.g., get more performance?
Redundant Array of Inexpensive Disks (RAID)
26
 Distribute a file system intelligently across multiple
disks to

Maintain high reliability and availability

Enable fast recovery from failure

Increase performance
“Levels” of RAID
27
 Level 0 – non-redundant striping of blocks across






disk
Level 1 – simple mirroring
Level 2 – striping of bytes or bits with ECC
Level 3 – Level 2 with parity, not ECC
Level 4 – Level 0 with parity block
Level 5 – Level 4 with distributed parity blocks
…
RAID 0 – Simple Striping
28
stripe 0
stripe 4
stripe 8
stripe 1
stripe 5
stripe 9
stripe 2
stripe 6
stripe 10
stripe 3
stripe 7
stripe 11
 Each stripe is one or a group of contiguous blocks
 Block/group i is on disk (i mod n)
 Advantage
 Read/write n blocks in parallel; n times bandwidth
 Disadvantage
 No redundancy at all. System MTBF is 1/n disk MTBF!
RAID 1 – Striping and Mirroring
29
stripe 0
stripe 4
stripe 8
stripe 1 stripe 2 stripe 3 stripe 0
stripe 5 stripe 6 stripe 7 stripe 4
stripe 9 stripe 10 stripe 11 stripe 8
stripe 1 stripe 2 stripe 3
stripe 5 stripe 6 stripe 7
stripe 9 stripe 10 stripe 11
 Each stripe is written twice

Two separate, identical disks
 Block/group i is on disks (i mod 2n) & (i+n mod 2n)
 Advantages



Read/write n blocks in parallel; n times bandwidth
Redundancy: System MTBF = (Disk MTBF)2 at twice the cost
Failed disk can be replaced by copying
 Disadvantage

A lot of extra disks for much more reliability than we need
RAID 2 & 3
30
 Bit- or byte-level striping
 Requires synchronized disks
 Highly impractical
 Requires fancy electronics
 For ECC calculations
 Not used; academic interest only
 See Silbershatz, §12.7.3 (pp. 471-472)
Observation
31
 When a disk or stripe is read incorrectly,
we know which one failed!
 Conclusion:
 A simple parity disk can provide very high reliability
 (unlike simple parity in memory)
RAID 4 – Parity Disk
32
stripe 0
stripe 4
stripe 8
stripe 1
stripe 5
stripe 9
stripe 2
stripe 6
stripe 10
stripe 3
stripe 7
stripe 11
parity 0-3
parity 4-7
parity 8-11
 parity 0-3 = stripe 0 xor stripe 1 xor stripe 2 xor stripe 3
 n stripes plus parity are written/read in parallel
 If any disk/stripe fails, it can be reconstructed from others
 E.g., stripe 1 = stripe 0 xor stripe 2 xor stripe 3 xor parity 0-3
 Advantages
 n times read bandwidth
 System MTBF = (Disk MTBF)2 at 1/n additional cost
 Failed disk can be reconstructed “on-the-fly” (hot swap)
 Hot expansion: simply add n + 1 disks all initialized to zeros
 Disadvantage
 Writing requires read-modify-write of parity stripe
only 1x write bandwidth.
RAID 5 – Distributed Parity
33
stripe 0
stripe 4
stripe 8
stripe 12
stripe 1
stripe 5
stripe 9
stripe 2
stripe 6
parity 8-11
parity 12-15
stripe 13
stripe 3
parity 0-3
parity 4-7
stripe 7
stripe 11
stripe 15
stripe 10
stripe 14
 Parity calculation is same as RAID Level 4
 Advantages & Disadvantages – Mostly same as RAID Level 4
 Additional advantages
 avoids beating up on parity disk
 Some writes in parallel (if no contention for parity drive)
 Writing individual stripes (RAID 4 & 5)
 Read existing stripe and existing parity
 Recompute parity
 Write new stripe and new parity
RAID 4 & 5
34
 Very popular in data centers
 Corporate and academic servers
 Built-in support in Windows XP and Linux
 Connect a group of disks to fast SCSI port (320 MB/sec
bandwidth)
 OS RAID support does the rest!
 Other RAID variations also available
Another Problem - Incomplete Operation
35
 Problem – how to protect against disk write
operations that don’t finish


Power or CPU failure in the middle of a block
Related series of writes interrupted before all are completed
 Examples:
 Database update of charge and credit
 RAID 1, 4, 5 failure between redundant writes
Consistency
36
 Required when system crashes or data on the disk
may be inconsistent:
compares data in the directory structure with data
blocks on disk and tries to fix inconsistencies. For
Consistency Checker example, What if a file has a pointer to a block, but
the bit map for the free-space-management says
that block isn't allocated.
Back-up
provides consistency by copying data to a "safe"
place.
Recovery
occurs when lost data is retrieved from backup,
Solution 1 – Stable Storage
37
 Write everything twice to separate disks
 Be sure 1st write does not invalidate previous 2nd copy
 RAID 1 is okay; RAID 4/5 not okay!
 Read blocks back to validate; then report completion
 Reading both copies
 If 1st copy okay, use it – i.e., newest value
 If 2nd copy different or bad, update it with 1st copy
 If 1st copy is bad; update it with 2nd copy – i.e., old value
Solution 1 – Stable Storage
38
 Crash recovery
 Scan disks, compare corresponding blocks
 If one is bad, replace with good one
 If both good but different, replace 2nd with 1st copy
 Result:–
 If 1st block is good, it contains latest value
 If not, 2nd block still contains previous value
 An abstraction of an atomic disk write of a single
block

Uninterruptible by power failure, etc.
Solution 2 – Log-structured File System
39
 Also known as Journaling File System (JFS)
 Maintains a journal of changes to make
 Crash Recovery
 Replay the journal until the file system is consistent
 Atomic changes
 Succeed (succeeded originally or replayed completely during
recovery)
 Fail (not replayed at all or skipped due to incomplete journal)
Agenda
40
 Disks
 File System Interface
 File System Implementation
Objectives
41
 Identify file systems, files and directories
 Explain file and directory structures
 Give examples of file and directory operations
 Explain access methods
 Explain protection in file system context
Questions
42
 What is a file system?
 What is a file?
 What is a directory?
File System Interface
43
 User-level portion of the file system
 Access methods

Directory Structure

Protection
File Concept
44
 A collection of related bytes having meaning only to the
creator. The file can be "free formed", indexed,
structured, etc.
 The file is an entry in a directory.
 The file may have attributes (name, creator, date, type,
permissions)
 The file may have structure ( O.S. may or may not know
about this.) It's a tradeoff of power versus overhead. For
example,



An Operating System understands program image format in order to
create a process.
The UNIX shell understands how directory files look. (In general the
UNIX kernel doesn't interpret files.)
Usually the Operating System understands and interprets file types.
Fundamental Ambiguity
45
 Is the file the “container of the information” or the
“information” itself?
 Almost all systems confuse the two.
 Almost all people confuse the two.
Example
46
 Later, how do either of us know that we are using the
same version of the document?
 Windows/Outlook/Exchange:


Time-stamp is a pretty good indication that they are
Time-stamps preserved on copy, drag and drop, transmission
via e-mail, etc.
 Unix
 By default, time-stamps not preserved on copy, ftp, e-mail, etc.
 Time-stamp associated with container, not with information
File Attributes
47
 Name – only information kept in human-readable form
 Identifier – unique tag (number) identifies file within file





system
Type – needed for systems that support different types
Location – pointer to file location on device
Size – current file size
Protection – controls who can do reading, writing, executing
Time, date, and user identification – data for protection,
security, and usage monitoring
 Information about files is kept in the directory structure,
which is maintained on the disk.
File Operations
48
 Create
 Write
 Read
 Reposition within a file
 Delete
 Truncate
 Rename, Append, Copy
File Types
49
File Structure
50
 None
 Sequence of words/bytes
 Simple Record Structure
 Lines
 Fixed Length
 Variable Length
 Complex Structure
 Formatted Document
 Relocatable load file
 OS and programs interpret these structures.
Access Methods
51
 Sequential Access
 Implemented by the file system.
 Data is accessed one record right after the last.
 Reads cause a pointer to be moved ahead by one.
 Writes allocate space for the record and move the pointer to
the new End Of File.
 Such a method is reasonable for tape
Access Methods
52
 Direct Access
 Method useful for disks.
 The file is viewed as a numbered sequence of blocks or records.
 There are no restrictions on which blocks are read/written in
any order.
 User now says "read n" rather than "read next".
 "n" is a number relative to the beginning of file, not relative to
an absolute physical disk location.
Access Methods
53
 OTHER ACCESS METHODS
 Built on top of direct access and often implemented by a user utility.
Indexed

ID plus pointer.
An index block says what's in each remaining block or contains pointers to blocks
containing particular items. Suppose a file contains many blocks of data arranged
by name alphabetically.
 Example 1: Index contains the name appearing as the first record in
each block. There are as many index entries as there are blocks.
 Example 2: Index contains the block number where "A" begins,
where "B" begins, etc. Here there are only 26 index entries.
Access Methods
54
 Example 1: Index contains
Adams
the name appearing as the
first record in each block.
There are as many index
entries as there are blocks.
Arthur
Smith, John | Data
Asher
Smith
Adams | Data
 Example 2: Index contains
the block number where
"A" begins, where "B"
begins, etc. Here there are
only 26 index entries.
Adams
Baker
Charles
Saarnin
Arthur | Data
Asher
| Data
Baker
| Data
Saarnin | Data
Smith
| Data
Directory
55
 Special kind of a file
 A tool for users & applications to organize and find
files


User-friendly names
Names that are meaningful over long periods of time
 The data structure for OS to locate files (i.e.,
containers) on disk
Directory Structures
56
 Single level


One directory per system, one entry pointing to each file
Small, single-user or single-use systems

PDA, cell phone, etc.
 Two-level



Single “master” directory per system
Each entry points to one single-level directory per user
Uncommon in modern operating systems
 Hierarchical

Any directory entry may point to
Another directory
 Individual file


Used in most modern operating systems
Directory Organization - Hierarchical
57
Directory Considerations
58
 Efficiency – locating a file quickly.
 Naming – convenient to users.
 Separate users can use same name for separate files.
 The same file can have different names for different users.
 Names need only be unique within a directory
 Grouping – logical grouping of files by properties
 e.g., all Java programs, all games, …
More on Hierarchical Directories
59
 Most systems support idea of current (working) directory

Absolute names – fully qualified from root of file system


Relative names – specified with respect to working directory


/usr/group/foo.c
foo.c
A special name – the working directory itself

“.”
 Modified Hierarchical – Acyclic Graph (no loops) and
General Graph


Allow directories and files to have multiple names
Links are file names (directory entries) that point to existing (source)
files
Directory Operations
60
 Create:

Make a new directory
 Add, Delete entry:

Invoked by file create & destroy, directory create & destroy
 Find, List:

Search or enumerate directory entries
 Rename:

Change name of an entry without changing anything else about it
 Link, Unlink:


Add or remove entry pointing to another entry elsewhere
Introduces possibility of loops in directory graph
 Destroy:

Removes directory; must be empty
Links
61
 Hard links: bi-directional relationship between file
names and file




A hard link is directory entry that points to a source file’s metadata
Metadata counts the number of hard links (including the source file)
– link reference count
Link reference count is decremented when a hard link is deleted
File data is deleted and space freed when the link reference count is 0
 Symbolic (soft) links: uni-directional relationship
between a file name and the file



Directory entry points to new metadata
Metadata points to the source file
If the source file is deleted, the link pointer is invalid
Other Issues
62
 Mounting:
 Attaching portions of the file system into a directory structure.
 Sharing:
 Sharing must be done through a protection scheme
 May use networking to allow file system access between systems




Manually via programs like FTP or SSH
Automatically, seamlessly using distributed file systems
Semi automatically via the world wide web
Client-server model allows clients to mount remote file systems from
servers





Server can serve multiple clients
Client and user-on-client identification is insecure or complicated
NFS is standard UNIX client-server file sharing protocol
CIFS is standard Windows protocol
Standard operating system file calls are translated into remote calls
Protection
63
 File owner/creator should be able to control:
 what can be done
 by whom
 Types of access
 Read
 Write
 Execute
 Append
 Delete
 List
Protection
64
 Unix/Linux


Mode of access: read (R), write (W), execute (X)
Three classes of users
Owner
 Group
 Public



Ask manager to create a group (unique name), say G, and add some
users to the group.
For a particular file (say game) or subdirectory, define an
appropriate access.
drwxr-xr-x
-rwxr-x---rw-r-----
2 clee01 8571 4096 Apr 30
1 clee01 8571 15827 Apr 30
1 clee01 8571 7758 Apr 27
2007 old
2007 webserver
2007 webserver.cxx
Protection
65
 Windows
Agenda
66
 Disks
 File System Interface
 File System Implementation
Objectives
67
 Differentiate layers in file system structure
 Explain different allocation methods
 Explain different free space management schemes
File System Implementation
68
 OS-level portion of the file system

Allocation and free space management

Directory implementation
File System Structure
69
 When talking about “the file system”, you are making a statement about both
the rules used for file access, and about the algorithms used to implement those
rules. Here’s a breakdown of those algorithmic pieces.
Application Programs
The code that's making a file request.
Logical File System
This is the highest level in the OS; it does protection, and
security. Uses the directory structure to do name
resolution.
File Organization Module
Here we read the file control block maintained in the
directory so we know about files and the logical blocks
where information about that file is located.
Basic File System
Knowing specific blocks to access, we can now make generic
requests to the appropriate device driver.
IO Control
These are device drivers and interrupt handlers. They
cause the device to transfer information between that
device and CPU memory.
Devices
The disks / tapes / etc.
Layered File System
70
 Handles the CONTENT of the file. Knows the file’s
internal structure.
 Handles the OPEN, etc. system calls. Understands
paths, directory structure, etc.
 Uses directory information to figure out blocks, etc.
Implements the READ. POSITION calls.
 Determines where on the disk blocks are located.
 Interfaces with the devices – handles interrupts.
Virtual File System
71
 Virtual File Systems (VFS)
provide an object-oriented
way of implementing file
systems.
 VFS allows the same system
call interface (the API) to be
used for different types of
file systems.
 The API is to the VFS
interface, rather than any
specific type of file system.
Allocation Methods – Contiguous Allocation
72
 Method: Lay down the entire file on contiguous sectors of the disk.
Define by a dyad <first block location, length >.
1.
2.
3.
Accessing the file requires a minimum of
head movement.
Easy to calculate block location: block i of a
file, starting at disk address b, is b + i.
Difficulty is in finding the contiguous space,
especially for a large file. Problem is one of
dynamic allocation (first fit, best fit, etc.)
which has external fragmentation. If many
files are created/deleted, compaction will be
necessary.
It's hard to estimate at create time what the size
of the file will ultimately be. What happens
when we want to extend the file --- we must
either terminate the owner of the file, or try to
find a bigger hole.
Allocation Methods – Contiguous Allocation
73
 Ideal for large, static files
 Databases, fixed system structures, OS code
 CD-ROM, DVD
 Simple address calculation
 Block address  sector address
 Fast multi-block reads and writes
 Minimize seeks between blocks
 Prone to fragmentation when …
Files come and go
 Files change size


Similar to unpaged virtual memory
Allocation Methods – Linked Allocation
74
 Each file is a linked list of disk
blocks, scattered anywhere on
the disk.
 Each block contains pointer to
next block
 Directory points to first and last
blocks
 Sector header block:


Pointer to next block
ID and block number of file
 At file creation time, simply tell
the directory about the file.
When writing, get a free block
and write to it, enqueueing it to
the file header.
Allocation Methods – Linked Allocation
75
 Advantages
 No space fragmentation
 Easy to create, extend files
 Ideal for lots of small files
 Disadvantages
 Lots of disk arm movement
 Space taken up by links
 Sequential access only!
Example of Linked Allocation
76
 File Allocation Table
(FAT)

Instead of a link on
each block, put all
links in one table
the FAT (File Allocation
Table)
 fixed place on disk




One entry per physical
block in disk
Directory points to first
& last blocks of file
Each block points to next
block (or EOF)
FAT File System
77
 Advantages
 Advantages of Linked File System
 FAT can be cached in memory
 Searchable at CPU speeds, pseudo-random access
 Disadvantages
 Limited size, not suitable for very large disks
 FAT cache describes entire disk, not just open files!
 Not fast enough for large databases
 Used in MS-DOS, early Windows systems
Allocation Methods – Indexed Allocation
78
 Each file uses an index block on
disk to contain addresses of other
disk blocks used by the file.
 When the i th block is written,
the address of a free block is
placed at the i th position in the
index block.
 Method suffers from wasted
space since, for small files, most
of the index block is wasted.
What is the optimum size of an
index block?
 If the index block is too small, we
can:


Link several together
Use a multilevel index
UNIX keeps 12 pointers to blocks in its header.
If a file is longer than this, then it uses pointers
to single, double, and triple level index blocks.
Allocation Methods – Indexed Allocation
79
 Unix Method

Direct blocks:


Single indirect table:


Extra block
containing pointers
to blocks n+1 .. n+m
Double indirect table:


Pointers to first n
sectors
…
Extra block
containing single
indirect blocks
Performance Issues
80
 It's difficult to compare mechanisms because usage
is different. Let's calculate, for each method, the
number of disk accesses to read block i from a file:
Contiguous
1 access from location start + i.
Linked
i + 1 accesses, reading each block in turn.
(is this a fair example?)
Indexed
2 accesses, 1 for index, 1 for data.
Free Space Management
81
 Bit Vector Method

Each block is represented by a bit
1 1 0 0 1 1 0 means blocks 2, 3, 6 are free.

This method allows an easy way of finding contiguous free blocks.
Requires the overhead of disk space to hold the bitmap.

A block is not REALLY allocated on the disk unless the bitmap is
updated.

What operations (disk requests) are required to create and allocate a
file using this implementation?
Free Space Management
82
 Free List Method


Free blocks are chained together, each holding a pointer to the next
one free.
This is very inefficient since a disk access is required to look at each
sector.
 Grouping Method

In one free block, put lots of pointers to other free blocks. Include a
pointer to the next block of pointers.
 Counting Method


Since many free blocks are contiguous, keep a list of dyads holding
the starting address of a "chunk", and the number of blocks in that
chunk.
Format < disk address, number of free blocks >
Directory Management
83
 The issue here is how to be able to search for information
about a file in a directory given its name.
 Could have linear list of file names with pointers to the data
blocks. This is:

simple to program BUT time consuming to search.
 Could use hash table - a linear list with hash data structure.
 Use the filename to produce a value that's used as entry to hash table.
 Hash table contains where in the list the file data is located.
 This decreases the directory search time (file creation and deletion are
faster.)
 Must contend with collisions - where two names hash to the same
location.
 The number of hashes generally can't be expanded on the fly.
Example: Linux Files
84
[clee01@CCC3 ~]$ stat .addressbook
File: `.addressbook'
Size: 0
Blocks: 0
IO Block: 8192
regular
empty file
Device: 16h/22d Inode: 32541601
Links: 1
Access: (0640/-rw-r-----) Uid: ( 8571/ clee01)
Gid: ( 8571/ UNKNOWN)
Access: 2007-01-13 23:32:41.457165000 -0500
Modify: 2007-01-13 23:32:41.457165000 -0500
Change: 2007-01-13 23:32:41.457165000 -0500
[clee01@CCC3 ~]$ stat .cshrc
File: `.cshrc'
Size: 84
Blocks: 8
IO Block: 8192
regular file
Device: 16h/22d Inode: 24230479
Links: 1
Access: (0600/-rw-------) Uid: ( 8571/ clee01)
Gid: ( 8571/ UNKNOWN)
Access: 2008-07-10 18:09:08.425782000 -0400
Modify: 2006-07-12 01:36:38.000000000 -0400
Change: 2007-01-10 11:08:20.775838000 -0500
[clee01@CCC3 ~]$
Example : Linux Directory
85
[clee01@CCC3 ~]$ stat cs502
File: `cs502'
Size: 4096
Blocks: 8
IO Block: 8192
directory
Device: 16h/22d Inode: 23494968
Links: 11
Access: (0750/drwxr-x---) Uid: ( 8571/ clee01)
Gid: ( 8571/ UNKNOWN)
Access: 2008-07-06 10:42:55.745277000 -0400
Modify: 2008-07-05 09:26:33.081177000 -0400
Change: 2008-07-05 09:26:33.081177000 -0400
[clee01@CCC3 ~]$