* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS 519 -- Operating Systems -
MTS system architecture wikipedia , lookup
Distributed operating system wikipedia , lookup
Process management (computing) wikipedia , lookup
Plan 9 from Bell Labs wikipedia , lookup
Commodore DOS wikipedia , lookup
Burroughs MCP wikipedia , lookup
Spring (operating system) wikipedia , lookup
CS519: Lecture 4
I/O and File Management
I/O Devices
So far we have talked about how to abstract and
manage CPU and memory (processes, VM, etc)
Now: I/O and file management
I/O devices are the computer’s interface to the
outside world (I/O Input/Output)
Example devices: display, keyboard, mouse, speakers,
network interface, and disk
CS 519
2
Operating System Theory
Basic Computer Structure
CPU
Memory
Memory Bus
(System Bus)
Bridge
I/O Bus
NIC
Disk
CS 519
3
Operating System Theory
Intel SR440BX Motherboard
CPU
System Bus &
MMU/AGP/PCI
Controller
I/O Bus
IDE Disk
Controller
USB
Controller
Another
I/O Bus
Serial &
Parallel Ports
CS 519
Keyboard
& Mouse
4
Operating System Theory
Communication Between CPU and
I/O Devices
How does the CPU communicate with I/O devices?
Memory-mapped communication
Each I/O device assigned a portion of the physical address space
CPU I/O device
• CPU writes to locations in this area to "talk" to I/O device
I/O device CPU
• Polling: CPU repeatedly check location(s) in portion of address space
assigned to device
• Interrupt: Device sends an interrupt (on an interrupt line) to get the
attention of the CPU
Programmed I/O, Interrupt-Driven, Direct Memory Access
PIO and ID = word at a time
DMA = block at a time
CS 519
5
Operating System Theory
Programmed I/O vs. DMA
Programmed I/O is ok for sending commands,
receiving status, and communication of a small amount
of data
Inefficient for a large amount of data
Keeps CPU busy during the transfer
Programmed I/O memory operations slow
Direct Memory Access
Device read/write directly from/to memory
Transfer from memory to device typically initiated from CPU
Transfer from device to memory can be initiated by the
device or the CPU
CS 519
6
Operating System Theory
Programmed I/O vs. DMA
CPU
Memory
Interconnect
CPU
Memory
Interconnect
CPU
Memory
Interconnect
Disk
Disk
Disk
Programmed
I/O
DMA
DMA
CS 519
7
Operating System Theory
Device Driver
OS module controlling an I/O device
Hides the device specifics from the above layers in the kernel
Supporting a common API
UNIX: block or character device
Block: device communicates with the CPU/memory in fixed-size blocks
Character/Stream: stream of bytes
Translates logical I/O into device I/O
E.g., logical disk blocks into {head, track, sector}
Performs data buffering and scheduling of I/O operations
Structure
Several synchronous entry points: device initialization, queue I/O requests,
state control, read/write
An asynchronous entry point to handle interrupts
CS 519
8
Operating System Theory
Some Common Entry Points for UNIX
Device Drivers
Attach: attach a new device to the system.
Close: note the device is not in use.
Halt: prepare for system shutdown.
Init: initialize driver globals at load or boot time.
Intr: handle device interrupt (not used).
Ioctl: implement control operations.
Mmap: implement memory-mapping (SVR4).
Open: connect a process to a device.
Read: character-mode input.
Size: return logical size of block device.
Start: initialize driver at load or boot time.
Write: character-mode output.
CS 519
9
Operating System Theory
I/O Buffering
I/O Transfer – DMA
After an I/O request is placed the source/destination of the I/O
transfer must be locked in memory
To allow user process to continue (when possible), data is often
copied from user address space to kernel buffers (or vice-versa)
which are pinned to memory
Copying is expensive asynchronous I/O
Devices are typically slow compared to CPU
How do we speed up accesses? Caching, of course …
I/O buffering
Buffer cache: a buffer in main memory for block devices
Character queue: follows the producer/consumer model (characters
in the queue are read once)
CS 519
10
Operating System Theory
User to Driver Control Flow
read, write, ioctl
user
kernel
ordinary file
special file
file system
character
device
block
device
buffer cache
character queue
driver_read/write
CS 519
driver-strategy
11
Operating System Theory
Buffer Cache
When an I/O request is made for a block, the buffer
cache is checked first
If block is missing from the cache, it is read into the
buffer cache from the device
Exploits locality of reference as any other cache
Replacement policies similar to those for VM
UNIX
Historically, UNIX has a buffer cache for the disk which
does not share buffers with character/stream devices
Adds overhead in a path that has become increasingly
common: disk NIC
CS 519
12
Operating System Theory
Disks
Sectors
Tracks
Seek time: time to move
the disk head to the
desired track
Rotational delay: time to
reach desired sector once
head is over the desired
track
Transfer rate: rate data
read/write to disk
Some typical parameters:
Seek: ~10-15ms
Rotational delay: ~4.15ms
for 7200 rpm
Transfer rate: 30 MB/s
CS 519
13
Operating System Theory
Disk Scheduling
Disks are at least four orders of magnitude slower
than main memory
The performance of disk I/O is vital for the performance of
the computer system as a whole
Access time (seek time+ rotational delay) >> transfer time
for a sector
Therefore the order in which sectors are read matters a lot
Disk scheduling
Usually based on the position of the requested sector rather
than according to the process priority
Possibly reorder stream of read/write request to improve
performance
CS 519
14
Operating System Theory
Disk Scheduling Policies
Shortest-service-time-first (SSTF): pick the request that
requires the least movement of the head
SCAN (back and forth over disk): good service distribution
C-SCAN (one way with fast return): lower service variability
Problem with SSTF, SCAN, and C-SCAN: arm may not move for
long time (due to rapid-fire accesses to same track)
N-step SCAN: scan of N records at a time by breaking the
request queue in segments of size at most N and cycling through
them
FSCAN: uses two sub-queues, during a scan one queue is
consumed while the other one is produced
CS 519
15
Operating System Theory
RAID
Redundant Array of Inexpensive Disks (RAID)
A set of physical disk drives viewed by the OS as a single
logical drive
Replace large-capacity disks with multiple smaller-capacity
drives to improve the I/O performance (at lower price)
Data are distributed across physical drives in a way that
enables simultaneous access to data from multiple drives
Redundant disk capacity is used to compensate for the
increase in the probability of failure due to multiple drives
Improve availability because no single point of failure
Six levels of RAID representing different design
alternatives
CS 519
16
Operating System Theory
RAID Level 0
Does not include redundancy
Data is stripped across the available disks
Total storage space across all disks is divided into strips
Strips are mapped round-robin to consecutive disks
A set of consecutive strips that maps exactly one strip to each disk in the
array is called a stripe
Can you see how this improves the disk I/O bandwidth?
What access pattern gives the best performance?
stripe 0
CS 519
strip 0
strip 1
strip 2
strip 3
strip 4
...
strip 5
strip 6
strip 7
17
Operating System Theory
RAID Level 1
Redundancy achieved by duplicating all the data
Every disk has a mirror disk that stores exactly the same data
A read can be serviced by either of the two disks which contains the
requested data (improved performance over RAID 0 if reads dominate)
A write request must be done on both disks but can be done in parallel
Recovery is simple but cost is high
CS 519
strip 0
strip 1
strip 1
strip 0
strip 2
...
strip 3
strip 3
strip 2
18
Operating System Theory
RAID Levels 2 and 3
Parallel access: all disks participate in every I/O request
Small strips since size of each read/write = # of disks * strip size
RAID 2: error correcting code is calculated across corresponding bits
on each data disk and stored on log(# data disks) parity disks
Hamming code: can correct single-bit errors and detect double-bit errors
Less expensive than RAID 1 but still pretty high overhead – not really
needed in most reasonable environments
RAID 3: a single redundant disk that keeps parity bits
P(i) = X2(i) X1(i) X0(i)
In the event of a failure, data can be reconstructed
Can only tolerate a single failure at a time
b0
CS 519
b1
b2
P(b)
19
X2(i) = P(i) X1(i) X0(i)
Operating System Theory
RAID Levels 4 and 5
RAID 4
Large strips with a parity strip like RAID 3
Independent access - each disk operates independently, so multiple I/O
request can be satisfied in parallel
Independent access small write = 2 reads + 2 writes
Example: if write performed only on strip 0:
P’(i) = X2(i) X1(i) X0’1(i)
= X2(i) X1(i) X0’(i) X0(i) X0(i)
= P(i) X0’(i) X0(i)
Parity disk can become bottleneck
strip 0
strip 1
strip 2
P(0-2)
strip 3
strip 4
strip 5
P(3-5)
RAID 5
Like RAID 4 but parity strips are distributed across all disks
CS 519
20
Operating System Theory
File System
File system is an abstraction of the disk
File Track/sector
To a user process
A file looks like a contiguous block of bytes (Unix)
A file system provides a coherent view of a group of files
A file system provides protection
API: create, open, delete, read, write files
Performance: throughput vs. response time
Reliability: minimize the potential for lost or
destroyed data
E.g., RAID could be implemented in the OS as part of the
disk device driver
CS 519
21
Operating System Theory
Unix File System
Ordinary files (uninterpreted)
Directories
File of files
Organized as a rooted tree
Pathnames (relative and absolute)
Contains links to parent, itself
Multiple links to files can exist
Link - hard OR symbolic
CS 519
22
Operating System Theory
Unix File Systems (Cont’d)
Tree-structured file
hierarchies
Mounted on existing space
by using mount
No links between different
file systems
CS 519
23
Operating System Theory
File Naming
Each file has a unique name
User visible (external) name must be symbolic
In a hierarchical file system, unique external names are given as
pathnames (path from the root to the file)
Internal names: i-node in UNIX - an index into an array of file
descriptors/headers for a volume
Directory: translation from external to internal name
May have more than one external name for a single internal name
Information about file is split between the directory and the
file descriptor: name, type, size, location on disk, owner,
permissions, date created, date last modified, date last access,
link count
CS 519
24
Operating System Theory
Name Space
In UNIX, “devices are files”
/
E.g., /dev/cdrom, /dev/tape
User process accesses
devices by accessing
corresponding file
usr
C
CS 519
25
A
B
D
Operating System Theory
File Allocation
Contiguous: a contiguous set of blocks is pre-allocated to a file
at the time of file creation
Good for sequential files
File size must be known at the time of file creation
External fragmentation – like memory allocation when giving a
contiguous block to each job
So what do we do?
Dynamic allocation (new space allocated on demand)
First fit (first chunk of sufficient size), best fit (smallest chunk of
sufficient size), nearest fit (chunk of sufficient size that is closest
to the previous allocation for the same file)
Indexed allocation (contiguous and chained allocations are other
options) with file allocation table. FAT includes file names and
corresponding index block numbers
Use a disk allocation table (bit map, chained, and indexed) to
manage the free space
CS 519
26
Operating System Theory
File Allocation Strategies
Contiguous allocation: find contiguous chunk for whole
file
Chained allocation: pointer to next block allocated to
file
Indexed: index block points to file blocks
CS 519
27
Operating System Theory
Free Space Management
Bitmap: one bit for each block on the disk
Good to find a contiguous group of free blocks
Small enough to be kept in memory
Requires sequential scan of bits
Chained free portions: pointer to the next one
Indexed: treats free space as a file
CS 519
28
Operating System Theory
UNIX File
i-nodes
CS 519
29
Operating System Theory
File System Buffer Cache
application:
OS:
read/write files
translate file to disk blocks
...buffer cache ...
maintains
controls disk accesses: read/write blocks
hardware:
Any problems?
CS 519
30
Operating System Theory
File System Buffer Cache
Disks are “stable” while memory is volatile
What happens if you buffer a write and the machine crashes
before the write has been saved to disk?
Can use write-through but write performance will suffer
In UNIX
Use un-buffered I/O when writing i-nodes or pointer blocks
Use buffered I/O for other writes and force sync every 30 seconds
What about replacement?
How can we further improve performance?
CS 519
31
Operating System Theory
Application-controlled caching
application:
OS:
read/write files
replacement policy
translate file to disk blocks
...buffer cache ...
maintains
controls disk accesses: read/write blocks
hardware:
CS 519
32
Operating System Theory
Application-Controlled File Caching
Two-level block replacement: responsibility is split
between kernel and user level
A global allocation policy performed by the kernel
which decides which process will give up a block
A block replacement policy decided by the user:
Kernel provides the candidate block as a hint to the process
The process can overrule the kernel’s choice by suggesting
an alternative block
The suggested block is replaced by the kernel
Examples of alternative replacement policy: mostrecently used (MRU)
CS 519
33
Operating System Theory
Sound kernel-user cooperation
Oblivious processes should do no worse than under LRU
Foolish processes should not hurt other processes
Smart processes should perform better than LRU whenever
possible and they should never perform worse
If kernel selects block A and user chooses B instead, the kernel
swaps the position of A and B in the LRU list and places B in a
“placeholder” which points to A (kernel’s choice)
If the user process misses on B (i.e. it made a bad choice), and B is
found in the placeholder, then the block pointed to by the
placeholder is chosen (prevents hurting other processes)
CS 519
34
Operating System Theory
File System Consistency
File system almost always uses a buffer/disk cache for
performance reasons
Two copies of a disk block (buffer cache, disk) consistency
problem if the system crashes before all the modified blocks
are written back to disk
This problem is critical especially for the blocks that contain
control information: i-node, free-list, directory blocks
Utility programs for checking block and directory consistency
Write critical blocks from the buffer cache to disk immediately
Data blocks are written to disk periodically: sync
CS 519
35
Operating System Theory
More on File System Consistency
To maintain file system consistency the ordering of
updates from buffer cache to disk is critical
Example: if the directory block (contains pointer to inode) is written back before the i-node and the
system crashes, the directory structure will be
inconsistent
Similar case when free list is updated before i-node
and the system crashes, free list will be incorrect
A more elaborate solution: use dependencies between
blocks containing control data in the buffer cache to
specify the ordering of updates
CS 519
36
Operating System Theory
Protection Mechanisms
Files are OS objects: unique names and a finite set of
operations that processes can perform on them
Protection domain is a set of {object,rights} where right is the
permission to perform one of the operations
At every instant in time, each process runs in some protection
domain
In Unix, a protection domain is {uid, gid}
Protection domain in Unix is switched when running a program
with SETUID/SETGID set or when the process enters the
kernel mode by issuing a system call
How to store all the protection domains?
CS 519
37
Operating System Theory
Protection Mechanisms (cont’d)
Access Control List (ACL): associate with each object
a list of all the protection domains that may access
the object and how
In Unix ACL is reduced to three protection domains: owner,
group and others
Capability List (C-list): associate with each process a
list of objects that may be accessed along with the
operations
C-list implementation issues: where/how to store them
(hardware, kernel, encrypted in user space) and how to
revoke them
CS 519
38
Operating System Theory
Log-Structured File System (LFS)
As memory gets larger, buffer cache size increases increase
the fraction of read requests which can be satisfied from the
buffer cache with no disk access
In the future, most disk accesses will be writes
but writes are usually done in small chunks in most file systems
(control data, for instance) which makes the file system highly
inefficient
LFS idea: structure the entire disk as a log
Periodically, or when required, all the pending writes being
buffered in memory are collected and written as a single
contiguous segment at the end of the log
CS 519
39
Operating System Theory
LFS segment
Contain i-nodes, directory blocks and data blocks, all
mixed together
Each segment starts with a segment summary
Segment size: 512 KB - 1MB
Two key issues:
How to retrieve information from the log?
How to manage the free space on disk?
CS 519
40
Operating System Theory
File Location in LFS
The i-node contains the disk addresses of the file
block as in standard UNIX
But there is no fixed location for the i-node
An i-node map is used to maintain the current
location of each i-node
i-node map blocks can also be scattered but a fixed
checkpoint region on the disk identifies the location
of all the i-node map blocks
Usually i-node map blocks are cached in main memory
most of the time, thus disk accesses for them are
rare
CS 519
41
Operating System Theory
Segment Cleaning in LFS
LFS disk is divided into segments that are written sequentially
Live data must be copied out of a segment before the segment
can be re-written
The process of copying data out of a segment: cleaning
A separate cleaner thread moves along the log, removes old
segments from the end and puts live data into memory for
rewriting in the next segment
As a result a LFS disk appears like a big circular buffer with
the writer thread adding new segments to the front and the
cleaner thread removing old segments from the end
Bookkeeping is not trivial: i-node must be updated when blocks
are moved to the current segment
CS 519
42
Operating System Theory
LFS Performance
CS 519
43
Operating System Theory
LFS Performance (Cont’d)
CS 519
44
Operating System Theory