Download CS 519 -- Operating Systems -

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

MTS system architecture wikipedia , lookup

Distributed operating system wikipedia , lookup

Process management (computing) wikipedia , lookup

Plan 9 from Bell Labs wikipedia , lookup

RSTS/E wikipedia , lookup

System 7 wikipedia , lookup

Commodore DOS wikipedia , lookup

Burroughs MCP wikipedia , lookup

DNIX wikipedia , lookup

Spring (operating system) wikipedia , lookup

Paging wikipedia , lookup

Unix security wikipedia , lookup

VS/9 wikipedia , lookup

CP/M wikipedia , lookup

Transcript
CS519: Lecture 4
I/O and File Management
I/O Devices
 So far we have talked about how to abstract and
manage CPU and memory (processes, VM, etc)
 Now: I/O and file management
 I/O devices are the computer’s interface to the
outside world (I/O  Input/Output)
Example devices: display, keyboard, mouse, speakers,
network interface, and disk
CS 519
2
Operating System Theory
Basic Computer Structure
CPU
Memory
Memory Bus
(System Bus)
Bridge
I/O Bus
NIC
Disk
CS 519
3
Operating System Theory
Intel SR440BX Motherboard
CPU
System Bus &
MMU/AGP/PCI
Controller
I/O Bus
IDE Disk
Controller
USB
Controller
Another
I/O Bus
Serial &
Parallel Ports
CS 519
Keyboard
& Mouse
4
Operating System Theory
Communication Between CPU and
I/O Devices
 How does the CPU communicate with I/O devices?
Memory-mapped communication
Each I/O device assigned a portion of the physical address space
CPU  I/O device
• CPU writes to locations in this area to "talk" to I/O device
I/O device  CPU
• Polling: CPU repeatedly check location(s) in portion of address space
assigned to device
• Interrupt: Device sends an interrupt (on an interrupt line) to get the
attention of the CPU
Programmed I/O, Interrupt-Driven, Direct Memory Access
PIO and ID = word at a time
DMA = block at a time
CS 519
5
Operating System Theory
Programmed I/O vs. DMA
 Programmed I/O is ok for sending commands,
receiving status, and communication of a small amount
of data
 Inefficient for a large amount of data
Keeps CPU busy during the transfer
Programmed I/O  memory operations  slow
 Direct Memory Access
Device read/write directly from/to memory
Transfer from memory to device typically initiated from CPU
Transfer from device to memory can be initiated by the
device or the CPU
CS 519
6
Operating System Theory
Programmed I/O vs. DMA
CPU
Memory
Interconnect
CPU
Memory
Interconnect
CPU
Memory
Interconnect
Disk
Disk
Disk
Programmed
I/O
DMA
DMA
CS 519
7
Operating System Theory
Device Driver
 OS module controlling an I/O device
 Hides the device specifics from the above layers in the kernel
 Supporting a common API
 UNIX: block or character device
Block: device communicates with the CPU/memory in fixed-size blocks
Character/Stream: stream of bytes
 Translates logical I/O into device I/O
 E.g., logical disk blocks into {head, track, sector}
 Performs data buffering and scheduling of I/O operations
 Structure
Several synchronous entry points: device initialization, queue I/O requests,
state control, read/write
An asynchronous entry point to handle interrupts
CS 519
8
Operating System Theory
Some Common Entry Points for UNIX
Device Drivers












Attach: attach a new device to the system.
Close: note the device is not in use.
Halt: prepare for system shutdown.
Init: initialize driver globals at load or boot time.
Intr: handle device interrupt (not used).
Ioctl: implement control operations.
Mmap: implement memory-mapping (SVR4).
Open: connect a process to a device.
Read: character-mode input.
Size: return logical size of block device.
Start: initialize driver at load or boot time.
Write: character-mode output.
CS 519
9
Operating System Theory
I/O Buffering
 I/O Transfer – DMA
 After an I/O request is placed the source/destination of the I/O
transfer must be locked in memory
 To allow user process to continue (when possible), data is often
copied from user address space to kernel buffers (or vice-versa)
which are pinned to memory
Copying is expensive  asynchronous I/O
 Devices are typically slow compared to CPU
 How do we speed up accesses? Caching, of course …
 I/O buffering
 Buffer cache: a buffer in main memory for block devices
 Character queue: follows the producer/consumer model (characters
in the queue are read once)
CS 519
10
Operating System Theory
User to Driver Control Flow
read, write, ioctl
user
kernel
ordinary file
special file
file system
character
device
block
device
buffer cache
character queue
driver_read/write
CS 519
driver-strategy
11
Operating System Theory
Buffer Cache
 When an I/O request is made for a block, the buffer
cache is checked first
 If block is missing from the cache, it is read into the
buffer cache from the device
 Exploits locality of reference as any other cache
 Replacement policies similar to those for VM
 UNIX
Historically, UNIX has a buffer cache for the disk which
does not share buffers with character/stream devices
Adds overhead in a path that has become increasingly
common: disk  NIC
CS 519
12
Operating System Theory
Disks
Sectors
Tracks
 Seek time: time to move
the disk head to the
desired track
 Rotational delay: time to
reach desired sector once
head is over the desired
track
 Transfer rate: rate data
read/write to disk
 Some typical parameters:
 Seek: ~10-15ms
 Rotational delay: ~4.15ms
for 7200 rpm
 Transfer rate: 30 MB/s
CS 519
13
Operating System Theory
Disk Scheduling
 Disks are at least four orders of magnitude slower
than main memory
The performance of disk I/O is vital for the performance of
the computer system as a whole
Access time (seek time+ rotational delay) >> transfer time
for a sector
Therefore the order in which sectors are read matters a lot
 Disk scheduling
Usually based on the position of the requested sector rather
than according to the process priority
Possibly reorder stream of read/write request to improve
performance
CS 519
14
Operating System Theory
Disk Scheduling Policies
 Shortest-service-time-first (SSTF): pick the request that
requires the least movement of the head
 SCAN (back and forth over disk): good service distribution
 C-SCAN (one way with fast return): lower service variability
Problem with SSTF, SCAN, and C-SCAN: arm may not move for
long time (due to rapid-fire accesses to same track)
 N-step SCAN: scan of N records at a time by breaking the
request queue in segments of size at most N and cycling through
them
 FSCAN: uses two sub-queues, during a scan one queue is
consumed while the other one is produced
CS 519
15
Operating System Theory
RAID
 Redundant Array of Inexpensive Disks (RAID)
A set of physical disk drives viewed by the OS as a single
logical drive
Replace large-capacity disks with multiple smaller-capacity
drives to improve the I/O performance (at lower price)
Data are distributed across physical drives in a way that
enables simultaneous access to data from multiple drives
Redundant disk capacity is used to compensate for the
increase in the probability of failure due to multiple drives
Improve availability because no single point of failure
 Six levels of RAID representing different design
alternatives
CS 519
16
Operating System Theory
RAID Level 0
 Does not include redundancy
 Data is stripped across the available disks
 Total storage space across all disks is divided into strips
 Strips are mapped round-robin to consecutive disks
 A set of consecutive strips that maps exactly one strip to each disk in the
array is called a stripe
 Can you see how this improves the disk I/O bandwidth?
 What access pattern gives the best performance?
stripe 0
CS 519
strip 0
strip 1
strip 2
strip 3
strip 4
...
strip 5
strip 6
strip 7
17
Operating System Theory
RAID Level 1
 Redundancy achieved by duplicating all the data
 Every disk has a mirror disk that stores exactly the same data
 A read can be serviced by either of the two disks which contains the
requested data (improved performance over RAID 0 if reads dominate)
 A write request must be done on both disks but can be done in parallel
 Recovery is simple but cost is high
CS 519
strip 0
strip 1
strip 1
strip 0
strip 2
...
strip 3
strip 3
strip 2
18
Operating System Theory
RAID Levels 2 and 3
 Parallel access: all disks participate in every I/O request
 Small strips since size of each read/write = # of disks * strip size
 RAID 2: error correcting code is calculated across corresponding bits
on each data disk and stored on log(# data disks) parity disks
 Hamming code: can correct single-bit errors and detect double-bit errors
 Less expensive than RAID 1 but still pretty high overhead – not really
needed in most reasonable environments
 RAID 3: a single redundant disk that keeps parity bits
 P(i) = X2(i)  X1(i)  X0(i)
 In the event of a failure, data can be reconstructed
 Can only tolerate a single failure at a time
b0
CS 519
b1
b2
P(b)
19
X2(i) = P(i)  X1(i)  X0(i)
Operating System Theory
RAID Levels 4 and 5
 RAID 4
 Large strips with a parity strip like RAID 3
 Independent access - each disk operates independently, so multiple I/O
request can be satisfied in parallel
 Independent access  small write = 2 reads + 2 writes
 Example: if write performed only on strip 0:
P’(i) = X2(i)  X1(i)  X0’1(i)
= X2(i)  X1(i)  X0’(i)  X0(i)  X0(i)
= P(i)  X0’(i)  X0(i)
 Parity disk can become bottleneck
strip 0
strip 1
strip 2
P(0-2)
strip 3
strip 4
strip 5
P(3-5)
 RAID 5
 Like RAID 4 but parity strips are distributed across all disks
CS 519
20
Operating System Theory
File System
 File system is an abstraction of the disk
File  Track/sector
To a user process
A file looks like a contiguous block of bytes (Unix)
A file system provides a coherent view of a group of files
A file system provides protection
 API: create, open, delete, read, write files
 Performance: throughput vs. response time
 Reliability: minimize the potential for lost or
destroyed data
E.g., RAID could be implemented in the OS as part of the
disk device driver
CS 519
21
Operating System Theory
Unix File System
 Ordinary files (uninterpreted)
 Directories
File of files
Organized as a rooted tree
Pathnames (relative and absolute)
Contains links to parent, itself
Multiple links to files can exist
Link - hard OR symbolic
CS 519
22
Operating System Theory
Unix File Systems (Cont’d)
 Tree-structured file
hierarchies
 Mounted on existing space
by using mount
 No links between different
file systems
CS 519
23
Operating System Theory
File Naming
 Each file has a unique name
 User visible (external) name must be symbolic
 In a hierarchical file system, unique external names are given as
pathnames (path from the root to the file)
 Internal names: i-node in UNIX - an index into an array of file
descriptors/headers for a volume
 Directory: translation from external to internal name
 May have more than one external name for a single internal name
 Information about file is split between the directory and the
file descriptor: name, type, size, location on disk, owner,
permissions, date created, date last modified, date last access,
link count
CS 519
24
Operating System Theory
Name Space
 In UNIX, “devices are files”
/
 E.g., /dev/cdrom, /dev/tape
 User process accesses
devices by accessing
corresponding file
usr
C
CS 519
25
A
B
D
Operating System Theory
File Allocation
 Contiguous: a contiguous set of blocks is pre-allocated to a file
at the time of file creation
 Good for sequential files
 File size must be known at the time of file creation
 External fragmentation – like memory allocation when giving a
contiguous block to each job
 So what do we do?
 Dynamic allocation (new space allocated on demand)
 First fit (first chunk of sufficient size), best fit (smallest chunk of
sufficient size), nearest fit (chunk of sufficient size that is closest
to the previous allocation for the same file)
 Indexed allocation (contiguous and chained allocations are other
options) with file allocation table. FAT includes file names and
corresponding index block numbers
 Use a disk allocation table (bit map, chained, and indexed) to
manage the free space
CS 519
26
Operating System Theory
File Allocation Strategies
 Contiguous allocation: find contiguous chunk for whole
file
 Chained allocation: pointer to next block allocated to
file
 Indexed: index block points to file blocks
CS 519
27
Operating System Theory
Free Space Management
 Bitmap: one bit for each block on the disk
Good to find a contiguous group of free blocks
Small enough to be kept in memory
Requires sequential scan of bits
 Chained free portions: pointer to the next one
 Indexed: treats free space as a file
CS 519
28
Operating System Theory
UNIX File
i-nodes
CS 519
29
Operating System Theory
File System Buffer Cache
application:
OS:
read/write files
translate file to disk blocks
...buffer cache ...
maintains
controls disk accesses: read/write blocks
hardware:
Any problems?
CS 519
30
Operating System Theory
File System Buffer Cache
 Disks are “stable” while memory is volatile
What happens if you buffer a write and the machine crashes
before the write has been saved to disk?
Can use write-through but write performance will suffer
In UNIX
Use un-buffered I/O when writing i-nodes or pointer blocks
Use buffered I/O for other writes and force sync every 30 seconds
 What about replacement?
 How can we further improve performance?
CS 519
31
Operating System Theory
Application-controlled caching
application:
OS:
read/write files
replacement policy
translate file to disk blocks
...buffer cache ...
maintains
controls disk accesses: read/write blocks
hardware:
CS 519
32
Operating System Theory
Application-Controlled File Caching
 Two-level block replacement: responsibility is split
between kernel and user level
 A global allocation policy performed by the kernel
which decides which process will give up a block
 A block replacement policy decided by the user:
Kernel provides the candidate block as a hint to the process
The process can overrule the kernel’s choice by suggesting
an alternative block
The suggested block is replaced by the kernel
 Examples of alternative replacement policy: mostrecently used (MRU)
CS 519
33
Operating System Theory
Sound kernel-user cooperation
 Oblivious processes should do no worse than under LRU
 Foolish processes should not hurt other processes
 Smart processes should perform better than LRU whenever
possible and they should never perform worse
 If kernel selects block A and user chooses B instead, the kernel
swaps the position of A and B in the LRU list and places B in a
“placeholder” which points to A (kernel’s choice)
 If the user process misses on B (i.e. it made a bad choice), and B is
found in the placeholder, then the block pointed to by the
placeholder is chosen (prevents hurting other processes)
CS 519
34
Operating System Theory
File System Consistency
 File system almost always uses a buffer/disk cache for
performance reasons
 Two copies of a disk block (buffer cache, disk)  consistency
problem if the system crashes before all the modified blocks
are written back to disk
 This problem is critical especially for the blocks that contain
control information: i-node, free-list, directory blocks
 Utility programs for checking block and directory consistency
 Write critical blocks from the buffer cache to disk immediately
 Data blocks are written to disk periodically: sync
CS 519
35
Operating System Theory
More on File System Consistency
 To maintain file system consistency the ordering of
updates from buffer cache to disk is critical
 Example: if the directory block (contains pointer to inode) is written back before the i-node and the
system crashes, the directory structure will be
inconsistent
 Similar case when free list is updated before i-node
and the system crashes, free list will be incorrect
 A more elaborate solution: use dependencies between
blocks containing control data in the buffer cache to
specify the ordering of updates
CS 519
36
Operating System Theory
Protection Mechanisms
 Files are OS objects: unique names and a finite set of
operations that processes can perform on them
 Protection domain is a set of {object,rights} where right is the
permission to perform one of the operations
 At every instant in time, each process runs in some protection
domain
 In Unix, a protection domain is {uid, gid}
 Protection domain in Unix is switched when running a program
with SETUID/SETGID set or when the process enters the
kernel mode by issuing a system call
 How to store all the protection domains?
CS 519
37
Operating System Theory
Protection Mechanisms (cont’d)
 Access Control List (ACL): associate with each object
a list of all the protection domains that may access
the object and how
In Unix ACL is reduced to three protection domains: owner,
group and others
 Capability List (C-list): associate with each process a
list of objects that may be accessed along with the
operations
C-list implementation issues: where/how to store them
(hardware, kernel, encrypted in user space) and how to
revoke them
CS 519
38
Operating System Theory
Log-Structured File System (LFS)
 As memory gets larger, buffer cache size increases  increase
the fraction of read requests which can be satisfied from the
buffer cache with no disk access
 In the future, most disk accesses will be writes
 but writes are usually done in small chunks in most file systems
(control data, for instance) which makes the file system highly
inefficient
 LFS idea: structure the entire disk as a log
 Periodically, or when required, all the pending writes being
buffered in memory are collected and written as a single
contiguous segment at the end of the log
CS 519
39
Operating System Theory
LFS segment
 Contain i-nodes, directory blocks and data blocks, all
mixed together
 Each segment starts with a segment summary
 Segment size: 512 KB - 1MB
 Two key issues:
How to retrieve information from the log?
How to manage the free space on disk?
CS 519
40
Operating System Theory
File Location in LFS
 The i-node contains the disk addresses of the file
block as in standard UNIX
 But there is no fixed location for the i-node
 An i-node map is used to maintain the current
location of each i-node
 i-node map blocks can also be scattered but a fixed
checkpoint region on the disk identifies the location
of all the i-node map blocks
 Usually i-node map blocks are cached in main memory
most of the time, thus disk accesses for them are
rare
CS 519
41
Operating System Theory
Segment Cleaning in LFS
 LFS disk is divided into segments that are written sequentially
 Live data must be copied out of a segment before the segment
can be re-written
 The process of copying data out of a segment: cleaning
 A separate cleaner thread moves along the log, removes old
segments from the end and puts live data into memory for
rewriting in the next segment
 As a result a LFS disk appears like a big circular buffer with
the writer thread adding new segments to the front and the
cleaner thread removing old segments from the end
 Bookkeeping is not trivial: i-node must be updated when blocks
are moved to the current segment
CS 519
42
Operating System Theory
LFS Performance
CS 519
43
Operating System Theory
LFS Performance (Cont’d)
CS 519
44
Operating System Theory