Download disk scheduling - Orgfree.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Tandem Computers wikipedia , lookup

Asynchronous I/O wikipedia , lookup

File Allocation Table wikipedia , lookup

Files-11 wikipedia , lookup

Design of the FAT file system wikipedia , lookup

B-tree wikipedia , lookup

Disk formatting wikipedia , lookup

Transcript
DISK SCHEDULING
Overview




Over the past 30 years, the increase in the speed of processors and main
memory has far outstripped that of disk access, with processor and main
memory speeds increasing by about two orders of magnitude compared to
one order of magnitude for disk.
The result is that disks are currently at least four orders of magnitude slower
than main memory.
This gap is expected to continue into the foreseeable future.
Thus, the performance of disk storage subsystems is of vital concern, and
much research has gone into schemes for improving that performance.
Disk Performance Parameters








When the disk drive is operating, the disk is rotating at constant speed.
To read or write, the head must be positioned at desired track and at the
beginning of the desired sector on the track.
Track selection involves moving the head in a moveable-head system.
On a moveable-head system, the time it takes to position the head at the
track is known as seek time.
Once the track is selected, the desired controller waits until the appropriate
sector rotates to line up with the head.
The time it takes for the beginning of the sector to reach the head is known as
rotational delay, or rotational latency.
The sum of the seek time if any and the rotational delay is the access time,
the time it takes to get into position to read or write.
Once the head is in position, the read or write operation is then performed as
the sector moves under the head; this is the data transfer position of the
operation.
Seek Time
o
o
o
Seek time is the time required to move the disk arm to the required
track.
The seek time consists of two key components: the initial startup time
and the time taken to traverse the cylinders that have to be crossed
once the access arm is up to speed.
The seek time can be approximate with the following formula:
Ts = m * n + s
Where, Ts = estimated seek time
n = number of tracks traversed
m = constant that depends on the disk
drive
s = startup time
Disk Scheduling



One of the responsibilities of the operating system is to use the hardware
efficiently.
For the disk drives, this means having a fast access time and disk bandwidth.
The access time has two major components:
1. the seek time and
2. the rotational latency




The disk bandwidth is the total number of bytes transferred, divided by the
total time between the first request for service and the completion of the last
transfer.
We can improve both the access time and the bandwidth by scheduling the
servicing of disk I/O requests in a good order.
Whenever a process needs I/O to or from the disk, it issues a system call to
the operating system.
The request specifies several pieces of information:
1.
2.
3.
4.


Whether this operation is input or output
What the disk address for the transfer is
What the memory address for the transfer is
What the number of bytes to be transferred is
If the desired disk drive and controller are available, the request can be
serviced immediately.
If the drive or controller is busy, any new requests for service will need to be
placed on the queue of pending requests for that drive.
First-Come First-Served Scheduling
o
o
The simplest form of disk scheduling is, of course, FCFS.
This algorithm is intrinsically fair, but it generally does not provide
the fastest service.
Shortest-Seek-Time-First Scheduling
o
o
o
o
o
o
It seems reasonable to service all the requests close to the current
head positions, before moving the head far away to service other
requests.
This assumption is the basis for the shortest-seek-time-first (SSTF)
algorithm.
The SSTF algorithm selects the request with the minimum seek time
from the current head position.
Since the seek time increases with the number of cylinders traversed
by the head SSTF chooses the pending request closest to current head
position.
SSTF scheduling is essentially a form of SJF scheduling, and, like SJF
scheduling, it may cause starvation of some requests.
That request may arrive at any time.
SCAN Scheduling
o
o
o
o
o
In the SCAN algorithm, the disk arm starts at one end of the disk, and
moves toward the other end of the disk.
At the other end, the direction of head movement is reversed, and
servicing continues.
The head continuously scans back and forth across the disk.
If the request arrives in the queue just in front of the head, it will be
serviced almost immediately, a request arriving just behind the head
will have to wait until the arm moves to the end of the disk, reverses
direction, and comes back.
The SCAN algorithm is sometimes called the Elevator algorithm,
since the disk arm behaves just like an elevator in a building, first
servicing all the requests going up, and then reversing to service
requests the other way.
C-SCAN Scheduling
o
o
o
o
Circular SCAN (C-SCAN) is a variant of SCAN that is designed to
provide a more uniform wait time.
Like SCAN, C-SCAN moves the head from one end of the disk to the
other, servicing the request along the way.
When the head reaches the other end, however it immediately returns
to the beginning of the disk, without servicing any request on the
return trip.
The C-SCAN scheduling algorithm essentially treats the cylinder as a
circular list that wraps around from the final cylinder to the first one.
LOOK Scheduling
o
o
o
o
o
Notice that, as we described them, both SCAN and C-SCAN move the
disk arm across the full width of the disk.
In practice, neither algorithm is implemented this way.
More commonly, the arm goes only as far as the final request in each
direction.
Then, it reverses direction immediately, without first going all the way
to the end of the disk.
These versions of SCAN and C-SCAN are called LOOK and CLOOK, because they look for a request before continuing to move in a
given direction.
Selection of a Disk-Scheduling Algorithm
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Given so many disk-scheduling algorithms, how do we choose the best
one?
SSTF is common and has a natural appeal.
SCAN and C-SCAN perform better for systems that place a heavy
load on the disk, because they are less likely to have the starvation
problem.
For any particular list of requests, it is possible to define an optimal
order of retrieval, but the computation needed to find an optimal
schedule may not justify the savings over SSTF or SCAN.
With any scheduling algorithm, however, performance depends
heavily on the number and types of requests.
For instance, suppose that the queue usually has just one outstanding
request.
Then, all scheduling algorithms are forced to behave the same,
because they have only one choice for where to move the disk head.
They all behave like FCFS scheduling.
Note that the requests for disk service can be greatly influenced by the
file-allocation method.
A program reading a contiguously allocated file will generate several
requests that are close together on the disk, resulting in limited head
movement.
A linked or indexed file, on the other hand, may include blocks that
are widely scattered on the disk, resulting in greater head movement.
The location of directories and index blocks also is important.
Since every file must be opened to be used, and opening a file requires
searching the directory structure, the directories will be accessed
frequently.
Suppose a directory entry is on the first cylinder and a file’s data are
on the final cylinder.
In this case, the disk head has to move the entire width of the disk.
o
o
o
o
o
o
o
o
o
o
o
o
o
o
If the directory entry were on the middle cylinder, the head has to
move at most one-half the width.
Caching the directories and index blocks in main memory can also
help to reduce the disk-arm movement, particularly for read requests.
Because of these complexities, the disk-scheduling algorithm should
be written as a separate module of the operating system, so that it can
be replaced with a different algorithm if necessary.
Either SSTF or LOOK is a reasonable choice for the default
algorithm.
Note that the scheduling algorithms described earlier consider only
the seek distances.
For modern disks, the rotational latency can be nearly as large as the
average seek time.
But it is difficult for the operating system to schedule for improved
rotational latency because modern disks do not disclose the physical
location of logical blocks.
Disk manufacturers have been helping with this problem by
implementing disk-scheduling algorithms in the controller hardware
built into the disk drive.
If the operating system sends a batch of requests to the controller, the
controller can queue them and then schedule them to improve both
the seek time and the rotational latency.
If I/O performance were the only consideration, the operating system
would gladly turn over the responsibility of disk scheduling to the disk
hardware.
In practice, however, the operating system may have other constraints
on the service order for requests.
For instance, demand paging may take priority over application I/O,
and writes are more urgent than reads if the cache is running out of
free pages.
Also, it may be desirable to guarantee the order of a set of disk writes
to make the file system robust in the face of system crashes; consider
what could happen if the operating system allocated a disk page to a
file, and the application wrote data into that page before the operating
system had a chance to flush the modified inode and free-space list
back to disk.
To accommodate such requirements, an operating system may choose
to do its own disk scheduling, and to “spoon-feed” the requests to the
disk controller, one by one.
Disk Management

The operating system is responsible for several other aspects of disk
management, too like disk initialization, booting from disk, and bad-block
recovery.
Disk Formatting
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
A new magnetic disk is a blank slate: It is just platters of a magnetic
recording material.
Before a disk can store data, it must be divided into sectors that the
disk controller can read and write.
This process is called low-level formatting, or physical formatting.
Low-level formatting fills the disk with a special data structure for
each sector.
The data structure for a sector typically consists of a header, a data
area (usually 512 bytes in size), and a trailer.
The header and trailer contain information used by the disk
controller, such as a sector number and an error-correcting code
(ECC).
When the controller writes a sector of data during normal I/O, the
ECC is updated with a value calculated from all the bytes in the data
area.
When the sector is read, the ECC is recalculated and is compared
with the stored value.
If the stored and calculated numbers are different, this mismatch
indicates that the data area of the sector has become corrupted and
that the disk sector may be bad.
The ECC is an error-correcting code because it contains enough
information that if only 1 or 2 bits of data have been corrupted, the
controller can identify which bits have changed, and can calculate
what their correct values should be.
The ECC processing is done automatically by the controller whenever
a sector is read or written.
Most hard disks are low-level formatted at the factory as a part of the
manufacturing process.
This formatting enables the manufacturer to test the disk, and to
initialize the mapping from logical block numbers to detect-free
sectors on the disk.
For many hard disks, when the disk controller is instructed to lowlevel format the disk, it can also be told how many bytes of data space
to leave between the header and trailer of all sectors.
It is usually possible to choose among a few sizes, such as 256, 512,
and 1024 bytes.
Formatting a disk with a larger sector size means that fewer sectors
can fit on each track, but that also means fewer headers and trailers
are written on each track, and thus increases the space available for
user data.
Some operating systems can handle only a sector size of 512 bytes.
To use a disk to hold files, the operating system still needs to record its
own data structures on the disk.
It does in two steps.
o
o
o
o
o
o
o
o
o
o
o
The first step is to partition the disk into one or more groups of
cylinders.
The operating system can treat each partition as though the latter
were a separate disk.
For instance, one partition can hold a copy of the operating system’s
executable code, while another holds user files.
After partitioning, the second step is called logical formatting, or
“making a file system.”
In this step, the operating system stores the initial file-system data
structures onto the disk.
The data structures may include maps of free and allocated space (a
FAT or inodes) and an initial empty directory.
Some operating systems give special programs the ability to use a disk
partition as a large sequential array of logical blocks, without any filesystem data structures.
This array is sometimes called raw I/O.
For example, some database systems prefer raw I/O because it enables
them to control the exact disk location where each database record is
stored.
Raw I/O bypasses all the file-system services, such as the buffer cache,
prefetching, space allocation, file names, and directories.
We can make some applications more efficient by implementing their
own special-purpose storage services on a raw partition, but most
applications perform better when they use the regular file-system
services.
Boot Block
o
o
o
o
o
o
o
o
For a computer to start running - for instance, when it is powered up
or rebooted - it needs to have an initial program to run.
This initial bootstrap program tends to be simple.
It initializes all aspects of the system, from CPU registers to device
controllers and the contents of main memory, and then starts the
operating system.
To do its job, it finds the operating-system kernel on disk, loads that
kernel into memory, and jumps to an initial address to begin the
operating-system execution.
For most computers, the bootstrap is stored in read-only memory
(ROM).
This location is convenient, because ROM needs no initialization, and
is at a fixed location that the processor can start executing when
powered up or reset.
And, since ROM is read only, it cannot be infected by a computer
virus.
The problem is that changing this bootstrap code requires changing
the ROM hardware chips.
o
o
o
o
o
o
o
For this reason, most systems store a tiny bootstrap loader program in
the boot ROM, whose only job is to bring in a full bootstrap program
from disk.
The full bootstrap program can be changed easily: A new version is
simply written onto the disk.
The full bootstrap program is stored in a partition called the boot
blocks, at a fixed location on the disk.
A disk that has a boot partition is called a boot disk or system disk.
The code in the boot ROM instructs the disk controller to read the
boot blocks into memory (no device drivers are loaded at this point),
and then starts executing that code.
The full bootstrap program is more sophisticated than the bootstrap
loader in the boot ROM, and is able to load the entire operating
system from a nonfixed location on disk, and to start the operating
system running.
Even so, the full bootstrap code may be small.
Bad Blocks
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Because disks have moving parts and small tolerances (recall that the
disk head flies just above the disk surface), they are prone to failure.
Sometimes, the failure is complete, and the disk needs to be replaced,
and its contents restored from backup media to the new disk.
More frequently, one or more sectors become defective.
Most disks even come from the factory with bad blocks.
Depending on the disk and controller in use, these blocks are handled
in a variety of ways.
On simple disks, such as some disks with IDE controllers, bad blocks
are handled manually.
For instance, the MS-DOS format command does a logical format,
and, as a part of the process, scans the disk to find bad blocks.
If format finds a bad block, it writes a special value into the
corresponding FAT entry to tell the allocation routines not to use that
block.
If blocks go bad during normal operation, a special program (such as
chkdsk) must be run manually to search for the bad blocks and to
lock them as before.
Data that resided on the bad blocks usually are lost.
More sophisticated disks, such as the SCSI disks used in high-end PCs
and most workstations, are smarter about bad-block recovery.
The controller maintains a list of bad blocks on the disk.
The list is initialized during the low-level format at the factory, and is
updated over the life of the disk.
Low-level formatting also sets aside spare sectors not visible to the
operating system.
o
o
o
The controller can be told to replace each bad sector logically with
one of the spare sectors.
This scheme is known as sector sparing or forwarding.
A typical bad-sector transaction might be as follows:


The operating system tries to read logical block 87.
The controller calculates the ECC and finds that the sector is
bad.
It reports this finding to the operating system.


o
o
o
o
o
o
o
o
o
o
The next time that the system is rebooted, a special command
is run to tell the SCSI controller to replace the bad sector with
a spare.
After that, whenever the system requests logical block 87, the
request is translated into the replacement sector’s address by
the controller.
Note that such a redirection by the controller could invalidate any
optimization by the operating system’s disk-scheduling algorithm!
For this reason, most disks are formatted to provide a few spare
sectors in each cylinder, and a spare cylinder as well.
When a bad block is remapped, the controller uses a spare sector
from the same cylinder, if possible.
As an alternative to sector sparing, some controllers can be instructed
to replace a bad block by sector slipping.
For instance, suppose that logical block 17 becomes defective, and the
first available spare follows sector 202.
Then, sector slipping would remap all the sectors from sector 17 to
202, moving them all down one spot.
That is, sector 202 would be copied into the spare, then sector 201 into
202, and then 200 into 201, and so on, until sector 18 is copied into
sector 19.
Slipping the sectors in this way frees up the space of sector 18, so
sector 17 can be mapped to it.
The replacement of a bad block generally is not a totally automatic
process, because the data in the bad block usually are lost.
Thus, whatever file was using that block must be repaired (for
instance, by restoration from a backup tape), and that requires
manual intervention.
Stable-Storage Implementation


By definition, information residing in stable storage is never lost.
To implement such storage, we need to replicate the needed information on
multiple storage devices with independent failure modes.


We need to coordinate the writing of updates in a way that guarantees that a
failure during an update does not leave all the copies in a damaged state, and
that, when we are recovering from a failure, we can force all copies to a
consistent and correct value, even if there is another failure during the
recovery.
A disk write results in one of the three outcomes:
1. Successful Completion

The data were written correctly on disk.
2. Partial Failure

A failure occurred in the midst of transfer, so only some
of the sectors were written with the new data, and the
sector being written during the failure may have been
corrupted.
3. Total Failure




The failure occurred before the disk write started, so
the previous data values on the disk remain intact.
We require that, whenever a failure occurs during writing of a block, the
system detects it and invokes a recovery procedure to restore the block to a
consistent state.
To do that, the system must maintain two physical blocks for each logical
block.
An output operation is executed as follows:
1. Write the information onto the first physical block.
2. When the first write completes successfully, write the same
information onto the second physical block.
3. Declare the operation complete only after the second write
completes successfully.





During recovery from a failure, each pair of physical blocks is examined.
If both are the same and no detectable error exists, then no further action is
necessary.
If one block contains a detectable error, then we replace its contents with the
value of the other block.
If both blocks contain no detectable error, but they differ in content, then we
replace the content of the first block with the value of the second.
This recovery procedure ensures that a write to stable storage either
succeeds completely or results in no change.



We can extend this procedure easily to allow the use of an arbitrarily large
number of copies of each block of stable storage.
Although a large number of copies further reduces the probability of a
failure, it is usually reasonable to simulate stable storage with only two
copies.
The data in stable storage are guaranteed to be safe unless a failure destroys
all the copies.