Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Scheduling
CS623, Lecture 7
3/9/2004
© Joel Wein, updated by T. Suel
Reading Materials:

Stallings Textbook, Chapter 9
–
Background, Fair-Share Scheduler

Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible
Proportional-Share Resource Management, Proc. of the First Symposium
on Operating Systems Design and Implementation (OSDI), 1994.

C. A. Waldspurger and W. E. Weihl, "Stride Scheduling: Deterministic
Proportional-Share Resource Management," Technical Memorandum
MIT/LCS/TM-528, Laboratory for Computer Science, MIT 1995.

P. Goyal and X. Guo and H.M. Vin, A Hierarchical CPU Scheduler for
Multimedia Operating Systems, Proceedings of 2nd Symposium on
Operating System Design and Implementation (OSDI), 1996.
Outline





Basics (Stallings 9.3)
Fair-Share Scheduler (Stallings 9.3)
Lottery Scheduling
Stride Scheduling
QLinux
Short-Term Scheduler




Medium-Term Scheduler: swapping
Short-Term Scheduler: what to execute next
Give small slices of time to processes
Some other objectives (fairness and others)
Basic Strategies







Priorities
FCFS
RR
SPN (shortest process next)
SRPT (shortest remaining processing time)
HRRN (stretch) (highest response ratio next)
Feedback (penalize old guys)
Fair-Share Scheduling

Traditional techniques treat collection of
ready processes as single pool from which
to choose the next.
–

Broken down by priority but otherwise
homogeneous.
There might be structure to collection of
processes not recognized by traditional
scheduler.
–
–
User might want his set of processes to make
progress, not so much one individual one.
Or group of users (department).
Fair Share Strategy

Each user assigned a weighting that defines
user’s share of system resources as fraction
of total usage of those resources.
–
–
If user A has twice the weighting of user B, in long
run should be able to do twice as much work.
Objective of scheduler: monitor usage and give
less resources to those that have more than fair
share, more to those that have less.
FSS


G. Henry, Fair-Share Scheduler, 1984.
Divide user community into a set of fair-share
groups and allocate fraction of processor resource
to each group.
–

Each fair share group can be thought of as a proportionally
slower than a full system.
Scheduling done on basis of priority: takes into
account
–
–
–
Priority of process (High number is lower priority)
Recent processor usage
Recent processor usage of group it belongs to.
Fair Share Scheduling





See Equations page 420, Stallings.
Each process assigned a base priority.
Priority of process drops as process uses
processor and as the group to which the process
belongs uses the processor.
In case of group utilization, average is normalized
by dividing by the weight of the group.
The greater the weight of group, the less its
utilization will affect its priority.
FSS

Processor utilization measured as
follows:
–
–
–
Process interrupted 60 times per second
During each interrupt, processor usage
field of currently running process is
incremented, as is corresponding group
processor field.
Once per second, priorities recalculated.
Lottery Scheduling: Motivation



Policy can have enormous impact on
throughput and response time.
“Accurate control over quality of service
provided to users and applications requires
support for specifying relative computation
rates.”
For interactive applications need ability to
do this on a short time-frame.
Lottery Scheduling: Problems with
Traditional Schedulers


Priority Systems are ad-hoc at best, highest
priority always wins.
Fair Share Schedulers:
–
–
–
Relatively coarse control over long-running
computations.
“Algorithms are complex, requiring periodic
usage updates, complicated dynamic priority
adjustments, administrative parameter setting to
ensure fairness on a time scale of minutes.”
Priority inversion.
Basics of Lottery Scheduling

Randomized Resource Allocation Mechanism
–
–
Resource Rights are represented by lottery
tickets.
Each allocation determined by holding a lottery;
resource granted to client with the winning ticket.
Lottery Scheduling: Resource
Rights

Lottery tickets encapsulate resource rights
that are abstract, relative and uniform.
–
–
–
Abstract: quantify resource rights independently
of machine details.
Relative: Fraction of resource that they represent
varies dynamically in proportion to contention
for that resource.
Uniform: Rights for heterogeneous resources
can be homogeneously represented as tickets.
Lottery Scheduling: Lotteries

How fair is lottery scheduling?
–
–
–
Probabilistically fair. Expected allocation
of resources to clients is proportional to
number of tickets that they hold.
Since scheduling algorithm is randomized,
actual allocated proportions not
guaranteed to match expceted proportions
exactly.
Over the “long term” disparity decreases.
Lottery Fairness

Number of lotteries won by a client has a binomial
distribution.
–
–


Probability of winning for client with t out of T tickets: p = t/T.
Expected number of wins in n trials = np.
Since any client with a non-zero number of tickets
will eventually win a lottery, conventional starvation
does not happen.
Also operates fairly when number of clients or tickets
varies dynamically.
–
For each allocation, any changes in relative ticket allocations
immediately reflected in next allocation decision.
Modular Resource Management

Tickets are a useful mechanism for modular
resource management.
–
–

Use to insulate resource management policies
of independent modules
Can be transferred
Four Techniques;
–
–
–
–
Transfers
Inflation
Currencies
Compensation tickets
Ticket Transfers



Explicit transfers of tickets from one client to another.
Can be used when a client blocks for some
dependency.
E.g: Client-Server Example
–
–
–
–

Server has no tickets of its own.
Clients give server all of their tickets during RPC.
Server’s priority is the sum of the priorities of all its active
clients.
Server can use lottery scheduling to give preferential service
to high-priority clients.
Very elegant solution to long-standing problem.
Transfer


Can be used to solve priority inversion
problem in a manner similar to priority
inheritance.
Could divide ticket transfers across
multiple servers on which they may be
waiting.
Ticket Inflation



Client can bump up its priority by printing
money.
Only works amongst mutually-trusting
clients.
Allows clients to adjust their priority
dynamically with zero communication.
Ticket Currencies





Can extend to express resource rights in
units that are local to each group of mutually
trusting clients.
Unique currency within each trust boundary.
Set up an exchange rate with the base
currency.
Enables inflation just within a group.
Simplifies mini-lotteries, such as for a mutex.
Compensation Tickets

What happens if a thread is I/O-bound and
blocks before its quantum expires?
–
–
–
Without adjustment, thread will get less than its
share of the processor.
If you complete fraction f of the quantum, your
tickets are inflated by 1/f until the next time you
win.
Example: If B on average uses 1/5 of a quantum its
tickets will be inflated 5x and it will win 5 times as
often and get its correct share overall.
Implementation Issues
Need good random number generator
 Lotteries

–
–
Randomly select a winning ticket, search
list of clients for winner
Optimization:
 Order
by decreasing ticket counts
 Tree data structures
Experimental Evaluation

60 seconds, 2 tasks, diff ticket ratios.
–


10:1 gave 13.42:1 relative rate. As ratio increases
randomness less reliable.
Dynamically controlled ticket inflation:
competing Monte Carlo simulations with
early high errors inflate tickets.
Client Server
Experimental Evaluation

Multimedia Applications:
–
–
–
–
3 Mpeg_play video viewers.
3:2:1
Results: 1.92:1.5:1
Results distorted by round-robin
processing of client requests by singlethreaded X11R5 server.
Use for Synchronization Resources

Contention due to synchronization can substantially
affect computation rates.
–

Extended Mach Cthreads library to support a lotteryscheduled mutex type.
–



Lottery Scheduling can help
Associated mutex_currency and inheritance ticket.
All threads that are blocked waiting for mutex perform
ticket transfers to fund the mutex currency.
Mutex transfers its inheritance ticket to thread which
currently holds mutex.
THUS: Thread which acquires mutex executes with its
own funding plus funding of all waiting threads.
Use for Synchronization Resources




This solves the priority inversion problem in
which a mutex owner with little funding could
execute very slowly due to competition with
other threads while a highly funded thread
remains blocked on the mutex.
2 minute experiment, 2 groups of threads,
2:1. Got 1.8:1.
Overall, not as fair as we’d like
But simple, elegant, OK
Stride Scheduling


Basic Idea: Make a deterministic version of
lottery scheduling to reduce short-term
variability and improve accuracy.
Implements proportional-share control over
processor time and other resources by
applying elements of rate-based flow control
algorithms designed for networks.
Stride Scheduling






Time quanta, tickets
Absolute error: Diff between specified and actual
number of allocations.
Pairwise relative error: absolute error for subsystem
containing just those 2.
Lottery Scheduling: Expected errors go as sqrt(n).
Stride Scheduling: relative error never greater than 1
Absolute error can be O(N) where N is number of
clients.
Stride Scheduling: Basic Algorithm




Mark time virtually using “passes” as the unit
as opposed to real seconds.
Compute a representation of the time interval
– stride – that a client must wait between
successive allocations.
Client with smallest stride will be scheduled
most frequently.
A client with half the stride of another will
execute twice as quickly.
Stride Scheduling: Basic Algorithm

Each client has three state variables:
–
–
–
Tickets: Num of tickets.
Stride: Inversely proportional to tickets; represents
the interval between selections.
Pass: virtual time index for client’s next selection.
How to Allocate a Resource


Client with minimum pass is selected and its
pass is advanced by its stride. If more than
one client has the same minimum pass
value, then any of them may be selected.
Compensation tickets: increment by f*stride
and not stride.
Dynamic Client Participation


This does not support dynamic changes in
the number of clients competing for a
resource
When clients allowed to leave and join state
must be appropriately modified.
–
Global variables.
Problems


Relative error good.
Absolute error: consider 101 clients with
ratio 100:1:…:1
–

After 100 steps we wanted 50 units for first job
but we got 100. Oops!
Hierarchical Stride Scheduling. Aggregates
clients to improve interleaving
Hierarchical Stride Scheduling

Recursive application of basic stride
scheduling algorithm.
–
–
–
Individual clients combined into groups with
larger aggregate ticket allocations and
correspondingly smaller strides.
Allocation performed by invoking normal stride
scheduling algorithm first among groups and then
among individual clients within groups.
Since often systems consist of small number of
high-throughput clients together with a large
number of low-throughput clients, helps.
A Hierarchical CPU Scheduler for Multimedia
Operating Systems

Consider requirements imposed by various
application classes that can co-exist in a
multimedia system:
–
–
Hard real-time applications (EDF, RMA).
Soft real-time applications. Need to statistically
guarantee QoS parameters such as maximum
delay and throughput. E.g. video:


–
Due to multiple time-scale variations, OS will be required
to over-book CPU. This may lead to CPU overload. Need
some QoS guarantees.
Can’t assume know requirements up front.
Best-Effort Applications
Bottom Line



Need different scheduling algorithms for different
application classes in a multimedia system.
Need an OS framework that enables different
schedulers to be employed for different applications.
Need to guarantee not just coexistence but
protection between different classes of applications.
–
For example, overbooking of CPU should not violate hard
real-time constraints.
Solution

Hierarchical Partitioning of CPU Bandwidth
–

OS should be able to partition the CPU bandwidth
among various application classes, and each
application class should be able to partition its
allocation among subclasses or applications.
Hierarchical Partitioning specified by tree.
–
–
Each thread belongs to exactly one leaf node
Each node in tree represents either an application
class or an aggregation of application classes.


Threads are scheduled by leaf node
dependent schedulers.
Intermediate nodes scheduled by an
algorithm that
1.
2.
3.
4.
Achieves fair distribution of CPU resource
Does not require a priori info about threads’ needs
Provides throughput guarantees
Computationally efficient.
Qlinux:

QLinux is a Linux kernel that can provide quality of service guarantees. QLinux,
based on the Linux 2.2.x kernel, combines some of the latest innovations in
operating systems research. It includes the following features:
–
–
–
Hierarchical Start Time Fair Queuing (H-SFQ) CPU scheduler
Hierarchical Start Time Fair Queuing (H-SFQ) network packet scheduler
Lazy receiver processing (LRP) network subsystem
QLinux



The H-SFQ CPU scheduler enables hierarchical scheduling of
applications by fairly allocating cpu bandwidth to individual applications
and application classes.
The H-SFQ packet scheduler provides rate guarantees and fair allocation
of bandwidth to packets from individual flows as well as flow aggregates
(classes). Lazy receiver processing enables accurate charging of
TCP/UDP protocol processing overhead (including interrupt processing)
to the appropriate process.
The Cello disk scheduler supports multiple application classes such as
interactive best-effort, throughput-intensive best effort and soft real-time
and fairly allocates disk bandwidth to these classes