Download PPT Chapter 10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 10
Synchronization and Scheduling in
Multiprocessor Operating Systems
Copyright © 2008
Introduction
•
•
•
•
•
•
Architecture of Multiprocessor Systems
Issues in Multiprocessor Operating Systems
Kernel Structure
Process Synchronization
Process Scheduling
Case Studies
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.2
2
Architecture of Multiprocessor
Systems
• Performance of uniprocessor systems depends on CPU
and memory performance, and Caches
– Further improvements in system performance can be
obtained only by using multiple CPUs
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.3
3
Architecture of Multiprocessor
Systems (continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.4
4
Architecture of Multiprocessor
Systems (continued)
• Use of a cache coherence protocol is crucial to ensure
that caches do not contain stale copies of data
– Snooping-based approach (bus interconnection)
• CPU snoops on the bus to analyze traffic and eliminate
stale copies
• Write-invalidate variant
– At a write, CPU updates memory and invalidates copies in
other caches
– Directory-based approach
• Directory contains information about copies in caches
• TLB coherence is an analogous problem
– Solution: TLB shootdown action
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.5
5
Architecture of Multiprocessor
Systems (continued)
• Multiprocessor Systems are classified according to the
manner of associating CPUs and memory units
– Uniform memory access (UMA) architecture
• Previously called tightly coupled multiprocessor
• Also called symmetrical multiprocessor (SMP)
• Examples: Balance system and VAX 8800
– Nonuniform memory access (NUMA) architecture
• Examples: HP AlphaServer and IBMNUMA-Q
– No-remote-memory-access (NORMA) architecture
• Example: Hypercube system by Intel
• Is actually a distributed system (discussed later)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.6
6
Architecture of Multiprocessor
Systems (continued)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.7
7
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.8
8
SMP Architecture
• Popularly use a bus or a cross-bar switch as the
interconnection network
– Only one conversation can be in progress over the bus at
any time; other conversations are delayed
• CPUs face unpredictable delays in accessing memory
• Bus may become a bottleneck
– With a cross-bar switch, performance is better
• Switch delays are also more predictable
• Cache coherence protocols add to the delays
• SMP systems do not scale well beyond a small number
of CPUs
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.9
9
NUMA Architecture
• Actual performance of a NUMA system depends on the
nonlocal memory accesses made by processes
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.10
10
Issues in Multiprocessor Operating
Systems
• Synchronization and scheduling algorithms should be
scalable, so that system performance does not degrade
with a growth in its size
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.11
11
Kernel Structure
• Kernel of a multiprocessor OS (SMP architecture) is
called an SMP kernel
– Any CPU can execute code in the kernel, and many
CPUs could do so in parallel
• Based on two fundamental provisions:
– Kernel is reentrant
– CPUs coordinate their activities through synchronization and
interprocessor interrupts
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.12
12
Kernel Structure: Synchronization
• Mutex locks for synchronization
– Locking can be coarse-grained or fine-grained
• Tradeoffs: simplicity vs. loss of parallelism
• Deadlocks are an issue in fine-grained locking
• Parallelism can be ensured without substantial locking
overhead:
– Use of separate locks for kernel functionalities
– Partitioning of the data structures of a kernel functionality
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.13
13
Kernel Structure: Heap Management
• Parallelism in heap management can be provided by
maintaining several free lists
• Locking is unnecessary if each CPU has its own free
list
– Would degrade performance
• Allocation decisions would not be optimal
• Alternative: separate free lists to hold free memory
areas of different sizes
– CPU locks an appropriate free list
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.14
14
Kernel Structure: Scheduling
• Suffers from heavy contention for mutex locks Lrq and
Lawt because every CPU needs to set/release these
locks while scheduling
– Alternative: Partition processes into subsets and entrust
each subset to a CPU for scheduling
– Fast scheduling but suboptimal performance
• An SMP kernel provides graceful degradation
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.15
15
Kernel Structure: NUMA Kernel
• CPUs in NUMA systems have different memory access
times for local and nonlocal memory
• Each node in a NUMA system has its own separate
kernel
– Exclusively schedules processes whose address spaces
are in local memory of the node
– Concept can be generalized: An application region
ensures good performance of an application. It has
• A resource partition with one or more CPUs
• An instance of the kernel
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.16
16
Process Synchronization
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.17
17
Process Synchronization (continued)
• Queued locks may not be scalable
• In NUMA, spin locks may lead to lock starvation
• Sleep locks may be preferred to spin locks if the
memory or network traffic densities are high
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.18
18
Special Hardware for Process
Synchronization
• The Sequent Balance system uses a special bus called
system link and interface controller (SLIC) for
synchronization
– Special 64-bit register in each CPU in the system
• Each bit implements a spin lock using SLIC
– Spinning doesn’t generate memory/network traffic
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.19
19
A Scalable Software Scheme for
Process Synchronization
• Scheme for process synchronization
– NUMA and NORMA architectures
– Scalable performance
• Minimizes synchronization traffic to nonlocal memory units
(NUMA) and over network (NORMA)
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.20
20
Process Synchronization (continued)
• Scheduling aware synchronization
– Adaptive lock
• A process waiting for this lock spins if holder of the lock is
scheduled to run in parallel
• Otherwise, the process is preempted and queued as in a
queued lock
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.21
21
Process Scheduling
• CPU scheduling decisions affect performance
– How, when and where to schedule processes
• Affinity scheduling: schedule a process on a CPU
where it has executed in the past
• Good cache hit ratio
• Interferes with load balancing across CPUs
• In SMP kernel CPUs can perform own scheduling
– Prevents kernel from becoming bottleneck
– Leads to scheduling anomalies
• Correcting requires shuffling of processes
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.22
22
Example: Process Shuffling in an SMP
Kernel
• Process shuffling can be implemented by using the
assigned workload table AWT and the interprocessor
interrupt (IPI)
– However, it leads to high scheduling overhead
• Effect is more pronounced in a system containing a large
number of CPUs
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.23
23
Process Scheduling (continued)
• Processes of an application should be scheduled on
different CPUs at the same time if they use spin locks
for synchronization
– Called coscheduling or gang scheduling
• A different approach is required when processes
exchange messages by using a blocking protocol
– In some situations, special efforts should be made not to
schedule such processes in same time slice
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.24
24
Case Studies
• Mach
• Linux
• SMP Support in Windows
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.25
25
Mach
• Mach OS implements scheduling hints
– Thread issues hint to influence processor scheduling
• For example, a hands-off hint to relinquish CPU in favor of a
specific thread
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.26
26
Linux
• Multiprocessing support introduced in 2.0 kernel
– Coarse-grained locking was employed
• Granularity of locks was made finer in later releases
– Kernel was still nonpreemptible until 2.6 kernel
• Kernel provides:
– Spin locks for locking of data structures
– Reader–writer spin locks
– Sequence lock
• Per-CPU data structures to reduce lock contention
• Other features: hard and soft affinity, load balancing
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.27
27
SMP Support in Windows
• A hyperthreaded CPU is considered to be several
logical processors
• Spin locks provide mutual exclusion over kernel data
– A thread holding a spinlock is never preempted
• Queued spinlock uses a scalable software
implementation scheme
• Uses many free lists of memory for parallel access
• Process default processor affinity and thread processor
affinity together define thread affinity set
• Ideal processor defines hard affinity for a thread
• Uses both hard and soft affinity
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.28
28
Summary
• Multiprocessor OS exploits multiple CPUs in computer
to provide high throughput (system), computation
speedup (application), and graceful degradation (of OS,
when faults occur)
• Classification of uniprocessors
– Uniform memory architecture (UMA)
• Also called Symmetrical multiprocessor (SMP)
– Nonuniform memory architecture (NUMA)
• OS efficiently schedules user processes in parallel
– Issues: kernel structure and synchronization delays
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.29
29
Summary (continued)
• Multiprocessor OS algorithms must be scalable
• Use of special kinds of locks:
– Spin locks and sleep locks
• Important scheduling concepts in multiprocessor OSs:
– Affinity scheduling
– Coscheduling
– Process shuffling
Operating Systems, by Dhananjay Dhamdhere
Copyright © 2008
10.30
30
Related documents