Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 10 Synchronization and Scheduling in Multiprocessor Operating Systems Copyright © 2008 Introduction • • • • • • Architecture of Multiprocessor Systems Issues in Multiprocessor Operating Systems Kernel Structure Process Synchronization Process Scheduling Case Studies Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.2 2 Architecture of Multiprocessor Systems • Performance of uniprocessor systems depends on CPU and memory performance, and Caches – Further improvements in system performance can be obtained only by using multiple CPUs Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.3 3 Architecture of Multiprocessor Systems (continued) Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.4 4 Architecture of Multiprocessor Systems (continued) • Use of a cache coherence protocol is crucial to ensure that caches do not contain stale copies of data – Snooping-based approach (bus interconnection) • CPU snoops on the bus to analyze traffic and eliminate stale copies • Write-invalidate variant – At a write, CPU updates memory and invalidates copies in other caches – Directory-based approach • Directory contains information about copies in caches • TLB coherence is an analogous problem – Solution: TLB shootdown action Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.5 5 Architecture of Multiprocessor Systems (continued) • Multiprocessor Systems are classified according to the manner of associating CPUs and memory units – Uniform memory access (UMA) architecture • Previously called tightly coupled multiprocessor • Also called symmetrical multiprocessor (SMP) • Examples: Balance system and VAX 8800 – Nonuniform memory access (NUMA) architecture • Examples: HP AlphaServer and IBMNUMA-Q – No-remote-memory-access (NORMA) architecture • Example: Hypercube system by Intel • Is actually a distributed system (discussed later) Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.6 6 Architecture of Multiprocessor Systems (continued) Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.7 7 Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.8 8 SMP Architecture • Popularly use a bus or a cross-bar switch as the interconnection network – Only one conversation can be in progress over the bus at any time; other conversations are delayed • CPUs face unpredictable delays in accessing memory • Bus may become a bottleneck – With a cross-bar switch, performance is better • Switch delays are also more predictable • Cache coherence protocols add to the delays • SMP systems do not scale well beyond a small number of CPUs Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.9 9 NUMA Architecture • Actual performance of a NUMA system depends on the nonlocal memory accesses made by processes Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.10 10 Issues in Multiprocessor Operating Systems • Synchronization and scheduling algorithms should be scalable, so that system performance does not degrade with a growth in its size Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.11 11 Kernel Structure • Kernel of a multiprocessor OS (SMP architecture) is called an SMP kernel – Any CPU can execute code in the kernel, and many CPUs could do so in parallel • Based on two fundamental provisions: – Kernel is reentrant – CPUs coordinate their activities through synchronization and interprocessor interrupts Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.12 12 Kernel Structure: Synchronization • Mutex locks for synchronization – Locking can be coarse-grained or fine-grained • Tradeoffs: simplicity vs. loss of parallelism • Deadlocks are an issue in fine-grained locking • Parallelism can be ensured without substantial locking overhead: – Use of separate locks for kernel functionalities – Partitioning of the data structures of a kernel functionality Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.13 13 Kernel Structure: Heap Management • Parallelism in heap management can be provided by maintaining several free lists • Locking is unnecessary if each CPU has its own free list – Would degrade performance • Allocation decisions would not be optimal • Alternative: separate free lists to hold free memory areas of different sizes – CPU locks an appropriate free list Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.14 14 Kernel Structure: Scheduling • Suffers from heavy contention for mutex locks Lrq and Lawt because every CPU needs to set/release these locks while scheduling – Alternative: Partition processes into subsets and entrust each subset to a CPU for scheduling – Fast scheduling but suboptimal performance • An SMP kernel provides graceful degradation Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.15 15 Kernel Structure: NUMA Kernel • CPUs in NUMA systems have different memory access times for local and nonlocal memory • Each node in a NUMA system has its own separate kernel – Exclusively schedules processes whose address spaces are in local memory of the node – Concept can be generalized: An application region ensures good performance of an application. It has • A resource partition with one or more CPUs • An instance of the kernel Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.16 16 Process Synchronization Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.17 17 Process Synchronization (continued) • Queued locks may not be scalable • In NUMA, spin locks may lead to lock starvation • Sleep locks may be preferred to spin locks if the memory or network traffic densities are high Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.18 18 Special Hardware for Process Synchronization • The Sequent Balance system uses a special bus called system link and interface controller (SLIC) for synchronization – Special 64-bit register in each CPU in the system • Each bit implements a spin lock using SLIC – Spinning doesn’t generate memory/network traffic Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.19 19 A Scalable Software Scheme for Process Synchronization • Scheme for process synchronization – NUMA and NORMA architectures – Scalable performance • Minimizes synchronization traffic to nonlocal memory units (NUMA) and over network (NORMA) Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.20 20 Process Synchronization (continued) • Scheduling aware synchronization – Adaptive lock • A process waiting for this lock spins if holder of the lock is scheduled to run in parallel • Otherwise, the process is preempted and queued as in a queued lock Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.21 21 Process Scheduling • CPU scheduling decisions affect performance – How, when and where to schedule processes • Affinity scheduling: schedule a process on a CPU where it has executed in the past • Good cache hit ratio • Interferes with load balancing across CPUs • In SMP kernel CPUs can perform own scheduling – Prevents kernel from becoming bottleneck – Leads to scheduling anomalies • Correcting requires shuffling of processes Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.22 22 Example: Process Shuffling in an SMP Kernel • Process shuffling can be implemented by using the assigned workload table AWT and the interprocessor interrupt (IPI) – However, it leads to high scheduling overhead • Effect is more pronounced in a system containing a large number of CPUs Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.23 23 Process Scheduling (continued) • Processes of an application should be scheduled on different CPUs at the same time if they use spin locks for synchronization – Called coscheduling or gang scheduling • A different approach is required when processes exchange messages by using a blocking protocol – In some situations, special efforts should be made not to schedule such processes in same time slice Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.24 24 Case Studies • Mach • Linux • SMP Support in Windows Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.25 25 Mach • Mach OS implements scheduling hints – Thread issues hint to influence processor scheduling • For example, a hands-off hint to relinquish CPU in favor of a specific thread Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.26 26 Linux • Multiprocessing support introduced in 2.0 kernel – Coarse-grained locking was employed • Granularity of locks was made finer in later releases – Kernel was still nonpreemptible until 2.6 kernel • Kernel provides: – Spin locks for locking of data structures – Reader–writer spin locks – Sequence lock • Per-CPU data structures to reduce lock contention • Other features: hard and soft affinity, load balancing Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.27 27 SMP Support in Windows • A hyperthreaded CPU is considered to be several logical processors • Spin locks provide mutual exclusion over kernel data – A thread holding a spinlock is never preempted • Queued spinlock uses a scalable software implementation scheme • Uses many free lists of memory for parallel access • Process default processor affinity and thread processor affinity together define thread affinity set • Ideal processor defines hard affinity for a thread • Uses both hard and soft affinity Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.28 28 Summary • Multiprocessor OS exploits multiple CPUs in computer to provide high throughput (system), computation speedup (application), and graceful degradation (of OS, when faults occur) • Classification of uniprocessors – Uniform memory architecture (UMA) • Also called Symmetrical multiprocessor (SMP) – Nonuniform memory architecture (NUMA) • OS efficiently schedules user processes in parallel – Issues: kernel structure and synchronization delays Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.29 29 Summary (continued) • Multiprocessor OS algorithms must be scalable • Use of special kinds of locks: – Spin locks and sleep locks • Important scheduling concepts in multiprocessor OSs: – Affinity scheduling – Coscheduling – Process shuffling Operating Systems, by Dhananjay Dhamdhere Copyright © 2008 10.30 30