Download Intel On-line Template

Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS535 4/7/2003 1 Outline  Multiprocessor Systems – Issues in MP Scheduling – How to Allocate Processors – Cache Affinity – Linux MP Scheduling  Simultaneous Multithreaded Systems – Issues in SMT Scheduling – Symbiotic Jobscheduling – SMT and Priorities – Linux SMT Scheduling  Conclusions Multiprocessor Systems  Symmetric Multiprocessing (SMP): – One copy of OS in memory, any CPU can use it – OS must ensure that multiple processors cannot access shared data structures at the same time CPU CPU CPU CPU Shared Memory CPU CPU CPU CPU Shared Memory Multiprocessors Issues in MP Scheduling  Starvation – Number of active parallel threads < number of allocated processors  Overhead – CPU time used to transfer and start various portions of the application  Contention – Multiple threads attempt to use same shared resource  Latency – Delay in communication between processors and I/O devices How to allocate processors  Allocate proportional to average parallelism  Other factors: –System load –Variable parallelism –Min/Max parallelism  Acquire/relinquish processors based on current program needs Cache Affinity  While a program runs, data needed is placed in local cache  When job is rescheduled, it will likely access some of the same data  Scheduling jobs where they have “affinity” improves performance by reducing cache penalties Cache Affinity (cont)  Tradeoff between processor reallocation and cost of reallocation –Utilization versus cache behavior  Scheduling policies: –Equipartition: constant number of processors allocated evenly to all jobs. Low overhead. –Dynamic: constantly reallocates jobs to maximize utilization. High utilization. Cache Affinity (cont)  Vaswani and Zahoran, 1991 –When a processor becomes available, allocate it to runnable process that was last run on processor, or higher priority job –If a job requests additional processors, allocate critical tasks on processor with highest affinity –If an allocated processor becomes idle, hold it for a small amount of time in case task with affinity comes along Vaswani and Zahoran, 1991  Results showed that utilization was dominant effect on performance, not cache affinity – But their algorithm did not degrade performance  Predicted that as processor speeds increase, significance of cache affinity will also increase  Later studies validated their predictions Linux 2.5 MP Scheduling  Each processor responsible for scheduling own tasks – schedule()  After process switch, check if new process should be transferred to other CPU running lower priority task – reschedule_idle()  Cache affinity – Affinity mask stored in /proc/pid/affinity – sched_setaffinity(), sched_getaffinity() What is SMT?  Simultaneous Multithreading – aka HyperThreading®  Issue instructions from multiple threads simultaneously on a superscalar processor Thread 1 Time Thread 2 ALU FPU BP Mem Why SMT?  Technique to exploit parallelism in and between programs with minimal additions in chip resources  Operating system treats SMT processor as two separate processors* Thread Thread 1 2 Operating System Processor 1 Processor 2 Operating System Issues With SMT Scheduling  *Not really separate processors: –Share same caches  MP scheduling attempts to avoid idle processors –SMT-aware scheduler must differentiate between physical and logical processors Symbiotic Jobscheduling  Recent studies from U of Washington –Origin of early research into SMT  OS coschedules jobs to run on hardware threads  # of coscheduled jobs <= SMT level  Occasionally swap out running set to ensure fairness Symbiotic Jobscheduling (cont)  Shared system resources: –Functional units, caches, TLB’s, etc…  Coscheduled jobs may interact well… –Few resource conflicts, high utilization  Or they may interact poorly –Many resource conflicts, lower utilization  Choice of coscheduled jobs can have large impact on system performance Symbiotic Jobscheduling (cont)  Improve symbiosis by coscheduling jobs that get along well  Two phases of SOS (Sample, Optimize, Symbios) jobscheduler: –Sample – Gather data on current performance –Symbios – Use computed scheduling configuration Symbiotic Jobscheduling (cont)  Sample phase: –Periodically alter coscheduled job mix –Record system utilization from hardware performance counter registers  Symbios phase: –Pick job mix that had the highest utilization  Trade-off between sampling often or infrequently How to Measure Utilization?  IPC not necessarily best predictor: – IPC can have high variations throughout process – High-IPC threads may unfairly take system resources from low-IPC threads  Other predictors: low # conflicts, high cache hit rate, diverse instruction mix  Balance: schedule with lowest deviation in IPC between coschedules is considered best What About Priorities?  Scheduler estimates the “natural” IPC of job  If a high-priority jobs is not meeting the desired IPC, it will be exclusively scheduled on CPU  Provides a truer implementation of priority: –Normal schedulers only guarantee proportional resource sharing, assumes no interaction between jobs Another Priority Algorithm:  SMT hardware fetches instructions to issue from queue  Scheduler can bias fetching algorithm to give preference to high-priority threads  Hardware already exists, minimal modifications Symbiosis Performance Results  Without priorities: –Up to 17% improvement  Software-enforced priorities: –Up to 20%, average 8%  Hardware-based priorities: –Up to 30%, average 15% Linux 2.5 SMT Scheduling  Immediate reschedule forced when HT CPU is executing two idle processes  HT-aware affinity: processes prefer same physical CPU  HT-aware load-balancing: distinguish logical and physical CPU in resource allocation Conclusions  Intelligent allocation of resources can improve performance in parallel systems  Dynamic scheduling of processors in MP systems produces better utilization as processor speeds increase – Cache affinity can help improve throughput  Symbiotic coscheduling of tasks in SMT systems can improve average response time Resources  Kenneth Sevcik, “Characterizations of Parallelism in Applications and Their Use in Scheduling”  Raj Vaswani and John Zahoran, “The Implications of Cache Affinity on Processor Scheduling for Multiprogrammed, Shared Memory Multiprocessors”  Allan Snavely et al., “Symbiotic Jobscheduling with Priorities for a Simultaneous Multithreading Processor”  Linux MP cache affinity, http://www.tech9.net/rml/linux  Linux Hyperthreading Scheduler, http://www.kernel.org/pub/linux/kernel/people/rust y/Hyperthread_Scheduler_Modifications.html  Daniel Bovet and Marco Cesati, Understanding the Linux Kernel

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Intel On-line Template