Download Lecture3-os-support

Lecture III: OS Support CMPT 401 Dr. Alexandra Fedorova The Role of the OS • The operating system needs to provide support for implementation of distributed systems • We will look at how distributed systems services interact with the operating systems • We will discuss the support that the operating system needs to provide CMPT 401 © A. Fedorova 2 Direct Interaction with the OS Process: a DS component • A process directly interacts with the OS via system calls • Example: a web browser, a web server system calls OS CMPT 401 © A. Fedorova 3 Interaction via Middleware Layer Process: a DS component Function calls or IPC Middleware system calls • A process directly interacts with the OS via a middleware layer • A middleware layer directly interacts with the OS • Example: a peer-to-peer file system implemented over a distributed hash table OS CMPT 401 © A. Fedorova 4 Interaction via Inclusion OS DS component • A DS component is a part of the operating system, i.e., an operating system daemon • Example: Network File System (NFS) daemon • Runs as a kernel thread, shares address space with the kernel, interacts with the rest of the OS via function calls • Why would one want to build a DS component that interacts with the OS via inclusion? CMPT 401 © A. Fedorova 5 Digression: Protection Implementation In the Kernel • • • • • System calls are expensive Why? – Protection domains Refresh memory protection from your OS class Good thing: we get memory protection Bad thing: crossing protection domains is expensive. Why? • So is this the best solution? CMPT 401 © A. Fedorova 6 Alternative: Protection Via Language • Safety features are guaranteed by language/runtime • Compiler checks safe memory access • In addition there are manifests w.r.t. what the process will and will not do • This way you get protection • And no need for hardware protection domains – everything can run in a single address space • Singularity: an OS from Microsoft implemented these concepts • ... End digression CMPT 401 © A. Fedorova 7 Infrastructure Provided by the OS • Networking – Interface to network devices – Implementation of common protocols: TPC, UDP, IP • Processes and threads – Efficient scheduling, load balancing and thread switching – Efficient thread synchronization – Efficient inter-process communication (IPC) CMPT 401 © A. Fedorova 8 The Need for Good Process/Thread Support • • Many distributed applications are implemented using multiple threads or processes Why? CMPT 401 © A. Fedorova 9 Motivation for Multithreaded Designs • • • compute • block • Servers provide access to large data sets (web servers, e-commerce servers) Even in the presence of caching, they often need to do I/O (to access files on disk or a network FS) I/O takes much longer than computation Overlapping I/O with computation to improve response time Threads make it easy to overlap I/O with computation While one thread blocks on I/O another can perform computation time • 1 request CMPT 401 © A. Fedorova Multiple threads Single thread 1.6 requests 10 Process or Thread Scheduling • Will use “process” and “thread” interchangeably – A single-threaded process maps to a kernel thread – Each thread in a multithreaded process (usually) maps to a kernel thread • A scheduler decides which thread runs next on the CPU • To ensure good support for DS components, a scheduler must: – – – – Be scalable Balance the load well Ensure good interactive response Keep context switches to a minimum (why?) CMPT 401 © A. Fedorova 11 Case Study: Solaris™ 10 OS • Solaris is often used on server systems • Known for its good scalability, good load balancing and interactive performance • We will look at Solaris runqueues and how they are managed – A runqueue is a scheduling queue – A structure containing pointers to runnable threads – i.e., threads that are waiting for CPU CMPT 401 © A. Fedorova 12 Runqueues in Solaris Global kernel priority queue kpqueue User priority queues for CPU0 disp_qs Pri 0 Pri 1 User priority queues for CPU1 disp_qs Pri N Pri 0 Pri 1 … • • Pri N … There is a user-level queue for each priority level A dispatcher runs the thread from the highest-priority non-empty queue CMPT 401 © A. Fedorova 13 Processor Load Balancing • Load balancing ensures that the load is evenly distributed among the CPUs on a multiprocessor • This improves the overall response time • Solaris kernel ensures that queues are well balanced when it enqueues a thread into a runqueue /* * setbackdq() keeps runqs balanced such that the difference in length * between the chosen runq and the next one is no more than RUNQ_MAX_DIFF. * (…) */ A comment from Solaris source code. Source: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/disp.c, line 1200 CMPT 401 © A. Fedorova 14 Tuning Thread Priorities For Improved Response Time • If a thread has waited too long for a processor, its priority is elevated, so no thread is starved • Threads holding critical resources are put to the front of the queue so that they release those resources as quickly as possible /* * Put the specified thread on the front of the dispatcher * queue corresponding to its current priority. * * Called with the thread in transition, onproc or stopped state * and locked (transition implies locked) and at high spl. * Returns with the thread in TS_RUN state and still locked. */ A comment on setfrontdq from Solaris source code. Source: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/disp/disp.c, line 1381 CMPT 401 © A. Fedorova 15 Ensuring Good Responsiveness in TimeSharing Scheduler • Solaris’s time-sharing scheduler (the default scheduler) assigns priorities so as to ensure good interactive performance • Timeslice: the amount of time a thread can run on CPU before it is pre-empted • If thread T used up it’s entire timeslice on CPU: – priority(T)↓, timeslice(T)↑ • If thread T has given up CPU before using up its timeslice: – priority(T) ↑, timeslice (T) ↓ • Why is this done? CMPT 401 © A. Fedorova 16 Time-Sharing Scheduler: Answers • Minimizing context switch costs: – CPU-bound threads stay on CPU longer without a context switch – In compensation, they are scheduled less often, due to decreased priority – Reducing the number of context switches improves performance • Ensuring good response for interactive applications – Interactive applications usually don’t use up their entire timeslice – Example: process a network message and release the CPU before the timeslice expires – Those applications will have their priority elevated, so they will respond quickly when response is needed (e.g., the next network packet arrives) CMPT 401 © A. Fedorova 17 What Limits Performance of MP/MT Applications? • The cost of context switching – depends on the hardware; the OS cannot fix it alone – Save/restore the registers – Flush the CPU pipeline – If switching address spaces • May need to flush the TLB (depends on the processor) • May need to flush the cache (depends on the processor) • The cost of inter-process communication(IPC): requires context switching • The cost of inter-thread synchronization – by and large depends on the program structure; OS can fix some of it, but not all CMPT 401 © A. Fedorova 18 Thread Synchronization CMPT 401 © A. Fedorova If lock is not available, threads wait Execution becomes serialized 19 Next… • Talk about synchronization • Operating system support for efficient synchronization • Transactional memory – new programming paradigm for efficient synchronization CMPT 401 © A. Fedorova 20

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture3-os-support