Download ppt

Improving IPC by Kernel Design Jochen Liedtke Proceeding of the 14th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993 The Performance of u-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Proceedings of the 16th Symposium on Operating Systems Principles October 1997, pp. 66-77 Jochen Liedtke (1953 – 2001) • 1977 – Diploma in Mathematics from University of Beilefeld. • 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles. • 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel. The IPC Dilemma • IPC is a core paradigm of u-kernel architectures • Most IPC implementations perform poorly • Really fast message passing systems are needed to run device drivers and other performance critical components at the user-level. • Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture What to Do? • Optimize IPC performance above all else! • Results: L3 and L4: second-generation microkernel based operating systems • Many clever optimizations, but no single “silver bullet” Summary of Techniques Seventeen Total Standard System Calls (Send/Recv) Kernel entered/exited four times per call! Client (Sender) send ( ); System call, Enter kernel Exit kernel Server (Receiver) receive ( ); System call, Enter kernel Exit kernel send ( ); System call, Enter kernel Exit kernel Client is not Blocked receive ( ); System call, Enter kernel Exit kernel New Call/Response-based System Calls Special system calls for RPC-style interaction Kernel entered and exited only twice per call! Client (Sender) Server (Receiver) reply_and_recv_next ( ); call ( ); System call, Enter kernel Allocate CPU to Server Suspend Resume from being suspended Exit kernel handle message Re allocate CPU to Client Exit kernel reply_and_recv_next ( ); Enter kernel Send Reply Wait for next message Complex Message Structure Batching IPC Combine a sequence of send operations into a single operation by supporting complex messages • Benefit: reduces number of sends. Direct Transfer by Temporary Mapping • Naïve message transfer: copy from sender to kernel then from kernel to receiver • Optimizing transfer by sharing memory between sender and receiver is not secure • L3 supports single-copy transfers by temporarily mapping a communication window into the sender. Scheduling • Conventionally, ipc operations call or reply & receive require scheduling actions: – – – – Delete sending thread from the ready queue. Insert sending thread into the waiting queue Delete the receiving thread from the waiting queue. Insert receiving thread into the ready queue. • These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T). Solution, Lazy Scheduling • Don’t bother updating the scheduler queues! • Instead, delay the movement of threads among queues until the queues are queried. • Why? – A sending thread that blocks will soon unblock again, and maybe nobody will ever notice that it blocked • Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks Pass Short Messages in Registers • Most messages are very short, 8 bytes (plus 8 bytes of sender id) – Eg. ack/error replies from device drivers or hardware initiated interrupt messages. • Transfer short messages via cpu registers. • Performance gain of 2.4 us or 48%T. Impact on IPC Performance • For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement. • For large message (4K) a 3 fold improvement is seen. Relative Importance of Techniques • Quantifiable impact of techniques – 49% means that that removing that item would increase ipc time by 49%. OS and Application-Level Performance OS-Level Performance Application-Level Performance Conclusion • Use a synergistic approach to improve IPC performance – A thorough understanding of hardware/software interaction is required – no “silver bullet” • IPC performance can be improved by a factor of 10 • … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS – L4-based Linux outperforms Mach-based Linux, but not monolithic Linux

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ppt