* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Advanced Operating Systems
Plan 9 from Bell Labs wikipedia , lookup
Library (computing) wikipedia , lookup
Burroughs MCP wikipedia , lookup
Process management (computing) wikipedia , lookup
Mobile operating system wikipedia , lookup
Berkeley Software Distribution wikipedia , lookup
Copland (operating system) wikipedia , lookup
Unix security wikipedia , lookup
Security-focused operating system wikipedia , lookup
Advanced Operating Systems Lecture 3: OS design University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Univ. of Tehran Advanced Operating Systems 1 How to design an OS   Some general guides and experiences. References   “Exokernel: An Operating System Architecture for Application Level Resource Management”, Dawson R., Engler M, Frans Kaashoek, et al. “On Micro-Kernel Constructions“, Univ. of Tehran Advanced Operating Systems 2 Outline     New applications/requirements Organizing operating systems Some microkernel examples Object-oriented organizations   Spring Organization for multiprocessors Univ. of Tehran Advanced Operating Systems 3 New vision       Two important problems: location and scale. Ubiquitous computing: tiny kernels of functionality Virtual Reality Mobility Intelligent devices distributed computing" make networks appear like disks, memory, or other nonnetworked devices. Univ. of Tehran Advanced Operating Systems 4 What is the big deal?   Performance Border crossings are expensive    Change in locality Copying between user and kernel buffers Application requirements differ in terms of resource management Univ. of Tehran Advanced Operating Systems 5 Operating System Organization    What is the best way to design an operating system? Put another way, what are the important software characteristics of an OS? What should be in OS kernel or application or partitioning.  Is there a minimal set for kernel? Univ. of Tehran Advanced Operating Systems 6 Important OS Software Characteristics     Correctness and simplicity Power and completeness Performance Extensibility and portability      Flexibility Scalability Suitability for distributed and parallel systems Compatibility with existing systems Security and fault tolerance Univ. of Tehran Advanced Operating Systems 7 Common OS Organizations    Monolithic Virtual machine Structured design     Layered designs Object-Oriented Microkernels Trade off between generality and specialization Univ. of Tehran Advanced Operating Systems 8 Monolithic OS Design  Build OS as single combined module    Hopefully using data abstraction. OS lives in its own, single address space Examples    DOS early Unix systems most VFS file systems Univ. of Tehran Advanced Operating Systems 9 Pros/Cons of Monolithic OS Organization + + + – – – – Highly adaptable (at first . . .) Little planning required Potentially good performance Hard to extend and change Eventually becomes extremely complex Eventually performance becomes poor Highly prone to bugs Univ. of Tehran Advanced Operating Systems 10 Virtual Machine Organizations   A base operating system provides services in a very generic way One or more other operating systems live on top of the base system    Using the services it provides To offer different views of system to users Examples - IBM’s VM/370, the Java interpreter Univ. of Tehran Advanced Operating Systems 11 Pros/Cons of Virtual Machine Organizations + + + – – – Allows multiple OS personalities on a single machine Good OS development environment Can provide good portability of applications Significant performance problems Especially if more than 2 layers Lacking in flexibility Univ. of Tehran Advanced Operating Systems 12 Old idea  VM 370   Virtualization for binary support for legacy apps Why resurgence today?  Companies want a share of everybody’s pie  IBM zSeries “mainframes” support virtualization for server consolidation   Enables billing and performance isolation while hosting several customers Microsoft has announced virtualization plans to allow easy upgrades and hosting Linux! Univ. of Tehran Advanced Operating Systems 13 Layered OS Design   Design tiny innermost layer of software Next layer out provides more functionality    Using services provided by inner layer Continue adding layers until all functionality required has been provided Examples    Multics Fluke layered file systems and comm. protocols Univ. of Tehran Advanced Operating Systems 14 Pros/Cons of Layered Organization + + – – More structured and extensible Easy model and development Performance: Layer crossing can be expensive In some cases, unnecessary layers, duplicated functionality. Univ. of Tehran Advanced Operating Systems 15 Two layer OS Designs  Only two OS layers    Move certain functionality outside kernel    Kernel OS services Non-kernel OS services file systems, libraries Unlike virtual machines, kernel doesn’t stand alone Examples - Most modern Unix systems Univ. of Tehran Advanced Operating Systems 16 Pros/Cons of two layer OS + + – – – Many advantages of layering, without disadvantage of too many layers Easier to demonstrate correctness Not as general as layering Offers no organizing principle for other parts of OS, user services Kernels tend to grow to monoliths Univ. of Tehran Advanced Operating Systems 17 Object-Oriented OS Design     Design internals of OS as set of privileged objects, using OO methods Sometimes extended into application space Tends to lead to client/server style of computing Examples   Mach (internally) Spring (totally) Univ. of Tehran Advanced Operating Systems 18 Object-Oriented Organizations   Object-oriented organization is increasingly popular Well suited to OS development, in some ways    OSes manage important data structures OSes are modularizable Strong interfaces are good in OSes Univ. of Tehran Advanced Operating Systems 19 Object-Orientation and Extensibility    One of the main advantages of objectoriented programming is extensibility Operating systems increasingly need extensibility So, again, object-oriented techniques are a good match for operating system design Univ. of Tehran Advanced Operating Systems 20 How object-oriented should an OS be?  Many OSes have been built with objectoriented techniques   E.g., Mach and Windows NT But most of them leave object orientation at the microkernel boundary  No attempt to force object orientation on out-of-kernel modules Univ. of Tehran Advanced Operating Systems 21 Pros/Cons of Object Oriented OS Organization Offers organizational model for entire system + Easily divides system into pieces + Good hooks for security – Can be a limiting model – Must watch for performance problems Not widely used yet + Univ. of Tehran Advanced Operating Systems 22 Microkernel OS Design    Like kernels, only less number of abstractions exported (threads, address space, communication channel) Try to include only small set of required services in the microkernel Moves even more out of innermost OS part     Like parts of VM, IPC, paging, etc. System services (e.g. VM manager) implemented as servers on top High comm overhead between services implemented at user level and microkernel limits extensibility in practice Examples - Mach, Amoeba, Plan 9, Windows NT, Chorus, Spring, etc. Univ. of Tehran Advanced Operating Systems 23 Pros/Cons of Microkernel Organization + + + – – – Those of kernels, plus: Minimizes code for most important OS services Offers model for entire system Microkernels tend to grow into kernels Requires very careful initial design choices Serious danger of bad performance Univ. of Tehran Advanced Operating Systems 24 Organizing the Total System    In microkernel organizations, much of the OS is outside the microkernel But that doesn’t answer the question of how the system as a whole gets organized How do you fit together the components to build an integrated system? While maintaining all the advantages of the microkernel Univ. of Tehran Advanced Operating Systems 25 Some Important Microkernel Designs Micro-ness is in the eye of the beholder  Spin  X-kernel  Exokernel  Mach  Spring  Amoeba  Plan 9  Windows NT Univ. of Tehran Advanced Operating Systems 26 Mach  Mach didn’t start life as a microkernel   Object-oriented internally    Became one in Mach 3.0 Doesn’t force OO at higher levels Microkernel focus is on communications facilities Much concern with parallel/distributed systems Univ. of Tehran Advanced Operating Systems 27 Mach Model User processes Software emulation 4.3BSD SysV HP/UX other emul. emul. emul. emul. layer Microkernel Univ. of Tehran Advanced Operating Systems User space Kernel space 28 What’s In the Mach Microkernel?       Tasks & Threads Ports and Port Sets Messages Memory Objects Device Support Multiprocessor/Distributed Support Univ. of Tehran Advanced Operating Systems 29 Kernel User space Mach Task Model Address space Process Thread Process port Univ. of Tehran Bootstrap port Exception Registered port ports Advanced Operating Systems 30 Mach Ports  Basic Mach object reference mechanism      Kernel-protected communication channel Tasks communicate by sending messages to ports Threads in receiving tasks pull messages off a queue Ports are location independent Port queues protected by kernel; bounded Univ. of Tehran Advanced Operating Systems 31 Port Rights    mechanism by which tasks control who may talk to their ports Kernel prevents messages being set to a port unless the sender has its port rights Port rights also control which single task receives on a port Univ. of Tehran Advanced Operating Systems 32 Port Sets   A group of ports sharing a common message queue A thread can receive messages from a port set    Thus servicing multiple ports Messages are tagged with the actual port A port can be a member of at most one port set Univ. of Tehran Advanced Operating Systems 33 Mach Messages  Typed collection of data objects      Unlimited size Sent to particular port May contain actual data or pointer to data Port rights may be passed in a message Kernel inspects messages for particular data types (like port rights) Univ. of Tehran Advanced Operating Systems 34 Mach Memory Objects   A source of memory accessible by tasks May be managed by user-mode external memory manager    a file managed by a file server Accessed by messages through a port Kernel manages physical memory as cache of contents of memory objects Univ. of Tehran Advanced Operating Systems 35 Mach Device Support    Devices represented by ports Messages control the device and its data transfer Actual device driver outside the kernel in an external object Univ. of Tehran Advanced Operating Systems 36 Mach Multiprocessor and DS Support  Messages and ports can extend across processor/machine boundaries     Location transparent entities Kernel manages distributed hardware Per-processor data structures, but also structures shared across the processors Intermachine messages handled by a server that knows about network details Univ. of Tehran Advanced Operating Systems 37 Mach’s NetMsgServer     User-level capability-based networking daemon Handles naming and transport for messages Provides world-wide name service for ports Messages sent to off-node ports go through this server Univ. of Tehran Advanced Operating Systems 38 NetMsgServer in Action User space User space User process User process NetMsgServer NetMsgServer Kernel space Kernel space Sender Receiver Univ. of Tehran Advanced Operating Systems 39 Mach and User Interfaces      Mach was built for the UNIX community UNIX programs don’t know about ports, messages, threads, and tasks How do UNIX programs run under Mach? Mach typically runs a user-level server that offers UNIX emulation Either provides UNIX system call semantics internally or translates it to Mach primitives Univ. of Tehran Advanced Operating Systems 40 Windows NT     More layered than some microkernel designs NT Microkernel provides base services Executive builds on base services via modules to provide user-level services User-level services used by   privileged subsystems (parts of OS) true user programs Univ. of Tehran Advanced Operating Systems 41 Windows NT Diagram User Processes Protected Subsystems Win32 POSIX Executive Microkernel User Mode Kernel Mode Hardware Univ. of Tehran Advanced Operating Systems 42 NT Microkernel      Thread scheduling Process switching Exception and interrupt handling Multiprocessor synchronization Only NT part not preemptible or pageable  All other NT components runs in threads Univ. of Tehran Advanced Operating Systems 43 NT Executive   Higher level services than microkernel Runs in kernel mode    but separate from the microkernel itself ease of change and expansion Built of independent modules  all preemptible and pageable Univ. of Tehran Advanced Operating Systems 44 NT Executive Modules       Object manager Security reference monitor Process manager Local procedure call facility (a la RPC) Virtual memory manager I/O manager Univ. of Tehran Advanced Operating Systems 45 Typical Activity in NT Win32 Protected Subsystem Client Process Executive Kernel Hardware Univ. of Tehran Advanced Operating Systems 46 More On Microkernels     Microkernels were the research architecture of the 80s But few commercial systems of the 90s really use microkernels To some extent, “microkernel” is now a dirty word in OS design Why? Univ. of Tehran Advanced Operating Systems 47 Main Issue  What should be in the Kernel?   Different designs give different answers. How to implement the system efficiently?   Some people think Micro kernel is slow Micro kernel construction paper argue other way. Univ. of Tehran Advanced Operating Systems 48 Exokernel   Traditional operating systems fix the interface and implementation of OS abstractions. Abstractions must be overly general to work with diverse application needs. Apache FIXED SQL Server Abstractions Interface Traditional OS Hardware Univ. of Tehran Advanced Operating Systems 49 The Issues  Performance   Flexibility   Denies applications the advantages of domain-specific optimizations Restricts the flexibility of application builders Functionality  Discourages changes to the implementations of existing abstractions Univ. of Tehran Advanced Operating Systems 50 Performance      Example: A DB can have predictable data access patterns, that doesn't fit with OS LRU page replacement, causing bad performance. Cao et al. Found that application-controlled file caching can reduce running time by as much as 45%. There is no single way to abstract physical resources or to implement an abstraction that is best for all applications. OS is forced to make trade-offs Performance improvements of application-specific policies could be substantial Univ. of Tehran Advanced Operating Systems 51 Flexibility   Fixed high-level abstractions hide information from applications. Makes it difficult or impossible for applications to implement their own resource management abstractions. Univ. of Tehran Advanced Operating Systems 52 Functionality   Only one available interface between applications and hardware resources. Because all applications must share one set of abstractions, changes to these abstractions occur rarely, if ever Univ. of Tehran Advanced Operating Systems 53 The Solution  Separate protection from management  Allow user level to manage resources   Application libraries implement OS abstractions Exokernel exports resources    Low level interface Protects, does not manage Expose hardware Univ. of Tehran Advanced Operating Systems 54 Exokernel Philosophy Applications know better than Operating Systems what the goal of their resource management decisions should be Applications should be given as much control as possible over those decisions Implementation view HW Exokernel Frame Buffer | TLB | Network | Memory | Disk Univ. of Tehran Advanced Operating Systems 55 Example Exokernel – Application level resource management Apache SQL Server Library OS Chosen from available Abstractions Interface Library OS Customized for SQLServer Abstractions Interface Exokernel Univ. of Tehran Hardware Advanced Operating Systems 56 Implementation Overview Library O.S. HW  Exokernel Frame Buffer | TLB | Network | Memory | Disk Library O.S., which uses the low-level exokernel interface to implement higher-level abstractions. Univ. of Tehran Advanced Operating Systems 57 Implementation Overview HW  Application Application Library O.S. Library O.S. Exokernel Frame Buffer | TLB | Network | Memory | Disk Applications link to library kernel, leveraging their higher-level abstractions. Univ. of Tehran Advanced Operating Systems 58 End-to-End Argument    “if something has to be done by the user program itself, it is wasteful to do it in a lower level as well.” Why should the OS do anything that the user program can do itself? In other words - all an OS should do is securely allocate resources. Univ. of Tehran Advanced Operating Systems 59 Exokernel design Univ. of Tehran Advanced Operating Systems 60 Exokernel tasks     Track ownership Guard all resources through bind points Revoke access to resources Abort Univ. of Tehran Advanced Operating Systems 61 Design principle     Expose Expose Expose Expose Univ. of Tehran hardware (securely) allocation names revocation Advanced Operating Systems 62 Secure binding    Decouples authorization from use Allows kernel to protect resource without understanding their semantics Example: TLB entry    Virtual to physical mapping performed in the library (above exokernel) Binding loaded into the kernel; used multiple times Example: packet filter   Predicates loaded into the kernel Checked on each packet arrival Univ. of Tehran Advanced Operating Systems 63 Implementing secure bindings  Hardware mechanisms    Software caching   Capability for physical pages of a file Frame buffer regions (SGI) Exokernel large software TLB overlaying the hardware TLB Downloading code into kernel  Avoid expensive boundary crossings Univ. of Tehran Advanced Operating Systems 64 Examples of secure binding  Physical memory allocation (hardware supported binding)    Library allocates physical page Exokernel records the allocator and the permissions and returns a “capability” – an encrypted cypher Every access to this page by the library requires this capability Page fault: •Kernel fields it •Kicks it up to the library •Library allocated a page – gets an encrypted capability •Library calls the kernel to enter a particular translation into the TLB by presenting the capability Univ. of Tehran Advanced Operating Systems 65  Download code into kernel to establish secure binding     Packet filter for demultiplexing network packets How to ensure authenticity? Only trusted servers (library OS) can download code into the kernel Other use of downloaded code   Execute code on behalf of an app that is not currently scheduled E.g. application handler for garbage collection could be installed in the kernel Univ. of Tehran Advanced Operating Systems 66 Visible resource revocation  Most resources are visibly revoked   E.g. processor; physical page Library can then perform necessary action before relinquishing the resource   E.g. needed state saving for a processor E.g. update of page table Univ. of Tehran Advanced Operating Systems 67 Abort protocol   Repossession exception passed to the library OS Repossession vector   Gives info to the library OS as to what was repossessed so that corrective action can be taken Library OS can seed the vector to enable exokernel to autosave (e.g. disk blocks to which a physical page being repossessed should be written to) Univ. of Tehran Advanced Operating Systems 68 Aegis – an exokernel Univ. of Tehran Advanced Operating Systems 69 Secure Bindings  Secure Binding – a protection mechanism that decouples authorization from actual use of a resource  Allows the kernel to protect resources without having to understand them Univ. of Tehran Advanced Operating Systems 70 Aegis – processor time slice     Linear vector of time slots Round robin An application can mark its “position” in the vector for scheduling Timer interrupt    Beginning and end of time slices Control transferred to library specified handler for actual saving/restoring Time to save/restore is bounded  Penalty? loss of a time slice next time! Univ. of Tehran Advanced Operating Systems 71 Aegis – processor environments  Exception context   Interrupt context   External: e,g. timer Protected entry context   Program generated Cross domain calls Addressing context  Guaranteed mappings implemented by software TLB mimicking the library OS page table Univ. of Tehran Advanced Operating Systems 72 Aegis performance Univ. of Tehran Advanced Operating Systems 73 Aegis - Address translation  On TLB miss      Kernel installs hardware from software TLB for guaranteed mappings Otherwise application handler called Application establishes mapping TLB entry with associated capability presented to the kernel Kernel installs and resumes execution of the application Univ. of Tehran Advanced Operating Systems 74 ExOS – library OS    IPC abstraction VM Remote communication using ASH (application specific safe handlers) Takeaway: significant performance improvement possible compared to a monolithic implementation Univ. of Tehran Advanced Operating Systems 75 Library operating systems    Use the low level exokernel interface Higher level abstractions Special purpose implementations An application can choose the library which best suits its needs, or even build its own. Univ. of Tehran Advanced Operating Systems 76 Another Example Univ. of Tehran Advanced Operating Systems 77 Exokernel vs. Microkernel   A micro-kernel provides abstractions to the hardware such as files, sockets, graphics etc. An exokernel provides almost raw access to the hardware. Univ. of Tehran Advanced Operating Systems 78 Design Challenge How can an Exokernel allow libOSes to freely manage physical resources while protecting them from each other?  Track ownership of resources    Secure bindings – libOS can securely bind to machine resources Guard all resource usage Revoke access to resources Univ. of Tehran Advanced Operating Systems 79 Secure Bindings  Exokernel allows libOSes to bind resources using secure bindings    Multiplex resources securely Protection for mutually distrusted apps Efficient Univ. of Tehran Advanced Operating Systems 80 Guard all resource usage Invisible resource revocation -Efficient – application layer not involved -Traditional OS Visible resource revocation -Allows libOS to guide deallocation and track availability of resources. -Exokernel Univ. of Tehran Advanced Operating Systems 81 Conclusion    An Exokernel securely multiplexes available hardware raw hardware among applications Application level library operating systems implement higher-level traditional OS abstractions LibOSes can specialize an implementation to suit a particular application Univ. of Tehran Advanced Operating Systems 82 Conclusion  The lower the level of a primitive… …the more efficiently it can be implemented … the more latitude it gives to higher level abstractions  So, separate management from protection and… …implement protection at a low level (exokernel) … implement management at a higher level (libOS) Univ. of Tehran Advanced Operating Systems 83 Exokernel Implementation Overview Allows the extension, specialization, and even replacement of abstractions.  Example: Page Table implementations can vary from libOS to libOS, and applications can choose whichever is most suitable for their needs. Univ. of Tehran Advanced Operating Systems 84 Exokernel Implementation Principles Provide libOS'es maximum freedom while protecting them from each other. It is achieved through separation of protection and resource management.    Resources should only be managed to the extent required for protection. LibOS'es handle how best to use resources, with exokernel arbitrating between competing libraries. LibOS's should be able to request specific physical resources (like specific physical pages). Resources should not be implicitly allocated; the LibOS should participate in every allocation. Univ. of Tehran Advanced Operating Systems 85 Exokernel Secure Bindings Protection mechanism that decouples authorization (bind time) from actual use of the resource (access time).  Authorization performed at bind time. Expressed in simple operations that the exokernel can implement quickly and efficiently. Can protect resources without understanding them.  Example: When a page fault occurs, virtual to physical address mapping is performed, the page is loaded by the exokernel (bind time), and then used multiple times (access time). Univ. of Tehran Advanced Operating Systems 86 Microkernel Construction  Most Microkernels do not perform well    Is it inherent in the approach or Implementation? IPC, microkernel bottleneck, can implemented an order of magnitude faster.    Not supervise memory Minimal address space management, grant, map, flush. Fast kernel-User Switch, usually 20-30 us but 3 in L3 implementation Univ. of Tehran Advanced Operating Systems 87 Exokernel Downloading Code   Code can be downloaded into the exokernel, for execution at defined events (like packet arrival). Reduces kernel crossings. Can execute even when the application isn't scheduled. Can initiate events (e.g. - initiate response message to packet) Example: A packet filter is downloaded into the exokernel (bind time), and then run on every incoming packet to determine the intended target application (access time), and can even initiate a response. Univ. of Tehran Advanced Operating Systems 88 Exokernel Visible Resource Revocation   Traditionally, OS's revoke (deallocate) resources invisibly, without application involvement (e.g. physical memory). Advantage: lower latency Disadvantage: applications cannot guide deallocation Exokernel uses visible revocation for most resources. The libraryOS is notified of the intention to deallocate, and has the capability of guiding the process. Example: libOS is told that exokernel will deallocate physical page “5”, it can use this information to update it's page table, or even to suggest a less important page for deallocation. Univ. of Tehran Advanced Operating Systems 89 Exokernel Abort Protocol   Mechanism to take away resources when libOS's fail to respond satisfactorily to visible revocation requests. A Repossession Vector is used to keep track of forcibly deallocated resources. Library OS's can pre-load the vector with information that can be used to write state or data about the resource when it is deallocated (e.g. - define disk blocks for memory paging).  OS's normally require certain allocations to be permanent, so exokernel can guarantee a small number of resources that cannot be forcibly deallocated. Example: page tables, exception areas Univ. of Tehran Advanced Operating Systems 90 Exokernel Implementation Aegis: Exokernel Exports: processor, physical memory, TLB, exceptions, interrupts, and network interface.  ExOS: Library OS   Implements: processes, virtual memory, userlevel exceptions, interprocess abstractions, and network protocols (ARP,IP,UDP,NFS) Compared to Ultrix Univ. of Tehran Advanced Operating Systems 91 Exokernel Aegis Processor Time Slices  Time Slices partitioned and allocated at the clock granularity. Scheduled using round robin. Advanced Scheduling can be implemented by libOS through requesting specific positions in the time slices.  Long running apps can allocate contiguous time slices, while interactive apps can allocate several equidistant slices Univ. of Tehran Advanced Operating Systems 92 Exokernel Aegis   Exceptions Interrupts Address Translations Guarantees address mappings for small number of pages, to simplify boot strapping.  Protected Control Transfers    For IPC abstractions Changes program counter to agreed location, sets appropriate data for context for callee, and donates current time slice. Dynamic Packet Filter Univ. of Tehran Advanced Operating Systems 93 Exokernel ExOS IPC Abstractions pipe: ExOS uses shared memory buffer, order of magnitude faster than Ultrix, which uses standard unix pipes. Application Level Virtual Memory 150x150 integer matrix mult – doesn't use any special ExOS or Aegis abilities – shows application level VM doesn't incur noticeable overhead (.1 second difference) All other tests performs comparably with Ultrix (reading pages, flipping protection bits, etc...)  Downloaded code for networking handler  Round Trip latency for RPC faster than FRPC Univ. of Tehran Advanced Operating Systems 94 Exokernel ExOS Extensibility  Extensible Page-Table structures   Implemented inverted page tables Extensible Schedulers Stride Scheduling (proportional share scheduling) The processes are succesfully scheduled at a ration of 3:2:1 Univ. of Tehran Advanced Operating Systems 95 Exokernel Conclusion Experiments with Aegis and ExOS show  Simple exokernel primitives can be implemented efficiently Fast low-level hardware multiplexing can be implemented efficiently Traditional OS abstractions can be implemented as User Level Applications can create special-purpose implementations by modifying libraries Univ. of Tehran Advanced Operating Systems 96 Exokernel Other Exokernel Work Porting Multithreading Libraries to an Exokernel System Ernest Artiaga, Albert Serra, Marisa Gil Dept. of Computer Architecture Universitat Politecnica de Catalunya ACM SIGOPS European Workshop, ACM 2000, pp. 121-126   Ported Cthreads to Exokernel Slightly faster execution than without threading Univ. of Tehran Advanced Operating Systems 97 Exokernel Other Exokernel Work Fast and Flexible Application-Level Networking on Exokernel System Gergory Ganger, Dawson Engled, et al. CMU, Stanford, MIT and Vividon, Inc. ACM Transactions on Computer Systems, vol. 20, no. 1, pp. 49--83, 2002     Implemented TCP, HTTP server, and web benchmarking tool TCP: 50-300% higher throughput HTTP: 3-8 higher throughput Benchmarking: Can produce loads 2-8 times heavier Univ. of Tehran Advanced Operating Systems 98 Micro Kernel construction  Microkernel should provide minimal abstractions    Address space, threads, IPC Abstractions machine independent but implementation hardware dependent for performance Myths about inefficiency of micro-kernel stem from inefficient implementation and NOT from microkernel approach Univ. of Tehran Advanced Operating Systems 99 What abstractions?  Determining criterion:   Hardware and microkernel should be trusted but applications are not    Functionality not performance Hardware provides page-based virtual memory Kernel builds on this to provide protection for services above and outside the microkernel Principles of independence and integrity   Subsystems independent of one another Integrity of channels between subsystems protected from other subsystems Univ. of Tehran Advanced Operating Systems 100 Microkernel Concepts  Hardware provides address space    mapping from virtual page to a physical page implemented by page tables and TLB Microkernel concept of address spaces  Hides the hardware address spaces and provides an abstraction that supports     Grant? Map? Flush? These primitives allows building a hierarchy of protected address spaces Univ. of Tehran Advanced Operating Systems 101 Address spaces R A1, P1 A2, P2 V2, NIL V1, R (P1, v1) (P1, v1) map A2, P2 R V2, R A3, P3 V3, R R (P2, v2) (P3, v3) (P1, v1) flush R A3, P3 V3, NIL (P2, v2) (P1, v1) grant Univ. of Tehran Advanced Operating Systems 102  Power and flexibility of address spaces      Initial memory manager for address space A0 appears by magic (similar to SPIN core service BUT outside the kernel) and encompasses the physical memory Allow creation of stackable memory managers (all outside the kernel) Pagers can be part of a memory manager or outside the memory manager All address space changes (map, grant, flush) orchestrated via kernel for protection Device driver can be implemented as a special memory manager outside the kernel as well Univ. of Tehran Advanced Operating Systems 103 PT M2, A2, P2 Map/grant M1, A1, P1 PT PT M0, A0, P0 Microkernel processor Univ. of Tehran Advanced Operating Systems 104 Threads and IPC  Executes in an address space   PC, SP, processor registers, and state info (such as address space) IPC is cross address space communication  Supported by the microkernel     Classic method is message passing between threads via the kernel Sender sends info; receiver decides if it wants to receive it, and if so where Address space operations such as map, grant, flush need IPC Higher level communication (e.g. RPC) built on top of basic IPC Univ. of Tehran Advanced Operating Systems 105  Interrupts?    Each hardware device is a thread from kernel’s perspective Interrupt is a null message from a hardware thread to the software thread Kernel transforms hardware interrupt into a message     Does not know or care about the semantics of the interrupt Device specific interrupt handling outside the kernel Clearing hardware state (if privileged) then carried out by the kernel upon driver thread’s next IPC TLB handler?   In theory software TLB handler can be outside the microkernel In practice first level TLB handler inside the microkernel or in hardware Univ. of Tehran Advanced Operating Systems 106 Unique IDs  Kernel provides uid over space and time for   Threads IPC channels Univ. of Tehran Advanced Operating Systems 107 Breaking some performance myths     Kernel user switches Address space switches Thread switches and IPC Memory effects Base system: 486 (50 MHz) – 20 ns cycle time Univ. of Tehran Advanced Operating Systems 108 Kernel-user switches  Machine instruction for entering and exiting   107 cycles Mach measures 900 cycles for kernel-user switch   Empirical proof   Why? L3 kernel ~ 123 cycles (accounting for some TLB, cache misses) Where did the remaining 800 cycles go in MACH?  Kernel overhead (construction of the kernel, and inherent in the approach) Univ. of Tehran Advanced Operating Systems 109 Address space switches  Primer on TLBs  AS tagged TLB (MIPS R4000) vs untagged TLB (486)   Instruction and data caches   Untagged TLB requires flush on AS switch Usually physically tagged in most modern processors so TLB flush has no effect Address space switch  Complete reload of Pentium TLB ~ 864 cycles Univ. of Tehran Advanced Operating Systems 110  Do we need a TLB flush always?    Implementation issue of “protection domains” SPIN implements protection domains as Modula names within a single hardware address space Liedtke suggests similar approach in the microkernel in an architecture-specific manner   PowerPC: use segment registers => no flush Pentium or 486: share the linear hardware address space among several user address spaces => no flush  There are some caveats in terms of size of user space and how many can be “packed” in a 2**32 global space Univ. of Tehran Advanced Operating Systems 111  Upshot?   Address space switching among medium or small protection domains can ALWAYS be made efficient by careful construction of the microkernel Large address spaces switches are going to be expensive ALWAYS due to cache effects and TLB effects, so switching cost is not the most critical issue Univ. of Tehran Advanced Operating Systems 112 Thread switches and IPC Univ. of Tehran Advanced Operating Systems 113 Segment switch (instead of AS switch) makes cross domain calls cheap Univ. of Tehran Advanced Operating Systems 114 Memory Effects – System Univ. of Tehran Advanced Operating Systems 115 Capacity induced MCPI Univ. of Tehran Advanced Operating Systems 116 Portability Vs. Performance  Microkernel on top of abstract hardware while portable    Cannot exploit hardware features Cannot take precautions to avoid performance problems specific to an arch Incurs performance penalty due to abstract layer Univ. of Tehran Advanced Operating Systems 117 Examples of nonportability  Same processor family  Use address space switch implementation   TLB flush method preferable for 486 Segment register switch preferable for Pentium => 50% change of microkernel!  IPC implementation   Details of the cache layout (associativity) requires different handling of IPC buffers in 486 and Pentium Incompatible processors  Exokernel on R4000 (tagged TLB) Vs. 486 (untagged TLB) => Microkernels are inherently non-portable Univ. of Tehran Advanced Operating Systems 118 Summary    Minimal set of abstractions in microkernel Microkernels are processor specific (at least in implementation) and non-portable Right abstractions and processor-specific implementation leads to efficient processor-independent abstractions at higher layers Univ. of Tehran Advanced Operating Systems 119 Performance Univ. of Tehran Advanced Operating Systems 120 Key points    Goal: extensibility akin to SPIN and Exokernel goals Main difference: support running several commodity operating systems on the same hardware simultaneously without sacrificing performance or functionality Why?      Application mobility Server consolidation Co-located hosting facilities Distributed web services …. Univ. of Tehran Advanced Operating Systems 121 Next Lecture  Process and Thread    “Cooperative Task Management Without Manual Stack Management”, by Atul Adya, et.al. “Capriccio: Scalable Threads for Internet Services”, by Ron Von Behrn, et. al. “The Performance Implication of Thread Management Alternative for Shared-Memory Multiprocessors”, Thomas E. Anderson, et.al. Univ. of Tehran Advanced Operating Systems 122