Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Plan 9 from Bell Labs wikipedia , lookup
Berkeley Software Distribution wikipedia , lookup
Security-focused operating system wikipedia , lookup
Unix security wikipedia , lookup
Linux kernel wikipedia , lookup
Distributed operating system wikipedia , lookup
Spring (operating system) wikipedia , lookup
Thread (computing) wikipedia , lookup
The 27 Year Old Microkernel Sebastien Marineau-Mes & Colin Burgess Agenda Background and timeline Hybrid Software Model Anatomy of the microkernel How things work – system calls, process manager Q&A All content copyright QNX Software Systems 2 A History of Software Innovation 1980: First commercially available microkernel OS 1992: First RTOS to offer built-in faulttolerant networking 1985: First memoryprotected RTOS 1982: First RTOS to support a hard disk on a PC 1990: First POSIXcertified RTOS 1997: First RTOS to support symmetric multiprocessing (SMP) 1994: US patent for scalable microkernel windowing system 2002: First RTOS vendor to deliver Eclipse-based IDE 2005: First to offer “bound” multiprocessing 2007: Introduces hybrid software model and opens source code 1980 QNX2 1985 1990 QNX4 1995 All content copyright QNX Software Systems 2000 QNX6 2005 3 Hybrid Software Model Developer Enablement > Published source code – runtime components > Transparent development QNX development teams working in the open Live check-ins for features and bugfixes Community Enablement > Foundry 27 developer portal > Initial projects: OS, Tools, BSPs, Bazaar Business Enablement > Free access to development tools for non-commercial and partners > Free access to source for development Standard business model & pricing for commercial projects > Ability to create and distribute derivative works > Flexible contribution model All content copyright QNX Software Systems 4 Microkernel Architecture Process Manager File System Networking Windowing Multi-media Message Bus µK Microkernel Arm, Mips, SH4 PowerPC, Xscale, X86 Microkernel + Process Manager are the only trusted components Application Applications and Drivers Are processes which plug into a message bus • Reside in their own memory-protected address space • Have a well defined message interface • Cannot corrupt other software components • Can be started, stopped and upgraded on the fly All content copyright QNX Software Systems 5 Separation of Duties – Process Manager vs. MicroKernel Microkernel Process Manager Messages Pathname Threads Process Synchronization Virtual Memory Scheduling procfs Signals Debug Timers Resources Channels Loader Connections Named Sems Interrupts imagefs procnto All content copyright QNX Software Systems 6 Microkernel Services Messages Simple pre-emptable operations Provides basic system services > Threads > > Synchronization Scheduling Signals Timers Channels Connections Interrupts Implements much of the POSIX thread and realtime standard Interrupt and exception redirection IPC primitives Most of the microkernel is hardware independent > > CPU-dependant layer for low-level cpu interfaces CPU-specific optimized routines Only pieces of code that runs with full system privilege Microkernel does not run “on its own” > Only reacts to external events: system calls, interrupts, exceptions All content copyright QNX Software Systems 7 Process Manager Services Process Manager Implements long, complex operations and services > Ex: Process creation and memory management Is a multi-threaded process that is scheduled at normal priority > Competes for CPU with all other threads in the system Message driven server More on this later Pathname Process Virtual Memory procfs Debug Resources Loader Named Sem imagefs All content copyright QNX Software Systems 8 Procnto Source Layout /services/system/ ppc sh x86 pathmgr procmgr memmgr arm mips … Process lifecyle management mips Support Functions arm proc Pathname management ker Looking for source code? Go to www.foundry27.com -> Projects -> Core Operating System -> Source Guide All content copyright QNX Software Systems 9 Kernel call operations sequence Kernel entry Entry Interrupts off Unlocked Kernel Operation which may include message pass Locked Pre-emptable No pre-emption Interrupts on Unlocked Pre-emptable Exit Interrupts off Kernel exit All content copyright QNX Software Systems 10 Anatomy of a Kernel Call A user-mode thread makes a call to a system call stub located in libc > Ex: MsgSendvnc() The system call stub executes a TRAP instruction (or whatever instruction is appropriate for the particular hardware). MsgSendvnc: lw $8,16($29) addiu $2,$0,12 syscall jr $31 nop The processor changes privilege state, interrupts are disabled Execution resumes at the appropriate vector services/system/ker/<cpu>/kernel.s > One of the few pieces of code that is all assembly – see __ker_entry, __ker_sysenter /* * r4k_syscall_handler() * Streamlined path for our most common operation--kernel calls */ FRAME(r4k_syscall_handler,sp,0,ra) .set noat /* * Coming from user mode. Save user registers, and get * a fresh kernel stack. Move GP to our own short data * area. */ LD_ACTIVE_AND_KERSTACK(k0,k1) addiu k0,k0,REG_OFF SAVE_REGS(1) All content copyright QNX Software Systems 11 Kernel Entry /* * r4k_syscall_handler() * Streamlined path for our most common operation--kernel calls */ FRAME(r4k_syscall_handler,sp,0,ra) /* * Coming from user mode. Save user registers, and get * a fresh kernel stack. Move GP to our own short data * area. */ LD_ACTIVE_AND_KERSTACK(k0,k1) System call entry Load kernel stack Save thread context (register set) SAVE_REGS(1) Acquire the kernel lock ACQUIRE_KERNEL(INKERNEL_NOW,zero,1) • On uni processor, atomically set the INKERNEL_NOW bit • On SMP systems, spinlock on INKERNEL_NOW Enable Interrupts Transfer to kernel call implementation • services/system/ker_call_table.c /* * Interrupts are now OK again */ STI /* * Kernel call number should still be intact in v0. * Save the kernel call number. */ sw v0,SYSCALL(s0) #if defined(VARIANT_instr) la t1,_trace_call_table #else la t1,ker_call_table #endif /* * Index the call table and run the C code */ All content copyright QNX Software Systems 12 Kernel Function Implementation Entry from kercall table int kdecl ker_timer_create(THREAD *act, struct kera *kap) { VALID_CLOCKID(kap); Validate parameters if(kap->event) { Verify pointers referenced by kernel are valid RD_VERIFY_PTR(act, kap->event, si • RD_PROBE/WR_PROBE functions RD_PROBE_INT(act, kap->event, siz • RD_VERIFY_*/WR_VERIFY_* functions } • If addresses are no accessible, a fault will be generated and kernel call will return with EFAULT All done up-front work, ready to do the real work prp = act->process; … It’s very important to get the validation right, as a fault (due to invalid or malicious parameter passed in to call) could be catastrophic All content copyright QNX Software Systems 13 Microkernel Pre-emption Kernel entry Entry Interrupts off Unlocked Kernel Operation which may include message pass Locked Kernel call preemption is important Interrupt activity may READY a higher priority thread to run while a kernel call is in progress We want to immediately schedule this higher priority THREAD (minimize scheduling latency) QNX does this in a novel way – preemptable kernel Pre-emptable > > Defer changing global kernel state Implementation of kernel ops is 2 stages: do the work followed by a “commit” On preemption, the active thread’s IP is backed up to re-execute the SYSENTER instruction Allows us to only have one kernel stack – not one per thread Any memory references/calculations done before locking kernel must be restartable No pre-emption Interrupts on Unlocked Pre-emptable Exit Interrupts off Kernel exit All content copyright QNX Software Systems 14 Lock Kernel Most of the preperatory work is done before locking kernel, if possible. int kdecl ker_sched_get(THREAD *act, struct kerargs_sched_get *kap) { PROCESS *prp; THREAD *thp; // Verify the target process exists. if((prp = (kap->pid ? lookup_pid(kap->pid) : act->process)) == NULL) return ESRCH; // Verify the target thread exists. if((thp = (kap->tid ? vector_lookup(&prp->threads, kap->tid-1) : act)) == NULL) return ESRCH; // Verify we have the right to examine the target process if(!kerisusr(act, prp)) return ENOERROR; Argument Verification if(kap->param) { WR_VERIFY_PTR(act, kap->param, sizeof(*kap->param)); WR_PROBE_OPT(act, kap->param, sizeof(*kap->param) / sizeof(int)); kap->param->sched_curpriority = thp->priority; kap->param->sched_priority = thp->real_priority; } User Pointer Verification Lock kernel to change status lock_kernel(); SETKSTATUS(act,thp->policy); return ENOERROR; } All content copyright QNX Software Systems 15 Exit Kernel Exit kernel to run user-space thread > Note that currently scheduled thread may have changed Ex: entered kernel due to HW interrupt, interrupt readied highprio thread (that high-prio thread becomes RUNNING) Ex: Blocking kernel causes current thread to be blocked, another to be made RUNNING > __ker_exit implements this Adjust the address space if needed > memmgr.aspace() Do special return processing > Deliver signals, pulses etc > This may cause a reschedule which could cause another loop through __ker_exit Restore the context of the (newly) active thread Call SYSEXIT All content copyright QNX Software Systems 16 What about “Non Kernel” System Calls? In many cases, traditional UNIX system calls are not implemented by the micro-kernel on QNX. > They are implemented in the process manager or in external servers that extend procnto In general, many of the lengthier core POSIX operations are done by the process manager All content copyright QNX Software Systems 17 Process Manager First process in system > Created by kernel (init_objects) Provides core services to other processes Multi-threaded Process > First <ncpus> threads are IDLE threads > Additional threads are threadpool worker threads Message driven server Actually a collection of (almost) independent servers 4 message handlers 11(!) resource managers > These resource managers are actually mini filesystems. All content copyright QNX Software Systems 18 Process Manager Message Handlers Resource Managers (pathmgr/*) proc/rsrcdbmgr_* /dev/mem proc/sysmgr_* /dev/null procmgr/* /dev/text /dev/tty SYSMGR_COID /dev/zero /dev/shmem /dev/tymem memmgr/* /dev/sem /proc/boot /proc / All content copyright QNX Software Systems 19 Process Manager Normal process… but It has certain privileges > Executes at higher processor privilege level This varies depending on processor architecture > Executes in kernel address space Not quite true Because proc’s address space and user address spaces don’t overlap, it may adopt a users address space. This makes for faster message passes between proc and user applications. > Has permission to use __Ring0() kernel call All content copyright QNX Software Systems 20 Process Manager __Ring0 Kernel Call > Used by proc when it needs to execute code in the kernel context Mostly used when manipulation of kernel structures is required Provide atomicity of kernel state modifications to ensure consistency > Occasionally used when processor privilege is required Ex: manipulate privileged CPU registers > Arguments are a function pointer and a data pointer Remember process manager shares address space with kernel > _NTO_PF_RING0 flag needed to use this kernel call > Only process manager has this flag set All content copyright QNX Software Systems 21 Process Manager Example The process manager implements many services which would actually be “kernel calls” in traditional UNIX Example – mmap() > mmap() is the API through which all mappings are setup by user processes > Malloc uses mmap() to allocate heap memory, also known as “anonymous” memory, since it is not a mapping of a named object. > Not implemented as a kernel call, but rather a message that is sent to the process manager All content copyright QNX Software Systems 22 mmap() void *_mmap(void *addr, size_t len, int prot, int flags, int fd, off64_t off, unsigned align, unsigned preload, void **base, size_t *size) { mem_map_t msg; Type of operation Parameters Send message to procnto requesting operation be done msg.i.type = _MEM_MAP; msg.i.zero = 0; msg.i.addr = (uintptr_t)addr; msg.i.len = len; msg.i.prot = prot; msg.i.flags = flags; msg.i.fd = fd; msg.i.offset = off; msg.i.align = align; msg.i.preload = preload; msg.i.reserved1 = 0; if(MsgSendnc(MEMMGR_COID, &msg.i, sizeof msg.i, &msg.o, sizeof msg.o) == -1) { return MAP_FAILED; } All content copyright QNX Software Systems 23 memmgr_handler() The _MEM_MAP message type is picked up and passed to the memmgr message handler switch(msg->type) { … case _MEM_MAP: proc_wlock_adp(prp); status = memmgr_map(ctp, prp, &msg->map); break; case _MEM_CTRL: proc_wlock_adp(prp); status = memmgr_ctrl(prp, &msg->ctrl); break; All content copyright QNX Software Systems 24 Process Manager User Process Process Manager malloc() mmap() MsgSendv() return msg.o.addr; _MEM_MAP MsgReceivev() memmgr_map() vmm_mmap() map_create() pa_alloc() pte_manipulate() MsgReplyv() All content copyright QNX Software Systems 25 Other Process Manager Services Creating processes! The spawn() send a _PROC_SPAWN message to create a new process The exec() ‘system call’ is actually a spawn message with an SPAWN_EXEC flag set! The fork() ‘system call’ is a _PROC_FORK message Procfs debug filesystem > Similar to unix procfs > Used by debugger/pidin/ps All content copyright QNX Software Systems 26 Ongoing kernel development The kernel team is currently working on our next release > Codename “trinity 2” Features include: > Memory management enhancements such as variable page support (~15% improvement in system performance) > POSIX PSE52 certification > PPC 9xx processor support > ARMv6 support > Cross-endian QNET capabilities Trinity 2 is currently feature complete – bugfixing/release process underway > Builds available on foundry27: http://community.qnx.com/sf/wiki/do/viewPage/projects.core_os/wiki/Trinity2 All content copyright QNX Software Systems 27 Roadmap – QNX Source Postings Source Bundle Release Date Description Networking Nov 2007 Next Generation Networking stack, protocols, drivers (io-pkt) Block Filesystems March 2008 Block Filesystems and Utilities Flash Filesystems March 2008 Flash (NOR/NAND) Filesystems and Utilities Network Filesystems March 2008 Block/Flash/Network Filesystems and Utilities Devices and Drivers June 2008 Serial, Audio, USB, PCI frameworks and drivers System Services June /2008 Additional system service managers Window Systems Sept 2008 High level Photon server and services Graphics System Sept 2008 Lower level graphics libraries and drivers Multimedia Nov 2008 Full multimedia stack All content copyright QNX Software Systems 28 Brainteasers Need something to chew on? > Try to figure out the questions below and post the answer to the OS_Tech forum on the OS project > Prize for the first to answer each question > QNX employees not eligible 1. What does STI expand to in the MIPS kernel? 2. NEED_PREEMPT(act) checks queued_event_priority What sets queued_event_priority? 3. In the memmgr message handler, what is the purpose of “proc_wlock_adp(prp);”? All content copyright QNX Software Systems 29 Want to learn more? Check out the projects on www.foundry27.com Download the QNX Momentics suite on www.qnx.com Download the microkernel source from the QNX operating system project Read the tech articles and wiki pages (linked off the project) Participate in the forums on the QNX operating system project All content copyright QNX Software Systems 30 Questions?