Download PowerPoint XP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Distributed operating system wikipedia , lookup

Transcript
Improving the Reliability of
Commodity Operating Systems
Introduction
Nooks
Allows existing OS extensions to execute safely
in commodity kernels
Use lightweight kernel protection domains
Restricted write access to kernel memory
Track and validate all modifications to kernel data
structures
Motivation
Computer reliability a unsolved problem
Cost of failures continues to rise
OS extensions have become prevalent
70% of Linux kernel code
35,000 drivers on Windows XP
Written by people who are less experienced in
kernel organization
Motivation
Extensions are leading cost of failures
In Windows XP, drivers cause 85% of failures
In Linux, device drivers introduce 7x errors than
the rest of the kernel
Extended OS cannot be tested completely
Nooks Approach
Target existing extension architecture
Use conventional C instead of type-safe
languages
Aim to reduce the number of crashes due
to drivers and extensions
Prototype implemented in Linux
Showed graceful recovery for 99% of fault
injections
Related Work
Hardware approaches
Capability-based architectures
Recovery difficult for shared resources
Segment architectures
Difficult to program
New OS structures
Microkernels
Good fault isolation
Rebooting required to restart services
Related Work
Transaction-based systems
Works well for file systems
Language-based approaches
Limited applicability
Architecture
Core principles
Design for fault resistance, not fault tolerance
Prevent and recover from most, not all
Design for mistakes, not abuse
Extensions are generally well-behaved (not
malicious)
Can explore the design space between unproctected
and safe
Architecture
Implications
+ Can define an architecture that supports
existing drivers with moderate performance
costs
- Malicious code can bypass these mechanisms
Goals
Isolation of kernel from extension failures
Need to detect failures before they spread
Automatic recovery from failures
Backward compatibility
Functions
Reliability layer inserted between the
extensions and the OS kernel
Intercepts all interactions between the
extensions and the OS kernel
Major functions
Isolation
Interposition
Object tracking
Recovery
Isolation
Lightweight kernel protection domain
Write access to a limited portion of the kernel’s
address space
Major tasks
Creation, manipulation, and maintenance of
lightweight kernel protection domains
Inter-domain control transfer
Isolation
Extension procedure call (XPC)
Similar to lightweight RPC
Assume trusted interactions
Asymmetric relationship
Kernel has more privileges
Interposition
The Nooks interposition mechanisms
Make sure that
All control flows between the kernel and extensions
are through the XPC mechanism
All data flows between the kernel and extensions are
managed by Nooks’ object-tracking code
Extensions and the kernel communicate
through wrapper stubs
Object Tracking
Maintains a list of kernel data structures
that are manipulated by an extension
Controls all modifications to those
structures
Provides object info for cleanup when an
extension fails
Object Tracking
An object must be copied into an
extension before it is modified
Object tracking code verifies the type and
accessibility of each parameter being
passed
Recovery
Nooks detects software faults
When kernel services are invoked incorrectly
When an extension consumes too many
resources
Actions
Return to the extension
Generate an error code
Recovery
Nooks detects hardware faults
Processor raises an exception during extension
execution
Attempts to read unmapped memory
Write memory outside of its protection domain
A user or a program trigger Nooks
recovery explicitly
Recovery
Since extensions are decoupled from
kernel, Nooks can freely release
extension-held kernel structures, such as
objects or locks, during the recovery
process
Architecture
Apache Web
Server
Navigator Web
Browser
Quake3D Video
Game
Operating System Kernel
Memory
Management
File System
Networking
Nooks Kernel Runtime
Network Nook
Video Nook
Per-nook runtime
Per-nook runtime
TCP/IP Driver
Ethernet Driver
Video Driver
SCSI Driver
Nooks Kernel Runtime
Ethernet Card
Video Card
SCSI Controller
Card
Implementation
Linux 2.4.18
Worst-case target
18 months of development
22,000 lines of Nooks code (vs. 2.4 million lines
of Linux code and 50 million lines of Windows
2003 code)
Isolation
Two parts
Memory management
Extension procedure call
Memory Management
Kernel has read-write access to the entire
address space
Each extension is restricted to read-only
kernel access and read-write access to its
local domain
Nooks maintains a copy of the kernel page
table for each domain
Memory Management
Changing protection domains is not as
costly as changing processes
Protection domains share kernel address space
Extension Procedure Call
Transparent to both the kernel and its
extensions
Managed by two functions
nooks_driver_call(func_ptr, arg_list, domain)
nooks_kernel_call(func_ptr, arg_list, domain)
Deferred call mechanisms available
Useful for network drivers to queue up packets
and perform bulk transfers
Changes to Linux Kernel
Maintain coherency between the kernel
and extension page tables
Detect exceptions that occurs within
Nooks’ protection domains
Locate tasks that are no longer collocated
on the kernel stack due to isolation
Interposition
Provides wrapper stubs between
extensions and the kernel
Transparent to the kernel and drivers
Kernel modifications
Make standard module load to bind extensions
to wrappers instead of kernel functions
The kernel is initialized to interpose on the
Nooks’ call into extensions
Interposition
Some data references are interposed
Certain objects are linked directly into the
extension for reading
Kernel modification calls are wrapped
Performance critical data structure
Shadow object in extension that are
synchronized before and after XPCs
Otherwise, just XPCs
Wrappers
Within the kernel’s protection domain
Three basic tasks
Check parameters for validity
Create a copy of kernel objects in the
extension’s protection domain
No serialization/deserialization necessary
Synchronization code placed in wrappers
Perform an XPC into the kernel or extension
Automatically generated
Wrapper Code Sharing
50% of Nooks code base
Shared among multiple drivers
Object Tracking
Supports 43 kernel object types
Records the addresses of all objects in
use by an extension
Records the association between the
kernel and the extension versions of
writable objects
Performs garbage collection
Determines whether to copy an object
Recovery
Recovery manager releases resources
Unloading the extension
Releasing its kernel and physical resources
Reloading and restarting the extension
User-mode agent coordinates recovery
Each object is associated with a recovery
function
Implementation Limitations
Nooks does not handle all possible errors
Deliberate corruptions of system states
Infinite loops
However, a moderate reduction of system
crashes is a significant contribution
Achieving Transparency
Wrapper stubs for every call in the
extension-kernel interface
Object-tracking code for every object type
that is passed between the extension and
the kernel
Nooks transparent to both the extension
and the kernel
Reliability
Nooks can detect and recover 99% of
extension faults
Test Methodology
Synthetic fault injection
Automatically changes single instructions in the
extension code to emulate common errors
Uninitialized variables
Bad parameters
Types of Extensions Isolated
Device drivers (network, sound cards)
Optional kernel subsystems (VFAT)
Application-specific kernel extension
(kHTTPd)
Test Environment
VMware
Allows automation of crash testing without
reboots
5 extensions
400 tests each
Test Results
Not all faulty-injection trials cause faulty
behavior
System Crashes
A system crash is easiest to detect
OS panics
Hangs
Reboots
Linux experienced 317 crashes
Nooks eliminated 313 crashes, or 99%
4 deadlocks
System Crashes
Sound blaster and VFAT extensions are
process-oriented
Fewer crashes
kHTTPd, pcnet32, e1000 are interruptedbased
More crashes
Non-Fatal Extension Failures
Nooks cannot detect erroneous extension
behaviors
Network could disappear
Mounted file system hangs
Recovery Errors
A faulting extension is unloaded, reloaded,
and restarted
Works well with kHTTPp
Not as well with VFAT
Corruptions can propagate to disk if not detected in
time
Summary of Reliability Experiments
Nooks eliminated 99% of the system
crashes in extensions
Nooks eliminated nearly 60% of non-fatal
extension failures
Performance
Dell 1.7 GHz Pentium 4
890 MB of RAM
SoundBlaster 16
Intel Pro/1000 Gb Ethernet Adapter
7200 RPM, 41 GB IDE HD
Linux 2.4.18
Sound Benchmark
Plays an MP3 file at 128 Kb/sec
150 XPCs/sec
Nooks imposes little overhead
Network Benchmark
netperf performance tool
A node sends/receives a stream of 32 KB
TCP messages via a 256KB buffer
10% overhead
Compile Benchmark
Linux kernel compilation on VFAT
25% slowdown
Web Server Benchmarks
httperf
Repeatedly request a 1-KB file and measure
the maximum request rate
60% slowdown
CPU bound
SPECweb99
3% slowdown
Summary
If the computation is not CPU bound, the
penalty may not be important
Conclusions
Nooks is achievable with modest
engineering effort
Extensions such as device drivers can be
isolated without changes to extension
code
Isolation and recovery can dramatically
improve the system’s ability to survive
extension faults