Download Virtualization in Linux

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Virtualization in
Linux
Atul
Bansal
Manish Pal
Pulkit Gambhir
Virtualization in a nut-shell
Virtualization : Running multiple
machines on a single hardware
 “Real” hardware invisible to OS
 OS only sees an abstracted out
picture
 Only Virtual Machine Monitor (VMM)
talks to hardware

More formally …
A framework for dividing the resources
of a machine into multiple execution
environments by using techniques such
as :
1. H/W & S/W partitioning
2. Time-sharing
3. Partial or complete machine simulation
4. Emulation …
In general, we implement an M-N mapping (M:real resources N:virtual
resources). Eg. Multitasking (1-N), Cluster Computing (M-1)
Motivations

Save on costs :
Many servers  1 Machine
 Running legacy software


Improve on security :


Protection of data by placing on
separate virtual machines
Development and debugging platform
More motivations
Hardware independence of code
(Java VM)
 Compatibility issues tackled
 Server migration is eased
 Error and attack containment
 Dynamic resource sharing
 It looks cool too….

Issues in virtualization

Some interfaces not designed with
virtualization in mind (Ex. Processor
privileges)


VMM needs to call all privileged instructions
Need for extra level of segmentation of
memory (between virtual machines)

Done entirely by VMM, guest OS’s only see
an abstraction of page table, not the real one
Issues in virtualization

Resource sharing


Map all VM requests to same network
card, same DMA controller etc.
Design and management of
communication between different
virtual machines

Need to show abstracted hardware
which has no physical equivalent
(emulation)
And despite all that ….
Need transparency and near real
machine like performance !
 An extremely hard task (bochs is a
very good example of less than
perfect performance)

Example : VMWare
architecture
Case Study : XEN
Virtual Architecture
Basic virtual Architecture of Xen
 CPU state
 Exception
 Interrupt handling
 Time
 Memory
 Devices
CPU State
Xen provides each guest OS with
virtual cpu’s (only 1 real cpu).
 All privileged cpu state are handled by
Xen.
 Guest OSes are not permitted to
perform privileged operations.
 hypercalls interface provided to guest
OS to execute privileged operations
on the cpu through Xen

Hypercalls



Analogous to system calls provided by any
OS except that handlers of software
interrupts vectors to entry point within Xen.
Even to set up Interrupt Vector Table, the
OS must invoke Xen hypercalls.
Basically any priviliged operation on CPU is
performed through a hypercall to Xen.
Virtual IDT
A virtual IDT is provided to guest OS
for setting up interrupt vector table.
 A guest OS can submit a table of trap
handlers to Xen via the
set_trap_table hypercall.
 The exception stack frame presented
to a virtual trap handler is identical to
its native equivalent.

Interrupt Handling




Interrupts are virtualized by mapping them
to event channels
Get delivered to the guest OS using a
callback supplied via the set callbacks
hypercall.
Guest OS can map these events onto its
standard interrupt dispatch mechanisms.
Xen is responsible for determining the guest
OS that will handle each physical interrupt
source.
Time

Time is important in virtualization as
guest OS need to be aware of ‘real
time’ and ‘virtual time’ (time of
execution).

Xen exports timestamps for system
time and wall-clock time to guest
operating systems through a shared
page of memory.
Time Consistency
All time stamps need to be updated and
read atomically .
 Xen stores a version number in the shared
info page, which is incremented before
and after updating the timestamps.
 A guest can be sure that it read a
consistent state by checking the two
version numbers are equal.

Event Channels




Event channels are the basic primitive
provided by Xen for event notifications.
Xen equivalent of a hardware interrupt.
Stores one bit of information, the event of
interest is signaled by transitioning this bit
from 0 to 1.
Notifications are received by a guest via an
up-call from Xen,
Event Channels
(Implementation)

The kernel shared info page (shared_info_t)
contains two bitfields for event channels
unsigned long evtchn_pending[…..];
unsigned long evtchn_mask[…..];


These two specify, respectively, if there is
an event pending (evtchn_pending) and if
the event channel is masked or not.
For masked channels, no events will be
delivered.
Virtual CPU Setup
Any guest OS needs to setup a virtual
CPU on which it executes.
 Includes installing vector table on
virtual IDT for handling interrupts,page
faults etc
 Guest OS must setup a pair of
hypervisor callbacks (notification and
entry points for XEN)

Hypercalls for CPU Setup
set callbacks(………………………..).
The above hypercall allows a guest OS to setup
the hypervisor callbacks.
set trap table(trap info t *table)
The above hypercall allows a guest OS to setup
its IDT.
A further hypercall is provided for the
management of virtual CPUs:
vcpu op(……..)
This hypercall can be used to bootstrap VCPUs,
to bring them up and down and to test their
current status.
Start of Day
The start-of-day environment for guest
operating systems is different to that
provided by the underlying hardware.
 Processor is already executing in
protected mode with paging enabled.
 Domain 0 is created and booted by
Xen itself.

Start of Day
For all domains other that dom0 , the
analogue of the boot-loader is the
domain builder.
 Domain builder is a user-space
software running in domain 0.
 The domain builder is responsible for
building the initial page tables for a
domain and loading its kernel image at
the appropriate virtual address.

XEN Scheduling
Similiar to traditional Linux schedulers
that divide CPU time for userland
processes, XEN schedules resources
between VMs.
 It is like context switching between
kernels
 Xen includes kernel boot time options
for scheduling.

Scheduling Algorithms

Atropos
soft real time scheduler
 guarantees absolute CPU shares


Round Robin


Characterized by a “quantum” of time
Borrowed Virtual Time
Proportional fair shares of CPU times
 “Penalizes” domains that block often
 ctx_allow : like the “quantum” above

Scheduling Algorithms

sEDF
Provides weighted CPU sharing
 Uses real time algorithms to ensure
time guarantees
 Uses weights as well as slices and
periods for scheduling and sharing

System Calls and Scheduling
Some Scheduling System Calls
*nice( )
getpriority( )
setpriority( )
sched_getscheduler( )
sched_setscheduler( )
sched_getparam( )
sched_setparam( )
sched_yield( )
sched_get_ priority_min( )
sched_get_ priority_max( )
sched_rr_get_interval( )
Memory management
Xen allocates physical memory to the domains
on a page granularity
 Domains may receive non-contiguous physical
memory.
 So xen makes a distinction between machine
memory and pseudo-physical memory.
 Machine memory refers to the entire amount
of memory installed in the machine.
 Pseudo-physical memory, on the other
hand, is a per-domain abstraction.

Memory management
Xen maintains a globally readable machineto-physical table
 Each domain is also supplied with a physicalto-machine table which performs the inverse
mapping.
 Architecture dependent code in guest
operating systems can then use the two tables
to provide the abstraction of pseudo-physical
memory.

Page Table Updates
Read-only access given to page tables
 Guest OS must explicitly request any
modifications (through hypercalls).
 Xen validates all such requests and
only applies updates that it deems safe
 This is necessary to prevent domains
from adding arbitrary mappings to their
page tables.

Writable Page Tables
Guest OS’s may request writable page
tables as well.
 Xen must still validate modifications to
ensure secure partitioning.
 Xen thus traps write attempts to
certain memory pages.

Handling the trap
Xen temporarily allows write access to
that page while at the same time
disconnecting it from the page table
that is currently in use.
 The newly-updated entries cannot be
used by the MMU until Xen revalidates
and reconnects the page.
 Reconnection occurs automatically
later in a number of situations. e.g
when the domain is preempted.

Shadow Page Tables
Another type of page table
 Guest OS uses a independent copy of
page tables
 Unknown to the hardware
 Xen propagates changes made to the
guest's tables to the real ones, and
vice versa.

VM assists
Xen provides a number of “assists” for
guest memory management .
 Hypercall used:
vm assist(unsigned int cmd, unsigned int
type);
 cmd parameter describes the action to
be taken
 type parameter describes the kind of
assist that is being referred to.

Conclusions
Virtualization is a very exciting area
 Implementation issues still exist
 We are still moving toward real
machine like performance
 With hardware supported virtualization
and multi-core, multi-threaded
hardware; things are now looking very
bright !

A quote to end it

Would PhD virtualization be when several
people get a PhD but only one is doing the
work? :
JoshTriplett on Xen IRC