Download Lecture Note VIII

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

VS/9 wikipedia, lookup

Spring (operating system) wikipedia, lookup

CP/M wikipedia, lookup

Paging wikipedia, lookup

Memory management unit wikipedia, lookup

Operating Systems
Lecture 8:
Memory Management
Physical memory
No protection
Abstraction: virtual memory
Each program isolated from all others and from the OS
Limited size
Illusion of infinite memory
Sharing visible to programs
Transparent -- can't tell if memory is shared
Easy to share data between programs
Ability to share code, data
Hardware support for protection
How is protection implemented?
Hardware support: address translation , dual mode operation
Address translation
Address space: literally, all the addresses a program can touch. All the state that a program can
affect or be affected by.
Restrict what a program can do by restricting what it can touch!
Hardware translates every memory reference from virtual addresses to physical addresses; software
sets up and manages the mapping in the translation box.
User mode
Kernel mode
Address Translation in Modern Architectures
Two views of memory:
view from the CPU -- what program sees, virtual memory
view from memory -- physical memory
Translation box converts between the two views.
Translation helps implement protection because no way for program to even talk about other
program's addresses; no way for them to touch operating system code or data.
Translation can be implemented in any number of ways -- typically, by some form of table lookup
(we'll discuss various options for implementing the translation box later). Separate table for each
user address space.
Application can not modify its own translation table. If it could, could get access to all of physical
memory. Has to be restricted somehow. Dual-mode operation enables control.
when in the OS, can do anything (kernel-mode)
when in a user program, restricted to only touching that program's memory (user-mode)
Hardware requires CPU to be in kernel-mode to modify address translation tables.
OS runs in kernel mode (untranslated)
User programs run in user mode (translated)
Want to isolate each address space so its behavior can't do any harm, except to itself.
How does kernel and user interact?
Kernel -> user:
To run a user program,
create a process/thread to:
allocate and initialize address space control block
read program off disk and store in memory
allocate and initialize translation table (point to program memory)
run program (or to return to user level after calling the kernel):
set machine registers
set hardware pointer to translation table
set processor status word (user vs. kernel)
jump to start of program
User-> Kernel:
How does the user program get back into the kernel?
Voluntarily user->kernel: System call -- special instruction to jump to a specific operating system
handler. Just like doing a procedure call into the operating system kernel -- program asks OS
kernel, please do something on procedure's behalf.
Can the user program call any routine in the OS? No. Just specific ones the OS says is ok. Always
start running handler at same place, otherwise, problems!
Involuntarily user->kernel: Hardware interrupt , also program exception such as bus error,
segmentation fault, page fault
On system call, interrupt, or exception: hardware atomically
sets processor status to kernel
changes execution stack to kernel
saves current program counter
jumps to handler in kernel
handler saves previous state of any registers it uses
How does the system call pass arguments?
a. Use registers
b. Write into user memory, kernel copies into its memory
Except: user addresses -- translated
kernel addresses -- untranslated
Base and Bounds
Base and bounds: Each program loaded into contiguous regions of physical memory, but with
protection between programs. First built in the Cray-1.
trap; addressing error
Hardware Implementation of Base and Bounds Translation
Program has illusion it is running on its own dedicated machine, with memory starting at 0 and
going up to size = bounds. Like linker-loader, program gets contiguous region of memory. But
unlike linker-loader, protection: program can only touch locations in physical memory between
base and base + bounds.
Logical address space
Physical address space
Virtual and Physical Memory Views in Base and Bounds System
Provides level of indirection: OS can move bits around behind the program's back, for instance, if
program needs to grow beyond its bounds, or if need to coalesce fragments of memory.
Stop program, copy bits, change base and bounds registers, restart.
Only the OS gets to change the base and bounds! Clearly, user program can't, or else lose
Hardware cost:
2 registers
adder, comparator
Plus, slows down hardware because need to take time to do add/compare on every memory
Base and bounds is simple and fast but it has the following disadvantages:
1. hard to share between programs for example, suppose two copies of "vi" :we want to share
code , only data and stack need to be different . We can't do this with base and bounds!
2. hard to grow address space. We want stack and heap to grow into each other (have to
allocate maximum future needs.
3. needs complex memory allocation such as first fit, best fit, buddy system . In worst case, it
is needed to shuffle large chunks of memory to fit new program.
Solution to 1 & 2 : (segmentation),
Solution to 1 & 3 : (paging),
Solution to 1 & 2 & 3 : (segmentation plus paging)! Segmentation
A segment is a region of logically contiguous memory. Idea is to generalize base and bounds, by
allowing a table of base & bound pairs.
Virtual address
Segment #
Physical addres
For example, what does it look like with this segment table, in virtual memory and physical
memory? Assume 2 bit segment ID, and 12 bit segment offset.
Virtual Segment #
Physical Segment Start at
Segment size
Although it seems that the virtual address space has gaps in it, each segment gets mapped to
contiguous locations in physical memory, but may be gaps between segments. But a correct
program will never address gaps; if it does, trap to kernel. Minor exception: stack, heap can grow.
Segmentation is efficient for sparse address spaces. It is easy to share whole segments (for
example, code segment) Only a protection mode can be added in segmentation table. For example,
code segment would be read-only (only execution and loads are allowed). Data and stack segment
would be read-write (stores allowed). But, segmentation still needs complex memory allocation
such as first fit, best fit, etc., and re-shuffling to coalesce free fragments, if no single free space is
big enough for a new segment.
How do we make memory allocation simple and easy? Paging
Allocate physical memory in terms of fixed size chunks of memory, or pages.
Simpler, because allows use of a bitmap. What's a bitmap?
Each bit represents one page of physical memory -- 1 means allocated, 0 means unallocated. Lots
simpler than base&bounds or segmentation
Operating system controls mapping: any page of virtual memory can go anywhere in physical
Each address space has its own page table, in physical memory. Hardware needs two special
registers -- pointer to physical location of page table, and page table size. Example: suppose page
size is 4 bytes.
Virtual address
Page #
Page table size
Page table pointer
Page #
Physical address
Page table translation
1. What if page size is very small? For example if page size is 512 bytes, means lots of space
taken up with page table entries.
2. What if page size is really big? Why not have an infinite page size? Would waste unused
space inside of page. Example of internal fragmentation.
With segmentation need to re-shuffle segments to avoid external fragmentation. Paging suffers
from internal fragmentation.
3. What if address space is sparse? For example: on UNIX, code starts at 0, stack starts at
2^31 - 1. With 1KB pages, 2 million page table entries -- because have to have table that maps
entire virtual address space.
Paging is a simple memory allocation. It is easy to share but needs big page tables if the address
space is sparse.
Is there a solution that allows simple memory allocation, easy to share memory, and is efficient for
sparse address spaces?
Combining paging and segmentation? Paged Segmentation (Multi-level translation)
Multi-level translation. Use tree of tables. Lowest level is page table, so that physical memory can
be allocated using a bitmap. Higher levels are typically segmented. For example, 2-level
Virtual address
Segment #
Page #
Page table
table size
page table
Physical page #
Segment table
Physical address
Just like recursion -- could have any number of levels. Most architectures today do some flavor of
1. Where are segment table/page tables stored? Segment tables are usually in special CPU
registers, because they are small. Page tables, usually in main memory
2. How do we share memory? Can share entire segment, or a single page.
Multilevel translation only needs to allocate as many page table entries as we need. In other words,
sparse address spaces are easy. Memory allocation is easy. Sharing can be done at segment or page
level. But it has some disadvantages as well. A pointer is needed per page (typically 4KB - 16KB
pages today). Page tables need to be contiguous . Two lookups per memory reference needed.
TLB: Translation Lookaside Buffer (a kind of page table cache )
Generic Issues in Caching
Cache hit : item is in the cache
Cache miss : item is not in the cache, have to do full operation
Effective access time = P(hit) * cost of hit + P(miss) * cost of miss
1. How do you find whether item is in the cache (whether there is a cache hit)?
2. If it is not in cache (cache miss), how do you choose what to replace from cache to make room?
3. Consistency -- how do you keep cache copy consistent with real version?
Use caching at each level, to provide illusion of a terabyte, with register access times. Works
because programs aren't random.
Exploit locality : that computers behave in future like they have in the past.
Temporal locality : will reference same locations as accessed in the recent past
Spatial locality : will reference locations near those accessed in the recent past
Caching applied to address translation
Often reference same page repeatedly, why go through entire translation each time?
Translation Buffer, Translation Lookaside Buffer : hardware table of frequently used translations,
to avoid having to go through page table lookup in common case. Typically, on chip, so access
time of 5-10ns, instead of several hundred for main memory.
How do we tell if needed translation is in TLB?
1. Search table in sequential order
2. Direct mapped: restrict each virtual page to use specific slot in TLB
Consistency between TLB and page tables
What happens on context switch?
Have to invalidate entire TLB contents. When new program starts running, will bring in new
translations. Alternatively, include process id tag in TLB comparator. Have to keep TLB
consistent with whatever the full translation would give.
What if translation tables change? For example, to move page from memory to disk, or vice versa.
Have to invalidate TLB entry.
Relationship between TLB and hardware memory caches
Can put a cache of memory values anywhere in this process. If between translation box and
memory, called a "physically addressed cache". Could also put a cache between CPU and
translation box: "virtually addressed cache".
Virtual memory is a kind of caching: we're going to talk about using the contents of main memory
as a cache for disk.
Page Replacement Algorithms:
FIFO: First -in -First -Out (Belady`s Anomaly)
LRU: Least Recently Used (Implementing with counters, stacks etc)
NRU: Not Recently used
Clock algorithm : arrange physical pages in a circle, with a clock hand.
1. Hardware keeps use bit per physical page frame
2. Hardware sets use bit on each reference ,If use bit isn't set, means not referenced in a long time
3. On page fault: Advance clock hand (not real time)
check use bit
1 -> clear, go on
0 -> replace page
Will it always find a page or loop infinitely? Even if all use bits are set, it will eventually loop
around, clearing all use bits -> FIFO
What if hand is moving slowly?
Not many page faults and/or find page quickly
What if hand is moving quickly?
Lots of page faults and/or lots of reference bits set.
Nth chance algorithm : don't throw page out until hand has swept by n times
OS keeps counter per page -- # of sweeps
On page fault, OS checks use bit:
1 => clear use and also clear counter, go on
0 => increment counter, if < N, go on
else replace page
How do we pick N?
Why pick large N? Better approx to LRU.
Why pick small N? More efficient; otherwise might have to look a long way to find free page.
Dirty pages have to be written back to disk when replaced. Takes extra overhead to replace a dirty
page, so give dirty pages an extra chance before replacing?
Common approach:
clean pages -- use N = 1
dirty pages -- use N = 2 (and write-back to disk when N=1)
To summarize, many machines maintain four bits per page table entry:
use : set when page is referenced, cleared by clock algorithm
modified : set when page is modified, cleared when page is written to disk
valid : ok for program to reference this page
read-only : ok for program to read page, but not to modify it (for example, for catching
modifications to code pages)
Inverted Page Table
Page tables map virtual page # -> physical page #
Do we need the reverse? physical page # -> virtual page #?
Yes. Clock algorithm runs through page frames. What if it ran through page tables?
(i) many more entries
(ii) what if there is sharing?
Thrashing: memory overcommitted, pages tossed out while still needed.
Example: One program, touches 50 pages (each equally likely).
Have only 40 physical page frames
If have enough pages, 200 ns/ref
If have too few pages, assume every 5th page reference, page
4 refs x 200 ns
1 page fault x 10 ms for disk I/O
Dennings Working Set
Informally, collection of pages process is using right now Formally, set of pages job has referenced
in last T seconds.
How do we pick T?
1 page fault = 10 msec
10 msec = 2 million instructions
So T needs to be a lot bigger than 1 million instructions. How do you figure out what working set
(a) Modify clock algorithm, so that it sweeps at fixed intervals. Keep idle time/page -- how many
sec since last reference
(b) With second chance list -- how many seconds since got put on 2nd chance list Now that you
know how many pages each program needs, what to do?
Global replacement
(UNIX) -- all pages in one pool.
More flexible -- if my process needs a lot, and you need a little, I can grab pages from you.