Download Virtual Memory - classes.cs.uchicago.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Paging wikipedia , lookup

Memory management unit wikipedia , lookup

Transcript
CMSC 22200
Computer Architecture
Lecture 16: Virtual Memory
Prof. Yanjing Li
University of Chicago
Administrative Stuff
! 
Lab 5 (multi-core)
" 
" 
" 
" 
! 
Thanksgiving week
" 
" 
! 
Basic requirements: out
Extra credit (40% + 60%): out by end of this week
Due: 11:59pm, Dec. 1st, Thursday
Two late days with penalty
No labs on Wednesday
No lecture on Thursday
Exam 2, Wednesday, 11/30, 7-9pm, Kent 107
" 
Practice problems will be posted
2
Where Are We in the Lecture Schedule?
! 
! 
ISA
Uarch
" 
" 
! 
! 
! 
! 
! 
Pipelining: basic, dependency handling, branch prediction
Advanced uarch: OOO, SIMD, VLIW, superscalar
Caches and advanced caches
Multi-core
Today and next 2 lectures (before exam 2)
" 
" 
! 
Datapath, control
Single cycle, multi cycle
Virtual memory, main memory (DRAM)
Review session
Last lecture: wrap up
3
Exam 2 Topics
! 
Focus on materials not covered in Exam 1
" 
! 
Microarcthiecture techniques to improve ILP
" 
" 
! 
! 
Basics, design considerations and tradeoffs, advanced techniques
Multi-core
" 
! 
OOO, SIMD, VLIW
How they work, design considerations, tradeoffs
Caches
" 
! 
But everything in class is fair game
Parallel programs, speedup, parallel computer architectures,
cache coherence, memory consistency, synchronizations
Virtual memory
Main memory (DRAM)
4
Lecture Outline
! 
! 
Synchronization mechanisms in shared memory multi-core
designs
Virtual memory
5
Main Multi-Core Design Issues
! 
Cache coherence
" 
! 
Memory consistency: ordering of memory operations
" 
! 
Ensure correct operation in the presence of private caches
What should the programmer expect the hardware to provide?
Shared memory synchronization
" 
Hardware support for synchronization primitives
! 
We will discuss the above issues
! 
Others
" 
Shared resource management, interconnects, …
6
How NOT To Implement Locks
! 
Lock:
while (lock_var == 1);
lock_var = 1;
! 
! 
Unlock:
lock_var = 0;
What’s the problem?
Testing if lock_var is 1 and setting it to 1 are not atomic
"  i.e., another processor can set lock_var to 1 in between
# Multiple processors acquire the lock!
" 
7
Atomic Read & Write Instructions
! 
Aka. read-modify-write
! 
Specify a memory location and a register
" 
" 
" 
! 
I. Value in location read into a register
II. Another value stored into location
Many variants based on what “values” are allowed in II
Simple example: test&set
" 
" 
" 
Read memory location into specified register
Store constant 1 into location
Successful if value loaded into register is 0
8
Using Test&Set to Implement a Lock
! 
Initialize location to 0
lock: t&s register, location //atomic read-modify-write
bnz lock
//if not 0, try again
ret
//locked; value in location is 1
unlock: st location, #0
ret
//write 0 to location
9
Many Others…
! 
Other read-modify-write primitives
" 
" 
! 
! 
Swap
Compare&swap
More fancy implementations to avoid spinning, reduce
memory traffic, promote fairness, etc.
Programmers, consult the ISA specification for the most
effective implementation that meets your needs
10
Virtual Memory
Memory (Programmer’s View)
12
A System with Physical Memory Only
! 
Examples:
" 
" 
" 
most Cray machines
early PCs
nearly all embedded systems
Memory
Physical
Addresses
0:
1:
CPU
CPU’s load or store addresses used
directly to access memory
N-1:
13
Difficulties of Direct Physical Addressing
! 
Difficult to support code and data relocation
" 
! 
! 
! 
Processes come and go; fragmentation issues
Difficult to provide protection and isolation among multiple
processes
Difficult to support data/code sharing across processes
Also, ISA can have an address space greater than the
physical memory size
" 
" 
E.g., a 64-bit address space with byte addressability
What if you do not have enough physical memory?
14
What Are Some Alternatives?
! 
Base and bound (BB)
! 
Segmentation
! 
Both have limitations!
15
Abstraction: Virtual vs. Physical Memory
! 
Programmer sees virtual memory
" 
! 
! 
Can assume the memory is private and very large
Reality: Physical memory size is much smaller than what
the programmer assumes and can be shared
The system (system software + hardware, cooperatively)
maps virtual memory addresses are to physical memory
" 
The system automatically manages the physical memory
space transparently to the programmer
16
Benefits of Virtual Memory
! 
Automatic management
" 
! 
Programmer does not need to know the physical size of
memory nor manage it # A small physical memory can
appear as a huge one to the programmer # Life is easier for
the programmer
Each process has its own mapping from virtual # physical
addresses, which enables
" 
Code and data to be located anywhere in physical memory
(efficient use of physical memory)
" 
Isolation/separation of code and data of different processes in
physical processes
(protection and isolation)
" 
Code and data sharing between multiple processes
(sharing)
17
Virtual Memory: Basic Mechanism
! 
! 
Idea: Indirection (in addressing)
Address generated by each instruction in a program is a
“virtual address”
" 
! 
An “address translation” mechanism maps virtual address
to a “physical address”
" 
! 
i.e., it is not the physical address used to address main
memory
The hardware converts virtual addresses into physical
addresses via an OS-managed lookup table (page table)
Requires Hardware + Software support!
18
Virtual Pages, Physical Frames
! 
Virtual address space divided into pages
Physical address space divided into frames
! 
A virtual page is mapped to
! 
" 
" 
! 
If an accessed virtual page is not in memory, but on disk
" 
" 
! 
A physical frame, if the page is in physical memory
A location in disk, otherwise
Generates page fault
Virtual memory system brings the page into a physical frame
and adjusts the mapping # called demand paging
Page table: mapping of virtual pages to physical frames
19
Remember: Page Table is Per Process
! 
Because each process has its own virtual address space
" 
" 
! 
Illusion of full address space for each program
Simplifies memory allocation, sharing, linking and loading
Which table to use is indicated by the page table base
register (PTBR)
Virtual
Address
Space for
Process 1:
Virtual
Address
Space for
Process 2:
0
N-1
0
N-1
0
VP 1
VP 2
...
Address
Translation
VP 1
VP 2
...
PP 2
Physical Address
Space (DRAM)
PP 7
(e.g., read/only
library code)
PP 10
M-1
20
A System with Virtual Memory (Page based)
Memory
0:
1:
Page Table
Virtual
Addresses
0:
1:
Physical
Addresses
CPU
P-1:
N-1:
Disk
! 
Physical memory is a cache for pages stored on disk
" 
In fact, it is a fully associative cache in modern systems (a
virtual page can be mapped to any physical frame)
21
Page Size
! 
! 
What is the granularity of management of physical memory?
Specified by the ISA
" 
Today: 4KB, 8KB, 4MB, 2GB, …
! 
! 
Small and large pages mixed together
Large vs. small pages: many tradeoffs
" 
" 
" 
" 
" 
Size of the Page Table
Number of page faults
Transfer size from disk to memory
Internal fragmentation
“Coverage” of TLB (we will see this later)
22
Virtual to Physical Address Translation
! 
Parameters
" 
" 
" 
P = 2p = page size (bytes)
N = 2n = Virtual-address limit
M = 2m = Physical-address limit
n–1
p p–1
virtual page number, VPN
page offset
0
virtual address
address translation
m–1
p p–1
physical frame number, PFN page offset
0
physical address
Page offset bits don’t change as a result of translation
23
Address Translation Using Page Table
! 
Page Table contains an entry for each virtual page
" 
! 
Called Page Table Entry (PTE)
What is in a PTE?
" 
" 
" 
" 
" 
A valid bit # to indicate validity/presence in physical memory
PFN for the corresponding /VPN # to support translation
Control bits to support replacement
Dirty bit indicate if we need to “write back”
Protection bits to enable access control and protection
24
Address Translation Illustrated
! 
! 
! 
Separate (set of) page table(s) per process
VPN forms index into page table (points to a page table entry)
Page Table Entry (PTE) provides information about page
page table
base register
(per process)
virtual address
n–1
p p–1
virtual page number (VPN)
page offset
valid
…
0
physical frame number (PFN)
VPN acts as
table index
if valid=0
then page
not in memory
(page fault)
m–1
p p–1
physical frame number (PFN)
page offset
physical address
25
0
Another Function of VM System: Protection
# Virtual memory system serves two functions today
Address translation (for illusion of large physical memory)
Access control (protection)
! 
Not every process is allowed to access every page
" 
! 
! 
E.g., OS code and data structures should be accessible by
system software only, i.e., require supervisor level privilege to
access
Idea: Store access control information on a page basis in
the process’s page table
Enforce access control at the same time as translation
26
VM as a Tool for Memory Access Protection
! 
! 
Extend Page Table Entries (PTEs) with access permission bits
Check bits on each access
" 
If violated, generate exception (Access Protection exception)
Memory
Page Tables
Read? Write?
VP 0: Yes
No
Process i:
VP 1: Yes
VP 2:
No
Physical Addr
PP 6
Yes
PP 4
No
XXXXXXX
PP 0
PP 2
PP 4
•
•
•
•
•
•
•
•
• Addr
Read?
Write?
Physical
Process j:
VP 0: Yes
Yes
PP 6
VP 1: Yes
No
PP 9
VP 2:
No
No
XXXXXXX
•
•
•
•
•
•
•
•
•
PP 6
PP 8
PP 10
PP 12
•
•
•
27
Access Control
! 
Type of access
" 
! 
Privilege level
" 
! 
Read, write, execute
Defined by ISA, e.g., supervisor vs. user
PTE contains protection bits which specify which accesses can
be made to this page at what privilege level
What type of access
is requested?
Privilege level of the
running process
Access Control
Logic
Access
allowed?
Protection bits in
PTE
28
Example: Privilege Levels in x86
29
Example: Page Level Protection in x86
30
System Support for Virtual Memory
Both HW and SW Support Required
! 
! 
! 
Page Table is in memory, managed by OS
Hardware utilizes the information in page table to perform
fast address translation
The hardware component is called the MMU (memory
management unit)
" 
" 
" 
Page Table Base Register (PTBR)
Translation look aside buffer (TLB)
Page walker logic
32
System Software (OS) Jobs for VM
! 
! 
! 
Keeping track of which physical frames are free
Populate page table by allocating free physical frames to
virtual pages
Page replacement policy
" 
! 
! 
Sharing pages between processes (e.g., shared libraries)
Change page tables on context switch
" 
! 
! 
When no physical frame is free, what should be swapped out?
To use the running thread’s page table
Handle page faults and ensure correct mapping
Copy-on-write and other optimizations
33
Aside: An Interesting Page Replacement Algorithm
! 
! 
! 
! 
The clock Algorithm (LRU approximation)
Keep a circular list of physical frames in memory and a pointer (hand)
When a page is accessed, set the reference (R) bit in PTE
When a page needs to be replaced, traverse the circular list starting
from the hand clockwise
"  Clear R bits of examined frames
"  Replace the first frame that has the reference (R) bit not set
"  Set the hand pointer to the next frame in the list and stop
34
Address Translation: Page Hit (HW)
Note: 2-3 may not be necessary (TLB hit)
35
Address Translation: Page Fault (HW + SW)
36
I/O Operation in Page Fault Handler
(1) Processor signals controller
" 
Read block of length P starting
at disk address X and store
starting at memory address Y
(2) Read occurs
" 
" 
Direct Memory Access (DMA)
Under control of I/O controller
(3) Controller signals completion
" 
" 
Interrupt processor
OS resumes suspended process
(1) Initiate Block Read
Processor
Reg
(3) Read
Done
Cache
Memory-I/O bus
(2) DMA
Transfer
Memory
I/O
controller
Disk
Disk
37
Page Fault (“A Miss in Physical Memory”) Resolved
Before fault
Page Table
Virtual
Physical
Addresses
Addresses
CPU
After fault
Memory
Memory
Page Table
Virtual
Addresses
Physical
Addresses
CPU
Disk
Disk
Virtual Memory System
Design Considerations
Three Major Issues
I. How large is the page table and how do we store and
access it?
II. How can we speed up translation & access control check?
III. Virtual memory and cache interaction
! 
Others
" 
" 
" 
What happens on a context switch?
How to handle multiple page sizes?
…
40
Page Table Size
64-bit
VPN
PO
52-bit
page
table
! 
28-bit
12-bit
concat
40-bit
PA
Suppose64-bitVAand40-bitPA,howlargeisthepagetable?
252entriesx~4bytes≈16PetaBytes
andthatisforjustoneprocess!
41
Virtual Memory Issue I
! 
Page table can be huge
! 
Where/how do we store it?
" 
" 
" 
In dedicated hardware structure?
In physical memory?
In virtual memory?
42
Solution: Multi-Level Page Tables
Example from x86 architecture
! 
The process many not be using the entire VM space!
" 
" 
Only the first-level page table has to be in physical memory
Remaining levels are in virtual memory (but get cached in
physical memory when accessed)
43
Page Table Access
! 
Page Table Base Register (PTBR, CR3 in x86)
" 
" 
! 
Page Table Limit Register
" 
! 
Specifies the address of the page directory
Must be physical address!
If VPN is out of the bounds then the process did not allocate
the virtual page # access control exception
PTBR is part of a process’s context
" 
" 
Just like PC, status registers, general purpose registers
Needs to be loaded when the process is context-switched in
44
Virtual Memory Issue II
! 
How fast do we need the address translation to be?
" 
! 
How can we make it fast?
Idea: Use a hardware structure that caches PTEs #
Translation lookaside buffer (TLB)
45
Speeding up Translation with a TLB
! 
Essentially a small cache of recent address translations
" 
" 
! 
! 
! 
! 
Avoids going to the page table on every reference
What happens on context switch?
Index = lower bits of VPN
Tag = unused bits of VPN (+
process ID sometimes)
Data = a page-table entry
Status = valid, dirty
The usual cache design choices
(associativity, replacement policy,
multi-level, etc.) all apply to TLB.
46
TLB Examples
Typical numbers: 16–512 PTEs, 0.5–1 cycle for hit, 10–100
cycles for miss, 0.01%–1% miss rate
TLB Misses
! 
The TLB is small; it cannot hold all PTEs
" 
! 
TLB miss indicates
" 
" 
! 
! 
Some translations will inevitably miss in the TLB
Page present, but PTE not in TLB
Page not preset
On TLB miss, access memory to find the appropriate PTE
!  Called walking the page directory/table
!  Large performance penalty
Who handles TLB misses? Hardware or software?
Handling TLB Misses
! 
Approach #1. Hardware-Managed (e.g., x86)
" 
" 
The hardware does the page walk
The hardware fetches the PTE and inserts it into the TLB
! 
" 
! 
If the TLB is full, the entry replaces another entry based on
replacement policy
Done transparently to system software
Approach #2. Software-Managed (e.g., MIPS)
" 
" 
" 
" 
The
The
The
The
hardware raises an exception
operating system does the page walk
operating system fetches the PTE
operating system inserts/evicts entries in the TLB
Tradeoffs
! 
Hardware-Managed TLB
" 
" 
" 
" 
! 
Pro: No exception on TLB miss. Instruction simply waits
Pro: Independent instructions may continue
Pro: No extra instructions/data brought into caches
Con: Page directory/table organization is etched into the
system: OS has little flexibility
Software-Managed TLB
" 
" 
" 
Pro: The OS can define page table/directory organization
Pro: More sophisticated TLB replacement policies are possible
Con: Need to generate an exception # performance overhead
due to pipeline flush, exception handler execution, extra
instructions brought to caches