Download VirtualMemory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Page replacement algorithm wikipedia , lookup

Transcript
Virtual Memory and Paging
J. Nelson Amaral
Large Data Sets
• Size of address space:
– 32-bit machines: 232 = 4 GB
– 64-bit machines: 264 = a huge number
• Size of main memory:
– approaching 4 GB
• How to handle:
– Applications whose data set is larger than the main
memory size?
– Sets of applications that together need more space
than the memory size?
Baer, p. 60
Multiprogramming
• More than one program reside in memory at
the same time
• I/O is slow:
• If the running program needs I/O, it relinquishes the
CPU
Baer, p. 60
Multiprogramming Challenges
• How and where to load a program to
memory?
• How a program asks for more memory?
• How to protect one program from another?
Baer, p. 60
Virtual Memory
• Solution:
– Give each program the illusion that it could
address the whole addressing space
• CPU works with virtual addresses
• Memory works with real or physical addresses
Baer, p. 60
Virtual -> Physical
Address Translation
• Paging System
– Divide both the virtual and the physical address
spaces into pages of the same size.
– Virtual space: page
– Physical space: frame
• Fully associative mapping between pages and
frames.
– any page can be stored in any frame
Baer, p. 60
Virtual space is much
larger than physical
memory
Paging System
Memory can be shared with
little fragmentation
Pages can be shared
among programs
Memory does not need
to store the whole program
and its data at the same time
Baer, p. 61
Address Translation
valid bit = 0 implies
a page fault (there is
no frame in memory
for this page)
Baer, p. 62
Page Fault
• Exception generated in program P1 because
valid bit = 0 in Page Table Entry (PTE)
– Page fault handler initiates I/O read for P1
• I/O read takes several miliseconds to complete
– context switch occurs
• O.S. saves processor state and starts I/O operation
• Handles CPU control to another program P2
– Restores P2’s state into CPU
Baer, p. 62
Address Translation
Virtual and physical addresses can be of different sizes.
Example:
64 bits
40 or 48 bits
Baer, p. 62
Translation Look-Aside Buffer (TLB)
• Problem:
– Storing page table entries (PTEs) in memory would
require a load for each address translation.
– Caching PTEs interferes with the flow of
instructions or data into the cache
• Solution: TLB, a small, high-associativity, cache
dedicated to cache PTEs
Baer, p. 62
TLB organization
• Each TLB entry consists of:
– tag
– data (a PTE entry)
– valid bit
– dirty bit
– bits to encode memory protection
– bits to encode recency of access
• A set of TLB entries may be reserved to the
Operating System
Baer, p. 62
TLB Characteristics
Architecture
Page Size (KB)
Alpha 21064
Number of Entries
I-TLB
D-TLB
8
8 (FA)
32 (FA)
Alpha 21164
8
48 (FA)
64 (FA)
Alpha 21264
8
64 (FA)
128 (FA)
Pentium
4
32 (4-way)
64 (4-way)
Pentium II
4
32 (4-way)
64 (4-way)
Pentium III
4
32 (4-way)
64 (4-way)
Pentium 4
4
64 (4-way)
128 (4-way)
Core Duo
4
64 (FA)
64 (FA)
Baer, p. 63
Large Pages
• Recent processors implement large page size
(typically 4 MB pages)
– reduces page faults in applications with lots of
data (scientific and graph)
– requires that TLB entries be reserved for large
pages.
Baer, p. 63
Referencing Memory
Baer, p. 63
Memory Reference Process
TLB hit?
No
Handle TLB miss
Yes
valid bit?
0
Page Fault
1
protection
violation?
Yes
Access Violation
Exception
No
store?
Yes Turn PTE dirty bit
on
No
Update Recency
Baer, p. 63
Handling TLB Misses
• Must access page table in memory
– entirely in hardware
– entirely in software
– combination of both
• Replacement Algorithms
– LRU for 4-way associativity (Intel)
– Not Most Recently Used for full associativity
(Alpha)
Baer, p. 64
Handling TLB Miss (cont.)
• Serving a TLB miss takes 100-1000 cycles.
– Too short to justify a context switch
– Long enough to have significant impact on
performance
• even a small TLB miss rate affects CPI
Baer, p. 64
OS handling of page fault
Reserve frame from a
free list
Find page to replace if
there is no free frame
Find if faulting page is
in disk
Invalidate portions of
the TLB (maybe Cache)
Initiate read for
faulting page
Invalidate cache lines
mapping to replaced page
Write dirty replaced
pages to the disk
Baer, p. 64
When page arrives in memory
I/O interruption is
raised
OS updates the PTE of
the page
OS schedule
requesting process for
execution
Baer, p. 64
Invalidating TLB Entries on Context
Switch
• Page Fault → Exception → Context Switch
• Let:
– PR: Relinquishing process
– PI: Incoming Process
• Problem: TLB entries are for PR, not PI
– Invalidating entire TLB on context switch leads to
many TLB misses when PI is restored
• Solution: Use a processor ID number (PID)
Baer, p. 64
Process ID (PID) Number
• O.S. sets a PID for each program
• The PID is added to the tag in the TLB entries
• A PID Register stores the PID of the active
process
• Match PID Register with PID in TLB entry
• No need to invalidate TLB entries on context
switch
• PIDs are recycled by the OS
Baer, p. 64
Page Size X Read/Write Time
Seek
Time
Rotation
Time
0 to 10 ms
~ 3 ms
Seek
Time
Rotation
Time
0 to 10 ms
~ 3 ms
Transfer
Time
Page of Size x
Transfer
Time
• Amortizing I/O Time:
• Large page size
• Read/write consecutive pages
Page of Size 2x
Baer, p. 65
Large Pages
• Amortize I/O time to transfer pages
• Smaller Page Tables
– More PTEs are in main memory
• lower probability of double page fault for a single
memory reference
• Fewer TLB misses
– Single TLB entry translates more locations
• Pages cannot be too large
– Transfer time and fragmentation
Baer, p. 65
Performance of Memory Hierarchy
Baer, p. 66
When to bring a missing item (to
cache, TLB, or memory)?
• On demand
Level
Cache
TLB
Page Fault
Miss Frequency
Miss Resolution
few times per 100 references
5-100 cycles
entirely in hardware
few times per 10,000 references
100-1000 cycles
in hardware or software
few times per 10,000,000
references
millions of cycles
require context switch
Baer, p. 66
Where to put the missing item?
• Cache: restrictive mapping (direct or low
associativity)
• TLB: fully associative or high set associativity
• Paging System: general mapping
Baer, p. 66
How do we know it is there?
• Cache: Compare tags and check valid bits
• TLB: Compare tags, PID, check valid bits
• Memory: Check Page Tables
Baer, p. 67
What happens on a replacement?
• Caches and TLBs: (approximation to) LRU
• Paging Systems:
– Sophisticated algorithms to keep page fault rate
very low
– O.S. policies allocate a number of page to each
program according to working set
Baer, p. 67
Simulating Memory Hierarchy
• Memory Hierarchy simulation is faster than
simulation to assess IPC or execution time
• Stack property of some replacement
algorithms:
– for a sequence of memory references for a given
memory location at a given level of the hierarchy,
the number of misses is monotonically non
increasing with the size of the memory
– can simulate a range of sizes in a single simulation
pass.
Baer, p. 67
Belady’s Algorithm
• Belady’s algorithm: replace the entry that will
be accessed the furthest in the future.
– It is the optimal algorithm
– It needs to know the future
• not realizable in practice
• useful in simulation to compare with practical
algorithms
Baer, p. 67