Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CMSC 22200 Computer Architecture Lecture 16: Virtual Memory Prof. Yanjing Li University of Chicago Administrative Stuff ! Lab 5 (multi-core) " " " " ! Thanksgiving week " " ! Basic requirements: out Extra credit (40% + 60%): out by end of this week Due: 11:59pm, Dec. 1st, Thursday Two late days with penalty No labs on Wednesday No lecture on Thursday Exam 2, Wednesday, 11/30, 7-9pm, Kent 107 " Practice problems will be posted 2 Where Are We in the Lecture Schedule? ! ! ISA Uarch " " ! ! ! ! ! Pipelining: basic, dependency handling, branch prediction Advanced uarch: OOO, SIMD, VLIW, superscalar Caches and advanced caches Multi-core Today and next 2 lectures (before exam 2) " " ! Datapath, control Single cycle, multi cycle Virtual memory, main memory (DRAM) Review session Last lecture: wrap up 3 Exam 2 Topics ! Focus on materials not covered in Exam 1 " ! Microarcthiecture techniques to improve ILP " " ! ! Basics, design considerations and tradeoffs, advanced techniques Multi-core " ! OOO, SIMD, VLIW How they work, design considerations, tradeoffs Caches " ! But everything in class is fair game Parallel programs, speedup, parallel computer architectures, cache coherence, memory consistency, synchronizations Virtual memory Main memory (DRAM) 4 Lecture Outline ! ! Synchronization mechanisms in shared memory multi-core designs Virtual memory 5 Main Multi-Core Design Issues ! Cache coherence " ! Memory consistency: ordering of memory operations " ! Ensure correct operation in the presence of private caches What should the programmer expect the hardware to provide? Shared memory synchronization " Hardware support for synchronization primitives ! We will discuss the above issues ! Others " Shared resource management, interconnects, … 6 How NOT To Implement Locks ! Lock: while (lock_var == 1); lock_var = 1; ! ! Unlock: lock_var = 0; What’s the problem? Testing if lock_var is 1 and setting it to 1 are not atomic " i.e., another processor can set lock_var to 1 in between # Multiple processors acquire the lock! " 7 Atomic Read & Write Instructions ! Aka. read-modify-write ! Specify a memory location and a register " " " ! I. Value in location read into a register II. Another value stored into location Many variants based on what “values” are allowed in II Simple example: test&set " " " Read memory location into specified register Store constant 1 into location Successful if value loaded into register is 0 8 Using Test&Set to Implement a Lock ! Initialize location to 0 lock: t&s register, location //atomic read-modify-write bnz lock //if not 0, try again ret //locked; value in location is 1 unlock: st location, #0 ret //write 0 to location 9 Many Others… ! Other read-modify-write primitives " " ! ! Swap Compare&swap More fancy implementations to avoid spinning, reduce memory traffic, promote fairness, etc. Programmers, consult the ISA specification for the most effective implementation that meets your needs 10 Virtual Memory Memory (Programmer’s View) 12 A System with Physical Memory Only ! Examples: " " " most Cray machines early PCs nearly all embedded systems Memory Physical Addresses 0: 1: CPU CPU’s load or store addresses used directly to access memory N-1: 13 Difficulties of Direct Physical Addressing ! Difficult to support code and data relocation " ! ! ! Processes come and go; fragmentation issues Difficult to provide protection and isolation among multiple processes Difficult to support data/code sharing across processes Also, ISA can have an address space greater than the physical memory size " " E.g., a 64-bit address space with byte addressability What if you do not have enough physical memory? 14 What Are Some Alternatives? ! Base and bound (BB) ! Segmentation ! Both have limitations! 15 Abstraction: Virtual vs. Physical Memory ! Programmer sees virtual memory " ! ! Can assume the memory is private and very large Reality: Physical memory size is much smaller than what the programmer assumes and can be shared The system (system software + hardware, cooperatively) maps virtual memory addresses are to physical memory " The system automatically manages the physical memory space transparently to the programmer 16 Benefits of Virtual Memory ! Automatic management " ! Programmer does not need to know the physical size of memory nor manage it # A small physical memory can appear as a huge one to the programmer # Life is easier for the programmer Each process has its own mapping from virtual # physical addresses, which enables " Code and data to be located anywhere in physical memory (efficient use of physical memory) " Isolation/separation of code and data of different processes in physical processes (protection and isolation) " Code and data sharing between multiple processes (sharing) 17 Virtual Memory: Basic Mechanism ! ! Idea: Indirection (in addressing) Address generated by each instruction in a program is a “virtual address” " ! An “address translation” mechanism maps virtual address to a “physical address” " ! i.e., it is not the physical address used to address main memory The hardware converts virtual addresses into physical addresses via an OS-managed lookup table (page table) Requires Hardware + Software support! 18 Virtual Pages, Physical Frames ! Virtual address space divided into pages Physical address space divided into frames ! A virtual page is mapped to ! " " ! If an accessed virtual page is not in memory, but on disk " " ! A physical frame, if the page is in physical memory A location in disk, otherwise Generates page fault Virtual memory system brings the page into a physical frame and adjusts the mapping # called demand paging Page table: mapping of virtual pages to physical frames 19 Remember: Page Table is Per Process ! Because each process has its own virtual address space " " ! Illusion of full address space for each program Simplifies memory allocation, sharing, linking and loading Which table to use is indicated by the page table base register (PTBR) Virtual Address Space for Process 1: Virtual Address Space for Process 2: 0 N-1 0 N-1 0 VP 1 VP 2 ... Address Translation VP 1 VP 2 ... PP 2 Physical Address Space (DRAM) PP 7 (e.g., read/only library code) PP 10 M-1 20 A System with Virtual Memory (Page based) Memory 0: 1: Page Table Virtual Addresses 0: 1: Physical Addresses CPU P-1: N-1: Disk ! Physical memory is a cache for pages stored on disk " In fact, it is a fully associative cache in modern systems (a virtual page can be mapped to any physical frame) 21 Page Size ! ! What is the granularity of management of physical memory? Specified by the ISA " Today: 4KB, 8KB, 4MB, 2GB, … ! ! Small and large pages mixed together Large vs. small pages: many tradeoffs " " " " " Size of the Page Table Number of page faults Transfer size from disk to memory Internal fragmentation “Coverage” of TLB (we will see this later) 22 Virtual to Physical Address Translation ! Parameters " " " P = 2p = page size (bytes) N = 2n = Virtual-address limit M = 2m = Physical-address limit n–1 p p–1 virtual page number, VPN page offset 0 virtual address address translation m–1 p p–1 physical frame number, PFN page offset 0 physical address Page offset bits don’t change as a result of translation 23 Address Translation Using Page Table ! Page Table contains an entry for each virtual page " ! Called Page Table Entry (PTE) What is in a PTE? " " " " " A valid bit # to indicate validity/presence in physical memory PFN for the corresponding /VPN # to support translation Control bits to support replacement Dirty bit indicate if we need to “write back” Protection bits to enable access control and protection 24 Address Translation Illustrated ! ! ! Separate (set of) page table(s) per process VPN forms index into page table (points to a page table entry) Page Table Entry (PTE) provides information about page page table base register (per process) virtual address n–1 p p–1 virtual page number (VPN) page offset valid … 0 physical frame number (PFN) VPN acts as table index if valid=0 then page not in memory (page fault) m–1 p p–1 physical frame number (PFN) page offset physical address 25 0 Another Function of VM System: Protection # Virtual memory system serves two functions today Address translation (for illusion of large physical memory) Access control (protection) ! Not every process is allowed to access every page " ! ! E.g., OS code and data structures should be accessible by system software only, i.e., require supervisor level privilege to access Idea: Store access control information on a page basis in the process’s page table Enforce access control at the same time as translation 26 VM as a Tool for Memory Access Protection ! ! Extend Page Table Entries (PTEs) with access permission bits Check bits on each access " If violated, generate exception (Access Protection exception) Memory Page Tables Read? Write? VP 0: Yes No Process i: VP 1: Yes VP 2: No Physical Addr PP 6 Yes PP 4 No XXXXXXX PP 0 PP 2 PP 4 • • • • • • • • • Addr Read? Write? Physical Process j: VP 0: Yes Yes PP 6 VP 1: Yes No PP 9 VP 2: No No XXXXXXX • • • • • • • • • PP 6 PP 8 PP 10 PP 12 • • • 27 Access Control ! Type of access " ! Privilege level " ! Read, write, execute Defined by ISA, e.g., supervisor vs. user PTE contains protection bits which specify which accesses can be made to this page at what privilege level What type of access is requested? Privilege level of the running process Access Control Logic Access allowed? Protection bits in PTE 28 Example: Privilege Levels in x86 29 Example: Page Level Protection in x86 30 System Support for Virtual Memory Both HW and SW Support Required ! ! ! Page Table is in memory, managed by OS Hardware utilizes the information in page table to perform fast address translation The hardware component is called the MMU (memory management unit) " " " Page Table Base Register (PTBR) Translation look aside buffer (TLB) Page walker logic 32 System Software (OS) Jobs for VM ! ! ! Keeping track of which physical frames are free Populate page table by allocating free physical frames to virtual pages Page replacement policy " ! ! Sharing pages between processes (e.g., shared libraries) Change page tables on context switch " ! ! When no physical frame is free, what should be swapped out? To use the running thread’s page table Handle page faults and ensure correct mapping Copy-on-write and other optimizations 33 Aside: An Interesting Page Replacement Algorithm ! ! ! ! The clock Algorithm (LRU approximation) Keep a circular list of physical frames in memory and a pointer (hand) When a page is accessed, set the reference (R) bit in PTE When a page needs to be replaced, traverse the circular list starting from the hand clockwise " Clear R bits of examined frames " Replace the first frame that has the reference (R) bit not set " Set the hand pointer to the next frame in the list and stop 34 Address Translation: Page Hit (HW) Note: 2-3 may not be necessary (TLB hit) 35 Address Translation: Page Fault (HW + SW) 36 I/O Operation in Page Fault Handler (1) Processor signals controller " Read block of length P starting at disk address X and store starting at memory address Y (2) Read occurs " " Direct Memory Access (DMA) Under control of I/O controller (3) Controller signals completion " " Interrupt processor OS resumes suspended process (1) Initiate Block Read Processor Reg (3) Read Done Cache Memory-I/O bus (2) DMA Transfer Memory I/O controller Disk Disk 37 Page Fault (“A Miss in Physical Memory”) Resolved Before fault Page Table Virtual Physical Addresses Addresses CPU After fault Memory Memory Page Table Virtual Addresses Physical Addresses CPU Disk Disk Virtual Memory System Design Considerations Three Major Issues I. How large is the page table and how do we store and access it? II. How can we speed up translation & access control check? III. Virtual memory and cache interaction ! Others " " " What happens on a context switch? How to handle multiple page sizes? … 40 Page Table Size 64-bit VPN PO 52-bit page table ! 28-bit 12-bit concat 40-bit PA Suppose64-bitVAand40-bitPA,howlargeisthepagetable? 252entriesx~4bytes≈16PetaBytes andthatisforjustoneprocess! 41 Virtual Memory Issue I ! Page table can be huge ! Where/how do we store it? " " " In dedicated hardware structure? In physical memory? In virtual memory? 42 Solution: Multi-Level Page Tables Example from x86 architecture ! The process many not be using the entire VM space! " " Only the first-level page table has to be in physical memory Remaining levels are in virtual memory (but get cached in physical memory when accessed) 43 Page Table Access ! Page Table Base Register (PTBR, CR3 in x86) " " ! Page Table Limit Register " ! Specifies the address of the page directory Must be physical address! If VPN is out of the bounds then the process did not allocate the virtual page # access control exception PTBR is part of a process’s context " " Just like PC, status registers, general purpose registers Needs to be loaded when the process is context-switched in 44 Virtual Memory Issue II ! How fast do we need the address translation to be? " ! How can we make it fast? Idea: Use a hardware structure that caches PTEs # Translation lookaside buffer (TLB) 45 Speeding up Translation with a TLB ! Essentially a small cache of recent address translations " " ! ! ! ! Avoids going to the page table on every reference What happens on context switch? Index = lower bits of VPN Tag = unused bits of VPN (+ process ID sometimes) Data = a page-table entry Status = valid, dirty The usual cache design choices (associativity, replacement policy, multi-level, etc.) all apply to TLB. 46 TLB Examples Typical numbers: 16–512 PTEs, 0.5–1 cycle for hit, 10–100 cycles for miss, 0.01%–1% miss rate TLB Misses ! The TLB is small; it cannot hold all PTEs " ! TLB miss indicates " " ! ! Some translations will inevitably miss in the TLB Page present, but PTE not in TLB Page not preset On TLB miss, access memory to find the appropriate PTE ! Called walking the page directory/table ! Large performance penalty Who handles TLB misses? Hardware or software? Handling TLB Misses ! Approach #1. Hardware-Managed (e.g., x86) " " The hardware does the page walk The hardware fetches the PTE and inserts it into the TLB ! " ! If the TLB is full, the entry replaces another entry based on replacement policy Done transparently to system software Approach #2. Software-Managed (e.g., MIPS) " " " " The The The The hardware raises an exception operating system does the page walk operating system fetches the PTE operating system inserts/evicts entries in the TLB Tradeoffs ! Hardware-Managed TLB " " " " ! Pro: No exception on TLB miss. Instruction simply waits Pro: Independent instructions may continue Pro: No extra instructions/data brought into caches Con: Page directory/table organization is etched into the system: OS has little flexibility Software-Managed TLB " " " Pro: The OS can define page table/directory organization Pro: More sophisticated TLB replacement policies are possible Con: Need to generate an exception # performance overhead due to pipeline flush, exception handler execution, extra instructions brought to caches