Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operating Systems 6. Memory Management 2006/2007 6.1 Main Memory Paulo Marques Departamento de Eng. Informática Universidade de Coimbra [email protected] Disclaimer This slides and notes are heavily based on the companion material of [Silberschatz05].The original material can be found at: In some cases, material from [Stallings04] may also be used. The original material can be found at: http://codex.cs.yale.edu/avi/os-book/os7/slide-dir/index.html http://williamstallings.com/OS/OS5e.html The respective copyrights belong to their owners. 2 Multi-step Processing of a User Program Address binding of instructions and data to memory addresses can happen at three different stages Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes Load time: Must generate relocatable code if memory location is not known at compile time Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers) 3 Memory Management Unit Logical address – generated by the CPU; also referred to as virtual address Physical address – addresses seen by the memory unit Address Buses CPU 0x12339912 MMU 0x00340344 Memory 4 Dynamic Relocation MMU 5 Dynamic Relocation (2) Programmer does not know where the program will be placed in memory when it is executed. While the program is executing, it may be swapped to disk and returned to main memory at a different location (relocated). Memory references must be translated in the code to actual physical memory address 6 Dynamic Loading & Dynamic Linking Dynamic Loading Routine is not loaded until it is called Better memory-space utilization; unused routine is never loaded Useful when large amounts of code are needed to handle infrequently occurring cases No special support from the operating system is required implemented through program design Dynamic Linking Linking postponed until execution time Small piece of code, stub, used to locate the appropriate memoryresident library routine Stub replaces itself with the address of the routine, and executes the routine Operating system needed to check if routine is in processes’ memory address Dynamic linking is particularly useful for libraries System also known as shared libraries 7 All that “jazz”… Dynamic linking of “math” Static linking of “math” Sizes Used in static linking Used in dynamic linking 8 Simple Memory Protection 9 Simple Memory Protection (2) MMU 10 Swapping 11 Swapping (2) A process can be swapped temporarily out of memory to disk, and then brought back into memory for continued execution. Example: Disk transfer rate Memory to swap: Time it takes to swap out: Average disk latency: Time to swap-out/swap-in: 5Mb/sec 1 Mb 1/5 sec = 200 msec 8 msec 416 msec Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped 12 Continuous Memory Allocation Memory must be allocated to processes that are created… Hole – block of available memory; holes of various size are scattered throughout memory. When a process arrives, it is allocated memory from a hole large enough to accommodate it. Operating system maintains information about: a) allocated partitions b) free partitions (holes) OS OS OS OS OS process 5 process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 2 process 2 process 2 process 10 process 10 process 2 process 2 13 Fragmentation External Fragmentation: There is no block large enough to accommodate a process, although overall there may be enough memory process 5 process 10 process 2 Internal Fragmentation: When memory is allocated in fixed-sized chunks, leading to some internal memory not being used process 5 Internal Fragmentation process 10 process 2 14 Memory Management 1. Fixed Partitioning Equal-size partitions Unequal-size partitions 2. Dynamic Partitioning Varying size partitions according to process needs 3. Buddy System 2N allocator 4. Paging An advanced form of equal-size fixed partitioning 5. Segmentation A form of dynamic partitioning 15 Fixed Partitioning Equal-size partitions Any process whose size is less than or equal to the partition size can be loaded into an available partition. If all partitions are full, the operating system can swap a process out of a partition Main memory use is inefficient. Any program, no matter how small, occupies an entire partition: internal fragmentation Unequal-size partitions Can assign each process to the smallest partition within which it will fit Queue for each partition Processes are assigned in such a way as to minimize wasted memory within a partition (minimizes internal fragmentation) 16 Unequal-size partitions 17 Dynamic Partitioning Partitions are of variable length and number. Process is allocated exactly as much memory as required. Eventually get holes in the memory. This is called external fragmentation. Must use compaction to shift processes so they are contiguous and all free memory is in one block. Time consuming task! 18 Dynamic Partitioning Placement Algorithms Operating system must decide which free block to allocate to a process. First-fit: Allocate the first hole that is big enough Best-fit: Allocate the smallest hole that is big enough Fast: can stop searching after finding the first adequate block External fragmentation Must search entire list, unless ordered by size Produces the smallest leftover hole: External fragmentation Worst-fit: Allocate the largest hole Must also search entire list Produces the largest leftover hole In general, first-fit and best-fit are better than worst-fit. Nevertheless, it cannot be said that best-fit is better than first-fit 19 Buddy System Normally used to allocate memory in the kernel Entire space available is treated as a single block of 2U. If a request of size s such that: 2U-1 < s <= 2U, entire block is allocated. Otherwise block is split into two equal buddies. Process continues until smallest block greater than or equal to s is generated When a block becomes free, it may be coalesced with its corresponding buddies It’s a very fast system! Compromise between internal and external fragmentation. 20 Example of Buddy-System in action 21 Paging Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter is available Divide physical memory into fixed-sized blocks called frames (typically 4KB) Divide logical memory into blocks of same size called pages (typically 4KB) Keep track of all free frames To run a program of size n pages, need to find n free frames and load program Set up a Page Table to translate logical to physical addresses Quite useful when associated with swapping Only has internal fragmentation 22 Paging (simplified) 4Gb 4Gb Disk 5000 256Mb 1000 0 0 Address Space of Process A Address Space of Process B Page Table 0 Physical Memory - Normally there is one page table per process - When a process is executing, the Page Table Register (PTBR) is loaded with the corresponding address of the page table 23 Swap file 24 Address Translation Address (m bits) generated by CPU is divided into: Page number (p) – used as an index into a page table which contains base address of each page in physical memory Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit page number page offset p d m bits For given logical address space 2m and page size 2n 25 Paging Hardware Page number (p) – used as an index into a page table which contains base address of each page in physical memory. Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit. 26 Logical Model Remember: - Typically, one page table per process… - Each entry of a page table is called a PTE (Page Table Entry) 27 Free Frames 28 Implementation in Hardware Page table is kept in main memory Page-table base register (PTBR) points to the page table Page-table length register (PRLR) indicates size of the page table In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction! The two memory access problem can be solved by the use of a special fast-lookup hardware cache called Translation Look-Aside Buffer (TLB) Small number of entries… 64-1024 29 Paging Hardware with TLB 30 Protection Each page table entry has an associated “valid bit”. If it’s not valid, the page is not mapped in the address space of the process in question 31 Structure of Page Tables With 4KB pages and a 4 byte PTE… A 32-bit address space requires a 4MB page table! Page Table should not be allocated sequentially in memory Several common schemes: Hierarchical Paging Hashed Page Tables Inverted Page Tables 32 Hierarchical Paging Divide the page table in smaller pieces. Part of the page table can itself be paged. Root page always remains in main memory. 4Kb root table 4Mb process tables 4Gb address space 33 Hierarchical Paging Addressing A logical address (on 32-bit machine with 4K page size) is divided into: Since the page table is paged, the page number is further divided into: a page number consisting of 20 bits. a page offset consisting of 12 bits. a 10-bit page number (p1). a 10-bit page offset (p2). Thus, a logical address is as follows: outer page number inner page number page offset p1 10 p2 d 10 12 Where p1 is an index into the outer page table, and p2 is the displacement within the page of the outer page table. 34 Address Translation Scheme TLBs are essential to guaranty good performance For 64-bit at least three levels are necessary! Typically: 32+10+10+12 Inadequate… 35 A Two-level Paging System 36 Hashed Page Tables Common in address spaces > 32 bits The virtual page number is hashed into a page table. This page table contains a chain of elements hashing to the same location. Virtual page numbers are compared in this chain searching for a match. If a match is found, the corresponding physical frame is extracted. 37 Hashed Page Table 38 Inverted Page Tables One entry for each real page of memory Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page. Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs Use hash table to limit the search to one (or at most a few) page-table entries Used in 64-bit UltraSPARC and PowerPC 39 Inverted Page Table Architecture 40 Segmentation Memory-management scheme that supports user view of memory A program is a collection of segments. A segment is a logical unit of data. 41 Logical View of Segmentation stack stack heap heap data text data text user space physical memory space 42 Segmentation A program has several segments All segments of all programs do not have to be of the same length. There is a maximum segment length Addressing consist of two parts: A segment number (s) A segment displacement/offset (d) segment offset Since segments are not equal, segmentation is similar to dynamic partitioning. (Nowadays, segmentation is in disuse…) 43 Segmentation Hardware 44 Example of Segmentation 45 Example – Intel Pentium Supports both Paging and Segmentation and a combination of Paging and Segmentation Max segment length = 4GB Maximum number of segments per process = 16KB 8KB segments are private to the process (Local Descriptor Table) 8KB segments are shared among all (Global Descriptor Table) 46 Intel Pentium Segmentation Pentium has 6 segment selector registers, allowing to access 6 segments at the same time. Each selector consists in a 16 bit number: Segment (13bits), local/global (1bit), protection (2bits) 47 Intel Pentium Paging Allows page sizes of 4KB or 4MB Normal two-level page hierarchy: For 4KB a two-level paging scheme is used First 10 bits Index the Page Directory Second 10 bits Index the Page Table The inner page table contains the Page Frame which is added to the page offset Page tables can be swapped to disk. Invalid bit is used on the Page Directory to indicate this. In that case, the remaining 31 bits can be used to locate the page in disk. 48 Intel Pentium Paging (2) 49 Linux on Pentium Systems Linux uses only 6 segments: Kernel Code Kernel Data User Code User Data Task-State Segment (TSS) Default LDT Segment Linux uses a three-level paging mechanism The size of the middle directory is set to 0 (uses 0 bits) 50 Three-Level Paging in Linux 51 Reference Chapter 8: Main Memory All chapter 8! 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 8.8 52 Operating Systems 6. Memory Management 2006/2007 6.2 Virtual Memory Paulo Marques Departamento de Eng. Informática Universidade de Coimbra [email protected] Virtual Memory As seen before, allows a separation between the physical memory and the logical address space Allows for more “logical memory” than what physically exists Easy sharing of memory between processes Efficient/Fast process creation (copy-on-write) 4Gb 4Gb Disk 5000 256Mb 1000 0 0 Address Space of Process A Address Space of Process B Page Table 0 Physical Memory 54 Paging to disk 55 Fetch Policy How do we bring pages from disk to main memory? Demand paging only brings pages into main memory when a reference is made to a location on the page. Less I/O needed Less memory needed Faster response More users Many page faults when process first started Pre-paging brings in more pages than needed. Efficient if pages reside contiguously on disk Uses up memory 56 How does it work? With each page table entry a valid/invalid bit is associated (v in-memory, i not-in-memory) Initially valid–invalid bit is set to i on all entries Example of a page table snapshot: Frame # page table valid-invalid bit v v v v i …. i i During address translation, if valid/invalid bit in page table entry is i page fault 57 Example 58 Steps on Handling a Page Fault 59 Performance of On-Demand Paging Page Fault Rate 0 p 1.0 if p = 0 no page faults if p = 1, every reference is a fault Effective Access Time (EAT) EAT = (1 – p) x memory access time + p [page fault overhead] [swap out + swap in + OS overhead] 60 Example Average page fault overhead: 10ms Memory access time: 100ns EAT = (1-p)100 + p(10 000 000) ns = (9999900p – 100) ns EAT: is directly proportional to the page-fault rate! If one access out of 1000 causes a page-fault, p = 0.001 EAT = 10us The computer will be slowed down by a factor of 100!!! “Did you say Trashing?” 61 Process Creation Virtual memory permits fast process creation (forking) by allowing Copy-on-Write Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory If either process modifies a shared page, only then is the page copied COW allows more efficient process creation as only modified pages are copied Free pages are allocated from a pool of zeroed-out pages 62 Copy-On-Write Page A Page B Page C Address Space of Process A Page A Page B Page C Page A Page B Page C Main Memory Page A Page B Page C Page A Page B Page C Address Space of Process B Page A Page B Page C Copy of C Page C is modified on Process B 63 Memory Mapped Files Using Paging and Virtual Memory also allows for MemoryMapped Files Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses. Simplifies file access by treating file I/O through memory rather than read()/write() system calls Also allows several processes to map the same file allowing the pages in memory to be shared 64 Memory Mapped Files 65 Example 66 I/O Interlock Sometimes, pages have to be locked in memory I/O buffers used for DMA (Direct-Memory-Access) – while a device is directly copying data to memory Use of a “lock bit” associated to each frame 67 Replacement Policy Now… At a certain point is necessary to bring a page stored in disk back to memory… Which page should be replaced? Some considerations: Use dirty (modify) bit to reduce overhead of page transfers – only modified pages written back to disk. Page removed should be the page least likely to be referenced in the near future. Most policies predict the future behavior based on the past behavior. 68 Basic Page Replacement Algorithm 1. Find the location of the desired page on disk 2. Find a free frame: If there is a free frame, use it. If there is no free frame, use a page replacement algorithm to select a victim frame If victim is dirty write back to disk 3. Read the desired page into the (newly) free frame; update the page and frame tables 4. Restart the process 69 Page Replacement Policy (which page to replace) Two Issues Frame Allocation Policy (how many frames allocate to a process) 70 Algorithms for Page Replacement Goal: get the lowest page-fault rate Evaluate algorithm by running on given string of memory references and compute the number of page faults Example of a reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Algorithms: FIFO, OPTIMAL, LRU, CLOCK 71 Of course… Page faults should decrease with available frames! 72 FIFO Page Replacement Simplest Algorithm to Implement Each page is time-stamped according to when it was brought into memory Simple Queue Page that has been in memory the longest is replaced. Problem 1: The pages that have been the longest in memory can be used quite frequently! Problem 2 (Belay’s Anomaly): More available frames can actually result in more page faults! 73 Example 74 Belay’s Anomaly Reference String: “1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5” With 3 memory frames: 9 page faults 1 2 3 4 1 2 5 5 5 3 4 4 1 2 3 4 1 2 2 2 5 3 3 1 2 3 4 1 1 1 2 5 5 With 4 memory frames: 10 page faults 1 2 3 4 4 4 5 1 2 3 4 5 1 2 3 3 3 4 5 1 2 3 4 1 2 2 2 3 4 5 1 2 3 1 1 1 2 3 4 5 1 2 75 Belay’s Anomaly 76 Optimal Replacement Algorithm Conceptual Model – Very Hard/Impossible to Implement Requires knowledge of the future Used mostly as comparison: to measure how good are real algorithms Replaces the page that will not be used for the longest period of time 77 LRU – Least Recently Used Algorithm Replaces the page that has not been referenced for the longest time. By the principle of locality, this should be the page least likely to be referenced in the near future. Each page could be tagged with the time of last reference. This would require a great deal of overhead 78 LRU – Least Recently Used Algorithm How to implement? Counters: Associate each PTE with a time-of-use counter. The counter is updated every-time a page is used. Finding the LRU PTE requires looking through the whole page table. (Slow!) Stack: Maintain a stack of page numbers. Whenever a page is referenced, it is moved from it’s position to the top. The LRU page will be at the bottom. (Has some overhead on the updating, but finding the LRU is fast.) In practice, few computers provide the hardware support needed for implementing true LRU. Approximations are used… 79 Clock Replacement (or Second Change) Algorithm Requires a use bit (or reference bit) for each page Allows to implement aging of pages (i.e. approximate the LRU) Algorithm When a page is loaded to memory, set use=0 When a page is referenced, set use=1 When it is time to replace a page, the first frame encountered with the use bit set to 0 is replaced During the search for replacement, each use bit found at 1 is set to 0 (aging) If fact, more than one use bit can be used for finer granularity… 80 Clock Replacement Algorithm 81 Comparison between page replacement algorithms 82 Frame allocation and Trashing Page Replacement Policy Two Issues (which page to replace) Frame Allocation Policy (how many frames allocate to a process) If a process does not have “enough” pages, the page-fault rate is very high. This leads to: Low CPU utilization Operating system thinks that it needs to increase the degree of multiprogramming Another process added to the system High I/O Thrashing a process is busy swapping pages in and out without doing anything useful! 83 Trashing vs. multi-processing 84 Locality In Memory Why does paging works? Process migrates from working with “one locality” to another Localities actually overlap Why does trashing occurs? size of locality > total memory size Working-Set: The set of pages a process is currently working on. (Captures the current locality!) 85 Locality In Memory Accesses 86 Working Set Model working-set window a fixed number of page references. Example: 10,000 accesses WSSi (working set of Process Pi) = total number of pages referenced in the most recent (varies in time) if too small will not encompass entire locality if too large will encompass several localities if = will encompass entire program D =WSSi total demand frames if D > m Thrashing Policy if D > m, then suspend one of the processes 87 Working Set Model 88 Keeping Track of the Working Set Approximate with interval timer + a reference bit Example: = 10,000 references Timer interrupts after every 5000 time units Keep in memory 2 bits for each page Whenever a timer interrupts copy and sets the values of all reference bits to 0 If one of the bits in memory = 1 page in working set This scheme is not completely accurate… (Why?) Improvement = 10 bits and interrupt every 1000 time units Problem: too much time spent processing the interrupts and keeping track of the working set 89 Page-Fault Frequency Scheme Establish “acceptable” page-fault rate If actual rate too low, process loses frames If actual rate too high, process gains frames 90 Operating Systems Example: Windows XP Uses demand paging with clustering. Clustering brings in pages surrounding the faulting page. Processes are assigned working set minimum and working set maximum Working set minimum is the minimum number of pages the process is guaranteed to have in memory A process may be assigned as many pages up to its working set maximum When the amount of free memory in the system falls below a threshold, automatic working set trimming is performed to restore the amount of free memory Working set trimming removes pages from processes that have pages in excess of their working set minimum 91 Reference Chapter 9: Virtual Memory 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.9.5, 9.9.6, 9.10.1, 9.11 Exclude from 9.4: 9.4.7 & 9.4.8 92