Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS 325: CS Hardware and Software Organization and Architecture Internal Memory 1 Semiconductor Main Memory • Early computer used doughnut shaped ferromagnetic loops called cores for storing each bit. • This main memory was often referred to as “core memory” or just “core”. • Some terms still exist: “core dump” 2 Memory Cells •Properties: •Includes two stable or semi-stable states representing 1 and 0. •Capable of being written to at least once to set state. •Capable of being read to sense the state. 3 Memory Cell Operation • Select line selects cell for operation specified by control line. • Control line has read or write signal. • Data/Sense line captures current state, or creates new state. 4 Semiconductor Memory Types 5 Semiconductor Memory •RAM (Random Access Memory) • Misnamed as all semiconductor memory is “random access” • Time required to access any address is constant and does not depend on previous address accessed. • Read and write abilities • Volatile • Temporary storage •Two technologies: • Dynamic RAM: • Analog device, uses capacitor to store charge. • Static RAM: • Digital device, uses flip-flop logic gates to store state. 6 Dynamic RAM (DRAM) •Bits stored as charge in capacitors •Capacitor charge will leak, resulting in the need of a refresh circuit. •Simpler construction that static RAM •Used for main memory. •Essentially analog rather than digital. •Level of capacitor charge determines logic value of memory cell. 7 Dynamic RAM Structure (Main Memory) • Address line is active when bit value from cell is read/written. • Write: voltage applied to the bit line. High 1, Low 0. • Read: Charge in capacitor is fed out the bit line to a reference circuit. • Readouts discharge the capacitor, which must be restored to complete the operation 8 Static RAM (SRAM) •Digital device that uses the same logic elements as the CPU. •Binary values are stored using traditional flip-flop logic gate configurations. •No charges to leak • No refresh needed when powered •More complex construction – 6 transistors •Larger and more expensive per bit, but faster than DRAM 9 Static RAM Structure (Cache Memory) • Address line is active when bit value from cell is read/written. • Write: Desired bit value is applied to B, compliment applied to B • Read: Bit value is read from line B. 10 SRAM (Cache) Vs DRAM (Main Memory) • Both are volatile • Power needed to preserve data • Dynamic RAM • Simpler to build and smaller than SRAM • Therefore more dense and less expensive • Needs refresh circuitry • Favored for larger memory units • Typically has a synchronous interface (responds to control inputs based on clock signal) SDRAM, commonly known as DDR SDRAM • Static RAM • Faster than DRAM • More expensive to build • Favored for cache memory • Typically has an asynchronous interface (responds to control inputs as quickly as possible) 11 SDR and DDR SDRAM • SDR SDRAM • Single Data Rate Synchronous Dynamic Random Access Memory • One word of data transferred per clock cycle • 3.3V • 66 – 133 MHz • Released 1993 • DDR1 SDRAM • Double Data Rate Synchronous Dynamic Random Access Memory • Two words of data transferred per clock cycle • 2.5V • 100 – 200MHz • Released 2000 12 SDR and DDR SDRAM • DDR2 SDRAM • Double Data Rate Synchronous Dynamic Random Access Memory • Four words of data transferred per clock cycle • 1.8V • 200 – 400 MHz • Released 2003 • DDR3 SDRAM • Double Data Rate Synchronous Dynamic Random Access Memory • Eight words of data transferred per clock cycle • 1.35V • 400 – 1067 MHz • Released 2007 13 SDR and DDR SDRAM • DDR4 SDRAM • Double Data Rate Synchronous Dynamic Random Access Memory • Over 30 Gbps bandwidth between RAM and CPU • 1.2V • 800 – 1600 MHz • Released 2014 14 Virtual Memory •Idea 1: • Many Programs share DRAM Memory so that context switches can occur •Idea 2: • Allow programs to be written without memory constraints • programs can exceed the size of the main memory •Idea 3: • Relocation: Parts of the program can be placed at different locations in the memory instead of a big chunk. •Virtual Memory: • DRAM Memory holds many programs running at same time • Use DRAM Memory as a kind of “cache” for disk 15 Memory Hierarchy: The Big Picture Data movement in a memory hierarchy. 16 Virtual Memory •Each process has its own private virtual address space • Example: 232 Bytes • CPU actually generates virtual addresses for processes •Each computer has a physical address space • Example: 4 GB SDRAM • Also called “real memory” •Address translation: • Mapping virtual to physical addresses • Allows multiple programs to use memory at same time • Also allows some chunks of virtual memory to be represented on disk, not in main memory 17 Mapping Virtual Memory to Physical Memory • Divide Memory into equal sized “chunks” (Ex: 4KB each) • Any chunk of Virtual Memory assigned to any chunk of Physical Memory page Virtual Memory Stack 64 MB Single Process Heap Heap Physical Memory Static Code 0 0 18 Comparing the 2 levels of hierarchy Cache Virtual Memory Miss Page Fault Block Size: 32-64B Page Size: 4K-16KB How Managed: Hardware Hardware + Software (Operating System) Block or Line Page 19 Address Translation •How to locate the physical page? • Search impractical (too many pages) •A page table is a data structure which contains the mapping of virtual pages to physical pages •Each process running in the system has its own page table 20 How Translate Fast? •Problem: Virtual Memory requires two memory accesses! • One to translate Virtual Address into Physical Address (page table lookup) • One to transfer the actual data (cache hit) • Page Table is in physical memory! => 2 main memory accesses! •Why not create a cache of virtual to physical address translations to make translation fast? •This “page table cache” is called: • Translation Lookaside Buffer, or TLB 21 Translation Lookaside Buffer • The TLB is a small cache of the most recent virtual-physical mappings. By checking here first, temporal locality is exploited to speed virtual address translation. • The strange name comes from the idea that, while the process to perform a virtual-to-physical translation is underway, the hardware checks to see if it has seen this translation recently. 22 Page Tables and Address Translation 23 Translation Lookaside Buffer • When an address is searched in the TLB and not found, the physical memory must be searched. • TLB Miss: • Either the page is present a TLB entry must be created, or • The page is not present, a page fault exception is signaled. • Exception is flushed, place program counter (PC) in the exception program counter (EPC) and interrupts the CPU • The OS, checking the exception, discovers a page fault was signaled. • What is needed: • Instruction page fault: find address in EPC • Data page fault: generate address from Instruction • OS then: • Finds the disk address • Chooses address in TLB to replace • Initiates read of disk block 24 Optimizing for Space •Page Table too big! • 4GB Virtual Address Space ÷ 4 KB page • 220 (~ 1 million) Page Table Entries • 4 MB just for Page Table of single process! • Variety of solutions to tradeoff Page Table size for performance when miss occurs in TLB • Use a limit register (size in memory of a process) to restrict page table size and let it grow with more pages, if needed • Multilevel page table • Paging page tables, etc. (Take OS Class to learn more) 25 Read Only Memory (ROM) •Permanent, nonvolatile storage •Contains a permanent pattern of data that cannot be changed or added to •No power source required to maintain bit values •Data is actually wired into the chip as a part of the fabrication process •Disadvantages: • No room for error. • If one bit is wrong, whole batch of ROMs must be thrown out 26 Read-mostly Memory •Read “mostly” memories can be rewritten •Erasable Programmable (EPROM) •Optical erasure of entire chip by UV light •Can take up to 20 minutes to erase •Only one transistor per bit •Electrically Erasable (EEPROM) •Takes much longer to write than read •Several hundred microseconds •Can rewrite single bytes •Less dense than EPROM 27 Flash Memory •Provides block electrical erasure but not byte level •Typical block size 512, 2048, 4096 •High density •One transistor per bit •Fast read speeds, but not as fast as DRAM •Very slow erase speed 28 Chip Logic •Trade-offs in chip design among speed, capacity, and cost •Key issue is number of bits that can be written simultaneously • One extreme: • Physical arrangement of memory cells same as logical arrangement of words in memory • 16Mbit chip is 1M 16-bit words • Other extreme: • One bit per chip, 16M memory uses 16M 1-bit chips 29 Organization in detail •A 16Mbit (2MB) chip can be organized as a 2048 x 2048 x 4bit array •Reduces number of address pins •Multiplex row and column address •11 pins to address (211 = 2048) •Adding one more pin doubles range (212 = 4096) 30 Chip Packaging 31 1 MB EPROM Packaging •Organized as 1M 8 bit words, 32 pins •A0 – A19 address pins (20 bit address) •D0 – D7 data pins •Power supply at Vcc and ground at Vss •CE chip enable pin • Indicates whether read/write address valid for this chip • Could be several chips • Vpp programming voltage pin used in write operations 32 16 Mbit DRAM • Organized as 4Mx4 bits • Data pins D1-D4 are input/output • WE (write enable) and OE (output enable) determine if read or write occurs • RAS (row address select) and CAS (column address select) pins • 2 Vcc and Vss pins • One NC (no connect) to make even number of pins 33 Error Detection and Correction •Hard Failure • Permanent defect • Caused by • Harsh environmental abuse • Manufacturing defects • Wear •Soft Error • Random, non-destructive • No permanent damage to memory • Caused by • Power supply problems 34 Error Detection and Correction •A single parity bit can be used to detect (most) errors in a word • Ex: 10010 to send • 1+0+0+1+0 mod 2 == 0 (even parity) • Sender adds parity bit: 100100, and sends • Receiver computes parity: 1+0+0+1+0 mod 2 == 0 • Receiver detects correct transmission based on parity bit •Parity bit test can fail to detect errors when there is more than one bit error •Hamming codes can be used to detect and correct errors 35