Download Lecture 14 - Memory Organization 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Immunity-aware programming wikipedia , lookup

Random-access memory wikipedia , lookup

Transcript
CS 325: CS Hardware and Software
Organization and Architecture
Internal Memory
1
Semiconductor Main Memory
• Early computer used doughnut shaped ferromagnetic loops
called cores for storing each bit.
• This main memory was often referred to as “core memory” or
just “core”.
• Some terms still exist: “core dump”
2
Memory Cells
•Properties:
•Includes two stable or semi-stable states
representing 1 and 0.
•Capable of being written to at least once to
set state.
•Capable of being read to sense the state.
3
Memory Cell Operation
• Select line selects cell for operation specified by control line.
• Control line has read or write signal.
• Data/Sense line captures current state, or creates new state.
4
Semiconductor Memory Types
5
Semiconductor Memory
•RAM (Random Access Memory)
• Misnamed as all semiconductor memory is “random
access”
• Time required to access any address is constant and
does not depend on previous address accessed.
• Read and write abilities
• Volatile
• Temporary storage
•Two technologies:
• Dynamic RAM:
• Analog device, uses capacitor to store charge.
• Static RAM:
• Digital device, uses flip-flop logic gates to store state.
6
Dynamic RAM (DRAM)
•Bits stored as charge in capacitors
•Capacitor charge will leak, resulting in the need of
a refresh circuit.
•Simpler construction that static RAM
•Used for main memory.
•Essentially analog rather than digital.
•Level of capacitor charge determines logic value
of memory cell.
7
Dynamic RAM Structure (Main Memory)
• Address line is active when bit value from cell is read/written.
• Write: voltage applied to the bit line. High  1, Low  0.
• Read: Charge in capacitor is fed out the bit line to a reference circuit.
• Readouts discharge the capacitor, which must be restored to
complete the operation
8
Static RAM (SRAM)
•Digital device that uses the same logic elements as
the CPU.
•Binary values are stored using traditional flip-flop
logic gate configurations.
•No charges to leak
• No refresh needed when powered
•More complex construction – 6 transistors
•Larger and more expensive per bit, but faster than
DRAM
9
Static RAM Structure (Cache Memory)
• Address line is active when bit value from cell is read/written.
• Write: Desired bit value is applied to B, compliment applied to B
• Read: Bit value is read from line B.
10
SRAM (Cache) Vs DRAM (Main Memory)
• Both are volatile
• Power needed to preserve data
• Dynamic RAM
• Simpler to build and smaller than SRAM
• Therefore more dense and less expensive
• Needs refresh circuitry
• Favored for larger memory units
• Typically has a synchronous interface (responds to control inputs
based on clock signal) SDRAM, commonly known as DDR SDRAM
• Static RAM
• Faster than DRAM
• More expensive to build
• Favored for cache memory
• Typically has an asynchronous interface (responds to control
inputs as quickly as possible)
11
SDR and DDR SDRAM
• SDR SDRAM
• Single Data Rate Synchronous Dynamic Random Access Memory 
• One word of data transferred per clock cycle
• 3.3V
• 66 – 133 MHz
• Released 1993
• DDR1 SDRAM
• Double Data Rate Synchronous Dynamic Random Access Memory
• Two words of data transferred per clock cycle
• 2.5V
• 100 – 200MHz
• Released 2000
12
SDR and DDR SDRAM
• DDR2 SDRAM
• Double Data Rate Synchronous Dynamic Random Access Memory
• Four words of data transferred per clock cycle
• 1.8V
• 200 – 400 MHz
• Released 2003
• DDR3 SDRAM
• Double Data Rate Synchronous Dynamic Random Access Memory
• Eight words of data transferred per clock cycle
• 1.35V
• 400 – 1067 MHz
• Released 2007
13
SDR and DDR SDRAM
• DDR4 SDRAM
• Double Data Rate Synchronous Dynamic Random Access Memory
• Over 30 Gbps bandwidth between RAM and CPU
• 1.2V
• 800 – 1600 MHz
• Released 2014
14
Virtual Memory
•Idea 1:
• Many Programs share DRAM Memory so that context
switches can occur
•Idea 2:
• Allow programs to be written without memory
constraints
• programs can exceed the size of the main memory
•Idea 3:
• Relocation: Parts of the program can be placed at
different locations in the memory instead of a big chunk.
•Virtual Memory:
• DRAM Memory holds many programs running at same
time
• Use DRAM Memory as a kind of “cache” for disk
15
Memory Hierarchy: The Big Picture
Data movement in a memory hierarchy.
16
Virtual Memory
•Each process has its own private virtual address
space
• Example: 232 Bytes
• CPU actually generates virtual addresses for processes
•Each computer has a physical address space
• Example: 4 GB SDRAM
• Also called “real memory”
•Address translation:
• Mapping virtual to physical addresses
• Allows multiple programs to use memory at same time
• Also allows some chunks of virtual memory to be
represented on disk, not in main memory
17
Mapping Virtual Memory to Physical Memory
• Divide Memory into equal sized
“chunks” (Ex: 4KB each)
• Any chunk of Virtual Memory
assigned to any chunk of Physical
Memory page
Virtual Memory

Stack
64 MB
Single
Process
Heap
Heap
Physical
Memory
Static
Code
0
0
18
Comparing the 2 levels of hierarchy
Cache
Virtual Memory
Miss
Page Fault
Block Size: 32-64B
Page Size: 4K-16KB
How Managed:
Hardware
Hardware + Software
(Operating System)
Block or Line
Page
19
Address Translation
•How to locate the physical page?
• Search impractical (too many pages)
•A page table is a data structure which contains the
mapping of virtual pages to physical pages
•Each process running in the system has its own page
table
20
How Translate Fast?
•Problem: Virtual Memory requires two
memory accesses!
• One to translate Virtual Address into Physical Address
(page table lookup)
• One to transfer the actual data (cache hit)
• Page Table is in physical memory! => 2 main memory accesses!
•Why not create a cache of virtual to physical
address translations to make translation fast?
•This “page table cache” is called:
• Translation Lookaside Buffer, or TLB
21
Translation Lookaside Buffer
• The TLB is a small cache of the most recent virtual-physical
mappings. By checking here first, temporal locality is
exploited to speed virtual address translation.
• The strange name comes from the idea that, while the
process to perform a virtual-to-physical translation is
underway, the hardware checks to see if it has seen this
translation recently.
22
Page Tables and Address Translation
23
Translation Lookaside Buffer
• When an address is searched in the TLB and not found, the physical
memory must be searched.
• TLB Miss:
• Either the page is present  a TLB entry must be created, or
• The page is not present, a page fault exception is signaled.
• Exception is flushed, place program counter (PC) in the exception
program counter (EPC) and interrupts the CPU
• The OS, checking the exception, discovers a page fault was
signaled.
• What is needed:
• Instruction page fault: find address in EPC
• Data page fault: generate address from Instruction
• OS then:
• Finds the disk address
• Chooses address in TLB to replace
• Initiates read of disk block
24
Optimizing for Space
•Page Table too big!
• 4GB Virtual Address Space ÷ 4 KB page
• 220 (~ 1 million) Page Table Entries
• 4 MB just for Page Table of single process!
• Variety of solutions to tradeoff Page Table size for
performance when miss occurs in TLB
• Use a limit register (size in memory of a process) to restrict
page table size and let it grow with more pages, if needed
• Multilevel page table
• Paging page tables, etc.
(Take OS Class to learn more)
25
Read Only Memory (ROM)
•Permanent, nonvolatile storage
•Contains a permanent pattern of data that cannot
be changed or added to
•No power source required to maintain bit values
•Data is actually wired into the chip as a part of the
fabrication process
•Disadvantages:
• No room for error.
• If one bit is wrong, whole batch of ROMs must be
thrown out
26
Read-mostly Memory
•Read “mostly” memories can be rewritten
•Erasable Programmable (EPROM)
•Optical erasure of entire chip by UV light
•Can take up to 20 minutes to erase
•Only one transistor per bit
•Electrically Erasable (EEPROM)
•Takes much longer to write than read
•Several hundred microseconds
•Can rewrite single bytes
•Less dense than EPROM
27
Flash Memory
•Provides block electrical erasure but not byte
level
•Typical block size 512, 2048, 4096
•High density
•One transistor per bit
•Fast read speeds, but not as fast as DRAM
•Very slow erase speed
28
Chip Logic
•Trade-offs in chip design among speed, capacity,
and cost
•Key issue is number of bits that can be written
simultaneously
• One extreme:
• Physical arrangement of memory cells same as logical
arrangement of words in memory
• 16Mbit chip is 1M 16-bit words
• Other extreme:
• One bit per chip, 16M memory uses 16M 1-bit chips
29
Organization in detail
•A 16Mbit (2MB) chip can be organized as a
2048 x 2048 x 4bit array
•Reduces number of address pins
•Multiplex row and column address
•11 pins to address (211 = 2048)
•Adding one more pin doubles range (212 = 4096)
30
Chip Packaging
31
1 MB EPROM Packaging
•Organized as 1M 8 bit words, 32 pins
•A0 – A19 address pins (20 bit address)
•D0 – D7 data pins
•Power supply at Vcc and ground at Vss
•CE chip enable pin
• Indicates whether read/write address valid for this chip
• Could be several chips
• Vpp programming voltage pin used in write operations
32
16 Mbit DRAM
• Organized as 4Mx4 bits
• Data pins D1-D4 are input/output
• WE (write enable) and OE (output enable) determine if
read or write occurs
• RAS (row address select) and CAS (column address select)
pins
• 2 Vcc and Vss pins
• One NC (no connect) to make even number of pins
33
Error Detection and Correction
•Hard Failure
• Permanent defect
• Caused by
• Harsh environmental abuse
• Manufacturing defects
• Wear
•Soft Error
• Random, non-destructive
• No permanent damage to memory
• Caused by
• Power supply problems
34
Error Detection and Correction
•A single parity bit can be used to detect (most)
errors in a word
• Ex: 10010 to send
• 1+0+0+1+0 mod 2 == 0 (even parity)
• Sender adds parity bit: 100100, and sends
• Receiver computes parity: 1+0+0+1+0 mod 2 == 0
• Receiver detects correct transmission based on parity bit
•Parity bit test can fail to detect errors when there is
more than one bit error
•Hamming codes can be used to detect and correct
errors
35