Download Towards Complexity-Effective Intelligent Prefetchers

Document related concepts

Immunity-aware programming wikipedia , lookup

Memristor wikipedia , lookup

Magnetic-core memory wikipedia , lookup

Random-access memory wikipedia , lookup

Transcript
Enabling Big Memory with
Emerging Technologies
Manjunath Shevgoor
Enabling Big Memory with Emerging Technologies
1
Big Memory
DRAM needs are increasing rapidly
Increased Data Gathering
In Memory Databases
Data Analytics
Enabling Big Memory with Emerging Technologies
2
Need more capacity
Core count doubling ~ every 2 years
DIMM capacity doubling ~ every 3 years
Memory capacity per core expected to
drop every year
[Source: Memory Scaling: Systems Architecture Perspective, O Mutlu]
Source: Kevin Lim et al., Disaggregated Memory for Expansion and Sharing in Blade Servers, ISCA’09
Enabling Big Memory with Emerging Technologies
3
Possible Solutions
3D Stacking
Increased Current Draw
[MICRO’13]
Many Rank
High Refresh Power
[Under Submission]
Memristor
Non- Volatile Memory
Sneak Currents
[ICCD’15, HPCA’12, NVMW’11,15]
Enabling Big Memory with Emerging Technologies
4
Thesis Statement
Memory capacity requirements are increasing at a very fast rate.
Management of high currents is crucial for effective deployment
of new technologies.
This thesis hypothesizes that architecture/OS policies for data
placement can help manage some of the problems posed by high
currents.
Enabling Big Memory with Emerging Technologies
5
Talk Outline
• Current Constraints in 3D DRAM
• Addressing Refresh Overheads in DRAM
• Improving Memristor Memory by Re-using Sneak Currents
• Conclusion and Future Work
Enabling Big Memory with Emerging Technologies
6
IR-Drop in 3D DRAM
[MICRO’13]
Enabling Big Memory with Emerging Technologies
7
What is power delivery network?
 Grid of wires which connects power and circuits
V VSS
 Voltage drops across every PDN
 Voltage lost on the PDN is the IR Drop
Explore architectural policies to manage IR Drop
Source: Sani R. Nassif, Power Grid Analysis Benchmarks
Enabling Big Memory with Emerging Technologies
8
IR Drop in 3D DRAM
• 3D stacking increases current density –
Increased ‘I’
• TSVs add resistance to the PDN –
Increased ‘R’
• Navigate 8 TSV layers to reach the top die
• Insufficient voltage leads to incorrect
operation
Enabling Big Memory with Emerging Technologies
High IR Drop
Low IR Drop
9
Floor Plan and Quality of Power Delivery
V
Y Coordinate
V on M1 on Layer 9
X Coordinate
 Banks that are farther away from the TSVs suffer higher IR Drop
Enabling Big Memory with Emerging Technologies
10
Layer 2
Layer 3
Layer 4
Layer 5
Layer 6
Layer 7
Layer 8
Layer 9
IR Drop Varies along a Die and across the stack
Enabling Big Memory with Emerging Technologies
11
Top 4 Dies
Bot 4 Dies
Create
for Iso-IR
Drop regions
IR Dropconstraints
oblivious page
placement
leads to 47% performance
Logic Layer
Place
critical pages in IR Drop resistant regions
degradation
Enabling Big Memory with Emerging Technologies
12
Region Based Constraints
1-2 Reads allowed/region
Top Region
Spatio-Temporal
Constraints
Bottom Region
4 Reads allowed/region
At least 1 Top-Read
8 Reads allowed/stack
No Top-Reads
16 Reads allowed/stack
Enabling Big Memory with Emerging Technologies
13
Dynamic Page Placement
 Pages with highest total queuing delay are moved to bottom regions
 Using page access count to promote pages can starve threads
 Scheduler ensures fairness
 Page migration is limited by Migration Penalty (10k/15M cycles)
Enabling Big Memory with Emerging Technologies
14
Results
Within
20% of
ideal
Enabling Big Memory with Emerging Technologies
15
Overview
3D Stacking
Increased Current Draw
[MICRO’13]
Many Rank
Refresh Overhead
[Under Submission]
Memristor
Non- Volatile Memory
Sneak Currents
[ICCD’15, HPCA’12, NVMW’11,15]
Enabling Big Memory with Emerging Technologies
16
Re-Thinking Data Placement in
Highly Ranked DRAM Systems
Enabling Big Memory with Emerging Technologies
17
Refresh Power in DRAM
Command
Current (mA)
Write
Refresh
125
245
RefreshAct
consumes 96% more
power
67
than read 125
Read
There can be up to 4 ranks in DIMM
Source: Micron 8GB DDR3L data sheet
Enabling Big Memory with Emerging Technologies
18
Rank 1
Channel 1
Rank 3
Rank 2
MC
MC
Rank 4
Channel 2
8-core
CMP
Stagger refresh to reduce peak power
Enabling Big Memory with Emerging Technologies
19
Increase in Refresh Time
Fine grained refresh
Chip Capacity
(GB)
tRFC
(ns)
tRFC_2X
(ns)
tRFC_4X
(ns)
8
16
32
Refresh Interval
350
480
640
7.8 µs
240
350
480
3.9 µs
160
240
350
1.95 µs
Enabling Big Memory with Emerging Technologies
20
Effect of Staggered Refresh
Completion Time
1.000
1.076
1.140
1.161
0ns
Simul Ref
Stagger Ref
Simul Ref
ExtT
Enabling Big Memory with Emerging Technologies
1.369
Stagger Ref
ExtT
21
Rank 1
Rank 3
Rank 2
Rank 4
Each Staggered Refresh
stalls many cores
MC
Channel 1
MC
Channel 2
8-core
CMP
Stalled
Stalled
T1
R2
T2
R3
T1
R1
T2
R2
T2
R1
T3
R1
T1
R1
T2
R3
T1
R3
Enabling Big Memory with Emerging Technologies
T1
R3
T3
R3
T3
R3
22
Limit the spread- Address Mapping
Normalized Comp. Time
No Interleave
XOR Bank Interleave
1.5
1.0
1.259
1.000
Chan Interleave
1.247
1.346
0.991
0.808
0.5
0.0
No Refresh
Enabling Big Memory with Emerging Technologies
Refresh
23
Rank 1
Rank 3
Rank 2
MC
Channel 1
MC
Rank 4
Channel 2
8-core
CMP
Stalled
Stalled
Ideally
T1
R2
T2
R3
T1
R1
T2
R2
T2
R1
T3
R1
T1
R1
T2
R3
T1
R3
T1
R3
T3
R3
T3
R3
T1
R1
T2
R2
T1
R1
T2
R2
T2
R2
T3
R3
T1
R1
T2
R2
T1
R1
T1
R1
T3
R3
T3
R3
Enabling Big Memory with Emerging Technologies
24
Rank Assigned Page Mapping
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
Thread 6
Rank 1
Thread 7
Thread 8
Rank 3
Rank 2
Channel 1
MC
Rank 4
MC
Channel 2
8-core CMP
(a) Strict mapping of threads to ranks.
Enabling Big Memory with Emerging Technologies
25
Norm. Comp. Time
2.0
1.6
1.2
0.8
0.4
0.0
0ns
Simultaneous Refresh
Staggered Refresh
Staggered RA
18.6% better than Staggered Refresh
Enabling Big Memory with Emerging Technologies
26
Limit the spread- Page Mapping
Thread 1
Thread 2
Thread 3
Thread 4
Thread 5
Thread 6
Rank 1
Thread 7
Thread 8
Rank 3
Rank 2
MC
Channel 1
Rank 4
MC
8-core
CMP
Channel 2
Enabling Big Memory with Emerging Technologies
27
Relaxing Rank Assignment
% Exec. Time
reduction
18.6%
16.5%
14.2%
13.5%
12.1%
9.4%
0%
10%
15%
20%
33%
50%
% Pages not mapped to Preferred Rank
Enabling Big Memory with Emerging Technologies
28
Data Mapping
Address Mapping
18.6% better than Staggered Refresh
Page Mapping
Enabling Big Memory with Emerging Technologies
29
Overview
3D Stacking
Increased Current Draw
[MICRO’13]
Many Rank
Refresh Overhead
[Under Submission]
Memristor
Non- Volatile Memory
Sneak Currents
[ICCD’15, HPCA’12, NVMW’11,15]
Enabling Big Memory with Emerging Technologies
30
Designing a Fast and Reliable
Memory with Memristor
Technology
[ICCD’15, NVMW’15]
Enabling Big Memory with Emerging Technologies
31
Background
 Store data in the form of resistance
 Metal oxide sandwiched between two
electrodes
 Inherently non conducting
 Creation of conductive Filaments of
oxygen vacancies reduces resistance
Source: Cong Xu et al., Modeling and Design Analysis of 3D Vertical
Resistive Memory - A Low Cost Cross-Point Architecture, ASPDAC
2014
Enabling Big Memory with Emerging Technologies
32
Voltage Dependent Resistance
The resistance of a ReRAM cell is not constant but varies with the applied voltage
Combination of a selector in series with memristor device
 Resistance decreases with increasing voltage
Enabling Big Memory with Emerging Technologies
33
Word
Line
Word
Line
Bit
Line
Word
Line
Bit
Line
DRAM Cell
Bit
Line
Memristor Cell
Cell Size of
PCM Cell
2
4F
Enabling Big Memory with Emerging Technologies
34
Cross Point Structure
Selected Cell
Memristor
Selector
Memristor Cell
Because of non-linearity, it is possible to select a cell without an access transistor.
Arrays can be layered vertically without resorting to 3D stacking.
Enabling Big Memory with Emerging Technologies
35
Reading and Writing
V/2
V/2
V/2
0V
Selected Cell
Sneak Current
V
V/2
Half Selected
Cells
V/2
V/2
Enabling Big Memory with Emerging Technologies
36
Effects of Ileak
Enabling Big Memory with Emerging Technologies
37
Effects of Ileak
Decreases Voltage at selected cell
 Increases Write Latency
 Can cause Write Failure
Distorts bit line current
 Increases read complexity
 Decreases read margin
Limits Array Size
Enabling Big Memory with Emerging Technologies
38
Vread/2
Ileak
Vread/2
Ileak
Vread/2
Ileak
Vread/2
Ileak
0 Vread/2 Vread/2
Vread/2
Reading from the crossbar array Step 1: Read background current (Ileak)
Enabling Big Memory with Emerging Technologies
39
Vread
Iread
Vread/2
Ileak
Vread/2
Ileak
Vread/2
Ileak
0 Vread/2 Vread/2
Vread/2
Reading from the crossbar array Step 2: Read total Vread current (Iread)
Enabling Big Memory with Emerging Technologies
40
State of selected cell determines
Iread ~ Ileak
tBG_READ
tREAD
Read Latency
Enabling Big Memory with Emerging Technologies
41
Vread
Vread/2
Vread/2
Vread/2
Pprech
Vread/2 Vread/2 Vread/2
Pacc
Vr
S2
Sensing
Circuit
S1
Sample and Hold
Sneak Current
Proposal 1: Re-use value in sample and hold circuit
Enabling Big Memory with Emerging Technologies
42
Rows
Sneak Current uA
Reusing Sneak Current Read
Columns
Enabling Big Memory with Emerging Technologies
43
Re-Use Sneak Current Reading for the same Column
tBG_READ
tREAD
Read Latency1
Enabling Big Memory with Emerging Technologies
tREAD
Read Latency2
44
Impact of Cell Location
Enabling Big Memory with Emerging Technologies
45
Word Line
Drivers
Bit Line Mux
 Increased error rates
Enabling Big Memory with Emerging Technologies
46
Array 1
Array 2
Bit 1
Array 3
Bit 2
Bit 3
Array 512
Bit 512
64 Byte Cache line
Default mapping leads to some lines with high error rate
Enabling Big Memory with Emerging Technologies
47
Proposal 2: Stagger the array mapping
Cacheline 1
Default
Mapping
Proposed
Mapping
Cacheline 2
Cacheline 4
Cacheline 3
0
0
1
1
2
2
3
3
0
0
1
1
2
2
3
3
30X reduction in probability of a single bit errorNth bit in cacheline
Array 0
Array 1
Array 2
Array 3
1
3
1
3
1
3
1
3
0
2
0
2
0
2
0
2
Enabling Big Memory with Emerging Technologies
48
Increase in Performance
Performance Vs Baseline
140%
120%
100%
80%
60%
40%
20%
0%
DRAM
32ReUse
Improving Memristor Memory with Sneak Current Sharing
49
Exploring Address Mapping
Normalized IPC
1.2
32Reuse
4interleave
XOR
32interleave
4Reuse
1.1
1.0
0.9
0.8
0.7
Improving Memristor Memory with Sneak Current Sharing
50
Summary of Dissertation
Increased Current
Draw
3D Stacking Spatio-Temporal
Constraints
[MICRO’13]
Many Rank Re-Thinking
Refresh
Overhead
Data Placement
[Under Submission]
Re-use
SneakLatencies
Currents
Memristor
Non- Volatile Memory
Memory
[ICCD’15, HPCA’12, NVMW’11,15]
Enabling Big Memory with Emerging Technologies
51
Conclusions
IR Drop Constraints
3D Stacking
Rank Assignment
Many Rank
Re-Use Sneak Currents
Memristor
Enabling Big Memory with Emerging Technologies
52
Future Work
• Mitigating the Rising Cost of Process Variation in 3D DRAM
• PDN Aware Refresh Cycle Time for 3D DRAM
• Addressing Long Write Latencies in Memristor based Memory
Enabling Big Memory with Emerging Technologies
53
Other Projects and Publications
• Efficiently Prefetching Complex Address Patterns
• MICRO’15
• USIMM: The Utah Simulated Memory Module
• Used for the Memory Scheduling Championship
• Efficient Scrub Mechanisms for Error-Prone Emerging Memories
• HPCA’12
• Accelerating Critical Word Access using Heterogeneous Memory
• MICRO’12
• Avoiding Information Leakage in the Memory Controller
• MICRO’15
Enabling Big Memory with Emerging Technologies
54
Acknowledgements
• Rajeev
• Ashwini, Parents
• Al, Erik, Naveen, Ken
• Chris Wilkerson, Zeshan Chishti
• Utah Arch team-mates
• Karen, Ann
Enabling Big Memory with Emerging Technologies
55
Thank You
Enabling Big Memory with Emerging Technologies
56
Thesis Overview
3D Stacking
Many Rank
Analyze
Increased
Current Density
Impact
of
[MICRO’13]
Currents
Data
+ High Refresh Current
Placement
[Under Submission]
Performance
Loss
Memristor
Non- Volatile Memory
Sneak Currents
[ICCD’15, NVMW’11,15]
Enabling Big Memory with Emerging Technologies
57
Normalized Exec. Time
Comparisons to Prior Work
1.36
1.00
1.10
0ns
RA
1.47
1.18
ExtT640ns RP_Opt
Enabling Big Memory with Emerging Technologies
1.30
RP_Real Elastic Ref
58
Bit Lines
RW
RW
RW
Word Lines
V
VW1
VW2
RW
VWN
RW
RW
RW
RW
VWN1
RW
VWNM
RW
V/2
RW
RW
V/2
V/2
0
Bit Line Mux
Bit line and word line resistances eat into the cell Voltage
Enabling Big Memory with Emerging Technologies
59
% Refreshes
Percentage of refreshes stalling a thread
100
90
80
70
60
50
40
30
20
10
0
50
Enabling Big Memory with Emerging Technologies
60
Memory Latecny
(Cycles)
Memory Latency
500
NoReUse
32Reuse
DRAM
400
300
200
100
0
Improving Memristor Memory with Sneak Current Sharing
61
Normalized Read Power
Memristor Read Power
32ReUse
1.0
4ReUse
4Interleave
32Interleave
NoReUse
0.9
0.8
0.7
0.6
0.5
Improving Memristor Memory with Sneak Current Sharing
62
$ Miss
Last Level $$
Core 1
Delta
History
Tables
Delta
Prediction
Tables
Prediction
Delta
Prediction
Tables
Prediction
Feedback
z
Core 8
Feedback
$ Miss
Delta
History
Tables
See a Delta?
Predict a Delta!
Enabling Big Memory with Emerging Technologies
63
Vread
Iread
Vread/2
Ileak
Vread/2
Ileak
Vread/2
Ileak
0 Vread/2 Vread/2
Vread/2
Sneak path currents can distort Iread
Enabling Big Memory with Emerging Technologies
64
Sneak Currents
Compress to reduce write latency
Array 1
Array 2
Bit 1
Array 3
Array 512
Bit 3
Bit 2
Bit 512
64 Byte Cache line
Proposed Mapping
With 50% Compression
1
3
1
3
1
3
1
3
0
2
0
2
0
2
0
2
Enabling Big Memory with Emerging Technologies
66
Normalized Completion Time
2.5
2.0
1.5
1.0
0.5
0.0
0ns
Simultaneous Refresh
Staggered Refresh
Enabling Big Memory with Emerging Technologies
Staggered RA
67
Summary
 With great density come a few challenges
 Sneak Currents limit array size, complicate reads, and delay writes
 Affect reliability
 Background current can be reused
 Reliability can be improved at the cost of write latency
 Compression can reduce write latency
 8.3% performance improvement
 30X reduction in multi bit error probability
Enabling Big Memory with Emerging Technologies
68
500
450
400
350
300
250
200
150
100
50
0
NoReUse
32Reuse
DRAM
Improving Memristor Memory with Sneak Current Sharing
CHR
100
90
80
70
60
50
40
30
20
10
0
Column Hit Rate (%)
Memory Latency (Cycles)
Column Hit Rate
69