Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Enabling Big Memory with Emerging Technologies Manjunath Shevgoor Enabling Big Memory with Emerging Technologies 1 Big Memory DRAM needs are increasing rapidly Increased Data Gathering In Memory Databases Data Analytics Enabling Big Memory with Emerging Technologies 2 Need more capacity Core count doubling ~ every 2 years DIMM capacity doubling ~ every 3 years Memory capacity per core expected to drop every year [Source: Memory Scaling: Systems Architecture Perspective, O Mutlu] Source: Kevin Lim et al., Disaggregated Memory for Expansion and Sharing in Blade Servers, ISCA’09 Enabling Big Memory with Emerging Technologies 3 Possible Solutions 3D Stacking Increased Current Draw [MICRO’13] Many Rank High Refresh Power [Under Submission] Memristor Non- Volatile Memory Sneak Currents [ICCD’15, HPCA’12, NVMW’11,15] Enabling Big Memory with Emerging Technologies 4 Thesis Statement Memory capacity requirements are increasing at a very fast rate. Management of high currents is crucial for effective deployment of new technologies. This thesis hypothesizes that architecture/OS policies for data placement can help manage some of the problems posed by high currents. Enabling Big Memory with Emerging Technologies 5 Talk Outline • Current Constraints in 3D DRAM • Addressing Refresh Overheads in DRAM • Improving Memristor Memory by Re-using Sneak Currents • Conclusion and Future Work Enabling Big Memory with Emerging Technologies 6 IR-Drop in 3D DRAM [MICRO’13] Enabling Big Memory with Emerging Technologies 7 What is power delivery network? Grid of wires which connects power and circuits V VSS Voltage drops across every PDN Voltage lost on the PDN is the IR Drop Explore architectural policies to manage IR Drop Source: Sani R. Nassif, Power Grid Analysis Benchmarks Enabling Big Memory with Emerging Technologies 8 IR Drop in 3D DRAM • 3D stacking increases current density – Increased ‘I’ • TSVs add resistance to the PDN – Increased ‘R’ • Navigate 8 TSV layers to reach the top die • Insufficient voltage leads to incorrect operation Enabling Big Memory with Emerging Technologies High IR Drop Low IR Drop 9 Floor Plan and Quality of Power Delivery V Y Coordinate V on M1 on Layer 9 X Coordinate Banks that are farther away from the TSVs suffer higher IR Drop Enabling Big Memory with Emerging Technologies 10 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7 Layer 8 Layer 9 IR Drop Varies along a Die and across the stack Enabling Big Memory with Emerging Technologies 11 Top 4 Dies Bot 4 Dies Create for Iso-IR Drop regions IR Dropconstraints oblivious page placement leads to 47% performance Logic Layer Place critical pages in IR Drop resistant regions degradation Enabling Big Memory with Emerging Technologies 12 Region Based Constraints 1-2 Reads allowed/region Top Region Spatio-Temporal Constraints Bottom Region 4 Reads allowed/region At least 1 Top-Read 8 Reads allowed/stack No Top-Reads 16 Reads allowed/stack Enabling Big Memory with Emerging Technologies 13 Dynamic Page Placement Pages with highest total queuing delay are moved to bottom regions Using page access count to promote pages can starve threads Scheduler ensures fairness Page migration is limited by Migration Penalty (10k/15M cycles) Enabling Big Memory with Emerging Technologies 14 Results Within 20% of ideal Enabling Big Memory with Emerging Technologies 15 Overview 3D Stacking Increased Current Draw [MICRO’13] Many Rank Refresh Overhead [Under Submission] Memristor Non- Volatile Memory Sneak Currents [ICCD’15, HPCA’12, NVMW’11,15] Enabling Big Memory with Emerging Technologies 16 Re-Thinking Data Placement in Highly Ranked DRAM Systems Enabling Big Memory with Emerging Technologies 17 Refresh Power in DRAM Command Current (mA) Write Refresh 125 245 RefreshAct consumes 96% more power 67 than read 125 Read There can be up to 4 ranks in DIMM Source: Micron 8GB DDR3L data sheet Enabling Big Memory with Emerging Technologies 18 Rank 1 Channel 1 Rank 3 Rank 2 MC MC Rank 4 Channel 2 8-core CMP Stagger refresh to reduce peak power Enabling Big Memory with Emerging Technologies 19 Increase in Refresh Time Fine grained refresh Chip Capacity (GB) tRFC (ns) tRFC_2X (ns) tRFC_4X (ns) 8 16 32 Refresh Interval 350 480 640 7.8 µs 240 350 480 3.9 µs 160 240 350 1.95 µs Enabling Big Memory with Emerging Technologies 20 Effect of Staggered Refresh Completion Time 1.000 1.076 1.140 1.161 0ns Simul Ref Stagger Ref Simul Ref ExtT Enabling Big Memory with Emerging Technologies 1.369 Stagger Ref ExtT 21 Rank 1 Rank 3 Rank 2 Rank 4 Each Staggered Refresh stalls many cores MC Channel 1 MC Channel 2 8-core CMP Stalled Stalled T1 R2 T2 R3 T1 R1 T2 R2 T2 R1 T3 R1 T1 R1 T2 R3 T1 R3 Enabling Big Memory with Emerging Technologies T1 R3 T3 R3 T3 R3 22 Limit the spread- Address Mapping Normalized Comp. Time No Interleave XOR Bank Interleave 1.5 1.0 1.259 1.000 Chan Interleave 1.247 1.346 0.991 0.808 0.5 0.0 No Refresh Enabling Big Memory with Emerging Technologies Refresh 23 Rank 1 Rank 3 Rank 2 MC Channel 1 MC Rank 4 Channel 2 8-core CMP Stalled Stalled Ideally T1 R2 T2 R3 T1 R1 T2 R2 T2 R1 T3 R1 T1 R1 T2 R3 T1 R3 T1 R3 T3 R3 T3 R3 T1 R1 T2 R2 T1 R1 T2 R2 T2 R2 T3 R3 T1 R1 T2 R2 T1 R1 T1 R1 T3 R3 T3 R3 Enabling Big Memory with Emerging Technologies 24 Rank Assigned Page Mapping Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Rank 1 Thread 7 Thread 8 Rank 3 Rank 2 Channel 1 MC Rank 4 MC Channel 2 8-core CMP (a) Strict mapping of threads to ranks. Enabling Big Memory with Emerging Technologies 25 Norm. Comp. Time 2.0 1.6 1.2 0.8 0.4 0.0 0ns Simultaneous Refresh Staggered Refresh Staggered RA 18.6% better than Staggered Refresh Enabling Big Memory with Emerging Technologies 26 Limit the spread- Page Mapping Thread 1 Thread 2 Thread 3 Thread 4 Thread 5 Thread 6 Rank 1 Thread 7 Thread 8 Rank 3 Rank 2 MC Channel 1 Rank 4 MC 8-core CMP Channel 2 Enabling Big Memory with Emerging Technologies 27 Relaxing Rank Assignment % Exec. Time reduction 18.6% 16.5% 14.2% 13.5% 12.1% 9.4% 0% 10% 15% 20% 33% 50% % Pages not mapped to Preferred Rank Enabling Big Memory with Emerging Technologies 28 Data Mapping Address Mapping 18.6% better than Staggered Refresh Page Mapping Enabling Big Memory with Emerging Technologies 29 Overview 3D Stacking Increased Current Draw [MICRO’13] Many Rank Refresh Overhead [Under Submission] Memristor Non- Volatile Memory Sneak Currents [ICCD’15, HPCA’12, NVMW’11,15] Enabling Big Memory with Emerging Technologies 30 Designing a Fast and Reliable Memory with Memristor Technology [ICCD’15, NVMW’15] Enabling Big Memory with Emerging Technologies 31 Background Store data in the form of resistance Metal oxide sandwiched between two electrodes Inherently non conducting Creation of conductive Filaments of oxygen vacancies reduces resistance Source: Cong Xu et al., Modeling and Design Analysis of 3D Vertical Resistive Memory - A Low Cost Cross-Point Architecture, ASPDAC 2014 Enabling Big Memory with Emerging Technologies 32 Voltage Dependent Resistance The resistance of a ReRAM cell is not constant but varies with the applied voltage Combination of a selector in series with memristor device Resistance decreases with increasing voltage Enabling Big Memory with Emerging Technologies 33 Word Line Word Line Bit Line Word Line Bit Line DRAM Cell Bit Line Memristor Cell Cell Size of PCM Cell 2 4F Enabling Big Memory with Emerging Technologies 34 Cross Point Structure Selected Cell Memristor Selector Memristor Cell Because of non-linearity, it is possible to select a cell without an access transistor. Arrays can be layered vertically without resorting to 3D stacking. Enabling Big Memory with Emerging Technologies 35 Reading and Writing V/2 V/2 V/2 0V Selected Cell Sneak Current V V/2 Half Selected Cells V/2 V/2 Enabling Big Memory with Emerging Technologies 36 Effects of Ileak Enabling Big Memory with Emerging Technologies 37 Effects of Ileak Decreases Voltage at selected cell Increases Write Latency Can cause Write Failure Distorts bit line current Increases read complexity Decreases read margin Limits Array Size Enabling Big Memory with Emerging Technologies 38 Vread/2 Ileak Vread/2 Ileak Vread/2 Ileak Vread/2 Ileak 0 Vread/2 Vread/2 Vread/2 Reading from the crossbar array Step 1: Read background current (Ileak) Enabling Big Memory with Emerging Technologies 39 Vread Iread Vread/2 Ileak Vread/2 Ileak Vread/2 Ileak 0 Vread/2 Vread/2 Vread/2 Reading from the crossbar array Step 2: Read total Vread current (Iread) Enabling Big Memory with Emerging Technologies 40 State of selected cell determines Iread ~ Ileak tBG_READ tREAD Read Latency Enabling Big Memory with Emerging Technologies 41 Vread Vread/2 Vread/2 Vread/2 Pprech Vread/2 Vread/2 Vread/2 Pacc Vr S2 Sensing Circuit S1 Sample and Hold Sneak Current Proposal 1: Re-use value in sample and hold circuit Enabling Big Memory with Emerging Technologies 42 Rows Sneak Current uA Reusing Sneak Current Read Columns Enabling Big Memory with Emerging Technologies 43 Re-Use Sneak Current Reading for the same Column tBG_READ tREAD Read Latency1 Enabling Big Memory with Emerging Technologies tREAD Read Latency2 44 Impact of Cell Location Enabling Big Memory with Emerging Technologies 45 Word Line Drivers Bit Line Mux Increased error rates Enabling Big Memory with Emerging Technologies 46 Array 1 Array 2 Bit 1 Array 3 Bit 2 Bit 3 Array 512 Bit 512 64 Byte Cache line Default mapping leads to some lines with high error rate Enabling Big Memory with Emerging Technologies 47 Proposal 2: Stagger the array mapping Cacheline 1 Default Mapping Proposed Mapping Cacheline 2 Cacheline 4 Cacheline 3 0 0 1 1 2 2 3 3 0 0 1 1 2 2 3 3 30X reduction in probability of a single bit errorNth bit in cacheline Array 0 Array 1 Array 2 Array 3 1 3 1 3 1 3 1 3 0 2 0 2 0 2 0 2 Enabling Big Memory with Emerging Technologies 48 Increase in Performance Performance Vs Baseline 140% 120% 100% 80% 60% 40% 20% 0% DRAM 32ReUse Improving Memristor Memory with Sneak Current Sharing 49 Exploring Address Mapping Normalized IPC 1.2 32Reuse 4interleave XOR 32interleave 4Reuse 1.1 1.0 0.9 0.8 0.7 Improving Memristor Memory with Sneak Current Sharing 50 Summary of Dissertation Increased Current Draw 3D Stacking Spatio-Temporal Constraints [MICRO’13] Many Rank Re-Thinking Refresh Overhead Data Placement [Under Submission] Re-use SneakLatencies Currents Memristor Non- Volatile Memory Memory [ICCD’15, HPCA’12, NVMW’11,15] Enabling Big Memory with Emerging Technologies 51 Conclusions IR Drop Constraints 3D Stacking Rank Assignment Many Rank Re-Use Sneak Currents Memristor Enabling Big Memory with Emerging Technologies 52 Future Work • Mitigating the Rising Cost of Process Variation in 3D DRAM • PDN Aware Refresh Cycle Time for 3D DRAM • Addressing Long Write Latencies in Memristor based Memory Enabling Big Memory with Emerging Technologies 53 Other Projects and Publications • Efficiently Prefetching Complex Address Patterns • MICRO’15 • USIMM: The Utah Simulated Memory Module • Used for the Memory Scheduling Championship • Efficient Scrub Mechanisms for Error-Prone Emerging Memories • HPCA’12 • Accelerating Critical Word Access using Heterogeneous Memory • MICRO’12 • Avoiding Information Leakage in the Memory Controller • MICRO’15 Enabling Big Memory with Emerging Technologies 54 Acknowledgements • Rajeev • Ashwini, Parents • Al, Erik, Naveen, Ken • Chris Wilkerson, Zeshan Chishti • Utah Arch team-mates • Karen, Ann Enabling Big Memory with Emerging Technologies 55 Thank You Enabling Big Memory with Emerging Technologies 56 Thesis Overview 3D Stacking Many Rank Analyze Increased Current Density Impact of [MICRO’13] Currents Data + High Refresh Current Placement [Under Submission] Performance Loss Memristor Non- Volatile Memory Sneak Currents [ICCD’15, NVMW’11,15] Enabling Big Memory with Emerging Technologies 57 Normalized Exec. Time Comparisons to Prior Work 1.36 1.00 1.10 0ns RA 1.47 1.18 ExtT640ns RP_Opt Enabling Big Memory with Emerging Technologies 1.30 RP_Real Elastic Ref 58 Bit Lines RW RW RW Word Lines V VW1 VW2 RW VWN RW RW RW RW VWN1 RW VWNM RW V/2 RW RW V/2 V/2 0 Bit Line Mux Bit line and word line resistances eat into the cell Voltage Enabling Big Memory with Emerging Technologies 59 % Refreshes Percentage of refreshes stalling a thread 100 90 80 70 60 50 40 30 20 10 0 50 Enabling Big Memory with Emerging Technologies 60 Memory Latecny (Cycles) Memory Latency 500 NoReUse 32Reuse DRAM 400 300 200 100 0 Improving Memristor Memory with Sneak Current Sharing 61 Normalized Read Power Memristor Read Power 32ReUse 1.0 4ReUse 4Interleave 32Interleave NoReUse 0.9 0.8 0.7 0.6 0.5 Improving Memristor Memory with Sneak Current Sharing 62 $ Miss Last Level $$ Core 1 Delta History Tables Delta Prediction Tables Prediction Delta Prediction Tables Prediction Feedback z Core 8 Feedback $ Miss Delta History Tables See a Delta? Predict a Delta! Enabling Big Memory with Emerging Technologies 63 Vread Iread Vread/2 Ileak Vread/2 Ileak Vread/2 Ileak 0 Vread/2 Vread/2 Vread/2 Sneak path currents can distort Iread Enabling Big Memory with Emerging Technologies 64 Sneak Currents Compress to reduce write latency Array 1 Array 2 Bit 1 Array 3 Array 512 Bit 3 Bit 2 Bit 512 64 Byte Cache line Proposed Mapping With 50% Compression 1 3 1 3 1 3 1 3 0 2 0 2 0 2 0 2 Enabling Big Memory with Emerging Technologies 66 Normalized Completion Time 2.5 2.0 1.5 1.0 0.5 0.0 0ns Simultaneous Refresh Staggered Refresh Enabling Big Memory with Emerging Technologies Staggered RA 67 Summary With great density come a few challenges Sneak Currents limit array size, complicate reads, and delay writes Affect reliability Background current can be reused Reliability can be improved at the cost of write latency Compression can reduce write latency 8.3% performance improvement 30X reduction in multi bit error probability Enabling Big Memory with Emerging Technologies 68 500 450 400 350 300 250 200 150 100 50 0 NoReUse 32Reuse DRAM Improving Memristor Memory with Sneak Current Sharing CHR 100 90 80 70 60 50 40 30 20 10 0 Column Hit Rate (%) Memory Latency (Cycles) Column Hit Rate 69