Download Introducing LRDIMM – A New Class of Memory Modules

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Immunity-aware programming wikipedia , lookup

Random-access memory wikipedia , lookup

Transcript
Introducing LRDIMM – A New
Class of Memory Modules
White Paper
This paper introduces the LRDIMM, a new type of memory module for highcapacity servers and high-performance computing platforms. LRDIMM is an
abbreviation for Load Reduced Dual Inline Memory Module, the newest type
of DIMM supporting DDR3 SDRAM main memory. The LRDIMM is fully pincompatible with existing JEDEC-standard DDR3 DIMM sockets, and supports
higher system memory capacities when enabled in the system BIOS.
The LRDIMM presents a new set of dynamics into the memory system performance
characteristics in terms of bandwidth, capacity, functionality and power.
Specifically, the LRDIMM offers substantially higher operating data rates in the
highest capacity configurations, new features to improve testability, usability and
memory capacity, and does so with minimal power adder in high capacity, multipleDIMM-per-channel configurations.
Existing RDIMM (registered DIMM) memory technology forces end-users to make
tradeoffs between memory capacity and operating speed, and is beginning to
fall short of ever-increasing user demands for higher capacity. The LRDIMM
overcomes the obstacle of forcing end-users to trade-off between capacity and
operating speed, enabling high capacity memory systems running at the highest
operating speeds.
What is Load-Reduced DIMM (LRDIMM)?
At the heart of the LRDIMM technology is the memory buffer, pioneered by the introduction of Inphi’s Isolation
Memory Buffer (iMB™). Figure 1 shows a high-level conceptual drawing of an LRDIMM, featuring one memory
buffer on the front side of the memory module and multiple ranks of DRAM mounted on both front and back
sides of the memory module. The memory buffer re-drives all of the data, command, address and clock
signals from the host memory controller and provides them to the multiple ranks of DRAM. The memory buffer
isolates the DRAM from the host, reducing the electrical load on all interfaces. Reducing the electrical loading
in this manner allows a system to operate at a higher speed for a given memory capacity, or to support a
higher memory capacity at a given speed. On existing RDIMM technology, the data bus connects directly to
the multiple ranks of DRAM, increasing electrical load and limiting system speed as the desired server capacity
increases. Figure 2 shows the system-level block diagram for RDIMM and LRDIMM.
Introducing LRDIMM – A New Class of Memory Modules
2
Memory Buffer
Memory Buffer re-drives
Command/Address bus on LRDIMM
Memory buffer re-drives data signals on
LRDIMM (from host to DRAM and DRAM to host)
Figure 1 – LRDIMM Conceptual Drawing
Command/Address
Bus from Host
Memory Controller
Parallel multi-drop data bus to/from Host
Memory Controller (one electrical load
per LRDIMM)
Memory
Controller
CMD/
ADDR/
CLK
Register
Registered DIMM
Load-Reduced DIMM
Memory
Controller
Data
CMD/
ADDR/
CLK
Memory
Buffer
Parallel multi-drop data bus to/from
Host Memory Controller (one electrical
load per LRDIMM)
Figure 2 – RDIMM vs. LRDIMM
Data
To maximize module and system-level memory capacity and overcome limitations on the number of chip selects
per channel, the LRDIMM supports Rank Multiplication, where multiple physical DRAM ranks appear to the host
controller as a single logical rank of a larger size. This is done by utilizing additional row address bits during the
Activate command as sub-rank select bits. Read and Write commands do not require the additional sub-rank
select bits, since the information is stored in the memory buffer after the Activate command. Rank Multiplication
may be disabled, or set for 2:1 or 4:1 multiplication, up to a maximum of 8 physical ranks per LRDIMM.
LRDIMM Features
In addition to maximizing system memory capacity and speed, the LRDIMM architecture enables a number
of useful features for the end-user. The LRDIMM’s memory buffer, the iMB, supports several useful features
including DRAM and LRDIMM test features such as transparent mode and MemBIST (Memory Built-In Self-Test)
functionality, VREF (voltage reference) margining for both the data (DQ) and command/address (CA) busses,
parity checking for commands, in-band control word features similar to those of the 32882 register for RDIMMs,
an optional out-of-band SMBus (Serial Management Bus) interface for LRDIMM configuration and status
registers, and an integrated temperature sensor.
Transparent mode: For module testing purposes, the memory buffer can be placed in transparent mode, where
it acts as a simple signal re-drive buffer and passes commands and data directly through to the DRAM devices.
Introducing LRDIMM – A New Class of Memory Modules
3
MemBIST: For DRAM initialization and component testing purposes, the LRDIMM’s memory buffer supports
a MemBIST (Memory Built-In-Self Test) function, enabling exhaustive at-speed testing of the DRAM devices.
Testing can be performed at full operational speed, using either in-band (Command/Address bus) or out-ofband (SMBus) access.
VREF margining: LRDIMMs can use externally supplied voltage references for data (VREFDQ) and command/
address signals (VREFCA), or supply the voltage references internally from the memory buffer. In the case when
VREF is provided by the Memory Buffer, the voltage level can be controlled by the host though the Memory
Buffer’s configuration registers. The programmable voltage references enable an LRDIMM memory system to
utilize independent VREF voltage references for the host-to-DRAM interfaces and host-controller-to-memorybuffer interfaces and thus enable VREF margining to be independently performed for the host-to-DRAM
interfaces and the host-controller-to-memory-buffer interfaces. The independent VREF margining capability
allows module and system suppliers to separately characterize and therefore guarantee the robustness of the
signaling interfaces of the LRDIMM memory system.
Parity checking: To detect corrupted commands on the Command/Address bus, the memory buffer checks parity
on the incoming commands and asserts an ERROUT_n signal if a parity error is detected.
SMBus interface: The memory buffer supports an out-of-band serial management bus to read and write
configuration and status registers.
Temperature sensor: The memory buffer contains an integrated temperature sensor that is updated eight
times per second and is accessible at any time through the SMBus. The memory buffer’s EVENT_n pin can be
configured as an interrupt back to the host to indicate high temperature events.
How Does LRDIMM scale to higher speed?
The benefits of the register component for RDIMMs are well known, and registered DIMMs provide system
memory capacity advantages over unbuffered DIMMs. However, since the RDIMM’s register component only
buffers the command and address busses, the unbuffered data bus remains as a weakness for an RDIMMbased memory system. For example, a quad-rank DDR3 RDIMM presents four electrical loads on the data bus
per RDIMM. Consequently, quad-rank DDR3 RDIMMs can only operate at a maximum data rate of 1066 MT/s
in a 1 DIMM per channel (1 DPC) configuration, and 800 MT/s in a 2 DIMM per channel (2 DPC) configuration.
LRDIMMs, which buffer the data bus along with the command and address busses, can operate at higher data
rates and in higher memory density configurations.
Figure 3 shows the simulated eye diagram of the data bus with two quad-rank RDIMMs in a 2 DIMM per channel
configuration. It demonstrates that with the presence of 8 electrical loads on the data bus, the signal integrity of
the memory channel is severely degraded and limits the signaling rate of the memory system. Specifically, it shows
that with 8 electrical loads at 1333 MT/s, the maximum data eye width on the data bus is reduced to 212 ps at
an idealized VREF point, and less than 115 mV at the maximum voltage opening. The effect of this reduced data
eye means that the two quad-rank RDIMMs in the 2 DIMM per channel configuration are not suitable for operation
at 1333 MT/s. Figure 3 thus illustrates the difficult trade-off that memory system architects must face between
higher capacity and higher data rate memory system operation in RDIMM memory systems.
Introducing LRDIMM – A New Class of Memory Modules
4
Figure 4 shows the simulated eye diagram of the data bus with two quad-rank LRDIMMs in the 2 DIMM per
channel configuration. With the electrical loads of the eight physical ranks of DRAM devices replaced with two
electrical loads of the memory buffer on the data bus, the signal integrity of the data bus is greatly improved.
Specifically, Figure 4 shows that under the same simulation conditions as that used for the two quad-rank
RDIMMs in Figure 3, the maximum data eye width on the data bus is improved from 212 ps to 520 ps, and the
maximum data eye height is improved to 327 mV from 115 mV at the maximum voltage opening. The effect of
this improved signal integrity means that the LRDIMM is able to operate at 1333 MT/s and above, even when
multiple LRDIMMs are populated in the same channel. Figure 4 thus demonstrates the enabling of higher
data rate operation for high capacity memory modules – the basic requirement for mitigating the dependency
between memory capacity and memory bandwidth.
Teye=519.88ps Veye=327.10mV
600
600
400
400
200
200
Voltage(mv)
Voltage(mv)
Teye=212.68ps Veye=114.89mV
0
-200
0
-200
-400
-400
-600
-600
0 1002003004005006007008009001000
Time(ps)
Figure 3 – Simulated Eye Diagram of the Data Bus
with Two Quad-Rank RDIMMs at 1333 MT/s
0 1002003004005006007008009001000
Time(ps)
Figure 4 – Simulated Eye Diagram of the Data Bus
with Two Quad-Rank LRDIMMs at 1333 MT/s
Implications for System Memory Capacity
One of the primary advantages of the LRDIMM is the ability to dramatically increase total system memory
capacity without sacrificing speed. By electrically isolating the DRAM from the data bus, additional ranks of
DRAM can be added to each DIMM while maintaining signal integrity, and additional DIMMs can be installed on
each system memory channel. LRDIMM capacities up to 32GB are possible today with 4Rx4 modules using
4 Gb, DDP (dual-die package) DRAM. Since each LRDIMM presents a single electrical load to the host, more
DIMMs can be installed per channel as well.
Assuming a high capacity server with 2 processors, 3 DIMM slots per channel, and 4 channels per processor,
total system memory using LRDIMMs can be increased by 2 to 3 times over RDIMM capacity on the same
system. Figure 5 shows typical RDIMM and LRDIMM capacity limits for each operating speed and voltage. For
1.5V DDR3 operation at 800 MT/s (mega-transfers per second), a system fully populated with RDIMM could
reach 384GB, using three 16GB 2Rx4 RDIMMs per channel. With support for 32GB 4Rx4 modules, system
memory capacity using LRDIMM can double that limit, reaching 768 GB. System chip-select limits (typically 8
total DRAM ranks per channel) can be overcome using the LRDIMM’s rank multiplication feature, allowing the
12 physical ranks per channel needed in this case.
Introducing LRDIMM – A New Class of Memory Modules
5
At higher speeds such as 1066 or 1333 MT/s, signal integrity constraints prevent 3 DIMM per channel RDIMM
configurations. For 1.5V DDR3 operation at 1066 or 1333 MT/s, maximum RDIMM capacity is 256 GB. The
LRDIMM does not face the same signal integrity limit, allowing full 3 DIMM per channel operation at 1066 MT/s
(and at 1333 MT/s as a stretch goal), with total system memory capacity remaining at 768 GB, a 3x advantage
over RDIMM. For 1.35V DDR3L operation at 1333 MT/s, the LRDIMM advantage is even greater.
Inphi’s iMB™ is designed to support LRDIMM operation at even higher memory access speeds, up to 1600
MT/s for systems capable of operating at that rate.
Additional capacity gains will become available in the near future, up to 64 GB in an 8Rx4 LRDIMM using 4Gb
QDP (quad-die package) DRAM.
RDIMM System (2 Nodes, 3 SPC, 4 Ch)
Speed
(MT/s)
800
1066
1333
LRDIMM System (2 Nodes, 3 SPC, 4 Ch)
Voltage
Maximum
density RDIMM
DIMMs
per
channel
DIMMs
per
system
System
Capacity
Maximum
density LRDIMM
DIMMs
per
channel
DIMMs
per
system
System
Capacity
1.5V
16 GB, 2Rx4, 4Gb
3
24
384 GB
32 GB, 4Rx4, 4Gb
3
24
768 GB
1.35V
16 GB, 2Rx4, 4Gb
2
16
256 GB
32 GB, 4Rx4, 4Gb
3
24
768 GB
1.5V
16 GB, 2Rx4, 4Gb
2
16
256 GB
32 GB, 4Rx4, 4Gb
3
24
768 GB
1.35V
16 GB, 2Rx4, 4Gb
2
16
256 GB
32 GB, 4Rx4, 4Gb
3
24
768 GB
1.5V
16 GB, 2Rx4, 4Gb
2
16
256 GB
32 GB, 4Rx4, 4Gb
3*
24
768 GB
1.35V
16 GB, 2Rx4, 4Gb
1
8
128 GB
32 GB, 4Rx4, 4Gb
2
16
512 GB
Figure 5 – System Memory Capacity, RDIMM vs. LRDIMM
Minimal Incremental Power Impact of LRDIMM
The LRDIMM not only enables server systems with higher memory capacities, it does so with minimum power
penalty. While the memory buffer on a single LRDIMM in a 1DIMM per channel configuration draws more power
than the registering clock driver on a single RDIMM in the same configuration, the difference is greatly reduced
for higher-density 2 and 3 DIMM per channel configurations, as shown in Figures 6 and 7.
Figure 6 shows the normalized power per RDIMM or LRDIMM for 1 and 2 DIMM per channel configurations
at various speeds. Since actual power consumed depends primarily on the memory density and DRAM
technology used, relative power is shown here for LRDIMMs and RDIMMs using the same generation DRAM
on equivalent 4Rx4, 32GB modules. RDIMM module power at 800 MT/s was normalized to 1.00, and other
results scaled to that reading. Memory was exercised using an industry-standard benchmarking tool set to
generate maximum bandwidth with 50 percent reads and 50 percent writes.
At 800 MT/s in a 1 DIMM per channel configuration, LRDIMM power was 17 percent higher than RDIMM
power, but at 800 MT/s in a 2 DIMM per channel configuration, LRDIMM power measured 3 percent lower than
RDIMM power. At 1066 MT/s in a 1 DIMM per channel configuration, LRDIMM power was 15 percent higher
than RDIMM power, but again dropped significantly in a 2 DIMM per channel configuration, to a level 15 percent
below 1 DIMM per channel RDIMM power. At 1333 MT/s, power per LRDIMM was 28 percent lower at 2
DIMM per channel than at 1 DIMM per channel.
Introducing LRDIMM – A New Class of Memory Modules
6
Figure 7 shows similar results for maximum bandwidth memory accesses with 100 percent reads. The power
differential at 1 DIMM per channel is slightly higher than in the 50/50 read/write case, but again drops off
significantly in 2 DIMM per channel configurations. Since the LRDIMM is primarily intended for high-density
memory applications, power per LRDIMM at 2 DIMM per channel should be of more interest to end-users than
power in 1 DIMM per channel configurations. Figures 6 and 7 demonstrate that for the higher-density 2 DIMM
per channel configurations, end-users enjoy the benefits of LRDIMM with no power penalty.
Power/DIMM, (50/50 R/W)
Power/DIMM, (100%Rd)
2.00
1.50
RDIMM 1DPC
1.00
LRDIMM 1DPC
0.50
RDIMM 2DPC
LRDIMM 2DPC
0.00
(Normalized Power)
(Normalized Power)
2.00
1.50
RDIMM 1DPC
1.00
LRDIMM 1DPC
0.50
RDIMM 2DPC
LRDIMM 2DPC
0.00
800 10661333
800 10661333
MT/s
MT/s
Figure 6 – Normalized Power per DIMM
(32GB 4Rx4, Closed Page, 50/50 Read/Write)
Figure 7 – Normalized Power per DIMM
(32GB 4Rx4, Closed Page, 100% Reads)
Improved Power Efficiency at 2 and 3 DIMM per Channel
The LRDIMM power efficiency gains at 2 DIMM per channel result from the architectural advantage of having
the memory buffer on each LRDIMM located between the memory controller and the DRAM, rather than
having all DRAM on all RDIMMs on a given memory channel connected directly to the memory controller. This
allows for power-saving termination options for DRAM located on non-target LRDIMMs that are not possible
for RDIMMs, and for lower drive strengths due to the lower number of loads on a given channel. These same
advantages extend to 3 DIMM per channel configurations as well.
Consider the 1 DIMM per channel Read case shown in Figure 8. For the RDIMM, active DRAM power includes
the DQ output driver of the target rank identified as Item 1, and the non-target DRAM termination identified as
Item 2. In the 1 DIMM per channel case, these are similar for the LRDIMM, which adds memory buffer MDQ
receiver termination power identified as Item 3, as well as the memory buffer MDQ receiver, DLL, and DQ output
driver power identified as Item 4. Items 3 and 4 are additive to the equivalent RDIMM power at 1 DIMM per
channel, which accounts for the higher LRDIMM power in the 1 DIMM per channel case.
In the 2 DIMM per channel case shown in Figure 9, the RDIMM system requires additional non-target DRAM
terminations on the adjacent RDIMM, labeled as Item 2. In the 2 DIMM per channel LRDIMM case, the
adjacent LRDIMM requires only the additional power needed for the non-target memory buffer DQ termination,
labeled as Item 5. By isolating the non-target DRAM from the main memory channel, termination requirements
for optimum signal integrity can be modified, and incremental power for additional LRDIMMs can be minimized.
Introducing LRDIMM – A New Class of Memory Modules
7
1
1
DRAM
Figure 8 – Power-consuming circuits
for 1 DIMM per channel read case
4
DRAM
2
iMB
2
3
RDIMM
LRDIMM
1
1
5
DRAM
4
2
2
iMB
2
RDIMM
3
LRDIMM
Item 1: DQ output driver
Item 2: Non-target DRAM termination
Item 3: iMB MDQ receiver termination
Item 4: iMB MDQ receiver, DLL, and active DQ output driver
Item 5: Non-target iMB DQ termination
Figure 9 – Power-consuming circuits
for 2 DIMM per channel read case
The iMB enables further power optimization by allowing more power-efficient settings for post-buffer ODT
(on-die termination) and Ron (on resistance) settings, given the shorter trace lengths and the reduced number
of electrical loads between the iMB and the DRAM, compared to the trace lengths from the host memory
controller to the DRAM on an RDIMM-based system. In Figure 9, note that the DRAM DQ driver labeled as Item
1 sees 7 DRAM loads in addition to the remote host controller in the RDIMM case, while the DRAM DQ driver in
the LRDIMM case sees only 3 DRAM loads and the MB.
A hypothetical example comparing the incremental DC power required by the second RDIMM or LRDIMM in
a 2 DIMM per channel configuration shows the magnitude of the achievable power savings. If the LRDIMM’s
non-target iMB DQ termination is 60 ohms, termination voltage is 750mV (VDD/2), and the DQ signal level is
1250mV (high) or 250mv (low), power per bit is calculated as follows for the LRDIMM:
P = V2/R = (1.25 – 0.75)2 / 60 = 4.17 mW
For the RDIMM, the two non-target DRAM terminations may need to be set to 40 ohms, a parallel combination
of 20 ohms, which gives:
P = V2/R = (1.25 – 0.75)2 / 20 = 12.5 mW
Introducing LRDIMM – A New Class of Memory Modules
8
This would save 8.33 mW per bit for the non-target DIMM, or approximately 900 mW for the 108 bits of the
data bus (64 DQ data bits, 8 ECC DQ bits, and 36 DQS strobe bits.)
In addition, dynamic power savings are achievable due to the reduced number of capacitive loads charging or
discharging in the LRDIMM 2 DIMM per channel case compared to the RDIMM 2 DIMM per channel case. For
each data bit, the DQ driver sees 4 additional loads going from 1 DIMM per channel to 2 DIMM per channel in
the RDIMM case, and only 1 additional load in the LRDIMM case. Assuming a 2 pF input capacitance for each
of these 3 additional loads toggling at 1333 MHz, with 50% of the inputs charging or discharging for a random
data pattern with a high-to-low voltage swing of 1.0V, the LRDIMM 2 DIMM per channel case could save
P = 3*0.5*(VRMS*IRMS) = 1.5*(V/√2)*((V/√2)/Z) = 1.5*(V2/2)*(2πfC), or
P = 1.5 * π * 1.333E9 * 2E-12 = 12.5 mW per bit
For the 108 bits of the data bus, total dynamic power savings would be approximately 1350 mW. Combined
with the DC power savings calculated above, the LRDIMM 2 DIMM per channel case could save over 2 watts, a
substantial improvement in power efficiency over RDIMM technology in high-density configurations.
Conclusion
The LRDIMM overcomes the limitations of existing RDIMM technology, offering several strategic advantages to
end-users in the high capacity server and high performance computing markets. The LRDIMM is compatible
with existing JEDEC standard DDR3 DRAM technology, providing a painless upgrade path to higher capacity
and higher performance systems. Inphi’s iMB™ sets the standard for a new generation of memory modules,
empowering the LRDIMM architecture and enabling a new class of feature-rich functionality for data center
operating environments and manufacturing test requirements.
The LRDIMM enables higher-density memory modules and system memory capacities 2 to 3 times higher than
possible with existing RDIMM technology, taking full advantage of DDR3 and DDR3L operating speeds that were
previously limited to the lowest capacity systems. The increased speed and memory capacity made possible by
the LRDIMM not only overcomes those limitations, it does so in a power footprint comparable to existing RDIMM
power budgets, providing substantial power, cooling and cost savings for high density data centers.
Updated 07/21/2011
Inphi Headquarters
3945 Freedom Circle
Suite 1100
Santa Clara, CA 95054
Phone: (408) 217-7300
Inphi Southern California
112 S. Lakeview Canyon Rd.
Suite 100
Westlake Village, CA 91362
Phone: (805) 719-2300
Web: www.inphi.com
© 2011 Inphi Corporation. All rights reserved. Inphi, the Inphi logo, Think fast, and ExacTik are registered
trademarks of Inphi Corporation. iMB is a trademark of Inphi Corporation.
9