Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
THE
TECHNOLOGY TIMES
Memory: the path to performance
Richard Murphy, Senior Architect Advanced Memory Systems, Micron Technology
Since the beginning of the computing era, memory technology has struggled to keep pace with processors. The
current “Big Data” era has raised the visibility of this disparity, referred to as the von Neumann Bottleneck or the
Memory Wall, perhaps the most fundamental problem in
computer architecture. Economics has dictated that processors be optimized for high performance and memory for
high density, while bandwidth has been taken for granted.
To paraphrase a close friend, the result is “We don’t pay
for bandwidth, but maybe we should.” Shifting the focus
from compute- to data-intensive applications certainly
highlights the fact that systems are fundamentally unbalanced, but the problem is further exacerbated by the unprecedented change that semiconductor technology is facing. Since 2003, processors simply have not shown much
improvement in the clock rate. The end of Dennard Scaling,
which has historically gone hand-in-hand with Moore’s Law
to provide more performance in roughly the same power
envelope, created the multicore era in which we now live.
Over the decades, performance improvement came “for
free,” and Moore’s Law guaranteed double the number of
transistors to take advantage of that improvement. Over
the same period, very little attention was given to memory
performance. Instead, the industry bet on the exponential
increase in performance made possible by the combination
of Moore’s Law and Dennard Scaling: if the system isn’t
fast enough, just wait; we’ll expend some more transistors
on the problem in the next generation. Indeed, the field of
computer architecture became almost entirely about building CPUs that could cope with “weak” memory systems.
Larger caches, deeper pipelines, super-scalar out-of-order
execution, and multithreading are all examples of processor
architecture innovations designed to cope with the fact that
memory was increasingly far away. Indeed, it may seem like
the processor got all the benefit from Moore’s Law, but the
reality is that Moore’s Law was originally about DRAM, stating that the number of DRAM transistors would double in
every generation. This prediction turned out to be remarkably accurate, with both DRAM and CPUs maintaining that
schedule ever since (despite the lengthening between generations that we’re now seeing as process technology slows).
In 2003, the world changed. Moore’s Law continued, but
Dennard Scaling did not. While the underlying transistor
technology got faster, it no longer did so in the same power
envelope, and we now had no way to remove the heat. Hence
the transition to multicore architecture. The only remaining
problem was that the industry never invested in memory
systems capable of supporting these new architectures—indeed, the number of memory controllers per core decreased
with each passing generation, increasing the burden on the
memory system. So we got used to paying for capacity, but
not bandwidth. I saw my first multicore processor in a lab in
2001, and Moore’s Law would conservatively say that those
two cores should have undergone at least six doublings by
now, yielding 128 cores on my desktop today. So why are we
so far behind? Because it’s impossible to build a capable memory system for that many cores out of DDR-class parts. The
1970s-era technology simply doesn’t have enough capability.
The solution is simple: focus not on processor performance, but on system performance, and rebalance the
architecture for more capable memory and I/O systems.
Micron’s Hybrid Memory Cube, which blends the best of
logic and DRAM processes into a heterogeneous package, represents a good first step toward a multicore solution. At the foundation of HMC is a small logic layer
that sits below vertical stacks of DRAM die connected by
through-silicon-via (TSV) bonds. An energy-optimized
DRAM array provides efficient access to memory bits via
the logic layer, providing an intelligent memory device
truly optimized for performance and energy efficiency.
This elemental change in how memory is built into a system is paramount. By placing intelligent memory on the
same substrate as the processing unit, each part of the system can do what it’s designed to do more efficiently than
with previous technologies. By advancing past traditional DRAM architecture, HMC is setting a new standard of
memory that will match advancements of CPU, GPU, and
ASIC roadmaps, offering system designers optimum flexibility in developing next-generation system architecture.
The Technology Times is a broadsheet publication powered by Supermicro UK. This four-page newspaper is sent to a targeted list
of financial customers in the City of London, and reports news from the technology and financial sectors.
For more information, please contact:
Sarah Powell
Supermicro UK
The Kinetic Centre
Theobald Street
Borehamwood
WD6 4PJ
Tel: +44 (0)208 387 1398
Email: [email protected]