Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE TECHNOLOGY TIMES Memory: the path to performance Richard Murphy, Senior Architect Advanced Memory Systems, Micron Technology Since the beginning of the computing era, memory technology has struggled to keep pace with processors. The current “Big Data” era has raised the visibility of this disparity, referred to as the von Neumann Bottleneck or the Memory Wall, perhaps the most fundamental problem in computer architecture. Economics has dictated that processors be optimized for high performance and memory for high density, while bandwidth has been taken for granted. To paraphrase a close friend, the result is “We don’t pay for bandwidth, but maybe we should.” Shifting the focus from compute- to data-intensive applications certainly highlights the fact that systems are fundamentally unbalanced, but the problem is further exacerbated by the unprecedented change that semiconductor technology is facing. Since 2003, processors simply have not shown much improvement in the clock rate. The end of Dennard Scaling, which has historically gone hand-in-hand with Moore’s Law to provide more performance in roughly the same power envelope, created the multicore era in which we now live. Over the decades, performance improvement came “for free,” and Moore’s Law guaranteed double the number of transistors to take advantage of that improvement. Over the same period, very little attention was given to memory performance. Instead, the industry bet on the exponential increase in performance made possible by the combination of Moore’s Law and Dennard Scaling: if the system isn’t fast enough, just wait; we’ll expend some more transistors on the problem in the next generation. Indeed, the field of computer architecture became almost entirely about building CPUs that could cope with “weak” memory systems. Larger caches, deeper pipelines, super-scalar out-of-order execution, and multithreading are all examples of processor architecture innovations designed to cope with the fact that memory was increasingly far away. Indeed, it may seem like the processor got all the benefit from Moore’s Law, but the reality is that Moore’s Law was originally about DRAM, stating that the number of DRAM transistors would double in every generation. This prediction turned out to be remarkably accurate, with both DRAM and CPUs maintaining that schedule ever since (despite the lengthening between generations that we’re now seeing as process technology slows). In 2003, the world changed. Moore’s Law continued, but Dennard Scaling did not. While the underlying transistor technology got faster, it no longer did so in the same power envelope, and we now had no way to remove the heat. Hence the transition to multicore architecture. The only remaining problem was that the industry never invested in memory systems capable of supporting these new architectures—indeed, the number of memory controllers per core decreased with each passing generation, increasing the burden on the memory system. So we got used to paying for capacity, but not bandwidth. I saw my first multicore processor in a lab in 2001, and Moore’s Law would conservatively say that those two cores should have undergone at least six doublings by now, yielding 128 cores on my desktop today. So why are we so far behind? Because it’s impossible to build a capable memory system for that many cores out of DDR-class parts. The 1970s-era technology simply doesn’t have enough capability. The solution is simple: focus not on processor performance, but on system performance, and rebalance the architecture for more capable memory and I/O systems. Micron’s Hybrid Memory Cube, which blends the best of logic and DRAM processes into a heterogeneous package, represents a good first step toward a multicore solution. At the foundation of HMC is a small logic layer that sits below vertical stacks of DRAM die connected by through-silicon-via (TSV) bonds. An energy-optimized DRAM array provides efficient access to memory bits via the logic layer, providing an intelligent memory device truly optimized for performance and energy efficiency. This elemental change in how memory is built into a system is paramount. By placing intelligent memory on the same substrate as the processing unit, each part of the system can do what it’s designed to do more efficiently than with previous technologies. By advancing past traditional DRAM architecture, HMC is setting a new standard of memory that will match advancements of CPU, GPU, and ASIC roadmaps, offering system designers optimum flexibility in developing next-generation system architecture. The Technology Times is a broadsheet publication powered by Supermicro UK. This four-page newspaper is sent to a targeted list of financial customers in the City of London, and reports news from the technology and financial sectors. For more information, please contact: Sarah Powell Supermicro UK The Kinetic Centre Theobald Street Borehamwood WD6 4PJ Tel: +44 (0)208 387 1398 Email: [email protected]