Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Power over Ethernet wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Audio power wikipedia , lookup
Alternating current wikipedia , lookup
Standby power wikipedia , lookup
Power engineering wikipedia , lookup
Microprocessor wikipedia , lookup
Immunity-aware programming wikipedia , lookup
Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck <[email protected]> April 19, 2006 From the Paper by James Laudon <[email protected]> Computer Architecture News, Volume 33, Number 4, September 2005 [Tim Shattuck, 2006] [1] At Issue: Power Hungry Servers Increasing Costs to Power Hardware Wastes Limited Resources [Tim Shattuck, 2006] [2] Three Trends High power consumption to performance gains ratio Hardware costs account for a smaller percentage of Total Cost of Ownership (TCO) Energy costs are rising These trends are expected to make power the dominant factor in calculating TCO within five years. [Tim Shattuck, 2006] [3] Niagra Optimizations Simple Clock gating Pipelines More complex Hardware support for multithreading [Tim Shattuck, 2006] [4] Simple Optimizations Clock gating Don't power idle parts of the chip Shorter, medium-length pipelines Fewer registers, transistors between stages Less power wasted on (failed) speculation Allow for more cores / chip [Tim Shattuck, 2006] [5] More Optimizations Hardware Multithreading Keep on-chip resources busy Deals with high cache miss rates Boosts performance / Watt Increases throughput of threads Increases power consumption only slightly Increases size of the die 4 - 7% per thread [Tim Shattuck, 2006] [6] Cores / Die Fewer complex cores More simple cores Individual thread completion Aggregate thread throughput Simpler cores tend to have better performance / Watt ratios [Tim Shattuck, 2006] [7] Sufficient Cache and Memory Bandwidth Necessary to keep threads busy Sun's Niagra: Cores connected to L2 cache by a crossbar switch Cache bandwidth of 76.8 GB/s Four memory controllers directly connected to DDR2 SDRAM memory unit (200 Mhz) Raw memory bandwidth of 25.6 GB/s Controllers can reorder accesses to favor reads over writes. [Tim Shattuck, 2006] [8] Testing SPEC JBB 2000 Java server side business logic TPC-C, TPC-W Transactional processing tests XML Test Sun's multithreaded processing test. Result: Scalar processors with moderate pipelines and thread support outperformed superscalar processors. [Tim Shattuck, 2006] [9] Case Studies Sun's Niagra 8 cores, 4 threads each Scalar cores Tries to maximize performance / Watt Intel's Pentium Extreme Edition 2 cores, 2 threads each Superscalar cores Tries to maximize performance [Tim Shattuck, 2006] [10] Case Studies (II) - Results Feature Clock Speed Pipeline Depth Number of Cores Number of Threads L2 Bandwidth Memory Bandwidth Transistor Count Niagra 1.2 Ghz 6 stages 8 32 76.8 GB/s 25.6 GB/s 279 Million Pentium Extreme Edition 3.2 Ghz 31 stages 2 4 ~180 GB/s 6.4 GB/s 230 Million Power 72 W 130 W [Tim Shattuck, 2006] [11] Simple Core Limitations Lower single thread performance Amplified by lower instruction level parallelism Keeping a large number of threads busy may become difficult Hot locks – threaded applications may not scale very well [Tim Shattuck, 2006] [12] Future Directions Use multithreading to enhance single threaded applications Run-ahead execution – allows out of order execution with only a modest amount of hardware Software control of power consumption Dynamic adjustments to voltage and frequency to tune power consumption Control of non-processing devices' (disk, memory systems) power consumption [Tim Shattuck, 2006] [13] Conclusion Invest in a Niagra today! [Tim Shattuck, 2006] [14]