* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Crusoe processor (Transmeta)
Electrification wikipedia , lookup
Electric power system wikipedia , lookup
Audio power wikipedia , lookup
Power over Ethernet wikipedia , lookup
Switched-mode power supply wikipedia , lookup
Alternating current wikipedia , lookup
Mains electricity wikipedia , lookup
Computer program wikipedia , lookup
Distribution management system wikipedia , lookup
TM5400/5600 TM5500/5800 TM6000 Jason Law Byeong Kil Lee Outline • • • • • • • Crusoe technology Crusoe processors / architecture Code morphing software Crusoe hardware support for code morphing LongRun power management Performance comparison Conclusion Crusoe Technology • Crusoe processor = Software + hardware Code Morphing software • Dynamically translates x86 instructions into VLIW instructions • Provides x86 compatibility • Optimization and scheduling by software VLIW hardware • 128 bit Very long Instruction Word Processor • Simple and fast • Fewer transistors Low power x86 compatibility PC performance 1/4 3/4 Crusoe VLIW Crusoe Processors L1 cache : 128 K DDRAM-SDRAM (100 to 133MHz) SDRAM (66 to 133MHz) Features • • • • • • • • Lighter Longer Cooler x86 compatibility (windows / Linux) Upgradeable (by software) Lower cost MMX support ( not support for SSE / 3dnow! ) Target : ultra-light mobile notebooks, internet appliance, high-density servers, embedded devices • Products : SONY, Fusitsu, NEC, RLX technology, …. Crusoe Architecture TM5800 Cont. • VLIW CPU : executing up to 4 operations in each cycle – Molecule: long instruction word (128 bits molecule) – All atoms within a molecule are executed in parallel, in order • 2 ALU, 1FP, 1 load/store, 1 branch unit • In-order 7-stage integer/10-stage FP pipeline • 64 integer registers, 32 FP registers Crusoe vs. x86 • The blue stuff is silicon, and the yellow is software • Crusoe's blue part is smaller • All of those hardware was moved off the die and into software Code Morphing Software : A dynamic translation system, reside in a ROM, First program to start executing when booting • Drawing the H/W and S/W line – Software: decoding x86 instructions and generating parallel molecule – Hardware: execute using a simple, high-speed VLIW engine • Decoding and scheduling – Translation cache : CMS translates instructions once, saving the resulting translation for re-use Skip the translation in the next time Code Morphing Software Caching • Translation cache : – Resides in a separate memory space – The size can be set at boot time, or OS can make the size adjustable • Crusoe’s CMS monitor actual execution – Keep track of which blocks of code execute most often Optimizes them accordingly – Keep track of which branches are most often taken Annotate the code accordingly Code Morphing Software Filtering & Prediction • Filtering : a wide choice of execution modes for x86 code – Interpretation (no translation overhead), – Translation, – Highly optimized code(takes longest to generate) : Run faster once translated • Prediction – Highly biased branch : frequently taken path – Otherwise : execute both path, select later Code Morphing Software Translation Process • 1st pass (frontend) – Translate the x86 instructions into a simple sequences of atoms (temporary register used) • 2nd pass(optimizer) – Well-known compiler optimization Common subexpression elimination, loop invariant removal, Dead code elimination • 3rd pass (scheduler) : – Reorders the optimized atoms and groups them into individual molecules (Scheduling by software, more effective scheduling algorithms and consider a larger window of instructions) Advantages of the Code Morphing Software Traditional x86 Processors Crusoe Processor with Code Morphing software Translates instructions once, Translates each x86 instruction saving the resultant translation in a cache every time it is encountered for re-use Full of complex, power-hungry Transistors Much of the processor functionality is implemented in software - less logic transistors, less power - use effective optimization/schedule algorithm - use a larger window of instruction -… Crusoe Hardware Support for Code Morphing : Crusoe hardware has been designed specifically with dynamic translation in mind. • Crusoe's solution of exceptions – All registers holding x86 state are shadowed (two copies of each register, a working copy and a shadow copy) – Normal atoms only update the working copy of the register i) without encountering an exception : "commit" operation : copies all working register into shadow registers ii) exception occurs : "rollback" operation : copies the shadow register values back into the working registers. Cont. • Store operations by holding store data in a "gated store buffer " – Only released to the memory system at the time of a commit – On a rollback, stores not yet committed : dropped from the store buffer • Safe reordering loads ahead of stores (Alias Hardware) – The load a "load-and-protect" (data, the address and size of data) – The store a "store-under-alias-mask " (checks for protected regions) * In the event that the store operation overwrite the previously loaded data the process raises an exception, and the runtime system can take corrective action. Sample Translation Code X86 instructions Translated VLIW molecule : They use 2 integer ALU atoms in a molecule LongRun Power Management • Crusoe was designed for good performance at very low power • Power = 1/2 CV2F • Reduce transistor count to decrease capacitance • Scale voltage and frequency dynamically to give just enough performance for current workload LongRun Power Management Dynamic Power Management • Frequency changes in steps of 33 MHz • Voltage changes in steps of 25mV • Supports up to 200 frequency/voltage changes per second • Can give cubic reductions in power consumption – Reduce C2 and F LongRun Power Management Conventional Power Profile LongRun Power Management LongRun Power Profile LongRun Power Management ACPI Standard • ACPI - Advanced Configuration and Power Interface – joint standard of Microsoft, Intel, and Toshiba • System level technique to reduce power • Allows three low-power states that can be alternated – AutoHALT - processor executes HLT instr • Processor stops its internal clock – QuickStart - Southbridge gives processor STPCLK signal • Processor maintains cache coherency – Deep Sleep - Southbridge disables processor CLK input • Southbridge maintains cache coherency LongRun Power Management ACPI vs. LongRun LongRun Power Management Intel Speed Step • Statically lowers voltage/frequency settings at startup • Two operating points: – AC power -- full performance – DC power -- slightly lower performance • Low granularity misses opportunities for power savings LongRun Power Management How LongRun Compares Performance The 700 MHz TM5400 was quoted as having comparable performance to a 500-550 MHz Pentium III. Transmeta didn't offer any conventional benchmarks. Rather, it compared the power utilized on a mobile P entium III to the power utilized on a Crusoe when completing various tasks. It appears that Transmeta would like to dictate to the mobile industry that power is what it's all about, not speed. That is Transmeta's strong suit, but some normal benchmarks would have been nice. Why not show them? If Crusoe did well in those benchmarks, do you think Transmeta wouldn't show them? I'm convinced that the Crusoe is not performing as well as mobile AMD or Intel chips. For the markets it's aimed at, that's not too big a deal, but I'd like to know. - From a article by Rob Hughes, Jan 20, 2000 Relative Performance While Mobile (on Batteries) TM5800 vs. Pentium III ULV 1.0 0.75 0.5 0.25 0 2001 CPUmark99 v1.1 Comparison CPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0 Business Graphics Winmark v1.1 Comparison CPU + Core Logic power Watt 8.0 6.0 4.0 2.0 0 Conclusion • Combination of hardware and software • Using software - To decompose complex instructions into simple atoms - To schedule and optimize the atoms for parallel execution Saves millions of logic transistors Cuts power consumption (60~70%) Enabling aggressive code optimization techniques • LongRun power management Cuts power consumption by factor of 2 to 10