Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tony Kolanovic Student Id: 053-74-7657 Microprocessors Professor Dewar May 20, 2002 Transmeta's Crusoe Microprocessor Transmeta -- in the beginning In 1995, four and a half years ago, Transmeta was formed. From its inception, privately held Transmeta kept a very low profile--or rather, a complete media silence. Transmeta began recruiting high profile talent, including the infamous Linus Torvalds, creator of the open source Linux OS. In late 1999, details began leaking out as to just what Transmeta was developing. It was a very low-power processor that could be run at high speeds and could run a variety of instruction sets. It would be called "Crusoe." On January 19th, 2000, Transmeta let the world in on its secret. Transmeta made a mobile chip called the Crusoe, and it draws a mere 1 watt of power during normal usage. The processor family is aimed at Internet / mobile devices, and a goal of production was to make it "Internet compatible." To Transmeta, that meant that not only did Crusoe have to support x86 instructions, run a standard OS, and run a browser, but it also had to be able to support plug-ins for ShockWave, Flash, or Java. The Crusoe is unlike any other hardware-only processors. Implementing new technology, Transmeta was able to create a processor that is lighter than many other laptop processors, allowing computer manufacturers that use Crusoe to put the chip in laptops that are only 1 inch thick or less. With this flexibility that Crusoe brings to the laptop industry, manufacturers can keep the weight of a laptop under 4 lbs., and increase screen size, or add a second or even third drive. Or, they can dramatically reduce both weight and thickness. Or any combination of the above. Besides being lighter, the Crusoe lasts longer than other processors. With a high performance 128-bit VLIW (Very Long Instruction Word) hardware engine, and a software-based architecture, Crusoe requires fewer transistors. Since all transistors burn power, having fewer of them means Crusoe requires less power to do the same amount of work. Secondly, Crusoe uses power wisely. Instead of running every application at the processor’s top speed, Crusoe’s unique ’smart processor’ architecture and LongRun® Power Management technology, look at the application, see where they can economize, and make the necessary adjustments — all in a few microseconds — to determine exactly how much performance the application demands. In addition, the longer you run an application, the more Crusoe improves the application’s performance because it’s always learning. The result is that, with Crusoe, you always get the optimum performance, no more or less, and the maximum battery life. Due to the fact that the Crusoe uses fewer transistors and manages power more efficiently, it can run at a much cooler temperature than other processors. Crusoe has a fundamentally different architecture, designed from the ground up to run cooler. The thermal images above contrast the operating temperatures of a Crusoe processor and a conventional “mobile” processor, running a software DVD player. The Crusoe processor, without any cooling, runs at 48°C (118°F), whereas the conventional processor, at 105°C (221°F), can heat to the point of failure if it not aggressively cooled. Thermal image cameras captured this side-by-side demonstration of a Pentium III 500MHz, 1.6 volt part versus a Crusoe TM5600 600MHz, 1.6 volt part. Figure 1. Pentium III / Crusoe processor bootup. Figure 2. Processors heating up (elapsed time = 10 sec.). Figure 3. Peak temperature (elapsed time = 25 sec.). In the thermal demo images above (see Figures 1 through 3), each processor is booted up, and very quickly the Pentium III on the left begins to heat up. With its cooling apparatus removed, the Pentium III reaches temperatures of 105ºC, or 242ºF. Meanwhile, Crusoe boots the operating system, runs perfectly and only reaches a temperature of 48ºC, or 118ºF. All of the aforementioned qualities of the Crusoe make it a remarkable processor. When Transmeta's Crusoe team began work on the processor, they started from scratch, but instead of asking "how fast can we possibly make this," as AMD's and Intel's design teams were doing, they asked "how efficient can we possibly make this, and still have it run x86 apps acceptably." The Crusoe team's answer to the questions they asked is an impressive blend of software and hardware technology that should make anyone with even a passing interest in CPU architecture sit up and take notice. With the backdrop of the Crusoe in place, we will be talking about the actual technology behind the Crusoe in detail: the Code Morphing software, the VLIW core, the Long Run power management features, and more. We'll look at how those technologies work and what they offer. The Concept Crusoe is pitched as a "hybrid software-hardware" CPU. Many of the functions that normal x86 CPUs do in hardware, Crusoe does with a complex software layer called Code Morphing Software. However, everything eventually gets done in hardware. The following is a block diagram of a generic, 6th or 7th generation x86 CPU. A modern x86 CPU like the K7 & PIII doesn't actually run x86 instructions as such. Rather, the CPU translates them into some more compact, uniform, RISC-like internal instruction format. To do all this translation, the K7 or PIII needs some hefty decoding hardware, as can be seen in the diagram. The front end of almost any modern CPU, x86, PPC, Alpha, etc., has other special hardware that the CPU uses to aggressively optimize and reorder code on-the-fly, as it's executing. All this fancy hardware takes up die space and increases power requirements. Now, here's Crusoe as it was presented by Transmeta. The blue section is silicon, and the yellow is software. Crusoe's blue part is smaller, because all that fancy translation, branch prediction, and out-of-order execution hardware was moved off the die and into software. All of those functions are now done in real-time by a special program as the application code is executing. Moving those functions to software doesn't mean the CPU doesn't still have to do them. That VLIW core does all the branch predicting and register renaming that the K7 does--it just doesn't have special hardware to do it with. Crusoe's lack of dedicated hardware for these essential front-end functions means that the software that implements them has to share hardware resources and CPU cycles with x86 application software, operating system software, and everything else. Here's a picture that shows what's really going on: The yellow sections are, as in the above picture, the front-end CPU functions (branch prediction, register renaming, instruction scheduling, etc.). The red sections are OS and application software. All of these things are running on the CPU core at the same time. The Crusoe CPU is still renaming registers, reordering instructions, translating x86 instructions into an internal instruction format, etc., just like any other CPU. It just isn't using any dedicated hardware to do it with. Instead, these functions are performed by the same hardware that handles instruction execution, addition, subtraction, multiplication, etc. It is clear that doing all of this stuff in software slows things down a bit. Nevertheless, Transmeta has gone to great lengths to ensure that the performance penalty incurred is as small as possible. Atoms and molecules: the instruction format Crusoe's internal instruction format is more or less straightforward VLIW. Individual operations that are destined for the execution units are called "atoms." Atoms are roughly equivalent to RISC operations (or K7's "ops" and PIII's "rops"). These atoms are packaged together into either 128- or 64-bit chunks called "molecules." Molecules correspond to EPIC's "bundles" or MAJC's "packets." A 128-bit bundle contains four atoms. Here's a diagram from Transmeta's tech to see how it works. A VLIW program is a just a list of these molecules, which are fed into the CPU and executed in order by the execution units. Crusoe's hardware doesn't have to think about reordering them, predicting where the next instructions will come from, or anything of the sort; it just focuses on firing those instruction molecules through the execution engine as fast as possible. Code Morphing In a traditional superscalar design, a software writer writes a sequential program in a high level language and then compiles it to machine code. This machine code is sequential too, and the CPU's instruction scheduling and dispatch hardware has to rearrange it so that it can run in parallel. The scheduler also aggressively examines the code for dependencies, and then reorders it before actually executing. The end result is that the sequential, ordered code that was fed into the CPU actually gets executed in parallel and out-of-order. Doing all of this trickery with the code involves a lot of work on the CPU's part, and often it isn't cheap in terms of transistors or clock cycles. A "traditional" VLIW machine, however, does all of that reordering and parallelism hunting in software. For a more straight-ahead VLIW design like Intel's IA-64, the piece of software that does all this is the compiler. The compiler extracts the parallelism from the code, looks for dependencies, etc., and produces optimized code that the VLIW core can run as fast as possible, in-order. Since Crusoe is a VLIW machine that's made to run code compiled for a superscalar machine, its compilation and scheduling scheme is sort of a hybrid of both approaches. Crusoe's Code Morphing software actually takes a compiled x86 program and recompiles it, on-the-fly, to Crusoe's native VLIW instruction format. This recompilation uses sophisticated compiler algorithms to extract parallelism from the code, look for dependencies and do all those things that a state-of-the-art VLIW compiler does. If you recall the two Crusoe diagrams from page 3, the yellow parts belong to the Code Morphing layer. The Code Morphing layer, which is written to Crusoe's native VLIW instruction set, sits between the CPU and the OS and BIOS, as can be seen in the following figure. The Code Morphing software resides in flash ROM and is the first application to launch when the Crusoe processor is powered up. Upon completion of its initialization, other system software components such as the BIOS and operating system are loaded in traditional fashion. The Code Morphing software consists of two main modules that work in conjunction to implement the functions of an x86 processor. The Interpreter The Code Morphing software contains an Interpreter module that interprets x86 instructions one at a time, much like a traditional microprocessor. The Interpreter functionality also filters infrequently executed code from being needlessly optimized and gathers run time statistical information about the x86 instructions it sees for determining whether optimizations are necessary. The Translator Upon detecting critical, frequently used x86 instruction sequences, the Code Morphing software invokes a Translator module that recompiles the x86 instructions into optimized VLIW instructions, called “Translations.” The native translations reduce the number of instructions executed and results in better performance. Further efficiencies are possible by saving the translations in memory that is inaccessible to normal x86 code. This special memory area is named the “Translation Cache” and allows the Code Morphing software to re-use translations and eliminate redundancies. Upon encountering previously translated x86 instruction sequences, the Code Morphing software skips the translation process and executes the cached translation directly out of the Translation Cache. Caching and re-using translations exploits the high degree of repetition typically found in real world workloads. The Code Morphing software matches repeated executions with entries in the Translation Cache and the optimized translation is executed at full speed with minimal overhead. The initial cost of the translation is amortized over repeated executions. Advantages of the Code Morphing software The Code Morphing software provides the Crusoe processor with unprecedented flexibility by implementing the complexities of a traditional microprocessor in software. This results in the following advantages over conventional x86 processors: Traditional x86 Processors Crusoe Processor with Code Morphing software Translates single instructions one at time Translates an entire group of x86 instructions at once Translates each x86 instruction every time it is encountered Translates instructions once, saving the resultant translation in a cache for re-use Full of complex, power-hungry transistors Much of the processor functionality is implemented in software — less logic transistors, less power With conventional microprocessor designs approaching 40 million transistors, managing heat and power consumption is now one of the industry’s biggest challenges. Switching every transistor on or off requires a bit of energy. The Crusoe processor was designed specifically to avoid these energy pitfalls by using the Code Morphing software to replace logic transistors, therefore generating less heat. This also has the added benefit of rolling out software upgrades to the microprocessor logic independently from the hardware. With a talented team of engineers dedicated to developing and enhancing the Code Morphing technology, Transmeta can provide quick, low cost improvements to performance and power consumption by simply releasing new Code Morphing software versions. In contrast, conventional x86 processors typically require new spins of hardware or exotic fabrication processes to deliver similar gains. The flexibility provided by the Code Morphing technology allows Transmeta to spearhead industry wide initiatives for low power/high performance mobile computing. Keeping it up to speed One of the biggest challenges that Transmeta faced with the Code Morphing software is that of maintaining an acceptable level of performance. Since they were designing the software and the hardware to fit together seamlessly, however, they could add in special hardware functionality to enhance the speed of the Code Morphing Software. The way that the x86 architecture does exception handling poses a problem to a CPU that tries to execute x86 code out-of-order. An exception occurs whenever an instruction tries to do something and runs into a problem (like in the case of a load causing a page fault, for instance.) When this happens on an x86 machine, the exception can't be taken care of ("handled") until all of the instructions before the one that caused the exception have completed and all the subsequent instructions have been put on hold. If you're executing instructions out of order, this poses a problem. What Crusoe does is keep two copies of the x86 register state, a "working copy" and a "shadow copy." Both copies are first made when Crusoe loads a block of translation code. As the translation executes, the code updates the working copy only. If the whole block of the translation executes with no exceptions, then Crusoe does a special "commit" instruction that overwrites the shadow copy with the working copy. If an exception happens during the translation, however, Crusoe has to trash all its work and go back and run the instructions in order to figure out exactly which instruction threw the exception. This is when the shadow copy comes in handy. It's kind of like a backup copy of the register state, in case an exception happens. Crusoe also has special hardware that helps it with speculative loading. Before hoisting a LOAD above a STORE (which could potentially overwrite the LOADed data), Crusoe converts the LOAD instruction into a special "load-and-protect" instruction. It then converts the STORE into a "store-under-alias-mask" instruction that looks to make sure it's not overwriting a protected LOAD. If the unlikely happens and the STORE does try to overwrite the LOADed data, an exception is thrown and the error is corrected. The benefit of all of this is that Crusoe can reorder LOADs however it needs too, because it has special bookkeeping facilities that'll let it know if it made a mistake and how to fix it. LongRun® Power Management Technology In addition to the Crusoe processor’s inherently energy efficient design, the Code Morphing software allows further reductions in power consumption by utilizing capabilities available only in the Crusoe hardware. One of these features is implemented in Transmeta’s LongRun Power Management technology. The LongRun technology provides Code Morphing software with the ability to adjust Crusoe’s voltage and clock frequency on the fly depending on the demands placed on the Crusoe processor by software. Because power varies linearly with clock speed and by the square of voltage, adjusting both can produce cubic reductions in power consumption, whereas conventional CPUs can adjust power only linearly (by only adjusting the frequency). The LongRun policies are implemented within the Code Morphing software and it continuously scales both the frequency and voltage of the Crusoe processor according to the instantaneous demands of the computer system. It can detect different scenarios based on runtime performance information and then exploit these by adapting its power usage accordingly. All LongRun adjustments are seamless and transparent to the user. Figure 1. DVD idle. Figure 2. DVD bootup (~5 seconds). PIII Average=2.309 Watts; Crusoe Average=0.326 PIII Average=5.242 Watts; Crusoe Average=1.375 Watts. Watts. Figure 3. LongRun launched. DVD running at normal speed. PIII Average=6.199 Watts; Crusoe Average=1.257 Watts. LongRun is designed to provide just enough performance for the processor workload at hand. This allows it to deliver performance when necessary and conserves power when processor demand is low, thereby eliminating performance and energy wastage. How does LongRun work? LongRun operates by configuring the Crusoe processor to run at a number of different frequency and voltage points. The LongRun algorithm in the Code Morphing software monitors the Crusoe processor and dynamically switches between these points as runtime conditions change. Idle time is monitored and LongRun finds a frequency/voltage point that minimizes the idle time for the current workload. Although the heuristics employed are complex, the LongRun policies can be abstracted to the following points: If no idle time is detected during a workload, the frequency/voltage point is incremented (if possible). If idle time is detected, LongRun may decide that performance is being wasted and decrement the frequency/voltage level. LongRun also works in conjunction with the industry standard ACPI (Advanced Configuration and Power Interface) specification — when the frequency and voltage scaling hits microarchitecture boundaries, Crusoe transparently switches over to traditional power models allowing policies such as ACPI to handle power management. How is LongRun different? The following table illustrates some of the differences between LongRun and other conventional power management techniques. Power Management Technologies Attributes Legacy Power Management technology Employs primitive techniques such as Clock Throttling to deliver only linear reductions in power. Clock Throttling chokes processor performance by alternating the processor between running at full-speed and being effectively turned off. Intel’s SpeedStep technology Power source based approach consisting of just two operating points. Provides a lower granularity of control and misses opportunities for further power gains. Transmeta’s LongRun technology Adjusts both processor frequency and voltage in multiple steps based on processor activity. Allows a significant cubic power reduction relative to the drop in frequency. Thermal Management The management of how a device will dispose of heat is an integral part of microprocessor design. Operating temperature rises as heat collects in a device, potentially causing damage and affecting performance. Designers are careful to incorporate thermal solutions in their designs that allow their products to operate within a safe temperature range. Conventional processors typically use Thermal Throttling for CPU thermal management. Thermal throttling regulates the thermal environment by alternating between running the processor at full speed and placing the processor in a sleep state whenever the upper limits of the thermal envelope are reached. Performance is delivered in discrete bursts that tend to be unfavorable for applications processing smooth multimedia content, such as software DVD and MP3 playback. By integrating a thermal model into the software algorithm, LongRun manages the Crusoe processor’s thermal environment by using frequency/voltage shifts as a substitute for thermal throttling. In contrast to conventional thermal management techniques, the LongRun thermal extensions deliver higher performance at the same die temperature or the same performance at a lower die temperature, essentially expanding the thermal budget of the CPU. The LongRun thermal extensions allow the possible elimination of active cooling solutions, which reduces system weight and time-to-market as there is no need for explicit CPU thermal management. With LongRun, Transmeta delivers superior performance for a given thermal envelope. The Future In 1995, Transmeta set out to expand the reach of microprocessors into new markets by dramatically changing the way microprocessors are designed. The initial market is mobile computing, in which complex power-hungry processors have forced users to give up either battery running time or performance. The Crusoe processor solutions have been designed for lightweight (two to four pound) mobile computers and Internet access devices such as handhelds and web pads. They can give these devices PC capabilities and unplugged running times of up to a day. To design the Crusoe processor chips, the Transmeta engineers did not resort to exotic fabrication processes. Instead they rethought the fundamentals of microprocessor design. Rather than “throwing hardware” at design problems, they chose an innovative approach that employs a unique combination of hardware and software. Using software to decompose complex instructions into simple atoms and to schedule and optimize the atoms for parallel execution saves millions of logic transistors and cuts power consumption on the order of 60–70% over conventional approaches— while at the same time enabling aggressive code optimization techniques that are simply not feasible in traditional x86 implementations. Transmeta’s Code Morphing software and fast VLIW hardware, working together, achieve low power consumption without sacrificing high performance for real-world applications. Although the Crusoe model TM3120 and model TM5400 are impressive first efforts, the significance of the Transmeta approach to microprocessor design is likely to become more apparent over the next several years. The technology is young and offers more freedom to innovate (both hardware and software) than conventional hardware-only designs. Nor is the approach limited to low-power designs or to x86compatible processors. Freed to render their ideas in a combination of hardware and software, and to evolve hardware without breaking legacy code, Transmeta microprocessor designers may produce one surprise after another in the new millennium. Bibliography Gerritsen, Armen. The Transmeta Crusoe. [Online] Available http://cpusite.examedia.nl/docs/crusoe.html February 8, 2000. ITWorld.com. Transmeta Crusoe. [Online] Available http://www.itworld.com/Comp/2062/ May 2000. Simon, Jon. Transmeta Crusoe Preview At Platform 2000. [Online] Available http://www.sharkyextreme.com/hardware/articles/transmeta_crusoe/ January 31, 2000. Transmeta. [Online] Available http://www.transmeta.com May 2002.