Download Transmeta`s Crusoe Microprocessor

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microprocessor wikipedia , lookup

Computer program wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Transcript
Tony Kolanovic
Student Id: 053-74-7657
Microprocessors
Professor Dewar
May 20, 2002
Transmeta's Crusoe Microprocessor
Transmeta -- in the beginning
In 1995, four and a half years ago, Transmeta was formed. From its inception, privately held Transmeta
kept a very low profile--or rather, a complete media silence. Transmeta began recruiting high profile talent,
including the infamous Linus Torvalds, creator of the open source Linux OS. In late 1999, details began
leaking out as to just what Transmeta was developing. It was a very low-power processor that could be run
at high speeds and could run a variety of instruction sets. It would be called "Crusoe."
On January 19th, 2000, Transmeta let the world in on its secret. Transmeta made a mobile chip called the
Crusoe, and it draws a mere 1 watt of power during normal usage. The processor family is aimed at
Internet / mobile devices, and a goal of production was to make it "Internet compatible." To Transmeta, that
meant that not only did Crusoe have to support x86 instructions, run a standard OS, and run a browser, but
it also had to be able to support plug-ins for ShockWave, Flash, or Java.
The Crusoe is unlike any other hardware-only processors. Implementing new technology, Transmeta was
able to create a processor that is lighter than many other laptop processors, allowing computer
manufacturers that use Crusoe to put the chip in laptops that are only 1 inch thick or less. With this
flexibility that Crusoe brings to the laptop industry, manufacturers can keep the weight of a laptop under 4
lbs., and increase screen size, or add a second or even third drive. Or, they can dramatically reduce both
weight and thickness. Or any combination of the above.
Besides being lighter, the Crusoe lasts longer than other processors. With a high performance 128-bit
VLIW (Very Long Instruction Word) hardware engine, and a software-based architecture, Crusoe requires
fewer transistors. Since all transistors burn power, having fewer of them means Crusoe requires less power
to do the same amount of work. Secondly, Crusoe uses power wisely. Instead of running every application
at the processor’s top speed, Crusoe’s unique ’smart processor’ architecture and LongRun® Power
Management technology, look at the application, see where they can economize, and make the necessary
adjustments — all in a few microseconds — to determine exactly how much performance the application
demands. In addition, the longer you run an application, the more Crusoe improves the application’s
performance because it’s always learning. The result is that, with Crusoe, you always get the optimum
performance, no more or less, and the maximum battery life.
Due to the fact that the Crusoe uses fewer transistors and manages power more efficiently, it can run at a
much cooler temperature than other processors. Crusoe has a fundamentally different architecture, designed
from the ground up to run cooler.
The thermal images above contrast the operating temperatures of a Crusoe processor and a conventional
“mobile” processor, running a software DVD player. The Crusoe processor, without any cooling, runs at
48°C (118°F), whereas the conventional processor, at 105°C (221°F), can heat to the point of failure if it
not aggressively cooled.
Thermal image cameras captured this side-by-side demonstration of a Pentium III 500MHz, 1.6 volt part
versus a Crusoe TM5600 600MHz, 1.6 volt part.
Figure 1. Pentium III / Crusoe processor bootup.
Figure 2. Processors heating up (elapsed time = 10
sec.).
Figure 3. Peak temperature (elapsed time = 25 sec.).
In the thermal demo images above (see Figures 1 through 3), each processor is booted up, and very quickly
the Pentium III on the left begins to heat up. With its cooling apparatus removed, the Pentium III reaches
temperatures of 105ºC, or 242ºF. Meanwhile, Crusoe boots the operating system, runs perfectly and only
reaches a temperature of 48ºC, or 118ºF.
All of the aforementioned qualities of the Crusoe make it a remarkable processor. When Transmeta's
Crusoe team began work on the processor, they started from scratch, but instead of asking "how fast can we
possibly make this," as AMD's and Intel's design teams were doing, they asked "how efficient can we
possibly make this, and still have it run x86 apps acceptably."
The Crusoe team's answer to the questions they asked is an impressive blend of software and hardware
technology that should make anyone with even a passing interest in CPU architecture sit up and take notice.
With the backdrop of the Crusoe in place, we will be talking about the actual technology behind the Crusoe
in detail: the Code Morphing software, the VLIW core, the Long Run power management features, and
more. We'll look at how those technologies work and what they offer.
The Concept
Crusoe is pitched as a "hybrid software-hardware" CPU. Many of the functions that normal x86 CPUs do in
hardware, Crusoe does with a complex software layer called Code Morphing Software.
However, everything eventually gets done in hardware. The following is a block diagram of a generic, 6th
or 7th generation x86 CPU.
A modern x86 CPU like the K7 & PIII doesn't actually run x86 instructions as such. Rather, the CPU
translates them into some more compact, uniform, RISC-like internal instruction format. To do all this
translation, the K7 or PIII needs some hefty decoding hardware, as can be seen in the diagram.
The front end of almost any modern CPU, x86, PPC, Alpha, etc., has other special hardware that the CPU
uses to aggressively optimize and reorder code on-the-fly, as it's executing. All this fancy hardware takes
up die space and increases power requirements.
Now, here's Crusoe as it was presented by Transmeta.
The blue section is silicon, and the yellow is software. Crusoe's blue part is smaller, because all that fancy
translation, branch prediction, and out-of-order execution hardware was moved off the die and into
software. All of those functions are now done in real-time by a special program as the application code is
executing.
Moving those functions to software doesn't mean the CPU doesn't still have to do them. That VLIW core
does all the branch predicting and register renaming that the K7 does--it just doesn't have special hardware
to do it with. Crusoe's lack of dedicated hardware for these essential front-end functions means that the
software that implements them has to share hardware resources and CPU cycles with x86 application
software, operating system software, and everything else. Here's a picture that shows what's really going
on:
The yellow sections are, as in the above picture, the front-end CPU functions (branch prediction, register
renaming, instruction scheduling, etc.). The red sections are OS and application software. All of these
things are running on the CPU core at the same time. The Crusoe CPU is still renaming registers,
reordering instructions, translating x86 instructions into an internal instruction format, etc., just like any
other CPU. It just isn't using any dedicated hardware to do it with. Instead, these functions are performed
by the same hardware that handles instruction execution, addition, subtraction, multiplication, etc.
It is clear that doing all of this stuff in software slows things down a bit. Nevertheless, Transmeta has gone
to great lengths to ensure that the performance penalty incurred is as small as possible.
Atoms and molecules: the instruction format
Crusoe's internal instruction format is more or less straightforward VLIW. Individual operations that are
destined for the execution units are called "atoms." Atoms are roughly equivalent to RISC operations (or
K7's "ops" and PIII's "rops"). These atoms are packaged together into either 128- or 64-bit chunks called
"molecules." Molecules correspond to EPIC's "bundles" or MAJC's "packets." A 128-bit bundle contains
four atoms. Here's a diagram from Transmeta's tech to see how it works.
A VLIW program is a just a list of these molecules, which are fed into the CPU and executed in order by
the execution units. Crusoe's hardware doesn't have to think about reordering them, predicting where the
next instructions will come from, or anything of the sort; it just focuses on firing those instruction
molecules through the execution engine as fast as possible.
Code Morphing
In a traditional superscalar design, a software writer writes a sequential program in a high level language
and then compiles it to machine code. This machine code is sequential too, and the CPU's instruction
scheduling and dispatch hardware has to rearrange it so that it can run in parallel. The scheduler also
aggressively examines the code for dependencies, and then reorders it before actually executing. The end
result is that the sequential, ordered code that was fed into the CPU actually gets executed in parallel and
out-of-order. Doing all of this trickery with the code involves a lot of work on the CPU's part, and often it
isn't cheap in terms of transistors or clock cycles.
A "traditional" VLIW machine, however, does all of that reordering and parallelism hunting in software.
For a more straight-ahead VLIW design like Intel's IA-64, the piece of software that does all this is the
compiler. The compiler extracts the parallelism from the code, looks for dependencies, etc., and produces
optimized code that the VLIW core can run as fast as possible, in-order.
Since Crusoe is a VLIW machine that's made to run code compiled for a superscalar machine, its
compilation and scheduling scheme is sort of a hybrid of both approaches. Crusoe's Code Morphing
software actually takes a compiled x86 program and recompiles it, on-the-fly, to Crusoe's native VLIW
instruction format. This recompilation uses sophisticated compiler algorithms to extract parallelism from
the code, look for dependencies and do all those things that a state-of-the-art VLIW compiler does.
If you recall the two Crusoe diagrams from page 3, the yellow parts belong to the Code Morphing layer.
The Code Morphing layer, which is written to Crusoe's native VLIW instruction set, sits between the CPU
and the OS and BIOS, as can be seen in the following figure.
The Code Morphing software resides in flash ROM and is the first application to launch when the Crusoe
processor is powered up. Upon completion of its initialization, other system software components such as
the BIOS and operating system are loaded in traditional fashion.
The Code Morphing software consists of two main modules that work in conjunction to implement the
functions of an x86 processor.
The Interpreter
The Code Morphing software contains an Interpreter module that interprets x86 instructions one at a time,
much like a traditional microprocessor. The Interpreter functionality also filters infrequently executed code
from being needlessly optimized and gathers run time statistical information about the x86 instructions it
sees for determining whether optimizations are necessary.
The Translator
Upon detecting critical, frequently used x86 instruction sequences, the Code Morphing software invokes a
Translator module that recompiles the x86 instructions into optimized VLIW instructions, called
“Translations.” The native translations reduce the number of instructions executed and results in better
performance.
Further efficiencies are possible by saving the translations in memory that is inaccessible to normal x86
code. This special memory area is named the “Translation Cache” and allows the Code Morphing software
to re-use translations and eliminate redundancies. Upon encountering previously translated x86 instruction
sequences, the Code Morphing software skips the translation process and executes the cached translation
directly out of the Translation Cache.
Caching and re-using translations exploits the high degree of repetition typically found in real world
workloads. The Code Morphing software matches repeated executions with entries in the Translation
Cache and the optimized translation is executed at full speed with minimal overhead. The initial cost of the
translation is amortized over repeated executions.
Advantages of the Code Morphing software
The Code Morphing software provides the Crusoe processor with unprecedented flexibility by
implementing the complexities of a traditional microprocessor in software. This results in the following
advantages over conventional x86 processors:
Traditional x86 Processors
Crusoe Processor with Code Morphing software
Translates single instructions one at
time
Translates an entire group of x86 instructions at once
Translates each x86 instruction every
time it is encountered
Translates instructions once, saving the resultant translation in
a cache for re-use
Full of complex, power-hungry
transistors
Much of the processor functionality is implemented in
software — less logic transistors, less power
With conventional microprocessor designs approaching 40 million transistors, managing heat and power
consumption is now one of the industry’s biggest challenges. Switching every transistor on or off requires a
bit of energy.
The Crusoe processor was designed specifically to avoid these energy pitfalls by using the Code Morphing
software to replace logic transistors, therefore generating less heat. This also has the added benefit of
rolling out software upgrades to the microprocessor logic independently from the hardware.
With a talented team of engineers dedicated to developing and enhancing the Code Morphing technology,
Transmeta can provide quick, low cost improvements to performance and power consumption by simply
releasing new Code Morphing software versions. In contrast, conventional x86 processors typically require
new spins of hardware or exotic fabrication processes to deliver similar gains.
The flexibility provided by the Code Morphing technology allows Transmeta to spearhead industry wide
initiatives for low power/high performance mobile computing.
Keeping it up to speed
One of the biggest challenges that Transmeta faced with the Code Morphing software is that of maintaining
an acceptable level of performance. Since they were designing the software and the hardware to fit together
seamlessly, however, they could add in special hardware functionality to enhance the speed of the Code
Morphing Software.
The way that the x86 architecture does exception handling poses a problem to a CPU that tries to execute
x86 code out-of-order. An exception occurs whenever an instruction tries to do something and runs into a
problem (like in the case of a load causing a page fault, for instance.) When this happens on an x86
machine, the exception can't be taken care of ("handled") until all of the instructions before the one that
caused the exception have completed and all the subsequent instructions have been put on hold. If you're
executing instructions out of order, this poses a problem.
What Crusoe does is keep two copies of the x86 register state, a "working copy" and a "shadow copy."
Both copies are first made when Crusoe loads a block of translation code. As the translation executes, the
code updates the working copy only. If the whole block of the translation executes with no exceptions, then
Crusoe does a special "commit" instruction that overwrites the shadow copy with the working copy. If an
exception happens during the translation, however, Crusoe has to trash all its work and go back and run the
instructions in order to figure out exactly which instruction threw the exception. This is when the shadow
copy comes in handy. It's kind of like a backup copy of the register state, in case an exception happens.
Crusoe also has special hardware that helps it with speculative loading. Before hoisting a LOAD above a
STORE (which could potentially overwrite the LOADed data), Crusoe converts the LOAD instruction into
a special "load-and-protect" instruction. It then converts the STORE into a "store-under-alias-mask"
instruction that looks to make sure it's not overwriting a protected LOAD. If the unlikely happens and the
STORE does try to overwrite the LOADed data, an exception is thrown and the error is corrected. The
benefit of all of this is that Crusoe can reorder LOADs however it needs too, because it has special
bookkeeping facilities that'll let it know if it made a mistake and how to fix it.
LongRun® Power Management Technology
In addition to the Crusoe processor’s inherently energy efficient design, the Code Morphing software
allows further reductions in power consumption by utilizing capabilities available only in the Crusoe
hardware. One of these features is implemented in Transmeta’s LongRun Power Management technology.
The LongRun technology provides Code Morphing software with the ability to adjust Crusoe’s voltage and
clock frequency on the fly depending on the demands placed on the Crusoe processor by software. Because
power varies linearly with clock speed and by the square of voltage, adjusting both can produce cubic
reductions in power consumption, whereas conventional CPUs can adjust power only linearly (by only
adjusting the frequency).
The LongRun policies are implemented within the Code Morphing software and it continuously scales both
the frequency and voltage of the Crusoe processor according to the instantaneous demands of the computer
system. It can detect different scenarios based on runtime performance information and then exploit these
by adapting its power usage accordingly. All LongRun adjustments are seamless and transparent to the
user.
Figure 1. DVD idle.
Figure 2. DVD bootup (~5 seconds).
PIII Average=2.309 Watts; Crusoe Average=0.326 PIII Average=5.242 Watts; Crusoe Average=1.375
Watts.
Watts.
Figure 3. LongRun launched. DVD running at normal speed.
PIII Average=6.199 Watts; Crusoe Average=1.257 Watts.
LongRun is designed to provide just enough performance for the processor workload at hand. This allows it
to deliver performance when necessary and conserves power when processor demand is low, thereby
eliminating performance and energy wastage.
How does LongRun work?
LongRun operates by configuring the Crusoe processor to run at a number of different frequency and
voltage points. The LongRun algorithm in the Code Morphing software monitors the Crusoe processor and
dynamically switches between these points as runtime conditions change.
Idle time is monitored and LongRun finds a frequency/voltage point that minimizes the idle time for the
current workload. Although the heuristics employed are complex, the LongRun policies can be abstracted
to the following points:


If no idle time is detected during a workload, the frequency/voltage point is incremented (if
possible).
If idle time is detected, LongRun may decide that performance is being wasted and decrement the
frequency/voltage level.
LongRun also works in conjunction with the industry standard ACPI (Advanced Configuration and Power
Interface) specification — when the frequency and voltage scaling hits microarchitecture boundaries,
Crusoe transparently switches over to traditional power models allowing policies such as ACPI to handle
power management.
How is LongRun different?
The following table illustrates some of the differences between LongRun and other conventional power
management techniques.
Power Management Technologies
Attributes
Legacy Power Management technology
Employs primitive techniques such as Clock Throttling
to deliver only linear reductions in power. Clock
Throttling chokes processor performance by alternating
the processor between running at full-speed and being
effectively turned off.
Intel’s SpeedStep technology
Power source based approach consisting of just two
operating points. Provides a lower granularity of control
and misses opportunities for further power gains.
Transmeta’s LongRun technology
Adjusts both processor frequency and voltage in
multiple steps based on processor activity. Allows a
significant cubic power reduction relative to the drop in
frequency.
Thermal Management
The management of how a device will dispose of heat is an integral part of microprocessor design.
Operating temperature rises as heat collects in a device, potentially causing damage and affecting
performance. Designers are careful to incorporate thermal solutions in their designs that allow their
products to operate within a safe temperature range. Conventional processors typically use Thermal
Throttling for CPU thermal management. Thermal throttling regulates the thermal environment by
alternating between running the processor at full speed and placing the processor in a sleep state whenever
the upper limits of the thermal envelope are reached.
Performance is delivered in discrete bursts that tend to be unfavorable for applications processing smooth
multimedia content, such as software DVD and MP3 playback.
By integrating a thermal model into the software algorithm, LongRun manages the Crusoe processor’s
thermal environment by using frequency/voltage shifts as a substitute for thermal throttling. In contrast to
conventional thermal management techniques, the LongRun thermal extensions deliver higher performance
at the same die temperature or the same performance at a lower die temperature, essentially expanding the
thermal budget of the CPU. The LongRun thermal extensions allow the possible elimination of active
cooling solutions, which reduces system weight and time-to-market as there is no need for explicit CPU
thermal management.
With LongRun, Transmeta delivers superior performance for a given thermal envelope.
The Future
In 1995, Transmeta set out to expand the reach of microprocessors into new markets by dramatically
changing the way microprocessors are designed. The initial market is mobile computing, in which complex
power-hungry processors have forced users to give up either battery running time or performance. The
Crusoe processor solutions have been designed for lightweight (two to four pound) mobile computers and
Internet access devices such as handhelds and web pads. They can give these devices PC capabilities and
unplugged running times of up to a day. To design the Crusoe processor chips, the Transmeta engineers did
not resort to exotic fabrication processes. Instead they rethought the fundamentals of microprocessor
design. Rather than “throwing hardware” at design problems, they chose an innovative approach that
employs a unique combination of hardware and software. Using software to decompose complex
instructions into simple atoms and to schedule and optimize the atoms for parallel execution saves millions
of logic transistors and cuts power consumption on the order of 60–70% over conventional approaches—
while at the same time enabling aggressive code optimization techniques that are simply not feasible in
traditional x86 implementations. Transmeta’s Code Morphing software and fast VLIW hardware, working
together, achieve low power consumption without sacrificing high performance for real-world applications.
Although the Crusoe model TM3120 and model TM5400 are impressive first efforts, the significance of the
Transmeta approach to microprocessor design is likely to become more apparent over the next several
years. The technology is young and offers more freedom to innovate (both hardware and software) than
conventional hardware-only designs. Nor is the approach limited to low-power designs or to x86compatible processors. Freed to render their ideas in a combination of hardware and software, and to
evolve hardware without breaking legacy code, Transmeta microprocessor designers may produce one
surprise after another in the new millennium.
Bibliography
Gerritsen, Armen. The Transmeta Crusoe. [Online] Available http://cpusite.examedia.nl/docs/crusoe.html
February 8, 2000.
ITWorld.com. Transmeta Crusoe. [Online] Available http://www.itworld.com/Comp/2062/
May 2000.
Simon, Jon. Transmeta Crusoe Preview At Platform 2000. [Online] Available
http://www.sharkyextreme.com/hardware/articles/transmeta_crusoe/
January 31, 2000.
Transmeta. [Online] Available http://www.transmeta.com May 2002.