yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Process management (computing) wikipedia , lookup

CP/M wikipedia , lookup

From Wikipedia, the free encyclopedia.
In computer architecture, 64-bit is an adjective
used to describe integers, memory addresses or
other data units that are at most 64 bits (8
octets) wide, or to describe CPU and ALU
architectures based on registers, address buses,
or data buses of that size.
As of 2004, 64-bit CPUs are common in
servers, and have recently been introduced to
the (previously 32-bit) mainstream personal
computer arena in the form of the AMD64,
EM64T, and PowerPC 970 (or "G5") processor
N-bit Processors
4- 8- 16- 24- 31- 32- 48- 64- 128bit bit bit bit bit bit bit bit bit
N-bit Applications
31- 32bit bit
N-bit Data Sizes
4- 8- 16bit bit bit
64- 128bit bit
nibble byte octet word dword qword
These definitions are relevant to the world of
x86 processors. See linked articles for
discussion of the meaning in other
architectures. The 31-bit and 48-bit sizes
relate to IBM mainframes and AS/400s,
Although a CPU may be 64-bit internally, its
external data bus or address bus may have a
different size, either larger or smaller, and the
term is often used to describe the size of these
buses as well. For instance, many current
machines with 32-bit processors use 64-bit buses, and may occasionally be referred to as
"64-bit" for this reason. The term may also refer to the size of an instruction in the
computer's instruction set or to any other item of data. Without further qualification,
however, a computer architecture described as "64-bit" generally has integer registers that
are 64 bits wide and thus directly supports dealing both internally and externally with 64bit "chunks" of data.
Architectural implications
Registers in a processor are generally divided into three groups: integer, floating point,
and other. In all common general purpose processors, only the integer registers are
capable of storing pointer values (that is, an address of some data in memory). The noninteger registers cannot be used to store pointers for the purpose of reading or writing to
memory, and therefore cannot be used to bypass any memory restrictions imposed by the
size of the integer registers.
Nearly all common general purpose processors (with the notable exception of the ARM
and most 32-bit MIPS implementations) have integrated floating point hardware, which
may or may not use 64 bit registers to hold data for processing. For example, the AMD64
architecture defines a SSE unit which includes 16 128-bit wide registers, and the
traditional x87 floating point unit defines 8 80-bit registers in a stack configuration. By
contrast, the 64-bit Alpha family of processors defines 32 64-bit wide floating point
registers in addition to its 32 64-bit wide integer registers.
Memory limitations
Most CPUs are currently (c. 2005) designed so that the contents of a single integer
register can store the address (location) of any datum in the computer's virtual memory.
Therefore, the total number of addresses in the virtual memory — the total amount of
data the computer can keep in its working area — is determined by the width of these
registers. Beginning in the 1960s with the IBM System 360, then (amongst many others)
the DEC VAX minicomputer in the 1970s, and then with the Intel 80386 in the mid1980s, a de facto consensus developed that 32 bits was a convenient register size. A 32bit register meant that 232 addresses, or 4 gigabytes of RAM memory, could be
referenced. At the time these architectures were devised, 4 gigabytes of memory was so
far beyond the typical quantities available in installations that this was considered to be
enough "headroom" for addressing. 4-gigabyte addresses were considered an appropriate
size to work with for another important reason: 4 billion integers are enough to assign
unique references to most physically countable things in applications like databases.
However, with the march of time and the continual reductions in the cost of memory (see
Moore's Law), by the early 1990s installations with quantities of RAM approaching 4
gigabytes began to appear, and the use of virtual memory spaces exceeding the 4gigabyte ceiling became desirable for handling certain types of problems. In response, a
number of companies began releasing new families of chips with 64-bit architectures,
initially for supercomputers and high-end workstation and server machines. 64-bit
computing has gradually drifted down to the personal computer desktop, with Apple
Computer's PowerMac desktop line as of 2003 and its iMac home computer line (as of
2004) both using 64-bit processors (the G5 chip from IBM), and AMD's "AMD64"
architecture (cloned by Intel as "EM64T") becoming common in high-end PCs.
1991: MIPS Technologies produced the first 64-bit CPU, as the third revision of their MIPS RISC architecture, the R4000.
The CPU was commercially available in 1991 and used in SGI graphics workstations starting with the Indigo series,
running the 64-bit version of the IRIX operating system.
1994: Intel announced plans for the 64-bit IA-64 architecture (jointly developed with HP) as a successor to its 32-bit IA-32
processors. A 1998-1999 launch date was targeted.
1995: Fujitsu-owned HAL Computer Systems launched workstations based on a 64-bit CPU, HAL's independently
designed first generation SPARC64. IBM released 64-bit AS/400 systems, with the upgrade able to convert the operating
system, database and applications.
1996: Sun and HP released their 64-bit processors, the UltraSPARC and the PA-8000. Sun Solaris, IRIX, and other
variants of UNIX continued to be common 64-bit operating systems.
1999: Intel released the instruction set for the IA-64 architecture. First public disclosure of AMD's set of 64-bit extensions
to IA-32 called x86-64.
2000: IBM shipped its first 64-bit mainframe, the zSeries z900, and its new z/OS operating system — culminating history's
biggest 64-bit processor development investment and instantly wiping out 31-bit plug-compatible competitors
Fujitsu/Amdahl and Hitachi. 64-bit Linux on zSeries followed almost immediately.
2001: Intel finally shipped its 64-bit processor line, now branded Itanium, targeting high-end servers. It fails to meet
expectations due to the repeated delays getting IA-64 to market, and becomes a flop. Linux was the first operating system
to run on the processor at its release.
2002: Intel introduced the Itanium 2 as a successor to the Itanium.
2003: AMD brought out its 64-bit Opteron and Athlon 64 processor lines. Apple also shipped 64-bit PowerPC chips
courtesy of IBM and Motorola, along with an update to its Mac OS X operating system. Several Linux distributions
released with support for x86-64. Microsoft announced that it would create a version of its Windows operating system for
the AMD chips. Intel maintained that its Itanium chips would remain its only 64-bit processors.
2004: Intel, reacting to the market success of AMD, admitted it had been developing a clone of the x86-64 extensions,
which it calls EM64T. Updated versions of its Xeon and Pentium 4 processor families supporting the new instructions were
2005: In March, Intel announced that their first dual-core processors will ship in the second quarter 2005 with the release
of the Pentium Extreme Edition 840 and the new Pentium D chips. Dual-core Itanium 2 processors will follow in the fourth
2005: On April 18, Beijing Longxin rolled out its first x86-64 compatible CPU, named Longxin II. The thumb sized square
chip gathers 13.5 million transistors with a peak capacity of 2 billion calculations per second for a single accuracy check
and 1 billion calculations per second under a dual accuracy check. The new chip registers a maximum frequency of
500MHz and a power consumption ranging from 3 to 5 watts.
2005: On April 30, Microsoft publicly released Windows XP x64 Edition for x86-64 processors.
2005: In May, AMD pre-released its dual-core desktop processor family called Athlon 64 X2. Athlon 64 X2 (Toledo)
processors feature two cores with 1MB of L2 cache memory per core and consist of about 233.2 million transistors. They
are 199 mm² large.
2005: In July, IBM announced its new dual-core 64-bit PowerPC 970MP (codenamed Antares).
32 vs 64 bit
A change from a 32-bit to a 64-bit architecture is a fundamental alteration, as most
operating systems must be extensively modified to take advantage of the new
architecture. Other software must also be ported to use the new capabilities; older
software is usually supported through either a hardware compatibility mode (in which the
new processors support an older 32-bit instruction set as well as the new modes), through
software emulation, or by the actual implementation of a 32-bit processor core within the
64-bit processor die (as with the Itanium2 processors from Intel). One significant
exception to this is the AS/400, whose software runs on a virtual ISA which is
implemented in low-level software. This software, called TIMI, is all that has to be
rewritten to move the entire OS and all software to a new platform, such as when IBM
transitioned their line from 32-bit POWER to 64-bit POWER.
While 64-bit architectures indisputably make working with huge data sets in applications
such as digital video, scientific computing, and large databases easier, there has been
considerable debate as to whether they or their 32-bit compatibility modes will be faster
than comparably-priced 32-bit systems for other tasks.
Theoretically, some programs could well be faster in 32-bit mode. Instructions for 64-bit
computing take up more storage space than the earlier 32-bit ones, so it is possible that
some 32-bit programs will fit into the CPU's high-speed cache while equivalent 64-bit
programs will not. However, in applications like scientific computing, the data being
processed often fits naturally in 64-bit chunks, and will be faster on a 64-bit architecture
because the CPU will be designed to process such information directly rather than
requiring the program to perform multiple steps. Such assessments are complicated by
the fact that in the process of designing the new 64-bit architectures, the instruction set
designers have also taken the opportunity to make other changes that address some of the
deficiencies in older instruction sets by adding new performance-enhancing facilities
(such as the extra registers in the AMD64 design).
Pros and cons
A common misconception is that 64-bit architectures are no better than 32-bit
architectures unless the computer has more than 4 GB of memory. This is not entirely
Some operating systems reserve portions of each process' address space for OS
use, effectively reducing the total address space available for mapping memory
for user programs. For instance, Windows XP DLLs and userland OS components
are mapped into each process' address space, leaving only 2 or 3 GB (depending
on the settings) address space available, even if the computer has 4 GB of RAM.
This restriction is not present in Linux or 64-bit Windows.
Memory mapping of files is becoming more dangerous with 32-bit architectures,
especially with the introduction of relatively cheap recordable DVD technology.
A 4 GB file is no longer uncommon, and such large files cannot be memory
mapped easily to 32-bit architectures. This is an issue, as memory mapping
remains one of the most efficient disk-to-memory methods, when properly
implemented by the OS.
The main disadvantage of 64-bit architectures is that relative to 32-bit architectures the
same data occupies slightly more space in memory (due to swollen pointers and possibly
other types and alignment padding). This increases the memory requirements of a given
process, and can have implications for efficient processor cache utilisation. Maintaining a
partial 32-bit data model is one way to handle this, and is in general reasonably effective.
64-bit data models
Converting application software written in a high-level language from a 32-bit
architecture to a 64-bit architecture varies in difficulty. One common recurring problem
is that some programmers assume that pointers (variables that store memory addresses)
have the same length as some other data type. Programmers assume they can transfer
quantities between these data types without losing information. Those assumptions
happen to be true on some 32 bit machines (and even some 16 bit machines), but they are
no longer true on 64 bit machines. The C programming language and its descendant C++
make it particularly easy to make this sort of mistake.
To avoid this mistake in C and C++, the sizeof operator can be used to determine the size
of these primitive types if decisions based on their size need to be made at run time. Also,
limits.h in the C99 standard and climits in the C++ standard give more helpful info;
sizeof only returns the number of bytes, which is sometimes misleading, because the size
of a byte is also not well defined in C or C++. One needs to be careful to use the
ptrdiff_t type (in the standard header <stddef.h>) when doing pointer arithmetic; too
much code incorrectly uses "int" or "long" instead.
Neither C nor C++ define the length of a pointer, int, or long to be a specific number of
In most programming environments on 32 bit machines, pointers, "int" variables, and
"long" variables, are all 32 bits long.
However, in many programming environments on 64-bit machines, "int" variables are
still 32 bits wide, but "long"s and pointers are 64 bits wide. These are described as having
an LP64 data model. Another alternative is the ILP64 data model in which all three data
types are 64 bits wide. However, in most cases the modifications required are relatively
minor and straightforward, and many well-written programs can simply be recompiled
for the new environment without changes. Another alternative is the LLP64 model that
maintains compatibility with 32 bit code, by leaving both int and long as 32-bit. "LL"
refers to the "long long" type, which is at least 64 bits on all platforms, including 32 bit
Note that a programming model is a choice made on a per compiler basis, and several can
coexist on the same OS. However typically the programming model chosen by the OS
API as primary model dominates.
Another consideration is the data model used for drivers. Drivers make up the majority of
the operating system code in most modern operating systems (although many may not be
loaded when the operating system is running). Many drivers use pointers heavily to
manipulate data, and in some cases have to load pointers of a certain size into the
hardware they support for DMA. As an example, a driver for a 32-bit PCI device asking
the device to DMA data into upper areas of a 64-bit machine's memory could not satisfy
requests from the operating system to load data from the device to memory above the 4
gigabyte barrier, because the pointers for those addresses would not fit into the DMA
registers of the device. This problem is solved by having the OS take the memory
restrictions of the device into account when generating requests to drivers for DMA.
Current 64-bit processor architectures
64-bit processor architectures (as of 2005) include:
The DEC Alpha architecture (view ALPHA 64-bit timeline)
Intel's IA-64 architecture (used in Intels Itanium CPUs)
AMD's AMD64 architecture (used in AMD's Opteron and Athlon 64 CPUs).
o Intel now markets the same architecture for its own processors as EM64T.
SPARC architecture
o Sun's UltraSPARC architecture
o Fujitsu's SPARC64 architecture
IBM's POWER architecture
IBM/Motorola's PowerPC architecture (originally the PowerPC 620, more
recently the PowerPC 970 µP)
IBM's z/Architecture, used by IBM zSeries and System z9 mainframes
MIPS Technologies' MIPS IV, MIPS V, and MIPS64 architectures
HP's PA-RISC family
Some 64-bit processor architectures can execute 32-bit code natively without any
performance penalty, such as AMD64, MIPS64,Sparc64, zSeries, PowerPC64, etc. This
kind of support is commonly called biarch support or more generally multi-arch support.
Beyond 64 bits
64-bit words seem to be sufficient for most practical uses today (circa 2004). Still it may
be mentioned that IBM's System/370 used 128-bit floating point numbers, and many
modern processors also include 128-bit floating point registers. The System/370 was
notable, however, in that it also used variable-length decimal numbers of up to 16 bytes
(i.e. 128-bit).
From Wikipedia, the free encyclopedia.
IA-32, sometimes generically called x86-32, is the computer architecture of Intel's most
successful microprocessors. Within various programming language directives it is also
referred to as "i386". The term may be used to refer to the 32-bit extensions to the
original x86 architecture, or to the architecture as a whole.
This architecture defines the instruction set for the family of microprocessors installed in
the vast majority of personal computers in the world.
The term means Intel Architecture, 32-bit, which distinguishes it from the 16-bit
versions of the architecture that preceded it, and the 64-bit architecture IA-64 (which is
very different, although it has an IA-32 compatibility mode). The more generic name for
all 16 and 32-bit versions of this architecture is x86.
Intel was the inventor and is the biggest supplier of processors compatible with this
instruction set, but it is not the only supplier of such processors. The second biggest
supplier is AMD. And then there are numerous even smaller more specialized suppliers
of these processors.
This instruction set was introduced in the Intel 80386 microprocessor in 1985. This
instruction set is still the basis of most PC microprocessors twenty years later in 2005.
Even though the instruction set has remained intact, the successive generations of
microprocessors that run it have become much faster at running it.
The IA-32 instruction set is usually described as CISC (Complex Instruction Set
Computer) architecture, though such classifications have become less meaningful with
advances in microprocessor design.
Two memory management models
There are two memory access models that IA-32 supports. One is called Real mode, and
the other is called Protected mode. In Real Mode, the processor is limited to accessing a
total of just over 1MB of memory, while in Protected mode it can access all of its
Real mode
The old DOS operating system required the real mode to work, while newer Windows,
Linux and other operating systems usually require the protected mode. Upon power-on
(aka booting), the processor initiates itself into Real mode, and then it begins loading
programs automatically into RAM from ROM and disk. A program inserted somewhere
along the boot sequence may be used to put the processor into the Protected mode.
Protected mode
In Protected mode, a number of other advantages beyond just the additional memory
addressability beyond the DOS 1MB limit get activated. One of them is protected
memory, which prevents programs from corrupting one another. Another one is virtual
memory, which lets programs use more memory than is physically installed on the
machine. And the third feature is task-switching, aka multitasking, which lets a computer
juggle multiple programs all at once to look like they are all running at the same time.
The size of memory in Protected mode is usually limited to 4GB. However, this isn't the
ultimate limit of the size of memory in IA-32 processors. Through tricks in the
processor's page and segment memory management systems, IA-32 operating systems
may be able to access more than 32-bits of address space, even without the switchover to
the 64-bit paradigm. One such trick is known as PAE (Physical Address Extensions).
Virtual 8086 mode
There was also a sub-mode of operation in Protected mode, called virtual 8086 mode.
This is basically a special hybrid operating mode which allowed old DOS programs and
operating systems to run while under the control of a Protected mode supervisor
operating system. This allowed for a great deal of flexibility in running both Protected
mode programs and DOS programs simultaneously. This mode was added only with the
IA-32 version of Protected mode, it did not exist previously in the 80286 16-bit version
of Protected mode.
The 386 has eight 32-bit general purpose registers for application use. There are 8
floating point stack registers. Later processors added new registers with their various
SIMD instruction sets too, such as MMX, 3DNow!, and SSE.
There are also system registers that are used mostly by operating systems but not by
applications usually. They are known as segment, control, debug, and test registers. There
are six segment registers, used mainly for memory management. The number of control,
debug or test registers varies from model to model.
General Purpose registers
The x86 general purpose registers are not really as general purpose as their name implies.
That is because these general purpose registers have some highly specialized tasks that
can often only be done by using only one or two specific registers. In other architectures,
any general purpose register can be used for any purpose. The x86 general purpose
registers further subdivide into registers specializing in data and others specializing in
Also a lot of operations can be done either inside a register or directly inside RAM
without requiring the data to be loaded into a register first. The 1970s heritage of this
architecture shows through by this behaviour.
Note: with the advent of the 64-bit extensions to x86 in AMD64, this odd behaviour has
now been cleaned up (at least in 64-bit mode). General purpose registers are now truly
general purpose and they can be used interchangeably. This does not affect the 32-bit
architecture, however.
8-bit and 16-bit register subsets
8-bit and 16-bit subsets of these registers are also accessible. For example, the lower 16bits of the 32-bit EAX registers can be accessed by calling it the AX register. Some of the
16-bit registers can be further subdivided into 8-bit subsets too; for example, the upper 8bit half of AX is called AH, and the lower half is called AL. Similarly, EBX is
subdivided into BX (16-bit), which in turn is divided into BH and BL (8-bit).
General data registers
All of the four following registers may be used as general purpose registers. However
each has some specialized purpose as well. Each of these registers also have 16-bit or 8bit subset names.
EAX Accumulator (with a special interpretation for arithmetic instructions; a for
EBX base register (used for addressing data in the data segment)
ECX counter (with a special interpretation for loops, c for counter)
EDX data register
General address registers
Used only for address pointing. They have 16-bit subset names, but no 8-bit subsets.
EBP base pointer (holds the address of the current stack frame)
ESI source index (for string operations)
EDI destination index (for string operations)
ESP stack pointer (holds the top address of the stack)
EIP instruction pointer (holds the current instruction address)
Floating point stack registers
Initially, IA-32 included floating-point capabilities only on add-on processors (8087,
80287 and 80387.) With the introduction of the 80486, these 8 80x87 floating point
registers, known as ST(0) through ST(7) are built in to the CPU. Each register is 80 bits
wide and stores numbers in the extended precision format of the IEEE floating-point
These registers are not accessible directly, but are accessible like a LIFO stack. The
register numbers are not fixed, but are relative to the top of the stack; ST(0) is the top of
the stack, ST(1) is the next register below the top of the stack, ST(2) is two below the top
of the stack, etc. That means that data is always pushed down from the top of the stack,
and operations are always done against the top of the stack. So you couldn't just access
any register randomly, it has to be done in the stack order.
SIMD registers
MMX, 3DNow!, and SSE also added new registers of their own to the IA-32 instruction
MMX registers
MMX added 8 new registers to the architecture, known as MM0 through MM7
(henceforth referred to as MMn). In reality, these new registers were just aliases for the
existing x87 FPU stack registers. Hence, anything that was done to the floating point
stack would also affect the MMX registers. Unlike the FP stack, these MMn registers
were fixed not relative, and therefore they were randomly accessible.
Each of the MMn registers are 64-bit integers. However, one of the main concepts of the
MMX instruction set is the concept of packed data types, which means instead of using
the whole register for a single 64-bit integer (quadword), two 32-bit integers
(doubleword), four 16-bit integers (word) or eight 8-bit integers (byte) may be used.
Also because the MMX's 64-bit MMn registers are aliased to the FPU stack, and each of
the stack registers are 80-bit wide, the upper 16-bits of the stack registers go unused in
MMX, and these bits are set to all ones, which makes it look like NaN's or infinities in
the floating point view. This makes it easier to tell whether you are working on a floating
point data or MMX data.
3DNow! registers
3DNow! was designed to be the natural evolution of MMX from integers to floating
point. As such, it uses the exact same register naming convention as MMX, that is MM0
through MM7. The only difference is that instead of packing byte to quadword integers
into these registers, one would pack single precision floating points into these registers.
The advantage of aliasing registers with the FPU registers is that the same instruction and
data structures used to save the state of the FPU registers can also be used to save
3DNow! register states. Thus no special modifications are required to be made to
operating systems which would otherwise not know about.
SSE registers
SSE discarded all legacy connections to the FPU stack. This also meant that this
instruction set discarded all legacy connections to previous generations of SIMD
instruction sets like MMX. But it freed the designers up, allowing them to use larger
registers, not limited by the size of the FPU registers. The designers created eight 128-bit
registers, named XMM0 through XMM7. (Note: in AMD64, the number of SSE XMM
registers has been increased from 8 to 16.)
But the downside is that operating systems had to have an awareness of this new set of
instructions in order to be able to save their register states. So Intel created a slightly
modified version of Protected mode, called Enhanced mode which enables the usage of
SSE instructions, whereas they stay disabled in regular Protected mode. An OS that is
aware of SSE will activate Enhanced mode, whereas an unaware OS will only enter into
traditional Protected mode.
SSE is a SIMD instruction set that works only on floating point values, like 3DNow!.
However, unlike 3DNow! it severs all legacy connection to the FPU stack. Because it has
larger registers than 3DNow!, SSE can pack twice the number of single precision floats
into its registers. The original SSE was limited to only single-precision numbers, like
3DNow!. The SSE2 introduced the capability to pack double precision numbers too,
which 3DNow! had no possibility of doing since a double precision number is 64-bit in
size which would be the full size of a single 3DNow! MMn register. At 128-bit, the SSE
XMMn registers could pack two double precision floats into one register. Thus SSE2 is
much more suitable for scientific calculations than either SSE1 or 3DNow!, which were
limited to only single precision.
The full listing of the x86 machine language mnemonics including integer, floating point,
and SIMD instructions can be found in the X86 instruction listings link. They are
categorized into a chronological and hierarchal format showing when the instructions
first became available, and what category of instructions they are.
The original IA-32 instruction set has been evolved over time with the addition of the
multimedia instruction updates. However, the ultimate evolution of IA-32 will be when it
becomes 64-bit, but of course at that point it cannot be called IA-32 anymore. It is called
x86_64 and the first implementation was AMD's AMD64. We cannot call it IA-64 as
Intel and HP already saved this label for their new Itanium design and this design is not
really an evolution which extends IA-32 but AMD64 is. AMD64 was the first x86_64
instruction set designed. Later, Intel followed by imitating AMD's design with what they
call EM64T.
SIMD Multimedia Instruction Set updates
Various generations of IA-32 CPUs since have added several extensions to the original
instruction set. They were known technically as SIMD instruction sets. However, more
colloquially they were known as Multimedia instruction sets, because they were mainly
used in multimedia entertainment software applications.
The MMX extensions were the first major upgrade. This was a set of integer-only
SIMD instructions. This was co-introduced by Intel and AMD in their Pentium
MMX and K6 processors, in 1997. It shared its registers with the x87 FPU;
therefore operating systems did not have to be modified to accept these
instructions, they automatically worked if the OS also supported x87 state-saving.
MMX was further upgraded with the addition of floating-point SIMD capabilities,
with the introduction of 3DNow! in early 1999. Like MMX, this set shared its
registers with the x87 FPU too. This extension was introduced by AMD in the
K6-2 processor, but it was never picked up by Intel.
SSE was single precision floating point SIMD introduced by Intel in late 1999,
with the introduction of the Pentium III processor. Unlike 3DNow!, it was not an
extension to the MMX extension, nor did it share its registers with the x87 FPU.
It required some modifications to operating systems for them to work. This added
programming inconvenience was made up for by the fact that SSE worked
unencumbered by any of the old limitations of the x87 FPU. This instruction set
was adopted eventually by AMD starting with its Athlon XP processor; all further
extensions to SSE will likely be adopted by AMD from now on, as it will no
longer make any extensions to its own 3DNow! instructions.
SSE2 was introduced in early 2001 with the introduction of the Pentium 4
processor. This was a further upgrade to the original SSE, adding double
precision operations to its bag of tricks.
SSE3 was introduced in early 2004, in an upgraded version of the Pentium 4,
codenamed Prescott. It featured some minor tweaks to the SSE2 extensions.
Next-generation 64-bit Instruction Sets
Two new instruction sets can claim to be the 64-bit successor to IA-32. One of them
builds on top of IA-32 but has a different name, while the other one discards IA-32
completely but has a similar name.
Intel's IA-64 architecture is not directly compatible with the IA-32 instruction set. It
completely discards all IA-32 instructions, and starts from scratch with a completely
different instruction set as well as using a VLIW design instead of out-of-order execution.
IA-64 is the architecture used by their Itanium line of processors. The Itanium has
hardware-support for IA-32, though very slow because of the different approach. IA-32
execution mode is set by the EFI program loaded on boot-up. The nomenclature "IA-64"
means "Intel Architecture, 64-bit", but the connection with IA-32 is only in the name.
AMD's AMD64 instruction set, aka x86-64, is largely built on top of IA-32, and thus
maintains the x86 family heritage. While extending the instruction set, AMD took the
opportunity to clean up some of the odd behaviour of this instruction set that has existed
(plagued?) since its earliest 16-bit days, while the processor is operating in 64-bit mode.
They also doubled the number of general purpose registers from 8 to 16; and the general
purpose registers are now much more truly general-purpose registers. They also doubled
the number of SSE registers from 8 to 16 as well. They have also deprecated most of the
functionality of the segment registers, since their usage has steadily declined even during
the IA-32 days.
By February 2004, Intel implicitly acknowledged the logic of the AMD64 instruction set,
deriving from it the EM64T, which is very similar to AMD64. This extension is
compatible with code written for the AMD64. Intel started using the set starting with the
Xeon Nocona core in 2004, introducing it to the desktop market with the Pentium 4
Prescott 2M in early 2005.
From Wikipedia, the free encyclopedia.
In computing, IA-64 (Intel Architecture-64) is a 64-bit processor architecture developed
in cooperation by Intel and Hewlett-Packard, implemented by processors such as Itanium
and Itanium 2. The goal of Itanium was to produce a "post-RISC era" architecture, using
a very long instruction word (VLIW) design. Unlike Intel x86 processors, the Itanium is
not geared toward high performance execution of the IA-32 (x86) instruction set.
In a mainstream "out-of-order" design, a complex decoder system examines each
instruction as they flow through the pipeline and sees which can be fed off to operate in
parallel across the available execution units — e.g., a series of instructions that say A = B
+ C and D = F + G will not affect each other, and so they can be fed into two different
execution units and run in parallel. The ability to extract instruction level parallelism
(ILP) from the instruction stream is essential for good performance in a modern CPU.
Predicting which code can and cannot be split up this way is a very complex task. In
many cases the inputs to one line are dependent on the output from another, but only if
some other condition is true. For instance, consider the slight modification of the example
noted before, A = B + C; IF A==5 THEN D = F + G. In this case the calculations
remain independent of the other, but the second command requires the results from the
first calculation in order to know if it should be run at all.
In these cases the circuitry on the CPU typically "guesses" what the condition will be. In
something like 90% of all cases, an IF will be taken, suggesting that in our example the
second half of the command can be safely fed into another core. However, getting the
guess wrong can cause a significant performance hit when the result has to be thrown out
and the CPU waits for the results of the "right" command to be calculated. Much of the
improving performance of modern CPUs is due to better prediction logic, but lately the
improvements have begun to slow.
IA-64 instead relies on the compiler for this task. Even before the program is fed into the
CPU, the compiler examines the code and makes the same sorts of decisions that would
otherwise happen at "run time" on the chip itself. Once it has decided what paths to take,
it gathers up the instructions it knows can be run in parallel, bundles them into one larger
instruction, and then stores it in that form in the program—hence the name VLIW or
"very long instruction word."
Moving this task from the CPU to the compiler has several advantages. First, the
compiler can spend considerably more time examining the code, a benefit the chip itself
doesn't have because it has to complete as quickly as possible. Thus the compiler version
can be considerably more accurate than the same code run on the chip's circuitry. Second,
the prediction circuitry is quite complex, and offloading prediction to the compiler
reduces that complexity enormously. It no longer has to examine anything; it simply
breaks the instruction apart again and feeds the pieces off to the cores. Third, doing the
prediction in the compiler is a one-off cost, rather than one incurred every time the
program is run.
The downside is that a program's runtime-behaviour is not always obvious in the code
used to generate it, and may vary considerably depending on the actual data being
processed. The out-of-order processing logic of a mainstream CPU can make decisions
on the basis of actual run-time data which the compiler can only guess at. That means
that it is possible for the compiler to get its prediction wrong more often than comparable
(or simpler) logic placed on the CPU. The VLIW design thus relies heavily on the
performance of the compilers, the trade-off being to decrease microprocessor hardware
complexity by increasing compiler software complexity.
The IA-64 architecture includes a very generous complement of registers: 128 each of
82-bit floating point and 64-bit integer registers. In addition to the sheer number, IA-64
adds in a register rotation mechanism that is controlled by the Register Stack Engine.
Rather than the typical spill/fill or window mechanisms used in other processors, the
Itanium can rotate in a set of new registers to accommodate for new function parameters
or temporaries. The register rotation mechanism combined with predication is also very
effective in executing automatically unrolled loops.
Instruction set
The architecture also provides instructions for multimedia operations and floating point
Where a typical VLIW will assign sub-instructions from each long instruction word to a
particular fixed functional unit, the Itanium supports several bundle mappings to allow
for more instruction mixing possibilities and which include a balance between serial and
parallel execution modes. There was room left in the initial bundle encodings to add more
mappings in future versions of IA-64. In addition, the Itanium has individually settable
predicate registers to cause a kind of runtime determined "no output" mode to each
A raw Itanium, when first booted, is actually missing some of its instruction
functionality. A boot-rom like program called an EFI program is loaded which loads
additional code into on-chip memory for defining these instructions, and performing
other boot-time configurations, such as choosing the execution mode of the processor
(64-bit versus 32-bit.) This design allows an Itanium system to be deployed with different
capabilities depending on the contents of the EFI program.
IA-32 support
In order to support IA-32, the Itanium can switch into 32-bit mode with special jump
escape instructions. The IA-32 instructions have been mapped to the Itanium's functional
units. However, since the Itanium is built primarily for speed of its EPIC-style
instructions, and because it has no out-of-order execution capabilities, IA-32 code
executes at a severe performance penalty compared to either the IA-64 mode or the
Pentium line of processors. For example, the Itanium functional units do not
automatically generate integer flags as a side effect of ordinary ALU computation, and do
not intrinsically support multiple outstanding unaligned memory loads. There are also IA32 software emulators which are freely available for Windows and Linux, and these
emulators typically outperform the hardware-based emulation by around 50%. The
Windows emulator is available from Microsoft, the Linux emulator is available from
some Linux vendors such as Novell. Given the superior performance of the software
emulator, there has been some speculation that Intel will remove IA-32 emulation from
future Itanium processors. However, the IA-32 hardware accounts for less than 1% of the
transistors of an Itanium 2, and so there is little to gain from doing so.
Although other 64-bit architectures have existed for a long time, most (MIPS, Alpha, PARISC) have faded from the marketplace. Itanium's remaining competition for the 64-bit
server and workstation market appear to be the resurrected AMD with its AMD64
architecture, and the entrenched rivals: IBM's POWER architecture, and Sun's UltraSparc
architecture. Although Apple might have challenged Intel with its XServe product line
based on the IBM PowerPC architecture, any such prospect evaporated with the
announcement of Apple's adoption of the Intel IA-32 architecture for its future products.
In response to favorable industry reaction to the AMD64, Intel's new version of the Xeon
(Nocona) supports EM64T extensions to IA-32, which are largely instruction-set
compatible with AMD64.
From Wikipedia, the free encyclopedia.
The AMD64 or x86-64 or x64 is a 64 bit processor architecture invented by AMD. It is a
superset of the x86 architecture, which it natively supports. The AMD64 Instruction set is
currently used in AMD's Athlon 64, Athlon 64 FX, Athlon 64 X2, Turion 64, Opteron
and later Sempron processors.
Architecture Overview
AMD's x86-64 instruction set (later renamed AMD64) is a straightforward extension of
the x86 architecture to 64 bits, motivated by the fact that the 4GB of memory directly
addressable by a 32 bit CPU is no longer sufficient for all applications. Some of the
New registers. The number of general-purpose registers (GPRs) is increased from
8 in x86-32 to 16, and the size of these registers is increased from 32 bits to 64
bits. Additionally, the number of 128 bit XMM registers (used for Streaming
SIMD instructions) is also increased from 8 to 16. The additional registers
increase performance.
Larger address space. Due to the 64 bit architecture, the AMD64 architecture
can address up to 256 tebibytes (also known as terabytes) of memory in its current
implementations. This is compared to just 4 GB for x86-32, only half of which is
available to applications under the most common versions of Microsoft Windows.
Future implementations of the AMD64 architecture may provide up to 2
exbibytes (also known as exabytes) of available memory. If paging is used
properly, 32 bit operating systems can access some of the physical address
extensions of the processor without having to execute in long mode. Virtual
memory for all programs running in 32 bit mode is still limited to 4GB.
RIP relative data access. Instructions can now reference data relative to the
program counter, which makes code in shared libraries that are not compiled to a
fixed address more efficient. It also allows shared libraries to be mapped
anywhere in the virtual address space.
SSE instructions. The AMD64 architecture includes Intel's SSE and SSE2
instructions, newer E-stepping CPU include SSE3 as well. The x87 and MMX
instructions are supported.
NX bit. The NX bit is a processor feature that allows the operating system to
forbid code execution in data areas, improving security. This feature is available
in both 32 bit and 64 bit modes, and is supported by Linux, Solaris, Windows XP
SP2, Windows Server 2003 SP1 and newer. The NX bit (when coupled with an
OS which takes advantage of it) is referred to in AMD's marketing literature as
Enhanced Virus Protection (EVP). While it does indeed block a common attack
vector for many types of malware (most notably buffer overflows), the NX bit
(nor any single technological measure) is insufficient to prevent viruses from
infecting a computer. Trade regulators in The Netherlands recently asked AMD to
cease calling the NX bit "Enhanced Virus Protection" in advertisements in that
country, stating that the NX capability was not a suitable substitute for other
countermeasures, such as anti-virus software.
It should be noted that the NX bit has long been available on 32-bit x86
processors in the PAE (Paged Addressing Environment) mode, originally
introduced in the 80286 processor. However, PAE has long been considered an
obsolete mode of operation by systems software vendors (no current PC OS uses
it); and AMD was the first x86-family vendor to support it in linear addressing
mode. Intel and other x86 CPU vendors are now supporting the NX bit in their
product offerings as well.
Operating modes
Operating mode
Operating Application Default Default
recompile address operand
64 bit mode
New 64 bit
Compatibility OS
Legacy Virtual 8086
Mode mode
Real mode
Legacy 32
bit OS
Legacy 16
bit OS
*General Purpose Register
Operating mode explanation
There are two primary modes of operation for this architecture:
Long Mode
The intended primary mode of operation of the architecture; it is a combination of
the processor's native 64 bit mode and a 32 bit compatibility mode. It also
abandons some of the more half-baked or lesser-used features of the 80386. It is
used by 64 bit Operating Systems; among those that support Long Mode are
Linux, the various BSDs, Solaris 10 and Windows XP Professional x64 Edition.
Since the basic instruction set is the same, there is no major performance penalty
for executing x86 code. This is unlike Intel's IA-64, where differences in the
underlying ISA means that running 32 bit code is like using an entirely different
processor. However, on AMD64, 32 bit x86 applications may still benefit from a
64 bit recompile, due to the additional registers in 64 bit code, which a high-level
compiler can use for optimization.
Using Long Mode, a 64 bit OS can run 32 bit applications and 64 bit applications
simultaneously. Also, x86-64 includes native support for running 16 bit x86
applications. Microsoft, however, has explicitly left out 16 bit program support in
Windows XP Professional x64 Edition due to problems in getting 16 bit x86 code
to run via their WoW64 Subsystem.
Legacy Mode
The mode used by 16 bit operating systems, like MS-DOS, and 32 bit operating
systems, such as Windows XP. In this mode, only 16 bit or 32 bit code can be
executed. 64 bit programs (such as the GUI setup program for Windows XP
Professional x64 Edition and Windows Server 2003 x64 Edition) will not run.
Market analysis
AMD64 represents a break with AMD's past behavior of following Intel's standards, but
follows Intel's earlier behavior of extending the x86 architecture, from the 16 bit 8086 to
the 32 bit 80386 and beyond, without ever removing backwards compatibility. The
AMD64 architecture extends the 32 bit x86 architecture (IA-32) by adding 64 bit
registers, with full 32 bit and 16 bit compatibility modes for earlier software. Even the 64
bit mode is largely backwards compatible, allowing existing tools targeting x86 such as
compilers to be retargeted to AMD64 with minimal effort. The AMD64 architecture also
features the NX bit.
The following processors implement the AMD64 architecture:
o AMD Athlon 64
o AMD Athlon 64 X2
o AMD Athlon 64 FX
o AMD Opteron
o AMD Turion 64
o AMD Sempron (only 'Palermo' models using the E6 stepping)
o Intel Xeon (some models since 'Nocona')
o Intel Celeron D (some models since 'Prescott')
o Intel Pentium 4 (some models since 'Prescott')
o Intel Pentium D
o Intel Pentium Extreme Edition
o Intel Pentium M (some models starting with 'Merom')
o Intel Conroe (upcoming desktop core)
From Wikipedia, the free encyclopedia.
Extended Memory 64-bit Technology (EM64T) is Intel's implementation of AMD64, a
64-bit extension to the IA-32 architecture. See the AMD64 article for architectural
The history of the EM64T project is long and convoluted, mainly due to the internal
politics of Intel. It began with the codename Yamhill, named after the Yamhill River
river in Oregon's Willamette Valley. After several years of denying that this project
existed, Intel eventually admitted it existed in early 2004, and gave it the codename CT
(Clackamas Technology), also named after an Oregon river (the Clackamas River, also a
tributary of the Willamette River). Then within the space of weeks of the CT
announcement, Intel gave it several new names. After the spring 2004 IDF, Intel named it
IA-32E (IA-32 Extensions) and a few weeks later devised the name EM64T. Intel's
chairman at the time, Craig Barrett, admitted that this was one of their worst kept secrets.
Intel CPUs with EM64T
Intel's first processor to actively implement the EM64T technology is the processor
codenamed Nocona, and is being sold as Intel's latest multiprocessor Xeon. Since the
Xeon itself is directly based on Intel's desktop processor, the Pentium 4, the Pentium 4
also has EM64T technology built in, although as with Hyper-Threading, this feature was
not initially enabled on the then-new Prescott design, likely because Intel had not yet
perfected it at the time. Intel has since begun selling EM64T enabled Pentium 4s using
the E0 revision of the Prescott core, being sold on the market as the Pentium 4, model F.
The E0 revision also adds eXecute Disable(XD) support to EM64T, Intel's name for the
NX bit, and should be backported in to the Nocona design soon. All
8xx/6xx/5x6/5x1/3x6/3x1 series CPU's have EM64T enabled, as will all future Intel
As of June 2005, none of Intel's notebook CPUs (the Pentium M family, the Celeron M
family, or the Mobile Intel Pentium 4 processors) support EM64T. The first Pentium M
derivative supporting EM64T will be the dual core Merom targeted for mid-2006.