Download Architecture and Programming of x86 Processors

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Stream processing wikipedia , lookup

ILLIAC IV wikipedia , lookup

Parallel computing wikipedia , lookup

Protected mode wikipedia , lookup

Supercomputer architecture wikipedia , lookup

Transcript
Brno University of Technology
Architecture and Programming of x86 Processors
Microprocessor Techniques and Embedded Systems
Lecture 12
Dr. Tomas Fryza
December 2012
Contents
A little bit of one-core Intel processors history
IA-32 processor registers
IA-32 processor programming in assembly language
Contents
A little bit of one-core Intel processors history
IA-32 processor registers
IA-32 processor programming in assembly language
History of Intel x86 processors
I
4-bit processor 4004; 8-bit processors 8080, 8085.
I
16-bit processors: The first processor in the IA (Intel Architecture) family was the
8086, introduced in 1979: 20-bit address bus, 16-bit data bus. It is available as
40-pin Dual-Inline-Package (DIP). It is available in three versions: 8086 (5 MHz),
8086-2 (8 MHz), and 8086-1 (10 MHz). It consists of 29,000 transistors.
I
8088 is identical to 8086, except it has 8-bit data bus. Intel introduced
segmentation with 8086 and 8088 (real-mode segmentation). These processors
can address up to four segments of 64 KB each.
I
x86 is the generic name for Intel processors released after the original 8086
processor.
I
Since Intel’s x86 processors are backwards compatible, newer x86 processors can
run all the programs that older processors could run. However, older processors
may not be able to run software that has been optimized for newer x86 processors.
I
80186 is a faster version of the 8086. Also 20-bit AB and 16-bit DB. Never
widely used in computer systems.
I
80286 introduced in 1982. It has a 24-bit AB, which implies 16 MB of memory
address space (224 = 16,777,216). DB is still 16 bit wide. It introduces protection
mode (some memory protection capabilities).
8086 memory segmentation
I
Segment Registers: additional registers called segment registers generate memory
address when combined with other in the microprocessor. In 8086 microprocessor,
memory is divided into 4 segments as follow:
I
I
I
I
Code Segment (CS): The CS register is used for addressing a memory location in the
Code Segment of the memory, where the executable program is stored.
Data Segment (DS): The DS contains most data used by program. Data are accessed
in the Data Segment by an offset address or the content of other register that holds the
offset address.
Stack Segment (SS): SS defined the area of memory used for the stack.
Extra Segment (ES): ES is additional data segment that is used by some of the string
to hold the destination data.
Figure: Memory segments of 8086.
Intel 80186 architecture
Figure: Architecture of Intel 80186 processor.
Intel’s 32-bit processors in computer systems
I
I
I
Intel introduced its first 32-bit processor–the 80386–in 1985. It has 32-bit AB
and 32-bit DB. It follows their 32-bit architecture known as IA-32. The memory
address space has grown from 16 MB to 4 GB (232 = 4.2950e+09). Intel
introduced paging into the IA architecture. It also allowed definition of segments
as large as 4 GB. This effectively allowed for a flat model (i.e. effectively turning
off segmentation).
The Intel 80486 processor was introduced in 1989. This is an improved version of
the 80386 (same AD, DB), but it combined the coprocessor functions for
performing floating-point arithmetic. 80486 has added more parallel execution
capability to instruction decode and execution units to achieve a scalar execution
rate of one instruction per clock. It has 8 KB onchip L1 cache, it supports L2
cache, and multiprocessing.
Pentium (name 80586 is not used, because it is not trademarked) was introduced
in 1993-03-22 (20th anniversary). Similar to 80486 but uses a 64-bit wide DB.
Internally, it has 128- and 256-bit wide datapaths to speed up internal data
transfers. It has added a second execution pipeline to achieve superscalar
performance by having the capability to execute two instructions per clock (the
first superscalar x86 processor). Doubled onchip L1 cache: 8 KB for data, 8 KB
for instructions (branch prediction added). Produced using a 0.8 micron (800 nm)
production process, the first Pentium chips were built from 3.1 million transistors
(compared to Core i7 chips, which have 1.4 billion transistors and are fabricated
using a 22 nm process). The first Pentium chips were also introduced in 60 MHz
and 66 MHz versions. iComp benchmark scores rating the 66 MHz Pentium at
565, compared with 297 for the 66 MHz 486DX2, which was the fastest chip
available prior to the Pentium launch.
Intel Pentium (80586) architecture
Figure: Architecture of Intel Pentium processor.
Intel Pentium Pro
I
The Pentium Pro was introduced in November 1995 as Intel’s 6th generation x86
design–code-named the P6. P6 has a three-way superscalar architecture (3 insts.
per clock cycle). AD has been extended to 36 bits (address space
236 = 6.8719e+10, i.e. 64 GB). In addition to the L1 caches provided by the
Pentium, the Pent. Pro has a 256 KB L2 cache in the same package as the CPU.
I
Powerful, but expensive.
(a)
(b)
Figure: Intel Pentium Pro: (a) package, (b) CPU and L2 cache die.
Another Pentiums . . .
I
The Pentium II processor was introduced in May 1997 and it has added
multimedia (MMX) instructions to the Pentium Pro architecture. L1D and L1P
caches have been extended to 16 KB each. It has also added more comprehensive
power management features including Sleep and Deep Sleep modes to conserve
power during idle times. The Pentium II abandoned the socket approach to
microprocessors, and introduced the slot concept. Containing 7.5 million
transistors (the first P6-generation core of the Pentium Pro contained 5.5 million
transistors). However, its L2 cache subsystem was a downgrade when compared
to Pentium Pros.
(a)
(b)
Figure: (a) Intel Pentium II Deschutes; CPU Core in the middle, cache on the right, (b) mobile
version of Pentium II Tonga.
Another Pentiums . . .
I
The Pentium III processor (Feb 1999) introduced streaming SIMD extensions
(SSE), cache prefetch instructions, and memory fences, and the single-instruction
multiple-data (SIMD) architecture for concurrent execution of multiple
floating-point operations. Pentium 4 enhanced these features further.
I
I
I
I
Code name: Katmai, 250 nm, May 1999
Coppermine, 180 nm, Mar 2000 (Remq.: The Pentium III Coppermine was the first
commercial x86 processor from Intel to attain a clock speed of 1 GHz)
Coppermine T, 180 nm, Aug 2000
Tualatin, 130 nm, Apr 2001
(a)
(b)
Figure: Intel Pentium III: (a) standard logo, (b) code name Coppermine.
64-bit processor was born
I
Intel’s 64-bit Itanium processor (released in 2001; formerly called IA-64) is
targeted for server applications and high-performance computing systems. The
Itanium uses a 64-bit AB to provide substantially larger address space. Its DB is
128 bits wide. In a major departure, Intel has moved from the CISC designs used
in their 32-bit processors to RISC orientation for their 64-bit Itanium processors.
I
I
I
I
Each 128-bit instruction word contains three instructions, and the fetch mechanism can
read up to two instruction words per clock from the L1 cache into the pipeline.
When the compiler can take maximum advantage of this, the processor can execute six
instructions per clock cycle.
The processor has thirty functional execution units (6 general-purpose ALUs, 2 integer
units, 1 shift unit, 6 data cache units, 6 multimedia units, 2 parallel shift units,
1 parallel multiply, 1 population count, 2 82-bit floating-point multiply-accumulate
units, 2 SIMD floating-point multiply-accumulate units (two 32-bit operations each),
3 branch units) in eleven groups.
Each unit can execute a particular subset of the instruction set, and each unit executes
at a rate of one instruction per cycle unless execution stalls waiting for data. While not
all units in a group execute identical subsets of the instruction set, common instructions
can be executed in multiple units.
(a)
(b)
Figure: Intel Itanium: (a) modified logo from 2009, (b) Itanium 2 McKinley.
Intel Itanium architecture
Figure: Architecture of Intel Itanium processor.
Contents
A little bit of one-core Intel processors history
IA-32 processor registers
IA-32 processor programming in assembly language
Processor registers
I
The IA-32 architecture provides ten 32-bit and six 16-bit registers. These
registers are grouped into general, control, and segment registers.
I
The general registers are further divided into data, pointer, and index registers.
Figure: IA-32 data registers.
I
There are four 32-bit data registers that can be used for arithmetic, logical, and
other operations:
I
I
I
Four 32-bit registers (EAX–accumulator, EBX–base, ECX–counter, EDX–data); or
Four 16-bit registers (AX, BX, CX, DX); or
Eight 8-bit registers (AH, AL, BH, BL, CH, CL, DH, DL).
Data, pointer, and index registers
(a)
(b)
Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.
I
Some registers have special functions when executing specific instructions. For
example, when performing a multiplication operation, one of the two operands
should be in the EAX, AX, or AL register depending on the operand size.
Similary, the ECX or CX register is assumed to contain the loop count value for
iterative instructions.
I
The two index registers (ESI, EDI) play a special role in the string processing
instructions, but can be used as general-purpose data registers as well.
I
The pointer registers are mainly used to maintain the stack. Even though they
can be used as general-purpose data registers, they are almost exclusively used for
maintaining the stack.
Move operation examples
(a)
(b)
Figure: IA-32 general registers: (a) data registers, (b) pointer and index registers.
Table: MOV and its operands.
Machine
instruction
Destination
operand
Source
operand
Operand notes
MOV
MOV
MOV
MOV
MOV
EAX,
EBX,
BX,
DL,
[EBL],
42h
EDI
CX
BH
EDI
MOV
EDX,
[ESI]
Source in immediate data
Both are 32-bit register data
Both are 16-bit register data
Both are 8-bit register data
Destination is 32-bit memory data at the
address stored in ebp
Source is 32-bit memory data at the address stored in esi
Control registers
I
I
I
I
There are two 32-bit control registers: the instruction pointer register (EIP, or
IP) and the flags register (EFLAGS, or FLAGS).
The processor uses the instruction pointer register to keep track of the location of
the next instruction to be executed (sometimes called the program counter
register). The IP register is used for 16-bit addresses and the EIP register for
32-bit addresses.
When an instruction is fetched from memory, the instruction pointer is updated
to point to the next instruction. This register is also modified during the
execution of an instruction that transfers control to another location in the
program (such as a jump, procedure call, or interrupt).
The FLAGS register is useful in executing 8086 processor code. The EFLAGS
register consists of 6 status flags, 1 control flag, and 10 system flags.
Figure: Flags control register EFLAGS.
Segment registers
I
There are six 16-bit segment registers:
CS
DS
SS
ES
FS
GS
Code segment
Data segment
Stack segment
Extra segment
Extra segment
Extra segment
Figure: The six segment registers support the segmented memory architecture.
I
In segmented memory organization, memory is partioned into segments, where
each segment is a small part of memory. The processor, at any time, can only
access up to six segments of the main memory. The six segment registers point
to where these segments are located in the memory.
I
A program is logically divided into two parts: a code part that contains only the
instructions, and a data part that keeps only the data. The code segment (CS)
register points to where the program’s instructions are stored in the main
memory, and the data segment (DS) register points to the data part of the
program. The stack segment (SS) register points to the program’s stack segment.
I
The last three segment registers–ES, FS, and GS–are additional segment registers
that can be used in a similar way as the other segment registers.
Segmentation models
I
I
The segments can span the entire memory address space. As a result, we can
effectively make the segmentation invisible by mapping all segment base
addresses to zero and setting the size to 4 GB. Such a model is called a flat
model and is used in programming environments such as UNIX and Linux.
Another model that uses the capabilities of segmentation to the full extent is the
multisegment model.
Figure: Segments in a multisegment model.
Flat segmentation models
(a)
(b)
Figure: Flat segmentation models: (a) basic, (b) protected.
Contents
A little bit of one-core Intel processors history
IA-32 processor registers
IA-32 processor programming in assembly language
Assembly language programming
I
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
3
4
Several assemblers: NASM, YASM, MASM, . . .
section . d a t a
section . t e x t
; FILENAME : s a n d b o x . a s m
global _start
_start :
nop
; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . .
mov
edx , 'WXYZ '
; 32− b i t move [email protected]@57h
mov
ax , 067 FEh
; 16− b i t move
mov
bx , ax
; 16− b i t move
mov
c l , bh
; 8− b i t move
mov
ch , b l
; 8− b i t move
xchg
c l , ch
; e x c h a n g e v a l u e s c l <−>ch
; Put y o u r e x p e r i m e n t s b e t w e e n t h e two n o p s . . .
nop
section . b s s
sandbox : sandbox . o
ld −o sandbox sandbox . o
sandbox . o : sandbox . asm
nasm −f elf −g −F stabs sandbox . asm −l sandbox . lst
I
Makefile example:
nasm
-f elf
-g
-F stabs
-l
invokes the assembler
specifies that the .o file will be generated in the elf format
specifies that debug information is to be included in the .o file
specifies that debug information is to be generated in the stabs format
listing file will be generated
Debugging tools: KDbg, gdb, . . .
(a)
(b)
Figure: Debugging example application in KDbg: (a) main window, (b) register values.