Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
7: Basic x86 architecture Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 1 7.1: What is an instruction set architecture? Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 2 Definitions • Architecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: – instruction set specification, registers. • Microarchitecture: Implementation of the architecture. • Examples: – cache sizes and core frequency. • Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc. 3 Instruction Set Architecture • Assembly Language View Application Program – Processor state • Registers, memory, … – Instructions • addl, movl, leal, … • How instructions are encoded as bytes • Layer of Abstraction – Above: how to program machine • Processor executes instructions in a sequence – Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions simultaneously Compiler OS ISA CPU Design Circuit Design Chip Layout 4 CISC Instruction Sets – Complex Instruction Set Computer – Dominant style through mid-80’s • Stack-oriented instruction set – Use stack to pass arguments, save program counter – Explicit push and pop instructions • Arithmetic instructions can access memory – addl %eax, 12(%ebx,%ecx,4) • requires memory read and write • Complex address calculation • Condition codes – Set as side effect of arithmetic and logical instructions • Philosophy – Add instructions to perform “typical” programming tasks 5 RISC Instruction Sets – Reduced Instruction Set Computer – Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) • Fewer, simpler instructions – Might take more to get given task done – Can execute them with small and fast hardware • Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries • Only load and store instructions can access memory – Similar to Y86 mrmovl and rmmovl – see later! • No Condition codes – Test instructions return 0/1 in register 6 Contrast with x86 / 64-bit • Operations are highly uniform – All encoded in exactly 32 bits – All take the same time to execute (mostly) – All operate between registers, or only load/store – All operate on 64 or 32 bit quantities (nothing smaller) • No condition codes: use registers • Lots of registers, including zero – All registers are uniform 7 Other RISC features (not in Alpha) • Explicit delay slots (e.g. MIPS) – E.g. can’t use a value until 2 instructions after the load • Make most instructions conditional (e.g. ARM) – Needs condition codes (why?) – Reduces branches, increases code density • Etc. • Key message: x86 is not the only way to do this! 8 CISC vs. RISC • Original Debate – Strong opinions! – CISC proponents---easy for compiler, fewer code bytes – RISC proponents---better for optimizing compilers, can make run fast with simple chip design • Current Status – For desktop processors, choice of ISA not a technical issue • With enough hardware, can make anything run fast • Code compatibility more important – For embedded processors, RISC still makes sense • Smaller, cheaper, less power • For how much longer? 9 Comparison with MIPS (remember Digital Design?) • MIPS is RISC: Reduced Instruction Set – Motivation: simpler is faster • Fewer gates ⇒ higher frequency • Fewer gates ⇒ more transistors left for cache – Seemed like a really good idea • x86 is CISC: Complex Instruction Set – More complex instructions, addressing modes • Intel turned out to be way too good at manufacturing • Difference in gate count became too small to make a difference • x86 inside is mostly RISC anyway, decode logic is small – ⇒ Argument is mostly irrelevant these days 10 There are many architectures… • You’ve already seen MIPS 2000 → MIPS 3000 → … – Workstations, minicomputers, now mostly embedded networking • IBM S/360 → S/370 → … → zSeries – First to separate architecture from (many) implementations • ARM (several variants) – Very common in embedded systems, basis for Advanced OS course at ETHZ • IBM POWER → PowerPC (→ Cell, sort of) – Basis for all 3 last-gen games console systems • DEC Alpha – Personal favorite; killed by Compaq, team left for Intel to work on… • Intel Itanium – First 64-bit Intel product; very fast (esp. FP), hot, and expensive – Mostly overtaken by 64-bit x86 designs • etc. 11 Summary • • • • Architecture vs. Microarchitecture Instruction set architectures RISC vs. CISC x86: comparison with MIPS 12 7.2: A bit of x86 history Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 13 Intel x86 Processors • The x86 Architecture dominates the computer market • Evolutionary design – Backwards compatible up until 8086, introduced in 1978 – Added more features as time goes on • Complex instruction set computer (CISC) – Many different instructions with many different formats • But, only small subset encountered with Linux programs – Hard to match performance of Reduced Instruction Set Computers (RISC) – But, Intel has done just that! 14 Intel x86 Evolution: Milestones Name • 8086 Date Transistors 1978 29K MHz 5-10 – First 16-bit processor. Basis for IBM PC & DOS – 1MB address space • 80386 – – – – 1985 275K 16-33 First 32 bit processor , referred to as IA32 Added “flat addressing” Capable of running Unix 32-bit Linux/gcc uses no instructions introduced in later models • Pentium 4F 2005 230M 2800-3800 – First 64-bit [x86] processor – Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of “Core” line 15 IntelArchitectures x86 Processors: Overview Processors X86-16 8086 286 X86-32/IA32 MMX 386 486 Pentium Pentium MMX SSE Pentium III SSE2 Pentium 4 SSE3 Pentium 4E X86-64 / EM64t Pentium 4F SSE4 Core 2 Duo Core i7 IA: often redefined as latest Intel architecture time 16 Intel x86 Processors, contd. • Machine Evolution 486 1989 Pentium 1993 Pentium/MMX ‘97 PentiumPro 1995 Pentium III 1999 Pentium 4 2001 Core 2 Duo 2006 1.9M 3.1M 74.5M 6.5M 8.2M 42M 291M • Added Features – Instructions to support multimedia operations • Parallel operations on 1, 2, and 4-byte data, both integer & FP – Instructions to enable more efficient conditional operations 17 x86 Clones: Advanced Micro Devices (AMD) • Historically – AMD has followed just behind Intel – A little bit slower, a lot cheaper • Then – Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies – Built Opteron: tough competitor to Pentium 4 – Developed x86-64, their own extension to 64 bits • Recently – Intel much quicker with dual core design – Intel currently far ahead in performance – em64t backwards compatible to x86-64 18 Intel’s 64-Bit (partially true…) • Intel Attempted Radical Shift from IA32 to IA64 – Totally different architecture (Itanium) – Executes IA32 code only as legacy – Performance disappointing • AMD Stepped in with Evolutionary Solution – x86-64 (now called “AMD64”) • Intel Felt Obligated to Focus on IA64 – Hard to admit mistake or that AMD is better • 2004: Intel Announces EM64T extension to IA32 – Extended Memory 64-bit Technology – Almost identical to x86-64! 19 Intel Nehalem-EX • Current leader (for the next few weeks) 2.3 billion transistors/die 8 or 10 cores per die 2 threads per core Up to 8 packages (= 128 contexts!) – 4 memory channels per package – Virtualization support – etc. – – – – • Good illustration of why it is hard to teach state-of-the-art processor design! 20 Intel Single-Chip Cloud Computer - 2010 • Experimental processor (only a few 100 made) – Designed for research – Working version in our Lab • 48 old-style Pentium cores • Very fast interconnection network – Hardware support for messaging between cores – Variable speed of network • Non-cache coherent – Sharing memory between cores won’t work with a conventional OS! 21 A quick note on syntax There are two common ways to write x86 Assembler: • AT&T syntax – What we'll use in this course, common on Unix • Intel syntax – Generally used for Windows machines 22 7.3: Basics of machine code Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 23 Assembly programmer’s view Memory CPU Addresses PC Registers Condition Codes Data Instructions Object Code Program Data OS Data Stack Programmer-Visible State – PC: Program counter • Address of next instruction • Called “EIP” (IA32) or “RIP” (x86-64) – Register file • Heavily used program data – Condition codes • Store status information about most recent arithmetic operation • Used for conditional branching Memory • Byte addressable array • Code, user data, (some) OS data • Includes stack used to support procedures 24 Compiling into assembly C code int sum(int x, int y) { int t = x+y; return t; } Generated ia32 assembly sum: pushl %ebp movl %esp,%ebp movl 12(%ebp),%eax addl 8(%ebp),%eax movl %ebp,%esp popl %ebp ret Obtain with command gcc -O -S code.c Produces file code.s Some compilers use single instruction “leave” 25 Assembly data types • “Integer” data of 1, 2, or 4 bytes – Data values – Addresses (untyped pointers) • Floating point data of 4, 8, or 10 bytes • No aggregate types such as arrays or structures – Just contiguously allocated bytes in memory 26 Assembly code operations • Perform arithmetic function on register or memory data • Transfer data between memory and register – Load data from memory into register – Store register data into memory • Transfer control – Unconditional jumps to/from procedures – Conditional branches 27 Object code Code for sum 0x401040 <sum>: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 • Total of 13 bytes 0x89 0xec • Each instruction 1, 2, or 3 bytes 0x5d 0xc3 • Starts at address 0x401040 • Assembler – Translates .s into .o – Binary encoding of each instruction – Nearly-complete image of executable code – Missing linkages between code in different files • Linker – Resolves references between files – Combines with static run-time libraries • E.g., code for malloc, printf – Some libraries are dynamically linked • Linking occurs when program begins execution 28 Machine instruction example • C Code int t = x+y; – Add two signed integers addl 8(%ebp),%eax Similar to expression: x += y – Add 2 4-byte integers • “Long” words in GCC parlance • Same instruction whether signed or unsigned – Operands: More precisely: int eax; int *ebp; eax += ebp[2] 0x401046: • Assembly 03 45 08 • x: • y: • t: Register Memory Register %eax M[%ebp+8] %eax – Return function value in %eax • Object Code – 3-byte instruction – Stored at address 0x401046 29 Disassembling object code Disassembled 00401040 <_sum>: 0: 55 1: 89 e5 3: 8b 45 0c 6: 03 45 08 9: 89 ec b: 5d c: c3 d: 8d 76 00 push mov mov add mov pop ret lea %ebp %esp,%ebp 0xc(%ebp),%eax 0x8(%ebp),%eax %ebp,%esp %ebp 0x0(%esi),%esi • Disassembler – – – – – objdump -d p Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or .o file 30 Alternate disassembly Disassembled Object 0x401040: 0x55 0x89 0xe5 0x8b 0x45 0x0c 0x03 0x45 0x08 0x89 0xec 0x5d 0xc3 0x401040 0x401041 0x401043 0x401046 0x401049 0x40104b 0x40104c 0x40104d <sum>: <sum+1>: <sum+3>: <sum+6>: <sum+9>: <sum+11>: <sum+12>: <sum+13>: push mov mov add mov pop ret lea %ebp %esp,%ebp 0xc(%ebp),%eax 0x8(%ebp),%eax %ebp,%esp %ebp 0x0(%esi),%esi Within gdb Debugger – gdb p – disassemble sum • Disassemble procedure – x/13b sum • Examine the 13 bytes starting at sum 31 What can be disassembled? % objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 30001001: 8b ec 30001003: 6a ff 30001005: 68 90 10 00 30 3000100a: 68 91 dc 4c 30 push mov push push push %ebp %esp,%ebp $0xffffffff $0x30001090 $0x304cdc91 • Anything that can be interpreted as executable code • Disassembler examines bytes and reconstructs assembly source 32 Summary • • • • Compiling into assembly Data types in assembly Assembly code operations Object code, and disassembling it 33 7.4: 32-bit x86 architecture Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 34 general purpose Integer registers (ia32) %eax %ax %ah %al accumulate %ecx %cx %ch %cl counter %edx %dx %dh %dl data %ebx %bx %bh %bl base %esi %si source index %edi %di destination index %esp %sp stack pointer %ebp %bp base pointer 16-bit virtual registers (backwards compatibility) Origin (mostly obsolete) 35 Moving data: ia32 • movx Source, Dest – x in {b, w, l} %eax %ecx %edx %ebx %esi – movl Source, Dest: Move 4-byte “long word” – movw Source, Dest: Move 2-byte “word” %edi %esp %ebp – movb Source, Dest: Move 1-byte “byte” • Lots of these in typical code 36 Moving data: ia32 %eax %ecx movl Source, Dest: %edx • Operand Types %ebx – Immediate: Constant integer data • Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, or 4 bytes – Register: One of 8 integer registers %esi %edi %esp %ebp • Example: %eax, %edx • But %esp and %ebp reserved for special use • Others have special uses for particular instructions – Memory: 4 consecutive bytes of memory at address given by register • Simplest example: (%eax) • Various other “address modes” 37 movl operand combinations Source movl Dest Src,Dest C Analog Imm Reg movl $0x4,%eax Mem movl $-147,(%eax) temp = 0x4; Reg Reg movl %eax,%edx Mem movl %eax,(%edx) temp2 = temp1; Mem Reg temp = *p; movl (%eax),%edx *p = -147; *p = temp; Cannot do memory-memory transfer with a single instruction 38 Simple memory addressing modes • Normal (R) Mem[Reg[R]] – Register R specifies memory address movl (%ecx),%eax • Displacement D(R) Mem[Reg[R]+D] – Register R specifies start of memory region – Constant displacement D specifies offset movl 8(%ebp),%edx 39 Using simple addressing modes void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: pushl %ebp movl %esp,%ebp pushl %ebx movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Set Up Body Finish 40 Using simple addressing modes void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: pushl %ebp movl %esp,%ebp pushl %ebx movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Set Up Body Finish 41 Understanding swap void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } Register %ecx %edx %eax %ebx Value yp xp t1 t0 movl movl movl movl movl movl Offset 12 yp 8 xp 4 Rtn adr 0 Old %ebp -4 Old %ebx 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) Stack • • • # # # # # # ecx edx eax ebx *xp *yp (in memory) %ebp = = = = = = yp xp *yp (t1) *xp (t0) eax 42 ebx Understanding swap Register file %edx 0x124 %ecx 0x120 123 456 %ebx Offset yp 12 0x120 xp 8 0x124 4 Rtn adr %epb → 0 -4 %esi %edi %esp %ebp 0x104 movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) # # # # # # ecx edx eax ebx *xp *yp = = = = = = yp xp *yp (t1) *xp (t0) eax ebx 0x124 0x120 0x11c 0x118 0x114 0x110 0x10c 0x108 0x104 0x100 Memory Address %eax 43 %eax 456 %edx 0x124 %ecx 0x120 Address 123 456 %ebx Offset yp 12 0x120 xp 8 0x124 4 Rtn adr %epb → 0 -4 %esi %edi %esp %ebp 0x104 movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) # # # # # # ecx edx eax ebx *xp *yp = = = = = = yp xp *yp (t1) *xp (t0) eax ebx 0x124 0x120 0x11c 0x118 0x114 0x110 0x10c 0x108 0x104 0x100 Memory Register file Understanding swap 44 %eax 456 %edx 0x124 %ecx 0x120 %ebx 123 Address 123 456 Offset yp 12 0x120 xp 8 0x124 4 Rtn adr %epb → 0 -4 %esi %edi %esp %ebp 0x104 movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) # # # # # # ecx edx eax ebx *xp *yp = = = = = = yp xp *yp (t1) *xp (t0) eax ebx 0x124 0x120 0x11c 0x118 0x114 0x110 0x10c 0x108 0x104 0x100 Memory Register file Understanding swap 45 Complete memory addressing modes • Most General Form: D(Rb,Ri,S) – D: – Rb: – Ri: Mem[Reg[Rb]+S*Reg[Ri]+ D] Constant “displacement” 1, 2, or 4 bytes Base register: Any of 8 integer registers Index register: Any, except for %esp • Unlikely you’d use %ebp, either – S: Scale: 1, 2, 4, or 8 (why these numbers?) • Special Cases (Rb,Ri) (Rb,Ri) D(Rb,Ri) D(Rb,Ri) (Rb,Ri,S) (Rb,Ri,S) Mem[Reg[Rb]+Reg[Ri]] Mem[Reg[Rb]+Reg[Ri]] Mem[Reg[Rb]+Reg[Ri]+D] Mem[Reg[Rb]+Reg[Ri]+D] Mem[Reg[Rb]+S*Reg[Ri]] Mem[Reg[Rb]+S*Reg[Ri]] 46 Address computation examples %edx 0xf000 %ecx 0x100 Expression 0x8(%edx) (%edx,%ecx) (%edx,%ecx,4) 0x80(,%edx,2) Address Computation Address 0xf000 + 0x8 0xf008 0xf000 + 0x100 0xf100 0xf000 + 4*0x100 0xf400 2*0xf000 + 0x80 0x1e080 47 Address computation instruction • leal Src,Dest – Src is address mode expression – Set Dest to address denoted by expression • Uses – Computing addresses without a memory reference • E.g., translation of p = &x[i]; – Computing arithmetic expressions of the form x + k*y • k = 1, 2, 4, or 8 48 Summary • 32-bit x86 registers • mov instruction: loads and stores • memory addressing modes – Example: swap() • leal: address computation 49 7.5: ia32 integer arithmetic Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 50 Some arithmetic operations • Two operand instructions: Format addl subl imull sall sarl shrl xorl andl orl Src,Dest Src,Dest Src,Dest Src,Dest Src,Dest Src,Dest Src,Dest Src,Dest Src,Dest Computation Dest ← Dest + Src Dest ← Dest - Src Dest ← Dest * Src Dest ← Dest << Src Dest ← Dest >> Src Dest ← Dest >> Src Dest ← Dest ^ Src Dest ← Dest & Src Dest ← Dest | Src Also called shll Arithmetic Logical • No distinction between signed and unsigned int (why?) 51 Some arithmetic operations • One operand instructions Format incl Dest decl Dest negl Dest notl Dest Computation Dest ← Dest + 1 Dest ← Dest - 1 Dest ← -Dest Dest ← ~Dest • See book for more instructions 52 Using leal for arithmetic expressions int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; } arith: pushl %ebp movl %esp,%ebp movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax movl %ebp,%esp popl %ebp ret Set Up Body Finish 53 Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; } movl 8(%ebp),%eax movl 12(%ebp),%edx leal (%edx,%eax),%ecx leal (%edx,%edx,2),%edx sall $4,%edx addl 16(%ebp),%ecx leal 4(%edx,%eax),%eax imull %ecx,%eax # # # # # # # # eax edx ecx edx edx ecx eax eax = = = = = = = = Offset • • • 16 z z 12 y y 8 x x 4 Rtn adr 0 Old Old %ebp %ebp x y x+y (t1) 3*y 48*y (t4) z+t1 (t2) 4+t4+x (t5) t5*t2 (rval) Stack %ebp 54 Another example int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; } logical: pushl %ebp movl %esp,%ebp movl xorl sarl andl movl %ebp,%esp popl %ebp ret 213 = 8192, 213 – 7 = 8185 movl xorl sarl andl 8(%ebp),%eax 12(%ebp),%eax $17,%eax $8185,%eax 8(%ebp),%eax 12(%ebp),%eax $17,%eax $8185,%eax # # # # eax eax eax eax = = = = Setup Body Finish x x^y (t1) t1>>17 (t2) t2 & 8185 55 7.6: 64-bit x86 architecture Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 56 Data representations: ia32 and x86-64 C data type Typical 32-bit ia32 Intel x86-64 char 1 1 1 short 2 2 2 int 4 4 4 long 4 4 8 long long 8 8 8 float 4 4 4 double 8 8 8 long double 8 10/12 10/16 char * 4 4 8 (or any other pointer) Sizes of C objects (in bytes) 57 x86-64 integer registers %rax %eax %r8 %r8d %rbx %ebx %r9 %r9d %rcx %ecx %r10 %r10d %rdx %edx %r11 %r11d %rsi %esi %r12 %r12d %rdi %edi %r13 %r13d %rsp %esp %r14 %r14d %rbp %ebp %r15 %r15d – Extend existing registers. Add 8 new ones. – Make %ebp/%rbp general purpose 58 Instructions • Long word l (4 Bytes) ↔ Quad word q (8 Bytes) • New instructions: – – – – movl → movq addl → addq sall → salq etc. • 32-bit instructions that generate 32-bit results – Set higher order bits of destination register to 0 – Example: addl 59 Swap in 32-bit mode void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: pushl %ebp movl %esp,%ebp pushl %ebx movl movl movl movl movl movl 12(%ebp),%ecx 8(%ebp),%edx (%ecx),%eax (%edx),%ebx %eax,(%edx) %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret Setup Body Finish 60 Swap in 64-bit Mode void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: movl movl movl movl retq (%rdi), %edx (%rsi), %eax %eax, (%rdi) %edx, (%rsi) • Operands passed in registers (why useful?) – First (xp) in %rdi, second (yp) in %rsi – 64-bit pointers • No stack operations required • 32-bit data – Data held in registers %eax and %edx – movl operation 61 Swap Long Ints in 64-bit Mode void swap_l (long int *xp, long int *yp) { long int t0 = *xp; long int t1 = *yp; *xp = t1; *yp = t0; } swap_l: movq movq movq movq retq (%rdi), %rdx (%rsi), %rax %rax, (%rdi) %rdx, (%rsi) • 64-bit data – Data held in registers %rax and %rdx – movq operation – “q” stands for quad-word 62 7.7: Condition codes Computer Architecture and Systems Programming 252-0061-00, Herbstsemester 2013 Timothy Roscoe 63 Processor State (ia32, Partial) • Information about currently executing program – Temporary data ( %eax, … ) – Location of runtime stack ( %ebp,%esp ) – Location of current code control point ( %eip, … ) – Status of recent tests ( CF,ZF,SF,OF) %eax %ecx %edx General purpose registers %ebx %esi %edi %esp Current stack top %ebp Current stack frame %eip Instruction pointer CF ZF SF OF Condition codes 64 Condition codes (implicit setting) • Single bit registers CF Carry Flag (for unsigned) ZF Zero Flag SF Sign Flag (for signed) OF Overflow Flag (for signed) • Implicitly set (think of it as side effect) by arithmetic operations – – – – Example: addl/addq Src,Dest ↔ t = a+b CF set if carry out from most significant bit (unsigned overflow) ZF set if t == 0 SF set if t < 0 (as signed) OF set if two’s complement (signed) overflow (a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0) • Not set by lea instruction • Full documentation link on course website 65 Condition Codes (Explicit Setting: Compare) • Explicit Setting by Compare Instruction cmpl/cmpq Src2,Src1 cmpl b,a like computing a-b without setting destination CF set if carry out from most significant bit (used for unsigned comparisons) ZF set if a == b SF set if (a-b) < 0 (as signed) OF set if two’s complement (signed) overflow: (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0) 66 Condition Codes (Explicit Setting: Test) • Explicit Setting by Test instruction testl/testq Src2,Src1 testl b,a like computing a&b w/o setting destination – Sets condition codes based on value of Src1 & Src2 – Useful to have one of the operands be a mask ZF set when a&b == 0 SF set when a&b < 0 67 Reading Condition Codes • SetX Instructions – Set single byte based on combinations of condition codes SetX Condition Description sete setne sets setns setg setge setl setle seta setb ZF ~ZF SF ~SF ~(SF^OF)&~ZF ~(SF^OF) (SF^OF) (SF^OF)|ZF ~CF&~ZF CF Equal / Zero Not Equal / Not Zero Negative Nonnegative Greater (Signed) Greater or Equal (Signed) Less (Signed) Less or Equal (Signed) Above (unsigned) Below (unsigned) 68 Reading Condition Codes (Cont.) • setx Instructions: Set single byte based on combination of condition codes • One of 8 addressable byte registers – Does not alter remaining 3 bytes – Typically use movzbl to finish job int gt (int x, int y) { return x > y; } %eax %ah %al %ecx %ch %cl %edx %dh %dl %ebx %bh %bl %esi %edi %esp %ebp Body movl 12(%ebp),%eax cmpl %eax,8(%ebp) setg %al movzbl %al,%eax # # # # eax = y Compare x : y al = x > y Zero rest of %eax 69 Reading Condition Codes: x86-64 • setx Instructions: – Set single byte based on combination of condition codes – Does not alter remaining 3 bytes int gt (long x, long y) { return x > y; } long lgt (long x, long y) { return x > y; } Body (same for both) xorl %eax, %eax cmpq %rsi, %rdi setg %al # eax = 0 # Compare x and y # al = x > y Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0! 70 Jumping jX Instructions: Jump to different part of code depending on condition codes jX Condition Description jmp 1 Unconditional je ZF Equal / Zero jne ~ZF Not Equal / Not Zero js SF Negative jns ~SF Non-negative jg ~(SF^OF)&~ZF Greater (Signed) jge ~(SF^OF) Greater or Equal (Signed) jl (SF^OF) Less (Signed) jle (SF^OF)|ZF Less or Equal (Signed) ja ~CF&~ZF Above (unsigned) jb CF Below (unsigned) 71 Summary • Condition codes (C, Z, S, O) • Explicit setting of condition codes – Compare – Test • Reading condition codes – setX • Jumps 72