Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Adaptation par J.Bétréma CS:APP Chapter 4 Computer Architecture Instruction Set Architecture Randal E. Bryant Carnegie Mellon University http://csapp.cs.cmu.edu CS:APP Instruction Set Architecture Application Program Compiler OS Jeu d’instructions du processeur, ou plutôt du modèle de processeur. ISA CPU Design Circuit Design Chip Layout –2– CS:APP X86 Evolution: Programmer’s View Name 8086 1978 29K 1985 275K Extended to 32 bits. Added “flat addressing” Capable of running Unix Linux/gcc uses no instructions introduced in later models Pentium –3– Transistors 16-bit processor. Basis for IBM PC & DOS Limited to 1MB address space. DOS only gives you 640K 386 Date 1993 3.1M CS:APP X86 Evolution: Programmer’s View Name Pentium III –4– Transistors 1999 8.2M Added “streaming SIMD” instructions for operating on 128-bit vectors of 1, 2, or 4 byte integer or floating point data Our fish machines Pentium 4 Date 2001 42M Added 8-byte formats and 144 new instructions for streaming SIMD mode CS:APP X86 Evolution: Clones Advanced Micro Devices (AMD) Historically AMD has followed just behind Intel A little bit slower, a lot cheaper Recently Recruited top circuit designers from Digital Equipment Corp. Exploited fact that Intel distracted by IA64 Now are close competitors to Intel –5– Developing own extension to 64 bits CS:APP New Species: IA64 Name Date Transistors Itanium 2001 10M Extends to IA64, a 64-bit architecture Radically new instruction set designed for high performance Will be able to run existing IA32 programs On-board “x86 engine” Joint project with Hewlett-Packard Itanium 2 –6– 2002 221M Big performance boost CS:APP Y86 Processor State Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes Memory OF ZF SF PC Program Registers Same 8 as with IA32. Each 32 bits Program Counter Indicates address of instruction –7– CS:APP Condition codes Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes Memory OF ZF SF PC Single-bit flags set by arithmetic or logical instructions » OF: Overflow ZF: Zero SF:Negative Il manque CF : Carry (retenue). On travaille donc sur des entiers relatifs (“avec signe”). Correspondance avec la notation NZVC : » N = SF Z = ZF V = OF –8– CS:APP Memory Program registers %eax %esi %ecx %edi %edx %esp %ebx %ebp Condition codes Memory OF ZF SF PC Byte-addressable storage array Words stored in little-endian byte order –9– CS:APP Machine Words Machine Has “Word Size” Nominal size of integer-valued data Including addresses Most current machines are 32 bits (4 bytes) Limits addresses to 4GB Becoming too small for memory-intensive applications High-end systems are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes Machines support multiple data formats Fractions or multiples of word size Always integral number of bytes – 10 – CS:APP Word-Oriented Memory Organization 32-bit 64-bit Words Words Addresses Specify Byte Locations Address of first byte in word Addresses of successive words differ by 4 (32-bit) or 8 (64-bit) Addr = 0000 ?? Addr = 0000 ?? Addr = 0004 ?? Addr = 0008 ?? Addr = 000c ?? – 11 – Addr = 0008 ?? Bytes Addr. 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 000a 000b 000c 000d 000e 000f CS:APP Byte Ordering Example Big Endian Least significant byte has highest address Little Endian Least significant byte has lowest address Example Variable n has 4-byte representation 0x01234567 Address given by &n is 0x100 Big Endian 0x100 0x101 0x102 0x103 01 Little Endian 45 67 0x100 0x101 0x102 0x103 67 – 12 – 23 45 23 01 CS:APP Y86 Instructions Format 1--6 bytes of information read from memory Can determine instruction length from first byte Not as many instruction types, and simpler encoding than with IA32 – 13 – Each accesses and modifies some part(s) of the program state PC incrémenté selon la longueur de l’instruction, pour pointer sur l’instruction suivante CS:APP Encoding Registers Each register has 4-bit ID %eax %ecx %edx %ebx 0 1 2 3 %esi %edi %esp %ebp 6 7 4 5 Same encoding as in IA32 IA32 utilise seulement 3 bits pour coder un registre Register ID 8 indicates “no register” – 14 – Will use this in our hardware design in multiple places CS:APP Instruction Example Addition Instruction Generic Form Encoded Representation addl rA, rB 6 0 rA rB Add value in register rA to that in register rB Store result in register rB Note that Y86 only allows addition to be applied to register data Set condition codes based on result e.g., addl %eax,%esi Encoding: 60 06 Two-byte encoding First indicates instruction type Second gives source and destination registers – 15 – CS:APP Arithmetic and Logical Operations Instruction Code Add addl rA, rB Function Code 6 0 rA rB Subtract (rA from rB) subl rA, rB Refer to generically as “OPl” Encodings differ only by “function code” Set condition codes as side effect Manquent (entre autres) : inc (incrémentation), dec, sh (shift = décalage) … 6 1 rA rB And andl rA, rB 6 2 rA rB Exclusive-Or xorl rA, rB – 16 – 6 3 rA rB CS:APP Arithmetic and Logical Operations (2) Dans les architectures RISC (Reduced Instruction Set Computer) les opérations portent sur 3 registres : • deux registres sources pour les opérandes • un registre destination pour le résultat Attention : ici (x86 ou Y86) le second registre sert aussi de destination, et le second opérande est donc écrasé. – 17 – CS:APP Move Operations rrmovl rA, rB Register --> Register 2 0 rA rB 3 0 8 rB V rmmovl rA, D(rB) 4 0 rA rB D 5 0 rA rB D irmovl V, rB mrmovl D(rB), rA Register --> Memory Memory --> Register Like the IA32 movl instruction Simpler format for memory addresses Give different names to keep them distinct – 18 – Immediate --> Register CS:APP Move Instruction Examples IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff movl %esi,0x41c(%esp) rmmovl %esi,0x41c(%esp) 40 64 1c 04 00 00 movl $0xabcd, (%eax) — movl %eax, 12(%eax,%edx) — movl (%ebp,%eax,4),%ecx — – 19 – CS:APP Load Ce sont les instructions coûteuses ! mrmovl 12(%ebp),%ecx cas général (déplacement) mrmovl (%ebp),%ecx le registre ebp sert de pointeur mrmovl adresse absolue de lecture 0x1b8,%ecx Store rmmovl %esi,0x41c(%esp) cas général (déplacement) rmmovl %esi,(%esp) le registre esp sert de pointeur rmmovl %esi,0x41c adresse absolue d’écriture – 20 – CS:APP Adressage par déplacement mrmovl 12(%ebp),%ecx cas général (déplacement) On charge le mot d’adresse ebp + 12 dans le registre ecx. Load = charger = lire . rmmovl %esi,0x41c(%esp) cas général (déplacement) On sauvegarde le registre esi à l’adresse esp + 0x41c . Store = sauve(garde)r = écrire . – 21 – CS:APP Jump Instructions Jump Unconditionally jmp Dest 7 0 Dest Refer to generically as “jXX” Dest Encodings differ only by “function code” Based on values of condition codes Same as IA32 counterparts Encode full destination address Jump When Less or Equal jle Dest 7 1 Jump When Less jl Dest 7 2 Dest Jump When Equal je Dest 7 3 Dest Jump When Not Equal jne Dest 7 4 Dest 7 5 Unlike PC-relative addressing seen in IA32 Jump When Greater or Equal jge Dest Dest Jump When Greater jg Dest – 22 – 7 6 Dest CS:APP Sauts conditionnels Jump When Less : saut effectué si le résultat de la dernière opération (addl, subl, andl, xorl) est négatif; ne pas oublier l’overflow ! • ( SF = 1 et OF = 0 ) ou ( SF = 0 et OF = 1 ) • en résumé SF OF Jump When Equal : saut effectué si le résultat de la dernière opération est nul, soit ZF = 1 Jump When Less or Equal : SF OF or ZF = 1 Négations : Jump When Greater or Equal : SF = OF Jump When Not Equal : ZF = 0 Jump When Greater : SF = OF and ZF = 0 – 23 – CS:APP Sauts (suite et fin) Un saut est effectué en modifiant PC . Les instructions rrmovl, irmovl, rmmovl, mrmovl ne modifient jamais les codes de condition. Opérations logiques andl, xorl : ZF et SF ajustés selon résultat de l’opération, OF = 0 (clear). andl %eax, %eax # ne modifie pas le registre je ... # saut effectué si eax = 0 jl ... # saut effectué si eax < 0 xorl %eax, %ebx je ... # saut effectué si eax = ebx (mais ce test écrase ebx) – 24 – CS:APP Miscellaneous Instructions 0 0 nop Don’t do anything halt – 25 – 1 0 Stop executing instructions IA32 has comparable instruction, but can’t execute it in user mode We will use it to stop the simulator CS:APP Object Code Code for sum Dans la vraie vie : Assembler Translates .s into .o Some libraries are dynamically linked 0x401040 <sum>: Binary encoding of each instruction 0x55 • Total of 13 0x89 Nearly-complete image of executable bytes 0xe5 code • Each 0x8b instruction 1, Missing linkages between code in 0x45 2, or 3 bytes different files 0x0c • Starts at 0x03 address Linker 0x45 0x401040 0x08 Resolves references between files 0x89 Combines with static run-time 0xec libraries 0x5d E.g., code for malloc, printf 0xc3 Linking occurs when program begins execution – 26 – CS:APP Turning C into Object Code Code in files p1.c p2.c Compile with command: gcc -O p1.c p2.c -o p Use optimizations (-O) Put resulting binary in file p text C program (p1.c p2.c) Compiler (gcc -S) text Asm program (p1.s p2.s) Assembler (gcc or as) binary Object program (p1.o p2.o) Static libraries (.a) Linker (gcc or ld) binary – 27 – Executable program (p) CS:APP Disassembling Object Code Disassembled 00401040 <_sum>: 0: 55 1: 89 e5 3: 8b 45 0c 6: 03 45 08 9: 89 ec b: 5d c: c3 d: 8d 76 00 push mov mov add mov pop ret lea %ebp %esp,%ebp 0xc(%ebp),%eax 0x8(%ebp),%eax %ebp,%esp %ebp 0x0(%esi),%esi Disassembler objdump -d p – 28 – Useful tool for examining object code Analyzes bit pattern of series of instructions Produces approximate rendition of assembly code Can be run on either a.out (complete executable) or .o file CS:APP Pseudo code objet Loop: Fichier source .ys mrmovl (%ecx),%esi addl %esi,%eax irmovl $4,%ebx addl %ebx,%ecx irmovl $-1,%ebx addl %ebx,%edx jne Loop Simulateur Y86 # get *Start # add to sum # Start++ # Count-# Stop when 0 Assembleur yas 0x057: 0x05d: Fichier .yo 0x05f: 0x065: 0x067: 0x06d: 0x06f: – 29 – 506100000000 6060 308304000000 6031 3083ffffffff 6032 7457000000 |Loop: | | | | | | mrmovl (%ecx),%esi addl %esi,%eax irmovl $4,%ebx addl %ebx,%ecx irmovl $-1,%ebx addl %ebx,%edx jne Loop CS:APP Fichier .yo 0x057: 0x05d: 0x05f: 0x065: 0x067: 0x06d: 0x06f: 506100000000 6060 308304000000 6031 3083ffffffff 6032 7457000000 |Loop: | | | | | | mrmovl (%ecx),%esi addl %esi,%eax irmovl $4,%ebx addl %ebx,%ecx irmovl $-1,%ebx addl %ebx,%edx jne Loop 1. Etiquette = adresse calculée par l’assembleur (vrai pour tout assembleur) 2. Fichier « exécuté » par un simulateur – 30 – CS:APP CISC Instruction Sets Complex Instruction Set Computer Dominant style through mid-80’s Stack-oriented instruction set Use stack to pass arguments, save program counter Explicit push and pop instructions Arithmetic instructions can access memory addl %eax, 12(%ebx,%ecx,4) requires memory read and write Complex address calculation Condition codes Set as side effect of arithmetic and logical instructions Philosophy – 31 – Add instructions to perform “typical” programming tasks CS:APP RISC Instruction Sets Reduced Instruction Set Computer Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions Might take more to get given task done Can execute them with small and fast hardware Register-oriented instruction set Many more (typically 32) registers Use for arguments, return pointer, temporaries Only load and store instructions can access memory Similar to Y86 mrmovl and rmmovl No Condition codes – 32 – Test instructions return 0/1 in register CS:APP MIPS Registers – 33 – $0 $0 $1 $at $2 $v0 $3 $v1 $4 $a0 $5 $a1 $6 $a2 $7 Constant 0 Reserved Temp. $16 $s0 $17 $s1 $18 $s2 $19 $s3 $20 $s4 $21 $s5 $22 $s6 $a3 $23 $s7 $8 $t0 $24 $t8 $9 $t1 $25 $t9 $10 $t2 $26 $k0 $11 $t3 $27 $k1 $12 $t4 $28 $gp $13 $t5 $29 $sp $14 $t6 $30 $s8 $15 $t7 $31 $ra Return Values Procedure arguments Caller Save Temporaries: May be overwritten by called procedures Callee Save Temporaries: May not be overwritten by called procedures Caller Save Temp Reserved for Operating Sys Global Pointer Stack Pointer Callee Save Temp Return Address CS:APP MIPS Instruction Examples R-R Op Ra addu $3,$2,$1 R-I Op Ra addu $3,$2, 3145 sll $3,$2,2 Branch Op Ra beq $3,$2,dest Load/Store Op – 34 – Ra Rb Rd 00000 Fn # Register add: $3 = $2+$1 Rb Immediate # Immediate add: $3 = $2+3145 # Shift left: $3 = $2 << 2 Rb Offset # Branch when $3 = $2 Rb Offset lw $3,16($2) # Load Word: $3 = M[$2+16] sw $3,16($2) # Store Word: M[$2+16] = $3 CS:APP CISC vs. RISC Original Debate Strong opinions! CISC proponents---easy for compiler, fewer code bytes RISC proponents---better for optimizing compilers, can make run fast with simple chip design Current Status For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important For embedded processors, RISC makes sense Smaller, cheaper, less power – 35 – CS:APP Summary Y86 Instruction Set Architecture Similar state and instructions as IA32 Simpler encodings Somewhere between CISC and RISC How Important is ISA Design? Less now than before With enough hardware, can make almost anything go fast Intel is moving away from IA32 Does not allow enough parallel execution Introduced IA64 » 64-bit word sizes (overcome address space limitations) » Radically different style of instruction set with explicit parallelism » Requires sophisticated compilers – 36 – CS:APP