Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 3 Instructions: Language of the Machine Note: The slides being presented represent a mix. Some are created by Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many are taken from the Patterson & Hennessy book, “Computer Organization & Design”, Copyright 1998 Morgan Kaufmann Publishers. This material may not be copied or distributed for commercial purposes without express written permission of the copyright holder. The original slides may be found at: http://books.elsevier.com/us//mk/us/subindex.asp?maintarget=companio ns/defaultindividual.asp&isbn=1558604286&country=United+States&srcc ode=&ref=&subcode=&head=&pdf=&basiccode=&txtSearch=&SearchFiel d=&operator=&order=&community=mk 1 Instructions: • Language of the Machine • More primitive than higher level languages e.g., no sophisticated control flow • Very restrictive e.g., MIPS Arithmetic Instructions • We will use the MIPS instruction set architecture – similar to other architectures developed since the 1980's – used by NEC, Nintendo, Silicon Graphics, Sony Instruction Set Design goals: maximize performance and minimize cost, reduce design time S04 2 MIPS arithmetic • • • • • All instructions perform one operation and must have 3 operands. Operand order is fixed (destination first, sources next). Design principle: regularity implies simplicity of design. Only one line per instruction (assembler constraint). # indicates start of comment. Example: C code: A=B+C MIPS code: add $s0, $s1, $s2 # $s0 Å $s1 + $s2 # $s0,$s1,$s2 are registers # all arithmetic operations use registers. S04 3 MIPS arithmetic • Hardware simplicity can add to programming complexity C code: A = B + C + D; E = F - A; MIPS code: add $t0, $s1, $s2 add $s0, $t0, $s3 sub $s4, $s5, $s0 # $s1=B, $s2=C # $s3=D, $s0=A # $s5=F, $s4=E • Operands must be registers, only 32 registers provided • Design Principle: smaller is faster. Why? S04 4 Registers vs. Memory • Arithmetic instructions operands must be registers, — only 32 registers provided (0 to 31) • Registers are 32 bits in length. A 32 bit chunk = “word”. • Compiler associates variables with registers. • What about programs with lots of variables (i.e. > 32). Input Control Memory Datapath Processor S04 Output I/O 5 Memory Organization • Memory is viewed as a large, single-dimension array. • A memory address is an index into the array. • "Byte addressing" means that the address points to a byte of memory. 0 1 2 3 4 5 6 8 bits of data 0000 8 bits of data 0001 8 bits of data 0010 8 bits of data 0011 8 bits of data 0100 8 bits of data 0101 8 bits of data 0110 ... S04 6 Memory Organization • Bytes are nice, but most data items use larger "words" • For MIPS, a word is 32 bits or 4 bytes. 0 4 8 12 ... 32 bits of data 32 bits of data 32 bits of data Registers hold 32 bits of data 32 bits of data • 232 bytes with byte addresses from 0 to 232-1 • 230 words with byte addresses 0, 4, 8, ... 232-4 • Words are aligned i.e., what are the least 2 significant bits of a word address? S04 7 Instructions • Load and store instructions • Example: A is array 100 words in length, h has been loaded into register s0. s3 contains base address for the array. C code: MIPS code: A[8] = h + A[8]; lw $t0, 32($s3) add $t0, $s0, $t0 sw $t0, 32($s3) # 32=8*4Bytes # sw = store word • Remember arithmetic operands are registers, not memory! S04 8 Another Example • Can we figure out the code? swap(int v[], int k); { int temp; temp = v[k] v[k] = v[k+1]; v[k+1] = temp; } swap: muli add lw lw sw sw jr S04 $2, $2, $15, $16, $16, $15, $31 $5, 4 $4, $2 0($2) 4($2) 0($2) 4($2) 9 So far: • MIPS — loading words but addressing bytes — arithmetic on registers only • Instruction add $s1, $s2, $s3 sub $s1, $s2, $s3 lw $s1, 100($s2) sw $s1, 100($s2) S04 Meaning $s1 = $s2 + $s3 $s1 = $s2 – $s3 $s1 = Memory[$s2+100] Memory[$s2+100] = $s1 10 Machine Language • • Instructions, like registers and words of data, are also 32 bits long – Example: add $t0, $s1, $s2 – registers have numbers, $t0=8, $s1=17, $s2=18 – $t0 Æ$t7 map to registers 8 Æ15, $s0 Æ$s7 map to registers 16 Æ23 Instruction Format (Register or R-Type) 000000 10001 op rs 6-bits 5-bits • • 10010 rt 01000 rd 00000 100000 shamt funct 5-bits 5-bits 5-bits 6-bits All instructions are the same length (fixed length vs variable length) Can you guess what the field names stand for? S04 11 Machine Language • • • • S04 Consider the load-word and store-word instructions, – What would the regularity principle have us do? – New principle: Good design demands a compromise Introduce a new type of instruction format – I-type for data transfer instructions Example: lw $t0, 32($s2) # $t0(reg 8)ÅA[8], $s2(reg 18) 35 18 8 op rs rt 32 16 bit number Where's the compromise? 12 Instruction Formats (so far) Instruction Format op rs rt rd shamt funct address add R 0 reg reg reg 0 32 n.a. sub subtract R 0 reg reg reg 0 34 n.a. lw Load word I 35 reg reg n.a. n.a. n.a. address sw Store word I 43 reg reg n.a. n.a. n.a. address S04 13 Stored Program Concept • • Instructions are bits Programs are stored in memory (read or written just like data) Processor • Memory memory for data, programs, compilers, editors, etc. Fetch & Execute Cycle – The PC (Program Counter) is a register that holds the memory address of the currently execution instruction. Generally (no branch or jump) on each instruction execution PC Å PC + 4 – Instructions are fetched and put into a special register (IR: Inst. Reg.) – Bits in the register "control" the instruction execution. – When complete, the “next” instruction is “fetched” (using PC) and the newly fetched instruction is executed … S04 14 Processor Organization OFF-Chip Data Memory Control PC Å PC+1 PC Å Br.Addr Other Stuff Instruction Memory Registers ON-Chip Processor PC: Program Counter IR: Instruction Reg. Other Registers Arithmetic and Logic S04 15 Now for Control Instructions • Decision making instructions: alter the control flow, – Instead of PC Å PC + 4, PC Å (Branch or Jump Address) • MIPS conditional branch instructions: bne $s0, $s1, Label #Go to addr. “Label” if contents #contents $s0 is not equal to $s1 beq $s0, $s1, Label • Example: if (i==j) h = i + j; bne $s0, $s1, Label add $s3, $s0, $s1 Label: .... S04 #h result in $s3 16 Control • MIPS unconditional branch instructions: j label • Example: if (i!=j) h=i+j; else h=i-j; Lab1: Lab2: beq $s4, $s5, Lab1 add $s3, $s4, $s5 j Lab2 sub $s3, $s4, $s5 ... • Can you build a simple for loop? S04 17 Addresses in Branches • • Instructions: bne $t4,$t5,Label beq $t4,$t5,Label Formats: I • • S04 Next instruction is at Label if $t4°$t5 Next instruction is at Label if $t4=$t5 op rs rt 16 bit address Could specify a register (like lw and sw) and add it to address – use Instruction Address Register (PC = program counter): i.e., PC Å PC + (16-bits in label field of branch). – Label field is a signed number (branching forward & backward). – most branches are local (principle of locality). Jump instructions just use high order bits of PC – address boundaries of 256 MB 18 So far: • • Instruction Meaning add $s1,$s2,$s3 sub $s1,$s2,$s3 lw $s1,100($s2) sw $s1,100($s2) bne $s4,$s5,L beq $s4,$s5,L j Label $s1 = $s2 + $s3 $s1 = $s2 – $s3 $s1 = Memory[$s2+100] Memory[$s2+100] = $s1 Next instr. is at Label if $s4 != $s5 Next instr. is at Label if $s4 == $s5 Next instr. is at Label Formats: R op rs rt rd I op rs rt 16 bit address J op S04 shamt funct 26 bit address 19 More on Control Flow • We have: beq, bne, what about Branch-if-less-than? • New instruction (Set Less Than, slt) if slt $t0, $s1, $s2 $s1 < $s2 then $t0 = 1 else $t0 = 0 • Can use this instruction to build "blt $s1, $s2, Label“ • Can build general control structures and call them in assembly language using “pseudoinstructions” which expand into a set of machine instructions. S04 20 Policy of Use Conventions Name Register number $zero 0 $v0-$v1 2-3 $a0-$a3 4-7 $t0-$t7 8-15 $s0-$s7 16-23 $t8-$t9 24-25 $gp 28 $sp 29 $fp 30 $ra 31 Usage the constant value 0 values for proc results & expression eval. arguments to pass to procedures temporaries saved more temporaries global pointer stack pointer frame pointer return address on procedure return $at, reg 1; reserved for assembler. $k0,$k1, reg 26,27; reserved for OS. 21 Constants • • • Small constants are used quite frequently (50% of operands) e.g., A = A + 5; B = B + 1; C = C - 18; Solutions? Why not? – put 'typical constants' in memory and load them. – create hard-wired registers (like $zero) for constants like one. MIPS Instructions: addi $29, $29, 4 slti $8, $18, 10 andi $29, $29, 6 ori $29, $29, 4 • S04 How do we make this work? 22 Immediate Instructions Addi $sp, $sp, 4 #Add immediate op rs rt immediate 8 29 29 4 001000 11101 11101 0000 0000 0000 0001 Lui $t0, 255 #Load upper immediate, $t0=reg8=0 op rs rt immediate 001111 00000 01000 0000 0000 1111 1111 $t0 after executing instruction: 0000 0000 1111 1111 0000 0000 0000 0000 S04 23 How about larger constants? • • We'd like to be able to load a 32 bit constant into a register Must use two instructions, new "load upper immediate" instruction lui $t0, 1010101010101010 1010101010101010 • 0000000000000000 Then must get the lower order bits right, i.e., ori $t0, $t0, 1010101010101010 ori S04 filled with zeros 1010101010101010 0000000000000000 0000000000000000 1010101010101010 1010101010101010 1010101010101010 Can also use addi 24 Procedure/subroutine Calls & Returns The PC (Program Counter) is a register that holds the memory address of the currently execution instruction. It is incremented by 4B each time a nonJump/Branch instruction is executed. It is changed to the Jump/Branch address when required.. Calling and returning from a procedure: PC contents X X+4 X+8 X + 12 etc. 2500 … … Instruction inst 1 inst 2 jal 2500 instructions for procedure jr S04 # $ra (reg 31) Å PC + 4 = X + 12 # end of procedure: PC Å $ra, PC Å X + 12 25 Memory Usage Memory is typically divided so that certain areas are reserved for certain usage. On MIPS processors the convention is as follows: 7fffffff hex Stack segment Dynamic data 10000000hex Static data Data segment Text segment 400000hex S04 Reserved 26 Loading a Register With an Address > 16-bits • Direct Loading: Say you want the address 0x10010020 Æ $v0 lui $s0, 0x1001 lw $v0, 0x0020($s0) • Using Global Register $gp: Many MIPS systems, on startup load $gp with 0x10008000. lw $v0, 0x8020($gp) • # Upper $s0 Å 0x1001 # $v0 Å 0x1001000 + 0x0020 # $v0 Å 10010020= 0x8020 + 0x10008000 Use Pseudo Instruction, la (Load Address): iarray: main: end: S04 .data .word .text la lw add jr # default starting address for data: 0x10010000 8, 128, 33, 279, 55, 10345, 5235, # array of ints(decimal) $s4, iarray # translates into lui $s1, 0($s4) $s2, $s1, $s1 $ra 27 Example using stack and procedure return • • • Copy Byte string Y to X, string terminates with null character. Base Addresses: Y in $a1, X in $a0, String pointer will be in $s0 Since $s0 is a stack register it must be save prior to using Strcpy: sub L1: L2: S04 sw add add lb add sb beq add j lw add jr $sp, $sp, 4 # adjust stack pointer, # stack grows down in memory, “push” $s0, 0($sp) # save $s0 on the stack $s0, $zero, $zero # string pointer set to zero $t1, $a1, $s0 # $t1 Å addr of Y string byte $t2, 0($t1) # $t2 Å Y string byte $t3, $a0, $s0 # $t3 Å addr of X string byte $t2, 0($t3) # Y byte Æ X byte $t2, $zero, L2 # If end of string, go to L2 $s0, $s0, 1 # increment string pointer, $s0Å$s0+1 L1 # go back and mover next byte $s0, 0($sp) # restore old $s0 from stack $sp, $sp, 4 # “pop” the stack $ra # return to calling routine 28 Overview of MIPS • • • • • S04 simple instructions all 32 bits wide very structured, no unnecessary baggage only three instruction formats R op rs rt rd I op rs rt 16 bit address J op shamt funct 26 bit address rely on compiler to achieve performance — what are the compiler's goals? help compiler where we can 29 MIPS Addressing Modes • Register Addressing: Operand is a register • Base or Displacement Addressing: Operand is at memory location, address is sum of a register and a constant in the instruction. • Immediate Addressing: Operand is a constant within the instruction. • PC-relative Addressing: Address is the sum of PC and a constant in the instruction • Pseudodirect Addressing: Jump address is the 26 bits of the instruction concatenated with the upper bits of the PC. S04 30 1. Immediate addressing op rs rt Immediate 2. Register addressing op rs rt rd ... funct Registers Register 3. Base addressing op rs rt Memory Address + Register Byte Halfword Word 4. PC-relative addressing op rs rt Memory Address PC + Word 5. Pseudodirect addressing op S04 Address PC Memory Word 31 To summarize: MIPS operands Name 32 registers Example Comments $s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is $fp, $sp, $ra, $at reserved for the assembler to handle large constants. Memory[0], 30 2 Accessed only by data transfer instructions. MIPS uses byte addresses, so memory Memory[4], ..., words sequential words differ by 4. Memory holds data structures, such as arrays, Memory[4294967292] and spilled registers, such as those saved on procedure calls. MIPS assembly language Category Arithmetic Instruction add Example add $s1, $s2, $s3 Meaning $s1 = $s2 + $s3 Three operands; data in registers subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers $s1 = $s2 + 100 $s1 = Memory[$s2 + 100] Memory[$s2 + 100] = $s1 $s1 = Memory[$s2 + 100] Memory[$s2 + 100] = $s1 Used to add constants addi $s1, $s2, 100 lw $s1, 100($s2) sw $s1, 100($s2) store word lb $s1, 100($s2) load byte sb $s1, 100($s2) store byte load upper immediate lui $s1, 100 add immediate load word Data transfer Conditional branch Unconditional jump $s1 = 100 * 2 16 Comments Word from memory to register Word from register to memory Byte from memory to register Byte from register to memory Loads constant in upper 16 bits branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100 Equal test; PC-relative branch branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100 Not equal test; PC-relative set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0 Compare less than; for beq, bne set less than immediate slti jump j jr jal jump register jump and link $s1, $s2, 100 if ($s2 < 100) $s1 = 1; Compare less than constant else $s1 = 0 2500 $ra 2500 Jump to target address go to 10000 For switch, procedure return go to $ra $ra = PC + 4; go to 10000 For procedure call 32 Assembly Language vs. Machine Language • Assembly provides convenient symbolic representation – much easier than writing down numbers – e.g., destination first • Machine language is the underlying reality – e.g., destination is no longer first • Assembly can provide 'pseudoinstructions' – e.g., “move $t0, $t1” exists only in Assembly; would be implemented using “add $t0,$t1,$zero” • When considering performance you should count real instructions S04 33 Assembly Language vs. Machine Language • Assembly provides convenient symbolic representation – much easier than writing down numbers – e.g., destination first • Machine language is the underlying reality – e.g., destination is no longer first • Assembly can provide 'pseudoinstructions' – e.g., “move $t0, $t1” exists only in Assembly; would be implemented using “add $t0,$t1,$zero” • When considering performance you should count real instructions S04 34 Byte Order within Words • Number of Bytes within a word can be from left to right, or right to left. • Big-Endian (left-to-right): .byte 0,1,2,3 Byte # O 1 2 3 • Little-Endian (right-to-left): .byte 3,2,1,0 Byte # 3 2 1 0 S04 35 Fixed vs Variable Length Instructions S04 36 Other Issues • Things we are not going to cover linkers, loaders, memory layout stacks, frames (introduced briefly), recursion manipulating strings and pointers interrupts and exceptions details of system call conventions (see example) • Some of these we'll talk about later • We've focused on architectural issues – basics of MIPS assembly language and machine code – We will build a processor to execute these instructions. S04 37 Alternative Architectures • Design alternative: – provide more powerful operations – goal is to reduce number of instructions executed – danger is a slower cycle time and/or a higher CPI • Sometimes referred to as “RISC vs. CISC” – virtually all new instruction sets since 1982 have been RISC – VAX: minimize code size, make assembly language easy instructions from 1 to 54 bytes long! • S04 We’ll look at PowerPC and 80x86 38 PowerPC • Indexed addressing – example: lw $t1,$a0+$s3 #$t1=Memory[$a0+$s3] – What do we have to do in MIPS? • • S04 Update addressing – update a register as part of load (for marching through arrays) – example: lwu $t0,4($s3) #$t0=Memory[$s3+4];$s3=$s3+4 – What do we have to do in MIPS? Others: – load multiple/store multiple – a special counter register “bc Loop” decrement counter, if not 0 goto loop 39 80x86 • • • • • • 1978: The Intel 8086 is announced (16 bit architecture). 1980: The 8087 floating point coprocessor is added. 1982: The 80286 increases address space to 24 bits, +instructions. 1985: The 80386 extends to 32 bits, new addressing modes 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (mostly designed for higher performance) 1997: MMX is added “This history illustrates the impact of the “golden handcuffs” of compatibility “adding new features as someone might add clothing to a packed bag” “an architecture that is difficult to explain and impossible to love” S04 40 A dominant architecture: 80x86 • • • See your textbook for a more detailed description Complexity: – Instructions from 1 to 17 bytes long – one operand must act as both a source and destination – one operand can come from memory – complex addressing modes e.g., “base or scaled index with 8 or 32 bit displacement” Saving grace: – frequently used instructions are not too difficult to build – compilers avoid the portions of the architecture that are slow “what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective” S04 41 Summary • Instruction complexity is only one variable – lower instruction count vs. higher CPI / lower clock rate • Design Principles: – simplicity favors regularity – smaller is faster – good design demands compromise – make the common case fast • Instruction set architecture – a very important abstraction S04 42