Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COM 249 – Computer Organization and Assembly Language Chapter 2 Instructions: Language of the Computer Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ Modified by S. J. Fritz Spring 2009 (1) Introduction • Words of a computer’s language are called its instructions • Its vocabulary is its instruction set. • Goal: – Find a language that makes it easy to build the hardware and the compiler, – while maximizing performance and minimizing cost Modified by S. J. Fritz Spring 2009 (2) • Different computers have different instruction sets – But with many aspects in common • Early computers had very simple instruction sets – Simplified implementation • Many modern computers also have simple instruction sets Modified by S. J. Fritz Spring 2009 (3) §2.1 Introduction Instruction Set Instruction Set Architecture • Early trend was to add more and more instructions to new CPUs to do elaborate operations – VAX architecture had an instruction to multiply polynomials! • RISC philosophy (Cocke IBM, Patterson, Hennessy,1980s)–Reduced Instruction Set Computing (RISC) – Keep the instruction set small and simple, makes it easier to build fast hardware. – Let software do complicated operations by composing simpler ones. Modified by S. J. Fritz Spring 2009 (4) The MIPS Instruction Set • Stored program concept- instructions and data are stored as numbers. • MIPS Instruction Set is used as the example throughout the book • Stanford MIPS commercialized by MIPS Technologies (www.mips.com) • Large share of embedded core market – Applications in consumer electronics, network/storage equipment, cameras, printers, … • Typical of many modern ISAs – See MIPS Reference Data tear-out card, and Appendixes B and E Modified by S. J. Fritz Spring 2009 (5) MIPS Architecture • MIPS – semiconductor company that built one of the first commercial RISC architectures • We will study the MIPS architecture in some detail in this class • Why MIPS instead of Intel 80x86? – MIPS is simple, elegant. Don’t want to get bogged down in gritty details. – MIPS widely used in embedded apps, x86 little used in embedded, and more embedded computers than PCs Modified by S. J. Fritz Spring 2009 (6) Review: Instruction Set Design software instruction set hardware Which is easier to change? Modified by S. J. Fritz Spring 2009 (7) Stored Program Computer • Basic Principles – Use of instructions that are indistinguishable from numbers – Use of alterable memory for programs • Demands balance among number of instructions, the number of clock cycles needed by an instruction and the speed of the clock. Modified by S. J. Fritz Spring 2009 (8) Overview of Design Principles 1. Simplicity favors regularity – keep all instructions a single size – require three register operands for arithmetic – keep register fields in same place in each instruction – regularity makes implementation simpler – simplicity enables higher performance at lower cost 2. Smaller is faster – the reason that MIPS has 32 registers rather than many more Modified by S. J. Fritz Spring 2009 (9) Overview of Design Principles 3.Make the common case fast – PC-relative addressing for conditional branch – immediate addressing for constant operands 4.Good design demands good compromises – compromise between larger addresses and keeping instructions same length Modified by S. J. Fritz Spring 2009 (10) MIPS Instructions • Design Principle 1: Simplicity favors regularity • The MIPS assembly language instruction • • • • add a, b, c means a = b + c Each line represents one instruction Each instruction has exactly 3 operands for simplicity There is one operation per MIPS instruction Instructions are related to operations (=, +, -, *, /) in C or Java Modified by S. J. Fritz Spring 2009 (11) Arithmetic Operations- Addition • The MIPS assembly language instruction add a, b, c means a = b+c • This sequence adds four variables (a=b+c+d+e) add a, b, c add a, a, d add a, a, e # the sum of b and c is placed in a # the sum of b,c, and d is now in a # the sum of b,c,d and e is now in a • Notice that it takes 3 instructions to add four variables Modified by S. J. Fritz Spring 2009 (12) MIPS Addition and Subtraction • Syntax of Instructions: 1 2,3,4 where: 1) operation by name 2) operand getting result (“destination”) 3) 1st operand for operation (“source1”) 4) 2nd operand for operation (“source2”) • Syntax is rigid: – 1 operator, 3 operands – Why? Keep Hardware simple via regularity Modified by S. J. Fritz Spring 2009 (13) MIPS Addition and Subtraction of Integers • Addition in Assembly – Example: add $s0,$s1,$s2 (in MIPS) Equivalent to: a = b + c (in C/Java) where MIPS registers $s0,$s1,$s2 are associated with C variables a, b, c • Subtraction in Assembly – Example: sub $s3,$s4,$s5 (in MIPS) Equivalent to: d = e - f (in C) where MIPS registers $s3,$s4,$s5 are associated with C variables d, e, f Modified by S. J. Fritz Spring 2009 (14) Addition and Subtraction • How would MIPS do this C/Java statement? a = b + c + d - e; • Break into multiple instructions add $t0, $s1, $s2 # temp = b + c add $t0, $t0, $s3 # temp = temp + d sub $s0, $t0, $s4 # a = temp - e • Notice: A single line of C or Java may break up into several lines of MIPS. Everything after the hash mark- # - on each line is ignored (comments) Modified by S. J. Fritz Spring 2009 (15) Compiling C into MIPS • How do we do this? • C code: f = (g + h) - (i + j); • Use intermediate temporary registers: • Compiled MIPS pseudocode: add t0, g, h add t1, i, j sub f, t0, t1 # temp t0 = g + h # temp t1 = i + j # f = t0 - t1 • Comments are to the right of the # • Each line contains at most one instruction Modified by S. J. Fritz Spring 2009 (16) C, Java Variables vs. Registers • In C (and most High Level Languages) variables are declared first and given a type – Example: int fahr, celsius; char a, b, c, d, e; • Each variable can ONLY represent a value of the declared type (cannot mix and match int and char variables). • In Assembly Language, the registers have no type; operation determines how register contents are treated Modified by S. J. Fritz Spring 2009 (17) Operands of the Computer Hardware • Operands of arithmetic instructions must be from a limited number of special memory locations called registers • Size of a MIPS register is 32 bits - called a word (although there is a 64 bit version). • Major difference between variables in programming language (unlimited) and registers is the limited number of registerstypically 32 in MIPS. Modified by S. J. Fritz Spring 2009 (18) Operands of the Computer Hardware • Design Principle 2: Smaller is faster – Very large number of registers may increase clock cycle time because it takes electronic signals longer to travel farther. – Using more than 32 registers would require a different instruction format. – MIPS register convention is to use two character names following a dollar sign: $s0, $s1… for variables $t0, $t1… for temporary locations $a0, $a1…for arguments Modified by S. J. Fritz Spring 2009 (19) Compiling C into MIPS Using Registers • C code: (similar to previous example) f = (g + h) - (i + j); • where f, g, h, i, j are assigned to registers $s0, $s1, $s2, $s3 and $s4 respectively: • Compiled MIPS code: add $t0,$s1,$S2 add $t1,$s3,$s4 sub $S0,$t0,$t1 #register $t0 = g + h #register $t1 = i + j #f gets $t0 - $t1 • Variables have been replaced with registers Modified by S. J. Fritz Spring 2009 (20) Register Operands • Arithmetic instructions use register operands • MIPS has a 32 × 32-bit register file (32 registers, each 32 bits) – Use for frequently accessed data – Numbered 0 to 31 – 32-bit data called a “word” Modified by S. J. Fritz Spring 2009 (21) Memory Operands • Programming Languages have both simple variables and complex data structures. • How can we handle large data structures with just a few registers? – Data structures are kept in memory. • MIPS includes instructions to transfer data between memory and registers. – Data transfer instructions ( load, store) Modified by S. J. Fritz Spring 2009 (22) Memory Operands • Data transfer Instruction – load copies data from memory to register – lw - load word • Format opcode register , constant (register) memory address • Syntax lw $t0, 8 ($s3) offset base address Modified by S. J. Fritz Spring 2009 (23) Memory Addressing ° Since 1980 almost every machine uses addresses to the level of 8-bits ( byte) 2 questions for design of Instruction Set Architecture: Since we could read a 32-bit word as • four loads of bytes from sequential byte addresses • or as one load word from a single byte address, How do byte addresses map onto words? Can a word be placed on any byte boundary? Modified by S. J. Fritz Spring 2009 (24) Addressing Objects: Alignment • Since 8-bit bytes are useful, most architectures address individual bytes. • Address of a word matches the address of one of the four bytes in the word • Addresses of sequential words differ by 4 bytes • MIPS words must start at addresses that are multiples of 4 - called alignment restriction Modified by S. J. Fritz Spring 2009 (25) Memory Operands • Arithmetic operations occur on registers • More complex data structures (arrays and structures) are kept in memory • MIPS must include instructions that transfer data between memory and registers (called data transfer instructions) • To access a word in memory, the instruction must include the memory address. Modified by S. J. Fritz Spring 2009 (26) Memory Operands • Main memory used for composite data – Arrays, structures, dynamic data • To apply arithmetic operations – Load values from memory into registers – Store result from register to memory • Memory is byte addressed – Each address identifies an 8-bit byte • Words are aligned in memory – Address must be a multiple of 4 • MIPS is Big Endian – Most-significant byte at least address of a word – c.f. Little Endian: least-significant byte at least address Modified by S. J. Fritz Spring 2009 (27) Addressing Objects: Endianess • Computers are grouped into those that use: – the address of the leftmost or “big end byte” as the word address – and those that use the “little end” or rightmost byte • MIPS is in the BIG Endian group Modified by S. J. Fritz Spring 2009 (28) Addressing Objects: “Endianess” and Alignment • Big Endian: address of most significant :IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA • Little Endian: address of least significant: Intel 80x86, DEC Vax, DEC Alpha (Windows NT) little endian byte 0 3 2 1 0 msb lsb 0 0 big endian byte 0 1 2 3 Alignment: require that objects fall on address that is multiple of their size. Modified by S. J. Fritz Spring 2009 (29) Aligned Not Aligned 1 2 3 Big Endian • "Big Endian" means that the high-order (most significant) byte of the number is stored in memory at the lowest address, and the low-order (least significant) byte at the highest address. (The “big end” comes first.) • A LongInt, would then be stored as: Base Address+0 Byte3 Base Address+1 Byte2 Base Address+2 Byte1 Base Address+3 Byte0 Big Endian A B C D Little Endian • Motorola processors (those used in Mac's) and mainframes use "Big Endian" byte order. • http://www.cs.umass.edu/~Verts/cs32/endian.html Modified by S. J. Fritz Spring 2009 (30) Little Endian • "Little Endian" means that the low-order byte of the number is stored in memory at the lowest address, and the highorder byte at the highest address. (The little end comes first.) • A 4 byte LongInt Byte3 Byte2 Byte1 Byte0 will be arranged in memory as follows: Base Address+0 Byte0 Base Address+1 Byte1 Base Address+2 Byte2 Base Address+3 Byte3 • Intel processors (those used in PC's) use "Little Endian" byte order. • http://www.cs.umass.edu/~Verts/cs32/endian.html Modified by S. J. Fritz Spring 2009 (31) Big Endian and Little Endian • To represent the value 1025 (as a 4 byte integer): 00000000 00000000 00000100 00000001 Address Big-Endian 00 01 02 03 00000000 00000000 00000100 00000001 Little-Endian 00000001 00000100 00000000 00000000 If UNIX were stored as 2 two-byte words then in a Big-Endian systems, it would be stored as UNIX; in a Little-Endian system, it would be stored as NUXI. (See http://www.webopedia.com/TERM/B/big_endian.html ) Modified by S. J. Fritz Spring 2009 (32) Big and Little Endian • Both have advantages and disadvantages • In "Big Endian" form, by having the high-order byte come first, you can test whether the number is positive or negative by looking at the byte at offset zero. • In "Little Endian" form, assembly language instructions for picking up a 1, 2, 4, or longer byte number proceed in exactly the same way for all formats and multiple precision math routines are correspondingly easy to write. • http://www.cs.umass.edu/~Verts/cs32/endian.html Modified by S. J. Fritz Spring 2009 (33) MIPS I Registers • Programmable storage – 232 x bytes of memory(r0-31) – 32 x 32-bit – General Purpose Registers GPRs (R0 = 0) r0 r1 ° ° ° r31 PC lo hi 0 32 bits “wide” Modified by S. J. Fritz Spring 2009 (34) Memory Addresses and Contents 3 2 1 0 Address Processor 100 10 101 1 Data Memory The address of the third data element is 2 and the contents of Memory[2] is 10. Modified by S. J. Fritz Spring 2009 (35) Registers vs. Memory • Registers are faster to access than memory • Operating on memory data requires loads and stores – More instructions to be executed • Compiler must use registers for variables as much as possible – Only spill to memory for less frequently used variables – Register optimization is important! Modified by S. J. Fritz Spring 2009 (36) Arrays and Data Structures • C and Java variables map onto registers; what about large data structures like arrays? • 1 of the 5 components of a computer - the memorycontains such data structures • But MIPS arithmetic instructions only operate on registers, never directly on memory. • Data transfer instructions transfer data between registers and memory: – Memory to register (Load) – Register to memory (Store) Modified by S. J. Fritz Spring 2009 (37) Anatomy: 5 components of any Computer Registers are in the datapath of the processor; if operands are in memory, we must transfer them to the processor to operate on them, and then transfer back to memory when done. Personal Computer Computer Processor Control (“brain”) Datapath Registers Memory Devices Input Store (to) Load (from) Output These are “data transfer” instructions… Modified by S. J. Fritz Spring 2009 (38) Data Transfer: Memory to Registers • To transfer a word of data, we need to specify two things: – Register: specify this by # ($0 - $31) or symbolic name ($s0,…, $t0, …) – Memory address: more difficult • Think of memory as a single one-dimensional array, so we can address it simply by supplying a pointer to a memory address. • Other times, we want to be able to offset from this pointer. • Remember: “Load FROM memory” Modified by S. J. Fritz Spring 2009 (39) Data Transfer: Memory to Registers • To specify a memory address to copy from, specify two things: – A register containing a pointer to memory – A numerical offset (in bytes) • The desired memory address is the sum of these two values. • Example: 8($t0) – specifies the memory address pointed to by the value in $t0, plus 8 bytes Modified by S. J. Fritz Spring 2009 (40) Data Transfer: Memory to Register • Load Instruction Syntax: 1 2, 3(4) lw $t0,12($s0) where 1) operation name 2) register that will receive value 3) numerical offset in bytes 4) register containing pointer to memory • MIPS Instruction Name: – lw (meaning Load Word, so 32 bits or one word are loaded at a time) Modified by S. J. Fritz Spring 2009 (41) Data Transfer: Memory to Register Data flow Example:lw $t0,12($s0) This instruction will take the pointer in $s0, add 12 bytes to it, and then load the value from the memory pointed to by this calculated sum into register $t0 • Notes: – $s0 is called the base register – 12 is called the offset – offset is generally used in accessing elements of array or structure: base register points to beginning of array or structure Modified by S. J. Fritz Spring 2009 (42) Data Transfer: Register to Memory • Also want to store from register into memory – Store instruction syntax is identical to Load’s • MIPS Instruction Name: sw (meaning Store Word, so 32 bits or one word are loaded at a time) Data flow • Example:sw $t0,12($s0) This instruction will take the pointer in $s0, add 12 bytes to it, and then store the value from register $t0 into that memory address • Remember: “ Store INTO memory” Modified by S. J. Fritz Spring 2009 (43) Data Transfer Instructions- Load • Load copies data from memory to a registerin MIPS lw or load word • Format – operation name followed by the register to be loaded, then a constant and register used to access memory • Sum of the constant portion of the instruction and the constants of the second registers forms the memory address Modified by S. J. Fritz Spring 2009 (44) Data Transfer Instructions- Load • Assume A is an array of 100 words, with a starting or base address in $s3 • Let g, h be variables associated with $s1,$s2 • C Assignment Statement: g = h + A[8]; • Compiling to MIPS with operand in Memory • First transfer A[8] to a register: (use load word - lw) lw $t0, 8($s3) #Temp register $t0 gets A[8] add $s1,$s2,$t0 # g = h + A[8] • The constant 8 is the offset and the register ($s3) added to form the address is called the base register. Modified by S. J. Fritz Spring 2009 (45) Actual MIPS Memory Addresses and Contents 12 8 4 0 Address Processor 100 10 101 1 Data Memory Since MIPS addresses each byte, word addresses are multiples of 4; there are 4 bytes in a word. Byte address of third word is 8. Modified by S. J. Fritz Spring 2009 (46) Data Transfer Instructions- Store • Instruction complementary to load is called store, or store word – sw- which copies data from a register to memory • Format similar to load instruction: name of operation, followed by the register to be stored, then offset to select the element, and finally the base register. Modified by S. J. Fritz Spring 2009 (47) Data Transfer Instructions- Store • • • • Assume variable h is associated with register $s2 Base address of array A is in $s3. C code: A[12] = h + A[8]; MIPS code: lw $t0, 32($s3) # temp reg $t0 gets A[8] add $t0, $s2, $t0 # temp reg $t0 gets h + A[8] sw $t0, 48($s3) # stores h + A[8] into A[12] Modified by S. J. Fritz Spring 2009 (48) MIPS Memory Addressing • Most architectures addresses individual bytes, therefore the address of a word matches the address of one of the 4 bytes within the word. • Addresses of sequential words differ by 4. • In MIPS, words must start at addresses that are multiples of 4.- called assignment restriction. • Remember MIPS is “big endian” • Byte addressing affects the array index. • Offset to be added to the base register $s3 (in previous example) must be (4 x 8) or 32. Modified by S. J. Fritz Spring 2009 (49) Constants or Immediate Operands • Design Principle 3: Make the common case FAST • Constants occur frequently and by including constants in arithmetic instructions, they are faster than if the constants were loaded from memory: lw $t0, AddrContant4($s1) # t0 = constant 4 • To add 4 to register 3 use add immediate (addi): addi $s3, $s3, 4 # $s3 = $s3+4 • Since MIPS supports negative constants, there is no need for a subtract immediate instruction. Modified by S. J. Fritz Spring 2009 (50) Pointers v. Values • Key Concept: • A register can hold any 32-bit value. That value can be a (signed) int, an unsigned int, a pointer (memory address), and so on • If you write add $t2,$t1,$t0 then $t0 and $t1 must contain values • If you write lw $t2,0($t0) then $t0 must contain a pointer to memory • Don’t mix these up! Modified by S. J. Fritz Spring 2009 (51) Addressing: Byte vs. Word • Every word in memory has an address, similar to an index in an array • Early computers numbered words like C numbers elements of an array: – Memory[0], Memory[1], Memory[2], … Called the “address” of a word • Computers needed to access 8-bit bytes as well as words (4 bytes/word) • Today machines address memory as bytes, (i.e.,“Byte Addressed”) hence 32-bit (4 byte) word addresses differ by 4 – Memory[0], Memory[4], Memory[8], … Modified by S. J. Fritz Spring 2009 (52) Immediates • Immediates are numerical constants. • They appear often in code, so there are special instructions for them. • Add Immediate: addi $s0,$s1,10 (in MIPS) f = g + 10 (in C) where MIPS registers $s0,$s1 are associated with C or Java variables f, g • Syntax similar to add instruction, except that the last argument is a number instead of a register. Modified by S. J. Fritz Spring 2009 (53) Immediates and Subtraction • There is no Subtract Immediate in MIPS: Why? • Limit types of operations that can be done to absolute minimum – negative constants are less frequent – if an operation can be decomposed into a simpler operation, don’t include it – addi …, -X = subi …, X => no need for subi • addi $s0,$s1,-10 (in MIPS) f = g - 10 (in C) where MIPS registers $s0,$s1 are associated with C or Java variables f, g Modified by S. J. Fritz Spring 2009 (54) Register Zero • One particular immediate, the number zero (0), appears very often in code. • So we define register zero ($0 or $zero) to always have the value 0; for example: add $s0,$s1,$zero (in MIPS) f = g (in C) where MIPS registers $s0,$s1 are associated with C variables f, g • Defined in hardware, so an instruction add $zero,$zero,$s0 will not do anything! Modified by S. J. Fritz Spring 2009 (55) Summarizing... • In MIPS Assembly Language: – – – – Registers replace C variables One Instruction (simple operation) per line Simpler is Better Smaller is Faster • New Instructions: add, addi, sub # arithmetic operations lw, sw # load, store –from/to memory • New Registers: C or Java Variables: $s0 - $s7 Temporary Variables: $t0 - $t9 Zero: $zero Modified by S. J. Fritz Spring 2009 (56) Compilation with Memory • What offset in lw to select A[5] in C/Java? • 4x5=20 to select A[5]: byte v. word • Compile by hand using registers: g = h + A[5]; where g: $s1, h: $s2, $s3:base address of A • 1st transfer from memory to register: lw $t0,20($s3) # $t0 gets A[5] – Add 20 to $s3 to select A[5], put into $t0 • Next add it to h and place in g add $s1,$s2,$t0 # $s1 = h+A[5] Modified by S. J. Fritz Spring 2009 (57) MIPS Instruction Encoding Instruction Format op rs add R 0 reg reg reg 0 n.a. sub R 0 reg reg reg 0 n.a. addi I 8 reg reg n.a. n.a. constant lw I 35 reg reg n.a. n.a. address sw I 43 reg reg n.a. n.a. address Modified by S. J. Fritz Spring 2009 (58) rt rd shamt funct MIPS Assembler Register Convention Name Number Usage Preserved across a call? the value 0 n/a return values no arguments no temporaries no saved yes temporaries no stack pointer yes return address yes $zero $v0-$v1 $a0-$a3 $t0-$t7 $s0-$s7 $t18-$t19 $sp $ra 0 2-3 4-7 8-15 16-23 24-25 29 31 • “caller saved” • “callee saved” • On Green Card in Column #2 at bottom Modified by S. J. Fritz Spring 2009 (59) Notes about Memory • Pitfall: • Forgetting that sequential word addresses in machines with byte addressing do not differ by 1. – Many an assembly language programmer has toiled over errors made by assuming that the address of the next word can be found by incrementing the address in a register by 1 instead of by the word size in bytes. – So remember that for both lw and sw, the sum of the base address and the offset must be a multiple of 4 (to be word aligned) Modified by S. J. Fritz Spring 2009 (60) Memory Operand Example 1 • C code: g = h + A[8]; – g in $s1, h in $s2, base address of A in $s3 • Compiled MIPS code: – Index 8 requires offset of 32 • 4 bytes per word lw $t0, 32($s3) add $s1, $s2, $t0 offset Modified by S. J. Fritz Spring 2009 (61) # load word base register Memory Operand Example 2 • C code: A[12] = h + A[8]; – Variable h in $s2, base address of A in $s3 • Compiled MIPS code: – Index 8 requires offset of 32 lw $t0, 32($s3) # load word add $t0, $s2, $t0 sw $t0, 48($s3) # store word Modified by S. J. Fritz Spring 2009 (62) • Computers store numbers as binary digits (bits) • Given an n-bit number x x n1 2n1 x n2 2n2 x1 21 x 0 20 • Range: 0 to +2n – 1 • Example 0000 0000 0000 0000 0000 0000 0000 10112 = 0 + … + 1×23 + 0×22 +1×21 +1×20 = 0 + … + 8 + 0 + 2 + 1 = 1110 • Using 32 bits 0 to +4,294,967,295 Modified by S. J. Fritz Spring 2009 (63) §2.4 Signed and Unsigned Numbers Unsigned Binary Integers Binary Numbers • The MIPS word is 32 bits so we can represent 232 different values. • Least significant bit refers to the rightmost bit • Most significant bit is the leftmost bit. • Sign and magnitude uses a separate sign bit to distinguish positive and negative numbers. Not used because of difficulty with arithmetic… Modified by S. J. Fritz Spring 2009 (64) Two’s Complement Representation • Makes hardware representation simple: • Leading zero(0) means positive, leading one (1) means negative –called the sign bit • All negative numbers begin with a 1. • Has one negative number –2,147,483,64810 that does not have a corresponding positive number. Modified by S. J. Fritz Spring 2009 (65) Two’s Complement Representation • To form the negation of a binary number – Invert all bits to form the complement – Add one For example, to negate binary 28 00011100 Binary 28 - Invert the digits. (0 becomes 1, 1 becomes 0) 11100011 Then we add 1. +1 11100100 Binary -28 For more information see: http://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html Modified by S. J. Fritz Spring 2009 (66) Two’s Complement Representation • Going in the opposite direction- taking the negation and transforming it into the positive binary number – Invert all bits to form the complement – Add one For example, to negate binary -28 11100100 Binary -28 - Invert the digits. (0 becomes 1, 1 becomes 0) 00011011 Then we add 1. +1 00011100 Binary 28 • This works because the binary representation of a sum of a number and its inverse equal –1 x + x = -1 Modified by S. J. Fritz Spring 2009 (67) 2’s Complement Simulator • Try it with a simulator: • http://scholar.hw.ac.uk/site/computing/activity12.asp?outline Modified by S. J. Fritz Spring 2009 (68) 2s-Complement Signed Integers Example • Given an n-bit number represented as x x n1 2n1 x n2 2n2 x1 21 x 0 20 • Range: –2n – 1 to +2n – 1 – 1 • Example 1111 1111 1111 1111 1111 1111 1111 11002 = –1×231 + 1×230 + … + 1×22 +0×21 +0×20 = –2,147,483,648 + 2,147,483,644 = –410 • Using 32 bits –2,147,483,648 to +2,147,483,647 Modified by S. J. Fritz Spring 2009 (69) 2s-Complement Signed Integers • Bit 31 is sign bit – 1 for negative numbers – 0 for non-negative numbers • –(–2n – 1) can’t be represented • Non-negative numbers have the same unsigned and 2s-complement representation • Some specific numbers: 0: 0000 0000 … 0000 –1: 1111 1111 … 1111 Most-negative: 1000 0000 … 0000 Most-positive: 0111 1111 … 1111 Modified by S. J. Fritz Spring 2009 (70) More Examples • References for Two’s Complement notation • http://www.duke.edu/~twf/cps104/twoscomp.html • http://en.wikipedia.org/wiki/Two's_complement • http://mathforum.org/library/drmath/sets/select/dm_twos_com plement.html • http://www.fact-index.com/t/tw/two_s_complement.html • http://www.hal-pc.org/~clyndes/computerarithmetic/twoscomplement.html • http://www.vb-helper.com/tutorial_twos_complement.html • http://web.bvu.edu/faculty/traylor/CS_Help_Stuff/Two's%20Co mplement.htm Modified by S. J. Fritz Spring 2009 (71) Sign Extension • Representing a number using more bits – Preserve the numeric value • In MIPS instruction set – addi: extend immediate value – lb, lh: extend loaded byte/halfword – beq, bne: extend the displacement • Replicate the sign bit to the left – c.f. unsigned values: extend with 0s • Examples: 8-bit to 16-bit – +2: 0000 0010 => 0000 0000 0000 0010 – –2: 1111 1110 => 1111 1111 1111 1110 Modified by S. J. Fritz Spring 2009 (72) • Instructions are encoded in binary – Called machine code • MIPS instructions – Encoded as 32-bit instruction words – Small number of formats encoding operation code (opcode), register numbers, … – Regularity! • Register numbers ( important!) – $t0 – $t7 are registers 8 – 15 – $t8 – $t9 are registers 24 – 25 – $s0 – $s7 are registers 16 – 23 Modified by S. J. Fritz Spring 2009 (73) §2.5 Representing Instructions in the Computer Representing Instructions R-Format Instructions • Define “fields” of the following number of bits each: 6 + 5 + 5 + 5 + 5 + 6 = 32 6 5 5 5 5 6 • For simplicity, each field has a name: opcode rs Modified by S. J. Fritz Spring 2009 (74) rt rd shamt funct R-Format Instructions • Meaning of fields: – rs (Source Register): generally used to specify register containing first operand – rt (Target Register): generally used to specify register containing second operand (note that name is misleading) – rd (Destination Register): generally used to specify register which will receive result of computation – shamt (Shift amount) – funct ( Function) - selects specific variant of the opcode operation - sometimes called function code Modified by S. J. Fritz Spring 2009 (75) MIPS R-Format Instructions - Summary op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits • MIPS fileds are given names to make them easier to remember • Instruction fields – – – – – – op: operation code (opcode) rs: first source register number rt: second source register number rd: destination register number shamt: shift amount (00000 for now) funct: function code (extends opcode) Modified by S. J. Fritz Spring 2009 (76) R-format Example op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits add $t0, $s1, $s2 Instruction format or layout special $s1 $s2 $t0 0 add hex 0 17 18 8 0 32 binary 000000 10001 10010 01000 00000 100000 Mips instruction: 000000100011001001000000001000002 = 0232402016 Modified by S. J. Fritz Spring 2009 (77) Hexadecimal • Base 16 – Compact representation of bit strings – 4 bits per hex digit 0 1 2 0000 0001 0010 4 5 6 0100 0101 0110 8 9 a 1000 1001 1010 c d e 1100 1101 1110 3 0011 7 0111 b 1011 f 1111 • Example: e c a 8 6 4 2 0 1110 1100 1010 1000 0110 0100 0010 0000 Modified by S. J. Fritz Spring 2009 (78) Why Multiple Instruction Formats? • Design Principle 4: Good design demands good compromises – there is a need to keep instructions the same length and desire for a single format • There is a problem using previous (R-format) when an instruction needs longer fields – for example lw must specify two registers and a constant, but the constant would have only 5 bits available, so the largest value would be 25 = 32 • Solution: allow I and J formats for different instructions - but keep all the same length = 32 bits Modified by S. J. Fritz Spring 2009 (79) Overview of MIPS • Simple instructions all 32 bits wide • Very structured, no unnecessary baggage • Only three instruction formats R op rs rt rd I op rs rt 16 bit address J op shamt 26 bit address rely on compiler to achieve performance — what are the compiler's goals? • help compiler where we can Modified by S. J. Fritz Spring 2009 (80) funct Additional MIPS Instruction Formats • I-format: used for instructions with immediates, lw and sw (since the offset counts as an immediate), and the branches (beq and bne), – (but not the shift instructions; later) • J-format: used for j and jal (jump and link) • R-format: used for all other instructions • It will soon become clear why the instructions have been partitioned in this way. Modified by S. J. Fritz Spring 2009 (81) MIPS I-format Instructions op rs rt constant or address 6 bits 5 bits 5 bits 16 bits • Format for Immediate arithmetic and load/store instructions – rt: destination or source register number – Constant: –215 to +215 – 1 – Address: offset added to base address in rs Modified by S. J. Fritz Spring 2009 (82) Instruction Format Names and Field Descriptions Instruction Fields Name 6 bits (31-26) R-format (6 fields) 5 bits Comments 5 bits 5 bits 5 bits 6 bits (25-21) (20-16) (15-11) (10-6) (5-0) op rs rt rd sham t funct Arithmetic instruction format I-format (4 fields) op rs rt Address/immediate Data Transfer, branch, immediate instruction format J-format (2 fields) op Target address All MIPS instructions 32 bits Jump instruction format Instruction field notes: The op and funct fields form the op-code. The rs field gives a source register and rt is also normally a source register. rd is the destination register, and shamt supplies the shift amount for logical shift operations. Modified by S. J. Fritz Spring 2009 (83) R-Format Example • MIPS Instruction: add $8,$9,$10 Decimal number per field representation: 0 9 10 8 0 32 Binary number per field representation: 000000 01001 01010 01000 00000 100000 hex representation: 012A 4020hex decimal representation: 19,546,144ten hex On Green Card: Format in column 1, opcodes in column 3 Modified by S. J. Fritz Spring 2009 (84) Green Card • green card /n./ [after the "IBM System/360 Reference Data" card] A summary of an assembly language, even if the color is not green. For example, "I'll go get my green card so I can check the addressing mode for that instruction." www.jargon.net Image from Dave's Green Card Collection: http://www.planetmvs.com/greencard/ Modified by S. J. Fritz Spring 2009 (85) J-Format Instructions • Define “fields” of the following number of bits each: 6 bits 26 bits • As usual, each field has a name: opcode target address • Key Concepts – Keep opcode field identical to R-format and I-format for consistency. – Combine all other fields to make room for large target address. Modified by S. J. Fritz Spring 2009 (86) Translating Assembly Language into Machine Language • Suppose $t1 has base of array A and $s2 corresponds to h in the assignment A[300] = h + A[300] • In MIPS : ( try this ) lw $t0, 1200 ($t1) # temp register $t0 gets A[300] add $t0, $s2, $t0 # temp register $t0 gets h+ A[300] sw $t0, 1200($t1) #stores h = A[300] back into A[300] These instructions can then be represented in machine language… Modified by S. J. Fritz Spring 2009 (87) Translating Assembly Language into Machine Language lw $t0, 1200 ($t1) # temp register $t0 gets A[300] add $t0, $s2, $t0 # temp register $t0 gets h+ A[300] sw $t0, 1200($t1) #stores h = A[300] back into A[300] op rs rt 35 9 8 0 18 8 43 9 8 Modified by S. J. Fritz Spring 2009 (88) rd add/shamt funct 1200 8 0 32 1200 Translating MIPS Assembly Language into Machine Language op rs rt 35 0 43 9 18 9 8 8 8 rd address /shamt 8 1200 0 1200 funct 32 The lw instruction (opcode) is 35, the base register is 9 ($t1), and the destination register ($t0) is 8. The offset 1200=300x4 is address. The add instruction is specified by 0 in the op field and 32 in the funct field. The sw instruction is 43 and the rest is similar to the lw instruction. See the summary on page 101. Modified by S. J. Fritz Spring 2009 (89) Translating MIPS Assembly Language into Machine Language Since 1200ten = 0000 0100 1011 0000two , the binary equivalent of the previous form is: 100011 01001 01000 0000 01 00 1011 0000 000000 10010 01000 01000 101011 01001 01000 0000 01 00 1011 0000 00000 100000 •Notice the similarity in the first and last instructions. The only difference is in the third bit from the left. •This similarity simplifies hardware design… Modified by S. J. Fritz Spring 2009 (90) Stored Program Computers The BIG Picture • Instructions represented in binary, just like data • Instructions and data stored in memory • Programs can operate on programs – e.g., compilers, linkers, … • Binary compatibility allows compiled programs to work on different computers – Standardized ISAs Modified by S. J. Fritz Spring 2009 (91) • Instructions for bitwise manipulation Operation C Java MIPS Shift left << << sll Shift right >> >>> srl Bitwise AND & & and, andi Bitwise OR | | or, ori Bitwise NOT ~ ~ nor • Useful for extracting and inserting groups of bits in a word Modified by S. J. Fritz Spring 2009 (92) §2.6 Logical Operations Logical Operations Shift Operations op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits • shamt: how many positions to shift • Shift left logical – Shift left and fill with 0 bits – sll by i bits multiplies by 2i – sll $t2, $s0, 4 # reg $t2 = reg $s0 << 4 bits • Shift right logical – Shift right and fill with 0 bits – srl by i bits divides by 2i (unsigned only) – srl $t2, $s0, 4 # reg $t2 = reg $s0 >> 4 bits Modified by S. J. Fritz Spring 2009 (93) AND Operations • Useful to mask bits in a word – Select some bits, clear others to 0 – Bit –by –bit operation, 1 if both are 1,0 otherwise and $t0, $t1, $t2 #reg $t0=reg $t1 & reg $t2 $t2 0000 0000 0000 0000 0000 1101 1100 0000 $t1 0000 0000 0000 0000 0011 1100 0000 0000 $t0 0000 0000 0000 0000 0000 1100 0000 0000 Modified by S. J. Fritz Spring 2009 (94) OR Operations • Useful to include bits in a word – Set some bits to 1, leave others unchanged – Places 1 in the result if either bit is 1, 0 otherwise or $t0, $t1, $t2 #reg $t0=reg $t1 | reg $t2 $t2 0000 0000 0000 0000 0000 1101 1100 0000 $t1 0000 0000 0000 0000 0011 1100 0000 0000 $t0 0000 0000 0000 0000 0011 1101 1100 0000 Modified by S. J. Fritz Spring 2009 (95) NOT Operations • Useful to invert bits in a word – Change 0 to 1, and 1 to 0 • For consistency, MIPS has NOR, a 3-operand instruction, instead of NOT – a NOR b == NOT ( a OR b ) nor $t0,$t1,$zero #reg$t0=-(reg $t1| $zero ) Register 0: always read as zero $t1 0000 0000 0000 0000 0011 1100 0000 0000 $t0 1111 1111 1111 1111 1100 0011 1111 1111 The full MIPS instruction set also includes XOR Modified by S. J. Fritz Spring 2009 (96) • MIPS includes two decision making instructions, similar to an if statement as well as a “go to” • Branch to a labeled instruction if a condition is true – Otherwise, continue sequentially • beq rs, rt, L1 #branch on equal – if (rs == rt) branch to instruction labeled L1; • bne rs, rt, L1 #branch not equal – if (rs != rt) branch to instruction labeled L1; • j L1 – unconditional jump to instruction labeled L1 Modified by S. J. Fritz Spring 2009 (97) §2.7 Instructions for Making Decisions Conditional Operations Compiling C/Java if into MIPS • Compile by hand if (i == j) f=g+h; else f=g-h; • Use this mapping: f: $s0 g: $s1 h: $s2 i: $s3 j: $s4 Modified by S. J. Fritz Spring 2009 (98) (true) i == j f=g+h (false) i == j? i != j f=g-h Exit Compiling If Statements • C code: if (i==j) f = g+h; else f = g-h; where f, g, … in $s0, $s1, … • Compiled MIPS code: bne add j Else: Exit: $s3, $s4, Else #goto Else if i ≠ j $s0, $s1, $s2 # skip if i = j Exit #goto Exit sub $s0, $s1, $s2 … Assembler calculates addresses Modified by S. J. Fritz Spring 2009 (99) Compiling Loop Statements • Code for loops is similar to that for decisions • C code: while (save[i] == k) i += 1; where i is in $s3, k in $s5, address of the array save is in $s6 – First load save[i] into a temporary register. To do so, we need to form the address by multiplying the index by 4. – Then add $t1 to the base of save in $s6. • Compiled MIPS code: Loop: sll add lw bne addi j Exit: … $t1, $t1, $t0, $t0, $s3, Loop $s3, 2 $t1, $s6 0($t1) $s5, Exit $s3, 1 Modified by S. J. Fritz Spring 2009 (100) #temp reg $t1 = i * 4 #$t1 = address of save[i] #temp reg $t1 = save[i] #goto Exit if save[i] = k # i= i +1 # goto Loop Basic Blocks • A basic block is a sequence of instructions with – No embedded branches (except at end) – No branch targets (except at beginning) • A compiler identifies basic blocks for optimization • An advanced processor can accelerate execution of basic blocks Modified by S. J. Fritz Spring 2009 (101) More Conditional Operations • Test for equality or inequality • Set result to 1 if a condition is true – Otherwise, set to 0 • slt rd, rs, rt #set on less than – if (rs < rt) rd = 1; else rd = 0; • slti rt, rs, constant #set immediate – if (rs < constant) rt = 1; else rt = 0; • Use in combination with beq, bne slt $t0, $s1, $s2 bne $t0, $zero, L Modified by S. J. Fritz Spring 2009 (102) # if ($s1 < $s2) # branch to L Branch Instruction Design • MIPS does not include a branch on less than instruction. • Why not blt, bge, etc? • Uses Von neumann’s warning to keep equipment simple • Hardware for <, ≥, … slower than =, ≠ – Combining with branch involves more work per instruction, requiring a slower clock – All instructions penalized! • beq and bne are the common case • This is a good design compromise- to use only slt, slti,beq, bne and zero for all relative conditions. Modified by S. J. Fritz Spring 2009 (103) Signed vs. Unsigned • Signed comparison: slt, slti • Unsigned comparison: sltu, sltui • Example $s0 = 1111 1111 1111 1111 1111 1111 1111 1111 $s1 = 0000 0000 0000 0000 0000 0000 0000 0001 slt $t0, $s0, $s1 # signed –1 < +1 $t0 = 1 sltu $t0, $s0, $s1 # unsigned +4,294,967,295 > +1 $t0 = 0 The value in reg $s0 is –1 if it is an integer and 4,294967,295 if it is an unsigned integer. Register $s1 contains a 1 in either case. Modified by S. J. Fritz Spring 2009 (104) Signed vs. Unsigned • Treating signed numbers as if they were unsigned gives us a low cost way to check if 0 < x <y • This can also be used for a bounds check for an array. • An unsigned comparison of x < y also checks if x is negative as well as if x is less than y. Modified by S. J. Fritz Spring 2009 (105) Case/Switch Statement • Simplest implementation of switch is with a sequence of if-then-else statements • Alternative include a jump address table, or jump table. – Program indexes into the table and then jumps to the appropriate sequence – Jump table is array of addresses corresponding to the labels in the code – Loads entry into a register and then jumps to the address in the register – MIPS includes a jump register instruction (jr) Modified by S. J. Fritz Spring 2009 (106) “And in Conclusion…” • Memory is byte-addressable, but lw and sw access one word at a time. • A pointer (used by lw and sw) is just a memory address, so we can add to it or subtract from it (using offset). • A Decision allows us to decide what to execute at run-time rather than compile-time. • C/Java decisions are made using conditional statements within if, while, do while, for. • MIPS Decision making instructions are the conditional branches: beq and bne. • New Instructions: lw, sw, beq, bne, j Modified by S. J. Fritz Spring 2009 (107) • Steps required 1. 2. 3. 4. 5. 6. Place parameters in registers Transfer control to procedure Acquire storage for procedure Perform procedure’s operations Place result in register for caller Return to place of call Modified by S. J. Fritz Spring 2009 (108) §2.8 Supporting Procedures in Computer Hardware Procedure Calling Register Usage • $a0 – $a3: arguments (registers 4 – 7) • $v0, $v1: result values (registers 2 and 3) • $t0 – $t9: temporaries – Can be overwritten by callee • $s0 – $s7: saved – Must be saved/restored by callee • • • • $gp: global pointer for static data (reg 28) $sp: stack pointer (reg 29) $fp: frame pointer (reg 30) $ra: return address (reg 31) Modified by S. J. Fritz Spring 2009 (109) Procedure Call Instructions • Procedure call: jump and link (jal) jal ProcedureLabel – Address of following instruction put in $ra – Jumps to target address • Procedure return: jump register (jr) jr $ra – Copies $ra to program counter – Can also be used for computed jumps • e.g., for case/switch statements Modified by S. J. Fritz Spring 2009 (110) Procedure Call Instructions • The link means that an address or link is formed that points to the calling site that allows the procedure to return to the proper address – (return address)- stored in $ra • The calling program – the caller, puts the parameter values in a register ($a0-$a3) and uses jal X to jump to procedure X (the callee). • The callee performs the calculations and then returns to the caller using jr $ra • The address of the current instruction is saved in the program counter - PC Modified by S. J. Fritz Spring 2009 (111) Using More Registers • Compilers often need additional registers – to spill register to memory • Stack – (LIFO)- last in first out data structure • Stack pointer- adjusted by one word for each registered saved or restored • MIPS reserves register for the stack pointer, $sp • Stack “grows” from higher to lower addresses • Push - places data on the stack (subtract form stack pointer) • Pop- removes data from the stack ( adds to the stack pointer) • Stack “grows” from higher to lower addresses Modified by S. J. Fritz Spring 2009 (112) Leaf Procedure Example • C code: int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } – Arguments g, …, j in $a0, …, $a3 – f in $s0 (hence, need to save $s0 on stack) – Result in $v0 Modified by S. J. Fritz Spring 2009 (113) Leaf Procedure Example • MIPS code: • leaf_example: addi $sp, $sp, -4 sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra Modified by S. J. Fritz Spring 2009 (114) Save $s0 on stack Procedure body Result Restore $s0 Return Non-Leaf Procedures • Procedures that call other procedures • For nested call, caller needs to save on the stack: – Its return address – Any arguments and temporaries needed after the call • Restore from the stack after the call Modified by S. J. Fritz Spring 2009 (115) Non-Leaf Procedure Example • C code: int fact (int n) { if (n < 1) return f; else return n * fact(n - 1); } – Argument n in $a0 – Result in $v0 Modified by S. J. Fritz Spring 2009 (116) Non-Leaf Procedure Example • MIPS code: fact: addi sw sw slti beq addi addi jr L1: addi jal lw lw addi mul jr $sp, $ra, $a0, $t0, $t0, $v0, $sp, $ra $a0, fact $a0, $ra, $sp, $v0, $ra $sp, -8 4($sp) 0($sp) $a0, 1 $zero, L1 $zero, 1 $sp, 8 $a0, -1 0($sp) 4($sp) $sp, 8 $a0, $v0 Modified by S. J. Fritz Spring 2009 (117) # # # # adjust stack for 2 items save return address save argument test for n < 1 # # # # # # # # # # if so, result is 1 pop 2 items from stack and return else decrement n recursive call restore original n and return address pop 2 items from stack multiply to get result and return Local Data on the Stack • Local data allocated by callee – e.g., C automatic variables • Procedure frame (activation record) – Used by some compilers to manage stack storage Modified by S. J. Fritz Spring 2009 (118) Memory Layout • Text: program code • Static data: global variables – e.g., static variables in C, constant arrays and strings – $gp initialized to address allowing ±offsets into this segment • Dynamic data: heap – E.g., malloc in C, new in Java • Stack: automatic storage Modified by S. J. Fritz Spring 2009 (119) • Byte-encoded character sets – ASCII: 128 characters • 95 graphic, 33 control – Latin-1: 256 characters • ASCII, +96 more graphic characters • Unicode: 32-bit character set – Used in Java, C++ wide characters, … – Most of the world’s alphabets, plus symbols – UTF-8, UTF-16: variable-length encodings Modified by S. J. Fritz Spring 2009 (120) §2.9 Communicating with People Character Data Byte/Halfword Operations • Could use bitwise operations • MIPS byte/halfword load/store – String processing is a common case lb rt, offset(rs) lh rt, offset(rs) – Sign extend to 32 bits in rt lbu rt, offset(rs) lhu rt, offset(rs) – Zero extend to 32 bits in rt sb rt, offset(rs) sh rt, offset(rs) – Store just rightmost byte/halfword Modified by S. J. Fritz Spring 2009 (121) String Copy Example • C code (naïve): – Null-terminated string void strcpy (char x[], char y[]) { int i; i = 0; while ((x[i]=y[i])!='\0') i += 1; } – Addresses of x, y in $a0, $a1 – i in $s0 Modified by S. J. Fritz Spring 2009 (122) String Copy Example • MIPS code: strcpy: addi sw add L1: add lbu add sb beq addi j L2: lw addi jr $sp, $s0, $s0, $t1, $t2, $t3, $t2, $t2, $s0, L1 $s0, $sp, $ra $sp, -4 0($sp) $zero, $zero $s0, $a1 0($t1) $s0, $a0 0($t3) $zero, L2 $s0, 1 0($sp) $sp, 4 Modified by S. J. Fritz Spring 2009 (123) # # # # # # # # # # # # # adjust stack for 1 item save $s0 i = 0 addr of y[i] in $t1 $t2 = y[i] addr of x[i] in $t3 x[i] = y[i] exit loop if y[i] == 0 i = i + 1 next iteration of loop restore saved $s0 pop 1 item from stack and return • Most constants are small – 16-bit immediate is sufficient • For the occasional 32-bit constant lui rt, constant – Copies 16-bit constant to left 16 bits of rt – Clears right 16 bits of rt to 0 lhi $s0, 61 0000 0000 0111 1101 0000 0000 0000 0000 ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000 Modified by S. J. Fritz Spring 2009 (124) §2.10 MIPS Addressing for 32-Bit Immediates and Addresses 32-bit Constants Branch Addressing • Branch instructions specify – Opcode, two registers, target address • Most branch targets are near branch – Forward or backward op rs rt constant or address 6 bits 5 bits 5 bits 16 bits • PC-relative addressing – Target address = PC + offset × 4 – PC already incremented by 4 by this time Modified by S. J. Fritz Spring 2009 (125) Jump Addressing • Jump (j and jal) targets could be anywhere in text segment – Encode full address in instruction op address 6 bits 26 bits • (Pseudo)Direct jump addressing – Target address = PC31…28 : (address × 4) Modified by S. J. Fritz Spring 2009 (126) Target Addressing Example • Loop code from earlier example – Assume Loop at location 80000 Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0 add $t1, $t1, $s6 80004 0 9 22 9 0 32 lw $t0, 0($t1) 80008 35 9 8 0 bne $t0, $s5, Exit 80012 5 8 21 2 19 19 1 addi $s3, $s3, 1 80016 8 j 80020 2 Loop Exit: … Modified by S. J. Fritz Spring 2009 (127) 80024 20000 Branching Far Away • If branch target is too far to encode with 16-bit offset, assembler rewrites the code • Example beq $s0,$s1, L1 ↓ bne $s0,$s1, L2 j L1 L2: … Modified by S. J. Fritz Spring 2009 (128) Addressing Mode Summary Modified by S. J. Fritz Spring 2009 (129) • Two processors sharing an area of memory – P1 writes, then P2 reads – Data race if P1 and P2 don’t synchronize • Result depends of order of accesses • Hardware support required – Atomic read/write memory operation – No other access to the location allowed between the read and write • Could be a single instruction – E.g., atomic swap of register ↔ memory – Or an atomic pair of instructions Modified by S. J. Fritz Spring 2009 (130) §2.11 Parallelism and Instructions: Synchronization Synchronization Synchronization in MIPS • Load linked: ll rt, offset(rs) • Store conditional: sc rt, offset(rs) – Succeeds if location not changed since the ll • Returns 1 in rt – Fails if location is changed • Returns 0 in rt • Example: atomic swap (to test/set lock variable) try: add ll sc beq add $t0,$zero,$s4 $t1,0($s1) $t0,0($s1) $t0,$zero,try $s4,$zero,$t1 Modified by S. J. Fritz Spring 2009 (131) ;copy exchange value ;load linked ;store conditional ;branch store fails ;put load value in $s4 Many compilers produce object modules directly Static linking Modified by S. J. Fritz Spring 2009 (132) §2.12 Translating and Starting a Program Translation and Startup Assembler Pseudoinstructions • Most assembler instructions represent machine instructions one-to-one • Pseudoinstructions: figments of the assembler’s imagination → add $t0, $zero, $t1 blt $t0, $t1, L → slt $at, $t0, $t1 move $t0, $t1 bne $at, $zero, L – $at (register 1): assembler temporary Modified by S. J. Fritz Spring 2009 (133) Producing an Object Module • Assembler (or compiler) translates program into machine instructions • Provides information for building a complete program from the pieces – Header: described contents of object module – Text segment: translated instructions – Static data segment: data allocated for the life of the program – Relocation info: for contents that depend on absolute location of loaded program – Symbol table: global definitions and external refs – Debug info: for associating with source code Modified by S. J. Fritz Spring 2009 (134) Linking Object Modules • Produces an executable image 1. Merges segments 2. Resolve labels (determine their addresses) 3. Patch location-dependent and external refs • Could leave location dependencies for fixing by a relocating loader – But with virtual memory, no need to do this – Program can be loaded into absolute location in virtual memory space Modified by S. J. Fritz Spring 2009 (135) Loading a Program • Load from image file on disk into memory 1. Read header to determine segment sizes 2. Create virtual address space 3. Copy text and initialized data into memory • Or set page table entries so they can be faulted in 4. Set up arguments on stack 5. Initialize registers (including $sp, $fp, $gp) 6. Jump to startup routine • Copies arguments to $a0, … and calls main • When main returns, do exit syscall Modified by S. J. Fritz Spring 2009 (136) Dynamic Linking • Only link/load library procedure when it is called – Requires procedure code to be relocatable – Avoids image bloat caused by static linking of all (transitively) referenced libraries – Automatically picks up new library versions Modified by S. J. Fritz Spring 2009 (137) Lazy Linkage Indirection table Stub: Loads routine ID, Jump to linker/loader Linker/loader code Dynamically mapped code Modified by S. J. Fritz Spring 2009 (138) Starting Java Applications Simple portable instruction set for the JVM Compiles bytecodes of “hot” methods into native code for host machine Modified by S. J. Fritz Spring 2009 (139) Interprets bytecodes • Illustrates use of assembly instructions for a C bubble sort function • Swap procedure (leaf) void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } – v in $a0, k in $a1, temp in $t0 Modified by S. J. Fritz Spring 2009 (140) §2.13 A C Sort Example to Put It All Together C Sort Example The Procedure Swap swap: sll $t1, $a1, 2 # $t1 = k * 4 add $t1, $a0, $t1 # $t1 = v+(k*4) # (address of v[k]) lw $t0, 0($t1) # $t0 (temp) = v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 (v[k+1]) sw $t0, 4($t1) # v[k+1] = $t0 (temp) jr $ra # return to calling routine Modified by S. J. Fritz Spring 2009 (141) The Sort Procedure in C • Non-leaf (calls swap) void sort (int v[], int n) { int i, j; for (i = 0; i < n; i += 1) { for (j = i – 1; j >= 0 && v[j] > v[j + 1]; j -= 1) { swap(v,j); } } } – v in $a0, k in $a1, i in $s0, j in $s1 Modified by S. J. Fritz Spring 2009 (142) The Procedure Body move move move for1tst: slt beq addi for2tst: slti bne sll add lw lw slt beq move move jal addi j exit2: addi j $s2, $a0 $s3, $a1 $s0, $zero $t0, $s0, $s3 $t0, $zero, exit1 $s1, $s0, –1 $t0, $s1, 0 $t0, $zero, exit2 $t1, $s1, 2 $t2, $s2, $t1 $t3, 0($t2) $t4, 4($t2) $t0, $t4, $t3 $t0, $zero, exit2 $a0, $s2 $a1, $s1 swap $s1, $s1, –1 for2tst $s0, $s0, 1 for1tst # # # # # # # # # # # # # # # # # # # # # save $a0 into $s2 save $a1 into $s3 i = 0 $t0 = 0 if $s0 ≥ $s3 (i ≥ n) go to exit1 if $s0 ≥ $s3 (i ≥ n) j = i – 1 $t0 = 1 if $s1 < 0 (j < 0) go to exit2 if $s1 < 0 (j < 0) $t1 = j * 4 $t2 = v + (j * 4) $t3 = v[j] $t4 = v[j + 1] $t0 = 0 if $t4 ≥ $t3 go to exit2 if $t4 ≥ $t3 1st param of swap is v (old $a0) 2nd param of swap is j call swap procedure j –= 1 jump to test of inner loop i += 1 jump to test of outer loop Move params Outer loop Inner loop Pass params & call Inner loop Outer loop Modified by S. J. Fritz Spring 2009 (143) The Full Procedure sort: addi $sp,$sp, –20 sw $ra, 16($sp) sw $s3,12($sp) sw $s2, 8($sp) sw $s1, 4($sp) sw $s0, 0($sp) … … exit1: lw $s0, 0($sp) lw $s1, 4($sp) lw $s2, 8($sp) lw $s3,12($sp) lw $ra,16($sp) addi $sp,$sp, 20 jr $ra Modified by S. J. Fritz Spring 2009 (144) # # # # # # # make room on stack for 5 registers save $ra on stack save $s3 on stack save $s2 on stack save $s1 on stack save $s0 on stack procedure body # # # # # # # restore $s0 from stack restore $s1 from stack restore $s2 from stack restore $s3 from stack restore $ra from stack restore stack pointer return to calling routine Effect of Compiler Optimization Compiled with gcc for Pentium 4 under Linux Relative Performance 3 2 100000 1 50000 0 0 none O1 O2 none O1 O2 Modified by S. J. Fritz Spring 2009 (145) none O3 Clock Cycles 200000 150000 100000 50000 0 Instruction count 150000 O1 O3 CPI 2 1.5 1 0.5 0 O3 O2 none O1 O2 O3 Effect of Language and Algorithm Java/int Java/JI T Java/int Java/JI T C/O3 C/O1 C/none C/O2 Bubblesort Relative Performance 3 2 1 0 C/O3 C/O1 C/none C/O2 Quicksort Relative Performance 3 2 1 0 Java/JI T Java/int C/O3 C/O2 C/none Modified by S. J. Fritz Spring 2009 (146) C/O1 Quicksort vs. Bubblesort Speedup 3000 2000 1000 0 Lessons Learned • Instruction count and CPI are not good performance indicators in isolation • Compiler optimizations are sensitive to the algorithm • Java/JIT compiled code is significantly faster than JVM interpreted – Comparable to optimized C in some cases • Nothing can fix a dumb algorithm! Modified by S. J. Fritz Spring 2009 (147) • Array indexing involves – Multiplying index by element size – Adding to array base address • Pointers correspond directly to memory addresses – Can avoid indexing complexity Modified by S. J. Fritz Spring 2009 (148) §2.14 Arrays versus Pointers Arrays vs. Pointers Example: Clearing and Array clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; } move $t0,$zero loop1: sll $t1,$t0,2 add $t2,$a0,$t1 move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size] loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if (…) # goto loop2 # i = 0 # $t1 = i * 4 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if (…) # goto loop1 Modified by S. J. Fritz Spring 2009 (149) Comparison of Array vs. Ptr • Multiply “strength reduced” to shift • Array version requires shift to be inside loop – Part of index calculation for incremented i – c.f. incrementing pointer • Compiler can achieve same effect as manual use of pointers – Induction variable elimination – Better to make program clearer and safer Modified by S. J. Fritz Spring 2009 (150) • ARM: the most popular embedded core • Similar basic set of instructions to MIPS ARM MIPS 1985 1985 Instruction size 32 bits 32 bits Address space 32-bit flat 32-bit flat Data alignment Aligned Aligned 9 3 15 × 32-bit 31 × 32-bit Memory mapped Memory mapped Date announced Data addressing modes Registers Input/output Modified by S. J. Fritz Spring 2009 (151) §2.16 Real Stuff: ARM Instructions ARM & MIPS Similarities Compare and Branch in ARM • Uses condition codes for result of an arithmetic/logical instruction – Negative, zero, carry, overflow – Compare instructions to set condition codes without keeping the result • Each instruction can be conditional – Top 4 bits of instruction word: condition value – Can avoid branches over single instructions Modified by S. J. Fritz Spring 2009 (152) Instruction Encoding Modified by S. J. Fritz Spring 2009 (153) • Evolution with backward compatibility – 8080 (1974): 8-bit microprocessor • Accumulator, plus 3 index-register pairs – 8086 (1978): 16-bit extension to 8080 • Complex instruction set (CISC) – 8087 (1980): floating-point coprocessor • Adds FP instructions and register stack – 80286 (1982): 24-bit addresses, MMU • Segmented memory mapping and protection – 80386 (1985): 32-bit extension (now IA-32) • Additional addressing modes and operations • Paged memory mapping as well as segments Modified by S. J. Fritz Spring 2009 (154) §2.17 Real Stuff: x86 Instructions The Intel x86 ISA The Intel x86 ISA • Further evolution… – i486 (1989): pipelined, on-chip caches and FPU • Compatible competitors: AMD, Cyrix, … – Pentium (1993): superscalar, 64-bit datapath • Later versions added MMX (Multi-Media eXtension) instructions • The infamous FDIV bug – Pentium Pro (1995), Pentium II (1997) • New microarchitecture (see Colwell, The Pentium Chronicles) – Pentium III (1999) • Added SSE (Streaming SIMD Extensions) and associated registers – Pentium 4 (2001) • New microarchitecture • Added SSE2 instructions Modified by S. J. Fritz Spring 2009 (155) The Intel x86 ISA • And further… – AMD64 (2003): extended architecture to 64 bits – EM64T – Extended Memory 64 Technology (2004) • AMD64 adopted by Intel (with refinements) • Added SSE3 instructions – Intel Core (2006) • Added SSE4 instructions, virtual machine support – AMD64 (announced 2007): SSE5 instructions • Intel declined to follow, instead… – Advanced Vector Extension (announced 2008) • Longer SSE registers, more instructions • If Intel didn’t extend with compatibility, its competitors would! – Technical elegance ≠ market success Modified by S. J. Fritz Spring 2009 (156) Basic x86 Registers Modified by S. J. Fritz Spring 2009 (157) Basic x86 Addressing Modes • Two operands per instruction Source/dest operand Second source operand Register Register Register Immediate Register Memory Memory Register Memory Immediate • Memory addressing modes – – – – Address in register Address = Rbase + displacement Address = Rbase + 2scale × Rindex (scale = 0, 1, 2, or 3) Address = Rbase + 2scale × Rindex + displacement Modified by S. J. Fritz Spring 2009 (158) x86 Instruction Encoding • Variable length encoding – Postfix bytes specify addressing mode – Prefix bytes modify operation • Operand length, repetition, locking, … Modified by S. J. Fritz Spring 2009 (159) Implementing IA-32 • Complex instruction set makes implementation difficult – Hardware translates instructions to simpler microoperations • Simple instructions: 1–1 • Complex instructions: 1–many – Microengine similar to RISC – Market share makes this economically viable • Comparable performance to RISC – Compilers avoid complex instructions Modified by S. J. Fritz Spring 2009 (160) • Powerful instruction higher performance – Fewer instructions required – But complex instructions are hard to implement • May slow down all instructions, including simple ones – Compilers are good at making fast code from simple instructions • Use assembly code for high performance – But modern compilers are better at dealing with modern processors – More lines of code more errors and less productivity Modified by S. J. Fritz Spring 2009 (161) §2.18 Fallacies and Pitfalls Fallacies Fallacies • Backward compatibility instruction set doesn’t change – But they do acquire more instructions Modified by S. J. Fritz Spring 2009 (162) x86 instruction set Pitfalls • Sequential words are not at sequential addresses – Increment by 4, not by 1! • Keeping a pointer to an automatic variable after procedure returns – e.g., passing pointer back via an argument – Pointer becomes invalid when stack popped Modified by S. J. Fritz Spring 2009 (163) • Design principles 1. Simplicity favors regularity 2. Smaller is faster 3. Make the common case fast 4. Good design demands good compromises • Layers of software/hardware – Compiler, assembler, hardware • MIPS: typical of RISC ISAs – c.f. x86 Modified by S. J. Fritz Spring 2009 (164) §2.19 Concluding Remarks Concluding Remarks Concluding Remarks • Measure MIPS instruction executions in benchmark programs – Consider making the common case fast – Consider compromises Instruction class MIPS examples SPEC2006 Int SPEC2006 FP Arithmetic add, sub, addi 16% 48% Data transfer lw, sw, lb, lbu, lh, lhu, sb, lui 35% 36% Logical and, or, nor, andi, ori, sll, srl 12% 4% Cond. Branch beq, bne, slt, slti, sltiu 34% 8% Jump j, jr, jal 2% 0% Modified by S. J. Fritz Spring 2009 (165)