Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Instruction Set Architectures CMSC411/Computer Architecture These slides and all associated material are © 2003 by J. Six and are available only for students enrolled in CMSC411. Science can amuse and fascinate us all, but it is engineering that changes the world. CMSC411 – Computer Architecture / © 2003 J. Six Use and Distribution Notice Possession of any of these files implies understanding and agreement to this policy. The slides are provided for the use of students enrolled in Jeff Six's Computer Architecture class (CMSC 411) at the University of Maryland Baltimore County. They are the creation of Mr. Six and he reserves all rights as to the slides. These slides are not to be modified or redistributed in any way. All of these slides may only be used by students for the purpose of reviewing the material covered in lecture. Any other use, including but not limited to, the modification of any slides or the sale of any slides or material, in whole or in part, is expressly prohibited. Most of the material in these slides, including the examples, is derived from Computer Organization and Design, Second Edition. Credit is hereby given to the authors of this textbook for much of the content. This content is used here for the purpose of presenting this material in CMSC 411, which uses this textbook. Instructions and Instruction Sets CMSC411 – Computer Architecture / © 2003 J. Six Each computer uses a certain language – each individual command (a word in the computer’s language) is called an instruction. All of the instructions that a specific computer understands is called the instruction set. There are multiple machine languages (instruction sets). While each is different, reflecting the design choices made during its construction, most instructions sets are similar. Throughout this course, we will study the MIPS instruction set, with occasional comparisons to others. MIPS-based microprocessors are currently used by Silicon Graphics/SGI, NEC, and Cisco Systems (among others). CMSC411 – Computer Architecture / © 2003 J. Six At the Beginning: Adding The most basic instruction for every computer is that which performs addition. The MIPS instruction for adding numbers is quite simple… add a, b, c This instruction tells the computer to add the two variables b and c and to put the sum in the variable a. This syntax is fixed … the add instruction in MIPS always takes two operands and produces the result. Adding More than Two Numbers CMSC411 – Computer Architecture / © 2003 J. Six Since the instruction format for adding is fixed, multiple instructions would need to be used to add more than two numbers. For instance, to add b, c, d, and e, putting the result in a… add a, b, c add a, a, d add a, a, e # sum of b & c is now stored in a # sum of b, c, & d is now stored in a # sum of b, c, d, & e is now stored in a The sharp symbol (#) delimits a comment. They are ignored by the computer. A comment in this language always ends at the end of the line. Also note that one and only one command can appear on a line. Fixed Operands: A Design Principle CMSC411 – Computer Architecture / © 2003 J. Six Addition naturally favors three arguments, two operands and the sum. Hardware for a fixed number of operands is much less complex than hardware that could support a variable number of operands (it’s always easier to do two rather than to do one, two, three, or any other number). This gives rise to one of the cardinal rules of computer design… Simplicity favors regularity. C to Assembler: A Basic Example CMSC411 – Computer Architecture / © 2003 J. Six A compiler transforms a high-level language (like C) program into assembly language (what we have just seen). A simple C program… a = b + c; d = a – e; … becomes a simple assembly language program… add a, b, c sub d, a, e CMSC411 – Computer Architecture / © 2003 J. Six C to Assembler: A More Complex Example A more complex example simple C statement… f = (g + h) – (i + j); … also becomes a fairly simple assembly language program. Here, we need some temporary storage, so let’s call these temporary variables t0 and t1 (there’s a reason for those names - we’ll get to that in a bit)… add t0, g, h # temp var t0 = g + h add t1, i, j # temp var t1 = i + j sub f, t0, t1 # f = to – t1 CMSC411 – Computer Architecture / © 2003 J. Six Registers So far, we have used the non-descriptive term variable for the operands to our instructions. At the assembly level, the operands of arithmetic instructions must be located in registers. Registers are one of the fundamental concepts of computer architecture – they are high speed memory locations located on the same die as the microprocessor. They are directly addressable by the CPU. The size of registers varies based on the instruction set – MIPS registers are 32 bits. CMSC411 – Computer Architecture / © 2003 J. Six General Purpose Registers These are registers that are used for normal instructions – such as arithmetic operands. Non GP registers would be registers like the stack pointer, amount of which contains the location of the system stack data structure. Register Limitations There are typically a small registers in a CPU… MIPS has 32 32-bit general purpose registers. Intel IA-32/x86 has 4 32-bit general purpose registers. This (severe) limitation on the number of registers is typically done for two reasons… Speed – smaller is normally faster Cost – Registers are expensive Modern computer designs have seen an explosive growth in general purpose registers… Intel IA-64 has 128 64-bit general purpose registers. Loading Information into Registers CMSC411 – Computer Architecture / © 2003 J. Six Since all operands for arithmetic operands must be in registers, there needs to be a way to put information from main memory into a register. In MIPS, this is accomplished using the loadword instruction (lw). The operands for this instruction are the name of the register to be loaded, and a constant followed by a register. The memory location to load the value from is formed by adding the constant to a pointer contained in the last register. CMSC411 – Computer Architecture / © 2003 J. Six Memory Addressing Memory is modern microprocessors is byte addressable. That means that each individual byte can be specified by a memory address. Each microprocessor has the concept of a word – this is the “preferred size” of data values for this architecture A MIPS word is 32 bits. An Intel IA-32/x86 word is 32 bits. An Intel IA-64 word is 64 bits. Most architectures enforce alignment restrictions – each word access must be at an address that is a multiple of the word size. CMSC411 – Computer Architecture / © 2003 J. Six Alignment Restrictions For example, a word can be found at memory locations 0, 4, 8, and so forth. It is not legal to attempt to access a word at location 3. Location 20 Location 16 Location 12 Location 8 Location 4 Location 0 32 bits CMSC411 – Computer Architecture / © 2003 J. Six Memory Address Space The entire memory that is addressable by a microprocessor is referred to as its address space. MIPS has a 32-bit address space. That means that all memory addresses are 32-bits long. This allows a maximum of 232 memory locations (remember each memory location is a byte – this is byte addressable memory). This means that 32-bit microprocessors can address 4 GB of memory. The first (lowest) byte address is 0. The last (highest) byte address is 4294967295. CMSC411 – Computer Architecture / © 2003 J. Six The Load-Word Instruction Consider the C language statement… g = h + A[8]; We need to get the value stored in main memory at location 8 in the array A into a register in order to perform this instruction. If the base address of A is in $s3, this can be accomplished using the load-word instruction… lw $t0, 8($s3) …now that the value A[8] is in register t0, we can perform the addition… add $s1, $s2, $t0 #g = h + A[8] CMSC411 – Computer Architecture / © 2003 J. Six Memory Layout: Endianness One design decision faced by every microprocessor design must decide how to store multibyte values in memory. Big endian processors store the leftmost (highest order) byte at the actual address of the value – little endian processors store the rightmost (lowest order) byte at the actual address of the value. For example, the value 00000000 11111111 01010101 10101010 stored at location 1024… Location 1027 10101010 Location 1026 Location 1026 01010101 Location 1025 Location 1025 11111111 Location 1024 Location 1024 00000000 Big Endian MIPS is big endian. Little Location 1027 00000000 11111111 01010101 10101010 Endian CMSC411 – Computer Architecture / © 2003 J. Six Revisiting the Load-Word Instruction Looking again at the C language statement… g = h + A[8]; We need to get the value stored in main memory at location 8 in the array A into a register in order to perform this instruction. Assuming A is an integer array and since integers are 32 bits in the MIPS architecture, the offset must be 4 x 8 = 32, since each entry in A is four bytes big and memory is byte addressable. We now have the load-word instruction… lw $t0, 32($s3) CMSC411 – Computer Architecture / © 2003 J. Six The Store-Word Instruction We have an instruction to load a value from memory into a register – now we need an instruction to take value from a register and copy it into memory. This is accomplished using the storeword (sw) instruction, which has the same syntax as the load-word instruction. CMSC411 – Computer Architecture / © 2003 J. Six Store-Word Example For example, consider the C statement… A[12] = h + A[8]; Assuming A is an array of 100 integers, the base address of A is stored in register $s3, and the variable h is already in register $s2, this statement could be implemented in MIPS assembly language as… lw $t0, 32($s3) #$t0 <= A[8] add $t0, $s2, $t0 #$t0 <= h+A[8] sw $t0, 48($s3) #A[12] <= $t0 CMSC411 – Computer Architecture / © 2003 J. Six Machine Instructions So how are these instructions actually represented in the computer? Like everything else … as a number. In fact, each portion of an instruction is usually represented as a number and then the numbers are abutted together to form one instruction. So far we have used registers like $s2 and $t0. These names are for our benefit … the CPU only understands numbered registers. We normally use a convention for name->number register mappings. So far we have the s registers and the t registers (what the s and t represent will be discussed soon). In MIPS design, $s0->$s7 map to registers 16->23 and $t0->$t7 map to registers 8->15. CMSC411 – Computer Architecture / © 2003 J. Six Instruction Format: R-Type In the MIPS architecture, all instructions are 32 bits wide. The layout of these bits is known as the instruction format. The addition and subtraction instructions are considered R-type (R for register) instructions in MIPS – they follow a standard format, common among all R-type instructions… opcode rs rt rd shamt funct Opcode – This is the basic operation – what this instruction? Rs – This is the first operand register. Rt – This is the second operand register. Rd – This is the destination register. Shamt – This is the shift amount – it is only used in shift instructions which will be discussed at a later point. Funct – This is the function (or function code). This selectes the specific variant of the operation (represented in the opcode) that is desired. CMSC411 – Computer Architecture / © 2003 J. Six Instruction Formats Remember our first design principle, simplicity favors regularity – all MIPS instructions are 32 bits and all register based instructions use the common R-type format. Ideally, all instructions would use the same format. However, this is not always possible – how would one encode a data transfer (lw/sw) instruction using a R-type instruction format? Since we need a (potentially large) offset encoded in the instruction and only two registers, the R-type format is not ideal. CMSC411 – Computer Architecture / © 2003 J. Six Instruction Format: I-Type This leads to another design principle… Good design demands good compromises. Data transfer instructions are encoded in MIPS using the I-type instruction format… opcode rs rt address 6 bits 5 bits 5 bits 16 bits Opcode – This is the basic operation – what this instruction? Rs – This is the first operand register (the base address). Rt – This is the destination register (loaded into or stored from). Address – This is the offset for the instruction. So, lw/sw instructioncs can reference a region +/- 215 (32768) bytes from the base address. CMSC411 – Computer Architecture / © 2003 J. Six MIPS Opcodes Note that the R-type and I-type formats are similar (they both start with the opcode and then the registers). Each opcode has a specific instruction format and the CPU can look at the opcode to see what format the rest of the instruction is in. Let’s look at how the four instructions we have been looking at are encoded… instruction format opcode rs rt rd shamt function address add R 0 reg reg reg 0 32 NA sub R 0 reg reg reg 0 34 NA lw I 35 reg reg NA NA NA addr sw I 43 reg reg NA NA NA addr reg= register number, NA=fields does not appear in this instruction format, addr=16 bit address/offset CMSC411 – Computer Architecture / © 2003 J. Six C -> Assembly -> Machine Let’s consider this C statement… A[300] = h + A[300]; Assuming $t1 has the base address of A and $s2 corresponds to h, this statement can be compiled into MIPS assembly… lw $t0, 1200($t1) add $t0, $s2, $t0 sw $t0, 1200($t1) We can then express the assembly language in machine language (let’s start with decimal numbers)… opcode rs rt 35 9 8 0 18 8 43 9 8 rd shamt/addr funct 1200 8 0 1200 32 CMSC411 – Computer Architecture / © 2003 J. Six C -> Assembly -> Machine Now that we have the machine instructions… opcode rs rt 35 9 8 0 18 8 43 9 8 rd shamt/addr funct 1200 8 0 32 1200 …we can express our instructions in binary, just like they are stored in the computer… opcode rs rt 100 011 01001 01000 000 000 10010 01000 101 011 01001 01000 rd shamt/addr funct 0000 0100 1011 0000 01000 00000 0000 0100 1011 0000 100000 CMSC411 – Computer Architecture / © 2003 J. Six C -> Assembly -> Machine A[300] = h + A[300]; lw $t0, 1200($t1) add $t0, $s2, $t0 sw $t0, 1200($t1) 10001101001010000000010010110000 00000010010010000100000000100000 10101101001010000000010010110000 CMSC411 – Computer Architecture / © 2003 J. Six The Stored Program Concept So, programs are made up of instructions and instructions are simply numbers – programs are just numbers stored in memory. This means that the same memory can contain program source code, the compiled machine code, the data being used, created, and manipulated by the program, and even the compiler used to compile the program (it’s all numbers!). M e m o ry A c c o u n t in g p r o g r a m ( m a c h in e c o d e ) This is known as the stored program concept. E d it o r p r o g r a m ( m a c h in e c o d e ) P ro c e s s o r C c o m p i le r ( m a c h in e c o d e ) P a y r o ll d a t a B o o k te x t S o u r c e c o d e in C fo r e d ito r p r o g r a m CMSC411 – Computer Architecture / © 2003 J. Six Decision Making: Branching Instructions Computers are capable of making decisions – a capability that is frequently used in programming languages using the if statement. The MIPS instruction set supports decision making using two conditional branch instructions. Both of these instructions involve two registers and a label. The branch-if-equal (beq) instruction goes to the labeled point if the two registers contain the same value… beq $s1, $s1, LABEL1 The branch-if-not-equal (bne) instruction goes to the labeled point if the two registers do not contain the same value… bne $s2, $s2, LABEL2 CMSC411 – Computer Architecture / © 2003 J. Six Labels Remember, program instructions are stored in memory, just like data. The label used in the conditional branch instructions is simply a name for a specific memory address (that contains the instruction that should be executed next if the branch is taken). The compiler will often generate labels where they are needed (often in places that you would not think of). This is one (of the many) benefits of programming in high-level languages. This label is converted into an actual address when the program is converted from assembly language to machine language (this is normally done by a program known as an assembler). CMSC411 – Computer Architecture / © 2003 J. Six Compiling if-then-else Statements If-then-else statements compile nicely with beq and bne instructions. For example, let’s compile the statement… if (i==j) f=g+h; else f=g-h; Assume f->i are stored in $s0->$s4. This can be compiled into MIPS assembly language quite easily… bne add j $s3, $s4, Else $s0, $s1, $s2 Exit The jump (j) instruction is an sub $s0, $s1, $s2 It simply goes to the specified label/address. Else: Exit: unconditional branch. CMSC411 – Computer Architecture / © 2003 J. Six Compiling Loops Conditional branches are also useful for loops. For example, let’s compile the loop… while (save[i] == k) i += j; Assume i,j,k are in $s3,$s4,$s5 and the base address of the integer array save is in $s6. Let’s compile… Loop: add $t1, add $t1, add $t1, lw $t0, bne $t0, add $s3, j Loop Exit: $s3, $s3 $t1, $t1 $t1, $s6 0($t1) $s5, Exit $s3, $s4 # # # # # # # temp = i * 2 temp = i * 4 addr of save[i] in $t1 load save[i] into $t0 check loop condition if statement we’re not done yet CMSC411 – Computer Architecture / © 2003 J. Six The Zero Register and the set-on-less-than Instruction MIPS includes the set-on-less-than (slt) instruction, which sets the destination register to one if the first operand register is less than the second operand register, zero otherwise… slt $t0, $s0, $s1 #t0 gets 1 if $s0<$s1, 0 else Using only the slt, beq, and bne instructions, a compiler can produce any conditional expression (<, >, ==, !=, <=, >=). Note that sometimes this requires the value zero…the MIPS register $zero (register 0) is a zero bucket … it always contains a zero and writing anything to it simply discards the write. Computer Hardware Function Support CMSC411 – Computer Architecture / © 2003 J. Six Most modern high-level languages employ the concept of a function (or procedure or method). When a function is called, there are generally six steps which are performed by the computer… Parameters are placed somewhere that the called function can access them. Control is transferred to the called function. Storage resources needed in the called function are acquired. The called function’s task is performed. The return value is placed somewhere that the calling function can access it. Control is transferred back to the point of origin. CMSC411 – Computer Architecture / © 2003 J. Six In/Out Registers As already discussed, registers are the fastest place to store information. The MIPS architecture allocates seven of its registers for function calling… $a0 -> $a3 are argument registers – they are used to pass data into a function. $v0 -> $v1 are return registers – they are used to pass data out of a function. $ra is the return address register – this is used to store a pointer to the point of origin (the place where the function was called; this is used to jump back to after the called function completes). CMSC411 – Computer Architecture / © 2003 J. Six Function Calling Functions are typically called in the MIPS architecture using the jump-and-link (jal) instruction. This instruction takes one argument, the memory address that the function begins at. Upon execution, control is passed to the function entry point. The return address is automatically stored in the $ra register – this is the address that will be jumped to after the function completes; it is the address of the instruction right after the jal instruction. This function is referred to as CALL in other popular instruction sets. CMSC411 – Computer Architecture / © 2003 J. Six The Program Counter For all of this to work, the address of the current instruction must be stored somewhere (how about a register?). This is done in MIPS using the $pc (or PC) register – PC stands for program counter. This is also referred to as an instruction pointer. Therefore, when a jal instruction is encountered (the PC points to a jal instruction), PC+4 is stored in $ra and PC changes to the memory address specified in the jal instruction. 00001348 0000134C 00001350 add $t0, $t0, $t0 jal 00001D60 add $t1, $t1, $t1 $pc 00001D60 0000134C 00001348 $ra 00001350 CMSC411 – Computer Architecture / © 2003 J. Six Returning from a Function So once we are in a function and we’re all done, how do we get back? The return address has been stored in the $ra register, so we simply need to jump to it. This can be accomplished using the MIPS jump-to-register (jr) instruction. This instruction takes one parameter, the register that contains the address to jump to. Since the $ra register has the return address, the end of the function is very simple… jr $ra CMSC411 – Computer Architecture / © 2003 J. Six Function Call Flow The calling function (the caller) puts input parameters for the called function (the callee) into registers $a0->$a3. It then uses the jal instruction to jump to the other function’s entry point. jal address The callee performs the required function and puts the results into registers $v0->$v1. It then returns to the next instruction in the caller by using the jr instruction and the $ra register. jr $ra CMSC411 – Computer Architecture / © 2003 J. Six Register Spilling A function call should “cover its own tracks.” It should not alter the register contents of the calling function. So, if the called function needs more than the $a0->$a3 and $v0->$v1 registers, some registers must be spilled. This means that the register contents are copied into memory, the registers are used, and then the original contents are restored from memory. The ideal data structure for spilling registers is a stack. This is a basic last-in-first-out data structure. CMSC411 – Computer Architecture / © 2003 J. Six Stack Layout High address $sp $sp Contentsof register $t1 Contents of register $t0 $sp Low address a. Contentsof register $s0 b. c. For instance, if a function needs to use $t1, $t0, and $s0, those registers must be spilled while in the called function. In this diagram, (a) shows the stack before the function call, (b) shows the stack during the function call, and (c) shows the stack after the function call. CMSC411 – Computer Architecture / © 2003 J. Six Stack Layout – The Details The stack is a data structure that is typically managed by the program with some assistance from the hardware. In MIPS, one of the registers, the stack pointer ($sp) is reserved for storing the address of the most recently allocated address in the stack (where did the last thing that was put on the stack end up in memory?). The stack pointer is adjusted by one word for each register that is spilled onto the stack (remember, a word in MIPS is 32 bits, the size of a register). CMSC411 – Computer Architecture / © 2003 J. Six Stack Operations Placing something on the stack is known as pushing it onto the stack. Removing something from the stack is known as popping it off of the stack. Stacks normally grow down – they start at higher memory addresses and each push gets stored at a lower memory address. Each push moves the stack pointer down by 4 bytes ($sp = $sp – 4). Each pop moves the stack pointer up by 4 bytes ($sp = $sp + 4). CMSC411 – Computer Architecture / © 2003 J. Six Function Prologues and Epilogues High address $sp $sp Contentsof register $t1 Contents of register $t0 $sp Low address Contentsof register $s0 a. sub sw sw sw b. $sp, $sp, 12 $t1, 8($sp) $t0, 4($sp) $s0, 0($sp) This is the very beginning of the function. It is sometimes referred to as the function prologue. c. lw lw lw add jr $s0, 0($sp) $t0, 4($sp) $t1, 8($sp) $sp, $sp, 12 $ra This is the very end of the function. It is sometimes referred to as the function epilogue. CMSC411 – Computer Architecture / © 2003 J. Six Intel x86 and the Stack Some microprocessors have more explicit support for the stack data structure. Intel, for example, has PUSH and POP instructions in the x86 instruction set (both take a register as an argument… MIPS PUSH copies that register onto the current end of the stack and moves the stack pointer down 32 bits. POP copies the current piece of data on the stack into that register and moves the stack pointer up 32 bits. sub sw sw sw $sp, $sp, 12 $t1, 8($sp) $t0, 4($sp) $s0, 0($sp) PUSH PUSH PUSH EAX ($t1) EBX ($t0) ECX ($s0) x86 CMSC411 – Computer Architecture / © 2003 J. Six So, in our previous example, we did not need to spill $t0 and $t1, only $s0. Register Semantics The MIPS $t# registers are known as temporary registers. They are just like temporary variables – their results are no longer necessary upon the computation is complete. Therefore, MIPS provides two categories of registers with different semantics… $t0->$t9 – these are 10 temporary registers that are not preserved by the called function on a function call (their values can change within the called procedure and are not restored prior to returning). $s0->$s7 – these are 8 saved registers that are preserved by the called function (if their values are changed within the called procedure, the original values must be restored prior to returning). CMSC411 – Computer Architecture / © 2003 J. Six If there is a jump-and-link instruction in this function body, we need the sw and lw instructions involving $ra. If there are no function calls in the body, they are not needed. Nested Procedures So, when a procedure gets called, the jal instruction saves the return address into the $ra register. If the called procedure calls another procedure, how is this accounted for? Well, the $ra value must be spilled before the jal instruction is executed (otherwise, the first return address would be overwritten and lost). sub sw sw sw sw $sp, $sp, 16 $ra, 12($sp) $t1, 8($sp) $t0, 4($sp) $s0, 0($sp) … function body … lw lw lw lw add jr $s0, 0($sp) $t0, 4($sp) $t1, 8($sp) $ra, 12($sp) $sp, $sp, 16 $ra CMSC411 – Computer Architecture / © 2003 J. Six Local Data The stack is also used for function local storage (arrays, variables, structures, and so forth that are local to the function). The segment of the stack that is used by a function (including saved/spilled registers and any local variables) is called the stack frame. To keep track of all of this, MIPS uses a second register to keep track of the stack, the frame pointer ($fp). This register points to the first word of the current function’s stack frame. CMSC411 – Computer Architecture / © 2003 J. Six The Frame Pointer H ig h a d d r e s s $ fp $ fp $sp $sp $ fp Saved argum ent r e g is t e r s ( if a n y ) S a v e d re tu r n a d d r e s s Saved saved r e g is te r s ( if a n y ) L o c a l a r ra y s a n d s tr u c t u r e s ( if a n y ) $sp L o w a d d re s s a. b. c. Here, (a), (b), and (c) represent the stack before, during, and after the function call. Note that the frame pointer points to the first word of the stack frame and the stack pointer points to the top of the stack. CMSC411 – Computer Architecture / © 2003 J. Six Accessing Data on the Stack Looking at the stack… $ fp Saved argum ent r e g is t e r s ( if a n y ) S a v e d re tu r n a d d r e s s Saved saved r e g is te r s ( if a n y ) L o c a l a r ra y s a n d s tr u c t u r e s ( if a n y ) $sp Normally saved registers, including the $ra, are accessed relative to the frame pointer. Local variables are normally referenced relative to the stack pointer (as we sometimes do not know at compile time how many local variables might be pushed onto the stack and only $sp moves with each stack push). If there are no local variables, the frame pointer is not normally used. CMSC411 – Computer Architecture / © 2003 J. Six Immediate Addressing Many common operations in computer programs involve constants (such as in the C statement x=x+12;). To allow this, MIPS has an immediate addressing mode in which the constant is encoded right into the instruction, such as the instructions that we have already seen for adding and subtracting the stack pointer. MIPS defines the instruction format involving constants (or immediate data) – this is the Itype we have already seen - as having a 16 bit data field. op rs rt immediate CMSC411 – Computer Architecture / © 2003 J. Six Immediate Addressing and Constant Comparison Immediate address comes up a lot when doing comparisons and arithmetic operations. Comparisons are accomplished using the immediate version of the set-on-less-than instruction (slt)… slti $t0, $s2, 10 # $t0=1 if $s2 < 10 The arithmetic instructions have immediate versions as well… op 8 001000 rs 29 11101 addi $sp, $sp, 4 rt immediate 29 4 11101 0000 0000 0000 0100 CMSC411 – Computer Architecture / © 2003 J. Six The Common Case Constants occur in arithmetic operations and comparisons a lot. The inclusion of the immediate addressing versions of arithmetic and set-on-less-than instructions in the MIPS architecture is an illustration of the “making the common case fast,” something that is a prevailing theme in computer system performance. This is so because it is much quicker to get the constant right from the instruction than to keep it in memory and load it into a register when necessary. CMSC411 – Computer Architecture / © 2003 J. Six Immediate Data Size The immediate addressing mode in MIPS allows 16 bit constants. So, when a 32 bit constant is necessary, how can this value be loaded and used? MIPS provides a load upper immediate (lui) instruction that takes a 16 bit constant and copies it into the upper 16 bits of the target register (filling the lower 16 bits with zeros). lui $t0, 255 op rs rt immediate 001111 00000 01000 0000 0000 1111 1111 After instruction Before instructionexecution… execution… 0101 0000 0101 1111 0101 1111 0101 0000 0101 0000 0101 0000 0101 0000 0101 $t0: 0000 CMSC411 – Computer Architecture / © 2003 J. Six Loading a 32-bit Constant Let’s say we wanted to load the 32-bit constant 0000 0000 0011 1101 0000 1001 0000 0000 into $s0. lui $s0, 61 lui $s0, 61 addi $s0, $s0, 2304 # 61 = 0000 0000 0011 1101 # 61 = 0000 0000 0011 1101 # 2304 = 0000 1001 0000 0000 xxxx0000 xxxx0011 xxxx 1101 xxxx xxxx 0000 xxxx 0000xxxx 1001 0000xxxx 0000 $s0: 0000 CMSC411 – Computer Architecture / © 2003 J. Six Jump Addressing The addressing mode associated with branch instructions again follows the “keep it simple” design idea. The jump instruction follows the MIPS J-type addressing format, which consists of the opcode and the address to jump to… opcode (6 bits) jump target address (26 bits) The opcode for jump is 2. So, to jump to memory address 10000, the machine language instruction is… 2 10000 CMSC411 – Computer Architecture / © 2003 J. Six Branch Addressing Conditional branch instructions must also specify two registers (for the condition checking)… opcode (6 bits) rs (5 bits) rt (5 bits) branch target (PC-relative offset) (16 bits) The branch target size is too small to specify a large enough range for modern programs (a branch could only have a target in the lower half of memory!). So, branch instructions employ a PC-relative addressing mode – the branch instruction is considered an offset and is added to the program counter to form the branch target. CMSC411 – Computer Architecture / © 2003 J. Six PC-Relative Branch Addressing Actually, in MIPS the branch target offset is added to the program counter plus four (the address of the instruction immediately after the branch instruction). This gives us… New PC = Old PC + 4 + Offset Why this is will be explored later, during a discussion on microprocessor control path design. It ends up that it is convenient for the hardware to increase the PC early to point to the next instruction – since this has already been computed, it is efficient to use it. PC-relative addressing is useful because the destination of conditional branching is highly likely to be local to the branch. In contrast, jump (and jump-and-link) has no such spatial locality characteristics. (Why is this?) More Quirks in MIPS Addressing CMSC411 – Computer Architecture / © 2003 J. Six As all MIPS instructions are 4 bytes long, the PC-relative addressing associated with branch instructions is actually the number of words to the target instruction (this gives us four times the range). The direct (J-type) addressing used in jump instructions has a 26-bit target address field. These 26 bits are considered the low 26-bits of the target address. The high 6 bits are copied from the current value of the PC. This is known as pseudodirect addressing. CMSC411 – Computer Architecture / © 2003 J. Six PC-Relative Addressing Example Loop: Exit: add add add lw bne add j Loop 80000: 80004: 80008: 80012: 80016: 80020: 80024: 0 0 0 35 5 0 2 $t1, $s3, $s3 $t1, $t1, $t1 $t1, $t1, $s5 $t0, 0($t1) $t0, $s5, Exit $s3, $s3, $s4 19 9 9 9 8 19 19 9 21 8 21 20 9 9 9 … 19 80000 0 0 0 0 2 0 32 32 32 32 CMSC411 – Computer Architecture / © 2003 J. Six Far-Away Branching We already discussed that the destination of conditional branching is highly likely to be local to the branch instruction itself. Sometimes this is not the case – the assembler typically deals with this by inserting an unconditional jump (with the large 26-bit target address) and then inserting a opposite branch instruction that would skip the jump instruction. For example, if L1 is too far away… beq $s0, $s1, L1 L2: bne j $s0, $s1, L2 L1 CMSC411 – Computer Architecture / © 2003 J. Six Summary of MIPS Addressing Modes 1 . Im m e d i a t e a d d r e s s i n g op rs rt Im m e d ia te 2 . R e g is te r a d d r e s s in g op rs rt rd . . . fu n c t R e g is te r s R e g is te r 3 . B a s e a d d r e s s in g op rs rt M emory A d dres s + R e g is t e r B y te H a lfw o r d 4 . P C - r e la ti v e a d d r e s s in g op rs rt M emory A d dres s PC + W o rd 5 . P s e u d o d ir e c t a d d r e s s in g op A d d re ss PC M emory W o rd W o rd CMSC411 – Computer Architecture / © 2003 J. Six Summary of MIPS Registers and their Conventional Uses register number name usage preserved on function call? 0 $zero always the constant zero N/A 1 $at reserved for assembler yes 2-3 $v0-$v1 results and expression eval no 4-7 $a0-$a3 arguments yes 8-15 $t0-$t7 temporaries no 16-23 $s0-$s7 saved yes 24-25 $t8-$t9 more temporaries no 26-27 $k0-$k1 reserved for the OS yes 28 $gp global pointer yes 29 $sp stack pointer yes 30 $fp frame pointer yes 31 $ra return address yes CMSC411 – Computer Architecture / © 2003 J. Six The PowerPC Architecture and Other Addressing Modes We have seen that MIPS has five addressing modes: register, base, immediate, PC-relative, and pseudodirect. There are a number of other addressing modes that might be useful. For example, let’s consider two that are found on the PowerPC architecture. PowerPC is designed and made by IBM and Motorola and is typically found in Apple Macintosh computers. It is similar to MIPS…PowerPC has 32 integer registers, all instructions are 32-bits long, and data in memory is manipulated using loads and stores. CMSC411 – Computer Architecture / © 2003 J. Six Indexed Addressing Consider an array of values in memory – remember an array is simply a set of same-type variables next to each other in memory. In this case, we might use indexed addressing. One register might contain the base address of the array and another register would contain the offset (or index). Using this approach, only one register needs to change to iterate through the array. Let’s look at MIPS code to do this and the corresponding PowerPC instruction… add lw $t0, $a0, $s3 $t1, 0($t0) MIPS lw PowerPC $t1, $a0+$s3 CMSC411 – Computer Architecture / © 2003 J. Six Update Addressing Consider the array again. Another common set of operations is to load a word from memory and then increment the base register to point to the next word. Update addressing introduces a new version of the data transfer instructions (load/store) that automatically increments the base register to point to the next word whenever data is transferred. Let’s look at MIPS code to do this and the corresponding PowerPC instruction (remember, a word is 4 bytes in both architectures)… lw addi $t0, 4($s3) $s3, $s3, 4 MIPS lwu PowerPC $t0, 4($s3) Indexed and Update Addressing CMSC411 – Computer Architecture / © 2003 J. Six a. Indexed addressing op rs rt rd ... Memory Register + Word Register b. Update addressing op rs rt Register Address Memory + Word CMSC411 – Computer Architecture / © 2003 J. Six PowerPC Instructions Most PowerPC instructions are very similar to MIPS The PowerPC bc Instruction instructions and PowerPC followed a lot of the same This instruction uses the special ctr design principles. register, a special register (in addition However, there are some which more to theinstructions normal 32) used forare loop control. complex. You simply set this register to the number As an example, PowerPC introduces a special branch of iterations you want and end the loop instruction intended forcode loops where the control value with this bc instruction. Easy! starts off at some value and is decremented until it reaches zero. Let’s look at MIPS code for such a loop and the corresponding PowerPC instructions… Loop: … addi $t0,$t0,-1 bne $t0,$zero,Loop MIPS PowerPC Loop: … bc Loop, ctr!=0 The Intel IA-32/x86 Architecture CMSC411 – Computer Architecture / © 2003 J. Six The Intel 8086 started life as a 16-bit microprocessor. With the introduction of the 80386, the x86 line (called IA-32 for the past couple of years), Intel moved to a 32-bit design. IA-32 has a huge number of instructions (over 100) – each new edition adds new instructions. Let’s look at some aspects of IA-32, including the register set, addressing modes, integer operations, and instruction encoding. CMSC411 – Computer Architecture / © 2003 J. Six Intel IA-32’s Register Set Name 31 0 Use EAX GPR 0 ECX GPR 1 EDX GPR 2 EBX GPR 3 ESP GPR 4 EBP GPR 5 ESI GPR 6 EDI GPR 7 EIP EFLAGS CS Code segment pointer SS Stack segment pointer (top of stack) DS Data segment pointer 0 ES Data segment pointer 1 FS Data segment pointer 2 GS Data segment pointer 3 Instruction pointer (PC) Condition codes The first observation to make about IA-32 is that there are only eight general purpose registers (GPRs) – this is in sharp contrast to MIPS’s 32 GPRs. CMSC411 – Computer Architecture / © 2003 J. Six IA-32 Addressing IA-32 uses two operands for its arithmetic, logical, and data transfer instructions. This means one operand must act as a source and a destination. Unlike in MIPS, both operands do not need to be registers – IA-32 allows instructions to operate directly on data stored in memory (one of the two operands can be a memory location). When memory locations are referenced, accesses do not need to be 32-bits wide (you do not need to always read/write a word). Most instructions can operate on a byte or a 16 or 32 bit value. By the way, in IA-32 a word in 16-bits and a 32-bit value is considered a doubleword (or DWORD). CMSC411 – Computer Architecture / © 2003 J. Six IA-32 Integer Instructions IA-32 integer operations fall into four major categories… Data transfer instructions – including move (move from one place to another; memory or register on either end), push, and pop Arithmetic/logic instructions – test operands (condition evaluation), integer and decimal operations Control flow instructions – condition branches, unconditional jumps, calls, returns String instructions – including string moves and string comparisions CMSC411 – Computer Architecture / © 2003 J. Six Conditional Branching Conditional branching is handled in a similar manner on PowerPC and IA-32 – this is based on the concept of condition codes or flags. Condition codes are a side effect of an operation – most often they are used to compare a value to zero and are checked by branch instructions. Most arithmetic and logic operations set condition codes. This is good because you do not need to explicitly compute such a flag and since they occur as part of the operation, it is faster. This is bad because every operation is more expensive as these code are computed, needed or not. CMSC411 – Computer Architecture / © 2003 J. Six IA-32 Instruction Encoding Not all IA-32 instructions are the same size. In fact, they can vary from 1 byte to 17 bytes! The opcode usually specifies what addressing mode is being used – alternatively it can say postbyte to learn the addressing mode (this is like a second opcode field). Sometimes there are two postbytes. This can be very confusing and there are so many addressing modes that it is very difficult to keep them straight. CMSC411 – Computer Architecture / © 2003 J. Six IA-32 Example Instructions Function Instruction JE name If equal (CC) EIP = name}; EIP – 128 name < EIP + 128 JMP name {EIP = NAME}; CALL name SP = SP – 4; M[SP] = EIP + 5; EIP = name; MOVW EBX,[EDI + 45] EBX = M [EDI + 45] PUSH ESI SP = SP – 4; M[SP] = ESI POP EDI EDI = M[SP]; SP = SP + 4 ADD EAX,#6765 EAX = EAX + 6765 TEST EDX,#42 Set condition codea (flags) with EDX & 42 MOVSL M[EDI] = M[ESI]; EDI = EDI + 4; ESI = ESI + 4 CMSC411 – Computer Architecture / © 2003 J. Six IA-32 Instruction Formats (just a few…) a. JE EIP + displacement 4 4 8 JE Condition Displacement b. CALL 8 32 CALL Offset c. MOV EBX, [EDI + 45] 6 1 1 MOV d w 8 r-m postbyte 8 Displacement d. PUSH ESI 5 3 PUSH Reg e. ADD EAX, #6765 4 3 1 32 ADD Reg w Immediate f. TEST EDX, #42 7 1 8 32 TEST w Postbyte Immediate CMSC411 – Computer Architecture / © 2003 J. Six RISC and CISC Architectures We have seen MIPS in great detail, and PowerPC and Intel IA-32 in passing. There appears to be a fundamental difference between MIPS and IA-32. This difference is more common than just these two architectures – let’s look at RISC and CISC criteria. CMSC411 – Computer Architecture / © 2003 J. Six MIPS – A Classic RISC Design MIPS is very simple. Data must be in registers (this is known as a load/store architecture) before being operated on. There is a small number of instructions. Each instruction has a similar format. Each instruction does one basic thing; complex tasks are done by combining a (large) number of instructions. MIPS is classically known as a reduced instruction set computer (RISC) architecture. CMSC411 – Computer Architecture / © 2003 J. Six IA-32 – A Classic CISC Design IA-32 is very complex. Data does not need to be in registers before being operated on (direct memory operations are possible). There is a large number of instructions. Instructions has very different formats. Each instruction can do a complex task. IA-32 is classically known as a complex instruction set computer (CISC) architecture. CMSC411 – Computer Architecture / © 2003 J. Six RISC vs. CISC Ten years ago, each instruction set architecture was either CISC or RISC and the different was clear. This is not very true anymore – each type has taken the best parts of the other and integrated them into their own design… PowerPC, the major RISC player right now, has some complex instructions and uses CISC-like technology such as condition codes. IA-32, the major CISC player right now, has a microprogrammed core where the complex instructions are broken down into simple microinstructions and executed in that manner. CMSC411 – Computer Architecture / © 2003 J. Six The End of RISC vs. CISC To a large extent, this RISC vs. CISC war is over with each side admitting that the other had some good points and adopting those points themselves. Modern and future architecture do even more combinations of principles (and introduce new designs all together) – this includes such ISAs as Intel’s IA-64, an architecture designed jointly by Intel (a major CISC organization) and HP (a major RISC organization). CMSC411 – Computer Architecture / © 2003 J. Six Summary: Design Principles Regardless of which design is personally favored, four design principles have emerged that are (almost) universally accepted… Simplicity favors regularity. Smaller is faster. Good design demands good compromises. Make the common case fast.