Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE200 – Computer Organization Chapter 3 – Instructions: The Language of the Machine Homework 3 3.2, 3.5, 3.9, 3.11, 3.14, 3.16, 3.19, 3.25, 3.27 Warning: Some of you will find this to be the most difficult chapter and hardest homework of the course Outline for Chapter 3 lectures Basic processor operation ISA motivation: enabling fast hardware implementations Reduced Instruction Set Computing (RISC) MIPS instruction formats and addressing modes Rundown of MIPS instructions Procedure call Arrays and pointers Assembling, linking, and loading programs Motorola MCore ISA The heart of a modern microprocessor destination operand source operand 1 Load instructions bring data from memory into the register file register file source operand 2 ALU ALU = Arithmetic Logic Unit operation Store instructions store data from the register file into memory Register file holds data to be operated on Basic processing of an instruction Instruction fetch Get the instruction from instruction memory Instruction decode Figure Read source operands Get data to be operated on from the register file Execute Do out what kind it is and how to control it the operation Write destination operand Write the result of the operation to the register file Basic processing of an instruction Instruction fetch Get the instruction from instruction memory Basic processing of an instruction Instruction decode Figure out what kind it is and how to control it control logic Basic processing of an instruction Read source operands Get data to be operated on from the register file Basic processing of an instruction Execute Do the operation Basic processing of an instruction Write destination operand Write the result of the operation to the register file The job of the ISA designer is to… create an ISA that enables fast hardware designs… that meet the design constraints (cost, power dissipation, etc.)… of the targeted market (workstation, handheld, etc.) Other jobs as well, such as Simplifying choices for the compiler Ensuring the longevity of the ISA by anticipating future technology trends Making computer hardware fast Computer hardware consists of datapaths and control, each of which may limit machine speed The datapaths pass the data that is being computed on to the different units The control steers the data to different units and selects the operation of each unit Example input select arithmetic logic unit MUX operation datapath control Making computer hardware fast Goal #1: fast execution of simple operations Goal #2: fast access of data from memory Goal #3: fast fetching and decoding of instructions Fast execution of simple operations Many programs contain predominately simple integer operations (e.g., add, and, load, branch) The results of operations are often used by successive instructions a + b + c + d Microprocessor designers make these simple operations as fast as possible and try to set the clock period of the chip to match this speed Register-register ISA All operands for arithmetic operations are read from the register file destination operand register # source operand 1 register # source operand 2 register # destination operand register file source operand 1 source operand 2 ALU operation Register file (32 registers in MIPS) holds data to be executed Big enough to hold most variables but small enough for a fast clock Very fast datapath and simple, fast control Operations which require the result of immediately prior operations can execute on back-to-back clock cycles through an operation called forwarding (Ch 4) Register-register ISA add $4, $3, $10 destination operand register # source operand 1 register # source operand 2 register # destination operand register file source operand 1 source operand 2 ALU operation Making computer hardware fast Goal #1: fast execution of simple operations Goal #2: fast access of data from memory Goal #3: fast fetching and decoding of instructions Fast access of data from memory In a register-register machine, load and store operations are used to access data in memory To access data memory, we need to first calculate the data’s memory address, called the effective address Addressing modes provide the means for representing offsets, array indexing, pointers, etc. at the assembly language level The number and complexity of addressing modes is a tradeoff between INST and (CPI,CT) More, powerful modes reduces INST Fewer, simple modes simplifies the EA calculation control and datapath, and thus the time required to access memory The ALU can serve the additional purpose of effective address calculation to save hardware Addressing mode example #1 Base mode (also called displacement mode) Used to access local variables Example: ld $2, 4 ($1) destination operand destination operand register # source operand 1 register # register file source operand 1 effective address ALU immediate value operation = add data cache Addressing mode example #2 Scaled mode Used to index arrays Example: ld $2, 4 ($1) [$3] destination operand destination operand register # source operand 1 register # source operand 2 register # register file source operand 1 source operand 2 x ALU scale immediate value operation = add + effective address data cache Making computer hardware fast Goal #1: fast execution of simple operations Goal #2: fast access of data from memory Goal #3: fast fetching and decoding of instructions Fast instruction fetching The Program Counter is a special register that holds the memory address of the currently executing instruction The PC is normally incremented to point to the next sequential instruction, except in the case of a jump or taken branch logic branch displacement branch condition PC instruction cache instruction Limiting the way that branches can modify the PC speeds up the calculation of the next PC Fast instruction decoding After an instruction is fetched, we parse its fields in hardware (decode it) to determine The operation to be performed The location of the source and destination operands Other instruction-specific information such as The addressing mode and values (e.g., offset) for memory instructions Constant values Decode is greatly simplified (and therefore execution begins sooner) if All instructions have the same length There is consistency (regularity) among instruction types in terms of where the fields are located The number of fields is limited Reduced Instruction Set Computing RISC has its roots in The IBM 801 project started by John Cocke at IBM Research in 1975 Precursor to the IBM RS/6000 workstation processors which later influenced PowerPC The Evolved into the SPARC ISA of Sun Microsystems The Berkeley RISC project started by Dave Patterson in 1980 Stanford MIPS project started by John Hennessy ~1980 Hennessy co-founded MIPS Computer RISC philosophy is that instruction sets should be simplified to enable fast hardware implementations that can be exploited by optimizing compilers University projects had the added goal of single chip implementation Original RISC approach Fixed-length (32 bits for MIPS) instructions that have only a few formats Simplifies instruction fetch and decode Code density is sacrificed Some bits are wasted for some instruction types Register-register architecture called load-store architecture Permits very fast implementation of simple instructions Easier to pipeline (Chapter 6) Requires more instructions to implement a HLL program Also Limited number of addressing modes Simplifies EA calculation and thus speeds up memory access Few if any complex arithmetic functions Instead more, simpler instructions are used Some recent deviations from RISC Shorter instructions (16 bits) intermixed with longer (32 bits) Higher code density for embedded environments Some complex operations Multiply-add for DSP and multimedia applications Experience is that selectively adding some complexity back into the ISA improves performance and/or cost effectiveness MIPS instruction formats R-type (Register) 31 26 25 op rs 16 15 rt 11 10 rd 6 5 shamt 0 funct I-Type (Immediate) 31 26 25 op 21 20 21 20 rs 16 15 rt 0 immediate J-Type (Jump) 31 26 25 op 0 target MIPS instruction formats where op is a 6-bit operation code (opcode) rs is a 5-bit source register specifier rt is a 5-bit (source or destination) register specifier or branch condition rd is a 5-bit destination register specifier shamt is a 5-bit shift amount funct is a 6-bit function field immediate is a 16-bit immediate, branch displacement, or memory address displacement target is a 26-bit jump target address Simplifications Fixed length Limited number of field types Many fields located in same location in different formats MIPS addressing modes Rundown of MIPS instructions Computational R-type and I-type Load and store instructions I-type Jump and branch instructions R-type, I-type, and J-type Only dealing with integer instructions for now Registers in MIPS 31 general purpose registers r0 r1 . . . 0 31 31 multiply/divide registers HI LO program counter register PC 0 0 r30 r31 32 GPRs r0 is equal to zero in all MIPS implementations, which makes it easy to… Load a constant, e.g., addi $2,$0,#3 Do absolute addressing, e.g., lw $4,1000($0) r31 holds return address for subroutine returns All other GPRs are “generic” although software conventions define their usage (discussed later) Registers in MIPS HI and LO Hold multiply result (HI holds upper half) Hold divide quotient (LO) and remainder (HI) PC holds address of currently executing instruction Also have a separate set of 32 floating point registers if floating point arithmetic is supported Computational operands in MIPS All computations occur on full 32 bit words Computations are on signed (2’s complement) or unsigned operands (positive numbers) Example: 1111 1111 1111 1111 1111 1111 1111 1111 is –1 as a signed number and 4,294,967,294 as an unsigned number Operands are in registers or are immediates Immediate (constant) values are 16 bits wide is sign extended (most significant bit is replicated in the uppermost 16 bits) before operated on to convert it to a 32-bit operand Value Example: 1000 0000 0000 0000 becomes 1111 1111 1111 1111 1000 0000 0000 0000 Computational instructions Arithmetic (signed, unsigned, and immediate) add, sub, mult, div addu, subu, multu, divu addi Logical and, or, nor, xor andi, ori, xori Shift (logical and arithmetic) srl, sll Shift shamt digits to right or left; put zeros in leftmost positions for srl srlv, Shift [rs] digits to right or left; put zeros in leftmost positions for srlv sral, sllv srav Same as above except sign-extend the high-order bits Computational instructions Set less than (signed, unsigned, and immediate) rd to 1 if condition is met, set to 0 otherwise Set Almost slt, sltu Condition is rs<rt slti, always paired with a branch instruction sltiu Condition is rs<immediate Load upper immediate (lui) immediate 16 bits left, append 16 zeros to right, put 32bit result into rd Shift Can create a 32-bit constant from lui and ori Move to/from HI/LO mfhi, Put HI or LO register contents into rd mthi, mflo mtlo Put rs contents into HI or LO (for interrupt recovery) Load and store operands in MIPS Memory operands can be bytes, half-words (2 bytes), or words (4 bytes [32 bits]) opcode determines the operand size Number of address bits used is 32 for accessing bytes 31 for accessing half-words 30 for accessing words Half-word and word addresses must be aligned Bit 0 must be zero for half-word accesses Bits 0 and 1 must be zero for word accesses 31 word effective address 1 0 0 0 . . . 12 8 4 0 byte addresses 1 byte memory 1 word Load operands can be signed or unsigned . . . 12 8 4 0 Load and store instructions lb, lh, lw Form effective address, get data from addressed memory location, sign extend if lb or lh, load into rt lbu, lhu, lwu Form effective address, get data from addressed memory location, zero extend if lb or lh, load into rt sb, sh, sw effective address, store data from rt (partial if sb or sh) into addressed location Form Also load/store instructions for unaligned data MIPS assembly example A is an array of words Starting address of A (memory address of A[0]) is in $4 Want to perform A[3] = A[0] + A[1] - A[2] MIPS assembly language: lw lw add lw sub sw $1, 0($4) $2, 4($4) $1, $1, $2 $2, 8($4) $1, $1, $2 $1, 12($4) # A[0] into $1 # A[1] into $2 # A[0]+A[1] into $1 # A[2] into $2 # A[0]+A[1]-A[2] into $1 # store result in A[3] Branch and jump instructions Branch instructions test a condition, either Equality beq, bne Often coupled with slt, sltu, slti, sltiu or the value of rs relative to zero or inequality of rs and rt bltz, bgtz, blez, bgez and form the target address by adding the sign extended immediate to the PC all instructions are words, immediate is shifted left two bits before being sign extended Since Operation true PC = target test condition false PC = PC+4 Branch and jump instructions Jump instructions unconditionally branch to the address formed by either left the 26-bit target two bits and combining it with the 4 high-order PC bits Shifting j The jr contents of register rs Branch and link and jump and link instructions also save the address of the next instruction into r31 bltzal, bgezal, jal, jalr Used for subroutine calls jr $31 used to return from a subroutine for loop and if-then-else example i, j, n are in $1, $2, $3; $4 holds A base address for (i=0; i<n; i++) { if (A[i] < j) A[i]=A[i]*2 else A[i]=A[i]+1 } Loop: Else: Exit: add lw slt beq add j addi sw addi addi bne $1, $0, $0 $5, 0($4) $6, $5, $2 $6, $0, Else $5, $5, $5 Exit $5, $5, 1 $5, 0($4) $4, $4, 4 $1, $1, 1 $1, $3, Loop # initialize i to 0 # A[i] into $5 # set $6=1 if A[i]<j # if not true, do else # A[i]=A[i]*2 # skip else # A[i]=A[i]+1 # store A[i] # point to next A[i] # increment i # loop back if i<n Special instructions System call (syscall) Call OS to perform system services, e.g., printf $2 contains code specifying the service to perform breakpoint Transfer control to exception handler Useful for debugging Many instruction extensions have been made to the original MIPS ISA Branches System Etc. functions Pseudoinstructions Assembler can create instruction sequences for other common operations Examples move $2, $3 blt $7, $14, Label li $4, 100 Many more examples in Appendix A Do not use these in your assignments! Register usage conventions Software conventions for the use of registers ensure that routines developed by different programmers can work together MIPS assembly convention assigns special names and meanings to the GPRs Preferred to use symbolic names instead of register numbers Caveats Conventions may vary by compiler (SGI/MIPS, gcc, etc.) The book has taken some liberties in the spirit of simplification (and so will my notes) So what is presented here (and in the book) may not be 100% accurate in terms of following “MIPS conventions” MIPS register usage conventions Register 0 1 2-3 4-7 8-15 24,25 16-23 26,27 28 29 30 31 name $zero $at $v0,$v1 $a0-$a3 $t0-$t7 $t8,$t9 $s0-$s7 $k0,$k1 $gp $sp $fp $ra usage always returns zero reserved by assembler value returned by subroutine arguments passed to subroutine temps not saved by called subroutine temps saved by called subroutine reserve for use by interrupt handler pointer to global area stack pointer frame pointer subroutine return address Procedure call To support procedures, we need means for Passing parameters to the procedure Transferring control to the procedure The procedure to use registers and memory without disturbing that of the calling program Passing results back to the calling program Returning control to the calling program calling program (1) pass parameters, transfer control procedure (2) save variables, allocate memory for local variables (3) execute procedure (4) pass results, return control (4) restore variables stack Procedure call Passing parameters $a0-$a3 reserved for parameter passing Additional parameters passed via the stack (later) Transferring control jal ProcedureName Puts PC+4 into $ra ($31) and jumps to address associated with label ProcedureName Passing results $v0,$v1 reserved for passing results Stack used for additional results Returning control jr $ra Jumps to address contained in register $ra Caller and callee-saved registers Caller: the procedure that calls another Callee: the called procedure Values in registers that are needed by the caller after the callee returns must be preserved Loop: Else: Exit: $s1-$s4 add lw slt beq add jal add j addi sw addi addi bne $s1, $zero, $zero # initialize i to 0 $t0, 0($s4) # A[i] into $t0 $t1, $t0, $s2 # set $t1=1 if A[i]<j $t1, $zero, Else # if not true, do else $a0, $t0, $zero # copy A[i] into $a0 doubleit # doubleit performs A[i]=A[i]*2 $t0, $v0, $zero # copy result into $t0 Exit # skip else $t0, $t0, 1 # A[i]=A[i]+1 $t0, 0($s4) # store A[i] $s4, $s4, 4 # point to next A[i] $s1, $s1, 1 # increment i $s1, $s3, Loop # loop back if i<n must be the same after return from doubleit Caller and callee-saved registers $t0-$t9 are caller-saved registers Calling procedure must save them before the procedure call if its want the values preserved across the call Called procedure can use them without saving them first Used for temporary values that do not need to be saved across a call, or if we use all $s0-$s7 $a0-$a3 and $v0,$v1 are also caller-saved $s0-$s7 are callee-saved register Called procedure must save them before using them Calling procedure does not need to save them before the call Used for values that need to be preserved across a call, or if we use all $t0-$t9 If the caller does not need $t0-$t9, $a0-$a3, and $v0,$v1 to be saved across a call, and the callee does not use $s0-$s7, then none of them need to be saved The stack The stack is a particular area of memory used for Passing additional parameters (beyond $a0-$a4) and results (beyond $v0, $v1) Preserving register values that need to be maintained across a procedure call Creating memory space for a procedure (e.g., for local variables) Items are pushed onto the stack and popped off 31 push 22 19 7 push 31 22 19 7 pop 31 22 19 7 22 pop 22 19 7 19 7 But note that the stack is not strictly accessed as a first-out structure The stack The LIFO nature of the stack matches the nested nature of procedure calling calling program non-leaf procedures leaf procedure Creating a stack in MIPS Need storage Convention is to use memory starting at address 7fff ffffhex (bottom of the stack) and growing to lower addresses Need a pointer (memory address register) to the top of the stack $sp ($29) by convention 7fff ffffhex $sp memory stack direction of stack growth 0hex Using the stack Pushing items onto the stack Adjust $sp to make room for new items Store items onto stack Popping items off the stack Retrieve items off stack Adjust $sp to position before items Example addi $sp, $sp, -12 sw $t1, 8($sp) sw $t0, 4($sp) sw $s0, 0($sp) lw $s0, 0($sp) lw $t0, 4($sp) lw $t1, 8($sp) addi $sp, $sp, 12 Procedure call steps Caller (before call) Save caller-saved registers Store arguments First four in $a0-$a3, remaining pushed onto the stack jal Callee Adjust stack pointer Save callee-saved registers (including $ra if non-leaf) Read arguments Execute the procedure Restore callee-saved registers Store return values and adjust stack pointer jr First two in $v0, $v1, remaining pushed onto the stack $ra Caller (after return) Retrieve arguments and restore caller-saved registers Stack before (a), during (b), and after (c) Frame pointer ($fp) provides a stable base register from which to reference local variables Provided for convenience only; some MIPS conventions use $30 as $s8 instead Example procedure: factorial Recursive function, therefore non-leaf C code int fact (int n) { if (n < 1) return (1); else return (n*fact(n-1)); } Example procedure: factorial MIPS code (see Appendix A for slight variation) fact: L1: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra # adjust stack for two items # save $ra on stack # save $a0 (n) on stack # test n<1 # if not, handle (n*fact(n-1)) # put 1 in result register $v0 # readjust stack pointer # return # put n-1 in argument register $a0 # call fact # restore n from stack # restore $ra from stack # readjust stack pointer # put n*fact(n-1) in result register $v0 # return fact execution with n=2 fact called three times, each with different values of n calling program n=2 n=1 $a0 used to pass the value of n $v0 used for the return value n=0 fact execution with n=2 fact: L1: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $sp fact execution with n=2 $a0=2 fact: L1: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $sp $ra $a0 fact execution with n=2 $a0=2 fact: L1: fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=1 L1: $sp addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $ra $a0 $ra $a0 $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra fact execution with n=2 $a0=2 fact: L1: fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=1 L1: $sp fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $ra $a0 $ra $a0 $ra $a0 $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=0 L1: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra fact execution with n=2 $a0=2 fact: L1: fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=1 L1: $sp fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $ra $a0 $ra $a0 $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=0 L1: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra fact execution with n=2 $a0=2 fact: L1: fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=1 L1: $sp fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $ra $a0 $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=0 L1: $v0=1 addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra fact execution with n=2 $a0=2 fact: L1: fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $v0=2*1*1 $a0=1 L1: $v0=1*1 $sp fact: addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra $a0=0 L1: $v0=1 addi sw sw slti beq addi addi jr addi jal lw lw addi mul jr $sp, $sp, -8 $ra, 4($sp) $a0, 0($sp) $t0, $a0, 1 $t0, $zero, L1 $v0, $zero, 1 $sp, $sp, 8 $ra $a0, $a0, -1 fact $a0, 0($sp) $ra, 4($sp) $sp, $sp, 8 $v0, $a0, $v0 $ra How pointers map to MIPS instructions Pointers are addresses of data objects Just as in HLLs, pointer versions of assembly language programs are often more efficient than array versions Compilers often will map an array version in a HLL into a pointer version in machine code How pointers map to MIPS instructions Array version of a procedure to clear an array clear1 (int array[], int size) { int i; for (i=0; i<size; i++) array[i] = 0; } clear1: Loop: add add add add sw addi slt bne $t0, $zero, $zero # initialize i to 0 $t1, $t0, $t0 # $t1 = 2*i $t1, $t1, $t1 # $t1 = 4*i $t2, $a0, $t1 # $t2 = &array[i] $zero, 0($t2) # clear array[i] $t0, $t0, 1 # increment i $t3, $t0, $a1 # set $t3=1 if i<size $t3, $zero, Loop # if $t3 not equal 0, reiterate loop How pointers map to MIPS instructions Pointer version of a procedure to clear an array clear2 (int *array, int size) { int *p; for (p=&array[0]; p<&array[size]; p++) *p = 0; } clear2: Loop: add add add add sw addi slt bne $t0, $a0, $zero $t1, $a1, $a1 $t1, $t1, $t1 $t2, $a0, $t1 $zero, 0($t0) $t0, $t0, 4 $t3, $t0, $t2 $t3, $zero, Loop # initialize p to &array[0] # $t1 = 2*size # $t1 = 4*size # $t2 = &array[size] # clear location pointed to by p # increment p to point to next element # set $t3=1 if p<&array[size] # if $t3 not equal 0, reiterate loop Assembling, linking, loading programs compile time run time Compiler Creates assembly language from HLL May produce object files directly (built-in assembly) Phases Front end Translate from HLL to intermediate (internal) form Optimizations Improve execution performance, often at the expense of increasing code size Back (optional) end (code generator) Generate assembly (or object code) from intermediate form Compiler structure Compiler optimizations Assembler Converts assembly language into an object file in binary format that can be linked with other object files Object file format Header: size and position of other object file sections Text segment: machine code (instructions) Data segment: data Relocation information: list of instructions and data that depend on absolute memory addresses when the program is loaded into memory Symbol table: location of each label in the program as well as list of unresolved references Debugging information Example object file Header Text segment Data segment Relocation info Symbol table Name myprog Text size 100hex Data size 20hex Address Instruction 0 lw $a0, 0($gp) 4 jal 0 … … Address Data 0 (X) … … Address 0 Instruction type lw Dependency X 4 jal yourproc Label Address X yourproc Linker Links object files together to form an executable (binary) that can be run on the machine should be “printf” and executable files should show machine language (in binary format) not assembly language Object Linker tasks Determine memory addresses of code and data Resolve absolute addresses Find library routines referenced by the program Resolve references among files Resulting program resides in I/O on hard drive on CD on another machine’s hard drive accessible via the network Etc. Example executable file Header Text segment Data segment Name Text size myprog 300hex Data size 50hex Address 0040 0000hex Instruction lw $a0, 8000hex($gp) 0040 0004hex jal 100040hex … … 0040 0100hex [start of yourproc] … … Address 1000 0000hex Data (X) … … $gp = 1000 8000hex Loader Part of the operating system that loads programs into memory, starts them, and terminates them when done Steps Determine memory size of text and data segments from executable header Create memory address space big enough for the program Copy text and initialized data into those memory locations Copy input parameters onto the stack Clear general registers and set $sp to first free stack location Call a startup routine that Copies input parameters into $a0-$a4 Loads the PC with the starting address of main Terminates the program when main returns The Motorola MCore architecture Introduced in 1997 to attack embedded market Cell phones Personal Digital Assistants (PDAs) Portable Global Positioning Systems (GPSs) Automotive systems (braking, engine control, etc) Etc. Features 32-bit architecture (integer arithmetic is on 32 bits) Many embedded processors are 8 or 16-bit 16-bit fixed-length instructions Two-operand register-register (load-store) architecture One source operand also serves as the destination (e.g., rx = rx + ry) Byte, half-word, and word memory accesses Power saving modes http://www.motorola.com/mcore MCore user visible registers 16 general purpose 32-bit registers R0 is stack pointer R15 holds return address 32-bit PC register 1-bit condition register Set by compare, arithmetic, and logical operations Alternative to setting a GPR for branch conditions as in MIPS MCore instruction formats Register-to-register Monadic Example: abs rx (absolute value of register rx) Dyadic Example: movt rx, ry (move ry to rx if condition register is “true”) 5-bit Example: lsli rx, imm5 (logical shift left of rx by imm5 bits) 5-bit immediate offset immediate Example: addi rx, oimm5 (add rx and unsigned value oimm5) MCore instruction formats 7-bit immediate Only used by movi rx, imm7 (put unsigned imm7 into rx) Control register Used to move between rx and special register (CReg) Memory access Scaled 4-bit immediate Scale (shift left) imm4 to match access type (byte, half-word, word) and add to rx MCore instruction formats Load relative word Only used by lrw rz,label (left-shift zero-extended DISP_8 by 2, add to PC+2, force two LSBs to zero, and load word at this address into rz) Register quadrant and multiple register Used to load/store multiple registers Flow control Scaled 11-bit displacement Example: bf label (if condition register is “false”, branch to address formed by appending one zero to right of sign-extended DISP_11 and adding to PC+2; otherwise increment PC) MCore instruction formats Register Example: jsr rx (put PC+2 in r15; jump to subroutine addressed by appending one zero to right of rx) Indirect Example jsri label (put PC+2 in r15; jump to subroutine addressed by left-shifting zero-extended DISP_8 by 2, adding to PC+2, and forcing two LSBs to zero Register with 4-bit negative displacement Only used for loopt rx, label (decrement rx; set condition bit if signed rx>0; clear the condition bit otherwise; if prior value of condition register was “true”, branch to address formed by appending one to right of one-extended DISP_4 and adding to PC+2; otherwise increment PC+2) MCore instruction set Mnemonic Description ABS ADDC ADDI ADDU AND ANDI ANDN ASR ASRC Absolute Value Add with C bit Add Immediate Add Unsigned Logical AND Logical AND Immediate AND NOT Arithmetic Shift Right Arithmetic Shift Right, Update C Bit BCLRI BF BGENI BGENR BKPT BMASKI BR BREV BSETI BSR BT BTSTI Clear Bit Branch on Condition False Bit Generate Immediate Bit Generate Register Breakpoint Bit Mask Immediate Branch Bit Reverse Bit Set Immediate Branch to Subroutine Branch on Condition True Bit Test Immediate CLRF CLRT CMPHS CMPLT CMPLTI CMPNE CMPNEI Clear Register on Condition False Clear Register on Condition True Compare Higher or Same Compare Less-Than Compare Less-Than Immediate Compare Not Equal Compare Not Equal Immediate MCore instruction set DECF DECGT DECLT DECNE DECT DIVS DIVU DOZE Decrement on Condition False Decrement Register and Set Condition if Result Greater-Than Zero Decrement Register and Set Condition if Result Less-Than Zero Decrement Register and Set Condition if Result Not Equal to Zero Decrement on Condition True Divide (Signed) Divide (Unsigned) Doze FF1 Find First One INCF INCT IXH IXW Increment on Condition False Increment on Condition True Index Halfword Index Word JMP JMPI JSR JSRI Jump Jump Indirect Jump to Subroutine Jump to Subroutine Indirect LD.[BHW] LDM LDQ LOOPT LRW LSL, LSR LSLC, LSRC LSLI, LSRI Load Load Multiple Registers Load Register Quadrant Decrement with C-Bit Update and Branch if Condition True Load Relative Word Logical Shift Left and Right Logical Shift Left and Right, Update C bit Logical Shift Left and Right by Immediate MCore instruction set MFCR MOV MOVI MOVF MOVT MTCR MULT MVC MVCV Move from Control Register Move Move Immediate Move on Condition False Move on Condition True Move to Control Register Multiply Move C Bit to Register Move Inverted C Bit to Register NOT Logical Complement OR Logical Inclusive-OR ROTLI RSUB RSUBI RTE RFI Rotate Left by Immediate Reverse Subtract Reverse Subtract Immediate Return from Exception Return from Interrupt SEXTB SEXTH ST.[BHW] STM STQ STOP SUBC SUBU SUBI SYNC Sign-Extend Byte Sign-Extend Halfword Store Store Multiple Registers Store Register Quadrant Stop Subtract with C Bit Subtract Subtract Immediate Synchronize MCore instruction set TRAP TST TSTNB Z Trap Test Operands Test for No Byte Equal Zero WAIT Wait XOR XSR XTRB0 XTRB1 XTRB2 XTRB3 Exclusive OR Extended Shift Right Extract Byte 0 Extract Byte 1 Extract Byte 2 Extract Byte 3 ZEXTB ZEXTH Zero-Extend Byte Zero-Extend Halfword MCore versus MIPS Both are 32-bit load-store GPR architectures MIPS was designed for desktops and servers, MCore for embedded systems MCore instruction length is 16 bits to save memory code space but this reduces The number of GPRs (16 versus 32 in MIPS) The number of operands (2 versus 3 in MIPS) MCore uses a condition bit versus a GPR for MIPS to set branch conditions Condition bit saves a GPR and sometimes an instruction but constrains instruction ordering MCore has more addressing modes and instruction types Saves code space but increases complexity of decoder and perhaps arithmetic units Questions?