Download + A[1] - University of Rochester ECE

Document related concepts
no text concepts found
Transcript
ECE200 – Computer Organization
Chapter 3 – Instructions:
The Language of the
Machine
Homework 3

3.2, 3.5, 3.9, 3.11, 3.14, 3.16, 3.19, 3.25, 3.27

Warning: Some of you will find this to be the
most difficult chapter and hardest homework of
the course
Outline for Chapter 3 lectures


Basic processor operation
ISA motivation: enabling fast hardware
implementations

Reduced Instruction Set Computing (RISC)

MIPS instruction formats and addressing modes

Rundown of MIPS instructions

Procedure call

Arrays and pointers

Assembling, linking, and loading programs

Motorola MCore ISA
The heart of a modern microprocessor
destination operand
source
operand 1
Load instructions
bring data from
memory into the
register file
register
file
source
operand 2
ALU
ALU = Arithmetic
Logic Unit
operation
Store instructions
store data from the
register file into
memory
Register file holds data
to be operated on
Basic processing of an instruction

Instruction fetch
 Get

the instruction from instruction memory
Instruction decode
 Figure

Read source operands
 Get

data to be operated on from the register file
Execute
 Do

out what kind it is and how to control it
the operation
Write destination operand
 Write
the result of the operation to the register file
Basic processing of an instruction

Instruction fetch
 Get
the instruction from instruction memory
Basic processing of an instruction

Instruction decode
 Figure
out what kind it is and how to control it
control
logic
Basic processing of an instruction

Read source operands
 Get
data to be operated on from the register file
Basic processing of an instruction

Execute
 Do
the operation
Basic processing of an instruction

Write destination operand
 Write
the result of the operation to the register file
The job of the ISA designer is to…

create an ISA that enables fast hardware
designs…

that meet the design constraints (cost, power
dissipation, etc.)…

of the targeted market (workstation, handheld,
etc.)

Other jobs as well, such as
 Simplifying
choices for the compiler
 Ensuring the longevity of the ISA by anticipating future
technology trends
Making computer hardware fast

Computer hardware consists of datapaths and
control, each of which may limit machine speed

The datapaths pass the data that is being
computed on to the different units

The control steers the data to different units and
selects the operation of each unit

Example
input select
arithmetic
logic unit
MUX
operation
datapath
control
Making computer hardware fast

Goal #1: fast execution of simple operations

Goal #2: fast access of data from memory

Goal #3: fast fetching and decoding of
instructions
Fast execution of simple operations

Many programs contain predominately simple
integer operations (e.g., add, and, load, branch)

The results of operations are often used by
successive instructions


a + b + c + d
Microprocessor designers make these simple
operations as fast as possible and try to set the
clock period of the chip to match this speed
Register-register ISA

All operands for arithmetic operations are read
from the register file
destination operand
register #
source operand 1
register #
source operand 2
register #
destination operand
register
file
source
operand 1
source
operand 2
ALU
operation
 Register

file (32 registers in MIPS) holds data to be executed
Big enough to hold most variables but small enough for a fast clock
 Very
fast datapath and simple, fast control
 Operations which require the result of immediately prior
operations can execute on back-to-back clock cycles
through an operation called forwarding (Ch 4)
Register-register ISA

add $4, $3, $10
destination operand
register #
source operand 1
register #
source operand 2
register #
destination operand
register
file
source
operand 1
source
operand 2
ALU
operation
Making computer hardware fast

Goal #1: fast execution of simple operations

Goal #2: fast access of data from memory

Goal #3: fast fetching and decoding of
instructions
Fast access of data from memory

In a register-register machine, load and store
operations are used to access data in memory

To access data memory, we need to first
calculate the data’s memory address, called the
effective address


Addressing modes provide the means for
representing offsets, array indexing, pointers,
etc. at the assembly language level
The number and complexity of addressing modes
is a tradeoff between INST and (CPI,CT)
 More,
powerful modes reduces INST
 Fewer, simple modes simplifies the EA calculation control and
datapath, and thus the time required to access memory

The ALU can serve the additional purpose of
effective address calculation to save hardware
Addressing mode example #1

Base mode (also called displacement mode)
 Used
to access local variables
 Example: ld $2, 4 ($1)
destination operand
destination operand
register #
source operand 1
register #
register
file
source
operand 1
effective
address
ALU
immediate
value
operation = add
data
cache
Addressing mode example #2

Scaled mode
 Used
to index arrays
 Example: ld $2, 4 ($1) [$3]
destination operand
destination operand
register #
source operand 1
register #
source operand 2
register #
register
file
source
operand 1
source
operand 2
x
ALU
scale
immediate
value
operation = add
+
effective
address
data
cache
Making computer hardware fast

Goal #1: fast execution of simple operations

Goal #2: fast access of data from memory

Goal #3: fast fetching and decoding of
instructions
Fast instruction fetching


The Program Counter is a special register that
holds the memory address of the currently
executing instruction
The PC is normally incremented to point to the
next sequential instruction, except in the case of
a jump or taken branch
logic
branch
displacement
branch
condition

PC
instruction
cache
instruction
Limiting the way that branches can modify the
PC speeds up the calculation of the next PC
Fast instruction decoding

After an instruction is fetched, we parse its fields
in hardware (decode it) to determine
 The
operation to be performed
 The location of the source and destination operands
 Other instruction-specific information such as



The addressing mode and values (e.g., offset) for memory instructions
Constant values
Decode is greatly simplified (and therefore
execution begins sooner) if
 All
instructions have the same length
 There is consistency (regularity) among instruction types in
terms of where the fields are located
 The number of fields is limited
Reduced Instruction Set Computing

RISC has its roots in
 The
IBM 801 project started by John Cocke at IBM Research
in 1975

Precursor to the IBM RS/6000 workstation processors which later
influenced PowerPC
 The

Evolved into the SPARC ISA of Sun Microsystems
 The


Berkeley RISC project started by Dave Patterson in 1980
Stanford MIPS project started by John Hennessy ~1980
Hennessy co-founded MIPS Computer
RISC philosophy is that instruction sets should
be simplified to enable fast hardware
implementations that can be exploited by
optimizing compilers
 University
projects had the added goal of single chip
implementation
Original RISC approach

Fixed-length (32 bits for MIPS) instructions that
have only a few formats
 Simplifies
instruction fetch and decode
 Code density is sacrificed


Some bits are wasted for some instruction types
Register-register architecture
called load-store architecture
 Permits very fast implementation of simple instructions
 Easier to pipeline (Chapter 6)
 Requires more instructions to implement a HLL program
 Also

Limited number of addressing modes
 Simplifies

EA calculation and thus speeds up memory access
Few if any complex arithmetic functions
 Instead
more, simpler instructions are used
Some recent deviations from RISC

Shorter instructions (16 bits) intermixed with
longer (32 bits)
 Higher

code density for embedded environments
Some complex operations
 Multiply-add

for DSP and multimedia applications
Experience is that selectively adding some
complexity back into the ISA improves
performance and/or cost effectiveness
MIPS instruction formats

R-type (Register)
31
26 25
op

rs
16 15
rt
11 10
rd
6 5
shamt
0
funct
I-Type (Immediate)
31
26 25
op

21 20
21 20
rs
16 15
rt
0
immediate
J-Type (Jump)
31
26 25
op
0
target
MIPS instruction formats
where
 op
is a 6-bit operation code (opcode)
 rs is a 5-bit source register specifier
 rt is a 5-bit (source or destination) register specifier or branch
condition
 rd is a 5-bit destination register specifier
 shamt is a 5-bit shift amount
 funct is a 6-bit function field
 immediate is a 16-bit immediate, branch displacement, or
memory address displacement
 target is a 26-bit jump target address

Simplifications
 Fixed
length
 Limited number of field types
 Many fields located in same location in different formats
MIPS addressing modes
Rundown of MIPS instructions

Computational
 R-type

and I-type
Load and store instructions
 I-type

Jump and branch instructions
 R-type,

I-type, and J-type
Only dealing with integer instructions for now
Registers in MIPS
31
general purpose registers
r0
r1
.
.
.
0
31
31
multiply/divide registers
HI
LO
program counter register
PC
0
0
r30
r31

32 GPRs
 r0
is equal to zero in all MIPS implementations, which makes it
easy to…


Load a constant, e.g., addi $2,$0,#3
Do absolute addressing, e.g., lw $4,1000($0)
 r31
holds return address for subroutine returns
 All other GPRs are “generic” although software conventions
define their usage (discussed later)
Registers in MIPS

HI and LO
 Hold
multiply result (HI holds upper half)
 Hold divide quotient (LO) and remainder (HI)

PC holds address of currently executing
instruction

Also have a separate set of 32 floating point
registers if floating point arithmetic is supported
Computational operands in MIPS


All computations occur on full 32 bit words
Computations are on signed (2’s complement) or
unsigned operands (positive numbers)
 Example:
1111 1111 1111 1111 1111 1111 1111 1111 is –1
as a signed number and 4,294,967,294 as an unsigned
number

Operands are in registers or are immediates

Immediate (constant) values are 16 bits wide
is sign extended (most significant bit is replicated in the
uppermost 16 bits) before operated on to convert it to a
32-bit operand
 Value

Example: 1000 0000 0000 0000
becomes
1111 1111 1111 1111 1000 0000 0000 0000
Computational instructions

Arithmetic (signed, unsigned, and immediate)
 add,
sub, mult, div
 addu, subu, multu, divu
 addi

Logical
 and,
or, nor, xor
 andi, ori, xori

Shift (logical and arithmetic)
 srl,

sll
Shift shamt digits to right or left; put zeros in leftmost positions for srl
 srlv,

Shift [rs] digits to right or left; put zeros in leftmost positions for srlv
 sral,

sllv
srav
Same as above except sign-extend the high-order bits
Computational instructions

Set less than (signed, unsigned, and immediate)
rd to 1 if condition is met, set to 0 otherwise
 Set
 Almost
 slt,

sltu
Condition is rs<rt
 slti,


always paired with a branch instruction
sltiu
Condition is rs<immediate
Load upper immediate (lui)
immediate 16 bits left, append 16 zeros to right, put 32bit result into rd
 Shift
 Can

create a 32-bit constant from lui and ori
Move to/from HI/LO
 mfhi,

Put HI or LO register contents into rd
 mthi,

mflo
mtlo
Put rs contents into HI or LO (for interrupt recovery)
Load and store operands in MIPS

Memory operands can be bytes, half-words (2
bytes), or words (4 bytes [32 bits])
 opcode

determines the operand size
Number of address bits used is
 32
for accessing bytes
 31 for accessing half-words
 30 for accessing words

Half-word and word addresses must be aligned
 Bit
0 must be zero for half-word accesses
 Bits 0 and 1 must be zero for word accesses
31

word effective address
1 0
0 0
.
.
.
12
8
4
0
byte
addresses
1 byte
memory
1 word
Load operands can be signed or unsigned
.
.
.
12
8
4
0
Load and store instructions

lb, lh, lw
 Form
effective address, get data from addressed memory
location, sign extend if lb or lh, load into rt

lbu, lhu, lwu
 Form
effective address, get data from addressed memory
location, zero extend if lb or lh, load into rt

sb, sh, sw
effective address, store data from rt (partial if sb or sh)
into addressed location
 Form

Also load/store instructions for unaligned data
MIPS assembly example

A is an array of words

Starting address of A (memory address of A[0])
is in $4

Want to perform A[3] = A[0] + A[1] - A[2]

MIPS assembly language:
lw
lw
add
lw
sub
sw
$1, 0($4)
$2, 4($4)
$1, $1, $2
$2, 8($4)
$1, $1, $2
$1, 12($4)
# A[0] into $1
# A[1] into $2
# A[0]+A[1] into $1
# A[2] into $2
# A[0]+A[1]-A[2] into $1
# store result in A[3]
Branch and jump instructions

Branch instructions test a condition, either
 Equality

beq, bne
Often coupled with slt, sltu, slti, sltiu
 or
the value of rs relative to zero



or inequality of rs and rt
bltz, bgtz, blez, bgez
and form the target address by adding the sign
extended immediate to the PC
all instructions are words, immediate is shifted left two
bits before being sign extended
 Since

Operation
true
PC = target
test
condition
false
PC = PC+4
Branch and jump instructions

Jump instructions unconditionally branch to the
address formed by either
left the 26-bit target two bits and combining it with
the 4 high-order PC bits
 Shifting

j
 The


jr
contents of register rs
Branch and link and jump and link instructions
also save the address of the next instruction into
r31
 bltzal,
bgezal, jal, jalr
 Used for subroutine calls
 jr $31 used to return from a subroutine
for loop and if-then-else example

i, j, n are in $1, $2, $3; $4 holds A base address
for (i=0; i<n; i++) {
if (A[i] < j)
A[i]=A[i]*2
else
A[i]=A[i]+1
}
Loop:
Else:
Exit:
add
lw
slt
beq
add
j
addi
sw
addi
addi
bne
$1, $0, $0
$5, 0($4)
$6, $5, $2
$6, $0, Else
$5, $5, $5
Exit
$5, $5, 1
$5, 0($4)
$4, $4, 4
$1, $1, 1
$1, $3, Loop
# initialize i to 0
# A[i] into $5
# set $6=1 if A[i]<j
# if not true, do else
# A[i]=A[i]*2
# skip else
# A[i]=A[i]+1
# store A[i]
# point to next A[i]
# increment i
# loop back if i<n
Special instructions

System call (syscall)
 Call
OS to perform system services, e.g., printf
 $2 contains code specifying the service to perform

breakpoint
 Transfer
control to exception handler
 Useful for debugging

Many instruction extensions have been made to
the original MIPS ISA
 Branches
 System
 Etc.
functions
Pseudoinstructions

Assembler can create instruction sequences for
other common operations

Examples
 move
$2, $3
 blt $7, $14, Label
 li $4, 100

Many more examples in Appendix A

Do not use these in your assignments!
Register usage conventions

Software conventions for the use of registers
ensure that routines developed by different
programmers can work together

MIPS assembly convention assigns special
names and meanings to the GPRs

Preferred to use symbolic names instead of
register numbers
Caveats

Conventions may vary by compiler (SGI/MIPS,
gcc, etc.)

The book has taken some liberties in the spirit of
simplification (and so will my notes)

So what is presented here (and in the book) may
not be 100% accurate in terms of following
“MIPS conventions”
MIPS register usage conventions
Register
0
1
2-3
4-7
8-15
24,25
16-23
26,27
28
29
30
31
name
$zero
$at
$v0,$v1
$a0-$a3
$t0-$t7
$t8,$t9
$s0-$s7
$k0,$k1
$gp
$sp
$fp
$ra
usage
always returns zero
reserved by assembler
value returned by subroutine
arguments passed to subroutine
temps not saved by called subroutine
temps saved by called subroutine
reserve for use by interrupt handler
pointer to global area
stack pointer
frame pointer
subroutine return address
Procedure call

To support procedures, we need means for
 Passing
parameters to the procedure
 Transferring control to the procedure
 The procedure to use registers and memory without disturbing
that of the calling program
 Passing results back to the calling program
 Returning control to the calling program
calling
program
(1) pass
parameters,
transfer control
procedure
(2) save variables,
allocate memory
for local variables
(3) execute
procedure
(4) pass results,
return control
(4) restore
variables
stack
Procedure call

Passing parameters
 $a0-$a3
reserved for parameter passing
 Additional parameters passed via the stack (later)

Transferring control
 jal


ProcedureName
Puts PC+4 into $ra ($31) and jumps to address associated with label
ProcedureName
Passing results
 $v0,$v1
reserved for passing results
 Stack used for additional results

Returning control
 jr

$ra
Jumps to address contained in register $ra
Caller and callee-saved registers

Caller: the procedure that calls another

Callee: the called procedure

Values in registers that are needed by the caller
after the callee returns must be preserved
Loop:
Else:
Exit:
 $s1-$s4
add
lw
slt
beq
add
jal
add
j
addi
sw
addi
addi
bne
$s1, $zero, $zero # initialize i to 0
$t0, 0($s4)
# A[i] into $t0
$t1, $t0, $s2
# set $t1=1 if A[i]<j
$t1, $zero, Else # if not true, do else
$a0, $t0, $zero # copy A[i] into $a0
doubleit
# doubleit performs A[i]=A[i]*2
$t0, $v0, $zero # copy result into $t0
Exit
# skip else
$t0, $t0, 1
# A[i]=A[i]+1
$t0, 0($s4)
# store A[i]
$s4, $s4, 4
# point to next A[i]
$s1, $s1, 1
# increment i
$s1, $s3, Loop # loop back if i<n
must be the same after return from doubleit
Caller and callee-saved registers

$t0-$t9 are caller-saved registers
 Calling
procedure must save them before the procedure call if
its want the values preserved across the call
 Called procedure can use them without saving them first
 Used for temporary values that do not need to be saved
across a call, or if we use all $s0-$s7
 $a0-$a3 and $v0,$v1 are also caller-saved

$s0-$s7 are callee-saved register
 Called
procedure must save them before using them
 Calling procedure does not need to save them before the call
 Used for values that need to be preserved across a call, or if
we use all $t0-$t9

If the caller does not need $t0-$t9, $a0-$a3, and
$v0,$v1 to be saved across a call, and the callee
does not use $s0-$s7, then none of them need
to be saved
The stack

The stack is a particular area of memory used for
 Passing
additional parameters (beyond $a0-$a4) and results
(beyond $v0, $v1)
 Preserving register values that need to be maintained across a
procedure call
 Creating memory space for a procedure (e.g., for local
variables)

Items are pushed onto the stack and popped off
31
push 22
19
7

push 31
22
19
7
pop
31
22
19
7
22
pop
22
19
7
19
7
But note that the stack is not strictly accessed
as a first-out structure
The stack

The LIFO nature of the stack matches the nested
nature of procedure calling
calling
program
non-leaf
procedures
leaf
procedure
Creating a stack in MIPS

Need storage
 Convention
is to use memory starting at address 7fff ffffhex
(bottom of the stack) and growing to lower addresses

Need a pointer (memory address register) to the
top of the stack
 $sp
($29) by convention
7fff ffffhex
$sp
memory
stack
direction
of stack
growth
0hex
Using the stack

Pushing items onto the stack
 Adjust
$sp to make room for new items
 Store items onto stack

Popping items off the stack
 Retrieve
items off stack
 Adjust $sp to position before items

Example
addi $sp, $sp, -12
sw $t1, 8($sp)
sw $t0, 4($sp)
sw $s0, 0($sp)
lw $s0, 0($sp)
lw $t0, 4($sp)
lw $t1, 8($sp)
addi $sp, $sp, 12
Procedure call steps

Caller (before call)
 Save
caller-saved registers
 Store arguments
First four in $a0-$a3, remaining pushed onto the stack

 jal

Callee
 Adjust
stack pointer
 Save callee-saved registers (including $ra if non-leaf)
 Read arguments
 Execute the procedure
 Restore callee-saved registers
 Store return values and adjust stack pointer

 jr

First two in $v0, $v1, remaining pushed onto the stack
$ra
Caller (after return)
 Retrieve
arguments and restore caller-saved registers
Stack before (a), during (b), and after (c)

Frame pointer ($fp) provides a stable base
register from which to reference local variables
 Provided
for convenience only; some MIPS conventions use
$30 as $s8 instead
Example procedure: factorial

Recursive function, therefore non-leaf

C code
int fact (int n)
{
if (n < 1) return (1);
else return (n*fact(n-1));
}
Example procedure: factorial

MIPS code (see Appendix A for slight variation)
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
# adjust stack for two items
# save $ra on stack
# save $a0 (n) on stack
# test n<1
# if not, handle (n*fact(n-1))
# put 1 in result register $v0
# readjust stack pointer
# return
# put n-1 in argument register $a0
# call fact
# restore n from stack
# restore $ra from stack
# readjust stack pointer
# put n*fact(n-1) in result register $v0
# return
fact execution with n=2

fact called three times, each with different
values of n
calling
program
n=2
n=1

$a0 used to pass the value of n

$v0 used for the return value
n=0
fact execution with n=2
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$sp
fact execution with n=2
$a0=2
fact:
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$sp
$ra
$a0
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=1
L1:
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$ra
$a0
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
$v0=1
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
fact execution with n=2
$a0=2
fact:
L1:
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$v0=2*1*1
$a0=1
L1:
$v0=1*1
$sp
fact:
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
$a0=0
L1:
$v0=1
addi
sw
sw
slti
beq
addi
addi
jr
addi
jal
lw
lw
addi
mul
jr
$sp, $sp, -8
$ra, 4($sp)
$a0, 0($sp)
$t0, $a0, 1
$t0, $zero, L1
$v0, $zero, 1
$sp, $sp, 8
$ra
$a0, $a0, -1
fact
$a0, 0($sp)
$ra, 4($sp)
$sp, $sp, 8
$v0, $a0, $v0
$ra
How pointers map to MIPS instructions

Pointers are addresses of data objects

Just as in HLLs, pointer versions of assembly
language programs are often more efficient than
array versions

Compilers often will map an array version in a
HLL into a pointer version in machine code
How pointers map to MIPS instructions

Array version of a procedure to clear an array
clear1 (int array[], int size)
{
int i;
for (i=0; i<size; i++)
array[i] = 0;
}
clear1:
Loop:
add
add
add
add
sw
addi
slt
bne
$t0, $zero, $zero # initialize i to 0
$t1, $t0, $t0
# $t1 = 2*i
$t1, $t1, $t1
# $t1 = 4*i
$t2, $a0, $t1
# $t2 = &array[i]
$zero, 0($t2)
# clear array[i]
$t0, $t0, 1
# increment i
$t3, $t0, $a1
# set $t3=1 if i<size
$t3, $zero, Loop # if $t3 not equal 0, reiterate loop
How pointers map to MIPS instructions

Pointer version of a procedure to clear an array
clear2 (int *array, int size)
{
int *p;
for (p=&array[0]; p<&array[size]; p++)
*p = 0;
}
clear2:
Loop:
add
add
add
add
sw
addi
slt
bne
$t0, $a0, $zero
$t1, $a1, $a1
$t1, $t1, $t1
$t2, $a0, $t1
$zero, 0($t0)
$t0, $t0, 4
$t3, $t0, $t2
$t3, $zero, Loop
# initialize p to &array[0]
# $t1 = 2*size
# $t1 = 4*size
# $t2 = &array[size]
# clear location pointed to by p
# increment p to point to next element
# set $t3=1 if p<&array[size]
# if $t3 not equal 0, reiterate loop
Assembling, linking, loading programs
compile
time
run
time
Compiler

Creates assembly language from HLL
 May

produce object files directly (built-in assembly)
Phases
 Front

end
Translate from HLL to intermediate (internal) form
 Optimizations

Improve execution performance, often at the expense of increasing
code size
 Back

(optional)
end (code generator)
Generate assembly (or object code) from intermediate form
Compiler structure
Compiler optimizations
Assembler


Converts assembly language into an object file
in binary format that can be linked with other
object files
Object file format
 Header:
size and position of other object file sections
 Text segment: machine code (instructions)
 Data segment: data
 Relocation information: list of instructions and data that
depend on absolute memory addresses when the program is
loaded into memory
 Symbol table: location of each label in the program as well as
list of unresolved references
 Debugging
information
Example object file
Header
Text segment
Data segment
Relocation info
Symbol table
Name
myprog
Text size
100hex
Data size
20hex
Address
Instruction
0
lw $a0, 0($gp)
4
jal 0
…
…
Address
Data
0
(X)
…
…
Address
0
Instruction type
lw
Dependency
X
4
jal
yourproc
Label
Address
X
yourproc
Linker

Links object files together to form an executable
(binary) that can be run on the machine
should be
“printf”
and executable files should show machine language
(in binary format) not assembly language
 Object
Linker tasks

Determine memory addresses of code and data

Resolve absolute addresses

Find library routines referenced by the program

Resolve references among files

Resulting program resides in I/O
 on
hard drive
 on CD
 on another machine’s hard drive accessible via the network
 Etc.
Example executable file
Header
Text segment
Data segment
Name
Text size
myprog
300hex
Data size
50hex
Address
0040 0000hex
Instruction
lw $a0, 8000hex($gp)
0040 0004hex
jal 100040hex
…
…
0040 0100hex
[start of yourproc]
…
…
Address
1000 0000hex
Data
(X)
…
…
$gp = 1000 8000hex
Loader

Part of the operating system that loads
programs into memory, starts them, and
terminates them when done

Steps
 Determine
memory size of text and data segments from
executable header
 Create memory address space big enough for the program
 Copy text and initialized data into those memory locations
 Copy input parameters onto the stack
 Clear general registers and set $sp to first free stack location
 Call a startup routine that



Copies input parameters into $a0-$a4
Loads the PC with the starting address of main
Terminates the program when main returns
The Motorola MCore architecture

Introduced in 1997 to attack embedded market
 Cell
phones
 Personal Digital Assistants (PDAs)
 Portable Global Positioning Systems (GPSs)
 Automotive systems (braking, engine control, etc)
 Etc.

Features
 32-bit

architecture (integer arithmetic is on 32 bits)
Many embedded processors are 8 or 16-bit
 16-bit
fixed-length instructions
 Two-operand register-register (load-store) architecture

One source operand also serves as the destination (e.g., rx = rx + ry)
 Byte,
half-word, and word memory accesses
 Power saving modes

http://www.motorola.com/mcore
MCore user visible registers

16 general purpose 32-bit
registers
 R0
is stack pointer
 R15 holds return address

32-bit PC register

1-bit condition register
 Set
by compare, arithmetic, and
logical operations
 Alternative to setting a GPR for
branch conditions as in MIPS
MCore instruction formats

Register-to-register
 Monadic

Example: abs rx (absolute value of register rx)
 Dyadic

Example: movt rx, ry (move ry to rx if condition register is “true”)
 5-bit

Example: lsli rx, imm5 (logical shift left of rx by imm5 bits)
 5-bit

immediate
offset immediate
Example: addi rx, oimm5 (add rx and unsigned value oimm5)
MCore instruction formats
 7-bit

immediate
Only used by movi rx, imm7 (put unsigned imm7 into rx)
 Control


register
Used to move between rx and special register (CReg)
Memory access
 Scaled

4-bit immediate
Scale (shift left) imm4 to match access type (byte, half-word, word) and
add to rx
MCore instruction formats
 Load

relative word
Only used by lrw rz,label (left-shift zero-extended DISP_8 by 2, add to
PC+2, force two LSBs to zero, and load word at this address into rz)
 Register


quadrant and multiple register
Used to load/store multiple registers
Flow control
 Scaled

11-bit displacement
Example: bf label (if condition register is “false”, branch to address
formed by appending one zero to right of sign-extended DISP_11
and adding to PC+2; otherwise increment PC)
MCore instruction formats
 Register

Example: jsr rx (put PC+2 in r15; jump to subroutine addressed by
appending one zero to right of rx)
 Indirect

Example jsri label (put PC+2 in r15; jump to subroutine addressed by
left-shifting zero-extended DISP_8 by 2, adding to PC+2, and forcing two
LSBs to zero
 Register

with 4-bit negative displacement
Only used for loopt rx, label (decrement rx; set condition bit if signed
rx>0; clear the condition bit otherwise; if prior value of condition register
was “true”, branch to address formed by appending one to right of
one-extended DISP_4 and adding to PC+2; otherwise increment
PC+2)
MCore instruction set
Mnemonic
Description
ABS
ADDC
ADDI
ADDU
AND
ANDI
ANDN
ASR
ASRC
Absolute Value
Add with C bit
Add Immediate
Add Unsigned
Logical AND
Logical AND Immediate
AND NOT
Arithmetic Shift Right
Arithmetic Shift Right, Update C Bit
BCLRI
BF
BGENI
BGENR
BKPT
BMASKI
BR
BREV
BSETI
BSR
BT
BTSTI
Clear Bit
Branch on Condition False
Bit Generate Immediate
Bit Generate Register
Breakpoint
Bit Mask Immediate
Branch
Bit Reverse
Bit Set Immediate
Branch to Subroutine
Branch on Condition True
Bit Test Immediate
CLRF
CLRT
CMPHS
CMPLT
CMPLTI
CMPNE
CMPNEI
Clear Register on Condition False
Clear Register on Condition True
Compare Higher or Same
Compare Less-Than
Compare Less-Than Immediate
Compare Not Equal
Compare Not Equal Immediate
MCore instruction set
DECF
DECGT
DECLT
DECNE
DECT
DIVS
DIVU
DOZE
Decrement on Condition False
Decrement Register and Set Condition if Result Greater-Than
Zero
Decrement Register and Set Condition if Result Less-Than Zero
Decrement Register and Set Condition if Result Not Equal to Zero
Decrement on Condition True
Divide (Signed)
Divide (Unsigned)
Doze
FF1
Find First One
INCF
INCT
IXH
IXW
Increment on Condition False
Increment on Condition True
Index Halfword
Index Word
JMP
JMPI
JSR
JSRI
Jump
Jump Indirect
Jump to Subroutine
Jump to Subroutine Indirect
LD.[BHW]
LDM
LDQ
LOOPT
LRW
LSL, LSR
LSLC,
LSRC
LSLI, LSRI
Load
Load Multiple Registers
Load Register Quadrant
Decrement with C-Bit Update and Branch if Condition True
Load Relative Word
Logical Shift Left and Right
Logical Shift Left and Right, Update C bit
Logical Shift Left and Right by Immediate
MCore instruction set
MFCR
MOV
MOVI
MOVF
MOVT
MTCR
MULT
MVC
MVCV
Move from Control Register
Move
Move Immediate
Move on Condition False
Move on Condition True
Move to Control Register
Multiply
Move C Bit to Register
Move Inverted C Bit to
Register
NOT
Logical Complement
OR
Logical Inclusive-OR
ROTLI
RSUB
RSUBI
RTE
RFI
Rotate Left by Immediate
Reverse Subtract
Reverse Subtract Immediate
Return from Exception
Return from Interrupt
SEXTB
SEXTH
ST.[BHW]
STM
STQ
STOP
SUBC
SUBU
SUBI
SYNC
Sign-Extend Byte
Sign-Extend Halfword
Store
Store Multiple Registers
Store Register Quadrant
Stop
Subtract with C Bit
Subtract
Subtract Immediate
Synchronize
MCore instruction set
TRAP
TST
TSTNB
Z
Trap
Test Operands
Test for No Byte Equal Zero
WAIT
Wait
XOR
XSR
XTRB0
XTRB1
XTRB2
XTRB3
Exclusive OR
Extended Shift Right
Extract Byte 0
Extract Byte 1
Extract Byte 2
Extract Byte 3
ZEXTB
ZEXTH
Zero-Extend Byte
Zero-Extend Halfword
MCore versus MIPS

Both are 32-bit load-store GPR architectures

MIPS was designed for desktops and servers,
MCore for embedded systems

MCore instruction length is 16 bits to save
memory code space but this reduces
 The
number of GPRs (16 versus 32 in MIPS)
 The number of operands (2 versus 3 in MIPS)

MCore uses a condition bit versus a GPR for
MIPS to set branch conditions
 Condition
bit saves a GPR and sometimes an instruction but
constrains instruction ordering

MCore has more addressing modes and
instruction types
 Saves
code space but increases complexity of decoder
and perhaps arithmetic units
Questions?