Download HLL - MavDISK

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Programming with MIPS Assembly Language
Leon Tietz, Ph.D.
Introduction
Assembly language programming is really not all that different from programming in a high
level language (HLL). With HLL, the task of the programmer is to use the facilities that are
provided by the language to create a higher level of functionality. Frequently, an application
will have many layers of abstraction to bridge the gap between the facilities of the language
and the desired application. This is simply the act of programming and is accomplished by
defining functions, objects, methods, etc.
With assembly language programming, the facilities that are provided by the language are
lower level, i.e. they are more primitive than with HLL. Typically, if an algorithm is coded in
both assembly language and HLL, more lines of assembly code will be required.
Assembly language programming is an important part of this course because it represents the
interface between hardware and software. This means that assembly language can only use
capabilities that are directly provided by the hardware. Computer architects refer to this as
the Instruction Set Architecture (ISA) of the computer. Each ISA requires the Assembly
language programmer to work with only the features that are provided by that ISA. Actually,
this is strictly true only for machine language. Most assemblers do provide a small amount
of support (e.g. with pseudo instructions or macros) to make the instruction set seem a little
richer (slightly higher level) than it actually is.
This class will use the MIPS ISA. MIPS is actually an acronym that stands for
microprocessor without interlocked pipeline stages and although the acronym is no longer
relevant, the name MIPS is still used. Since MIPS is used primarily in embedded applications
and some UNIX workstations, you probably don’t have a MIPS processor on which to run
this code, so we will use the SPIM simulator. Actually, it is much easier to learn to write
assembly language using a simulator than to run it on the actual hardware because the
simulator can show in detail the various actions that are being performed. This can be a
significant benefit to assist you in debugging your code. Using the simulator will allow you
to observe your code being executed one instruction at a time by using the facility known as
single-stepping. The simulator will also keep track of the path taken through your code when
you are single-stepping
Even though each ISA and its accompanying assembly language are different, just as with
learning HLLs, once you have learned one or two, learning others becomes quite easy. MIPS
Assembly Language is relatively easy to learn. When writing HLL you are writing
statements. With assembly language these are called instructions. A typical HLL statement
will require more than one assembly language instruction. Many times when writing
assembly language, it will be helpful to think about an algorithm as one HLL statement at a
time and to write assembly language by translating just one HLL statement at a time.
1
MIPS is a RISC (Reduced Instruction Set Computer) architecture. In general, this means that
it has simpler instructions that are easy for the hardware to carry out quickly. Common
characteristics of RISC computers are:
 Instructions are all the same length (each MIPS instruction is 32-bits long)
 There are a relatively large number of registers available to the programmer. The
MIPS ISA has 32 general purpose registers.
 Only a limited number of instructions access memory for data. Most instructions use
only data in registers.
Registers
So what is a register? By the way, the word is register, not registry. The registry is that thing
that Microsoft Windows uses to keep track of many things. It is easiest to think of registers as
variables that are built right into the processor. By saying that they are built into the
processor we mean that they are not in memory. This distinction is important because the
processor has much faster access to registers than to memory and by using registers
appropriately, much higher computation speeds are achieved.
You might want to draw an analogy comparing registers to the workbench of a craftsman.
Items are brought to the workbench when they are needed and (to avoid the workbench
getting so cluttered that no work can be done there) written back to memory when they won’t
be needed again for some time.
The general purpose registers in MIPS can each hold
 A 2’s complement integer (either signed or unsigned)
 an address (the identity of a specific memory location) or
 a bit pattern (possibly one to four ASCII characters).
Of the 32 general purpose MIPS registers, 2 are actually not really general purpose. Register
0, usually written $0 or $zero always contains a zero. Zero is a really useful number and it is
handy to have a register that always contains this value. If you write a non-zero number to
this register, and then read it, you will still read a zero. Register 31, usually written $31 or $ra
is used by the processor to automatically store a return address when a subroutine or function
is called.
The other 30 registers are completely interchangeable from the hardware standpoint. There
are however software conventions that, when properly used, make code sharing, use of
libraries, efficient operating system support, etc. possible. Table 1 shows these conventions
and lists the names that are assigned to each of the registers. Using these names makes
reading code that follows the conventions easier. Note that these are conventions and need
not be followed if the software being developed does not need to interface with software
developed by others. You will however find it useful to follow these conventions for all your
programs.
2
Table 1. MIPS Registers.
Name
#
Description
$zero
$at
$v0
$v1
$a0
$a1
$a2
$a3
$t0
$t1
$t2
$t3
$t4
$t5
$t6
$t7
$s0
$s1
$s2
$s3
$s4
$s5
$s6
$s7
$t8
$t9
$k0
$k1
$gp
$sp
$fp
$ra
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
constant 0
reserved for the assembler – do not use
expression evaluation and results of a function
expression evaluation and results of a function
argument 1 sent to a function
argument 2 sent to a function
argument 3 sent to a function
argument 4 sent to a function
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
saved temporary (preserved across function call)
temporary (not preserved across function call)
temporary (not preserved across function call)
reserved for exception processing – do not use
reserved for exception processing – do not use
pointer to global area
stack pointer
frame pointer (sometimes called $s8)
return address (used by function call)
Instructions
Registers are used to hold information, but it is certainly necessary to do something with that
information. We need to perform operations on it. Each assembly language instruction
performs an operation. Each operation has a mnemonic that is short, yet easy for a person to
understand and type. The Mnemonic associated with addition is simply add; for subtraction it
is sub. Assembly language operations are frequently referred to as opcodes (short for
3
operation codes) because each operation is encoded in a group of bits within the (32-bit)
instruction. Each instruction must contain not only an opcode, but it must also include
information on the registers that are being used for this instruction. Consider this instruction
that you as a MIPS Assembly Language programmer might write:
add $t1, $t2, $t3
This statement would look like $t1 = $t2 + $t3 in HLL or pseudo code. The contents of
registers $t2 and $t3 are added and the sum is stored in register $t1. Notice that the
destination is on the left just as it is in an HLL. Similarly, the instruction
sub $a0, $t4, $s1
performs subtraction and implements $a0 = $t4 - $s1.
An easy way to remember the exact operation to be performed (especially with a noncommutative arithmetic operation such as subtraction) is to mentally substitute an assignment
operation for the first comma and the arithmetic operator for the second comma.
Table 2 contains some MIPS instructions that perform arithmetic on values that are stored in
registers. Each of these instructions uses two source registers and stores the result into a third
(destination) register.
Table 2. Arithmetic and Logic Instructions.
Mnemonic
add
addu
and
nor
or
sllv
srav
srlv
sub
subu
xor
Description
Addition (with overflow checking)
Addition (no overflow checking)
Logical (bitwise) AND
Logical (bitwise) NOR
Logical (bitwise) OR
Shift left logical (variable)
Shift right arithmetic (variable)
Shift right logical (variable)
Subtract (with overflow checking)
Subtract (no overflow checking)
Logical (bitwise) XOR
Java or C++ for: op $t1, $t2, $t3
$t1 = $t2 + $t3
$t1 = $t2 + $t3
$t1 = $t2 & $t3
$t1 = ~($t2 | $t3)
$t1 = $t2 | $t3
$t1 = $t2 << $t3
$t1 = $t2 >> $t3
$t1 = $t2 >>> $t3
$t1 = $t2 - $t3
$t1 = $t2 - $t3
$t1 = $t2 ^ $t3
Sometimes it is desirable to have a constant as one of the source operands. MIPS also has
instructions to perform these operations. For example, you can write:
addi $t1, $t2, 5
to add the constant 5 to the contents of register $t2 and store the result in register $t1. In
Assembly Language programming this constant is called an immediate value. The
immediate value constant is stored in a bit field within the 32-bit MIPS instruction and is
always specified last in MIPS Assembly Language. Since 16 bits within the 32-bit instruction
are used to hold an immediate value, the possible range for immediate values in MIPS is
limited to -32768 to +32767. The shift instructions in Table 3 also have a built-in constant. It
is called the shift amount (shamt) and is limited to the range 0 to 31.
4
Table 3. Instructions with Immediate Values
Mnemonic
addi
addiu
andi
ori
sll
sra
srl
xori
Description
Addition (with overflow checking)
Addition (no overflow checking)
Logical (bitwise) AND
Logical (bitwise) OR
Shift left logical (variable)
Shift right arithmetic (variable)
Shift right logical (variable)
Logical (bitwise) XOR
Java or C++ for: op $t1, $t2, const
$t1 = $t2 + imm
$t1 = $t2 + imm
$t1 = $t2 & imm
$t1 = $t2 | imm
$t1 = $t2 << shamt
$t1 = $t2 >> shamt
$t1 = $t2 >>> shamt
$t1 = $t2 ^ imm
Multiplication and Division
Did you notice that the above tables included neither multiply nor divide instructions? This is
because multiplication and division are fundamentally different from addition or subtraction.
Think about what happens when you multiply two numbers. If you multiply a 2-digit number
by another 2-digit number, you will get a 4-digit product. The product of two numbers can
potentially be twice as long as either of the numbers you are multiplying. Rather than just
assume that the product will fit in a general purpose register, MIPS uses two special purpose
registers for multiplication. These are called lo (used to hold the low order half of the
product) and hi (used to hold the high order half of the product). MIPS also has instructions
to move values from lo and hi to general purpose registers so that you can handle the case of
when you expect that the result will fit into one register and when you must work with a
double word (see Table 4).
Table 4. Multiply and Divide Instructions
Instruction
Description
Action
div $t1, $t2
Divide with overflow checking
lo = $t1 / $t2
hi = $t1 % $t2
divu $t1, $t2
mult $t1, $t2
multu $t1, $t2
mfhi $t1
mflo $t1
Divide no overflow checking
Multiply
Unsigned Multiply
Move from hi
Move from lo
lo = $t1 / $t2
hi = $t1 % $t2
hi || lo = $t1 * $t2
hi || lo = $t1 * $t2
$t1 = hi
$t1 = lo
Pseudo-instructions and Macros
Assemblers typically have the ability to define operations that are more complex than a single
machine instruction. MIPS has a large number of these pseudo-instructions built into the
assembler. These pseudo-instructions make programming easier, but you should remember
5
that frequently the pseudo-instruction does not fit into a single 32-bit instruction. The MIPS
assembler will instead use several of the instructions that are implemented in hardware (like
those in the above tables). Examples of pseudo-instructions are given in Table 5. Assemblers
typically permit the programmer to write macros to automate some of the repetitious parts of
assembly language programming. Macros in assembly language work similarly to macros
used in spreadsheet programming, however, this course will not cover macros. The pseudoinstructions shown here are basically predefined macros. To see how these pseudoinstructions are actually implemented, you can use these instructions in a program and then
examine the code as displayed in the SPIM simulator.
Table 5. Example Pseudo instructions.
Pseudo instruction
abs $t1, $t2
div $t1, $t2, $t3
div $t1, $t2, const
mul $t1, $t2, $t3
mul $t1, $t2, const
not $t1, $t2
rem $t1, $t2, $t3
move $t1, $t2
Action
Comment
$t1 = abs($t2)
$t1 = $t2 / $t3
$t1 = $t2 / const
$t1 = $t2 * $t3
$t1 = $t2 * const
$t1 = ~$t2
$t1 = $t2 % $t3
$t1 = $t2
absolute value
divide with overflow.
divide with overflow.
multiply with overflow
multiply with overflow
bitwise NOT
remainder
copy register to register
Getting Values into and out of Registers
All of the ideas above are useless unless we have a way to get values into registers. There are
instructions and pseudo instructions to do this as shown in Table 6. Instructions that place
values into registers are said to be load instructions. Instructions that take values from
registers and copy the values to memory are called store instructions. Store instructions are
the only ones that have the destination on the right, rather than on the left. This is done so
that the address specifier is always on the far right end of the instruction.
Locations in memory can also be given names. These are called labels. A label is defined by
starting a line of assembly language with the name of the label, followed by a colon
(examples are shown later). The label can then be used as a synonym for the address of that
memory location. Just as in HLL programming, the assembly language programmer can use a
label without knowing the actual value of the address to which it refers. The assembler will
keep track of this for you.
6
Table 6. Load and Store Instructions. Note that Mem[x] refers to memory location x.
(Pseudo) Instruction
li $t1, const
la $t1, address
Pseudo?
y
y
Name
load immediate
load address
Action
$t1 = const
$t1 = &address
lb $t1, address
y
load byte
lb $t1, const($t2)
n
load byte
lw $t1, address
y
load word
Place the byte located at
Mem[address] into register $t1
Place the byte located at
Mem[address+const] into register
$t1.
Place the word (4 bytes) located at
Mem[address] into register $t1.
address must be evenly divisible by
4.
Place the word (4 bytes) located at
Mem[address+const] into register
$t1. address+const must be evenly
divisible by 4.
lw $t1, const($t2)
n
load word
sb $t1, address
y
store byte
sb $t1, const($t2)
n
store byte
sw $t1, address
y
store word
sw $t1, const($t2)
n
store word
Copy the right-most 8 bits of $t1 to
the byte located at Mem[address]
Copy the right-most 8 bits of $t1 to
the byte located at
Mem[address+const]
Copy the word in $t1 (4 bytes) to
the word located at Mem[address].
address must be evenly divisible by
4.
Copy the word in $t1 (4 bytes) to
the word located at
Mem[address+const].
address+const must be evenly
divisible by 4.
Here is code that uses some of the ideas covered so far:
# Assume that intA and intB are integer variables and that arrC is an array of integers
# That is they are labels that refer to the appropriate memory locations.
# The following will compute intA = intB + arrC[2]
lw
$t1, intB
# $t1 receives a copy of the contents of intB.
la
$t2, arrC
# $t2 receives the address of (the beginning of) arrC.
lw
$t3, 8($t2)
# $t3 receives arrC[2]. Note: each integer is 4 bytes long.
add $t1, $t1, $t3 # perform the addition.
sw
$t1, intA
# store result to variable intA.
Input and Output
All I/O in MIPS (as implemented in the SPIM simulator) is performed by the syscall (system
call) instruction. Table 7 shows the contents of registers used to perform the specified syscall
actions. The System Call Code must always be placed in $v0 prior to executing the syscall
7
instruction. The syscall instruction has no parameters specified after the opcode. It just looks
like this:
syscall
Table 7. Syscall codes
Service
print_int
print_string
read_int
System Call Code
in $v0
1
4
5
read_string
Arguments
Result
$a0 = integer
$a0 = string address
integer (in $v0)
$a0 = buffer address,
$a1 = length of buffer
8
Example code using syscall:
# prompt and myStr are addresses of strings (discussed later)
li
$v0, 4
# this is the code to perform print_string
la
$a0, prompt # prepare to print a prompt to the user
syscall
# print the string.
li
$v0, 5
syscall
# this is the code to perform read_int
# read the integer (always goes to $v0)
addi
# increment the integer.
$t0, $v0, 1
li
$v0, 4
la
$a0, myStr
syscall
# this is the code to perform print_string
# prepare to print a string to the user
# print the string.
li
$v0, 1
move $a0, $t0
syscall
# this is the code to perform print_int
# integer must be in $a0 for printing.
# print the integer
Conditional Expressions, Loops and Function Calls
Assembly language is not a structured language. Blocks of code that are set off by { } in Java
or C++ are not handled that way. Instead, if statements and loops must be done by using
labels and branching according to the algorithm’s needs. If you have programmed in a
language such as Q-BASIC, you have encountered this style of programming.
Table 8 shows some instructions that are used to perform these operations. These instructions
are either ‘jump’ or ‘branch’ instructions. In MIPS, the ‘jump’ instructions can transfer to an
instruction that is farther away than ‘branch instructions can.
8
Table 8. Jumps and Branches
Instruction
Description
Action
j target_address
jump
jal target_address
jump and link
unconditionally jump to the
instruction at target_address
unconditionally jump to the
instruction at target_address.
save the address of the next
instruction to $ra
jr $ra
beq $t1, $t2, label
jump register
branch on equal
jump to the instruction whose
address is in $ra. This usually
acts as a return statement.
branch to label if $t1 == $t2
bgez $t1, label
branch on greater than
or equal zero
branch to label if $t1 >= 0
bgtz $t1, label
branch on greater than
zero
branch to label if $t1 > 0
blez $t1, label
branch on less than or
equal zero
branch to label if $t1 <= 0
bltz $t1, label
bne $t1, $t2, label
branch on less than
zero
branch on not equal
branch to label if $t1 < 0
branch to label if $t1 != $t2
When coding a while loop, it is necessary to place a branch at the top of the loop that tests for
the condition to leave the loop (not to continue looping or to skip the loop without ever
entering it). This will be a branch to a label that is just past the end of the loop body. The end
of the loop body will have a jump instruction to unconditionally go back to the top to reexecute the conditional branch.
Here is a sample while loop:
# the following assembly code implements this HLL code:
#
a = 10
#
while (a >= 0) {
#
print a,
# written this way for simplicity
#
a=a-1
#
}
top:
li
bltz
$t4, 10
$t4, done
# loop control variable initialized to 10
# while ($t4 >= 0)
li
$v0, 1
move $a0, $t4
syscall
# this is the code to perform print_int
# integer must be in $a0 for printing.
# print the integer
addi
j
# decrement the control variable
# back to beginning of loop
$t4, $t4, -1
top
done:
9
Notice that in the while loop, the while test is while a>=0 but in the assembly code the test is
bltz, that is, branch if less than zero. This is the opposite text because in HLL the test
indicates when the loop should be executed, whereas in assembly language, the test indicates
when the loop should no longer be executed.
When coding a Do-While Loop (called an Until Loop in some languages), the conditional test
will be at the bottom of the loop and will branch back to the beginning if an additional
iteration is required. No unconditional branch is needed for this loop but the loop body will
be executed at least once.
# the following assembly code implements:
#
a = 10
#
do
#
{
#
print a,
# written this way for simplicity
#
a=a-1
#
} while (a >= 0)
li
$t4, 10
# loop control variable initialized to 10
top:
li
$v0, 1
move $a0, $t4
syscall
# this is the code to perform print_int
# integer must be in $a0 for printing.
# print the integer
addi $t4, $t4, -1
bgez $t4, top
# decrement the control variable
# while ($t4 >= 0)
When coding an if-statement, the condition encoded is the opposite of the one that you think
about as being the condition to execute the body of the if-statement. Remember that the
conditional branch will be used to branch around the body of the if-statement in a manner
similar to the while loop above.
This is an example of an if-statement”
# The following assembly language implements this code:
#
if ($t1 > 10)
#
{
#
$t1 = $t1/2
#
$t2 = 1
#
}
ble
sra
li
$t1, 10, endif
$t1, $t1, 1 # shifting right by one bit divides by 2
$t2, 1
endif:
10
This is an example of an if-else-statement:
# The following assembly language implements this code:
#
if ($t1 > 10)
#
$t1 = $t1/2
#
else
#
$t1 = $t1+3
ble
sra
j
elsePart:
addi
endif:
$t1, 10, elsePart
$t1, $t1, 1
endif
$t1, $t1, 3
Every processor has a special purpose register called the Program Counter (PC) that always
contains the address of the next instruction to be executed. The MIPS processor also contains
a PC register. Normally the PC register is incremented by 4 (there are 4 bytes per instruction
in MIPS) as each instruction enters execution. Jump and Branch instructions are able to
change the contents of the PC and are thus able to alter the sequential nature of the program
execution. Branch instructions in MIPS have a 16-bit field within the instruction that contains
an “offset” to the destination instruction. The offset is a signed number which when
multiplied by 4 and then added to the PC will be the address of the target instruction.
When you look at the jal instruction, you should be asking: “How can the return address for a
function call be placed in a register rather than on a stack? I thought that return addresses are
always placed on the stack.” In general, it can’t, however, many functions that are called
don’t call any other functions (we call these ‘leaf’ functions because of their location on a
calling tree). Such a function can safely allow the address to remain in $ra until it is time to
return to the caller.
Any function that has been called and needs to call another function (or itself, recursively)
must save the contents of $ra (the address this function must return to) to a stack. When it is
time to return, the function will then need to retrieve this address from the stack and place it in
$ra again. We will be using stack frames to hold return addresses and other information later
in the course.
Assembler Directives and Memory Utilization
The Assembler must handle allocating memory for variables and strings as well as generating
the binary code for the instructions in the program. Special commands that always begin with
a “.” are used for this purpose. The following table summarizes some of these. A complete
MIPS program is then provided.
11
Table 9. Dot Commands
Dot
Command
.data
.text
Description
Introduce the section of the program that
contains data. This data is treated like global
data is in an HLL.
Introduce the executable portion of the
program.
Example
.globl
Export labels (make them known outside your
program).
.globl main
.align
Force words to be properly aligned.
.align 2
.asciiz
Declare a null terminated string.
.asciiz "This is a test."
.space
Begin an un-initialized region of data memory.
.space 50
.byte
Initialize bytes of data
.byte 10
.byte 20,30,0x91
.word
Initialize words of data
.word 3
.word 5, 0xFF00, 10
This is a whole sample program:
# Programs should always begin with your name
# and section identifier in a comment here.
.globl main
# the entry point (main) must always be made global
.data
# data segment declarations
pmpt: .asciiz "Enter your name: "
strB: .asciiz "This calculation was prepared\nespecially for "
buffer:
.space 80
# a place for string input
# force the next entry to be word aligned (2 zeros at end of address)
.align 2
int1: .word 99
array1:
.word 12, 15, 29, 33, 43, 0
.text
# begin the executable portion
li
$v0, 4
la
$a0, pmpt
syscall
# system call code for print_str
# address of string to print
# print the prompt
li
la
li
# system call code for read_str
# address of string buffer
# length of buffer area
main:
$v0, 8
$a0, buffer
$a1, 80
12
syscall
# read the string
li
$v0, 4
la
$a0, strB
syscall
# system call code for print_str
li
$v0, 4
la
$a0, buffer
syscall
# system call code for print_str
li
la
loop: lw
beqz
add
addi
j
$t0, 0
$t1, array1
$t2, 0($t1)
$t2, done
$t0, $t0, $t2
$t1, $t1, 4
loop
# $t0 will be used to accumulate a total
# address of beginning of the array
# get array element to $t2
# leave loop if we have reached sentinel value (0)
# add value into accumulator
# adjust address to point to next array element
done:
lw
mul
$t3, int1
# load the integer at int1
$a0, $t3, $t0 # perform a calculation: int1 * sum(array1)
li
$v0, 1
syscall
# this is the code to perform print_int
# print the integer (previously stored in $a0)
jr
# return (to the operating system)
$ra
13