Download Chapter 3 Instructions: Language of the Machine

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 3
Instructions: Language of the Machine
Note: The slides being presented represent a mix. Some are created by
Mark Franklin, Washington University in St. Louis, Dept. of CSE. Many
are taken from the Patterson & Hennessy book, “Computer Organization
& Design”, Copyright 1998 Morgan Kaufmann Publishers. This material
may not be copied or distributed for commercial purposes without
express written permission of the copyright holder. The original slides
may be found at:
http://books.elsevier.com/us//mk/us/subindex.asp?maintarget=companio
ns/defaultindividual.asp&isbn=1558604286&country=United+States&srcc
ode=&ref=&subcode=&head=&pdf=&basiccode=&txtSearch=&SearchFiel
d=&operator=&order=&community=mk
1
Instructions:
• Language of the Machine
• More primitive than higher level languages
e.g., no sophisticated control flow
• Very restrictive
e.g., MIPS Arithmetic Instructions
• We will use the MIPS instruction set architecture
– similar to other architectures developed since the
1980's
– used by NEC, Nintendo, Silicon Graphics, Sony
Instruction Set Design goals: maximize performance and minimize
cost, reduce design time
S04
2
MIPS arithmetic
•
•
•
•
•
All instructions perform one operation and must have 3 operands.
Operand order is fixed (destination first, sources next).
Design principle: regularity implies simplicity of design.
Only one line per instruction (assembler constraint).
# indicates start of comment.
Example:
C code:
A=B+C
MIPS code: add $s0, $s1, $s2 # $s0 Å $s1 + $s2
# $s0,$s1,$s2 are registers
# all arithmetic operations use registers.
S04
3
MIPS arithmetic
• Hardware simplicity can add to programming complexity
C code:
A = B + C + D;
E = F - A;
MIPS code: add $t0, $s1, $s2
add $s0, $t0, $s3
sub $s4, $s5, $s0
# $s1=B, $s2=C
# $s3=D, $s0=A
# $s5=F, $s4=E
• Operands must be registers, only 32 registers provided
• Design Principle: smaller is faster.
Why?
S04
4
Registers vs. Memory
• Arithmetic instructions operands must be registers,
— only 32 registers provided (0 to 31)
• Registers are 32 bits in length. A 32 bit chunk = “word”.
• Compiler associates variables with registers.
• What about programs with lots of variables (i.e. > 32).
Input
Control
Memory
Datapath
Processor
S04
Output
I/O
5
Memory Organization
• Memory is viewed as a large, single-dimension array.
• A memory address is an index into the array.
• "Byte addressing" means that the address points to a byte
of memory.
0
1
2
3
4
5
6
8 bits of data
0000
8 bits of data
0001
8 bits of data
0010
8 bits of data
0011
8 bits of data
0100
8 bits of data
0101
8 bits of data
0110
...
S04
6
Memory Organization
• Bytes are nice, but most data items use larger "words"
• For MIPS, a word is 32 bits or 4 bytes.
0
4
8
12
...
32 bits of data
32 bits of data
32 bits of data
Registers hold 32 bits of data
32 bits of data
• 232 bytes with byte addresses from 0 to 232-1
• 230 words with byte addresses 0, 4, 8, ... 232-4
• Words are aligned
i.e., what are the least 2 significant bits of a word
address?
S04
7
Instructions
• Load and store instructions
• Example: A is array 100 words in length, h has been
loaded into register s0. s3 contains base address for the
array.
C code:
MIPS code:
A[8] = h + A[8];
lw $t0, 32($s3)
add $t0, $s0, $t0
sw $t0, 32($s3)
# 32=8*4Bytes
# sw = store word
• Remember arithmetic operands are registers, not memory!
S04
8
Another Example
•
Can we figure out the code?
swap(int v[], int k);
{ int temp;
temp
= v[k]
v[k]
= v[k+1];
v[k+1] = temp;
}
swap:
muli
add
lw
lw
sw
sw
jr
S04
$2,
$2,
$15,
$16,
$16,
$15,
$31
$5, 4
$4, $2
0($2)
4($2)
0($2)
4($2)
9
So far:
• MIPS
— loading words but addressing bytes
— arithmetic on registers only
• Instruction
add $s1, $s2, $s3
sub $s1, $s2, $s3
lw $s1, 100($s2)
sw $s1, 100($s2)
S04
Meaning
$s1 = $s2 + $s3
$s1 = $s2 – $s3
$s1 = Memory[$s2+100]
Memory[$s2+100] = $s1
10
Machine Language
•
•
Instructions, like registers and words of data, are also 32 bits long
– Example: add $t0, $s1, $s2
– registers have numbers, $t0=8, $s1=17, $s2=18
– $t0 Æ$t7 map to registers 8 Æ15, $s0 Æ$s7 map to registers 16 Æ23
Instruction Format (Register or R-Type)
000000 10001
op
rs
6-bits 5-bits
•
•
10010
rt
01000
rd
00000
100000
shamt
funct
5-bits 5-bits 5-bits
6-bits
All instructions are the same length (fixed length vs variable length)
Can you guess what the field names stand for?
S04
11
Machine Language
•
•
•
•
S04
Consider the load-word and store-word instructions,
– What would the regularity principle have us do?
– New principle: Good design demands a compromise
Introduce a new type of instruction format
– I-type for data transfer instructions
Example: lw $t0, 32($s2) # $t0(reg 8)ÅA[8], $s2(reg 18)
35
18
8
op
rs
rt
32
16 bit number
Where's the compromise?
12
Instruction Formats (so far)
Instruction
Format
op
rs
rt
rd
shamt
funct
address
add
R
0
reg
reg
reg
0
32
n.a.
sub
subtract
R
0
reg
reg
reg
0
34
n.a.
lw
Load word
I
35
reg
reg
n.a.
n.a.
n.a.
address
sw
Store word
I
43
reg
reg
n.a.
n.a.
n.a.
address
S04
13
Stored Program Concept
•
•
Instructions are bits
Programs are stored in memory (read or written just like data)
Processor
•
Memory
memory for data, programs,
compilers, editors, etc.
Fetch & Execute Cycle
– The PC (Program Counter) is a register that holds the memory
address of the currently execution instruction. Generally (no branch
or jump) on each instruction execution PC Å PC + 4
– Instructions are fetched and put into a special register (IR: Inst. Reg.)
– Bits in the register "control" the instruction execution.
– When complete, the “next” instruction is “fetched” (using PC) and the
newly fetched instruction is executed …
S04
14
Processor Organization
OFF-Chip
Data
Memory
Control
PC Å PC+1
PC Å Br.Addr
Other Stuff
Instruction
Memory
Registers
ON-Chip
Processor
PC: Program Counter
IR: Instruction Reg.
Other Registers
Arithmetic and Logic
S04
15
Now for Control Instructions
•
Decision making instructions: alter the control flow,
– Instead of PC Å PC + 4, PC Å (Branch or Jump Address)
•
MIPS conditional branch instructions:
bne $s0, $s1, Label
#Go to addr. “Label” if contents
#contents $s0 is not equal to $s1
beq $s0, $s1, Label
•
Example:
if (i==j) h = i + j;
bne $s0, $s1, Label
add $s3, $s0, $s1
Label:
....
S04
#h result in $s3
16
Control
• MIPS unconditional branch instructions:
j label
• Example:
if (i!=j)
h=i+j;
else
h=i-j;
Lab1:
Lab2:
beq $s4, $s5, Lab1
add $s3, $s4, $s5
j Lab2
sub $s3, $s4, $s5
...
• Can you build a simple for loop?
S04
17
Addresses in Branches
•
•
Instructions:
bne $t4,$t5,Label
beq $t4,$t5,Label
Formats:
I
•
•
S04
Next instruction is at Label if $t4°$t5
Next instruction is at Label if $t4=$t5
op
rs
rt
16 bit address
Could specify a register (like lw and sw) and add it to address
– use Instruction Address Register (PC = program counter):
i.e.,
PC Å PC + (16-bits in label field of branch).
– Label field is a signed number (branching forward & backward).
– most branches are local (principle of locality).
Jump instructions just use high order bits of PC
– address boundaries of 256 MB
18
So far:
•
•
Instruction
Meaning
add $s1,$s2,$s3
sub $s1,$s2,$s3
lw $s1,100($s2)
sw $s1,100($s2)
bne $s4,$s5,L
beq $s4,$s5,L
j Label
$s1 = $s2 + $s3
$s1 = $s2 – $s3
$s1 = Memory[$s2+100]
Memory[$s2+100] = $s1
Next instr. is at Label if $s4 != $s5
Next instr. is at Label if $s4 == $s5
Next instr. is at Label
Formats:
R
op
rs
rt
rd
I
op
rs
rt
16 bit address
J
op
S04
shamt
funct
26 bit address
19
More on Control Flow
• We have: beq, bne, what about Branch-if-less-than?
• New instruction (Set Less Than, slt)
if
slt $t0, $s1, $s2
$s1 < $s2 then
$t0 = 1
else
$t0 = 0
• Can use this instruction to build "blt $s1, $s2, Label“
• Can build general control structures and call them in
assembly language using “pseudoinstructions” which
expand into a set of machine instructions.
S04
20
Policy of Use Conventions
Name Register number
$zero
0
$v0-$v1
2-3
$a0-$a3
4-7
$t0-$t7
8-15
$s0-$s7
16-23
$t8-$t9
24-25
$gp
28
$sp
29
$fp
30
$ra
31
Usage
the constant value 0
values for proc results & expression eval.
arguments to pass to procedures
temporaries
saved
more temporaries
global pointer
stack pointer
frame pointer
return address on procedure return
$at, reg 1; reserved for assembler.
$k0,$k1, reg 26,27; reserved for OS.
21
Constants
•
•
•
Small constants are used quite frequently (50% of operands)
e.g.,
A = A + 5;
B = B + 1;
C = C - 18;
Solutions? Why not?
– put 'typical constants' in memory and load them.
– create hard-wired registers (like $zero) for constants like one.
MIPS Instructions:
addi $29, $29, 4
slti $8, $18, 10
andi $29, $29, 6
ori $29, $29, 4
•
S04
How do we make this work?
22
Immediate Instructions
Addi
$sp, $sp, 4
#Add immediate
op
rs
rt
immediate
8
29
29
4
001000
11101
11101
0000 0000 0000 0001
Lui
$t0, 255 #Load upper immediate, $t0=reg8=0
op
rs
rt
immediate
001111
00000
01000
0000 0000 1111 1111
$t0 after executing instruction:
0000 0000 1111 1111 0000 0000 0000 0000
S04
23
How about larger constants?
•
•
We'd like to be able to load a 32 bit constant into a register
Must use two instructions, new "load upper immediate" instruction
lui $t0, 1010101010101010
1010101010101010
•
0000000000000000
Then must get the lower order bits right, i.e.,
ori $t0, $t0, 1010101010101010
ori
S04
filled with zeros
1010101010101010
0000000000000000
0000000000000000
1010101010101010
1010101010101010
1010101010101010
Can also
use addi
24
Procedure/subroutine Calls & Returns
The PC (Program Counter) is a register that holds the memory
address of the currently execution instruction. It is incremented
by 4B each time a nonJump/Branch instruction is executed. It is
changed to the Jump/Branch address when required..
Calling and returning from a procedure:
PC contents
X
X+4
X+8
X + 12
etc.
2500
…
…
Instruction
inst 1
inst 2
jal
2500
instructions for procedure
jr
S04
# $ra (reg 31) Å PC + 4 = X + 12
# end of procedure: PC Å $ra,
PC Å X + 12
25
Memory Usage
Memory is typically divided so that certain areas are reserved
for certain usage. On MIPS processors the convention is as
follows:
7fffffff
hex
Stack segment
Dynamic data
10000000hex
Static data
Data segment
Text segment
400000hex
S04
Reserved
26
Loading a Register With an Address > 16-bits
•
Direct Loading: Say you want the address 0x10010020 Æ $v0
lui $s0, 0x1001
lw $v0, 0x0020($s0)
•
Using Global Register $gp: Many MIPS systems, on startup load
$gp with 0x10008000.
lw $v0, 0x8020($gp)
•
# Upper $s0 Å 0x1001
# $v0 Å 0x1001000 + 0x0020
# $v0 Å 10010020= 0x8020 + 0x10008000
Use Pseudo Instruction, la (Load Address):
iarray:
main:
end:
S04
.data
.word
.text
la
lw
add
jr
# default starting address for data: 0x10010000
8, 128, 33, 279, 55, 10345, 5235, # array of ints(decimal)
$s4, iarray
# translates into lui
$s1, 0($s4)
$s2, $s1, $s1
$ra
27
Example using stack and procedure return
•
•
•
Copy Byte string Y to X, string terminates with null character.
Base Addresses: Y in $a1, X in $a0, String pointer will be in $s0
Since $s0 is a stack register it must be save prior to using
Strcpy: sub
L1:
L2:
S04
sw
add
add
lb
add
sb
beq
add
j
lw
add
jr
$sp, $sp, 4
# adjust stack pointer,
# stack grows down in memory, “push”
$s0, 0($sp)
# save $s0 on the stack
$s0, $zero, $zero # string pointer set to zero
$t1, $a1, $s0 # $t1 Å addr of Y string byte
$t2, 0($t1)
# $t2 Å Y string byte
$t3, $a0, $s0 # $t3 Å addr of X string byte
$t2, 0($t3)
# Y byte Æ X byte
$t2, $zero, L2 # If end of string, go to L2
$s0, $s0, 1
# increment string pointer, $s0Å$s0+1
L1
# go back and mover next byte
$s0, 0($sp)
# restore old $s0 from stack
$sp, $sp, 4
# “pop” the stack
$ra
# return to calling routine
28
Overview of MIPS
•
•
•
•
•
S04
simple instructions all 32 bits wide
very structured, no unnecessary baggage
only three instruction formats
R
op
rs
rt
rd
I
op
rs
rt
16 bit address
J
op
shamt
funct
26 bit address
rely on compiler to achieve performance
— what are the compiler's goals?
help compiler where we can
29
MIPS Addressing Modes
•
Register Addressing: Operand is a register
•
Base or Displacement Addressing: Operand is at memory
location, address is sum of a register and a constant in the
instruction.
•
Immediate Addressing: Operand is a constant within the
instruction.
•
PC-relative Addressing: Address is the sum of PC and a
constant in the instruction
•
Pseudodirect Addressing: Jump address is the 26 bits of the
instruction concatenated with the upper bits of the PC.
S04
30
1. Immediate addressing
op
rs
rt
Immediate
2. Register addressing
op
rs
rt
rd
...
funct
Registers
Register
3. Base addressing
op
rs
rt
Memory
Address
+
Register
Byte
Halfword
Word
4. PC-relative addressing
op
rs
rt
Memory
Address
PC
+
Word
5. Pseudodirect addressing
op
S04
Address
PC
Memory
Word
31
To summarize:
MIPS operands
Name
32 registers
Example
Comments
$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
$a0-$a3, $v0-$v1, $gp,
arithmetic. MIPS register $zero always equals 0. Register $at is
$fp, $sp, $ra, $at
reserved for the assembler to handle large constants.
Memory[0],
30
2
Accessed only by data transfer instructions. MIPS uses byte addresses, so
memory Memory[4], ...,
words
sequential words differ by 4. Memory holds data structures, such as arrays,
Memory[4294967292]
and spilled registers, such as those saved on procedure calls.
MIPS assembly language
Category
Arithmetic
Instruction
add
Example
add $s1, $s2, $s3
Meaning
$s1 = $s2 + $s3
Three operands; data in registers
subtract
sub $s1, $s2, $s3
$s1 = $s2 - $s3
Three operands; data in registers
$s1 = $s2 + 100
$s1 = Memory[$s2 + 100]
Memory[$s2 + 100] = $s1
$s1 = Memory[$s2 + 100]
Memory[$s2 + 100] = $s1
Used to add constants
addi $s1, $s2, 100
lw $s1, 100($s2)
sw $s1, 100($s2)
store word
lb $s1, 100($s2)
load byte
sb $s1, 100($s2)
store byte
load upper immediate lui $s1, 100
add immediate
load word
Data transfer
Conditional
branch
Unconditional jump
$s1 = 100 * 2
16
Comments
Word from memory to register
Word from register to memory
Byte from memory to register
Byte from register to memory
Loads constant in upper 16 bits
branch on equal
beq
$s1, $s2, 25
if ($s1 == $s2) go to
PC + 4 + 100
Equal test; PC-relative branch
branch on not equal
bne
$s1, $s2, 25
if ($s1 != $s2) go to
PC + 4 + 100
Not equal test; PC-relative
set on less than
slt
$s1, $s2, $s3
if ($s2 < $s3) $s1 = 1;
else $s1 = 0
Compare less than; for beq, bne
set less than
immediate
slti
jump
j
jr
jal
jump register
jump and link
$s1, $s2, 100 if ($s2 < 100) $s1 = 1;
Compare less than constant
else $s1 = 0
2500
$ra
2500
Jump to target address
go to 10000
For switch, procedure return
go to $ra
$ra = PC + 4; go to 10000 For procedure call
32
Assembly Language vs. Machine Language
• Assembly provides convenient symbolic representation
– much easier than writing down numbers
– e.g., destination first
• Machine language is the underlying reality
– e.g., destination is no longer first
• Assembly can provide 'pseudoinstructions'
– e.g., “move $t0, $t1” exists only in Assembly;
would be implemented using “add $t0,$t1,$zero”
• When considering performance you should count real
instructions
S04
33
Assembly Language vs. Machine Language
• Assembly provides convenient symbolic representation
– much easier than writing down numbers
– e.g., destination first
• Machine language is the underlying reality
– e.g., destination is no longer first
• Assembly can provide 'pseudoinstructions'
– e.g., “move $t0, $t1” exists only in Assembly;
would be implemented using “add $t0,$t1,$zero”
• When considering performance you should count real
instructions
S04
34
Byte Order within Words
• Number of Bytes within a word can be from left to right, or
right to left.
• Big-Endian (left-to-right): .byte 0,1,2,3
Byte #
O 1 2 3
• Little-Endian (right-to-left): .byte 3,2,1,0
Byte #
3 2 1 0
S04
35
Fixed vs Variable Length Instructions
S04
36
Other Issues
• Things we are not going to cover
linkers, loaders, memory layout
stacks, frames (introduced briefly), recursion
manipulating strings and pointers
interrupts and exceptions
details of system call conventions (see example)
• Some of these we'll talk about later
• We've focused on architectural issues
– basics of MIPS assembly language and machine code
– We will build a processor to execute these
instructions.
S04
37
Alternative Architectures
•
Design alternative:
– provide more powerful operations
– goal is to reduce number of instructions executed
– danger is a slower cycle time and/or a higher CPI
•
Sometimes referred to as “RISC vs. CISC”
– virtually all new instruction sets since 1982 have been RISC
– VAX: minimize code size, make assembly language easy
instructions from 1 to 54 bytes long!
•
S04
We’ll look at PowerPC and 80x86
38
PowerPC
•
Indexed addressing
– example:
lw $t1,$a0+$s3
#$t1=Memory[$a0+$s3]
– What do we have to do in MIPS?
•
•
S04
Update addressing
– update a register as part of load (for marching through
arrays)
– example: lwu $t0,4($s3)
#$t0=Memory[$s3+4];$s3=$s3+4
– What do we have to do in MIPS?
Others:
– load multiple/store multiple
– a special counter register “bc Loop”
decrement counter, if not 0 goto loop
39
80x86
•
•
•
•
•
•
1978: The Intel 8086 is announced (16 bit architecture).
1980: The 8087 floating point coprocessor is added.
1982: The 80286 increases address space to 24 bits, +instructions.
1985: The 80386 extends to 32 bits, new addressing modes
1989-1995: The 80486, Pentium, Pentium Pro add a few
instructions (mostly designed for higher performance)
1997: MMX is added
“This history illustrates the impact of the “golden handcuffs” of
compatibility
“adding new features as someone might add clothing to a packed bag”
“an architecture that is difficult to explain and impossible to love”
S04
40
A dominant architecture: 80x86
•
•
•
See your textbook for a more detailed description
Complexity:
– Instructions from 1 to 17 bytes long
– one operand must act as both a source and destination
– one operand can come from memory
– complex addressing modes
e.g., “base or scaled index with 8 or 32 bit displacement”
Saving grace:
– frequently used instructions are not too difficult to build
– compilers avoid the portions of the architecture that are slow
“what the 80x86 lacks in style is made up in quantity,
making it beautiful from the right perspective”
S04
41
Summary
• Instruction complexity is only one variable
– lower instruction count vs. higher CPI / lower
clock rate
• Design Principles:
– simplicity favors regularity
– smaller is faster
– good design demands compromise
– make the common case fast
• Instruction set architecture
– a very important abstraction
S04
42