Download Lecture 7 - 資訊科學研究所

Document related concepts
no text concepts found
Transcript
嵌入式處理器架構與程式設計
王建民
中央研究院 資訊所
2008年 7月
Contents
Introduction
 Computer Architecture
 ARM Architecture
 Development Tools
 GNU Development Tools
 ARM Instruction Set
 ARM Assembly Language
 ARM Assembly Programming
 GNU ARM ToolChain
 Interrupts and Monitor

2
Lecture 7
ARM Assembly Language
Outline



Coprocessor and Thumb Instructions
Assembly Language
Runtime Environment
4
Coprocessors1

The ARM architecture supports 16 coprocessors




A coprocessor may be implemented




System coprocessor
Floating-point coprocessor
Application-specific coprocessor
in hardware
in software (via the undefined instruction exception)
in both (common cases in hardware, the rest in software)
Each coprocessor instruction set occupies part of
the ARM instruction set.
5
Coprocessors2

There are three types of coprocessor instruction




Coprocessor data processing
Coprocessor (to/from ARM) register transfers
Coprocessor memory transfers (load and store to/from
memory)
Assembler macros can be used to transform
custom coprocessor mnemonics into the generic
mnemonics understood by the processor.
6
Coprocessor Data Processing


This instruction initiates a coprocessor operation
The operation is performed only on internal
coprocessor state


For example, a Floating point multiply, which
multiplies the contents of two registers and stores the
result in a third register
Syntax:

CDP{<cond>} <cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>}
31
28 27 26 25 24 23
Cond
1 1 1 0
20 19
opc_1
Opcode
16 15
CRn
12 11
CRd
cp_num
8
7
5
4
opc_2 0
3
0
CRm
Destination Register
Opcode
Source Registers
Condition Code Specifier
7
Coprocessor Register Transfers
Instructions



MRC : Move to ARM Register from Coprocessor
MCR : Move to Coprocessor from ARM Register
An operation may also be performed on the data
as it is transferred


Ex. a Floating Point Convert to Integer instruction can
be implemented as a register transfer to ARM.
Syntax

<MRC|MCR>{<cond>} <cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2>
31
Cond
28 27 26 25 24 23 22 21 20 19
1 1 1 0 opc_1 L
Condition Code Specifier
16 15
CRn
12 11
Rd
8
cp_num
7
5
4
opc_2 1
ARM Source/Dest Register
Opcode
Coprocesor Source/Dest Registers
Transfer To/From Coprocessor
Opcode
3
0
CRm
8
Coprocessor Memory Transfers1


Load from memory to coprocessor registers
Store to memory from coprocessor registers.
31
28 27 26 25 24 23 22 21 20 19
Cond
1 1 0 P U N W L
Condition Code Specifier
16 15
Rn
12 11
CRd
cp_num
Source/Dest Register
Base Register
Load/Store
Base Register Writeback
Transfer Length
Add/Subtract Offset
Pre/Post Increment
8
7
0
Offset
Address Offset
9
Coprocessor Memory Transfers2

Syntax
<LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<address>

PC relative offset generated if possible, else causes an error.
<LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn,offset]{!}>

Pre-indexed form, with optional writeback of the base register
<LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn],offset>


Post-indexed form
<L> when present causes a “long” transfer to be
performed (N=1) else causes a “short” transfer to be
performed (N=0).

Effect of this is coprocessor dependant.
10
Thumb1

Thumb is a 16-bit instruction set




Optimized for code density from C code (~65% of
ARM code size)
Improved performance from narrow memory (~160%
of an equivalent ARM connected to 16-bit memory
system)
Subset of the functionality of the ARM instruction set
Core has additional execution state - Thumb


It can switch back and forth between 16-bit and 32-bit
instructions
Switch between ARM and Thumb using BX instruction
11
Thumb2
31
ADDS r2,r2,#1
0
32-bit ARM Instruction

For most instructions
generated by compiler:



15
ADD r2,#1
0
16-bit Thumb Instruction


Conditional execution
is not used
Source and destination
registers identical
Only Low registers
used
Constants are of
limited size
Inline barrel shifter not
used
12
Outline



Coprocessor and Thumb Instructions
Assembly Language
Runtime Environment
13
The Programmer’s Model1


We will not be using the Thumb instruction set.
Memory Formats

We will be using the Little Endian format


Instruction Length


the lowest numbered byte of a word is considered the word’s
least significant byte, and the highest numbered byte is
considered the most significant byte .
All instructions are 32-bits long.
Data Types

8-bit bytes and 32-bit words.
14
The Programmer’s Model2

Processor Modes (of interest)

User: the “normal” program execution mode.
IRQ: used for general-purpose interrupt handling.

Supervisor: a protected mode for the operating system.


The Register Set




Registers R0-R15 + CPSR
R13: Stack Pointer
R14: Link Register
R15: Program Counter where bits 0:1 are ignored
(why?)
15
The Programmer’s Model3

Program Status Registers

CPSR (Current Program Status Register)

holds info about the most recently performed ALU operation




controls the enabling and disabling of interrupts
sets the processor operating mode
SPSR (Saved Program Status Registers)


contains N (negative), Z (zero), C (Carry) and V (oVerflow) bits
used by exception handlers
Exceptions

reset, undefined instruction, SWI, IRQ.
16
Assembly Language Basics1





“Load/store” architecture
32-bit instructions
32-bit and 8-bit data types
32-bit addresses
37 registers (30 general-purpose registers, 6 status
registers and a PC)


only a subset is accessible at any point in time
No instruction to move a 32-bit constant to a
register (why?)
17
Assembly Language Basics2


Conditional execution
Barrel shifter






scaled addressing, multiplication by a small constant,
and ‘constant’ generation
Loading constants into registers
Loading addresses into registers
Load and Store Multiple instructions
Jump tables
Co-processor instructions (we will not use these)
18
GNU ARM Assembler

You can assemble the contents of any ARM
assembly language source file by executing the
arm-elf-as program.


Though you can use the GNU Linker to create the
final executable, it is preferred to use the GNU
Compiler Collection to create an executable file.


arm-elf-as –mno-fpu –o filename.o filename.s
arm-elf-gcc –o filename.elf filename.s
To execute an ARM executable file

arm-elf-run filename.elf
19
Assembly Language Syntax

Each assembly line has the following format
[<label:>]

Begins with a letter
A directive to guide the work of the assembler


Only use the alphabetic characters A-Z and a-z, the digits 0-9,
as well as “_”, “.”, and “$”
An instruction to assemble into machine language code.


@ comment
A label can be any valid symbol followed by a :


[<instruction or directive>]
Begins with a .
A comment is anything that follows a @

C-style comments (using “/*” and “*/”) are also allowed
20
Assembler Directives

Starting a new section
.section

name
Defining code section of program
.text

Defining data initialized data section of program
.data

Defining un-initialized data section of program
.bss

End of the assembly file (optional)
.end
21
Assembler Directives

Making a symbol available to other partial
programs that are linked with it
.global

Declaring a symbol as externally defined (optional)
.extern

symbol
Aligning the address to a particular storage
boundary which is a power of 2.
.align

symbol
expression
Declaring a common symbol that may be merged
.comm
symbol,length,alignment
22
Assembler Directives

Defining / initializing storage locations
.word
.hword
.byte

@ 32 bits
@ 16 bits
@ 8 bits
Defining / initializing a string
.ascii
.asciz

expression
expression
expression
“string”
“string”
Defining memory space
.skip
.space
size
size
23
Assembler Directives

Directives similar to the statements that begin with
“#” in the C programming language
.include
.equ
.set
.if
.ifdef
.ifndef
.else
.endif
“file”
symbol, expression
symbol, expression
expression
expression
expression
24
The Structure of an Assembly Code
Chunks of code or data manipulated by the linker
.file
"sum2.s"
.section .text
.align 2
.global sum2
sum2:
add
mov
Minimum required block (why?)
r0, r0, r1
pc, lr
.end
@
@
@
@
@
@
@
the code section
aligns the address
to 4 bytes
give the symbol
an external linkage
add input arguments
return from subroutine
@ end of program
First instruction to be executed
25
Example #1: Finding the Large One
#include <stdio.h>
extern int max2(int a, int b);
int main()
{
int a = 12345;
int b = 6789;
printf("The maximum of %d and %d is %d\n",a,b,max2(a,b));
}
.text
.align 2
.global max2
max2:
done:
cmp
bge
mov
mov
r0, r1
done
r0, r1
pc, lr
@
@
@
@
compare two numbers
if R0 contains the maximum
otherwise overwrite R0
return from subroutine
26
Example #2: Finding the Largest
#include <stdio.h>
extern int maxn(int *a, int n);
int a[6] = { 123, 34, 45, 56, 678, 9 };
int main()
{
printf("The maximum of all numbers is %d\n", maxn(a,6));
}
.text
.align 2
.global maxn
maxn:
mov
r2, r0
mov
r3, r1
ldr
r0, [r2], #4
loop:
subs
r3, r3, #1
@ reduce the count by 1
beq
done
@ test if finished
ldr
r1, [r2], #4
@ put next number in R1
cmp
r0, r1
@ if R0 contains the larger
movlt
r0, r1
@ otherwise overwrite R0
b
loop
@ continue
done:
mov
pc, lr
@ return from subroutine
27
Does this work?

Instead of computing the larger number by itself,
it may call max2 in Example #1 to find the larger
number
maxn:
loop:
done:
.text
.align
.global
mov
mov
ldr
subs
beq
ldr
bl
b
mov
2
maxn
r2, r0
r3, r1
r0, [r2], #4
r3, r3, #1
done
r1, [r2], #4
max2
loop
pc, lr
@
@
@
@
@
@
reduce the count by 1
test if finished
put next number in R1
call max2 to find the larger
continue
return from subroutine
28
Calling Another Function

Be careful with the registers used in a function,
especially the link register!
maxn:
loop:
done:
.text
.align
.global
mov
mov
mov
ldr
subs
beq
ldr
bl
b
mov
mov
2
maxn
r2, r0
r3, r1
r5, lr
r0, [r2], #4
r3, r3, #1
done
r1, [r2], #4
max2
loop
lr, r5
pc, lr
@ save the link register
@
@
@
@
@
@
@
reduce the count by 1
test if finished
put next number in R1
call max2 to find the larger
continue
restore the link register
return from subroutine
29
Example #3: Computing Factorial
#include <stdio.h>
extern int factor(int n);
int main()
{
int n = 7;
printf("The factorial of %d is %d\n", n, factor(n));
}
.text
.align
.global
factor: stmfd
subs
moveq
blne
ldmfd
mul
done:
mov
2
factor
sp!, {r0, lr}
r0, r0, #1
r0, #1
factor
sp!, {r1, lr}
r0, r0, r1
pc, lr
@ push register on stack
@
@
@
@
@
(n-1)! = 1 if n-1 == 0
compute (n-1)! if n-1 != 0
pop registers from stack
compute n! = n * (n-1)!
return from subroutine
30
Assembly codes for if-statements
if cond then
then_statements
else
else_statements
end if;
t1 = cond
if not t1 goto else_label
codes for then_statements
goto endif_label
else_label:
codes for else_statements
endif_label:
31
Assembly codes for else-if parts

For each alternative, place in code the current
else_label, and generate a new one.
if cond then s1
else if cond2 then s2
else s4
end if;
t1 = cond1
if not t1 goto else_label1
codes for s1
goto endif_label
else_label1:
t2 = cond2
if not t2 goto else_label2
codes for s2
goto endif_label
else_label2:
codes for s4
endif_label:
32
Assembly codes for while loops

Create two labels: start_loop, end_loop
while (cond) {
s1;
if (cond2) break;
s2;
if (cond3) continue;
s3;
};
start_loop:
if (!cond) goto end_loop
codes for s1
if (cond2) goto end_loop
codes for s2
if (cond3) goto start_loop:
codes for s3
goto start_loop
end_loop:
33
Assembly codes for numeric loops

Semantics: loop not executed if range is null, so
must test before first pass.
for J in expr1..expr2 loop
S1
end loop;
J = expr1
start_label:
if J > expr2 goto end_label
codes for S1
J=J+1
goto start_label
end_label:
34
Codes for short-circuit expressions


Short-circuit expressions are treated as control
structures
if B1 or else B2 then S1… -- if (B1 || B2) { S1..
if B1 goto then_label
if not B2 goto else_label
then_label:
codes for S1
goto endif_label
else_label:


Inherit target labels from enclosing control
structure
Create additional labels for composite shortcircuits
35
Assembly codes for case statements

If range is small and most cases are defined, create
jump table as array of code addresses, and
generate indirect jump.
case x is
when up: y := 0;
when down : y := 1;
end case;
table label1, label2
…
jumpi x table
label1:
y=0
goto end_case
label2:
y=1
goto end_case
end_case:
36
Outline



Coprocessor and Thumb Instructions
Assembly Language
Runtime Environment
37
Runtime Environment


To understand the environment in which
your final output will be running.
How a program is laid out in memory:





Code
Data
Stack
Heap
How function callers and callees pass info
38
Executable Layout in Memory
Runtime stack
Dynamic data (heap)
Global data
 High memory
(not to scale)
Static data
Code
 Low memory
39
Overall Program Layout

From low memory up:






Code (text segment, instructions)
Static (constant) data
Global data
Dynamic data (heap)
Runtime stack (procedure calls)
Review of what’s in each section:
stack
heap
globl
static
code
40
Text Segment (Executable Code)1

Actual machine instructions







Arithmetic / logical
Comparison
Branch (short distances)
Jump (long distances)
Load / store
Data movement
Constant manipulation (immediate)
41
Text Segment (Executable Code)2




Code segment write-protected, so running
code can’t overwrite itself.
(Debugger can overwrite it.)
You’ll create the precursor for the code in
this segment by emitting assembly code.
Assembler will build final text.
42
Data Segment1

Data Objects



Whose size is known at compile time
Whose lifetime is the full run of the program
(not just during a function invocation)
Static data includes things that won’t change
(can be write-protected):



Virtual-function dispatching tables
String literals used in instructions
Arithmetic literals could be, but more likely
incorporated into instructions.
43
Data Segment2

Global data (other than static)


Variables declared global
Local variables declared static (in C)



Declared local to a function.
Retain values even between invocations of that
function (lifetime is whole run).
Semantic analysis ensures that static locals are
not referenced outside their function scope.
44
Dynamic Data (Heap)1



Data created by malloc or New.
Heap data lives until deallocated or until
program ends. (Sometimes longer than you
want, if you lose track of it.)
Garbage collection / reference counting are
ways of automatically de-allocating dead
storage in the heap.
45
Dynamic Data (Heap)2


Heap allocation starts at bottom of heap (lower
addresses) and allocates upward.
Requirements of alignment, specifics of allocation
algorithm may cause storage to be allocated out of
(address) order.
*p3
p1 = new Big();
p2 = new Medium();
*p2
*p4
p3 = new Big();
p4 = new Tiny();
 So (int)p2 > (int)p1
*p1
0x1000000
 But (int)p4 < (int)p3
 Compare pointers for equality, not < or >.
46
Runtime Stack1

Data used for function invocation:

Variables declared local to functions (including
main) aka “automatic” data.





Except for statics (in data segment)
Variables declared in anonymous blocks inside
function.
Arguments to function (passed by caller).
Temporaries used by generated code (not
representing names in source).
Possibly value returned by callee to caller.
47
Runtime Stack2

Types of data that can be allocated on
runtime stack:



In C, all kinds of data: simple types, structs,
arrays.
C++: stack can hold objects declared as class
type, as well as pointer type.
Some languages don’t allow arrays on stack.
48
Stack Terminology1
A stack is an abstract data type.
 Top
 Base
Push new value onto Top; pop value off Top.
Higher elements are more recent, lower
elements are older.
49
Stack Terminology2



Stack implementation can grow any direction.
MIPS stack grows downward (from higher
memory addresses to lower).
Possible difficulty with terminology.



Some people (and documents) talk about going “up”
and “down” the stack.
Some use the abstraction, where “up” means “more
recent”, towards Top.
Some (including gdb) say “up” meaning “towards older
entries”, toward Base.
50
Other Resources

Caches (very fast)



Physical memory (fast)
Virtual memory (swapping is slower)


Possibly multiple levels
Includes main memory + swap space on disk
Registers (the fastest)
51
Storage Layout Issues





Variables (local & file-scope)
Functions
Objects
Arrays
Strings
52
Arrays

C uses row-major order


Whole first row is stored first, then whole
second row, … then whole nth row.
Fortran uses column-major order


Whole first column is stored first, then whole
second col, … then whole kth col.
Storage still a big block, but a column is
contiguous instead of a row.
53
Generating Code for Array Refs

In C, use size of element and range of each
dimension to compute the offset of any
given element:

If A has m rows and n columns of 4-byte
elements:
&A [i] [j] is &A + 4 * (n * i + j)
54
C struct Objects

Structs in C are stored in adjacent words,
enough for all fields to be aligned:
struct cow {
char milk;
// 3 slack-bytes after this
char* name; // aligned on single word } Cow;
‘A’
0x20000
 “Bossy”
55
Making Function Calls

Each active function call has its own unique
stack frame







Frame pointer
Static link
Return address
Arguments
Local variables and temporaries
Who does which (caller vs callee)?
How do callers and callees communicate?
56
Who does what?1

Before a function call, the calling routine:





Saves any necessary registers
Pushes arguments onto the stack
Sets up the static link (if appropriate)
Saves the return address into $ra
Jumps (or branches) to the target (AKA the
callee, the called function)
57
Who does what?2

During a function call, the called routine:





Saves any necessary registers
Sets up the new frame pointer
Makes space for any locals or temporaries
Does its work
Sets up return value in $v0




Works only for integer or pointer values
Tears down frame pointer and static link
Restores any saved registers
Jumps to return address (saved on stack)
58
Who does what?3

After a function call, the calling routine:




Removes return address and parameters from
the stack
Gets return value from $v0
Restores any saved registers
Continues executing
59
Parameter Passing

Call by value


Call by value-result


Supported by Ada
Call by reference


Supported by C
Supported by Fortran
Call by name

Like C preprocessor macros
60
An Example Program
int dump(arg1, arg2, arg3, stop)
int arg1, arg2, arg3, *stop;
{
int loc1 = 5, loc2 = 6, loc3 = 7;
int *p;
printf("Address
Content\n");
for (p = stop; p >= (int*)(&p); p--)
printf("%8x:
%8x\n", p, *p);
return 9;
}
int main(argc, argv, envp)
int argc;
char *argv[], *envp[];
{
int var1 = 1, var2 = 2, var3 = 3;
var3 = dump(var1, var2, var3, &envp);
}
61
Sample Output
Address
bffff9a8:
bffff9a4:
bffff9a0:
bffff99c:
bffff998:
bffff994:
bffff990:
bffff98c:
bffff988:
bffff984:
bffff980:
bffff97c:
bffff978:
bffff974:
bffff970:
bffff96c:
bffff968:
bffff964:
bffff960:
bffff95c:
bffff958:
Content
bffff9ec
bffff9e4
1
420158d4
bffff9b8
1
2
3
bffff998
4212a2d0
4212aa58
bffff9a8
3
2
1
80483d1
bffff998
5
6
7
bffff958
Comment
envp
argv
argc
return address (crt0)
fp (crt0) <- fp (main)
var1
var2
var3
stop
arg3
arg2
arg1
return address (main)
fp (main) <- fp (dump)
loc1
loc2
loc3
p
62