Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
嵌入式處理器架構與程式設計 王建民 中央研究院 資訊所 2008年 7月 Contents Introduction Computer Architecture ARM Architecture Development Tools GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming GNU ARM ToolChain Interrupts and Monitor 2 Lecture 7 ARM Assembly Language Outline Coprocessor and Thumb Instructions Assembly Language Runtime Environment 4 Coprocessors1 The ARM architecture supports 16 coprocessors A coprocessor may be implemented System coprocessor Floating-point coprocessor Application-specific coprocessor in hardware in software (via the undefined instruction exception) in both (common cases in hardware, the rest in software) Each coprocessor instruction set occupies part of the ARM instruction set. 5 Coprocessors2 There are three types of coprocessor instruction Coprocessor data processing Coprocessor (to/from ARM) register transfers Coprocessor memory transfers (load and store to/from memory) Assembler macros can be used to transform custom coprocessor mnemonics into the generic mnemonics understood by the processor. 6 Coprocessor Data Processing This instruction initiates a coprocessor operation The operation is performed only on internal coprocessor state For example, a Floating point multiply, which multiplies the contents of two registers and stores the result in a third register Syntax: CDP{<cond>} <cp_num>,<opc_1>,CRd,CRn,CRm,{<opc_2>} 31 28 27 26 25 24 23 Cond 1 1 1 0 20 19 opc_1 Opcode 16 15 CRn 12 11 CRd cp_num 8 7 5 4 opc_2 0 3 0 CRm Destination Register Opcode Source Registers Condition Code Specifier 7 Coprocessor Register Transfers Instructions MRC : Move to ARM Register from Coprocessor MCR : Move to Coprocessor from ARM Register An operation may also be performed on the data as it is transferred Ex. a Floating Point Convert to Integer instruction can be implemented as a register transfer to ARM. Syntax <MRC|MCR>{<cond>} <cp_num>,<opc_1>,Rd,CRn,CRm,<opc_2> 31 Cond 28 27 26 25 24 23 22 21 20 19 1 1 1 0 opc_1 L Condition Code Specifier 16 15 CRn 12 11 Rd 8 cp_num 7 5 4 opc_2 1 ARM Source/Dest Register Opcode Coprocesor Source/Dest Registers Transfer To/From Coprocessor Opcode 3 0 CRm 8 Coprocessor Memory Transfers1 Load from memory to coprocessor registers Store to memory from coprocessor registers. 31 28 27 26 25 24 23 22 21 20 19 Cond 1 1 0 P U N W L Condition Code Specifier 16 15 Rn 12 11 CRd cp_num Source/Dest Register Base Register Load/Store Base Register Writeback Transfer Length Add/Subtract Offset Pre/Post Increment 8 7 0 Offset Address Offset 9 Coprocessor Memory Transfers2 Syntax <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<address> PC relative offset generated if possible, else causes an error. <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn,offset]{!}> Pre-indexed form, with optional writeback of the base register <LDC|STC>{<cond>}{<L>} <cp_num>,CRd,<[Rn],offset> Post-indexed form <L> when present causes a “long” transfer to be performed (N=1) else causes a “short” transfer to be performed (N=0). Effect of this is coprocessor dependant. 10 Thumb1 Thumb is a 16-bit instruction set Optimized for code density from C code (~65% of ARM code size) Improved performance from narrow memory (~160% of an equivalent ARM connected to 16-bit memory system) Subset of the functionality of the ARM instruction set Core has additional execution state - Thumb It can switch back and forth between 16-bit and 32-bit instructions Switch between ARM and Thumb using BX instruction 11 Thumb2 31 ADDS r2,r2,#1 0 32-bit ARM Instruction For most instructions generated by compiler: 15 ADD r2,#1 0 16-bit Thumb Instruction Conditional execution is not used Source and destination registers identical Only Low registers used Constants are of limited size Inline barrel shifter not used 12 Outline Coprocessor and Thumb Instructions Assembly Language Runtime Environment 13 The Programmer’s Model1 We will not be using the Thumb instruction set. Memory Formats We will be using the Little Endian format Instruction Length the lowest numbered byte of a word is considered the word’s least significant byte, and the highest numbered byte is considered the most significant byte . All instructions are 32-bits long. Data Types 8-bit bytes and 32-bit words. 14 The Programmer’s Model2 Processor Modes (of interest) User: the “normal” program execution mode. IRQ: used for general-purpose interrupt handling. Supervisor: a protected mode for the operating system. The Register Set Registers R0-R15 + CPSR R13: Stack Pointer R14: Link Register R15: Program Counter where bits 0:1 are ignored (why?) 15 The Programmer’s Model3 Program Status Registers CPSR (Current Program Status Register) holds info about the most recently performed ALU operation controls the enabling and disabling of interrupts sets the processor operating mode SPSR (Saved Program Status Registers) contains N (negative), Z (zero), C (Carry) and V (oVerflow) bits used by exception handlers Exceptions reset, undefined instruction, SWI, IRQ. 16 Assembly Language Basics1 “Load/store” architecture 32-bit instructions 32-bit and 8-bit data types 32-bit addresses 37 registers (30 general-purpose registers, 6 status registers and a PC) only a subset is accessible at any point in time No instruction to move a 32-bit constant to a register (why?) 17 Assembly Language Basics2 Conditional execution Barrel shifter scaled addressing, multiplication by a small constant, and ‘constant’ generation Loading constants into registers Loading addresses into registers Load and Store Multiple instructions Jump tables Co-processor instructions (we will not use these) 18 GNU ARM Assembler You can assemble the contents of any ARM assembly language source file by executing the arm-elf-as program. Though you can use the GNU Linker to create the final executable, it is preferred to use the GNU Compiler Collection to create an executable file. arm-elf-as –mno-fpu –o filename.o filename.s arm-elf-gcc –o filename.elf filename.s To execute an ARM executable file arm-elf-run filename.elf 19 Assembly Language Syntax Each assembly line has the following format [<label:>] Begins with a letter A directive to guide the work of the assembler Only use the alphabetic characters A-Z and a-z, the digits 0-9, as well as “_”, “.”, and “$” An instruction to assemble into machine language code. @ comment A label can be any valid symbol followed by a : [<instruction or directive>] Begins with a . A comment is anything that follows a @ C-style comments (using “/*” and “*/”) are also allowed 20 Assembler Directives Starting a new section .section name Defining code section of program .text Defining data initialized data section of program .data Defining un-initialized data section of program .bss End of the assembly file (optional) .end 21 Assembler Directives Making a symbol available to other partial programs that are linked with it .global Declaring a symbol as externally defined (optional) .extern symbol Aligning the address to a particular storage boundary which is a power of 2. .align symbol expression Declaring a common symbol that may be merged .comm symbol,length,alignment 22 Assembler Directives Defining / initializing storage locations .word .hword .byte @ 32 bits @ 16 bits @ 8 bits Defining / initializing a string .ascii .asciz expression expression expression “string” “string” Defining memory space .skip .space size size 23 Assembler Directives Directives similar to the statements that begin with “#” in the C programming language .include .equ .set .if .ifdef .ifndef .else .endif “file” symbol, expression symbol, expression expression expression expression 24 The Structure of an Assembly Code Chunks of code or data manipulated by the linker .file "sum2.s" .section .text .align 2 .global sum2 sum2: add mov Minimum required block (why?) r0, r0, r1 pc, lr .end @ @ @ @ @ @ @ the code section aligns the address to 4 bytes give the symbol an external linkage add input arguments return from subroutine @ end of program First instruction to be executed 25 Example #1: Finding the Large One #include <stdio.h> extern int max2(int a, int b); int main() { int a = 12345; int b = 6789; printf("The maximum of %d and %d is %d\n",a,b,max2(a,b)); } .text .align 2 .global max2 max2: done: cmp bge mov mov r0, r1 done r0, r1 pc, lr @ @ @ @ compare two numbers if R0 contains the maximum otherwise overwrite R0 return from subroutine 26 Example #2: Finding the Largest #include <stdio.h> extern int maxn(int *a, int n); int a[6] = { 123, 34, 45, 56, 678, 9 }; int main() { printf("The maximum of all numbers is %d\n", maxn(a,6)); } .text .align 2 .global maxn maxn: mov r2, r0 mov r3, r1 ldr r0, [r2], #4 loop: subs r3, r3, #1 @ reduce the count by 1 beq done @ test if finished ldr r1, [r2], #4 @ put next number in R1 cmp r0, r1 @ if R0 contains the larger movlt r0, r1 @ otherwise overwrite R0 b loop @ continue done: mov pc, lr @ return from subroutine 27 Does this work? Instead of computing the larger number by itself, it may call max2 in Example #1 to find the larger number maxn: loop: done: .text .align .global mov mov ldr subs beq ldr bl b mov 2 maxn r2, r0 r3, r1 r0, [r2], #4 r3, r3, #1 done r1, [r2], #4 max2 loop pc, lr @ @ @ @ @ @ reduce the count by 1 test if finished put next number in R1 call max2 to find the larger continue return from subroutine 28 Calling Another Function Be careful with the registers used in a function, especially the link register! maxn: loop: done: .text .align .global mov mov mov ldr subs beq ldr bl b mov mov 2 maxn r2, r0 r3, r1 r5, lr r0, [r2], #4 r3, r3, #1 done r1, [r2], #4 max2 loop lr, r5 pc, lr @ save the link register @ @ @ @ @ @ @ reduce the count by 1 test if finished put next number in R1 call max2 to find the larger continue restore the link register return from subroutine 29 Example #3: Computing Factorial #include <stdio.h> extern int factor(int n); int main() { int n = 7; printf("The factorial of %d is %d\n", n, factor(n)); } .text .align .global factor: stmfd subs moveq blne ldmfd mul done: mov 2 factor sp!, {r0, lr} r0, r0, #1 r0, #1 factor sp!, {r1, lr} r0, r0, r1 pc, lr @ push register on stack @ @ @ @ @ (n-1)! = 1 if n-1 == 0 compute (n-1)! if n-1 != 0 pop registers from stack compute n! = n * (n-1)! return from subroutine 30 Assembly codes for if-statements if cond then then_statements else else_statements end if; t1 = cond if not t1 goto else_label codes for then_statements goto endif_label else_label: codes for else_statements endif_label: 31 Assembly codes for else-if parts For each alternative, place in code the current else_label, and generate a new one. if cond then s1 else if cond2 then s2 else s4 end if; t1 = cond1 if not t1 goto else_label1 codes for s1 goto endif_label else_label1: t2 = cond2 if not t2 goto else_label2 codes for s2 goto endif_label else_label2: codes for s4 endif_label: 32 Assembly codes for while loops Create two labels: start_loop, end_loop while (cond) { s1; if (cond2) break; s2; if (cond3) continue; s3; }; start_loop: if (!cond) goto end_loop codes for s1 if (cond2) goto end_loop codes for s2 if (cond3) goto start_loop: codes for s3 goto start_loop end_loop: 33 Assembly codes for numeric loops Semantics: loop not executed if range is null, so must test before first pass. for J in expr1..expr2 loop S1 end loop; J = expr1 start_label: if J > expr2 goto end_label codes for S1 J=J+1 goto start_label end_label: 34 Codes for short-circuit expressions Short-circuit expressions are treated as control structures if B1 or else B2 then S1… -- if (B1 || B2) { S1.. if B1 goto then_label if not B2 goto else_label then_label: codes for S1 goto endif_label else_label: Inherit target labels from enclosing control structure Create additional labels for composite shortcircuits 35 Assembly codes for case statements If range is small and most cases are defined, create jump table as array of code addresses, and generate indirect jump. case x is when up: y := 0; when down : y := 1; end case; table label1, label2 … jumpi x table label1: y=0 goto end_case label2: y=1 goto end_case end_case: 36 Outline Coprocessor and Thumb Instructions Assembly Language Runtime Environment 37 Runtime Environment To understand the environment in which your final output will be running. How a program is laid out in memory: Code Data Stack Heap How function callers and callees pass info 38 Executable Layout in Memory Runtime stack Dynamic data (heap) Global data High memory (not to scale) Static data Code Low memory 39 Overall Program Layout From low memory up: Code (text segment, instructions) Static (constant) data Global data Dynamic data (heap) Runtime stack (procedure calls) Review of what’s in each section: stack heap globl static code 40 Text Segment (Executable Code)1 Actual machine instructions Arithmetic / logical Comparison Branch (short distances) Jump (long distances) Load / store Data movement Constant manipulation (immediate) 41 Text Segment (Executable Code)2 Code segment write-protected, so running code can’t overwrite itself. (Debugger can overwrite it.) You’ll create the precursor for the code in this segment by emitting assembly code. Assembler will build final text. 42 Data Segment1 Data Objects Whose size is known at compile time Whose lifetime is the full run of the program (not just during a function invocation) Static data includes things that won’t change (can be write-protected): Virtual-function dispatching tables String literals used in instructions Arithmetic literals could be, but more likely incorporated into instructions. 43 Data Segment2 Global data (other than static) Variables declared global Local variables declared static (in C) Declared local to a function. Retain values even between invocations of that function (lifetime is whole run). Semantic analysis ensures that static locals are not referenced outside their function scope. 44 Dynamic Data (Heap)1 Data created by malloc or New. Heap data lives until deallocated or until program ends. (Sometimes longer than you want, if you lose track of it.) Garbage collection / reference counting are ways of automatically de-allocating dead storage in the heap. 45 Dynamic Data (Heap)2 Heap allocation starts at bottom of heap (lower addresses) and allocates upward. Requirements of alignment, specifics of allocation algorithm may cause storage to be allocated out of (address) order. *p3 p1 = new Big(); p2 = new Medium(); *p2 *p4 p3 = new Big(); p4 = new Tiny(); So (int)p2 > (int)p1 *p1 0x1000000 But (int)p4 < (int)p3 Compare pointers for equality, not < or >. 46 Runtime Stack1 Data used for function invocation: Variables declared local to functions (including main) aka “automatic” data. Except for statics (in data segment) Variables declared in anonymous blocks inside function. Arguments to function (passed by caller). Temporaries used by generated code (not representing names in source). Possibly value returned by callee to caller. 47 Runtime Stack2 Types of data that can be allocated on runtime stack: In C, all kinds of data: simple types, structs, arrays. C++: stack can hold objects declared as class type, as well as pointer type. Some languages don’t allow arrays on stack. 48 Stack Terminology1 A stack is an abstract data type. Top Base Push new value onto Top; pop value off Top. Higher elements are more recent, lower elements are older. 49 Stack Terminology2 Stack implementation can grow any direction. MIPS stack grows downward (from higher memory addresses to lower). Possible difficulty with terminology. Some people (and documents) talk about going “up” and “down” the stack. Some use the abstraction, where “up” means “more recent”, towards Top. Some (including gdb) say “up” meaning “towards older entries”, toward Base. 50 Other Resources Caches (very fast) Physical memory (fast) Virtual memory (swapping is slower) Possibly multiple levels Includes main memory + swap space on disk Registers (the fastest) 51 Storage Layout Issues Variables (local & file-scope) Functions Objects Arrays Strings 52 Arrays C uses row-major order Whole first row is stored first, then whole second row, … then whole nth row. Fortran uses column-major order Whole first column is stored first, then whole second col, … then whole kth col. Storage still a big block, but a column is contiguous instead of a row. 53 Generating Code for Array Refs In C, use size of element and range of each dimension to compute the offset of any given element: If A has m rows and n columns of 4-byte elements: &A [i] [j] is &A + 4 * (n * i + j) 54 C struct Objects Structs in C are stored in adjacent words, enough for all fields to be aligned: struct cow { char milk; // 3 slack-bytes after this char* name; // aligned on single word } Cow; ‘A’ 0x20000 “Bossy” 55 Making Function Calls Each active function call has its own unique stack frame Frame pointer Static link Return address Arguments Local variables and temporaries Who does which (caller vs callee)? How do callers and callees communicate? 56 Who does what?1 Before a function call, the calling routine: Saves any necessary registers Pushes arguments onto the stack Sets up the static link (if appropriate) Saves the return address into $ra Jumps (or branches) to the target (AKA the callee, the called function) 57 Who does what?2 During a function call, the called routine: Saves any necessary registers Sets up the new frame pointer Makes space for any locals or temporaries Does its work Sets up return value in $v0 Works only for integer or pointer values Tears down frame pointer and static link Restores any saved registers Jumps to return address (saved on stack) 58 Who does what?3 After a function call, the calling routine: Removes return address and parameters from the stack Gets return value from $v0 Restores any saved registers Continues executing 59 Parameter Passing Call by value Call by value-result Supported by Ada Call by reference Supported by C Supported by Fortran Call by name Like C preprocessor macros 60 An Example Program int dump(arg1, arg2, arg3, stop) int arg1, arg2, arg3, *stop; { int loc1 = 5, loc2 = 6, loc3 = 7; int *p; printf("Address Content\n"); for (p = stop; p >= (int*)(&p); p--) printf("%8x: %8x\n", p, *p); return 9; } int main(argc, argv, envp) int argc; char *argv[], *envp[]; { int var1 = 1, var2 = 2, var3 = 3; var3 = dump(var1, var2, var3, &envp); } 61 Sample Output Address bffff9a8: bffff9a4: bffff9a0: bffff99c: bffff998: bffff994: bffff990: bffff98c: bffff988: bffff984: bffff980: bffff97c: bffff978: bffff974: bffff970: bffff96c: bffff968: bffff964: bffff960: bffff95c: bffff958: Content bffff9ec bffff9e4 1 420158d4 bffff9b8 1 2 3 bffff998 4212a2d0 4212aa58 bffff9a8 3 2 1 80483d1 bffff998 5 6 7 bffff958 Comment envp argv argc return address (crt0) fp (crt0) <- fp (main) var1 var2 var3 stop arg3 arg2 arg1 return address (main) fp (main) <- fp (dump) loc1 loc2 loc3 p 62