Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Instruction Set & Assembly Language Programming Jianjian SONG Software Institute, Nanjing University Content Computer Architecture Taxonomy ARM Architecture Introduction ARM Instruction Set ARM Assembly Language Programming 1. Computer Architecture Taxonomy What is architecture? Architecture & Organization 1 Architecture is those attributes visible to the programmer Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. e.g. Is there a multiply instruction? Organization is how features are implemented Control signals, interfaces, memory technology. e.g. Is there a hardware multiply unit or is it done by repeated addition? Architecture & Organization 2 All Intel x86 family share the same basic architecture The IBM System/370 family share the same basic architecture This gives code compatibility At least backwards Organization differs between different versions von Neumann architecture Memory holds data, instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc. CPU + memory address memory data 200 PC CPU 200 ADD r5,r1,r3 ADD IR r5,r1,r3 Harvard architecture address data memory data address program memory data PC CPU von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth. RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions. Load-store Architecture 指令集仅能处理(如ADD、SUB等)寄存器中(或 指令中直接指定)的值,而且总是将处理结果 放回寄存器中。针对存储器的唯一操作是将存 储器的值装入寄存器(load指令),或将寄存器 的值存到存储器(store指令)。 相比较,典型的CISC处理器允许将存储器中的 值加(ADD)到寄存器,有时还允许将寄存器的 值加(ADD)到存储器中。 Instruction set characteristics Fixed vs. variable length. Addressing modes. Number of operands. Types of operands. Programming model Programming model: registers visible to the programmer. Some registers are not visible (e.g. IR). Multiple implementations Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc. 2. ARM Architecture Introduction ARM (Advanced RISC Machines) ARM公司是一家设计公司,是IP 供应商, 靠转让设计许可证由合作伙伴生产各具 特色的芯片。 What is IP?Intellectual Property ARM的特点 ARM具有RISC体系的一般特点: 大量寄存器 绝大多数操作都在寄存器中进行,通过Load/Store 的在内存和寄存器间传递数据。 寻址方式简单 采用固定长度的指令格式 此外, 小体积、低功耗、低成本、高性能 16位/32位双指令集 全球众多合作伙伴 ARM体系结构的版本和扩充 六个版本 ARMv1 ~ ARMv6 ARM体系结构的扩充 Thumb (T variant): 16位指令集,用以改善 指令密度; DSP (E variant): 用于DSP应用的算术运算指 令集; Jazeller (J variant): 允许直接执行Java字节 什么是指令密度? 码 执行同等操作序列的前提下,单位内存空间所容纳的机器指令数。 ARM体系结构版本的命名格式 命名字符串: ARM vx (x: 指令集版本号,1~6) 表示变种的字符 (如 T, E, J ) 用字符x表示排除某种写功能。 ARM处理器系列 ARM7系列 ARM9系列 ARM9E系列 ARM10系列 SecureCore系列 Intel StrongARM Intel XScale 3. ARM Instruction Set ARM ARM ARM ARM ARM assembly language programming model memory organization data operations flow of control Assembly language Why assembly language? One-to-one with instructions (more or less). Basic features: One instruction per line. Labels provide names for addresses (usually in first column). Instructions often start in later columns. Columns run to end of line. ARM assembly language example label1 ADR LDR ADR LDR SUB r4,c r0,[r4] ; a comment r4,d r1,[r4] r0,r0,r1 ; comment ARM指令的一般编码格式 31 28 27 26 25 24 cond 00 21 20 19 X opcode S 16 15 Rn 12 11 Rd Shifter-operand opcode: 指令操作符编码 cond: 指令执行条件编码 S: 指令的操作是否影响CPSR的值 Rn: 包含第一个操作数的寄存器编码 Rd: 目标寄存器编码 Shifter_operand: 第二个操作数 0 ARM指令的基本寻址方式 寄存器寻址 ; (R3)+2→R3 例:LDR R0 , [R3] ; ((R3))→R0 寄存器变址 例:ADD R3 , R3 , #2 寄存器间接寻址 ; (R1)+(R2)→R0 立即数寻址 例:ADD R0 , R1 , R2 例:LDR R0 , [R1, #4] ; ((R1)+4)→R0 相对寻址 例:B rel ; (PC)+rel→PC Pseudo-ops Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants. ARM programming model r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 (PC) 0 31 CPSR NZCV Endianness Relationship between bit and byte/word ordering defines endianness: bit 31 bit 0 byte 3 byte 2 byte 1 byte 0 little-endian bit 0 bit 31 byte 0 byte 1 byte 2 byte 3 big-endian ARM data types Word is 32 bits long. Word can be divided into four 8-bit bytes. ARM addresses can be 32 bits long. Address refers to byte. Address 4 starts at byte 4. Can be configured at power-up as either little- or big-endian mode. ARM status bits Every arithmetic, logical, or shifting operation sets CPSR bits: N (negative), Z (zero), C (carry), V (overflow). Examples: -1 + 1 = 0: NZCV = 0110. 231-1+1 = -231: NZCV = 0101. Instructions Overview Data instructions Move Instructions Load/Store instructions Comparison instructions Branch instructions ARM data instructions Basic format: ADD r0,r1,r2 Computes r1+r2, stores in r0. Immediate operand: ADD r0,r1,#2 Computes r1+2, stores in r0. ARM data instructions ADD, ADC : add (w. carry) SUB, SBC : subtract (w. carry) RSB, RSC : reverse subtract (w. carry) MUL, MLA : multiply (and accumulate) AND, ORR, EOR BIC : bit clear LSL, LSR : logical shift left/right ASL, ASR : arithmetic shift left/right ROR : rotate right RRX : rotate right extended with C Data operation varieties Logical shift: Arithmetic shift: fills with zeroes. fills with ones. RRX performs 33-bit rotate, including C bit from CPSR above sign bit. ARM move instructions MOV, MVN : move (negated) MOV r0, r1 ; sets r0 to r1 ARM load/store instructions LDR, LDRH, LDRB : load (half-word, byte) STR, STRH, STRB : store (half-word, byte) Addressing modes: register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4] ARM comparison instructions CMP : compare CMN : negated compare TST : bit-wise test TEQ : bit-wise negated test These instructions set only the NZCV bits of CPSR. ARM branch instructions B: Branch BL: Branch and Link ARM ADR pseudo-op Cannot refer to an address directly in an instruction. Generate value by performing arithmetic on PC. ADR pseudo-op generates instruction required to calculate address: ADR r1,FOO Example: C assignments C: x = (a + b) - c; Assembler: ADR LDR ADR LDR ADD ADR LDR r4,a r0,[r4] r4,b r1,[r4] r3,r0,r1 r4,c r2,[r4] ; ; ; ; ; ; ; get address for a get value of a get address for b, reusing r4 get value of b compute a+b get address for c get value of c C assignment, cont’d. SUB r3,r3,r2 ADR r4,x STR r3,[r4] ; complete computation of x ; get address for x ; store value of x Example: C assignment C: y = a*(b+c); Assembler: ADR LDR ADR LDR ADD ADR LDR r4,b ; get address for b r0,[r4] ; get value of b r4,c ; get address for c r1,[r4] ; get value of c r2,r0,r1 ; compute partial result r4,a ; get address for a r0,[r4] ; get value of a C assignment, cont’d. MUL r2,r2,r0 ; compute final value for y ADR r4,y ; get address for y STR r2,[r4] ; store y Example: C assignment C: z = (a << 2) | (b & 15); Assembler: ADR LDR MOV ADR LDR AND ORR r4,a ; get address for a r0,[r4] ; get value of a r0,r0,LSL 2 ; perform shift r4,b ; get address for b r1,[r4] ; get value of b r1,r1,#15 ; perform AND r1,r0,r1 ; perform OR C assignment, cont’d. ADR r4,z ; get address for z STR r1,[r4] ; store value for z Additional addressing modes Base-plus-offset addressing: LDR r0,[r1,#16] Loads from location r1+16 Auto-indexing increments base register: LDR r0,[r1,#16]! Post-indexing fetches, then does offset: LDR r0,[r1],#16 Loads r0 from r1, then adds 16 to r1. ARM flow of control All operations can be performed conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT, GT, LE Branch operation: B #100 Can be performed conditionally. Example: if statement C: if (a < b) { x = 5; y = c + d; } else x = c d; Assembler: ; compute and test condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value for b CMP r0,r1 ; compare a < b BGE fblock ; if a >= b, branch to false block If statement, cont’d. ; true block MOV r0,#5 ; generate value for x ADR r4,x ; get address for x STR r0,[r4] ; store x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute y ADR r4,y ; get address for y STR r0,[r4] ; store y B after ; branch around false block If statement, cont’d. ; false block fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value for d SUB r0,r0,r1 ; compute a-b ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ... Example: Conditional instruction implementation ; true block MOVLT r0,#5 ; generate value ADRLT r4,x ; get address for STRLT r0,[r4] ; store x ADRLT r4,c ; get address for LDRLT r0,[r4] ; get value of ADRLT r4,d ; get address for LDRLT r1,[r4] ; get value of ADDLT r0,r0,r1 ; compute y ADRLT r4,y ; get address for STRLT r0,[r4] ; store y for x x c c d d y Example: switch statement C: switch (test) { case 0: … break; case 1: … } Assembler: ADR r2,test ; get address for test LDR r0,[r2] ; load value for test ADR r1,switchtab ; load address for switch table LDR r15,[r1,r0,LSL #2] ; index switch table switchtab DCD case0 DCD case1 ... Example: FIR filter C: for (i=0, f=0; i<N; i++) f = f + c[i]*x[i]; Assembler ; loop initiation code MOV r0,#0 ; use r0 for I MOV r8,#0 ; use separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f FIR filter, cont’.d ADR r3,c ; load r3 with base of c ADR r5,x ; load r5 with base of x ; loop body loop LDR r4,[r3,r8] ; get c[i] LDR r6,[r5,r8] ; get x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ADD r8,r8,#4 ; add one word offset to array index ADD r0,r0,#1 ; add 1 to i CMP r0,r1 ; exit? BLT loop ; if i < N, continue ARM subroutine linkage Branch and link instruction: BL foo Copies current PC to r14. To return from subroutine: MOV r15,r14 Nested subroutine calls f1 Nesting/recursion requires coding convention: LDR r0,[r13] ; load arg into r0 from stack ; call f2() STR r13!,[r14] ; store f1’s return adrs STR r13!,[r0] ; store arg to f2 on stack BL f2 ; branch and link to f2 ; return from f1() SUB r13,#4 ; pop f2’s arg off stack LDR r13!,r15 ; restore register and return Summary Load/store architecture Most instructions are RISCy, operate in single cycle. Some multi-register operations take longer. All instructions can be executed conditionally. 4. ARM Assembly Language Programming Why and when to use? AT&T format and Intel format Grammar of ARM assembly language Examples Why and when to use? 操作系统内核中的底层程序直接与硬件 打交道,需要用到的专用指令。 CPU中的特殊指令 频繁使用代码的时间效率 程序的空间效率(如操作系统的引导程序) Refer to “Linux内核源代码情景分析” (浙江大学出版社)1.5节 AT&T format and Intel format Grammar of ARM assembly language 语句 程序格式 语句 语句 指令 伪操作 宏 语句格式 { symbol } { instruction | directive | pseudo-instruction } { ;comment } 伪操作 符号定义伪操作 数据定义伪操作 汇编控制伪操作 框架描述伪操作 信息报告伪操作 其它伪操作 关于变量的伪操作 声明一个全局变量,并初始化 声明一个局部变量,并初始化 GBLA, GBLL, GBLS LCLA, LCLL, LCLS 变量赋值 SETA, SETL, SETS Example GBLA objectsize objectsize SETA 0xff SPACE objectsize GBLL statusB statusB SETL {TRUE} ;声明一个全局的算术变量 ;给该变量赋值 ;使用该变量 关于数据常量的伪操作 EQU name EQU expr {, type} 通常在.inc文件中 分配内存单元 SPACE DCB {label} SPACE bye_num 分配一块内存单元,并用0初始化 {label} DCB expr, {expr} 分配一段字节内存单元,并用expr初始化 DCD {label} DCD expr, {expr} 分配一段字内存单元(分配的内存都是字对齐的), 并用expr初始化 MACRO and MEND 子程序与宏 宏定义体 在子程序比较短,而需要传递的参数比较多的情况下使用宏 汇编技术 MACRO: 宏定义的开始 MEND: 宏定义的结束 通常在.mac文件中 格式 MACRO {$label} macroname {$para1, $para2, ...} ... ;code MEND Example MACRO $label xmac $p1 ... ;code $label.loop1 ;宏定义体的内部标号 ... ;code BGE $label.loop1 $label.loop2 ;宏定义体的内部标号 ... ;code BL $p1 ;参数p1是一个子程序的名称 BGT $label.loop2 ... ;code MEND Example (cont’d) “abc xmac subr1”调用宏展开后的结果 ... ;code abcloop1 ;内部标号label被abc代替 ... ;code BGE abcloop1 ; 内部标号label被abc代替 abcloop2 ;内部标号label被abc代替 ... ;code BL subr1 ;参数p1被实际值subr1代替 BGT abcloop2 ... ;code 其它伪操作 AREA: 定义一个代码段或数据段 AREA sectionname {, attr1} {, attr2} ENTRY: 程序入口点 END: 源程序结束 其它伪操作(cont’d) GET/INCLUDE EXPORT INCLUDE filename EXPORT symbol {[WEAK]} IMPORT IMPORT symbol {[WEAK]} 伪指令 ADR ADRL ADRL{cond} register, expr ADRL伪指令比ADR读取更大的地址范围。 汇编替换为两条指令 LDR ADR{cond} register, expr 将基于PC的地址值或基于寄存器的地址值读取到寄存器中 汇编替换成一条指令 LDR{cond} register, =[expr | label_expr] 将一个32位的常数或地址值读取到寄存器中 NOP 空操作,如MOV R0, R0 程序格式 以段为单位组织源文件 代码段和数据段 AREA伪操作 Example Review Computer architecture and ARM architecture Instruction set Assembly language programming Program structure Statements