Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Assembly Language Fundamentals Chapter 2 1 Directives and Instructions Assembly language statements are either directives or instructions Instructions are executable statements. They are translated by the assembler into machine instructions. Ex: call MySub mov ax,5 ;transfer of control ;data transfer Directives tells the assembler how to generate machine code and allocate storage. Ex: count db 50 2 ;creates 1 byte ;of storage ;initialized to 50 A Template for Assembly Language Programs .386 = directive to accept all instructions of 386 and previous processors (use .586 to assemble Pentium specific instructions) end = directive that marks the end of the program main = label of the entry point of the program (first instruction to execute) ret = instruction that returns the control to the caller (here the Win32 console) Macros to perform I/O are included in csi2121.inc 3 .386 .model flat include csi2121.inc .data ;data allocation ;directives here .code main: ;instructions here ret end The FLAT Memory Model The .model flat directive tells the assembler to generate code that will run in protected mode and in 32-bit mode Also ask the assembler to do whatever is needed in order that code, stack, and data share the same 32-bit memory segment All the segment registers will be loaded with the correct values at load time and do not need to be changed by the programmer Only the offset part of a logical address becomes relevant Each data byte (or instruction) is referred to only by a 32-bit offset address The directives .code and .data mark the beginning of the code and data segments. They are used only for protection .code is read-only .data is read and write 4 Steps to Produce an Executable File Source file Assembler Object file linker Executable file library The assembler produces an object file from the assembly language source The object file contains machine language code with some external and relocatable addresses that will be resolved by the linker. There values are undetermined at that stage. The linker extract object modules (compiled procedures) from a library and links them with the object file to produce the executable file. The addresses in the executable file are all resolved but they are still logical addresses. 5 Using Borland’s BCC32 All these steps are performed with the command: bcc32 –v hello.asm The bcc32 command calls TASM32 to assemble and produce an object file It then calls ILINK32 to link this object file with the C/C++ library functions and Win32 functions used by the program to produce the executable file hello.exe The –v option produces full debugging info See the LabInfo page for all the info you need 6 Names A name identifies either: a variable a label a constant a keyword (assembler-reserved word). 7 Names (Cont.) A variable is a symbolic name for a location in memory that was allocated by a data allocation directive. Ex: count db 50 ; allocates 1 byte to ; variable count A label is a name given to an instruction. It must be followed by ‘:’. Ex: main: mov eax, 5 xor eax, ebx jump main 8 Names (Cont.) The first character must be a letter or any one of ‘@’, ‘_’, ‘$’, ‘?’ subsequent characters can include digits A programmer chosen name must be different from an assembler reserved word avoid using ‘@’ as the first character since many keywords start with it When called from bcc32, the TASM32 assembler is case sensitive for user-defined words but case insensitive for the assembler reserved words 9 Integer Constants Integer constants are made of numerical digits with, possibly, a sign and a suffix. Ex: -23 (a negative integer, base 10 is default) 1011b (a binary number) 1011 (a decimal number) 0A7Ch (an hexadecimal number) A7Ch (this is the name of a variable, an hexadecimal number must start with a decimal digit) 10 Character and String Constants They are any sequence of characters enclosed either in single or double quotation marks. Embedded quotes are permitted. Ex: ‘A’ ‘ABC’ “Hello World!” “123” (this is a string, not a number) “This isn’t a test” ‘Say “hello” to him’ 11 Simple Data Allocation Directives The DB (define byte) directive allocates storage for one or more byte values [name] DB initval [,initval] Each initializer can be any constant. Ex: a db 10, 32, 41h ;allocate 3 bytes b db 0Ah, 20h,‘A’;same values as above A question mark (?) in the initializer leaves the initial value of the variable undefined. Ex: c db ? ;the initial value for c is ;undefined Everything that follows “;” is ignored by the assembler. It is thus a comment 12 Simple Data Allocation Directives (cont.) A string is stored as a sequence of characters. Ex: aString db “ABCD” bString DB ‘A’,’B’,’C’,’D’;same values cString db 41h,42h,43h,44h ;same values again The (offset) address of a variable is the address of its first byte. Ex: If the following data segment starts at address 0 .data Var1 db “ABC” Var2 db “DEFG” 13 The address of Var1 is 0 = the address of ‘A’ The address of ‘B’ is 1 The address of ‘C’ is 2 The address of Var2 is 3 The address of ‘E’ is 4 … Simple Data Allocation Directives (cont.) Define Word (DW) allocates a sequence of words. Ex: A dw 1234h, 5678h ; allocates 2 words Intel’s x86 are little endian processors: the lowest order byte (of a word or double word) is always stored at the lowest address. Ex: if variable A (above) is located at address 0, we have: address: 0 1 2 3 value: 34h 12h 78h 56h 14 Simple Data Allocation Directives (cont.) Define Double Word (DD) allocates a sequence of double words. Ex: B dd 12345678h ;allocates 1 double word If this variable is located at address of 0, we have: address: 0 1 2 3 value: 78h 56h 34h 12h If a value fits into a byte, it will be stored in the lowest ordered byte available. Ex: V dw ‘A’ the value will be stored as: address: 0 1 value: 41h 00h 15 Simple Data Allocation Directives (cont.) The DUP operator enables us to repeat values when allocating storage. Ex: a db 100 dup(?) ;100 bytes ;uninitialized b db 3 dup(“Ho”) ;6 bytes: “HoHoHo” DUP can be nested: c db 2 dup(‘a’, 2 dup(‘b’)) ;this allocates 6 bytes:‘abbabb’ DUP must be used with data allocation directives There is a bug is some TASM32 versions: b db 3 dup(“Ho”) Will allocate 6 bytes that will be filled with 0 (i.e. the specified initial values are ignored). 16 Constants We can use the equal-sign (=) directive or the EQU directive to give a name to a constant. Ex: one = 1 ;this is a constant two equ 2; also a constant The EQU and = directives are equivalent The assembler does not allocate storage to a constant (in contrast with data allocation directives) It merely substitutes, at assembly time, the value of the constant at each occurrence of the assigned name 17 Constants (cont.) In place of a constant, we can use a constant expression involving the standard operators used in HLLs: +, -, *, / Ex: the following constant expression is evaluated at assembly time and given a name at assembly time: A = (-3 * 8) + 2 A constant can be defined in terms of another constant: B = (A+2)/2 18 Exercise 1 Suppose that the following data segment starts at address 0 .data A DW 1,2 B DW 6ABCh Z EQU 232 C DB 'ABCD' 19 A) Find the address of variable A. B) Find the address of variable B. C) Find the address of variable C. D) Find the address of character ‘C’. Data Transfer Instructions The MOV instruction transfers the content of the source operand to the destination operand mov destination,source This changes the content of destination (but not the content of source) Both operands must be of the same size. An operand can be either direct or indirect Direct operands (this chapter) are either: Immediate (a constant): noted imm Register: noted reg Memory variable (with displacement), noted mem Indirect operands are used for indirect addressing (later chapter) 20 Data Transfer Instructions (cont.) Some restrictions on MOV: imm cannot be the destination operand... EIP cannot be an operand Source and destination cannot both be mem. Direct memory-to-memory data transfer is forbidden! mov wordVar1,wordVar2; illegal 21 Data Transfer Instructions (cont.) The type of an operand is given by its size (byte, word, doubleword…) Both operands of MOV must be of the same type Type check is done by the assembler The type assigned to a mem operand is given by its data allocation directive (DB, DW…) The type assigned to a register is given by its size An imm source operand of MOV must fit into the size of the destination operand 22 Data Transfer Instructions (cont.) Examples of MOV usage: mov bh, 255; 8-bit operands mov al, 256; error: cst too large mov bx, AwordVar; 16-bit operands mov bx, AbyteVar; error: size mismatch mov edx, AdoublewordVar;32-bit operands mov cx, bl ; error: size mismatch mov wordVar1,wordVar2 ;error: mem-to-mem 23 MOVZX: Move with Zero Extend Often we want to move the content of a source operand into a destination operand of larger size The MOVZX instruction does this operation by filling with zeros the high order part of the destination. Usage: MOVZX destination,source Immediate operands are not allowed here The size of destination must be strictly larger than the size of source Example: mov bh, 80h movzx ah,bh ;illegal, size mismatch movzx ax,bh ;AX = 0080h movzx ecx,ax ;ECX = 00000080h Notice that if the signed value in the source operand is negative, then MOVZX will not preserve the sign. mov bh, 80h ;BH = 80h (negative) movzx ax,bh ;AX = 0080h (positive) 24 MOVSX: Move with Sign Extend 25 We can use the MOVSX instruction to preserve the sign of the source operand. Usage: MOVSX destination,source The high order part of the destination operand will be the sign extension of the source operand The sign extension of a negative number is …111111 The sign extension of a positive number is …0000000 Examples: mov bh, 80h ;BH = 80h (negative) movsx ax,bh ;AX = FF80h (negative) ;FFh is the sign extension of 80h mov bl, 7Ah ;BL = 7Ah (positive) movsx ax,bl ;AX = 007Ah (positive) ;00h is the sign extension of 7Ah MOVSX preserves the signed value whereas MOVZX preserves the unsigned value Immediate operands are not allowed and the size of destination must be strictly larger than the size of source. Data Transfer Instructions (cont.) We can add a displacement to a memory operand to access a memory value without a name Ex: .data arrB db 10h, 20h arrW dw 1234h, 5678h arrB+1 refers to the location one byte beyond the beginning of arrB and arrW+2 refers to the location two bytes beyond the beginning of arrW. mov al,arrB ; AL = 10h mov al,arrB+1 ;AL=20h (mem with displacement) mov ax,arrW+2 ; AX = 5678h mov ax,arrW+1 ; AX = 7812h ; little endian convention! mov ax,arrW-2 ; AX = 2010h negative ; displacement permitted 26 Data Transfer Instructions (cont.) The XCHG instruction exchanges the content of the source and destination operands: XCHG destination,source Only mem and reg operands are permitted (and must be of the same size) Both operands cannot be mem (direct mem-tomem exchange is forbidden). To exchange the content of word1 and word2, we have to do: mov ax,word1 xchg word2,ax mov word1,ax 27 Exercise 2 Given the following data segment .data A dw 1234h,-1 B dd 55h,66778899h Indicate if the following instruction is legal. If it is, indicate the value, in hexadecimal, of the destination operand immediately after the instruction is executed (please verify your answers with a debugger) MOV eax,A MOV bx,A+1 MOV bx,A+2 MOV dx,A+4 MOV cx,B+1 MOV edx,B+2 28 Simple Arithmetic Instructions The ADD instruction adds the source to the destination and stores the result in the destination (source remains unchanged) ADD destination,source The SUB instruction subtracts the source from the destination and stores the result in the destination (source remains unchanged) SUB destination,source Both operands must be of the same size and they cannot be both mem operands Recall that to perform A - B the CPU in fact performs A + NEG(B) 29 Simple Arithmetic Instructions (cont.) ADD and SUB affect all the status flags according to the result of the operation ZF (zero flag) = 1 iff the result is zero SF (sign flag) = 1 iff the msb of the result is one OF (overflow flag) = 1 iff there is a signed overflow CF (carry flag) = 1 iff there is an unsigned overflow Signed overflow: when the operation generates an out-ofrange (erroneous) signed value Unsigned overflow: when the operation generates an out-ofrange (erroneous) unsigned value 30 More on Overflows A unsigned overflow occurs if and only if (IFF) the unsigned value of the result does not fit into the destination operand This occurs IFF the unsigned interpretation of the result is erroneous It is signaled by CF=1 A signed overflow occurs IFF the signed value of the result does not fit into the destination operand This occurs IFF the signed interpretation of the result is erroneous It is signaled by OF=1 31 Simple Arithmetic Instructions (cont.) Both types of overflow occur independently and are signaled separately by CF and OF mov add mov add mov add al, 0FFh al,1 ; AL=00h, OF=0, CF=1 al,7Fh al, 1 ; AL=80h, OF=1, CF=0 al,80h al,80h ; AL=00h, OF=1, CF=1 Hence: we can have either type of overflow or both of them at the same time 32 Overflow Example mov ax,4000h add ax,ax ;AX = 8000h Unsigned Interpretation: The sum of the 2 magnitudes 4000h + 4000h gives 8000h. This is the result in AX (the unsigned value of the result is correct). CF=0 Signed Interpretation: we add two positive numbers: 4000h + 4000h and have obtained a negative number! the signed value of the result in AX is erroneous. Hence OF=1 33 Overflow Example mov ax,8000h sub ax,0FFFFh ;AX = 8001h Unsigned Interpretation: from the magnitude 8000h we subtract the larger magnitude FFFFh the unsigned value of the result is erroneous. Hence CF=1 Signed Interpretation: We subtract -1 from the negative number 8000h and obtained the correct signed result 8001h. Hence OF=0 34 Overflow Example mov ah,40h sub ah,80h ;AH = C0h Unsigned Interpretation: we subtract from 40h the larger number 80h the unsigned value of the result is wrong. Hence CF=1 Signed Interpretation: we subtract from 40h (64) a negative number 80h (-128) to obtain a negative number the signed value of the result is wrong. Hence OF=1 35 Exercise 3 For each of these instructions, give the content (in hexadecimal) of the destination operand and the CF and OF flags immediately after the execution of the instruction (verify your answers with a debugger). ADD AX,BX when AX contains 8000h and BX contains FFFFh. SUB AL,BL when AL contains 00h and BL contains 80h. ADD AH,BH when AH contains 2Fh and BH contains 52h. SUB AX,BX when AX contains 0001h and BX contains FFFFh. 36 Simple Arithmetic Instructions (cont.) The INC (increment) and DEC (decrement) instructions add 1 or subtracts 1 from a single operand (mem or reg operand) INC destination DEC destination They affect all status flags, except CF. Say that initially we have, CF=OF=0 mov bh,0FFh ; CF=0, OF=0 inc bh ; bh=00h, CF=0, OF=0 mov bh,7Fh ; CF=0, OF=0 inc bh ; bh=80h, CF=0, OF=1 37 Simple Arithmetic Instructions (cont.) The NEG instruction performs the twos complement of its operand NEG destination Where destination is either mem or reg CF=0 IFF the result is 0 OF=1 IFF there is a signed overflow. Ex: mov ax,-5 neg ax; CF = 1, OF = 0 mov ax,8000h neg ax; CF=1, OF=1 signed overflow! 38 I/O on the Win32 Console Our programs will communicate with the user via the Win32 console (the MS-DOS box) Input is done on the keyboard Output is done on the screen Modern OS like Windows forbids user programs to interact directly with I/O hardware User programs can only perform I/O operation via system calls For simplicity, our programs will perform I/O operations by using macros that are provided in the csi2121.inc file These macros are calling C libraries functions like printf() which, in turn, are calling the Win32 API Hence, these I/O operations will be slow but simple to use and easy to migrate to another OS We will examine the mechanisms involved in I/O operations later in the course 39 Character Output The putch macro prints on the screen the character of the operand’s ASCII code. Usage: putch source Where source must be a 32-bit operand i.e. either imm, reg32, or mem32 (a double word variable) .data aword dw 41h adword dd 61h .code putch aword ;error: 16-bit operand putch adword ;‘a’ is written on screen putch ‘b’ ;’b’ is written on screen mov eax,’c’ putch eax ;’c’ is written on screen putch ax ;error: 16-bit operand 40 Character Output (cont.) Also: the cursor will advance one position after printing the character The putch macro calls the putchar() function from the C library. Hence: The number 10 = 0Ah will direct the cursor to the beginning of the next line (the “newline character” in C). So the <CR> and <LF> functions are both performed on the screen. putch 10 ;move the cursor to the ;beginning of next line 41 String Output To print a string, use the following macro: putstr source Where source must be mem operand (i.e. the name of a variable). It cannot be a reg or imm operand. This macro calls printf(“%s”, ) of the C library. Hence: The number 10 = 0Ah will move the cursor to the beginning of the next line (the “newline character” in C) The string must be a “null terminating” string. The last character must have ASCII code = 0h. Ex: .data msg db “hello”,0ah,“world”,0h .code putstr msg ;prints ‘hello’ on one line ;and ‘world’ on the next line 42 Integer Output To print the signed value of an integer, use: putint source Where source must be a 32-bit operand i.e. either imm, reg32, or mem32 (a double word variable) . Ex: .data aword dw 243 adword dd -266 .code putint aword ;error: 16-bit operand putint adword ;-266 is written on screen putint -1 ; -1 is written on screen mov eax,0FFFFFFFFh putint eax ;-1 is written on screen putint ax ;error: 16-bit operand 43 Character Input To read one or more character on the keyboard, we will use the getch macro. Usage: getch This macro calls getchar() from the C library. So it uses a memory buffer that we will call the input buffer. Upon execution of getch, the input buffer is first examined. If the input buffer is empty, then getch waits for the user to enter an input line (a sequence of char ended by <CR>). Each character that the user enters (at the keyboard) is copied into the input buffer When the user enters the <CR>: the screen cursor move to the next line, the value 0Ah is stored in the input buffer and the control is pass to the instruction following getch The ASCII code of the first character entered on the keyboard will be stored in AL. The remaining bits of EAX are filled with zeros. Ex: mov eax,-1 getch ; eax=41h if the user first hits ‘A’ 44 Character Input (cont.) Example: Suppose that the input buffer is initially empty and, upon execution of getch, the users enters “hello”+<CR> on the keyboard. Then, when the control returns to the instruction following getch, EAX contains 068h (= ‘h’) and the input buffer looks like this: ‘h’ ‘e’ Pointer to next char 45 ‘l’ ‘l’ ‘o’ 0Ah Pointer to last char If the input buffer is not empty when getch is executed, then EAX will get loaded with the ASCII code of the next character in the input buffer and the pointer to the next char will increase by one. The input buffer is empty only when the pointer to the next char points beyond the last character (i.e: 0Ah) The user is prompted only when the input buffer is empty Character Input (example) Try to understand this program It first prints “?” and moves the cursor to the next line awaiting user input When the user enters “abcdef” + <CR>, the program displays (before exiting): abc .code main: putch '?' putch 10 getch putch eax getch putch eax getch putch eax ret But if, instead, the user enters “a” + <CR>, the program displays: a and the cursor moves to the next line awaiting user input. If the user then enters “bcdef”+<CR>, the program prints on the next line (before exiting): b 46 .386 .model flat include csi2121.inc end I/O Example: Case Conversion .386 .model flat include csi2121.inc .data msg1 db "Enter a lower case letter: ",0 msg2 db 'In upper case it is: ' char db ?,0 .code main: putstr msg1 getch ;char in eax and goto next line sub al,20h ;converts to upper case mov char,al putstr msg2 ret end 47