Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2 Software Tools and Assembly Language Syntax 2.1 Assembly Language Statements and Text Editors Assembly Language Syntax comments directives instructions ; Example assembly language program -- adds 158 to number in memory ; Author: R. Detmer ; Date: 10/2004 .386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD .STACK 4096 ; reserve 4096-byte stack .DATA ; reserve storage for data number DWORD -105 sum DWORD ? .CODE ; start of main program code _start: mov eax, number ; first number to EAX add eax, 158 ; add 158 mov sum, eax ; sum to memory INVOKE ExitProcess, 0 ; exit with return code 0 PUBLIC _start ; make entry point public END ; end of source code comments Comments • Start with a semicolon (;) • Extend to end of line • May follow other statements on a line Instructions • Each corresponds to a single instruction actually executed by the 80x86 CPU • Examples – mov eax, number copies a doubleword from memory to the accumulator EAX – add eax, 158 adds the doubleword representation of 158 to the number already in EAX, replacing the number in EAX Directives • Provide instructions to the assembler program • Typically don’t cause code to be generated • Examples – .386 tells the assembler to recognize 32-bit instructions – DWORD tells the assembler to reserve space for a 32-bit integer value Macros • Each is “shorthand” for a sequence of other statements – instructions, directives or even other macros • The assembler expands a macro to the statements it represents, and then assembles these new statements Typical Statement Format • name mnemonic operand(s) ; comment • In the data segment, a name field has no punctuation • In the code segment, a name field is followed by a colon (:) • Some of these fields may be omitted in some statements Identifiers • Identifiers used in assembly language are formed from letters, digits and special characters – Special characters are best avoided except for an underscore (_) with _start. • An identifier may not begin with a digit • An identifier may have up to 247 characters • Restricted identifiers include instruction mnemonics, directive mnemonics, register designations and other words which have a special meaning to the assembler Program Format • Indent for readability, starting names in column 1 and aligning mnemonics and trailing comments where possible • The assembler is not case-sensitive; but good practice is to – Use lowercase letters for instructions – Use uppercase letters for directives Creating a Assembly Language Source Code File • Use a text editor like edit (at the MS-DOS prompt) or notepad, not a word processor • Save program with .asm extension 2.2 The Assembler MASM • Microsoft Assembler • For source example.asm, invoked at the command prompt with ml /c /coff /Fl /Zi example.asm • Switches – /c compile only – /coff generate special file format – /Fl generate assembly listing – /Zi prepare for debugging Output of Assembler • Object file, e.g., example.obj – Contains machine language statements almost ready to execute • Listing file, e.g., example.lst – Shows how MASM translated the source program Listing File locations of data relative to start of data segment 8 bytes reserved for data, with first doubleword initialized to -105 00000000 00000000 FFFFFF97 00000004 00000000 00000000 00000000 00000000 A1 00000000 R 00000005 05 0000009E 0000000A A3 00000004 R locations of instructions relative to start of code segment .DATA number DWORD sum DWORD .CODE _start: mov add mov -105 ? eax, number eax, 158 sum, eax object code for the three instructions Parts of an Instruction • Instruction’s object code begins with the opcode, usually one byte – Example, A1 for mov eax, number • Immediate operands are constants embedded in the object code – Example, 0000009E for add eax, 158 • Addresses are assembly-time; must be fixed when program is linked and loaded – Example, 00000004 for mov sum, eax 2.3 The Linker Functions of the Linker • Combines separately assembled modules into a single module, ready to be loaded into memory • Arranges the individual object modules end-toend, fixing up addresses for the resulting load module • Load module is copied to memory when the program is actually executed, and additional address correction may take place at load time Using the Linker • At the command prompt link /debug /subsystem:console /entry:start /out:example.exe example.obj kernel32.lib (entered as a single command) • This command links example.obj and any needed procedures from the library file kernel32.lib to produce the output file example.exe Link switches • /out:example.exe specifies example.exe as the name of the executable program file • /entry:start identifies _start as the label of the program entry point • /debug tells the linker to generate files necessary for debugging, example.ilk and example.pdb • /subsystem:console tells the linker to generate code for a console application, one that runs in a MS-DOS window 2.4. The Debugger Functions of a Debugger • Allows a programmer to control execution of a program, pausing after each instruction or at a preset breakpoint • A programmer can examine the contents of variables in a high-level language, or registers or memory in assembly language • Useful both to find errors and to “see inside” a computer to find out how it executes programs Using WinDbg (1) • Type Windbg at the command prompt • From the WinDbg menu bar choose File, then Open Executable. Select example.exe, or the name of your executable file • Press the step into button Using WinDbg (2) • Click OK in the information window “No symbolic Info for Debugee” – source code then appears in a Windbg child window behind the Command window • Minimize the Command window • Select View and then Registers to open a window that shows contents of the 80x86 registers Using WinDbg (3) • Select View and Memory to open a window that shows contents of memory – Enter the starting memory address using the C/C++ address-of operator (&) – For example, if the first item in the data section is number, you could use &number as the starting address Using WinDbg (4) • The instruction about to be executed is highlighted in yellow. • Press the step into button to execute each instruction one at a time • When an instruction causes a register value to change, the new value is shown in red WinDbg Display 2.5. Data Declarations BYTE Directive • Reserves storage for one or more bytes of data, optionally initializing storage • Numeric data can be thought of as signed or unsigned • Characters are assembled to ASCII codes • Examples byte1 byte2 byte3 byte4 byte5 byte6 byte7 BYTE BYTE BYTE BYTE BYTE BYTE BYTE 255 ; value is FF 91 ; value is 5B 0 ; value is 00 -1 ; value is FF 6 DUP (?) ; 6 bytes each with 00 'm' ; value is 6D "Joe" ; 3 bytes with 4A 6F 65 DWORD Directive • Reserves storage for one or more doublewords of data, optionally initializing storage • Examples double1 double2 double3 double4 Double5 DWORD DWORD DWORD DWORD DWORD -1 -1000 -2147483648 0, 1 100 DUP (?) ; ; ; ; ; value is FFFFFFFF value is FFFFFC18 value is 80000000 two doublewords 100 doublewords WORD Directive • Reserves storage for one or more words of data, optionally initializing storage 2.6 Instruction Operands Types of Instruction Operands • Immediate mode – Constant assembled into the instruction • Register mode – A code for a register is assembled into the instruction • Memory references – Several different modes Memory References • Direct – at a memory location whose address (offset) is built into the instruction – The memory references in the example program are direct • Register indirect – at a memory location whose address is in a register