Download Essentials of 80x86 Assembly Language

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 2
Software Tools and Assembly Language Syntax
2.1 Assembly Language
Statements and Text Editors
Assembly Language Syntax
comments
directives
instructions
; Example assembly language program -- adds 158 to number in memory
; Author: R. Detmer
; Date:
10/2004
.386
.MODEL FLAT
ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
.STACK 4096
; reserve 4096-byte stack
.DATA
; reserve storage for data
number DWORD
-105
sum
DWORD
?
.CODE
; start of main program code
_start:
mov
eax, number
; first number to EAX
add
eax, 158
; add 158
mov
sum, eax
; sum to memory
INVOKE ExitProcess, 0 ; exit with return code 0
PUBLIC _start
; make entry point public
END
; end of source code
comments
Comments
• Start with a semicolon (;)
• Extend to end of line
• May follow other statements on a line
Instructions
• Each corresponds to a single instruction
actually executed by the 80x86 CPU
• Examples
– mov eax, number
copies a doubleword from memory to the
accumulator EAX
– add eax, 158
adds the doubleword representation of 158 to
the number already in EAX, replacing the
number in EAX
Directives
• Provide instructions to the assembler
program
• Typically don’t cause code to be
generated
• Examples
– .386 tells the assembler to recognize 32-bit
instructions
– DWORD tells the assembler to reserve space
for a 32-bit integer value
Macros
• Each is “shorthand” for a sequence of
other statements – instructions, directives
or even other macros
• The assembler expands a macro to the
statements it represents, and then
assembles these new statements
Typical Statement Format
• name mnemonic operand(s) ; comment
• In the data segment, a name field has no
punctuation
• In the code segment, a name field is
followed by a colon (:)
• Some of these fields may be omitted in
some statements
Identifiers
• Identifiers used in assembly language are
formed from letters, digits and special characters
– Special characters are best avoided except for an
underscore (_) with _start.
• An identifier may not begin with a digit
• An identifier may have up to 247 characters
• Restricted identifiers include instruction
mnemonics, directive mnemonics, register
designations and other words which have a
special meaning to the assembler
Program Format
• Indent for readability, starting names in
column 1 and aligning mnemonics and
trailing comments where possible
• The assembler is not case-sensitive; but
good practice is to
– Use lowercase letters for instructions
– Use uppercase letters for directives
Creating a Assembly Language
Source Code File
• Use a text editor like edit (at the MS-DOS
prompt) or notepad, not a word processor
• Save program with .asm extension
2.2 The Assembler
MASM
• Microsoft Assembler
• For source example.asm, invoked at the
command prompt with
ml /c /coff /Fl /Zi example.asm
• Switches
– /c compile only
– /coff generate special file format
– /Fl generate assembly listing
– /Zi prepare for debugging
Output of Assembler
• Object file, e.g., example.obj
– Contains machine language statements
almost ready to execute
• Listing file, e.g., example.lst
– Shows how MASM translated the source
program
Listing File
locations of data relative to
start of data segment
8 bytes reserved for data, with first
doubleword initialized to -105
00000000
00000000 FFFFFF97
00000004 00000000
00000000
00000000
00000000 A1 00000000 R
00000005 05 0000009E
0000000A A3 00000004 R
locations of instructions relative
to start of code segment
.DATA
number DWORD
sum
DWORD
.CODE
_start:
mov
add
mov
-105
?
eax, number
eax, 158
sum, eax
object code for the three instructions
Parts of an Instruction
• Instruction’s object code begins with the
opcode, usually one byte
– Example, A1 for mov eax, number
• Immediate operands are constants
embedded in the object code
– Example, 0000009E for add eax, 158
• Addresses are assembly-time; must be
fixed when program is linked and loaded
– Example, 00000004 for mov sum, eax
2.3 The Linker
Functions of the Linker
• Combines separately assembled modules into a
single module, ready to be loaded into memory
• Arranges the individual object modules end-toend, fixing up addresses for the resulting load
module
• Load module is copied to memory when the
program is actually executed, and additional
address correction may take place at load time
Using the Linker
• At the command prompt
link /debug /subsystem:console /entry:start
/out:example.exe example.obj kernel32.lib
(entered as a single command)
• This command links example.obj and any
needed procedures from the library file
kernel32.lib to produce the output file
example.exe
Link switches
• /out:example.exe specifies example.exe as
the name of the executable program file
• /entry:start identifies _start as the label of
the program entry point
• /debug tells the linker to generate files
necessary for debugging, example.ilk and
example.pdb
• /subsystem:console tells the linker to
generate code for a console application, one
that runs in a MS-DOS window
2.4. The Debugger
Functions of a Debugger
• Allows a programmer to control execution
of a program, pausing after each
instruction or at a preset breakpoint
• A programmer can examine the contents
of variables in a high-level language, or
registers or memory in assembly language
• Useful both to find errors and to “see
inside” a computer to find out how it
executes programs
Using WinDbg (1)
• Type Windbg at the command prompt
• From the WinDbg menu bar choose File,
then Open Executable. Select
example.exe, or the name of your
executable file
• Press the step into button
Using WinDbg (2)
• Click OK in the information window “No
symbolic Info for Debugee” – source code
then appears in a Windbg child window
behind the Command window
• Minimize the Command window
• Select View and then Registers to open a
window that shows contents of the 80x86
registers
Using WinDbg (3)
• Select View and Memory to open a
window that shows contents of memory
– Enter the starting memory address using the
C/C++ address-of operator (&)
– For example, if the first item in the data
section is number, you could use &number
as the starting address
Using WinDbg (4)
• The instruction about to be executed is
highlighted in yellow.
• Press the step into button to execute each
instruction one at a time
• When an instruction causes a register
value to change, the new value is shown
in red
WinDbg Display
2.5. Data Declarations
BYTE Directive
• Reserves storage for one or more bytes of data,
optionally initializing storage
• Numeric data can be thought of as signed or
unsigned
• Characters are assembled to ASCII codes
• Examples
byte1
byte2
byte3
byte4
byte5
byte6
byte7
BYTE
BYTE
BYTE
BYTE
BYTE
BYTE
BYTE
255
; value is FF
91
; value is 5B
0
; value is 00
-1
; value is FF
6 DUP (?) ; 6 bytes each with 00
'm'
; value is 6D
"Joe"
; 3 bytes with 4A 6F 65
DWORD Directive
• Reserves storage for one or more
doublewords of data, optionally initializing
storage
• Examples
double1
double2
double3
double4
Double5
DWORD
DWORD
DWORD
DWORD
DWORD
-1
-1000
-2147483648
0, 1
100 DUP (?)
;
;
;
;
;
value is FFFFFFFF
value is FFFFFC18
value is 80000000
two doublewords
100 doublewords
WORD Directive
• Reserves storage for one or more words
of data, optionally initializing storage
2.6 Instruction Operands
Types of Instruction Operands
• Immediate mode
– Constant assembled into the instruction
• Register mode
– A code for a register is assembled into the
instruction
• Memory references
– Several different modes
Memory References
• Direct – at a memory location whose
address (offset) is built into the instruction
– The memory references in the example
program are direct
• Register indirect – at a memory location
whose address is in a register