Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE 485/585 Microprocessors Chapter 3 x86 Assembly Language 1 Herbert G. Mayer, PSU Status 11/22/2016 1 Syllabus Motivation 16-bit, 32-bit, 64-bit Processor Null Program Print Character Print String INT Function Assembler Abbreviations Macros Procedures Assembly and Linking nasm Assembler Summary Appendix 2 Motivation Almost impossible to communicate with a microprocessor on the binary level; causes insanity! Assembler offers abstraction, relocatability, and program reuse Symbolic names permit convenient definition and reference of named data and code objects Assembler offers high level data and control constructs, similar to high-level languages Assembler programming allows high level of control over the target machine And achieves highest performance -for short code sections 3 Motivation Intel x86 is the most widely used microprocessor for general computing; made by Intel and AMD The ARM processor is most widely used processor for portable devices, e.g. tablets and cell phones We use Intel x86 here to explain the relation of μP and assembly language; for any one μP, there may be many assemblers, but only a single binary code The μP architecture defines details of the assembler instructions; yet some assembly language detail is independent of architecture E.g. the syntactic order in which operands are listed in assembly instructions is arbitrary, but the bits have to be assembled into their specific bit positions of a machine instruction 4 Motivation Any machine instruction has its corresponding assembler syntax; AKA mnemonics Different manufacturers of an assembler may have different syntax rules and different mnemonics for the same machine instruction For example, some define the destination register to be situated in the leftmost position of the various defined operands; e.g. a load instruction for a hypothetical machine could be: ld r1, [foo] -- load word at address foo into reg r1 Others might reverse the order, or use different mnemonics, or name registers differently, such as: load foo, %r1 -- load word at address foo into reg r1 5 Motivation Some manufacturers refer to moving bits from memory into a register as a load instruction (IBM); others as a move instruction (Intel) Assembly Language bridges the gap between low level binary machine instructions and higher level interface with human programmers Binary instructions execution on a digital computer, while an assembler provides a tool of expressing programs in readable text form, readable by programmers Assembly language is by no means high-level in the sense of machine independent, structured, or objectoriented It is a low level, target machine specific interface; but shields programmers from the tedium of binary code 6 Motivation Users do not deal with the target machine in terms of bits that represent binary machine instructions An assembler is a piece of system software that maps an assembly source program into binary instructions Thus assembly language provides an abstraction: 1. It elevates the user to the level of textual language, up from the level of binary object code 2. Several, different assemblers may do this in syntactically different ways for the same target μP 3. Yet the generated binary code has to be identical for each assembler, in order to render the object code executable on the targeted μP 7 Motivation Common to many architectures is the notion (and separation) of data space, instruction space, and perhaps other areas of program logic (stack space, read-only space etc.) The x86 architecture embodies so called data segments, code segments, stack segments, and numerous of these if needed Each segment is identified at run time by a segment register Offsets to specific data or code elements are identified by offsets from the start of their respective segment 8 Motivation For example, the code label next: will be interpreted by the hardware as seg: offset, where seg is the segment register cs, and offset is the offset of label next from the start of the code segment Let’s say the offset of next is 248x and the value in the cs register is 2030x, then the resulting run time (code) address is 20548x Note the left-shift of the segment address by 4 bits! This is possible, and required, since all segments are required to be aligned at modulo-16 addresses on the Intel x86 architecture Thus a segment’s starting address is always a multiple of 16, and its binary address would always have the rightmost (low-order) 4 bits 0; hence can be skipped in asm source code and 16 bits suffice 9 Motivation This chapter introduces complete programs, written in assembly language Starting with the smallest possible but complete assembly program, we progress to more sophisticated programs One example emits a single character, the next prints a complete string onto the standard screen, followed by conventions that allow us to communicate with the assembler in an abbreviated way We also discuss macros and simple procedures (AKA functions) with calls, returns 10 16-Bit, 32-Bit, 64-bit Architecture The Intel x86 processor started out as a 16-bit architecture in the late 1970s The x86 product name was: Intel 8086 μP Then the x86 architecture grew to become a 32-bit architecture The initial product name being Intel 80386; yes, there were preliminary versions, named 80186 and 80286, with very short lives The 32-bit version was backwards compatible with the 16-bit architecture and could execute old 16-bit code In early 2000, since AMD had produced a 64-bit version of the x86 family, very much to the surprise of Intel, Intel productized a 64-bit version as well, in addition to the new and different Itanium 11 16-Bit, 32-Bit, 64-bit Architecture The AMD product of 64-bit x86 name was AMD64 Intel’s name: Intel 64 Old 16-bit and 32-bit x86 code is compatible and executes without issue on the new 64 bit processors Through not with optimal speed, as legacy object code cannot take advantage of new instructions that may speed up certain applications 12 Photo of AMD64 μP 16-Bit, 32-Bit, 64-bit Architecture AMD’s 64-bit version of the old x86 architecture must have sent shock waves through Intel, which at the time of AMD’s release had no published plans to release a 64-bit version of the old x86 machine That quickly changed, as Intel had been smart enough, to have its skunk work design new Intel 64bit μP in secrecy All 8 old registers were expanded to 64 bits, names modified correspondingly, to differentiate them from their 32-bit or 16-bit siblings Old names, e.g. “eax” for 32-bit version of ax register were modified to “rax”, for the 64-bit version of the ax register; reminder: ax register has 16 bits Intel added 8 more GPR to the register-starved architecture; these are known as rn, with n = 8..15 13 16-Bit, 32-Bit, 64-bit Architecture 14 MMX Extensions MMX registers introduced with Intel Pentium® II processors; Streaming SIMD Extension (SSE) introduced later with Pentium III MMX regs, standing for Multi Media Extensions. Other meanings of acronym MMX exist! Visible as floating-point registers fp(i), 80 bits long, to process 80-bit version of IEEE 754 format 80-bit form that became FP standard Named fp(0) .. fp(7), known in assembler as fp0 .. fp7 Aliased with 8 64-bit MMX registers; can use one or the other in respective lengths, but not both at same time; switch is needed to use one or the other With SSE introduction, MMX length grew to 128 bits 15 MMX Extension Feature Pentium® III Processor MHz 450-600 MHz L2 Cache 512k off-die Execution Type System Bus 100MHz MMX™ Technology Yes Streaming SIMD Extensions Streaming SIMD Extensions 2 Manufacturing Process Chipset Dynamic Pentium® III Processor 600 MHz – 1.13GHz 256k on-die Dynamic Pentium® 4 Processor Northwood 1.5 GHz 2+ GHz 256k on-die 512k on-die Intel® NetBurst™mArch Intel® NetBurst™mArch 400MHz 400/533MHz (4x100 MHz) (4x100/133 MHz) Yes Yes Yes Yes Yes Yes Yes No No Yes Yes .25 micron ICH-1 133MHz .18 micron ICH-2 16 .18 micron ICH-2 .13 micron ICH-2 MMX Sample Operations 17 XMM Extension 8 new 128-bit registers with SSE introduction, named XMM0 .. XMM7 Not aliased with any others, usable with MMX XMM registers usable as scratch, for various purposes Handle all data types simultaneously without penalty; e.g.: xmm0 - Extended SIMD integer data xmm1 - Single-precision FP xmm2 - Double-precision FP, etc. Usage does not add latencies, if all operations on a register are of consistent same data type 18 XMM SHUFPD Operation SHUFPD: Shuffle Packed Double-FP XMM2 1 0 y2 y1 XMM1 y2-y1 XMM1 SHUFPD XMM1, XMM2, 3 XMM1 y2 SHUFPD XMM1, XMM2, 2 XMM1 y2 19 1 0 x2 x1 x2-x1 // binary 11 x2 // binary 10 x1 16-Bit, 32-Bit, 64-bit Usage In assembly code below we use the following names for the ax register, depending on 16-bit, 32bit, or 64-bit modes: ax 16 bits; also al is the low order byte register eax 32 bits rax 64 bits Ditto with the other registers, for example, the bx: bx 16 bits; also bh is the high order byte register ebx 32 bits rbx 64 bits Etc. 20 A Null Program In x86 Assembly Language 21 Null Program Goal here is to craft an x86 assembly language program that assembles, links, loads and executes correctly, and then does nothing Set up segments: code, data, and stack Here only the Code Segment as the others are empty Note the ’code’ string to identify code segment Communicate implied seg portion of seg:offset in assume instruction Define start address (actually offset) via label, here label start: Labels are user-defined identifiers, each followed by colon, in the code segment 22 Null Program ; Source: out1.asm ; Purpose:simplest program, no data seg, no stack code_s segment ’code’ ; ’code’ identifies segment assume cs:code_s ; implied seg register cs ; use of some magic numbers: start: mov al, 0h ; mov ah, 4ch ; int 21h ; code_s ends ; end start 0h, 4ch, 21h termination code, same as 0 to terminate: place 4ch in ah call system sw for help: 21h end of code segment, good death ; end argument defines start 23 Null Program Use manufacturer-provided assembler services: Here 4ch to terminate; the ‘h’ stands for ‘hexadecimal’ Run-time services, requested via INT 21h Service refinement specified in register ah, ‘h’ stands for ‘high’ byte of the 2 bytes in ax Return code is 0, meaning: no errors occurred Comments start anywhere on line with ; Comments end at the end of line Can be different in different assemblers! Careful! Assembler used here assumed to be Microsoft product: masm or ML; ML is newer, compatible system SW tool 24 Print Single Character: We Choose ’$’ 25 Print Character ‘$’ Goal to craft an x86 assembly language program that assembles, links, loads and executes a complete program for the purpose of printing one character Define also data and stack segment; though they will remain unused; just used for demonstration Use assembler instruction to define data, here a single machine word, via dw: dw 999 ; reserves 1 word, initialized: 999 And we define an array of 100 machine words, via the dup pseudo-opcode dup: 100 dup( 0 ) ; defines 100 words, initialized: 0 ; but all unused in program below! 26 Print Character ’$’ ; Source: out2.asm ; Purpose: simple DOS program to output a character: ‘$’ data_s segment ; unused data segment dw 999 ; define a word, init 999 data_s ends stack_s stack_s code_s start: code_s segment ; unused stack segment dw 100 dup( 0 ) ; reserve 100 words, init 0 ends segment 'code' ; THE Code Segment assume cs:code_s, ds:data_s mov ax, seg data_s ; initialize ds via ax mov ds, ax ; cannot load directly into ds mov dl, '$' ; char to print is assumed in dl mov ah, 2h ; call 2h emits char in dl int 21h ; call OS routine, e.g. DOS mov ax, 4c00h ; termination code in ah + al int 21h ; terminate finally via call ends ; repeat seg name at ends end start ; say: Where to start 27 Print Character ‘$’ Again a special DOS system routine is called to provide help: INT 21h The specific argument, communicating which help is needed, must be passed in register ah Value 2 (AKA 2h) in ah states: character output is desired OS service routine 2 prints a char; it outputs the one found in register dl; that is the ‘$’ character Moving 4c00h into register ax is same as 4ch into register ah and 00h into al Note that one of the h qualifiers says “hex”, while the other says “high”!! To confuse students 4c00h is just two byte literals concatenated 28 Printing a Character String 29 Print String Goal now is to craft an x86 assembly program that assembles, links, loads and executes a program to print a character string The Data Segment defines a string of bytes, initialized to some string literal, identified by symbol msg This name msg is a user-defined name for the byte address, where the string starts Note the $ character to end a string literal; hence length has no inherent upper limit Used as end criterion for system SW routine 9 Stack segment here is solely a dummy segment: It holds 10 unused strings, each of length 16, solely for demonstration 30 Print String ; Source: out3.asm ; Purpose: simplest program to output character string data_s segment msg db "Hello CCUT class$" ; was done in China! data_s ends stack_s stack_s code_s start: code_s segment db ends ; unused 10 dup( "---S t a c k----" ) ; repeating name stack_s OK segment 'code' assume cs:code_s, ds:data_s mov ax, seg data_s ; silly detour via ax mov ds, ax ; ds points to data_s mov dx, offset msg ; System SW prints mov ah, 9h ; sys call 9h emits string int 21h ; call OS routine mov ax, 4c00h ; term code in ah + al int 21h ; term finally via call ends ; label seg name at ends end start ; start here! Yup: Microsoft 31 Print String System SW routine 9 emits character string to the standard output file; note 9 is same as 9h Whose start address it finds in ds:offset, offset communicated in register dx Note the built-in system-SW pseudo-function offset applied to a data label, here label msg System-SW also provides built-in seg pseudofunction to generate another part of the final address 32 INT Function The x86 INT instruction, AKA interrupt, is not what the computer sciences call an interrupt Instead, this is a call to a low-level system SW routine, named INT Parameterized by the single-byte argument residing in the ah register The actual system SW being executed as a result of INT is dependent on the actual operating system on which the x86 code executes; here Microsoft DOS Thus it may be different on a Linux system, Windows, or Unix system 33 Assembler Abbreviations 34 Assembler Abbreviations Assembler directive .mode small allows for certain default abbreviations and assumptions For example data, code, stack, @data are predefined in Microsoft assemblers, as are assume statements Here another string is printed, that string is “Hello” Note again the $ terminator, must be supplied Different meanings of $ on different target systems E.g. $ means “current code address” in Linux Under Microsoft assembler SW, the macro @data is predefined by ML (or masm), same as seg data Note again offset function, to compute the byte distance (i.e. offset) from start address of the segment 35 Assembler Abbreviations ; Source file: out4.asm ; Purpose: simpler program to output string .model small ; assumes stack data code .stack 10h hi .data db "Hello$” .code start: mov mov mov mov int ; assumes segment name: stack ; assumes segment name: data ; ax, @data ; ds, ax ; dx, offset hi ; ah, 9h ; 21h ; assumes segment name: code @data predefined macro now data segment reg set string 2 b output by System SW System SW 9h emits string call System SW mov int ax, 4c00h 21h ; we want to terminate: ah + al ; terminate finally end start ; start here! 36 Assembler Abbreviations Note again the System SW routine 9 under Microsoft system SW, to output some $-terminated string of characters, whose at address is found in register dx Program using .model small abbreviation is smaller, more compact, easier to read The .code ends previous segment, if any (here data) And starts code segment The .data ends previous segment, if any And starts the data segment So how does one output the ‘$’ character string? 37 Macros 38 Macros Programmers get tired of writing segment … ends The .model small allows defaults and abbreviations Macros make program source more readable, easier to maintain; here are the rules: Macros can be defined anywhere in assembler source The initial assembler translation process extracts all macro definitions, stores them during assembly time, and uses (expands) them, each time a macro name is found in the asm source Macros are introduced by user defined name and the macro keyword Terminated by endm keyword 39 Macros ; Source file: out5.asm ; Purpose: macro-ized program to output character string start macro mov ax, @data mov ds, ax endm ; ; ; ; no parameters @data predefined macro now data segment reg set end of start macro Put_Str macro Str ; one formal parameter, "Str” mov dx, offset Str; string 2 b output by DOS mov ah, 9h ; DOS call 9h emits string int 21h ; call system SW endm ; end of Put_Str macro Done macro mov mov int endm ret_code ah, 4ch al, ret_code 21h ; ; ; ; ; formal parameter ”ret_code” want to terminate, ah = 4c communicate: all is o.k. terminate finally via DOS end of macro body: Done 40 Macros, Program Cont’d .model small ; allow predefined assumptions .stack 10h ; assumes segment name: stack .data ; assumes segment name: data hi db "Hello$" .code main: start Put_Str hi Done 0 end main ; terminate string with $ ; assumes segment name: code ; compare to page 31! Way shorter! ; use of macro ”start” ; invoke macro ”Put_Str” with hi ; use of macro ”Done” ; start here! 41 Macros Macros specify 0 or more formal macro parameters, which can be referenced in the macro body At the place of macro definition, these parameters are named formal parameters Formal parameters follow the macro keyword at the place of definition At the place of use (the place where they are expanded) formals are substituted by actual parameters When macro name is used, its body is expanded inline at that place, with all actual parameters taking the place of the formal ones 42 Assembler Procedures: Like High-Level Language Procedures 43 Procedures Assembler procedure identified by proc and endp Procedures can be called and provide a syntactic grouping mechanism to form physical modules containing logically connected actions The Microsoft syntax rule for procedure names does not allow : as used for labels Return instruction ret ends a procedure body and allows return to the place of call, immediately after the call instruction Physical procedure definitions allow logical modularization 44 Procedures ; Source file: out6.asm ; Purpose: modular macro program to output string start macro mov mov endm ; ax, @data ; ds, ax ; ; Put_Str macro Str .data hi db "Hello$” . . . main main .code proc start Put_Str Done ret endp end hi 0 main no parameters @data predefined macro now data segment reg set end of “start” macro body ; “Str” must be data label ; assumes name: data ; terminate string with $ ; ; ; ; ; ; assumes name: code begin of procedure body invoke “start” macro invoke “Put_Str” with actual invoke “Done” with actual 0 redundant return ; entry point is “main” 45 Procedures Like in High-Level language programs, procedures are a key syntax tool to modularize Eases pain of asm programming Physical modules (procedures) encapsulate data and actions that belong together Physical modules –delineated by the proc and endp keywords– are the language tool to define such logical modules Net result: programs that are easier to write, and above all, easier to read 46 Assembly and Linking Of Full Programs 47 Assembly Linking is the process of binding 2 or more pieces of software together in a way that they constitute one running program Clearly the start address, where execution begins, must be defined, by convention Typical tools to link include: 1. Microsoft Macro Assembler masm 2. Borland Macro Assembler tasm 3. Microsoft Macro Assembler ml 4. Microsoft Linker link 5. Borland Linker tlink 48 Assembly With MASM The Microsoft macro assembler old version has the name masm A newer assembler from Microsoft is named ml This section explains the masm command briefly The masm command in version 5.10 and older has 4 arguments, separated from one another by commas. These arguments are file names Arguments are considered omitted, if no comma (and thus no file name) is given The assembler prompts for each omitted one, so it is generally better to provide them, at least the commas, lest there will be repeated interaction with the assembler asking for file names, or hitting of carriage returns 49 Assembly With MASM It is a nuisance in masm 5.10 that the last comma (the third one to separate 4 arguments) must be followed by another comma (or semicolon, indicating the end of a command line) Else the assembler does not recognize that the default should be used for the fourth argument If commas without file names are given, then default file names are assumed The four file names, which are the arguments of the masm command, are left to right: 50 Assembly With MASM 1. assembly source program, e.g. source.asm 2. object program generated by assembler, e.g. source.obj 3. the listing, generated by the assembler, say source.lst; yes, in days of old, people actually created paper listings of programs being processed 4. the cross-reference file, named source.crf 51 Assembly With MASM Suffixes obj, lst, and crf are automatically generated by the assembler, if no other names are provided Some complete masm commands, for the assembler file src1.asm would be: masm src1.asm, src.obj, src.lst, src.crf; no prompting masm src1,src1,src1,src1 ; no prompting masm src1,src1.obj,src1,src1.crf ; no prompting masm src1,,,; ; no prompting In the above cases the assembler will not prompt you, because you provided all file names It was smart enough to think up the suffixes (like .lst and .obj) from the respective positions 52 Assembly With MASM Some incomplete masm commands for source file src2.asm, are shown next The assembler will prompt the user for the missing ones: masm src2.asm, src2.obj; asks for: list, cross ref file (xref) masm src2,foo,src2 ; creates foo.obj, src2.lst, asks xref masm src2,,bar.lst ; creates src2.obj, bar.lst, asks xref masm src2 ; asks for object,list, cross ref file Borland Macro Assembler tasm 5.10 Similar to masm, but command is tasm 53 Linking Assembler Programs 54 Linking The Microsoft link command also has 4 arguments, one input file and 3 output files Input is the object to be linked The object may be a concatenation of multiple object files, typically ending in the .obj suffix, concatenated via the + operator. For example: link mem0 + putdec,,, creates an executable mem0.exe The file name mem0 is derived from the first part of the first argument; suffix .exe is assumed Also, the object file putdec.obj is used as input, used to resolve external names used in mem0.obj 55 Linking The link command has 4 arguments: the 4 file names are: 1. object files, concatenated by + with default suffix .obj 2. the linked executable with suffix .exe 3. the load map file, whose name ends in .map 4. the library If the input file is provided without suffix then the suffix .obj is assumed If the executable file is specified without suffix, then .exe is assumed Any other file and suffix is allowable too 56 Linking The file for the load map can be specified If none is provided then the file name nul is generated by the linker If no file suffix is provided, then the .map suffix is assumed. Similarly, for the library a file name must be specified The suffix is .lib The commands below do not cause the linker to prompt for additional file name inputs, because sufficient information is assumed: link mem0 + putdec,,,, ; mem0.exe, no map, no library link mem0+putdex,foo.bar,,, ; generate executable foo.bar link putdec+mem0,mem0.exe,,, ; mem0.exe 57 Linking Concatenation operator + may be embedded in any number of blanks Commas may be surrounded by 0 or more blanks The order of specifying object files is immaterial, provided the main entry point is unambiguous The commands below cause the linker to prompt for some additional information: link mem0 + putdec ; executable, map, and library link mem0+putdec,x.y; ask for map and lib link putdec+mem0,, ; gen putdec.exe, ask for map and lib 58 Main Entry Point Each assembly unit (.asm source file) must end in an end directive (in MS AKA as end statement) This end statement may have a label, identifying one of the labels of proc names of the program. Such a label specifies the entry point, i.e. the initial value of ip, set by the loader However, if an executable file is composed of multiple objects, there must be one single entry point. All other source modules should not specify an argument after their end statement If, however, two or more object modules to be linked into an executable do have entry points specified, masm does not complain! Instead, it takes the first one of the objects listed as the first argument in the link command. And if this is not the intended entry point, program execution will bring surprises 59 nasm Assembler 60 Nasm Assembler Simplest possible, meaningful asm program that outputs a character string. Assumes translation via Borland nasm command 1. ; introduces comment, until the end of source line 2. %define macro_name value the value is replaced, whenever the macro name is found 3. section pseudo instruction defines one of various data segments, or code or stack segment 4. mov is instruction to move bits to register, memory on the left, from source on the right 5. $ pseudo-operator means: Current value of location counter. 6. int 80h instruction is an x86 instruction that uses GPRs to determine what to do 61 Nasm Assembler ; Asm: Netwide Assembler (nasm) ; Note: uses Linux system calls, not Microsoft! ; Define convenient symbolic names for Linux system calls %define __NR_exit %define __NR_write %define STDOUT_FILE 1 4 1 ; symbolic names system dependent ; 4 for output under Linux ; 1 for standard out under Linux section .data message: msglen: ; Other section names: .rodata and .bss ; have specific, and distinct, meanings db "Hello CCUT class" equ $ - message ; # bytes in message section .text ; All executable code is in the .text section global _start ; required to announced name “start” for linker start: ; used by linker; similar to "main()" in C 62 Nasm Assembler ; Display the string on stdout mov eax, __NR_write ; system call number for write mov ebx, STDOUT_FILE ; write string to stdout mov ecx, message ; address of string mov edx, msglen ; number of bytes to write int 80h ; call Linux ; Exit mov mov int the program eax, __NR_exit ebx, 0 80h ; system call number for exit ; exit status 0: "success" ; call Linux 63 Summary Comments introduced by ; .model pseudo instruction tells assembler: which memory model to be used, pulls in predefined macros .stack is one such macro; tells assembler: Use of stack is included in this program! Leftmost column used for optional labels Labels are symbolic names you can refer to in the source; eases relocation Next column used for commands or pseudo commands; but if no label is used, first string is the asm command data_s is a symbolic name chosen to name a data segment Define string literal by embedding it between pair of double quotes, e.g. "Hello ECE class”; remember ‘$’ 64 Summary The ends pseudo instruction says: end of segment; may be redefined any number of times again The assume pseudo instruction tells assembler, which value to set cs and ds registers to The segment ‘code’ pseudo instruction defines the code segment mov is instruction to move bits to/from register, memory or (if source) literal move offset message instruction breaks address into segment/offset pair and uses offset The int 21 instruction is an x86 interrupt (really a system call) that uses other registers to determine what to do The end start pseudo instruction says: start execution at first address of the segment with the symbolic name start 65 Appendix: Some Definitions 66 Definitions Address Identity of any one of the distinguishable memory units, e.g. bytes or words On the x86 architecture a logical address is a pair seg:offset, which is translated by the hardware into linear address The segment and the offset are 16 bits long each in real mode The machine address, called a linear address, is 20 bits long on the original x86 microprocessor Since the 1980s Intel has produced the more famous 32-bit version of its x86 μP, and since the 2000s, the 64-bit version has become common 67 Definitions Assembler A source to object translator, reading relocatable, abstract, machine-specific source programs, translating them into binary object code After linking, the binary code is executable 68 Definitions Binary Object These are strings of bits, which, when interpreted by the target machine, are legal machine operations plus associated memory references Jointly, these bit strings represent executable programs 69 Definitions Code Segment The code segment is a subsection of memory which holds executable instructions Possible to embed so called immediate operands in the code segment, but these are not meant for execution; generally they are prevented from being executed by a branch round immediate data On the x86 microprocessors, the start address of the code segment is identified by the cs register A complete program is comprised of one or more code segments 70 Definitions Data Segment Subsection of memory which holds data to be manipulated Like any segment, a data segment is identified by a segment register, holding its start address Such an address must be evenly divisible by 16 on the x86 family processors Such aligned addresses are also the starts of paragraphs 71 Definitions Offset Byte distance of a named object (addressable unit) from the beginning of an area that encompasses the name 72 Definitions Relocation, Relocatability Ability of data to be placed in any location of memory For example, referring to data (or object code) by offsets relative to some start address allows the code to be placed anywhere, as long as the respective start address is always added at execution time Even object code can be relocatable, if al address references in that code are relative to the code’s start address 73 Definitions Segment Subsection of memory with no fixed or predefined length restriction A segment is identified by a segment register and holds either code, data, or stack space 74 Definitions Stack Data structure holding data, identified by a stack segment register (ss). Access to these data is restricted in a specific way, often referred to as lastin, first-out The amount of actively live data varies over time: Increase of data is accomplished through an operation called pushing, decreases via popping A stack segment register points to the beginning of the stack While the stack pointer register (sp) to the current top This top (i.e. the value of the sp register) varies frequently during execution 75 Definitions Top of Stack Select the element on the stack that is immediately accessible, AKA addressable That element is said to be “at the top” There may be other elements in the stack as well, hidden by the top element Additional elements are created by pushing, and elements are removed by popping If the stack is empty, and the top element is accessed, an error occurs 76 Bibliography 1. Jan’s Linux and Assembler: http://www.janw.easynet.be/eng.html 2. Webster Assembly Language: http://webster.cs.ucr.edu/ 3. Nasm assembler under Unix: http://www.int80h.org/bsdasm/ 77