Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Assembly Language http://iescobar.com Msc. Ivan A. Escobar Broitman Enero Mayo 2012 CHAPTER 1 Introduction Microprocessor Silicon chip that contains a central processing unit (CPU). The “Brain” of all personal computers, most workstations, and a great number of digital devices. In charge of program execution. It can be RISC or CISC. 2 Bus Connections Address Bus CPU Memory I/O Control Bus Data Bus 3 Bus Connections (continued) A processor communicates with the system’s memory and I/O circuits by means of signals that travel through a set of cables or connections known as buses. Address Bus: Holds the memory address that will be accessed. Data Bus: Holds the piece of data to read or write. Control Bus: Indicates the operation to be done (read or write). 4 CPU Instructions Each instruction has: an opcode (operation code), that indicates which operation to perform. zero o more operands, which may be registers, constants or memory locations. 5 Fetch-Execute Cycle Fetch: 1. Fetch an instruction from memory. 2. Decode the instruction to determine the operation. 3. Fetch data from memory if necessary. Execute: 4. Perform the operation on the data. 5. Store the result in memory if needed. 6 RISC: Reduced Instruction Set Computer Microprocessor that uses a relatively small number of fast but simple instructions. Cheaper to design and produce because they require less transistors. Mainly used in workstations. 7 CISC: Complex Instruction Set Computer Microprocessor that uses a significantly large amount of complex (specialized) instructions. Mainly used for Intel’s x86 architecture. 8 Programming Languages High Level Language Assembly Language Machine Code Hardware 9 Machine Code Lowest level programming language. Each CPU instruction is represented as an opcode, which is an unsigned integer number. Only language that the computer really understands. Difficult to understand by human beings. 10 Machine Code Example The opcode for adding one to the accumulator in the Intel x86 is: 01000000b or 0x40 11 Assembly Language Same instruction set as machine code. Each opcode is replaced by a symbolic name. Less cryptic for human beings. 12 Assembly Language Example The Intel x86 assembly language instruction that adds one to the accumulator is: inc eax 13 Assembler In order to execute a program written in assembly language, it first has to be translated to machine code using a special program called an assembler. inc eax Assembler 0x40 14 High Level Language Has less primitive instructions than assembly language and machine code. Program text is much more like natural language. Easier to understand by human beings. Examples: FORTRAN, LISP, COBOL, BASIC and C. 15 Compiler if(x == 0) x = x + 5; A program written in a high level language may be translated to machine code using a compiler. Compiler cmp jne add .L1 0x81FE00000000 0x7506 0x81C605000000 esi,0 .L1 esi,5 Assembler 16 Interpreter An interpreter translates a high level language program to an intermediate form that is subsequently executed by a virtual machine. IF X = 0 THEN X = X + 5 Interpreter Translator Intermediate Form Virtual Machine 17 Assembly Language Advantages Program execution speed. Executable code size. “Bare bones” programming: special instructions (FPU, MMX) I/O ports special CPU modes of operation 18 Assembly Language Disadvantages Error prone. Long and tedious to write. Difficult to understand and modify. Strongly tied to a specific computer architecture. 19 Commonly Used Assembly Language Applications Operating Systems Device Drivers Communication Software Real Time Systems Embedded Systems Graphics 20 Reasons for Studying Assembly Language To understand some of the low level details of how a real computer operates. To get to know some technologies that can only be adequately understood using assembly language. To obtain a better appreciation of the innerworkings of a compiler. 21 What’s next? Programming Languages Course Computer Science (ISC) Microprocessors Course Computer Engineering (ISE) Assembly Language Course 22 CHAPTER 2 The Intel x86 Architecture Moore’s Law In 1965, Intel’s co-founder Gordon Moore, made the following observation: Approximately every 18 months microchips duplicate their power, while their cost stays roughly the same. 24 Intel Processors year 1970 1975 1980 1985 1990 1995 2000 P7 transistors 10M P6 1M P5 80486 80386 80286 100K 10K 8086 0 8080 4004 25 Moore’s Law 26 4004 (1971) First microprocessor. Built by Intel for Busicom calculators. 4-bit registers. 108 kHz. 2,300 transistors. 640 bytes of memory. 27 4004 (1971) 28 8080 (1974) Used in the MITS Altair 8800, the first commercial personal computer. 8-bit registers. 16-bit address bus. 2 MHz. 6,000 transistors. 64Kbytes of memory 29 8080 (1974) 30 8086/8088 (1978) Used in the original IBM PC. First 16-bit microprocessor. 20-bit address bus. 16-bit (8086) and 8-bit (8088) data bus. 4.77+ MHz. 29,000 transistors. Addressable memory 1Mb. 31 8086/8088 (1978) 32 80286 (1982) Used in the original IBM PC/AT. 24-bit address bus. 16-bit data bus. 6+ MHz. 134,000 transistors. Multitasking, protected mode and virtual memory. Addressable memory 16Mb. 33 80286 (1982) 34 80386 (1985) 32-bit registers. 32-bit address bus. 32-bit data bus. Pipelining. 16+ MHz. 275,000 transistors. Addressable memory 4Gb. 35 80386 (1985) 36 P4: 80486 (1989) Better execution speed. Integrated floating point unit (FPU). 8 KB L1 cache. 25+ MHz. 1’200,000 transistors. Addressable memory 4Gb. 37 P4: 80486 (1989) 38 P5: Pentium (1993) 64-bit data bus. 8 KB L1 cache for data and 8 KB for code. Dual pipeline for integer operations. 60+ MHz. 3’100,000 transistors. Addressable Memory 4Gb. 39 P5: Pentium (1993) 40 P6: Pentium Pro (1995) 36-bit address bus. 256 KB L2 cache. Superpipelining. Speculative and out of order execution. 150+ MHz. 5’500,000 transistors. Addressable Memory 64Gb. 41 P6: Pentium Pro (1995) 42 P55C: Pentium MMX (1997) Classic Pentium with MMX technology: 64-bit SIMD multimedia and communication extensions. 16 KB L1 cache for data and 16 KB for code. 166+ MHz. 4’500,000 transistors. Addressable memory 4Gb. 43 Klamath: Pentium II (1997) Pentium Pro with MMX technology. 16 KB L1 cache for data and 16 KB for code. 512 KB L2 cache. 233+ MHz. 7’500,000 transistors. Addressable Memory 64Gb. 44 Klamath: Pentium II (1997) 45 New P6 processors Pentium II Xeon (“Pentium II on steroids”) L2 cache runs at full processor speed. Designed for the computer server market. Celeron (“the Castrated One”) Pentium II with no L2 cache. Designed for the sub-$1,000 PC market. 46 New PII XEON 47 CELERON 48 Katmai: Pentium III (1999) Pentium II with 128-bit SIMD floating point oriented extension to the MMX technology. Processor serial number in order to “enhance security”. 450+ MHz. Addressable Memory 64Gb. 49 Katmai: Pentium III (1999) 50 Pentium IV (2000) 0.18-micron 42 million transistors on a single chip. 1.4 3.0 Ghz. Bus Speed 400 Mhz. 51 Pentium IV (2000) 52 Merced: Itanium (2000) Intel Architecture-64 (IA-64). Developed jointly by Intel and HewlettPackard. Hardware x86 emulation. Not RISC or CISC, but EPIC (Explicitly Parallel Instruction Computing). 600 MHz and 1,000 MHz. Tens of millions of transistors. 53 x86 Basic Structure Code Cache Branch Predictor Decode & Prefetch Unit Bus To RAM Interface Integer ALU Registers Execution Unit Floating Point Unit Data Cache 54 x86 Basic Structure (continued) Execution unit: two parallel integer pipelines enable the CPU to read, interpret, execute and dispatch two instructions simultaneously. Branch Predictor: The branch prediction unit tries to guess which sequence will be executed each time the program contains a conditional jump, so that the Prefetch and Decode Unit can get the instructions ready in advance. 55 x86 Basic Structure (continued) Floating Point Unit: Third execution unit, where non-integer calculations are performed. Primary Cache: Two on-chip caches, one for code and one for data, are far quicker than the external memory. Bus Interface: This brings a mixture of code and data into the CPU, separates the two ready for use, and then recombines them and sends them back out. 56 x86 Modes of Operation The operating mode determines which instructions and architectural features are accessible. The Intel Architecture supports three operating modes: Real Mode Protected Mode Virtual-8086 Mode 57 Real Mode Mode in which all x86 processors boot. The CPU works like a very fast 8086. Can only access up to 1 MB of memory. Only one task is executed at a time. 58 Real Mode In Real address mode, the IA-32 processor can access 1MB of memory using 20 bit address in the range 0 to FFFFF hex. The basic problem that Intel engineers had to solve was that the original 8086 processor had only 16 bit registers, so it was impossible to directly represent a 20 bit address. They came up with a scheme known as segmented memory. All memory is divided into 64kb units called segments, as shown in the figure: 59 Real Mode 60 Real Mode An analogy might be a large building Segments= floors. Offset = a room in that floor. EX; 8000:0250 represents an offset of 250 in the segment 8000, the last zero can be dropped of the segments. To calculate linear address: Segment x 10 + offset 8000x10 +250 == 80250 61 Real Mode A typical program has three segments: Code (CS) Data (DS) Stack (SS) 62 Protected Mode Allows multitasking. Each program has its own memory protected from other programs. Extended memory: more than 1 MB of memory available. Supports virtual memory. 63 Protected Mode When a processor is running in protected mode, each program can address up to 4GB of memory. It uses the flat memory model. It only requires a 32 bit integer to hold the address of any instruction or variable. 64 Protected Mode A typical program has three segments: Code (CS) Data (DS) Stack (SS) 65 Virtual-8086 Mode Allows simultaneous execution of two or more programs designed to work in real mode, each program having up to 1 MB of independent memory. 66 Registers A register is a special high-speed storage area within the CPU. The x86 processors have several registers available for the application programmer, grouped as follows: General-purpose data registers. Segment registers. Status and control registers (EIP and EFLAGS registers). 67 General-Purpose Data Registers These eight 32-bit registers are available for holding the following data items: Integer operands for logical and arithmetic operations. Pointers (memory addresses). 68 General-Purpose Data Registers (continued) 31 16 8 ah ax 0 al Accumulator eax bh bx bl Base ebx ch cx cl Count ecx dh dx dl Data edx 69 General-Purpose Data Registers (continued) 31 16 0 sp Stack Pointer bp Base Pointer si Source Index di Destination Index esp ebp esi edi 70 Segment Registers The six segment registers hold 16-bit segment selectors. A segment selector points to a special structure in memory called a segment descriptor. Several segment descriptors are grouped together into a descriptor table. A segment descriptor contains addressing and control information which is used to control how a 32-bit linear address is generated. 71 Segment Registers (continued) 16 0 cs Code Segment ds Data Segment es Extra Segment fs Extra Segment gs Extra Segment ss Stack Segment 72 Segment Registers (continued) Memory Segment Information: Segment Descriptor Segment Descriptor Segment Register Segment Selector Segment Descriptor Descriptor Table Segment Descriptor • Base address • Size • Privilege Level: - private OS function - OS service - device driver - application program • Type: - read-only - read/write - execute-only - execute/read ... 73 Instruction Pointer Register The instruction pointer (EIP) is a 32-bit register that contains the offset in the current code segment for the next instruction to be executed. 31 16 eip 0 Instruction Pointer 74 Instruction Pointer Register (continued) It is advanced from one instruction boundary to the next in straight-line code or it is moved ahead or backwards by a number of instructions when executing flow control instructions such as jumps or subroutine calls. It cannot be accessed directly by software. 75 Flags Register This 32-bit register is a collection of individual status and control bits called flags. Each flag is usually manipulated independently and not as a set. 76 Flags Register (continued) eflags ... 31 of df 11 10 CF carry flag PF parity flag AF auxiliary flag ZF Zero Flag sf zf 7 6 af pf cf 4 2 0 SF sign flag DF direction flag OF overflow flag 77 Flags Register (continued) Carry Flag Is set if the result of an arithmetic operation involving unsigned numbers overflows. Overflow Flag Is set if the result of an arithmetic operation involving signed numbers overflows. Sign Flag Is set if the result of an arithmetic or logical operation is negative. Zero Flag Is set if the result of an arithmetic or logical operation is zero. 78 Flags Register (continued) Parity Flag Is set if the result of an arithmetic or logical operation has an even number of 1 bits in its 8 least significant bits. Auxiliary Flag Is set if the result of an arithmetic operation has a carry out from the low-order nibble. Used in binary-coded decimal (BCD) operations. Direction Flag Is explicitly set or cleared by the programmer in order to modify the behavior of some special string operations. 79 Memory Organization The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. 80 Memory Organization (continued) The physical address space ranges from zero to a maximum of 232 – 1 (4 GB). When employing the processor’s memory management facilities, programs DO NOT directly address physical memory. Instead, they access memory using a memory model. 81 Flat Memory Model Memory appears to a program as a single, continuous address space, called a linear address space. All code and data are contained in this address space. 82 Flat Memory Model (continued) The linear Linear Address Space ... address space is byte addressable, with addresses running contiguously from 0 to 232 - 1. 0xFFFFFFFF 0x00000000 83 Paging The x86 supports translation of linear (virtual) addresses into physical addresses through paging. Special tables map portions of the virtual addresses into physical memory locations. Physical memory is divided into page frames, each 4 KB in size. The operating system copies a certain number of pages from your storage device to main memory. 84 Paging (continued) Physical Memory Disk Drive Address Space Virtual Memory When a program needs a page that is not in main memory, the operating system copies the required page into memory and copies another page back to the disk. Each time a page is needed that is not currently in memory, a page fault occurs. 85 Generating a Physical Address Logical Address 16-bit selector Segment Descriptor 32-bit offset + 32-bit linear address Paging 32-bit physical address 86 32-bit Offset + + 32-bit base register eax, ebx, ecx, edx, esi, edi, ebp, esp 32-bit index register eax, ebx, ecx, edx, esi, edi, ebp scale factor displacement 1, 2, 4, 8 8-bit, 32-bit 32-bit offset 87 32-bit Offset Example displacement MOV EAX, [ESI + ECX * 4 + 12] base register index register scale factor 88 Byte Order When a value is stored in memory in multiple bytes, two distinct byte orders may be used: Big-Endian Little-Endian Big End Little end 89 Byte Order (continued) In big-endian architectures, the leftmost bytes (those with a lower address) are most significant. In little-endian architectures, the rightmost bytes are most significant. The terms big-endian and little-endian are derived from the Lilliputians of Jonathan Swift's Gulliver's Travels, whose major political issue was whether soft-boiled eggs should be opened on the big side or the little side. 90 Byte Order (continued) Intel x86 and DEC VAX systems store multibyte values in little-endian order. HP, IBM and Motorola 68K systems store multibyte values in big-endian order. The Power PC is a bi-endian processor: it supports both big and little-endian byte ordering. 91 Byte Order Example The byte ordering for the number 1025 stored in 4 bytes is: 1025 = 00000000 00000000 00000100 00000001b Address 03 00000000b 02 00000000b 01 00000100b 00 00000001b little-endian 03 00000001b 02 00000100b 01 00000000b 00 00000000b big-endian 92 CHAPTER 3 The Linux Operating System Operating System Software that makes hardware usable. Manages such things as: memory, screen display, keyboard input, disk files and printer output. User Application Programs Operating System Hardware 94 UNIX Operating system developed at Bell Labs in the early 1970s by Ken Thompson and Dennis Ritchie. First operating system to be written in a high-level programming language, namely C. 95 UNIX (continued) The name UNIX was intended as a pun on a previous OS called MULTICS (and was written UNICS at first: UNiplexed Information and Computing System). Leading operating system for workstations 96 Linux Free UNIX-type operating system originally created by Linus Torvalds at the University of Helsinki in Finland. Developed under the GNU General Public License, the source code for Linux is freely available to everyone. 97 Linux (continued) Linux is an independent POSIX (Portable Operating System Interface for UNIX) implementation and includes: multitasking, multi-user, multiprocessing, virtual memory, shared libraries and TCP/IP networking. Currently implemented in a wide range of platforms, including: x86, Alpha, SPARC, 68K and PowerPC. 98 GNU Project Short for GNU's Not UNIX. A UNIX-compatible software system developed by the Free Software Foundation (FSF). The philosophy behind GNU is to produce software that is non-proprietary. Anyone can download, modify and redistribute GNU software. The only restriction is that they cannot limit further redistribution. The GNU project was started in 1983 by Richard Stallman at the MIT. 99 POSIX Acronym for Portable Operating System Interface for UNIX. Set of IEEE and ISO standards that define an interface between programs and operating systems. Supported by most UNIX systems and Windows NT. 100 Multitasking The ability to execute more than one task (program) at the same time. The CPU switches from one program to another so quickly that it gives the appearance of executing all of the programs at the same time. 101 Multitasking (continued) There are two basic types of multitasking: Preemptive multitasking: the operating system assigns CPU time slices to each program. Cooperative multitasking: each program can control the CPU for as long as it needs it. If a program is not using the CPU, however, it can allow another program to use it temporarily. Linux supports preemptive multitasking. 102 Multi-user Computer systems that support two or more simultaneous users. All mainframes and minicomputers and most workstations are multi-user systems. 103 Multiprocessing Since version 2.0, Linux has the ability to run in multiprocessor architectures. The OS can distribute several applications in true parallel fashion across several CPUs. 104 Virtual Memory If it’s there and you can see it it’s real If it’s not there and you can see it it’s virtual If it’s there and you can’t see it it’s transparent If it’s not there and you can’t see it you erased it! IBM poster explaining virtual memory, circa 1978. 105 Virtual Memory (continued) Technique that allows to increases the amount of apparent memory available on a system. A swap space is an area on disk in which the OS stores images of running programs when memory is tight. The Linux virtual memory system uses a swap space to implement paging. 106 Shared Libraries A library is a collection of precompiled routines that a program can use. In a static library, all library functions that a program requires are made part of an executable, which can make it rather large. In a shared library, function code is not directly included in an executable file. Instead, the OS dynamically links a running program to the required routines contained in the shared library. 107 Shared Libraries (continued) Shared libraries have two important advantages: Small executable files. Several programs running at the same time can share a single copy of the library code. 108 TCP/IP Networking Acronym for Transmission Control Protocol/Internet Protocol. Consists of a suite of communications protocols used to connect hosts on the Internet. Allows services such as: e-mail, telnet, ftp and http. 109 CHAPTER 4 The Netwide Assembly Language nasm: The Netwide Assembler Free and portable x86 assembler originally developed by Simon Tatham and Julian Hall. It supports a range of object file formats, including Linux ELF, NetBSD/FreeBSD, COFF, Microsoft 16-bit OBJ and Win32. 111 Development Cycle editor nasm ld (linker) assembly language file *.asm object file *.o ELF executable file 112 ld: The Linker An object file isn’t directly executable; it first needs to be fed into a linker (also known as link-loader or link-editor). The linker does the following tasks: identifies the initial program entry point (_start label) binds symbolic references to memory addresses unites all the object and library files produces an executable ELF file 113 ELF File The Executable and Linkable Format was designed by the UNIX System Laboratories. Used by contemporary Linux implementations as its standard executable file format. Supports shared libraries (dynamic linking). 114 a.out File a.out is the default file name given to executable files by UNIX linkers. It means “assembly output”, in spite of being linker output! On the PDP-7 computer, there was no linker. Executable programs were created directly by the assembler. The name stuck, even when the linkers started to appear in newer machines. 115 Building a Program edition assembly linkage execution $ vi test.asm $ ls test.asm $ nasm -f elf test.asm $ ls test.asm test.o $ ld -s -o test test.o $ ls test test.asm test.o $ test 116 Linux-NASM Program Skeleton bits 32 ; -- 32 bit program section .data ; -- Start data segment ; put initialized data here section .bss ; -- Start bss segment ; put non-initialized data here section .text ; -- Start code segment global _start ; -- Export “_start” label _start ; -- Define “_start” label ; put program code here mov eax, 1 ; -- Exit system call mov ebx, 0 ; exit code #0 int 0x80 117 Segments A segment on UNIX is a section of related stuff in a binary. ELF files have three segments: TEXT for storing code DATA for storing initialized data BSS for non-initialized data 118 NASM Source Code Every NASM program source line has the following four fields: label: instruction operands ; comment Every field is optional. The number of operands depend of the instruction. 119 PseudoInstructions Instructions Mnemonics that represent x86 opcodes. Generate code that produce actions at run time. Not real x86 instructions (they don’t produce any actions at run time). Are used in the instruction field because that’s the most convenient place to put them. 120 Directives Statements that allow us to control how a program is assembled. They only work at assembly time (they don’t directly produce any machine code). 121 bits Directive Specifies if NASM must produce code that will run in 16 or 32-bit mode. ELF files only support 32-bit mode: bits 32 May be omitted for ELF files. 122 section .data Directive States the beginning of the initialized data segment. An image of this segment’s data is physically stored in the executable file. This segment contains read/write data. 123 Pseudo-Instructions for the Data Segment PseudoInstruction db dw dd dq dt Meaning Define byte Define word Define double word Define quadword Define ten bytes Size (bits) 8 16 32 64 80 124 section .bss Directive States the beginning of the non-initialized data segment. Only the size of the data is stored in the executable file. Once the program is loaded into memory, all the data in this section is set to zero. This segment contains read/write data. BSS means “Block Started by Symbol”, a pseudo-instruction from the old IBM 704 assembler, carried over into UNIX. 125 Pseudo-Instructions for the BSS Segment PseudoInstruction resb resw resd resq rest Meaning Reserve byte Reserve word Reserve double word Reserve quadword Reserve ten bytes Size (bits) 8 16 32 64 80 126 section .text Directive States the beginning of the segment that contains the program’s executable instructions. This segment is read-only. 127 System Calls Processes access kernel facilities via the system call interface. System calls are the only way a program con communicate to the outside world. In assembly language, interrupt 0x80 is used to make system calls. 128 System calls (continued) Process system calls: INT 0x80 Linux Kernel I/O Devices (display, keyboard, mouse, disks, printer, etc.) 129 sys_exit Terminate current process, return exit code to caller. EAX 1 EBX exit code 130 sys_read Read a number of bytes from a given input device. EAX 3 EBX file descriptor (0 = stdin) ECX buffer address EDX number of bytes to read INT 0x80 131 sys_write Write a number of bytes to a given output device. EAX 4 EBX file descriptor (1 = stdout) ECX buffer address EDX number of bytes to write INT 0x80 132 CHAPTER 5 x86 Integer Instructions Condition Codes Sufix O NO C B NAE NC NB AE Meaning Overflow No Overflow Carry Below Not Above nor Equal No Carry Not Below Above or Equal Flags OF=1 OF=0 CF=1 CF=0 134 Condition Codes (continued) Sufix Z E NZ NE BE NA A NBE Meaning Zero Equal Not Zero Not Equal Below or Equal Not Above Above Not Below nor Equal Flags ZF=1 ZF=0 CF=1 OR ZF=1 CF=0 AND ZF=0 135 Condition Codes (continued) Sufix S NS P PE NP PO Meaning Sign Not Sign Parity Parity Even Not Parity Parity Odd Flags SF=1 SF=0 PF=1 PF=0 136 Condition Codes (continued) Sufix L NGE GE NL LE NG G NLE Meaning Less Not Greater nor Equal Greater or Equal Not Less Less or Equal Not Greater Greater Not Less nor Equal Flags SF<>OF SF=OF ZF=1 OR SF<>OF ZF=0 AND SF=OF 137 Condition Codes (continued) Above and Below are used for unsigned integer comparisons. Greater and Less are used for signed integer comparisons. 138 Flow Control Instructions JMP Jcc CALL RET 139 JMP: jump Syntax: JMP dest Operation (absolute jump): EIP dest Operation (relative jump): EIP EIP + dest - - - - - - of df sf zf af pf cf 140 Unconditional Jumps Jmp statement label We have two types of jumps, Intersegment Intrasegment Address can be in a register, variable or label. 141 Unconditional Jumps Example: Start: Mov Ax, 0 Inc Ax, Jmp Start 142 Jcc: short jump conditional Syntax: Jcc dest Operation: if(cc) EIP EIP + dest endif Notes: cc is any of the condition codes. dest must be within a signed 8-bit range (-128 to 127). - - - - - - of df sf zf af pf cf 143 Jcc: near jump conditional Syntax: Jcc NEAR dest Operation: if(cc) EIP EIP + dest endif Notes: cc is any of the condition codes. dest must be within a signed 32-bit range. - - - - - - of df sf zf af pf cf 144 Conditional Jumps Dependent on condition codes. Example: JZ jump if zero flag is set. 145 Conditional Codes Examples: Code the following C routine using aseembly language instructions. Add a value to x; If x < 0 Then … (body for negative condition) Else if x = 0 … (body for zero condition) Else … (body for positive condition) End if 146 Conditional Codes Solution Add x, eax ;add a value to x Jns elseIf Zero ;jump if x is not negatve … ; code for negative condition Jmp endCheck elseifZero: jnz elsePos ; jump if x is not zero … ; code for zero condition jmp endCheck elsePos: … ; code for positive balance endCheck: 147 Comparing Instructions CMP op1, op2 This instructions executes by calculating a like a sub instruction op1 –op2 but it does not modify the operands it only modifies the flag register. We use the flag register values. We have to analyse if we care or not of the sign of the operation. 148 Compare Examples OP1= 3B OP2= 3B CF=OF=SF=0 ZF=1 OP1==OP2 signed and unsigned 149 Compare Examples OP1= 3B OP2= 15 OP1-OP2= 26 CF=OF=SF=ZF=0 OP1>OP2 signed and unsigned 150 Compare Examples OP1=15 OP2= F6 OP1-OP2=1F CF=1 – borrow SF=OF=ZF=0 Signed operation = op1>op2 Unsigned operation =op1 < op2 151 Compare Examples Legal Examples Cmp eax, 356 cmp value, 03dh Cmp bh, ‘$’ Illegal examples Cmp 1000, total 152 Compare Programming Ex. Code the following routine in assembly language. If val < 10 Then add 1 to xcount; Else add 1 to ycount; End if; 153 Compare Programming Ex Solution: Cmp ebx, 10 Jnl Elsey Inc xcount Jmp endVal Elsey: Inc ycount endVal: ;value < 10 ;add 1 to xcount ;add 1 to ycount 154 Programming Ex #2 Code the following routine in assembly language: If (total mayor o igual 100) or (count=10) Then add value to total; End if 155 Programming Ex2 Solution Cmp total, 100 Jge addValue Cmp cx, 10 Jne endAddCheck addValue: Mov ebx, value Add total, ebx endAddCheck: 156 While Loops While continuation condition loop …{ body} end while; The continuation condition is a boolean expression. 157 While loop excercise Design an assembly language module to implement the following high level language instructions. While (sum < 1000) loop …{body increment sum} End while; 158 While Loops Exercise 2 Design an assembly language module to implement the following high level language instructions. X:=1 twoTox:=1; While twoTox</number multiply twoTox by 2; End while; Substract 1 from x; 159 Homework 160 CALL: call subroutine Syntax: CALL dest Operation (absolute call): ESP ESP - 4 [ESP] EIP EIP dest Operation (relative call): ESP ESP - 4 [ESP] EIP EIP EIP + dest - - - - - - of df sf zf af pf cf 161 RET: return from subroutine Syntax: RET Operation: EIP [ESP] ESP ESP + 4 - - - - - - of df sf zf af pf cf 162 Data Transfer Instructions MOV CMOVcc SETcc XCHG XLATB PUSH POP PUSHF POPF PUSHA POPA 163 MOV: move data Syntax: MOV dest, orig Operation: dest orig - - - - - - of df sf zf af pf cf 164 CMOVcc: conditional move Syntax: CMOVcc dest, orig Operation: if(cc) dest orig endif Notes: cc is any of the condition codes. - - - - - - of df sf zf af pf cf 165 SETcc: set conditional Syntax: SETcc dest Operation: if(cc) dest 1 else dest 0 endif Notes: cc is any of the condition codes. - - - - - - of df sf zf af pf cf 166 XCHG: exchange data Syntax: XCHG op1, op2 Operation: temp op1 op1 op2 op2 temp - - - - - - of df sf zf af pf cf 167 XLATB: translate byte Syntax: XLATB Operation: AL [EBX + AL] Notes: AL is treated as an unsigned byte. - - - - - - of df sf zf af pf cf 168 PUSH: push data on stack Syntax: PUSH op Operation: ESP ESP - 4 [ESP] op - - - - - - of df sf zf af pf cf 169 POP: pop data from stack Syntax: POP dest Operation: dest [ESP] ESP ESP + 4 - - - - - - of df sf zf af pf cf 170 PUSHF: push flags register Syntax: PUSHF Operation: ESP ESP - 4 [ESP] EFLAGS - - - - - - of df sf zf af pf cf 171 POPF: pop flags register Syntax: POPF Operation: EFLAGS [ESP] ESP ESP + 4 X X X X X X X of df sf zf af pf cf 172 PUSHA: push all registers Syntax: PUSHA Operation: temp ESP [ESP [ESP [ESP [ESP [ESP [ESP [ESP [ESP + + + + + + + + 0x1C] 0x18] 0x14] 0x10] 0x0C] 0x08] 0x04] 0x00] ESP ESP - 0x20 EAX ECX EDX EBX temp EBP ESI EDI - - - - - - of df sf zf af pf cf 173 POPA: pop all registers Syntax: POPA Operation: EDI ESI EBP EBX EDX ECX EAX ESP [ESP + 0x00] [ESP + 0x04] [ESP + 0x08] [ESP + 0x10] [ESP + 0x14] [ESP + 0x18] [ESP + 0x1C] ESP + 0x20 - - - - - - of df sf zf af pf cf 174 Flow Control Instructions JMP Jcc CALL RET 175 JMP: jump Syntax: JMP dest Operation (absolute jump): EIP dest Operation (relative jump): EIP EIP + dest - - - - - - of df sf zf af pf cf 176 Jcc: short jump conditional Syntax: Jcc dest Operation: if(cc) EIP EIP + dest endif Notes: cc is any of the condition codes. dest must be within a signed 8-bit range (-128 to 127). - - - - - - of df sf zf af pf cf 177 Jcc: near jump conditional Syntax: Jcc NEAR dest Operation: if(cc) EIP EIP + dest endif Notes: cc is any of the condition codes. dest must be within a signed 32-bit range. - - - - - - of df sf zf af pf cf 178 CALL: call subroutine Syntax: CALL dest Operation (absolute call): ESP ESP - 4 [ESP] EIP EIP dest Operation (relative call): ESP ESP - 4 [ESP] EIP EIP EIP + dest - - - - - - of df sf zf af pf cf 179 RET: return from subroutine Syntax: RET Operation: EIP [ESP] ESP ESP + 4 - - - - - - of df sf zf af pf cf 180 Arithmetic Instructions CLC STC CMC ADD ADC INC SUB SBB DEC NEG CMP MUL IMUL DIV IDIV CBW CWD CDQ CWDE MOVSX MOVZX 181 CLC: clear carry flag Syntax: CLC Operation: CF 0 - - - - - - 0 of df sf zf af pf cf 182 STC: set carry flag Syntax: STC Operation: CF 1 - - - - - - 1 of df sf zf af pf cf 183 CMC: complement carry flag Syntax: CMC Operation: CF ~CF - - - - - - X of df sf zf af pf cf 184 ADD: add integers Syntax: ADD dest, orig Operation: dest dest + orig X - X X X X X of df sf zf af pf cf 185 ADD examples AX: 0075 CX: 01A2 Results: Add ax,cx AX: 0217 CX: 01A2 SF=ZF=CF=OF=0 186 ADD examples AX: 77AC CX: 4B35 Results: add ax, cx AX: C2E1 CX: 4B35 SF=OF=1; ZF=CF=0 187 ADC: add with carry Syntax: ADC dest, orig Operation: dest dest + orig + CF X - X X X X X of df sf zf af pf cf 188 INC: increment integer Syntax: INC dest Operation: dest dest + 1 X - X X X X of df sf zf af pf cf 189 INC examples ECX: 00 00 01 A2 inc ecx Results: ECX= 00 00 01 A3 SF=ZF=OF=0 190 INC examples EDX: 7F FF FF FF inc edx Results: EDS: 80 00 00 00 SF=OF=1; ZF=0 191 SUB: subtract integers Syntax: SUB dest, orig Operation: dest dest - orig X - X X X X X of df sf zf af pf cf 192 SUB examples EAX: 00 00 00 75 ECX: 00 00 01 A2 sub eax, ecx Results: EAX: FF FF FE D3 ECX: 00 00 01 A2 SF=1, ZF=CF=OF=0 193 SUB examples DX: FF 20 Word at value FF 20 sub dx, Value Results: DX:00 00 Value: FF 20 ZF=1PF=1, the rest are zero. 194 SBB: subtract with borrow Syntax: SBB dest, orig Operation: dest dest - orig - CF X - X X X X X of df sf zf af pf cf 195 DEC: decrement integer Syntax: DEC dest Operation: dest dest - 1 X - X X X X of df sf zf af pf cf 196 DEC examples BX: 00 01 dec bx Results: BX: 00 00 ZF=1; SF=OF=0 197 DEC examples AL: F5 dec al Results: AL: F4 SF=1; OF=ZF=0 198 NEG: negate Syntax: NEG dest Operation: dest - dest Notes: Sets CF, unless dest is zero, y which case CF is cleared. X - X X X X X of df sf zf af pf cf 199 NEG examples BX: 01 A2 neg bx Results: BX: FE 5E SF=1; ZF=0 200 NEG examples DH: F5 neg dh Results: DH:0B SF=ZF=0 201 NEG examples EAX: 00 00 00 00 neg eax Results: EAX: 00 00 00 00 SF=0; ZF=1 202 CMP: compare integers Syntax: CMP op1, op2 Operation: NULL op1 - op2 X - X X X X X of df sf zf af pf cf 203 MUL: unsigned integer multiply Syntax: MUL orig Operation: case(size(orig)) 8: AX AL * orig 16: DX:AX AX * orig 32: EDX:EAX EAX * orig endcase Notes: CF and OF are cleared if the high order of the result is zero. Orig cannot be immediate X - ? ? ? ? X of df sf zf af pf cf 204 MUL examples AX: 00 05 BX: 00 02 DX: ?? ?? Results: DX: 00 00 AX: 00 0A CF=OF=0 mul bx 205 MUL examples AL: 05 Byte at Factor: FF Results: AX: 04 FB CF=OF=1 mul Factor 206 IMUL: signed integer multiply Syntax #1: IMUL orig Operation: case(size(orig)) 8: AX AL * orig 16: DX:AX AX * orig 32: EDX:EAX EAX * orig endcase 207 IMUL examples AX: 00 05 BX: 00 02 DX: ?? ?? DX: 00 00 AX: 00 0A CF=OF=0 imul bx 208 IMUL examples AL: 05 Byte at Factor: FF imul Factor Results: AX: 04 FB CF=OF=1 209 IMUL: signed integer multiply (continued) Syntax #2: IMUL dest, orig Operation: dest dest * orig X - ? ? ? ? X of df sf zf af pf cf 210 IMUL examples EBX: 00 00 00 0A imul ebx, 10 *Note source may be immediate Results: EBX: 00 00 00 64 CF=OF=0 211 IMUL: signed integer multiply (continued) Syntax #3: IMUL dest, orig, const Operation: dest orig * const Notes: CF and OF are cleared if the result is the same size as the multiplicand. X - ? ? ? ? X of df sf zf af pf cf 212 IMUL examples Word at Value: 08F2 BX: ?? ?? imul bx, Value, 1000 Results: BX: F1 50 CF=OF=1 213 DIV: unsigned integer divide Syntax: DIV orig Operation: case(size(orig)) 8: AL AX / orig AH AX % orig 16: AX DX:AX / orig DX DX:AX % orig 32: EAX EDX:EAX / orig EDX EDX:EAX % orig endcase ? - ? ? ? ? ? of df sf zf af pf cf 214 DIV source (divisor) other(dividend) Quotient Remainder byte word double word AX DX:AX EDX:EAX AL AX EAX AH DX EDX 215 DIV examples EDX: 00 00 00 00 EAX: 00 00 00 64 EBX: 00 00 00 0D (100/13) div ebx Results: EDX: 00 00 00 09 EAX: 00 00 00 07 216 IDIV: signed integer divide Syntax: IDIV orig Operation: case(size(orig)) 8: AL AX / orig AH AX % orig 16: AX DX:AX / orig DX DX:AX % orig 32: EAX EDX:EAX / orig EDX EDX:EAX % orig endcase ? - ? ? ? ? ? of df sf zf af pf cf 217 CBW: convert byte to word Syntax: CBW Operation: AX SignExtend(AL) - - - - - - of df sf zf af pf cf 218 CBW examples AL: 53 cbw Results: AX: 0053 219 CBW examples AL: C6 Results: AX: FF C6 cbw 220 CWD: convert word to dword Syntax: CWD Operation: DX:AX SignExtend(AX) - - - - - - of df sf zf af pf cf 221 CWD example AX: 07 0D DX: ?? ?? Results: DX: 00 00 AX: 07 0D cwd 222 CDQ: convert dword to qword Syntax: CDQ Operation: EDX:EAX SignExtend(EAX) - - - - - - of df sf zf af pf cf 223 CDQ example EAX: FF FF FA 13 EDX: ?? ?? ?? ?? cdq Results: EDX: FF FF FF FF EAX: FF FF FA 13 224 CWDE: convert word to dword extended Syntax: CWDE Operation: EAX SignExtend(AX) - - - - - - of df sf zf af pf cf 225 CWDE example AX: FF 2A cwde Results: EAX: FF FF FF 2A 226 MOVSX: move data with sign extend Syntax: MOVSX dest, orig Operation: dest SignExtend(orig) Notes: orig must be smaller than dest. - - - - - - of df sf zf af pf cf 227 MOVSX examples Word at value: 07 0D movsx ecx, value Results: ECX: 00 00 07 0D 228 MOVSX examples Word at value: F7 0D movsx ecx, value Results: ECX: FF FF F7 0D 229 MOVZX: move data with zero extend Syntax: MOVZX dest, orig Operation: dest ZeroExtend(orig) Notes: orig must be smaller than dest. - - - - - - of df sf zf af pf cf 230 MOVZX examples Word at value: 07 0D movzx ecx, value Results: ECX: 00 00 07 0D 231 MOVZX examples Word at value: F7 0D movzx ecx, value Results: ECX: 00 00 F7 0D 232 Logical and Bitwise Instructions AND OR XOR NOT TEST SHL SHR SAR ROL ROR RCL RCR 233 AND: bitwise and Syntax: AND dest, orig Operation: dest dest & orig Notes: 0&0=0 0&1=0 1&0=0 1&1=1 0 - X X ? X 0 of df sf zf af pf cf 234 OR: bitwise or Syntax: OR dest, orig Operation: dest dest | orig Notes: 0|0=0 0|1=1 1|0=1 1|1=1 0 - X X ? X 0 of df sf zf af pf cf 235 XOR: bitwise xor Syntax: XOR dest, orig Operation: dest dest ^ orig Notes: 0^0=0 0^1=1 1^0=1 1^1=0 0 - X X ? X 0 of df sf zf af pf cf 236 NOT: bitwise not Syntax: NOT dest Operation: dest ~dest Notes: ~0 = 1 ~1 = 0 0 - X X ? X 0 of df sf zf af pf cf 237 TEST: test bits Syntax: TEST op1, op2 Operation: NULL op1 & op2 0 - X X ? X 0 of df sf zf af pf cf 238 SHL: shift left Syntax: SHL dest, count Operation: msb cf lsb ... 0 ? - X X ? X X of df sf zf af pf cf 239 SHR: shift right Syntax: SHR dest, count Operation: msb 0 lsb ... cf ? - X X ? X X of df sf zf af pf cf 240 SAR: shift arithmetic right Syntax: SHR dest, count Operation: msb lsb ... cf ? - X X ? X X of df sf zf af pf cf 241 ROL: rotate left Syntax: ROL dest, count Operation: msb cf lsb ... ? - X X ? X X of df sf zf af pf cf 242 ROR: rotate right Syntax: ROR dest, count Operation: msb lsb ... cf ? - X X ? X X of df sf zf af pf cf 243 RCL: rotate through carry left Syntax: RCL dest, count Operation: msb cf lsb ... ? - X X ? X X of df sf zf af pf cf 244 RCR: rotate through carry right Syntax: RCR dest, count Operation: msb lsb ... cf ? - X X ? X X of df sf zf af pf cf 245 String Instructions CLD STD REP STOSB REP STOSW REP STOSD REP MOVSB REP MOVSW REP MOVSD 246 CLD: clear direction flag Syntax: CLD Operation: DF 0 - 0 - - - - of df sf zf af pf cf 247 STD: set direction flag Syntax: STD Operation: DF 1 - 1 - - - - of df sf zf af pf cf 248 REP STOSB: repeat store string byte Syntax: REP STOSB Operation: while(ECX <> 0) [EDI] AL if(DF = 0) EDI EDI + 1 else EDI EDI - 1 endif ECX ECX - 1 endwhile - - - - - - of df sf zf af pf cf 249 REP STOSW: repeat store string word Syntax: REP STOSW Operation: while(ECX <> 0) [EDI] AX if(DF = 0) EDI EDI + 2 else EDI EDI - 2 endif ECX ECX - 1 endwhile - - - - - - of df sf zf af pf cf 250 REP STOSD: repeat store string dword Syntax: REP STOSD Operation: while(ECX <> 0) [EDI] EAX if(DF = 0) EDI EDI + 4 else EDI EDI - 4 endif ECX ECX - 1 endwhile - - - - - - of df sf zf af pf cf 251 REP MOVSB: repeat move string byte Syntax: REP MOVSB Operation: while(ECX <> 0) BYTE [EDI] BYTE [ESI] if(DF = 0) ESI ESI + 1 EDI EDI + 1 else ESI ESI - 1 EDI EDI - 1 endif ECX ECX - 1 endwhile - - - - - - of df sf zf af pf cf 252 REP MOVSW: repeat move string word Syntax: REP MOVSW Operation: while(ECX <> 0) WORD [EDI] WORD [ESI] if(DF = 0) ESI ESI + 2 EDI EDI + 2 else ESI ESI - 2 EDI EDI - 2 endif ECX ECX - 1 endwhile - - - - - - of df sf zf af pf cf 253 REP MOVSD: repeat move string dword Syntax: REP MOVSD Operation: while(ECX <> 0) DWORD [EDI] if(DF = 0) ESI ESI EDI EDI else ESI ESI EDI EDI endif ECX ECX - 1 endwhile DWORD [ESI] + 4 + 4 - 4 - 4 - - - - - - of df sf zf af pf cf 254 CHAPTER 6 Mixing C and Assembly Language Modularization Most programs consist of a number of seperate parts, called modules. Source modules are seperately edited and compiled or assembled in order to produce the corresponding object modules. All the object modules are linked together to produce an executable program. 256 Modularization (continued) source module *.asm source module *.c nasm gcc ... source module *.o standard C library ld (linker) source module *.o start file crt0.o ELF executable file 257 Exporting & Importing Names in Assembly Language Any assembly language label may be exported to other modules using the global directive. 258 Exporting & Importing Names in Assembly Language (continued) The global directive must appear before the definition of the corresponding symbol. If a module exports a certain label, any other module may import it. To import a label, the extern directive must be used. A label can not be defined and declared extern in the same module. 259 Assembly Export/Import Example bits 32 section .data global alpha extern beta alfa dd 500 section .text global _start extern func _start inc dword [alpha] inc byte [beta] call func mov eax, 1 mov ebx, 0 int 0x80 module1.asm bits 32 section .data global beta extern alpha beta db 10 section .text global func func xor eax, eax mov al, [beta] add [alpha], eax ret module2.asm 260 Assembly Export/Import Example (continued) Building the program: $ nasm -f elf module1.asm $ nasm -f elf module2.asm $ ld -s module1.o module2.o -o program $ ls module1.asm module2.asm module1.o module2.o program 261 Exporting & Importing Names in ANSI C By default, al function names and global variables are exportable to other modules. If a name is prefered to be kept local to a module, it must be declared static. 262 Exporting & Importing Names in ANSI C (continued) To indicate that a name is probably declared in some other module, the extern modifier must be used in the variable or function prototype declaration. The extern modifier is optional in function prototype declarations. It is not an error to declare a name extern and to have it defined as well in the same module. 263 ANSI C Export/Import Example int x; static int y; /* defines an exportable variable */ /* defines a local module variable */ /* import x if not defined in this module */ extern int x; /* import h if not defined in this module */ extern int h(int, int); int f(int a, int b) /* defines an exportable function */ { return a + b; } static int g(int c) /* defines a local module function */ { return c + c; } 264 x86 and GCC Data types GCC Data Type Size in bytes Assembly Language Equivalent char 1 byte short 2 word int 4 dword long 4 dword long long 8 qword float 4 dword double 8 qword long double 10 tword void * 4 dword 265 Register Usage Function return their values in the following registers: AL for char AX for short EAX for int, long and void * EDX:EAX for long long ST0 for floating point 266 Register Usage (continued) Registers EAX, ECX, EDX (not EBX) may be changed by the function; all other registers must be saved and restored. Flags may be changed by the procedure with the following restriction: The direction flag is 0 by default. The direction flag may be set temporarily, but must be cleared before any call or return. 267 Passing Parameters The parameters received by a C function, or a C-callable assembly language subroutine, are passed through the stack. Parameters are pushed into the stack in reverse order, that is, from right to left. This means that the first paramater is always the nearest to the top of the stack. 268 Passing Parameters (continued) After the parameters are pushed into the stack, a CALL instruction to the desired function or subroutine is executed. When the function or subroutine returns, the parameters are still in the stack and must be removed by the caller. This may be done using POP instructions or by adjusting directly the ESP register through an ADD instruction. 269 Subroutine Prologue The first two instructions in a C-callable subroutine that receives arguments should be: push ebp mov ebp, esp This saves the EBP value, so that it can now point to the current top of stack. 270 Subroutine Prologue (continued) After this ... prologue, the stack has the following layout: Subroutine parameters EBP+n EBP+8 CALL return address Original value of EBP EBP+4 EBP ESP 271 Subroutine Epilogue In order to undo the subroutine prologue, the following intructions must be the last in a C-callable subroutine: pop ebp ret 272 CHAPTER 8 Floating Point Instructions FPU: Floating Point Unit The FPU (Intel x87) is used for mathematical computations that require floating point numbers. Uses IEEE 754 standard for floating point numbers. Works in parallel together with the other x86 units. 274 FPU Registers CPU and FPU have a separate set of registers, mutually inaccessible. FPU has a stack of eight 80-bit registers. The register at the top of the stack is called ST0, the one bellow is ST1 and so on. All values in the FPU registers are stored as real extended numbers (80-bit). All computations take place using this precision. 275 FPU Registers (continued) 79 63 0 st0 st1 st2 mantissa st3 exponent st4 sign st5 st6 st7 276 x87 Data Types The values contained in the FPU registers may be converted to and from the following data types: x87 Data Type word integer short integer long integer packed BCD integer single precision real double precision real extended precision real Number NASM Type ANSI C Type of Bytes 2 word short 4 dword int 8 10 4 8 10 qword tword dword qword tword long long not available float double long double The long long type is a GCC extension to ANSI C. 277 FPU Operations Most FPU operations involve pushing and popping values to and from the register stack. When a value is pushed to the stack, register ST0 becomes ST1, ST1 becomes ST2 and so on, thus making space in ST0 for the pushed value. 278 FPU Operations (continued) The opposite occurs when the stack is popped: ST1 becomes ST0, ST2 becomes ST1 and son on. Instructions that refer to memory usually require a size prefix: word, dword, qword or tword. 279 Using FPU Instructions 1. 2. 3. 4. Reset FPU (FINIT). Copy data from memory into FPU registers. Process data. Copy data from FPU registers back into memory. 280 Types of FPU Operations Real Transfers Integer Transfers Packed BCD Transfers Loading Constants Addition Normal Subtraction Reversed Subtraction Multiplication Normal Division Reversed Division Transcendental Instructions Comparisons Miscellaneous Operations 281 Types of FPU Operations (continued) Description of most FPU operations can be consulted in the FPU Operation Tables. 282 CHAPTER 9 SIMD Instructions Data Transfer Instructions MOVD MOVQ 284 MOVD: move dword Syntax: MOVD dest, orig Operation: dest orig Notes: dest and orig may be MMX registers, memory locations or 32-bit integer registers. When the destination operand is an MMX register, the 32-bit source value is written to the low-order 32 bits of the 64-bit MMX register and zero-extended to 64 bits. When the source operand is an MMX register, the low-order 32 bits of the MMX register are written to the 32-bit integer register or 32-bit memory location selected with the destination operand. 285 MOVQ: move qword Syntax: MOVQ dest, orig Operation: dest orig Notes: orig and dest can be either an MMX register or a memory location; however, data cannot be transferred from one memory location to another memory location. 286 Arithmetic Instructions PADDB PADDW PADDD PADDSB PADDSW PADDUSB PADDUSW PSUBB PSUBW PSUBD PSUBSB PSUBSW PSUBUSB PSUBUSW 287 Arithmetic Instructions (continued) PMULLW PMULHW PMADDWD 288 Data Range Limits for Saturation Data Type Lower Limit Upper Limit Decimal Hexadecimal Decimal Hexadecimal Signed Byte -128 Signed Word -32,768 Unsigned Byte 0 Unsigned Word 0 0x80 127 0x8000 32,767 0x00 255 0x0000 65,535 0x7F 0x7FFF 0xFF 0xFFFF 289 PADDB: packed truncated byte addition Syntax: PADDB dest, orig Operation: dest + + + + + + + + = = = = = = = = orig dest 290 PADDW: packed truncated word addition Syntax: PADDW dest, orig Operation: dest + + + + = = = = orig dest 291 PADDD: packed truncated dword addition Syntax: PADDD dest, orig Operation: dest + + = = orig dest 292 PADDSB: packed signed saturated byte addition Syntax: PADDSB dest, orig Operation: dest + + + + + + + + = = = = = = = = orig dest 293 PADDSW: packed signed saturated word addition Syntax: PADDSW dest, orig Operation: dest + + + + = = = = orig dest 294 PADDUSB: packed unsigned saturated byte addition Syntax: PADDUSB dest, orig Operation: dest + + + + + + + + = = = = = = = = orig dest 295 PADDUSW: packed unsigned saturated word addition Syntax: PADDUSW dest, orig Operation: dest + + + + = = = = orig dest 296 PSUBB: packed truncated byte subtraction Syntax: PSUBB dest, orig Operation: dest - - - - - - - - = = = = = = = = orig dest 297 PSUBW: packed truncated word subtraction Syntax: PSUBW dest, orig Operation: dest - - - - = = = = orig dest 298 PSUBD: packed truncated dword subtraction Syntax: PSUBD dest, orig Operation: dest - - = = orig dest 299 PSUBSB: packed signed saturated byte subtraction Syntax: PSUBSB dest, orig Operation: dest - - - - - - - - = = = = = = = = orig dest 300 PSUBSW: packed signed saturated word subtraction Syntax: PSUBSW dest, orig Operation: dest - - - - = = = = orig dest 301 PSUBUSB: packed unsigned saturated byte subtraction Syntax: PSUBUSB dest, orig Operation: dest - - - - - - - - = = = = = = = = orig dest 302 PSUBUSW: packed unsigned saturated word subtraction Syntax: PSUBUSW dest, orig Operation: dest - - - - = = = = orig dest 303 PMULLW: packed multiply low word (signed) Syntax: PMULLW dest, orig Operation: dest * * * * Low Order Low Order orig Low Order Low Order = = = = dest 304 PMULHW: packed multiply high word (signed) Syntax: PMULHW dest, orig Operation: dest * * * * orig High Order High Order High Order = = = High Order = dest 305 PMADDWD: packed multiply and add (signed) Syntax: PMADDWD dest, orig Operation: dest * * * * orig + + dest 306 Logical Instructions PAND POR PXOR PANDN 307 PAND: bitwise qword and Syntax: PAND dest, orig Operation: dest & orig = dest 308 POR: bitwise qword or Syntax: POR dest, orig Operation: dest | orig = dest 309 PXOR: bitwise qword xor Syntax: PXOR dest, orig Operation: dest ^ orig = dest 310 PANDN: bitwise qword and/not Syntax: PANDN dest, orig Operation: dest ~ ~dest & orig = dest 311 Shift Instructions PSLLW PSLLD PSLLQ PSRLW PSRLD PSRLQ PSRAW PSRAD 312 PSLLW: packed word logical shift left Syntax: PSLLW dest, orig Operation: dest << << << << = = = = orig dest 313 PSLLD: packed dword logical shift left Syntax: PSLLD dest, orig Operation: dest << << = = orig dest 314 PSLLQ: packed qword logical shift left Syntax: PSLLQ dest, orig Operation: dest << orig = dest 315 PSRLW: packed word logical (unsigned) shift right Syntax: PSRLW dest, orig Operation: dest >> >> >> >> = = = = orig dest 316 PSRLD: packed dword logical (unsigned) shift right Syntax: PSRLD dest, orig Operation: dest >> >> = = orig dest 317 PSRLQ: packed qword logical (unsigned) shift right Syntax: PSRLQ dest, orig Operation: dest >> orig = dest 318 PSRAW: packed word arithmetic (signed) shift right Syntax: PSRAW dest, orig Operation: dest >> >> >> >> = = = = orig dest 319 PSRAD: packed dword arithmetic (signed) shift right Syntax: PSRAD dest, orig Operation: dest >> >> = = orig dest 320 Comparison Instructions PCMPEQB PCMPEQW PCMPEQD PCMPGTB PCMPGTW PCMPGTD 321 PCMPEQB: packed compare for equal bytes Syntax: PCMPEQB dest, orig Operation: dest == == == == == == == == orig = = = = = = = = dest All ones if true, all zeros if false. 322 PCMPEQW: packed compare for equal words Syntax: PCMPEQW dest, orig Operation: dest == == == == = = = = orig dest All ones if true, all zeros if false. 323 PCMPEQD: packed compare for equal dwords Syntax: PCMPEQD dest, orig Operation: dest == == = = orig dest All ones if true, all zeros if false. 324 PCMPGTB: packed compare for greater than bytes (signed) Syntax: PCMPGTB dest, orig Operation: dest > > > > > > > > = = = = = = = = orig dest All ones if true, all zeros if false. 325 PCMPGTW: packed compare for greater than words (signed) Syntax: PCMPGTW dest, orig Operation: dest > > > > = = = = orig dest All ones if true, all zeros if false. 326 PCMPGTD: packed compare for greater that dwords (signed) Syntax: PCMPGTD dest, orig Operation: dest > > = = orig dest All ones if true, all zeros if false. 327 Conversion Instructions PACKSSWB PACKSSDW PACKUSWB PUNPCKLBW PUNPCKLWD PUNPCKLDQ PUNPCKHBW PUNPCKHWD PUNPCKHDQ 328 PACKSSWB: pack words into bytes with signed saturation Syntax: PACKSSWB dest, orig Operation: dest orig dest 329 PACKSSDW: pack dwords into words with signed saturation Syntax: PACKSSDW dest, orig Operation: dest orig dest 330 PACKUSWB: pack words into bytes with unsigned saturation Syntax: PACKUSWB dest, orig Operation: dest orig dest 331 PUNPCKLBW: unpack low packed bytes Syntax: PUNPCKLBW dest, orig Operation: dest orig dest 332 PUNPCKLWD: unpack low packed words Syntax: PUNPCKLWD dest, orig Operation: dest orig dest 333 PUNPCKLDQ: unpack low packed dwords Syntax: PUNPCKLDQ dest, orig Operation: dest orig dest 334 PUNPCKHBW: unpack high packed bytes Syntax: PUNPCKHBW dest, orig Operation: dest orig dest 335 PUNPCKHWD: unpack high packed words Syntax: PUNPCKHWD dest, orig Operation: dest orig dest 336 PUNPCKHDQ: unpack high packed dwords Syntax: PUNPCKHDQ dest, orig Operation: dest orig dest 337 Empty MMX State Instruction EMMS 338 EMMS: empty MMX state Syntax: EMMS Notes: Should be used at the end of a sequence of MMX instructions in order to allow subsequent FPU instructions. 339 CHAPTER 10 Interrupt Handling Interrupting Program Execution An interrupt is an asynchronous event that is typically triggered by hardware (I/O device). An exception is a synchronous event that is generated when the processor detects one or more predefined conditions while executing an instruction. 341 Interrupting Program Execution (continued) When an interrupt or exception is signaled, the processor halts execution of the current task and switches to a handler procedure that has been written specifically to handle the interrupt or exception condition. 342 Interrupting Program Execution (continued) The processor accesses the handler procedure through an entry in the interrupt descriptor table (IDT). When the handler has completed handling the interrupt or exception, program control is returned to the interrupted task. 343 Interrupt Descriptor Table The IDT comprises up to 256 8-byte gate descriptors. A gate is the mechanism that allows a task to execute code in a different privilege level. Each gate descriptor contains the segment selector, offset and privilege level of its corresponding handler procedure. The address and size of the IDT is stored in the 48-bit Interrupt Descriptor Table Register. (IDTR). 344 Interrupt Descriptor Table Register 47 IDTR 16 15 IDT base address (32 bits) Gate for Interrupt 0 Gate for Interrupt 1 ... IDT may begin at any address in physical memory IDT Limit Gate for Interrupt n 0 handler procedure code for interrupt 0 handler procedure code for interrupt 1 handler procedure code for interrupt n 8-byte descriptors 345 SIDT: store IDTR Syntax: SIDT dest Operation: dest IDTR - - - - - - of df sf zf af pf cf 346 Hardware Interrupts The x86 processor has two pins that can be attached to external interruptgenerating devices. These pins, or input lines, are: INTR Maskable interrupts NMI Nonmaskable interrupts 347 Interrupt Flag The interrupt flag IF is contained in the EFLAGS register. The INTR input line may be enabled or disabled through software (running in the correct privileged level) with the use of the STI (set IF) and CLI (clear IF) instructions. This means that INTR may be masked (disabled). The NMI input line is nonmaskable, which means it may not be disabled. 348 The PIC 8259 The 8259 Programmable Interrupt Controller (PIC) chip accepts interrupts from up to eight different devices. If any one of the devices requests service, the 8259 will toggle the CPU’s INTR input line and pass an interrupt vector number to the CPU’s data bus. Several PICs may be cascaded in order to support up to different 64 devices. 349 The PIC 8259 (continued) A typical PC uses two PICs to provide 15 interrupt inputs (seven on the master PIC with its eighth input coming from the slave PIC to process its eight inputs). In modern motherboards, the 8259 is usually incorporated into a larger chip as part of the chipset. 350 PIC and CPU Connections IRQ7 IRQ6 IRQ5 IRQ4 IRQ3 IRQ2 IRQ1 IRQ0 PIC 8259 slave IRQ7 IRQ6 IRQ5 IRQ4 IRQ3 IRQ2 IRQ1 IRQ0 PIC 8259 master INTR CPU x86 D0 D1 D2 D3 D4 D5 D6 D7 data bus 351 PIC Inputs for a PC (Real Mode) 8259 Pin Vector Number 0x08 IRQ 0 0x09 IRQ 1 0x0A IRQ 2 0x0B IRQ 3 0x0C IRQ 4 0x0D IRQ 5 0x0E IRQ 6 0x0F IRQ 7 0x70 IRQ 8/0 0x71 IRQ 9/1 0x72 IRQ 10/2 0x73 IRQ 11/3 0x74 IRQ 12/4 0x75 IRQ 13/5 0x76 IRQ 14/6 0x77 IRQ 15/7 Device Timer chip Keyboard Cascade for slave controller (IRQ 8-15) Serial port 2 Serial port 1 Parallel port 2 in AT, reserved in PS/2 systems Diskette drive Parallel port 1 Real-time clock CGA vertical retrace (and other IRQ 2 devices) Reserved Reserved Reserved in AT, auxiliary device on PS/2 systems FPU interrupt Hard disk controller Reserved 352 Interrupts and Exceptions (Protected Mode) Vector Number 0 1 2 3 4 5 6 7 Description Divide error Debug NMI interrupt Breakpoint Overflow Bound range exceeded Invalid opcode Device not available 8 Double fault 9 10 Reserved Invalid TSS Source DIV and IDIV instructions Any code or data reference Nonmaskable external interrupt INT 3 instruction INTO instruction BOUND instruction UD2 instruction or reserved opcode No math coprocessor Any instruction that can generate an exception, an NMI, or an INTR. Task switch or TSS access. 353 Interrupts and Exceptions (continued) Vector Number Description 11 Segment Not Present 12 Stack Segment Fault 13 General Protection 14 15 Page Fault Reserved 16 Floating-Point Error 17 18 19-31 Alignment Check Machine Check Reserved 32-255 Maskable Interrupts Source Loading segment registers or accessing system segments. Stack operations and SS register loads. Any memory reference and other protection checks. Any memory reference. Floating-point or WAIT/FWAIT instruction. Any data reference in memory. Model dependent. External interrupt from INTR pin or INT n instruction. 354 Signals Linux traps all interrupts and exceptions that are generated by the system. Under some circumstances, the operating system will send a signal to a running process informing it that an exceptional situation has occurred. 355 Signals (continued) Some signals report errors such as references to invalid memory addresses; others report asynchronous events, such as disconnection of a phone line. 356 Hardware Interrupts & Signals Process PIC 8259 1. A device generates a hardware interrupt CPU x86 3. If required, the OS kernel sends a signal to a process OS kernel 2. CPU calls the handler procedure provided by the OS kernel 357 Software Exceptions & Signals 1. Process generates a software exception Process 3. OS kernel sends a signal to the offending process OS kernel CPU x86 2. CPU calls the handler procedure provided by the OS kernel 358 Signal Handling A programmer may arrange for a particular signal to be ignored or to be processed by a special piece of code called a signal handler. 359 Signal Handling (continued) In the latter case, the process that receives the signal suspends its current flow of control, executes the signal handler, and the resumes the original flow of control when the signal handler finishes. 360 Predefined Signals There are 31 different signals defined for UNIX. A programmer may choose one of the following actions for a particular signal: Trigger a user-supplied signal handler Trigger the default kernel-supplied handler Ignore it 361 Default Signal Handlers DUMP: terminate the process and generate a core (memory) image file QUIT: terminate the process without generating a core image file IGNORE: ignore and discard the signal SUSPEND: suspends the process 362 List of Signals SIGHUP SIGINT SIGQUIT SIGILL SIGTRAP SIGIOT SIGBUS SIGFPE SIGKILL Signal Number 1 2 3 4 5 6 7 8 9 Default Action quit quit dump dump dump dump dump dump quit SIGUSR1 10 quit Macro Description Hangup Interrupt Quit Illegal instruction Trace trap (for debugging) IO Trap instruction Bus error Floating Point Exception Kill (cannot be caught, blocked or ignored) User defined signal 1 363 List of Signals (continued) SIGSEGV SIGUSR2 SIGPIPE Signal Number 11 12 13 SIGALRM SIGTERM SIGCHLD SIGCONT SIGSTOP 14 15 17 18 19 SIGTSTP 20 Macro Default Description Action dump Segmentation violation User defined signal 2 quit Write on a pipe with no one to read quit it Alarm clock quit Software termination signal quit ignore Child status has changed ignore Continue after stop suspend Stop (cannot be caught, blocked or ignored) suspend Stop signal generated from keyboard 364 List of Signals (continued) Macro SIGTTIN SIGTTOU SIGURG SIGXCPU SIGXFSZ SIGVTALRM SIGPROF SIGWINCH SIGLOST Signal Default Description Number Action 21 suspend Background read attempted from control terminal 22 suspend Background write attempted to control terminal 23 ignore Urgent condition present on socket 24 quit CPU time limit exceeded 25 quit File size limit exceeded 26 quit Virtual time alarm 27 quit Profiling timer alarm 28 ignore Window size changed 29 quit Resource lost 365 Setting a Signal Handler The signal system call allows a process to specify the action that it will take when a particular signal is received. 366 Setting a Signal Handler (continued) It takes two parameters (from left to right): 1. The code number of the signal to be reprogrammed. 2. The address of a user defined function, which will be executed when the specified signal arrives, or zero (SIG_DFL) to use the default handler, or one (SIG_IGN) to ignore the signal. 367