Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CS2422 Assembly Language and System Programming IA-32 Processor Architecture Department of Computer Science National Tsing Hua University Assembly Language for IntelBased Computers, 5th Edition CS2422 Assembly Language and System Programming Kip Irvine Chapter 2: IA-32 Processor Architecture Slides prepared by the author Revision date: June 4, 2006 (c) Pearson Education, 2006-2007. All rights reserved. You may modify and copy this slide show for your personal use, or for use in the classroom, as long as this copyright statement, the author's name, and the title are not changed. Chapter Overview Goal: Understand IA-32 architecture Basic Concepts of Computer Organization Instruction execution cycle Basic computer organization Data storage in memory How programs run IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System 2 Recall: Computer Model for ASM MOV AX, a ADD AX, b MOV x, AX … Register CPU PC AX BX xa+b Memory a 010100110010101 b 110010110001010 x 000000000010010 ... ALU + 3 Meanings of the Code (assumed) Assembly code MOV AX, a (Take the data stored in memory address ‘a’, and move it to register AX) ADD AX, b (Take the data stored in memory address ‘b’, and add it to register AX) MOV x, AX Machine code 01 0000001 1000010 MOV register address AX memory address a 11 0000001 1000110 ADD 01 1000011 0000001 (Take the data stored in register AX, and move it to memory address ‘x’) 4 Another Computer Model for ASM Processor AX BX … x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 ALU IR Memory Stored program architecture Register b MOV AX, a ADD AX, b MOV x, AX … PC address PC: program counter IR: instruction register 5 Step 1: Fetch (MOV AX, a) Register AX BX … Memory x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 ALU b MOV AX, a ADD AX, b MOV x, AX … IR PC 01 0000001 1000010 0000111 address 6 Step 2: Decode (MOV AX,a) Register AX BX … Memory x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 Controller ALU b MOV AX, a ADD AX, b MOV x, AX … clock IR PC 01 0000001 1000010 0000111 address 7 Step 3: Execute (MOV AX,a) Register AX BX Memory 00000000 00000001 … data x 00000000 000000001 a 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 Controller ALU b MOV AX, a ADD AX, b MOV x, AX … clock IR PC 01 0000001 1000010 0000111 address 8 Step 1: Fetch (ADD AX,b) Register AX BX Memory 00000000 00000001 … x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 ALU b MOV AX, a ADD AX, b MOV x, AX … IR PC 11 0000001 1000110 0001000 address 9 Step 2: Decode (ADD AX,b) Register AX BX Memory 00000000 00000001 … x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 Controller ALU b MOV AX, a ADD AX, b MOV x, AX … clock IR PC 11 0000001 1000110 0001000 address 10 Step 3a: Execute (ADD AX,b) Register AX BX Memory 00000000 00000001 … x a data 00000000 00000010 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 Controller + ALU b MOV AX, a ADD AX, b MOV x, AX … 00000000 00000011 clock IR PC 11 0000001 1000110 0001000 address 11 Step 3b: Write Back (ADD AX,b) Register AX BX Memory 00000000 00000000 00000011 00000001 … x a data 01 0000001 1000010 11 0000001 1000110 01 1000011 0000001 Controller ALU b MOV AX, a ADD AX, b MOV x, AX … 00000000 00000011 clock IR PC 11 0000001 1000110 0001000 address 12 Basic Computer Organization Clock synchronizes CPU operations Control unit (CU) coordinates execution sequence ALU performs arithmetic and bitwise processing data bus registers Central Processor Unit (CPU) ALU CU Memory Storage Unit I/O Device #1 I/O Device #2 clock control bus address bus 13 Clock Operations in a computer are triggered and thus synchronized by a clock Clock tells “when”: (no need to ask each other!!) When to put data on output lines When to read data from input lines Clock cycle measures time of a single operation Must long enough to allow signal propagation one cycle 1 0 14 Instruction/Data for Operations Where are the instructions needed for computer operations from? Stored-program architecture: The whole program is stored in main memory, including program instructions (code) and data CPU loads the instructions and data from memory for execution Don’t worry about the disk for now Where are the data needed for execution? Registers (inside the CPU, discussed later) Memory Constant encoded in the instructions 15 Memory Organized like mailboxes, numbered 0, 1, 2, 3,…, 2n-1. Each box can hold 8 bits (1 byte) So it is called byte-addressing Address of mailboxes: 16-bit address is enough for up to 64K 20-bit for 1M 32-bit for 4G Most servers need more than 4G!! That’s why we need 64-bit CPUs like Alpha (DEC/Compaq/HP) or Merced (Intel) … 16 Storing Data in Memory Character String: So how are strings like “Hello, World!” are stored in memory? ASCII Code! (or Unicode…etc.) Each character is stored as a byte Review: how is “1234” stored in memory? Integer: A byte can hold an integer number: ‒ between 0 and 255 (unsigned) or ‒ between –128 and 127 (2’s complement) How to store a bigger number? Review: how is 1234 stored in memory? 17 Big or Little Endian? Example: 1234 is stored in 2 bytes. = 100 1101 0010 in binary = 04 D2 in hexadecimal Do you store 04 or D2 first? Big Endian: 04 first Little Endian: D2 first Intel’s choice Reason: more consistent for variable length (e.g., 2 bytes, 4 bytes, 8 bytes…etc.) 18 Cache Memory High-speed expensive static RAM both inside and outside the CPU. Level-1 cache: inside the CPU chip Level-2 cache: often outside the CPU chip Cache hit: when data to be read is already in cache memory Cache miss: when data to be read is not in cache memory 19 How a Program Runs? User sends program name to Operating system gets starting cluster from searches for program in returns to System path loads and starts Directory entry Current directory Program 20 Load and Execute Process OS searches for program’s filename in current directory and then in directory path If found, OS reads information from directory OS loads file into memory from disk OS allocates memory for program information OS executes a branch to cause CPU to execute the program. A running program is called a process Process runs by itself. OS tracks execution and responds to requests for resources When the process ends, its handle is removed and memory is released How? OS is only a program! 21 Multitasking OS can run multiple programs at same time Multiple threads of execution within the same program Scheduler utility assigns a given amount of CPU time to each running program Rapid switching of tasks Gives illusion that all programs are running at the same time Processor must support task switching What supports are needed from hardware? 22 What's Next General Concepts IA-32 Processor Architecture Modes of operation Basic execution environment Floating-point unit Intel microprocessor history IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System 23 Modes of Operation Protected mode Real-address mode native mode (Windows, Linux) Programs are given separate memory areas named segments native MS-DOS System management mode power management, system security, diagnostics • Virtual-8086 mode hybrid of Protected each program has its own 8086 computer 24 Basic Execution Environment Address space: Protected mode 4 GB 32-bit address Real-address and Virtual-8086 modes 1 MB space 20-bit address 25 Basic Execution Environment Program execution registers: named storage locations inside the CPU, optimized for speed 32-bit General-Purpose Registers EAX EBP EBX ESP ECX ESIRegister EDX EDI Memory 16-bit Segment ALU Registers Controller EFLAGS CS SS clock EIP DS N Z IR ES PCFS GS 26 General Purpose Registers Used for arithmetic and data movement Addressing: AX, BX, CX, DX: 16 bits Split into H and L parts, 8 bits each Extended into E?X to become 32-bit register (i.e., EAX, EBX,…etc.) 27 Index and Base Registers Some registers have only a 16-bit name for their lower half: 28 Some Specialized Register Uses General purpose registers EAX: accumulator, automatically used by multiplication and division instructions ECX: loop counter ESP: stack pointer ESI, EDI: index registers (source, destination) for memory transfer, e.g. a[i,j] EBP: frame pointer to reference function parameters and local variables on stack EIP: instruction pointer (i.e. program counter) 29 Some Specialized Register Uses Segment registers In real-address mode: indicate base addresses of preassigned memory areas named segments In protected mode: hold pointers to segment descriptor tables CS: code segment DS: data segment SS: stack segment ES, FS, GS: additional segments EFLAGS Status and control flags (single binary bits) Control the operation of the CPU or reflect the outcome of some CPU operation 30 Status Flags (EFLAGS) Reflect the outcomes of arithmetic and logical operations performed by the CPU Carry: unsigned arithmetic out of range Overflow: signed arithmetic out of range Sign: result is negative Zero: result is zero Auxiliary Carry: carry from bit 3 to bit 4 Register Parity: sum of 1 bits is an even number Memory Controller clock ALU N Z IR PC 31 System Registers Application programs cannot access system registers IDTR (Interrupt Descriptor Table Register) GDTR (Global Descriptor Table Register) LDTR (Local Descriptor Table Register) Task Register Debug Registers Control registers CR0, CR2, CR3, CR4 Model-Specific Registers 32 Floating-Point, MMX, XMM Reg. Eight 80-bit floating-point data registers ST(0), ST(1), . . . , ST(7) 80-bit Data Registers arranged in a stack ST(0) used for all floating-point arithmetic Eight 64-bit MMX registers Eight 128-bit XMM registers for single-instruction multiple-data (SIMD) operations ST(1) ST(2) ST(3) ST(4) ST(5) ST(6) ST(7) 33 Opcode Register Intel Microprocessors Early microprocessors: Intel 8080: ‒ 64K addressable RAM, 8-bit registers ‒ CP/M operating system ‒ S-100 BUS architecture ‒ 8-inch floppy disks! Intel 8086/8088 ‒ IBM-PC used 8088 ‒ 1 MB addressable RAM, 16-bit registers ‒ 16-bit data bus (8-bit for 8088) ‒ separate floating-point unit (8087) This is where “real-address mode” comes from! 34 Intel Microprocessors The IBM-AT Intel 80286 ‒ 16 MB addressable RAM ‒ Protected memory ‒ Introduced IDE bus architecture ‒ 80287 floating point unit Intel IA-32 Family Intel386: 4 GB addressable RAM, 32-bit registers, paging (virtual memory) Intel486: instruction pipelining Pentium: superscalar, 32-bit address bus, 64-bit internal data path 35 Intel Microprocessors Intel P6 Family Pentium Pro: advanced optimization techniques in microcode Pentium II: MMX (multimedia) instruction set Pentium III: SIMD (streaming extensions) instructions Pentium 4 and Xeon: Intel NetBurst microarchitecture, tuned for multimedia 36 What’s Next General Concepts of Computer Architecture IA-32 Processor Architecture IA-32 Memory Management Real-address mode Calculating linear addresses Understand it from Protected mode the view point of the processor Multi-segment model Paging Components of an IA-32 Microcomputer Input-Output System 37 Real-Address Mode Programs have 1 MB RAM maximum addressable with 20-bit addresses Application programs can access any area of the 1MB memory Single tasking: one program at a time, but CPU can momentarily interrupt that program to process requests (called interrupts) from peripherals MS-DOS runs in real-address mode 38 Ancient History IBM PC XT (Intel 8088/8086) is a so-called 16-bit machine Each register has 16 bits 216 = 65536 = 64K Want to use more memory (640K, 1M)… How to hold 20-bit addresses with 16-bit registers in the 8086 processor? Solution: segmented memory All of memory is divided into 64KB units called segments 16 segments in total One 16-bit register for segment value and another for 16-bit offset within the segment 39 Segmented Memory Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value (in CS, DS, SS, or ES) added to a 16-bit offset F0000 E0000 8000:FFFF D0000 C0000 B0000 A0000 90000 80000 one segment 70000 60000 50000 40000 30000 represented as 8000:0250 0250 8000:0000 20000 8000 0000 10000 seg segment value ofs offset 00000 40 Calculating Linear Addresses Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset all done by the processor Example: convert 08F1:0100 to a linear address Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0 41 Protected Mode Designed for multitasking Each process (running program) is assigned a total of 4GB of addressable RAM Two parts: Segmentation: provides a mechanism of isolating individual code, data, and stack so that multiple programs can run without interfering one another Paging: provides demand-paged virtual memory where sections of a program’s execution environ. are moved into physical memory as needed Give segmentation the illusion that it has 4GB of physical memory 42 Segmentation in Protected Mode Segment: a logical unit of storage (not the same as the “segment” in real-address mode) e.g., code/data/stack of a program, system data structures Variable size Processor hardware provides protection All segments in the system are in the processor’s linear address space (physical space if without paging) Need to specify: base address, size, type, … segment descriptor & descriptor table linear address = base address + offset 43 Flat Segment Model Use a single global descriptor table (GDT) All segments (at least 1 code and 1 data) mapped to entire 32-bit address space not used Segment descriptor, in the Global Descriptor Table FFFFFFFF (4GB) 00040000 limit access 00000000 00040 ---- physical RAM base address 00000000 44 Multi-Segment Model Local descriptor table (LDT) for each program One descriptor for each segment located in a system segment of LDT type RAM Local Descriptor Table 26000 base limit 00026000 0010 00008000 000A 00003000 0002 access 8000 300045 Segmentation Addressing Program references a memory location with a logical address: segment selector + offset Segment selector: provides an offset into the descriptor table CS/DS/SS points to descriptor table for code/data/stack segment 46 Convert Logical to Linear Address Segment selector points to a segment descriptor, which contains base address of the segment. The 32-bit offset from the logical address is added to the segment’s base address, generating a 32-bit linear address Logical address Selector Offset Descriptor table Segment Descriptor + GDTR/LDTR Linear address (contains base address of descriptor table) 47 Paging Supported directly by the processor Divides each segment into 4096-byte blocks called pages Part of running program is in memory, part is on disk Sum of all programs can be larger than physical memory Virtual memory manager (VMM): An OS utility that manages loading and unloading of pages Page fault: issued by processor when a page must be loaded from disk 48 What's Next General Concepts IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Skipped … Input-Output System 49 What's Next General Concepts IA-32 Processor Architecture IA-32 Memory Management Components of an IA-32 Microcomputer Input-Output System How to access I/O systems? 50 Different Access Levels of I/O Call a HLL library function (C++, Java) Call an operating system function specific to one OS; device-independent medium performance Call a BIOS (basic input-output system) function easy to do; abstracted from hardware slowest performance may produce different results on different systems knowledge of hardware required usually good performance Communicate directly with the hardware May not be allowed by some operating systems 51 Displaying a String of Characters When a HLL program displays a string of characters, the following steps take place: Calls an HLL library function to write the string to standard output Library function (Level 3) calls an OS function, passing a string pointer OS function (Level 2) calls a BIOS subroutine, passing ASCII code and color of each character BIOS subroutine (Level 1) maps the character to a system font, and sends it to a hardware port attached to the video controller card Video controller card (Level 0) generates timed hardware signals to video display to display pixels 52 Summary Central Processing Unit (CPU) Arithmetic Logic Unit (ALU) Instruction execution cycle Multitasking Floating Point Unit (FPU) Complex Instruction Set Real mode and Protected mode Motherboard components Memory types Input/Output and access levels 53