Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ארכיטקטורת יחידת עיבוד מרכזית '( – תשס"ג סמסטר א36113741) March , 2007 Hugo Guterman ([email protected]) Web site: http://www.ee.bgu.ac.il/~cpuarch Arch. CPU L3 ISA. 1 Guterman March 2007 ©BGU What is “Computer Architecture” Computer Architecture = Instruction Set Architecture + Machine Organization Arch. CPU L3 ISA. 2 Guterman March 2007 ©BGU Outline ° ISA and Assembly Language ° Instruction Set Definition (MIPS) ° Registers and Memory ° Arithmetic Instructions ° Load/store Instructions ° Instruction Formats °DLX Architecture and ISA Arch. CPU L3 ISA. 3 Guterman March 2007 ©BGU Instruction Set Architecture (ISA) Arch. CPU L3 ISA. 4 Guterman March 2007 ©BGU Software Layers Arch. CPU L3 ISA. 5 Guterman March 2007 ©BGU Levels of Representation (Intr. to Comp. Review) temp = v[k]; High Level Language Program v[k] = v[k+1]; v[k+1] = temp; Compiler lw$15, lw$16, sw sw Assembly Language Program Assembler Machine Language Program 0000 1010 1100 0101 1001 1111 0110 1000 1100 0101 1010 0000 0110 1000 1111 1001 0($2) 4($2) $16, 0($2) $15, 4($2) 1010 0000 0101 1100 1111 1001 1000 0110 0101 1100 0000 1010 1000 0110 1001 1111 Machine Interpretation Control Signal Specification ALUOP[0:3] <= InstReg[9:11] & MASK ° ° Arch. CPU L3 ISA. 6 Guterman March 2007 ©BGU Basic ISA Classes ° Memory to Memory Machines But we need storage for temporaries • • Memory is slow • Memory is big (lots of address bits) °Architectural Registers registers can hold temporary variables • registers are faster than memory • memory traffic is reduced, so program is sped up (since registers are faster than memory) • • code density improves (since register named with fewer bits than memory location) Arch. CPU L3 ISA. 7 Guterman March 2007 ©BGU Basic ISA Classes (cont.) Accumulator • 1 address add A acc ← acc + mem[A] • 1+x address addx A acc ← acc + mem[A + x] General Purpose Register File (Register-Memory): • • 2 address add A B EA(A) ← EA(A) + EA(B) 3 address add A B C EA(A) ← EA(B) + EA(C) General Purpose Register File (Load/Store): • • 3 address add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb] • store Ra Rb mem[Rb] ← Ra Stack (not a register file but an operand stack) • 0 address add tos ← tos + next Comparison: • Bytes per instruction? Number of Instructions? Cycles per instruction? Arch. CPU L3 ISA. 8 Guterman March 2007 ©BGU Comparing Number of Instructions Arch. CPU L3 ISA. 9 Guterman March 2007 ©BGU Generic Examples of Instruction Format Widths Variable: … … Fixed: Hybrid: Arch. CPU L3 ISA. 10 Guterman March 2007 ©BGU Top 10 80x86 Instructions ° Rank instruction Integer Average Percent total executed 1 load 22% 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return 1% Total 96% ° Simple instructions dominate instruction frequency Arch. CPU L3 ISA. 11 Guterman March 2007 ©BGU Typical Operations (little change since 1960) Data Movement Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Shift shift left/right, rotate left/right Logical not, and, or, set, clear Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return Interrupt trap, return Synchronization test & set (atomic r-m-w) String Graphics (MMX) search, translate parallel subword ops (4 16bit add) Arch. CPU L3 ISA. 12 Guterman March 2007 ©BGU Compilers and Instruction Set Architectures • Ease of compilation °orthogonality: no special registers, few special cases, all operand modes available with any data type or instruction type °completeness: support for a wide range of operations and target applications ° regularity: no overloading for the meanings of instruction fields ° streamlined: resource needs easily determined • Register Assignment is critical too °Easier if lots of registers Arch. CPU L3 ISA. 13 Guterman March 2007 ©BGU Addressing Mode Usage? (ignore register mode) 3 programs measured on machine with all address modes (VAX) --- Displacement: 42% avg, 32% to 55% --- Immediate: 33% avg, 17% to 43% --- Register deferred (indirect): 13% avg, 3% to 24% --- Scaled: 7% avg, 0% to 16% --- Memory indirect: 3% avg, 1% to 6% --- Misc: 2% avg, 0% to 3% 75% 85% 75% displacement & immediate 88% displacement, immediate & register indirect Arch. CPU L3 ISA. 14 Guterman March 2007 ©BGU Instruction Format • If have many memory operands per instructions and many addressing modes, =>Address Specifier per operand •If have load-store machine with 1 address per instr. and one or two addressing modes, => encode addressing mode in the opcode Arch. CPU L3 ISA. 15 Guterman March 2007 ©BGU MIPS R3000 Instruction Set Architecture (Summary) Registers ° Instruction Categories • Load/Store • • Computational Jump and Branch • Floating Point - coprocessor • • Memory Management Special R0 - R31 PC HI LO 3 Instruction Formats: all 32 bits wide OP rs rt OP rs rt OP Arch. CPU L3 ISA. 16 rd sa funct immediate jump target Guterman March 2007 ©BGU MIPS I Registers °Programmable storage • • 2^32 x bytes of memory 31 x 32-bit GPRs (R0 = 0) • • 32 x 32-bit FP regs (paired DP) HI, LO, PC r0 r1 ° ° ° r31 PC lo hi Arch. CPU L3 ISA. 17 0 Guterman March 2007 ©BGU MIPS Addressing Modes/Instruction Formats • All instructions 32 bits wide Register (direct) op rs rt rd register Immediate Base+index op rs rt immed op rs rt immed register PC-relative op rs PC rt Memory + immed Memory + • Register Indirect? Arch. CPU L3 ISA. 18 Guterman March 2007 ©BGU Example: MIPS Assembly Language Notation Arch. CPU L3 ISA. 19 Guterman March 2007 ©BGU Instruction Set Definition (programming model) Arch. CPU L3 ISA. 20 Guterman March 2007 ©BGU Registers and Memory (MIPS) Arch. CPU L3 ISA. 21 Guterman March 2007 ©BGU Memory Organization Arch. CPU L3 ISA. 22 Guterman March 2007 ©BGU Memory Organization Arch. CPU L3 ISA. 23 Guterman March 2007 ©BGU Addressing Objects: Endianess and Alignment °Big Endian: address of most significant byte = word address (xx00 = Big End of word) • IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA °Little Endian: address of least significant byte = word address (xx00 = Little End of word) • Intel 80x86, DEC Vax, DEC Alpha (Windows NT) 3 2 1 little endian byte 0 0 msb 0 big endian byte 0 lsb 1 2 0 3 1 2 3 Aligned Alignment: require that objects fall on address that is multiple of their size. Not Aligned Arch. CPU L3 ISA. 24 Guterman March 2007 ©BGU Instruction Cycle (execution model) Arch. CPU L3 ISA. 25 Guterman March 2007 ©BGU Instruction Cycle (execution model) Arch. CPU L3 ISA. 26 Guterman March 2007 ©BGU Executing an Assembly Instruction Arch. CPU L3 ISA. 27 Guterman March 2007 ©BGU Register File Program Execution Arch. CPU L3 ISA. 28 Guterman March 2007 ©BGU Register File Program Execution Arch. CPU L3 ISA. 29 Guterman March 2007 ©BGU Another Example Arch. CPU L3 ISA. 30 Guterman March 2007 ©BGU Accessing Data Arch. CPU L3 ISA. 31 Guterman March 2007 ©BGU Memory Operation - Loads Arch. CPU L3 ISA. 32 Guterman March 2007 ©BGU Memory Operations - Store Arch. CPU L3 ISA. 33 Guterman March 2007 ©BGU Memory Operation - Loads Arch. CPU L3 ISA. 34 Guterman March 2007 ©BGU Memory Operation – Loads – cont’ Arch. CPU L3 ISA. 35 Guterman March 2007 ©BGU Instruction Format Arch. CPU L3 ISA. 36 Guterman March 2007 ©BGU Instruction Formats Arch. CPU L3 ISA. 37 Guterman March 2007 ©BGU Constants Arch. CPU L3 ISA. 38 Guterman March 2007 ©BGU Loading Immediate Values Arch. CPU L3 ISA. 39 Guterman March 2007 ©BGU MIPS Machine Language Arch. CPU L3 ISA. 40 Guterman March 2007 ©BGU Summary °If code size is most important, use variable length ° If performance is most important, use fixed length instructions ° Recent embedded machines (ARM, MIPS) have an optional mode to execute subset of 16bit wide instructions (Thumb, MIPS16); per procedure, decide which one of performance or density is more important Arch. CPU L3 ISA. 41 Guterman March 2007 ©BGU Summary (cont’) ° “Simple” computations, movements of data, etc., are not “simple” in terms of a single, obvious assembly instruction • • Often requires a sequence of even more primitive instructions One options is to try to “anticipate” every such computation, and try to provide an assembly instruction for it PRO: assembly programs are easier to write by hand CON: hardware gets really, really complicated by instructions used very rarely. Compilers might be harder to write • Other option is to provide a small set of essential primitive instructions CON: anything in a high level language turns into LOTS of instructions in assembly language PRO: hardware and compiler become easier to design, cleaner, easier to optimize for speed, performance - Arch. CPU L3 ISA. 42 Guterman March 2007 ©BGU DLX (“Deluxe”) Architecture IF: Instruction fetch ID: Instruction decode/ register file read EX: Execute/ address calculation MEM: Memory access WB: Write back 0 M u x 1 Add Add 4 Add result Shift left 2 PC Read register 1 Address Read data 1 Read register 2 Registers Read Write data 2 register Instruction Instruction memory Zero ALU ALU result 0 M u x 1 Write data Address Read data 1 M u x 0 Data memory Write data 16 Sign extend 32 Arch. CPU L3 ISA. 43 Guterman March 2007 ©BGU Multicycle Approach ° Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit ° At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers PC 0 M u x 1 Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register Arch. CPU L3 ISA. 44 0 M Instruction u x [15– 11] 1 B 0 M u x 1 Sign extend 32 Zero ALU ALU result ALUOut 0 4 Write data 16 0 M u x 1 A 1 M u 2 x 3 Shift left 2 Guterman March 2007 ©BGU Five Execution Steps ° Instruction Fetch ° Instruction Decode and Register Fetch ° Execution, Memory Address Computation, or Branch Completion ° Memory Access or R-type instruction completion ° Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! Arch. CPU L3 ISA. 45 Guterman March 2007 ©BGU DLX Instruction Execution ° Every DLX instruction can be implemented in at most 5 CC!! • • • Instruction Fetch (and PC Increment) cycle (IF) - IR Mem [PC] - NPC PC + 4 Instruction Decode / Register Fetch cycle (ID) - A Regs [ IR6……..10]; B Regs [ IR11……..15]; - Imm ((IR16)16 # # IR16...31) Execution/effective address cycle (EX) Performs one of the four possible operations (depending on the DLX instruction type) 1. Memory Reference: - ALUOutput A + Imm; 2. Register-Register ALU Instruction Arch. CPU L3 ISA. 46 ALUOutput A fun B; Guterman March 2007 ©BGU DLX Instruction Execution (cont’) 3. Register-Immediate ALU instruction - ALUOutput A op Imm; 4. Branch - ALUOutput NPC + Imm; - Cond (A op 0) • Memory access/branch completion cycle (MEM) 1. Memory Reference - ALUOutput A op Imm; 2. Branch - LMD Mem [ALUOutput] or Mem [ALUOutput] LMD • Write - Back Cycle (WB) 1. Register-Register ALU Instruction – Regs [ IR16……..20] ALUOutput; 2. Register-Immediate ALU instruction – Regs [ IR11……..15] ALUOutput; 3. Load instruction – Regs [ IR11……..15] LDM; Arch. CPU L3 ISA. 47 Guterman March 2007 ©BGU Arch. CPU L3 ISA. 48 Guterman March 2007 ©BGU Arch. CPU L3 ISA. 49 Guterman March 2007 ©BGU Arch. CPU L3 ISA. 50 Guterman March 2007 ©BGU Arch. CPU L3 ISA. 51 Guterman March 2007 ©BGU Arch. CPU L3 ISA. 52 Guterman March 2007 ©BGU