Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COMP541 Datapaths I Montek Singh Mar 28, 2012 1 Topics Over next 2 classes: datapaths How ALUs are designed How data is stored in a register file Lab 9: Start building a datapath! 2 What is computer architecture? 3 Architecture (ISA) Jumping up a few levels of abstraction. Architecture: the programmer’s view of the computer Defined by instructions (operations) and operand locations Microarchitecture: how to implement an architecture in hardware Application Software programs Operating Systems device drivers Architecture instructions registers Microarchitecture datapaths controllers Logic adders memories Digital Circuits AND gates NOT gates Analog Circuits amplifiers filters Devices transistors diodes Physics electrons MIPS Machine Language Three instruction formats: R-Type: register operands I-Type: immediate operand J-Type: for jumps R-Type instructions Register-type 3 register operands: rs, rt: source registers rd: destination register Other fields: op: the operation code or opcode (0 for R-type instructions) funct: the function – together, op and funct tell the computer which operation to perform shamt: the shift amount for shift instructions, otherwise it is 0 R-Type op 6 bits rs 5 bits rt rd shamt funct 5 bits 5 bits 5 bits 6 bits R-Type Examples Field Values Assembly Code rs op rt rd shamt funct add $s0, $s1, $s2 0 17 18 16 0 32 sub $t0, $t3, $t5 0 11 13 8 0 34 5 bits 5 bits 5 bits 5 bits 6 bits 6 bits Note the order of registers in the assembly code: add rd, rs, rt Machine Code op rs rt rd shamt funct 000000 10001 10010 10000 00000 100000 (0x02328020) 000000 01011 01101 01000 00000 100010 (0x016D4022) 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits I-Type instructions Immediate-type 3 operands: op: the opcode rs, rt: register operands imm: 16-bit two’s complement immediate I-Type op 6 bits rs 5 bits rt imm 5 bits 16 bits I-Type Examples Assembly Code Field Values rs op rt imm addi $s0, $s1, 5 8 17 16 5 addi $t0, $s3, -12 8 19 8 -12 lw $t2, 32($0) 35 0 10 32 sw $s1, 43 9 17 4 4($t1) 6 bits Note the differing order of registers in the assembly and machine codes: 5 bits 5 bits 16 bits Machine Code op rs rt imm 001000 10001 10000 0000 0000 0000 0101 (0x22300005) addi rt, rs, imm 001000 10011 01000 1111 1111 1111 0100 (0x2268FFF4) lw rt, imm(rs) 100011 00000 01010 0000 0000 0010 0000 (0x8C0A0020) sw rt, imm(rs) 101011 01001 10001 0000 0000 0000 0100 (0xAD310004) 6 bits 5 bits 5 bits 16 bits J-Type instructions Jump-type 26-bit address operand (addr) Used for jump instructions (j) J-Type op addr 6 bits 26 bits Review: Instruction Formats R-Type op 6 bits rs 5 bits rt rd shamt funct 5 bits 5 bits 5 bits 6 bits I-Type op 6 bits rs 5 bits rt imm 5 bits 16 bits J-Type op addr 6 bits 26 bits Microarchitecture Microarchitecture: how to implement an architecture in hardware This is sometimes just called implementation Processor: Datapath: functional blocks Control: control signals Application Software programs Operating Systems device drivers Architecture instructions registers Microarchitecture datapaths controllers Logic adders memories Digital Circuits AND gates NOT gates Analog Circuits amplifiers filters Devices transistors diodes Physics electrons Parts of CPUs Datapath The registers and logic to perform operations on them Control unit Generates signals to control datapath 13 Memory and I/O Memories are connected to the data/control in and out lines Example: register to memory ops Will discuss I/O arrangements later 14 Basic Datapath Basic components of the CPU datapath PC, Instruction Memory, Register File, ALU, Data Memory CLK CLK CLK PC' PC 32 32 32 A RD Instruction Memory 5 32 5 A1 A2 WE3 WE RD1 RD2 32 32 32 5 32 A3 WD3 Register File 32 A RD Data Memory WD 32 C First: A “lightweight” ALU Arithmetic Logic Unit = ALU 16 Lightweight ALU A lightweight ALU from textbook: 3-bit function select (7 functions) A B N N ALU N Y 3F F2:0 Function 000 A&B 001 A|B 010 A+B 011 not used 100 A & ~B 101 A | ~B 110 A-B 111 SLT Lightweight ALU: Internals (light-weight version) A B N N N 0 1 F2 N Cout + [N-1] S Zero Extend N N N N 0 1 2 3 2 N Y F1:0 F2:0 Function 000 A&B 001 A|B 010 A+B 011 not used 100 A & ~B 101 A | ~B 110 A-B 111 SLT Set Less Than (SLT) Example Configure a 32-bit ALU for the A set if less than (SLT) operation. B N Suppose A = 25 and B = 32. N A is less than B, so we expect Y to N 0 1 F2 N Cout + [N-1] S 1 bit (MSB) Zero Extend N N N N 0 1 2 3 2 N Y F1:0 be the 32-bit representation of 1 (0x00000001). For SLT, F2:0 = 111. F2 = 1 configures the adder unit as a subtracter. So 25 - 32 = -7. The two’s complement representation of -7 has a 1 in the most significant bit, so S31 = 1. With F1:0 = 11, the final multiplexer selects Y = S31 (zero extended) = 0x00000001. Next: A “full-feature” ALU 20 Arithmetic Logic Unit (ALU) Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bidirectional Barrel Shifter Add/Sub Boolean Bool 0 1 1 Math 1 Flags N V,C Flag 0 R 0 … Shft Z Flag Sub Bool Shft Math 0 XX 0 1 1 XX 0 1 X X0 1 1 X X1 1 1 X 00 1 0 X 10 1 0 X 11 1 0 X 00 0 0 X 01 0 0 X 10 0 0 X 11 0 0 OP A+B A-B 0 1 B<<A B>>A B>>>A A & B A | B A ^ B A | B Shifting Logic Shifting is a common operation applied to groups of bits used for alignment used for “short cut” arithmetic operations X << 1 is often the same as 2*X X >> 1 can be the same as X/2 For example: X = 2010 = 000101002 Left Shift: (X << 1) = 001010002 = 4010 Right Shift: (X >> 1) = 000010102 = 1010 X7 X6 X5 X4 X3 X2 X1 X0 “0” SHL1 Signed or “Arithmetic” Right Shift: (-X >>> 1) = (111011002 >>> 1) = 111101102 = -1010 0 1 R7 0 1 R6 0 1 R5 0 1 R4 0 1 R3 0 1 R2 0 1 R1 0 1 R0 Shifting Logic How do you shift by more than 1 position? feed other bits into the multiplexer e.g., left-shift-by-2 multiplexer for Rk receives input from Xk-2 How do you allow the shift amount to be specified dynamically? need a bigger multiplexer shift amount is applied as the select input will design in class and lab 23 Boolean Operations It will also be useful to perform logical operations on groups of bits. Which ones? ANDing is useful for “masking” off groups of bits. ex. 10101110 & 00001111 = 00001110 (mask selects last 4 bits) ANDing is also useful for “clearing” groups of bits. ex. 10101110 & 00001111 = 00001110 (0’s clear first 4 bits) ORing is useful for “setting” groups of bits. ex. 10101110 | 00001111 = 10101111 (1’s set last 4 bits) XORing is useful for “complementing” groups of bits. ex. 10101110 ^ 00001111 = 10100001 (1’s invert last 4 bits) NORing is useful for.. uhm… ex. 10101110 # 00001111 = 01010000 (0’s invert, 1’s clear) Boolean Unit It is simple to build up a Boolean unit using primitive gates and a mux to select the function. Since there is no interconnection between bits, this unit can be simply replicated at each position. The cost is about 7 gates per bit. One for each primitive function, and approx 3 for the 4-input mux. Bi Ai This logic block is repeated for each bit (i.e. 32 times) 00 01 10 Bool Qi 11 An ALU at last! Full-feature ALU from COMP411: A B 5-bit ALUFN Sub Bidirectional Barrel Shifter Add/Sub Boolean Bool 0 1 1 Math 1 Flags N V,C Flag 0 R 0 … Shft Z Flag Sub Bool Shft Math 0 XX 0 1 1 XX 0 1 X X0 1 1 X X1 1 1 X 00 1 0 X 10 1 0 X 11 1 0 X 00 0 0 X 01 0 0 X 10 0 0 X 11 0 0 OP A+B A-B 0 1 B<<A B>>A B>>>A A & B A | B A ^ B A | B Which one do we implement? We will use the full-feature one! slightly more challenging … I will help you! … but a lot more fun to use supports much more useful set of instructions for your final programming project 27 Processor Architecture Rather, “microarchitecture” or implementation 28 Microarchitectures Multiple implementations for a single architecture: Single-cycle Each instruction executes in a single cycle Multicycle Each instruction is broken up into a series of shorter steps Pipelined Each instruction is broken up into a series of steps Multiple instructions execute at once. Directly impacts performance obtained Processor Performance Program execution time Execution Time = (# instructions) (cycles/instruction)(seconds/cycle) Definitions: Cycles/instruction = CPI Seconds/cycle = clock period 1/CPI = Instructions/cycle = IPC Challenge is to satisfy constraints of: Cost Power Performance MIPS Processor We will consider a subset of MIPS instructions (in book & lab): R-type instructions: and, or, add, sub, slt, … Memory instructions: lw, sw, … Branch instructions: beq, … Some immediate instructions too: addi, … Jumps as well: j, … Next Next class: We’ll look at single cycle MIPS Then the more complex versions Lab Friday (March 30) Demo your graphics displays (Lab 8) Start on Lab 9 (will post on website by Fri) start building the datapath! – ALU – Registers 32