* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 3_Old
Survey
Document related concepts
Transcript
1 ALU for Computers (MIPS) • design a fast ALU for the MIPS ISA • requirements ? – support the arithmetic/logic operations: add, addi addiu, sub, subu, and, or, andi, ori, xor, xori, slt, slti, sltu, sltiu • design a multiplier • design a divider 2 Review Digital Logic Gates: Combinational Logic 3 Review Digital Logic PLA: AND array, OR array Review Digital Logic 4 5 A D latch implemented with NOR gates. A D flip-flop with a falling-edge trigger. 6 Review Digital Logic D Q Value of D is sampled on positive clock edge. Q outputs sampled value for rest of cycle. CLK D Q Review: Edge-Triggering in Verilog module ff(D, Q, CLK); input D, CLK; output Q; Module code has two bugs. Where? always @ (CLK) Q <= D; endmodule module ff(D, Q, CLK); input D, CLK; output Q; reg Q; always @ (posedge CLK) Q <= D; endmodule Correct ? 7 8 CLK R (red) Change Rst If Change == 1 on positive CLK edge traffic light changes Y (yellow) If Rst == 1 on positive CLK edge RYG=100 G (green) RYG 100 9 Rst == 1 RYG 100 Change == 1 Change == 1 RYG 001 Change == 1 RYG 010 10 Rst == 1 Change == 1 RYG 100 Change == 1 RYG 001 Change == 1 RYG 010 Change RYG 100 001 010 100 11 Rst == 1 Change == 1 RYG 100 RYG 001 Change == 1 RYG 010 Change == 1 “One-Hot Encoding” D Q R D Q G D Q Y 12 Rst == 1 Change == 1 RYG 100 Change == 1 RYG 001 Change == 1 RYG 010 Rst Change Next State Combinational Logic D Q R D Q G D Q Y 13 State Elements: Traffic Light Controller D Q R D Q G wire next_R, next_Y, next_G; output R, Y, G; ??? D Q Y 14 D Q Value of D is sampled on positive clock edge. Q outputs sampled value for rest of cycle. module ff(Q, D, CLK); CLK input D, CLK; output Q; reg Q; always @ (posedge CLK) Q <= D; endmodule 15 State Elements: Traffic Light Controller D Q R D Q G wire next_R, next_Y, next_G; output R, Y, G; ff ff_R(R, next_R, CLK); ff ff_Y(Y, next_Y, CLK); ff ff_G(G, next_G, CLK); D Q Y 16 Next State Logic: Traffic Light Controller Rst Change Next State Combinational Logic next_R wire R next_G G next_Y next_R, next_Y, next_G; assign next_R = rst ? 1’b1 : (change ? G : R); assign next_Y = rst ? 1’b0 : (change ? R : Y); assign next_G = rst ? 1’b0 : (change ? Y : G); Y 17 wire next_R, next_Y, next_G; output R, Y, G; assign next_R = rst ? 1’b1 : (change ? G : R); assign next_Y = rst ? 1’b0 : (change ? R : Y); assign next_G = rst ? 1’b0 : (change ? Y : G); ff ff_R(R, next_R, CLK); ff ff_Y(Y, next_Y, CLK); ff ff_G(G, next_G, CLK); 18 Logic Diagram: Traffic Light Controller Rst == 1 Change == 1 RYG 100 Change == 1 RYG 001 Change == 1 RYG 010 Next State Combinational Logic D Q R D Q G D Q Y ALU for MIPS ISA • design a 1-bit ALU using AND gate, OR gate, a full adder, and a mux 19 20 ALU for MIPS ISA • design a 32-bit ALU by cascading 32 1-bit ALUs 21 ALU for MIPS • a 1-bit ALU performing AND, OR, addition and subtraction If we set Binvert = Carryin =1 then we can perform a - b 22 23 ALU for MIPS • include a “less” input for set-on-less-than (slt) 24 ALU for MIPS • design the most significant bit ALU • most significant bit need to do more work (detect overflow and MSB can be used for slt ) • how to detect an overflow overflow = carryin{MSB} xor carryout{MSB] overflow = 1 ; means overflow overflow = 0 ; means no overflow • set-on-less-than slt $1, $2, $3; if $2 < $3 then $1 = 1, else $1 = 0 ; if MSB of $2 - $3 is 1, then $1 = 1 ; 2’s comp. MSB of a negative no. is 1 25 ALU for MIPS • a 1-bit ALU for the MSB Overflow =Carryin XOR Carryout 26 A 32-bit ALU constructed from 32 1-bit ALUs 27 A 32-bit ALU with zero detector 28 29 A Verilog behavioral definition of a MIPS ALU. 30 ALU for MIPS • Critical path of 32-bit ripple carry adder is 32 x carry propagation delay • How to solve this problem – design trick : use more hardware – design trick : look ahead, peek – carry look adder (CLA) • CLA a 0 0 1 1 b 0 1 0 1 cout 0 cin cin 1 propagate = a + b; nothing happen propagate cin propagate cin generate generate = ab 31 ALU for MIPS • CLA using 4-bit as an example • two 4-bit numbers: a3a2a1a0, b3b2b1b0 • p0 = a0 + b0; g0 = a0b0 c1 = g0 + p0c0 c2 = g1 + p1c1 c3 = g2 + p2c2 c4 = g3 + p3c3 • larger CLA adders can be constructed by cascading 4bit CLA adders • other adders: carry select adder, carry skip adder 32 Design Process • Divide and Conquer – using simple components – glue simple components together – work on the things you know how to do. The unknown will become obvious as you make progress • Successive Refinement – multiplier design – divider design 33 Multiplier • paper and pencil method multiplicand multiplier 0110 1001 0110 0000 0000 0110 0110110 product n bits x m bits = m+n bits binary : 0 place 0 1 place a copy of multiplicand 34 Multiply Hardware Version 1 32 bits x 32 bits; using 64-bit multiplicand reg. 64 bit ALU, 64 bit product reg. 32 bit multiplier multiplicand 64 bits 64-bit ALU product shift left shift right multiplier ADD write control 64 bits Control provides four control signals Check the right most bit of M’r to decide to add 0 or multiplicand Multiply Algorithm Version 1 1. test multiplier0 (i.e., bit0 of multiplier) 1.a if multiplier0 = 1, add multiplicand to product and place result in product register 2. shift the multiplicand left 1 bit 3. shift the multiplier right 1 bit 4. 32nd repetition ? if yes done if no go to 1. 35 36 Multiply Algorithm Version 1 Example 0010 x 0101 = 0000 1010 iter. 0 1 2 3 4 step initial 1.a 2 3 2 3 1.a 2 3 2 3 multiplier 0101 0101 0101 0010 0010 0001 0001 0001 0000 0000 0000 multiplicand 0000 0010 0000 0010 0000 0100 0000 0100 0000 1000 0000 1000 0000 1000 0001 0000 0001 0000 0010 0000 0010 0000 product 0000 0000 0000 0010 0000 0010 0000 0010 0000 0010 0000 0010 0000 1010 0000 1010 0000 1010 0000 1010 0000 1010 37 Multiplier Algorithm Version 1 • • • • observations from version 1 1/2 bits in multiplicand always 0 use 64-bit adder is wasted (for 32 bit x 32 bit) 0’s inserted into multiplicand as shifted left, least significant bits of the product does not change once formed • 3 steps per bit • shift product to right instead of shifting multiplicand to left ? (by adding to the left half of the product register) 38 Multiply Hardware Version 2 32-bit multiplicand reg. 32-bit ALU, 64-bit product reg. 32-bit multiplier reg multiplicand 32 bits 32-bit ALU product 32 bits 32 bits ADD shift right shift right multiplier control write Write into the left half of the product register Check the right most bit of M’r to decide to add 0 or multiplicand 39 Multiply Algorithm Version 2 1. test multiplier0 (i.e., bit 0 of the multiplier) 1a. if multiplier0 = 1 add multiplicand to the left half of product and place the result in the left half of product register; 2. shift product reg. right 1 bit 3. shift multiplier reg. right 1 bit 4. 32nd repetition ? if yes done if no, go to 1. 40 Multiply Algorithm Version 2 Example iter. 0 1 2 3 4 step initial 1.a 2 3 1.a 2 3 2 3 2 3 multiplier 0011 0011 0011 0001 0001 0001 0000 0000 0000 0000 0000 multiplicand 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 product 0000 0000 0010 0000 0001 0000 0001 0000 0011 0000 0001 1000 0001 1000 0000 1100 0000 1100 0000 0110 0000 0110 41 Multiply Version 2 • Observations – product reg. wastes space that exactly matches the size of multiplier – 3 steps per bit – combine multiplier register and product register 42 Multiply Hardware Version 3 • 32-bit multiplicand register, 32-bit ALU, 64-bit product register, multiplier reg is part of product register multiplicand ADD 32 bit ALU write into left half control product (multiplier) shift right 43 Multiply Algorithm Version 3 1. test product0 (multiplier is in the right half of product register) 1a. if product0 = 1 add multiplicand to the left half of product and place the result in the left half of product register 2. shift product register right 1 bit 3. 32nd repetition ? if yes, done if no, go to 1. 44 Multiply Algorithm Version 3 Example 1110 x 1011 iter. 0 1 2 3 4 step initial 1.a 2 1.a 2 2 1.a 2 multiplicand 1110 1110 1110 1110 1110 1110 1110 1110 need to save the carry 1110 x 1011 = 1001 1010 14 x 11 = 154 product 0000 1011 1110 1011 0111 0101 10101 0101 1010 1010 0101 0101 10011 0101 1001 1010 45 Multiply Algorithm Version 3 • Observations • 2 steps per bit because of multiplier and product in one register, shift right 1 bit once (rather than twice in version 1 and version 2) • MIPS registers Hi and Li correspond to left and right half of product • MIPS has instruction multu • How about signed numbers in multiplication ? – method 1: keep the sign of both numbers and use the magnitude for multiplication, after 32 repetitions, then change the product to appropriate sign. – method 2: Booth’s algorithm – Booth’s algorithm is more elegant in signed number multiplications – Booth’s algorithm uses the same hardware as version 3 46 Booth’s Algorithm • Motivation for Booth’s Algorithm is speed example 2 x 6 = 0010 x 0110 normal approach 0010 0110 Booth’s approach 0010 0110 Booth’s approach : replace a string of 1s in multiplier by two actions action 1: beginning of a string of 1s, subtract multiplicand action 2: end of a string of 1s, add multiplicand 47 Booth’s Algorithm end of run middle of run beginning of run 011111111111111111110 current bit bit to the right explanation action sub. mult’d from left half of product (previous bit) 1 0 beginning of a run of 1s 1 1 middle of a run no arithmetic oper. 0 1 end of a run 0 0 middle of a run of 0s add mul’d to left half of product no arith. operation. 48 Booth’s Algorithm Example -2 x 7=-14 in signed binary 1110 x 0111 = 1111 0010 iteration step 0 initial 1 sub. product shift right 2 shift right 3 shift right 4 add shift right multiplicand 1110 1110 1110 1110 1110 1110 1110 product 0000 0111 0010 0111 0001 0011 0000 1001 0000 0100 1110 0100 1111 0010 To begin with we put multiplier at the right half of the product register previous bit 0 0 1 1 1 1 0 49 Divide Algorithm Paper and pencil divisor 1011 1010101010 quotient dividend remainder (modulo ) 50 Divide Hardware Version 1 • 64-bit divisor reg., 64-bit ALU, 32-bit quotient reg. 64-bit remainder register divisor shift right 64-bit ALU quotient shift left remainder write control put the dividend in the remainder register initially 51 Divide Algorithm Version 1 start: place dividend in remainder 1. sub. divisor from the remainder and place the result in remainder 2. test remainder 2a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1 2b. if remainder <0, restore the original value by adding divisor to remainder, and place the sum in remainder. shift quotient to left and setting new least significant bit 0 3. shift divisor right 1 bit 4. n+1 repetitions ? if yes, done, if no, go to 1. Divide Algorithm Version 1 Example iter. 0 1 2 3 4 5 step initial 1 2b 3 1 2b 3 1 2b 3 1 2a 3 1 2a 3 quotient 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0001 0001 0011 0011 divisor 0010 0000 0010 0000 0010 0000 0001 0000 0001 0000 0001 0000 0000 1000 0000 1000 0000 1000 0000 0100 0000 0100 0000 0100 0000 0010 0000 0010 0000 0010 0000 0001 remainder 0000 0111 1110 0111 0000 0111 0000 0111 1111 0111 0000 0111 0000 0111 1111 1111 0000 0111 0000 0111 0000 0011 0000 0011 0000 0011 0000 0001 0000 0001 0000 0001 52 53 Divide Algorithm Version 1 Observations – 1/2 bits in divisor always 0 – 1/2 of divisor is wasted – 1/2 of 64-bit ALU is wasted Possible improvement – instead of shifting divisor to right, shifting remainder to left ? – first step can not produce a 1 in quotient, so switch order to shift first and then subtract. This can save one iteration 54 Divide Hardware Version 2 32-bit divisor reg. 32-bit ALU, 32-bit quotient reg., 64-bit remainder reg. divisor quotient 32-bit ALU shift left remainder shift left control 55 Divide Algorithm Version 2 start: place dividend in remainder 1. shift remainder left 1 bit 2. sub. divisor from the left half of remainder and place the result in the left half of remainder 3. test remainder 3a. if remainder >= 0, shift quotient to left setting the new rightmost bit to 1 3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift quotient to left and setting new least significant bit 0 4. n repetitions ? if yes, done, if no, go to 1. 56 Divide Algorithm Version 2 Example iter. 0 1 2 3 4 step initial 1 2 3b 1 2 3a 1 2 3b 1 2 3a quotient 0000 0000 0000 0000 0000 0000 0001 0001 0001 0010 0010 0010 0101 divisor 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 0011 remainder 0000 1111 0001 1110 1110 1110 0001 1110 0011 1100 0000 1100 0000 1100 0001 1000 1110 1000 0001 1000 0011 0000 0000 0000 0000 0000 57 Divide Algorithm Version 2 • Observations – 3 steps (shift remainder left, subtract, shift quotient left) • Further improvement (version 3) – eliminating quotient register by combining with remainder register as shifted left – therefore loop contains only two steps, because the shift of remainder is shifting the remainder in the left half and the quotient in the right half at the same time – consequence of combining the two registers together is the remainder shifted one time unnecessary at the last iteration – final correction step: shift back the remainder in the left half of the remainder register (i.e., shift right 1 bit of remainder only) 58 Divide Hardware Version 3 32-bit divisor register, 32-bit ALU, 64-bit remainder register, 0-bit quotient register (quotient bit shifts into remainder register, as remainder register shifts left) divisor 32bits 32-bit ALU shift left remainder, quotient 64-bit write control 59 Divide Algorithm Version 3 start: place dividend in remainder 1. shift remainder left 1 bit 2. sub. divisor from the remainder and place the result in remainder 3. test remainder 3a. if remainder >= 0, shift remainder to left setting the new rightmost bit to 1 3b. if remainder <0, restore the original value by adding divisor to the left half of remainder, and place the sum in the left of the remainder. also shift remainder to left and setting new least significant bit 0 4. n repetitions ? if yes, done, if no, go to 2. 60 Divide Algorithm Version 3 Example iter. 0 step initial 1 2 3b 2 3b 2 3a 2 3b divisor 0101 0101 0101 0101 0101 0101 0101 0101 0101 0101 remainder 0000 1110 0001 1100 1 1100 1100 0011 1000 2 1110 1000 0111 0000 3 0010 0000 0100 0001 4 1111 0001 1000 0010 0100 0010 correction step: shift remainder right 1bit. quotient 61 Divide Algorithm Version 3 • Observations – same hardware as multiply, need a 32-bit ALU to add and subtract and a 64-bit register to shift left and right – divide algorithm version 3 is called restoring division algorithm for unsigned numbers • Signed numbers divide – simplest method » remember signs of dividend and divisor, make positive, and finally complement quotient and remainder as necessary » dividend and remainder must have the same sign » quotient is negative if dividend sign and divisor sign disagree – SRT (named after three persons) method » an efficient algorithm 62 Floating Point Numbers • What can be represented in N bits ? unsigned 0 <-------------> 2N-1 2’s complement. -2N- 1 <------------------> 2N-1 - 1 1’s comp. -2N-1+ 1 <---------------------->2N-1 - 1 BCD 0 <-----------------------> 10N/4 - 1 How about very small numbers, very large numbers rationals, such as 2/3; irrationals such as 2; transcendentals, such as , . 63 Floating Point Numbers • Mantissa (aka Significand), Exponent (using radix of 10) 6.12 x 10 23 S E M IEEE standard F.P. 1.M x 2 E-127 single precision S(1bit), E(8 bits), M(23 bits) mantissa = sign + magnitude; magnitude is normalized with hidden integer bit: 1.M exponent = E -127 (excess 127), 0 < E < 255 a FP number N = (-1)S 2(E-127) (1.M) 0 = 0 00000000 00000000000000000000000 -1.5 = 1 01111111 10000000000000000000000 64 Floating Point Numbers • Single Precision FP numbers - 0.75 = __________________________________ - 5.0 = ___________________________________ 7 = ____________________________________ -0.75 =-0.11b=-1.1 x 2-1 E=126 -5.0 = -101.0b=-1.01 x 22 E=129 7 = 111b = 1.11 x 22 E=129 1 01111110 10000.......0 65 Floating Point Numbers • Single precision FP number What is the smallest number in magnitude ? (1.0) 2 -126 What is the largest number in magnitude ? (1.11111111111111111111111)binary 2127 = (2 - 2-23) 2127 66 Floating Point Numbers single precision FP numbers Exponent Significand 0 0 0 nonzero 1 to 254 anything 255 0 255 nonzero other topics in FP numbers 1. extra bits for rounding 2. guard bit, sticky bit 3. algorithms for FP numbers Object represented 0 denormalized numbers floating point numbers infinite NaN (Not A Number) 67 Floating Point Numbers • Double precision – 64 bits total » 52-bit significand » 11-bit exponent (excess 1023 bias) – Number is: (-1)s (1.M) x 2E-1023 68 Basic Addition Algorithm • Steps for Y + X, assuming Y >= X 1. Align binary points (denormalize smaller number) a. compute Diff = Exp(Y) - Exp(X); Exp = Exp(Y) b. Sig(X) = Sig(X) >> Diff 2. Add the aligned components Sig = Sig(X) + Sig(Y) 3. Normalize the sum 1. shift Sig right/left until leading bit is 1; decrementing or incrementing Exp. 2. Check for overflow in Exp 3. Round 4. repeat step 3 it not still normalized 69 Addition Example • 4-bit significand 1.0110 x 23 + 1.1000 x 22 • align binary points (denormalize smaller number) 1. 0110 x 23 0. 1100 x 23 • Add the aligned components 10. 0010 x 23 • Normalize the sum 1.0001 x 24 No overflow, no rounding 70 Another Addition Example • 1.0001 x 23 - 1.1110 x 1 – 4-bit significand; extra bit needed for accuracy 1. Align binary point: 1. 0001 x 23 - 0. 01111 x 23 2. Subtract the aligned components 0. 10011 x 23 3. Normalize 1.0011 x 22 = 4.75 Without extra bit, the result would be 0.1001 x 23 = 100.1 = 4.5, which is off by 0.25. This is too much! 71 Accuracy and Rounding • Want arithmetic to be fully precise – IEEE 754 keeps two extra digits on the right during intermediate calculations (guard digit, round digit) • Alignment step can cause data to be discarded (shifted out on right) 2.56 x 100 + 2.34 x 102 2.3400 x 102 + 0.0256 x 102 2.3656 x 102 (We have two digits to round 0 to 49 round down 51 to 99 round up Round Guard Answer = 2.37 x 102 Without using Guard and Round digits, Answer would be 2.36 x 102