Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Problem 5 -- Sequential circuits CLK2 a) Would the sequential circuit from the figure above be considered a latch, a master-slave latch pair or a pulse-triggered latch? Briefly explain your answer. This is a pulse-triggered latch. The data is sampled when both CLK=1 and CLK2=1(the duration is only 3*tinv). b) All transistors in this circuit are unit-sized, with equivalent resistances R and gate capacitances C (ignore diffusion capacitances). Calculate the propagation delay tClk-Q for high-to-low and low-to-high transitions. Load on the output Q is equal 12C. Ignore the signal slopes in delay calculation. For LH, 3R*2C+R*CQ=18RC For HL, 2R*CQ=24RC c) This circuit does not strictly follow the rules for designing sequential logic discussed in the class. List three major problems in the operation of this circuit. (1) There is no latch at the output to hold the Q value when CLK=0 (2) There is a feedforward path from CLK to Q which may cause overshot at the output. (3) When CLK switches from 0 to 1, no matter what value D is, Q will be pulled down first before the first stage settles. Functionality? Why is this Circuit good? Main reason for inverter I1? Flip Flop/C2MOs (no skew)/to avoid loading state node (buffer) Also, review: 1. brief overview on timing. 2. Question 5 on http://bwrc.eecs.berkeley.edu/classes/icdesign/ee141_f05/Exams/Final-f05-sol.pdf 3. problem 3 on http://bwrc.eecs.berkeley.edu/classes/icdesign/ee141_s06/Homeworks/ee141_hw 10_sp06.pdf Problem 2 – Variable Block Carry Bypass Adder Consider a 24-bit, 6 stage carry-bypass adder with the following delays: tsetup=4, tcarry=1, tsum=4, tbypass=2 Setup Setup Setup Setup Setup Setup 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Cout Cin Sum Sum Sum Sum Sum Sum a) Identify the critical path through the adder. List the delays for each block along the critical path and give the total delay. Assume that each stage bypasses the same number of bits. The critical path is through the setup of the first stage (specifically, the first bit of the first stage), through all four bits of the first stage’s carry chain, through the first five bypass multiplexers, three bits of the last stage’s carry chain, then through the final stage’s sum. The last carry bit of the final stage does not affect the sum, only the carry out. Carry out through the final multiplexer is not on the critical path as the sum is slower. The delays for each component of this are tsetup, 4tcarry, 3tbypass +2tbypass, 3tcarry, and tsum respectively. These add up to the expression given in equation 11.9 in the text, and substituting the given delays gives a total of 25. The critical path is grayed in figure 1114. b) Consider the setup delay and carry propagation of the second and third stages. These are not on the critical path and can be made slower without affecting performance. If we allow each stage to handle a different number of bits, what is the relationship between the number of bits per stage and the respective carry propagation delay? How many bits would you assign to each of the first three stages to minimize the delay from inputs to the carry output for the first 12 bits of the adder? The worst case delay from the first stage’s inputs through the setup, carry propagation and bypass to the start of the fourth stage is tsetup + M0*tcarry + 3tbypass where M0 is the number of bits in the first stage. The delay for the second and third stages are similarly tsetup + M1*tcarry + 2tbypass and tsetup + M2*tcarry + tbypass Making all of these equal, we get that M1- M0 = M2 – M1 = tbypass/tcarry = 2. Thus, the first, second, and third stages should add 2, 4 and 6 bits respectively. The original critical path is now 2 carries shorter, for a total delay of 23 at the final sum output (it is also acceptable to just give the delay to the end of the third stage carry being now 12 instead of 14). The second and third stages are now also critical paths as well, with the same delay. c) How many bits would you assign to each stage in the second half of the adder? What is/are the delays along the critical path(s) now? Same approach as for part b, except the critical paths are now from the carry in from the third stage, to the sum outputs. Delays for each path are: 2tbypass + (M5-1)tcarry + tsum tbypass + (M4-1)tcarry + tsum (M3-1)tcarry + tsum Making all of these equal, we get that M4- M5 = M3 – M4 = tbypass/tcarry = 2. Thus, the fourth, fifth and sixth stages should add 6, 4 and 2 bits respectively. The critical path to the final stage sum output is now another 2 carries shorter, for a total delay of 21. The fourth and fifth stage outputs are now also critical paths. Note: parts a and c assumed that the sum logic for a bit has a delay of tsum from its carry in to the sum out. From the structure of the mirror adder, one might consider tsum to refer to the delay from the bit’s own carry to its output, in which case the critical path delays would have one extra tcarry. (M instead of M-1) This does not affect the choice of stage widths and is acceptable for the answers. Problem 1 – Inverter Sizing Consider a standard CMOS inverter shown above driving a capacitive load CL = 80 fF with a relatively fast step at its input. Assume that a minimum size “unit” inverter has symmetric high and low drive strength Req,u = 20 kΩ, intrinsic output capacitance Cint,u = 3 fF, and input capacitance Cin,u = 4 fF. Also assume that inverter resistances and capacitances scale linearly with size. a) What is the shortest tp that can possibly be attained for the above circuit by sizing the inverter, and how would it be sized? Call this delay tpmin. Soln: The delay is given by tp=0.69⋅Req⋅(Cint+CL), Req = Req,u/k, Cint = Cint,u⋅k, where k is the size of the inverter relative to the unit inverter. From this equation it is seen that tp is minimized by letting kinfinity (make the inverter infinitely large), at which point the delay is just equal to the intrinsic self loading delay of tpmin = 0.69⋅ReqCint = 41.4 ps. b) What size should the inverter be, relative to the unit inverter, to obtain tp = 1.3tpmin? What are the input and intrinsic output capacitances of this inverter? Soln: tp=0.69⋅(Req,u/k)⋅(Cint,u⋅k+CL)=1.3×⋅41.4ps k = 89 times larger than a unit inverter. The capacitances are C = 89⋅C in in,u = 356 fF, C = 89⋅C int int,u = 267 fF. c) Now consider the dynamic energy consumed driving Cin, Cint, and CL over a complete input cycle (one logic transition in each direction). What inverter size minimizes the energy delay product of this circuit? How do the inverter capacitances compare to CL in this case? Soln: For constant supply voltage energy is simply proportional to the total capacitance that is being charged, in this case E ∝ (Cin,u+Cint,u)⋅k+CL, where k is the inverter size relative to the unit inverter. The delay is given by tp=0.69⋅(Req,u/k)⋅(Cint,u⋅k+CL) = tpmin+0.69⋅Req,u⋅CL/k. Thus, we minimize the energy delay product by minimizing [(Cin,u+Cint,u)⋅k+CL]×[tpmin+0.69⋅Req,u⋅CL/k] over k. This is done by setting the derivative with respect to k equal to zero (to be complete, we should also show that this finds a minima, not a maxima, which can be seen by considering extreme values of k). First we simplify the above expression by expanding the products, combining terms, and dropping terms that are constant with respect to k (i.e., terms that don’t affect the derivative). This results in the simplified expression: 2 U(k) = (Cin,u+Cint,u)⋅k⋅tpmin + 0.69⋅Req,u⋅CL /k ( argmin{U(k)} = argmin{E(k)} ) 2 2 dU/dk = (Cin,u+Cint,u)⋅tpmin - 0.69⋅Req,u⋅CL /k = 0 2 2 2 k = 0.69⋅Req,u⋅CL /[tpmin⋅(Cin,u+Cint,u)] = 0.69⋅Req,u⋅CL /[0.69⋅Req,u⋅Cint,u(Cin,u+Cint,u)] k = CL/sqrt(Cint,u[Cin,u+Cint,u]) = 17.5 times larger than the unit inverter An inverter this size has Cin = 70 fF, Cint = 52.5 fF. These capacitances are similar to the load capacitance itself when optimizing the energy delay product. If we had only considered Cint and neglected Cin, the solution would have been to set Cint = CL, i.e., to make self loading and external loading equal. Problem 2 – CMOS Scaling A microprocessor consumes 0.3mW/MHz when fabricated using a 0.13 um process. The area of the processor is 0.7 mm2. Assume a 200 MHz clock frequency, and 1.2 V power supply. Its leakage power is 0.1mW. Assume short channel devices, but ignore second order effects like mobility degradation, series resistance, etc. (a) If the supply voltage of the microprocessor scaled to 90 nm is reduced to 1.0V, what will the area, frequency, power consumption, and power density be? (b) If the threshold voltage in the 0.13 m process is 0.35V, what should be the threshold voltage in 90nm? Assuming 80mV/dec subthreshold slope, what would be the leakage power of the new processor? 1. Analysis Using the Unified Model Below is another I-V transfer curve for a different NMOS transistor operating under slightly different conditions (see next page): In this problem, the objective is to use a transfer curve like the one above to obtain the transistor parameters. The transistor has (W/L)=(20/1). You may also assume that velocity saturation does not play a role in this example. Also assume –2F = -0.6V From the figure on the next page, determine the following parameters: the threshold voltage VT0, body effect parameter , channel length modulation parameter . Hint: Depending on your choice of curves, you might get unreasonable values for VT0. Therefore, use the curves with the two lowest Vgs values (1V, 1.5V) for the determination of VT0, and explain why using curves with higher Vgs doesn’t give you sensible answers. Vgs=2.5V, Vbs=0V Vgs=2.5V, Vbs=-1V Vgs=2V, Vbs=0V Ids (A) Vgs=2V, Vbs=-1V Vgs=1.5V, Vbs=0V Vgs=1.5V, Vbs=-1V Vgs=1V, Vbs=0V Vgs=1V, Vbs=-1V Vds (V) Solution VT0 This one should immediately signal you to look at a curve(s) that don’t have body-effect. That means VBS = 0V. Pick two points, each from different curves that satisfy the no-body-effect condition. Make sure they’re in the same operating region too! Point VGS VDS ID A B 1.5V 1.0V 2V 2V 1mA 0.36mA W (VGS , A 2 L 1 W k p (VGS , B 2 L 1 I D, A I D,B kp 2 VT 0 ) (1 V DS , A ) 2 VT 0 ) (1 V DS , B ) Operating Region saturation saturation 2 (1.5 VT 0 ) 1 0.36 (1.0 VT 0 ) 2 VT0 = 0.25V As you can see, working on two points with the same VDS helps to cancel as many variables as possible to be able to solve the equation. The reason that we choose the curves with lower Vgs values is because velocity saturation effect is less prominent at these values. Therefore, the quadratic equations can still provide a reasonable fit to the curves. We can use the same methodology as above. This time, we want to keep VGS constant. Point VGS VDS ID A B 1V 1V 2.0V 1.0V 0.36mA 0.32mA W (VGS , A 2 L 1 W k p (VGS , B 2 L 1 I D, A I D, B kp Operating Region saturation saturation 2 VT ) (1 V DS , A ) 2 VT ) (1 V DS , B ) 0.36 (1 2.0) 0.32 (1 1.0) = 0.143V-1 It shouldn’t be a surprise, but that leaves us to keep almost everything constant except for VSB. Point VSB VGS VDS ID A B 1.0V 0.0V 1.0V 1.0V 2V 2V 0.16mA 0.36mA W 2 (VGS , A VT ) (1 VDS , A ) 2 L 1 W 2 k p (VGS , B VT 0 ) (1 V DS , B ) 2 L 1 I D, A I D,B kp 0.16 (1.0 VT ) 2 0.36 (1.0 0.25) 2 VT = 0.5V Now solve for using the following equation: Operating Region saturation saturation VT VT 0 VSB 2 F 2 F 0.5 0.25 1 0.6 0.6 = 0.51V1/2 2. Inverter delay analysis and power. First order analysis again. Like problem 3 in HW3. Warm up exercises for buffer sizing. Difference between EVDD and EC 3. Capacitance What is Miller effect? Try KCL method at output node and find the Ceq=2(CgdN+CgdP) Problem 3. CMOS Gate Design and Implementation a) Design F AB AC BC in static CMOS using the least number of devices. Draw the Logic Graphs corresponding to the circuit and identify the Euler paths. b) Using the Euler paths you found draw the stick diagram for the implementation. Try to use the appropriate colors to make your diagram clear.