Download DesignReview1

Brad Bridgeman Yanqing Zhang ECE 6332 October 8, 2009 ECE 6332 Design Review 1 1. Summaries: Variation Driven Device Sizing for minimum energy Sub-threshold circuits This paper discusses methods behind choosing an optimal size and supply voltage for system functionality. In addition to discussing the outcomes behind modifying the supply voltage and size of the devices, this paper discusses the idea of designing a circuit based on a chosen failure rate. By calculating optimal values of width’s and voltages, you can design your circuit to have a high success rate, and in addition to that, you can still modify the values to minimize the variability constraints imposed on the design. The goal here is that within a design you should have the smallest possible sizes and voltages that satisfy the design constraints. Additionally, one major problem that was discussed was functionality error. In a design, if the NMOS is strengthened enough, it will introduce a strong enough leakage current, that the PMOS will not provide a strong active high, which gives both a functionality error, and a problem due to the leakage. To solve this problem, a butterfly plot was recommended to be used to ensure proper noise margin’s for each device, rather than using a set standard. Advantages in ultra-low voltage design This paper discusses various topics within the field of ultra-low power systems, and introduces several projects that have met recent success within the field. The major point that was discussed was the importance of optimizing Vdd and Vt as a method to optimizing system performance. As you lower Vdd, the delay increases exponentially, and the leakage energy increases, but as you increase Vdd, energy drawn increases, and circuit performance increases, which introduces the tradeoff. Two key challenges were mentioned, relating to Ion, and the ratio of Ion/Ioff, both coming from the fact that Ion is small compared to a strong-inversion current. This causes leakage current to effect the output logic, as well as output voltage values. A second topic of variation was also discussed within this paper. Variation will affect the strength of the devices, such as providing a strong or weak mosfet. Looking specifically at subthreshold, the main variation comes from random dopant fluctuations, which causes shifts within Vt. These shifts within Vt exponentially influence the resulting current from the circuit design. In addition to variation causing unknown currents, it also establishes an uncertainty in the delay (this is where a Monte Carlo simulation will come in handy). Other topics that were mentioned include using a butterfly plot to determine stability, using a read buffer to assist reading hold times, introducing redundancy into a circuit for managing variation, and finally using a sensor mechanism to cause the supply voltage to constantly keep the circuit at the minimum energy point (this can be done using the switched capacitor DC-DC converter). Nanometer MOSFET Variation in Minimum Energy Subthreshold Circuits This paper discusses a wide assortment of topics dealing with the design and optimization of subthreshold circuits. The paper begins by introducing the primary uses of a low power IC, and proceeds to discuss what the comment points of failure are when looking at low power operation. The main problem that is introduced is a result of the SRAM cells, more importantly, implications due to the SRAM cells. The SRAM in a low power design is typically the first set of devices to fail; additionally, they generally require the largest power. One additional question that was introduced, was what happens to the circuit when you reduce Vdd? The answer to this is that the switching speed decreases leading to an increase in the leakage current due to a longer delay in the circuit; additionally, the DIBL effect will be decreased as well, due to a smaller value of Vds. One of the major solutions proposed to improve the SRAM, was to introduce a buffer, more specifically a read buffer. This buffer eliminates the limitation imposed by the read static noise margin, which improves the overall operation of the device. The problem with this comes from the size of the buffer that must be introduced, and the resulting additional leakage current that comes along with it. Overall the read buffer was determined to improve the operation of the circuit, however it does show room for improvement in the design if this buffer could be reduced or eliminated. In addition to discussing SRAM, other topics that were discussed include variation, guidelines to circuit design, and possible testing schemes that can be incorporated. A 65nm Sub-Vt Microcontroller with Integrated SRAM and Switched Capacitor DC-DC Converter This paper introduces several key topics related to Sub-Vt operation. The major topics discussed in this paper include the effects of logic and depth, as well as the results of modifying the supply voltage and width of the devices. When you are analyzing a circuit in Sub-Vt, one major factor to the performance relates to the output, and external connections being made to the device, or in other words, the fan-out. The larger the fan-out of a system, the slower the system will perform. Additionally, when you are operating in the Sub-Vt region, you will have a large variability with the delay of the system. There are two methods that were introduced to reduce the delay variability in sub-Vt operation, increasing the width of the devices, and increasing the supply voltage. Both of these however, will increase the amount of energy consumed by the device. This brings up the topic of the optimum voltage. For every design, there will be an optimum supply voltage that will provide the minimum energy requirement based on the resulting performance to energy consumption ratio. This is where the DC-DC converter is introduced; it was designed to act as a control system that maintains the supply voltage at the optimum level for the minimum energy. Other topics that were introduced within this paper include: low-power suspended states, holdstatic noise margins, usage of clock buffers, effect of fan-ins, and usage of read-buffers. A 65 nm Sub-Vt Microcontroller with Integrated SRAM and Switched Capacitor DC-DC Converter In this paper, a 65-nm Sub-Vt Microcontroller design is presented while proposing crucial design problems and methods of solving them in the sub-threshold regime. Methods to optimize energy and power consumption are given. More importantly to our project, concerns over delay modeling in sub-vt and a method of good approximation are given. Also, important factors to consider during synthesis, including fan-in and constructing a custom library for sub-vt operation are described. Points that are very valuable to us include constructing a process variation aware library that works in sub-vt, considering statistical timing analysis with respect to variation and deriving delay dependent upon logic depth and device sizes. On the energy efficiency side, it is shown that SRAM energy consumption is huge, and design should take notice of this. Energy Efficient Design for Subthreshold Supply Voltage Operation In this paper, an analysis method for minimum energy is given. We are able to see the different energy scenarios for different low power approaches. More specifically, the paper compares energy consumption with or without clock and power gating. A specific model of variation impact wraps up the end of the paper, where we learn an analytical way of determining the ideal Vmin giving consideration to logic depth and variation. Modeling Energy of the Clock Generation and Distribution Circuitry In this paper, equations for calculating the clock load are given. We learn that a large part of clockload is used for cache and register files. Once we know what technology we will be using, we can approximate easily the clock load with the derived model in this paper. (Note: I do not know how useful this will be to us.) Energy-Efficient Subthreshold Processor Design In this paper, architectural and circuit level considerations are given to design an energy efficient processor. The point is made that variation will pose problems to ensure we have the correct Vdd,min and operation frequency. The paper goes on to show that variation will have more effect on operating frequency than Vdd due to the exponential relationship between Vt and operating frequency. This is important to consider because as it is presented, processors have the tendency to operate at higher frequencies than are expected. Therefore, we can save energy by finding the optimal Vdd, lowering the operating frequency. However, the tradeoff is that this will induce greater variation, which in turn will complicate timing closure issues. Once again, random dopant variation is the main culprit. Monte Carlo simulations in the paper show the validity of this conclusion by either fixing Vdd to the optimal voltage and varying the operation period t, and comparing it to the simulations where t is fixed and Vdd is varied. The variation of t is much more substantial than that of Vdd. We learn that we will have to give much attention to this spread in frequency to ensure proper operation when making our two-phase clock. 2. Synthesis Flow and Corresponding Schematics Since we have not really determined what we will be designing our two-phase clock for, here we will describe the synthesis flow that we are now familiar with, and what important steps will mean for our project. a) Synthesis flow begins with the VHDL architecture, here a simple 16-bit accumulator is used for simplicity: ENTITY Acc IS GENERIC(width : POSITIVE := 16); PORT( data_in : IN std_logic_vector(width -1 DOWNTO 0); data_out : OUT std_logic_vector(width -1 DOWNTO 0); clock, control : IN std_logic); END ENTITY Acc; … FOR ALL : Reg USE ENTITY work.Reg; FOR ALL : Adder USE ENTITY work.Adder; BEGIN RX : Reg GENERIC MAP(width => width) PORT MAP (D => data_in, Q => x, clock =>clock, enable =>'1', reset =>control); RY : Reg GENERIC MAP(width =>width) PORT MAP (D => z, Q => y, clock =>clock, enable =>'1', reset =>control); AD : Adder GENERIC MAP (width =>width) PORT MAP (x =>x, y => y, z =>z); data_out <= z; END ARCHITECTURE structure; b) The next step is to choose a technology to implement a design, and we will use Cadence’s RTL Compiler to generate a mapped Verilog netlist of our design to the technology. This is a script version (there is a GUI for RTL Compiler) to accomplish this text. Important comments are in bold: #-----------------------------------------# synthesis scripts # This script is by Zhenyu(Jerry) QI ([email protected]) #-----------------------------------------#You may source other .tcl files here ####################################### # Set global variables and directories ####################################### set TOPMODULE Acc ####################################### # Technology Library Info ####################################### set_attribute library /app/lib/hcmos9gp_920/CORE9GPLL_SNPS_AVT_4.1/SIGNOFF/bc_1.32V_m40C_wc_1.08V_125C/PT_LIB/CORE9GPLL_ Worst.lib #We’ll use the ST130 technology, considering that the design we will ultimately be using was implemented in this way ####################################### # don't ignore DRC for timing ####################################### set_attribute drc_first true ####################################### # Dont-Use Cells #Perhaps we will be adding items in this category depending on our implementation ####################################### # Global Attributes Settings ####################################### … ########################################### # Elaborating Design and Generate Object ########################################### read_hdl -vhdl Reg.vhd read_hdl -vhdl Adder16.vhd read_hdl -vhdl Acc.vhd elaborate ${TOPMODULE} ####################################### # Timing Constraints ####################################### #read_sdc -stop_on_errors ${TOPMODULE}.sdc ########################################## # More Attributes on the Design Object ########################################## set_attribute max_leakage_power 0.0 ${TOPMODULE} #This field will perhaps be changed set_attribute lp_optimize_dynamic_power_first true ${TOPMODULE} set_attribute lp_power_analysis_effort low ######################################## … # Synthesize to Technology Mapped Gates ######################################### synthesize -to_mapped -eff low ###write -m > {TOPMODULE}_mapped.v write -m > Acc_mapped.v #This is the outputted file c) This mapped verilog file should be simulated in a digital simulator, for example ModelSim d) Next, we will be using Cadence’s Encounter to do our Place and Route steps. First, we will setup the tool using Standard Cell Files, Common Timing Libraries, and Timing Constraint Files, with the suffixes .lef, .ltf(or .lib), and .sdc respectively. After setup, we are able to check the standard cells used and the schematic block diagram for our design, which may prove quite valuable: Schematic View Standard Cell Lookup List e) The next step is Placement, where we specify the core size, ground and power supplies, and place our standard cells within the core. We can see the results of placement in two ways. There is an overall architecture view, where we can see specifically see where our modules are placed, and a standard cell view, where we can see the outline of individual cells: Architectural View Standard Cell View f) Our final step before implementing the clock is Routing: Cell View After Routing g) These steps are actually the preamble to the research we will be looking into. Once we have done enough research and have decided how we will implement our two phase clock, we will then go on to place and route that clock within the design. We will then go on to simulate Timing Closure. Chances are that we will not be successful the first time, so then we will have to be able to extract timing reports, and figure out a way to meet timing requirements from that. After being successful, we will then find a way to extract power and performance(speed) parameters from our finished circuit. How we optimize that will also be based on our reading and understanding of Sub-vt operation (which has been summarized in the first part of this design review). 3. Conclusion of Progress and Looking Ahead (Remaining Tasks) With our reading (so that’s why it’s called Reading Days), we have established some very important concepts about design in the Sub-vt regime that will ultimately guide us in our design. Specifically, the method of finding the optimal Vdd, the horrendous effects of variation at such low power supply, and what that means to meeting timing constraints. We will give notice to a statistical method of approximating delay based on logic depth and device sizing. We also take notice that recent success in Sub-vt design always make sure of functionality due to leakage effects in sub-threshold. We have also familiarized ourselves with the tools that we will be using for our project. We have been able to ‘test run’ our flow with a simple 16-bit accumulator design. Namely, we are now conscious of what happens at every stage of the synthesis flow, so that we will know where to turn to when we encounter a problem. For example, if we find that energy and power can be optimized by not using gates with fan-ins greater than 3, we know that we will have to specify this in the RTL Compiler. We also have a conceptual idea of how a chip finally looks like, with supply and ground rails and rings obvious around the core, and our cell placement in a stacked view in the core. We also notice that the same module in our schematic may be placed across the core, even for such a simple design as an accumulator! This conceptual understanding boosts our confidence of how important synthesizing a quality clock that guarantees timing is correct can be. The tasks that remain for our design prior to the proposal primarily relate to research on the topic. As of now we have a strong basis when it comes to the introduction of variability into our design; however we still have room to discover more options on how to reduce the effects of that variation. We currently know that the project we are working on has the goal of reducing the energy and area spent on a device, while ideally operating at the same speed or faster than the current model. The question now becomes how we can convert our knowledge into something feasible. We will also need to explore ideas of how to synthesis our clock. Some suggestions that have been made are the classic clock tree style, but having two trees for each of our phases. Another is adding huge buffers from the sides our top and bottom of our design that can fully drive the two phases. Yet another is clock buffering, and then optimizing the buffering for two phases. As found in our research thus far, the buffers that are introduced purely in an SRAM bitcell account for up to 40% of the area of that bitcell, additionally, they provide an additional leakage current that must be addressed; on the other hand, they are currently one of the major building blocks towards a working design. We must continue our research in an attempt to find alternative methods that have been performed to reduce the requirement of these buffers. A ‘microscopic’ optimization method that has been suggested is to take long paths with bad variation from the timing report, and route our two phase clock specifically to those places that need it, whilst feeding the ‘normal’ clock to other places that already meet timing requirements. In any case that we decide, another good idea would be to fiddle further with the Place and Route Tool, namely know how to get our timing report, how to place our clock, and how to get simulation data for power and speed from the tool. Looking at the upcoming proposal deadline, we will be working on continuing our simulation efforts trying various synthesis options to familiarize ourselves with the system; additionally we will continue researching the topics of subthreshold designs looking at delays, variations, and buffer alternatives. Once we have sufficient knowledge on the subject we will begin determining the best option when it comes to a design to test paths of various lengths as a way to measure the effects of clock skew on the overall delay of the circuit. Looking ahead beyond the proposal, once we have familiarized ourselves with synthesizing, and we have chosen a valid device to test; we will begin weighing in our options on what methods we should incorporate. Based on the synthesis method we take, we know we will begin by finding the value of Vdd, and minimum width that would provide the optimum performance, however beyond that is where our ‘real’ research begins to take hold. Once the background information has been decided we will create a design of experiments (DOE) in which will serve as a way to measure the results based on an independent change. After that, the next step is incorporating our design choices into the current model we choose, and reporting the data. List of Documented Notes:                      If NMOS is strengthened, then the PMOS will not drive VDD due to leakage currents from the PDN VOH and VOL must be sufficient enough to prevent functionality errors A good idea, is to use a Butterfly plot as a guideline for designing VOH and VOL parameters A good measure to test design variability, is to perform a Monte Carlo simulation, it will provide data showing the average, best and worst case scenario. As you increase the width of a device, the variability will decrease As you decrease the voltage, the delay of the circuit will exponentially increase Devices should always be made as small as possible, while still abiding by the design constraints imposed due to yield and variability When looking at stacked devices, the output swing is more important than current variability Given a desired yield, you can calculate the optimal Vdd, and Width of the minimum sized device As you increase the size of devices, you will offset the effect of variations due to lower voltages, however you will also increase the required energy due to an increase in leakage current One way to model the delay variation of a system, is to analyze the delay distribution of each stage. Using the optimum Vdd and Vt will minimize the energy consumption of the design The optimum Vdd is not always the lowest operating voltage, there are also performance factors As you lower Vdd, the delay increases exponentially, and the leakage energy also increases, around 300mv, the leakage current dominates the circuit Two key challenges to ultra low voltage designs, that impact the circuit functionality: o Ion is << I-strong inversion, this can allow the leakage current to effect the on-state o Ion/Ioff is reduced, this weakens the output values, on strong devices, that is VOH = VDD – delta, and VOL = VSS + delta To address this, models were derived for output and input requirements, as well as minimum operating voltages Variation affects the strengths of devices, one major type being a strong Pmos/weak Nmos, or vice versa The main variation in sub-Vt is from random dopant fluctuation (RDF), where dopants in the channel shift Vt, causing exponential effects on Ion/Ioff Process variation increases the uncertainty in delay, at a low voltage, an increased sensitivity to local variation causes the distribution of delay to widen SRAMs typically are the dominant portion of area and power of a system Use a butterfly plot to determine if a bitcell is stable, the Static noise margin(SNM) indicates the maximum noise that can be applied before a state is destroyed, if variation causes both the PMOS and NMOS to shift out of the SNM, then the device fails                       As VDD decreases, the read and hold SNM decrease. This introduces the need for a read buffer, which can be introduced using an additional 2T on the SRAM. Redundancy is good for managing variation MEP changes often, so a detector system can be used to ensure the supply voltage will constantly be working at the MEP, this relates back to the DC-DC convertor that was discussed in another paper. There are three main types of systems that benefit from voltage scaling to the sub-threshold level: low-speed systems, dynamic speed systems and constant high speed systems  through the use of parallel networks, most speed requirements can be satisfied. In low voltage designs, SRAMs require the most power, and typically are the first devices to fail In modern designs, active switching and leakage are the dominant sources of energy When you reduce the Vdd of a circuit, it will impede the switching speed; additionally it will also affect the charge on the capacitor. E= C * Vdd^2; while Vdd is decreasing, the total time to charge the capacitor is increasing, thus the integral form of this equation will show that the leakage energy actually increases. In sub-threshold designs, looking at the short-channel model, VDS will be decreased, which results in a reduced effect on DIBL. By decreasing the effect of DIBL, it significantly decreases the leakage current. Random dopant fluctuation (RDF) and processing variation are dominating effects in modern designs; where both result in a change in Vt on the devices. As Vt changes, the drain current changes, which results in overpowering Geometric variation is also a problem, and it has a stronger effect on the channel length, resulting in problems involving drift and short-channel effects such as DIBL. RDF is the dominating source of variability in sub-Vt systems. Another problem is the degradation in drain current, due to variation. Due to leakage paths, active nodes may provide weaker logic levels than expected (VOH = Vdd – delta, VOL = Vss + delta) The worst case noise margin’s can be analyzed through NAND and NOR gates Gates with more than two inputs should be avoided in sub-Vt designs Variation can compromise logic functionality, which proves the tradeoff: Increase Width  lower Vt variation, increased yield and increased energy consumption….. Increase Vdd  increased energy, increased signal levels, decreased variability. Failures in SRAM were discussed, along with possible solutions and their tradeoffs, one of which uses a read buffer, which provides a 8T design instead of 6T The read buffer eliminates the read SNM limitation, which improves the operating margin that is now set by the hold SNM and the write margin. The problem however, is that this buffer increases the size of the cell by 25-40%, and adds an additional leakage path At the end of the paper, it does discuss possible solutions to the sub-Vt read current degradation Bigger fan in gates can decrease transistor area We will consider statistical timing analysis delay dependant upon logic depth and device sizes Dynamic frequency more important than dynamic voltage optimization

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DesignReview1