Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mariam Hoseini Advisor: Dr. Chao You Supervisor: Dr. Mark Pavicic Committee members: Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li North Dakota State University April 2009 • Conformal Computing • Asynchronous circuit design Handshake protocols Data encodings Signaling protocol Asynchronous design methodologies Asynchronous primitives • Constructing an array of cells • PCC cell design and simulations • Conclusion North Dakota State University 2 • Computers are typically rigid boards or boxes with a fixed computational capability. • The available computers may have the undesired size or shape, or have less computing capability than is needed. • The program investigates a more flexible form of computer which easily conforms to the physical and computational needs of an application. • Potential applications: – Sorting, cryptography, cellular neural nets, etc – The computational material can be integrated with arrays of sensors and/or actuators North Dakota State University 3 • Potential problems: – Easily changing the physical shape of the computer – Adjusting the computational capability – Propagation delays, synchronization, power distribution, and heat dissipation. • One approach is: – To form extensible arrays of simple reconfigurable computing elements (cells) into thin wallpaper-like sheets. – Long signal wires are eliminated. – Communications are local and synchronized with cell to cell pulses. • This research presents a cell design, called a pulsed conformal computer cell (PCC cell). North Dakota State University 4 • PCC cell has significant similarities to cellular automata (CA): – Simple fine-grained elements, – Integration of processing and storage, – Local communication • CA can model the elements of digital computers, using patterns of cells to perform the functions of wires, logic, & registers – The same model is used in the PCC cell design • The function and connections of PCC cell are reconfigurable, similar to FPGAs. – FPGAs are not as fine-grained – FPGAs are not as regular – The PCC cell array uses only short-range wires that connect adjacent cells North Dakota State University 5 • Two major styles of circuit design: Synchronous & Asynchronous • Advantages of asynchronous design, in terms of: – – – – – Clock skew Speed Meta-stability Modularity Power • Disadvantages of asynchronous design: – More difficult to design for a hazard free behavior and a correct ordering of operations. – Additional hardware to initiate, advance, and indicate the completion of operations. • Asynchronous systems are specified by handshake protocol, data encoding, underlying delay model. North Dakota State University 6 • Handshaking is the alternate for clocking in asynchronous systems. • Data transfer between two processes is synchronized with signals that are generated by the same processes. • Asynchronous operation can also be done without handshaking. – Handshaking is used to separate successive uses of a component. – It may not be necessary to separate the use of a component or the separation can be done by delaying the operations. • Handshaking can be done at higher levels in an asynchronous system. North Dakota State University 7 • Bundled data: – Normal Boolean levels encodes data values – Separate request and acknowledge wires are used • Dual rail: – Two wires are used to carry a single bit – Request wire is encoded in dual rail data wires Dual rail encoding Meaning 00 No data 01 0 10 1 11 Forbidden – Dual rail data encoding is used in PCC cell design North Dakota State University 8 • Pulse Signaling: – – – – – Each request and acknowledge is a pulse Simple and small cycle like transition signaling Dealing with levels like level signaling Better noise immunity than single-track signaling Potential problem: robustness of sending pulses over long wires. One cycle start event Request event done Acknowledge – Pulse signaling is used in PCC cell design & there is no problem of long wires. North Dakota State University 9 • Bounded delay Simplest model Delays of circuit elements and wires are assumed to be known or bounded. • Delay insensitive (DI) Both gates and wires have unbounded and unknown delays. Completion detection mechanism is needed at receiver • Quasi delay insensitive (QDI) DI + Isochronic forks = QDI Isochronic forks are capable of indication All input transitions should be indicated by an output signal transition A North Dakota State University d2 B d3 C d1 10 • In an asynchronous systems, interfaces and inside modules can be designed with different timing models • In the PCC cell design, for timing management: – Internal of a cell is governed by a bounded delay model – Communications between the cells is done by a QDI model North Dakota State University 11 • In synchronous systems, Boolean circuits can be constructed from a primitive like a NAND-gate • Logic gates provide only logic functionality, not timing functionality, so not sufficient to make asynchronous circuits • Asynchronous systems can be made from a set of primitives • The set of primitives must provide both universal logic and timing functionalities • Different sets of primitives have been introduced, such as Keller’s, Patra’s, Lee’s, and etc North Dakota State University 12 The set of primitives used in a PCC cell: • Wire I O – Transfers the output of a component to input of another one. O2 I1 • Fork – The output of one component is the input to several components • Merge – Sends one of its input to the output • Join O1 I I1 I2 O – Data from several independent components are needed to be synchronized. O1 12 I1 I2 O North Dakota State University 13 • An array of cells each having a simple one-bit processing unit • Von Neumann neighborhood for local connections • A routing problem occurs: • A possible solution: North Dakota State University 14 • Another approach is to combine every two to make a double cell – The same routing capability with fewer neighboring connections • A further step is to group 4 cells together to make a quad cell – The same routing capability with simple connections to 4 nearest neighbors North Dakota State University 15 • Logic Unit Design • Synchronization • Pulse Regenerator • Top Level Design • Configuration Circuitry • PCC Cell Simulations – One-bit full adder – Ring oscillator – Shift register • Implementing Pipelines North Dakota State University 16 • There is a logic unit (LU) and an output register in each quarter • Each LU has two inputs and one output North Dakota State University 17 • Dual rail inputs • Dual rail outputs • Switches should be set before inputs arrival • 8 switches to define a function • 16 functions • Avoids floating nodes by pull down resistors North Dakota State University 18 • AND function • D, E , F, G are “0001” North Dakota State University A B Z 0 0 0 0 1 0 1 0 0 1 1 1 19 • Wire one output pulse triggers the LU inputs of the neighbor cell in the same direction. • Merge is realized by 2:1 Muxs, pulses do right turns (90 degree) • Fork Each turn triggers a neighbor quarter and also a neighbor cell, – a single computation forks into multiple parallel computations North Dakota State University 20 Join North Dakota State University • A completion detection circuitry • All the participating quarters should have their LU outputs ready • Complements a fork by combining multiple parallel computations into a single computation. • QDI Communications 21 North Dakota State University • Fork1 – Only when a pulse turns – LU should use only the turned pulse • Fork2 & Fork4 – No timing assumptions • Fork3 & Fork5 – Bounded delay model 22 • When a pulse travels through many cells, the width of the pulse may increase or decrease • Too short pulse may not be detectable at all, too long pulse may catch up other pulses • A PRG produces an output pulse with a certain constant width, D1 independent of the width of the input pulse. D2 A B C • D1 is the delay by which the input pulse is stretched • D2 determines the width of the output pulse D E North Dakota State University 23 North Dakota State University 24 In a PCC cell : (W/L)p / (W/L)n ≈ 1.6 In an inverter: Equivalent resistance of a MOS : (R≈ L/W) • To match PMOS and NMOS resistances (W/L)p / (W/L)n = 3 ~ 3.5 tpHL = .69* Rn* CL & tpLH = .69* Rp* CL if Rn = Rp tpHL = tpLH • A bigger PMOS improves the tpLH by increasing the charging current. • A bigger PMOS degrades the tpHL by causing a larger parasitic capacitance. • tp = (tpHL + tpLH)/2 is not minimal. • The ratio for an optimal speed performance equals to √(Rp/Rn) • The device can be speed up device by reducing the size of PMOS North Dakota State University 25 • Configuration bits (16 bits for LU switches, 8 bits for Merge MUXs & 4 bits for Join, i.e. total of 28 bits) should be loaded • Only some parts of the array may need to be configured • One solution is to make a long chain of shift registers of all the cells & configure all of them • A better solution is to form the chain of shift registers only by the cells that are needed to be configured. • In each cell, a controller: – decides whether the cell is wanted to be configured or not – directs the bit flow to one of the cell neighbors – stops the shift registers whenever all the intended cells are configured North Dakota State University 26 Decoder clk-N clk-W clk-E clk-S OR clk-N clk-S clk-W clk-E Decoder data-N data-W data-E data-S OR 10 Decoder 11 data-N data-S data-W data-E 01 00 Controller Shows that the shift register is filled Shows that the cell is the last one in the chain of shift register Determines that the cell should/should not be configured Defines the neighbor to which the bits should be forwarded North Dakota State University 27 North Dakota State University 28 • • PCC cell was implemented in TSMC 250 nm CMOS using S-Edit. The simulation was done by Pspice • The supply voltage is 5V • Input pulse widths are 400ps • Propagation delay through a cell is 480ps ~ 500ps. • Better speed: Slope ≤ gate propagation delay • Slope of the external inputs are 12ps. • No overshoots and undershoots North Dakota State University 29 Voltage source =5V Average current = 6 mA for 1.4 ns & 17 mA for 8.6 ns For 20 pulses: Energy = (5 * 6* 1.4) + (5 * 17 * 8.6) = 773 pJ North Dakota State University 30 For 1 pulse (1-bit of operation): Voltage source= 5 V Average current = 5 mA Voltage source= 3.3 V Average current = 3 mA North Dakota State University Energy = 5 * 5 *1.5 ns =37.5 pJ Energy = 3 * 3.3 *1.8 ns=17.8 pJ 31 • Sum = A B C 1 1 1= 1 • Carry= AB + BC + AC = AB + (A+B)C 1.1 + (1+1).1=1 • Sum & carry products are ready after 0.5ns & 1.8ns North Dakota State University 32 • Loops are important for many circuits such as sequential circuits, iterative computations and For, If, and While constructs • The ring oscillator represents two capabilities of PCC cell: – A loop can be controlled externally (started & stopped) – Utilizing Join of pulses, communications can be QDI Start Pulse ‘0’ 0 1 0 1 0 0 1 Output is always a ‘1’ North Dakota State University 33 • Ring oscillator implemented in an array of PCC cells One One XOR WR Nand • • Pass WR One Pass ‘0’ pulses are shown in blue, ‘1’ pulses are shown in red The input Mux is configured to receive a ‘0’ pulse only from external of the 1st cell and a ‘1’ pulse only from a turn. North Dakota State University 34 Simulation Results: North Dakota State University 35 An input bit stream of “1010” is used. Cell 1 Cell 2 Cell Cell 3 4 D1 x x x D2 D1 x x D3 D2 D1 x D4 D3 D2 D1 North Dakota State University 36 • • If handshaking is done for every component, the components can form a pipeline. Each component should supply an Ack to indicate that it is available for re-use. Ack is received Ack is received LU LU LU LU LU LU Ack LU Ack LU LU Delay(1) = 3X + (n-2)5X + 3X= (5n - 4)X North Dakota State University 37 • Some cells don’t handshake & they are cascaded. The cascaded cells form a unit of a pipeline. So, handshaking is done only at higher level. Ack is received LU LU LU LU Ack LU A unit of the pipeline Ack LU LU A unit of the pipeline Delay(2) = 3X + (n-2)2X + 3x= (2n +2)X Delay(2)/Delay(1) = (2n + 2)X=(5n-4)X = 2/5 North Dakota State University 38 PCC Cell Technology TSMC 250nm Voltage Source 5V (3.3V) Transistor Count 760 Propagation delay 500 ps (600 ps) Minimum input pulse width 400 ps Energy consumption for 1-bit operation 37.5 pJ (17.8 pJ) Routing Capability Data can be routed in 4 directions QDI Communications Yes, by perfuming Join Performance: Speed very good Energy good Area average Implementing comb/seq circuits Yes Controlling a loop externally Yes Implementing pipelines Yes North Dakota State University 39 • Contribution: – Utilizing asynchrony, reconfigurability, and the properties of CA to make an extensible array with more regular and finer grained cells than that of FPGAs. • Future works: – Improving the performance of the cell in terms of area and thermal management North Dakota State University 40 • Express my deepest gratitude to my supervisors, Dr. Mark Pavicic and Dr. Chao You. • Gratitude are also due to graduate committee, Dr. Rajendra Katti, Dr. Subbaraya Yuvarajan, Dr. Deying Li. • Express my love and gratitude to my beloved spouse, Hamed. North Dakota State University 41 North Dakota State University 42