Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing Bryan S. Goda Rensselaer Polytechnic Institute Troy, New York Agenda • Introduction • BiCMOS FPGA History • SiGe HBT BiCMOS Process • Current Mode Logic • Xilinx 6200 FPGA Design • Configuration Memory • Performance Results • Conclusions and Future Work Current Role of SiGe • “More Zip per Chip” • Wireless Phones -> Watch Sized Phone • Direct Broadcast Satellite • Fiber-Optic Lines, Switches, and Routers Programmable Bipolar Logic 1983: Fairchild ECL Field Programmable Logic Array • Fuse Based • 4ns Cycle Rate • High Power • Scaling Problems 1990: Algotronix 1.2uM 256 Cell Configurable Logic Array • fT 6 GHz, 200ps Gate Delay • 4 Transistor Static RAM Memory Cells • ASIC Emulation and Signal Processing • Forerunner of XC6200 US Patent CMOS Switchable 2 Input Multiplexer V+ Y1 Y1 Y2 a a a a Vref EN1 EN2 V- Y2 SiGe Heterojunction Bipolar Transistor • Selectively introduce Ge into the base of a Si BJT • Smaller Base Bandgap increases einjection, higher Beta (100) • Higher Beta allows more heavily doped base RB (125 Ohm) • Graded Bandgap decrease base transit time fT SiGe HBT • 50Ghz Process, 100Ghz process within a year (30uA at 50 Ghz) • 5 layers of metal • Used in RPI VLSI Class • co-integrated with CMOS process – can have HBT logic with CMOS memory – low power and high speed fT Curves for Various Emitter Lengths Emitter SiGe HBT Layout Base Collector Sub-Collector Band Diagram Eg,Ge(x=0) Eg,Ge(grade)= Eg,Ge(x=Wb)- Eg,Ge(x=0) =0.031 ev e- EC EV n+ Si emitter p-SiGe base h+ Ge p-Si Drift Field n- Si collector Dielectric Constant Si = 11.7 Ge =16.2 SiGe (7.5% Ge)=12.03 CML Branch Current vs. Differential DC Voltage IBM SiGe and CMOS Load Gate Delays on M1, M2, LM Current Steering Logic Vcc 0 V Level 1 -250 mV Fastest Logic Level Limited Drive Capability -950 mV Level 2 -1.2 V Inter-block Signal Level Good Fan-Out (10) -1.90 V Level 3 -2.15 V Clock Signal Slowest Level Vee 4.5 V Level 4 Possible Current Steering Logic In SiGe • 13ps Transistor Switching Time (75 Ghz) – 6ps Process Next Year • Small Voltage Swings (250mv) vs 3.3 or 5 V – Less Power – Smaller Swing = Faster • “Steer” Currents, Use Differential Logic – Less Switch Noise • Less Transistors needed, Complement Signal Present • Flip-Flops and Multiplexers Easy to Implement Vcc O V CML XOR Logic Schematic A 0 0 1 1 B 0 1 0 1 1 0 A XOR B A XOR B 0 1 1 0 1 1 Level 1 0 -0.25 V A A XOR B A 0 1 1 A 0 1 B B A level1 Level 2 -0.95 -1.2V B level 2 0 0 1 A XOR B 0 0 1 1 1 0 0 1 0 1 1 Vref 0 1 0 Vee -4.5V General FPGA Structure Logic Cell I/O Cell Routing Network Configuration Memory High Speed FPGA Applications • Real Time Image Processing - Radar - Pattern Recognition • Digital Networks - Mobile Subscriber Equipment - Command Information Systems - High Speed Switching Nodes • Control Systems - Guidance Systems - Reprogrammable Survivability • Satellite Systems Image Correlation Search Image Desired Image 1. 2. 3. 4. Desired Image is programmed into chip (1 pixel = 1CLB) Load a section of search image If enough pixels match, then turn found bit on Load another section, or reprogram with new desired image Samples From XC6200 CAD Tools IO Blocks CLBs Pins FPGA Drawbacks • Slowdown – 200 Mhz Internal Speed down to 30-60 MHz External – Pass Transistor = Low Pass Filter • Limited Bandwidth • Relatively Long Configuration Times (Seconds) • Vender Guarded Information • More Expensive than Comparable ASIC Pass Transistor Interconnect Modeling 3 1 3 M 1 M M 2 1 2 3 4 On M 4 2 M M 4 (Memory) Interconnect Pass Transistor Equivalent Circuit from Node 3 to Node 2 Field Programmable Gate Arrays (FPGA) • Hierarchy Level Organization (Sea of Gates) – Simple Cells (Configurable Logic Blocks) – 4x4, 16x16, 64x64 groupings – Hierarchy of routing resources at each level – I/O Blocks (external interface) Design Parameters • Logic Swings Levels - Based on Differential Pair Switching - Current Levels • Redesign of the Configurable Logic Block - Take Advantage of Differential Wiring - What Parts Can be Turned off if not Used? • Supply Levels - How Many Levels of Logic? • Routing Resources • CMOS Voltage Levels - Integrate CMOS into Bipolar Current Tree Current Tree with CMOS Routing VCC 0 V OUT Level 1 0 -0.25V OUT a a b b S1 S1 c c S1 Level 2 -0.95 -1.2V Level 3 -1.9 -2.15V d S1 S2 S2 Vref Replace with Vee -3.4 V d Bipolar vs Bipolar/CMOS Current Trees CMOS Bipolar Pulse Width 50ps 60ps 70ps 100ps 4:1 Multiplexer Level 1 Inputs Level 1 Output Level 1 Output Level 2 Input Level 2 Input Level 3 Input Level 3 Input CMOS Version W/L 5:1 X1:= a Sample Logic Using Multiplexers A and B X2:= b Y2 1 0 X3:= a Y3 If a=1 then select Y2 output = b If a=0 then select Y3 output = 0 X1:= a A OR B X2:= a Y2 1 0 X3:= b Y3 If a=1 then select Y2 output = 1 If a=0 then select Y3 output = b X1:= a X2:= b Redesign of XC6200 Logic Original XC6200 Design • Have to Track Inversions Y2 1 Inverted Output 0 X3:= a Y3 X1:= a X2:=b Revised Design • Use Differential Pair Logic • Eliminate XC6200 Fast Logic • No Inversion Tracking Y2 1 0 X3:= a Y3 Non-Inverted Output X2 X1 Y2 1 0 CS Multiplexer RP Multiplexer C D Q X3 F S Original XC6200 Architecture Y3 Clk Q Clr X2 X1 Y2 1 0 CS Multiplexer RP Multiplexer C D Q X3 F Redesigned Architecture S Y3 Bipolar with CMOS Routing Clk Clr Q Switchable 10 Ghz Three CLB Simulation CLB Layout 4:1 Mux (off switchable) Master/Slave Latch (off switchable) CMOS Control 4:1 Mux 2:1 Mux High Speed Logic CMOS Control (off switchable) Buffer Sample CLB Test Circuit Vref 8:1 Mux CLB Vref Buffer 8/1 Divide Pad Drivers Actual Fabricated Test Circuit Pads (110u x 110u) Outgoing CLB Routing Incoming CLB Routing N S E W N4 S4 E4 W4 X3 N S E W N4 S4 E4 W4 X1 X2 CLB F N S E W N4 S4 E4 W4 4x4 Block Boundary Routing N Switches S Switches Local Routing Magic Routing W Switches E Switches E Switches W Switches N Switches S Switches Length 4 FastLane (4x4) Length 16 Fastlane (16x16) Chip Length Fastlane (64x64) Local CLB Routing Nout N E W F N S E W N4 S4 E4 W4 X1 • Nearest Neighbor Routing • Output (F) or Local Through N S E W N4 S4 E4 W4 X2 CLB F S EW F Sout Example: Route East Signal Through to Next CLB Note: Can’t Route Signal Back to Origin at this Level Eout X3 N S E F N S W F Wout N S E W N4 S4 E4 W4 Normal CMOS Memory-CML Interface New Configuration Data SRAM Bits In Memory Planes VSS CMOS to CML Buffer V SS V REF decode VEE V EE CLB Multiplexer Inputs Memory Design Q D Q D Latch M/S 40 Transistors CLK D Clock Q D Latch M/S Q 18 Transistors CLK Data Data Word Out Out RAM Cell 6 Transistors Parallel Load 3-D Chip Stacking Memory Planes CLBs • Shorter Wires • More CLBs/Area • Optimize Memory CLB Select CLB with Routing and RAM (2) RAM2 CLB RAM1 MUX MUX MUX MUX Selects Layout of Configurable Logic Block with 2 sets of RAM RAM 2:1 Mux Circuit Elements: 240 nfets 122 pfets 36 resistors 98 npn1 HBTs 16 npnhb1 HBTs 8:1Mux (routing) CMOS Selects CLB (logic) Master/Slave Latch (memory) SiGe Performance Circuit Type Propagation Delay Buffer 17ps CML MUX XOR,AND,OR XOR,AND,OR 22-25ps 23-26ps CLB 100ps Power Decreasing Ideas Date Dec 98 June 99 Aug 99 Dec 99 Mar 00 Dec 00* Idea Power Consumption/CLB Original CLB 73 mW CLB Redesign I 34 mW CLB Redesign II 24 mW Widlar Current Mirror with CMOS Control, CMOS Routing 10.8 mW Supply Voltage 4.5 -> 3.3V 7 mW 7HP Process 0.3 mW * Projected Power Levels for 7HP Process: At 50Ghz, 30 uA, 20x+ reduction in power Multiplexer Performance vs Temperature -50o C 25o C 125oC Normal 250 mV Swing 200 mV Min Swing (b) Current Tree Voltage Turn-Off Vcc Voltage (mV) 0 Input -50 -100 -150 -200 -250 0 1 2 3 4 5 Time (nS) (c) Current Tree Voltage Turn-On Vref Vee Voltage (mV) 0 -50 -100 -150 -200 -250 0 Widlar Current Mirror with CMOS Control 0.1 0.2 0.3 Time (nS) 0.4 0.5 0.6 XC6200 Design Improvements • Developed at the University of Scotland • Inversion of Signal at Every CLB - Taken care of due to differential pair wiring • No Pass Transistors, Use Multiplexers for Routing • Able to turn off unused parts with CMOS controlled current mirror • No CMOS-CML Conversion circuits needed, CMOS in current trees • Handcrafted, dense layouts • Context Switching uW/gate/Mhz (log scale) Power Delay Product 1 5HP PDP CMOS High 0.1 PDP CMOS Low PDP BiCMOS 7HP 8HP 0.01 0.001 1998 1999 2000 Year 2001 2002 Data Dependent Switching Differential Logic has Complement Switching In Opposite Direction A A B B C C Slow Transition Bit Line Twisting Could Vary Signals Up to 30% A A B B C C Fast Transition Setup Time Violations Future Work • Testing • Overall FPGA Architecture • Scaling • Integrate with Other Systems • Projected Graduation May 2001, work to continue at USMA • Power Reduction - 7HP Process CLB Context Switch Example Pattern1 0001100100 70ps ~ 7.1 GHz Pattern2 1011011100 70ps Select 0001100100 1011011100 0001000100 AND 1011111100 OR AND OR AND OR Redesigned CLB Cell with Routing and Memory (2x) 2x24 Bit RAM Three 8-1 Input Mux CLB Four 4-1 Output Mux M1 M2 M3 M4 CLB Row 4x1 Memory Bus Lines Circuit Elements Switch 1520 Nfets 792 Pfets 260 Resistors 140 NPN1 HB 576 NPN1 N/S Input Output XC6200 Device Family Device XC6209 XC6216 XC6236 XC6264 Gate Count 9-13K 16-24K 36-55K 64-100K Number Cells 2304 4096 9216 16384 I/O Blocks 192 256 384 512 Row x Col 48x48 64x64 96x96 128x128 Typical Routing Delays Symbol Parameter XC6200 SiGe Redesign TNN Route Nearest Neighbor 1 ns 23 ps Tmagic Route X2/X3 to Magic Out 1.5 ns 47 ps TL4 Length 4 FastLane TL16 Length 16 FastLane 2 ns 70 ps TCL64 Chip-Length (64) Delay 3 ns 94 ps 1.5 ns 47 ps ~31x improvement 4x4 CLB Layout Cell • Largest Basic Block • Over 13,000 Transistors • Commercial Product Size is a 4x4 Array of this Cell 5 Stage Ring Oscillator Speed Relative to Schematic Current Schematic 6.36 Ghz -- 8.4mA Parasitics 5.71 Ghz 89% 8.6mA 50oC 5.26 Ghz 82% 8.85 mA 75oC 4.87 Ghz 76% 9.1 mA 100oC 4.16 Ghz 65% 9.34 mA 125oC 3.12 Ghz 49% 9.5 mA