Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to CPLD/FPGA Technology, Devices and Tools Theerayod Wiangtong Electronic Department Mahanakorn University of Technology 1 Outline • Programmable Logic – CPLD – FPGA • • • • Architecture: Basic & Advance Examples Features Vendors and Devices • Design Tools 2 World of Integrated Circuits Full-Custom ASICs Semi-Custom ASICs User Programmable PLD FPGA 3 ASIC • ASIC: Application Specific Integrated Circuit • Designs must be sent for expensive and time consuming fabrication in semiconductor foundry • Designed all the way from behavioral description to physical layout 4 CPLD/FPGA • CPLD: Complex Programmable Logic Device • FPGA: Field Programmable Gate Array • • • • • Small development overhead No NRE (non-recurring engineering) costs Quick time to market No minimum quantity order Reprogrammable 5 Which Way to Go? Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability ASICs CPLD/FPGAs 6 Other Advantages • Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower – Mistakes not detected at design time have large impact on development time and cost – FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications – Reconfigurable computing 7 Programmable Logic CPLD/FPGA 8 Programmable Logic • Programmable digital integrated circuit • Standard off-the-shelf parts • Desired functionality is implemented by configuring onchip logic blocks and interconnections • Types of programmable logic: – Complex PLDs (CPLD) – Field programmable Gate Arrays (FPGA) 9 PLD - Sum of Products Programmable AND array followed by fixed fan-in OR gates A B C Programmable switch or fuse f1 A B C A B C f2 A B A B C AND plane 10 PLD - Macrocell Can implement combinational or sequential logic Select A B Enable C f1 Flip-flop MUX D Q Clock AND plane 11 CPLD Structure Integration of several PLD blocks with a programmable interconnect on a single chip PLD Block • • • • • • I/O Block PLD Block I/O Block I/O Block • • • Interconnection Matrix I/O Block • • • PLD Block PLD Block 12 CPLD Example - Altera MAX7000 EPM7000 Series Block Diagram 13 CPLD Example - Altera MAX7000 EPM7000 Series Device Macrocell 14 FPGA Architecture 15 FPGA - Generic Structure Logic block FPGA building blocks: • • I/O I/O I/O • Programmable logic blocks Implement combinatorial and sequential logic Programmable interconnect Wires to connect inputs and outputs to logic blocks Programmable I/O blocks Special logic blocks at the periphery of device for external connections Interconnection switches I/O 16 FPGA – Basic Logic Element • LUT to implement combinatorial logic • Register for sequential circuits • Additional logic (not shown): – Carry logic for arithmetic functions – Expansion logic for functions requiring more than 4 inputs Select Out A B C D LUT D Clock Q 17 Look-Up Tables (LUT) • Look-up table with N-inputs can be used to implement any combinatorial function of N inputs • LUT is programmed with the truth-table A B C D Z 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 A B C D LUT Z LUT implementation A B Z C D Truth-table Gate implementation 18 LUT Implementation • Example: 3-input LUT • Based on multiplexers (pass transistors) • LUT entries stored in configuration memory cells X1 X2 0/1 0/1 0/1 0/1 0/1 F 0/1 0/1 Configuration memory cells 0/1 X3 19 Programmable Interconnect • Interconnect hierarchy (not shown) – Fast local interconnect – Horizontal and vertical lines of various lengths LE LE Switch Matrix LE LE Switch Matrix LE LE 20 Switch Matrix Operation Before Programming • • • After Programming 6 pass transistors per switch matrix interconnect point Pass transistors act as programmable switches Pass transistor gates are driven by configuration memory cells 21 Configuration Storage Elements • Static Random Access Memory (SRAM) – each switch is a pass transistor controlled by the state of an SRAM bit – FPGA needs to be configured at power-on • Flash Erasable Programmable ROM (Flash) – each switch is a floating-gate transistor that can be turned off by injecting charge onto its gate. FPGA itself holds the program – reprogrammable, even in-circuit • Fusible Links (“Antifuse”) – Forms a forms a low resistance path when electrically programmed – one-time programmable in special programming machine – radiation tolerant 22 FPGA Technology Roadmap Year 1995 1996 1997 2000 2003 2004 Technology 0.6µ 0.35 µ 0.25 µ 0.18 µ 0.13 µ 0.09µ Transistor count 3.5M 75M 430M 1B 12M 23M 23 Special Features • Clock management – PLL,DLL – Eliminate clock skew between external clock input and on-chip clock – Low-skew global clock distribution network • • • • • Embedded memory blocks Support for various interface standards High-speed serial I/Os Embedded processor cores DSP blocks 24 FPGA Vendors & Device Families • Xilinx – Virtex-II/Virtex-4: Featurepacked high-performance SRAM-based FPGA – Spartan 3: low-cost feature reduced version – CoolRunner: CPLDs • Altera – Stratix/Stratix-II • High-performance SRAM-based FPGAs – Cyclone/Cyclone-II • Actel – Anti-fuse based FPGAs • Radiation tolerant – Flash-based FPGAs • Lattice – Flash-based FPGAs – CPLDs (EEPROM) • QuickLogic – ViaLink-based FPGAs • Low-cost feature reduced version for cost-critical applications – MAX3000/7000 CPLDs – MAX-II: Flash-based FPGA 25 State of the Art in FPGAs • 90 nm process on 300 mm wafers – Lower cost per function (LUT + register) – Smaller and faster transistors: Higher speed • System speed up to 500 MHz – Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. – Integrated transceivers running at 10 Gigabits/sec • More Logic and Better Features: – >100,000 LUTs & flip-flops – >200 embedded RAMs, and same number 18 x 18 multipliers • 1156 pins (balls) with >800 GP I/O – 50 I/O standards, incl. LVDS with internal termination • 16 low-skew global clock lines – Multiple clock management circuits • On-chip microprocessor(s) and multi-Gbps transceivers 26 Latest Devices: Capacity & Features Xilinx Virtex-4 Altera Stratix-II • • • • • • • • • • • • • 90nm process Up to 960 I/Os >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 192 DSP slices (18x18 multiplieraccumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores • • • 90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers 12 PLLs Serial I/O up to 1Gb/s No hard processor cores 27 ALTERA 28 Device Families & Tools 29 Device Roadmap 30 Technology 31 Logic Density 32 Pricing Roadmap 33 FLEX10K Basic Architecture 34 Logic Array Block: FLEX10K 35 Logic Element of FLEX10K 36 Advance Altera Architecture 37 Stratix Device 38 Stratix Device Family 39 Altera: Embedded DSP Blocks • Two DSP Block columns per device • Number varies by height of column • Can implement: – Eight 9x9 multipliers – Four 18x18 multipliers – One 36x36 multiplier • Contains adder/subtractor/accumulator • Registered inputs can become shift register 40 Altera: Embedded DSP Block 41 Embedded RAM Dual-Port RAM – M512 – 512 x 1 – M4K – 4096 x 1 – M-RAM – 64K x 8 42 Embedded RAM Block 43 ALTERA High Speed I/O 44 Embedded Processor • Soft Processor: NIOS 32bit @150MHz • Hard Processor: ARM922T 32bit RISC @200 MHz (Excalibur device) • Additional features – – – – – Communication Controller Integrated MMU (Memory Management Unit) High-Speed Memory Interface C-Level Simulation Multi-Processor Support 45 NIOS II Family 46 Max II Device 47 Xilinx 48 Product Overview High Volume Low Cost CPLD Rom-based High Performance High Density Low Power Low Cost 49 Xilinx FPGA Families • Old families – XC3000, XC4000, XC5200 – Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. • High-performance families – Virtex (0.22µm) – Virtex-E, Virtex-EM (0.18µm) – Virtex-II, Virtex-II PRO (0.13µm) • Low Cost Family – – – – Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 50 Basic FPGA Architecture Spartan-II 51 CLB Structure COUT G4 G3 G2 G1 Look-Up Table O Carry & Control Logic COUT YB Y D S Q CK EC CIN CLK CE Look-Up Table O R F5IN BY SR F4 F3 F2 F1 G4 G3 G2 G1 Carry & Control Logic YB Y D S Q CK EC R F5IN BY SR Look-Up Table O Carry & Control Logic XB X D S CK EC Q F4 F3 F2 F1 R SLICE CIN CLK CE Look-Up Table O Carry & Control Logic XB X D S Q CK EC R SLICE • Contains 2 slices • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs 52 CLB Slice Structure • Each slice contains two sets of the following: – Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register – Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic – Storage element • • • • Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control 53 Example: 5-Input Functions implemented using two LUTs X5 X4 X3 X2 X1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 LUT OUT A4 A3 LUT ROM RAM D A2 A1 WS DI F5 0 LUT F4 A4 F3 A3 F2 A2 F1 A1 BX WS DI D 1 F5 GXOR G X LUT ROM RAM nBX BX 1 0 54 Dedicated Expansion Multiplexers • MUXF5 combines 2 LUTs to create – Any 5-input function (LUT5) – Or selected functions up to 9 inputs – Or 4x1 multiplexer • MUXF6 combines 2 slices to form – Any 6-input function (LUT6) – Or selected functions up to 19 inputs – 8x1 multiplexer CLB Slice MUXF6 LUT LUT MUXF5 Slice LUT LUT MUXF5 55 Distributed RAM RAM16X1S • CLB LUT configurable as Distributed RAM – A LUT equals 16x1 RAM – Implements Single and Dual-Ports – Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read – Accompanying flip-flops used for synchronous read D WE WCLK A0 A1 A2 A3 = LUT O RAM32X1S D WE WCLK A0 A1 A2 A3 A4 LUT = LUT or O RAM16X2S D0 D1 WE WCLK A0 A1 A2 A3 O0 O1 or RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 56 Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of sum & carry signals MSB Carry logic is independent of normal logic and routing resources Carry Logic Routing – Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters LSB 57 Basic I/O Block Structure Q D Three-State EC FF Enable Clock Three-State Control SR Set/Reset • • • • Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered Inputs can be delayed Q D Output EC FF Enable Output Path SR Direct Input FF Enable Registered Input Input Path Q D EC SR 58 Advance Xilinx Architecture 59 Virtex-II Pro • 130nm CMOS Copper Low-K • 1200 I/Os, 1696 Pin Package • 125,000 Logic Cells • 10 Megabits of RAM • 556 XTREME DSP Multipliers • 16 3.125 Gbps transceivers • 4 PowerPC CPUs Virtex-II Pro 60 Vertex-II Pro PowerPC 405 Dedicated multipliers and memory Digital Clock Management (DCM) provides • 16 independent clock domains • Clock divide, multiply, phase shift • Enhanced Phase Locked Loops (PLLs) Routing Resources (90%) 61 Block RAM • Most efficient memory implementation – 4 to 14 memory blocks • 4096 bits per blocks – Use multiple blocks for larger memories Spartan-II True Dual-Port Block RAM Port B • Ideal for most memory requirements Port A – Dedicated blocks of memory Block RAM • Builds both single and true dual-port RAMs 62 Dual-Port Bus Flexibility RAMB4_S4_S16 WEA Port A In 1K-Bit Depth ENA RSTA DOA[3:0] Port A Out 4-Bit Width DOB[15:0] Port B Out 16-Bit Width CLKA ADDRA[9:0] DIA[3:0] WEB ENB Port B In 256-Bit Depth RSTB CLKB ADDRB[7:0] DIB[15:0] • Each port can be configured with a different data bus width • Provides easy data width conversion without any additional logic 63 Two Independent Single-Port RAMs RAMB4_S1_S1 Port A In 2K-Bit Depth VCC, ADDR[10:0] WEA ENA RSTA DOA[0] Port A Out 1-Bit Width CLKA ADDRA[10:0] DIA[0] Port B In 2K-Bit Depth GND, ADDR[10:0] WEB ENB RSTB DOB[0] Port B Out 1-Bit Width CLKB ADDRB[10:0] DIB[0] • Can split a Dual-Port 4K RAM into two Single-Port 2K RAM – Simultaneous independent access to each RAM • To access the lower RAM – Tie the MSB address bit to Logic Low • To access the upper RAM – Tie the MSB address bit to Logic High 64 Rocket I/O • From 4 to 24 RocketIO MGTs per Virtex-II Pro™ device • Continuous operating range 622 Mbps to 3.125 Gbps Virtex 4: 11.1 Gbps !!! 65 Embedded Processor • Soft Processor: MicroBlaze 32bit @150MHz • Hard Processor: IBM PowerPC405 32bit RISC @300MHz (in Vertex-II Pro) – – – – – – – – Low Power Consumption: 0.9 mW/MHz Five-Stage Data Path Pipeline Hardware Multiply/Divide Unit Thirty-Two 32-bit General Purpose Registers Memory Management Unit (MMU) Dedicated On-Chip Memory (OCM) Interface Supports IBM CoreConnect™ Bus Architecture Debug and Trace Support 66 FPGA Design Tools 67 Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Specification (Lab Experiments) VHDL description (Your Source Files) Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Synthesis Post-synthesis simulation 68 Design process (2) Implementation Timing simulation Configuration On chip testing 69 Active-HDL 70 Simulation and Synthesis Tools 71 Logic Synthesis VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal signal signal signal begin A1:STD_LOGIC; B1:STD_LOGIC; Y1:STD_LOGIC; MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 MUX_1<=A1 MUX_2<=A1 MUX_3<=A1 and B1; or B1; xor B1; xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; 72 Features of synthesis tools • Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist 73 Implementation • After synthesis the entire implementation process is performed by FPGA vendor tools • Xilinx ISE foundation 11.1i • Altera Quartus II 9.2 • 3rd party tools for alliance version 74 Circuit Compilation 1. Technology Mapping LUT 2. Placement LUT ? Assign a logical LUT to a physical location. 3. Routing Select wire segments And switches for Interconnection. 75 Routing Example FPGA Programmable Connections 76 Configuration • Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bit stream or configuration file • The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information 77 QUESTIONS? THANK YOU 78