Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
® DLLs 2 The Need for Clock Management As system speeds increase, we can no longer ignore clock skew and noise problems — A 2ns clock skew matters more with a 6ns clock, than it does with a 20ns clock Need a way to control clock skew and decrease the effect of noise on the clock 3 Ways to Manage the Clock PLLs DLLs — All digital — Triggered by incoming clock edge — Creates output jitter less than 50ps — Less susceptible to analog noise — Easily transferable from one process technology to another 4 — Uses analog VCO — Can suppress incoming clock jitter — Adds undefined output jitter — Susceptible to analog noise — Not easily transferable from one process technology to another DLL Basics A DLL works by inserting delay on the clock net until the next clock input rising edge is in phase with the clock feedback rising edge. Requires a well designed low-skew clock distribution network so that the clock edges arrive simultaneously everywhere in the part. CLKIN Delay Delay Delay Delay CLKOUT Phase Delay Control CLKFB 5 Clock Distribution Network DLL Functions Virtex Speedup Tc2o Zero-Delay Internal Clock Buffer Clock Phase Synthesis For Use Internally Or Externally Virtex Clock Multiplication & Division For Use Internally Or Externally 6 Clock Mirror Zero-Delay Board Clock Buffer DLL Tclock-to-out Speedup Tclock = 0ns D Q > DLL OUT CLKext Tc2q + Tout = Tc2o CLKint Nullify clock delay - fast Tc2o on XCV1000 —External CLKext pin and internal CLKint pin are aligned —2.5ns setup/0.0ns hold & 3.5ns Tc2o on all devices Optional Duty Cycle correction —50/50 Duty Cycle correction applied when specified 7 DLL Multiplication 16 32 16 Data Buffer IO Internal Logic 2x DLL x CLK Generate 2x & 4x clocks — Reduce board EMI and trace concerns by routing low frequency clocks externally and multiplying internally Cross clock domains without worry — Multiplied & divided clocks have synchronized edges — No external clock drift & minimal external clock skew 8 DLL Division Selectable Division Values — 1.5, 2, 2.5, 3, 4, 5, 8, or 16 — 50/50 Duty Cycle correction available — Use DLL pair to combine functions Input 180 2X 30 MHz - 180° Phase Shift DV2 DLL 30 MHz (180° Shift) 30 MHz 30 MHz Used for FB DLL 15 MHz (Divide by 2) 60 MHz (Multiply by 2) 30 MHz (180° Shift) 30 MHz 180° Phase Shift - Clock Multiply & Clock Divide 9 System Synchronization Synchronize all devices — Eliminate board clock skew — Nullifies clock input & board delay in addition to internal distribution delay — Removes chip to chip race conditions — Increases chip to chip interface speed - 240MHz for Virtex-E CLK DLL DLL FPGA 1 DLL DLL FPGA 2 FPGA 3 10 DLL FPGA N DLL Applications Clock to out Speedup — High Speed Memory interfaces — High Speed chip to chip requirements Clock Multiplication/Division — Multiply clock internally, so that the external clock is slower, thus decreasing the signal integrity problems on the board Clock Phase Shift and Duty Cycle Correction — Double Data Rate applications — Generation of multiple clocks Clock Mirroring — Generate extra external clocks for fanout issues — Board level clock management 11 Virtex-E DLL Modes Low Frequency — — — — Input Frequency Range - 25 MHz to 160 MHz Maximum Output Frequency - 320 MHz Minimum High/Low Time - 2.0 ns* All 6 Outputs Available for use Internally & Externally – CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV High Frequency — — — — Input Frequency Range - 60 MHz to 320 MHz Maximum Output Frequency - 320 MHz Minimum High/Low Time - 1.3 ns* 3 Outputs Available for use Internally & Externally – CLK0, CLK180 & CLKDV Both Modes Supported with Simple Design Primitives — VHDL & Verilog Simulation Support Available * Varies with frequency 12 DLL Software Support Use BUFGDLL macro for common clock usage BUFGDLL 0ns Build complex structures using clkdll primitive Equivalent Structure CLKDLL CLK0 CLKIN CLKFB CLK90 BUFG CLK180 CLK270 CLK2X CLKDV RST PAD LOCKED 13 IBUFG DLL FB To distributed clock network What happens if the CLKIN phase shifts? The outputs will phase shift 1-4 clock edges after the CLKIN shifts. — Due to this delay inter-chip communication could have problems since the clock sources are not aligned. LOCKED will stay asserted and the control logic will remain at the previous setting Advice: Keep the phase shift to a longer LOW pulse. 14 What happens if the CLKIN changes frequency? The control logic is may not able to catch period changes of 1.0ns or more The outputs may start to destabilize as the control logic tries to adjust the delay lines to compensate. What to do: Make sure that a change of frequency is followed by a reset of the CLKDLL. 15 What happens if the operating temperature changes? The DLL will automatically adjust for temperature variance DLL specs are guaranteed for chip temperatures between 0ºC and 85ºC 16 Why can’t I mux the CLKIN line? The CLKIN input must come from an IBUFG, a BUFG driven from another CLKDLL, or DLLIOB If a LUT or other route is placed in the circuit the CLKDLL can not adjust for this unknown delay What to do: Route the net out of the chip and into an IBUFG or DLLIOB 17 DLL Information XAPP132: Using the Virtex DLL XAPP400: DLL usage in Software http://www.xilinx.com/apps/virtexapp.htm 18 Differential Signaling LVDS, LVPECL, BusLVDS 19 Moore’s Law at Work Blasting Thru the 100M Transistor Barrier XCV3200E 211M Transistors 200M XCV2000E 125M Transistors 100M XCV1000 75M Transistors 1998 20 1999 2000 I/O Bandwidth Trends Bandwidth (MB/s) 10,000 1,000 Ethernet 100 SCSI 10 Internet Backbone 1986 1988 1990 21 1992 1994 1996 1998 2000 2002 I/O Signaling I/O Signaling Single-Ended I/O Signaling TTL HSTL 22 Differential I/O Signaling SSTL LVDS BLVDS LVPECL The Problem As the process shrinks, the absolute I/O noise margin shrinks as well 5V 4V Logic 1 3V Logic 1 2V 1.6 V Logic 1 1.0 V 1V 0.86 V Logic 0 Logic 0 5V CMOS 3.3V CMOS 23 Logic 0 1.8V CMOS Differential Signaling The Solution Differential I/O signaling has a higher noise immunity The data is transmitted in the voltage difference of two lines The noise effects both lines, but the voltage difference stays about the same, which means that the data is not effected by the noise 24 Differential Signaling The Benefits The benefits: — — — — High Noise Immunity… Huge Benefit Low Power High Speed I/O transfer Low EMI – Noise due to switching cancels between the two lines, since both lines switch at the same time, in the opposite direction 25 Differential Configurations Multidrop Point to Point Multi-Point 26 Signal Interconnect Classification Dual-Pin Differential Point-to-Point + _ LVDS LVPECL 50 Transmission Lines + _ + _ Multi-Drop Bus LVDS LVPECL •Typically found in backplanes 30 Transmission Lines Multi-Point Bus LVDS LVPECL •Typically found in backplanes 30 Transmission Lines 27 VIRTEX-E as a Differential Receiver Point-to-point configuration LVDS/LVPECL Line driver Zo = 50 Q Virtex-E FPGA IN Rt Data out Data in Zo = 50 QB INX VIRTEX-E can be driven by any standard LVDS or LVPECL driver VIRTEX-E receiver complies with the LVDS or LVPECL specs 28 VIRTEX-E as an Differential Driver Point-to-point configuration Virtex-E FPGA Zo = 50 Rs Q Rdiv Data out OUT Standard LVDS or LVPECL receiver, or VIRTEX-E LVDS or LVPECL receiver Rt Data in Zo = 50 QB Rs OUTX Capable of driving any standard LVDS or LVPECL receiver 29 LVDS LVDS stands for: — Low Voltage Differential Signaling. It’s a way of communication using low voltage — Swing (~350 mV) over two differential connections. The Big motivation for developing LVDS is the need for noise immunity for board to board communication 30 BLVDS BLVDS stands for: — Bus LVDS Bidirectional LVDS — The device can transmit and receive LVDS signals through the same pins Requires different termination than LVDS 31 Virtex-E LVDS Signaling +/- 175 mV Swing @ 1.25V Midpoint Q _ Q 1.5V 1.0V 0.5V 0.0V Computed Signal Differential 2 x (Q-QB) 32 LVDS Standards Parameter RS-422 Driver output voltage Receiver input threshold Data Rate Dynamic Power Noise Cost ~2 - 5 V ~200 mV <30 Mbps Low Low Medium 33 PECL ~600 - 1.000 mV ~200 - 300 mV > 400 Mbps High Low High LVDS ~250 - 450 mV ~100 mV > 400 Mbps Low Low Low LVDS Characteristics Termination — The transmission medium must be terminated with a 100 + 20 . — The resistor is placed across the differential inputs — With this termination as LVDS driver can drive signals over several meters at speeds in excess of 155.5 Mbps (77.7 MHz). — The real limitation of speed is: – – How fast can data be delivered to the driver. Bandwidth performance of the selected media. — The simple LVDS termination is easy to implement — ECL and PECL require more complex termination schemes. 34 LVDS Advantages Saving Power — LVDS technology saves power in several important way’s. — Power dissipation at the terminator is ~1.2 mW – RS-422 driver delivers 3 V across a termination of 100 , for 90 mW power consumption... 75 times more than LVDS! — Due to the current mode driver design, the frequency component of Icc is greatly reduced. – Compared to TTL / CMOS transceivers where the dynamic power consumption increases exponentially with the frequency. 35 LVDS Advantages Save Money — High performance can be achieved using off the shelf FPGA’s — LVDS consumes less power, therefore one can use cheaper power supplies, or fewer fans — LVDS is low noise, so no more EMI headaches (save time). — Since LVDS is much faster than CMOS / TTL, LVDS signals can be serialized. This results in smaller packages, simpler connectors, etc 36 LVPECL LVPECL stands for — Low Voltage Positive Emitter Coupled Logic Well known industry standard for fast clocking Voltage swing (~750 mV) over two differential connections. Virtex-E offers easy interface with other standard LVPECL chips 38 LVPECL Clocking TTL is not the most desired clocking technique for clock frequencies higher than 150 MHz System Clock Speed LVPECL TTL 150 MHz 39 Clock Sources TTL Oscillator TTL/CMOS Up to ~135MHz LVPECL Up to ~250 MHz Generic LVPECL Oscillator Example: Saronix SEL3400 Series Quartz Crystal 16MHz Nom LVPECL Clock Synthesizer Example: Motorola MC12429 Synergy SY89429V 40 LVPECL Up to ~400 MHz Virtex-E 300+ MHz LVPECL Clocking No LVPECL-TTL Translator Equal-Length Point-to-Point LVPECL PCB Clock Traces Virtex-E 1 2 LVPECL Clock Source LVPECL Clock Distributor 2 2 Virtex-E 2 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E Virtex-E n Typical Discrete Solution:PECL-to-TTL Motorola MC100EPT23 Dual Differential -PECL to TTL Translator, TPD = 2.0ns & Skew Virtex-E Eliminates Converters Eliminates 2ns Delay 41 Virtex-E LVPECL Clock Conversion Receive and convert high speed clocks with zero delay Zero-Delay Local Clock Generation to Any of Virtex-E I/O Standards LVPECL Clock TTL DLL DLL Virtex-E 42 External RAM, etc. SSTL External RAM, etc. Putting it All Together ... No LVPECL-TTL Translator Device Equal-Length Point-to-Point LVPECL PCB Clock Traces Virtex-E 1 Device Device 2 LVPECL Clock Source 2 LVPECL Clock Distributor 2 Virtex-E 2 Device 2 Example Devices: Motorola MC10/100E111 Synergy SY10E111LE Virtex-E 43 Device Virtex-E n Device Designing With LVDS and LVPECL Some Facts — Impedance Matching is VERY important — Discontinuities in impedance WILL create reflections. — Reflections degrade signals and show up as Common Mode Noise. — Common Mode Noise cancels the magnetic shield effect of differential lines and radiates as EMI. — Do not make sharp turns since this causes impedance discontinuities. — Keep stubs and uncontrolled tracks < 10 mm. 44 Designing With LVDS and LVPECL (Continued) PCB guidelines: — Use at least 4 PCB layers (LVDS signals, ground, power, TTL/CMOS signals) — Separate TTL/CMOS signals from the LVDS signals — Keep LVDS driver/receiver connections as close to the connectors as possible. — Decouple the power supply as good as possible. — Connect all the VCC and Ground pins of the component. — Make power and ground tracks as wide as possible. — Connect to power and ground tracks with multiple vias. 45 Designing With LVDS and LVPECL (Continued) PCB guidelines — Match the tracks to the impedance of your transmission medium and termination resistor. — Run differential tracks as close together as possible as soon as they leave the IC — Use Microstrip or Stripline for tracks — Match electrical length of tracks to reduce skew. — Keep the distance of a pair of tracks as constant as possible to avoid discontinuities in impedance. 46 Designing With LVDS and LVPECL (Continued) PCB guidelines — Use a good matching termination resistor. – LVDS will not work without resistor termination. — Typically a single resistor at the receiver is OK. — Surface mount resistors are best. – – – Stubs are short. Distance between receiver and termination is short. No component leads. — At extra cost you can use the center tap capacitance termination scheme. R/2 R C R/2 47 More LVDS and LVPECL Info At Xilinx’ website: http://www.xilinx.com/apps/xapp.htm Look at AppNotes XAPP230, XAPP231, XAPP232 48 Memory Interfaces ZBT RAM, SDRAM, DDR SDRAM 49 Virtex-E and High Speed Memory Interfaces Features needed for interface to high speed memory — Fast I/Os — Clock management capabilities Virtex-E has both: — SSTL2, HSTL, LVDS, LVPECL and many more — 8 on-chip DLLs - use for Clk-to-Out speed up, clock deskew, clock multiplication/division 50 Benefits of using an FPGA for the Memory Interface Easy to implement Can add functionality in the future easily — ASIC is a one-time-deal Combine multiple discrete devices into the FPGA — Save space, money, and power 51 High Speed Memory Interfaces ZBT RAM Interface SDRAM Interface DDR SDRAM Interface 52 Zero Bus Turn-around SRAM Extremely high bandwidth — Other non-cache applications in telecom, test equipment, DSP and embedded memory applications ZBT stands for “Zero Bus Turnaround” — — — No idle cycles between read-to-write and write-to-read 100% bus use Previous architectures had a Turnaround Cycle Completely Deterministic Timing - Simplifies System Design — Any cycle can perform any operation 53 ZBT SRAM Parameters Densities 2, 4 and 8 Mbits Data bus widths 18, 32, and 36-bit IO Voltage and standards 2.5V, 3.3V, LVTTL Flow thru speed 8, 10ns Pipeline speed 5, 6, 7.5ns (Clock cycle time) 54 (Clock cycle time) ZBT Flow-ThroughTiming Read Operation - data available after single clock latency Clk 1 2 Control Address Data Write Operation - “Late Write” data to be written is presented on next clock Clk 1 2 Control Address Data 55 ZBT Pipelined Timing Read Operation - data available after two clock latency Clk 1 2 3 Control Address Data Write Operation - “Late Write” data is written 2 cycles later Clk 1 2 3 Control Address Data 56 ZBT 100% Bus Use Write/Write/Read/Write/Read/Burst Read T1 T2 T3 Command Write1 Write2 Read1 Address Addw1 Addw2 AddR1 T4 T5 T6 WRITE3 Read2 RdBrst Addw3 AddR2 T7 T8 Clock Dout w1 DQ Dout w2 Din R1 Dout w3 Din R2 Pipelined part’s timing is illustrated above 57 Din R2+1 Virtex-E ZBT Bandwidth 800 Mbytes/sec @ 32bits wide Device Frequency (MHz) ZBT Pipelined ZBT Pipelined ZBT Pipelined ZBT Pipelined SyncBurst Pipelined ZBT FlowThrough SyncBurst Flow-Through 200 166 143 133 133 100 83 Cycle Time (nS) 5 6 7 7.5 7.5 10 12 MAX* READ/WRITE Bandwidth Cycle (MByte/sec) Bandwidth 800 800 666 666 572 572 533 533 533 267 400 400 332 221 READ/WRITE Burst of 4 Bandwidth 800 666 572 533 426 400 295 Very High Performance Synchronous, Static Memory NOTE: The bandwidth figures presented in this table are for a 32 bit data path, the raw bandwidth is 12.5% higher if a 36 bit data path is used. 58 ZBT Interface Reference Design DLL 1 CLKin DLL 2 Clk2x Clk2x Data out Reset Data in Addr Error Data Controller Tester ZBT SRAM RW# XCV300-E 59 Addr ZBT Interface Application Note •7.2 Giga-bits/s @ 36 bits wide •200 MHz Synthesisable HDL Controller Design •XCV300-E, -6 speed grade ZBT Controller Interface with tester resource utilisation 93 Logic Cells 502 Flip Flops 71 IO Part XCV50-E XCV100-E XCV200-E XCV300-E XCV400-E XCV600-E XCV1000-E Logic Cell Total Flip Flop available Utilisation Utilisation Logic Cells 5.38% 3.44% 1.76% 1.35% 0.86% 0.60% 0.34% 1,728 2,700 5,292 6,912 10,800 15,552 27,648 60 32.68% 20.92% 10.67% 8.17% 5.23% 3.63% 2.04% Total available Flip Flops IO Utilisation Total available IO 1536 2400 4704 6144 9600 13824 24576 39.44% 39.44% 25.00% 22.47% 17.57% 13.87% 13.87% 180 180 284 316 404 512 512 ZBT Bus Contention - Real World 143 MHz Clock R/W Address [0] Data [0] Scope shot taken directly from the ZBT controller reference board. 61 Virtex-E High Speed SDRAM Interface SDRAM Overview — Features Virtex-E SDRAM controller — Features — Block diagram — Timing 62 SDRAM Features: — Synchronous interface (free system from wait states) — Burst mode access (reduce CAS access time) — Multiple banks (parallel processing: access one bank, precharge/refresh the other) — LVTTL, 3.3V — Programmable burst length, CAS latency Clock READ Command Address Col D1 DQ CAS latency=2 D2 D3 Burst length=4 63 D4 SDRAM Controller Application Note Synthesizable Verilog/VHDL Programmable burst length (1, 2, 4, 8) Programmable CAS latency (2, 3) Automatically issues refresh commands Supports LOAD_MR, AUTO_REFRESH, PRECHARGE, ACT_ROW, READA, WRITEA, BURST_STOP, NOP Interfaces with SDRAM at 125MHz (Virtex-E, -6 speed) Uses 2 DLLs and 165 CLB slices (5% of XCV300E) 64 SDRAM controller 62.5MHz clock system controls 125MHz clock XCV300-E -6 data_addr_n controls addr 11 AD data 32 32 65 SDRAM 16M (x16) SDRAM controller Controller 66 SDRAM controller IO timing Read Cycle is the critical timing: — SDRAM-8 clk-to-out = 6.0ns — Virtex-6 setup = 1.7ns — 125 MHz operation (8ns cycle), 300ps left for board routing on data lines Write Cycle: — Virtex-6 clk-to-out = 3.9ns — SDRAM-8 setup = 2.0ns — 125 MHz operation (8ns cycle), 2.1ns left for board routings 67 Virtex-E DDR-SDRAM Interface DDR SDRAM Overview — Features — Differences from SDRAM Virtex-E SDRAM controller — — — — Features Block diagram Timing Board layout guideline 68 DDR SDRAM Features: — Next generation SDRAM — DDR data I/O (twice the bandwidth at the same clock frequency as SDRAM) — Peak bandwidth: 1.6 GBytes/s (64-bit @ 100MHz) — 2.5V, SSTL2, 100/133MHz — Advantages over RDRAM cost, package, open industry spec, compatible with existing spec — Supported by major vendors Micron, Samsung, IBM, Fujitsu, Hitachi, Huyndai, Toshiba,... 69 DDR SDRAM Differences compared to standard SDRAM: — All IOs are SSTL2, 2.5V (reduce power and noise) — Differential clock (CLK and CLKB). Positive edge clock is the crossing of CLK going high and CLB going low. — Bidirectional data strobe (clock-to-data skew is eliminated) — Double Data Rate data transfer 70 Write Cycle SDRAM: clk cmd ACT addr data ROW NOP WRITE COL D1 D2 D4 D3 DDR SDRAM: clk clkb cmd addr dqs ACT NOP WRITE COL ROW data D1 71 D2 D3 D4 Read Cycle SDRAM: clk cmd ACT addr data ROW READ NOP COL D1 D2 D4 D3 DDR SDRAM: clk clkb cmd addr dqs ACT NOP READ COL ROW data D1 72 D2 D3 D4 DDR SDRAM controller Application Note Synthesizable Verilog Virtex-E, -6 speed grade: 100 MHz Clk — 200 MHz Data rate — 1.6 Giga-Bytes/S bandwidth @ 64 bits wide Programmable CAS latency, burst length 2 DLLs, 474 slices (15% of XCV300-E) Uses “Logic Accessible Clock” technique Uses Clock to latch Read Data, instead of DQS 73 DDR SDRAM controller Virtex-E 74 DDR SDRAM IO timing Data Lines: Read Cycle Data Lines — Read cycle is critical. Data is strobed by clk, instead of DQS ddr_clk minimum DDR clk-out -0.8ns -0.4ns minimum Virtex-E hold time Minimum trace delay on data = 0.8ns - 0.4ns - clock skew between ddr_clk & fpga_clk = 0.4ns- clock skew 75 DDR SDRAM IO timing Addr/Cntrl Lines Address and Control lines are generated on the negative edge of the clock, to guarantee DDR hold time 5ns ddr_clk 2.4ns Virtex-E clk_out (max) 1.2ns DDR setup time Maximum trace delay on Addr/Cntrl = 5ns - 2.4ns - 1.2ns - clock skew = 1.4ns - clock skew 76 DDR SDRAM IO timing Summary The I/O spec for DDR is very tight Carefully calculate data and address trace delays to guarantee setup and hold times The minimum trace delay on the data lines can be eliminated by delaying the ddr_clk — Since DDR has negative tAC(min), delaying the ddr_clk helps meet Virtex-E’s hold time requirement 77 Board Layout Guideline All high speed memory interfaces — Virtex device and the memory chips must be placed close to each other — Consider/Simulate board level signal integrity and timing, pay particular attention to clocks — Use matched impedance traces DDR — All bi-directional signals use IOBUF_SSTL2_II (data & data strobes) other output signals use OBUF_SSTL2_I — DQ lines must be closely matched, and kept short to minimize cross talk — DQS trace lengths should match DQ — CLK and CLKB delays and loads should match (CLKB can also be routed back to an unused IOB near the feedback pin) 78 Memory Interface Application Notes ZBT RAM: XAPP136 SDRAM: XAPP134 DDR SDRAM: XAPP200 http://www.xilinx.com/apps/virtexapp.htm 79 CAM in Virtex-E 80 CAM Overview Content Addressable Memory Storage Array (like RAM) Find a location of a particular stored value Compare input against data in memory – If Match found, output the Address – Maximum performance, if match in a single clock cycle 81 CAM Overview Simple RAM and CAM compared RAM Add [9:0] 1024 x 8 Dout [7:0] CAM Add [9:0] Din [7:0] 1024 x 8 Match 82 CAM Applications Telecommunications Networking Ethernet ATM Protocol 83 CAM Overview CAM features: — — — — — — — Word Size (width) Number of Words (depth) Match or Compare Time (read) Significance of Write Speed Clock Frequency Masks Decoded and/or Encoded Address (outputs) 84 CAMs in Virtex-E Depth 32 256 32 128 4096 Width 8 8 16 40 16 Size Match 256 bits 4.5 ns 2Kbits 8.5 ns 512 bits 8 ns 5Kbits 12 ns 64Kbits 16 x 20 ns Device Logic XCV50-6 BRAM XCV50E-6 BRAM XCV50-6 SRL16 XCV300-6 SRL16 XCV400-6 RAM16x1 Flexible CAM designs in Virtex and Virtex-E — CAM implemented in a LUT — CAM implemented in a Block SelectRAM 85 Designing CAM in Virtex slices XAPP203: “Designing Flexible, Fast CAMs with Virtex Family FPGAs”: — VHDL and Verilog Reference Designs available Features — — — — — — — — — 4 bits per LUT 16-word x 4-bit organization Match in one clock cycle 16 Write clock cycles Decoded address output Generic word width from 4 bits up to any multiple by 4 Generic number of 16 words CAM blocks Cascadable Address Encoder in logic or tri-state buffers (TBUF) 86 CAM in a LUT Match Operation 8 DATA_IN MATCH_SIGNAL D D A[0:3] 4 Q Q FF SRL16 CLK LUT D 4 Q A[0:3] SRL16 “1” Wide AND LUT 1 slice Reconfigurable 8-bit Word Comparator 87 Match Waveforms for CAM in a LUT “…1001” DATA_IN MATCH_ENABLE MATCH “xxxx xxxx xxxx xxxx” “0000 0000 0000 0100” “xxxx” “0010” R_MATCH_OK R_MATCH_ADDR CLK Match_cycle DATA_IN CAM 16WORDS MATCH_ENABLE 88 MATCH Encode_cycle R_MATCH_ADDR ENCODE R_MATCH_OK CAM in a LUT Write Operation DATA_IN 8 4-bit Compare 4 D A[0:3] Q SRL16 MSB LUT 4 4 4-bit Compare D Q A[0:3] SRL16 LSB LUT Counter 1 slice Reconfigurable 8-bit Word Comparator 89 Cascading CAMs in LUTs CAM match path (1 CLK) & encode (1 CLK) DATA_IN 8 Array of N x 16_WORDS 16 D CAM_16WORDS Encode 4 LSB CAM_16WORDS Encode 4 LSB CAM_16WORDS Encode 4 LSB CAM_16WORDS CLK MATCH_ENABLE 90 MATCH_ADDR Encode 4 LSB Q FF Encode MSB 16 FFs MATCH_OK D Q FF CAM in Block SelectRAM XAPP204: “Using Block SelectRAM+ for HighPerformance Read/Write CAMs”: — VHDL and Verilog Reference Designs available Features — — — — — — — — 128 bits per Block SelectRAM+ 16-word x 8-bit organization Match in one clock cycle Write in one clock cycle (and Erase in one clock cycle) Decoded address output Fully synchronous match and write ports (Independent) Cascadable Address Encoder in logic or tri-state buffers (TBUF) 91 CAM in a Block SelectRAM+ CAM 16x8 Macro in 1 Block SelectRAM+ ERASE_WRITE DATA_WRITE[7:0] ADDR[3:0] DIA[0] 8 12 PORT A ADDRA[11:0] WEA ENA 4 WRITE_ENABLE “0” CLK_WRITE RSTA DOA N.C. CLKA “0000….0000” DATA_MATCH[7:0] “0” MATCH_ENABLE MATCH_RST DIB[15:0] ADDRB[7:0] WEB ENB RSTB CLK_MATCH PORT B DOB[15:0] CLKB RAMB4_S1_S16 92 MATCH[15:0] Cascading Block SelectRAM+ CAMs for bigger depth CAM 64-word x 8-bit in Read Mode 8 8 8 CAM (16x8) CAM (16x8) CAM (16x8) MATCH[63:0] 8 CAM (16x8) DATA_MATCH[7:0] 64 [63:48] 48 [47:32] 32 [31:16] CLK_MATCH [15:0] 93 16 Cascading Block SelectRAM+ CAMs for higher width CAM 16-word x 16-bit in Read Mode MATCH[15:0] [15:8] [15] CAM (16x8) [15] [7:0] [15] CAM (16x8) [1] DATA_MATCH[15:0] [1] [1] [0] [0] [15:0] CLK_MATCH [15:0] 94 [0] CAM in Block SelectRAM+ The final picture CAM16x8 Macro — Match flag and encoded outputs Write port A (4096 x 1) DATA[7:0] ADDRB[7:0] MATCH[15:0] Read port B (256 x 16) DOB[15:0] CLK_MATCH ENCODE Decoded Address MATCH_ADDR[3:0] 16 D CLKB Q FF CLK_MATCH 95 4 MATCH_SIGNAL CAM in Virtex FPGAs Basic decoder/comparator block designed using: — — Virtex slices configured as 16-bit shift registers (8 bits per slice) Virtex dual port block SelectRAM+ (128 bits per block) Use an array of basic blocks to implement a CAM Width (bits)300 250 XCV2000E 200 150 BRAM 16x8b Slice 1x8b 100 Size = 20,480 bits 50 7680 2560 1280 15360 96 640 480 128 0 Size = 122,880 bits CAM depth in words XILINX CAMs comparison Device Implementation Min. CAM size Max CAM size MATCH (# of clock) WRITE (# of clock) Min. CAM width Min. CAM depth Max. CAM depth Fastest Match Decoded Address Design VIRTEX & VIRTEX-E Slices RAM16x1 based 10 bits per LUT ~ 500 Kbits (XCv3200E) 16 cycles 1 cycle 1 bit 16 words ~64 K 8-bit words 16 x 12 ns yes (by 16) Ref. Design 202 97 VIRTEX & VIRTEX-E Slices SRL16 based 4 bits per LUT ~200 Kbits (XCV3200E) 1 cycle 16 cycles 4 bits 1 word ~25 K 8-bit words 7.5 ns yes Ref. Design 203 VIRTEX & VIRTEX-E Block SelectRAM 128 bits per Block 26 Kbits (XCV3200E) 1 cycle 1 cycle (+1 erase cycle) 8 bits 16 words 3,328 8-bit words 4.5 ns yes Ref. Design 204 SRL16 98 SelectShift LUT Dynamically addressable Shift Registers, implemented in one LUT IN CE CLK D Q CE 0 D Q CE 1 D Q CE 2 CLB Slice Slice LUT LUT LUT LUT D Q CE ADDR[3:0] 99 15 OUT SelectShift Features Serial In, Serial Out Does not require an address counter Programmable cycle delay from 1 to 16 — Addr[3:0] specifies the desired delay Cascade for cycle delays greater than 16 CLB Flip-Flops can be used to add depth 100 Software Support D CLK A0 A1 A2 A3 SRL16 Q 16-bit Shift Register Look-Up-Table D CE CLK A0 A1 A2 A3 SRL16E Q Primitives available in software Positive or negative clock edge triggered Clock Enable optional Available for VHDL or Verilog instantiations 16-bit Shift Register Look-Up-Table with Clock Enable 101 SRL16 Applications Shift Registers Delayed Signal Generation Linear Feedback Shift Registers (LFSRs) CRC circuits 102 Virtex- E Configuration 103 Agenda Review of configuration Modes — Serial, Parallel, JTAG Startup Sequence XC1800 PROM interfacing Daisy Chaining Tips in debugging configuration issues JTAG Configuration 104 Operation Flow POWER UP CONFIGURATION • Serial Mode •Parallel Mode •JTAG Device Operational 105 Configuration Data stored in a PROM or downloaded through a cable Configuration time dependents — device size — type of configuration — clock speed Configuration Modes Serial Modes — Master — Slave Parallel Mode — SelectMAP JTAG 106 Serial Mode Configuration Master Serial Configuration Mode Virtex-E PROM CCLK CLK DIN DATA DONE /CE /INIT /RESET/OE Serial Configuration — Master mode: the Virtex-E device is initiating the configuration — Slave mode: the Virtex-E device is waiting for some other device to start the configuration 107 Serial Mode Configuration Data is loaded serially- one bit per CCLK A Virtex-E device in Master Serial Mode produces it’s own CCLK — CCLK rate is controllable in software — Mode used with a PROM In a Slave Serial Mode, Virtex-E device needs a CCLK provided by another device — All download cables do this 108 Parallel Mode Configuration SelectMAP Virtex-E D0-D7 /CS Microprocessor /WRITE CCLK PROG DONE One byte loaded per CCLK Designed to be driven by other logic device — — — — Another FPGA or CPLD Processor Microcontroller MultiLinx Cable 109 Important Signals in SelectMAP Data(D0-D7)- bi-directional data bus — D0 is the MSB /WRITE- direction of data on the bus — Low for configuration (Write) — High for readback /CS- enable for the data bus — a High will ignore CCLK transitions BUSY- output that indicates when data can be received — Not needed for CCLK < 50 MHz 110 SelectMAP- Things to Know Initialization needed after /INIT goes high — 3 CCLKs needed — If /CS and /WRITE are asserted early , no data will be transferred on the first CCLK To strobe data, use /CS, not /WRITE — If a CCLK rising edge occurs when /CS is asserted and /WRITE is de-asserted, an ABORT will occur – Need to reload Sync Word and redo last packet 111 Virtex-E Bitstream Format 10 internal configuration registers Bitstream is actually a set sequence of writes into those registers Configuration data still broken into frames All data is encapsulated into packets- Type I and Type II When migrating from Virtex to Virtex-E a new bitstream is needed 112 Configuration Registers Register Symbol Register Name/Description CMD Command Register- executes commands to control read/write, CRC, etc. FLR COR MASK CTL Frame Length- indicates frame size (available in XAPP138) Configuration Option Register- some user selected options from Bitgen Mask Register- masks out bits of CTL register for security Control Register- handles internal functions like Port Persistence FAR Frame Address Register- sets the starting frame address FDRI Frame Data Input- pipelined input register that receives frame data CRC Cyclic Redundancy Check- loaded with CRC value that checks for errors FDRO LOUT Frame Data Output- pipelined output register for reading frame data Legacy Data Output- pipelines data to the DOUT pin Each register has a 5-bit address Detailed information in XAPP 138 113 Configuration Startup Sequence Four signals to control — — — — GWE (Global Write Enable) GSR (Global Set/Reset) GTS (Global 3-State) DONE (External Done Pin) Six phases to select assertion/de-assertion (1-6) Sequencer will wait in the DONE phase until DONE goes high Can create “Sync-To-Done” behavior by setting GTS, GSR, and GWE to same cycle as DONE 114 Startup Sequence Phase 0 1 2 3 4 5 6 7 StartupClk DONE Default Phase in Bold GTS GSR GWE 115 Virtex-E and XC1800 PROM’s Can program via serial or SelectMAP mode — serial vs. parallel controlled in software 116 Daisy Chaining Master Slave Slave Virtex-E #1 Virtex-E #2 Virtex/4kX #3 DIN DIN DOUT DOUT DIN PROM Available only is Serial or JTAG Mode Concatenation of bitstreams does not work — Use the software to generate the necessary bitstreams (PROMGen) 117 Debugging Tips and Info What causes /INIT to go low? — CRC check fails — Internal error, e.g. data loaded too fast When will an error stay undetected? — A bit is missed or added- this will misalign the instructions, and the CRC check won’t happen Mode pin considerations — Internal pullups are guaranteed — Make sure pulldown is strong enough (4.7k) 118 JTAG Configuration 119 What is JTAG? JTAG - Joint Test Action Group — Developed as standard testing interface — Boundary Scan, IEEE STD 1149.1 Four Dedicated Pins Required: — TDI, TDO, TMS, and TCK — TRST is an optional 5th pin that Xilinx does not use 120 JTAG Standard JTAG Standard - 16 State, State Machine — — — — TAP (Test Access Port) IR (Instruction Register) DR (Data Register) Bypass Register 121 JTAG Tap Controller Test-Logic-Reset 1 1 Run-Test/Idle 1 Select-DR-Scan 0 0 0 CaptureDR 0 1 ShiftDR 1 1 0 Capture-IR 0 Shift-IR 0 Exit1DR 0 PauseDR 1 0 1 Exit1-IR 1 1 0 Pause-DR 0 0 1 0 Exit2-DR 1 Exit2-DR 1 Update-DR Update-IR 1 0 122 1 Select-IR-Scan JTAG TAP Controller: Architecture 123 BSDL Files Boundary Scan Description Language BSDL Files define the hardware — Description of the die, with pins and scan chain order — Information about the size of the various chip specific registers (e.g. instruction register length) Unconfigured BSDL files are provided — Assumes all I/Os are bidirectional 124 BSDL Availability Files on the web are continuously updated — Current software does not always have most recent BSDL file HTTP://support.xilinx.com -> Software 125 JTAG Programmer Software Support for Virtex-E JTAG Software Support in M2.1i SP3 — Non invasive: Idcode, Bypass, Usercode — SVF file generation Stay current with the download tools — Service packs — Web Pack (pc only) Foundation or Alliance software updates at: http://support.xilinx.com/support/techsup/sw_updates/ JTAG Programmer at: http://www.xilinx.com/sxpresso/webpack.htm 126 Cables Provided by Xilinx Multilinx — Supported in 2.1i sp2 JTAG Programmer — USB or Serial ports — Win 98 only Parallel Cable III XChecker 127 Cables: JTAG Connections * If there is a TRST trace on the board, it should be tied high 128 JTAG Debugging Tips Debug Chain Software Tool (Logic Probe) /TRST pin should be tied high on 3rd party chips Noise or bad parallel port ISP Checklist app note XAPP104 Know all devices in chain and the order Virtex-E does not tolerate 5V signals directly 129 Good References Virtex-E Datasheet- basic information on configuration modes XAPP138- Configuration modes, packets and readback XAPP151- Detailed bitwise explanation of configuration registers, partial reconfiguration hints and advanced concepts in readback XAPP139 - Detailed information on JTAG configuration and readback for VIRTEX devices XAPP153 - Status and Control register information for partial reconfiguration information http://www.xilinx.com/apps/virtexapp.htm 130