Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ECE 697F Reconfigurable Computing Lecture 1 Course Introduction Prof. Russell Tessier Lecture 1: Course Introduction September 8, 2004 What is Reconfigurable Computing? • Computation using hardware that can adapt at the logic level to solve specific problems ° Why is this interesting? • Some applications are poorly suited to microprocessor. • VLSI “explosion” provides increasing resources. • Hardware/Software • Relatively new research area. ° Acknowledgement: Wolf text Lecture 1: Course Introduction September 8, 2004 Background needed • Basic VLSI – transistors, delay models. • Basic algorithms – graph algorithms, seaches • Computer Architecture – ALU, microprocessor • Digital Design – adder, counter, etc. Topic self-contained! Lecture 1: Course Introduction September 8, 2004 Course Organization • Two lecture a week (about 27 overall) • 3 Homework assignments (20%) • Final project (35%) • Transcription assignment (20%) • Mid-term (25%) • Required text – FPGA-based System Design (Wolf) Lecture 1: Course Introduction September 8, 2004 Microprocessor-based Systems Data Storage (Register File) A B C ALU 64 • Generalized to perform many functions well. • Operates on fixed data sizes. • Inherently sequential. Lecture 1: Course Introduction September 8, 2004 Reconfigurable Computing A B If (A > B) { H = A; Functional L = B; Unit } Else { H L H = B; L = A; • Create specialized hardware for each application. } • Functional units optimized to perform a special task. Lecture 1: Course Introduction September 8, 2004 Example: Bubblesort A B A B H L H L A B A B H L H L A B H L Smallest Largest • Adapt interconnect to problem. • Take advantage of parallelism. Lecture 1: Course Introduction September 8, 2004 Implementation Spectrum Microprocessor Reconfigurable Hardware ASIC • ASIC gives high performance at cost of inflexibility. • Processor is very flexible but not tuned to the application. • Reconfigurable hardware is a nice compromise. What does it look like? Lecture 1: Course Introduction September 8, 2004 Moore’s Law ° Gordon Moore: co-founder of Intel. ° Predicted that number of transistors per chip would grow exponentially (double every 18 months). ° Exponential improvement in technology is a natural trend: steam engines, dynamos, automobiles. Lecture 1: Course Introduction September 8, 2004 Moore’s Law plot Lecture 1: Course Introduction September 8, 2004 The cost of fabrication ° Current cost: $2-3 billion. ° Typical fab line occupies about 1 city block, employs a few hundred people. ° New fabrication processes require 6-8 month turnaround. ° Most profitable period is first 18 months-2 years. Lecture 1: Course Introduction September 8, 2004 Reconfigurable Hardware Logic Element A B C D A Out B C D = out • Each logic element operates on four one-bit inputs. • Output is one data bit. • Can perform any boolean function of four inputs 2 2 4 = 64K functions! Lecture 1: Course Introduction September 8, 2004 Field-Programmable Gate Array Tracks Logic Element LE LE LE LE LE LE LE LE LE LE LE LE • Each logic element outputs one data bit. • Interconnect programmable between elements. • Interconnect tracks grouped into channels. Lecture 1: Course Introduction September 8, 2004 FPGA Architecture Issues Logic Element • • • • Need to explore architectural issues. How much functionality should go in a logic element? How many routing tracks per channel? Switch “population”? Lecture 1: Course Introduction September 8, 2004 Real World Physical Issues S S Wires have real cost • • • • Modelling FPGA delay. Improving performance through buffering/segmentation. Technology dependent. The cost of reconfigurability. Lecture 1: Course Introduction September 8, 2004 Translating a Design to an FPGA C program . . C = A+B . Circuit A B + Array C • CAD to translate circuit from text description to physical implementation well understood. • CAD to translate from C program to circuit not well understood. • Very difficult for application designers to successfully write highperformance applications Need for design automation! Lecture 1: Course Introduction September 8, 2004 High-level Compilers • Difficult to estimate hardware resources. • Some parts of program more appropriate for processor (hardware/software codesign). • Compiler must parallelize computation across many resources. • Engineers like to write in C rather than pushing little blocks around. A C = A+B B + C Lecture 1: Course Introduction for (i = 0; i<n, i++) { . . } September 8, 2004 Design abstractions English Executable program Sequential machines function Logic gates Lecture 1: Course Introduction specification behavior Throughput, design time registertransfer Function units, clock cycles logic Literals, logic depth transistors circuit nanoseconds rectangles layout microns September 8, 2004 cost Circuit Compilation 1. Technology Mapping LUT 2. Placement LUT ? Assign a logical LUT to a physical location. 3. Routing Select wire segments And switches for Interconnection. Lecture 1: Course Introduction September 8, 2004 Two Bit Adder Made of Full Adders A B Co FA A+B = D Ci S Logic synthesis tool reduces circuit to SOP form S = ABCi + ABCi + ABCi + ABCi A B Ci LUT Co A B Ci LUT Co = ABCi + ABCi + ABCi + ABCi Lecture 1: Course Introduction September 8, 2004 S FPGA design ° FPGA manufacturer creates an FPGA fabric; system designer uses the fabric. ° FPGA fabric design issues: • Study sample user designs. • Select interconnect topology. • Create logic element structures. • Design circuits, layout. Lecture 1: Course Introduction September 8, 2004 Why do we care about layout? ° We won’t design layout. ° Layout determines: • Logic delay. • Interconnect delay. • Energy consumption. ° We want to understand sources of FPGA characteristics. Lecture 1: Course Introduction September 8, 2004 Design validation ° Must check at every step that errors haven’t been introduced-the longer an error remains, the more expensive it becomes to remove it. ° Forward checking: compare results of less- and more-abstract stages. ° Back annotation: copy performance numbers to earlier stages. Lecture 1: Course Introduction September 8, 2004 Processor + FPGA Three possibilities daughtercard Proc FPGA chip Backplane bus (e.g. PCI) 1. FPGA serves as coprocessor for data intensive applications – possible project. Proc FPGA chip 2. FPGA serves as embedded computer for low latency transfer. “Reconfigurable Functional Unit” Lecture 1: Course Introduction September 8, 2004 Processor + FPGA (cont..) 3. Processor integration Processor RF ALU FPGA • FPGA logic embedded inside processor. • A number of problems with 2 and 3. - Process technology an issue. - ALU much faster than FPGA generally. - FPGA much faster than the entire processor. Lecture 1: Course Introduction September 8, 2004 Multi-FPGA Systems F F F F F F F F F • Most applications don’t fit on one device. • Create need for partitioning designs across many devices. • Effectively a “netlist computer” Each FPGA is a logic processor interconnected in a given topology. Lecture 1: Course Introduction September 8, 2004 Dynamic Reconfiguration L L L L • What if I want to exchange part of the design in the device with another piece? • Need to create architectures and software to incrementally change designs. • Effectively a “configuration cache” Examples: encryption, filtering. Lecture 1: Course Introduction September 8, 2004 Research Areas • Storing configuration info inside device. • Architecture evaluation. - Size and performance tradeoff. • Layout of a new logic element. • Algorithm for place and route. • Apply an application to FPGA logic. Lecture 1: Course Introduction September 8, 2004 Versatile Place and Route • Written by Vaughn Betz at the University of Toronto • Performs FPGA placement and routing. • Written in C • Runs on Suns, Alphas, Linux • Estimates device sizes and performance. Lecture 1: Course Introduction September 8, 2004 Xilinx XC4000 Cell Lecture 1: Course Introduction • 2 4-input look-up tables • 1 3-input look-up table • 2 D flip flops September 8, 2004 Xilinx XC4000 Routing Lecture 1: Course Introduction 25 September 8, 2004 Altera Flex10K Lecture 1: Course Introduction September 8, 2004 Altera Flex10K Lecture 1: Course Introduction September 8, 2004 Xilinx Virtex-II Pro Lecture 1: Course Introduction September 8, 2004 Altera Stratix Lecture 1: Course Introduction September 8, 2004 Xilinx Virtex CLB Lecture 1: Course Introduction September 8, 2004 Embedded RAM ° Xilinx – Block SelectRAM • 18Kb dual-port RAM arranged in columns ° Altera – TriMatrix Dual-Port RAM • M512 – 512 x 1 • M4K – 4096 x 1 • M-RAM – 64K x 8 Lecture 1: Course Introduction September 8, 2004 Xilinx: Embedded Multipliers Lecture 1: Course Introduction September 8, 2004 Summary ° Reconfigurable computing relies heavily on new VLSI technology ° Device architectures maturing ° Application development progressing at rapid pace ° Integration of hardware and software a difficult challenge ° Active area of research at UMass. Lecture 1: Course Introduction September 8, 2004