Download Digital Image Processing - Electrical & Computer Engineering

Document related concepts
no text concepts found
Transcript
EE4OI4
Engineering Design
Programmable Logic Technology
Evolution of Silicon Chip
• We often measure the size of an IC by the number of logic
gates or the number of transistors that the IC contains.
• Example: 100k-gate IC: contains equivalent of 100,000 twoinput NAND gates.
• Small-scale integration (SSI) ICs: contains a few (1 to 10)
logic gates (often simple gates NANA, NOT, AND)
• Medium-scale integration (MSI): increased the range to
counters and similar larger scale logic functions
• Large-scale integration (LSI): packed even larger logic
functions such as the first microprocessor into a single chip
• Very large scale integration (VLSI): 64 bit microprocessors
with cache memory and floating point arithmetic units (over a
million transistor on a single silicon)
2
Evolution of Silicon Chip
• Some digital logic ICs are standard parts.
• These ICs can be selected from catalog and data books and
bought and used in different systems
• With the advent of VLSI in the 1980s engineers began to
realize the advantages of designing an IC that was customized
or tailored to a particular system or application rather than
using standard ICs.
3
Digital logic technology
• ICs are made on a thin silicon wafer
• The transistors and wiring are made from many layers
(between 10 to 15) built on top of one another
• The first half-dozen or so layers define the logic cells (AND ,
OR, Flip-flop). The last half-dozen or so define the wires
between the logic cells (mask layer or interconnect)
4
Digital logic technologies
Digital
Logic
Standard
Logic
TTL
74xx
CMOS
4xxx
Progammable
Logic (FPLDs)
PLDs
FPGAs
Full
Custom
ASICs
Microproce ssor
& RAM
CPLDs
Gate
Arrays
Standard
Cell
Digital logic technologies.
5
Digital logic technology
• In a full-custom IC some (or all) logic cells are customized
and all the mask layers are also customized
• Example: a microprocessor is a full custom
• The designer does not use pre-tested, pre-characterized cells
• Why?
– No suitable entity cell library available (not fast enough, not small
enough, consumes too much power) or no cell library is available (new
application)
• Full custom ICs are the most expensive to design and
manufacture
• Design time is long
• Fewer and fewer full-custom ICs are being designed because
of the above problems
6
Digital logic technology
• Traditional integrated circuits chips: perform a fixed operation
defined by device manufacturer
• Internal functional operation is defined by user:
– Application Specific Integrated Circuits (ASIC)
– Field Programmable Programmable Logic Devices (FPLD)
7
Digital logic technology
• ASIC:
– Gate arrays
– Standard cells
• Gate array: an array of pre-manufactured logic cells
– A final manufacturing step is required to interconnect the logic cells in
a pattern created by the designer to implement a particular design
• Standard cell: no fixed internal structure
– The manufacturer builds the chip based on the user’s selection of
devices from the manufacturer’s standard cell library
8
Digital logic technology
• Programmable logic devices (PLDs) are standard ICs that
may be configured or programmed to create a part customized
to a specific application
• Features:
– No customized layers or cells
– Fast design time
9
Digital logic technology
Full custom
Semi-custom
Programmable
Customized
Pre-designed
Pre-designed
Programmed
by the user
Customized
Pre-designed
Programmed
by the user
Logic cell
Mask Layers
Customized
10
Ful Custom
V
LS
I Design
S
peed,
Densi ty,
Complexi ty,
Market
V
olume
needed for
P
roduct
A
S
ICs
CP
LDs
FP
GA
s
P
LDs
E
ngn
i eering Cost, Tm
i e to Deveo
lpP
roduct
Digital logic technology tradeoffs.
11
Programmable Logic Technology
• Simple programmable logic devices (PLDs) such as
programmable logic array (PLA) and programmable array
logic (PAL) have been in use for over 20 years.
• PLA: the idea is that logic functions can be realized in sum-of
products form
12
x1 x2
xn
Input buffers
and
inverters
x1 x1
xn xn
P1
OR plane
AND plane
Pk
General structure of a PLA
f1
fm
13
x1
x2
x3
Programmable
connections
OR plane
P1
P2
P3
P4
AND plane
Gate-level diagram of a PLA
f1
f2
14
x1
x2
x3
OR plane
P1
P2
P3
P4
AND plane
Customary schematic of a PLA
f1
f2
15
x1
x2
x3
P1
f1
P2
P3
f2
P4
AND plane
An example of a PLA
16
Programmable Logic Technology
• Programmable connections (switches) are difficult to fabricate
and reduce the speed of circuit
• In PALs the AND plane is programmable but the OR plane is
fixed.
• To compensate for reduced flexibility, PALs are manufactured
in a range
17
Programmable Logic Technology
• On many PLAs and PALs the output of the OR gate is
connected to a flip flop whose output can then be feedback as
an input into the AND gate array.
• This way simple state machines are implemented
18
Select
Enable
f1
Flip-flop
D
Q
Clock
To AND plane
Output circuitry
19
FPLD
• CPLDs and FPGAs are the highest density and most advanced
programmable logic devices.
• These devices are collectively called field programmable
logic devices (FPLD).
• Characteristics:
– None of the mask layers are customized
– The core is a regular array of programmable logic cells that can
implement combinational as well as sequential logic
– A matrix of programmable interconnect surrounds the basic logic cells
– Programmable I/O cells
• For all but the most time critical design applications, CPLDs
and FPGAs have adequate speed (clock range 50-400 MHz)
20
FPLD
• CPLDs and FPGAs typically contain multiple copies of a
basic programmable logic element (LE) or logic cell (LC).
• Logic element: can implement a network of several logic
gates that feed into 1 or 2 flip-flops
• Logic elements are arranged in a column or matrix on the chip
21
FPLD
• To perform complex operations, logic elements are connected
using a programmable interconnection network
• Interconnection network contains row and/or column chipwide interconnections.
• Interconnection network often contains shorter and faster
programmable interconnects limited only to neighboring logic
elements
22
FPLD
•
1.
2.
3.
FPLDs contain:
Programmable logic cells
Programmable interconnection
Programmable I/O cells
23
I/O block
Cell
I/O block
Cell
Cell
Cell
I/O block
I/O block
Interconnection wires
Structure of a CPLD
24
PAL-like block (details not shown)
PAL-like block
D Q
D Q
A section of a CPLD
D Q
25
Interconnection switches
I/O block
I/O block
I/O block
Logic block
I/O block
Structure of an FPGA
26
FPLD
• In large FPLDs the clock arrives at different times at different
flip flops if it is routed through the chip like a normal signal
• The situation in which the clock signal arrives at different
times at different flip flops is known as clock skew.
• Clock signals in large FPLDs are normally distributed using
an internal high speed bus (global clock line)
• Using global clock line, clock is distributed to all flip-flops in
the device at the same time.
27
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
Clock
Figure 10.44
An H tree clock distribution network
28
UP2
29
UP3
30
Altera MAX7000
• MAX7000 is a CPLD family with 600 to 20000 gates.
• Configured by an internal electrically erasable programmable
read only memory (EEPROM)
• Configuration is retained when power is removed
• The 7000 family contains from 32 to 256 macrocells.
• An individual macrocell contains five programmable AND
gates.
• The AND/OR network is designed to implement Boolean
equations expressed in sum-of-product form.
31
LAB Local Array
Parallel Logic
Expanders
(from other
macrocells)
Global
Clear
This respresents a
multiplexer
controlled by the
configuration
program
Global
Clock
Programmable
Register
PRN
ProductTerm
Select
Matrix
D
Clock/
Enable
Select
To I/O
Control
Block
Q
ENA
CLRN
Clear VCC
Select
Shared Logic
Expanders
36 Signals
from PIA
To PIA
16 Expander
Product
MAX 7000 macrocell.
32
Altera MAX7000
• Macrocells are combined into groups of 16 called logic array
block (LAB)
• Input to the AND gates include product terms from other
macrocells in the same block or signals from the chip-wide
programmable interconnect array (PIA)
33
Altera MAX7000
• Each I/O pins contains a programmable tri-state output buffer.
• An I/O pin can be programmed as input, output, output with a
tri-state driver and tri-state bi-directional.
34
Altera MAX7000
•
If more than five product terms are required, additional
product terms are generated using the following methods:
1. Parallel expander: product terms can be shared between
macrocells. A macrocell can borrow up to 15 product terms
from its neighbors
2. Shared expander: one of the product terms in a macrocell is
inverted and fed back to the shared pool of product term.
• The inputs to this product term are used in complement form
and using DeMorgan’s theorem a sum term is produced.
• Since there are 16 macrocells in an LAB, shared logic
expander pool has up to 16 terms
35
Input/GCLK1
Input/OE2/GCLK2
Input/OE1
Input/GCLRn
6 Output
6-16
I/O Pins
6 Output
6- LAB A
16
I/O
Macrocells
Control
1-16
Block 6-
3
16
Macrocells
17-32
1
616
I/O
Control
6- Block
6-16
I/O
Control
Block 6-
LAB C
Macrocells
33-48
3
16
6-16
3
1
6-16
6-16
I/O Pins
6
PIA
6-
6
3
6-16
6
6-16
I/O Pins
LAB B
LAB D
6-
Macrocells
49-64
I/O
Control
6- Block
6-16
I/O Pins
6
MAX 7000 CPLD architecture.
36
FLEX 10K
• Flex 10K: an FPGA family with 10,000 to 250,000 gates.
• Configured by loading internal static random access memory
(SRAM).
• The configuration is lost whenever power is removed
• Gate logic is implemented using a look-up table (LUT)
37
FLEX 10K
• LUT is a high-speed 16 by 1 SRAM.
• Four inputs are used to address the LUT’s memory
• The truth table for the desired gate network is loaded into the
LUT’s SRAM.
• A single LUT can model any network of gates with 4 inputs
and one output.
38
A
B
C
D
4 Input
LUT
(16 x 1 RAM)
F
A
B
F
C
A
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
RAM Contents
Address
Data
B C D
F
0 0 0
0
0 0 1
0
0 1 0
1
0 1 1
0
1 0 0
0
1 0 1
0
1 1 0
1
1 1 1
0
0 0 0
0
0 0 1
0
0 1 0
1
0 1 1
0
1 0 0
1
1 0 1
1
1 1 0
1
1 1 1
1
D
Using a lookup table (LUT) to model a gate network.
39
DATA1
DATA2
DATA3
DATA4
Look-Up
Table
(LUT)
Carry
In
Cascade
Register Bypass
In
Carry
Chain
Cascade
Chain
Programmable Register
PRN
D
Q
To FastTrack
Interconnect
ENA
CLRN
To LAB Local
Interconnect
LABCTRL1
LABCTRL2
Clear/Preset
Logic
Chip-Wide
Reset
Clock Select
LABCTRL3
LABCTRL4
Carry
Out
Cascade
Out
FLEX 10K Logic Element (LE).
40
FLEX 10K
• Two dedicated high speed paths are provided in FLEX 10K:
carry chain and cascade chain
• They both connect adjacent LEs without using general
purpose interconnect path
• Carry chain: supports high speed adders and counters (carry
forward function between LEs)
• Cascade chain can implement functions with a more than 4
inputs.
• Adjacent LUTs compute portions of the function in parallel
and the cascade chain serially connects the intermediate
values
• Cascade chain uses logic AND or OR to connect the outputs
of adjacent LEs.
41
Carry chain
42
Cascade chain
43
Dedicated Inputs &
Global Signals
Row Interconnect
6
4
4
Logic
Block
Array
(LAB)
16
Carry-In and
Cascade-In
2
4
LE1
4
LE2
4
LE3
4
LE4
4
LE5
4
LE6
4
LE7
4
LE8
8
2
8
24
4
Column-to-Row
Interconnect
16
8
Carry-Out and Cascade-Out
FLEX 10K Logic Array Block (LAB).
44
FLEX 10K CPLD architecture.
45
FLEX 10K
• The chip also contains embedded array blocks (EAB).
• EABs are SRAM blocks that can be configured to provide
memory blocks of various aspect ratios.
• An EAB contains 2048 SRAM cells which can be used to
provide memory blocks with a range of aspect ratios: 256x8,
512x4, 1024x2, 2048x1.
46
FLEX 10K
47
Cyclone
• Cyclone: Configured by loading internal static random access
memory (SRAM).
• The configuration is lost whenever power is removed
• Cyclone’s logic array consists of LABs, with 10 Logic
Elements (LEs) in each LAB.
• An LE is a small unit of logic providing efficient
implementation of user logic functions
• Cyclone had between 2,910 to 20,060 LEs
48
Cyclone
• RAM blocks are embedded in Cyclone devices
• These blocks are dual-port memory blocks with 4K bits of
memory plus parity (4,608)
• These blocks provide dual-port or single port memory from 1
to 36 bits wide at up to 200 MHz.
• These blocks are grouped into columns across the device in
between certain LABs
• The Cyclone EP1C6 and EP1C12 contain 92 and 239K bits of
embedded RAM
49
Cyclone
50
Cyclone
51
Cyclone
Register Chain
Routing from
Previous LE
LAB-Wide
Synchronous
Load LAB-Wide
Synchronous
Clear
LAB Carry In
Carry In1
Carry In0
Register Bypass
Programmable
Register
Packed
Registered
Select
Addnsub
DATA1
DATA2
DATA3
Look-Up
Carry
Table
Chain
(LUT)
Synchronous
Load and
Clear Logic
D
LUT Chain
Routing to
Next LE
Q
Row, Column,
and Direct
Link Routing
Register
Feedback
Row, Column,
and Direct
Link Routing
PRNALD
ADATA
DATA4
ENA
CLRN
LABCTRL1
LABCTRL2
LABPRE/ALOAD
Local Routing
Asynchronous
Clear/Preset/
Load Logic
Register Chain
Output
Chip-Wide Reset
Clock & Clock
Enable Select
LABCTRL1
LABCTRL2
LABCLKENA1
LABCLKENA2
LAB Carry In
Carry In1
Carry In0
52
Cyclone
•
•
•
•
Gate logic is implemented using a look-up table (LUT)
The LUT is a high-speed 16 by 1 SRAM
Four inputs are used to address the LUT’s memory
The truth table for the desired gate network is loaded into the
LUT’s SRAM during programming
53
Cyclone
• The output of LUT can be fed into a D flip-flop and then to
the interconnection network.
• More complex gate networks require interconnection with
neighboring logic elements.
• A logic array block (LAB) is composed of ten logic elements
(LE)
• Both programmable local LAB and chip-wide row and
column interconnects are available
• Carry chain are also provided to support faster addition
operation
54
Cyclone
Row Interconnect
Direct Link
Interconnect
from
Adjacent
Block
Direct Link
Interconnect
from
Adjacent
Block
Direct Link
Interconnect
to Adjacent
Block
Direct Link
Interconnect
to Adjacent
Block
Local
Interconnect
LAB
Local
Interconnect
LAB
55