Download DSP Lecture Series

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DSP Lecture Series
DSP Memory
Architecture
Dr. E.W. Hu
Nov. 28, 2000
Computer Architecture and VLSI
Technology Pioneer: Lynn Conway
In the 1950s,
while working
at IBM, Lynn
Conway
conceived the
idea of multiissue processors
, the forerunner
of today’s
VLIW
processors?
Fixed-point DSP datapath
What is memory architecture


The characteristics of the organization of
memory and its interconnection with the
processor’s datapath is called memory
architecture.
Memory architecture determines the
memory bandwidth which is a critical
factor that affects the performance of a
DSP.
Memory bandwidth


In general, bandwidth w is defined as the rate at
which the words can be written to (store) or read
from the memory.
For a DSP, it is convenient to think of how many
instruction cycles are needed to complete a read
or write operation. If everything else is the same,
the smaller the number of instruction cycles, the
higher the bandwidth.
Why DSP applications needs
large memory bandwidth


A high performance datapath is only part
of a high-performance processor.
DSP applications are typical computationintensive, which requires large amount of
data to be moved to and from the memory
quickly (between the datapath(s) and the
memory module (s), as described in the
next slide.
Typical DSP applications: the FIR
or finite impulse response filter
At each “tap”, four memory accesses are
needed for FIR application




Fetch the MAC instruction in memory
Read the data value from memory(a
‘sample’ from the signal)
Read the appropriate coefficient from
memory (known constant for a particular
filter)
Write the data value to memory (next
location in the delay line)
The Von Neumann architecture
for general-purpose processors
The Harvard architecture: design basis for most
DSPs; more than two accesses per cycle
Variations of the Harvard architecture allow still more
memory accesses per instruction cycle
Typical DSPs with two or three
independent memory banks
Analog Devices ADSP-21xx
AT&T DSP 16xx
Zilog Z893xx
Motorola DSP5600x, DSP563xx, DSP96002
Other approaches to achieve multiple
accesses to memories per cycle

Examples of some other approaches



multiple, sequential accesses per instruction cycle
over a single set of buses (meaning each access
takes less than one cycle), e.g., Zoran ZR3800.
Multi-ported memories that allow multiple concurrent
memory accesses over two or more independent sets
of buses (Fig 5.4), e.g., AT&T DSP32xx.
Allows read/write operation to proceed at the same
time under restricted circumstances, e.g., AT&T
DSP16xx.
Using cache memory to reduce
memory accesses


On-chip program cache reduces memory
accesses
There are so many different implementations of
program caches:



Single instruction repeat buffer
Multiple-instruction cache (e.g., stores a block of 16
instructions)
Single-sector instruction cache that stores some
number of most recently used instructions.
Using “modulo addressing” technique to
reduce memory accesses

To be discussed in the next seminar:
memory addressing modes
Using “algorithmic approaches” to reduce
memory accesses



Algorithms are used to exploit data locality to reduce
memory accesses.
DSP algorithms that operate on blocks of input data
often fetch the same data from memory multiple times
during execution, as in the case of FIR filter
computation.
In the example that follows, the filter operates on a block
of two input samples. Instead of computing output
samples one at a time, the filter instead computes two
output samples at a time, allowing it to reuse previously
fetched data. This effectively reduces the memory
bandwidth required from one instruction fetch and two
data fetches to one instruction fetch and one data fetch
per instruction cycle.
Illustration of algorithmic approach
Memory wait states

Wait states are states in which the
processor cannot execute its program
because it is waiting for access to memory
due to, for example


Slow memory
Bus sharing
On-chip ROM for low-cost
embedded applications

On-chip ROM (usually small, 256 to 36K
words) is used to store small application
programs and constant data for low-cost
embedded applications.
External memory interfaces
External memory interfaces:
manual caching

If a section of often-used program code is
stored in a slow, off-chip memory, it is
programmer’s responsibility to move the
code to faster on-chip RAM, either at
system start-up or when that section of
program is needed.
Dynamic memory


Most DSPs use static RAM, which is faster
and easier to interface, but it is more
expensive.
For low-cost high-volume product, the
designer might need to consider dynamic
RAM, especially the static-column DRAM.
Direct memory access


DMA allows data transfer to take place
(to/from processor’s memory) without the
involvement of the processor itself.
It is typically used to improve the
performance for I/O devices.
Customization

Some vendors are flexible enough to
customize it chip-design for their
customers (customizable DSPS).