Download Clockless Chips.pdf - 123SeminarsOnly.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Alternating current wikipedia , lookup

Fault tolerance wikipedia , lookup

Electronic engineering wikipedia , lookup

Flexible electronics wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Microprocessor wikipedia , lookup

Flip-flop (electronics) wikipedia , lookup

Time-to-digital converter wikipedia , lookup

Transcript
CLOCKLESS CHIPS
1. Introduction:Over the years, the designers of microprocessors have resorted to all sorts of tricks to
make their products run faster. Modern chips, for example, queue up several
instructions in a “pipeline” and analyze them to see if switching the order in which
they are executed can produce the correct result, only more quickly.
After a point, cranking up the clock speed becomes an exercise in diminishing returns.
That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip.
The clock, through the work it must do to coordinate millions of transistors on a chip,
generates its own overhead. The faster the clock, the greater the overhead becomes.
The clock in a state-of-the microprocessor can consume up to 30 percent of the chip's
computing capability, with that percentage increasing at an ever faster rate as clock
speeds increase.
Faced with diminishing returns, however, chip designers are dusting down two
technologies—called multi-threading and asynchronous logic—that were both
invented decades ago. At the time, neither was competitive with conventional designs,
but important uses have since emerged for each of them. Multi-threading can increase
the performance of database- and web-servers, while asynchronous logic is ideal for
wireless devices and smart cards.
The term asynchronous logic is used to describe a variety of design styles, which uses
different assumptions about circuit properties. These vary from the bundled delay
model, which uses conventional data processing elements with the completion
indicated by a logically generated delay model to delay insensitive design where
arbitrary delay through circuit elements can be accommodated.
BKEC, BASAVKALYAN
1
Dept Of CSE
CLOCKLESS CHIPS
2.Problems with Synchronous Approach:The synchronous approach predominated, largely because it is easier to design chips
in which things happen only when the clock ticks.
As chips get bigger, faster and more complicated, distributing the clock signal around
the chip becomes harder. Another drawback with clocked designs is that they waste a
lot of energy, since even inactive parts of the chip have to respond to every clock tick.
Clocked chips also produce electromagnetic emissions at their clock frequency, which
can cause radio interference.
Each tick must be long enough for signals to traverse even a chip’s longest wires in
one cycle. However, the tasks performed on parts of a chip that are close together
finish well before a cycle but can’t move on until the next tick. As chips get bigger
and more complex, it becomes more difficult for ticks to reach all elements,
particularly as clocks get faster.
In today's chips, the clock remains the key part of the action. As a microprocessor
performs a given operation, electronic signals travel along microscopic strips of metal
forking, intersecting again, encountering logic gates-until they finally deposit the
results of the computation in a temporary memory bank called a register.
Let's say you want to multiply 4 by 6. If you could slow down the chip and peek into
the register as this calculation was being completed, you might see the value changing
many times, say, from 4 to 12 to 8, before finally settling down into the correct
answer. That's because the signals transmitted to perform the operation travel along
many different paths before arriving at the register; only after all signals have
completed their journey is the correct value assured. The role of the clock is to
guarantee that the answer will be ready at a given time. The chip is designed so that
even the slowest path through the circuit-the path with the longest wires and the most
gates-is guaranteed to reach the register within a single clock-tick.
BKEC, BASAVAKALYAN
2
Dept Of CSE
CLOCKLESS CHIPS
The chip’s clock is an oscillating crystal that vibrates at a regular frequency,
depending on the voltage applied. This frequency is measured in gigahertz or
megahertz. All the chip’s work is synchronized via the clock, which sends its signals
out along all circuits and controls the registers, the data flow, and the order in which
the processor performs the necessary tasks. An advantage of synchronous chips is that
the order in which signals arrive doesn’t matter. Signals can arrive at different times,
but the register waits until the next clock tick before capturing them. As long as they
all arrive before the next tick, the system can process them in the proper order.
Designers thus don’t have to worry about related issues, such as wire lengths, when
working on chips. And it is easier to determine the maximum performance of a
clocked system. With these systems, calculating performance simply involves
counting the number of clock cycles needed to complete an operation.
Calculating performance is less defined with asynchronous designs.The clocks
themselves consume power and produce heat. In addition, in synchronous designs,
registers use energy to switch so that they are ready to receive new data whenever the
clock ticks, whether they have inputs to process or not. In asynchronous designs, gates
switch only when they have inputs.
The job of coordinating tens of millions of transistors at a billion ticks per second
requires the consumption of a lot of energy, most of which ends up as heat. Patrick
Gelsinger, chief technology officer at Intel, referred to the problem in his keynote
speech at the International Solid-State Circuits Conference last February. Gelsinger
was only half-joking when he said that if microprocessors continue to be run by everfaster clocks, then by 2005 a chip will run as hot as a nuclear reactor.
By throwing out the clock, the fundamental way that chips have organized and
executed their work. For instance, within every one-gigahertz microprocessor, there
lies an oscillating crystal ticking one billion times a second. Engineers are trained to
design chips where their first consideration is getting work done before the next
clock-tick comes around. For most chip designers, throwing out the clock is difficult
to imagine.
BKEC, BASAVAKALYAN
3
Dept Of CSE
CLOCKLESS CHIPS
3. Asynchronous logic circuits (Stop the clocks):As its name suggests, it does away with the cardinal rule of chip design: that
everything marches to the beat of an oscillating crystal “clock”. For a 1GHz chip, this
clock ticks one billion times a second, and all of the chip’s processing units coordinate their actions with these ticks to ensure that they remain in step.
Asynchronous, or “clockless”, designs, in contrast, allow different bits of a chip to
work at different speeds, sending data to and from each other as and when
appropriate.
Clockless processors, also called asynchronous or self-timed, don’t use the oscillating
crystal that serves as the regularly “ticking” clock that paces the work done by
traditional synchronous processors. Rather than waiting for a clock tick, clocklesschip elements hand off the results of their work as soon as they are finished.
Figure 1.Cyclic Time of Clocked Logic and Clockless Logic
BKEC, BASAVAKALYAN
4
Dept Of CSE
CLOCKLESS CHIPS
4. How clockless chips work:There are no purely asynchronous chips yet. Instead, today’s clockless processors are
actually clocked processors with asynchronous elements. Clockless elements use
perfect clock gating, in which circuits operate only when they have work to do, not
whenever a clock ticks. Instead of clock-based synchronization, local handshaking
controls the passing of data between logic modules. The asynchronous processor
places the location of the stored data it wants to read onto the address bus and issues a
request for the information. The memory reads the address off the bus, finds the
information, and places it on the data bus. The memory then acknowledges that it has
read the data. Finally, the processor grabs the information from the data bus.
According to Jorgenson, “Data arrives at any rate and leaves at any rate. When the
arrival rate exceeds the departure rate, the circuit stalls the input until the output
catches up.”
The many handshakes themselves require more power than a clock’s operations.
However, clockless systems more than offset this because, unlike synchronous chips,
each circuit uses power only when it performs work.
BKEC, BASAVAKALYAN
5
Dept Of CSE
CLOCKLESS CHIPS
5. Clockless advantages:In synchronous designs, the data moves on every clock edge, causing voltage spikes.
In clockless chips, data doesn’t all move at the same time, which spreads out current
flow, thereby minimizing the strength and frequency of spikes and emitting less EMI.
Less EMI reduces both noise-related errors within circuits and interference with
nearby devices.
5.1Power efficiency, responsiveness, and robustness:Because asynchronous chips have no clock and each circuit powers up only when
used, asynchronous processors use less energy than synchronous chips by providing
only the voltage necessary for a particular operation.
According to Jorgenson, clockless chips are particularly energy-efficient for running
video, audio, and other streaming applications — data-intensive programs that
frequently cause synchronous processors to use considerable power. Streaming data
applications have frequent periods of dead time — such as when there is no sound or
when video frames change very little from their immediate predecessors — and little
need for running error-correction logic. During this inactive time, asynchronous
processors don’t use much power. Clockless processors activate only the circuits
needed to handle data, thus they leave unused circuits ready to respond quickly to
other demands. Asynchronous chips run cooler and have fewer and lower voltage
spikes. Therefore, they are less likely to experience temperature-related problems and
are more robust. Because they use handshaking, clockless chips give data time to
arrive and stabilize before circuits pass it on. This contributes to reliability because it
avoids the rushed data handling that central clocks sometimes necessitate, according
to University of Manchester Professor Steve Furber, who runs the Amulet project.
BKEC, BASAVAKALYAN
6
Dept Of CSE
CLOCKLESS CHIPS
5.2 Simple, efficient design:Logic modules could be developed without regard to compatibility with a central
clock frequency, which makes the design process easier. Also, because asynchronous
processors don’t need specially designed modules that all work at the same clock
frequency, they can use standard components. This enables simpler, faster design and
assembly.
However, the recent use of both domino logic and the delay-insensitive mode in
asynchronous processors has created a fast approach known as integrated pipelines
mode.
Domino logic improves performance because a system can evaluate several lines of
data at a time in one cycle, as opposed to the typical approach of handing one line in
each cycle. Domino logic is also efficient because it acts only on data that has
changed during processing, rather than acting on all data throughout the process. The
delay-insensitive mode allows an arbitrary time delay for logic blocks. “Registers
communicate at their fastest common speed. If one block is slow, the blocks that it
communicates with slow down,” said Jorgenson. This gives a system time to handle
and validate data before passing it along, thereby reducing errors.
BKEC, BASAVAKALYAN
7
Dept Of CSE
CLOCKLESS CHIPS
6. Advantages of the Clockless chips:A clocked chip can run no faster than its most slothful piece of logic; the answer isn't
guaranteed until every part completes its work. By contrast, the transistors on an
asynchronous chip can swap information independently, without needing to wait for
everything else. The result? Instead of the entire chip running at the speed of its
slowest components, it can run at the average speed of all components. At both Intel
and Sun, this approach has led to prototype chips that run two to three times faster
than comparable products using conventional circuitry.
Clockless chips draw power only when there is useful work to do, enabling a huge
savings in battery-driven devices; an asynchronous-chip-based pager marketed by
Philips Electronics, for example, runs almost twice as long as competitors' products,
which use conventional clocked chips.
Asynchronous chips use 10 percent to 50 percent less energy than synchronous chips,
in which the clocks are constantly drawing power. That makes them ideal for mobile
communications applications - which usually need low power sources - and the chips'
quiet nature also makes them more secure, as typical hacking techniques involve
listening to clock ticks.
Another advantage of clockless chips is that they give off very low levels of
electromagnetic noise. The faster the clock, the more difficult it is to prevent a device
from interfering with other devices; dispensing with the clock all but eliminates this
problem. The combination of low noise and low power consumption makes
asynchronous chips a natural choice for mobile devices. "The low-hanging fruit for
clockless chips will be in communications devices," starting with cell phones
Asynchronous logic would offer better security than conventional chips: "The
clock is like a big signal that says, Okay, look now," says Fant. "It's like looking for
someone in a marching band. Asynchronous is more like a milling crowd. There's no
clear signal to watch. Potential hackers don't know where to begin."
Analyzing the power consumption for each clock tick can crack the encryption on
existing smart cards. This allows details of the chip’s inner workings to be deduced.
BKEC, BASAVAKALYAN
8
Dept Of CSE
CLOCKLESS CHIPS
Such an attack would be far more difficult on a smartcard based on asynchronous
logic.
They can perform encryption in a way that is harder to identify and to crack.
Improved encryption makes asynchronous circuits an obvious choice for smart
cards—the chip-endowed plastic cards beginning to be used for such securitysensitive applications as storage of medical records, electronic funds exchange and
personal identification.
Ivan Sutherland of Sun Microsystems, who is regarded as the guru of the field,
believes that such chips will have twice the power of conventional designs, which will
make them ideal for use in high-performance computers. But Dr Furber suggests that
the most promising application for asynchronous chips may be in mobile wireless
devices and smart cards.
BKEC, BASAVAKALYAN
9
Dept Of CSE
CLOCKLESS CHIPS
7. Different styles:There are several styles of asynchronous design. Conventional chips represent the
zeroes and ones of binary digits (“bits”) using low and high voltages on a particular
wire.
One clockless approach, called “dual rail”, uses two wires for each bit. Sudden
voltage changes on one of the wires represent a zero, and on the other wire a one.
"Dual-rail" circuits use two wires giving the chip communications pathways, not only
to send bits, but also to send "handshake" signals to indicate when work has been
completed. Replacing the conventional system of digital logic with what he calls "null
convention logic," a scheme that identifies not only "yes" and "no," but also "no
answer yet"—a convenient way for clockless chips to recognize when an operation
has not yet been completed.
Another approach is called “bundled data”. Low and high voltages on 32 wires are
used to represent 32 bits, and a change in voltage on a 33rd wire indicates when the
values on the other 32 wires are to be used.
BKEC, BASAVAKALYAN
10
Dept Of CSE
CLOCKLESS CHIPS
8. Applications of Clockless Chips (more into technical
details):1. High performance.
2. Low power dissipation.
3. Low noise and low electro-magnetic emission.
4. A good match with heterogeneous system timing.
1. Asynchronous for High Performance:In an asynchronous circuit the next computation step can start immediately after the
previous step has completed: there is no need to wait for a transition of the clock
signal. This leads, potentially, to a fundamental performance advantage for
asynchronous circuits, an advantage that increases with the variability in delays
associated with these computation steps. However, part of this advantage is canceled
by the overhead required to detect the completion of a step. Furthermore, it may be
difficult to translate local timing variability into a global system performance
advantage.
BKEC, BASAVAKALYAN
11
Dept Of CSE
CLOCKLESS CHIPS
Data-dependent delays
The delay of the combinational logic circuit show in Figure-1 depends on the current
state and the value of the primary inputs. The worst-case delay, plus some margin for
flip-flop delays and clock skew, is then a lower bound for the clock period of a
synchronous circuit. Thus, the actual delay is always less (and sometimes much less)
than the clock period.
BKEC, BASAVAKALYAN
12
Dept Of CSE
CLOCKLESS CHIPS
A simple example is an N-bit ripple-carry adder (Figure 2). The worst-case delay
occurs when 1 is added to 2N - 1. Then the carry ripples from FA1 to FAN. In the best
case there is no carry ripple at all, as, for example, when adding 1 to 0. Assuming
random inputs, the average length of the longest carry-propagation chain is bounded
by log 2 N. For a 32-bit wide ripple-carry adder the average length is therefore 5, but
the clock period must be 6 times longer! On the other hand, the average length
determines the average case delay of an asynchronous ripple-carry adder, which we
consider next. In an asynchronous circuit this variation in delays can be exploited by
detecting the actual completion of the addition. Most practical solutions use dual-rail
encoding of the carry signal (Figure 2(b)); the addition has completed when all
internal carry-signals have been computed. That is, when each pair (cfi; cti) has made
a monotonous transition from (0; 0) to (0; 1) (carry = false) or to (1; 0) (carry = true).
Dual-rail encoding of the carry signal has also been applied to a carry bypass adder.
When inputs and outputs are dual-rail encoded as well, the completion can be
observed from the outputs of the adder.
Elastic pipelines
In general it is not easy to translate a local asynchronous advantage in average- case
performance into a system-level performance advantage. Today's synchronous circuits
are heavily pipelined and retimed. Critical paths are nicely balanced and little room is
left to obtain an asynchronous benefit. Moreover, an asynchronous benefit of this kind
must be balanced against a possible overhead in completion signaling and
asynchronous control.
The controller communicates exclusively with the controllers of the immediately
preceding and succeeding stages by means of handshake signaling, and controls the
state of the data latches (transparent or opaque). Between the request and the next
acknowledge phase the corresponding data wires must be kept stable.
BKEC, BASAVAKALYAN
13
Dept Of CSE
CLOCKLESS CHIPS
2. Asynchronous for Low Power:Dissipating when and where active the classic example of a low-power asynchronous
circuit is a frequency divider. A D-flip-flop with its inverted output fed back to its
input divides an incoming (clock) frequency by two (Figure 4(a)). A cascade of N
such divide-by-two elements (Figure 4(b)) divide the incoming frequency by 2N.
The second element runs at only half the rate of the first one and hence dissipates only
half the power; the third one dissipates only a quarter, and so on. Hence, the entire
asynchronous cascade consumes, over a given period of time, slightly less than twice
the power of its head element, independent of N. That is, fixed power dissipation is
obtained.
In contrast, a similar synchronous divider would dissipate in proportion to N. A
cascade of 15 such divide-by-two elements is used in watches to convert a 32 kHz
crystal clock down to a 1 Hz clock. The potential of asynchronous for low power
depends on the application.
For example, in a digital filter where the clock rate equals the data rate, all flip-flops
and all combinational circuits are active during each clock cycle. Then little or
nothing can be gained by implementing the filter as an asynchronous circuit.
However, in many digital-signal processing functions the clock rate exceeds the data
(signal) rate by a large factor, sometimes by several orders of magnitude 2. In such
circuits, only a small fraction of registers change state during a clock cycle.
BKEC, BASAVAKALYAN
14
Dept Of CSE
CLOCKLESS CHIPS
Furthermore, this fraction may be highly data dependent. The clock frequency is
chosen that high to accommodate sequential algorithms that share resources over
subsequent computation steps. One is vastly improved electrical efficiency, which
leads directly to prolonged battery life.
One application for which asynchronous circuits can save power is Reed-Solomon
error correctors operating at audio rates, as demonstrated at Philips Research
Laboratories. Two different asynchronous realizations of this decoder (single-rail and
dual-rail) are compared with a synchronous (product) version.
The single rail was
clearly superior and consumed five times less power than the synchronous version.
A second example is the infrared communications receiver IC designed at HewlettPackard/Stanford. The receiver IC draws only leakage current while waiting for
incoming data, but can start up as soon as a signal arrives so that it loses no data.
Also, most modules operate well below the maximum frequency of operation.
The filter bank for a digital hearing aid was the subject of another successful
demonstration, this time by the Technical University of Denmark in cooperation with
Oticon Inc. They re-implemented an existing filter bank as a fully asynchronous
circuit. The result is a factor five less power consumption.
A fourth application is a pager in which several power-hungry sub circuits were
redesigned as asynchronous circuits, as shown later in this issue.
3. Asynchronous for Low Noise and Low Emission:Sub circuits of a system may interact in unintended and often subtle ways. For
example, a digital sub circuit generates voltage noise on the power-supply lines or
induces currents in the silicon substrate. This noise may affect the performance of an
analog-to-digital converter connected so as to draw power from the same source or
that is integrated on the same substrate. Another example is that of a digital sub circuit
that emits electromagnetic radiation at its clock frequency (and the higher harmonic
frequencies), and a radio receiver sub-circuit that mistakes this radiation for a radio
signal.
BKEC, BASAVAKALYAN
15
Dept Of CSE
CLOCKLESS CHIPS
Due to the absence of a clock, asynchronous circuits may have better noise and EMC
(Electro-Magnetic Compatibility) properties than synchronous circuits. This
advantage can be appreciated by analyzing the supply current of a clocked circuit in
both the time and frequency domains.
Circuit activity of a clocked circuit is usually maximal shortly after the productive
clock edge. It gradually fades away and the circuit must become totally quiescent
before the next productive clock edge. Viewed differently, the clock signal modulates
the supply current as depicted schematically in Figure 5(a). Due to parasitic resistance
and inductance in the on-chip and off-chip supply wiring this causes noise on the onchip power and ground lines.
4. Heterogeneous Timing:There are two on-going trends that affect the timing of a system-on-a-chip: the
relative increase of interconnects delays versus gate delays and the rapid growth of
design reuse. Their combined effect results in an increasingly heterogeneous
organization of system-on-a-chip timing. According to Figure 7, gate delays rapidly
decrease with each technology generation. By contrast, the delay of a piece of
interconnect of fixed modest length increases, soon leading to a dominance of
interconnect delay over gate delay. The introduction of additional interconnects layers
and new materials (copper and low dielectric constant insulators) may slow down this
trend somewhat. Nevertheless, new circuits and architectures are required to
circumvent these parasitic limitations. For
BKEC, BASAVAKALYAN
16
Dept Of CSE
CLOCKLESS CHIPS
example, across-chip communication may no longer fit within a single clock period of
a processor core. Heterogeneous system timing will offer considerable design
challenge for system-level interconnect, including buses, FIFOs, switch matrices,
routers, and multi-port memories. Asynchrony makes it easier to deal with
interconnecting a variety of different clock frequencies, without worrying about
synchronization problems, differences in clock phases and frequencies, and clock
skew. Hence, new opportunities will arise for asynchronous interconnect structures
and protocols. Once asynchronous on-chip interconnect structures are accepted, the
threshold to introduce asynchronous clients to these interconnects is lowered as well.
Also, mixed synchronous-asynchronous circuits hold promise.
BKEC, BASAVAKALYAN
17
Dept Of CSE
CLOCKLESS CHIPS
9. Clockless challenges:Asynchronous chips face a couple of important challenges.
9.1 Integrating clockless and clocked solutions:In today’s clockless chips, asynchronous and synchronous circuitry must interface.
Unlike synchronous processors, asynchronous chips don’t complete instructions at
times set by a clock. This variability can cause problems interfacing with synchronous
systems, particularly with their memory and bus systems. Clocked components
require that data bits be valid and arrive by each clock tick, whereas asynchronous
components allow validation and arrival to occur at their own pace. This requires
special circuits to align the asynchronous information with the synchronous system’s
clock.
9.2 Lack of tools and expertise:-
Because most chips use synchronous technology, there is a shortage of expertise, as
well as coding and design tools, for clockless processors. There is also a shortage of
asynchronous design expertise. Not only is there little opportunity for developers to
gain experience with clockless chips, but also colleges have fewer asynchronous
design courses.
BKEC, BASAVAKALYAN
18
Dept Of CSE
CLOCKLESS CHIPS
10. Conclusion:As we have been studied the implementation of clockless chip in asynchronous circuit
has much great advantage on clocked chips.
The obvious reason for their super performance and average speed , low power
consumption, less heat and noise generated .
These features mentioned above are in great demand of the current market of
electronics and computing world.
Now, As these clockless chips have great advantages over the clocked chips in feature
these clockless chips are surely going to remove the marked of clocked chips.
The term asynchronous logic is used to describe a variety of design styles, which uses
different assumptions about circuit properties.
These vary from the bundled delay model, which uses conventional data processing
elements with the completion indicated by a logically generated delay model to delay
insensitive design where arbitrary delay through circuit elements can be
accommodated.
This is very new area of research, design and testing but if more scientists and
engineers are dedicated to this then it is sure that in the future there will be technology
for clockless chips.
BKEC, BASAVAKALYAN
19
Dept Of CSE
CLOCKLESS CHIPS
References:1) Scanning the Technology: Applications of Asynchronous Circuits – C. H.
(Kees) van Berkel, Mark B. Josephs, and Steven M. Nowick
2) http://ieeexplore.ieee.org/iel5/2/30617/01413111.pdf (October 2001)
3) http://csdl2.computer.org/comp/mags/dt/2003/06/d6005.pdf
4) http://www.technologyreview.com/articles/01/10/tristram1001.asp
5) http://www1.cs.columbia.edu/async/misc/economist/Economist_com.htm
BKEC, BASAVAKALYAN
20
Dept Of CSE