Download A Complementary GaAs (CGaAs TM) 32

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of electric power transmission wikipedia , lookup

Electrical substation wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Power engineering wikipedia , lookup

Microprocessor wikipedia , lookup

Voltage optimisation wikipedia , lookup

Transmission line loudspeaker wikipedia , lookup

Buck converter wikipedia , lookup

Transistor wikipedia , lookup

Power over Ethernet wikipedia , lookup

Electronic engineering wikipedia , lookup

Surge protector wikipedia , lookup

Alternating current wikipedia , lookup

Mains electricity wikipedia , lookup

Semiconductor device wikipedia , lookup

Flexible electronics wikipedia , lookup

Digital electronics wikipedia , lookup

Opto-isolator wikipedia , lookup

CMOS wikipedia , lookup

Transcript
A Complementary
Michael
GaAs (CGaAsT”)
J. Kelley, Matthew
A. Postiff, Timothy
Department
of Electrical
32-bit Multiply
Accumulate
Unit
D. Strong, Richard B. Brown, and Trevor N. Mudge
Engineering
University
and Computer
Science
of Michigan
1301 Beal Ave.
Ann Arbor MI, 48109
The University
Abstract
seeks to exploit
A
high
speed
accumulate
unit
(CGaAs~~)
to collect
one-cycle,
technology.
is
utilized
on
paths.
A
the
partial
description
radiation
including
logic
and their
effect
of
styles,
of
circuit
select
issues,
a study
design
is
is
architecture.
underway.
accumulate
size.
PUMA
research project
characteristics
design of the PowerPC
Several technology-proving
We chose to design
unit (MAC)
the importance
designs are also
the high
reported
in a high-
microprocessor
speed multiply
in this paper because of
of multiplication
in microprocessor
2. Technology
as a
of some
of
presented,
methodology,
on performance.
1. Introduction
CGaAs
is a complementary
provides enhancement
erostructure
GaAs
mode n-channel
insulated-gate
technology
field effect transistors
on the same substrate. A fully
complementary
lower
logic
style,
having
to other
mented in CGaAs.
much
GaAs
logic
power
families,
The rail-to-rail
which
and p-channel
Iike)
compared
(CMOSdissipation
can be imple-
signal
swings
better noise margins than those of direct-coupled
logic, the most common GaAs logic family.
these markets have need for micro-
2.1.
and digital-signal
are capable of surviving
environment.
In
addition,
pro-
Motorola’s
CGaAs
offers
the satellite
high electron
mobility
technology
inherent
communications
provides
FET
Device Structure
CGaAs
devices are created
ited by molecular
satisfies
radiation
provide
a hostile radiation
market also requires low power operation.
ments. CGaAs
het-
(HFETs)
The emerging demand for satellite communications
is
moving the market for radiation hardened devices out of
military
applications
and into the mainstream.
Both of
cessors which
and
DSP systems.
is used
technology,
is provided
CGaAs
in
adder
encoding
tree
CGaAs
Finally,
is used
(D CVSL)
hardness,
designing
GaAs
Booth
product
logic
speed, low-power
64-bit
compressors
Radix-4
switch
to the discussion.
implications
multiply,
Complementary
and a cany
result.
voltage
its inherent
background
the
reduce
cascode
cn”tical
including
products
the jinal
to
in
A tree of 4:2
the partial
used to determine
Di#erential
32-bit
is presented
of Michigan’s
these CGaAs
these require-
hardness, and its
good device transconduc-
tance and enables high speed operation at low voltages.
The newest generation
of Motorola’s
CGaAs
has
beam epitaxy
in epitaxial
(MBE).
layers
depos-
A semi-insulating
GaAs substrate is used for starting material.
A cross section
of
a pair
devices appears in Figure
the carriers to a conducting
of
nHFET
1. Heterojunctions
and
pHFET
help confine
channel formed in the InGaAs
drawn gate lengths of 0.5pm and thresholds of 0.35V. The
process uses refractory
metal gates and self aligned
layer, where they have high nobilities
because there is no
impurity scattering. The large bandgap AlGaAs layer provides gate isolation, and the top GaAs layer is used as a
source/drain
cap to prevent surface oxidation
implants.
Ohmic contacts to the source/drain
areas are formed
using a refractory
high temperature
stability
metal, which provides
and the opportunity
to use stan-
dard aluminum
metallization
for interconnect. Three levels of AICU metallization
are currently available, and a
fourth level of metal is currently under development [ 1, 2].
In most MESFET
a Schottky
of the AlGaAs.
technologies,
diode junction.
the gate is formed
In CGaAs,
by
the large bandgap
of the AlGaAs layer between the TNN
gate metal and
the channel increases the n-gate diode turn-on voltage
(where Ig=l@~m2)
to 1.75V, and the p-gate diode turn
on voltage to -2V, with Vds = OV [1, 3]. Gate leakage
0-8186-8316-3/98 $10.00 (c) 1998 IEEE
Tw4
ate at 1.3V, which is slightly
gate
/
\
lower than the technology’s
power supply rating of 1.5V. This moderates performance
somewhat but provides a significant
25d
. . . . . . . .. —-
.--
. . . . . ..—
—
.. .. . . .
The gate layer has a lower resistance than its CMOS
8MlGa&s
polysilicon
d.
I-GUS
?4 DMa m$m.g
%Cm4 ,4*3
,.AF3A9 Lulrlw
semiconductor surface, and there is a leakage current even
in oxygen implanted
field areas. Therefore,
one must
Radiation
immune
1. Cross section
CGaAs
devices
increases with higher gate voltages, larger gate areas, and
drain-source
lowering).
voltage (due to drain-induced
barrier
This leakage must be addressed in the design
phase.
The threshold
duction
voltage of the devices is set by the con-
and valence band discontinuities
InGaAs,
by a silicon
delta doping
versions of CGaAs
thresholds
(without
implant
the channel
at 0.55V. The new version
for a 0.35V
threshold
of the AIGaAs/
[1]. Earlier
implant)
set the
is being optimized
which will provide
improved
speed
voltage drop across an n-transistor
pass-
gate when passing a logic one.
Ohmic
contact
source/drain
allows
between
subsequent
provides
the first metal
areas is provided
high
temperature
a conventional
three-level
with stacked vias. A fourth
layer
by a refractory
In a CMOS
charge due to radiation,
process,
which
causes
rate was improved
by the addition
of a Low Temperature
GaAs (LTG) layer to reduce charge collection from ionizing radiation (see Figure 1). The LTG layer also improves
between
characteristics
devices.
and provides
Optimization
better isolation
of the new structure
is
ongoing.
With the immunity
of CGaAs to latchup,
a semi-insu-
GaAs substrate, and the added LTG layer, CGaAs
devices are suitable
[2].
3. Multiplier
for radiation
intensive
environments
Architecture
and the
metal which
processing.
AICU
accumulates
It is
effects because it does not
leakage between transistors and a shift in the threshold
voltages. Recently, the single event upset (SEU) soft error
lating
and the possibility of using unipolar transmission gates (npass gates). The earlier high-threshold
devices resulted in
an unacceptable
hardness of the process is excellent.
to total dose radiation
subthreshold
layer placed just below
the channel area, and by a p-channel
but it is in direct contact with the
use Si02 as a gate or field insulator.
Si02
higher
counterpart,
route as little in the Schottky gate metal layer as possible.
----
Ned to scale
Figure
in gate leak-
f 534 ,4nG&
LTG G*
.. ... .. . ... ... ... ..
reduction
age.
WA I-Gans
CGaAs
Although
independent
metallization
CGaAs
ment the MAC,
accumulate
chosen to imple-
architecture
of the technology.
for the multiply
metal level is in development
is the technology
the high-level
The overall
chip (MAC)
is essentially
design goals
included
high
but was not used in this design. The metal feature size and
spacing are coarser than those of a typical CMOS technol-
speed, low area, and low power. The main focus was to
explore the design space and to determine techniques and
ogy of comparable
circuit topologies for digital CGaAs designs.
With these objectives, the MAC was designed as a one-
Isolation
implant
of
feature size [4].
the devices
that extends
This technique
implant
minimum
into
by
an oxygen
the semi-insulating
is provided
substrate.
separates the n and p active regions.
is used to avoid the mesa etch step common
GaAs technologies,
ity. Unlike
An
in
and it results in better surface planar-
CMOS,
the substrate
is semi-insulating,
so
wells and well contacts are not necessary, and latchup
is
unit.
The
is composed of two subunits, a 32-bit multiplier
two’s
and
a 64-bit
complement
adder (see Figure
puts and internal
registers
multiply-accumulate
2). Latches on the inputs, outare included
designed in CGaAs have several important
tures. The low threshold
voltages,
full
power
rail
featransi-
tions and high transconductance
at low supply voltages
make low voltage performance good. These lower supply
reduce
both
dynamic
in a global
scan
chain for testing purposes.
The 32-bit, two’s complement
multiplier
tree is imple-
A 4:2 compressor
operates
on four partial product bits and compresses these to two
result bits. A tree layout based of 4:2 compressors has
more regular cell placement and simpler routing than a
2.2. Process Characteristics
voltages
MAC
mented using 4:2 compressors.
not possible.
Circuits
cycle,
power
dissipation
and
3:2 (full-adder)
implementation
[5].
A full tree of 4:2 compressors
takes five levels to pro-
duce the two 64-bit sum and carry results presented to the
final adder. The utilization
of Radix-4 Booth encoding
eliminates one level of compressors and reduces area [6].
power lost through the gate. The MAC is designed to oper-
0-8186-8316-3/98 $10.00 (c) 1998 IEEE
4. Circuit
Ad
.4mB
CGaAs
which
full
R--P
7-
provides
complementary,
dynamic
!
Carrwpas!
I
Figure
I
A,&rc
I
families
in
unit, including
passgate,
XOR
source-coupled,
Voltage
Switch
dual-rail
domino
“keeper”
pHFET
devices
to maintain
nodes.
HSPICE
simulations
the design pseudo-static,
charge
sharing
influencing
3.
gate with
voltages
show
device not only allows the clock
making
Logic
gate appears in Figure
dynamic
“keeper”
Fmlklder
of logic
the MAC
Cascode
An example
The gate is a standard DCVSL
I
LnE3y@
r-
variety
dynamic,
Differential
(DCVSL).
LwhTree
l-,
a wide
we could have designed
and unipolar circuits. Each of these represents different
points in the performance, size, power and noise immunity space [7]. The MAC was designed using primarily
I
[
Implementation
on the
that
this
to be stopped,
but also helps to prevent
in the evaluation
logic
from
incorrectly
the output.
2. MAC Architecture
In addition to performing
a 32-bit multiply, the 64-bit
intermediate result can be used as an accumulate value;
this technique
is often used in signal processing
The accumulate
is implemented
by folding
late (CRegister)
value into appropriate
partial
tree of the multiplier.
product
because some inputs
the multiplier.
tial products
circuits.
the accumu-
bit positions
This
to the compressors
in the
is possible
are not used by
The value is thereby combined with the parfrom the A and B registers, and is already
added to the product
mediate registers.
when the result appears at the inter-
The two 64-bit quantities output from the multiplier
are added to form the final result, which is also the next
accumulate
value of the MAC.
posed of seven 16-bit
the outputs
The 64-bit
adder is com-
adders so that multiplexors
of adders that have the correct
Figure 3. Dynamic
“keeper”
transistors
DCVSL
XOR
gate
with
select
carry results
from lower segments. Each 16-bit adder is designed
with
The DCVSL
than
a fully
style circuits
complementary
allow
higher
design
performance
because
devices
are 3 to 4 times
cient adder that requires only a few standard cells.
pHFET
devices.
The adder and multiplier
make up most of the MAC.
To facilitate testing, the MAC incorporates
several test
power. The XOR gate of Figure 3 has an evaluation
4-bit carry-lookahead.
devices.
Two registers
are included
When
This architecture
(CarryRegister
between
the multiplier
these registers
are selected,
two-cycle pipelined
registers is included
ters are included
provides
an effi-
and SumRegister)
and the final
the MAC
adder.
becomes
a
design. A full scan-chain of all the
for testing. Since the pipeline regis-
in the scan chain, the intermediate
results
The gates are efficient
for
speed and
delay
of 428ps at 189p.W for a power delay product of 8 lfJ.
While the presence of dynamic nodes might increase the
SEU sensitivity
of the design, the result is still expected to
be much
less radiation
Evaluation
of the MAC
sensitive
than a CMOS
for radiation
Booth
of the multiplier
tree are available for debug, and inputs
can be scanned directly into the final adder to aid in its
testing. The inputs to the multiplier
tree and the accumulate value are latched at the beginning and end of the unit,
dynamic
static circuits
encoding
is hidden
circuits.
circuit.
hardness will
our understanding of dynamic CGaAs circuits.
The MAC uses full complementary
circuits
Radix-4
respective y (see Figure 2).
nHFET
faster than the corresponding
add to
to perform
[9, 11, 12]. The delay of these
within
the pre-charge
Once all dynamic
time of the
nodes in the circuit
have been pre-charged, they evaluate in domino fashion.
CGaAs also improves performance
and reduces the
size of the logic for Booth encoding over DCFL designs.
Booth
encoding
provides
small
0-8186-8316-3/98 $10.00 (c) 1998 IEEE
is far more efficient
fast multiplexors.
in a technology
DCFL
designs
that
must
implement
the coding
logic
in NOR-NOR
configurations
which are larger [8].
4.1. General Design Considerations
Among
the more obvious
differences
is that CGaAs
has no wells.
and CMOS
between CGaAs
This allows
p-
and n-channel transistor drains to be abutted, prevents
Iatchup, and eliminates the need for well contacts.
Another
While
difference
the Schottky
routing,
in CGaAs
is the Schottky
layer.
layer in CGaAs can be used for local
it leaks to the substrate, so its use as an intercon-
nect layer should
dynamic
be limited
circuits.
Dynamic
charge on their internal
transistors
helps
especially
circuits
when used with
rely on the storage of
nodes. The use of small “keeper”
maintain
the charge
on the dynamic
nodes.
Figure
4. Preliminary
In addition, the n-transistors in CGaAs have a transconductance about 4 times better than the p-transistors. Due
to the better transconductance, circuits should be designed
for
primarily in the n-transistors.
very well to dynamic circuits
operated in pipelined
the n-network.
should
Complementary
be designed
avoiding
This design style lends itself
where most of the logic is in
logic designed
primarily
of NAND
the
through
73 MHz
core
critical
path
mode.
thus
5. Conclusion
large p-stacks.
Although
many
using
process
of
Verilog.
the MAC
proper functionality
described
of the characteristics
and
circuit
techniques
of
CGaAs may be different than in CMOS, the methodologies behind designing chips are very similar. The MAC is
models
simulated
in CGaAs
structures,
Physical Design
modeled
The
the design is 13.7ns, yielding
a clock speed of
in single cycle mode or about 140MHz when
The MAC
4.2.
is 239mW.
MAC Layout
Both
were
behavioral
simulated
by applying
and structural
and verified
in this paper demonstrates
of CGaAs
technology
several
in
a high
speed design. Careful
attention
to layout and circuit
details is necessary to avoid difficulties
that can be caused
by the differences between CGaAs and CMOS. However,
CGaAs
technology
power, radiation
provides
a high
hard alternative
performance,
for digital
low
designs.
for
targeted and random test
Acknowledgment
vectors.
In conjunction
with the development
els, a standard cell library
were custom
designed
of the MAC mod-
was created. The standard cells
using Mentor
Graphics
ICstation.
Each standard cell was simulated
and verified
using
HSPICE.
With a standard cell library in place, the MAC core
was
designed
EPOCH
using
compiler.
cell placement
Cascade
EPOCH
and routing.
and routed to the padframe.
layout of the MAC
ing and power
dissipation
cess once the design
obtained.
The final
60,904
[1]
Automation’s
The core was manually
placed
Figure 4 shows a preliminary
analysis on clock buffer-
needs to be completed.
using Motorola’s
is completed
[2]
The
CGaAs pro-
and a final
layout
is
transistors.
(without
clock
HSPICE-estimated
buffering)
power
contains
was supported
by DARPA
contract
DAA
[3]
J. Abrokwah, J. Huang, W. Ooms, C. Shurboff, J. Hallmark,
R. Lucero, J. Gilbert, B. Bemhardt, G. Hansell, “A
Manufacturable Complementary GaAs Process:
1993
IEEE GaAs IC Symposium Technical Digest, pp. 127-130.
M. LaMacchia, J. Abrokwah, B. Bemhardt, D. Foster, B.
Crawforth, B. Mathes, T. McGuire, T. Weatherford,
“Radiation Hardened Complementary GaAsV 1997 IEEE
GaAs IC Symposium Technical Digest, pp. 59-61.
A. Akinwande, P. Ruden, D. Gnder, J, Nohava, T. Nohava,
P.
design
work
References
was used for both standard
unit. Further
MAC unit will be fabricated
Design
This
H04-94-G-0327.
Joslyn,
Heterostructure
dissipation
0-8186-8316-3/98 $10.00 (c) 1998 IEEE
J.
Breezley,
PETs for Low
“Complementary
Power
Integrated
HI-v
Circuits:’
1990
IEEE
Technical
[4]
P. O’Neil,
G.
International
B. Bemhardt,
Hansen,
“GaAs
Motorola;
Digest,
[5]
1993
Meeting
F. Nikpourian,
IEEE
Heterojunction
C. Delta,
Circuit
GaAs
IC
Y. Abad,
Fabrication
Symposium
at
Technical
[18] P. Song,
Issues for a High
Processor,
1995
Ph.D.
Solid-State
Clock
Dissertation,
of Michigan.
Computer
Arithmetic
Algorithms,
1993 Prentice
Hall.
[7]
R. Brown,
Par&h,
B. Bemhardt,
T. Basso,
Foster, B. Crawforth,
T. McQuire,
of Complementary
“Overview
for High-Speed
VLSI
Circuits,”
J. Abrokwah,
P.
C. Gauthier,
D.
K. %rkallah,
R. Lomax,
GaAs Technology
1997 IEEE Transactions
on
Circuits.
M. Riepe, T. Huff,
M. Upton, T. Mudge,
53x53-bit
Array
Parrdlel
GaAs
DCFL;’
Engineering
[9]
S. Stetson,
T. Mudge,
VLSI
[8]
M. LaMacchia,
S. Gold,
G.
Multiplier
University
and Computer
Fast
Bewick,
Implementation,
R. Brown,
in 0.6 pm
Michigan,
Electrical
of
Science, pp. 1-9.
Multiplication:
1994
“A 7-nS
Implemented
Algorithms
Ph.D.
Dissertation,
M. Sherwin,
P. Robertson,
and
Stanford
University.
[10] A. Baca, J. Zolper,
Howard,
D.
Rieger,
Junction-Gated
[11 ] N.
Klein,
1994 IEEE
R. Shul, A.
“Complementary
Heterostructure
Technology;’
Digest,
J.
Field
GaAs
IC
GaAs
Effect
Transistor
Symposium
Technical
pp. 59-62.
Burgess,
Booth’s
“Removal
Algorithm
Letters,
August
[12] M. Roorda,
of Sign-Extension
Circuitry
Multiplier-Accumulators~
From
Electronics
1990, Vol. 26, No. 17, pp. 1413-1415.
“Method
Multiplier
That
Electronics
Letters,
to Reduce the Sign Bit Extension
Uses
the
Modified
September
Booth
1986. Vol.
in a
Algorithm;’
22, No. 20, pp.
1061-1062.
[13] M.
Mehta,
Multiplier
V,
Compressor
Computer
[14] J.
E.
C.
Shurboff,
J. Huang,
B.
“0.9
“High-Speed
M.
on
Flym,
University
R.
J.
15 ns 4K
Mukiply/Accumulator~
1994
Adder;
Blocks:
Lucero,
A
Technical
Binary
Research and Development,
Stanford
and
Symposium
Ooms,
V DSP
and a 45 ns 16-bit
[16] N. Quach,
IEEE
Counter
pp. 43-50.
IEEE GaAs IC Symposium
[15] H. Ling,
“High-Speed
Multi-Input
1991
Circuits,”
Hallmark,
Swartzlander,
Using
Arithmetic,
Abrokwah,
SRAM
Parmar,
Design
Digest,
pp 55-58.
1981 IBM
Journal
of
Vol. 25, No. 3, pp. 156-166.
“High-Speed
Technical
Addition
Report,
in CMOS;’
CSL-TR-90-415,
1990.
[17] J.
Huang,
Majerus,
Mills,
E.
Glass,
J.
Abrokwah,
B.
E. Spears, J. Parsey, D. Scheitlin,
K.
Optimization
Hawthorne,
J. Blaugh,
for a Low Voltage
“Device
Enhancement
Bemhardt,
R. Droopad,
and
Technical
G. DeMicheli,
Offs for High-Speed
Circuits,
[19] J. Rabaey, Digital
and Circuit
Floating-Point
1. Koren,
FET for Portable
GaAs IC Symposium
Integrated
Architectural
University
[6]
Devices
pp. 123-126.
T. Huff,
Rate
Electron
Digest, pp. 983-986.
M.
L.
Process
Mode Power
0-8186-8316-3/98 $10.00 (c) 1998 IEEE
Applications:
1997 IEEE
Digest, pp. 55-58.
“Circuit
and Architecture
Multiplication:
Trade-
1991 IEEE Journal
Vol. 26, No. 9, pp. 1184-1198.
Integrated
Circuits,
1996 Prentice
Hall.
of