Download Reliability and Failure Analysis of Electronic Components

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Thermal runaway wikipedia , lookup

Opto-isolator wikipedia , lookup

Capacitor wikipedia , lookup

Stray voltage wikipedia , lookup

Transistor wikipedia , lookup

Mains electricity wikipedia , lookup

Electrical substation wikipedia , lookup

Earthing system wikipedia , lookup

Surge protector wikipedia , lookup

Distribution management system wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Power MOSFET wikipedia , lookup

Fault tolerance wikipedia , lookup

Transcript
Reliability and Failure Analysis
of Electronic Components
By
Marco Mugnaini
Design for Safety of
Electronic Components
• For VLSI Circuits to be a useful and growing technology, 2 conditions
must be satisfied:
– Can be produced in large quantities at low cost
– Can perform their functions throughout their intended lifetime
• To lower the cost of manufacturing, the optimal size of the IC should
be selected.
• The optimal size is a compromise between several competing
considerations:
–
–
–
–
Partitioning of the system
yield of good circuits
packaging and system assembly cost
Reliability/availability and design for safety of complete system
• Large number of IC’s results in high yield and assembly cost
• To arrive at an optimal division of the system, we must be able to
predict the total system reliability as a function of the number of IC’s of
varying size
2
Mechanism of Yield Loss in VLSI
• Cause for low yield falls into 3 basic categories:
– Parametric processing problems
– Circuit design problems
– Random point defects in circuits due to several causes
Processing Effects
• Often a wafer is divided into regions good chips and bad
• This is most likely due to processing effects such as
– Variations in thickness of oxide or polysilicon layers
– Variations in resistance of implanted layers
– Variations in width of lithographically defined features
3
• Alignment of photomasks
– e.g. PolySi gate lengths are shorter in thinner polySi regions than
in thicker polySi regions. This may cause channel lengths to be
too short and transistors cannot be turned off. This leads to
excessive leakage current
• Variations in thickness of deposited dielectric lead to variations in
contact window size. This may lead to non-operative circuits if the
circuits depend on having a low value of contact resistance.
• Variations in the doping of implanted layers which also leads to
variations in contact resistance
• Also, wafer may vary in size during processing in excess of 20ppm.
Therefore a 125 mm wafer changes in size by 2.5mm. This may cause
significant misalignment.
4
Circuit Sensitivities
• Certain areas of a wafer have low device yield because the design of a
circuit has failed to consider expected variations in device parameters and
correlation between variations in different parameters.
Point Defects
• A 3 m dust can cause a break in a metal conductor
• Isolated oxidation induced stacking fault may cause excessive leakage
current
Modeling of Yield-loss Mechanisms
• We need to model IC yield in terms of fundamental parameters independent
of particular IC and characteristics of the process and processing line
because:
– by accurately modeling the yield we can predict the cost and availability
of ICs
5
•
•
•
•
•
•
•
•
– once yield-modeling parameters are known one can compare
processing quality of different process lines and indicate where
improvements are required
IC yield is expressed as
Y=Y0Y1(D0,A,i)
1-Y0 = fraction of bad chips due to processing related effects
1-Y1 = remaining fraction of bad chips which is a function of
density of point defects
A is the chip area
i is the parameter unique to different models of the yield
Y = ratio of good chips to total number of chips per wafer
All models predict Y decreases monotonically as A increases
Yield modeling can identify those processes and mechanisms that limit
yield of present IC
The process can then be improved or eliminated as needed
6
Uniform Density of Point Defects
• In those areas where yield not degraded by either processing or circuit
sensitivities, the remaining cause of chip failure is randomly distributed
point defects
• A grid of 24 chip sites with 10 defects randomly distributed. In this
example 16 of the 24 sites have 0 defects
• Of the remaining sites 6 have 1 defect no site has more than 2 defects
• The problem of determining the yield is identical to the problem of placing
n balls in N cells and then calculating probability of a given cell containing
k balls
P k = (n!)/[k!(n-k)!]  (1)/(Nn)(N-1) n-k
• If N and n are both large n/N = m remains finite and can be approximated as
Pk =e-mmk /k!
• The probability that a chip contains no defects is Y1 = P0 = e-m
• The probability a chip contains 1 defect is
P1 = me-m
7
• If the area of the chip is A, the total chip area in the useable part of a
wafer is NA
• The density of defects is n/NA = D0
• The average number of defects per chip, m, is m = n/N = D0 NA/N = D0
A
Y1 = P0 = exp(-D0 A)
• This Poisson estimation was used to predict yield in the early days in the
manufacture process
• The actual yield was found to be much larger than predicted
8
Yield Enhancement using Redundant Circuitry
• Many large MOS memory chips are designed with redundant circuitry,
which can be switched to replace defective circuit elements
• This is usually accomplished using fusible links which can be fused as
needed using laser or other techniques
• The yield will then be modified as shown
•
Y1 = P0 + P1
• P0 = probability of chip containing no defects
• P1 = probability of chip containing 1 defect
•  = probability of chip containing 1 defect and can be repaired by using a
single redundant column
Simple Non-uniform Distribution of D
• Discrepancy between measured and predicted yield led to investigation of
non-uniform distribution of D0 across a wafer
9
• The yield can be expressed as
• The yield is expressed as
Y =  exp(-DA) f(D) dD
• f(D) is the normalized distribution of defect density
 f(D) dD = 1
• 3 different D0 are investigated:
– Delta function Y1 = exp(-D0A)
– Triangular
Y2 = {[1-exp(-D0A)]/D0A}2
– Rectangular Y3 = {1-exp(-2D0A)}/2D0A
• for D0A >> 1 we find that
– Y1 = exp(-D0A)
– Y2 = 1/(D0A)2
– Y3 = 1/(2D0A)
• Y3 is found to be most closely fit to the observed yield of large IC’s
• The above distributions do not have any physical basis, therefore more
physically based distributions need to be investigated
10
Gamma Distribution
• The Gamma distribution is more physical
f(D) = 1/[()() ]D -1 exp(-D/ )
•  and  are 2 distribution parameters and () is the gamma function
• Average density of defects =  
• Variance of D =  2
• Consequently Y4 = 1/(1+SD0A)1/s
• for s 0, Gamma function reduces to delta function and Y4 exp(-D0A)
• Using different values of s, Gamma function is a good approximations of
Y2 and Y3 over a wide range of D0A
• Gamma yield functions can be used to represent a large variations in the
shape of experimental yield vs area curve see Fig. 4 and 5 p. 621 and 622
of Sze.
• Each type of defect is characterized by
– its mean defect density Dn0
– shape factor of its distribution Sn
11
– portion of total chip area An susceptible that defect
– Using Gamma yield function
Yn = 1/{(1+SnAnDn0)}1/Sn
• The overall yield is the product of the yield for each known type of defect
Y =  Yn
for n=1,2,….,N
• For a mature process in a well controlled high yield line, all of the major
yield-limiting defects have probably been controlled or eliminated. The
yield is a product of many terms each approximately 1.
• This means SnAnDn0 << 1
ln Y =  [-(1/Sn) ln(1+SnAnDn0)]
ln(1+SnAnDn0)  SnAnDn0
Thus lnY =  -AnDn0
Y = exp(- AnDn0)
D* = (1/A)  AnDn0
Y = exp(-AD*)
• Here An is the total chip area susceptible to the particular defect
12
Reliability Requirements for VLSI
• It is instructive to consider examples of the effects of device failure
– Early discrete solid state computer systems typically consisted of 105
transistors per system
– If 1 device failure per month is set as the minimum acceptable
condition then the failure rate
 < 1/(105  720 hrs)
= 14  10-9 failure/device-hour
• 1 FIT  1 failure/ 109 device-hour
• The objective for the hypothetical system is for  < 14 FIT
13
MAIN FAILURE MODES for VLSI
•Printed circuit board failures
•Printed circuit boards (PCBs) are vulnerable to environmental influences; for
example,the traces are corrosion-prone and may be improperly etched leaving partial
shorts,while the vias may be insufficiently plated through or filled with solder. The
traces may crack or make poor contact under mechanical loads, often resulting in
unreliable PCB operation.
Residues of solder flux may facilitate corrosion; those of other materials on PCBs
can cause electrical leaks. Polar covalent compounds can attract moisture like
antistatic agents, forming a thin layer of conductive moisture between the traces.
•Vias are a common source of unwanted serial resistance. Mousebites are regions
where metallization has a decreased width; such defects usually do not show during
electrical testing but present a major reliability risk. Increased current density in the
mousebite can aggravate electro-migration problems;
14
Semiconductor (LED and MOSFET) failures Many failures result in
generation of hot electrons. These are observable under an optical microscope.
Examples of semiconductor failures include accumulation of charge carriers
trapped in the gate oxide of MOSFETs. This introduces permanent gate
biasing, influencing the transistor's threshold voltage; it may be caused by hot
carrier injection, ionizing radiation, or nominal use.
15
Electrostatic discharge (ESD)
Electrostatic discharge is a subclass of electrical overstress and may cause immediate
device failure, permanent parameter shifts and latent damage causing increased
degradation rate. It has at least one of three components, localized heat generation,
high current density and high electric field gradient; prolonged presence of currents of
several amperes transfer energy to the device structure to cause damage.
Catastrophic ESD failure modes include:
• Junction burnout, where a conductive path forms through the junction and shorts
it
• Metallization burnout, where melting or vaporizing of a part of the metal
interconnect interrupts it
• Oxide punch-through, formation of a conductive path through the insulating layer
between two conductors or semiconductors; the gate oxides are thinnest and
therefore most sensitive. The damaged transistor (MOSFET) shows a low-ohmic
junction between gate and drain terminals.
16
A parametric failure only shifts the device parameters and may manifest in
stress testing; sometimes, the degree of damage can lower over time.
Latent ESD failure modes occur in a delayed fashion and include:
• Insulator damage by weakening of the insulator structures.
• Junction damage by lowering minority carrier lifetimes, increasing forwardbias resistance and increasing reverse-bias leakage.
• Metallization damage by conductor weakening.
17
•Catastrophic failures require the highest discharge voltages, are the easiest to test for
•and are rarest to occur. Parametric failures occur at intermediate discharge voltages
•Crownlite Mfg. Corp., Failure modes of electronics, and occur more often, with latent
failures the most common. For each parametric failure, there are 4–10 latent ones.
•The gate oxide of some MOSFETs can be damaged by 50 volts of potential, the gate
isolated from the junction and potential accumulating on it causing extreme stress on the
thin dielectric layer; stressed oxide can shatter and fail immediately. The gate oxide itself
does not fail immediately but can be accelerated by stress induced leakage current, the
oxide damage leading to a delayed failure after prolonged operation hours;
•on-chip capacitors using oxide or nitride dielectrics are also vulnerable.
•Smaller structures are more vulnerable because of their lower capacitance, meaning the
same amount of charge carriers charges the capacitor to a higher voltage.
•All thin layers of dielectrics are vulnerable; hence, chips made by processes employing
thicker oxide layers are less vulnerabile.
18
• Capacitors
• Capacitors are characterized by their capacitance, parasitic resistance in series and
parallel, breakdown voltage and dissipation factor; structurally, capacitors consist
of electrodes separated by a dielectric, connecting leads, and housing;
deterioration of any of these may cause parameter shifts or failure. Shorted
failures and leakage due to increase of parallel parasitic resistance are the most
common failure modes of capacitors, followed by open failures. One common
example of capacitor failures includes overvoltage or aging of the dielectric,
occurring when breakdown voltage falls below operating voltage.
• In addition to the problems listed above, electrolytic capacitors suffer from
failures when power dissipation by high ripple currents and internal resistances
cause an increase of the capacitor's internal temperature beyond specifications,
accelerating the deterioration rate; such capacitors usually fail short.
•Reliability of semiconductor devices can be summarized as follows:
1. Semiconductor devices are very sensitive to impurities and particles. Therefore,
to manufacture these devices it is necessary to manage many processes while
accurately controlling the level of impurities and particles.
2. The problems of micro-processes, and thin films and must be fully understood
as they apply to metallization and bonding wire bonding. It is also necessary to
analyze surface phenomena from the aspect of thin films.
3. Reliability of semiconductor devices may depend on assembly, use, and
environmental conditions. Stress factors affecting device reliability include gas,
dust, contamination, voltage, current density, temperature, humidity, mechanical
stress, vibration, shock, radiation, pressure, and intensity of magnetic and
electrical fields.
20
Reliability Theory
•Useful mathematical description requires precise definition of the terms
•Definitions:
•Reliability -- probability that an item will perform a required function under
stated conditions for a stated period of time
21
• For an IC the required function is generally defined by a test program for
an automatic test set
• Often initial test programs are not complete and the ckts are not tested
under “all” required conditions
• As new device failure modes are identified, the appropriate tests are
included in later test programs
• Stated Conditions -- comprise of the total physical environments,
including mechanical, thermal, electrical ….
• Stated period of time -- the time during which satisfactory operation is
required
Cumulative Distribution Function
• If the device is operational at t = 0. The probability that the device will
fail at or before t is given by the function F(t)
F(t) = 0
t<0
0  F(t)  F(t`)
0  t  t`
F(t) 1
t 
22
Reliability Function and Probability Density Function
• The probability density function is
f(t) = dF(t)/dt
• The Cumulative distribution function is
t
F(t) = 0 f(x)dx
• The reliability function is

R(t) = t f (x)dx
• Thus f(t) = - dR(t)/dt
23
Failure Rate
• In many applications the quantity of most concern is the instantaneous
failure rate
• This is often referred to as the hazard rate
• Fraction of devices that were good at time t and that fail by t +  is
given by
•
F(t + ) - F(t) = R(t) - R(t+ )
• The average failure rate during the time interval, , is
•
(t) = average failure rate
•
= 1/  [R(t) - R(t+  )]/R(t)
•
for  0
•
(t) = - 1/R(t) dR(t)/dt = f(t)/R(t)
•
= f(t)/[1 - F(t)]
•
= - d[ln R(t)]/dt
•
R(t) = exp[- 0t (x) dx]
24
Mean Time to Failure (MTTF)
and Common Distribution Functions (p. 630 Sze)
• MTTF is a common measure of reliability

MTTF = 0 t f(t) dt
• It is desirable to have a single mathematical model that represents the
failure rate of devices over their entire lifetime
• (t) generally varies as a function of time as shown
A. High “early failures” or
“Infant Mortality” due to
manufacturing defects
B. “Midlife” or “Steady state”
period of low and generally
constant failure rate
C. “Final” or “wear out”
period



C
t
25
Exponential Distribution Function
• The simplest distribution function, exponential, is characterized by a
constant failure rate over the lifetime of the device. This is useful for
representing a device in which all early failure mechanisms have been
eliminated
– (t) = 0
– R(t) = exp(- 0t)
– F(t) = 1 - exp(- 0t)
– f(t) = 0exp(- 0t)

– MTTF = 0 t 0exp(- 0t) dt
26
Weibull Distribution
• (t) varies as a power of the age of the device
 = (/)t-1
where  and  are constants
• For  < 1 the failure rate decreases with time and can be used to represent
early failure
• For  = 1, (t) is constant and can be used to represent steady state
• For  > 1, (t) increases with time and can be used represent wearout
condition
• For  = 1, the failure rate is constant which is a special case of Weibull
distribution
•
R(t) = exp{-(1/)t}
 1 
f(t) = (/) t-1exp {-(1/)t}
1  F (t )  exp  t 
  
MTTF = 1/ (1+1/) where  =1.
which is linear. The
  1 
ln ln 
slope of the line is
   ln t  ln 

 1  F (t )  
. The MTTF is the
time when F(t) = 0.5
27