Download Common Failure Mechanisms in Microelectronics

Document related concepts

Thermal runaway wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Opto-isolator wikipedia , lookup

Electromagnetic compatibility wikipedia , lookup

Surge protector wikipedia , lookup

Portable appliance testing wikipedia , lookup

Semiconductor device wikipedia , lookup

Automatic test equipment wikipedia , lookup

Fault tolerance wikipedia , lookup

Transcript
Reliability &
Failure Analysis
Yeoh Lai Seng
Sr. Failure Analysis & Reliability Engineer
Fairchild Semiconductor (M) Sdn. Bhd.
1
Reliability
2
What is Reliability ?
• A total of over 2,300,000 blocks of limestone and granite were used
in its construction with the average block weighing 2.5 tons and none
weighing less than 2 tons. The large blocks used in the ceiling of the
King's Chamber weigh as much as 9 tons.
•The estimated total weight of the structure is 6.5 million tons!
• Construction date (Estimated): 2589 B.C.
• Construction time (Estimated): 20 years.
Questions : What is The Reliability of The Great Pyramid
(a) for 4000 Years?
(b) for 6000 Years?
3
Semiconductor Reliability Engineering :
An Industry Perspective
Reliability theory developed apart from the mainstream of probability and
statistics. It was originally a tool to help nineteenth century maritime
insurance and life insurance companies compute profitable rates to charge
their customers. Today, we apply the reliability concept in semiconductor
industry.
"Semiconductor Reliability Engineering" refers to the development of
technology, processes, and standards to ensure the reliability of
semiconductor devices during application. It encompasses a vast set of
engineering disciplines that ensure the continuous improvement in the
reliability of every device.
“Reliability” is defined as the ability of a device to conform to its electrical
and visual/mechanical specifications over a specified period of time under
specified conditions at a specified confidence level. Reliability engineering
employs a wide variety of reliability tests to achieve continuous reliability
improvement throughout the entire life cycle of the semiconductor device from design, to manufacturing, to its usage, and until after its failure.
4
Since it is often more difficult to improve the reliability of a semiconductor
device after it has been released, as much effort as possible must be
exerted to design units that are inherently reliable. This concept is known
as "Designing for Reliability", or DFR. This consists of following all known
design rules for making a device reliable, not only electrically, but visually
and mechanically as well. These design rules must be updated regularly, to
reflect the best known practices that ensure maximum reliability for a
device. Building reliability into a device as early as the 'design phase' is a
'must', especially now that semiconductor devices reach obsolescence
more quickly than in previous years.
Once an integrated circuit has been designed and the first silicon comes
out, reliability tests at wafer level are done to assess the reliability of the
die. This is known as wafer-level reliability testing. Any reliability issues
identified at this level must be corrected, since these will surely manifest
even at package level. Note that the possibility of encountering wafer-level
problems will be greatly minimized by diligently following the concept of
DFR.
If the new circuit passes wafer-level reliability testing, the wafer is
assembled into its intended package. The packaged device will then
undergo package-level reliability testing.
5
Package-level reliability testing refers to the assessment of the over-all reliability
of the device in packaged form. This consists of subjecting packaged samples to
reliability tests that expose the various sample sets to different stress conditions,
after which the samples are tested for any degradation in quality after the
stress. Since reliability stresses are often destructive, only a sample population
is used for reliability testing. As such, the assessment of the reliability of the rest
of the population is essentially statistical and probabilistic in nature.
There are many industry-standard package-level reliability tests already
available. The reliability test employed is chosen based on the failure
mechanism of interest to the engineer, as different stress tests accelerate
different failure mechanisms. Nonetheless, most reliability tests utilize one or
more of the following stress factors to accelerate failure: temperature, moisture
or humidity, current, voltage, and pressure.
Prior to the official release of a new device for mass manufacturing, it must
undergo full qualification. New device qualification is operationally the same as
package-level reliability testing, except that it is systematized with the objective
of generating official reliability data that would justify the mass manufacturing of
the new device. New device qualification most often requires several sets of
samples for different reliability tests.
6
Once the final semiconductor device has successfully completed the
qualification process, it may be released for mass manufacturing and
consumption. To ensure that no major process deviations occur in the
manufacturing line, a regular monitoring of the reliability of the manufactured
units is performed. Reliability monitoring, as this activity is often referred to,
consists of getting finished samples from the line and subjecting these to
reliability testing. Valid reliability failures should undergo root cause analysis
for reliability improvement.
In summary, an excellent semiconductor reliability engineering system would
have all of the following components: 1) design for reliability; 2) wafer-level
reliability testing; 3) package-level reliability testing; 4) new device/process
qualification; and 5) reliability monitoring.
7
An Overview of
Semiconductor Engineering Flows
Wafer Design
Wafer Fab
Wafer Level
Failure
Analysis
Package Level
Reliability
Pass
Product
Release
Wafer Level
Reliability
Fail
Assembly
Process
Fail
Pass
Packaging
Design
Package Level
Failure Analysis
8
Reliability Tests
Request
Generate Qual Plan
E O N Request
No
Assembly, Test, Collect
ALR Data
Approved
Qual Plan
Yes
Reliability Test
Flow
Received Materials
Generate Reliability
Tests Requests
Start Reliability Tests
ATE Test
Verified
Failure
Yes
Failure Analysis
Generate Final
Reliability Test Report
Reliability
Failure
No
Record Database
No
Yes
8D Analysis
9
Bath Tub Curve
10
Bath Tub Curve
1. An “infant mortality” early life phase characterized by a decreasing failure
rate (Phase 1). Failure occurrence during this period is not random in time
but rather the result of substandard components with gross defects and the
lack of adequate controls in the manufacturing process. Parts fail at a high
but decreasing rate.
2. A “useful life” period where electronics have a relatively constant failure rate
caused by randomly occurring defects and stresses (Phase 2). This
corresponds to a normal wear and tear period where failures are caused by
unexpected and sudden over stress conditions. Most reliability analyses
pertaining to electronic systems are concerned with lowering the failure
frequency (i.e.,  is constant) during this period.
3. A “wear out” period where the failure rate increases due to critical parts
wearing out (Phase 3). As they wear out, it takes less stress to cause failure
and the overall system failure rate increases, accordingly failures do not
occur randomly in time.
11
Bath Tub Curve : Example
An IC manufacturing company encountered this failure rate curve
Failure
rate
Time
A program was implemented immediately to significantly improve
the product quality and yield.
12
Bath Tub Curve : Example
(a)
(b)
(c)
(d)
How the new bath-tub curve will looks like?
13
Case Study
Company X launched a new product. During product qualification,
the design engineer requested reliability study to be performed.
Table below shows the reliability data.
Time Period
(Hours)
Sample Size
on Stress
Number of
Failures
Cumulative
Failures
Failure Rate
24
50
23
23
0.9583
48
27
8
31
0.3333
72
19
3
34
0.1250
96
16
0
34
0
168
16
1
35
0.0139
240
15
0
35
0
360
15
2
37
0.0167
480
13
3
40
0.0250
600
10
1
41
0.0083
720
9
0
41
0
840
9
0
41
0
960
9
0
41
0
1008
9
0
41
0
14
1.2
1
Failure Rate
0.8
0.6
0.4
0.2
0
0
200
400
600
800
1000
1200
-0.2
Time (Hours)
High infant mortality is demonstrated on the bathtub curve, where
the early life failure is relatively high but rapidly declining over time.
This is undesirable as it would reflect high failure rate at the early
application stage of the end customer.
The high infant mortality correlates with yield loss due to extrinsic
defects. Why?
15
Failure rate ()
 = Total number of failures within a population
Total time expanded by that population
=
41
23(24hrs) + 8(48hrs) + 3(72hrs) + 1(168hrs) + 2(360hrs)
+ 3(480hrs) + 1(600hrs) + 9(1008hrs)
= 0.003117 failures/ hour
= 3117000 FITs or 3117000 failures per 109 device-hours
Mean Time Between failures (MTBF)
MTBF = 1/ 
= 1/ 0.003117
= 320.8 hours
= 13.37 days
Note : 1 FIT (Failure unIT) = one failure in 109 device-hours
16
Time Period, t
Reliability, R(t) = exp (-t)
Failure, F(t) = 1- R(t)
1 month
0.106008075
0.893991924
2 months
0.011237712
0.988762287
3 months
0.001191288
0.998808711
4 months
0.000126286
0.999873713
5 months
0.000013387
0.999986612
6 months
0.000001419
0.999998580
7 months
0.000000150
0.999999849
8 months
0.000000016
0.999999984
9 months
0.000000002
0.999999998
10 months
0
1
17
In order to grasp the quantitative degree of risk that the products are
exposed to, we have to study the tabulated reliability rate as well as the
failure rate over time.
It was observed that the reliability level of the devices is relatively low
starting from the first month of operation, that is, merely 0.11. This implies
a relatively high failure rate of 0.89.
The devices’ reliability degrades drastically within a few months time and
fall to zero at 10th months of operation, where the devices have came to
the end of life.
0.12
1.02
0.11
1
0.98
0.09
Failure Function F(t)
Reliability Function R(t)
0.1
0.08
0.07
0.06
0.05
0.04
0.03
0.96
0.94
0.92
0.9
0.88
0.02
0.86
0.01
0.84
0
0
1
2
3
4
5
6
7
Time Period t (Months)
8
9
10
11
1 month
2 months
3 months
4 months
5 months
6 months
7 months
Time Period t (Months)
8 months
9 months
18
10 months
Type of Accelerating Tests
1. Preconditioning (PRECON)
The preconditioning stress sequence is performed for the purpose of evaluating
the capability of semiconductor devices to withstand the stresses imposed by a
user’s printed circuit board assembly operation. A properly designed device (i.e.
die and package combination) should survive this preconditioning sequence with
no measurable changes in electrical performance. Furthermore, preconditioning
of properly designed devices should not produce latent defects which lead to
degraded reliability during life or environmental stress tests. Changes in electrical
characteristics and both observable as well as latent physical damage during this
stress sequence result principally from mechanical and thermal stresses and from
ingress of flux and cleaning agents. Effects include die and package cracks,
fractured wire bonds, package and lead frame delamination, and corrosion of die
metallization.
19
1. Preconditioning (PRECON)
Step
Stress
Conditions
1
Initial Electrical Test
Room temperature
2
External Visual
Inspection
40X Magnification
3
Temperature Cycling
5 cycles at –40C (max) to +60C (min) (Step is
optional)
4
Bake Out
24 hrs (min) at 125C
5
Moisture Soak
Per MSL rating
6
Reflow
3 cycles per referenced profile
7
Flux Application
10 sec immersion in water soluble flux @ room
temp
8
Cleaning
Multiple DI water rinses
9
Dry
Room temperature
10
Final Electrical Test
Room temperature
20
2. Operating Life (SOPL/DOPL)
The operating life test is performed for the purpose of demonstrating the
quality and reliability of devices subjected to the specified conditions over
an extended time period. Either a static or dynamic condition may be
used, depending on the circuit type and the wafer fabrication technology.
The specified test conditions (i.e. bias conditions, loads, clock inputs,
etc.) are selected so as to represent the worst case conditions for the
device. Unless otherwise specified in the detailed test procedure, the test
is run at an ambient temperature of +125C or above. Many device types
are routinely run at +150 C ambient. Ambient temperatures above +170
C are generally considered impractical due to the physical limitations of
circuit boards, sockets, device lead finishes, molding compound glass
transition temperatures, etc.
Stress Conditions: 125 C – 150 C, Bias
21
3. Power Cycle (PRCL)
The power cycle test is performed to determine the effects on solid state
devices of thousands of power-on/power-off operations such as would be
encountered in an automobile or a TV set. The repetitive heating/cooling
effect caused by multiple on/off cycles can lead to fatigue cracks and
other degrading thermal and/or electrical changes in the die attachment
system of those devices which generate significant internal thermal
heating under maximum load conditions (i.e. voltage regulators or highcurrent drivers). This test forces junction temperature excursions at the
rate of ~ 30 cycles per hour (typical).
Stress Conditions:
Delta Tj = 100 C, 2 minute cycle
22
4. High Temperature Gate Bias Test (HTGB)
The HTGB test biases gate or other oxides of the device samples. The
devices are normally operated in a static mode at, or near, maximumrated oxide breakdown voltage levels. The particular bias conditions
should be determined to bias the maximum number of gates in the
device. The HTGB test is typically used for power devices.
Stress Conditions:
150 C Tj, Biased
5. High Temperature Reverse Bias Test (HTRB)
The HTRB test is configured to reverse bias major power handling
junctions of the device samples. The devices are characteristically
operated in a static operating mode at, or near, maximum-rated
breakdown voltage and/or current levels. The particular bias
conditions should be determined to bias the maximum number of
solid state junctions in the device. The HTRB test is typically applied
on power devices.
Stress Conditions:
150 C Tj, Biased
23
6. Temperature Humidity Biased Test (THBT)
The steady-state temperature-humidity-bias life test is performed for the
purpose of evaluating the reliability of non-hermetic packaged devices
operating in humid environments. It employs severe conditions of
temperature, humidity, and bias which accelerate the penetration of moisture
through the external protective material (encapsulant) or seal) or along the
interface between the external protective materials and the metallic
conductors passing through it. When moisture reaches the surface of the
die, the applied potential forms an electrolytic cell, which corrodes the
aluminum, affecting DC parameters through its conduction, and eventually
causes catastrophic failure by opening the metal. The presence of
contaminants such as chlorine greatly accelerates the reaction as does
excessive phosphorus in the PSG layers (passivation, dielectric or field
oxide).
Stress Conditions:
85%RH, 85 C
24
7. Highly Accelerated Stress Test (HAST)
The HAST test is performed for the purpose of evaluating the
moisture resistance of non-hermetic packaged devices operating in
high humidity environments. Bias is applied minimizing current
draw using alternating potentials wherever possible. The test
approximates a highly accelerated version of the THBT test. These
severe conditions of pressure, humidity, and temperature, together
with bias, accelerate the penetration of moisture through the
external protective material (encapsulant or seal) or along the
interface between the external protective material and the metallic
conductors passing through it. When moisture reaches the surface
of the die, the applied potential forms an electrolytic cell which
corrodes the metallization, affecting the DC parameters and
eventually causing catastrophic failure by opening the metal. The
presence of contaminants, such as chlorine, greatly accelerate the
reaction as does excessive phosphorus in the PSG layers
(passivation, dielectric or field oxide). The use of HAST as a stress
technique should be avoided when stressing assembly packages
that have mold compound and die attach materials with Tg of less
than 130 C since uncharacteristic failures may result.
Stress Conditions:
130 C, 85%RH, 19.5psig
25
8. Autoclave (ACLV)
The autoclave (or pressure cooker) test is performed for the process of
evaluating the moisture resistance of non-hermetic packaged devices. No bias
is applied to the devices during this test. It employs severe conditions of
pressure, humidity and temperature not typical of actual operating
environments that accelerate the penetration of moisture through the external
protective material (encapsulant or seal) or along the interface between the
external protective material and the metallic conductors passing through it.
When moisture reaches the surface of the die, reactive agents cause leakage
paths on the surface of the die and corrode the die metalization, affecting DC
parameters and eventually catastrophic failure. Other die-related failure
mechanisms are activated by this method including mobile ionic contamination
and various temperature and moisture related phenomena. The autoclave test
is destructive and produces increasing failure rates when repetitively applied. It
is useful for short-term, comparative evaluations such as lot acceptance,
process monitors and qualifications but generates no absolute information
since accelerating factors relating the test conditions to those of the operating
environment are not well established.
26
In addition, the autoclave test can produce spurious failures not
representative of device reliability, due to excessive chamber
contaminants. This condition is usually evidenced by severe external
package degradation, including corroded device terminals/leads or the
formation of conducting matter between the terminals, or both. The
autoclave test is not, therefore, suitable for measurements of external
package quality or reliability. Specific device types and technologies may
be particularly sensitive to package degradation. ACLV test should not be
included in the qualification of laminate or tape based packages i.e. FR4,
polyimide tape, or equivalent. However, during the development stages
of a laminate or tape based package ACLV can be used to understand
inherent weakness in the package. Cautions must be taken when
interpreting results because failure mechanisms may be due to
exceeding the capabilities of the package, producing unrealistic material
failures.
Stress Conditions:
100%RH, 121 C, 2ATM
27
9. Temperature Cycle (TMCL)
The temperature cycle test is conducted for the purpose of determining the
resistance of devices to alternating exposures at extremes of high and low
temperatures. Permanent changes in electrical characteristics and physical
damage produced during temperature cycling result principally from mechanical
stress caused by thermal expansion and contraction. Effects of temperature
cycling include cracking and delamination of packages, cracking or cratering of
die, cracking of passivation, delamination of metallization, and various other
changes in the electrical characteristics resulting from thermo-mechanically
induced damage.
Stress Conditions:
-40 C to +125 C or –65 C to +150 C
10. Board Level Temperature Cycle (BTMCL)
The BTMCL test is intended to provide fatigue-related wearout
information on the solder joint attachment of devices to circuit boards.
Daisy chain structure test devices are mounted to circuit boards and
cycled through temperature extremes typically in the range of 0 C to
+100 C. During stress, the solder joint resistance is continuously
monitored and a unit is considering failing when 5 cumulative incidences
of elevated resistance (> 1000 ohms) are detected. Ideally, testing
should continue until a cumulative 63% failure rate of the test sample has
been observed.
Stress Conditions:
0 C to +100 C, 2 cycles/hour
28
11. High Temperature Storage Life (HTSL)
The high temperature storage (also called the stabilization bake test) is
employed for the purpose of determining the effects of storing devices
at elevated temperatures without electrical stresses applied. Devices
under test are subjected to continuous storage in a chamber with
circulated air heated to +150 C. At the end of the specified stress
period, the devices are removed from the chamber, allowed to cool,
and electrically tested. Interim measurements are made if specified in
the detailed test procedure.
Stress Conditions:
150 C
29
12. Moisture Sensitivity (MOIS)
The purpose of this stress is to classify the sensitivity of non-hermetic solid
state Surface Mount Devices (SMDs) to moisture-induced stress so that
they can be properly packaged, stored, and handled to avoid subsequent
thermal/mechanical damage during the assembly solder reflow
attachment and/or repair operation.
Step
Stress
Conditions
1
Initial Electrical Test
Per data sheet
2
External Visual
Inspection
40X Magnification
3
CSAM Inspection
Classify and Measure Initial Delamination Levels
4
Bake Out
5
Moisture Soak
Per Target MSL rating
6
Reflow
3 cycles per referenced profile
7
External Visual
Inspection
40X Magnification
8
Final Electrical Test
Per data sheet
9
CSAM Inspection
Classify and Measure Final Delamination Levels
10
Final Electrical Test
Room temperature

24 hrs (min) at 125 C
30
Case Study
Imagine that you are a reliability engineer in a memory IC chips
manufacturing firm. Your company design team has just came out
with a new package. In this new package, solder bumps are
replaced with silver-filled adhesive, while copper wire is used
instead of gold wire. You are required to propose appropriate
accelerating tests for the new package qualification.
31
Reliability Modeling for Failure Mechanisms
Failure Mechanism Reliability Modeling, or reliability modeling, or
acceleration modeling, or simply modeling, is the mathematical representation
of a failure mechanism in terms of a set of algebraic or differential equations
from the perspective of its reliability implications. The term failure mechanism
refers to the actual physical phenomenon behind a failure occurrence.
Modeling is a means of determining and understanding the different variables
or factors that bring out and accelerate a failure mechanism.
Being able to model a mechanism and quantify how it is affected by various
environmental factors will allow a reliability engineer to develop appropriate
reliability tests for estimating field failure rates and predicting when failures will
begin to occur. Modeling is often expressed in the form of time to failure, or tf,
or the acceleration factor, AF.
32
Failure Mechanism
Package crack
Lifted/ broken ball/wedge/neck
Ea (eV)
1.0
0.45 – 0.75
Bond inter-metallic failure (Bromine induced)
1.0
Bond inter-metallic failure (chlorine induced)
0.8
Die attach failure
0.3
Delamination between mold compound and lead frame
0.3
Lifted welds at lead frame
0.45
Metal corrosion (halide induced)
0.7
Metal corrosion (electrolytic)
0.3 – 0.6
Device leakage (surface inversion)
1.0
Dielectric breakdown
0.35
Hot carrier trapping in oxide
-0.06
Electromigration in Al
0.5
Contact electromigration, Si in Al
0.9
Contact electromigration (Al at sidewall)
0.8 – 1.4
Contact metal migration through barrier layer
1.8
Au-Al intermetallic growth
1.0
33
1. Thermal Effects (Arrhenius Equation)
TTF:
Ao:
Ea:
k:
T:
Time to Failure
Arbitrary Scale Factor
Activation Energy (eV)
Boltzman’s Constant (8.62 x 10-5 eV/K)
Temperature (degrees Kelvin)
Acceleration Factor: (Ratio of TF values, use/stress)
AF  e
(
Ea 1
)(
1
)
T
T
use
stress
k
34
2. Electromigration (Black’s Equation)
TTF: Time to Failure
A: Prefactor with complex dependence upon grain size, line structure and
geometry, test conditions, current density, thermal history, etc.
J: Current Density
N: Empirically determined factor ranging from 1-14. The industry accepted
number today is N=2 for current densities in the range of 1 to 2x106 A/cm2.
Ea: Activation Energy (eV)
k: Boltzman’s Constant (8.62 x 10-5 eV/K)
T: Temperature (degrees Kelvin)
Acceleration Factor:
AF  (
J use
J stress
) N  e
(
Ea 1
)(
1
)
T
Tstress
use
k
35
3. Temperature Humidity Mechanisms (Peck’s Model)
TTF:
Ao:
RH:
N:
Ea:
k:
T:
Time to Failure
Arbitrary Scale Factor
Relative Humidity (%)
an experimentally determined constant
Activation Energy (eV)
Boltzman’s Constant (8.62 x 10-5 eV/K)
Temperature (degrees Kelvin)
Acceleration Factor: (Ratio of TF values, use/stress)
AF  (
RH use
RH stress
) N  e
(
Ea 1
)(
1
)
T
T
use
stress
k
36
4. Temperature and Voltage Mechanisms (Eyring Equation)
TTF:
Ao:
V:
N:
Ea:
k:
T:
Time to Failure
Arbitrary Scale Factor
Voltage (V)
An experimentally determined constant
Activation Energy (eV)
Boltzman’s Constant (8.62 x 10-5 eV/K)
Temperature (degrees Kelvin)
Acceleration Factor: (Ratio of TF values, use/stress)
Vuse
AF  (
Vstress
)
N
e
(
Ea 1
)(
1
)
T
Tstress
use
k
37
5. Creep Mechanisms:
TTF  Bo  (To  T ) n  e
(
Ea
)
kT
TTF:
Time to Failure
Bo:
Process Dependent Constant
To:
Stress free temperature for metal (~metal deposition temperature for aluminum)
T:
Temperature (degrees Kelvin)
n:
n = 2-3 (n usually ~5 if creep, thus implies T<Tm/2)
Ea:
Activation energy (eV)
k:
Boltzman’s Constant (8.62 x 10-5 eV/K)
T:
Temperature (degrees Kelvin)
Acceleration Factor: (Ratio of TF values, use/stress):
AF  ((T0  Taccel ) /(T0  Tuse ))  N  e
(
Ea 1
)(
1
)
T
Tstress
use
k
38
6. Thermo-Mechanical Mechanisms (Coffin-Manson Equation)
Nf:
Co:
T :
n:
Number of cycles to failure
A material dependent constant
Entire temperature cycle range for the device
An experimentally determined constant
Acceleration Factor: (Ratio of TF values, use/stress)
AF  (Tuse / Tstress )
n
39
Reliability Modeling : Example 1
40
Reliability Modeling : Example 2
41
Failure
Analysis
42
Semiconductor Failure Analysis
Semiconductor Failure analysis (FA) is the process of determining how or why
a semiconductor device has failed, often performed as a series of steps known
as FA techniques. Device failure is defined as any non-conformance of the
device to its electrical and/or visual/mechanical specifications. Failure analysis is
necessary in order to understand what caused the failure and how it can be
prevented in the future.
Electrical failure can either be functional or parametric. Functional failure refers
to the inability of a device to perform its intended function. Parametric failure
refers to the inability of a device to meet the electrical specifications for a
measurable characteristic (such as leakage current) that does not directly pertain
to functionality. Thus, a parametric failure may be present even if the device is
still functional or able to perform its intended function.
For example, a DAC that can convert digital data into the correct analog voltage
but draws excessive supply current is a parametric failure, but the one that does
not convert data at all is a functional failure. A device is said to be failing
catastrophically if it is grossly failing all parametric and functional test blocks.
43
Semiconductor Failure Analysis
Failure analysis starts with failure verification. It is important to validate the
failure of a sample prior to failure analysis in order to conserve valuable FA
resources.
Failure verification is also done to characterize the failure
mode. Good characterization of the failure mode is necessary to make the FA
efficient and accurate.
After failure verification, the analyst subjects the sample to various FA techniques
step by step, collecting attributes and other observations along the way. Nondestructive FA techniques are done before destructive ones. Also, the results of
these various FA techniques must be consistent or corroborative. Any
inconsistency in results must be resolved before proceeding to the next step.
In general, the results of the various FA techniques would collectively point to the
real failure site. The FA process is finished once there are enough information to
make a conclusion about the location of the failure site and cause or mechanism
of failure.
44
FA Terminology
Failure Mode - a description of how a device is failing, usually in terms of how
much it is deviating from the specification that it is failing, e.g., excessive supply
current, excessive offset voltage, excessive bias current
Failure Mechanism - the physical phenomenon behind the failure of a device,
e.g., metal corrosion, electrostatic discharge, electrical overstress
Root Cause - the first event or condition that triggered, whether directly or
indirectly, the occurrence of the failure, e.g., improper equipment grounding that
resulted in ESD damage, a system problem that caused the usage of an
incorrect mask set
The objective of a failure analyst when conducting FA is to determine the failure
mechanism that led to the failure mode of the device.
Once the failure
mechanism has been determined, the process owner or expert can work with
the failure analyst to determine the root cause of the problem. The process
owner must always address the root cause of the failure mechanism, not just
the intermediate failure causes that occurred after the root cause has already
happened.
45
Basic FA Capability
FA Equipment
Purpose
Estimated Cost (USD)
Curve Tracer and Break-out Boxes
I/V Curve Tracing
10K - 20K
Digital Multimeters
Bench Testing
5K - 10K
Power Supplies/Signal Generators
Bench Testing
5K - 10K
Oscilloscope
Bench Testing
10K - 20K
Low-power Microscope
Optical Inspection
5K - 10K
High-power Microscope
Optical Inspection
10K - 50K
Fume Hood and Glassware
Manual Decapsulation
5K - 15K
Vise and Assorted Hand Tools
Mechanical Decapsulation
<1K
46
Good FA Capability
FA Equipment
Purpose
Estimated Cost
(USD)
Microprobing Station
Probing / Thermography
40K - 100K
Laser Cutter
Circuit Isolation
20K - 40K
Microsectioning Equipment
Cross-sectioning
40K - 80K
High-Resolution X-ray System
X-ray Radiography
100K - 200K
Automatic Decapsulation System
Decapsulation
20K - 30K
Reactive Ion Etcher
Die Delayering
40K - 60K
SEM/EDX System
SEM / EDX Analysis
300K - 400K
47
Advanced FA Capability
FA Equipment
Purpose
Estimated Cost
(USD)
Light Emission Microscope
Light Emission Microscopy
150K - 200K
Scanning Acoustic Microscope
Acoustic Microscopy
200K - 250K
Focused Ion Beam Machine
FIB Analysis
500K - 700K
FTIR Spectrometer
FTIR Analysis
60K - 80K
TEM System
TEM Inspection
400K - 500K
Auger Spectrometer
Auger Analysis
~1M
48
FA
Flow
Receipt of Failure Devices
Failure Verification
Curve Trace
Parametric Test
Functional Bench Test
External Visual Inspection
X-ray Analysis
External Package
Cleaning & Retest
Bake Recoverability Test
SAT
Decapsulation
Internal Visual Inspection/ SEM/ EDX Analysis
Data Review
Failure Mechanism Determined
Failure Mechanism Undetermined
Level I Analysis
Level II Analysis
49
Generate Final report
FA
Flow
From Level 1 Analysis
Internal Visual Inspection
(Optical Microscope / SEM)
Non-Visible Failure Mode
Visible Failure Modes
Electrical Microprobing
TIVA/ LIVA Analysis
Liquid Crystal Analysis
Data Review
SEM / EDX Analysis
Cross-sectioning
Deprocessing
Internal Visual Inspection
(Optical Microscope / SEM)
Failure Mechanism Determined
Final Report
50
Structural Diagram of an IC Package
51
Common Failure Mechanisms in Microelectronics Devices
Ball Bond Lifting
Detachment of the ball bond from the silicon chip; also refers to non-sticking
of the ball bond to the bond pad.
Common Causes: contamination on the bond pad, incorrect wire bond
parameter settings, instability of the die during bonding, bond pad corrosion,
excessive bond pad probing, Kirkendall voiding, excessive thermal stress
resulting in excessive intermetallic formation, bond pad metallization/barrier
metallization lifting, cratering.
Wedge Bond Lifting
Detachment (or non-sticking) of the wedge bond from the silicon chip,
bonding post, or lead finger.
Common Causes: contamination on the bond pad or lead finger, incorrect
parameter settings, instability of the die or lead frame during bonding, bond
pad or lead finger corrosion, excessive bond pad probing
52
Common Failure Mechanisms in Microelectronics Devices
Ball Bond Neck Break
Breakage of the wire at the neck of the Au ball bond.
Common Causes: incorrect wire bond parameter settings, incorrect wire
looping, die-to-package delamination, excessive wires weeping during
mold.
Wedge Bond Heel Break
Breakage of the wire at the heel of the Al wedge bond.
Common Causes: incorrect wire bond parameter settings,
incorrect wire looping, lead finger-to-package delamination,
excessive wires weeping during mold.
Midspan Wire Break
Breakage along the span of the wire.
Common Causes: wire nicks or damage,
wire corrosion, tight wire looping,
excessive wires weeping, electrical overstress.
53
Common Failure Mechanisms in Microelectronics Devices
Bond-to-Metal Shorting
Electrical shorting between the bond and a metal line on the die.
Common Causes: incorrect wire bond parameter settings, incorrect bond
placement, insufficient bond pad-to-metal distance
Bond-to-Bond Shorting
Electrical shorting between two bonds.
Common Causes: incorrect wire bond parameter settings, incorrect bond
placement, insufficient bond pad-to-bond pad distance.
Wire-to-Wire Shorting
Electrical shorting between two wires.
Common Causes: incorrect wire looping,
excessive wires weeping, insufficient wire-to-wire distance.
54
Common Failure Mechanisms in Microelectronics Devices
Cratering
Silicon damage under the bond pad, the worst of which is when a chunk
of silicon is completely detached from the active circuit.
Common Causes: incorrect wire bond parameter settings,
excessive bond pad probing.
Die Chip-outs
Die chipping is a failure mechanism wherein a part or parts of the die
break away from the die itself.
Die Corrosion
Die cracking is the occurrence of fractures in or on any part of the die.
55
Common Failure Mechanisms in Microelectronics Devices
Die Lifting
Die lifting is the separation or detachment of the die from its die pad or die
cavity.
Die Scratches
Die scratch is the presence of abrasion, or
laceration damage on surface of the die.
Package Cracking
Ceramic/ plastic package cracking is the occurrence of
fractures anywhere in or on the package.
Plastic Package Delamination
Plastic Package Delamination refers to the disbonding between a surface of
the plastic package and that of another material. Plastic delamination may
therefore occur at an interface of the plastic and the lead frame, die, die
paddle, or die attach material.
56
Common Failure Mechanisms in Microelectronics Devices
Corrosion in Die and Package
Corrosion is the degradation of metals as a result of electrochemical activity.
Lead Corrosion
Lead corrosion is often due to inadequate lead finish, the presence of
contaminants on the leads, and exposure of the leads to excessive moisture. It
can be accelerated by higher temperatures and the presence of electrical bias
on the leads.
Lead Frame Corrosion
Similar to lead corrosion, but occurring on the lead frames as received from the
material supplier.
Wire Corrosion
Bond wire corrosion can occur, gross cases of which can lead to wire breaking
or even total disintegration of the wire. Most commonly encountered in
57
aluminum wires that have been contaminated by chlorine.
Common Failure Mechanisms in Microelectronics Devices
Contact Migration
Contact migration refers to the diffusion of the metal atoms of a contact
(usually Al or an alloy thereof) into the Si substrate.
Electromigration
It is referred to the gradual displacement or mass transport of the metal
atoms of a conductor as a result of current flowing through that
conductor. It can lead to formation of voids in the metal line, which may
cause open and short circuits.
Dielectric Breakdown
Dielectric breakdown refers to the destruction of a dielectric layer, usually as
a result of excessive potential difference or voltage across it. It is usually
manifested as a short or leakage at the point of breakdown.
58
Common Failure Mechanisms in Microelectronics Devices
Time-Dependent Dielectric Breakdown
Early life and time-dependent dielectric breakdowns are primarily due to the
presence of weak spots within the oxide layers arising from poor
processing or uneven growth. These weak spots or dielectric defects may
be caused by mobile sodium, ions in the oxide, and impurities trapped on
the Si surface prior to oxidation
Electrical Overstress (EOS)
EOS refers to the destruction of the circuit because of excessive voltage,
current, or power. EOS damage is usually very obvious. Metal lines are
discolored, burnt or melted.
Electrostatic Discharge (ESD)
ESD can occur when a high electrostatic field develops between two objects
close proximity. An ESD event can damage a device in many ways, e.g.
conductor fusing, metal-resistor severing, junction damage.
59
Common Failure Mechanisms in Microelectronics Devices
Ionic Contamination
Mobile ionic contamination refers to the presence of mobile ions such as Na+,
Cl-, and K+ in the device structures of an integrated circuit. These mobile ions
can come from the environment, humans, wafer processing materials, and
packaging materials.
Mobile ionic contamination is commonly observed in the gate oxide of a MOS
transistor. These ions can accumulate and cause charge build-ups that can
shift the gate threshold of the MOS transistor. Inversion channels may also form
in MOS transistors.
In bipolar devices, mobile ions can affect carrier
concentrations, changing the beta of the transistor.
Mobile ions respond to temperature and voltage, so failures due to mobile ionic
contamination can be accelerated by burn-in. Mobile ionic contamination
failures can also be made to recover by subjecting the device to unbiased bake,
since this will redistribute the ions by promoting their random movement. Thus,
a device is most likely a mobile ionic contamination failure if it fails after burn-in
but recovers after unbiased bake.
60
Common Failure Mechanisms in Microelectronics Devices
Ionic Contamination
61
Future Challenge for Failure Analysis
62
Evolution of IC Packages
27.90 x 9.14 mm
9.90 x 3.90 mm
5.50 x 4.40 mm
Future ?
2.20 x 1.35 mm
1.45 x 1.00 mm
63
As a microelectronics engineering students,
what can you be after graduated?
1. Process Engineer
2. Test Engineer
3. Product Engineer
4. Circuit Design Engineer
5. Package Development Engineer
6. R&D Engineer
7. Reliability Engineer
8. Failure Analysis Engineer
9. Others
64
Carrier as A Reliability Engineer
-Create qualification of new packages and processes and reliability monitoring
plans for continuous improvement.
-To define internal stress/test procedures to support new/emerging technologies
and to develop/implement required procedural specifications.
- Regularly update and maintain the reliability specifications.
-Provide technical support and training for the reliability specialists.
- Acquire new equipment to increase the reliability capability.
-To develop necessary FMDRC (Failure Mechanism Driven Reliability
Characterization) techniques to characterize/model new technology. Work with
modeling group to develop analyst models for potential failure mechanisms.
65
Carrier as A Failure Analysis Engineer
- To define failure analysis procedures to support new/emerging technologies
and to develop/implement required procedural specifications.
-To provide technical guidance and compilation of the analysis specification.
- Regularly update and maintain the failure analysis specifications.
-Provide technical support and training for the failure analysis specialists.
-Acquire new equipment to increase the failure analysis capability.
- Handle plant engineering evaluation, new qualifications, quality issues, and
customer returns.
- Provide precise and comprehensive failure analysis report to the requestors
and to assist them in finding out the root cause.
66
Thank
You
67