Download White Paper - Power Conversion Reliability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Solar micro-inverter wikipedia , lookup

Power inverter wikipedia , lookup

Opto-isolator wikipedia , lookup

Islanding wikipedia , lookup

Pulse-width modulation wikipedia , lookup

Portable appliance testing wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Electrification wikipedia , lookup

Electric power system wikipedia , lookup

Variable-frequency drive wikipedia , lookup

Power over Ethernet wikipedia , lookup

Decibel wikipedia , lookup

Rectifier wikipedia , lookup

Audio power wikipedia , lookup

Stray voltage wikipedia , lookup

History of electric power transmission wikipedia , lookup

Electrical substation wikipedia , lookup

Power electronics wikipedia , lookup

Surge protector wikipedia , lookup

Buck converter wikipedia , lookup

Power engineering wikipedia , lookup

Power MOSFET wikipedia , lookup

Voltage optimisation wikipedia , lookup

Alternating current wikipedia , lookup

Fault tolerance wikipedia , lookup

AC adapter wikipedia , lookup

Switched-mode power supply wikipedia , lookup

Mains electricity wikipedia , lookup

Transcript
POWER CONVERSION RELIABILITY
WHITE PAPER: TW0059
December 2008
G. Mulcahy
1
About the Author
GARY MULCAHY
Gary Mulcahy is Chief Technology Officer of TDI Power. He
received his BE-EE from New York University followed by graduate
study at the Polytechnic Institute of New York. Mr. Mulcahy is a
recognized authority in power conversion technology and the
design, development and production of power systems for
maximum performance and reliability with minimal life cycle cost
of ownership.
2 of 18
Executive Summary
Power Conversion Equipment reliability is crucial to the success of
modern electronic systems.
Best in class reliability assurance
practices, based on a Total Quality Management philosophy and
including HALT and HASS testing, provide power system reliability
levels well above the traditional expectations of military and
telecommunication industry standards.
3 of 18
The Need for Power Conversion Reliability
Electronic equipment is often fundamental to business and commerce, where loss of
function, service or data results in large monetary penalties and/or compromised safety.
Quite often, systems are designed with redundancy so that no single failure event shuts
the system down. However, even with redundancy, it is crucial that system operation
not be compromised through propagation of a single point failure to other equipment via
safety, fire, smoke, noise, or other issues. Power conversion equipment reliability is vital
to realizing this goal.
Premature failure or wear out of power conversion equipment continues to be a major
concern in various industries and applications. By its nature, power conversion
equipment is subject to unique reliability challenges. Energy density, operating
voltages, size and weight of components, thermal management, size and routing of
conductors, and relatively low production volumes on unique designs are just a few of
the issues that must be dealt with in providing a reliable and dependable power
conversion system.
Historic Approaches to Power Conversion Reliability
Historically, power conversion reliability was pursued by an “after the fact” approach of
inspection, unit-by-unit parametric testing and static burn in. Many times in military
products the use of JAN-TX approved parts was required in the hope that if the units’
parts were manufactured in a highly controlled and traceable manner, this would result
in a more reliable end product.
Eventually it became apparent these approaches were not effective. No amount of
inspection or unit-level testing was enough to overcome a design that was either overly
complex or had inherent flaws that caused components to be used beyond their
practical capabilities.
To address these limitations, more focus was placed on
understanding the underlying problems that cause non-reliable operation and addressing
these at the design stage of the product.
Numerous study groups were formed to determine best practices toward achieving
power conversion reliability, such as the Naval Ad-hoc Committee for Power Supply
Reliability, formed in the early 1980’s. The result of these efforts was the identification
of a series of best practices, such as those released in document “NAVMAT P4855A” and
eventually superceded by the current document “NAVSO P-3641A”.
4 of 18
Figure 1 – NAVSO P-3641A and its predecessor document NAVMAT P4855A
The NAVMAT standard includes, among other things, the following recommendations:
−
A comprehensive understanding of the unit’s intended environment and a
specification that adequately anticipates it
−
Budgeting adequate time, monetary and personnel resources, and physical space
for the power system
−
Effective design reviews throughout the product’s design phase
−
Design for Manufacturability (DFM) Considerations
−
Design Verification (i.e., does the design meet its specification) and Validation
(i.e., does the design work properly in its end application)Testing during the
development phase
−
Highly Accelerated Life Testing during the development phase
−
Environmental Stress Screening as part of the production process
The Evolution of Reliability Assurance
These recommendations serve as the basis for an all-encompassing process of reliability
assurance. Starting from where NAVMAT left off, and based on the principals of Total
Quality Management, this process addresses the causes for unreliable operation
throughout the product’s design and fulfillment phases. As depicted in Figure 2, in order
to achieve the goal of product reliability, it must be surrounded by a process that
prevents any escapes. Any gaps in the process may allow an escape that can result in
non-reliable operation.
5 of 18
Figure 2 – Comprehensive Reliability Process
Total Quality Management
Total Quality Management, or TQM, is the overriding philosophy toward achieving
reliable products. A process built around TQM looks at all facets of product realization
and optimizes these with respect to a balanced agenda of performance, cost and time.
Crucial to TQM is a cross-functional approach to product design and manufacturing.
This demands that all stakeholders in the product, such as Engineering, Manufacturing,
Quality, Materials and the end-customer, be involved at the design phase so that the
end product will adequately incorporate their concerns. In this way, issues such as
Design for Manufacturabilty or Design for Test can properly be incorporated.
Conservative Design Margins
Experience has shown commercial parts can provide superb reliability if adequate
margin is provided between their worst case operating point and their rated capabilities.
Maximum part ratings are often influenced by the competitive nature of the
marketplace, where manufacturers will push the maximum ratings of their parts to the
point where reliability begins to suffer. Providing adequate design margin will lower
failure rates so as to support extremely high fielded reliability. The US Navy’s document
NAVSO P-3641A presents what many consider best-in-class recommendations for
component de-rating guidelines.
A general rule of thumb in most systems is that as temperature increases, reliability
decreases. The Arrhenius Model is often accepted as an accurate predictor of
semiconductor, and other device reliability. This model covers many of the nonmechanical (or non material fatigue) failure modes that cause electronic equipment
failure. It is particularly useful in describing failure mechanisms that depend on
6 of 18
chemical reactions, diffusion or migration processes. The model suggests the rate a
reaction occurs is given by the following equation:
-(EA /κT)
R(t) = A * e
Where A is a constant, EA is the activation energy of the reaction, κ is Boltzman’s
Constant and T is temperature in degrees Kelvin. The model predicts that as
temperature increases, the rate to failure increases, as depicted in Figure 3
100
100
10
Rate[ ( temperature+ 273)K]
1
0.1 0.1
50
60
50
70
80
temperature
90
100
100
Figure 3 - Illustration of Rate to Failure versus Temperature based on the
Arrhenius Model
Once the activation energy for a specific failure mechanism is known, the effects of
increased temperature on the rate of reaction can be expressed as:
Failure Rate @ Temp T1 / Failure Rate @ Temp T2 = exp*(EA /κ) (1/T2 – 1/T1)
The typical activation energy for failure mechanisms of components found within
electronic power supplies is on the order of 0.5 to 1.5 electron volts. Thus, a 10oC
increase in temperature can correlate to a two to eight times increase in various
component failure rates, and a corresponding reduction in unit reliability.
In response to this observation, NAVSO is particularly critical of component operating
temperatures. Typical of NAVSO guidelines is the limit of semiconductor components to
110oC maximum junction temperature under all conditions of line, load and ambient
temperature.
It is interesting to note that with a 110oC worst case junction
temperature, very often the typical temperature the equipment will experience is much
less than this, leading to many years of trouble-free service.
Figure 4 presents a handful of typical derating recommendations from NAVSO P-3641A.
(A complete list can be found in the NAVSO document.)
7 of 18
Component Type
Parameter
Resistor
Power
Voltage
Voltage
Temperature
Voltage
Power
Junction Temp
Voltage
Current
Temperature
Capacitor
(Electrolytic)
Transistor – Mosfet
Connector
Recommended
Derating
50%
80%
70%
20oC less than rated
70%
65%
110oC absolute maximum
20% of dielectric rating
60%
60oC below rating
Figure 4 – Typical NAVSO P-3641A Component Derating Recommendations
Best-in-class equipment will have a comprehensive review of all the components used in
the design against these requirements. This is typically accomplished with what is
known as a Component Stress Analysis Chart, as shown in Figure 5.
Assembly: 138220, Rev. p1, Bias Supply; Schematic: 138219, Filename: Stress138220
Abbreviations: N/A --- Not Applicable or Not Available; m --- milli; k --- kilo; TBD --- To Be Determined
Rated:
Ref.
Des.
Component Description
Volts
CAPACITOR, ELECTROLYTIC, 33F, 25V
25
In House Limits:
mA
Temp. milli
deg. C Watts
Volts
mA
In circuit, steady state at worst case
operating point:
Temp. milli Volts
mA
Temp. milli
deg. Watts
deg. C Watts
C
Comments
Manufacturer's Part No.
C01
C02
EEVFC1E330P
105
N/A
17.5
98
84
N/A
N/A
17.5
N/A
15
10
N/A
within NAVSO derating
CAPACITOR, ELECTROLYTIC, 33F, 25V
25
140
105
98
84
15
10
N/A
within NAVSO derating
C03
CAPACITOR, CERAMIC, 0.1UF, 50V
50
N/A
125
35
N/A
100
15
N/A
N/A
within NAVSO derating
C04
CAPACITOR, CERAMIC, 1UF, 25V
25
N/A
125
17.5
N/A
100
15
N/A
N/A
within NAVSO derating
C05
CAPACITOR, CERAMIC, 0.1UF, 50V
50
N/A
125
35
N/A
100
15
N/A
N/A
within NAVSO derating
C06
CAPACITOR, CERAMIC, 0022UF, 1000V
50
N/A
125
35
N/A
100
15
N/A
N/A
CAPACITOR, CERAMIC, 001UF, 1000V
200
N/A
125
N/A
140
N/A
100
N/A
15
N/A
N/A
within NAVSO derating
1000
N/A
125
N/A
700
N/A
100
N/A
200
N/A
N/A
within NAVSO derating
C07
C08
EEVFC1E330P
140
NOT USED
C09
GRM43-2X7R152K1KVBL CAPACITOR, CERAMIC, 0015UF, 1000V
C10
GRM43-2X7R152K1KVBL CAPACITOR, CERAMIC, 0015UF, 1000V
C11
within NAVSO derating
within NAVSO derating
CAPACITOR, CERAMIC, 240PF, 1000V
1000
N/A
125
N/A
700
N/A
100
N/A
200
N/A
N/A
within NAVSO derating
1000
N/A
125
N/A
700
N/A
100
N/A
100
N/A
N/A
within NAVSO derating
Figure 5 – Excerpt from typical Component Stress Analysis Chart
Effective Design Tools
Once a decision has been made to employ conservative design margins, it’s important
that this intention be effectively carried into the design. Modern computer aided
Electronic Design Automation (EDA) tools provide the means to implement effective
layout design rules within complex designs. Likewise, there are many design analysis
tools on the market that aid the designer in assuring components are operating within
reliable limits.
8 of 18
Figure 6 – Modern CAD Design Tools
Latent Failure Modes
The ingredients that produce latent failure modes are in place at the time of the
product’s manufacture, but require the effects of time, temperature, humidity, vibration
and other environmental factors before they result in compromised reliability. Typical
latent failure modes in power supplies include:
−
−
−
−
−
−
Compromised insulation due to dendrite growth or metallic migration
Compromised insulation due to environmental effects or Corona
Semiconductor die cracking due to mechanical stress
Semiconductor degraded performance due to humidity infusion
Semiconductor degraded performance due to ESD exposure
Electrolytic capacitor wear out
Compromised Insulation
One of the more prevalent failure modes observed in power conversion equipment is
compromised voltage insulation spacing. Product safety specifications such as EN-60950
or UL-1950 provide mandated spacing requirements from energized conductors to earth
ground, along with recommended in-circuit spacing requirements for functional
insulation. (Safety agency specifications normally allow in-circuit spacing to be violated
if it can be demonstrated that compromising the spacing does not result in an unsafe
condition.) Experience has shown this criterion is not necessarily a formula for extended
reliability. Many environments are prone to airborne contaminants and infusion of these
contaminants is one of the leading causes of premature unit failure.
Beyond conductive particle infusion, gradual infusion of normally non-conductive dust,
along with pre-existing sources of ionic contaminants, humidity and the presence of
significant electrical fields within power conversion equipment can lead to conductive
dendrite growth. As shown in Figure 7, even in “clean” office and data processing
9 of 18
environments, dust and other airborne environmental contaminants can build up to
problematic levels over time.
Figure 7 - Dust build up in data processing application
Dendrites are microscopic conductive paths that are formed when ionic materials, in the
presence of moisture and an electric field, disassociate into negatively and positively
charged materials. Figure 8 presents a photograph of a dendrite growing between two
PC board traces.
Non-coated conductor
on printed wiring board
Conductive dendrite
growing between
conductors
Non-coated conductor
on printed wiring board
Figure 8 - Dendrite (magnified) growing between two traces on PC Board
Non-conductive dust provides a moisture collection medium that enables the dendrite
forming process. Pre-existing sources of ionic materials include flux, cleaning fluids, and
plating chemistry residue from the surface finish. All printed wiring boards have a
significant amount of their conductors exposed for soldering and connection purposes.
Over time, exposure to environmental contaminants can lead to undesired bridging of
insulation spacing if adequate counter-measures are not provided.
More generous spacing around critical components and circuit connections can help
combat these effects. Along with improved spacing, a proactive approach that treats
critical circuit areas with a protective coating will significantly improve long-term product
performance. (TDI Power’s standard practice is to coat printed wiring boards so as to
maximize long-term reliability.)
10 of 18
Latent Failure Modes in Semiconductors
Semiconductor components have shown themselves to be particularly susceptible to
latent failures such as the well documented effects of Electrostatic Discharge (ESD).
Modern power semiconductors can be very cost effective and reliable, however, over
time the infusion of moisture through plastic package over-mold materials can be
problematic. Care must be taken that the internal construction of the device is
adequately protected so that infused moisture does not result in dendrite growth or
corrosion.
Best in class suppliers assure semiconductor reliability through a part qualification
process that includes long term testing under the conditions of high voltage,
temperature and humidity. Destructive physical analysis of the parts’ internal design
features is also critical. Oftentimes a design weakness can be spotted through a critical
review of the parameters such as mask alignment, guard ring structures, clearances
from conductors, etc.
Magnetic Component Reliability
Magnetic components, such as transformers and inductors, are often seen in greater
numbers in power supplies versus other types of equipment.
Safety agency
requirements provide a good basis for reliable design regarding winding insulation and
spacing. However, there are a number of other areas that should also be considered,
including the following.
−
Corona inception and deterioration of thin sheet insulating material used to
separate high frequency switching windings. This can become a problem at
voltages as low as 200VRMS, a situation quite often found in switching power
supplies. Corona is a partial breakdown of air due to high electric field intensity.
Microscopic air bubbles in thin sheet insulation can provide locations for corona
inception. Corona discharges can begin to eat away at the insulation, leading to
premature failure of the insulator.
−
The effects of simultaneous aging and high switching flux densities on certain
powdered iron core types. This can cause the binder used in the core material to
degrade, ultimately leading to increased core and winding losses and potential
catastrophic component failure.
−
Imperfections of wire or foil terminations causing mechanical abrasion of internal
insulators, ultimately causing insulation punch through.
−
Core loss characteristics that transition from a negative power dissipated versus
core temperature coefficient to a positive coefficient. This can cause core
permeability to drop off under extreme operating conditions with corresponding
unit failure.
11 of 18
Reliability Assurance Testing
Crucial to reliability assurance is testing of the design to determine its actual limits with
regard to ambient temperature, vibration stress, input voltage (both surges and steady
state), output current overload, and any other stressful parameters pertinent to the
application. Tests in this manner are generally referred to as “Highly Accelerated Life
Testing” or HALT. HALT is a destructive test that determines the margin between the
products’ intended environment and where it will fail.
Crucial to effective HALT testing is a well structured plan and the proper equipment to
carry it out. As shown in Figure 9, the equipment required will include temperature
chambers, vibration tables, variable input voltage and output loading generators, and
monitoring instrumentation.
Figure 9 – HALT Test Setup
Once the point at which the test subject fails has been determined, a judgment must be
made as to whether the design margin is good enough. Figure 10 presents typical
margins that TDI believes are reflective of good design for reliability practice.
Parameter
Operating temperature – High
Operating temperature – Low
Input Voltage – High
Input Voltage – Low
Output Current – High
Vibration
Dielectric Withstand Voltage
Input Surge Voltage
HALT Target
40oC above maximum rated operating
temperature
20o below minimum rated operating
temperature
30% greater than the maximum rated
input voltage
No damage or degradation with
continuous input voltage below minimum
rated value
50% greater than normal rating (internal
over-current limit point disabled)
>10 g-rms (random tri-axial) vibration
level for more than 1 hour
Minimum of 500V greater than rated value
Minimum of 500V greater than rated value
Figure 10 – Typical HALT Test Targets
12 of 18
HALT testing should be conducted at a point during the product’s development late
enough so that the sample being tested is a reasonable representation of the final
design, but not so late that any design improvements uncovered cannot find their way
into the final deign.
Production Reliability Screening Testing
Once HALT testing has identified the actual capability limits of the design, this
information is utilized to devise production reliability screening tests.
NAVMAT
Suggested Environmental Stress Screening (or “ESS”) as a means of screening
production product. In ESS, the subject unit is exposed to alternating high and low
operating temperatures, with modest transitional temperature ramps between tests.
While ESS provides a better screen than burn in, it was ultimately determined that a
more aggressive test that exposes the subject unit to faster temperature transitions,
along with other stresses, was a better at assuring unit performance. This test is
generally referred to as “Highly Accelerated Stress Screening” or HASS. As shown in
Figures 11 and 12, a typical HASS test simultaneously subjects the unit to the effects of
temperature extremes (high, low and transitional), vibration, input voltage variations
and output load cycling for a period of several hours.
Figure 11 – Production HASS Chamber
13 of 18
Figure 12 – Typical HASS Test Profile
A number of mathematical models are utilized to ascertain the effects of stresses
imposed during HASS. Utilizing these, the HASS profile is generally designed to present
the equivalent of 40 to 60 days of operation in the intended real life environment. In
this way, the unit is subjected to the infant mortality period of operation while it’s still in
the factory.
As it only takes a handful of field failures to corrupt a product line’s average reliability
performance in moderate production volumes (i.e., less than 10,000 units per year), it is
crucial that HASS be applied to each and every production unit that is manufactured.
Otherwise, crucial reliability information will be impossible to trend.
Best practice is achieved when reliability trends can be mapped back to the particular
production lot where the trend surfaced. This requires a high degree of production
control, along with traceability to the component lot level. Production procedures that
follow specific protocols in the event of uncovered trends must be put into place. These
include stopping the production line, quarantine of specific units, elevation of situation
to proper authorities, etc.
With adequate systems in place comes the opportunity to quarantine portions of
produced product for corrective actions, as appropriate. In power supplies this
capability is especially productive, as on occasion component suppliers (especially power
semiconductor manufacturers) can lose their recipe on a lot-to-lot basis. By maintaining
traceability, appropriate actions can be targeted at those units where they are required.
Production Quality Controls
Adequate controls of the manufacturing process are crucial to effective reliability
realization. Without these in place, processes such as HASS will only serve to screen out
an unacceptable number of failed units, resulting in an untenable position from both the
cost and time aspects.
14 of 18
Results of Effective Reliability Processes
Fielded product reliability performance is generally measured in terms of Mean Time
between Failures (“MTBF”). In its simplest form, MTBF is the total number of unit
population operating hours divided by the total number of unit failures that occurred
during the operating time period. More failures over time equates to lower MTBF.
Various methods for predicting MTBF are available, such as MIL-HDBK-217 or Belcore.
These estimate MTBF based on a failure summation per part type. Each part is assigned
a failure rate (failures per 106 operating hours) based on a database compiled by the
organization over many years of observation on numerous units. MTBF is the inverse of
the summation of individual failure rates.
Historically, MIL-HDBK-217 heavily penalized the use of commercial parts. One of the
factors that influences the base failure rate of any part type is the “Quality Factor” (πq).
For commercial parts, quite often this factor multiplies the base failure rate by ten times,
or more, presenting a poor expectation for unit reliability.
It has been TDI Power’s experience that rather than being a liability to achieving reliable
operation, commercial parts, when used in the context of the previous reliability
oversight system, provide base failure rates much lower than found in MIL-HDBK-217.
TDI has assembled a database of component type failure rates for power supplies
utilized in numerous computer and industrial applications. Figure 13 presents a
summary of these failure rates versus those contained in the military standard.
Part Type
Generic operating conditions
capacitor paper RFI
capacitor metalized plastic film
capacitor mica
capacitor ceramic
capacitor ceramic chip
capacitor aluminum electrolytic
capacitor aluminum electrolytic
capacitor tantalum electrolytic
50% rated voltage @ 30C
50% rated voltage @ 30C
50% rated voltage @ 30C
50% rated voltage @ 30C
50% rated voltage @ 30C
60% rated voltage @ 40C
80% rated voltage @ 60C
50% rated voltage @ 30C
Monolithic bipolar IC
Monolithic bipolar IC
Monolithic bipolar IC (PWM)
non hermitic linear @ Tj=45C, 8 pin
non hermitic linear @ Tj=45C, 16 pin
non hermitic linear @ Tj=45C, 16 pin
Disc Semi Silicon Diode low current
Disc Semi Silicon Diode high current
Disc Semi Zener Diode
Disc Semi PNP Trans low power
Disc Semi NPN Trans high power
Disc Semi NPN Trans high power
Disc Semi FET Trans
Disc Semi Opto elec.
Disc Semi LED
10% rated current stress Tj = 30C
20% rated current stress Tj = 80C
40% rated power Tj = 30C
10% rated power stress Tj = 30C
10% rated power stress, Tj = 80C
10% rated power stress, Tj = 80C
10% rated power stress, Tj = 80C
Tj=30
Tj=30
Resistor fixed film
Resistor fixed film power
Resistor fixed wirewound
Resistor thermistor
50% rated power @ 40C
50% rated power @ 40C
50% rated power @ 40C
50% rated power @ 40C
Inductor/transformer high power
Inductor/transformer low power
85C max operating temp on 180C device
30C max operating temp on 180C device
Relay
Thermostat
30C operation, low cycle rate
Fuse cartrige
p MIL217
p
TDI
stress
Data Base
MIL-STD-217 Coefficients
Base
Failure
Rate
b
0.0039
0.0045
0.0019
0.0039
0.0026
0.023
0.06
0.015
b
b
0.00021
0.00069
0.00079
0.00079
0.0012
0.0012
0.024
0.0055
0.00065
b
0.0012
0.012
0.0055
0.065
b
0.007
0.0066
b
0.0061
0.0061
b
0.01
Quality
Factor
Q
7
10
15
10
10
10
10
10
Q
20
20
20
Q
15
15
30
12
12
12
12
1
1
Q
15
3
15
15
Q
8
1.5
Q
1
1
Q
1
Environmental Coefficients…
E
CV
1
1
1
0.7
1
1
1
1.3
1
1.6
1
1
1
1
1
1.6
E
T
V
C1
C2
0.38 1.1
1
0.01 0.0026
0.38 1.1
1
0.01 0.0061
0.38 1.1
1
0.01 0.0061
E
R
S2
C
1
0.6
1
0.7
1
1
0.6
10
0.7
1
1
1
1
1
1
1
0.7
1.5 0.3
1
1
0.7
5
1.2
1
1
0.7
5
1.2
1
1
0.7
1
1
1
1
19
1
1
1
1
19
1
1
1
E
R
1
1
1
1
1
1
1
1
E
1
1
E
C
cyc F
Q
1
1
0.1 6
1
1
0.1 6
E
1
Figure 13 – Comparison of Failure Rates
15 of 18
L
1
1
1
p
MIL/TDI
Resultant
Failure
Rate
Resultant
Failure
Rate
0.0273
0.0315
0.0285
0.0507
0.0416
0.23
0.6
0.24
0.00087
0.00087
0.00087
0.00024
0.00024
0.00027
0.00027
0.00024
31
36
33
211
173
852
2,222
1,000
0.23976
0.26636
0.26636
0.00303
0.00379
0.00168
79
70
159
0.001323
0.04347
0.0237
0.003
0.06048
0.06048
0.2016
0.1045
0.0124
0.00002
0.00063
0.00031
0.00015
0.00017
0.00017
0.00247
0.00175
0.00458001
66
69
76
20
356
356
82
60
3
0.018
0.036
0.0825
0.975
0.00001
0.00210194
0.015
0.0063
1,800
17
6
155
0.056
0.0099
0.00185
0.00185
30
5
0.13
0.00366
0.0606
2
0.00214376 2
0.01
0.00242
4
As shown in Figure 13, component failure rates have been observed to be many times
lower than was previously indicated by MIL-HDBK-217, resulting in increased MTBF. It
has been TDI Power’s experience that achieved MTBF can be ten times (or more) than
that suggested by MIL-HDBK-217.
When tracking fielded MTBF, it is normal for a measured MTBF on a growing population
to show growth from a low value toward its eventual long term average value as total
unit operation hours begin to accumulate. Ultimate demonstration of projected long
term MTBF is a function of how fast the fielded population of units grows and any early
life failures (i.e., infant mortality) that occur in the fielded population.
For example, a highly complex 2600W AC-DC power supply, whose MTBF was predicted
by MIL-HDBK-217 to be on the order of 10,000 hours, had a predicted long term MTBF
of >750,000 hours using failure rates from TDI Power’s database. This unit has shown
excellent field reliability with a currently measured fielded MTBF of 330,000 hours and a
trajectory that is headed for the anticipated predicted long term value.
Figure 14 presents a view of the unit considered in the previous example, while Figure
15 presents the fielded unit population growth from zero to a total population of >2,500
units at the end date. Figure 16 presents the measured MTBF growth of this unit.
Figure 14 – 2400W Power Supply in Example
16 of 18
17 of 18
Oct-08
Aug-08
May-08
Feb-08
Nov-07
Aug-07
May-07
Feb-07
Nov-06
Aug-06
May-06
Feb-06
Nov-05
Aug-05
May-05
Feb-05
Nov-04
Aug-04
May-04
Feb-04
Nov-03
Aug-03
May-03
Measured MTBF (Hours)
Oct-08
Aug-08
May-08
Feb-08
Nov-07
Aug-07
May-07
Feb-07
Nov-06
Aug-06
May-06
Feb-06
Nov-05
Aug-05
May-05
Feb-05
Nov-04
Aug-04
May-04
Feb-04
Nov-03
Aug-03
May-03
Total No of Units in Service
Units in Service
3,000
2,500
2,000
1,500
1,000
500
0
Date
Figure 15 – Fielded Unit Population Growth
Measured MTBF on 7200W AC-DC Unit
400,000
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
Date
Figure 16 – Measured Fielded MTBF of Growing Unit Population
Conclusion
Best in class reliability assurance practices, based on a Total Quality Management
philosophy and including HALT and HASS testing, provide power system reliability levels
well above the traditional expectations of military standards. The achievement of these
results requires a significant commitment by the supplier to provide the necessary
infrastructure and expertise to support the comprehensive systems required.
18 of 18