Download Archived link

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Stat Exposé
The d-two-star tables uncovered
Gregory F. Gruska, The Third Generation Inc
with input from the MSA workgroup
David Benham, DaimlerChrylser Corporation
Peter Cvetkovski, Ford Motors Corporation
Michael Down, General Motors Corporation
Abstract
From the beginning of the development and implementation of Statistical Quality Control
–SQC (now known as Statistical Process Control – SPC), the Range has been used to
develop an estimate of the process standard deviation. Since the Range is a biased
estimate of the standard deviation “correction” factors have to be used to transform the
average Range to an estimate of the process standard deviation. This paper will discuss
the development of the d 2* tables which contain the necessary “correction” factors.
ξξξξξ
Warning
This paper is rated ξ (xi) since it contains Greek letters, mathematical symbols, and
statistical terminology. Individual with no statistical background or training should
proceed with caution. Professional Statistical guidance is recommended.
Warning
ξξξξξ
This paper is intended to serve as additional guidelines for the analysis of measurement systems.
www.aiag.org/publications/quality/msa3.html
1
The d-two-star tables uncovered
From the beginning of the development and implementation of Statistical Quality Control
–SQC (now known as Statistical Process Control – SPC), the range has been used to
develop an estimate of the process standard deviation. Although the range provides a less
efficient estimate of the population standard deviation, it was widely used due to the ease
of calculation and the lack of inexpensive computers and calculators capable of
calculating the standard deviation during the first five decades of SPC.
Because of its wide use the distribution of the range in random samples from a normal
distribution has been studied by noted statisticians such as David, Grubbs, Weaver,
Patniak, Hartley, Pearson, and Duncan. The difficulty with reading their papers is that
there is no consistent notation used. This paper will use the notation contained in the
Quality Control and Industrial Statistics1,.by Acheson Duncan because of its prominence
in the Quality field.
The above authors have shown that the distribution of the range in random samples from
a normal distribution:
• Depends on the sample size
• Is independent of the population mean
• Is dependent on the population standard deviation
Further, the relative efficiency of the range as an estimator of the standard deviation
decreases as the sample size increases2.
Unfortunately, a simple form of the exact distribution of the (mean) range cannot be
developed except for the trivial case of two samples of two observations each. However,
Patniak (1950) did develop a useful approximation to this distribution which is utilized
here.
Approximation to the Distribution of the Mean Range
Let x1 , x2 ,…, xm denote a random sample of size m from a normal population having
mean µ and standard deviation σ . The range of this sample is
Range = Rm = max ( x1 , x2 ,… , xm ) − min ( x1 , x2 ,… , xm
)
If there are g such independent samples each with a sample size of m, the mean of the g
ranges is denoted in this paper by Rg ,m .
Let Wm denote the range of the standardized (z) values. That is,
1 Quality Control and Industrial Statistics, 5th edition, McGraw-Hill, 1986.by Acheson Duncan
2
A generally acceptable rule is that the range should not be used when the sample size exceeds 20. In these
cases it is preferable to divide the sample into a number of groups and consider the average range over all
the groups. A subgroup size of seven or eight provides the most efficient estimation.
2
Wm = max ( z1 , z2 ,… , zm ) − min ( z1 , z2 ,… , zm
)
where zi =
xi − µ
σ
Then the probability integral of Wm can be expressed as
P (Wm ) = m ∫
∞
−∞
f ( z)
{∫
z +Wm
z
}
m −1
f (u )du
dz
where f ( x ) is the normal frequency function: f ( x ) =
1 − 12 x2
e
2π
The moments for the probability integral of Wm have been calculated to 5 decimal places
using numerical quadrature by Hartley and Pearson (1951). The following table has been
extracted from their work.
Sample
Size
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Mu = d2
Var = Vm
1.12838
1.69257
2.05875
2.32593
2.53441
2.70436
2.8472
2.97003
3.07751
3.17287
3.25846
3.33598
3.40676
3.47193
3.53198
3.58788
3.64006
3.68896
3.73495
0.72676
0.78922
0.77407
0.74661
0.71916
0.69424
0.67213
0.65262
0.63531
0.61984
0.60601
0.59353
0.58217
0.57186
0.56237
0.55363
0.54554
0.53802
0.53097
Table 1: Mean and Standard Deviation for
the distribution of ranges in normal samples
The range is a biased estimate of the standard deviation and a “correction” factor (d2 ) has
to be used to transform the average range to an estimate of the process standard deviation.
I.e.,
R
E(Wm) = m = d 2
σ
Rm
σ =
So
d2
3
For the distribution of Rg ,m , things are not so simple. However, based on the work by
Pearson (1926), Patniak selected the χ – distribution as a reasonably accurate
representation of the distribution of Rg ,m .
The first two moments of Rg ,m are related to those of Rm by
R

E  g ,m  = d 2
σ 

1
1
R

Vm = Vg ,m
var  g ,m  =
var ( Rm ) =
2
σ  gσ
gσ 2

Relating these two moments with those of
c
χ where χ has ν degrees of freedom
ν
yields:
d2 =
Vg ,m
c 2 ν +1 
ν 
Γ
 Γ 
ν  2 
2
2
c2 
  ν +1 
ν  
= ν − 2  Γ 
 Γ   
ν 
 2   
  2 
where c = d 2*
The Γ -functions can be expanded by Stirling’s formula and the resulting equations
simplified and used to solve for d 2* and ν .
(d )
* 2
2
= d 22 + Vm g
ν = A−1 +
1
3
3 2
−
A+
A
4 16
64
where A =
2Vm
d 22
These are the formulae used to generate the d 2* table in Appendix C of the MSA Manual,
3rd edition.
2
The constant difference (C.D.) given in the table is calculated by d 2
.
2Vm
Using the d 2* Table
Whither go g and m?
The thing that tends to be most confusing to people first using the d 2* table is what shout
the value for g and m be. The best thing is to bring it down to basics:
How many ranges are used to calculate the average range?
=g
How many pieces were used to determine each range?
=m
4
In the MSA 3rd example for the Range Method there are five ranges used to calculate the
average range – hence g = 5. And each range was the difference of samples of size two –
m = 2. From the table the value of d 2* for g = 5 (= number of parts) and m = 2 (= number
of appraisers) is 1.19105 or simply 1.19 (unless you are enamored with decimals).
In the GRR example with a 3, 10, 3 setup, the average range used for repeatability
calculations has g = 30 (= number of parts * number of appraisers) range values used in
the calculations. Each range is based on a sample of m = 3 (= number of trials). Going to
the table we cannot find g = 30 – the largest g is 20. We make the assumption that 30 is
sufficiently large and use the d 2 value of 1.69257 for d 2* in the calculation of
1
1
K1 = * =
= 0.59081751419439077852023845394873 = 0.5908
d 2 1.69257
C.D.s and dfs
The constant difference term is used to determine the degrees of freedom value (ν ) when
the number of samples (g) exceed the tabled values (i.e. g > 20).
Example:
Find d 2* and ν for g = 22 and m = 8.
From the table we have d 2* = 2.85310 and ν = 120.9 for g = 20 m = 8 with d 2 = 2.8472
and C.D. = 6.0305.
For g = 22
take d 2* = 2.853 since 22 is closer to 20 than infinity.
ν = 120.9 + 2*(6.0305) = 132.961 or 133.0
Yes, there is some “fudging”, but remember these are only approximations.
Gregory F. Gruska, a Fellow of the American Society for Quality (ASQ), is the principal consultant in
performance excellence for Omnex, LLC. an Engineering and Management services firm. Greg has been
involved in the development of theory and software and co-authored over 60 books and papers in statistical
theory and applications and quality management.
5
References
Duncan, A. (1986). Quality Control and Industrial Statistics, 5th edition, McGraw-Hill,
New York.
David, H. A. (1951). “Further Applications of Range to the Analysis of Variance”
Biometrika, 38, 393.
Florin, H. (1950). Comminucations of the Royal Finnish Academy (Science Series), 12, 6.
Grubbs, F. E. and Weaver, C. L.(1947) “The Best Estimate of Population Standard
Deviation Based on Group Ranges”, JASA, 42, 224
Hartley, H. O. and Pearson, E. S. (1951). “Moment constants for the distribution of
Range in Normal Samples”, Biometrika, 38, 463.
Patniak, P.B. (1950). “The Use of Mean Range as an Estimator of Variance in Statistical
Tests”, Biometrika, 37, 78.
Pearson E. S. (1951). “Some Notes on the Use of Range”, Biometrika, 38, 88.
Pearson, E. S. and Hartley, H. O. (eds.)(1976). Biometrika Tables for Statisticians,
Griffen and Co., London
6