Download Inferences About Process Quality

Document related concepts

German tank problem wikipedia , lookup

Student's t-distribution wikipedia , lookup

Multimodal distribution wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
SMU
EMIS 7364
NTU
TO-570-N
Statistical Quality Control
Dr. Jerrell T. Stracener,
SAE Fellow
Inferences About Process Quality
Updated: 2/3/04
1
Inferences about Process Quality
• Sampling & Sampling Distributions
• Inferences Based on Single Random Sample
• Inferences Based on Two Random Samples
• Inferences Based on More than Two Random Samples
2
Sampling & Sampling Distributions
3
Population vs. Sample
• Population
the total of all possible values (measurement,
counts, etc.) of a particular characteristic for a
specific group of objects.
• Sample
a part of a population selected according to
some rule or plan.
Why sample?
4
Sampling
Characteristics that distinguish one type of sample
from another:
• the manner in which the sample was obtained
• the purpose for which the sample was obtained
5
Simple Random Sample
The sample X1, X2, ... ,Xn is a random sample if
X1, X2, ... , Xn are independent identically
distributed random variables.
Remark: Each value in the population has an
equal and independent chance of being included
in the sample.
6
Generating Random Samples
using Monte Carlo Simulation
7
Generating Random Numbers
f(y)
y
F(y)
ri
1.0
0.8
0.6
0.4
0.2
0
y
yi
8
Generating Random Numbers
Generating values of a random variable using the
probability integral transformation to generate a
random value y from a given probability density
function f(y):
1. Generate a random value rU from a uniform
distribution over (0, 1).
2. Set rU = F(y)
3. Solve the resulting expression for y.
9
Generating Random Numbers with Excel
From the Tools menu, look for Data Analysis.
10
Generating Random Numbers with Excel
If it is not there, you must install it.
11
Generating Random Numbers with Excel
Once you select Data Analysis, the following window will
appear. Scroll down to “Random Number Generation” and
select it, then press “OK”
12
Generating Random Numbers with Excel
Choose which distribution you would like. Use uniform for an
exponential or weibull distribution or normal for a normal or
lognormal distribution
13
Generating Random Numbers with Excel
Uniform Distribution, U(0, 1).
Select “Uniform” under the “Distribution” menu.
Type in “1” for number of variables and 10 for number of
random numbers. Then press OK. 10 random numbers of
uniform distribution will now appear on a new chart.
14
Generating Random Numbers with Excel
Normal Distribution, N(m, s).
Select “Normal” under the “Distribution” menu.
Type in “1” for number of variables and 10 for number of
random numbers. Enter the values for the mean (m) and
standard deviation (s) then press OK. 10 random numbers of
uniform distribution will now appear on a new chart.
15
Generating Random Values from an Exponential
Distribution E() with Excel
First generate n random variables, r1, r2, …, rn, from U(0, 1).
Select “Uniform” under the “Distribution” menu.
Type in “1” for number of variables and 10 for number of
random numbers. Then press OK. 10 random numbers of
uniform distribution will now appear on a new chart.
16
Generating Random Values from an Exponential
Distribution E() with Excel
Select a  that you would like to use, we will use  = 5.
Type in the equation xi=-ln(1 - ri), with filling in  as 5, and ri as cell A1
(=-5*LN(1-A1)). Now with that cell selected, place the cursor over the
bottom right hand corner of the cell. A cross will appear, drag this cross
down to B10. This will transfer that equation to the cells below. Now we
have n random values from the exponential distribution with parameter =5
in cells B1 - B10.
17
Generating Random Values from an Weibull
Distribution W(,b) with Excel
First generate n random variables, r1, r2, …, rn, from U(0, 1).
Select “Uniform” under the “Distribution” menu.
Type in “1” for number of variables and 10 for number of
random numbers. Then press OK. 10 random numbers of
uniform distribution will now appear on a new chart.
18
Generating Random Values from an Weibull
Distribution W(,b) with Excel
Select a  and b that you would like to use, we will use  = 100, b = 20.
Type in the equation xi = [-ln(1 - ri)]1/b, with filling in  as 100, b as 20, and
ri as cell A1 (=100*(-LN(1-A1))^(1/20)). Now transfer that equation to the
cells below. Now we have n random variables from the Weibull distribution
with parameters =100 and b=20 in cells B1 - B10.
19
Generating Random Values from an Lognormal
Distribution LN(m, s) with Excel
First generate n random variables, r1, r2, …, rn, from N(0, 1).
Select “Normal” under the “Distribution” menu.
Type in “1” for number of variables and 10 for number of
random numbers. Enter 0 for the mean and 1 for standard
deviation then press OK. 10 random numbers of uniform
distribution will now appear on a new chart.
20
Generating Random Values from an Lognormal
Distribution LN(m, s) with Excel
Select a m and s that you would like to use, we will use m = 2, s = 1.
Type in the equation , xi  e m  ris with filling in m as 2, s as 1, and ri as cell
A1 (=EXP(2+A1*1)). Now transfer that equation to the cells below. Now we
have an Lognormal distribution in cells B1 - B10.
21
Flow Chart of Monte Carlo Simulation method
Input 1: Statistical distribution
for each component variable.
Select a random value from
each of these distributions
Input 2: Relationship
between component
variables and system
performance
Calculate the value of system
performance for a system
composed of components with the
values obtained in the previous step.
Repeat
many
times
Output: Summarize and plot resulting
values of system performance. This
provides an approximation of the
distribution of system performance.
22
Distribution of Sample Mean
23
Sampling Distribution of X with known s
If X1, X2, ... ,Xn is a random sample of size n
from a normal distribution with mean m and
known standard deviation s,
1 n
and if X   X i ,
n i 1
then
and
 σ 
X ~ N μ,

n

Z
X μ
~ N0,1
σ
n
24
Central Limit Theorem
If X is the mean of a random sample of size n,
X1, X2, …, Xn, from a population with mean m and
finite standard deviation s, then if n   the
limiting distribution of
Z
X m
s
n
is the standard normal distribution.
25
Central Limit Theorem
Remark: The Central Limit Theorem provides the
basis for approximating the distribution of X with
a normal distribution with mean m and standard
deviation
s
n
The approximation gets better as n gets larger.
26
Sampling Distribution of X with Unknown s
Let X1, X2, ..., Xn be independent random variables
that have normal distribution with mean m and
unknown standard deviation s. Let
1 n
X   Xi
n i 1
and


n
2
1
2
S 
Xi  X

n  1 i 1
Then the random variable
X μ
T
S
n
has a t-distribution with  = n - 1 degrees of freedom.
27
Distribution of Sample
Standard Deviation
28
Sampling Distributions of S2
If S2 is the variance of a random sample of size n
taken from a normal population having the
variance s2, then the statistic

2

n  1 s

s
2
2
n

i 1
X
i
X
s2

2
has a chi-squared distribution with  = n - 1
degrees of freedom.
29
Inferences Based on a
Single Random Sample
30
Estimation - Binomial Distribution
Estimation of a Proportion, p
• X1, X2, …, Xn is a random sample of size n from
B(n, p)
• Point estimate of p:
fs
P
n
^
where fs = # of successes
31
Estimation - Binomial Distribution
• Approximate (1 - ) ·100% confidence interval
for p: p 'L , p 'U

where

^
and
p 'L  p p
^
p  p p
'
U
^ ^
where
and
p  Z / 2
Z
2
pq
n
,
is the value of the standard normal
random variable Z such that
PZ  z / 2  

2
32
Estimation of the Mean - Normal Distribution
• X1, X2, …, Xn is a random sample of size n from
N(m, s), where both m & s are unknown.
• Point Estimate of m
1 n
μ   Xi  X
n i 1
^
• (1 - )  100% Confidence Interval for the mean μ L , μ U 
where
Δμ  t α
2
, n 1
s
,
n
μ L  X  Δμ
and
μ U  X  Δμ
33
Estimation of the Mean - Infinite Population
- Type Unknown
• X1, X2, …, Xn is a random sample of size n
• Point Estimate of m
1 n
μ   Xi  X
n i 1
^
• An approximate
(1 - )  100% Confidence Interval for the mean
μL , μU 
where
μ  t α
2
based on the Central Limit Theorem
, n 1
s
n
μL  X  Δμ
and
μU  X  Δμ
34
Estimation of Means - Finite Populations
• X1, X2, ... , Xn is a random sample of size n from
a population of size N with unknown parameters
m and s
^
• Point Estimate of m: m  X
• An approximate (1 - ) · 100% Confidence
Interval for m is, m 'L ,m 'U
where

^
m  x  m
'
L
where Δμ  t α
2
, n 1

and
s
n
^
m  x  m ,
'
U
Nn
,
N 1
35
Estimation of Means - Finite Populations
where
n

1
S 
Ti  T

n  1 i 1
2

2

 
• t  is the value of T ~ tdf for which P T  t   

, n 1
, n 1 
2
2

 2
•
Nn
is the finite population correction factor
N 1
36
Estimation of Lognormal Distribution
• Random sample of size n, X1, X2, ... , Xn from
LN (m, s)
• Let Yi = ln Xi for i = 1, 2, ..., n
• Treat Y1, Y2, ... , Yn as a random sample from
N(m, s)
• Estimate m and s using the Normal Distribution
Methods
37
Estimation of Weibull Distribution
• Random sample of size n, T1, T2, …, Tn, from
W(b, ), where both b &  are unknown.
• Point estimates
^
• β is the solution of g(b) = 0
n
where
gβ  
β
T
 i lnT i
i 1
n
β
T
 i
1 1 n
   lnT i
β n i 1
i 1
1
 1 n β^  β^
• θ    Ti 
 n i 1 
^
38
Estimation of Standard Deviation - Normal Distribution
• Point Estimate of s

1 n
s
Xi  X

n i 1
^

2
n 1
s
n
• (1 - ) · 100% Confidence Interval for s is, sL ,s U 
where
(n  1)
sL  s 2
x / 2,n 1
and
(n  1)
sU  s 2
x1 / 2,n 1
39
Testing Hypotheses
There are two possible decision errors associated
with testing a statistical hypothesis:
A Type I error is made when a true hypothesis is rejected.
A Type II error is made when a false hypothesis is accepted.
Decision
Accept H0
Reject H0
(Accept H1)
True Situation
H0 true
H0 false
correct
Type II error
Type I errorcorrect
40
Testing Hypotheses
The decision risks are measured in terms of
probability.

= P(Type I error)
= P(reject H0|H0 is true)
= Producers risk
b
= P(Type II error)
= P(accept H0|H1 is true)
= Consumers risk
Remark: 100% ·  is commonly referred to as the
significance level of a test.
Note: For fixed n,  increases as b decreases, and vice
41
versa, as n increases, both  and b decrease.
Power Function
Before applying a test procedure, i.e., a decision
rule, we need to analyze its discriminating power,
i.e., how good the test is. A function called the
power function enables us to make this analysis.
Power Function = P(rejecting H0|true parameter value)
OC Function
= P(accepting H0|true parameter value)
= 1 - Power Function
where OC is Operating Characteristic.
42
Power Function
A plot of the power function vs the test parameter
value is called the power curve and 1 - power curve
is the OC curve.
ideal power curve
PR(m)
1
m
0
H0
H1
43
Power Function
The power function of a statistical test of
hypothesis is the probability of rejecting H0 as
a function of the true value of the parameter
being tested, say , i.e.,
PF() = PR()
= P(reject H0|)
= P(test statistic falls in CA|)
44
Operating Characteristic Function
The operating characteristic function of a statistical
test of hypothesis is the probability of accepting
H0 as a function of the true value of the parameter
being tested, say , i.e.,
OC() = PA()
= P(accept H0|)
= P(test statistic falls in CR|)
45
Tests of Proportions
Let X1, X2, . . ., Xn be a random sample of size n
from B(n, p).
Case 1: small sample sizes
To test the Null Hypothesis
H0: p = p0, a specified value, against the
appropriate Alternative Hypothesis
or
or
1. HA: p < p0 ,
2. HA: p > p0 ,
3. HA: p  p0 ,
46
Tests of Proportions
at the 100 · % Level of Significance, calculate
the value of the test statistic using X ~ B(n, p = p0).
Find the number of successes and compute the
appropriate P-Value, depending upon the alternative
hypothesis and reject H0 if P  , where
or
or
1. P = P(X  x|p = p0) ,
2. P = P(X  x|p = p0) ,
3. P = 2P(X  x|p = p0) if x < np0, or
P = 2P(X  x|p = p0) if x > np0,
47
Tests of Proportions
Case 2: large sample sizes with p not extremely
close to 0 or 1.
To test the Null Hypothesis
H0: p = p0, a specified value, against the
appropriate Alternative Hypothesis
or
or
1. HA: p < p0 ,
2. HA: p > p0 ,
3. HA: p  p0 ,
48
Tests of Proportions
Calculate the value of the test statistic
x  np 0
Z
np 0 q 0
and reject H0 if
or
or
1.
z  z ,
2.
z  z ,
3. z   z α
2
or
z  zα ,
2
depending on the alternative hypothesis.
49
Test of Means
Let X1, …, Xn, be a random sample of size n, from
a normal distribution with mean m and standard
deviation s, both unknown.
To test the Null Hypothesis
H0: m = m0 , a given or specified value
against the appropriate
Alternative Hypothesis
or
or
1. HA: m < m0 ,
2. HA: m > m0 ,
3. HA: m  m0 ,
50
Test of Means
at the 100  % level of significance. Calculate the
value of the test statistic
X  m0
t
s
n
Reject H0 if
1. t < -t, n-1 ,
2. t > t, n-1 ,
3. t < -t/2, n-1 , or if t > t/2, n-1 ,
depending on the Alternative Hypothesis.
51
Test of Variances
Let X1, …, Xn, be a random sample of size n, from
a normal distribution with mean m and standard
deviation s, both unknown.
To test the Null Hypothesis
H0: s2 = s20, a specified value
against the appropriate
Alternative Hypothesis
or
or
1. HA: s2 < s20 ,
2. HA: s2 > s20 ,
3. HA: s2  s20 ,
52
Test of Variances
at the 100  % level of significance. Calculate the
value of the test statistic
 2  n  1
s2
s 02
Reject H0 if
1. 2 < 21-, n-1 ,
2. 2 > 2, n-1 ,
3. 2 < 21-/2, n-1 , or if 2 > 2/2, n-1 ,
depending on the Alternative Hypothesis.
53
Inferences Based on
Two Random Samples
54
Estimation - Binomial Populations
Estimation of the difference between two
proportions
• Let X11, X12, …, X1n1 , and X21, X22, …, X 2 n2 ,
be random samples from B(n1, p1) and
B(n2, p2) respectively
• Point estimation of p1 - p2
^
^
^
 p  p1  p 2
 X1  X 2
f
f
 1 2
n1 n2
55
Estimation - Binomial Populations
• Approximate (1 - ) · 100% confidence interval
for p  p1  p2
pL , pU 
where
^
^
pL   p  Z 
2
^
^
^
p1 q1 p2 q2

n1
n2
and
^
^
pU   p  Z 
2
^
^
^
p1 q1 p2 q2

n1
n2
56
Estimation of Difference
Between Two Means - Normal Distribution
• Let X11, X12, …, X1n1, and X21, X22, …, X 2 n2be
random samples from N(m1, s1) and N(m2, s2),
respectively, where m1, s1, m2 and s2 are all
unknown
• Point estimation of m = m1 - m2
^
^
^
Δ μ  μ1  μ 2
 X1  X 2
57
Estimation of Difference
Between Two Means - Normal Distribution
• An approximate (1 - ) · 100% Confidence
Interval for m = m1 - m2
'
 '


m
,

m
 L
U 



 s12 s22
m L   m   t 

,
 2  n1 n2
'
^
where

 s12 s22
mU   m   t 

,
 2  n1 n2
'
^
58
Estimation of Difference
Between Two Means - Normal Distribution
where  = degrees of freedom
2
s
s 
  
n1 n2 


2
2
2
2
 s1   s2 
   
 n1    n2 
n1  1 n2  1
2
1
2
2
59
Estimation of Ratio of Two
Standard Deviations - Normal Distribution
• Let X11, X12, …, X1n1, and X21, X22, …, X 2 n2be
random samples from n(m1, s1) and n(m2, s2),
respectively
• Point estimation of
s1
rs 
s2
^
where

s1
rs 
s2
1 ni
si 
X ij  X i

n  1 j 1
for i = 1, 2

2
60
Estimation of Ratio of Two
Standard Deviations - Normal Distribution
• (1 - ) · 100% Confidence Interval for
r
sL
where
, rs U

s1
rσ L 
s2
s1
rs 
s2
1
Fα , υ1 , υ 2
2
and
rσ U
s1

Fα , υ 2 , υ1
s2 2
61
Estimation of Ratio of Two
Standard Deviations - Normal Distribution
where F ,1 , 2 is the value of the F-Distribution with
2
1  n1  1 and 2  n2 1 degrees of freedom for which

 
P F  F ,1 ,2  
2

 2
62
Test on Two Means
Let X11, X12, …, X1n1 be a random sample of size n1 from
N(m1, s1) and X21, X22, …, X2n2 be a random sample of
size n2 from N(m2, s2), where m1, s1, m2 and s2 are all
unknown.
To test
H0:
m1 - m2 = do, where do  0,
against the appropriate alternative hypothesis
63
Test on Two Means
1.
H1:
m1 - m2 < do, where do  0,
2.
H1:
m1 - m2 > do, where do  0,
3.
H1:
m1 - m2  do, where do  0,
or
or
at the   100% level of significance, calculate the
value of the test statistic.
t' 
X
1
 X2  d0
s12 s 22

n1 n 2
64
Test on Two Means
Reject Ho if
1.
t' < t,
or
2.
t' > t,
or
3.
t' < t/2, or t' > t/2, depending on the
alternative hypothesis.
2
s
s 
  
n1 n 2 


2
2
 s12   s 22 
   
 n1    n 2 
n1  1 n2  1
2
1
2
2
65
Test on Two Variances
Let X11, X12, …, X1n1 be a random sample of size n1 from
N(m1, s1) and X21, X22, …, X2n2 be a random sample of
size n2 from N(m2, s2), where m1, s1, m2 and s2 are all
unknown.
To test
H0:
σ12  σ 22
against the appropriate alternative hypothesis
66
Test on Two Variances
1.
H1:
σ12  σ12
2.
H1:
σ12  σ12
3.
H1:
σ σ
or
or
2
1
2
1
at the   100% level of significance, calculate the
value of the test statistic.
2
1
2
2
S
F
S
67
Test on Two Variances
Reject Ho if
F  F1 (v1 , v2 )
or
F  Fα (v1,v2 )
or
F  F1 / 2 (v1 , v2 ) or F  F / 2 (v1 , v2 )
depending on the alternative hypothesis.
68
Inferences Based on
More than Two Random Samples
69
Normal Distribution - Estimation of m
X1, X2, …, Xn is a random sample of size n from
N(m, s), where both m & s are unknown.
• Point Estimate of m
1 n
μ   Xi  X
n i 1
^
• (1 - )·100% Confidence Interval for m is μ L , μ U ,
where
μ L  X  Δμ
and
μ U  X  Δμ
70
Normal Distribution - Estimation of m
Δμ  t α
2
where t α
2
, n 1
, n 1
s
n
is the value of the t-distribution with
parameter  = n-1
which P(T> t α
2
, n 1
) = /2
and may be obtained from the table t-distribution
(Located in the resource section on the website).
71
Estimation of Lognormal Distribution
• Random sample of size n, X1, X2, ... , Xn from
LN (m, s)
• Let Yi = ln Xi for i = 1, 2, ..., n
• Treat Y1, Y2, ... , Yn as a random sample from
N(m, s)
• Estimate m and s using the Normal Distribution
Methods
72
Estimation of Weibull Distribution
• Random sample of size n, T1, T2, …, Tn, from
W(b, ), where both b &  are unknown.
• Point estimates
^
• β is the solution of g(b) = 0
n
where
gβ  
β
T
 i lnT i
i 1
n
β
T
 i
1 1 n
   lnT i
β n i 1
i 1
1
 1 n β^  β^
• θ    Ti 
 n i 1 
^
73