Download Chapter03

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Chapter 3
Inferences About Process Quality
Introduction to Statistical Quality Control,
4th Edition
3-1. Statistics and Sampling
Distributions
• Statistical methods are used to make
decisions about a process
– Is the process out of control?
– Is the process average you were given the true
value?
– What is the true process variability?
Introduction to Statistical Quality Control,
4th Edition
3-1. Statistics and Sampling
Distributions
• Statistics are quantities calculated from a
random sample taken from a population of
interest.
• The probability distribution of a statistic is
called a sampling distribution.
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• Let X represent measurements taken from a
normal distribution. X~ N( , )
• Select a sample of size n, at random, and
calculate the sample mean, x
2
 2 
• Then x ~ N ,

 n 


Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• Probability Example
– The life of an automotive battery is normally
distributed with mean 900 days and standard
deviation 35 days. What is the probability that
a random sample of 25 batteries will have an
average life of more than 1000 days?
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• Chi-square (2) Distribution
– If x1, x2, …, xn are normally and independently
distributed random variables with mean zero
and variance one, then the random variable
y  x  x  ...  x
2
1
2
2
2
n
is distributed as chi-square with n degrees of
freedom
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• Chi-square (2) Distribution
– Furthermore, the sampling distribution of
n
 (x i  x)
2
2
(
n

1
)
S
y  i 1

2

2
is chi-square with n – 1 degrees of freedom
when sampling from a normal population.
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
– Chi-square (2)
Distribution for various
degrees of freedom.
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• t-distribution
– If x is a standard normal random variable and if
y is a chi-square random variable with k degrees
of freedom, then
t 
x
y
k
is distributed as t with k degrees of freedom.
Introduction to Statistical Quality Control,
4th Edition
3-1.1 Sampling from a Normal
Distribution
• F-distribution
– If w and y are two independent chi-square
random variables with u and v degrees of
freedom, respectively, then
w/u
F
y/v
is distributed as F with u numerator degrees of
freedom and v denominator degrees of freedom.
Introduction to Statistical Quality Control,
4th Edition
3-1.2 Sampling from a Bernoulli
Distribution
• A random variable, x, with probability
function
p

p( x )  
(1  p)  q
x 1
x0
is called a Bernoulli random variable.
• The sum of a sample from a Bernoulli
process has a binomial distribution with
parameters n and p.
Introduction to Statistical Quality Control,
4th Edition
3-1.2 Sampling from a Bernoulli
Distribution
• x1, x2, …, xn taken from a Bernoulli process
• The sample mean is a discrete random
variable given by
1 n
x   xi
n i 1
• The mean and variance of x are
x  p
p(1  p)
 
n
2
x
Introduction to Statistical Quality Control,
4th Edition
3-1.3 Sampling from a Poisson
Distribution
• Consider a random sample of size n, x1, x2, …, xn,
taken from a Poisson process with parameter 
• The sum, x = x1 + x2 + … + xn is also Poisson
with parameter n.
• The sample mean is a discrete random variable
given by
1 n
x   xi
n i 1

2
• The mean and variance of x are  x   ,  x 
n
Introduction to Statistical Quality Control,
4th Edition
3-2. Point Estimation of Process
Parameters
• Parameters are values representing the
population. Ex) ,  2 The population
mean and variance, respectively.
• Parameters in reality are often unknown and
must be estimated.
• Statistics are estimates of parameters.
2
Ex) x, S The sample mean and sample
variance, respectively.
Introduction to Statistical Quality Control,
4th Edition
3-2. Point Estimation of Process
Parameters
Two properties of good point estimators
1. The point estimator should be unbiased.
2. The point estimator should have minimum
variance.
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
Two categories of statistical inference:
1. Parameter Estimation
2. Hypothesis Testing
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
•
A statistical hypothesis is a statement
about the values of the parameters of a
probability distribution.
H0 :   0
H1 :    0
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
•
Steps in Hypothesis Testing
–
–
–
–
–
–
Identify the parameter of interest
State the null hypothesis, H0 and alternative
hypotheses, H1.
Choose a significance level
State the appropriate test statistic
State the rejection region
Compare the value of test statistic to the rejection
region. Can the null hypothesis be rejected?
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
•
Example: An automobile manufacturer claims a
particular automobile can average 35 mpg
(highway).
–
–
Suppose we are interested in testing this claim. We
will sample 25 of these particular autos and under
identical conditions calculate the average mpg for this
sample.
Before actually collecting the data, we decide that if
we get a sample average less than 33 mpg or more
than 37 mpg, we will reject the makers claim.
(Critical Values)
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
•
Example (continued)
–
•
H0:   35
H1:   35
Rejection Regions
Do not reject
From the sample of 25
cars, the average mpg
was found to be 31.5.
What is your
conclusion?
Reject
Reject
33
Introduction to Statistical Quality Control,
4th Edition
35
x
37
3-3. Statistical Inference for a
Single Sample
Choice of Critical Values
• How are the critical values chosen?
• Wouldn’t it be easier to decide “how much room
for error you will allow” instead of finding the
exact critical values for every problem you
encounter?
OR
• Wouldn’t be easier to set the size of the rejection
region, rather than setting the critical values for
every problem?
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
Significance Level
• The level of significance,  determines the size
of the rejection region.
• The level of significance is a probability. It is
also known as the probability of a “Type I error”
(want this to be small)
• Type I error - rejecting the null hypothesis when
it is true.
• How small? Usually want   0.10
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
Types of Error
• Type I error - rejecting the null hypothesis when
it is true.
• Pr(Type I error) = . Sometimes called the
producer’s risk.
• Type II error - not rejecting the null hypothesis
when it is false.
• Pr(Type II error) = . Sometimes called the
consumer’s risk.
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
An Engine Explodes
H0: An automobile engine explodes when started.
H1: An automobile engine does not explode when
started.
Which error would you take action to avoid?
Whose risk is higher, the producer’s or the
consumer’s?
Introduction to Statistical Quality Control,
4th Edition
3-3. Statistical Inference for a
Single Sample
Power of a Test
• The Power of a test of hypothesis is given
by 1 - 
• That is, 1 -  is the probability of correctly
rejecting the null hypothesis, or the
probability of rejecting the null hypothesis
when the alternative is true.
Introduction to Statistical Quality Control,
4th Edition
3-3.1 Inference on the Mean of a
Population, Variance Known
Hypothesis Testing
• Hypotheses: H0:    o H1:    o
• Test Statistic:
x  0
Z0 
/ n
• Significance Level, 
• Rejection Region: Zo   Z / 2
or
Z 0  Z / 2
• If Z0 falls into either of the two regions above,
reject H0
Introduction to Statistical Quality Control,
4th Edition
3-3.1 Inference on the Mean of a
Population, Variance Known
Example 3-1
• Hypotheses: H0:   175 H1:   175
•
182  175
 3.50
Test Statistic: Z0 
10 / 25
•
•
•
Significance Level,  = 0.05
Rejection Region: Z0  Z  1.645
Since 3.50 > 1.645, reject H0 and conclude that
the lot mean pressure strength exceeds 175 psi.
Introduction to Statistical Quality Control,
4th Edition
3-3.1 Inference on the Mean of a
Population, Variance Known
Confidence Intervals
• A general 100(1- )% two-sided confidence
interval on the true population mean,  is
P[ L    U ]  (1   )
•
100(1- )% One-sided confidence intervals are:
P[  U]  (1  )
P[L  ]  (1  )
Upper
Introduction to Statistical Quality Control,
4th Edition
Lower
3-3.1 Inference on the Mean of a
Population, Variance Known
Confidence Interval on the Mean with
Variance Known
• Two-Sided:
P[ x  Z 
2
•


   x  Z
]  (1   )
n
n
2
See the text for one-sided confidence
intervals.
Introduction to Statistical Quality Control,
4th Edition
3-3.1 Inference on the Mean of a
Population, Variance Known
Example 3-2
•
Reconsider Example 3-1. Suppose a 95% two-sided
confidence interval is specified. Using Equation (3-28)
we compute


x  z / 2
   x  z / 2
n
n
10
10
182  1.96
   182  1.96
25
25
178.08    185.92
•
Our estimate of the mean bursting strength is 182 psi 
3.92 psi with 95% confidence
Introduction to Statistical Quality Control,
4th Edition
3-3.2 The Use of P-Values in
Hypothesis Testing
•
•
If it is not enough to know if your test
statistic, Z0 falls into a rejection region,
then a measure of just how significant
your test statistic is can be computed - Pvalue.
P-values are probabilities associated with
the test statistic, Z0.
Introduction to Statistical Quality Control,
4th Edition
3-3.2 The Use of P-Values in
Hypothesis Testing
Definition
• The P-value is the smallest level of significance
that would lead to rejection of the null
hypothesis H0.
Introduction to Statistical Quality Control,
4th Edition
3-3.2 The Use of P-Values in
Hypothesis Testing
Example
• Reconsider Example 3-1. The test statistic was
calculated to be Z0 = 3.50 for a right-tailed
hypothesis test. The P-value for this problem is
then
P = 1 - (3.50) = 0.00023
• Thus, H0:  = 175 would be rejected at any level
of significance   P = 0.00023
Introduction to Statistical Quality Control,
4th Edition
3-3.3 Inference on the Mean of a
Population, Variance
Unknown
Hypothesis Testing
•
•
•
•
•
Hypotheses: H0:    o
Test Statistic:
x  0
t0 
H 1:    o
s/ n
Significance Level, 
Rejection Region: t 0  t  / 2,n 1
Reject H0 if t 0  t  / 2,n 1
Introduction to Statistical Quality Control,
4th Edition
3-3.3 Inference on the Mean of a
Population, Variance
Unknown
Confidence Interval on the Mean with Variance
Unknown
•
Two-Sided:
s
s 

P  x  t  / 2,n 1
   x  t  / 2,n 1
 (1  )

n
n

•
See the text for the one-sided confidence
intervals.
Introduction to Statistical Quality Control,
4th Edition
3-3.3 Inference on the Mean of a
Population, Variance
Unknown
Computer Output
Table 3-2. Minitab Output for Example 3-3
Welcome to Minitab, press F1 for help.
One-Sample T: Strength
Test of mu = 50 vs mu not = 50
Variable
Strength
Variable
Strength
N
16
Mean
49.864
95.0% CI
(48.979, 50.750)
StDev
1.661
T
-0.33
Introduction to Statistical Quality Control,
4th Edition
SE Mean
0.415
P
0.749
3-3.4 Inference on the Variance of
a Normal Distribution
Hypothesis Testing
•
•
•
•
Hypotheses: H0:  2   02
Test Statistic: 2 (n  1)S2
0 
H1:  2   02
 02
Significance Level, 
2
2
2
2
Rejection Region:  0    ,n 1 or  0  1  ,n 1
2
Introduction to Statistical Quality Control,
4th Edition
2
3-3.4 Inference on the Variance of
a Normal Distribution
Confidence Interval on the Variance
•
Two-Sided:
2 
 (n  1)s 2
(
n

1
)
s
P 2
 2  2
  1 
1  / 2,n 1 
   / 2,n 1
•
See the text for the one-sided confidence
intervals.
Introduction to Statistical Quality Control,
4th Edition
3-3.5 Inference on a Population
Proportion
Hypothesis Testing
•
•
•
•
Hypotheses: H0: p = p0 H1: p  p0
Test Statistic:  ( x  0.5)  np 0 x  np 0
 np (1  p )

0
0
Z0  
( x  0.5)  np 0


 np 0 (1  p 0 )
x  np 0
Significance Level, 
Rejection Region: Z0  Z / 2
Introduction to Statistical Quality Control,
4th Edition
3-3.5 Inference on a Population
Proportion
Confidence Interval on the Population Proportion
•
Two-Sided:

p̂(1  p̂)
p̂(1  p̂) 
P  p̂  Z / 2
 p  p̂  Z / 2
  1 
n
n


•
See the text for the one-sided confidence
intervals.
Introduction to Statistical Quality Control,
4th Edition
3-3.6 The Probability of Type II
Error
Calculation of P(Type II Error)
•
•
Assume the test of interest is H0:    o H1:    o
P(Type II Error) is found to be


 n
 n
    Z  

    Z  





2
 2



•
The Power of the test is then 1 - 
Introduction to Statistical Quality Control,
4th Edition
3-3.6 The Probability of Type II
Error
Operating Characteristic (OC) Curves
•
Operating Characteristic (OC) curve is a graph
representing the relationship between , ,  and
n.
•
OC curves are useful in determining how large a
sample is required to detect a specified
difference with a particular probability.
Introduction to Statistical Quality Control,
4th Edition
3-3.6 The Probability of Type II
Error
Operating Characteristic (OC) Curves
Introduction to Statistical Quality Control,
4th Edition
3-3.7 Probability Plotting
•
•
•
Probability plotting is a graphical method for
determining whether sample data conform to a
hypothesized distribution based on a subjective
visual examination of the data.
Probability plotting uses special graph paper
known as probability paper. Probability paper is
available for the normal, lognormal, and Weibull
distributions among others.
Can also use the computer.
Introduction to Statistical Quality Control,
4th Edition
3-3.7 Probability Plotting
Example 3-8
x(j)
1176
1183
1185
1190
1191
1192
1201
1205
1214
1220
(j – 0.5)/10
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
99
ML Estimates
95
90
80
Percent
j
1
2
3
4
5
6
7
8
9
10
Normal Probability Plot for Life
70
60
50
40
30
20
10
5
1
1150
1160
1170
1180
1190
1200
Data
Introduction to Statistical Quality Control,
4th Edition
1210
1220
1230
1240
Mean:
1195.7
StDev:
13.3120
3-4. Statistical Inference for Two
Samples
•
Previous section presented hypothesis testing
and confidence intervals for a single population
parameter.
•
Results are extended to the case of two
independent populations
•
Statistical inference on the difference in
population means, 1   2
Introduction to Statistical Quality Control,
4th Edition
3-4.1 Inference For a Difference in
Means, Variances Known
Assumptions
1. X11, X12, …, X1n1 is a random sample from
population 1.
2. X21, X22, …, X2n2 is a random sample from
population 2.
3. The two populations represented by X1 and X2 are
independent
4. Both populations are normal, or if they are not
normal, the conditions of the central limit theorem
apply
Introduction to Statistical Quality Control,
4th Edition
3-4.1 Inference For a Difference in
Means, Variances Known
•
Point estimator for 1   2 is
where
X1  X 2
EX1  X2   EX1   EX2   1   2
12  22
VX1  X 2   VX1   VX 2  

n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.1 Inference For a Difference in
Means, Variances Known
Hypothesis Tests for a Difference in Means,
Variances Known
Null Hypothesis: H 0 : 1   2   0
Test Statistic:
Z0 
X1  X 2   0
12  22

n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.1 Inference For a Difference in
Means, Variances Known
Hypothesis Tests for a Difference in Means,
Variances Known
Alternative Hypotheses
Rejection Criterion
H1 : 1   2   0
z 0  z  / 2 or z 0  z  / 2
H1 : 1   2   0
z0  z
H1 : 1   2   0
z 0  z 
Introduction to Statistical Quality Control,
4th Edition
3-4.1 Inference For a Difference in
Means, Variances Known
Confidence Interval on a Difference in Means,
Variances Known
100(1 - )% confidence interval on the
difference in means is given by
x1  x 2  z  / 2
12  22
12  22

 1   2  x1  x 2  z  / 2

n1 n 2
n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I: 12   22   2
• Point estimator for 1   2 is X1  X 2
where


1 1 
VX1  X2  

    
n1 n 2
 n1 n 2 
2
2
2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I: 12   22   2
2
The pooled estimate of  , denoted by S2p is defined
by
2
2




n

1
S

n

1
S
1
2
2
S2  1
p
n1  n 2  2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case I: 12   22   2
Null Hypothesis: H 0 : 1   2   0
Test Statistic:
X1  X 2   0
t0 
1 1
Sp

n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Variances Unknown
Alternative Hypotheses
Rejection Criterion
H1 : 1   2   0
t 0  t  / 2,n1  n 2  2 or
H1 : 1   2   0
H1 : 1   2   0
t 0   t  / 2 , n1  n 2  2
t 0  t  / 2 , n1  n 2  2
t 0   t  / 2 , n1  n 2  2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case II: 12   22
Null Hypothesis: H 0 : 1   2   0
Test Statistic:

0
t 
X1  X 2   0
S12 S22

n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Hypothesis Tests for a Difference in Means,
Case II: 12   22
• The degrees of freedom for t 0 are given by
2
S S 
  
n1 n 2 


2
2
2
S12 n1
S22 n 2

n1  1
n2 1
2
1

 
2
2

Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Confidence Intervals on a Difference in Means,
Case I: 12   22   2
100(1 - )% confidence interval on the
difference in means is given by
x 1  x 2  t  / 2 , n1  n 2  2 s p
1 1
1 1

 1   2  x1  x 2  t  / 2,n1  n 2  2s p

n1 n 2
n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Inference For a Difference in
Means, Variances Unknown
Confidence Intervals on a Difference in Means,
Case II: 12   22
100(1 - )% confidence interval on the
difference in means is given by
x1  x 2  t  / 2 ,
s12 s 22
s12 s 22

 1   2  x1  x 2  t  / 2,

n1 n 2
n1 n 2
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Paired Data
•
•
•
Observations in an experiment are often
paired to prevent extraneous factors from
inflating the estimate of the variance.
Difference is obtained on each pair of
observations, dj = x1j – x2j, where j = 1, 2,
…, n.
Test the hypothesis that the mean of the
difference, d, is zero.
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Paired Data
•
The differences, dj, represent the “new”
set of data with the summary statistics:
1 n
d  dj
n j1
 d j  d 
n
S 
2
d
2
j1
n 1
Introduction to Statistical Quality Control,
4th Edition
3-4.2 Paired Data
Hypothesis Testing
•
•
Hypotheses: H0: d = 0 H1: d  0
Test Statistic:
d
t0 
Sd n
•
•
Significance Level, 
Rejection Region: |t0|  t/2,n-1
Introduction to Statistical Quality Control,
4th Edition
3-4.3 Inferences on the Variances
of Two Normal Distributions
Hypothesis Testing
•
Consider testing the hypothesis that the variances of two
independent normal distributions are equal.
H 0 : 12   22
H1 : 12   22
•
Assume random samples of sizes n1 and n2 are taken
from populations 1 and 2, respectively
Introduction to Statistical Quality Control,
4th Edition
3-4.3 Inferences on the Variances
of Two Normal Distributions
Hypothesis Testing
•
•
Hypotheses: H 0 : 12   22 H1 : 12   22
Test Statistic:
2
S1
F0  2
S2
•
•
Significance Level, 
Rejection Region: F  F
0
 / 2 , n1 1, n 2 1
Introduction to Statistical Quality Control,
4th Edition
F0  F(1  / 2 ), n1 1,n 2 1
3-4.3 Inferences on the Variances
of Two Normal Distributions
Alternative
Hypothesis
Test
Statistic
H1 :   
S22
F0  2
S1
F0  F ,n 2 1,n1 1
S12
F0  2
S2
F0  F , n1 1, n 2 1
2
1
2
2
H1 :   
2
1
2
2
Rejection
Region
Introduction to Statistical Quality Control,
4th Edition
3-4.3 Inferences on the Variances
of Two Normal Distributions
Confidence Intervals on Ratio of the Variances of
Two Normal Distributions
100(1 - )% two-sided confidence interval on
the ratio of variances is given by
S12
12 S12
F
 2  2 F / 2,n 2 1,n1 1
2 (1  / 2 ), n 2 1, n1 1
S2
 2 S2
Introduction to Statistical Quality Control,
4th Edition
3-4.4 Inference on Two Population
Proportions
Large-Sample Hypothesis Testing
•
•
Hypotheses: H0: p1 = p2 H1: p1  p2
Test Statistic:
Z0 
•
•
P̂1  P̂2  (p1  p 2 )
p1 (1  p1 ) p 2 (1  p 2 )

n1
n2
Significance Level, 
Rejection Region: Z0  Z / 2
Introduction to Statistical Quality Control,
4th Edition
3-4.4 Inference on Two Population
Proportions
Alternative Hypothesis
Rejection Region
H1 : p1  p2
z0  z
H1 : p1  p2
z 0  z 
Introduction to Statistical Quality Control,
4th Edition
3-4.4 Inference on Two Population
Proportions
Confidence Interval on the Difference in Two Population
Proportions
•
Two-Sided:
P̂1  P̂2  Z  / 2
p1 (1  p1 ) p 2 (1  p 2 )

 p1  p 2
n1
n2
 P̂1  P̂2  Z  / 2
•
p1 (1  p1 ) p 2 (1  p 2 )

n1
n2
See the text for the one-sided confidence
intervals.
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
Example
Investigating the effect of one factor (with several levels) on
some response. See Table 3-5
Hardwood
Concentration
5%
10
15
20
Overall
1
7
12
14
19
2
8
17
18
25
Observations
3 4 5 6
15 11 9 10
13 18 19 15
19 17 16 18
22 23 18 20
Totals Avg
60 10
94 15.67
102 17
127 21.17
383 15.96
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
Analysis of Variance
•
•
Always a good practice to compare the
levels of the factor using graphical
methods such as boxplots.
Comparative boxplots show the variability
of the observations within a factor level
and the variability between factor levels.
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
Figure 3-14 (a)
Tensile strength (psi)
25
15
5
5
10
15
20
Hardwood Concentration (%)
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• The observations yij can be modeled by
 i  1,2,..., a
Yij    i   ij 
 j  1,2,..., n
a = number of factor levels
n = number of replicates (# of observations
per treatment (factor) level.)
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• The hypotheses being tested are
H0 :  1   2 ...   a  0
H1 : i  0 for at least one i
• Total variability can be measured by the
“total corrected sum of squares”:
a
n
SST    ( y ij  y.. )
2
i 1 j1
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• The sum of squares identity is
a
n
a
a
n
  ( y ij  y.. )  n  ( y i.  y.. )    ( y ij  y i. )
i 1 j1
2
i 1
2
i 1 j1
• Notationally, this is often written as
SST = SSTreatments + SSE
Introduction to Statistical Quality Control,
4th Edition
2
3-5. What If We Have More Than
Two Populations?
• The expected value of the treatment sum of
squares is
a
E(SSTreatments )  (a  1)  n  
2
i 1
• If the null hypothesis is true, then
 SSTreatments
E
 a 1

2


Introduction to Statistical Quality Control,
4th Edition
2
i
3-5. What If We Have More Than
Two Populations?
• The error mean square
ss E
MSE 
a (n  1)
• If the null hypothesis is true, the ratio
SSTreatments /( a  1) MSTreatments
F0 

SSE /[ a (n  1)]
MSE
has an F-distribution with a – 1 and a(n – 1)
degrees of freedom.
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
The following formulas can be used to calculate the sums of
squares.
• Total Sum of Squares (SST):
2
y
SST    y ij2  ..
i 1 j1
an
a
n
• Sum of Squares for the Treatments (SSTreatment):
SSTreatment
yi2. y..2


i 1 n
an
a
• Sum of Squares for error (SSE):
SSE = SST -SSTreatment
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• Analysis of Variance Table 3-7
Source of
Variation
Sum of Squares
Degrees of
Freedom
Mean Square
F0
Treatments
SSTreatments
a-1
MSTreatments
MS Treatments
MS E
Error
Total
SSE
SST
a(n – 1)
an - 1
MSE
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• Analysis of Variance Table 3-8
Analysis of Variance
Source
DF
SS
Factor
3
382.79
Error
20
130.17
Total
23
512.96
Level
5
10
15
20
N
6
6
6
6
Pooled StDev =
Mean
10.000
15.667
17.000
21.167
2.551
MS
127.60
6.51
StDev
2.828
2.805
1.789
2.639
F
19.61
P
0.000
Individual 95% CIs For Mean
Based on Pooled StDev
-----+---------+---------+---------+(---*---)
(---*----)
(---*---)
(---*----)
-----+---------+---------+---------+10.0
15.0
20.0
25.0
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
Residual Analysis
• Assumptions: model errors are normally
and independently distributed with equal
variance.
• Check the assumptions by looking at
residual plots.
Introduction to Statistical Quality Control,
4th Edition
3-5. What If We Have More Than
Two Populations?
• Residual Analysis
• Plot of residuals versus factor levels
5
4
3
Residual
2
1
0
-1
-2
-3
-4
5
10
15
Percent Hardwood
Introduction to Statistical Quality Control,
4th Edition
20
3-5. What If We Have More Than
Two Populations?
• Residual Analysis
• Normal probability plot of residuals
Normal Probability Plot
.999
.99
Probability
.95
.80
.50
.20
.05
.01
.001
-4
-3
-2
-1
0
1
2
3
4
5
Residuals
Introduction to Statistical Quality Control,
4th Edition