Download Tue Feb 25 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Lecture 13
• Multiple comparisons for one-way ANOVA
(Chapter 15.7)
• Analysis of Variance Experimental Designs
(Chapter 15.3)
15.7 Multiple Comparisons
• When the null hypothesis is rejected, it may be
desirable to find which mean(s) is (are) different,
and how they rank.
• Three statistical inference procedures, geared at
doing this, are presented:
– Fisher’s least significant difference (LSD) method
– Bonferroni adjustment to Fisher’s LSD
– Tukey’s multiple comparison method
Example 15.1
• Sample means:
xconvenience  577.55
x price  608.65
xquality  653.00
Does the quality strategy have a higher mean sales than the
other two strategies?
Do the quality and price strategies have a higher mean than
the convenience strategy?
Does the price strategy have a smaller mean sales than
quality but a higher mean than convenience?
• Pairwise comparison: Are two population means different?
Fisher Least Significant Different (LSD)
Method
• This method builds on the equal variances t-test of
the difference between two means.
• The test statistic is improved by using MSE rather
than sp2.
• We conclude that mi and mj differ (at a% significance
level if | xi  x j | > LSD, where
LSD  ta
2, n  k
1 1
MSE (  )
ni n j
Multiple Comparisons Problem
• A hypothetical study of the effect of birth control pills is
done.
• Two groups of women (one taking birth controls, the other
not) are followed and 20 variables are recorded for each
subject such as blood pressure, psychological and medical
problems.
• After the study, two-sample t-tests are performed for each
variable and it is found that one null hypothesis is rejected.
Women taking birth pills have higher incidences of
depression at the 5% significance level (the p-value equals
.02).
• Does this provide strong evidence that women taking birth
control pills are more likely to be depressed?
Experimentwise Type I error rate (aE) versus
Comparisonwise Type I error rate
• The comparisonwise Type I error rate is the probability of
committing a Type I error for one pairwise comparison.
• The experimentwise Type I error rate ( a E ) is the probability
of committing at least one Type I error when C tests are done
and all null hypotheses are true.
• For a one-way ANOVA, there are k(k-1)/2 pairwise
comparisons (k=number of populations)
• If the comparisons are not planned in advance and chosen
after looking at the data, the experimentwise Type I error rate
is the more appropriate one to look at.
Experimentwise Error Rate
• The expected number of Type I errors if C tests
are done at significance level a each is Ca
• If C independent tests are done,
aE = 1-(1 – a)C
• The Bonferroni adjustment determines the
required Type I error probability per test (a) , to
secure a pre-determined overall aE.
Bonferroni Adjustment
• Suppose we carry out C tests at significance
level a
• If the null hypothesis for each test is true,
the probability that we will falsely reject at
least one hypothesis is at most Ca
• Thus, if we carry out C tests at significance
level a / C , the experimentwise Type I error
rate is at most C (a / C )  a
Bonferroni Adjustment for ANOVA
• The procedure:
– Compute the number of pairwise comparisons (C)
[all: C=k(k-1)/2], where k is the number of populations.
– Set a = aE/C, where aE is the true probability of making
at least one Type I error (called experimentwise Type I
error).
– We conclude that mi and mj differ at a/C% significance
level (experimentwise error rate at most a ) if
xi  x j  ta ( 2C ),nk
1 1
MSE (  )
ni n j
Fisher and Bonferroni Methods
• Example 15.1 - continued
– Rank the effectiveness of the marketing strategies
(based on mean weekly sales).
– Use the Fisher’s method, and the Bonferroni adjustment
method
• Solution (the Fisher’s method)
– The sample mean sales were 577.55, 653.0, 608.65.
– Then,
1 1
x1  x 2  577.55  653.0  75.45
ta 2,nk MSE (  ) 
ni n j
x  x  577.55  608.65  31.10
1
3
x 2  x 3  653.0  608.65  44.35
t.05 / 2,57 8894(1 / 20)  (1 / 20)  59.71
Fisher and Bonferroni Methods
• Solution (the Bonferroni adjustment)
– We calculate C=k(k-1)/2 to be 3(2)/2 = 3.
– We set a = .05/3 = .0167, thus t.0167/2, 60-3 = 2.467 (Excel).
x1  x 2  577.55  653.0  75.45
x1  x 3  577.55  608.65  31.10
ta 2
1 1
MSE(  ) 
ni n j
x 2  x 3  653.0  608.65  44.35 2.467 8894 (1/ 20)  (1/ 20)  73.54
Again, the significant difference is between m1 and m2.
Tukey Multiple Comparisons
• The test procedure:
– Assumes equal number of obs. per populations.
– Find a critical number w as follows:
MSE
w  q a (k ,  )
ng
k = the number of populations
 =degrees of freedom = n - k
ng = number of observations per population
a = significance level
qa(k,) = a critical value obtained from the studentized range table (app. B17/18)
Tukey Multiple Comparisons
• Select a pair of means. Calculate the difference
between the larger and the smaller mean. xmax  xmin
• If xmax  xmin  w there is sufficient evidence to
conclude that mmax > mmin .
• Repeat this procedure for each pair of
samples. Rank the means if possible.
If the sample sizes are not extremely different, we can use the
above procedure with ng calculated as the harmonic mean of
the sample sizes.
ng 
k
1 n1 1 n2 ... 1 nk
Tukey Multiple Comparisons
• Example 15.1 - continued We had three
populations (three marketing strategies).
K = 3,
Sample sizes were equal. n1 = n2 = n3 = 20,
 = n-k = 60-3 = 57,
MSE = 8894.
Take q.05(3,60) from the table: 3.40.
MSE
8894
w  q a (k,  )
 q.05 (3,57)
 71.70
ng
20
Population
Mean
Sales - City 1 577.55
Sales - City 2 653
Sales - City 3 698.65
xmax  xmin
xmax  xmin  w
City 1 vs. City 2: 653 - 577.55 = 75.45
City 1 vs. City 3: 608.65 - 577.55 = 31.1
City 2 vs. City 3: 653 - 608.65 = 44.35
15.3 Analysis of Variance
Experimental Designs
• Several elements may distinguish between
one experimental design and another:
– The number of factors (1-way, 2-way, 3-way,…
ANOVA).
– The number of factor levels.
– Independent samples vs. randomized blocks
– Fixed vs. random effects
These concepts will be explained in this lecture.
Number of factors, levels
• Example: 15.1, modified
– Methods of marketing:
price, convenience, quality
=> first factor with 3 levels
– Medium: advertise on TV vs. in newspapers
=> second factor with 2 levels
• This is a factorial experiment with two
“crossed factors” if all 6 possibilities are
sampled or experimented with.
• It will be analyzed with a “2-way ANOVA”.
(The book got this term wrong.)
One - way ANOVA
Single factor
Two - way ANOVA
Two factors
Response
Response
Treatment 3 (level 1)
Treatment 2 (level 2)
Treatment 1 (level 3)
Level2
Level 1
Factor B
Level 3
Level2
Level 1 Factor A
Randomized blocks
• This is something between 1-way and 2-way ANOVA:
a generalization of matched pairs when there are more
than 2 levels.
• Groups of matched observations are collected in
blocks, in order to remove the effects of unwanted
variability. => We improve the chances of detecting
the variability of interest.
• Blocks are like a second factor
=> 2-way ANOVA is used for analysis
• Ideally, assignment to levels within blocks is
randomized, to permit causal inference.
Randomized blocks (cont.)
• Example: expand 13.03
– Starting salaries of marketing and finance MBAs:
add accounting MBAs to the investigation.
– If 3 independent samples of each specialty are
collected (samples possibly of different sizes), we
have a 1-way ANOVA situation with 3 levels.
– If GPA brackets are formed, and if one samples
3 MBAs per bracket, one from each specialty,
then
one has a blocked design. (Note: the 3 samples will
be of equal size due to blocking.)
– Randomization is not possible here: one can’t assign
each student to a specialty, and one doesn’t know the
GPA beforehand for matching.
=> No causal inference.
Models of fixed and random effects
• Fixed effects
– If all possible levels of a factor are included in our
analysis or the levels are chosen in a nonrandom way,
we have a fixed effect ANOVA.
– The conclusion of a fixed effect ANOVA applies only
to the levels studied.
• Random effects
– If the levels included in our analysis represent a random
sample of all the possible levels, we have a randomeffect ANOVA.
– The conclusion of the random-effect ANOVA applies to
all the levels (not only those studied).
Models of fixed and random effects (cont.)
Fixed and random effects - examples
– Fixed effects - The advertisement Example (15.1): All
the levels of the marketing strategies considered were
included. Inferences don’t apply to other possible
strategies such as emphasizing nutritional value.
– Random effects - To determine if there is a difference in
the production rate of 50 machines in a large factory,
four machines are randomly selected and the number of
units each produces per day for 10 days is recorded.
15.4 Randomized Blocks Analysis of
Variance
• The purpose of designing a randomized
block experiment is to reduce the withintreatments variation, thus increasing the
relative amount of between treatment
variation.
• This helps in detecting differences between
the treatment means more easily.
Examples of Randomized Block
Designs
Factor
Response
Varieties of
Corn
Blood
pressure
Drugs
Managemen
t style
Yield
Units
Plots of
Land
Hypertensio Patient
n
Worker
Amount
productivity produced
by worker
Block
Adjoining
plots
Same age,
sex, overall
condition
Shifts
Randomized Blocks
Block all the observations with some
commonality across treatments
Treatment 4
Treatment 3
Treatment 2
Treatment 1
Block3
Block2
Block 1
Randomized Blocks
Block all the observations with some
commonality across treatments
Treatment
Block
1
2
.
.
.
b
Treatment mean
1
2
k Block mean
X11 X12 . . . X1k
x[B]1
X21 X22
X2k
x[B]2
Xb1 Xb2
Xbk
x[ T ]1 x[ T ]2
x[ T ]k
x[B]b
Partitioning the total variability
• The sum of square total is partitioned into
Recall.
three sources of variation For the independent
– Treatments
– Blocks
– Within samples (Error)
samples design we have:
SS(Total) = SST + SSE
SS(Total) = SST + SSB + SSE
Sum of square for treatments
Sum of square for blocks
Sum of square for error
Sums of Squares Decomposition
•
•
•
•
= observation in ith block, jth treatment
X i 
= mean of ith block
X  j= mean of jth treatment
X ij
k
b
SSTot   ( X ij  X ) 2
j 1 i 1
k
SST   b( X  j  X ) 2
j 1
b
SSB   k ( X i  X ) 2
i 1
k
b
SSE  SSTot  SST  SSB   ( X ij  X i  X  j  X ) 2
j 1 i 1
Calculating the sums of squares
• Formulas for the calculation of the sums of squares
SS (Total)  ( x11  X ) 2  ( x21  X ) 2  ...  ( x12  X ) 2  ( x22  X ) 2 
...  ( X 1k  X ) 2  ( x2k  X ) 2  ... 
SSB=
Treatment
Block
1
2
k Block mean
2


 k( x[B]1 )  X  
X11 X12 . . . X1k
x[B]1


X21 X22
X2k x[B]
2
2
1
2
.
.
.
b
 k( x[B] )  X  
2


Xb1 Xb2
Xbk
Treatment mean x[ T ]1 x[ T ]2
x[ T ]k
SST =
2
2
 k( x[B] )  X 
k


x
 b( x[ T] )  X    b( x[T] )  X   ...   b( x[ T ] )  X 
1
2
k


 


2
2
Calculating the sums of squares
• Formulas for the calculation of the sums of squares
SSE  ( x11  x[ T ]1  x[B]1  X ) 2  ( x 21  x[ T ]1  x[B] 2  X ) 2  ...
( x 12  x[ T ] 2  x[B]1  X ) 2  ( x 22  x[ T ] 2  x[B] 2  X ) 2  ...
( x 1k  x[ T ]k  x[B]1  X ) 2  ( x 2k  x[ T ]k  x[B] 2  X ) 2  ...
SSB=
Treatment
Block
2
1
2
k Block mean
 k( x[B] )  X  
X11 X12 . . . X1k
1
x[B]1


X21 X22
X2k x[B]2
2
1
2
.
.
.
b
 k( x[B] )  X  
2


Xb1 Xb2
Xbk
Treatment mean x[ T ]1 x[ T ]2
x[ T ]k
SST =
2
2
 k( x[B] )  X 
k


x
 b( x[ T] )  X    b( x[T] )  X   ...   b( x[ T ] )  X 
1
2
k


 


2
2
Mean Squares
To perform hypothesis tests for treatments and
blocks we need
• Mean square for treatments
• Mean square for blocks
SST
• Mean square for error
MST 
k 1
SSB
MSB 
b 1
SSE
MSE 
nk b 1
Test statistics for the randomized block
design ANOVA
Test statistic for treatments
MST
F
MSE
Test statistic for blocks
MSB
F
MSE
df-T: k-1
df-B: b-1
df-E: n-k-b+1
The F test rejection regions
• Testing the mean responses for treatments
F > Fa,k-1,n-k-b+1
• Testing the mean response for blocks
F> Fa,b-1,n-k-b+1
Randomized Blocks ANOVA - Example
• Example 15.2
– Are there differences in the effectiveness of
cholesterol reduction drugs?
– To answer this question the following experiment
was organized:
• 25 groups of men with high cholesterol were matched by
age and weight. Each group consisted of 4 men.
• Each person in a group received a different drug.
• The cholesterol level reduction in two months was
recorded.
– Can we infer from the data in Xm15-02 that there
are differences in mean cholesterol reduction
among the four drugs?
Randomized Blocks ANOVA - Example
• Solution
– Each drug can be considered a treatment.
– Each 4 records (per group) can be blocked,
because they are matched by age and weight.
– This procedure eliminates the variability in
cholesterol reduction related to different
combinations of age and weight.
– This helps detect differences in the mean
cholesterol reduction attributed to the
different drugs.
Randomized Blocks ANOVA - Example
ANOVA
Source of Variation
Rows
Columns
Error
SS
3848.7
196.0
1142.6
Total
5187.2
Treatments
df
24
3
72
MS
160.36
65.32
15.87
F
P-value
10.11
0.0000
4.12
0.0094
F crit
1.67
2.73
99
Blocks b-1 K-1 MST / MSE MSB / MSE
Conclusion: At 5% significance level there is sufficient evidence
to infer that the mean “cholesterol reduction” gained by at least
two drugs are different.