Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 16
Analysis
of
Variance
Copyright ©2011 Brooks/Cole, Cengage Learning
1
9.1 Parameters, Statistics,
and Statistical Inference
A statistic is a numerical value computed from a
sample. Its value may differ for different samples.
e.g. sample mean x , sample standard deviation s,
and sample proportion p̂.
A parameter is a numerical value associated with
a population. Considered fixed and unchanging.
e.g. population mean m, population standard
deviation s, and population proportion p.
Copyright ©2011 Brooks/Cole, Cengage Learning
2
ANOVA
Analysis of variance: tool for analyzing
how the mean value of a quantitative
response variable is affected by one or more
categorical explanatory factors.
If one categorical variable: one-way ANOVA
If two categorical variables: two-way ANOVA
Copyright ©2011 Brooks/Cole, Cengage Learning
3
16.1 Comparing Means
with an ANOVA F-Test
H0: m1 = m2 = … = mk
Ha: The population means are not all equal.
F-statistic:
Variation among sample means
F
Natural variation within groups
Copyright ©2011 Brooks/Cole, Cengage Learning
4
Variation among sample means
F
Natural variation within groups
Variation among sample means is 0 if all
k sample means are equal and gets larger
the more spread out they are.
If large enough  evidence at least one
population mean is different from others
 reject null hypothesis.
p-value found using an F-distribution (more later)
Copyright ©2011 Brooks/Cole, Cengage Learning
5
Example 16.1 Seat Location and GPA
Q: Do best students sit in the front of a classroom?
Data on seat location and GPA for n = 384 students;
88 sit in front, 218 in middle, 78 in back
Students sitting in the front
generally have slightly
higher GPAs than others.
Copyright ©2011 Brooks/Cole, Cengage Learning
6
Example 16.1 Seat Location and GPA
H0: m1 = m2 = m3
Ha: The three population means are not all equal.
The F-statistic is 6.69 and the p-value is 0.0001.
p-value so small  reject H0 and conclude there
are differences among the population means.
Copyright ©2011 Brooks/Cole, Cengage Learning
7
Example 16.1 Seat Location and GPA
95% Confidence Intervals for 3 population means:
Interval for “front” does not overlap with the
other two intervals  significant difference
between mean GPA for front-row sitters and
mean GPA for other students
Copyright ©2011 Brooks/Cole, Cengage Learning
8
Notation for Summary Statistics
k = number of groups
x , si, and ni are the mean, standard deviation,
and sample size for the ith sample group
N = total sample size = n1 + n2 + … + nk
Example 16.2 Seat Location and GPA
Three seat locations  k = 3
n1 = 88, n2 = 218, n3 = 78; N = 88+218+78 = 384
x1  3.2029, x2  2.9853, x3  2.9194
s1  0.5491, s2  0.5577, s3  0.5105
Copyright ©2011 Brooks/Cole, Cengage Learning
9
Assumptions for the F-Test
• Samples are independent random samples.
• Distribution of response variable is a normal curve
within each population.
• Different populations may have different means.
• All populations have same standard deviation, s.
How k = 3 populations might look …
Copyright ©2011 Brooks/Cole, Cengage Learning
10
Conditions for Using the F-Test
• F-statistic can be used if data are not extremely
skewed, there are no extreme outliers, and group
standard deviations are not markedly different.
• Tests based on F-statistic are valid for data with
skewness or outliers if sample sizes are large.
• A rough criterion for standard deviations is that
the largest of the sample standard deviations
should not be more than twice as large as the
smallest of the sample standard deviations.
Copyright ©2011 Brooks/Cole, Cengage Learning
11
Example 16.3 Seat Location and GPA
• The boxplot showed two outliers in the group
of students who typically sit in the middle of
a classroom, but there are 218 students in that
group so these outliers don’t have much
influence on the results.
• The standard deviations for the three groups
are nearly the same.
• Data do not appear to be skewed.
Necessary conditions for F-test seem satisfied.
Copyright ©2011 Brooks/Cole, Cengage Learning
12
The Family of F-Distributions
• Skewed distributions with minimum value of 0.
• Specific F-distribution indicated by two parameters
called degrees of freedom: numerator degrees of
freedom and denominator degrees of freedom.
• In one-way ANOVA,
numerator df = k – 1,
and
denominator df = N – k
Copyright ©2011 Brooks/Cole, Cengage Learning
13
Determining the p-Value
Statistical Software reports the p-value in output.
Table A.4 provides critical values for 1% and 5%
significance levels.
• If the F-statistic is > than the 5% critical value,
the p-value < 0.05.
• If the F-statistic is > than the 1% critical value,
the p-value < 0.01 .
• If the F-statistic is between the 1% and 5% critical
values, the p-value is between 0.01 and 0.05.
Copyright ©2011 Brooks/Cole, Cengage Learning
14
16.2 Details of One-Way
Analysis of Variance
Fundamental concept: the variation among the data
values in the overall sample can be separated into:
(1) differences between group means
(2) natural variation among observations within a group
Total variation =
Variation between groups + Variation within groups
ANOVA Table displays this information.
Copyright ©2011 Brooks/Cole, Cengage Learning
15
Measuring Variation
Between Groups
Sum of squares for groups = SS Groups
SS Groups   groups ni xi  x 
2
Numerator of F-statistic = mean square for groups
SS Groups
MS Groups 
k 1
Copyright ©2011 Brooks/Cole, Cengage Learning
16
Measuring Variation
within Groups
Sum of squared errors = SS Error
SS Errors  groups ni  1si2
Denominator of F-statistic = mean square error
SS Error
MSE 
N k
Pooled standard deviation:
Copyright ©2011 Brooks/Cole, Cengage Learning
sp 
MSE
17
Measuring Total Variation
Total sum of squares = SS Total = SSTO
SS Total  values xij  x 
2
SS Total = SS Groups + SS Error
Copyright ©2011 Brooks/Cole, Cengage Learning
18
General Format of a
One-Way ANOVA Table
Copyright ©2011 Brooks/Cole, Cengage Learning
19
Example 16.7 Analysis of Variation
among Weight Losses
x1  7
x2  9
x3  15
Program 3 appears to have the highest weight loss overall.
Copyright ©2011 Brooks/Cole, Cengage Learning
20
Example 16.8 Analysis of Variation
among Weight Losses
x1  7, x2  9, x3  15 and x  10
n1  4, n2  3, n3  3 and N  10
SS Groups  groups ni xi  x 
2
 47  10  39  10  315  10  114
2
2
2
SS Groups 114
MS Groups 

 57
k 1
3 1
Copyright ©2011 Brooks/Cole, Cengage Learning
21
Example 16.8 Analysis of Variation
among Weight Losses
x1  7, x2  9, x3  15 and x  10
n1  4, n2  3, n3  3 and N  10
SS Total  values xij  x 
2
 7  10  9  10  5  10  7  10
2
2
2
2
 9  10  11  10  7  10
2
2
2
 15  10  12  10  18  10
 148
2
Copyright ©2011 Brooks/Cole, Cengage Learning
2
2
22
Example 16.8 Analysis of Variation
among Weight Losses
x1  7, x2  9, x3  15 and x  10
n1  4, n2  3, n3  3 and N  10
SS Error  SS Total - SS Groups
 148-114  34
SS Error
34
MSE 

 4.857
N k
10  3
MS Groups
57
F

 11.74 with 2 and 7 df
MSE
4.857
Copyright ©2011 Brooks/Cole, Cengage Learning
23
Example 16.8 Analysis of Variation
among Weight Losses
“Factor” used instead of Groups as the groups (weight-loss
programs) form an explanatory factor for the response.
Note: Pooled StDev is s p  MSE  4.86  2.204
Copyright ©2011 Brooks/Cole, Cengage Learning
24
Example 16.9 Top Speeds of Supercars
Data: top speeds for six runs on
each of five supercars. Kitchens (1998, p. 783)
Copyright ©2011 Brooks/Cole, Cengage Learning
25
Example 16.9
Top Speeds
Copyright ©2011 Brooks/Cole, Cengage Learning
26
Example 16.9
Top Speeds
• F = 25.15 and p-value is 0.000  reject null hypothesis
that population mean speeds are same for all five cars.
• Conditions are satisfied. Data not skewed and no
extreme outliers. Largest sample std dev (5.02 Viper) not
more than twice as large as smallest std dev (2.92 Acura).
• MS Error =14.5 is an estimate of variance of top speed for
hypothetical distribution of all possible runs with one car.
Estimated standard deviation for each car is 3.81.
• Based on sample means and CIs: Porsche and Ferrari
seem to be significantly faster than other cars.
Copyright ©2011 Brooks/Cole, Cengage Learning
27
Computation of 95% Confidence
Intervals for the Population Means
In one-way analysis of variance, a confidence
interval for a population mean mi is


s
p
*

xi  t
 n 
 i
where s p 
MSE and
t* is such that the confidence level is the probability
between -t* and t* in a t-distribution with df = N – k.
Copyright ©2011 Brooks/Cole, Cengage Learning
28
16.3 Other Methods
When data are skewed or extreme outliers present
…better to analyze the median instead of mean
H0: Population medians are equal.
Ha: Population medians are not all equal.
Two such tests are:
1. Kruskal-Wallis Test
2. Mood’s Median Test
Also called nonparametric tests.
Copyright ©2011 Brooks/Cole, Cengage Learning
29
Example 16.12 Drinks and Seat Location
Data: Seat location and
number of alcoholic drinks per week
Data appear skewed, sample
standard deviations differ.
Students sitting in the back report drinking more.
Copyright ©2011 Brooks/Cole, Cengage Learning
30
Example 16.12 Drinks and Seat Location
P = 0.000  strong evidence that the population
median number of drinks per week are not all equal.
Copyright ©2011 Brooks/Cole, Cengage Learning
31
Example 16.13 Drinks and Seat Location
P = 0.000 => the null hypothesis of equal
population medians can be rejected.
Copyright ©2011 Brooks/Cole, Cengage Learning
32
Related documents