Download Analysis of Variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Analysis of variance wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
ANALYSIS OF VARIANCE
(ANOVA)
Chapter 12
Chapter Problem
2


A Study involved children who lived within 7km of
a large ore smelter that emitted lead pollution.
To investigate the possible effect of lead
exposure on performance IQ scores
Chapter Problem
3


Informal and subjective comparisons show that the
low group has a mean that is somewhat higher
than the means of the medium and high groups.
Formal Methods:
 The
method of Section 9-3 to compare means from
samples collected from two different populations.
 But here we need to compare means from samples
collected from _____ different populations.

Need the method of _______________________.
Analysis of Variance (분산분석)
4

Analysis of variance(ANOVA) is a method for
testing the hypothesis that three or more
population means are equal.
셋
이상의 표본정보를 이용, 이들에서 추출된
셋 이상의 모집단이 갖는 평균들이 서로 같은가
를 확인하는 방법

Hypothesis:
ANOVA requires the F-distribution
5
not symmetric; it is
skewed to the right.
2.
The values of F can
be 0 or positive,
3.
for each pair of
degrees of freedom
for the numerator
and denominator.
Critical values of F are
given in Table A-5
1.
Content
1.
2.
One-way ANOVA
Two-way ANOVA
1. One-way ANOVA
One-Way ANOVA (일원분산분석)
8



A method of testing the equality of three or more
population means by analyzing sample variances.
One-way analysis of variance is used with data
categorized with ___________________, which is
a characteristic that allows us to distinguish the
different populations from one another.
Requirements
9
1. The populations have approximately normal
distributions.
2. The populations have the same variance σ2 (or
standard deviation σ).
3. The samples are simple random samples of
quantitative data.
4. The samples are independent of each other.
5. The different samples are from populations that
are categorized in only one way.
Example: Lead and Performance IQ
Scores
10

Use the performance IQ scores listed in Table 12-1
and a significance level of α = 0.05 to test the claim
that the three samples come from populations with
means that are all equal.
11
Example: Lead and Performance IQ
Scores

Here are summary statistics from the collected data:
Example: Requirement Check
12
1.
2.
3.
4.
5.
The three samples appear to come from populations
that are approximately normal (normal quantile plots).
The three samples have standard deviations that are
not dramatically different.
We can treat the samples as simple random samples.
The samples are independent of each other and the
IQ scores are not matched in any way.
The three samples are categorized according to a
single factor: ________________________________
Example: Results
13

The hypotheses are:
H0 : 1  2  3
H1 : At least one of the means is different from the others.

Results:
Example: Procedure
14
15
Example: Lead and Performance IQ
Scores


The displays all show that the P-value is 0.01951.
Because the P-value is less than the significance
level of α = 0.05, we can ___________________.
 There
is sufficient evidence that the three samples
come from populations with means that are
_____________.

We cannot conclude formally that any particular
mean is different from the others, but it appears
that greater blood lead levels are associated with
lower performance IQ scores.
Class Talk

to test the claim that the three samples come from
populations with means that are all equal.
Class Talk
17
Relationship b/w P-value & F Test Statistic
18

Larger values of
the test statistic
result in smaller Pvalues, so the
ANOVA test is
right-tailed.
F Test Statistic
19

Assuming that the populations have the same
variance σ2 (as required for the test), the F test
statistic is the ratio of these two estimates of σ2:
 variation
________ samples (based on variation
among sample means)
 variation ______ samples (based on the sample
variances).
F Test Statistic: Calculations w/ Equal Sample Sizes n
20
variance between samples
F
variance within samples


2
x
Variance between samples = ns
2
where s x =variance of sample means
s
2
p
Variance within samples =
where s 2p =pooled variance
(or the mean of the sample variances)
Calculations w/ Equal Sample Sizes n
21

Variance Between Samples: 표본평균값들의 분산
 표본평균들이
총 평균을 기준으로 볼 때 표본들간
에 얼마나 변동폭이 큰지를 측정.

Variance Within Samples: 각 표본 분산의 평균
 우연히
발생한 각 표본의 잔차의 합.
 각표본마다 계측되는 분산들의 합이 잔차를 대변.
21
Calculations w/ Equal Sample Sizes n
22
Sample Calculations
23
s
2
x
x  x


2
x
5.5  5.83   6.0  5.83   6.0  5.83


2
k 1
2
s
2
3 1
2
 0.0833
nsx2  4*(0.0833)  0.3332
2
ns
0.3332
3.0  2.0  2.0
x
2
sp 
 2.333  F  s 2  2.3333  0.1428
p
3
Critical Value of F
24


Right-tailed test
Degree of freedom with k samples of the same size n
numerator df = k – 1
denominator df = k(n – 1)
where k = number of samples
n = sample size
25
분산 분석: 일원 배치법
요약표
인자의
수준
Sample 1
Sample 2
Sample 3
관측수
합
평균
분산
4
4
22
24
5.5
6
3
2
4
24
6
2
분산 분석
변동의
요인
처리
(Between)
잔차
(Within)
제곱합
(SS)
자유도 제곱 평균
(df)
(MS)
F비
P-값
F
0.66667
2
0.333333 0.14286 0.868805 4.26
21
9
2.333333
FYI: Calculations with Unequal Sample
Sizes
27
ni(xi – x)2
F
variance between samples
= variance within samples
=
k –1
(ni – 1)s2i
(ni – 1)
where x = mean of all sample scores combined
k = number of population means being compared
ni = number of values in the ith sample
xi = mean values in the ith sample
si2 = variance of values in the ith sample
2. Two-way ANOVA
Key Concept
29

We introduce the method of two-way analysis of
variance, which is used with _________________
__________ according to ____________.
 The
methods of this section require that we begin
by testing for an interaction between the two
factors.

Then we test whether the row or column factors
have effects.
Example
30


The data in the table are categorized with two
factors:
1. Sex: Male or Female
2. Blood Lead Level: Low, Medium, or High
The subcategories are called _____, and the
response variable is IQ score.
Example
31
There is an
interaction between
two factors if the
effect of one of the
factors changes for
different
categories of the
other factor.
 How many cells?
_____

Example
32

Let’s explore the IQ data in the table by
calculating the mean for each cell and constructing
an interaction graph
Example
33



An interaction effect is suggested if the line
segments are far from being parallel.
No interaction effect is suggested if the line
segments are approximately parallel.
For the IQ scores(the figure), it appears there is
an __________________:
 Females
with high lead exposure appear to have
lower IQ scores, while males with high lead
exposure appear to have high IQ scores.
Requirements
34
1. For each cell, the sample values come from a
population with a distribution that is approximately
normal.
2. The populations have the same variance σ2.
3. The samples are simple random samples.
4. The samples are independent of each other.
5. The sample values are categorized two ways.
6. All of the cells have the same number of sample
values (a balanced design – this section does not
include methods for a design that is not balanced).
Procedure for Two-way ANOVA
35
Step 1: Interaction Effect - test the null hypothesis
that there is ______________
Step 2: Row/Column Effects - if we conclude there is
no interaction effect, proceed with these two
hypothesis tests
Row Factor: _________________
Column Factor: ______________________
All tests use the F distribution
36
37
Procedure
38
Result
39
Result
Step 1: Interaction Effect : No interaction b/w the 2 factors
MS (Interaction) 105.7333
F

 0.4311
MS (error)
245.2667
40
Example: Continued
41
Step 1: Test that there is no interaction between the
two factors.
 The
test statistic is F = 0.43 and the P-value is
0.655, so we fail to reject the null hypothesis.
 It does not appear that the performance IQ scores
are affected by an interaction between sex and
blood lead level.
 There does not appear to be an interaction effect,
so we proceed to test for row and column effects.
Result
Step 1: Row/Column Effect
MS ( Sex)
17.6333

 0.0719
Row Factor: F 
MS (error) 245.2667
MS ( Lead Level )
24.4
Column Factor: F 

 0.0995
MS (error)
245.2667
42
Example: Continued
43
Step 2: Hypothesis test
H0 : There are no effects from the row factor (gender).
H0 : There are no effects from the column factor (blood lead level).
 For
the row factor, F = 0.0719 and the P-value is
0.791. Fail to reject the null hypothesis, there is no
evidence that IQ scores are affected by the gender of
the subject.
 For the column factor, F = 0.0995 and the P-value is
0.906. Fail to reject the null hypothesis, there is no
evidence that IQ scores are effected by the level of
lead exposure.
Example: Continued
44
Interpretation:
 Based
on the sample data, we conclude that IQ
scores do not appear to be affected by sex or
blood lead level.
Class Talk
45