• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Psychometrics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Analysis of variance wikipedia, lookup

Omnibus test wikipedia, lookup

Transcript
```Tests with two+ groups
We have examined tests of means for a
single group, and for a difference if we
have a matched sample (as in husbands
and wives)
Now we consider differences of means
between two or more groups
1
Two sample t test
Compare means on a variable for two
different groups.
Income differences between males and
females
 Average SAT score for blacks and whites
 Mean time to failure for parts manufactured
using two different processes

2
New Test - Same Logic
Find the probability that the observed
difference could be due to chance
factors in taking the random sample.
If probability is very low, then conclude
that difference did not happen by
chance (reject null hypothesis)
If probability not low, cannot reject null
hypothesis (no diff. between groups)
3
Sampling Distributions
Note in this case
each mean is not
in the critical
region of other
sampling dist.
Mean
1
Mean
2
4
Sampling Distributions
Note each mean
is well into the
critical region of
other sampling
distribution.
Mean
1
Mean
2
5
Sampling Dist. of Difference
Big Differences
Hypothesize
Zero Diff.
Difference
of Means
6
Procedure
Calculate means for each group
Calculate difference
Calculate standard error of difference
Test to see if difference is bigger than
“t” standard errors (small samples)
 z standard errors (large samples)

t and z are taken from tables at 95 or 99
percent confidence level.
7
Standard error of difference
s y1  y2
(n1  1) s  (n2  1) s

n1  n2  2
2
1
Pooled estimate of
standard deviation
2
2
1 1

n1 n2
Divide by
sample
sizes
8
t test
y1  y2
t
s y1  y2
Difference of Means
Standard error of
difference of means
If t is greater than table value of t for 95%
confidence level, reject null hypothesis
9
Three or more groups
If there are three or more groups, we
cannot take a single difference, so we
need a new test for differences among
several means.
This test is called ANOVA for ANalysis
Of VAriance
It can also be used if there are only two
groups
10
Analysis of Variance
Note the name of the test says that we
are looking at variance or variability.
The logic is to compare variability
between groups (differences among the
means) and variability within the group
(variability of scores around the mean)
These are call the between variance
and the within variance, respectively
11
The logic
If the between variance is large relative
to the within variance, we conclude that
there are significant differences among
the means.
If the between variance is not so large,
we accept the null hypothesis
12
Examples
Large Between
Small Between
Both examples
have same
Within
13
Variance
Calculate sum of squares and then
divide by degrees of freedom
 (Y  Y )
2
n 1
Three ways to do this
14
Total, Within, and Between
Total variance is the mean squared
deviation of individual scores around the
overall (total) mean
Within variance is the mean squared
deviation of individual scores around each
of the group means
Between variance is the mean squared
deviation of group means around the
overall (total) mean
15
Total, Within, and Between
SST   ( y  y ) 2
Total = SST/dfT
dfT  n  1
SSW   ( y  yk )
2
Within = SSW/dfW
dfW  n  K
SS B   ( yk  y ) 2
Between = SSB/dfB
df B  K  1
16
F test for ANOVA
The F statistic has a distribution somewhat
like the chi-square. It made of the ratio of two
variances.
For our purpose, we will compare the
between and within estimates of variance
Create a ratio of the two -- called an F ratio.
Between variance divided by the within
variance
17
F-ratio
Table in the back of the book has critical
values of the F statistic. Like the t distribution,
we have to know degrees of freedom
Different than the t distribution, there are two
different degrees of freedom we need
Between (numerator) and within
(denominator)
18
Decision
If F-ratio for our sample is larger than
the critical value, we reject the null
hypothesis of no differences among the
means
If F-ratio is not so large, we accept null
hypothesis of no differences among the
means
19
Example (three groups)
Observations
123 456 789
Overall mean is 5
TSS  (1  5)  (2  5)  (3  5)
2
2
 (4  5)  (5  5)  (6  5)
2
 (7  5)  (8  5)  (9  5)
2
2
2
2
2
2
60
20
Example (within)
123
2
456
5
Observations
Group Means
789
8
WSS  (1  2)  (2  2)  (3  2)
2
2
 (4  5)  (5  5)  (6  5)
2
2
 (7  8)  (8  8)  (9  8)
2
2
2
2
6
2
21
Example (between)
Observations
Group Means
123 456 789
2
5
8
Overall mean is 5
BSS  (2  5)  (2  5)  (2  5)
2
2
 (5  5)  (5  5)  (5  5)
2
 (8  5)  (8  5)  (8  5)
2
2
2
2
2
2
54
22
F-ratio
Between variance divided by within
variance.
Between= 54 / 2 = 27 (remember k-1 degrees
of freedom, so df = 3-1
 Within = 6 / 6 = 1 (remember n-k degrees of
freedom, so df = 9-3

F-ratio is 27/1 with 2 and 6 df
Critical value (95%) of F is 5.14
23
```
Related documents