Download Tests of Differences between Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Kin 304
Inferential Statistics
“Statistics means
never having to say
you're certain”
Inferential Statistics

As the name suggests Inferential Statistics
allow us to make inferences about the
population, based upon the sample, with a
specified degree of confidence
Inferential Statistics
The Scientific Method



Select a sample representative of the population.
The method of sample selection is crucial to this
process along with the sample size being large enough
to allow appropriate probability testing.
Calculate the appropriate test statistic. The test
statistic used is determined by the hypothesis being
tested and the research design as a whole.
Test the Null hypothesis. Compare the calculated
test statistic to its critical value at the predetermined
level of acceptance.
Inferential Statistics
Setting a Probability Level for Acceptance



Prior to analysis the researcher must decide upon their
level of acceptance.
Tests of significance are conducted at pre-selected
probability levels, symbolized by p or α.
The vast majority of the time the probability level of
0.05, is used.
–
A p of .05 means that if you reject the null hypothesis, then
you expect to find a result of this magnitude by chance only 5
in 100 times. Or conversely, if you carried out the experiment
100 times you would expect to find a result of this magnitude
95 times. You therefore have 95% confidence in your result. A
more stringent test would be one where the p = 0.01, which
translates to 99% confidence in the result.
Inferential Statistics
No Rubber Yard Sticks


Either the researcher should pre-select one level of
acceptance and stick to it, or do away with a set level
of acceptance all together and simply report the exact
probability of each test statistic.
If for instance, you had calculated a t statistic and it
had an associated probability of p = 0.032, you could
either say the probability is lower than the pre-set
acceptance level of 0.05 therefore a significant
difference at the 95% level of confidence or simply talk
about 0.032 as a percentage confidence (96.8%)
Inferential Statistics
Significance of Statistical Tests


The test statistic is calculated
The critical value of the test statistic is
determined
–

based upon sample size and probability acceptance
level (found in a table at the back of a stats book or
part of the EXCEL stats report, or SPSS output)
The calculated test statistics must be greater
than the critical value of the test statistic to
accept a significant difference or relationship
Inferential Statistics
Degrees
Probability
Degrees
Probability
of Freedom
0.05
0.01
of Freedom
0.05
0.01
1
.997
1.000
24
.388
.496
2
.950
.990
25
.381
.487
3
.878
.959
26
.374
.478
4
.811
.917
27
.367
.470
5
.754
.874
28
.361
.463
6
.707
.834
29
.355
.456
7
.666
.798
30
.349
.449
8
.632
.765
35
.325
.418
9
.602
.735
40
.304
.393
10
.576
.708
45
.288
.372
11
.553
.684
50
.273
.354
12
.532
.661
60
.250
.325
13
.514
.641
70
.232
.302
14
.497
.623
80
.217
.283
15
.482
.606
90
.205
.267
16
.468
.590
100
.195
.254
17
.456
.575
125
.174
.228
18
.444
.561
150
.159
.208
19
.433
.549
200
.138
.181
20
.423
.537
300
.113
.148
21
.413
.526
400
.098
.128
22
.404
.515
500
.088
.115
23
.396
.505
1,000
.062
.081
Kin 304
Tests of Differences between Means:
t-tests
SEM
Visual test of differences
Independent t-test
Paired t-test
Comparison


Is there a difference between two or more
groups?
Test of difference between means
– t-test
 (only
–
two means, small samples)
ANOVA - Analysis of Variance
 Multiple
–
means
ANCOVA
 covariates
t Tests
Standard Error of the Mean
SD
SEM 
n
Describes how confident you are that the
mean of the sample is the mean of the
population
t Tests
Visual Test of
Significant Difference between Means
1 Standard Error of the Mean
A
Mean
B
1 Standard Error of the Mean
Overlapping standard error
bars therefore no significant
difference between means of A
and B
No overlap of standard error
bars therefore a significant
difference between means of A
and B at about 95%
confidence
Independent t-test




Two independent groups compared using an independent
T-Test (assuming equal variances)
– e.g. Height difference between men and women
The t statistic is calculated using the difference between
the means in relation to the variance in the two samples
A critical value of the t statistic is based upon sample
size and probability acceptance level (found in a table at
the back of a stats book or part of the EXCEL t-test
report, or SPSS output)
the calculated t based upon your data must be greater
than the critical value of t to accept a significant
difference between means at the chosen level of
probability
t Tests
t statistic quantifies
the degree of overlap of the distributions
t Tests
standard error of
the difference between means
s X1  X 2  s X1  s X 2
2


2
2
The variance of the difference between means is the
sum of the two squared standard deviations.
The standard error (S.E.) is then estimated by adding
the squares of the standard deviations, dividing by the
sample size and taking the square root.
S .E .  ( s1  s2 ) / n
2
t Tests
2
t statistic

The t statistic is then calculated as the ratio of the
difference between sample means to the standard
error of the difference, with the degrees of freedom
being equal to n - 2.
t
t Tests
X1  X 2
( s1  s 2 ) / n
2
2
Critical values of t

Hypothesis:
–
There is a difference between means

Degrees of Freedom = 2n – 2

tcalc > tcrit = significant difference
Degrees
of
Freedom
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
35
40
45
50
55
60
70
80
90
100
120
?
t Tests
Probability
0.050
0.025
0.010
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.030
2.021
2.014
2.008
2.004
2.000
1.994
1.989
1.986
1.982
1.980
1.9600
25.452
6.205
4.176
3.495
3.163
2.969
2.841
2.752
2.685
2.634
2.593
2.560
2.533
2.510
2.490
2.473
2.458
2.445
2.433
2.423
2.414
2.406
2.398
2.391
2.385
2.379
2.373
2.368
2.364
2.360
2.342
2.329
2.319
2.310
2.304
2.299
2.290
2.284
2.279
2.276
2.270
2.2414
6.675
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.724
2.704
2.690
2.678
2.669
2.660
2.648
2.638
2.631
2.625
2.617
2.5758
Table 2-7.1: Critical values of the t statistic
Paired Comparison
 Paired
t Test
sometimes called t-test for correlated
data
– “Before and After” Experiments
– Bilateral Symmetry
– Matched-pairs data
–
t Tests
Paired t-test

Hypothesis:
–

Is the mean of the differences between paired
observations significantly different than zero
the calculated t statistic is evaluated in the
same way as the independent test
t
t Tests
( ( X 1  X 2 )) / n
( s1  s 2 ) / n
2
2
9 Subjects All Lose Weight
Paired Weight Loss Data
n=9
Weight Before (kg)
Weight After (kg)
Weight Loss (kg)
89.0
87.5
1.5
67.0
65.8
1.2
112.0
111.0
1.0
109.0
108.5
0.5
56.0
55.5
0.5
123.5
122.0
1.5
108.0
106.5
1.5
73.0
72.5
0.5
83.0
81.0
2.0
Mean of differences = +1.13
MS EXCEL
t-Test: Independent
Mean
Variance
Observations
Pooled Variance
WRONG
ANALYSIS
Before
After
91.16666667
90.03333333
537.875
531.11
9
9
534.4925
Hypothesized Mean Difference
0
df
16
t Stat
0.103990367
P(T<=t) one-tail
0.459234679
t Critical one-tail
1.745884219
P(T<=t) two-tail
0.918469359
t Critical two-tail
2.119904821
MS EXCEL
t-Test: Paired
Mean
Variance
Observations
CORRECT
ANALYSIS
Before
After
91.16666667
90.03333333
537.875
531.11
9
9
Pearson Correlation
0.999741718
Hypothesized Mean
Difference
0
df
8
t Stat
6.23354978
P(T<=t) one-tail
0.000125066
t Critical one-tail
1.85954832
P(T<=t) two-tail
0.000250133
t Critical two-tail
2.306005626
Kin 304
Tests of Differences between Means:
ANOVA – Analysis of Variance
One-way ANOVA
ANOVA – Analysis of Variance




Used for analysis of multiple group means
Similar to independent t-test, in that the
difference between means is evaluated based
upon the variance about the means.
However multiple t-tests result in an increased
chance of type 1 error.
F (ratio) statistic is calculated and is evaluated
in comparison to the critical value of F (ratio)
statistic
Tests of Difference – ANOVA
One-way ANOVA

One grouping factor
–
–

HO: The population means are equal
HA: At least one group mean is different
Two or more levels of grouping factor
-
Exposure = low, medium or high
Age Groups = 7-8, 9-10, 11-12, 13-14
Tests of Difference – ANOVA
Between Groups
Within Groups
F (ratio) Statistic


The F ratio compares two
sources of variability in the
scores.
The variability among the
sample means, called
Between Group Variance, is
compared with the variability
among individual scores within
each of the samples, called
Within Group Variance.
TOTAL
Tests of Difference – ANOVA
Formula for sources of variation
Tests of Difference – ANOVA
Anova Summary Table
SS
df
MS
F
Between
Groups
SS(Between)
k-1
SS(Between)
k-1
MS(Between)
MS(Within)
Within
Groups
SS(Within)
N-k
SS(Within)
N-k
Total
SS(Within) +
SS(Between)
N-1
.
Tests of Difference – ANOVA
Assumptions for ANOVA




The populations from which the samples were
obtained are approximately normally
distributed.
The samples are independent.
The population value for the standard deviation
between individuals is the same in each group.
If standard deviations are unequal
transformation of values may be needed.
Tests of Difference – ANOVA
CFS Kids 17 – 19 years (Boys)
Descriptivesa
VO2MAX
N
17.00
18.00
19.00
Total
198
154
121
473
Mean
5.1586
4.9896
5.0314
5.0710
Std. Deviation
.75824
.76877
.79604
.77357
Std. Error
.05389
.06195
.07237
.03557
95% Confidence Interval for
Mean
Lower Bound Upper Bound
5.0523
5.2649
4.8672
5.1120
4.8881
5.1747
5.0011
5.1409
Minimum
3.70
3.70
3.50
3.50
Maximum
6.20
6.20
6.00
6.20
a. SEX = 1.00
ANOVAa
VO2MAX
Between Groups
Within Groups
Total
Sum of
Squares
2.729
279.724
282.453
df
2
470
472
Mean Square
1.364
.595
F
2.292
Sig.
.102
a. SEX = 1.00




ANOVA
Dependent - VO2max
Grouping Factor - Age (17, 18, 19)
No Significant difference between means for VO2max (p>0.05)
CFS Kids 17 – 19 years (Girls)
Descriptivesa
VO2MAX
N
17.00
18.00
19.00
Total
146
132
152
430
Mean
3.7671
3.7174
3.6349
3.7051
Std. Deviation
.38812
.37610
.33578
.37000
Std. Error
.03212
.03274
.02724
.01784
95% Confidence Interval for
Mean
Lower Bound Upper Bound
3.7036
3.8306
3.6527
3.7822
3.5811
3.6887
3.6700
3.7402
Minimum
3.00
3.00
2.90
2.90
Maximum
5.00
5.20
4.50
5.20
a. SEX = 2.00
ANOVAa
VO2MAX
Between Groups
Within Groups
Total
Sum of
Squares
1.331
57.397
58.729
df
2
427
429
Mean Square
.666
.134
F
4.953
Sig.
.007
a. SEX = 2.00




ANOVA
Dependent - VO2max
Grouping Factor - Age (17, 18, 19)
Significant difference between means for VO2max (p<0.05)
Post Hoc tests




Post hoc simply means that the test is a followup test done after the original ANOVA is found
to be significant.
One can do a series of comparisons, one for
each two-way comparison of interest.
E.g. Scheffe or Tukey’s tests
The Scheffe test is very conservative
Tests of Difference – ANOVA
Multiple Comparisonsa
Dependent Variable: VO2MAX
Scheffe
Scheffe’s
(I) AGE
17.00
– Post Hoc Test
18.00
19.00
(J) AGE
18.00
19.00
17.00
19.00
17.00
18.00
Mean
Difference
(I-J)
.1690
.1272
-.1690
-.0418
-.1272
.0418
Std. Error
.08289
.08902
.08289
.09372
.08902
.09372
Sig.
.126
.361
.126
.905
.361
.905
Boys
95% Confidence Interval
Lower Bound Upper Bound
-.0346
.3725
-.0914
.3458
-.3725
.0346
-.2719
.1883
-.3458
.0914
-.1883
.2719
a. SEX = 1.00
Multiple Comparisonsa
Dependent Variable: VO2MAX
Scheffe
(I) AGE
17.00
18.00
19.00
(J) AGE
18.00
19.00
17.00
19.00
17.00
18.00
Mean
Difference
(I-J)
.0497
.1323*
-.0497
.0826
-.1323*
-.0826
Std. Error
.04403
.04249
.04403
.04362
.04249
.04362
Sig.
.529
.008
.529
.168
.008
.168
Girls
95% Confidence Interval
Lower Bound Upper Bound
-.0585
.1579
.0279
.2366
-.1579
.0585
-.0246
.1897
-.2366
-.0279
-.1897
.0246
*. The mean difference is significant at the .05 level.
a. SEX = 2.00


Boys – no significant differences, would not run post hoc tests
Girls – VO2max for age19 is significantly different than at age17
ANOVA – Factorial design
Multiple factors



Test of differences between means with two or
more grouping factors, such that each factor is
adjusted for the effect of the other
Can evaluate significance of factor effects and
interactions between them
2 – way ANOVA: Two factors considered
simultaneously
Tests of Difference – ANOVA
Tests of Between-Subjects Effects
Dependent Variable: VO2MAX
Between-Subjects Factors
N
AGE
SEX
17.00
18.00
19.00
1.00
2.00
344
286
273
473
430
Source
Corrected Model
Intercept
AGE
SEX
AGE * SEX
Error
Total
Corrected Total
Type III Sum
of Squares
424.295 a
16946.730
3.032
403.923
.715
337.122
18407.560
761.417
df
5
1
2
1
2
897
903
902
Mean Square
F
84.859
225.789
16946.730 45091.176
1.516
4.034
403.923
1074.742
.358
.952
.376
Sig.
.000
.000
.018
.000
.386
a. R Squared = .557 (Adjusted R Squared = .555)



Example: 2 way ANOVA
Dependent - VO2max
Grouping Factors
–
–



AGE (17, 18, 19)
SEX (1, 2)
Significant difference in VO2max (p<0.05) by SEX=Main effect
Significant difference in VO2max (p<0.05) by AGE=Main effect
No Significant Interaction (p<0.05) AGE * SEX
Analysis of Covariance (ANCOVA)


Taking into account a relationship of the
dependent with another continuous variable
(covariate) in testing the difference between
means of one or more factor
Tests significance of difference between
regression lines
Tests of Difference – ANOVA
Maximum Grip Strength (lbs)
75
70
Male
65
Female
60
55
50
45
♂
♀
40
35
r = +0.78
r = +0.75
30
25
♂+♀
r = +0.91
27.0
29.0
20
15
17.0
18.0
19.0
20.0
21.0
22.0
23.0
24.0
25.0
26.0
28.0
30.0
31.0
32.0
Skinfold-adjusted Forearm Girth (cm)

Scatterplot showing correlations between
skinfold-adjusted Forearm girth and maximum
grip strength for men and women
Use of T tests for difference between means?
Group Statistics
SAFAGR
GRIPR
SEX
1.0
2.0
1.0
2.0
N
20
23
20
23
Mean
25.801
21.355
52.310
35.304
Std. Deviation
1.9882
1.4569
7.8432
6.8536
Std. Error
Mean
.4446
.3038
1.7538
1.4291
Independent Samples Test
Levene's Test for
Equality of Variances
F
SAFAGR
GRIPR

Equal variances
ass umed
Equal variances
not as sumed
Equal variances
ass umed
Equal variances
not as sumed
1.713
.525
Sig.
.198
.473
t-tes t for Equality of Means
t
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
8.437
41
.000
4.446
.5270
3.3816
5.5101
8.257
34.408
.000
4.446
.5385
3.3521
5.5397
7.589
41
.000
17.006
2.2407
12.4804
21.5309
7.517
38.101
.000
17.006
2.2623
12.4263
21.5850
Men are significantly (p<0.05) bigger than women in
skinfold-adjusted forearm girth and grip strength
ANCOVA
Dependent – Maximum Grip Strength (GRIPR)
Grouping Factor – Sex
Covariate – Skinfold-adjusted Forearm Girth (SAFAGR)
Tests of Between-Subjects Effects
Dependent Variable: GRIPR
Source
Corrected Model
Intercept
SAFAGR
SEX
Error
Total
Corrected Total
Type III Sum
of Squares
4378.670a
234.150
1284.985
25.731
917.182
85596.020
5295.852
df
2
1
1
1
40
43
42
Mean Square
2189.335
234.150
1284.985
25.731
22.930
F
95.481
10.212
56.041
1.122
Sig.
.000
.003
.000
.296
a. R Squared = .827 (Adjusted R Squared = .818)



SAFAGR is a significant Covariate
No significant difference between sexes in Grip Strength when adjusted
for Covariate (representing muscle size)
Therefore one regression line (not two, for each sex) fit the relationship
3-way ANOVA
For 3-way ANOVA, there will be:
- three 2-way interactions (AxB, AxC) (BxC)
- one 3-way interaction (AxBxC)
 If for each interaction (p > 0.05) then use main
effects results
 Typically ANOVA is used only for 3 or less
grouping factors

Tests of Difference – ANOVA
Repeated Measures ANOVA



Repeated measures design – the same variable
is measured several times over a period of time
for each subject
Pre- and post-test scores are the simplest
design – use paired t-test
Advantage - using fewer experimental units
(subjects) and providing a control for differences
(effect of variability due to differences between
subjects can be eliminated)
Tests of Difference – ANOVA
Related documents