Download Ch09

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Outline and Review of Chapter 9 (and 11)
Introduction to the t test
Hypothesis testing with the t statistic is based on hypothesis testing with the z statistic. When
using z, you use the population parameters, mean (μ) and standard deviation (σ), to determine the
probability of obtaining a particular score (X) or sample mean (M). There will be some times,
however, when you do not know the population parameters and other times when you want to
compare your sample mean to something other than the population mean. In these cases, you
will need to us a t-test. The t-test is the most common statistic for comparing differences
between two means. Like z, t is an inferential statistic or a statistic that is used to make an
inference, or a decision. In our case, that decision is usually whether or not to reject the null
hypothesis.
Basic t-test Theory
We previously discussed the differences between the equations for population standard deviation
(σ) and the sample standard deviation (s).
Population: σ =
SS
=
N
2
Sample: s =
SS
=
( n  1)
SS
=
df
ss
The equations differ because the sample standard deviation does not actually represent the
sample standard deviation (confused?). The sample standard deviation is a formula designed to
estimate the population standard deviation. It is a guess, but it’s better than nothing, so if you
don’t know the population standard deviation, the t-test allows you conduct a hypothesis test
using the sample standard deviation in place of the population standard deviation. Look at the
similarities between the equation for z and the equation for t:
z
M 
M
t
M 
sM
The standard error of the sample (sM) is calculated from the sample standard deviation in the
same way the standard error of the population (σM) is calculated from the population standard
deviation.

s
M 
sM 
n
n
The sample standard deviation is an estimate of the population standard deviation, the accuracy
of this estimate improves as sample size increases. As a result, for the t-test, the critical value
that you need to reach in order to reject the null hypothesis will be quite large if your sample size
is small but it will decrease as sample size increases. This means that, unlike the z-test, the t-test
has different critical regions for different sample sizes (see table below). Although the critical
value for z for a two-tailed test with α = 0.05 will always be ± 1.96, the critical value for t is ±
4.303 if n = 3 and ± 2.042 if n = 30. The critical values for t get closer and closer to the critical
values for z as sample size increases. This is because, as sample size increases, s becomes a
better estimate of σ.
In summary, t is used just like z with two important exceptions: (1) for the t-test, the sample
standard deviation is used in place of the population standard deviation and (2) the critical values
for t will depend on the sample size; small samples will have larger critical values than small
sample sizes.
One-Sample t-tests
You can use a one-sample t-test to compare a sample mean to any single value, whether it is a
real population mean or any other hypothetical value that interests you. For example, imagine
that your bank informs you that they will charge a fee each time your average yearly checking
account balance drops below $100. Although you have always made sure your account was
greater than zero, you never really paid much attention to whether or not your account balance
was greater than $100. You know what your monthly balances have been and you know that the
average of last year’s twelve monthly balances was greater than 100 (M = 106.25) but it would
be nice to know if this number is significantly greater than 100 so you don’t slip below this
average next year. You don’t expect your financial situation to change in the upcoming year so
you decide to take last year’s twelve monthly balances to see if the mean balance for these
months is significantly greater than $100. This will give you a good idea of your odds of slipping
below an average balance of $100 next year. Note that $100 does not represent a true population
mean, but it is an important value to which you would like to compare your average monthly
balance. For simplicity, we will still refer to this as a hypothetical population mean and
represent it with μ. Here are your twelve monthly balances from last year along with the
appropriate statistics:
January
February
March
April
May
June
July
August
September
October
November
December
107
93
105
97
100
113
105
106
112
119
99
119
M=
s=
sm=
n=
106.25
8.31
2.40
12
Your hypothesis test is based on the question, “Was last year’s average monthly balance
significantly greater than $100?”
H0: M <= μ or
H1: M > μ or
H0: 106 <= 100
H1: 106 > 100
Note that we are conducting a one-tailed test. If we set α = 0.05, the critical value for t with n-1
degrees of freedom (df = 11) will be + 1.796. Make note of this: t(11)crit = 1.796.
Figure 9.1. The t distribution showing the critical region (or zone of rejection) in red for values
greater than +1.796.
We have everything we need to calculate t using the formula:
t
M 
sM
t
106  100
2.40
t  2.5
Since + 2.5 is well within the zone of rejection, you conclude that last year’s average monthly
balance was significantly greater than $100 and, provided nothing dramatic happens to your
financial situation, you will probably remain safely above $100 next year. If you wanted to
report these results formally, it would look something like this:
Last year’s mean monthly balance (M = 106.25) was greater than the hypothesized mean
monthly balance (μ = 100) and this difference was statistically significant (t(11) = 2.5, p <0.05,
one-tailed).
Notice that the degrees of freedom (df) are reported in parentheses after t and that you clearly
state that your probability of making a Type I Error (p) is less than your declared value of α
(0.05) which means that you should reject the null hypothesis. Since many researchers are now
using statistical software that can report the exact probability of making a Type I error, you may
see the results presented with an exact value of p. In our case, p = 0.013 so, if you know this,
you should tell your reader (t(11) = 2.5, p = 0.013, one tailed). In these cases, it is up to you to
understand that p is less than the declared value of alpha and that the null hypothesis should be
rejected. It is also important to realize that statistical software will probably report p to only three
decimal places. This means that you could get a result that states p = 0.000. Understand that
your probability of making a Type I error is never zero but the software has rounded a very low
p value to the nearest three decimal places. When reporting this result, you do not know the
exact value of this small p, but you do know that it is less than 0.001, so the appropriate thing to
do is to state that p < 0.001.
Effect Size and t-tests
Effect size, as measured by Cohen’s d, is calculated just as it was when you were using z. This
time, however, instead of using the population standard deviation, you must use the sample
standard deviation. Since the sample standard deviation (s) is an estimate of the population
standard deviation (σ), the value is an estimate of Cohen’s d:
estimated d 
M 
s
Just as before, consider values of d near 0.5 to show a medium effect, values below 0.3 to be
small, and values at and above 0.8 to be large.
Related-Samples t-tests
A simple modification of the one sample t-test opens the door to another powerful and common
version of the t-test, the related-samples t-test, or paired-samples t-test. One of the simplest
and most common experimental strategies is the within-subjects design in which you measure
people or subjects under a treatment condition and under a control condition. For example, if
you study maze learning in rats and typically test your animals in an illuminated room, you
might like to see if rats can complete the maze equally well in the dark. In order to conduct this
experiment in an unbiased fashion, you take ten naïve rats (who have never been in your maze)
and time them under each of the two conditions. To do this properly, you should use a
counterbalanced design and randomly determine which condition to run first for each rat. This is
a repeated-measures design because you measured each subject twice. It is also a withinsubjects design because you are comparing treatment and control measurements that are coming
from the same group if subjects. Your results appear below:
Rat
1
2
3
4
5
6
7
8
9
10
Illuminated
(seconds)
128
138
125
133
43
73
60
70
11
42
M = 82.3
SD = 45.43
Dark
(seconds)
19
290
120
130
114
137
218
151
226
141
M = 154.6
SD = 74.38
There is quite a lot to consider here. You have two sets of times, each with its own mean and
standard deviation. So far, we do not have the tools to handle more than one sample mean
(although we will get to that in later chapters). However, there is another way of looking at these
data. Each rat has two scores. These are paired scores and we can turn each pair into a single
score by taking the difference between them. If we subtract the time in the illuminated condition
from the time in the dark condition we can create another column of Difference Scores. These
scores show us how much each rat’s time changed as it moved from the illuminated condition to
the dark condition or vice versa. We can only do this because each rat was tested under each
condition. (If the ten rats in the illuminated condition were not the same rats used in the dark
condition, it would be pointless to calculate difference scores since no single score in one
condition would have any special relationship to any single score in the other condition. We
will discuss how to run t-tests for independent groups in the following chapter.)
Note that some values are positive and others are negative; it is essential that you make this
distinction.
Rat
1
2
3
4
5
6
7
8
9
10
Illuminated
(seconds)
128
138
125
133
43
73
60
70
11
42
M = 82.3
SD = 45.43
Dark
(seconds)
19
290
120
130
114
137
218
151
226
141
M = 154.6
SD = 74.38
Difference
Scores
-109
152
-5
-3
71
64
158
81
215
99
M D= 72.3
SD = 93.95
These difference scores contain all the information that we need. They tell us how much change
occurs when we go from the illuminated condition to the dark condition. Once we have
calculated the differences, we can forget about the original data and focus on the difference
scores. From there, we can calculate the Mean Difference (MD), Standard Deviation of
Difference Scores, and the Standard Error of the Mean Difference (SMD). The data are simplified
as follows:
Rat
1
2
3
4
5
6
7
8
9
10
Difference
Scores
-109
152
-5
-3
71
64
158
81
215
99
MD =
s=
SMD =
n=
72.3
93.95
29.71
10
M D= 72.3
SD = 93.95
We will be testing the hypothesis that a difference exists. Therefore, our null hypothesis will
state that the mean difference equals zero and the alternative hypothesis will state that it does
not.
H0: MD = μD
H1: MD < > μD
or
or
H0: 72.3 = 0
H1: 72.3 < > 0
Calculating a Related-Samples t
The formula for t does not change for a related-samples test but we do need to think a bit about
the questions we are asking. We know the mean of the difference scores (MD) and we know how
to calculate the standard error of the difference scores (sMD), but what is our hypothetical value
for the mean of the difference scores? If we want to know if a change has occurred between the
Illuminated and dark conditions, then we are asking of the difference is greater than zero. This
means that the hypothetical mean difference will always be zero if your null hypothesis predicts
that the mean difference will be zero (Note: There may be times when your null hypothesis does
not predict a change of zero. If that is the case, plug that value in for μD).
Therefore, the equation for a repeated measures t-test is:
t
M D  D
sMD
and if your null hypothesis predicts no change, μD = 0 and the equation simplifies to:
t
MD
sMD
t
72.3
29.71
t  2.43
Cohen’s d is estimated as:
estimated d 
MD
s
d
72.3
93.95
d  0.77
Your sample size will be the number of difference scores so, for this example, n = 10 and df =
9 . As always, the critical values for t are selected based on df and whether you are conducting a
two-tailed test (predicting that the change will be greater than zero) or conducting a one-tailed
test (specifically predicting either a positive or negative change). In our case, the critical value
for t and 9 degrees of freedom and α = 0.05 is tcrit = 2.262. This is how you would report the
results:
We found that, compared to the illuminated condition, rats took significantly more time to solve
the maze in the dark, t(9) = 2.43, p < 0.05, one-tailed, d = 0.77.
Summary
In these examples, we used t-tests to compare means and to make an informed statement about
whether a sample mean (M) was significantly different than a hypothetical mean (μ) or if the
mean difference (MD) from a set of paired scores is significantly different than zero. In the next
section, we will modify the t-test formula one more time so we can compare unpaired scores.
This t-test is commonly used in Between-Groups Studies in which individual scores from one
group have no special relationship with the scores in the other.
Values of t corresponding to the proportion of the distribution in one tail or two tails combined.
Numbers in the left column are degrees of freedom (n-1 ).
Proportion in One Tail
0.10
0.05
0.03
0.01
0.005
Proportion in Two-Tails
0.20
0.10
0.05
0.02
0.01
1
3.078
6.314
12.710
31.820
63.660
2
1.886
2.920
4.303
6.965
9.925
3
1.638
2.353
3.182
4.541
5.841
4
1.533
2.132
2.776
3.747
4.604
5
1.476
2.015
2.571
3.365
4.032
6
1.440
1.943
2.447
3.143
3.707
7
1.415
1.895
2.365
2.998
3.499
8
1.397
1.860
2.306
2.896
3.355
9
1.383
1.833
2.262
2.821
3.250
10
1.372
1.812
2.228
2.764
3.169
11
1.363
1.796
2.201
2.718
3.106
12
1.356
1.782
2.179
2.681
3.055
13
1.350
1.771
2.160
2.650
3.012
14
1.345
1.761
2.145
2.624
2.977
15
1.341
1.753
2.131
2.602
2.947
16
1.337
1.746
2.120
2.583
2.921
17
1.333
1.740
2.110
2.567
2.898
18
1.330
1.734
2.101
2.552
2.878
19
1.328
1.729
2.093
2.539
2.861
20
1.325
1.725
2.086
2.528
2.845
21
1.323
1.721
2.080
2.518
2.831
22
1.321
1.717
2.074
2.508
2.819
23
1.319
1.714
2.069
2.500
2.807
24
1.318
1.711
2.064
2.492
2.797
25
1.316
1.708
2.060
2.485
2.787
26
1.315
1.706
2.056
2.479
2.779
27
1.314
1.703
2.052
2.473
2.771
28
1.313
1.701
2.048
2.467
2.763
29
1.311
1.699
2.045
2.462
2.756
30
1.310
1.697
2.042
2.457
2.750
40
1.303
1.684
2.021
2.423
2.704
50
1.299
1.676
2.009
2.403
2.678
60
1.296
1.671
2.000
2.390
2.660
80
1.292
1.664
1.990
2.374
2.639
100
1.290
1.660
1.984
2.364
2.626
120
1.289
1.658
1.980
2.358
2.617
8
1.282
1.645
1.960
2.326
2.576