Download The t-test - University of South Florida

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
The t-test
Inferences about Population Means
when population SD is unknown
Confidence intervals in z (Review)






Want to estimate height of students at USF.
Sampled N=100 students. Found mean =68 in
and SD = 6 in.
Best guess for population mean is 68 inches plus
or minus some.
X
X 
95%CI = X  z.05 X
N
95%CI=68±(1.96)[6/sqrt(100)]
68 ±1.96(.6) = 68 ±1.18
Interval is 66.82 to 69.18. Such an interval will
contain the mean 95% of the time.
Problem with z



Formulas so far use population SD, and they have
been correct, but SD is usually unknown, so we
have to estimate
Estimate will be off a bit; would be nice to account
for this
The statistic called ‘t’ adjusts for error in estimate of
SD. Estimate of SD is better as sample size
increases, so t changes with N. The values of t are
basically the same as z, but t spreads out more and
more as the sample size gets small.
The t Distribution
We use t when the population variance is unknown (the
usual case) and sample size is small (N<100, the usual
case). If you use a stat package for testing hypotheses
about means, you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends on
its df. As N becomes infinitely large, t becomes normal.
Example values from t and z
Area
beyond
value
z
t (df=100)
t (df=25)
[t changes
with df (N)]
.50
0
0
0
.25
.67
.68
.68
.025
1.96
1.98
2.06
.005
2.57
2.62
2.79
Degrees of Freedom
For the t distribution, degrees of freedom are always a
simple function of the sample size, e.g., (N-1).
One way of explaining df is that if we know the total or
mean, and all but one score, the last (N-1) score is not free to
vary. It is fixed by the other scores. 4+3+2+X = 10. X=1.
t table
Confidence Intervals in t
Want to estimate height of students at USF. Sampled
N=100 students. Found mean =68 in and SD = 6 in.
Best guess for population mean is 68 inches plus or
minus some.
( X  X )2

95%CI =
X  t.05 s X
95%CI=68±(1.98)[6/sqrt(100)]
sX 
sX

N
N 1
N
t.05  t( .05, 2tails,df 99)  1.98
68 ±1.98(.6) = 68 ±1.19
Interval is 66.81 to 69.19. Such an interval will contain
the mean 95% of the time.
Note this is virtually the same as in z, where interval was 66.82 to 69.18.
Matters more when N is small.
CI in t, Example 2

Suppose we want to estimate mean curiosity score
for psychology students. Sample N = 25 people,
Mean = 52, SD = 10.
ˆ  52; ˆ  s X  10; ˆ X  s X 
sX
10

2
N
25
t(.05)  t(.05, 2tail,df  24)  2.064
95%CI  X  t.05 s X  52  2.064(2)
95%CI  47.872 to 56.128
Note: this is same as CI in
z, except we use t instead
of z. The value of t comes
from a table. Tabled value
depends on df.
One-sample t-test
We can use a confidence interval to “test” or decide
whether a population mean has a given value. For
example, suppose we want to test whether the mean
height of women at USF is equal to 68 inches.
Suppose we randomly sample 50 women students
at USF. We find that their mean height is 63.05
inches. The SD of height in the sample is 5.75
inches. Then we find the standard error of the
mean by dividing SD by sqrt(N) = 5.75/sqrt(50) =
.81. The critical value of t with (50-1) df is
2.01(find this in a t-table). Our confidence interval
is, therefore, 63.05 plus/minus 1.63. See the graph.
One-sample t Example 1
One sample t test
C onfiden c e inter v al v e iw
10
N =50
M = 63.05
SD =5.75
8
S
6
F requency
Pop Mean = 68
X
 .8 1
H is togr am o f Sample H eig ht
t=2.0 1
4
ci  X  1.63
2
0
40
50
60
70
80
Height in Inches
Take a sample, set a confidence interval around the sample mean.
Does the interval contain the hypothesized value?
Conventional Steps (Cookbook)






1. Choose alpha (.05)
2. State null and alternative hypotheses (H0:
pop mean is 68) (Ha is not 68)
3. Calculate observed stat (t = ?)
4. Find critical value (tcrit =value in table)
5. State decision rule (if obs > tcrit, reject
null)
6. State conclusion (pop mean is not 68)
One sample t test
t distribution view
15
12
X    4.95
t d i s tri b u ti o n
S X  .8 1
9
F requency
  68
X  6305
.
6
t 
X  
 4 .9 5

  6 .1 1
SX
.8 1
3
0
62
Height in Inches
70
The sample mean is roughly six standard deviations (St. Errors) from the hypothesized
population mean. If the population mean is really 68 inches, it is very, very unlikely
that we would find a sample with a mean as small as 63.05 inches.
One-sample t, Example 2
X  25, s X  2.52

Over the years,
smokers at M’s
treatment center report
smoking an average of
30 cigs per day. New
treatment Smoke-BGon pills given to N=25
new clients. Did it
help?
tobs 
sX 
X 
sX
sX
2.52

 .50
N
25
X   25  30
tobs 

 10
sX
.5
tcrit  t( .05, 2tails,df  24)  2.064
|tobs| > tcrit. Reject null. Result is significant.
Application

We prefer to use the t test instead of the z
test when the _____ is small.




1 mind
2 sample size
2 standard error
4 type II error
Definition

The t test adjusts for error in estimating the
population ____ during hypothesis testing.




1 mean
2 median
3 range
4 standard deviation
Application

We compute a one-sample t test and find an
obtained value of t of 2.5. The critical
(tabled) value of t given the null hypothesis
turns out to be 2.01. What do we decide?




1
2
3
4
the result is significant
the result is not significant
we made a type I error
we made a type II error