Download T-scores, hypothesis testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Misuse of statistics wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 1
Overview of t-scores
• Very similar to z-scores
– Provides way of judging how extreme a sample mean is
– A bunch of t-scores form a t-distribution
• Done when σ is unknown
• Used for hypothesis testing:
– Ex: You wonder if college students really get 8 hours of sleep
• Ho: μ = 8 (College students do get eight hours of sleep)
• Ha: μ  8 (College students don’t get eight hours of sleep)
• t-distribution provides foundation for t-test
– can do by hand with table
– can do on SPSS
• Key difference: t-test done when σ is unknown
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 2
Review: Different Measures of Stand. Dev.
* Calculate differently based on available information
Have all the
scores in a
population
Have only
scores in a
sample, want
to estimate
variability in
population
Dr. Sinn, PSYC 301
 x
x


n
2
E.g., SAT scores (ETS has
every single score).
2
x 
E.g., hours of sleep
students in this class slept
last night
(Need to adjust because
you’ve only got sample
data.)
Unit 2: z, t, hyp, 2t
sˆx 
n
x
2
2

 x

n 1
n
p. 3
Different Measures of Sampling Error
• If σx is known, do z-test
• If σx is not known, do t-test
• Use σx to get measure of
sampling error in
distribution.
• Use ŝx to get measure of
sampling error in
distribution.
x 
x
sˆx
sˆx 
n
so...
x
t
sˆx
n
so...
x
z
x
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 4
t-distributions vs. z-distributions
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 5
Comparing Frequency & Sampling Distributions (T1)
Frequency D-z
Sampling D – z
Sampling D - t
x ’s
xbars
xbars
x
x
x
Amt. of
Variab.


+
Meas. of
Variab.
x
x
sx
Have
Compare
Formula
Dr. Sinn, PSYC 301
z
x

z
x
x
Unit 2: z, t, hyp, 2t
x
t
sˆx
p. 6
Practice Problem: Calculating t-test
•
Do college students sleep 8 hours per night?
•
Follow hypothesis testing steps:
1. State type of comparison
2. State null (H0) and alternative (HA)
3. Set standards:
a. State type of test (& critical values if doing by hand )
•
E.g., t-critical (get from table in back of book)
b. Significance level you require (eg. α = .05)
c. 1 vs. 2 tailed test (we’ll always do 2-tail tests- more conservative)
4. Calculate statistic (e.g. get t-obtained)
5. State decision and explain in English.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 7
Finding t-critical
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 8
Homework Problem
• College graduates score 35, 45, 30, 50, 60, 55,
60, 45, 40 on a critical thinking test.
• If normal people score 45 on the test, do college
graduates score significantly better?
• Do hypothesis testing steps
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 9
HW: Standard Deviation Calculation
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 10
HW: T-Calculation
• SD = 10.6066
• SE = 3.536
• t = (46.67-45) / 3.536 = .4781
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 11
HW: Hypothesis testing steps
1. Compare xbar and μ
2. Ho: μ = 45 Ha: μ  45
3. α = .05, df = n-1 = 8, two-tailed test.
tcritical = 2.306
4. tobt = .471
5. Retain Ho. The hypothesis was not
supported. College graduates did not
score significantly better (M=46.67) on
critical thinking (μ =45), t(8) = .471, n.s.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 12
T-test Example: Speed
• The government claims cars traveling in front
of your house average 55 mph. You think this
is a load of…. That is, you think cars travel
faster than this.
• You steal a police radar gun and clock nine
cars, obtaining the following speeds:
• 45, 60, 65, 55, 65, 60, 50, 70, 60
• What’s μ?
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 13
SPSS Steps
Go to
Compare
Means
Pick
variable
Enter the
speeds of
cars you
clocked.
Dr. Sinn, PSYC 301
Set to μ
Unit 2: z, t, hyp, 2t
p. 14
Output part #1
Number of cars you measured (sample size).
Average speed of these cars (sample mean).
One-Sample Statistics
SPEED
N
9
Mean
58.89
x
Std.
Deviation
7.817
Std. Error
Mean
2.606
ŝ x
ŝ x
Standard error of
the mean – the
typical difference
we’d expected
sampling error to
cause.
Standard deviation of these
speeds.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 15
Output part #2

One-Sample Test
Test Value = 55
tobtained
By hand, it’s
x
t
sˆx
SPEED
t
1.492
Sig.
(2-tailed)
.174
df
8
x
•pobt: Proportion of time you’d
see a difference of this size
simply because of sampling error
•This value must fall below .05 to
say we have a significant
difference.
Dr. Sinn, PSYC 301
Mean
Difference
3.89
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
-2. 12
9.90
Unit 2: z, t, hyp, 2t
Note:
There’s no tcritical when
done with
SPSS
p. 16
Hypothesis Testing Steps
1. Compare xbar and μ
2. Ho: μ = 55 Ha: μ  55
3. α = .05, df = n-1 = 8, two-tailed test.
4. tobt = 1.492, pobt = .174
5. Retain Ho. Average car speed (M=58.89)
does not differ significantly from 55 mph
speed limit, t(8) = 1.492, n.s.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 17
Same test, different outcome
• What if we had measured
slightly different speeds?
• 50,60,65,55,65,60,55,75,65
•
In this case, we’d reject the Ho.
•
Speeds appear to exceed 55
mph, t(8) = 2.475, p.05
• What happens to μ? xbar?
One-Sample Statistics
N
SPEED
9
Mean
61.11
Std. Deviation
7.407
Std. Error
Mean
2.469
One-Sam ple Test
Test Value = 55
SPEED
Dr. Sinn, PSYC 301
t
2.475
df
8
Sig. (2-tailed)
.038
Unit 2: z, t, hyp, 2t
Mean
Difference
6.11
95% Confidenc e
Int erval of t he
Difference
Lower
Upper
.42
11.80
p. 18
Learning Check
1. As tobt increases, we become more likely to ___ Ho.
2. If the sample size increases tobt will _____ and tcritical
will ______
3. If the difference between xbar and μ increases
a.
b.
c.
d.
e.
sampling error will ______
tcritical will _______
tobtained will _______
ŝxbar will _______
you become _____ likely to reject the Ho
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 19
Learning Check
1. A researcher compares the number of workdays missed
for employees who are depressed versus the companywide average of 6 days per year.
a. Rejecting the Ho would mean what about depressed employees?
b. Would you be more likely to reject Ho with a sample mean of 8 or 10?
c. Would you be more likely to reject Ho with a ŝx of 1.5 or 3?
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 20
Decision Errors
• Educated guesses can be wrong.
• Def: Drawing a false conclusion from an hypothesis test
– Never know for sure if a difference is due just to sampling error
or if it truly reflects a treatment effect.
• Two Types
– Type I: Falsely rejecting null
• Seeing something that’s not there. False positive.
– Type II: Falsely retaining null
• Missing something that is there. False negative.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 21
Decision Errors – Example #1
“Is that a burglar or am I hearing things?”
•
You hear a noise in your house and wonder if it means
there’s a burglar in the house. The problem is that it
could just be regular background noise (___________)
or it really could mean something’s going on
(____________). You’d make a mistake if you…
a. decide there’s a burglar when there is not.
Type I Error
b. decide there’s no burglar when there is.
Type II Error
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 22
Decision Errors – Example #2
•
“Did the training work or is this group of
people just more talented than usual?”
•
You implement a training program to improve job performance,
and then compare the performance of trainees to average
performance. You’d make a mistake if you….
a. Conclude participants don’t differ from average, but in
reality the training DOES improve performance.
Type II error
b. Conclude participants do better than average, but in reality
the training does NOT improve performance.
Type I error
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 23
Graph of Type I Error – α
When rejecting Ho, you may commit a Type I error.
(Wrongly concluding cars DO NOT average 55 mph.)
But this is
actually true.
Ho: μ=55
Ha: μ>55
α
You guess this.
α
tcrit
tcrit
So α is the chance of
making a Type I error.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 24
Graph of Type II Error – β
When retaining Ho, you may commit a Type II error.
(In this case, assuming cars DO average 55 mph.)
You guess
this…
Ho: μ=55
Ha: μ>55
…but this is
actually true.
β
tcrit
So β is the chance of making
a Type II error.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 25
Effect-size statistic: d
• Statistical vs. Practical Significance
– Statistical Sig: Decides if difference is reliable (e.g., t-test)
– Practical Sig: Decides if difference is big enough to be
practically important
– So, only do tests for practical significance if you get statistical
significance first (i.e., if you reject the H0
• Effect size (d)
–
–
–
–
–
Def: Impact of IV on DV in terms of standard deviation units.
So, d=1 means the IV “raises” scores 1 full standard deviation.
d = .2+ small effect size
This is
x
d = .5+ moderate effect size
d
standard
d = .8+ large effect size
sˆx
deviation, not
standard error
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 26
Practice: Meditation
• You suspect the anxiety level of people in your
meditation class will differ from a score of 3 on a 1-5
anxiety self-assessment scale.
• #1: Do an SPSS analysis and then fill-in the
following information:
μ=
Mean Difference =
σ=
tcrit =
ŝx =
tobt =
Ŝxbar=
pobt =
M=
Dr. Sinn, PSYC 301
x
2
3
4
3
2
2
2
1
d=
Unit 2: z, t, hyp, 2t
p. 27
One-Sample Statistics
N
anxiety
8
Std.
Deviation
.916
Mean
2.38
Std. Error
Mean
.324
One-Sample Test
Test Value = 3
anxiety
Dr. Sinn, PSYC 301
t
-1.930
df
7
Unit 2: z, t, hyp, 2t
Sig.
(2-tailed)
.095
Mean
Difference
-.625
p. 28
Practice: Meditation
• You suspect the anxiety level of people in your
meditation class will differ from a score of 3 on a 1-5
anxiety self-assessment scale.
• #1: Do an SPSS analysis and then fill-in the
following information:
μ=3
Mean Diff.
σ =???
tcrit = ± 2.365
ŝx =
.916
tobt = -1.930
Ŝxbar=
.324
d=
M=
Dr. Sinn, PSYC 301
2.38
= -.625
x
2
3
4
3
2
2
2
1
pobt = .095
inappropriate
Unit 2: z, t, hyp, 2t
p. 29
• #2: Hypothesis Testing Steps
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 30
#2: Hypothesis Testing Steps
1. Cf. M and μ.
2. Ho: μ = 3
Ha: μ ≠ 3
3. 2-tailed, α = .05, df=7
4. tobt = -1.930, pobt = .095
5. Retain Ho. The hypothesis was not supported.
The anxiety of those meditating (M=2.38) did
not differ significantly from average anxiety
(μ=3), t(7) = -1.930, n.s.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 31
• #3 Sketch the distribution, including regions of
rejection, tcritical and tobtained.
• #4 What type of decision error is possible here?
• #5 Pretend you had a significant result – calculate d.
Dr. Sinn, PSYC 301
Unit 2: z, t, hyp, 2t
p. 32