Download t Test

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis testing
(The One-Sample Tests)
Today:
Paired-Samples t-Test
Independent Samples t-Test
The H0 Hypothesis
The logic of Statistical Hypothesis
testing is based on indirect proof –
rejecting or retaining (keeping) the H0
hypothesis based on a test statistics!
H1: µestimated <
µ0
H0: µestimated =
µ0
H2: µestimated
> µ0
We assume NO difference (H0)
UNTIL we COLLECT
ENOUGH
EVIDENCE to prove that
there is significant DIFFERENCE
Level of significance
and possible errors of the decision
When rejecting H0:
- Possible error: rejecting H0 when it’s actually true!
- Name of error: Type I. error
- Probability = the level of significance (α)
(getting statistically significant results when you shouldn’t – erroneous conclusion)
When retaining (keeping) H0:
- Possible error: a false H0 is retained!
- Name of error: Type II. error
- Probability = usually unknown (β)
(failure to discover existing differences – erroneous conclusion)
By reducing the Type I. Error – you raise the probability of the Type II. Error!
CHOOSE YOUR LEVEL OF SIGNIFICANCE
PRIOR TO ANALYSES!
If I choose…
p<0.05, for the level of significance for my statistical decision,
probability of the Type I. error is:
The confidence for my decision is: in this case is 100%5%=95%.
If I choose p<0.01, for the level of significance for my statistical
decision, probability of the Type I. error is (1%) or smaller!
The p<0.1 level is called TENDENCY.
P values above 0.05 are NOT SIGNIFICANT!
Most statistical programs (e.g. SPSS provides the exact level of
significance for a given test, e.g. p=0.003 (this is significant). Or
p=0.23 (not significant); p=0.062 (tendency); p=0.9 (not
significant), etc.
Base Your conclusions on relevant
statistics, and report:
• number and most important characteristics of
your observational units
• the statistical test you’ve used
and it’s parameters (level of sig, test value,
degree of freedom, hypothesized mean, etc. …)
• your precise conclusion (answer the research
question by justifying / rejecting the research
statement or report no significant difference)
Comparing the Means of
DEPENDENT Samples:
The Paired-Samples t Test
Chapter 15. (pg. 275-276, 278-285)
• What are “dependent samples”?
• The “direct difference” method
• df (degrees of freedom – revisited)
• Type I. & type II. errors
What are “dependent samples”?
The observations from one sample are
related in some way to those from another!
• Repeated – measures design: samples come from the same individuals
• Matched – subjects design: selecting pairs based on a certain criteria
• In any other case when the observations in the samples are NOT
independent (e.g. husband – wife, mother – son, etc.)
E.g.
Samples:
IQ scores of twins:
Dependent
Independent
X
Females and males of a school
X
Students who sit together in the classroom
X
Pulse rate before and after chemotherapy
X
Case-variable matrix for
dependent samples:
Subjects:
•
•
•
•
Condition 1
Condition 2
1.
5
10
2.
6
9
3.
5
6
4.
9
12
Difference score
5
3
1
3
Calculating difference scores for each unit of observation
Calculating average difference score for the sample…
Calculating SD of difference scores…
T tests work with these values, this is the:
„direct difference” method!
What to use: z Test, or one of
the t Tests?
Testing Statistical Hypotheses About µ
:
IF
known
not known
z Test
X − µ0
uz =
σ/ n
σ
t Test
t=
Comparing the Means
of DEPENDENT
Samples
t Test
x − µ0
s/ n
t=
D
sD / n
Eg.
Do patients have a higher pulse rate before chemotherapy than after?
Do mothers in Malaysia have their first pregnancy earlier than the
world average, which is 22 (±3) years?
Is memory better in the morning, or in the evening?
The “direct difference” method
Comparing means of DEPENDENT samples,
and testing hypothesis for the populations of these samples
t Test
Is there a difference
between the
D
characteristics of
=
t
• husbands and wives,
sD / n
• the IQ of twins,
• memory performance in the
morning and in the afternoon, etc?
D=Y–X
the “difference”
variable,
mean: D,
St Dev: sD
Criteria: quantitative,
normally distributed
variables, the SD of the
samples are similar..
Calculating difference scores for each pair: D1, D2, etc.
• calculating mean and SD of D1, D2, etc.
• calculating the t Test
• determining df (NUMBER of PAIRS minus 1!!!)
Conclusions are made
for the couple, the twins,
and the time of day, etc.
• checking from the table the corresponding t0.05 value
Generalization of Hypothesis testing
with t-tests
t
sample
t-test
result
t ≤ t 0.05
H 1 : µ < µ0
|t| < t 0.05
H
0
t-0.05
?
0
Region of rejection Region of retention
t ≥ t 0.05
H 2 : µ > µ0
In this case we cannot say anything certain about the
estimated population mean!
t+0.05
?
Region of rejection
For each problem FIND
the t0.05 critical value for
the appropriate df in the
statistical tables of the
Student’s t Distribution
(df = n-1)
Do you think it is more difficult
to recall words in English that
start with vowels or
consonants???
• Design an experiment with a repeated
measures design to be performed in class!
• Draft the case-variable matrix to be used!
• Formulate the research hypothesis!
• What will be the null hypothesis?
Is it more difficult to recall words in
English starting with vowels or
consonants?
Statement:
It is easier to recall words in English that start with…!
Experimental design: units of observation, variables…
Research hypothesis: estimated population mean
of recalling English words with consonants will be
significantly higher/lower than the estimated population
mean of recalling English words with vowels.
Statistical hypothesis: e.g: H1: µconsonants > µvowels
Experiment:
Half of the group: please recall as many
words as you can in English, starting with
consonants – you have 1 minute…
Half of the group: please recall as many
words as you can in English, starting with
vowels – you have 1 minute…
Switch task…
Calculate the difference score for Your raw data!
The case-variable datamatrix
and demonstrating what SPSS will calculate…
StatisticsLecturesTopic06demoVowelConsonantExample.xls
Subjects:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
mean
StDev
vowels
consonants difference
8
11
6
9
7
8
10
12
7
8
9
6
6
9
13
10
17
18
7
12
8
6
5
8
9
10
7
12
10
16
19
17
8.7
2.1
11.3
4.7
t= 2.5
9
7
1
3
1
-2
-5
-4
2
2
-2
6
4
7
6
7
t Test
D
=
t
sD / n
D=Y–X
the “difference”
variable,
mean: D,
St Dev: sD
2.625 mean of Difference scores
4.3 StDev of difference scores
Decision is based on
generalized hypothesis testing:
Conclude
the statistical test results below:
t = 2.5
df =15
t0.05 = 2.131
13
14
15
16
6
9
13
10
10
16
19
17
mean
StDev
8.7
2.1
11.3
4.7
4
7
6
7
2.625 mean of Difference scores
4.3 StDev of difference scores
t= 2.5
StatisticalTablesWithCriticalValuesInExcel.xls
Average recall number of English words in one minute
starting with vowels = 8.7 … with consonants = 11.3
Foreign language students recall English words starting with consonants
more easily based on the performance of 16 students: they could recall
significantly more words with consonants (11.3±4.7) as compared to
their performance on words with vowels (8.7±2.1) based on the
dependent samples t-test: t(15) = 2.5 (p < 0.05).
Testing the H0 Hypothesis
Random sample
Test statistics – a mathematical
rule for decision:
• Calculate statistical value based on the
formula of the appropriate statistical test
• Determine critical values
• Test if the calculated statistical value falls
in the area of retention (keeping the H0) or
in the area of rejection (keeping H1 or H2)
Level of
Confidence
(e.g. 95%)
Critical
value (−
−)
Critical
value (+)
Region of
Region of
Region of
rejection
retention
rejection
H1
H0
H1: µNew Jersey < µUSA
H0: µNew Jersey = µUSA
H2: µNew Jersey >
H2
µUSA
Level of significance
and possible errors of the decision
When rejecting H0:
- Possible error: rejecting H0 when it’s actually true!
- Name of error: Type I. error
- Probability = the level of significance (α)
(getting statistically significant results when you shouldn’t – erroneous conclusion)
When retaining (keeping) H0:
- Possible error: a false H0 is retained!
- Name of error: Type II. error
- Probability = usually unknown (β)
(failure to discover existing differences – erroneous conclusion)
By reducing the Type I. Error – you raise the probability of the Type II. Error!
E.g
We did not report significant differences between language proficiency of class
A and class B. What type of error might have we made?
Type II. error!
The one-sample z-test (rarely used)
Testing a statistical hypothesis about µ when the Standard
Deviation of the population (σ) is known:
sample
− µ0
σ/ n
X
z=
z ≤ -1.96
H 1 : µ < µ0
|z| < 1.96
H
0
-1.96
+1.96
Region of rejection Region of retention Region of rejection
z ≥ 1.96
H 2: µ > µ0
In this case we cannot say anything certain about the
estimated population mean!
Criteria: quantitative,
normally distributed
variables, σ is known,
µ0 is hypothesized, the
SD of the sample is
similar to σ.
What if we DO NOT KNOW the
Standard deviation of the
population?!
Testing Statistical Hypotheses About µ0 :
IF
known
z Test
X − µ0
uz =
σ/ n
Eg.
σ
not known
t Test
t=
x − µ0
s/ n
In both tests the mean of the sample
is being compared to a hypothesized: µ0 (e.g. national standard)
We are testing if the estimated population mean (based on data
from the sample) is significantly higher/lower than the µ0
Student's t-distribution
William Sealy Gosset
(published under the
pseudonym „Student”)
For estimating the mean of a normally
distributed population in situations
where the sample size is small and
population standard deviation is
unknown.
The t-distribution for each sample size is
different, and the larger the sample, the
more the distribution resembles a normal
distribution.
The normal distribution describes the
full population, on the other hand tdistributions describe samples drawn
from a full population.
https://onlinecourses.science.psu.edu/stat414/node/175
The one-sample t-test for testing a
hypothesis about the population mean
t
sample
X
t=
t≤
H 1 : µ < µ0
t 0.05
− µ0
s/
n
|t| < t 0.05
H
0
t-0.05
t+0.05
0
Region of rejection Region of retention
Region of rejection
t ≥ t 0.05
H 2: µ > µ0
In this case we cannot say anything certain about the
estimated population mean!
Criteria: quantitative,
normally distributed
variables, σ is known,
µ0 is hypothesized,
the SD of the sample
is similar to σ.
The degrees of freedom
t
sample
X
t=
t ≤ t 0.05
H 1 : µ < µ0
− µ0
s/
n
|t| < t 0.05
H
0
t-0.05
?
0
Region of rejection Region of retention
t+0.05
?
Region of rejection
t ≥ t 0.05
H 2 : µ > µ0
In this case we cannot say anything certain about the
estimated population mean!
For each problem there
is a different critical
value based on
df = n – 1 in the
statistical tables of the
Student’s t Distribution
Eg.
Practice task:
Teachers think that the students in their
High School have an outstanding IQ, based
on their sample: XIQ = (126, 139, 89, 106).
What do you think?
•
Students show „well above” average IQ
based on a sample of four: 115 points, the
standard error of the mean is 11 points.
IS THE AVERAGE IQ SIGNIFICANTLY
HIGHER THAN 100 IN THIS SCHOOL?
Eg.
SPSS analysis:
t
sample
X
t=
t ≤ t 0.05
H 1 : µ < µ0
− µ0
s/
n
|t| < t 0.05
H
0
t-0.05
?
0
Region of rejection Region of retention
t+0.05
?
Region of rejection
t ≥ t 0.05
H 2 : µ > µ0
In this case we cannot say anything certain about the
estimated population mean!
For each problem there
is a different critical
value based on
df = n – 1 in the
statistical tables of the
Student’s t Distribution
Eg.
SPSS analysis:
Eg.
Practice task:
Teachers think that the students in their
High School have an outstanding IQ, based
on their sample: XIQ = (126, 139, 89, 106).
What do you think?
• Students seem „well above” average IQ
based on a sample of four: 115 points,
however, based on the one sample t-test
they are NOT significantly different
from the 100 point average
t(3)=1.363 (p > 0.1).