Download t-test

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript

Summary from last week


Introducing the free experiment (week 10)
Parametric statistics
T-test

Exercises

 The t-test in SPSS
 Group problem solving on t-tests


Changes to the course plan – new course book uploded
Changes to provide more time for parametric statistics

Compendium is in the press – should be available early next week
from the book store at KUA

Emilie guest teacher next 2 weeks while I am on paternity leave

For week 12, regression and correlation, 100+ pages in
compendium: No need to read all of it – read the introductions to
each chapter, get the feel for the first simple examples
– multiple regression and –correlation is for future reference

The purpose of this exercise is to give you the chance to design your
own experiment, run it, analyze the results and write a report about your
experience

You can experiment with anything! Pick something you care about,
something you wonder about:
 What is the best way to make popcorn?
 Why do my plants die?
 Is chat program X faster than program Y?
 How long should rise boil for maximum tenderness?
 Does my daughter have a favorite toy?
 Which beer tastes the best?
 Is iron or aluminium pans best for boiling water?
 What is the ideal speed for walking with a coffee-cup?
 Etc. etc.

Example: Designing a 2*3 factorial experiment to
determine the effect of three variables on the
amount of popcorn produced
 Variables: Brand of popcorn (Netto, Irma), size of batch
(100 g, 200g), popcorn-to-oil-ratio (low, high)
Looking into e.g. if more expensive popcorn are
worth the price in terms of produced amount?
 What combination of variables produces the best
result in terms of volume?


Having picked your topic, sit down and define your
hypothesis


Consider: IV and DV? How many levels to the IV?
What is the expected causal relationship?
Directional or non-directional hypothesis? Argument to
support this?


Then design an experiment to test your hypothesis

Choose something that is easy to re-test – you may
experience that the test design needs to be redesigned

Design experiments in your groups - on the website is a list
of 101 examples of student experiment projects as
inspiration

You have 4 weeks to complete the experiment and write a
report about what was done and what was learned,
problems encountered, how to improve the experiment, etc.

Experiments can be the ”home type” or the ”laboratory
type”

As you progress and encounter questions, raise them in the
class where they can be discussed, or contact me/Emilie

The only formal requirement is that you MUST use
statistics in your analysis!
 This should be a piece of cake – just identify the type of
experiment you do, then use the appropriate statistical
test ...

Similar to the Mousepad experiment – but with
more statistics as appropriate
 E.g.: 2 groups with 2 levels of IV? Use t-test
 Etc.




Week 10: Prepare topic
Week 11+12: Run experiment
Week 13: Prepare report and presentation of experiment
Week 14: Present experiment and results (5 minutes)

There is no time set aside to this work in the exercise hours
(there may be some time in exercises but do not plan for it)

No page limit on the report – you are expected to use the
standard report template and add content as appropriate.
Use the textbook + Mousepad experiment reports as a
guide.

1. Sample means vary, and hence differences between sample means
and population mean varies.

2. Small differences are likely to occur by chance, large differences are
not (but can occasionally do so).

3. Small difference -> retain null hypothesis (difference has occurred by
chance). Large difference –> reject null hypothesis in favour of
experimental hypothesis (difference has not occurred by chance).

4. "Large" is a difference that is likely to occur by chance only 5% of the
time or less (p < .05) - a compromise between Type 1 and Type 2 errors.

5. Directional hypothesis versus non-directional hypothesis.

Different types of statistics

Descriptive statistics: Describing a single
sample and the population it came from
Inferential statistics: To answer research
questions – inference about the world


Parametric statistics = inferential statistical
testing methods

Parametric statistics work on the mean -> All
data must be interval or ratio level data

Parametric tests also make assumptions
about the variance between groups or
conditions

For independent-measures (between groups),
we assume that variance in one condition is the
same as the other: Homogeneity of variance
 The spread of scores in each sample should be roughly
similar
 Tested using Levene´s test

For repeated-measures (within subjects), we
operate with the sphericity assumption,
 Tested using Mauchly´s test
 Basically the same thing: homogeneity of variance

We also assume our data come from a population with a
normal distribution

We can test how much a distribution is similar to the normal
distribution using the Kolmogorov-Smirnov test (the vodka
test) and the Shapiro-Wilk tests
 The tests compare the set of scores in the sample to a normally
distributed set of scores with the same mean and standard deviation
 If the test is non-significant (p>0.05) the distribution of the sample is
NOT significantly different from a normal distribution (i.e. it is
normal)
 If p<0.05, the distribution of the sample is significantly different from
normal (e.g. positively or negatively skewed).

We can run Kolmogorov-Smirnov and Shapiro-Wilk tests in
SPSS

The most important is the Kolmogorov-Smirnov Test (K-Stest)

SPSS produces an output that includes the test statistic itself
(D), the degrees of freedom (df) (= the sample size) and the
significance value of the test (sig.).

If the significance of the K-S-test is less than .05, the
distribution deviates significantly from the normal

We have run an experiment with two groups (e.g.
control and experiment groups)

We have sample data, and we can use descriptive
statistics to calcuate the means, SDs etc. etc.

But how do we find out if the two samples are
significantly different? I.e.:
 If our experiment was a success?
 Our manipulation of IV caused a variation in DV larger
than the random variance

The simplest experimental design is to have two conditions: an
"experimental“ condition in which subjects receive some kind of
treatment, and a "control" condition in which they do not.

We want to compare performance in the two conditions.

We use a t-test to help us to decide whether the difference
between the conditions is "real" or whether it is due merely to
chance fluctuations

The t-test enables us to decide whether the mean of one
condition is really different from the mean of another
condition

We use the t-test in the simplest experimental
condition: 2 groups to compare
 Sample-sample (or sample-population)

The test statistic is called ”t” – it has its own
frequency distribution which varies with sample size

There are two types of t-test
 Independent t-test: 2 groups with different participants
[independent measures design/between-groups]
 Dependent t-test: 2 groups with same participants
[repeated measures design/within-subject]

In both cases, we have one independent
variable
 The thing we manipulate in our experiment), with
two levels (the two different conditions of our
experiment).
▪ Small mouse pad or big mouse pad

We have one dependent variable
 The thing we actually measure).
▪ Task completion time in seconds

1) Differences between extraverts and introverts in
performance on a memory test.
 The independent variable (I.V.) is "personality type", with two levels -
introversion and extraversion - and the dependent variable (D.V.) is
the memory test score
 An independent t-test would be appropriate here

2) The effects of alcohol on reaction-time performance.
 The I.V. is "alcohol consumption", with two levels - drunk and sober -
and the D.V. is reaction-time performance
 A dependent t-test could be used here; each subject's reaction time
could be measured twice, once while they were drunk and once while
they were sober

There are some considerations underlying the t-test which we need to
be aware off to avoid using the test blindly

Understanding how statistical tests operate is important – we need to
know how tests operate in order to use them correctly

Rationale of the t-test:

1) We have two sample means – they differ to some extent


Given two sample means, X1 & X 2 - we want to find out if the sample
means come from two populations with the same mean (same
population), or from two populations with different means.
2) If null hypothesis, means are identical, if experimental hypothesis,
means are significantly
dissimilar

Interpretation under the null
hypothesis: Samples come
from the same population:
Interpretation under the
experimental hypothesis:
Samples come from different
populations:
mean of
population 1
mean of
population 2
mean of
sample 1
mean of
sample 2
population
mean
mean of
sample 1
mean of
sample 2

3) in the t-test, we compare the differences we have
obtained with the difference we would expect (we assume
no difference, null hypothesis)

If we find a big difference between the means, we have
either
 1) atypical samples [by random chance, we got two dissimilar
samples]
 2) the samples are from different populations because their means
are different [our experiment had an effect]

The bigger the difference in sample means, the bigger the
chance of the null hypothesis being rejected

4) Because samples can be different by random chance, we
cannot just work with the difference of the means

We need some way of calculating the odds of two samples
being dissimilar by random chance

We can then “compare” our sample means difference with
the chance of this difference occurring

I.e., we need to know the frequency distribution of sample
mean differences

For example, say the difference in our two sample means is
“243”, we need to know how likely this difference size is in
our population?

The frequency distribution of the sample means difference
can tell us how likely it is that “243” is the difference between
two sample means – e.g. “X%”

If the chance of the difference occurring is small, there is a
good chance the difference in sample means is significant.




Recall: Sample means from a population will be normally distributed:
-> higher chance of sample means being similar than not
However sometimes samples do not have similar means:
-> large difference in sample means by chance alone
we need to account for this when figuring out if samples are different!
 = 10
Mean = 10
SD = 1.22
4
3
M = 10
M=9
Frequency

2
1
M = 11
M=9
M = 10
M=8
M = 12
0
6
7
8
9
10
11
Sample Mean
M = 10
M = 11
12
13
14
Sampling distribution of differences between means: A new type of distribution
Population I
Population II
μ1
...
μ2
X1 X1 X1 . . .
...
X1  X 2  D
  
X2 X2 X2 . . .
 Note:
 we want to figure out if
Pop. 1 and 2 are the same
frequency
of D

values of X1 − X 2

I.e. We take all possible sample means and subtract
all possible sample means, and map the distribution

The distribution is of course normally distributed

The SD of this distribution = SE of differences [SE
because we are dealing with the distribution of
sample means – we call it SD
when we have just one sample]
Mean = 10
SD = 1.22
4
Frequency
3
2
1
0
6
7
8
9
10
11
Sample Mean
12
13
14


-> small SE means most pairs of samples
from a population will generally have similar
means (difference between sample means is
small)
-> large SE means that sample means can
deviate a lot from population mean, and
differences between pairs of samples can
be large by chance alone

The SE of the sample means difference frequency
distribution gives us an estimate of the extent to which we
would expect sample means to be different by chance alone
 A measure of unsystematic variance in our experiment

T-test is simply difference between means as a function of
the degree to which those means would differ by chance
alone

Note: If large differences are COMMON in the means of
samples from a population, because the normal distribution
of sample means is flat, the difference between the samples
need to be correspondingly larger to be significant
Observed difference of
Sample means

t=
-
Difference between means under
null hypothesis
Estimate of the standard error (SE)
of the difference between
the two sample means (the
unsystematic variance

Recall: Two types of t-test
 Independent t-test: 2 groups with different
participants [independent measures
design/between-groups]
 Dependent t-test: 2 groups with same
participants [repeated measures design/withinsubject]

The dependent t-test is used when the same
participants are used in both experimental
conditions
Repeated measures experiment. To examine
the effect of variable A on variable B
N subjects are selected from
the population
The subjects are first given Level 1 of the
independent variable A
POPULATION
N subjects
Level 1 of independent variable A administered
Subjects are measured on dependent variable B.
( X1 and s1 are computed from these data)
The same subjects are then given Level 2
of the independent variable A
subjects
measured on
dependent
variable B
Level 2 of independent variable A administered
Subjects are measured on dependent variable B.
( X 2 and s2 are computed from these data)
subjects
measured on
dependent
variable B
Compute
Statistics are computed and hypothesis
test carried out to decide if the difference
between X1 andX 2 is due to sampling variability
or effect of A on B.
D
SD
Test your hypothesis
 H0: μ1 – μ2 = 0
H : μ – μ ≠ 0
1
1
2

Experiment on the effects of alcohol on task
performance (time in seconds).

Measure time taken to perform the task for subjects
when drunk, and when (same subjects are) sober.

Null hypothesis: Alcohol has no effect on time
taken: Variation between the drunk sample mean
and the sober sample mean is due to sampling
variation alone.

i.e. The drunk and sober performance times are
samples from the same population.
Quick reminder: Sampling distribution of differences between means
Population level 1 of A
with Alcohol
Population level 2 of A
without Alcohol
μ1
...
μ2
X1 X1 X1 . . .
...
X1  X 2  D
  

frequency
of D

μD
values of X1 − X 2
X2 X2 X2 . . .
 
Times (in seconds) of participants to complete a motor coordination task
Condition
A
Level 1
Condition
A
Level 2
Participant
With
Alcohol
Without
Alcohol
1
12.4
10.0
2
15.5
14.2
3
17.9
18.0
4
9.7
10.1
5
19.6
14.2
6
16.5
12.1
7
15.1
15.1
8
16.3
12.4
9
13.3
12.7
10
11.6
13.1
D  D (hypothesized)
t(observed) 
SD
the mean difference
between scores in our two
samples (should be close
to zero if there is no
difference between the
two conditions)
the predicted average
difference between scores
in our two samples (usually
zero, since we assume the
two samples do not differ )
estimated standard error of the mean difference
(a measure of how much the mean difference might
vary from one occasion to the next randomly).
Condition
A
Level 1
Condition A
Level 2
Participant
With
Alcohol
Without
Alcohol
Diff.
(D)
1
12.4
10.0
2.4
2
15.5
14.2
1.3
3
17.9
18.0
-0.1
4
9.7
10.1
-0.4
5
19.6
14.2
5.4
6
16.5
12.1
4.4
7
15.1
15.1
0.0
8
16.3
12.4
3.9
9
13.3
12.7
0.6
10
11.6
13.1
-1.5
D  16.0
If independent t-test,
(2 groups of different
subjects), we just subtract
sample mean 1 from
sample mean 2
1. Add up the differences:
 D 16
2. Find the mean difference:

D 16

D

 1.6
N
10

3a. Estimate of the population standard deviation
We need this to calculate the standard error of the
mean differences
Standard
deviation
SD 
Standard error
of sample means
differences
(D  D )
N 1
SD
SD 
N
2
Breaking this
calculation
down
In steps:
Condition
A
Level 1
Condition A
Level 2
Participant
With
Alcohol
Without
Alcohol
Diff.
(D)
1
12.4
10.0
2.4
0.8
0.64
2
15.5
14.2
1.3
-0.3
0.09
3
17.9
18.0 
-0.1
-1.7
2.89
4
9.7
10.1
-0.4
-2.0
4.0
5
19.6
14.2
5.4
3.8
14.44
6
16.5
12.1
4.4
2.8
7.84
7
15.1
15.1
0.0
-1.6
2.56
8
16.3
12.4
3.9
2.3
5.29
9
13.3
12.7
0.6
-1.0
1.0
10
11.6
13.1
-1.5
-3.1
9.61
D  D (D  D ) 2

D  16.0 (D  D)
D
16
 1.6
10

2
 48.36

3b. Estimate of the population standard deviation
SD 
(D  D )
2
N 1
48.36

 2.318
9
4. Estimate of the population standard error (the SE of the
population of differences between means of samples)
Recall: The SE is the SD of sample means
(here it is the standard error of the differences between two
sample means – our difference frequency distribution):
SD
SD 
N
2.318

 0.733
10
5. Hypothesised difference between the sample means
Our null hypothesis is usually that there is no difference
between the two sample means. (In statistical terms, that
they have come from two identical populations):
D (hypothesised) = 0
6. Work out t:
1.6  0
t(observed) 
 2.183
0.733
7. "Degrees of freedom" (df) are the number of subjects
minus one: df = N - 1 = 10 - 1 = 9
8. Find t-critical value of t from a table (at the back of statistics books;
also on the course website).
8. Find t-critical value of t from a table (at the back of statistics books;
also on the course website).
(a) “Two-tailed test”: If we are
predicting a difference between
Level 1 and 2; find the critical
value of t for a "two-tailed" test.
With df = 9, critical value = 2.26.
TWO-Tailed: t-observed (2.183) is
smaller than t-critical (2.26)
“There is no significant difference
between the times taken to
complete the task with or without
alcohol”
t(9) = 2.183, p > 0.05
(b) “One-tailed test”: If we are
predicting that Level 1 is bigger
than 2, (or 1 is smaller than 2),
find the critical value of t for a
"one-tailed" test. For df = 9, critical
value = 1.83.
ONE-Tailed: t-observed (2.183) is
larger than t-critical (1.83)
“The times taken to complete the
task is significantly longer with
alcohol than without”
t(9) = 2.183, p < 0.05
lower
t-critical value
-2.26
two-tailed
df = 9
0.025
upper
t-critical value
2.26
0.025
t-observed
(2.183)
one-tailed
df = 9
0.05
t-critical value
1.83

Using SPSS to do a dependent t-test
Data
Entry
Running SPSS (repeated measures t-test)
Running SPSS (repeated measures t-test)
Running SPSS (repeated measures t-test)
Interpreting SPSS output (repeated measures t-test)
Sample means
Sample sizes
Correlation strength (effect size)
+ significance of correlation
Mean difference
of samples
SD of the difference
between the means
SE of differences
between scores
Test results

Degrees of freedom is sample size minus 1 in
repeated measures (here: 10 – 1 = 9)

SPSS uses the dg. to calculate the odds that the tvalue could occur by chance
POPULATION
Independent measures experiment. To
examine the effect of variable A on variable B
N subjects are selected from
the population and split into two
groups n1 and n2
n1 + n2 = N
Subjects in each group
receive identical treatment
except different levels of
independent variable A are
given to each group
Subjects in each group
are measured in the same way
on the dependent variable B
N subjects
Random factors
determine which
group
Group 1:
n1 subject
Group 2:
n2 subject
Level 1
of independent
variable A
given to all
subjects (n1)
Level 2
of independent
variable A
given to all
subjects (n2)
Group 1
subjects
measured on
dependent
variable B
Group 2
subjects
measured on
dependent
variable B
Compute
Compute
X1 Error term1
X 2 Error term2
Statistics are computed and hypothesis
test carried out to decide if the difference
Test your hypothesis
X
X
between 1 and 2 is due to sampling variability
H0: μ1 – μ2 = 0

or effect of A on B.
H1: μ1 – μ2 ≠ 0

Experiment on the effects of alcohol on task performance
(time in seconds).

Measure time taken to perform the task for one set of
subjects when drunk, and a different set of subjects when
sober.

Null hypothesis: Alcohol has no effect on time taken:
variation between the drunk sample mean and the sober
sample mean is due to sampling variation.

i.e. The drunk and sober performance times are samples
from the same population.
Subject group 1
Subject group 2
Drunk
X1
Sober
X2
Participant 1
13.0
Participant 1
11.1
Participant 2
16.5
Participant 2
13.5
Participant 3
16.9
Participant 3
11.0
Participant 4
19.7
Participant 4
9.1
Participant 5
17.6
Participant 5
13.3
Participant 6
17.5
Participant 6
11.7
Participant 7
18.1
Participant 7
14.3
Participant 8
17.3
Participant 8
10.8
Participant 9
14.5
Participant 9
12.6
Participant 10
13.3
Participant 10
11.2
(X1  X 2 )  (1  2 )hypothesized
t(observed) 
estimated X1 X 2
the difference between
samples means (should
be close to zero if there is
no difference between the
two conditions)
the predicted average
difference between scores
in our two samples (usually
zero, since we assume the
two samples don’t differ )
estimated standard error of the difference
between the means (a measure of how much the
difference between means might vary from one
occasion to the next).

Using SPSS to do an independent t-test
Data
Entry
Note 2 samples
Running SPSS (independent measures t-test)
Running SPSS (independent measures t-test)
Running SPSS (independent measures t-test)
Running SPSS (independent measures t-test)
Defining who
belongs to which sample
SPSS output (independent measures t-test)
t is calculated by dividing difference in means with
standard error: 4.58/0.84359
Sig. is < than .05, so there is a
significant difference between
alcohol/no alcohol on performance
Row 1 left show result of Levene´s test – tests the hypothesis that variance in the two
samples is equal. If Levene´s test is significant at p<0.05 the assumption of
homogenity of variance in the samples has been violated (this is annoying).
If not, we assume equal variance (use row 1)

Degrees of freedom is the sum of the sample sizes
minus the number of samples (here 10+10-2 = 18)

SPSS uses the dg. To find out which t-distribution to
use, and thus to calculate the odds that the t-value
could occur by chance
 t-distributions vary by sample size
We use the t-test when, given two sample means,
X1 & X 2 , we want to find out if the sample means come from
two populations with the same mean, or from two
populations with different means.


t-statistic = the ratio of the difference between means
divided by an estimate of the standard error of the frequency
distribution of the difference of means

t-test is simply the difference between the means of two
samples/groups as a function of the degree to which those
means would differ by chance alone
 The t-test informs us if the sample means of two conditions is large
enough not to be a chance result




1) When to use a t-test
2) What t-test to use (dependent,
independent)
3) That the t-test compares difference of
sample means, as a function of this difference
occuring by random chance
4) How to use SPSS to run a t-test and
interpret the result

Calculating effect sizes

Is the effect we have found substantive or
small?

We can convert t-statistic into an effect size
[”r”]

For our independent t-test example (t=5.429;
dg.=18)

See ya in 2 weeks!