Download Theories - the Department of Psychology at Illinois State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Analysis of variance wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics
Psych 231: Research
Methods in Psychology


Quiz 10 (chapter 7) is due on Nov. 13th at midnight
Journal Summary 2 assignment


Due in class NEXT week (Wednesday, Nov. 18th) <- moved
due date
Group projects


Plan to have your analyses done before Thanksgiving break,
GAs will be available during lab times to help
Poster sessions are last lab sections of the semester (last week
of classes), so start thinking about your posters. I will lecture
about poster presentations on the Monday before Thanksgiving
break.
Reminders

2 General kinds of Statistics

Descriptive statistics
• Used to describe, simplify, &
organize data sets
• Describing distributions of
scores

Population
Inferential
statistics
used to
generalize
back
Inferential statistics
• Used to test claims about the
population, based on data
gathered from samples
• Takes sampling error into
account. Are the results above
and beyond what you’d expect
by random chance?
Sample
Samples and Populations

Properties: Shape, Center, and Spread (variability)

Shape
• Symmetric v. asymmetric (skew)
• Unimodal v. multimodal

Center
• Where most of the data in the distribution are
• Mean, Median, Mode

Spread (variability)
• How similar/dissimilar are the scores in the distribution?
• Standard deviation (variance), Range
Describing Distributions

Properties: Shape, Center, and Spread (variability)

Visual descriptions - A picture of the distribution is usually
helpful

f
%
1 (hate)
200
20
2
100
10
3
200
20
4
200
20
5 (love)
300
30
Numerical descriptions of distributions
Describing Distributions

The mean (mathematical average) is the most popular
and most important measure of center.
Divide by the total
– The formula for the population
mean is (a parameter):
åX
m=
N
– The formula for the sample
åX
X=
n
mean is (a statistic):
mean
Mean & Standard deviation
number in the
population
Add up all of
the X’s
Divide by the total
number in the
sample


The mean (mathematical average) is the most popular
and most important measure of center. Others include
median and mode.
The standard deviation is the most popular and
important measure of variability.

The standard deviation measures how far off all of the
individuals in the distribution are from a standard, where that
standard is the mean of the distribution.
• Essentially, the average of the deviations.
stdev =
mean
Mean & Standard deviation
2
X
mean
(
)
å
N

Working your way through the formula:
standard deviation = σ = s =
2



å ( X - m)
2
N
Step 1: Compute deviation scores
Step 2: Compute the SS
Step 3: Determine the variance
• Take the average of the squared deviations
• Divide the SS by the N

Step 4: Determine the standard deviation
• Take the square root of the variance
An Example: Computing Standard
Deviation (population)

Main difference:
 This is done because samples are biased
s= s =
2



å( X - X )
n -1
2
to be less variable than the population.
This “correction factor” will increase the
sample’s SD (making it a better estimate
of the population’s SD)
Step 1: Compute deviation scores
Step 2: Compute the SS
Step 3: Determine the variance
• Take the average of the squared deviations
• Divide the SS by the n-1

Step 4: Determine the standard deviation
• Take the square root of the variance
An Example: Computing Standard
Deviation (sample)

2 General kinds of Statistics

Descriptive statistics
• Used to describe, simplify, &
organize data sets
• Describing distributions of
scores

Population
Inferential
statistics
used to
generalize
back
Inferential statistics
• Used to test claims about the
population, based on data
gathered from samples
• Takes sampling error into
account. Are the results above
and beyond what you’d expect
by random chance?
Statistics
Sample

Purpose: To make claims about populations
based on data collected from samples


What’s the big deal?
Example Experiment:
 Group A - gets treatment to improve memory
 Group B - gets no treatment (control)


Population
After treatment period test both groups for
memory
Results:
 Group A’s average memory score is 80%
 Group B’s is 76%

Is the 4% difference a “real” difference
(statistically significant) or is it just sampling
error?
Inferential Statistics
Sample A
Treatment
X = 80%
Sample B
No Treatment
X = 76%





Step 1: State your hypotheses
Step 2: Set your decision criteria
Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics
Step 5: Make a decision about your null hypothesis


“Reject H0”
“Fail to reject H0”
Testing Hypotheses

Step 1: State your hypotheses

Null hypothesis (H0)
• “There are no differences (effects)”

Alternative hypothesis(ses)
This is the
hypothesis
that you are
testing
• Generally, “not all groups are equal”

You aren’t out to prove the alternative hypothesis
(although it feels like this is what you want to do)

If you reject the null hypothesis, then you’re left
with support for the alternative(s) (NOT proof!)
Testing Hypotheses

Step 1: State your hypotheses

In our memory example experiment


Null
H0: mean of Group A = mean of Group B
Alternative HA: mean of Group A ≠ mean of Group B
 (Or more precisely: Group A > Group B)



It seems like our theory is that the treatment should
improve memory.
That’s the alternative hypothesis. That’s NOT the
one the we’ll test with inferential statistics.
Instead, we test the H0
Testing Hypotheses


Step 1: State your hypotheses
Step 2: Set your decision criteria

Your alpha level will be your guide for when to:
• “reject the null hypothesis”
• “fail to reject the null hypothesis”

This could be correct conclusion or the incorrect conclusion
• Two different ways to go wrong
• Type I error: saying that there is a difference when there really
isn’t one (probability of making this error is “alpha level”)
• Type II error: saying that there is not a difference when there
really is one
Testing Hypotheses
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Error types
H0 is
wrong
Type I
error
a
Type II
error
b
Real world (‘truth’)
Defendant
is innocent
Defendant
is guilty
Type I error
Jury’s decision
Find
guilty
Type II error
Find not
guilty
Error types: Courtroom analogy

Type I error: concluding that there is an effect (a difference
between groups) when there really isn’t.






Sometimes called “significance level”
We try to minimize this (keep it low)
Pick a low level of alpha
Psychology: 0.05 and 0.01 most common
For Step 5, we compare a “p-value” of our test to the alpha level to
decide whether to “reject” or “fail to reject” to H0
Type II error: concluding that there isn’t an effect, when there
really is.


1-b
Related to the Statistical Power of a test
How likely are you able to detect a difference if it is there
Error types




Step 1: State your hypotheses
Step 2: Set your decision criteria
Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics



Descriptive statistics (means, standard deviations, etc.)
Inferential statistics (t-tests, ANOVAs, etc.)
Step 5: Make a decision about your null hypothesis



Reject H0
Fail to reject H0
“statistically significant differences”
“not statistically significant differences”
Make this decision by comparing your test’s “p-value”
against the alpha level that you picked in Step 2.
Testing Hypotheses

X

Consider the results of our class experiment
Main effect of cell
phone
✓

Main effect of site type
✓

An Interaction between
cell phone and site type
0.25
-0.36
Factorial designs
Resource: Dr. Kahn's reporting
stats page

“Statistically significant differences”

When you “reject your null hypothesis”
• Essentially this means that the observed difference is
above what you’d expect by chance
• “Chance” is determined by estimating how much
sampling error there is
• Factors affecting “chance”
• Sample size
• Population variability
Statistical significance
Population mean
Population
Distribution
x
n=1
Sampling error
(Pop mean - sample mean)
Sampling error
Population mean
Population
Distribution
Sample mean
x
n=2
x
Sampling error
(Pop mean - sample mean)
Sampling error
 Generally,
as the
sample
Population
mean
size increases, the sampling
error decreases
Sample mean
Population
Distribution
x
x
n = 10
x
x
x x
x
x xx
Sampling error
(Pop mean - sample mean)
Sampling error

Typically the narrower the population distribution, the
narrower the range of possible samples, and the smaller the
“chance”
Small population variability
Sampling error
Large population variability

These two factors combine to impact the distribution of
sample means.

The distribution of sample means is a distribution of all possible
sample means of a particular sample size that can be drawn
from the population
Population
Distribution of
sample means
Samples
of size = n
XA XB XC XD
“chance”
Sampling error
Avg. Sampling
error

“A statistically significant difference” means:



the researcher is concluding that there is a difference above
and beyond chance
with the probability of making a type I error at 5% (assuming an
alpha level = 0.05)
Note “statistical significance” is not the same thing as
theoretical significance.


Only means that there is a statistical difference
Doesn’t mean that it is an important difference
Significance

Failing to reject the null hypothesis


Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
• Sample size is too small
• Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Non-Significance

Example Experiment:
 Group A - gets treatment to improve memory
 Group B - gets no treatment (control)
 After treatment period test both groups for memory
 Results:
 Group A’s average memory score is 80%
 Group B’s is 76%
 Is the 4% difference a “real” difference (statistically
significant) or is it just sampling error?
Two sample
distributions
Experimenter’s
conclusions
XB
XA
76%
80%
From last time
About populations
H0: μA = μB
H0: there is no difference
between Grp A and Grp B
Reject
H0
Fail to
Reject
H0
Real world (‘truth’)
H0 is
correct
H0 is
wrong
Type I
error
a
Type II
error
b

Tests the question:

Are there differences between groups
due to a treatment?
Real world
(‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
One
population
Two sample
distributions
XB
XA
76%
80%
“Generic” statistical test
H0 is
wrong
Type I
error
Type II
error

Tests the question:

Real world
(‘truth’)
H0 is
correct
Are there differences between groups
due to a treatment?
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
H0 is
wrong
Type I
error
a
Type II
error
b
H0 is false (is a treatment effect)
Two
populations
XB
XA
XB
XA
76%
80%
76%
80%
People who get the treatment change,
they form a new population
(the “treatment population)
“Generic” statistical test
XA




XB
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
Why might the samples be different?
(What is the source of the variability between groups)?
“Generic” statistical test
XA




XB
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
The generic test statistic - is a ratio of sources of
variability
Computed
Observed difference
TR + ID + ER
=
=
test statistic
Difference from chance
ID + ER
“Generic” statistical test

The distribution of sample means is a distribution of all
possible sample means of a particular sample size that can be
drawn from the population
Population
Distribution of
sample means
Samples of
size = n
XA XB XC XD
“chance”
Sampling error
Avg. Sampling
error

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
TR + ID + ER
ID + ER
Test
statistic
Distribution of
sample means
α-level determines where
these boundaries go
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
Fail to reject H0
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
“One tailed test”: sometimes you
know to expect a particular
difference (e.g., “improve memory
performance”)
Fail to reject H0
“Generic” statistical test

Things that affect the computed test statistic

Size of the treatment effect
• The bigger the effect, the bigger the computed test
statistic

Difference expected by chance (sample error)
• Sample size
• Variability in the population
“Generic” statistical test

“A statistically significant difference” means:



the researcher is concluding that there is a difference above
and beyond chance
with the probability of making a type I error at 5% (assuming an
alpha level = 0.05)
Note “statistical significance” is not the same thing as
theoretical significance.


Only means that there is a statistical difference
Doesn’t mean that it is an important difference
Significance

Failing to reject the null hypothesis


Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
• Sample size is too small
• Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Non-Significance

1 factor with two groups

T-tests
• Between groups: 2-independent samples
• Within groups: Repeated measures samples (matched, related)

1 factor with more than two groups


Analysis of Variance (ANOVA) (either between groups or
repeated measures)
Multi-factorial

Factorial ANOVA
Some inferential statistical tests

Design


2 separate experimental conditions
Degrees of freedom
• Based on the size of the sample and the kind of t-test

Formula:
Observed difference
T=
X1 - X2
Diff by chance
Computation differs for
between and within t-tests
T-test
Based on sample error

Reporting your results





The observed difference between conditions
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test

“The mean of the treatment group was 12 points higher than the
control group. An independent samples t-test yielded a significant
difference, t(24) = 5.67, p < 0.05.”

“The mean score of the post-test was 12 points higher than the
pre-test. A repeated measures t-test demonstrated that this
difference was significant significant, t(12) = 5.67, p < 0.05.”
T-test

Designs

XA
XB
XC
More than two groups
• 1 Factor ANOVA, Factorial ANOVA
• Both Within and Between Groups Factors


Test statistic is an F-ratio
Degrees of freedom


Several to keep track of
The number of them depends on the design
Analysis of Variance
XA

XB
XC
More than two groups


Now we can’t just compute a simple difference score since
there are more than one difference
So we use variance instead of simply the difference
• Variance is essentially an average difference
Observed variance
F-ratio =
Variance from chance
Analysis of Variance
XA

XB
XC
1 Factor, with more than two levels

Now we can’t just compute a simple difference score since
there are more than one difference
• A - B, B - C, & A - C
1 factor ANOVA
Null hypothesis:
XA
XB
XC
H0: all the groups are equal
XA = XB = XC
Alternative hypotheses
HA: not all the groups are equal
XA ≠ XB ≠ XC
XA = XB ≠ XC
1 factor ANOVA
The ANOVA
tests this one!!
Do further tests to
pick between these
XA ≠ XB = XC
XA = XC ≠ XB
Planned contrasts and post-hoc tests:
- Further tests used to rule out the different
Alternative hypotheses
XA ≠ XB ≠ XC
Test 1: A ≠ B
Test 2: A ≠ C
Test 3: B = C
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA

Reporting your results
The observed differences
 Kind of test
 Computed F-ratio
 Degrees of freedom for the test
 The “p-value” of the test
 Any post-hoc or planned comparison results
“The mean score of Group A was 12, Group B was 25, and
Group C was 27. A 1-way ANOVA was conducted and the
results yielded a significant difference, F(2,25) = 5.67, p < 0.05.
Post hoc tests revealed that the differences between groups A
and B and A and C were statistically reliable (respectively t(1) =
5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not
differ significantly from one another”


1 factor ANOVA


We covered much of this in our experimental design lecture
More than one factor



Factors may be within or between
Overall design may be entirely within, entirely between, or mixed
Many F-ratios may be computed


An F-ratio is computed to test the main effect of each factor
An F-ratio is computed to test each of the potential interactions
between the factors
Factorial ANOVAs

Reporting your results

The observed differences
• Because there may be a lot of these, may present them in a table
instead of directly in the text

Kind of design
• e.g. “2 x 2 completely between factorial design”

Computed F-ratios
• May see separate paragraphs for each factor, and for interactions

Degrees of freedom for the test
• Each F-ratio will have its own set of df’s

The “p-value” of the test
• May want to just say “all tests were tested with an alpha level of
0.05”

Any post-hoc or planned comparison results
• Typically only the theoretically interesting comparisons are
presented
Factorial ANOVAs