Download 25.statistics - Illinois State University Department of Psychology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Resampling (statistics) wikipedia , lookup

Transcript
Inferential Statistics
Psych 231: Research
Methods in Psychology

Purpose: To make claims about populations
based on data collected from samples


What’s the big deal?
Example Experiment:
 Group A - gets treatment to improve memory
 Group B - gets no treatment (control)


After treatment period test both groups for memory
Results:
 Group A’s average memory score is 80%
 Group B’s is 76%

Is the 4% difference a “real” difference (statistically
significant) or is it just sampling error?
Inferential Statistics

Step 1: State your hypotheses





Null hypothesis
Alternative hypothesis
Step 2: Set your decision criteria


This is the
hypothesis that
you are testing
Alpha level (prob of Type I error)
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Real world (‘truth’)
H0 is
correct
H0 is
wrong
Type I
error

Type II
error

Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics
Step 5: Make a decision about your null hypothesis


Reject H0
Fail to reject H0
“statistically significant differences”
“not statistically significant differences”
Testing Hypotheses

Example Experiment:
 Group A - gets treatment to improve memory
 Group B - gets no treatment (control)
 After treatment period test both groups for memory
 Results:
 Group A’s average memory score is 80%
 Group B’s average memory score is 76%
 Is the 4% difference a “real” difference (statistically
significant) or is it just sampling error?
Two sample
distributions
About populations
H0: μA = μB
H0: there is no difference
between Grp A and Grp B
Real world (‘truth’)
H0 is
correct
Reject
H0
XB
XA
76%
80%
Experimenter’s
conclusions
Fail to
Reject
H0
Testing Hypotheses
H0 is
wrong
Type
I error

Type II
error


Tests the question:

Are there differences between groups
due to a treatment?
Real world (‘truth’)
H0 is
correct
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
One
population
Two sample
distributions
XB
XA
76%
80%
“Generic” statistical test
H0 is
wrong
Type I
error

Type II
error


Tests the question:

Real world (‘truth’)
H0 is
correct
Are there differences between groups
due to a treatment?
Reject
H0
Experimenter’s
conclusions
Fail to
Reject
H0
Two possibilities in the “real world”
H0 is true (no treatment effect)
H0 is
wrong
Type I
error

Type II
error

H0 is false (is a treatment effect)
Two
populations
XB
XA
XB
XA
76%
80%
76%
80%
People who get the treatment change,
they form a new population
(the “treatment population)
“Generic” statistical test
XB




XA
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
Why might the samples be different?
(What is the source of the variability between groups)?
“Generic” statistical test
XB




XA
ER: Random sampling error
ID: Individual differences (if between subjects factor)
TR: The effect of a treatment
The generic test statistic - is a ratio of sources of
variability
Computed
Observed difference
TR + ID + ER
=
=
test statistic
Difference from chance
ID + ER
“Generic” statistical test
Observed difference
Difference from chance

“Statistically significant differences”

When you “reject your null hypothesis”
• Essentially this means that the observed difference is
above what you’d expect by chance
• “Chance” is determined by estimating how much
sampling error there is
• Factors affecting “chance” (sampling error)
• Sample size
• Population variability
Estimating Sampling Error
Population mean
Population
Distribution
x
n=1
Sampling error
(Pop mean - sample mean)
Estimating Sampling Error: Sample size
Population mean
Population
Distribution
Sample mean
x
n=2
x
Sampling error
(Pop mean - sample mean)
Estimating Sampling Error: Sample size
 Generally,
as the
sample
Population
mean
size increases, the sampling
error decreases
Sample mean
Population
Distribution
x
x
n = 10
x
x
x x
x
x xx
Sampling error
(Pop mean - sample mean)
Estimating Sampling Error: Sample size

Typically the narrower the population distribution, the
narrower the range of possible samples, and the smaller the
“chance”
Small population variability
Large population variability
Estimating Sampling Error: Pop. Variance

These two factors combine to impact the distribution of
sample means.

The distribution of sample means is a distribution of all possible
sample means of a particular sample size that can be drawn
from the population
Population
Distribution of
sample means
Samples
of size = n
XA XB XC XD
“chance”
Avg. Sampling
error
Estimating Sampling Error: Standard Error

The generic test statistic distribution

A transformation of the distribution of sample means into a standardized
distribution (e.g., z-distribution, t-distribution, F-distribution)
Observed difference
Difference from chance
Test
statistic
Distribution of
sample means
“Generic” statistical test
Distribution of
the test statistic

Step 1: State your hypotheses



Step 2: Set your decision criteria




Null hypothesis
Alternative hypothesis
Alpha level (prob of Type I error)
Step 3: Collect your data from your sample(s)
Step 4: Compute your test statistics
Step 5: Make a decision about your null hypothesis


Reject H0
Fail to reject H0
“statistically significant differences”
“not statistically significant differences”
Testing Hypotheses

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough to reject H0? The alpha level gives us the decision criterion
TR + ID + ER
ID + ER
Distribution of
the test statistic
Test
statistic
Distribution of
sample means
α-level determines where
these boundaries go
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough to reject H0? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
Fail to reject H0
“Generic” statistical test

The generic test statistic distribution

To reject the H0, you want a computed test statistics that is large
• reflecting a large Treatment Effect (TR)

What’s large enough to reject H0? The alpha level gives us the decision criterion
Distribution of
the test statistic
Reject H0
“One tailed test”: sometimes you
know to expect a particular
difference (e.g., “improve memory
performance”)
Fail to reject H0
“Generic” statistical test
Computed
Observed difference
TR + ID + ER
=
=
test statistic
Difference from chance
ID + ER

Things that affect the computed test statistic

• Sample size
• Variability in the population
R

NR
exp
Difference expected by chance (sample error)
Size of the treatment effect
• The bigger the effect, the bigger the computed test
statistic
“Generic” statistical test

“A statistically significant difference” means:



the researcher is concluding that there is a difference above
and beyond chance
with the probability of making a type I error at 5% (assuming an
alpha level = 0.05)
Note “statistical significance” is not the same thing as
theoretical significance.


Only means that there is a statistical difference
Doesn’t mean that it is an important difference
Significance

Failing to reject the null hypothesis


Generally, not interested in “accepting the null hypothesis”
(remember we can’t prove things only disprove them)
Usually check to see if you made a Type II error (failed to
detect a difference that is really there)
• Check the statistical power of your test
• Sample size is too small
• Effects that you’re looking for are really small
• Check your controls, maybe too much variability
Non-Significance

1 factor with two groups

T-tests
• Between groups: 2-independent samples
• Within groups: Repeated measures samples (matched, related)

1 factor with more than two groups


Analysis of Variance (ANOVA) (either between groups or
repeated measures)
Multi-factorial

Factorial ANOVA
Some inferential statistical tests

Design


2 separate experimental conditions
Degrees of freedom
• Based on the size of the sample and the kind of t-test

Formula:
Observed difference
T=
X1 - X2
Diff by chance
Computation differs for
between and within t-tests
T-test
Based on sample error

Reporting your results





The observed difference between conditions
Kind of t-test
Computed T-statistic
Degrees of freedom for the test
The “p-value” of the test

“The mean of the treatment group was 12 points higher than the
control group. An independent samples t-test yielded a significant
difference, t(24) = 5.67, p < 0.05.”

“The mean score of the post-test was 12 points higher than the
pre-test. A repeated measures t-test demonstrated that this
difference was significant significant, t(12) = 5.67, p < 0.05.”
T-test

Designs

XA
XB
XC
More than two groups
• 1 Factor ANOVA, Factorial ANOVA
• Both Within and Between Groups Factors


Test statistic is an F-ratio
Degrees of freedom


Several to keep track of
The number of them depends on the design
Analysis of Variance
XA

XB
XC
More than two groups


Now we can’t just compute a simple difference score since
there are more than one difference
So we use variance instead of simply the difference
• Variance is essentially an average difference
Observed variance
F-ratio =
Variance from chance
Analysis of Variance
XA

XB
XC
1 Factor, with more than two levels

Now we can’t just compute a simple difference score since
there are more than one difference
• A - B, B - C, & A - C
1 factor ANOVA
Null hypothesis:
XA
XB
XC
H0: all the groups are equal
XA = XB = XC
Alternative hypotheses
HA: not all the groups are equal
XA ≠ XB ≠ XC
XA = XB ≠ XC
1 factor ANOVA
The ANOVA
tests this one!!
Do further tests to
pick between these
XA ≠ XB = XC
XA = XC ≠ XB
Planned contrasts and post-hoc tests:
- Further tests used to rule out the different
Alternative hypotheses
XA ≠ XB ≠ XC
Test 1: A ≠ B
Test 2: A ≠ C
Test 3: B = C
XA = XB ≠ XC
XA ≠ XB = XC
XA = XC ≠ XB
1 factor ANOVA

Reporting your results
The observed differences
 Kind of test
 Computed F-ratio
 Degrees of freedom for the test
 The “p-value” of the test
 Any post-hoc or planned comparison results
“The mean score of Group A was 12, Group B was 25, and
Group C was 27. A 1-way ANOVA was conducted and the
results yielded a significant difference, F(2,25) = 5.67, p < 0.05.
Post hoc tests revealed that the differences between groups A
and B and A and C were statistically reliable (respectively t(1) =
5.67, p < 0.05 & t(1) = 6.02, p <0.05). Groups B and C did not
differ significantly from one another”


1 factor ANOVA


We covered much of this in our experimental design lecture
More than one factor



Factors may be within or between
Overall design may be entirely within, entirely between, or mixed
Many F-ratios may be computed


An F-ratio is computed to test the main effect of each factor
An F-ratio is computed to test each of the potential interactions
between the factors
Factorial ANOVAs

Reporting your results

The observed differences
• Because there may be a lot of these, may present them in a table
instead of directly in the text

Kind of design
• e.g. “2 x 2 completely between factorial design”

Computed F-ratios
• May see separate paragraphs for each factor, and for interactions

Degrees of freedom for the test
• Each F-ratio will have its own set of df’s

The “p-value” of the test
• May want to just say “all tests were tested with an alpha level of
0.05”

Any post-hoc or planned comparison results
• Typically only the theoretically interesting comparisons are
presented
Factorial ANOVAs

Example: Suppose that you notice that the
more you study for an exam, the better your
score typically is.


This suggests that there is a relationship
between study time and test performance.
We call this relationship a correlation.
Relationships between variables

Properties of a correlation




Form (linear or non-linear)
Direction (positive or negative)
Strength (none, weak, strong, perfect)
To examine this relationship you should:


Make a scatterplot
Compute the Correlation Coefficient
Relationships between variables


Plots one variable against the other
Useful for “seeing” the relationship



Form, Direction, and Strength
Each point corresponds to a different individual
Imagine a line through the data points
Scatterplot
Y
6
Hours
study
Exam
perf.
X
6
1
Y
6
2
5
5
6
2
3
4
1
3
2
Scatterplot
4
3
1
2
3
4
5
6 X



A numerical description of the relationship between two
variables
For relationship between two continuous variables we
use Pearson’s r
It basically tells us how much our two variables vary
together

As X goes up, what does Y typically do
• X, Y
• X, Y
• X, Y
Correlation Coefficient
Linear
Form
Non-linear
Negative
Positive
Y
Y
X
X
• As X goes up, Y goes up
• As X goes up, Y goes down
• X & Y vary in the same
direction
• X & Y vary in opposite
directions
• Positive Pearson’s r
• Negative Pearson’s r
Direction

Zero means “no relationship”.


The farther the r is from zero, the stronger the
relationship
The strength of the relationship

Spread around the line (note the axis scales)
Strength
r = -1.0
“perfect negative corr.”
-1.0
r = 0.0
“no relationship”
r = 1.0
“perfect positive corr.”
0.0
The farther from zero, the stronger the relationship
Strength
+1.0
Rel A
r = -0.8
Rel B
r = 0.5
-.8
-1.0
.5
0.0
Which relationship is stronger?
Rel A, -0.8 is stronger than +0.5
Strength
+1.0

Compute the equation for the line that best
fits the data points
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0.5
Change in Y
1
2
3
Regression
4
5
6 X
Change in X
2.0
= slope

4.5
Can make specific predictions about Y
based on X
Y
6
5
X=5
Y = (X)(.5) + (2.0)
Y=?
Y = (5)(.5) + (2.0)
Y = 2.5 + 2 = 4.5
4
3
2
1
1
2
3
Regression
4
5
6 X

Also need a measure of error
Y = X(.5) + (2.0) + error
Y = X(.5) + (2.0) + error
• Same line, but different relationships (strength difference)
Y
6
5
Y
6
5
4
3
2
1
4
3
2
1
1
2
3
4
5
Regression
6 X
1
2
3
4
5
6 X



Don’t make causal claims
Don’t extrapolate
Extreme scores (outliers) can strongly
influence the calculated relationship
Cautions with correlation & regression