Download t - PBworks

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Week 4
Associational research
 Looks at the relationship between two variables
 Usually continuous variables
 No manipulation of IV
 Correlation coefficient shows relationship between 2
variables
 Regression: equation used to predict outcome value
based on predictor value
 Multiple regression: same, but uses more than 1
predictor
What is a correlation?
 Know that statistical model is:
𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑚𝑜𝑑𝑒𝑙 + 𝑒𝑟𝑟𝑜𝑟𝑖
 For correlation, this can be expressed as:
𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑏𝑥𝑖 + 𝑒𝑟𝑟𝑜𝑟𝑖
 Simplified: outcome is predicted from predictor
variable and some error
 b = Pearson product-moment correlation, or r
Covariance
 Covariance: extent to which 2 variables covary with
one another
Cov( x, y) 
  xi  x  yi  y 
N 1
 Shows how much deviation with one variable is
associated with deviation in the second variable
Covariance example
Covariance example
( x i  x )( y i  y )
cov( x , y ) 
N 1
( 0.4)( 3)  ( 1.4)( 2 )  ( 1.4)( 1)  (0.6)( 2 )  ( 2.6)( 4)

4
1.2  2.8  1.4  1.2  10.4

4
 17
4
 4.25
Covariance
 Positive covariance: As one variable deviates from
mean, other variable deviates in same direction
 Negative covariance: As one variable deviates from
mean, other variable deviates in opposite direction
 Problem with covariance: depends on scales variables
measured on
 Can’t be compared across measures
 Need standardized covariance to compare across
measures
Correlation
 Standardized measure of covariance
r

Covxy
sx s y
  xi  x  yi  y 
 N 1s x s y
 Known as Pearson’s product-moment correlation, r
Correlation example
 From previous table:
r
Covxy
sx s y
4.25

1.67  2.92
 .87
Correlation
 Values range from -1 to +1
 +1: perfect positive correlation: as one variable
increases, other increases by proportionate amount
 -1: perfect negative correlation: as one variable
increases, other decreases by proportionate amount
 0: no relationship. As one variable changes, other stays
the same
Positive correlation
90
Appreciation of Dimmu Borgir
80
70
60
50
40
30
20
10
10
20
30
40
50
Age
60
70
80
90
Negative correlation
100
Appreciation of Dimmu Borgir
80
60
40
20
0
-20
10
20
30
40
50
Age
60
70
80
90
Small correlation
160
Appreciation of Dimmu Borgir
140
120
100
80
60
40
20
0
-20
10
20
30
40
50
Age
60
70
80
90
Correlation significance
 Significance tested using t-statistic
𝑡𝑟 =
𝑟 𝑁−2
1 − 𝑟2
Correlation and causality
 Correlation DOES NOT imply causality!!!
 Only shows us that 2 variables are related to one
another
 Why correlation doesn’t show causality:
 3rd variable problem: some other variable (not
measured) responsible for observed relationship
 No way to determine directionality: does a cause b, or
does b cause a?
Before running a correlation…
Bivariate correlation in SPSS
Note on pairwise & listwise
deletion
 Pairwise deletion: removes cases from analysis on an
analysis-by-analysis basis
 3 variables: A, B, & C



Correlation matrix between A, B, & C
Case 3 is missing data on variable B, but not on A or C
Case 3 will be excluded from correlation between B & C, and A
& B, but not from correlation beteween A & C
 Advantage: keep more of your data
 Disadvantage: not all analyses will include the same
cases: can bias results
Note on pairwise & listwise
deletion
 Listwise deletion: removes cases from analysis if they
are missing data on any variable under consideration
 3 variables: A, B, & C



Correlation matrix between A, B, & C
Case 3 is missing data on variable B, but not on A or C
Case 3 will be excluded from correlation between B & C, A &
B, and A & C
 Advantage: less prone to bias
 Disadvantage: don’t get to keep as much data
 Usually a better option than pairwise
Correlation output
Interpreting correlations
 Look at statistical significance
 Also, look at size of correlation:
 +/- .10: small correlation
 +/- .30: medium correlation
 +/- .50: large correlation
Coefficient of determination,
2
R
 Amount of variance in one variable shared by other
variable
 Example: pretend R2 between cognitive ability and job
performance is .25
 Interpretation: 25% of variance in cognitive ability
shared by variance in job performance
 Slightly incorrect but easier way to think of it: 25% of the
variance in job performance is accounted for by
cognitive ability
Spearman’s correlation coefficient
 Also called Spearman’s rho (ρ)
 Non-parametric
 Based on ranked, not interval or ratio, data
 Good for minimizing effect of outliers and getting
around normality issues
 Ranks data (lowest to highest score)
 Then, uses Pearson’s r formula on ranked data
Kendall’s tau (τ)
 Non-parametric correlation
 Also ranks data
 Better than Spearman’s rho if:
 Small data set
 Large number of tied ranks
 More accurate representation of correlation in
population than Spearman’s rho
Point-biserial correlations
 Used when one of the two variables is a truly
dichotomous variable (male/female, dead/alive)
 In SPSS:
 Code one category of dichotomous variable as 0, and the
other as 1
 Run normal Pearson’s r
 Example: point-biserial correlation of .25 between
species (0=cat & 1=dog) and time spent on the couch
 Interpretation: a one unit increase in the category (i.e.,
from cats to dogs) is associated with a .25 unit increase
in time spent on couch
Biserial correlation
 Used when one variable is a “continuous dichotomy”
 Example: passing exam vs. failing exam
 Knowledge of subject is continuous variable: some people
pass exam with higher grade than others
 Formula to convert point-biserial to biserial:
 P1=proportion of cases in category 1
 P2=proportion of cases in category 2
 y is from z-table: find value roughly equivalent to split
between largest and smallest proportion
 See table on p. 887 in book
Biserial correlation
 Example:
 Correlation between time spent studying for medical
boards and outcome of test (pass/fail) was .35. 70% of
test takers passed.
 𝑟𝑏 =
.35 .30∗.70
.3485
= .46
Partial correlation
 Correlation between two variables when the effect of a
third variable has been held constant
 Controls for effect of third variable on both variables
 Rationale: if third variable correlated (shares variance)
with 2 variables of interest, correlation between these
2 variables won’t be accurate unless effect of 3rd
variable is controlled for
Partial correlation
 Obtain by going to Analyze-correlate-Partial
 Choose variables of interest to correlate
 Choose variable to control
Semi-partial (part) correlations
 Partial correlation: control for effect that 3rd variable
has on both variables
 Semi-partial correlation: control for effect that 3rd
variable has on one variable
 Useful for predicting outcome using combination of
predictors
Calculating effect size
 Can square Pearson’s correlation to get R2: proportion
of variance shared by variables
 Can also square Spearman’s rho to get R2s: proportion
of variance in ranks shared by variables
 Can’t square Kendall’s tau to get proportion of variance
shared by variables
Regression
 Used to predict value of one variable (outcome) from value
of another variable (predictor)
 Linear relationship
Yi=(bo+b1x1)+ei
 = outcome
 = intercept: value of outcome (Y) when predictor (X) = 0
 = slope of line: shows direction & strength of relationship
 = value of predictor (x)
 = deviation of predicted outcome from actual outcome
Regression
 𝑏𝑜 and 𝑏1 are regression coefficients
 Negative 𝑏1 : negative relationship between predictor
and criterion
 Positive 𝑏1 : positive relationship between predictor and
criterion
 Will sometimes see β𝑜 and β1 instead: these are
standardized regression coefficients
 Put values in standard deviation units
Regression
Regression
 Regression example:
 Pretend we have the following regression equation:



Exam grade (Y) = 45 + .35(Hours spent studying) + error
If we know that someone spends 10 hours studying for the
test, what is the best prediction of their exam grade we can
make?
Exam grade = 45 + (.35*10) = 80
Estimating model
 Difference between actual outcome and outcome
predicted by data
Estimating model
 Total error in model = (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖 − 𝑚𝑜𝑑𝑒𝑙𝑖 )2
 Called sum of squared residuals (SSR)
 Large SSR: Model not a good fit to data; small = good
fit
 Ordinary least squares (OLS) regression: used to
define model that minimizes sum of squared residuals
Estimating model
 Total sum of squares (SST): Total sum of squared
differences between observed data and mean value of
Y
 Model sum of squares (SSM): Improvement in
prediction as result of using regression model rather
than mean
Estimating model
 Proportion of improvement due to use of model rather
than mean:
𝑆𝑆𝑀
=
𝑆𝑆𝑇
 Also is indicator of variance shared by predictor and
outcome
 F-ratio: statistical test for determining whether model
describes data significantly better than mean
𝑀𝑆𝑀
𝐹=
𝑀𝑆𝑅
𝑅2
Individual predictors
 b should be significantly different from 0
 0 would indicate that for every 1 unit change in x, y
wouldn’t change
 Can test difference between b and null hypothesis (b =
0) using t-test
𝑏𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑
𝑡=
𝑆𝐸𝑏
Week 5
Outliers in regression
 Outlier can affect regression coefficient
Outliers in regression
 Residual: difference between actual value of outcome
and predicted value of outcome
 Large residuals: poorly-fitting regression model
 Small residuals: regression model good fit
 Unstandardized residual: difference between actual
and predicted outcome value, measured in same units
as outcome
 Standardized residual: Residuals converted to z-scores
 Studentized residual: unstandardized residual divided
by estimate of standard deviation
Influential cases
 Influential case: value that strongly influences
regression model parameter estimates
 Cook’s distance: measure of overall influence of case
on the model
 Values larger than 1 = problem
 Leverage: shows influence of observed value of
outcome variable over predicted values of outcome
 Average leverage = (k + 1)/n, where k is number of
predictors and n is sample size
 Problematic values: (3(k + 1)/n)
Influential cases
 DFBETA: compares regression coefficient when case is
excluded from the model to regression coefficient
when case is included in the model
 Problematic if values larger than 2(√n)
 Mahalanobis distance: measures distance of case from
mean of predictor variable(s)
 Chi square distribution with degrees of freedom equal to
number of predictors
 Significant value = problem
Independent errors
 Durbin-Watson test: tests whether adjacent residuals
are correlated
 Value of 2: residuals uncorrelated
 Value larger than 2: negative correlation between
residuals
 Value smaller than 2: positive correlation between
residuals
 Values greater than 3 or less than 1 problematic
Assumptions of linear regression
models
 Additivity and linearity: outcome linearly related to




additive combination of predictors
Independent errors: uncorrelated residuals
Homoscedasticity: at all levels of predictor, should be
equal variance of residuals
Normally distributed errors (residuals)
Predictors uncorrelated with external variables:
external variables = variables not included in model
that influence outcome variable
Assumptions of linear regression
models
 Predictors must be quantitative, or categorical with
only 2 categories
 Can dummy-code variables if more than 2 categories
 Outcomes quantitative and continuous
 No perfect multicollinearity: No perfect linear
relationship between predictor pairs
 Non-zero variance: predictors need to vary
Multiple regression
 Incorporates multiple predictors into regression model
 Predictors should be chosen based on theory/previous
research
 Not useful to chuck lots of random predictors into
model to see what happens
y  b0  b1 X1 b2 X 2    bn X n   i
Semi-partial correlation
 Foundation of multiple regression
 Measures relationship between predictor and
outcome, controlling for relationship between that
predictor and other predictors in the model
 Shows unique contribution of predictor in explaining
variance in outcome
Reasons for multiple regression
 Want to explain greater amount of variance in
outcome
 “What factors influence adolescent drug use? Can we
predict it better?”
 Want to look at set of predictors in relation to outcome
 Very useful: human behavior rarely determined by just
one thing
 “How much do recruiter characteristics and procedural
justice predict job satisfaction once hired?”
Reasons for multiple regression
 Want to see if adding another predictor (or set of
predictors) will improve prediction above and beyond
known set of predictors
 “Will adding a job knowledge test to current battery of
selection tests improve prediction of job performance?”
 Want to see if predictor(s) significantly related to
outcome after controlling for effect of other predictors
 “Is need for cognition related to educational attainment,
after controlling for socioeconomic status?”
Entering predictors
 Hierarchical regression
 Known predictors entered into model first
 New/untested predictors added into models next
 Good for assessing incremental validity
 Forced entry
 All predictors forced into model at same time
 Stepwise
 DON’T USE IT!
 Adds predictors based upon amount of variance explained
 Atheoretical & capitalizes on error/chance variation
Multicollinearity
 Perfect collinearity: one predictor has perfect
correlation with another predictor
 Can’t get unique estimates of regression coefficients:
both variables share same variance
 Lower levels of multicollinearity common
Multicollinearity
 Problems with multicollinearity:
 Untrustworthy bs due to increase in standard errormore variable across samples
 Limits R: If two variables highly correlated, they share a
lot of variance. Each will then account for very little
unique variance in the outcome

Adding predictor to model that’s correlated strongly with
existing predictor won’t increase R by much even if on it’s own
it’s strongly related to outcome
 Can’t determine importance of predictors: since variance
shared between predictors, which accounts for more
variance in outcome?
Multicollinearity
 Example: You’re trying to predict social anxiety using
emotional intelligence and number of friends as
predictors
 What if emotional intelligence and number of friends
are related?
Multicollinearity
Social anxiety
Emotional
intelligence
Number of
friends
Both
explain this
variance in
outcome
Multicollinearity
 Could have high R accompanied by very small bs
 Variance inflation factor (VIF): evaluates linear
relationship between predictor and other predictors
 Largest VIF greater than 10: problem
 Average VIF greater than 1: problem

Calculate this by adding up VIF values across predictors, and
then dividing by number of predictors
 Tolerance: reciprocal of VIF (1/VIF)
 Below .10: major problem
 Below .20: potential problem
Multicollinearity
 Many psychological variables are slightly correlated
 Likely to run into big multicollinearity problems if you
include 2 predictors measuring the same, or very
similar, constructs
 Examples:




Cognitive ability and problem-solving
2 different conscientiousness measures
Job knowledge and a situational interview
Scores on 2 different anxiety measures
Homoscedasticity
 Can plot zpred (standardized predicted values of DV
based on model) against zresid (standardized
residuals)
Homoscedasticity
 Should look like a random scatter of values
Multiple regression in SPSS
Multiple regression in SPSS
Multiple regression in SPSS
Regression output
 R: Correlation between actual outcome values, and
values predicted by regression model
 R2: Proportion of variance in outcome predicted by
model
 Adjusted R2: estimate of value in population (adjusted
for shrinkage that tends to occur in cross-validated
model due to sampling error)
Regression output
 F-test: compares variance explained by model to
variance unaccounted for by model (error)
 Shows whether predictions based on model are more
accurate than predictions made using mean
Regression output
 Beta (b) values: change in outcome associated with a
one-unit change in a predictor
 Standardized beta (β) values: beta values expressed as
standard deviations
Practice time!
 The following tables show the results of a regression
model predicting Excel training performance using 5
variables: self-efficacy (Setotal), Excel use (Rexceluse),
Excel formula use (Rformulause), cognitive ability
(WPTQ), and task-switching IAT score (TSA_score)
Interpret this…
And this…
And finally this
Week 6 and 7
Categorical variables
 When categorical variable has 2 categories
(male/female, dead/alive, employed/not employed),
can put it directly into regression
 When categorical variable has more than 2 categories
(freshman/sophomore/junior/senior, entry level/first
line supervisor/manager), can’t input it directly into
regression model
 Have to dummy code categorical variable
Categorical variables
 Dummy variables: represent group membership using
zeroes and ones
 Have to create a series of new variables
 Number of variables=number of categories - 1
 Example: freshman/sophomore/junior/senior
Categorical variables
 Eight steps in creating and using dummy coded
variables in regression:
 1. Count number of groups in variable and subtract 1
 2. Create as many new variables as needed based on step
1
 3. Choose one of groups as baseline to compare all other
groups against

Usually this will be the control group or the majority group
 4. Assign values of 0 to all members of baseline group
for all dummy variables
Categorical variables
 5. For first dummy variable, assign 1 to members of the
first group that you want to compare against baseline
group. Members of all other groups get a 0.
 6. For second dummy variable, assign 1 to all members of
second group you want to compare against baseline
group. Members of all other groups get a 0.
 7. Repeat this for all dummy variables.
 8. When running regression, put all dummy variables in
same block
Categorical variables
 Example: One variable with 4 categories: Freshman,
sophomore, junior, senior.
Categorical variables
Categorical variables
Categorical variables
Categorical variables
Categorical variables
 Each dummy variable is included in the regression
output
 Regression coefficient for each dummy variable shows
change in outcome that results when moving from
baseline (0) to category being compared (1): difference
in outcome between baseline group and other group
 Example: Compared to freshmen, seniors’ attitudes
towards college scores are 1.94 points higher
 Significant t-value: group coded as 1 for that dummy
variable significantly different on outcome than
baseline group
Moderation
 Relationship between 2 variables depends on the level
of a third variable
 Interaction between predictors in model
Moderation
 Many research questions deal with moderation!
 Example: In I/O psychology, moderation important for
evaluating predictive invariance

Does the relationship between a selection measure and job
performance vary depending on demographic group (Male vs.
female, White vs. Black, etc.)?
 Example: In clinical/counseling, moderation important
for evaluating risk for mental illness

Does the relationship between exposure to a stressful
situation and subsequent mental illness diagnosis vary
depending on the individual’s social support network?
Moderation
Moderation
 𝑌𝑖 = 𝑏𝑜 + 𝑏1 𝐴𝑖 + 𝑏2 𝐵𝑖 + 𝑏3 𝐴𝐵𝑖 + 𝑒𝑖
 Basic regression equation with minor change: 𝐴𝐵𝑖
 Outcome depends on
 Intercept (𝑏𝑜 )
 Score on variable A (𝑏1 𝐴𝑖 ), and relationship between
variable A and Y
 Score on variable B (𝑏2 𝐵𝑖 ), and relationship between
variable B and Y
 Interaction (multiplication) between scores on variables
A and B (𝑏3 𝐴𝐵𝑖 ), and relationship between AB and Y
Moderation
 Moderator variables can be either categorical (low
conscientiousness/high conscientiousness; male vs.
female, etc.) or continuous (conscientiousness scores
from 1-7)
 Categorical: can visualize interaction as two different
regression lines, one for each group, which vary in
slope (and possibly in intercept)
Moderation
Moderation
 Continuous moderator: visualize in 3-dimensional
space: more complex relationship between moderator
and predictor variable
 Slope of one predictor changes as values of moderator
change
 Pick a few values of moderator and generate graphs for
easier interpretation
Moderation
 Prior to analysis, need to grand-mean center predictors
 Doing makes interactions easier to interpret (why we
center)





Regression coefficients show relationship between predictor and
criterion when other predictor equals 0
Not all variables have meaningful 0 in context of study: age,
intelligence, etc.
Could end up trying to interpret effects based on non-existing score
(such as the level of job performance for person with intelligence
score of 0)
Once interactions are factored in, interpretation becomes
increasingly problematic
Also reduces nonessential multicollinearity (i.e., correlations due to
the way that the variables were scaled)
Moderation
 Grand mean centering: subtract mean of variable from
all scores on that variable
 Centered variables used to calculate interaction term

Creates interaction variable
 Don’t center categorical predictors

Just make sure it is scaled 0 and 1
 Don’t center outcome/dependent variable

Centering only applies to predictors
Moderation
 For centered variable, value of 0 represents the mean
value on the predictor
 Since transformation is linear, doesn’t change regression
model substantially
 Interpretation of regression coefficients easier


Without centering: interaction = how outcome changes with
one-unit increase in moderator when predictor = 0
With centering: interaction = how outcome changes with
one-unit increase in moderator when predictor = mean
Grand mean centering
Moderation
 Steps for moderation in SPSS:
 1. Grand-mean center continuous predictor(s)
 2. Enter both predictor variables into first block
 3. Enter interaction term in second block

Doing it this way makes it easier to look at R2 change
 4. Run regression and look at results
 5. If interaction term significant:


Categorical predictor: Line graph between predictor and DV,
with a different line for each category
Continuous predictor: Simple slopes analysis
Simple slopes analysis
 Basic idea: values of outcome (Y) calculated for
different levels of predictor and moderator: low,
medium, and high
 Usually defined as -1 SD, mean, + 1 SD
 Recommend using online calculator for these (can be
done by hand, but it’s a pain)
 http://www.jeremydawson.co.uk/slopes.htm
 http://quantpsy.org/interact/mlr2.htm
Simple slopes analysis
 Example:
 Aggression = 39.97 +(.17*video) + (.76*callous) +
(.027(video*callous)
 For 1 SD below on video games at low levels of callous
unemotionality:
39.97 + (.17*-6.9622)+(.76*-9.6177)+(.027*(-6.9622*9.6177) = 33.29
 Would do this 8 more times so that you had values of
aggression at low, medium, and high levels of callous
unemotionality and video game playing
Simple slopes analysis
Creating interaction term
Entering variables
Entering variables
Output
Simple slopes analysis
Low Callous
Dependent variable
5
4.75
4.5
4.25
4
3.75
3.5
3.25
3
2.75
2.5
2.25
2
1.75
1.5
1.25
1
0.75
0.5
0.25
0
Husbands
Wives
Low Attractiveness
High Attractiveness
Research Designs
Comparing Groups
Week 8
Quasi-experimental designs
Quasi-experiments

No random assignment

Goal is still to investigate relationship between proposed
causal variable and an outcome

What they have:


Manipulation of cause to force it to happen before outcome

Assess covariation of cause and effect
What they don’t have:

Limited in ability to rule out alternative explanations

But design features can improve this
One group posttest only
design
X

O1
Problems:

No pretest: did anything change?

No control group: what would have happened if IV not
manipulated?

Doesn’t control for threats to internal validity
One group posttest only
design

Example: An organization implemented a new pay-forperformance system, which replaced its previous pay-byseniority system. A researcher was brought in after this
implementation to administer a job satisfaction survey
One group pretest-posttest
design
O1
X
O2

Adding pretest allows assessment of whether change
occurred

Major threats to internal validity:

Maturation: change of participants due to natural causes

History: change due to historical event (recession, etc.)

Testing: desensitizing participants to the test, using the
same pretest for posttest
One group pretest-posttest
design

Example: An organization wanted to implement a new
pay-for-performance system to replace its pay-byseniority system. A researcher was brought in to
administer a job satisfaction questionnaire before the
pay system change, and again after the pay system
change
Removed treatment design
O1
X
O2
O3
X
O4

Treatment given, and then removed

4 measurements of DV: 2 pretests, and 2 posttests

If treatment affects DV, DV should go back to its pretreatment level after treatment removed

Unlikely that threat to validity would follow this same
pattern

Problem: assumes that treatment can be removed with no
lingering effects

May not be possible or ethical (i.e., ethical conundrum: taking
away schizophrenic patients’ medicine treatment; possibility
conundrum: therapy for depression, benefits would still be
experienced) )
Removed treatment design

Example: A researcher wanted to evaluate whether
exposure to TV reduced memory capacity. Participants
first completed a memory recall task, then completed
the same task while a TV plays a sitcom in the
background. After a break, participants again complete
the memory task while the TV plays in the background,
then complete it again with the TV turned off.
Repeated treatment design
O1
X
O2
X
O3
X
O4

Treatment introduced, removed, and then re-introduced

Threat to validity would have to follow same schedule of
introduction and removal-very unlikely

Problem: treatment effects may not go away
immediately
Repeated treatment design

Example: A researcher wanted to investigate whether
piped-in classical music decreased employee stress. She
administered a stress survey, and then piped in music.
One week later, stress was measured again. The music
was then removed, and stress was measured again one
week later. The music was then piped in again, and
stress was measured a final time one week later.
Posttest-only with
nonequivalent groups
NR
X
NR

Participants not randomly assigned to groups

One group receives treatment, one does not

DV measured for both groups

Big validity threat: selection
O1
O2
Posttest-only with
nonequivalent groups

Example: An organization wants to implement a policy
against checking email after 6pm in an effort to reduce
work-related stress. The organization assigns their
software development department to implement the
new policy, while the sales department does not
implement the new policy. After 2 months, employees
in both departments complete a work stress scale.
Untreated control group with
pretest and posttest
NR
O1
NR
O1
X

Pretest and posttest data gathered on same
experimental units

Pretest allows for assessment of selection bias

Also allows for examination of attrition
O2
O2
Untreated control group with
pretest and posttest

Example: A community is experimenting with a new
outpatient treatment program for meth addicts. Current
treatment recipients had the option to participate
(experimental group) or not participate (control group).
Current daily use of meth was collected for all
individuals. Those in the experimental group completed
the new program, while those in the control group did
not. Following the program, participants in both groups
were asked to provide estimates of their current daily
use of meth.
Switching replications
NR
O1
NR
O1
X
O2
O2
O3
X

Treatment eventually administered to group that
originally served as control

Problems:
O3

May not be possible to remove treatment from one group

Can lead to compensatory rivalry
Switching replications

Example: An organization implemented a new reward
program to reduce absences. After a month of no
absences, employees were…The manufacturing
organization from the previous scenario removed the
reward program from the Ohio plant, and implemented it
in the Michigan plant. Absences were gathered and
compared 1 month later.
Reversed-treatment control
group
NR
O1
X+
O2
NR
O1
X-
O2

Control group given treatment that should have opposite
effect of that given to treatment group

Rules out many potential validity threats

Problems: may not be feasible (pay/performance, what’s
the opposite?) or ethical
Reversed-treatment control
group

Example: A researcher wanted to investigate the effect
of mood on academic test performance. All participants
took a pre-test of critical reading ability. The treatment
group was put in a setting which stimulated positive
mood (calming music, lavender scent, tasty snacks) while
the control group was put in a setting which stimulated
negative mood (annoying children’s show music, sulfur
scent, no snacks). Participants then completed the
critical reading test again in their respective settings.
Randomized experimental
designs
Randomized experimental
designs

Participants randomly assigned to groups

Random assignment: any procedure that assigns units to
conditions based on chance alone, where each unit has a
nonzero probability of being assigned to any condition

NOT random sampling!

Random sampling concerns how sample obtained

Random assignment concerns how sample assigned to
different experimental conditions
Why random assignment?


Researchers in natural sciences can rigorously control
extraneous variables

People are tricky. Social scientists can’t exert much
control.

Can’t mandate specific level of cognitive ability, exposure
to violent TV in childhood, attitude towards women, etc.
Random assignment to conditions reduces chances that
some unmeasured third variable led to observed
covariation between presumed cause and effect
Why random assignment?

Example: what if you assigned all participants who signed
up in the morning to be in the experimental group for a
memory study, and all those who signed up in the
afternoon to be in the control group?

And those who signed up in the morning had an average age
of 55 and those who signed up in the afternoon had an
average age of 27?

Could difference between experimental and control groups
be attributed to manipulation?
Random assignment

Since participants randomly assigned to conditions,
expectation that groups are equal prior to experimental
manipulations


Any observed difference attributable to experimental
manipulation, not third variable
Doesn’t prevent all threats to validity

Just ensures they’re distributed equally across conditions so
they aren’t confounded with treatment
Random assignment

Doesn’t ensure groups are equal

Just ensures expectation that they are equal

No obvious reason why they should differ

But they still could

Example: By random chance, average age of control group
may be higher than average age of experimental group
Random assignment

Random assignment guarantees equality of groups, on
average, over many experiments

Does not guarantee that any one experiment which uses
random assignment will have equivalent groups

Within any one study, groups likely to differ due to sampling
error

But, if random assignment process was conducted over
infinite number of groups, average of all means for
treatment and control groups would be equal
Random assignment

If groups do differ despite random assignment, those
differences will affect results of study

But, any differences due to chance, not to way in which
individuals assigned to conditions

Confounding variables unlikely to correlate with
treatment condition
Posttest-only control group
design
R
X
R
O
O

Random assignment to conditions (R)

Experimental group given treatment/IV manipulation (X)

Outcome measured for both groups (O)
Posttest-only control group
design

Example:

Participants assigned to control group (no healthy eating
seminar) or treatment group (90 minute healthy eating
seminar)

6 months later, participants given questionnaire assessing
healthy eating habits

Scores on questionnaire compared for control group and
treatment group
Problems with posttest-only
control group design

No pretest

If attrition occurs, can’t see if those who left were any
different than those who completed study

No pretest makes it difficult to assess change on
outcome
Pretest-posttest control group
design
R
P
R
P
X
O
O

Randomly assigned to conditions

Given pretest (P) measuring outcome variable

One group given treatment/IV manipulation

Outcome measured for both groups

Variation: can randomly assign after pretest
Pretest-posttest control group
design

Example:

Randomly assign undergraduate student participants to
control group and treatment group

Give pretest on attitude towards in-state tuition for
undocumented students

Control group watches video about history of higher
education for 20 minutes, while treatment group watches
video explaining challenges faced by undocumented
students in obtaining college degree

Give posttest on attitude towards in-state tuition for
undocumented students
Factorial designs

Have 2 or more independent variables


Naming logic: # of levels in IV1 x # of levels in IV2 x …# of
levels in IV X
3 advantages:

Require fewer participants since each participant receives
treatment related to 2 or more IVs

Treatment combinations can be evaluated

Interactions can be tested
Factorial designs

R
XA1B1
O
R
XA1B2
O
R
XA2B1
O
R
XA2B2
O
For 2x2 design:

Randomly assign to conditions (there are 4)

Each condition represents 1 of 4 possible IV combinations

Measure outcome
Factorial designs

Example:

2 IVs of interest: room temperature (cool/hot) and noise
level (quiet/noisy)

DV = number of mistakes made in basic math calculations

Randomly assign to 1 of 4 groups:

Quiet/cool

Quiet/hot

Noisy/cool

Noisy/hot

Measure number of mistakes made in math calculations

Compare means across groups using factorial ANOVA
Factorial designs

2 things we can look for with these designs:

Main effects: average effects of IV across treatment levels
of other IV

Did participants do worse in the noisy than quiet conditions?

Did participants do worse in the hot than cool conditions

Main effect can be misleading if there is a moderator
variable

Interaction: Relationship between one IV and DV depends
on level of other IV

Noise level positively related to number of errors made, but
only if room hot
Within-subjects randomized
experimental design
R
Order 1 Conditi O1
on 1
Conditi O2
on 2
R
Order 2 Conditi O1
on 2
Conditi O2
on 1

Participants randomly assigned to either order 1 or order 2

Participants in order 1 receive condition 1, then condition 2

Participants in order 2 receive condition 2, then condition 1

Having different orders prevents order effects

Having participants in more than 1 condition reduces error
variance
Within-subjects randomized
experimental design

Example:

Participants randomly assigned to order 1 or order 2

Participants in order 1 reviewed resumes with the
applicant’s picture attached and made hiring
recommendations. They then reviewed resumes without
pictures and made hiring recommendations.

Participants in order 2 reviewed resumes without pictures
and made hiring recommendations. They then reviewed
resumes with the applicant’s picture attached and made
hiring recommendations.
Data analysis
With 2 groups

Need to compare 2 group means to determine if they are
significantly different from one another

If groups independent, use independent samples t-test


If participants in one group are different from the
participants in the other group
If repeated measures design, use repeated measures ttest
With 3 or more groups

Still need to compare group means to determine if they
are significantly different

If only 1 IV, use a one-way ANOVA

If 2 or more IVs, use a factorial ANOVA

If groups are not independent, use repeated measures
ANOVA
Design practice

Research question:


Does answering work-related communication (emails, phone
calls) after normal working hours affect work-life balance?
Design BOTH a randomized experiment AND a quasiexperiment to evaluate your research question

For each design (random and quasi):

Operationalize variables and develop a hypothesis(es)

Name and explain the experimental design as it will be used to
test your hypothesis(es)

Name and explain one threat to internal validity in your design
Week 9
Comparing means
 2 primary ways to evaluate mean differences between
groups:
 t-tests
 ANOVAs
 Which one you use will depend on how many groups
you want to compare, and how many IVs you have
 2 groups, 1 IV, 1 DV: t-test
 3 or more groups, 1 or more IVs, 1 DV: ANOVA


One-way ANOVA if only 1 IV
Factorial ANOVA if 2 or more IVs
t-tests
 Used to compare means on one DV between 2 groups
 Do men and women differ in their levels of job
autonomy?
 Do students who take a class online and students who
take the same class face-to-face have different scores on
the final test?
 Do individuals report higher levels of positive affect in
the morning than they report in the evening?
 Do individuals given a new anti-anxiety medication
report different levels of anxiety than individuals given a
placebo?
t-tests
 2 different options for t-tests:
 Independent samples t-test: individuals in group 1 are
not the same as individuals in group 2

Do self-reported organizational citizenship behaviors differ
between men and women?
 Repeated measures t-test: individuals in group 1 are the
same as individuals in group 2

Do individuals report different levels of job satisfaction when
surveyed on Friday than they do when surveyed on Monday?
A note on creating groups
 Beware of dichotomizing a continuous variable in
order to make 2 groups
 Example: everyone who scored a 50% or below on a test
goes in group 1, and everyone who scored 51% or higher
goes in group 2
 Causes several problems
 People with very similar scores around cut point may
end up in separate groups
 Reduces statistical power
 Increases chances of spurious effects
t-tests and the linear model
 t-test is just linear model with one binary predictor
variable
𝑌𝑖 = 𝑏0 + 𝑏1 𝑥1 + 𝑒𝑖
 Predictor has 2 categories (male/female,
control/experimental)



Dummy variable: 0=baseline group, 1 =
experimental/comparison group
𝑏0 is equal to mean of group coded 0
𝑏1 is equal to difference between group means
Rationale for t - test
 2 sample means collected-need to see how much they
differ
 If samples from same population, expect means to be
roughly equivalent
 Large differences unlikely to occur due to chance
 When we do a t-test, we compare difference between
sample means to difference we would expect if null
hypothesis was true (difference = 0)
Rationale for t-test
 Standard error = gauge of differences between means
likely to occur due to chance alone


Small standard error: expect similar means if both samples
from same population
Large standard error: expect somewhat different means even if
both samples from same population
 t-test evaluates whether observed difference between
means is larger than would be expected, based on
standard error, if samples from same population
Rationale for t-test
 Top half of equation = model
 Bottom half of equation = error
Independent samples t-test
 Use when each sample contains different individuals
 Look at ratio of between-group difference in means to
estimate of total standard error for both groups
 Variance sum law: variance of difference between 2
independent variables = sum of their variances
 Use sample standard deviations to calculate standard
error for each population’s sampling distribution
Independent samples t-test
 Assuming that sample sizes are equal:
𝑋1 − 𝑋2
𝑡= 2
𝑠 1 𝑠22
+
𝑁1 𝑁2
 Top half: difference between means
 Bottom half: each sample’s variance divided by its
sample size
Independent samples t-test
 If sample sizes are not equal, need to use pooled
variance, which weights variance for each sample to
account for sample size differences
 Pooled variance:
2
sp

n1  1 

 n2  1
n1  n 2  2
2
s1
2
s2
Independent samples t-test
 Equation for independent samples t-test with different
sample sizes:
t
X1  X 2
2
sp
n1

2
sp
n2
Differences between
groups
Error
Paired samples/repeated measures
t-test
 Use when same people are in both samples
 Average difference between scores at measurement 1
and measurement 2: 𝐷
 Shows systematic variation between measurements
 Difference that we would expect between
measurements if null hypothesis true: 𝜇𝐷
 Since null hypothesis says that difference = 0, this
cancels out
 Measure of error = standard error of differences: 𝑠𝐷
𝑁
Paired samples/repeated measures
t-test
D  D
t
sD N
= 0 and cancels
out (what we
would expect to
see if the null
were true)
Assumptions of t-tests
 Both types of t-tests are parametric and assume
normality of sampling distribution
 For repeated measures, refers to sampling distribution
of differences
 Data on DV have to be measured at interval level
 Can’t be nominal or ordinal
 Independent samples t-test assumes variances of each
population equivalent (homogeneity of variance)
 Also assumes scores in each sample independent of
scores in other sample
Assumptions of t-tests
 Independent samples t-tests will automatically do
Levene’s test for you
 If Levene’s not significant, homogeneity of variance
assumption met: interpret first line of output (equal
variances assumed)
 If Levene’s is significant, homogeneity of variance
assumption not met: interpret second line out output
(equal variances not assumed)
Independent samples t-test
example
 DV = Number of items skipped on ability test
 Group 1: Took test in unproctored setting
 Group 2: took test in proctored setting
Independent samples t-test
example
Independent samples t-test
example
Independent samples t-test
Independent samples t-test
 Need to report effect size
 Can convert to r:
 r = √(7.65*7.65)/(7.65*7.65)+1642.492 ))= .184
Values taken from Slide 21 Independent samples t-test (proctored v. unproctored)
Independent samples t-test
 More commonly use d:
𝑋1 − 𝑋2
𝑑=
𝑠2
 d = (.56-.23)/1.431 = 0.23
Values taken from Slide 21
Independent samples t-test
(proctored v. unproctored)
 Note on d: Book shows d calculation using only 1 sd
 In practice, more common to use pooled standard
deviation
 Interpretation (Cohen, 1988): .20 = small, .50 =
medium, .80 = large

Negative d means that 𝑋2 larger than 𝑋1
Repeated measures t-test example
 DV = Perceptions of procedural justice
 Measurement 1: Participants took one type of Implicit
Association Test (task-switching ability)
 Measurement 2: Participants took traditional cognitive
ability test (WPT-Q)
Repeated measures t-test example
Repeated measures t-test example
Repeated measures t-test example
Repeated measures t-test effect
sizes
 Still need to calculate effect sizes
 Problem with r in repeated measures t-test: tends to
over-estimate effect size
 Better off using d with repeated measures designs:
better estimate of effect size
 Formula for repeated measures d = (D – μD)/S
Comparing the t-tests
 If you have the same people in both groups, ALWAYS
use repeated measures t-test (or you violate one of the
assumptions of the independent t-test)
 Non-independence of errors violates assumptions of
independent samples t-test
 Power is higher in repeated measures t-test

Reduces error variance by quite a bit since same participants
are in both samples
One-way ANOVA
 ANOVA = analysis of variance
 One-way ANOVA allows us to compare means on a
single DV across more than 2 groups
Why we need ANOVA
 Doing multiple t-tests (control vs. group 1, control vs.




group 2, etc.) on data inflates the Type I error rate
beyond acceptable levels
Familywise error rate assuming α = .05 for each test:
1 – (.95)n
n = number of comparisons being made
So, with 3 comparisons, overall α = .143
With 4 comparisons, overall α = .185
ANOVA and the linear model
 Mathematically, ANOVA and regression are the same thing!
 ANOVA output: F-ratio: comparison of systematic to
unsystematic variance
 Same as F ratio in regression: shows improvement in
prediction of outcome gained by using model as compared to
just using mean
 Only difference between ANOVA and regression: predictor
is categorical variable with more than 2 categories
 Exactly the same as using dummy variables in regression
 Linear model with # of predictors equal to number of groups
-1
ANOVA and the linear model
 Intercept (b0) will be equal to the mean of the baseline
group (group coded as 0 in all dummy variables
 Regression coefficient b1 will be equal to the difference
in means between baseline group and group 1
 Regression coefficient b2 will be equal to the difference
in means between baseline group and group 2
F ratio
𝐹=
𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑢𝑛𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑒𝑟𝑟𝑜𝑟)
 Systematic variance, in ANOVA, is mean differences
between groups
 Null hypothesis: group means are same
 In this case, systematic variance would be small
 Thus, F would be small
ANOVA logic
 Simplest model we can fit to data is grand mean (of DV)
 We try to improve on this prediction by creating a more
complex model
 Parameters include intercept (b0) and one or more regression
coefficients (b1, b2, etc.)
 Bigger regression coefficients = bigger differences between
groups
 If between group differences large, model better fit to data
than grand mean
 If model fit is better than grand mean, then between-group
differences are significant
Total sum of squares (SST)
SST 
(xi  x grand)
2
 This shows the total amount of variation within the
data
 Grand mean on DV subtracted from each observation’s
value on DV
 Total degrees of freedom for SST: N-1
Model sum of squares (SSM)
SS M 
 ni (xi  x grand)
2
 This shows how much variance the linear model explains
 Calculate difference between mean of each group and
grand mean, square this value (each value), then multiply
it by the number of participants in the group
 Add the values for each group together
 Degrees of freedom: k – 1, where k is number of groups
Residual sum of squares (SSR)
SS R  (x i  x i )
2
 This shows differences in scores that aren’t explained by
model (i.e., aren’t explained by between-group differences)
 Calculated by subtracting the group mean from each score,
squaring this value, and then adding all of the values
together
 Degrees of freedom = N – k, where k = number of groups
and N is overall sample size
Mean squares
 To get a mean square value, divide sum of squares
value by its degrees of freedom
 Mean square model (MSM) = SSM/k-1
 Mean square residual (MSR) = SSR/N - k
F ratio
 Calculated using mean square values:
 Degrees of freedom for F: (k-1), (N – k)
 If F is statistically significant, group means differ by
more than they would if null hypothesis were true
 F is omnibus test: only tells you whether group means
differ significantly: there’s a difference somewhere
 Doesn’t tell you which means differ from one another
 Need post-hoc tests to determine this
Post-hoc tests
 Pairwise comparisons to compare all groups to one
another
 All incorporate correction so that Type I error rate is
controlled (at about .05)
 Example: Bonferroni correction (very conservative): use
significance level( usually .05) α/n, where n is number of
comparisons
 So, if we have 3 groups and we want to keep α at .05
across all comparisons, each comparison will have α =
.017
Post-hoc tests
 Lots of options for post hoc tests in SPSS
 Some notes on the more common ones:
 Least significant difference (LSD): doesn’t control Type I error
very well
 Bonferroni’s and Tukey’s: control Type I error rate, but lack
statistical power (too conservative)
 REGWQ: controls Type I error and has high power, but only
works if sample sizes equal across groups
 Games-Howell: less control of Type I error, but good for
unequal sample sizes and unequal variance across groups
 Dunnett’s T3: good control of Type I error, works if unequal
variance across groups
Assumptions of ANOVA
 Homogeneity of variance: can check with Levene’s test
 If Levene’s significant and homogeneity of variance
assumption violated, need to use corrected F ratio


Brown-Forsyth F
Welch’s F
 Provided group sizes equal, ANOVA works ok if normality
assumption violated somewhat
 If group sizes not equal, ANOVA biased if data non-normal
 Non-parametric alternative to ANOVA: Kruskal-Wallis test
(book covers in detail)
Steps for doing ANOVA
Effect sizes for ANOVA
 R2: SSM/SST
 When applied to ANOVA, value called eta squared, η2
 Somewhat biased because it’s based on sample only:
doesn’t adjust for looking at effect size in population
 SPSS reports partial eta squared, but only for factorial
ANOVA: SSB/SSB+ SSE
 Better effect size measure for ANOVA: omega-
squared (ω2 ; SPSS will not measure for you)
𝑆𝑆𝑀 − (𝑑𝑓𝑀 )𝑀𝑆𝑅
2
𝜔 =
𝑆𝑆𝑇 + 𝑀𝑆𝑅
One-way ANOVA in SPSS
 IV: Counterproductive work behavior (CWB) scale that
varied in its response anchors: control, infrequent, &
frequent
 DV: self-reported CWB
One-way ANOVA in SPSS
One-way ANOVA in SPSS
One-way ANOVA in SPSS
One-way ANOVA in SPSS
One-way ANOVA in SPSS
 Calculating omega-squared:

𝜔2
=
21.49− 2 2.29
664.996+2.29
 = .025
 Suggestions for interpreting 𝜔2 :



.01 = small
.06 = medium
.14 = large