* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download t - PBworks
Survey
Document related concepts
Transcript
Week 4 Associational research Looks at the relationship between two variables Usually continuous variables No manipulation of IV Correlation coefficient shows relationship between 2 variables Regression: equation used to predict outcome value based on predictor value Multiple regression: same, but uses more than 1 predictor What is a correlation? Know that statistical model is: 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑚𝑜𝑑𝑒𝑙 + 𝑒𝑟𝑟𝑜𝑟𝑖 For correlation, this can be expressed as: 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑏𝑥𝑖 + 𝑒𝑟𝑟𝑜𝑟𝑖 Simplified: outcome is predicted from predictor variable and some error b = Pearson product-moment correlation, or r Covariance Covariance: extent to which 2 variables covary with one another Cov( x, y) xi x yi y N 1 Shows how much deviation with one variable is associated with deviation in the second variable Covariance example Covariance example ( x i x )( y i y ) cov( x , y ) N 1 ( 0.4)( 3) ( 1.4)( 2 ) ( 1.4)( 1) (0.6)( 2 ) ( 2.6)( 4) 4 1.2 2.8 1.4 1.2 10.4 4 17 4 4.25 Covariance Positive covariance: As one variable deviates from mean, other variable deviates in same direction Negative covariance: As one variable deviates from mean, other variable deviates in opposite direction Problem with covariance: depends on scales variables measured on Can’t be compared across measures Need standardized covariance to compare across measures Correlation Standardized measure of covariance r Covxy sx s y xi x yi y N 1s x s y Known as Pearson’s product-moment correlation, r Correlation example From previous table: r Covxy sx s y 4.25 1.67 2.92 .87 Correlation Values range from -1 to +1 +1: perfect positive correlation: as one variable increases, other increases by proportionate amount -1: perfect negative correlation: as one variable increases, other decreases by proportionate amount 0: no relationship. As one variable changes, other stays the same Positive correlation 90 Appreciation of Dimmu Borgir 80 70 60 50 40 30 20 10 10 20 30 40 50 Age 60 70 80 90 Negative correlation 100 Appreciation of Dimmu Borgir 80 60 40 20 0 -20 10 20 30 40 50 Age 60 70 80 90 Small correlation 160 Appreciation of Dimmu Borgir 140 120 100 80 60 40 20 0 -20 10 20 30 40 50 Age 60 70 80 90 Correlation significance Significance tested using t-statistic 𝑡𝑟 = 𝑟 𝑁−2 1 − 𝑟2 Correlation and causality Correlation DOES NOT imply causality!!! Only shows us that 2 variables are related to one another Why correlation doesn’t show causality: 3rd variable problem: some other variable (not measured) responsible for observed relationship No way to determine directionality: does a cause b, or does b cause a? Before running a correlation… Bivariate correlation in SPSS Note on pairwise & listwise deletion Pairwise deletion: removes cases from analysis on an analysis-by-analysis basis 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, and A & B, but not from correlation beteween A & C Advantage: keep more of your data Disadvantage: not all analyses will include the same cases: can bias results Note on pairwise & listwise deletion Listwise deletion: removes cases from analysis if they are missing data on any variable under consideration 3 variables: A, B, & C Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, A & B, and A & C Advantage: less prone to bias Disadvantage: don’t get to keep as much data Usually a better option than pairwise Correlation output Interpreting correlations Look at statistical significance Also, look at size of correlation: +/- .10: small correlation +/- .30: medium correlation +/- .50: large correlation Coefficient of determination, 2 R Amount of variance in one variable shared by other variable Example: pretend R2 between cognitive ability and job performance is .25 Interpretation: 25% of variance in cognitive ability shared by variance in job performance Slightly incorrect but easier way to think of it: 25% of the variance in job performance is accounted for by cognitive ability Spearman’s correlation coefficient Also called Spearman’s rho (ρ) Non-parametric Based on ranked, not interval or ratio, data Good for minimizing effect of outliers and getting around normality issues Ranks data (lowest to highest score) Then, uses Pearson’s r formula on ranked data Kendall’s tau (τ) Non-parametric correlation Also ranks data Better than Spearman’s rho if: Small data set Large number of tied ranks More accurate representation of correlation in population than Spearman’s rho Point-biserial correlations Used when one of the two variables is a truly dichotomous variable (male/female, dead/alive) In SPSS: Code one category of dichotomous variable as 0, and the other as 1 Run normal Pearson’s r Example: point-biserial correlation of .25 between species (0=cat & 1=dog) and time spent on the couch Interpretation: a one unit increase in the category (i.e., from cats to dogs) is associated with a .25 unit increase in time spent on couch Biserial correlation Used when one variable is a “continuous dichotomy” Example: passing exam vs. failing exam Knowledge of subject is continuous variable: some people pass exam with higher grade than others Formula to convert point-biserial to biserial: P1=proportion of cases in category 1 P2=proportion of cases in category 2 y is from z-table: find value roughly equivalent to split between largest and smallest proportion See table on p. 887 in book Biserial correlation Example: Correlation between time spent studying for medical boards and outcome of test (pass/fail) was .35. 70% of test takers passed. 𝑟𝑏 = .35 .30∗.70 .3485 = .46 Partial correlation Correlation between two variables when the effect of a third variable has been held constant Controls for effect of third variable on both variables Rationale: if third variable correlated (shares variance) with 2 variables of interest, correlation between these 2 variables won’t be accurate unless effect of 3rd variable is controlled for Partial correlation Obtain by going to Analyze-correlate-Partial Choose variables of interest to correlate Choose variable to control Semi-partial (part) correlations Partial correlation: control for effect that 3rd variable has on both variables Semi-partial correlation: control for effect that 3rd variable has on one variable Useful for predicting outcome using combination of predictors Calculating effect size Can square Pearson’s correlation to get R2: proportion of variance shared by variables Can also square Spearman’s rho to get R2s: proportion of variance in ranks shared by variables Can’t square Kendall’s tau to get proportion of variance shared by variables Regression Used to predict value of one variable (outcome) from value of another variable (predictor) Linear relationship Yi=(bo+b1x1)+ei = outcome = intercept: value of outcome (Y) when predictor (X) = 0 = slope of line: shows direction & strength of relationship = value of predictor (x) = deviation of predicted outcome from actual outcome Regression 𝑏𝑜 and 𝑏1 are regression coefficients Negative 𝑏1 : negative relationship between predictor and criterion Positive 𝑏1 : positive relationship between predictor and criterion Will sometimes see β𝑜 and β1 instead: these are standardized regression coefficients Put values in standard deviation units Regression Regression Regression example: Pretend we have the following regression equation: Exam grade (Y) = 45 + .35(Hours spent studying) + error If we know that someone spends 10 hours studying for the test, what is the best prediction of their exam grade we can make? Exam grade = 45 + (.35*10) = 80 Estimating model Difference between actual outcome and outcome predicted by data Estimating model Total error in model = (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖 − 𝑚𝑜𝑑𝑒𝑙𝑖 )2 Called sum of squared residuals (SSR) Large SSR: Model not a good fit to data; small = good fit Ordinary least squares (OLS) regression: used to define model that minimizes sum of squared residuals Estimating model Total sum of squares (SST): Total sum of squared differences between observed data and mean value of Y Model sum of squares (SSM): Improvement in prediction as result of using regression model rather than mean Estimating model Proportion of improvement due to use of model rather than mean: 𝑆𝑆𝑀 = 𝑆𝑆𝑇 Also is indicator of variance shared by predictor and outcome F-ratio: statistical test for determining whether model describes data significantly better than mean 𝑀𝑆𝑀 𝐹= 𝑀𝑆𝑅 𝑅2 Individual predictors b should be significantly different from 0 0 would indicate that for every 1 unit change in x, y wouldn’t change Can test difference between b and null hypothesis (b = 0) using t-test 𝑏𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑡= 𝑆𝐸𝑏 Week 5 Outliers in regression Outlier can affect regression coefficient Outliers in regression Residual: difference between actual value of outcome and predicted value of outcome Large residuals: poorly-fitting regression model Small residuals: regression model good fit Unstandardized residual: difference between actual and predicted outcome value, measured in same units as outcome Standardized residual: Residuals converted to z-scores Studentized residual: unstandardized residual divided by estimate of standard deviation Influential cases Influential case: value that strongly influences regression model parameter estimates Cook’s distance: measure of overall influence of case on the model Values larger than 1 = problem Leverage: shows influence of observed value of outcome variable over predicted values of outcome Average leverage = (k + 1)/n, where k is number of predictors and n is sample size Problematic values: (3(k + 1)/n) Influential cases DFBETA: compares regression coefficient when case is excluded from the model to regression coefficient when case is included in the model Problematic if values larger than 2(√n) Mahalanobis distance: measures distance of case from mean of predictor variable(s) Chi square distribution with degrees of freedom equal to number of predictors Significant value = problem Independent errors Durbin-Watson test: tests whether adjacent residuals are correlated Value of 2: residuals uncorrelated Value larger than 2: negative correlation between residuals Value smaller than 2: positive correlation between residuals Values greater than 3 or less than 1 problematic Assumptions of linear regression models Additivity and linearity: outcome linearly related to additive combination of predictors Independent errors: uncorrelated residuals Homoscedasticity: at all levels of predictor, should be equal variance of residuals Normally distributed errors (residuals) Predictors uncorrelated with external variables: external variables = variables not included in model that influence outcome variable Assumptions of linear regression models Predictors must be quantitative, or categorical with only 2 categories Can dummy-code variables if more than 2 categories Outcomes quantitative and continuous No perfect multicollinearity: No perfect linear relationship between predictor pairs Non-zero variance: predictors need to vary Multiple regression Incorporates multiple predictors into regression model Predictors should be chosen based on theory/previous research Not useful to chuck lots of random predictors into model to see what happens y b0 b1 X1 b2 X 2 bn X n i Semi-partial correlation Foundation of multiple regression Measures relationship between predictor and outcome, controlling for relationship between that predictor and other predictors in the model Shows unique contribution of predictor in explaining variance in outcome Reasons for multiple regression Want to explain greater amount of variance in outcome “What factors influence adolescent drug use? Can we predict it better?” Want to look at set of predictors in relation to outcome Very useful: human behavior rarely determined by just one thing “How much do recruiter characteristics and procedural justice predict job satisfaction once hired?” Reasons for multiple regression Want to see if adding another predictor (or set of predictors) will improve prediction above and beyond known set of predictors “Will adding a job knowledge test to current battery of selection tests improve prediction of job performance?” Want to see if predictor(s) significantly related to outcome after controlling for effect of other predictors “Is need for cognition related to educational attainment, after controlling for socioeconomic status?” Entering predictors Hierarchical regression Known predictors entered into model first New/untested predictors added into models next Good for assessing incremental validity Forced entry All predictors forced into model at same time Stepwise DON’T USE IT! Adds predictors based upon amount of variance explained Atheoretical & capitalizes on error/chance variation Multicollinearity Perfect collinearity: one predictor has perfect correlation with another predictor Can’t get unique estimates of regression coefficients: both variables share same variance Lower levels of multicollinearity common Multicollinearity Problems with multicollinearity: Untrustworthy bs due to increase in standard errormore variable across samples Limits R: If two variables highly correlated, they share a lot of variance. Each will then account for very little unique variance in the outcome Adding predictor to model that’s correlated strongly with existing predictor won’t increase R by much even if on it’s own it’s strongly related to outcome Can’t determine importance of predictors: since variance shared between predictors, which accounts for more variance in outcome? Multicollinearity Example: You’re trying to predict social anxiety using emotional intelligence and number of friends as predictors What if emotional intelligence and number of friends are related? Multicollinearity Social anxiety Emotional intelligence Number of friends Both explain this variance in outcome Multicollinearity Could have high R accompanied by very small bs Variance inflation factor (VIF): evaluates linear relationship between predictor and other predictors Largest VIF greater than 10: problem Average VIF greater than 1: problem Calculate this by adding up VIF values across predictors, and then dividing by number of predictors Tolerance: reciprocal of VIF (1/VIF) Below .10: major problem Below .20: potential problem Multicollinearity Many psychological variables are slightly correlated Likely to run into big multicollinearity problems if you include 2 predictors measuring the same, or very similar, constructs Examples: Cognitive ability and problem-solving 2 different conscientiousness measures Job knowledge and a situational interview Scores on 2 different anxiety measures Homoscedasticity Can plot zpred (standardized predicted values of DV based on model) against zresid (standardized residuals) Homoscedasticity Should look like a random scatter of values Multiple regression in SPSS Multiple regression in SPSS Multiple regression in SPSS Regression output R: Correlation between actual outcome values, and values predicted by regression model R2: Proportion of variance in outcome predicted by model Adjusted R2: estimate of value in population (adjusted for shrinkage that tends to occur in cross-validated model due to sampling error) Regression output F-test: compares variance explained by model to variance unaccounted for by model (error) Shows whether predictions based on model are more accurate than predictions made using mean Regression output Beta (b) values: change in outcome associated with a one-unit change in a predictor Standardized beta (β) values: beta values expressed as standard deviations Practice time! The following tables show the results of a regression model predicting Excel training performance using 5 variables: self-efficacy (Setotal), Excel use (Rexceluse), Excel formula use (Rformulause), cognitive ability (WPTQ), and task-switching IAT score (TSA_score) Interpret this… And this… And finally this Week 6 and 7 Categorical variables When categorical variable has 2 categories (male/female, dead/alive, employed/not employed), can put it directly into regression When categorical variable has more than 2 categories (freshman/sophomore/junior/senior, entry level/first line supervisor/manager), can’t input it directly into regression model Have to dummy code categorical variable Categorical variables Dummy variables: represent group membership using zeroes and ones Have to create a series of new variables Number of variables=number of categories - 1 Example: freshman/sophomore/junior/senior Categorical variables Eight steps in creating and using dummy coded variables in regression: 1. Count number of groups in variable and subtract 1 2. Create as many new variables as needed based on step 1 3. Choose one of groups as baseline to compare all other groups against Usually this will be the control group or the majority group 4. Assign values of 0 to all members of baseline group for all dummy variables Categorical variables 5. For first dummy variable, assign 1 to members of the first group that you want to compare against baseline group. Members of all other groups get a 0. 6. For second dummy variable, assign 1 to all members of second group you want to compare against baseline group. Members of all other groups get a 0. 7. Repeat this for all dummy variables. 8. When running regression, put all dummy variables in same block Categorical variables Example: One variable with 4 categories: Freshman, sophomore, junior, senior. Categorical variables Categorical variables Categorical variables Categorical variables Categorical variables Each dummy variable is included in the regression output Regression coefficient for each dummy variable shows change in outcome that results when moving from baseline (0) to category being compared (1): difference in outcome between baseline group and other group Example: Compared to freshmen, seniors’ attitudes towards college scores are 1.94 points higher Significant t-value: group coded as 1 for that dummy variable significantly different on outcome than baseline group Moderation Relationship between 2 variables depends on the level of a third variable Interaction between predictors in model Moderation Many research questions deal with moderation! Example: In I/O psychology, moderation important for evaluating predictive invariance Does the relationship between a selection measure and job performance vary depending on demographic group (Male vs. female, White vs. Black, etc.)? Example: In clinical/counseling, moderation important for evaluating risk for mental illness Does the relationship between exposure to a stressful situation and subsequent mental illness diagnosis vary depending on the individual’s social support network? Moderation Moderation 𝑌𝑖 = 𝑏𝑜 + 𝑏1 𝐴𝑖 + 𝑏2 𝐵𝑖 + 𝑏3 𝐴𝐵𝑖 + 𝑒𝑖 Basic regression equation with minor change: 𝐴𝐵𝑖 Outcome depends on Intercept (𝑏𝑜 ) Score on variable A (𝑏1 𝐴𝑖 ), and relationship between variable A and Y Score on variable B (𝑏2 𝐵𝑖 ), and relationship between variable B and Y Interaction (multiplication) between scores on variables A and B (𝑏3 𝐴𝐵𝑖 ), and relationship between AB and Y Moderation Moderator variables can be either categorical (low conscientiousness/high conscientiousness; male vs. female, etc.) or continuous (conscientiousness scores from 1-7) Categorical: can visualize interaction as two different regression lines, one for each group, which vary in slope (and possibly in intercept) Moderation Moderation Continuous moderator: visualize in 3-dimensional space: more complex relationship between moderator and predictor variable Slope of one predictor changes as values of moderator change Pick a few values of moderator and generate graphs for easier interpretation Moderation Prior to analysis, need to grand-mean center predictors Doing makes interactions easier to interpret (why we center) Regression coefficients show relationship between predictor and criterion when other predictor equals 0 Not all variables have meaningful 0 in context of study: age, intelligence, etc. Could end up trying to interpret effects based on non-existing score (such as the level of job performance for person with intelligence score of 0) Once interactions are factored in, interpretation becomes increasingly problematic Also reduces nonessential multicollinearity (i.e., correlations due to the way that the variables were scaled) Moderation Grand mean centering: subtract mean of variable from all scores on that variable Centered variables used to calculate interaction term Creates interaction variable Don’t center categorical predictors Just make sure it is scaled 0 and 1 Don’t center outcome/dependent variable Centering only applies to predictors Moderation For centered variable, value of 0 represents the mean value on the predictor Since transformation is linear, doesn’t change regression model substantially Interpretation of regression coefficients easier Without centering: interaction = how outcome changes with one-unit increase in moderator when predictor = 0 With centering: interaction = how outcome changes with one-unit increase in moderator when predictor = mean Grand mean centering Moderation Steps for moderation in SPSS: 1. Grand-mean center continuous predictor(s) 2. Enter both predictor variables into first block 3. Enter interaction term in second block Doing it this way makes it easier to look at R2 change 4. Run regression and look at results 5. If interaction term significant: Categorical predictor: Line graph between predictor and DV, with a different line for each category Continuous predictor: Simple slopes analysis Simple slopes analysis Basic idea: values of outcome (Y) calculated for different levels of predictor and moderator: low, medium, and high Usually defined as -1 SD, mean, + 1 SD Recommend using online calculator for these (can be done by hand, but it’s a pain) http://www.jeremydawson.co.uk/slopes.htm http://quantpsy.org/interact/mlr2.htm Simple slopes analysis Example: Aggression = 39.97 +(.17*video) + (.76*callous) + (.027(video*callous) For 1 SD below on video games at low levels of callous unemotionality: 39.97 + (.17*-6.9622)+(.76*-9.6177)+(.027*(-6.9622*9.6177) = 33.29 Would do this 8 more times so that you had values of aggression at low, medium, and high levels of callous unemotionality and video game playing Simple slopes analysis Creating interaction term Entering variables Entering variables Output Simple slopes analysis Low Callous Dependent variable 5 4.75 4.5 4.25 4 3.75 3.5 3.25 3 2.75 2.5 2.25 2 1.75 1.5 1.25 1 0.75 0.5 0.25 0 Husbands Wives Low Attractiveness High Attractiveness Research Designs Comparing Groups Week 8 Quasi-experimental designs Quasi-experiments No random assignment Goal is still to investigate relationship between proposed causal variable and an outcome What they have: Manipulation of cause to force it to happen before outcome Assess covariation of cause and effect What they don’t have: Limited in ability to rule out alternative explanations But design features can improve this One group posttest only design X O1 Problems: No pretest: did anything change? No control group: what would have happened if IV not manipulated? Doesn’t control for threats to internal validity One group posttest only design Example: An organization implemented a new pay-forperformance system, which replaced its previous pay-byseniority system. A researcher was brought in after this implementation to administer a job satisfaction survey One group pretest-posttest design O1 X O2 Adding pretest allows assessment of whether change occurred Major threats to internal validity: Maturation: change of participants due to natural causes History: change due to historical event (recession, etc.) Testing: desensitizing participants to the test, using the same pretest for posttest One group pretest-posttest design Example: An organization wanted to implement a new pay-for-performance system to replace its pay-byseniority system. A researcher was brought in to administer a job satisfaction questionnaire before the pay system change, and again after the pay system change Removed treatment design O1 X O2 O3 X O4 Treatment given, and then removed 4 measurements of DV: 2 pretests, and 2 posttests If treatment affects DV, DV should go back to its pretreatment level after treatment removed Unlikely that threat to validity would follow this same pattern Problem: assumes that treatment can be removed with no lingering effects May not be possible or ethical (i.e., ethical conundrum: taking away schizophrenic patients’ medicine treatment; possibility conundrum: therapy for depression, benefits would still be experienced) ) Removed treatment design Example: A researcher wanted to evaluate whether exposure to TV reduced memory capacity. Participants first completed a memory recall task, then completed the same task while a TV plays a sitcom in the background. After a break, participants again complete the memory task while the TV plays in the background, then complete it again with the TV turned off. Repeated treatment design O1 X O2 X O3 X O4 Treatment introduced, removed, and then re-introduced Threat to validity would have to follow same schedule of introduction and removal-very unlikely Problem: treatment effects may not go away immediately Repeated treatment design Example: A researcher wanted to investigate whether piped-in classical music decreased employee stress. She administered a stress survey, and then piped in music. One week later, stress was measured again. The music was then removed, and stress was measured again one week later. The music was then piped in again, and stress was measured a final time one week later. Posttest-only with nonequivalent groups NR X NR Participants not randomly assigned to groups One group receives treatment, one does not DV measured for both groups Big validity threat: selection O1 O2 Posttest-only with nonequivalent groups Example: An organization wants to implement a policy against checking email after 6pm in an effort to reduce work-related stress. The organization assigns their software development department to implement the new policy, while the sales department does not implement the new policy. After 2 months, employees in both departments complete a work stress scale. Untreated control group with pretest and posttest NR O1 NR O1 X Pretest and posttest data gathered on same experimental units Pretest allows for assessment of selection bias Also allows for examination of attrition O2 O2 Untreated control group with pretest and posttest Example: A community is experimenting with a new outpatient treatment program for meth addicts. Current treatment recipients had the option to participate (experimental group) or not participate (control group). Current daily use of meth was collected for all individuals. Those in the experimental group completed the new program, while those in the control group did not. Following the program, participants in both groups were asked to provide estimates of their current daily use of meth. Switching replications NR O1 NR O1 X O2 O2 O3 X Treatment eventually administered to group that originally served as control Problems: O3 May not be possible to remove treatment from one group Can lead to compensatory rivalry Switching replications Example: An organization implemented a new reward program to reduce absences. After a month of no absences, employees were…The manufacturing organization from the previous scenario removed the reward program from the Ohio plant, and implemented it in the Michigan plant. Absences were gathered and compared 1 month later. Reversed-treatment control group NR O1 X+ O2 NR O1 X- O2 Control group given treatment that should have opposite effect of that given to treatment group Rules out many potential validity threats Problems: may not be feasible (pay/performance, what’s the opposite?) or ethical Reversed-treatment control group Example: A researcher wanted to investigate the effect of mood on academic test performance. All participants took a pre-test of critical reading ability. The treatment group was put in a setting which stimulated positive mood (calming music, lavender scent, tasty snacks) while the control group was put in a setting which stimulated negative mood (annoying children’s show music, sulfur scent, no snacks). Participants then completed the critical reading test again in their respective settings. Randomized experimental designs Randomized experimental designs Participants randomly assigned to groups Random assignment: any procedure that assigns units to conditions based on chance alone, where each unit has a nonzero probability of being assigned to any condition NOT random sampling! Random sampling concerns how sample obtained Random assignment concerns how sample assigned to different experimental conditions Why random assignment? Researchers in natural sciences can rigorously control extraneous variables People are tricky. Social scientists can’t exert much control. Can’t mandate specific level of cognitive ability, exposure to violent TV in childhood, attitude towards women, etc. Random assignment to conditions reduces chances that some unmeasured third variable led to observed covariation between presumed cause and effect Why random assignment? Example: what if you assigned all participants who signed up in the morning to be in the experimental group for a memory study, and all those who signed up in the afternoon to be in the control group? And those who signed up in the morning had an average age of 55 and those who signed up in the afternoon had an average age of 27? Could difference between experimental and control groups be attributed to manipulation? Random assignment Since participants randomly assigned to conditions, expectation that groups are equal prior to experimental manipulations Any observed difference attributable to experimental manipulation, not third variable Doesn’t prevent all threats to validity Just ensures they’re distributed equally across conditions so they aren’t confounded with treatment Random assignment Doesn’t ensure groups are equal Just ensures expectation that they are equal No obvious reason why they should differ But they still could Example: By random chance, average age of control group may be higher than average age of experimental group Random assignment Random assignment guarantees equality of groups, on average, over many experiments Does not guarantee that any one experiment which uses random assignment will have equivalent groups Within any one study, groups likely to differ due to sampling error But, if random assignment process was conducted over infinite number of groups, average of all means for treatment and control groups would be equal Random assignment If groups do differ despite random assignment, those differences will affect results of study But, any differences due to chance, not to way in which individuals assigned to conditions Confounding variables unlikely to correlate with treatment condition Posttest-only control group design R X R O O Random assignment to conditions (R) Experimental group given treatment/IV manipulation (X) Outcome measured for both groups (O) Posttest-only control group design Example: Participants assigned to control group (no healthy eating seminar) or treatment group (90 minute healthy eating seminar) 6 months later, participants given questionnaire assessing healthy eating habits Scores on questionnaire compared for control group and treatment group Problems with posttest-only control group design No pretest If attrition occurs, can’t see if those who left were any different than those who completed study No pretest makes it difficult to assess change on outcome Pretest-posttest control group design R P R P X O O Randomly assigned to conditions Given pretest (P) measuring outcome variable One group given treatment/IV manipulation Outcome measured for both groups Variation: can randomly assign after pretest Pretest-posttest control group design Example: Randomly assign undergraduate student participants to control group and treatment group Give pretest on attitude towards in-state tuition for undocumented students Control group watches video about history of higher education for 20 minutes, while treatment group watches video explaining challenges faced by undocumented students in obtaining college degree Give posttest on attitude towards in-state tuition for undocumented students Factorial designs Have 2 or more independent variables Naming logic: # of levels in IV1 x # of levels in IV2 x …# of levels in IV X 3 advantages: Require fewer participants since each participant receives treatment related to 2 or more IVs Treatment combinations can be evaluated Interactions can be tested Factorial designs R XA1B1 O R XA1B2 O R XA2B1 O R XA2B2 O For 2x2 design: Randomly assign to conditions (there are 4) Each condition represents 1 of 4 possible IV combinations Measure outcome Factorial designs Example: 2 IVs of interest: room temperature (cool/hot) and noise level (quiet/noisy) DV = number of mistakes made in basic math calculations Randomly assign to 1 of 4 groups: Quiet/cool Quiet/hot Noisy/cool Noisy/hot Measure number of mistakes made in math calculations Compare means across groups using factorial ANOVA Factorial designs 2 things we can look for with these designs: Main effects: average effects of IV across treatment levels of other IV Did participants do worse in the noisy than quiet conditions? Did participants do worse in the hot than cool conditions Main effect can be misleading if there is a moderator variable Interaction: Relationship between one IV and DV depends on level of other IV Noise level positively related to number of errors made, but only if room hot Within-subjects randomized experimental design R Order 1 Conditi O1 on 1 Conditi O2 on 2 R Order 2 Conditi O1 on 2 Conditi O2 on 1 Participants randomly assigned to either order 1 or order 2 Participants in order 1 receive condition 1, then condition 2 Participants in order 2 receive condition 2, then condition 1 Having different orders prevents order effects Having participants in more than 1 condition reduces error variance Within-subjects randomized experimental design Example: Participants randomly assigned to order 1 or order 2 Participants in order 1 reviewed resumes with the applicant’s picture attached and made hiring recommendations. They then reviewed resumes without pictures and made hiring recommendations. Participants in order 2 reviewed resumes without pictures and made hiring recommendations. They then reviewed resumes with the applicant’s picture attached and made hiring recommendations. Data analysis With 2 groups Need to compare 2 group means to determine if they are significantly different from one another If groups independent, use independent samples t-test If participants in one group are different from the participants in the other group If repeated measures design, use repeated measures ttest With 3 or more groups Still need to compare group means to determine if they are significantly different If only 1 IV, use a one-way ANOVA If 2 or more IVs, use a factorial ANOVA If groups are not independent, use repeated measures ANOVA Design practice Research question: Does answering work-related communication (emails, phone calls) after normal working hours affect work-life balance? Design BOTH a randomized experiment AND a quasiexperiment to evaluate your research question For each design (random and quasi): Operationalize variables and develop a hypothesis(es) Name and explain the experimental design as it will be used to test your hypothesis(es) Name and explain one threat to internal validity in your design Week 9 Comparing means 2 primary ways to evaluate mean differences between groups: t-tests ANOVAs Which one you use will depend on how many groups you want to compare, and how many IVs you have 2 groups, 1 IV, 1 DV: t-test 3 or more groups, 1 or more IVs, 1 DV: ANOVA One-way ANOVA if only 1 IV Factorial ANOVA if 2 or more IVs t-tests Used to compare means on one DV between 2 groups Do men and women differ in their levels of job autonomy? Do students who take a class online and students who take the same class face-to-face have different scores on the final test? Do individuals report higher levels of positive affect in the morning than they report in the evening? Do individuals given a new anti-anxiety medication report different levels of anxiety than individuals given a placebo? t-tests 2 different options for t-tests: Independent samples t-test: individuals in group 1 are not the same as individuals in group 2 Do self-reported organizational citizenship behaviors differ between men and women? Repeated measures t-test: individuals in group 1 are the same as individuals in group 2 Do individuals report different levels of job satisfaction when surveyed on Friday than they do when surveyed on Monday? A note on creating groups Beware of dichotomizing a continuous variable in order to make 2 groups Example: everyone who scored a 50% or below on a test goes in group 1, and everyone who scored 51% or higher goes in group 2 Causes several problems People with very similar scores around cut point may end up in separate groups Reduces statistical power Increases chances of spurious effects t-tests and the linear model t-test is just linear model with one binary predictor variable 𝑌𝑖 = 𝑏0 + 𝑏1 𝑥1 + 𝑒𝑖 Predictor has 2 categories (male/female, control/experimental) Dummy variable: 0=baseline group, 1 = experimental/comparison group 𝑏0 is equal to mean of group coded 0 𝑏1 is equal to difference between group means Rationale for t - test 2 sample means collected-need to see how much they differ If samples from same population, expect means to be roughly equivalent Large differences unlikely to occur due to chance When we do a t-test, we compare difference between sample means to difference we would expect if null hypothesis was true (difference = 0) Rationale for t-test Standard error = gauge of differences between means likely to occur due to chance alone Small standard error: expect similar means if both samples from same population Large standard error: expect somewhat different means even if both samples from same population t-test evaluates whether observed difference between means is larger than would be expected, based on standard error, if samples from same population Rationale for t-test Top half of equation = model Bottom half of equation = error Independent samples t-test Use when each sample contains different individuals Look at ratio of between-group difference in means to estimate of total standard error for both groups Variance sum law: variance of difference between 2 independent variables = sum of their variances Use sample standard deviations to calculate standard error for each population’s sampling distribution Independent samples t-test Assuming that sample sizes are equal: 𝑋1 − 𝑋2 𝑡= 2 𝑠 1 𝑠22 + 𝑁1 𝑁2 Top half: difference between means Bottom half: each sample’s variance divided by its sample size Independent samples t-test If sample sizes are not equal, need to use pooled variance, which weights variance for each sample to account for sample size differences Pooled variance: 2 sp n1 1 n2 1 n1 n 2 2 2 s1 2 s2 Independent samples t-test Equation for independent samples t-test with different sample sizes: t X1 X 2 2 sp n1 2 sp n2 Differences between groups Error Paired samples/repeated measures t-test Use when same people are in both samples Average difference between scores at measurement 1 and measurement 2: 𝐷 Shows systematic variation between measurements Difference that we would expect between measurements if null hypothesis true: 𝜇𝐷 Since null hypothesis says that difference = 0, this cancels out Measure of error = standard error of differences: 𝑠𝐷 𝑁 Paired samples/repeated measures t-test D D t sD N = 0 and cancels out (what we would expect to see if the null were true) Assumptions of t-tests Both types of t-tests are parametric and assume normality of sampling distribution For repeated measures, refers to sampling distribution of differences Data on DV have to be measured at interval level Can’t be nominal or ordinal Independent samples t-test assumes variances of each population equivalent (homogeneity of variance) Also assumes scores in each sample independent of scores in other sample Assumptions of t-tests Independent samples t-tests will automatically do Levene’s test for you If Levene’s not significant, homogeneity of variance assumption met: interpret first line of output (equal variances assumed) If Levene’s is significant, homogeneity of variance assumption not met: interpret second line out output (equal variances not assumed) Independent samples t-test example DV = Number of items skipped on ability test Group 1: Took test in unproctored setting Group 2: took test in proctored setting Independent samples t-test example Independent samples t-test example Independent samples t-test Independent samples t-test Need to report effect size Can convert to r: r = √(7.65*7.65)/(7.65*7.65)+1642.492 ))= .184 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored) Independent samples t-test More commonly use d: 𝑋1 − 𝑋2 𝑑= 𝑠2 d = (.56-.23)/1.431 = 0.23 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored) Note on d: Book shows d calculation using only 1 sd In practice, more common to use pooled standard deviation Interpretation (Cohen, 1988): .20 = small, .50 = medium, .80 = large Negative d means that 𝑋2 larger than 𝑋1 Repeated measures t-test example DV = Perceptions of procedural justice Measurement 1: Participants took one type of Implicit Association Test (task-switching ability) Measurement 2: Participants took traditional cognitive ability test (WPT-Q) Repeated measures t-test example Repeated measures t-test example Repeated measures t-test example Repeated measures t-test effect sizes Still need to calculate effect sizes Problem with r in repeated measures t-test: tends to over-estimate effect size Better off using d with repeated measures designs: better estimate of effect size Formula for repeated measures d = (D – μD)/S Comparing the t-tests If you have the same people in both groups, ALWAYS use repeated measures t-test (or you violate one of the assumptions of the independent t-test) Non-independence of errors violates assumptions of independent samples t-test Power is higher in repeated measures t-test Reduces error variance by quite a bit since same participants are in both samples One-way ANOVA ANOVA = analysis of variance One-way ANOVA allows us to compare means on a single DV across more than 2 groups Why we need ANOVA Doing multiple t-tests (control vs. group 1, control vs. group 2, etc.) on data inflates the Type I error rate beyond acceptable levels Familywise error rate assuming α = .05 for each test: 1 – (.95)n n = number of comparisons being made So, with 3 comparisons, overall α = .143 With 4 comparisons, overall α = .185 ANOVA and the linear model Mathematically, ANOVA and regression are the same thing! ANOVA output: F-ratio: comparison of systematic to unsystematic variance Same as F ratio in regression: shows improvement in prediction of outcome gained by using model as compared to just using mean Only difference between ANOVA and regression: predictor is categorical variable with more than 2 categories Exactly the same as using dummy variables in regression Linear model with # of predictors equal to number of groups -1 ANOVA and the linear model Intercept (b0) will be equal to the mean of the baseline group (group coded as 0 in all dummy variables Regression coefficient b1 will be equal to the difference in means between baseline group and group 1 Regression coefficient b2 will be equal to the difference in means between baseline group and group 2 F ratio 𝐹= 𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑢𝑛𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑒𝑟𝑟𝑜𝑟) Systematic variance, in ANOVA, is mean differences between groups Null hypothesis: group means are same In this case, systematic variance would be small Thus, F would be small ANOVA logic Simplest model we can fit to data is grand mean (of DV) We try to improve on this prediction by creating a more complex model Parameters include intercept (b0) and one or more regression coefficients (b1, b2, etc.) Bigger regression coefficients = bigger differences between groups If between group differences large, model better fit to data than grand mean If model fit is better than grand mean, then between-group differences are significant Total sum of squares (SST) SST (xi x grand) 2 This shows the total amount of variation within the data Grand mean on DV subtracted from each observation’s value on DV Total degrees of freedom for SST: N-1 Model sum of squares (SSM) SS M ni (xi x grand) 2 This shows how much variance the linear model explains Calculate difference between mean of each group and grand mean, square this value (each value), then multiply it by the number of participants in the group Add the values for each group together Degrees of freedom: k – 1, where k is number of groups Residual sum of squares (SSR) SS R (x i x i ) 2 This shows differences in scores that aren’t explained by model (i.e., aren’t explained by between-group differences) Calculated by subtracting the group mean from each score, squaring this value, and then adding all of the values together Degrees of freedom = N – k, where k = number of groups and N is overall sample size Mean squares To get a mean square value, divide sum of squares value by its degrees of freedom Mean square model (MSM) = SSM/k-1 Mean square residual (MSR) = SSR/N - k F ratio Calculated using mean square values: Degrees of freedom for F: (k-1), (N – k) If F is statistically significant, group means differ by more than they would if null hypothesis were true F is omnibus test: only tells you whether group means differ significantly: there’s a difference somewhere Doesn’t tell you which means differ from one another Need post-hoc tests to determine this Post-hoc tests Pairwise comparisons to compare all groups to one another All incorporate correction so that Type I error rate is controlled (at about .05) Example: Bonferroni correction (very conservative): use significance level( usually .05) α/n, where n is number of comparisons So, if we have 3 groups and we want to keep α at .05 across all comparisons, each comparison will have α = .017 Post-hoc tests Lots of options for post hoc tests in SPSS Some notes on the more common ones: Least significant difference (LSD): doesn’t control Type I error very well Bonferroni’s and Tukey’s: control Type I error rate, but lack statistical power (too conservative) REGWQ: controls Type I error and has high power, but only works if sample sizes equal across groups Games-Howell: less control of Type I error, but good for unequal sample sizes and unequal variance across groups Dunnett’s T3: good control of Type I error, works if unequal variance across groups Assumptions of ANOVA Homogeneity of variance: can check with Levene’s test If Levene’s significant and homogeneity of variance assumption violated, need to use corrected F ratio Brown-Forsyth F Welch’s F Provided group sizes equal, ANOVA works ok if normality assumption violated somewhat If group sizes not equal, ANOVA biased if data non-normal Non-parametric alternative to ANOVA: Kruskal-Wallis test (book covers in detail) Steps for doing ANOVA Effect sizes for ANOVA R2: SSM/SST When applied to ANOVA, value called eta squared, η2 Somewhat biased because it’s based on sample only: doesn’t adjust for looking at effect size in population SPSS reports partial eta squared, but only for factorial ANOVA: SSB/SSB+ SSE Better effect size measure for ANOVA: omega- squared (ω2 ; SPSS will not measure for you) 𝑆𝑆𝑀 − (𝑑𝑓𝑀 )𝑀𝑆𝑅 2 𝜔 = 𝑆𝑆𝑇 + 𝑀𝑆𝑅 One-way ANOVA in SPSS IV: Counterproductive work behavior (CWB) scale that varied in its response anchors: control, infrequent, & frequent DV: self-reported CWB One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS Calculating omega-squared: 𝜔2 = 21.49− 2 2.29 664.996+2.29 = .025 Suggestions for interpreting 𝜔2 : .01 = small .06 = medium .14 = large