* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download t - PBworks
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Week 4 Associational research  Looks at the relationship between two variables  Usually continuous variables  No manipulation of IV  Correlation coefficient shows relationship between 2 variables  Regression: equation used to predict outcome value based on predictor value  Multiple regression: same, but uses more than 1 predictor What is a correlation?  Know that statistical model is: 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑚𝑜𝑑𝑒𝑙 + 𝑒𝑟𝑟𝑜𝑟𝑖  For correlation, this can be expressed as: 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑖 = 𝑏𝑥𝑖 + 𝑒𝑟𝑟𝑜𝑟𝑖  Simplified: outcome is predicted from predictor variable and some error  b = Pearson product-moment correlation, or r Covariance  Covariance: extent to which 2 variables covary with one another Cov( x, y)    xi  x  yi  y  N 1  Shows how much deviation with one variable is associated with deviation in the second variable Covariance example Covariance example ( x i  x )( y i  y ) cov( x , y )  N 1 ( 0.4)( 3)  ( 1.4)( 2 )  ( 1.4)( 1)  (0.6)( 2 )  ( 2.6)( 4)  4 1.2  2.8  1.4  1.2  10.4  4  17 4  4.25 Covariance  Positive covariance: As one variable deviates from mean, other variable deviates in same direction  Negative covariance: As one variable deviates from mean, other variable deviates in opposite direction  Problem with covariance: depends on scales variables measured on  Can’t be compared across measures  Need standardized covariance to compare across measures Correlation  Standardized measure of covariance r  Covxy sx s y   xi  x  yi  y   N 1s x s y  Known as Pearson’s product-moment correlation, r Correlation example  From previous table: r Covxy sx s y 4.25  1.67  2.92  .87 Correlation  Values range from -1 to +1  +1: perfect positive correlation: as one variable increases, other increases by proportionate amount  -1: perfect negative correlation: as one variable increases, other decreases by proportionate amount  0: no relationship. As one variable changes, other stays the same Positive correlation 90 Appreciation of Dimmu Borgir 80 70 60 50 40 30 20 10 10 20 30 40 50 Age 60 70 80 90 Negative correlation 100 Appreciation of Dimmu Borgir 80 60 40 20 0 -20 10 20 30 40 50 Age 60 70 80 90 Small correlation 160 Appreciation of Dimmu Borgir 140 120 100 80 60 40 20 0 -20 10 20 30 40 50 Age 60 70 80 90 Correlation significance  Significance tested using t-statistic 𝑡𝑟 = 𝑟 𝑁−2 1 − 𝑟2 Correlation and causality  Correlation DOES NOT imply causality!!!  Only shows us that 2 variables are related to one another  Why correlation doesn’t show causality:  3rd variable problem: some other variable (not measured) responsible for observed relationship  No way to determine directionality: does a cause b, or does b cause a? Before running a correlation… Bivariate correlation in SPSS Note on pairwise & listwise deletion  Pairwise deletion: removes cases from analysis on an analysis-by-analysis basis  3 variables: A, B, & C    Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, and A & B, but not from correlation beteween A & C  Advantage: keep more of your data  Disadvantage: not all analyses will include the same cases: can bias results Note on pairwise & listwise deletion  Listwise deletion: removes cases from analysis if they are missing data on any variable under consideration  3 variables: A, B, & C    Correlation matrix between A, B, & C Case 3 is missing data on variable B, but not on A or C Case 3 will be excluded from correlation between B & C, A & B, and A & C  Advantage: less prone to bias  Disadvantage: don’t get to keep as much data  Usually a better option than pairwise Correlation output Interpreting correlations  Look at statistical significance  Also, look at size of correlation:  +/- .10: small correlation  +/- .30: medium correlation  +/- .50: large correlation Coefficient of determination, 2 R  Amount of variance in one variable shared by other variable  Example: pretend R2 between cognitive ability and job performance is .25  Interpretation: 25% of variance in cognitive ability shared by variance in job performance  Slightly incorrect but easier way to think of it: 25% of the variance in job performance is accounted for by cognitive ability Spearman’s correlation coefficient  Also called Spearman’s rho (ρ)  Non-parametric  Based on ranked, not interval or ratio, data  Good for minimizing effect of outliers and getting around normality issues  Ranks data (lowest to highest score)  Then, uses Pearson’s r formula on ranked data Kendall’s tau (τ)  Non-parametric correlation  Also ranks data  Better than Spearman’s rho if:  Small data set  Large number of tied ranks  More accurate representation of correlation in population than Spearman’s rho Point-biserial correlations  Used when one of the two variables is a truly dichotomous variable (male/female, dead/alive)  In SPSS:  Code one category of dichotomous variable as 0, and the other as 1  Run normal Pearson’s r  Example: point-biserial correlation of .25 between species (0=cat & 1=dog) and time spent on the couch  Interpretation: a one unit increase in the category (i.e., from cats to dogs) is associated with a .25 unit increase in time spent on couch Biserial correlation  Used when one variable is a “continuous dichotomy”  Example: passing exam vs. failing exam  Knowledge of subject is continuous variable: some people pass exam with higher grade than others  Formula to convert point-biserial to biserial:  P1=proportion of cases in category 1  P2=proportion of cases in category 2  y is from z-table: find value roughly equivalent to split between largest and smallest proportion  See table on p. 887 in book Biserial correlation  Example:  Correlation between time spent studying for medical boards and outcome of test (pass/fail) was .35. 70% of test takers passed.  𝑟𝑏 = .35 .30∗.70 .3485 = .46 Partial correlation  Correlation between two variables when the effect of a third variable has been held constant  Controls for effect of third variable on both variables  Rationale: if third variable correlated (shares variance) with 2 variables of interest, correlation between these 2 variables won’t be accurate unless effect of 3rd variable is controlled for Partial correlation  Obtain by going to Analyze-correlate-Partial  Choose variables of interest to correlate  Choose variable to control Semi-partial (part) correlations  Partial correlation: control for effect that 3rd variable has on both variables  Semi-partial correlation: control for effect that 3rd variable has on one variable  Useful for predicting outcome using combination of predictors Calculating effect size  Can square Pearson’s correlation to get R2: proportion of variance shared by variables  Can also square Spearman’s rho to get R2s: proportion of variance in ranks shared by variables  Can’t square Kendall’s tau to get proportion of variance shared by variables Regression  Used to predict value of one variable (outcome) from value of another variable (predictor)  Linear relationship Yi=(bo+b1x1)+ei  = outcome  = intercept: value of outcome (Y) when predictor (X) = 0  = slope of line: shows direction & strength of relationship  = value of predictor (x)  = deviation of predicted outcome from actual outcome Regression  𝑏𝑜 and 𝑏1 are regression coefficients  Negative 𝑏1 : negative relationship between predictor and criterion  Positive 𝑏1 : positive relationship between predictor and criterion  Will sometimes see β𝑜 and β1 instead: these are standardized regression coefficients  Put values in standard deviation units Regression Regression  Regression example:  Pretend we have the following regression equation:    Exam grade (Y) = 45 + .35(Hours spent studying) + error If we know that someone spends 10 hours studying for the test, what is the best prediction of their exam grade we can make? Exam grade = 45 + (.35*10) = 80 Estimating model  Difference between actual outcome and outcome predicted by data Estimating model  Total error in model = (𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖 − 𝑚𝑜𝑑𝑒𝑙𝑖 )2  Called sum of squared residuals (SSR)  Large SSR: Model not a good fit to data; small = good fit  Ordinary least squares (OLS) regression: used to define model that minimizes sum of squared residuals Estimating model  Total sum of squares (SST): Total sum of squared differences between observed data and mean value of Y  Model sum of squares (SSM): Improvement in prediction as result of using regression model rather than mean Estimating model  Proportion of improvement due to use of model rather than mean: 𝑆𝑆𝑀 = 𝑆𝑆𝑇  Also is indicator of variance shared by predictor and outcome  F-ratio: statistical test for determining whether model describes data significantly better than mean 𝑀𝑆𝑀 𝐹= 𝑀𝑆𝑅 𝑅2 Individual predictors  b should be significantly different from 0  0 would indicate that for every 1 unit change in x, y wouldn’t change  Can test difference between b and null hypothesis (b = 0) using t-test 𝑏𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑡= 𝑆𝐸𝑏 Week 5 Outliers in regression  Outlier can affect regression coefficient Outliers in regression  Residual: difference between actual value of outcome and predicted value of outcome  Large residuals: poorly-fitting regression model  Small residuals: regression model good fit  Unstandardized residual: difference between actual and predicted outcome value, measured in same units as outcome  Standardized residual: Residuals converted to z-scores  Studentized residual: unstandardized residual divided by estimate of standard deviation Influential cases  Influential case: value that strongly influences regression model parameter estimates  Cook’s distance: measure of overall influence of case on the model  Values larger than 1 = problem  Leverage: shows influence of observed value of outcome variable over predicted values of outcome  Average leverage = (k + 1)/n, where k is number of predictors and n is sample size  Problematic values: (3(k + 1)/n) Influential cases  DFBETA: compares regression coefficient when case is excluded from the model to regression coefficient when case is included in the model  Problematic if values larger than 2(√n)  Mahalanobis distance: measures distance of case from mean of predictor variable(s)  Chi square distribution with degrees of freedom equal to number of predictors  Significant value = problem Independent errors  Durbin-Watson test: tests whether adjacent residuals are correlated  Value of 2: residuals uncorrelated  Value larger than 2: negative correlation between residuals  Value smaller than 2: positive correlation between residuals  Values greater than 3 or less than 1 problematic Assumptions of linear regression models  Additivity and linearity: outcome linearly related to     additive combination of predictors Independent errors: uncorrelated residuals Homoscedasticity: at all levels of predictor, should be equal variance of residuals Normally distributed errors (residuals) Predictors uncorrelated with external variables: external variables = variables not included in model that influence outcome variable Assumptions of linear regression models  Predictors must be quantitative, or categorical with only 2 categories  Can dummy-code variables if more than 2 categories  Outcomes quantitative and continuous  No perfect multicollinearity: No perfect linear relationship between predictor pairs  Non-zero variance: predictors need to vary Multiple regression  Incorporates multiple predictors into regression model  Predictors should be chosen based on theory/previous research  Not useful to chuck lots of random predictors into model to see what happens y  b0  b1 X1 b2 X 2    bn X n   i Semi-partial correlation  Foundation of multiple regression  Measures relationship between predictor and outcome, controlling for relationship between that predictor and other predictors in the model  Shows unique contribution of predictor in explaining variance in outcome Reasons for multiple regression  Want to explain greater amount of variance in outcome  “What factors influence adolescent drug use? Can we predict it better?”  Want to look at set of predictors in relation to outcome  Very useful: human behavior rarely determined by just one thing  “How much do recruiter characteristics and procedural justice predict job satisfaction once hired?” Reasons for multiple regression  Want to see if adding another predictor (or set of predictors) will improve prediction above and beyond known set of predictors  “Will adding a job knowledge test to current battery of selection tests improve prediction of job performance?”  Want to see if predictor(s) significantly related to outcome after controlling for effect of other predictors  “Is need for cognition related to educational attainment, after controlling for socioeconomic status?” Entering predictors  Hierarchical regression  Known predictors entered into model first  New/untested predictors added into models next  Good for assessing incremental validity  Forced entry  All predictors forced into model at same time  Stepwise  DON’T USE IT!  Adds predictors based upon amount of variance explained  Atheoretical & capitalizes on error/chance variation Multicollinearity  Perfect collinearity: one predictor has perfect correlation with another predictor  Can’t get unique estimates of regression coefficients: both variables share same variance  Lower levels of multicollinearity common Multicollinearity  Problems with multicollinearity:  Untrustworthy bs due to increase in standard errormore variable across samples  Limits R: If two variables highly correlated, they share a lot of variance. Each will then account for very little unique variance in the outcome  Adding predictor to model that’s correlated strongly with existing predictor won’t increase R by much even if on it’s own it’s strongly related to outcome  Can’t determine importance of predictors: since variance shared between predictors, which accounts for more variance in outcome? Multicollinearity  Example: You’re trying to predict social anxiety using emotional intelligence and number of friends as predictors  What if emotional intelligence and number of friends are related? Multicollinearity Social anxiety Emotional intelligence Number of friends Both explain this variance in outcome Multicollinearity  Could have high R accompanied by very small bs  Variance inflation factor (VIF): evaluates linear relationship between predictor and other predictors  Largest VIF greater than 10: problem  Average VIF greater than 1: problem  Calculate this by adding up VIF values across predictors, and then dividing by number of predictors  Tolerance: reciprocal of VIF (1/VIF)  Below .10: major problem  Below .20: potential problem Multicollinearity  Many psychological variables are slightly correlated  Likely to run into big multicollinearity problems if you include 2 predictors measuring the same, or very similar, constructs  Examples:     Cognitive ability and problem-solving 2 different conscientiousness measures Job knowledge and a situational interview Scores on 2 different anxiety measures Homoscedasticity  Can plot zpred (standardized predicted values of DV based on model) against zresid (standardized residuals) Homoscedasticity  Should look like a random scatter of values Multiple regression in SPSS Multiple regression in SPSS Multiple regression in SPSS Regression output  R: Correlation between actual outcome values, and values predicted by regression model  R2: Proportion of variance in outcome predicted by model  Adjusted R2: estimate of value in population (adjusted for shrinkage that tends to occur in cross-validated model due to sampling error) Regression output  F-test: compares variance explained by model to variance unaccounted for by model (error)  Shows whether predictions based on model are more accurate than predictions made using mean Regression output  Beta (b) values: change in outcome associated with a one-unit change in a predictor  Standardized beta (β) values: beta values expressed as standard deviations Practice time!  The following tables show the results of a regression model predicting Excel training performance using 5 variables: self-efficacy (Setotal), Excel use (Rexceluse), Excel formula use (Rformulause), cognitive ability (WPTQ), and task-switching IAT score (TSA_score) Interpret this… And this… And finally this Week 6 and 7 Categorical variables  When categorical variable has 2 categories (male/female, dead/alive, employed/not employed), can put it directly into regression  When categorical variable has more than 2 categories (freshman/sophomore/junior/senior, entry level/first line supervisor/manager), can’t input it directly into regression model  Have to dummy code categorical variable Categorical variables  Dummy variables: represent group membership using zeroes and ones  Have to create a series of new variables  Number of variables=number of categories - 1  Example: freshman/sophomore/junior/senior Categorical variables  Eight steps in creating and using dummy coded variables in regression:  1. Count number of groups in variable and subtract 1  2. Create as many new variables as needed based on step 1  3. Choose one of groups as baseline to compare all other groups against  Usually this will be the control group or the majority group  4. Assign values of 0 to all members of baseline group for all dummy variables Categorical variables  5. For first dummy variable, assign 1 to members of the first group that you want to compare against baseline group. Members of all other groups get a 0.  6. For second dummy variable, assign 1 to all members of second group you want to compare against baseline group. Members of all other groups get a 0.  7. Repeat this for all dummy variables.  8. When running regression, put all dummy variables in same block Categorical variables  Example: One variable with 4 categories: Freshman, sophomore, junior, senior. Categorical variables Categorical variables Categorical variables Categorical variables Categorical variables  Each dummy variable is included in the regression output  Regression coefficient for each dummy variable shows change in outcome that results when moving from baseline (0) to category being compared (1): difference in outcome between baseline group and other group  Example: Compared to freshmen, seniors’ attitudes towards college scores are 1.94 points higher  Significant t-value: group coded as 1 for that dummy variable significantly different on outcome than baseline group Moderation  Relationship between 2 variables depends on the level of a third variable  Interaction between predictors in model Moderation  Many research questions deal with moderation!  Example: In I/O psychology, moderation important for evaluating predictive invariance  Does the relationship between a selection measure and job performance vary depending on demographic group (Male vs. female, White vs. Black, etc.)?  Example: In clinical/counseling, moderation important for evaluating risk for mental illness  Does the relationship between exposure to a stressful situation and subsequent mental illness diagnosis vary depending on the individual’s social support network? Moderation Moderation  𝑌𝑖 = 𝑏𝑜 + 𝑏1 𝐴𝑖 + 𝑏2 𝐵𝑖 + 𝑏3 𝐴𝐵𝑖 + 𝑒𝑖  Basic regression equation with minor change: 𝐴𝐵𝑖  Outcome depends on  Intercept (𝑏𝑜 )  Score on variable A (𝑏1 𝐴𝑖 ), and relationship between variable A and Y  Score on variable B (𝑏2 𝐵𝑖 ), and relationship between variable B and Y  Interaction (multiplication) between scores on variables A and B (𝑏3 𝐴𝐵𝑖 ), and relationship between AB and Y Moderation  Moderator variables can be either categorical (low conscientiousness/high conscientiousness; male vs. female, etc.) or continuous (conscientiousness scores from 1-7)  Categorical: can visualize interaction as two different regression lines, one for each group, which vary in slope (and possibly in intercept) Moderation Moderation  Continuous moderator: visualize in 3-dimensional space: more complex relationship between moderator and predictor variable  Slope of one predictor changes as values of moderator change  Pick a few values of moderator and generate graphs for easier interpretation Moderation  Prior to analysis, need to grand-mean center predictors  Doing makes interactions easier to interpret (why we center)      Regression coefficients show relationship between predictor and criterion when other predictor equals 0 Not all variables have meaningful 0 in context of study: age, intelligence, etc. Could end up trying to interpret effects based on non-existing score (such as the level of job performance for person with intelligence score of 0) Once interactions are factored in, interpretation becomes increasingly problematic Also reduces nonessential multicollinearity (i.e., correlations due to the way that the variables were scaled) Moderation  Grand mean centering: subtract mean of variable from all scores on that variable  Centered variables used to calculate interaction term  Creates interaction variable  Don’t center categorical predictors  Just make sure it is scaled 0 and 1  Don’t center outcome/dependent variable  Centering only applies to predictors Moderation  For centered variable, value of 0 represents the mean value on the predictor  Since transformation is linear, doesn’t change regression model substantially  Interpretation of regression coefficients easier   Without centering: interaction = how outcome changes with one-unit increase in moderator when predictor = 0 With centering: interaction = how outcome changes with one-unit increase in moderator when predictor = mean Grand mean centering Moderation  Steps for moderation in SPSS:  1. Grand-mean center continuous predictor(s)  2. Enter both predictor variables into first block  3. Enter interaction term in second block  Doing it this way makes it easier to look at R2 change  4. Run regression and look at results  5. If interaction term significant:   Categorical predictor: Line graph between predictor and DV, with a different line for each category Continuous predictor: Simple slopes analysis Simple slopes analysis  Basic idea: values of outcome (Y) calculated for different levels of predictor and moderator: low, medium, and high  Usually defined as -1 SD, mean, + 1 SD  Recommend using online calculator for these (can be done by hand, but it’s a pain)  http://www.jeremydawson.co.uk/slopes.htm  http://quantpsy.org/interact/mlr2.htm Simple slopes analysis  Example:  Aggression = 39.97 +(.17*video) + (.76*callous) + (.027(video*callous)  For 1 SD below on video games at low levels of callous unemotionality: 39.97 + (.17*-6.9622)+(.76*-9.6177)+(.027*(-6.9622*9.6177) = 33.29  Would do this 8 more times so that you had values of aggression at low, medium, and high levels of callous unemotionality and video game playing Simple slopes analysis Creating interaction term Entering variables Entering variables Output Simple slopes analysis Low Callous Dependent variable 5 4.75 4.5 4.25 4 3.75 3.5 3.25 3 2.75 2.5 2.25 2 1.75 1.5 1.25 1 0.75 0.5 0.25 0 Husbands Wives Low Attractiveness High Attractiveness Research Designs Comparing Groups Week 8 Quasi-experimental designs Quasi-experiments  No random assignment  Goal is still to investigate relationship between proposed causal variable and an outcome  What they have:   Manipulation of cause to force it to happen before outcome  Assess covariation of cause and effect What they don’t have:  Limited in ability to rule out alternative explanations  But design features can improve this One group posttest only design X  O1 Problems:  No pretest: did anything change?  No control group: what would have happened if IV not manipulated?  Doesn’t control for threats to internal validity One group posttest only design  Example: An organization implemented a new pay-forperformance system, which replaced its previous pay-byseniority system. A researcher was brought in after this implementation to administer a job satisfaction survey One group pretest-posttest design O1 X O2  Adding pretest allows assessment of whether change occurred  Major threats to internal validity:  Maturation: change of participants due to natural causes  History: change due to historical event (recession, etc.)  Testing: desensitizing participants to the test, using the same pretest for posttest One group pretest-posttest design  Example: An organization wanted to implement a new pay-for-performance system to replace its pay-byseniority system. A researcher was brought in to administer a job satisfaction questionnaire before the pay system change, and again after the pay system change Removed treatment design O1 X O2 O3 X O4  Treatment given, and then removed  4 measurements of DV: 2 pretests, and 2 posttests  If treatment affects DV, DV should go back to its pretreatment level after treatment removed  Unlikely that threat to validity would follow this same pattern  Problem: assumes that treatment can be removed with no lingering effects  May not be possible or ethical (i.e., ethical conundrum: taking away schizophrenic patients’ medicine treatment; possibility conundrum: therapy for depression, benefits would still be experienced) ) Removed treatment design  Example: A researcher wanted to evaluate whether exposure to TV reduced memory capacity. Participants first completed a memory recall task, then completed the same task while a TV plays a sitcom in the background. After a break, participants again complete the memory task while the TV plays in the background, then complete it again with the TV turned off. Repeated treatment design O1 X O2 X O3 X O4  Treatment introduced, removed, and then re-introduced  Threat to validity would have to follow same schedule of introduction and removal-very unlikely  Problem: treatment effects may not go away immediately Repeated treatment design  Example: A researcher wanted to investigate whether piped-in classical music decreased employee stress. She administered a stress survey, and then piped in music. One week later, stress was measured again. The music was then removed, and stress was measured again one week later. The music was then piped in again, and stress was measured a final time one week later. Posttest-only with nonequivalent groups NR X NR  Participants not randomly assigned to groups  One group receives treatment, one does not  DV measured for both groups  Big validity threat: selection O1 O2 Posttest-only with nonequivalent groups  Example: An organization wants to implement a policy against checking email after 6pm in an effort to reduce work-related stress. The organization assigns their software development department to implement the new policy, while the sales department does not implement the new policy. After 2 months, employees in both departments complete a work stress scale. Untreated control group with pretest and posttest NR O1 NR O1 X  Pretest and posttest data gathered on same experimental units  Pretest allows for assessment of selection bias  Also allows for examination of attrition O2 O2 Untreated control group with pretest and posttest  Example: A community is experimenting with a new outpatient treatment program for meth addicts. Current treatment recipients had the option to participate (experimental group) or not participate (control group). Current daily use of meth was collected for all individuals. Those in the experimental group completed the new program, while those in the control group did not. Following the program, participants in both groups were asked to provide estimates of their current daily use of meth. Switching replications NR O1 NR O1 X O2 O2 O3 X  Treatment eventually administered to group that originally served as control  Problems: O3  May not be possible to remove treatment from one group  Can lead to compensatory rivalry Switching replications  Example: An organization implemented a new reward program to reduce absences. After a month of no absences, employees were…The manufacturing organization from the previous scenario removed the reward program from the Ohio plant, and implemented it in the Michigan plant. Absences were gathered and compared 1 month later. Reversed-treatment control group NR O1 X+ O2 NR O1 X- O2  Control group given treatment that should have opposite effect of that given to treatment group  Rules out many potential validity threats  Problems: may not be feasible (pay/performance, what’s the opposite?) or ethical Reversed-treatment control group  Example: A researcher wanted to investigate the effect of mood on academic test performance. All participants took a pre-test of critical reading ability. The treatment group was put in a setting which stimulated positive mood (calming music, lavender scent, tasty snacks) while the control group was put in a setting which stimulated negative mood (annoying children’s show music, sulfur scent, no snacks). Participants then completed the critical reading test again in their respective settings. Randomized experimental designs Randomized experimental designs  Participants randomly assigned to groups  Random assignment: any procedure that assigns units to conditions based on chance alone, where each unit has a nonzero probability of being assigned to any condition  NOT random sampling!  Random sampling concerns how sample obtained  Random assignment concerns how sample assigned to different experimental conditions Why random assignment?   Researchers in natural sciences can rigorously control extraneous variables  People are tricky. Social scientists can’t exert much control.  Can’t mandate specific level of cognitive ability, exposure to violent TV in childhood, attitude towards women, etc. Random assignment to conditions reduces chances that some unmeasured third variable led to observed covariation between presumed cause and effect Why random assignment?  Example: what if you assigned all participants who signed up in the morning to be in the experimental group for a memory study, and all those who signed up in the afternoon to be in the control group?  And those who signed up in the morning had an average age of 55 and those who signed up in the afternoon had an average age of 27?  Could difference between experimental and control groups be attributed to manipulation? Random assignment  Since participants randomly assigned to conditions, expectation that groups are equal prior to experimental manipulations   Any observed difference attributable to experimental manipulation, not third variable Doesn’t prevent all threats to validity  Just ensures they’re distributed equally across conditions so they aren’t confounded with treatment Random assignment  Doesn’t ensure groups are equal  Just ensures expectation that they are equal  No obvious reason why they should differ  But they still could  Example: By random chance, average age of control group may be higher than average age of experimental group Random assignment  Random assignment guarantees equality of groups, on average, over many experiments  Does not guarantee that any one experiment which uses random assignment will have equivalent groups  Within any one study, groups likely to differ due to sampling error  But, if random assignment process was conducted over infinite number of groups, average of all means for treatment and control groups would be equal Random assignment  If groups do differ despite random assignment, those differences will affect results of study  But, any differences due to chance, not to way in which individuals assigned to conditions  Confounding variables unlikely to correlate with treatment condition Posttest-only control group design R X R O O  Random assignment to conditions (R)  Experimental group given treatment/IV manipulation (X)  Outcome measured for both groups (O) Posttest-only control group design  Example:  Participants assigned to control group (no healthy eating seminar) or treatment group (90 minute healthy eating seminar)  6 months later, participants given questionnaire assessing healthy eating habits  Scores on questionnaire compared for control group and treatment group Problems with posttest-only control group design  No pretest  If attrition occurs, can’t see if those who left were any different than those who completed study  No pretest makes it difficult to assess change on outcome Pretest-posttest control group design R P R P X O O  Randomly assigned to conditions  Given pretest (P) measuring outcome variable  One group given treatment/IV manipulation  Outcome measured for both groups  Variation: can randomly assign after pretest Pretest-posttest control group design  Example:  Randomly assign undergraduate student participants to control group and treatment group  Give pretest on attitude towards in-state tuition for undocumented students  Control group watches video about history of higher education for 20 minutes, while treatment group watches video explaining challenges faced by undocumented students in obtaining college degree  Give posttest on attitude towards in-state tuition for undocumented students Factorial designs  Have 2 or more independent variables   Naming logic: # of levels in IV1 x # of levels in IV2 x …# of levels in IV X 3 advantages:  Require fewer participants since each participant receives treatment related to 2 or more IVs  Treatment combinations can be evaluated  Interactions can be tested Factorial designs  R XA1B1 O R XA1B2 O R XA2B1 O R XA2B2 O For 2x2 design:  Randomly assign to conditions (there are 4)  Each condition represents 1 of 4 possible IV combinations  Measure outcome Factorial designs  Example:  2 IVs of interest: room temperature (cool/hot) and noise level (quiet/noisy)  DV = number of mistakes made in basic math calculations  Randomly assign to 1 of 4 groups:  Quiet/cool  Quiet/hot  Noisy/cool  Noisy/hot  Measure number of mistakes made in math calculations  Compare means across groups using factorial ANOVA Factorial designs  2 things we can look for with these designs:  Main effects: average effects of IV across treatment levels of other IV  Did participants do worse in the noisy than quiet conditions?  Did participants do worse in the hot than cool conditions  Main effect can be misleading if there is a moderator variable  Interaction: Relationship between one IV and DV depends on level of other IV  Noise level positively related to number of errors made, but only if room hot Within-subjects randomized experimental design R Order 1 Conditi O1 on 1 Conditi O2 on 2 R Order 2 Conditi O1 on 2 Conditi O2 on 1  Participants randomly assigned to either order 1 or order 2  Participants in order 1 receive condition 1, then condition 2  Participants in order 2 receive condition 2, then condition 1  Having different orders prevents order effects  Having participants in more than 1 condition reduces error variance Within-subjects randomized experimental design  Example:  Participants randomly assigned to order 1 or order 2  Participants in order 1 reviewed resumes with the applicant’s picture attached and made hiring recommendations. They then reviewed resumes without pictures and made hiring recommendations.  Participants in order 2 reviewed resumes without pictures and made hiring recommendations. They then reviewed resumes with the applicant’s picture attached and made hiring recommendations. Data analysis With 2 groups  Need to compare 2 group means to determine if they are significantly different from one another  If groups independent, use independent samples t-test   If participants in one group are different from the participants in the other group If repeated measures design, use repeated measures ttest With 3 or more groups  Still need to compare group means to determine if they are significantly different  If only 1 IV, use a one-way ANOVA  If 2 or more IVs, use a factorial ANOVA  If groups are not independent, use repeated measures ANOVA Design practice  Research question:   Does answering work-related communication (emails, phone calls) after normal working hours affect work-life balance? Design BOTH a randomized experiment AND a quasiexperiment to evaluate your research question  For each design (random and quasi):  Operationalize variables and develop a hypothesis(es)  Name and explain the experimental design as it will be used to test your hypothesis(es)  Name and explain one threat to internal validity in your design Week 9 Comparing means  2 primary ways to evaluate mean differences between groups:  t-tests  ANOVAs  Which one you use will depend on how many groups you want to compare, and how many IVs you have  2 groups, 1 IV, 1 DV: t-test  3 or more groups, 1 or more IVs, 1 DV: ANOVA   One-way ANOVA if only 1 IV Factorial ANOVA if 2 or more IVs t-tests  Used to compare means on one DV between 2 groups  Do men and women differ in their levels of job autonomy?  Do students who take a class online and students who take the same class face-to-face have different scores on the final test?  Do individuals report higher levels of positive affect in the morning than they report in the evening?  Do individuals given a new anti-anxiety medication report different levels of anxiety than individuals given a placebo? t-tests  2 different options for t-tests:  Independent samples t-test: individuals in group 1 are not the same as individuals in group 2  Do self-reported organizational citizenship behaviors differ between men and women?  Repeated measures t-test: individuals in group 1 are the same as individuals in group 2  Do individuals report different levels of job satisfaction when surveyed on Friday than they do when surveyed on Monday? A note on creating groups  Beware of dichotomizing a continuous variable in order to make 2 groups  Example: everyone who scored a 50% or below on a test goes in group 1, and everyone who scored 51% or higher goes in group 2  Causes several problems  People with very similar scores around cut point may end up in separate groups  Reduces statistical power  Increases chances of spurious effects t-tests and the linear model  t-test is just linear model with one binary predictor variable 𝑌𝑖 = 𝑏0 + 𝑏1 𝑥1 + 𝑒𝑖  Predictor has 2 categories (male/female, control/experimental)    Dummy variable: 0=baseline group, 1 = experimental/comparison group 𝑏0 is equal to mean of group coded 0 𝑏1 is equal to difference between group means Rationale for t - test  2 sample means collected-need to see how much they differ  If samples from same population, expect means to be roughly equivalent  Large differences unlikely to occur due to chance  When we do a t-test, we compare difference between sample means to difference we would expect if null hypothesis was true (difference = 0) Rationale for t-test  Standard error = gauge of differences between means likely to occur due to chance alone   Small standard error: expect similar means if both samples from same population Large standard error: expect somewhat different means even if both samples from same population  t-test evaluates whether observed difference between means is larger than would be expected, based on standard error, if samples from same population Rationale for t-test  Top half of equation = model  Bottom half of equation = error Independent samples t-test  Use when each sample contains different individuals  Look at ratio of between-group difference in means to estimate of total standard error for both groups  Variance sum law: variance of difference between 2 independent variables = sum of their variances  Use sample standard deviations to calculate standard error for each population’s sampling distribution Independent samples t-test  Assuming that sample sizes are equal: 𝑋1 − 𝑋2 𝑡= 2 𝑠 1 𝑠22 + 𝑁1 𝑁2  Top half: difference between means  Bottom half: each sample’s variance divided by its sample size Independent samples t-test  If sample sizes are not equal, need to use pooled variance, which weights variance for each sample to account for sample size differences  Pooled variance: 2 sp  n1  1    n2  1 n1  n 2  2 2 s1 2 s2 Independent samples t-test  Equation for independent samples t-test with different sample sizes: t X1  X 2 2 sp n1  2 sp n2 Differences between groups Error Paired samples/repeated measures t-test  Use when same people are in both samples  Average difference between scores at measurement 1 and measurement 2: 𝐷  Shows systematic variation between measurements  Difference that we would expect between measurements if null hypothesis true: 𝜇𝐷  Since null hypothesis says that difference = 0, this cancels out  Measure of error = standard error of differences: 𝑠𝐷 𝑁 Paired samples/repeated measures t-test D  D t sD N = 0 and cancels out (what we would expect to see if the null were true) Assumptions of t-tests  Both types of t-tests are parametric and assume normality of sampling distribution  For repeated measures, refers to sampling distribution of differences  Data on DV have to be measured at interval level  Can’t be nominal or ordinal  Independent samples t-test assumes variances of each population equivalent (homogeneity of variance)  Also assumes scores in each sample independent of scores in other sample Assumptions of t-tests  Independent samples t-tests will automatically do Levene’s test for you  If Levene’s not significant, homogeneity of variance assumption met: interpret first line of output (equal variances assumed)  If Levene’s is significant, homogeneity of variance assumption not met: interpret second line out output (equal variances not assumed) Independent samples t-test example  DV = Number of items skipped on ability test  Group 1: Took test in unproctored setting  Group 2: took test in proctored setting Independent samples t-test example Independent samples t-test example Independent samples t-test Independent samples t-test  Need to report effect size  Can convert to r:  r = √(7.65*7.65)/(7.65*7.65)+1642.492 ))= .184 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored) Independent samples t-test  More commonly use d: 𝑋1 − 𝑋2 𝑑= 𝑠2  d = (.56-.23)/1.431 = 0.23 Values taken from Slide 21 Independent samples t-test (proctored v. unproctored)  Note on d: Book shows d calculation using only 1 sd  In practice, more common to use pooled standard deviation  Interpretation (Cohen, 1988): .20 = small, .50 = medium, .80 = large  Negative d means that 𝑋2 larger than 𝑋1 Repeated measures t-test example  DV = Perceptions of procedural justice  Measurement 1: Participants took one type of Implicit Association Test (task-switching ability)  Measurement 2: Participants took traditional cognitive ability test (WPT-Q) Repeated measures t-test example Repeated measures t-test example Repeated measures t-test example Repeated measures t-test effect sizes  Still need to calculate effect sizes  Problem with r in repeated measures t-test: tends to over-estimate effect size  Better off using d with repeated measures designs: better estimate of effect size  Formula for repeated measures d = (D – μD)/S Comparing the t-tests  If you have the same people in both groups, ALWAYS use repeated measures t-test (or you violate one of the assumptions of the independent t-test)  Non-independence of errors violates assumptions of independent samples t-test  Power is higher in repeated measures t-test  Reduces error variance by quite a bit since same participants are in both samples One-way ANOVA  ANOVA = analysis of variance  One-way ANOVA allows us to compare means on a single DV across more than 2 groups Why we need ANOVA  Doing multiple t-tests (control vs. group 1, control vs.     group 2, etc.) on data inflates the Type I error rate beyond acceptable levels Familywise error rate assuming α = .05 for each test: 1 – (.95)n n = number of comparisons being made So, with 3 comparisons, overall α = .143 With 4 comparisons, overall α = .185 ANOVA and the linear model  Mathematically, ANOVA and regression are the same thing!  ANOVA output: F-ratio: comparison of systematic to unsystematic variance  Same as F ratio in regression: shows improvement in prediction of outcome gained by using model as compared to just using mean  Only difference between ANOVA and regression: predictor is categorical variable with more than 2 categories  Exactly the same as using dummy variables in regression  Linear model with # of predictors equal to number of groups -1 ANOVA and the linear model  Intercept (b0) will be equal to the mean of the baseline group (group coded as 0 in all dummy variables  Regression coefficient b1 will be equal to the difference in means between baseline group and group 1  Regression coefficient b2 will be equal to the difference in means between baseline group and group 2 F ratio 𝐹= 𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑢𝑛𝑠𝑦𝑠𝑡𝑒𝑚𝑎𝑡𝑖𝑐 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑒𝑟𝑟𝑜𝑟)  Systematic variance, in ANOVA, is mean differences between groups  Null hypothesis: group means are same  In this case, systematic variance would be small  Thus, F would be small ANOVA logic  Simplest model we can fit to data is grand mean (of DV)  We try to improve on this prediction by creating a more complex model  Parameters include intercept (b0) and one or more regression coefficients (b1, b2, etc.)  Bigger regression coefficients = bigger differences between groups  If between group differences large, model better fit to data than grand mean  If model fit is better than grand mean, then between-group differences are significant Total sum of squares (SST) SST  (xi  x grand) 2  This shows the total amount of variation within the data  Grand mean on DV subtracted from each observation’s value on DV  Total degrees of freedom for SST: N-1 Model sum of squares (SSM) SS M   ni (xi  x grand) 2  This shows how much variance the linear model explains  Calculate difference between mean of each group and grand mean, square this value (each value), then multiply it by the number of participants in the group  Add the values for each group together  Degrees of freedom: k – 1, where k is number of groups Residual sum of squares (SSR) SS R  (x i  x i ) 2  This shows differences in scores that aren’t explained by model (i.e., aren’t explained by between-group differences)  Calculated by subtracting the group mean from each score, squaring this value, and then adding all of the values together  Degrees of freedom = N – k, where k = number of groups and N is overall sample size Mean squares  To get a mean square value, divide sum of squares value by its degrees of freedom  Mean square model (MSM) = SSM/k-1  Mean square residual (MSR) = SSR/N - k F ratio  Calculated using mean square values:  Degrees of freedom for F: (k-1), (N – k)  If F is statistically significant, group means differ by more than they would if null hypothesis were true  F is omnibus test: only tells you whether group means differ significantly: there’s a difference somewhere  Doesn’t tell you which means differ from one another  Need post-hoc tests to determine this Post-hoc tests  Pairwise comparisons to compare all groups to one another  All incorporate correction so that Type I error rate is controlled (at about .05)  Example: Bonferroni correction (very conservative): use significance level( usually .05) α/n, where n is number of comparisons  So, if we have 3 groups and we want to keep α at .05 across all comparisons, each comparison will have α = .017 Post-hoc tests  Lots of options for post hoc tests in SPSS  Some notes on the more common ones:  Least significant difference (LSD): doesn’t control Type I error very well  Bonferroni’s and Tukey’s: control Type I error rate, but lack statistical power (too conservative)  REGWQ: controls Type I error and has high power, but only works if sample sizes equal across groups  Games-Howell: less control of Type I error, but good for unequal sample sizes and unequal variance across groups  Dunnett’s T3: good control of Type I error, works if unequal variance across groups Assumptions of ANOVA  Homogeneity of variance: can check with Levene’s test  If Levene’s significant and homogeneity of variance assumption violated, need to use corrected F ratio   Brown-Forsyth F Welch’s F  Provided group sizes equal, ANOVA works ok if normality assumption violated somewhat  If group sizes not equal, ANOVA biased if data non-normal  Non-parametric alternative to ANOVA: Kruskal-Wallis test (book covers in detail) Steps for doing ANOVA Effect sizes for ANOVA  R2: SSM/SST  When applied to ANOVA, value called eta squared, η2  Somewhat biased because it’s based on sample only: doesn’t adjust for looking at effect size in population  SPSS reports partial eta squared, but only for factorial ANOVA: SSB/SSB+ SSE  Better effect size measure for ANOVA: omega- squared (ω2 ; SPSS will not measure for you) 𝑆𝑆𝑀 − (𝑑𝑓𝑀 )𝑀𝑆𝑅 2 𝜔 = 𝑆𝑆𝑇 + 𝑀𝑆𝑅 One-way ANOVA in SPSS  IV: Counterproductive work behavior (CWB) scale that varied in its response anchors: control, infrequent, & frequent  DV: self-reported CWB One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS One-way ANOVA in SPSS  Calculating omega-squared:  𝜔2 = 21.49− 2 2.29 664.996+2.29  = .025  Suggestions for interpreting 𝜔2 :    .01 = small .06 = medium .14 = large
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            