Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ten Difference Score Myths By Jeffery R. Edwards Presented by Chelsea Hutto Difference Scores Typically used to represent the similarity between two constructs Highly used in studies of person-job fit, similarity between employee and organizational values, match between employee expectations and experiences, and the agreement between performance ratings. Suffer from many methodological problems Polynomial Regression Analysis These problems can be avoided by using PRA Uses components of difference scores along with higher order terms to represent relationships of interest in congruence research. Treats difference scores as statements of hypotheses to be tested empirically. Also supported by Cafri et. al (2009). Misconceptions Regarding Problems with DS Myth 1:The Problem with Difference Scores is low reliability Low internal consistency reliability as been viewed as only serious problem with DS Reliability of any measure is ultimately an empirical matter Problem is not whether DS are reliable in an absolute sense but also whether or not they are more reliable than other alternatives Even with adequate reliabilities – does not solve other issues Myth 2: Difference Scores Provide Conservative Statistical Tests Stat’s tests based on DS labeled as conservative Sometimes seen as appropriate for exploratory research DS are also likely to invite conclusions that signify false positives, such that stats tests effectively become liberal. Have not been scrutinized by PRA Conservatism usually corresponds to effect sizes that are biased downward and Type 1 error rates that are minimized at the cost of Type 2 error. Need a balance between liberal and conservative Alternatives to DS that are themselves problematic Myth 3:Measures that elicit direct comparisons avoid problems with difference scores Merely shift the responsibility of creating a DS from the researcher to the respondent – must calculate response – error Direct comparisons is double barreled, combines two distinct concepts into a single score Construct validity of direct comparisons - questionable Myth 4: Categorized Comparisons Avoid Problems with DS Creation of subgroups based on the congruence between two component measures to avoid problems with DS Some researchers even say it could solve reliability issues Creates illusion Accentuates the loss of information and reduction in explained variance Just makes things worse Myth 5: Product Terms are Viable Substitutes for DS Some turn to product terms tested hierarchically in multiple regression analysis as last resort Captures the interaction between two variables Does not represent the effects of congruence for continuous measures Myth 6:Hierarchical Analysis Provides Conservative Tests of DS Some studies statistically control for component measures before estimating the effects of DS. Characterized as conservative Components are controlled when testing interactions using product terms Does not yield conservative tests of DS, instead alters the relationships DS are intended to capture Misunderstandings or misguided criticisms of PRA Myth 7: PRA is an Exploratory, Empirically-Driven Procedure Claimed that PR capitalizes on sample specific variance to maximize the amount of variance explained Primary goal of PRA is to test hypotheses derived from theories of congruence Also provides an explicit test of this hypothesis whereas using an algebraic difference score incorporates this hypothesis as an untested assumption DS allow congruence hypotheses to evade empirical scrutiny Lack of evidence necessary to confirm or reject hypotheses. Myth 8: Polynomial Regression Suffers from Multicollinearity Myth 9: Higher-Order Terms Do Not Enhance the Understanding of Congruence Concerns of multicollinearity between lower order and higher order terms are unfounded Interpretation of higher order terms can be difficult, such difficulties arise from attempts to interpret coefficients on higher order terms individually. Can be avoided by using response surfaces as the intermediary between congruence hypotheses and PR coefficients Myth 10: PR Eliminates the Concept of Congruence Comes from the assumption that a DS represents a concept that is distinct from its components. Argued that DS and their component measures are not conceptually interchangeable. DS is calculated from its components it cannot represent a construct that is conceptually or operationally distinct from its components. Assumptions All can be tested empirically – so why argue? PRA has its limitations More comprehensive and conclusive that information obtained from difference scores Things I Have Learned (So Far) by Jacob Cohen Some Things You Learn Aren’t So Proper sample size of 30 cases per group when comparing groups Any lower than 30 required specialized handling with “small sample statistics” Versus critical-ratio approach Can lead to only a fifty-fifty chance of getting significant results Less is More Should be studying few IV’s and even fewer DV’s Which DV’s are real and which are due to chance As number of IV’s increase chances their redundancy in regards to criterion relevance also increases Reporting numerical results What does r = .12345 really mean? Serve as a distraction from meaningful leading digits Simple Is Better Reporting of Data and Representation Do not usually make it possible for most of us or consumers of products to actually see and understand the distribution Need for graphic representation Computers and Statistical packages Loss of contact with data Idea that knowledge of statistics isn’t necessary to use Compositing of Values Beta weights vs. unit weights Generate a higher correlation than any other weight. CATCH! Only guarantees to be better than unit weights for the sample on which they were determined. Very rare circumstances when Beta is better Unit weights are usually more practical (+1 for positively related predictors, -1 for negatively related predictors, and 0). Work well outside of multiple regression when we have criterion data Better on standardized scores for our purposes than those generated by program The Fisherian Legacy Based on principle that science proceeds only through inductive inference, which is achieved by rejecting the null hypothesis, usually at .05 level. Misinterpretation of Yes/No decision feature Research is frequently designed to produce decisions, although things are not always so clearly decision oriented Null Hypothesis – any statement about a state of affairs in a population, usually the value of a parameter, frequently zero. It is called a null hypothesis because the strategy is to nullify it or because it means “nothing doing”. The Dreaded .05 Level Basis for decision – cut off level Lead to possible data fudging to massively altering data to dropping cases where there “must have been errors” The Null Hypothesis Tests Us Results do not tell us the truth of the null hypothesis, must turn to Bayesian stats in which probability isn’t a relative frequency but a degree of belief. What is does tell us is the probability of the data given the truth of the null NOT THE SAME THING p Value P value does not tell us the probability that the null is true, then it cannot tell us the probability that the research is true. Rejection of null gives us no basis for estimating the probability that a replication of the research will again result in rejecting the null. True meaning of statistical significance Effect is not nil, and nothing more Temptation Problems with NH If the NH is almost always false, what’s the big deal about rejecting it? Also supported by Trafimow and Rice (2009). If tests exceeded critical value, you could conclude that null is false, but if you fell short of that value you couldn’t conclude it was true. Reality: Can’t conclude anything. If null was false – had to be false to some degree Power Analysis Based on four parameters Alpha significance criterion Sample size Population effect size Power of the test Made it possible to “prove” null hypotheses By showing that it is of no more than negligible or trivial size Must consider the magnitude of effects How To Use Statistics Use of graphic and numerical analyses in ways in which we can understand them. Plan the research Must have credible set of specifications or discover research is not possible. Use of effect size measures which include mean differences, correlations, and squared correlation of all kinds. All of which will lead you to a sample effect size How To Use Statistics After finding the sample effect size, attach a p value (or better) a confidence interval. Most important rule – judgment of the scientist Take Home Message A single piece of research doesn’t settle an issue once and for all. Only a successful future replication in same and different settings provides an approach to settling the issue. .05 should not be a cliff, but a reference point along the possibility-probability continuum. Things take time. The Earth Is Round (p <.05) By Jacob Cohen Problems with Null Hypothesis Does not tell us what we want to know Given this data, what is the probability that NH is true Really says, “Given that NH is true, what is the probability of these (or more extreme) data?” The Permanent Illusion Misapplication of deductive syllogistic reasoning Invalid Bayesian interpretation Level of significance at which the NH is rejected (.05) is the probability that it is correct, or at least that it is of low probability Why P(D|Ho) ≠ P(Ho|D) P(D|Ho) = when Ho is tested, finding the probability that the data could have arisen if Ho were true The real issue = P(Ho|D) the inverse probability The probability that Ho is true given the data Reason why we conduct statistical tests – to be able to reject Ho because of its unlikelihood Posterior Probability Available only through Bayes’s theorem Have to know the probability of the NH before the experiment, the “prior” probability P(Ho) Problem: We do not normally know this Can be done through Bayesian Stat’s by posting prior probability or distribution of probabilities. Extremely unreliable Use of different prior probabilities G.K. Huysamen (2005). Illusion of Attaining Improbability Also known as Bayesian Id’s Wishful Thinking Error Extremely easy to make Made by 68 out of 70 academic psychologists studied by Oakes (1986, pp. 79-82). Problem: Belief that after a successful rejection of Ho it is highly probable that replications will also result in rejection of Ho. Could not be farther from the truth Just because Ho is rejected does not mean that the theory is established. Remember – Science experiment is not to make decisions but to make adjustments to the degree of belief. The Nil Hypothesis The null in Ho is taken to mean nil or zero Which is mistakenly thought as the effect size is 0 – that the population mean difference, correlation, and raters reliability is 0 (a Ho that can almost always be rejected, even with a small sample) Criticism – Where its use may be valid only for true experiments involving randomization (controlled clinical trials) or when any departure from pure chance is meaningful (laboratory experiments or clairvoyance) What To Do Do not look for an alternative to NHST Must understand and improve data before we can generalize from our data Report ES through confidence intervals Improve our measurement by reducing the unreliable and invalid parts of the variance in our measures. Use of informed judgment when using theories Discussion Questions Why do you think many researchers still support NHST as it stands? Has psychology as a field become more focused on getting significant results rather than completing the proper process of an experiment? Do you think it is more prominent in other fields? How can we as psychologists eliminate confusion and misuse of NHST? References Cafri, G., Van den Berg P., & Brannick, M.T. (2009). What have the difference scores not been telling us? A critique of the use of self-ideal discrepancy in the assessment of body image and evaluation of an alternative data-analytic framework. Assessment, 17(3), 361-376. Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49(12), 997-1003. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312. Edwards, J.R. (2001). Ten difference score myths. Organizational Research Methods, 4(3), 265-287. Huysamen, G.K. (2005). Null hypothesis significance testing: ramifications, ruminations, and recommendations. South African Journal of Psychology, 35(1), 1-20. Trafimow, D. & Rice, S. (2009). A test of the null hypothesis significance testing procedure correlation argument. The Journal of General Psychology, 136(3), 261-269.