Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
White Paper on Measurement 2.0 Table of Contents 1. Introduction 2. Background Knowledgebase a. Psychometrics b. Therapist Variability in Clinical Outcomes c. Benchmarking 3. Components a. Patient Self-Report Items and Questionnaires b. Technology c. Data Standardization and Analytical Methods d. Clinician Feedback System 4. Clinician Participation 5. Future Directions 1. Introduction This white paper represents a collaboration of individuals from a wide range of public and private organizations and academic institutions. These individuals share a common commitment to improving treatment outcomes in behavioral health-care through the routine use of patientcompleted outcome questionnaires concurrently with treatment. The use of routine outcome assessment combined with providing ongoing feedback to the treating clinicians, is often referred to as outcomes-informed care. Technology involved in outcomes-informed care illustrated in this paper is collectively referred to as "Measurement 2.0," indicating accumulated innovations that are a substantial departure from 20th century practices. Back in the 20th century, in “Measurement 1.0,” symptom ratings were sold in identical forms to all users. Large-scale users, such as behavioral health organizations, had to buy copyrighted commercial test forms and pay prices from $1 to as high as $10 per administration, or pay for expensive remote scoring services. Test authors placed idiosyncratic limitations on how the test could be used, sometimes charging different prices for academic and commercial users. By the end of the 20th century, there were so many proprietary mental-health symptom checklists that it was hard to believe any single one was the best. The new approach to measurement, “Measurement 2.0,” realizes that client ratings of general distress are no longer rocket science. Well-known psychometric procedures can now be applied inexpensively to any questionnaire to assess whether or not all items work in the way they were intended. If organizations need to create or revise items, the new items can be evaluated in less than a day of statistical analysis once the data are collected. Therefore, dependence on proprietary measures is no longer necessary. As an outcomes-informed care system, Measurement 2.0 is comprised of the following components: (a) patient self-report items and questionnaires, (b) data collection technology and procedures, (c) data standardization and analytical methods, and (d) feedback system. As a general flow, the patient in therapy is requested by the participating clinician to complete a brief self-report questionnaire comprised of common and specific psychological symptoms and patient-therapist treatment alliance indicators. The self-report forms completed by the patients are then faxed to a central server that scans the information into a central database. Then, based on the overall collected data, the patients’ responses are standardized so that they can be compared to one another. The standardized outcomes of the patients are then communicated back to the clinicians via a web-based toolkit, allowing the clinicians to keep track of their clients’ progress as well as reevaluate their current course of treatment with each of their patients. Measurement 2.0 is thus a compilation of several recent advancements in behavioral health assessments, notably psychometrics, research on therapist variability in clinical outcomes, and treatment effectiveness benchmarking methodology. This outcomes-informed care technology has several advantages to all previous and other current outcomes-informed care technologies that have been investigated. First, rather than using a questionnaire using a fixed set of items, it could flexibly incorporate items that are individually tailored to the specific needs of the organization, the therapist, and/or that of the individual client. Second, although such flexibility often results in difficulty comparing across individual clients and organizations, use of item response theory allows for comparisons across clients regardless of their individualized assessment. Therefore, the individualized questionnaires that the clients fill out retain the utility at all levels, including that of the clients, therapists, and organizations. 2. Background Knowledgebase a. Psychometrics In the past, most outcomes-informed care initiatives in behavioral health were based on normed, copyrighted questionnaires originating in academic research. However, from the standpoint of organizations interested in implementing outcomes-informed care initiatives, relying on these measures was advantageous only if the organization had no access to large quantities of normative data. However, large-scale commercial outcomes-informed care initiatives generate data at rates and amounts far surpassing any research study, so the size, quality, and relevance of the normative data quickly exceeds data available for any research-based instruments. Copyrighted questionnaires have additional disadvantages in large-scale outcomes-informed care initiatives. Aside from the obvious issue of cost, the end user (e.g., clients, therapists, organizations) has no control over the measured items. Test publishers are generally reluctant to permit the modification of their questionnaires, including rewording items, reducing the number of items, and adding new items. Such prohibitions severely limit the end user’s capacity to optimize the questionnaires based on their measurement needs. These limitations of copyrighted questionnaires led larger managed care organizations to devise their own outcome measures. With the aid of in-house and consulting statisticians, organizations can create items and questionnaires that are as sophisticated as research-based measures yet are flexible and fit their measurement needs. The Wellness Assessment (Adult and Youth versions) employed by Optum Health is an example of such a questionnaire. In addition, other organizations represented by the authors of this article continue to develop items and questionnaires based on sophisticated analyses incorporating both classical test theory and item response theory (IRT). This psychometric approach, which the authors sometimes refer to as the “psychometric torture test,” was established at Vanderbilt University and was applied to develop the Peabody Battery (Bickman et al., 2006), which is a set of 10 free measures for providing feedback to clinicians about the progress of child clients. The purpose of the approach was to make item evaluation objective and public, so there is a clear evidence-based rationale when items are retained, revised, or dropped. The “torture test” is ecumenical. It views the three main approaches to psychometrics (classical test theory, factor analysis, and item response theory) as useful tools rather than as competing religions. Classical test theory includes a number of useful tools, such as Cronbach’s alpha (Cronbach, 1951; Cronbach & Shavelson, 2004), item-total correlations, and the SpearmanBrown (Brown, 1910; Spearman, 1910) approximation. Factor analysis may be exploratory and/or confirmatory, and it offers tools such as scree plots, factor loadings, and, most importantly, model-fit statistics (Yu, 2002). Item Response Theory (IRT) has two broad streams, oneparameter Rasch models (Bond & Fox, 2001; Rasch, 1980), which offer many tools for practical test construction, and multiple-parameter IRT (Embretson & Hershberger, 1999; Embretson & Reise, 2000), which offers a great depth of features for model-based measurement research. The torture test provides the test developer with exhaustive figures of merit for every test item and other figures of merit for the test as a whole. All statistics are flagged by the statistician as good or bad. After a brief explanation, test developers who are not statisticians have been able to make evidence-based decisions on item revision and retention, and make reasonable decisions about the tradeoff between test length and reliability. Readers interested in the technical details will find examples in the free manual for the Peabody Battery (Bickman et al., 2006), or in two articles (Greco, Lambert, & Baer, 2008; Walker, Beck, Garber, & Lambert, 2008). The Psychological Assessment article contains the most technical detail. b. Therapist Variability in Client Outcomes The idea that clinicians differ in clinical competence and effectiveness is a hypothesis that has been entertained since perhaps the inception of psychotherapy itself (e.g., Luborsky, 1952; Rosenzweig, 1936). However, 20th century results were obscure until the statistical method of hierarchical linear modeling (HLM; Raudenbush & Bryk, 2002) became widely available to psychologists. This approach has various names, such as random regression (Hedeker & Gibbons, 2006), multilevel modeling (Snijders & Bosker, 1999), and mixed modeling (Pinheiro & Bates, 2000). By the late 1990’s dependable software was available, e.g. from SAS (Littell, Milliken, Stroup, Wolfinger, & Schabenger, 2006); HLM, (Raudenbush & Bryk, 2002); SPSS (Bickel, 2007); Stata (Rabe-Hesketh & Skrondal, 2005) and S+ or R (Pinheiro & Bates, 2000). Behavioral health data is well fitted for HLM because there are multiple patients who are seen by the same clinician. HLM supports estimates of variance both at the client and therapist levels; a third level, e.g., clinic may simultaneously be modeled as well. Before user-friendly software was available for HLM, clinical trials and treatment evaluation studies ignored the problem of nested data. As such, they likely overestimated the effect of the type of treatment rather than who is administering the treatment (Wampold & Serlin, 2000; Wampold, 2001; Kim, Wampold & Bolt, 2006). Recent analyses have since revealed that the percentage of variance in treatment outcomes attributed to therapists far exceed the variance due to the type of treatment (Luborsky et al., 1986; Crits-Christoph et al, 1991; Crits-Christoph & Mintz, 1991; Wampold & Serlin, 2000; Wampold, 2001; Kim, Wampold & Bolt, 2006). In addition, almost three decades of randomized clinical trials and meta-analyses of these trials have consistently supported the “dodo bird effect,” that all bona fide psychological therapies appear to have similar treatment effects (Smith & Glass, 1977; Smith & Glass, 1980; Shapiro & Shapiro, 1982; Robinson, Berman, & Neimeyer, 1990; Wampold, Mondin, Moody, & Ahn, 1997; Ahn & Wampold, 2001; Wampold, 2001; Luborsky et al., 2002; Lambert & Ogles, 2004). Further, such effects have been found in studies investigating the effects of antidepressants (Kim, Wampold, & Bolt, 2006; Wampold & Brown, 2005). In particular, Wampold and Brown found that for patients who are receiving medications and psychotherapy concurrently, the effect of medications was impacted by the clinician’s competence in providing psychotherapy. These findings suggest that the contributions made by individual therapists far exceed the effects of the type of treatment. The current Measurement 2.0 database, which contains cases from multiple organizations, shows that therapists do indeed vary in their clinical outcomes. In addition, these differences appear to be stable over time. When patients are referred to clinicians who have above average outcomes in the past, these patients are more likely to report better than average outcomes. In addition, the data has shown that clinicians who support and actively participate in the outcomes-informed care initiatives, on average, report above-average results than those who participate sparsely. Therefore, in addition to the research evidence, it is in the best interest of effective clinicians to participate in outcomes-informed care initiatives because participation would likely provide favorable evidence of their clinical effectiveness. On the other hand, for clinicians who are getting below-average results, efforts to improve their outcomes by use of the feedback tool would also likely lead to favorable results. Likewise, it is in the interest of employers and payers of mental health services to identify these effective clinicians in order increase patient access to these highly effective clinicians. Conversely, it is also in the interest of payers to identify clinicians whose clinical competence is less than optimal so as to implement strategies to improve their clinical competence. Most importantly, outcomes-informed care initiatives are in the patients’ best interest. Outcomes-informed care can help connect clients with the most effective therapists and they can improve outcomes by the effects of outcome monitoring itself. What is good for clients can be a common ground for therapists, clinics, and insurers whose existence are justified—or should be—by what is good for the client. c. Benchmarking To say whether particular outcomes are “good,” we ask the question, “Compared to what?” That there is variability among therapists' effectiveness need not be a problem if even the least effective therapists are still effective enough based on some criterion. Conversely, variability among therapists becomes less of an issue if at all, if those who are considered relatively highly effective, are very ineffective based on some criterion. Therefore, the question of criterion becomes central. Such evaluations that compare data obtained from outcomes-informed care initiatives and other real-world clinical settings are called benchmarking. Over the past decade, a growing number of research articles have addressed methods of benchmarking behavioral health treatment outcomes (Azocar et al, 2007; Brown et al, 2002; Brown et al, 2005; Matumoto, Jones, & Brown, 2003; Minami et al., 2007; Minami et al., 2008). Traditionally, benchmarks established from research have been considered more valid than treatment outcome data based on real-world outcomes-informed care initiatives. Consequently, a recent study has compared data from outcomes-informed care initiatives against benchmarks derived from clinical trials and has found that behavioral health services provided in a managed care context has similar treatment effect compared to well-controlled clinical trials (Minami et al., 2008). However, as previously mentioned, data accumulated from large-scale outcomesinformed care initiatives such as Measurement 2.0 has now far exceeded the amount of data obtained from any clinical trial, and therefore, it is expected that future benchmarks will be derived from these outcomes-informed care initiatives rather than from research. An additional issue that supports future benchmarks to be established using large-scale outcomes-informed care initiatives is that unlike randomized clinical trials, real-world clinical data include patients that would be considered unfit to include in clinical trials (e.g., low initial severity, co-morbid conditions). Whereas clinical trials pursue homogeneity in their patients, clinics in the real world do not choose patients who come through their door. Therefore, the benchmarks currently established from randomized clinical trials are clearly not suited for patients in these clinical settings, and any benchmarks established needs to accommodate this heterogeneity in patient population. Consequently, Measurement 2.0 extends the established benchmarking method to incorporate this heterogeneity in patient population. We refer to the resulting indicator of treatment effect as the Severity Adjusted Effect Size (SAES), which will be explained more in detail under Data Standardization and Analytical Methods. 3. Components a. Patient Self-Report Items and Questionnaires Although there have always been criticisms within academic institutions against patient selfreport questionnaires and items as being unreliable and invalid, such criticisms are difficult to substantiate based on two viewpoints. First, there has been consistent evidence pointing to larger treatment effect when measured by clinicians (Lambert, Hatch, Kingston, & Edwards, 1986; Minami et al., 2007; Smith, Glass, & Miller, 1980), even when the evaluating clinician is not the one providing treatment. Therefore, it is conceivable that the clinicians consider their patients doing better than what the patients subjectively perceive. Second, there has been a historical shift in behavioral health with regards to who has the “authority” to determine whether or not the patient improved. Whereas traditionally the clinicians had sole authority, in recent years, the patients have been empowered so as to consider one’s own sense of distress (or lack thereof) as an important indicator of improvement. Therefore, from these perspectives, it is unreasonable to assume that patient’s self-reported clinical symptoms are invalid. A more practical issue from the perspective of implementing outcomes-informed care initiatives is that it is untenable to establish a system based on clinician’s report of treatment progress. The three main reasons is that (a) clinicians, overwhelmed with administrative responsibilities as it is (Rupert & Baird, 2004), are most likely unwilling to increase further administrative tasks, (b) clinicians, especially if treating the patient, are very motivated to find “progress” that may or may not be important from the patient’s perspective, and (c) cost would prohibit implementation of any system that requires an independent clinician (i.e., not the treating clinician) to assess patient outcomes. Therefore, these reasons also argue against outcomes-informed care initiatives that center around clinicians’ clinical evaluations of patients. However, this is not to say that clinicians’ inputs are unnecessary or unimportant in outcomesinformed care initiatives. Specifically, feedback from communities of clinicians has been thoroughly incorporated throughout the development of Measurement 2.0, including professional organizations such as the American Psychological Association and the American Psychiatric Association. In particular, based on numerous feedback, Measurement 2.0 has clearly moved away from assessment of progress based on a particular set of items on a single questionnaire, regardless of how global these items may be. Rather, numerous items have been, and continue to be populated, based on input from communities of clinicians so as to better reflect their patients’ issues. By incorporating IRT to discern how each item behaves relative to other items, standardization has been possible at the item level so that meaningful comparisons could still be done even if two forms do not have identical items. b. Technology It is a common misunderstanding that the newest technology is always the best for all systems. The same could be said about outcomes-informed care initiatives that often aim to be too technically “advanced.” Based on over two decades of outcomes-informed care implementation, it is now firmly established that Measurement 2.0 cannot be widely implemented if data collection is done purely online. The main reasons are all related to cost. First, as compared to traditional paper-and-pencil methods, the cost becomes prohibitive if the clinicians intend to collect data from more than one patient simultaneously, which then requires multiple computers. Secondly, computers require much more space than paper. Third, whereas paper-and-pencil questionnaires require no instructions, computer-administered assessments require instructions given by a staff member. Fourth, it is more costly to take measures to protect patients’ confidentiality when they are filling out the questionnaires electronically than when using paper. Therefore, the technology involved with large-scale outcomes-informed care initiatives must be one that is low-cost and available anywhere—traditional paper-and-pencil questionnaires that are optimized for optical character and mark recognition software. These forms can easily be faxed to a central data server that scans the forms and electronically records the patients’ responses. c. Data Standardization and Analytical Methods SAES is derived through three statistical steps: (a) calculation of observed effect size, (b) calculation of predicted effect size based on case mix variables, and (c) adjusting the observed effect size based on the difference between the observed effect size and the predicted effect size (i.e., residual). The observed effect size is calculated simply by dividing the difference between the intake score and the score at last treatment with the standard deviation of the intake score. Although this statistic provides the absolute magnitude of the effect, in outcomes-informed care initiatives with patient heterogeneity, the observed effect sizes cannot be used to compare across patients due to case mix. Therefore, in the second step, the effect of the case mix is evaluated using multiple regression. Through the use of regression, the impact of each of the case mix variables is estimated, and a prediction is made with regards to the expected effect size given a particular case mix. As a result, in the third step, SAES is calculated by adding the difference between the expected effect size and the observed (i.e., residual) to the observed mean effect size of the outcomes-informed care population. In this way, the effect sizes are standardized across patients allowing for meaningful comparisons across patients, clinicians, and organizations regardless of the differences in case mix. Among numerous case mix variables, the initial severity of the patient’s distress as indicated by their intake score has invariably been the strongest predictor of change. Specifically, as greater distress has consistently led to greater observed effect size, SAES adjusts the observed effect size based on the patient’s initial severity. As a related issue, outcomes-informed care initiatives data have consistently revealed that a significant percentage (25~35%) of the patients will enter treatment with intake scores indicating low levels of distress that is comparable to individuals in the community who have never sought behavioral health services. This group of patients, although typically satisfied with the received services, on average does not show improvement on the outcome questionnaires, and rather, tend to trend upwards in the course of treatment (i.e., become “worse”). Currently, patients who initially score at this low level of distress are excluded from the analysis because (a) including this group would make any comparisons with clinical trials benchmarks invalid and (b) the general clinical symptom measure as it is currently developed do not appear to be appropriate in measuring treatment progress for this population. d. Clinician Feedback System Arguably the most crucial aspect of any outcomes-informed care is the feedback given to the clinicians with regard to their patients, as without this, it is questionable as to whether or not the patients will ever benefit from any initiatives. In Measurement 2.0, a web-based application (i.e., Clinician’s Toolkit) is used to provide clinicians access to their data to view both aggregated results as well as individual patient-level data with session-by-session test scores. As clinicians differ in what they would like to see, the Clinician’s Toolkit allows for customization of the reported data. The interactive aspects of the Toolkit help engage clinicians to monitor their own outcomes, becoming what we refer to as “outcomes-informed clinicians.” 4. Clinician participation The psychometric and technological challenges addressed in this article are trivial compared to the difficulty of engaging clinicians in the process. Not surprisingly, the message "I'm from the managed care company and I'm here to measure you " has not been met with enthusiasm by the majority of clinicians. Lack of interest in outcomes-informed care initiatives takes many forms. Most common are the understandable complaints about the additional workload on clinicians and staff. There is no getting around the problem that even the simplest questionnaires will require work for someone, and the frequent initial reaction from providers is to view measurement efforts as yet another administrative burden that might serve some agenda of the managed care organization, but have no benefit to patients or clinicians. This reaction is entirely understandable. Furthermore, there is often a knowledge gap between the providers in the field and the organizations encouraging the measurement initiatives. While senior executives in the organizations have no choice but to be familiar with the research basis for outcomes-informed care (not to mention have access to internal data), most providers rarely have the time to familiarize themselves with research findings. In addition, providers often regard any initiatives from the managed care organizations as further pursuit of profit. After all, managed care organizations are businesses. Clinicians’ participation rates vary even when there are clear incentives, such as an increase in referrals and/or higher reimbursement rates. The reasons take many forms, often having to do with concerns about measurement methods and most importantly, case mix adjustment. Providers often express the concern that because their cases are qualitatively different and/or more difficult than those of other providers, they would be unfairly penalized by any measurement efforts. It is therefore clear that many providers express anxiety regarding measuring their own outcomes and making these results available to potential patients. A close look at actual outcome data may provide some clues regarding the source of anxiety. The fact is that outcomes are highly variable, even for the most effective clinicians. Some patients get significantly worse, some experience rapid improvement, and most are somewhere in between. The simple fact of such high variability makes it cognitively impossible to "guesstimate" what their average outcome is. In addition, for any given case, clinicians likely have a number of different hypotheses as to why treatment was or was not effective for their clients. Therefore, the anxiety is likely rooted in the fact that clinicians, except for those who are actively participating in outcomes-informed care initiatives, have difficulty in objectively assessing what their actual outcomes are or how they might compare to their peers. It is natural to be anxious under these circumstances. At the same time, there is a subset of clinicians who do appear willing to participate without too much encouragement. While currently a clear minority, our data suggests that these clinicians tend to have exceptionally good outcomes. Therefore, it is clearly to the advantage of potential patients, employers, and payers that these clinicians be recognized and every effort made to steer referrals to them. Likewise, strong evidence of effectiveness could be used to justify increased reimbursement, regardless of their level of training. Therefore, in summary, it appears in the best interest of employers, health plans, managed care organizations, and most importantly, patients, to support these clinicians in their willingness to participate in outcomes informed care initiatives. Transparency in how data is analyzed and reported, combined with forthrightness as to the intended use of the data serve to lesson clinicians’ anxiety. Once clinicians see that their results are actually very good and they are recognized and rewarded for their efforts, ongoing (even enthusiastic) participation is likely. It does little good to argue against clinicians who are convinced that measurements are unnecessary. Limited resources are much better devoted to working with providers who are willing to collaborate in gathering practice-based evidence of effectiveness. Similarly, there is no reason for any organization to do anything that might be harmful to a provider's reputation or to their ability to earn a living. However, the unwillingness to say something negative about any provider should not prevent organizations from saying something positive about a clinician with strong evidence. Clearly, patients are better served if managed care organizations make effort to steer referrals towards clinicians with a track record of highlevel effectiveness with a variety of patients. 5. Future Directions We have described the evolution of outcomes-informed care from the perspective of advances in measurement methodology and the benefits of collaboration across multiple organizations. As we have shown, the measurement methodology and information technology needed to measure treatment outcomes systematically and cost effectively now exists. Many questions remain, but the rapid accumulation of data provides the necessary conditions for further knowledge. Unanswered questions include: What is the relationship between clinician’s use of feedback systems such as the Clinician’s Toolkit and treatment outcomes? Does feedback lead to increased clinician effectiveness? Which clinicians are most likely to benefit from feedback? While we believe we have a solid foundation for measuring treatment outcomes, the ethos of Measurement 2.0 encourages continuous experimentation and variation. Exploration of new variables to improve case mix models or provide clinicians with useful clinical information, combined with the open sharing of information results in best measurement practices being propagated across organizations. In our view, it is critical to support those clinicians willing to implement outcomes informed care. The various organizations represented by the authorship of this paper are reaching out to these clinicians and finding ways to facilitate the collection of data with as little cost to the clinicians as possible. By centralizing many of the processes for data capture and warehousing, the workload on the clinicians is reduced to the effort to give the questionnaire to the patient and later fax the form to a tool-free number. Making it easy to collect data is not enough. We need to support clinicians in the use of data to improve outcomes by providing ongoing training and consultation services designed to increase providers' sophistication and comfort with the methodology of outcomes-informed care. We need to recognize and reward highly effective clinicians. Certainly we can do all in our power to increase referral rates. Cost-benefit analyzes can be performed to justify preferential reimbursement rates. Unfortunately, in the near term the practicalities of provider contracting, not to mention variations in state laws and regulations, make pay-for-performance strategies difficult to implement. However, this should not keep us from thinking about it, and from planning for a future when many more clinicians are ready to join us in using outcome data for the benefit of themselves, and ultimately, for the patients. References Ahn, H., & Wampold, B.E. (2001). Where oh where are the specific ingredients? A metaanalysis of component studies in counseling and psychotherapy. Journal of Counseling Psychology, 48, 251-7. Azocar, F., Cuffel, B., McCulloch? , J., McCabe, J., Tani, S, & Brodey, B. (2007). Monitoring Patient Improvement and Treatment Outcomes in Managed Behavioral Health, March/April 2007 http://www.nahq.org/journal/ce/article.html?article_id=275 Bickel, R. (2007). Multilevel analysis for applied research : it's just regression! New York: Guilford Press. Bickman, L., Breda, C., Dew, S.E., Kelley, S.D., Lambert, E.W., Pinkard, T.J., et al. (2006). Peabody Treatment Progress Battery Manual Electronic version. Retrieved 09/01/06, from http://peabody.vanderbilt.edu/ptpb/ Bond, T.G., & Fox, C.M. (2001). Applying the Rasch model:fundamental measurement in the human sciences. Mahwah, NJ: L. Erlbaum. Brown, W. (1910). Some experimental results in t the correlation of mental abilities. British Journal of Psychology, 3, 296-322. Brown, G.S., Burlingame, G.M., Lambert, M.J., Jones, E. & Vacarro, J. (2001) Pushing the quality envelope: A new outcomes management system. Psychiatric Services, 52 (7), 925-934. Brown, G.S., Jones, E., Lambert, M.J., & Minami, T. (2005) Evaluating the effectiveness of psychotherapist in a managed care environment. American Journal of Managed Care. 2(8) pp 513-520. Crits-Christoph, P., Baranackie, K., Carrilm, K., Luborsky, L., McLellan, T., Woody, G., Thompson, L., Gallagier, D., & Zitrin, C. (1991). Meta-analysis of therapist effects in psychotherapy outcome studies. Psychotherapy Research, 1, 81-91. Crits-Christoph, P, Mintz J. (1991). Implications of therapist effects for the design and analysis of comparative studies of psychotherapies. J Consul Clin Psychol 59:20-6. Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334. Cronbach, L.J., & Shavelson, R.J. (2004). My Current thoughts on Coefficient Alpha and Successor Procedures. Educational and Psychological Measurement, 64(3), 391-418. Embretson, S. E., & Hershberger, S. L. (Eds.). (1999). The new rules of measurement: What every psychologist and educator should know: Mahwah, NJ, US: Lawrence Erlbaum Associates. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, N.J.: L. Erlbaum Associates. Greco, L. A., Lambert, W., & Baer, R. A. (2008). Psychological inflexibility in childhood and adolescence: Development and evaluation of the Avoidance and Fusion Questionnaire for Youth. Psychological Assessment, 20(2), 93-102. Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Longitudinal data analysis. Hoboken, NJ: Wiley-Interscience. Kim, D., Wampold, B. E., & Bolt, D. M. (2006). Therapist effects in psychotherapy: A randomeffects modeling of the National Institute of Mental Health Treatment of Depression Collaborative Research Program data. Psychotherapy Research, 16, 161-172. Lambert, M. J., & Ogles, B. M. (2004). The Efficacy and Effectiveness of Psychotherapy. In M. Lambert (Ed.), Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change (5th ed., pp. 139-193). New York, NY: John Wiley and Sons. Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., & Schabenger, O. (2006). SAS System for Mixed Models, Second Edition. Cary, N.C.: SAS Institute. Luborsky, L. (1952). The personality of the psychotherapist. Menninger Quarterly, 6, 1-6. Luborsky L, Rosenthal, R, Diguer L, Andrusyna, T., Berman, J.S., Levitt, J. T., Seligman, D.A., & Krause, E. D. (2002). The dodo bird verdict is alive and well—mostly. Clinical Psychology Science Practice, 9, 2-12. Luborsky, L, Crits-Christoph, P, McLellan, T, Woody, G., Piper, W., Liverman, B., Imber, S., & Pilkonis, P. 1986. Do therapists vary much in their success? Findings from four outcome studies. American Journal of Orthopsychiatry , 56, 501-12. Matumoto, K., Jones, E., & Brown, J. (2003) Using clinical informatics to improve outcomes: A new approach to managing behavioral healthcare. Journal of Information Technology in Health Care, 1, 135-150. McKay, K. M., Imel, Z. E., & Wampold, B. E. (2006). Psychiatrist effects in the psychopharmacological treatment of depression. Journal of Affective Disorders, 92, 287-290. Minami, T., Wampold, B. E., Serlin, R. C., Kircher, J. C., & Brown, G. S. (2007). Benchmarks for psychotherapy efficacy in adult major depression, Journal of Consulting and Clinical Psychology, 75, 232-243. Minami, T., Wampold, B. E., Serlin, R. C., Hamilton, E. G., Brown, G. S., & Kircher, J. C. (2008). Benchmarking the effectiveness of psychotherapy treatment for adult depression in a managed care environment. Journal of Consulting and Clinical Psychology, 76, 116-124. Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. New York: Springer. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using Stata, Second Edition. College Station, TX: StataCorp. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd edition) (Second ed.). Newbury Park, CA, US: Sage Publications, Inc. Rasch, G. (1980). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests (Expanded ed.). Chicago: University of Chicago Press. Robinson, L.A., Berman, J.S., & Neimeyer, R. A. (1990). Psychotherapy for treatment of depression: A comprehensive review of controlled outcome research. Psychological Bulletin, 108, 30-49. Rosenzweig, S. (1936). Some implicit common factors in diverse methods of psychotherapy. American Journal of Orthopsychatry, 6, 412-415. Shapiro, D.A., & Shapiro D. (1982). Meta-analysis of comparative therapy outcome studies: A replication and refinement. Psychological Bulletin, 92,581-604. Shapiro, A. K., & Shapiro, E. S. (1997). The powerful placebo: From ancient priest to modern medicine. Baltimore, MD: Johns Hopkins University Press. Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcomes studies. American Psychologist, 32, 752-760. Smith, M. L., & Glass, G. V. (1980). The benefits of psychotherapy. Baltimore: The John Hopkins University Press. Snijders, T., & Bosker, R. (1999). Multilevel Analysis: An introduction to basic and advanced multilevel modeling (2000 ed.). Thousand Oaks CA: Sage Publications. Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 171-195. Walker, L. S., Beck, J. E., Garber, J., & Lambert, W. (2008). Children's Somatization Inventory: Psychometric Properties of the Revised Form (CSI-24). J Pediatr Psychol. Wampold, B.E., (2001) The Great Psychotherapy Debate: Models, Methods and Findings. Mahwah, J.J.: Lawrence Erlbaum Associates, Inc. Wampold, B.E., & Brown, G.S. (2005) Estimating therapist variability: A naturalistic study of outcomes in private practice. Journal of Consulting and Clinical Psychology 75, 914-923. Wampold, B. E., Mondin, G.W., Moody, M., & Ahn, H. (1997). A meta-analysis of outcome studies comparing bona fide psychotherapies: Empirically, “All must have prizes.” Psychological Bulletin, 122, 203-15. Wampold, B, E., & Serlin, R.C. (2000). The consequences of ignoring a nested factor on measures of effect size in analysis of variance designs. Psychological Methods, 4, 425-33. Yu, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Unpublished Doctoral dissertation, UCLA, Los Angeles