Download White Paper on Measurement 2 - A Collaborative Outcomes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Adherence (medicine) wikipedia , lookup

Patient safety wikipedia , lookup

Multiple sclerosis research wikipedia , lookup

Transcript
White Paper on Measurement 2.0
Table of Contents
1. Introduction
2. Background Knowledgebase
a. Psychometrics
b. Therapist Variability in Clinical Outcomes
c. Benchmarking
3. Components
a. Patient Self-Report Items and Questionnaires
b. Technology
c. Data Standardization and Analytical Methods
d. Clinician Feedback System
4. Clinician Participation
5. Future Directions
1. Introduction
This white paper represents a collaboration of individuals from a wide range of public and
private organizations and academic institutions. These individuals share a common commitment
to improving treatment outcomes in behavioral health-care through the routine use of patientcompleted outcome questionnaires concurrently with treatment. The use of routine outcome
assessment combined with providing ongoing feedback to the treating clinicians, is often referred
to as outcomes-informed care. Technology involved in outcomes-informed care illustrated in this
paper is collectively referred to as "Measurement 2.0," indicating accumulated innovations that
are a substantial departure from 20th century practices.
Back in the 20th century, in “Measurement 1.0,” symptom ratings were sold in identical forms to
all users. Large-scale users, such as behavioral health organizations, had to buy copyrighted
commercial test forms and pay prices from $1 to as high as $10 per administration, or pay for
expensive remote scoring services. Test authors placed idiosyncratic limitations on how the test
could be used, sometimes charging different prices for academic and commercial users. By the
end of the 20th century, there were so many proprietary mental-health symptom checklists that it
was hard to believe any single one was the best.
The new approach to measurement, “Measurement 2.0,” realizes that client ratings of general
distress are no longer rocket science. Well-known psychometric procedures can now be applied
inexpensively to any questionnaire to assess whether or not all items work in the way they were
intended. If organizations need to create or revise items, the new items can be evaluated in less
than a day of statistical analysis once the data are collected. Therefore, dependence on
proprietary measures is no longer necessary.
As an outcomes-informed care system, Measurement 2.0 is comprised of the following
components: (a) patient self-report items and questionnaires, (b) data collection technology and
procedures, (c) data standardization and analytical methods, and (d) feedback system. As a
general flow, the patient in therapy is requested by the participating clinician to complete a brief
self-report questionnaire comprised of common and specific psychological symptoms and
patient-therapist treatment alliance indicators. The self-report forms completed by the patients
are then faxed to a central server that scans the information into a central database. Then, based
on the overall collected data, the patients’ responses are standardized so that they can be
compared to one another. The standardized outcomes of the patients are then communicated back
to the clinicians via a web-based toolkit, allowing the clinicians to keep track of their clients’
progress as well as reevaluate their current course of treatment with each of their patients.
Measurement 2.0 is thus a compilation of several recent advancements in behavioral health
assessments, notably psychometrics, research on therapist variability in clinical outcomes, and
treatment effectiveness benchmarking methodology. This outcomes-informed care technology
has several advantages to all previous and other current outcomes-informed care technologies
that have been investigated. First, rather than using a questionnaire using a fixed set of items, it
could flexibly incorporate items that are individually tailored to the specific needs of the
organization, the therapist, and/or that of the individual client. Second, although such flexibility
often results in difficulty comparing across individual clients and organizations, use of item
response theory allows for comparisons across clients regardless of their individualized
assessment. Therefore, the individualized questionnaires that the clients fill out retain the utility
at all levels, including that of the clients, therapists, and organizations.
2. Background Knowledgebase
a. Psychometrics
In the past, most outcomes-informed care initiatives in behavioral health were based on normed,
copyrighted questionnaires originating in academic research. However, from the standpoint of
organizations interested in implementing outcomes-informed care initiatives, relying on these
measures was advantageous only if the organization had no access to large quantities of
normative data. However, large-scale commercial outcomes-informed care initiatives generate
data at rates and amounts far surpassing any research study, so the size, quality, and relevance of
the normative data quickly exceeds data available for any research-based instruments.
Copyrighted questionnaires have additional disadvantages in large-scale outcomes-informed care
initiatives. Aside from the obvious issue of cost, the end user (e.g., clients, therapists,
organizations) has no control over the measured items. Test publishers are generally reluctant to
permit the modification of their questionnaires, including rewording items, reducing the number
of items, and adding new items. Such prohibitions severely limit the end user’s capacity to
optimize the questionnaires based on their measurement needs.
These limitations of copyrighted questionnaires led larger managed care organizations to devise
their own outcome measures. With the aid of in-house and consulting statisticians, organizations
can create items and questionnaires that are as sophisticated as research-based measures yet are
flexible and fit their measurement needs. The Wellness Assessment (Adult and Youth versions)
employed by Optum Health is an example of such a questionnaire. In addition, other
organizations represented by the authors of this article continue to develop items and
questionnaires based on sophisticated analyses incorporating both classical test theory and item
response theory (IRT).
This psychometric approach, which the authors sometimes refer to as the “psychometric torture
test,” was established at Vanderbilt University and was applied to develop the Peabody Battery
(Bickman et al., 2006), which is a set of 10 free measures for providing feedback to clinicians
about the progress of child clients. The purpose of the approach was to make item evaluation
objective and public, so there is a clear evidence-based rationale when items are retained, revised,
or dropped.
The “torture test” is ecumenical. It views the three main approaches to psychometrics (classical
test theory, factor analysis, and item response theory) as useful tools rather than as competing
religions. Classical test theory includes a number of useful tools, such as Cronbach’s alpha
(Cronbach, 1951; Cronbach & Shavelson, 2004), item-total correlations, and the SpearmanBrown (Brown, 1910; Spearman, 1910) approximation. Factor analysis may be exploratory
and/or confirmatory, and it offers tools such as scree plots, factor loadings, and, most importantly,
model-fit statistics (Yu, 2002). Item Response Theory (IRT) has two broad streams, oneparameter Rasch models (Bond & Fox, 2001; Rasch, 1980), which offer many tools for practical
test construction, and multiple-parameter IRT (Embretson & Hershberger, 1999; Embretson &
Reise, 2000), which offers a great depth of features for model-based measurement research.
The torture test provides the test developer with exhaustive figures of merit for every test item
and other figures of merit for the test as a whole. All statistics are flagged by the statistician as
good or bad. After a brief explanation, test developers who are not statisticians have been able to
make evidence-based decisions on item revision and retention, and make reasonable decisions
about the tradeoff between test length and reliability. Readers interested in the technical details
will find examples in the free manual for the Peabody Battery (Bickman et al., 2006), or in two
articles (Greco, Lambert, & Baer, 2008; Walker, Beck, Garber, & Lambert, 2008). The
Psychological Assessment article contains the most technical detail.
b. Therapist Variability in Client Outcomes
The idea that clinicians differ in clinical competence and effectiveness is a hypothesis that has
been entertained since perhaps the inception of psychotherapy itself (e.g., Luborsky, 1952;
Rosenzweig, 1936). However, 20th century results were obscure until the statistical method of
hierarchical linear modeling (HLM; Raudenbush & Bryk, 2002) became widely available to
psychologists. This approach has various names, such as random regression (Hedeker &
Gibbons, 2006), multilevel modeling (Snijders & Bosker, 1999), and mixed modeling (Pinheiro
& Bates, 2000). By the late 1990’s dependable software was available, e.g. from SAS (Littell,
Milliken, Stroup, Wolfinger, & Schabenger, 2006); HLM, (Raudenbush & Bryk, 2002); SPSS
(Bickel, 2007); Stata (Rabe-Hesketh & Skrondal, 2005) and S+ or R (Pinheiro & Bates, 2000).
Behavioral health data is well fitted for HLM because there are multiple patients who are seen by
the same clinician. HLM supports estimates of variance both at the client and therapist levels; a
third level, e.g., clinic may simultaneously be modeled as well.
Before user-friendly software was available for HLM, clinical trials and treatment evaluation
studies ignored the problem of nested data. As such, they likely overestimated the effect of the
type of treatment rather than who is administering the treatment (Wampold & Serlin, 2000;
Wampold, 2001; Kim, Wampold & Bolt, 2006). Recent analyses have since revealed that the
percentage of variance in treatment outcomes attributed to therapists far exceed the variance due
to the type of treatment (Luborsky et al., 1986; Crits-Christoph et al, 1991; Crits-Christoph &
Mintz, 1991; Wampold & Serlin, 2000; Wampold, 2001; Kim, Wampold & Bolt, 2006). In
addition, almost three decades of randomized clinical trials and meta-analyses of these trials have
consistently supported the “dodo bird effect,” that all bona fide psychological therapies appear to
have similar treatment effects (Smith & Glass, 1977; Smith & Glass, 1980; Shapiro & Shapiro,
1982; Robinson, Berman, & Neimeyer, 1990; Wampold, Mondin, Moody, & Ahn, 1997; Ahn &
Wampold, 2001; Wampold, 2001; Luborsky et al., 2002; Lambert & Ogles, 2004). Further, such
effects have been found in studies investigating the effects of antidepressants (Kim, Wampold, &
Bolt, 2006; Wampold & Brown, 2005). In particular, Wampold and Brown found that for
patients who are receiving medications and psychotherapy concurrently, the effect of
medications was impacted by the clinician’s competence in providing psychotherapy. These
findings suggest that the contributions made by individual therapists far exceed the effects of the
type of treatment.
The current Measurement 2.0 database, which contains cases from multiple organizations, shows
that therapists do indeed vary in their clinical outcomes. In addition, these differences appear to
be stable over time. When patients are referred to clinicians who have above average outcomes
in the past, these patients are more likely to report better than average outcomes. In addition, the
data has shown that clinicians who support and actively participate in the outcomes-informed
care initiatives, on average, report above-average results than those who participate sparsely.
Therefore, in addition to the research evidence, it is in the best interest of effective clinicians to
participate in outcomes-informed care initiatives because participation would likely provide
favorable evidence of their clinical effectiveness. On the other hand, for clinicians who are
getting below-average results, efforts to improve their outcomes by use of the feedback tool
would also likely lead to favorable results. Likewise, it is in the interest of employers and payers
of mental health services to identify these effective clinicians in order increase patient access to
these highly effective clinicians. Conversely, it is also in the interest of payers to identify
clinicians whose clinical competence is less than optimal so as to implement strategies to
improve their clinical competence. Most importantly, outcomes-informed care initiatives are in
the patients’ best interest. Outcomes-informed care can help connect clients with the most
effective therapists and they can improve outcomes by the effects of outcome monitoring itself.
What is good for clients can be a common ground for therapists, clinics, and insurers whose
existence are justified—or should be—by what is good for the client.
c. Benchmarking
To say whether particular outcomes are “good,” we ask the question, “Compared to what?” That
there is variability among therapists' effectiveness need not be a problem if even the least
effective therapists are still effective enough based on some criterion. Conversely, variability
among therapists becomes less of an issue if at all, if those who are considered relatively highly
effective, are very ineffective based on some criterion. Therefore, the question of criterion
becomes central. Such evaluations that compare data obtained from outcomes-informed care
initiatives and other real-world clinical settings are called benchmarking. Over the past decade, a
growing number of research articles have addressed methods of benchmarking behavioral health
treatment outcomes (Azocar et al, 2007; Brown et al, 2002; Brown et al, 2005; Matumoto, Jones,
& Brown, 2003; Minami et al., 2007; Minami et al., 2008).
Traditionally, benchmarks established from research have been considered more valid than
treatment outcome data based on real-world outcomes-informed care initiatives. Consequently, a
recent study has compared data from outcomes-informed care initiatives against benchmarks
derived from clinical trials and has found that behavioral health services provided in a managed
care context has similar treatment effect compared to well-controlled clinical trials (Minami et al.,
2008). However, as previously mentioned, data accumulated from large-scale outcomesinformed care initiatives such as Measurement 2.0 has now far exceeded the amount of data
obtained from any clinical trial, and therefore, it is expected that future benchmarks will be
derived from these outcomes-informed care initiatives rather than from research.
An additional issue that supports future benchmarks to be established using large-scale
outcomes-informed care initiatives is that unlike randomized clinical trials, real-world clinical
data include patients that would be considered unfit to include in clinical trials (e.g., low initial
severity, co-morbid conditions). Whereas clinical trials pursue homogeneity in their patients,
clinics in the real world do not choose patients who come through their door. Therefore, the
benchmarks currently established from randomized clinical trials are clearly not suited for
patients in these clinical settings, and any benchmarks established needs to accommodate this
heterogeneity in patient population. Consequently, Measurement 2.0 extends the established
benchmarking method to incorporate this heterogeneity in patient population. We refer to the
resulting indicator of treatment effect as the Severity Adjusted Effect Size (SAES), which will be
explained more in detail under Data Standardization and Analytical Methods.
3. Components
a. Patient Self-Report Items and Questionnaires
Although there have always been criticisms within academic institutions against patient selfreport questionnaires and items as being unreliable and invalid, such criticisms are difficult to
substantiate based on two viewpoints. First, there has been consistent evidence pointing to larger
treatment effect when measured by clinicians (Lambert, Hatch, Kingston, & Edwards, 1986;
Minami et al., 2007; Smith, Glass, & Miller, 1980), even when the evaluating clinician is not the
one providing treatment. Therefore, it is conceivable that the clinicians consider their patients
doing better than what the patients subjectively perceive. Second, there has been a historical shift
in behavioral health with regards to who has the “authority” to determine whether or not the
patient improved. Whereas traditionally the clinicians had sole authority, in recent years, the
patients have been empowered so as to consider one’s own sense of distress (or lack thereof) as
an important indicator of improvement. Therefore, from these perspectives, it is unreasonable to
assume that patient’s self-reported clinical symptoms are invalid.
A more practical issue from the perspective of implementing outcomes-informed care initiatives
is that it is untenable to establish a system based on clinician’s report of treatment progress. The
three main reasons is that (a) clinicians, overwhelmed with administrative responsibilities as it is
(Rupert & Baird, 2004), are most likely unwilling to increase further administrative tasks, (b)
clinicians, especially if treating the patient, are very motivated to find “progress” that may or
may not be important from the patient’s perspective, and (c) cost would prohibit implementation
of any system that requires an independent clinician (i.e., not the treating clinician) to assess
patient outcomes. Therefore, these reasons also argue against outcomes-informed care initiatives
that center around clinicians’ clinical evaluations of patients.
However, this is not to say that clinicians’ inputs are unnecessary or unimportant in outcomesinformed care initiatives. Specifically, feedback from communities of clinicians has been
thoroughly incorporated throughout the development of Measurement 2.0, including professional
organizations such as the American Psychological Association and the American Psychiatric
Association. In particular, based on numerous feedback, Measurement 2.0 has clearly moved
away from assessment of progress based on a particular set of items on a single questionnaire,
regardless of how global these items may be. Rather, numerous items have been, and continue to
be populated, based on input from communities of clinicians so as to better reflect their patients’
issues. By incorporating IRT to discern how each item behaves relative to other items,
standardization has been possible at the item level so that meaningful comparisons could still be
done even if two forms do not have identical items.
b. Technology
It is a common misunderstanding that the newest technology is always the best for all systems.
The same could be said about outcomes-informed care initiatives that often aim to be too
technically “advanced.” Based on over two decades of outcomes-informed care implementation,
it is now firmly established that Measurement 2.0 cannot be widely implemented if data
collection is done purely online. The main reasons are all related to cost. First, as compared to
traditional paper-and-pencil methods, the cost becomes prohibitive if the clinicians intend to
collect data from more than one patient simultaneously, which then requires multiple computers.
Secondly, computers require much more space than paper. Third, whereas paper-and-pencil
questionnaires require no instructions, computer-administered assessments require instructions
given by a staff member. Fourth, it is more costly to take measures to protect patients’
confidentiality when they are filling out the questionnaires electronically than when using paper.
Therefore, the technology involved with large-scale outcomes-informed care initiatives must be
one that is low-cost and available anywhere—traditional paper-and-pencil questionnaires that are
optimized for optical character and mark recognition software. These forms can easily be faxed
to a central data server that scans the forms and electronically records the patients’ responses.
c. Data Standardization and Analytical Methods
SAES is derived through three statistical steps: (a) calculation of observed effect size, (b)
calculation of predicted effect size based on case mix variables, and (c) adjusting the observed
effect size based on the difference between the observed effect size and the predicted effect size
(i.e., residual). The observed effect size is calculated simply by dividing the difference between
the intake score and the score at last treatment with the standard deviation of the intake score.
Although this statistic provides the absolute magnitude of the effect, in outcomes-informed care
initiatives with patient heterogeneity, the observed effect sizes cannot be used to compare across
patients due to case mix. Therefore, in the second step, the effect of the case mix is evaluated
using multiple regression. Through the use of regression, the impact of each of the case mix
variables is estimated, and a prediction is made with regards to the expected effect size given a
particular case mix. As a result, in the third step, SAES is calculated by adding the difference
between the expected effect size and the observed (i.e., residual) to the observed mean effect size
of the outcomes-informed care population. In this way, the effect sizes are standardized across
patients allowing for meaningful comparisons across patients, clinicians, and organizations
regardless of the differences in case mix.
Among numerous case mix variables, the initial severity of the patient’s distress as indicated by
their intake score has invariably been the strongest predictor of change. Specifically, as greater
distress has consistently led to greater observed effect size, SAES adjusts the observed effect size
based on the patient’s initial severity.
As a related issue, outcomes-informed care initiatives data have consistently revealed that a
significant percentage (25~35%) of the patients will enter treatment with intake scores indicating
low levels of distress that is comparable to individuals in the community who have never sought
behavioral health services. This group of patients, although typically satisfied with the received
services, on average does not show improvement on the outcome questionnaires, and rather, tend
to trend upwards in the course of treatment (i.e., become “worse”). Currently, patients who
initially score at this low level of distress are excluded from the analysis because (a) including
this group would make any comparisons with clinical trials benchmarks invalid and (b) the
general clinical symptom measure as it is currently developed do not appear to be appropriate in
measuring treatment progress for this population.
d. Clinician Feedback System
Arguably the most crucial aspect of any outcomes-informed care is the feedback given to the
clinicians with regard to their patients, as without this, it is questionable as to whether or not the
patients will ever benefit from any initiatives. In Measurement 2.0, a web-based application (i.e.,
Clinician’s Toolkit) is used to provide clinicians access to their data to view both aggregated
results as well as individual patient-level data with session-by-session test scores. As clinicians
differ in what they would like to see, the Clinician’s Toolkit allows for customization of the
reported data. The interactive aspects of the Toolkit help engage clinicians to monitor their own
outcomes, becoming what we refer to as “outcomes-informed clinicians.”
4. Clinician participation
The psychometric and technological challenges addressed in this article are trivial compared to
the difficulty of engaging clinicians in the process. Not surprisingly, the message "I'm from the
managed care company and I'm here to measure you " has not been met with enthusiasm by the
majority of clinicians.
Lack of interest in outcomes-informed care initiatives takes many forms. Most common are the
understandable complaints about the additional workload on clinicians and staff. There is no
getting around the problem that even the simplest questionnaires will require work for someone,
and the frequent initial reaction from providers is to view measurement efforts as yet another
administrative burden that might serve some agenda of the managed care organization, but have
no benefit to patients or clinicians.
This reaction is entirely understandable. Furthermore, there is often a knowledge gap between
the providers in the field and the organizations encouraging the measurement initiatives. While
senior executives in the organizations have no choice but to be familiar with the research basis
for outcomes-informed care (not to mention have access to internal data), most providers rarely
have the time to familiarize themselves with research findings. In addition, providers often
regard any initiatives from the managed care organizations as further pursuit of profit. After all,
managed care organizations are businesses.
Clinicians’ participation rates vary even when there are clear incentives, such as an increase in
referrals and/or higher reimbursement rates. The reasons take many forms, often having to do
with concerns about measurement methods and most importantly, case mix adjustment.
Providers often express the concern that because their cases are qualitatively different and/or
more difficult than those of other providers, they would be unfairly penalized by any
measurement efforts. It is therefore clear that many providers express anxiety regarding
measuring their own outcomes and making these results available to potential patients.
A close look at actual outcome data may provide some clues regarding the source of anxiety. The
fact is that outcomes are highly variable, even for the most effective clinicians. Some patients get
significantly worse, some experience rapid improvement, and most are somewhere in between.
The simple fact of such high variability makes it cognitively impossible to "guesstimate" what
their average outcome is. In addition, for any given case, clinicians likely have a number of
different hypotheses as to why treatment was or was not effective for their clients. Therefore, the
anxiety is likely rooted in the fact that clinicians, except for those who are actively participating
in outcomes-informed care initiatives, have difficulty in objectively assessing what their actual
outcomes are or how they might compare to their peers. It is natural to be anxious under these
circumstances.
At the same time, there is a subset of clinicians who do appear willing to participate without too
much encouragement. While currently a clear minority, our data suggests that these clinicians
tend to have exceptionally good outcomes. Therefore, it is clearly to the advantage of potential
patients, employers, and payers that these clinicians be recognized and every effort made to steer
referrals to them. Likewise, strong evidence of effectiveness could be used to justify increased
reimbursement, regardless of their level of training.
Therefore, in summary, it appears in the best interest of employers, health plans, managed care
organizations, and most importantly, patients, to support these clinicians in their willingness to
participate in outcomes informed care initiatives. Transparency in how data is analyzed and
reported, combined with forthrightness as to the intended use of the data serve to lesson
clinicians’ anxiety. Once clinicians see that their results are actually very good and they are
recognized and rewarded for their efforts, ongoing (even enthusiastic) participation is likely.
It does little good to argue against clinicians who are convinced that measurements are
unnecessary. Limited resources are much better devoted to working with providers who are
willing to collaborate in gathering practice-based evidence of effectiveness.
Similarly, there is no reason for any organization to do anything that might be harmful to a
provider's reputation or to their ability to earn a living. However, the unwillingness to say
something negative about any provider should not prevent organizations from saying something
positive about a clinician with strong evidence. Clearly, patients are better served if managed
care organizations make effort to steer referrals towards clinicians with a track record of highlevel effectiveness with a variety of patients.
5. Future Directions
We have described the evolution of outcomes-informed care from the perspective of advances in
measurement methodology and the benefits of collaboration across multiple organizations. As
we have shown, the measurement methodology and information technology needed to measure
treatment outcomes systematically and cost effectively now exists.
Many questions remain, but the rapid accumulation of data provides the necessary conditions for
further knowledge. Unanswered questions include: What is the relationship between clinician’s
use of feedback systems such as the Clinician’s Toolkit and treatment outcomes? Does feedback
lead to increased clinician effectiveness? Which clinicians are most likely to benefit from
feedback?
While we believe we have a solid foundation for measuring treatment outcomes, the ethos of
Measurement 2.0 encourages continuous experimentation and variation. Exploration of new
variables to improve case mix models or provide clinicians with useful clinical information,
combined with the open sharing of information results in best measurement practices being
propagated across organizations.
In our view, it is critical to support those clinicians willing to implement outcomes informed care.
The various organizations represented by the authorship of this paper are reaching out to these
clinicians and finding ways to facilitate the collection of data with as little cost to the clinicians
as possible. By centralizing many of the processes for data capture and warehousing, the
workload on the clinicians is reduced to the effort to give the questionnaire to the patient and
later fax the form to a tool-free number.
Making it easy to collect data is not enough. We need to support clinicians in the use of data to
improve outcomes by providing ongoing training and consultation services designed to increase
providers' sophistication and comfort with the methodology of outcomes-informed care.
We need to recognize and reward highly effective clinicians. Certainly we can do all in our
power to increase referral rates. Cost-benefit analyzes can be performed to justify preferential
reimbursement rates. Unfortunately, in the near term the practicalities of provider contracting,
not to mention variations in state laws and regulations, make pay-for-performance strategies
difficult to implement. However, this should not keep us from thinking about it, and from
planning for a future when many more clinicians are ready to join us in using outcome data for
the benefit of themselves, and ultimately, for the patients.
References
Ahn, H., & Wampold, B.E. (2001). Where oh where are the specific ingredients? A metaanalysis of component studies in counseling and psychotherapy. Journal of Counseling
Psychology, 48, 251-7.
Azocar, F., Cuffel, B., McCulloch? , J., McCabe, J., Tani, S, & Brodey, B. (2007). Monitoring
Patient Improvement and Treatment Outcomes in Managed Behavioral Health, March/April
2007 http://www.nahq.org/journal/ce/article.html?article_id=275
Bickel, R. (2007). Multilevel analysis for applied research : it's just regression! New York:
Guilford Press.
Bickman, L., Breda, C., Dew, S.E., Kelley, S.D., Lambert, E.W., Pinkard, T.J., et al. (2006).
Peabody Treatment Progress Battery Manual Electronic version. Retrieved 09/01/06, from
http://peabody.vanderbilt.edu/ptpb/
Bond, T.G., & Fox, C.M. (2001). Applying the Rasch model:fundamental measurement in the
human sciences. Mahwah, NJ: L. Erlbaum.
Brown, W. (1910). Some experimental results in t the correlation of mental abilities. British
Journal of Psychology, 3, 296-322.
Brown, G.S., Burlingame, G.M., Lambert, M.J., Jones, E. & Vacarro, J. (2001) Pushing the
quality envelope: A new outcomes management system. Psychiatric Services, 52 (7), 925-934.
Brown, G.S., Jones, E., Lambert, M.J., & Minami, T. (2005) Evaluating the effectiveness of
psychotherapist in a managed care environment. American Journal of Managed Care. 2(8) pp
513-520.
Crits-Christoph, P., Baranackie, K., Carrilm, K., Luborsky, L., McLellan, T., Woody, G.,
Thompson, L., Gallagier, D., & Zitrin, C. (1991). Meta-analysis of therapist effects in
psychotherapy outcome studies. Psychotherapy Research, 1, 81-91.
Crits-Christoph, P, Mintz J. (1991). Implications of therapist effects for the design and analysis
of comparative studies of psychotherapies. J Consul Clin Psychol 59:20-6.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,
297-334.
Cronbach, L.J., & Shavelson, R.J. (2004). My Current thoughts on Coefficient Alpha and
Successor Procedures. Educational and Psychological Measurement, 64(3), 391-418.
Embretson, S. E., & Hershberger, S. L. (Eds.). (1999). The new rules of measurement: What
every psychologist and educator should know: Mahwah, NJ, US: Lawrence Erlbaum Associates.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, N.J.:
L. Erlbaum Associates.
Greco, L. A., Lambert, W., & Baer, R. A. (2008). Psychological inflexibility in childhood and
adolescence: Development and evaluation of the Avoidance and Fusion Questionnaire for Youth.
Psychological Assessment, 20(2), 93-102.
Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Longitudinal data analysis.
Hoboken, NJ: Wiley-Interscience.
Kim, D., Wampold, B. E., & Bolt, D. M. (2006). Therapist effects in psychotherapy: A randomeffects modeling of the National Institute of Mental Health Treatment of Depression
Collaborative Research Program data. Psychotherapy Research, 16, 161-172.
Lambert, M. J., & Ogles, B. M. (2004). The Efficacy and Effectiveness of Psychotherapy. In M.
Lambert (Ed.), Bergin and Garfield’s Handbook of Psychotherapy and Behavior Change (5th ed.,
pp. 139-193). New York, NY: John Wiley and Sons.
Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., & Schabenger, O. (2006). SAS
System for Mixed Models, Second Edition. Cary, N.C.: SAS Institute.
Luborsky, L. (1952). The personality of the psychotherapist. Menninger Quarterly, 6, 1-6.
Luborsky L, Rosenthal, R, Diguer L, Andrusyna, T., Berman, J.S., Levitt, J. T., Seligman, D.A.,
& Krause, E. D. (2002). The dodo bird verdict is alive and well—mostly. Clinical Psychology
Science Practice, 9, 2-12.
Luborsky, L, Crits-Christoph, P, McLellan, T, Woody, G., Piper, W., Liverman, B., Imber, S., &
Pilkonis, P. 1986. Do therapists vary much in their success? Findings from four outcome studies.
American Journal of Orthopsychiatry , 56, 501-12.
Matumoto, K., Jones, E., & Brown, J. (2003) Using clinical informatics to improve outcomes: A
new approach to managing behavioral healthcare. Journal of Information Technology in Health
Care, 1, 135-150.
McKay, K. M., Imel, Z. E., & Wampold, B. E. (2006). Psychiatrist effects in the
psychopharmacological treatment of depression. Journal of Affective Disorders, 92, 287-290.
Minami, T., Wampold, B. E., Serlin, R. C., Kircher, J. C., & Brown, G. S. (2007). Benchmarks
for psychotherapy efficacy in adult major depression, Journal of Consulting and Clinical
Psychology, 75, 232-243.
Minami, T., Wampold, B. E., Serlin, R. C., Hamilton, E. G., Brown, G. S., & Kircher, J. C.
(2008). Benchmarking the effectiveness of psychotherapy treatment for adult depression in a
managed care environment. Journal of Consulting and Clinical Psychology, 76, 116-124.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. New York:
Springer.
Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using Stata,
Second Edition. College Station, TX: StataCorp.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data
analysis methods (2nd edition) (Second ed.). Newbury Park, CA, US: Sage Publications, Inc.
Rasch, G. (1980). Studies in mathematical psychology: I. Probabilistic models for some
intelligence and attainment tests (Expanded ed.). Chicago: University of Chicago Press.
Robinson, L.A., Berman, J.S., & Neimeyer, R. A. (1990). Psychotherapy for treatment of
depression: A comprehensive review of controlled outcome research. Psychological Bulletin,
108, 30-49.
Rosenzweig, S. (1936). Some implicit common factors in diverse methods of psychotherapy.
American Journal of Orthopsychatry, 6, 412-415.
Shapiro, D.A., & Shapiro D. (1982). Meta-analysis of comparative therapy outcome studies: A
replication and refinement. Psychological Bulletin, 92,581-604.
Shapiro, A. K., & Shapiro, E. S. (1997). The powerful placebo: From ancient priest to modern
medicine. Baltimore, MD: Johns Hopkins University Press.
Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy outcomes studies.
American Psychologist, 32, 752-760.
Smith, M. L., & Glass, G. V. (1980). The benefits of psychotherapy. Baltimore: The John
Hopkins University Press.
Snijders, T., & Bosker, R. (1999). Multilevel Analysis: An introduction to basic and advanced
multilevel modeling (2000 ed.). Thousand Oaks CA: Sage Publications.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3,
171-195.
Walker, L. S., Beck, J. E., Garber, J., & Lambert, W. (2008). Children's Somatization Inventory:
Psychometric Properties of the Revised Form (CSI-24). J Pediatr Psychol.
Wampold, B.E., (2001) The Great Psychotherapy Debate: Models, Methods and Findings.
Mahwah, J.J.: Lawrence Erlbaum Associates, Inc.
Wampold, B.E., & Brown, G.S. (2005) Estimating therapist variability: A naturalistic study of
outcomes in private practice. Journal of Consulting and Clinical Psychology 75, 914-923.
Wampold, B. E., Mondin, G.W., Moody, M., & Ahn, H. (1997). A meta-analysis of outcome
studies comparing bona fide psychotherapies: Empirically, “All must have prizes.” Psychological
Bulletin, 122, 203-15.
Wampold, B, E., & Serlin, R.C. (2000). The consequences of ignoring a nested factor on
measures of effect size in analysis of variance designs. Psychological Methods, 4, 425-33.
Yu, C.-Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with
binary and continuous outcomes. Unpublished Doctoral dissertation, UCLA, Los Angeles