Download HealthCare: The Journal of Delivery Science and

Document related concepts

Health equity wikipedia , lookup

Rhetoric of health and medicine wikipedia , lookup

Long-term care wikipedia , lookup

Catholic Church and health care wikipedia , lookup

Patient safety wikipedia , lookup

Managed care wikipedia , lookup

Transcript
 Volume 1 Issues 1–2
Amsterdam–Boston–London–New York–Oxford–Paris–Philadelphia–San Diego–St. Louis
© 2013 Elsevier Inc. All rights reserved.
This journal and the individual contributions contained in it are protected under copyright by Elsevier Inc., and the following terms and
conditions apply to their use:
Photocopying
Single photocopies of single articles may be made for personal use as allowed by national copyright laws. Permission of the Publisher and
payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional
purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for
non-profit educational classroom use.
Permissions may be sought directly from Elsevier’s Rights Department in Oxford, UK: phone: (+44) 1865 843830; fax: (+44) 1865 853333;
e-mail: [email protected]. Requests may also be completed on-line via the Elsevier homepage (http://www.elsevier.com/locate/
permissions). In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive,
Danvers, MA 01923, USA; phone: (+1) (978) 7508400; fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid
Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44) 20 7631 5555; fax: (+44) 20 7631 5500. Other
countries may have a local reprographic rights agency for payments.
Derivative Works
Subscribers may reproduce tables of contents or prepare lists of articles including abstracts for internal circulation within their institutions.
Permission of the Publisher is required for resale or distribution outside the institution. Permission of the Publisher is required for all other
derivative works, including compilations and translations.
Electronic Storage or Usage
Permission of the Publisher is required to store or use electronically any material contained in this journal, including any article or part of
an article.
Except as outlined above, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any
means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions
requests to: Elsevier Rights Department, at the fax and e-mail addresses noted above.
Notice
No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because
of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Although all
advertising material is expected to conform to ethical (medical) standards, inclusion in this publication does not constitute a guarantee or
endorsement of the quality or value of such product or of the claims made of it by its manufacturer.
Publication information: Journal of healthcare (ISSN 2213-0764). For 2013, volume 1 is scheduled for publication by Elsevier (Radarweg 29,
1043 NX Amsterdam, The Netherlands) and distributed by Elsevier, 360 Park Avenue South, New York, NY 10010-1710, USA. Subscription
prices are available upon request from the Publisher or from the Regional Sales Office nearest you or from this journal’s website (http://www.
elsevier.com/locate/hjdsi). Further information is available on this journal and other Elsevier products through Elsevier’s website: (http://
www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard
mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made
within six months of the date of dispatch.
Orders, claims, and journal enquiries: please contact the Customer Service Department at the Regional Sales Office nearest you:
Americas: Elsevier, Customer Service Department, 6277 Sea Harbor Drive, Orlando, FL 32887-4800, USA; phone: (+1) (877) 8397126 [toll free
number for US customers], or (+1) (407) 3454020 [customers outside US]; fax: (+1) (407) 3631354; e-mail: [email protected]
Europe: Elsevier, Customer Service Department, PO Box 211, 1000 AE Amsterdam, The Netherlands; phone: (+31) (20) 4853757; fax: (+31) (20)
4853432; e-mail: [email protected]
Tokyo: Elsevier, Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg, 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan;
phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail: [email protected]
Asia: Elsevier, Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65)
67331510; e-mail: [email protected]
Advertising information: If you are interested in advertising or other commercial opportunities please e-mail [email protected]
and your enquiry will be passed to the correct person who will respond to you within 48 hours.
USA mailing notice: Journal of Obsessive-Compulsive and Related Disorders (ISSN 2211-3649) is published 4 times a year by Elsevier Inc. (P.O. Box
211, 1000 AE Amsterdam, The Netherlands). Periodical postage paid at Rahway, NJ and additional mailing offices.
USA POSTMASTER: Send address changes to Journal of Obsessive-Compulsive and Related Disorders, Elsevier Customer Service Department,
3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365,
Blair Road, Avenel, NJ 07001.
The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper)
Printed in the United States of America
Healthcare 1 (2013) 1
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Introducing you to Healthcare: The Journal of Delivery Science and Innovation
Amol Navathe n, Sachin Jain, Ashish Jha, Arnold Milstein, Richard Shannon
17 Worcester Street, Unit 7, Boston, MA 02118, United States
Dear Readers,
It is with great pleasure that we introduce you to Healthcare:
The Journal of Delivery Science and Innovation.
As our journal moves forward, we hope you will find on its pages
descriptions of leading strategies to implement new payment models
and improve care delivery processes; creative approaches to organizing care around patients and their conditions; and novel applications
of information technology to enhance health system performance.
With a rapid pace of health system transformation, there exists a
tremendous opportunity to learn what works and what does not to
improve patient care.
We hope this journal will occupy a unique space: a hub for the
growing community of practitioners to share critical ideas and
n
Corresponding author. Tel.: +1 12679758833.
E-mail addresses: [email protected],
[email protected] (A. Navathe).
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.05.004
knowledge about healthcare delivery with one another. More
importantly, we hope that decision-makers will find the evidence,
strategies, and methods needed to support translation of first-rate
ideas into broader practice. We will highlight the successes and
challenges of care delivery innovation and work to close the
knowledge gaps that exist between academics, front-line practitioners, industry and policy-makers.
Our first issue focuses on these issues as they relate to payment
reform—a key emerging driver of health system transformation.
We hope you will consider contributing your work to Healthcare:
The Journal of Delivery Science and Innovation. Only if we work
together and share our experiences, knowledge, and insights can
we unlock the true potential that exists within our systems of care.
Healthcare 1 (2013) 2–2
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Editorial
Introduction to Healthcare: The Journal of Delivery Science and Innovation
It's hard to imagine a more exciting era in US health care. This is
a time of great challenge – and even fretfulness – for America, but
also a time of great opportunity. We are dealing with a system too
often fragmented, impersonal, and ineffective, that is at the same
time so expensive as to have become unsustainable. We spend
almost twice as much per person on healthcare as any other
country on earth, but our lifespan is not even in the top 20. The
main problem lies in our delivery system, which is outdated,
unreliable, unsafe, fragmented, and full of waste. We need to
change that system, and bring the focus back to where it should
always have been: on the patient. And we need a system that
thrives and celebrates when people stay well; not one that fixes its
sights so thoroughly downstream on burdens of illness, injury, and
disability that could be prevented upstream.
Changing the delivery system, however, is not going to happen
overnight. And, it is not going to be done by one person, one group, or
even one arm of society. To make truly meaningful change, we need
to bring together the best minds across academia, medicine, government, the community, and more. Health Care: The Journal of Delivery
Science and Innovation will help do just that. Its aim is to enable new
ideas to arise, grow, mature, become visible, and ultimately reach the
people who can turn the idea into wide-scale impact.
America could in time have the best health care system in the
world. We have all the elements, the latest technologies, the best
parts, and committed people, but we cannot reach that broad
goal if the parts of care persist in fragmentation and fail to work
together. The vast tectonic shifts of the past few years, including,
but by no means limited to the Affordable Care Act, set the stage of
something truly marvelous to happen with the care we all want
and need. I am confident that Health Care: The Journal of Delivery
Science and Innovation will provide a venue for sharing and
shaping ideas that will help meet that challenge: to improve the
quality, justice, and sustainability of our health care system, and
ensure that health care is a right for all Americans.
President Emeritus and Senior Fellow
Donald M. Berwick MD, MPP n
20 University Road, 7th Floor Cambridge, MA 02138, USA
E-mail address: [email protected]
Received 22 April 2013
Available online 4 May 2013
n
2213-0764/$ - see front matter & 2013 Published by Elsevier Inc.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.010
Tel.: +1 440 364 8872.
Healthcare 1 (2013) 3
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Introduction to Healthcare: The Journal of Delivery Science and Innovation
I am pleased to introduce this first issue of Healthcare: The
Science of Delivery and Innovation.
Much of my work has focused on building the science of
delivery. Thus, it is particularly gratifying to see a first journal
dedicated to strengthening this science in the field of health care.
While engaged in development work in many countries, I have
been struck by the fact that when it comes to our most cherished
social goals—health, education, social protection, environmental
sustainability—we have an inexplicably high tolerance for poor
execution. Downplaying execution has sometimes been seen as
proof that our minds are focused on high ideals, not bogged down
in the mundane mechanics of implementation. This confusion
has compromised health care in the United States and many other
places—and it is no longer acceptable.
Over the past several years, delivery pioneers around the world
have begun considering how to harness systems engineering,
operations science, managerial science, leadership, and strategy
to improve execution in health care and other fields that are
critical for human wellbeing. Failure to incorporate multidisciplinary perspectives into health care delivery has hampered progress
towards the goals that we all want to reach: better health
outcomes and reduced health care costs.
How can we capture and disseminate insights from successful
practice? What are the possibilities and limits of dialog with other
disciplines? How can we learn deeply from failures and continuously stimulate innovation to build a science of health care
delivery?
These are the fundamental questions the editors and authors of
this journal are asking. The work in these pages brings the early
seeds of leadership and transformation that we so urgently need:
in health care delivery and across the social goals that define our
highest aspirations for human progress.
President
Jim Yong Kim MD, PhD.
The World Bank Group
Received 22 April 2013
Available online 4 May 2013
2213-0764/$ - see front matter & 2013 Published by Elsevier Inc.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.011
Healthcare 1 (2013) 4–7
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Making the RCT more useful for innovation with evidence-based
evolutionary testing
Kevin G. Volpp a,b,c,d,e,n, C. Terwiesch c,d, A.B. Troxel b,c, S. Mehta b,c,e, D.A. Asch a,b,c,d,e
a
Center for Health Equity Research and Promotion, Philadelphia VA Medical Center, USA
Leonard Davis Institute Center for Health Incentives and Behavioral Economics, USA
c
University of Pennsylvania, School of Medicine, USA
d
The Wharton School, USA
e
Penn Medicine Center for Innovation, USA
b
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 9 February 2013
Received in revised form
18 April 2013
Accepted 20 April 2013
Available online 9 May 2013
We propose a new innovation model designed to accelerate the rate of learning from provider payment
reform initiatives. Drawing on themes from operations research, we describe a new approach that
balances speed and rigor to more quickly build evidence on what works in delivery system redesign.
While randomized controlled trials provide “gold standard” evidence on efficacy, traditional RCTs tend to
be static and provide information too slowly given the CMMI tagline of “We can't wait.” Our approach
speaks to broader needs within health financing and delivery reform for testing that while rigorous
recognizes the urgency of the challenges we face.
Published by Elsevier Inc.
Keywords:
Innovation
Evidence
Randomized controlled trials
The Center for Medicare and Medicaid Innovation (CMMI)
launched the Health Care Innovation Challenge in November of
2011 and called on teams from around the United States to design
and evaluate innovative programs that would accomplish the
triple aim of improving health, improving health care, and reducing costs. The creation of CMMI more broadly provides an
opportunity to think about how to catalyze and evaluate innovations in provider payment, health care financing, and health care
delivery. The Health Care Innovation Challenge was branded with
the tagline, “We can't wait”—a reference both to the urgency of
meeting the triple aim, and to the delays that traditional prospective evaluations often create. Innovation is slowed by evaluation,
but innovation without evaluation limits learning. As a result,
accelerating evaluation is a central challenge in health care
innovation.
Building on the literature of innovation management and
design thinking, we propose a more flexible evaluation system
that could be used by organizations in developing, testing, and
evaluating provider payment and delivery system initiatives. This
new approach emphasizes the fluid and dynamic nature of the
innovation process. We argue that this added flexibility and
n
Correspondence to: University of Pennsylvania, School of Medicine,
1120 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021, USA.
Tel.: +1 215 573 0270; fax: +1 215 573 8778.
E-mail addresses: [email protected],
[email protected] (K.G. Volpp).
2213-0764/$ - see front matter Published by Elsevier Inc.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.007
increased speed sacrifices only a small amount of the rigor of
traditionally conceived randomized controlled trials.
Experimentation and testing are at the heart of innovation.
Ideally, new products or services are created and then empirically
tested and improved upon through experiments.1 In assessing
health care interventions, the typical approach to experimentation
is either a pre-post study (not a true experiment) or a randomized
controlled trial (RCT). In a pre-post study, an outcome is measured
before the intervention and then again after the intervention and
the change in outcome is then assumed to represent the impact of
the intervention. Such one-arm demonstration projects allow only
weak conclusions since observed changes over time may be due to
the intervention or any number of measured or unmeasured
confounders.2 A common problem is regression to the mean,
because interventions often target individuals whose condition
at enrollment is extreme (more often hospitalized, worse diabetes
control) with the aim of putting them closer to the mean
(hospitalized an average amount, average diabetes control). With
time, simple regression to the mean can masquerade as “improvements” falsely attributed to the intervention.
Randomized trials are often implicitly passed over as ways to
evaluate innovations in care delivery or financing because of a
misguided view that even though all new drugs require evaluation
of their safety and effectiveness, policy or delivery system innovations do not require evaluation so long as they are thoughtful and
well-conceived. In a standard randomized trial, participants are
assigned randomly between the current care process and one
K.G. Volpp et al. / Healthcare 1 (2013) 4–7
5
(or multiple) treatment arms that reflect a new care process or
financing approach and the experimental evaluation of the intervention is simply the difference in outcome between the treatment and control groups. Randomized trials provide strong causal
evidence of effects, but tend to be lengthy and narrowly focused
on the effect of a precisely specified intervention. That precision
may be less relevant in highly naturalized and complex settings
like clinics and hospitals where the effects of a policy innovation
are likely to be substantially modified by the particular people and
context of the setting.
A weakness of the traditional RCT is that the organization or
researcher must commit to one new care or financing model
ex-ante. Study teams passively observe how this model does over
time. Although RCTs sometimes require interim analyses and
intermediate stopping rules, few opportunities exist to improve
the trial while the experiments are unfolding as teams typically
wait until the end of the three or four year trial for the results to
come in. Those are good rules when we want high confidence and
are not constrained by the need for rapid results. But they are too
strict if each approach is mostly meant to inform the next one in a
rapid cycle innovation approach. In those cases we should be
willing to make faster changes in the middle of a trial and tolerate
much higher Type I error rate (i.e., the chance of concluding a
significant improvement when none really exists).
Evaluation of innovation in health care often rests at one of the
following two extremes: (1) no evaluation at all or pre-post
designs with limited ability to draw causal inferences; or
(2) lengthy and expensive randomized trials whose results are
too context-dependent to justify the long trial times they require.
The design theorist and architect Christopher Alexander
emphasized that the design of new systems requires iterating
between the abstract, mental world where hypotheses are formulated and later tested in the actual world capturing the
empirical reality. If there exist explicit milestones at which
the researcher switches from the mental world to implementation,
the process of innovation resembles the pathway shown in Fig. 1.
In the innovation management literature, this process is commonly referred to as the stage-gate process for innovation.
A vast body of research in the areas of product development
and innovation has shown mathematically3 as well as empirically4–6 that the static processes of the stage-gate model are best
suited for incremental innovations in low uncertainty environments.7,8 However, the challenge of transformative health care
innovation is not to develop incremental change but to create a
radically new care process for patients with high rates of poor
outcomes and high costs.4–6 What CMMI seeks is a substantially
new care process for patients, to shift the focus of health care from
providing care through a visit-based model in which patients see
providers when they are sick to one focused on improving
population health. Such transformations require innovation simultaneously in (a) the operations of the care model; (b) the technological infrastructure and workforce; and (c) the mindsets and
engagement of both patients and providers and the incentives
they face. An open question is whether population health is
defined as focusing on clinical aspects of population health or
more broadly social and environmental determinants of health.9 In
either case, an approach to design, testing, and evaluation is needed
that can provide rigorous but timely feedback on effectiveness.
The process model that the innovation literature recommends
for such radical, high uncertainty innovation challenges is one of
rapid iteration and validation. In “The Science of the Artificial,”
Nobel laureate Herbert Simon emphasizes that certain problems,
especially those of high complexity, are best solved by iteration.
Simon's work had a major impact on the fields of design and
innovation and the current literature in these fields strongly
embraces rapid iteration between the mental world and empirical
reality (see Fig. 2). In these settings, the point in time at which the
definition of the new product or process is finalized is deliberately
kept open for longer, creating an opportunity for learning and
flexibility for future adjustment in response to new data.4–6 This
type of innovation research is inspired by models of software
development.7,8
For example, a 3-year evaluation could be sequenced into
multiple learning cycles, each cycle refining the previous one
(Fig. 2). This iterative model reflects the fact that learning about
implementation comes during a trial, even if comparative outcomes
do not arrive until the end, and knowledge of both implementation and outcomes can be used in refining the approach that is
being tested.
Version 2 (V2) reflects the knowledge gained from implementing Version 1 (V1), but may also be informed by parallel experiments of narrower components performed alongside a larger
intervention, such as testing different ways of incenting patients
or providers, different technologies, or different assessment or
data collection techniques—any elements of the central intervention that includes embedded alternatives that can be compared.
These side experiments can be carried out with a different set of
patients or providers to avoid interfering with the main study
population.
Both of these sources of knowledge allow organizations (or
researchers) to improve and refine the care model. The intervention can be dynamically adjusted to the new data obtained from
the clinical implementation instead of “being stuck” with the initial
hypothesis. This form of rapid cycle innovation might be called
evidence-based evolutionary design.
A common critique of rapid cycle innovation is that it appears
chaotic compared to the carefully crafted project management
plans of the more traditional innovation process with its distinct
phases of design, build, implementation, and evaluation following
the logic in Fig. 1.2 But the approach derives value from the
recognition that even the most conceptually grounded and
evidence-based interventions can always be improved. While a
traditional RCT asks the question, “Which among these fixed
alternative approaches is the best?” evidence-based evolutionary
design asks the question, “How can I make my processes better?.”
A fundamental appeal of the second approach is that it has no
limit and supports the notion of continuous improvement.
Fig. 1. Problem solving model.14
Fig. 2. Flexible model of iterative innovation.
6
K.G. Volpp et al. / Healthcare 1 (2013) 4–7
Fig. 3. Evidence based evolutionary testing.
An example of our approach, in the context of our own
applications to the CMMI 2012 Innovation Challenge, is summarized in Fig. 3; we present both a sequence of interventions related
to development of an approach that could be used in populationbased financing of providers and side experiments. We use as a
foundation the concept of “automated hovering” in which
advances in behavioral economics, changes in health care financing to emphasize population-based health improvement as
opposed to reimbursement for visits or procedures, and the
proliferation of wireless devices and social media come together
to create new possibilities to improve health engagement among
patients.9 In the evidence-based evolutionary designs approach,
the traditional RCT is adapted to better accommodate ongoing
improvement in care processes but still retain the appealing RCT
features that allow definitive evaluation. All interventions will be
compared in a randomized design against patients receiving usual
care as a control. In that way, the design remains true to the
principles of the RCT.
Rather than deploy a single intervention during this 3 year
period, the investigative team has begun with a current state-ofthe-art standard (version 1.0), which will be modified based on
new experience and early evidence, with an enhanced Version
2.0 deployed about midway through the trial period. Randomization to either active treatment will occur at a 2:1 ratio relative to
controls, so that there are equally sized groups receiving usual
care, Version 1.0, and Version 2.0, optimizing statistical power for
all comparisons. Insights from these two models could then be
used to develop a larger-scale model to be deployed with insurer
partners in months 34–36. The result is that at the end of the
3 year period a team of investigators will have [1] deployed a
state-of-the-art approach to achieve the triple aim in these
patients based on the best available evidence; [2] compared
Version 1.0 to a randomly assigned usual care control; [3] run a
series of side experiments that will inform subsequent modifications; [4] deployed and tested an enhanced Version 2.0 against a
randomly assigned usual care control. Results can be analyzed
comparing any intervention (Version 1.0 or 2.0) to control—a
comparison that spans the entire period; [5] deployed a version
for scaled up implementation (Version 3.0) that has gone through
two cycles of product development. Because of the unbalanced
randomization, it will also be possible to compare Version 1.0 and
Version 2.0 directly, although we acknowledge potential confounding by time or other cohort effects since this comparison
does not rest on randomized groups. While we think it is cleaner
to test Version 1.0 and Version 2.0 in separate cohorts, this could
also be done using a single cohort. Note that this illustration
contains only three cycles because of the limited time period of
the funding mechanism; we would suggest that this type of
iteration based on feedback and evaluation be continuous and
ongoing.
This approach to testing ideas using sequential randomized
trials is likely to advance knowledge on the effectiveness and cost
effectiveness of provider payment or other care delivery initiatives
more rapidly than either: (1) implementation or “scaling up” of
programs that have some limited supportive evidence; (2) a prepost design; or (3) a traditional RCT because such approaches
either do not facilitate rigorous evaluation or efforts to expedite
incorporation of intervention improvements.
We put this framework in terms of alternative extremes in
terms of how organizations can design, test, and evaluate new
ideas3: (a) development and improvement or (b) Darwinian selection. To the extent that there exists a significant amount of ex ante
knowledge about the innovation under development, innovation
can be “rationally designed and executed” using development and
improvement. Careful plans are developed, milestones are defined,
and the plans are executed. The strength of the approach lies in its
predictability—you get what you plan for.
The problem with the development and improvement
approach is that it does not do well under high degrees of
uncertainty/ambiguity (low amount of ex ante knowledge and/or
a fast moving world.5 In such settings, problem solving is a much
more organic and evolutionary process.4 Many alternatives have to
be considered and since there exists insufficient ex ante information about an evaluation of these opportunities, many of them will
be launched and explored in parallel.10 This is the basis for
considering an approach more akin to Darwinian selection.
To categorize all alternative development approaches/methodologies, it is thus helpful to lay out a continuum ranging from
“high amount of prior knowledge” to “low amount of prior knowledge.” In health care delivery settings, high amounts of prior
knowledge imply that standard RCTs can be an effective approach
in which the alternatives are defined, the sample sizes computed,
and the trials implemented and evaluated. When prior knowledge
levels are low, an entirely organic process of Darwinian selection
makes more sense: start 10 parallel alternatives and judge at the
end.11
We view our proposed methodology as a hybrid which reflects
that we have an intermediate amount of prior knowledge. We
build in the flexibility of adjusting/responding to new information
as the process unfolds. This is analogous to the process articulated
by MacCormack, Verganti, and Iansiti in their influential work in
Management Science and work by Christensen and others that
suggest using a learning mindset to continue to refine and
improve new models as they are built, rather than pursuing a
focused approach that attempts for prove/disprove (in case of a
study) or scale (in the case of a business) a model that has not yet
been validated and fully refined.12,13 These approaches share the
notion that too much is uncertain and unknown at the outset of a
new venture to deploy a static model of evaluation. However, we
do not engage in a parallel exploration of alternatives. At any given
time, we define the best alternative based on the limited knowledge that we have. As new knowledge comes in, we refine the
alternative.
The approach we describe here is intended to be a framework
that can be broadly adopted as a way of continuously improving
and building evidence as opposed to a single intervention that
would be “approved.” Part of the concept is that improvement is
always possible and by deploying new interventions in a manner
that forces both identification of relevant outcomes and assessment of how well two or more alternatives perform in achieving
those predetermined outcomes will facilitate continuous improvement based on evidence as opposed to presupposition.
K.G. Volpp et al. / Healthcare 1 (2013) 4–7
While evaluation of complex care initiatives is inherently
complex and there is inherent cross-sectional variability and lots
of confounding influences, this approach could help separate
signal from noise. There are tradeoffs inherent in our model. A
principal feature relative to static designs is that the intervention
groups continually change over the time period of evaluation, such
that the “effect” is the sum of the impact of all the various
iterations throughout the course of the study. This means that
the results reflect the evolution and improvement of the model
over time and makes it difficult to isolate the impact of specific
interventions. This provides a test of an approach, which while it
can be clearly described is more difficult to exactly replicate than a
static intervention. Another limitation is that if multiple versions
are to be explicitly tested, sample size requirements increase
accordingly. We do not think premature evaluation of underdeveloped concepts is the fundamental problem; the problem is
more one of inadequate evaluation of concepts that have shown
some initial promise but which then are not subjected to ongoing
evaluation, assessment, and improvement as they move toward
wider scale adoption.
Efficiently facilitating innovation will be important for the
CMMI and health care delivery organizations as new models of
payment and health care delivery are developed. This will be a key
ingredient to maximizing the impact of the resources allocated to
new initiatives in the least possible time, with the overall goal of
improving the value of health care delivered in the United States
quickly but building an evidence base and a process that will allow
us to continue to do better.
7
References
1. Simon HA. The Sciences of the Artificial. 3rd., Cambridge, MA: The MIT Press;
1996.
2. Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for
Research.Dallas: Houghton Mifflin Company; 1963.
3. Sommer S, Loch C. Selectionism and learning in projects with complexity and
unforeseeable uncertainty. Management Science. 2004;50:1334–1347.
4. MacCormack A, Verganti R, Iansiti M. Developing products on internet time:
the anatomy of a fexible development process. Management Science. 2001;47:
133–150.
5. Eisenhardt K, Tabrizi B. Accelerating adaptive processes: product innovation in
the global computer industry. Administrative Science Quarterly 1995;40:84–110.
6. Terwiesch C, Loch C. Measuring the effectiveness of overlapping development
activities. Management Science. 1999;45:455–465.
7. Boehm B. A spiral model of software development and enhancement. Computer.
1988;21:61–72.
8. Connel J, Shafer L. Structured Rapid Prototyping—An Evolutionary Approach to
Software Development.Englewood Cliffs, NJ: Prentice Hall; 1989.
9. Noble DJ, Casalino LP. Can accountable care organizations improve population
health?: should they try? JAMA: The journal of the American Medical Association.
2013;309:1119–1120.
10. Loch C, Terwiesch C, Thomke S. Parallel and sequential testing of design
alternatives. Management Science. 2001;47:663–678.
11. MacCormack A. Management lessons from Mars. Harvard Business Review.
2004;32:18–19.
12. Christensen CM, Raynor ME. The Innovator's Solution: Creating and Sustaining
Successful Growth. Boston, MA: Harvard Business School Press; 2003.
13. McGrath RG, MacMillan IC. Discovery Driven Planning.Boston, MA: Harvard
Business School Press; 1995.
14. Alexander C; 1964.
Healthcare 1 (2013) 8–11
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Perspectives Paper
Will new care delivery solve the primary care physician shortage?: A call for
more rigorous evaluation
Clese E. Erikson n
Center for Workforce Studies, Association of American Medical Colleges, 2450 N Street, NW, Washington, DC 20037, USA
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 1 March 2013
Received in revised form
12 April 2013
Accepted 18 April 2013
Available online 9 May 2013
Transformations in care delivery and payment models that make care more efficient are leading some to
question whether there will really be a shortage of primary care physicians. While it is encouraging to see
numerous federal and state policy levers in place to support greater accountability and coordination of
care, it is too early to know whether these efforts will change current and future primary care physician
workforce needs. More research is needed to inform whether efforts to reduce cost and improve quality
of care and population health will help alleviate or further exacerbate expected primary care physician
shortages.
& 2013 Elsevier Inc. All rights reserved.
Keywords:
Primary care shortages
Team-based care
Triple aim
Accountable care organizations
Payment reform
Workforce
1. Introduction
Health care in the United States is undergoing major transformations in care delivery and payment models that are bringing us
closer to the goals of the IHI “triple aim”1 of improving both the
patient experience of care and population health while simultaneously reducing cost. New payment models that reward value
over volume such as Accountable Care Organizations (ACOs) and
increased payments for practices recognized as Patient-Centered
Medical Homes (PCMHs) are providing incentives for primary care
practices to redesign care delivery, leading them to integrate novel
team members and new technology to facilitate highly-coordinated, team-based care. While much of the innovation at the
national level is spurred by the Affordable Care Act,2,3 states4 and
even private insurance companies5,6 are also experimenting by
paying more for care coordination in the hopes of reducing
duplication of effort and avoidable hospitalizations and emergency
room visits.
While there is a growing body of evidence that such transformations in care delivery and payment models hold significant
potential for helping to advance the triple aim,7–10 there is very
little empirical data available to assess what affect these changes
will have on workforce needs. New models of primary care are
increasingly incorporating team members such as nurse
n
Tel.: +1 202 828 0587.
E-mail address: [email protected]
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.008
practitioners (NPs), physician assistants (PAs), medical assistants
(MAs), health coaches, care coordinators, and community health
workers who facilitate task delegation and shared decision-making, thus potentially reducing the need for more physicians.
Furthermore, advances in technology such as the adoption of
electronic medical records (EMRs) and telemedicine facilitate a
shift away from in-person office visits, which could also mitigate
physician shortages. Thus, in an era of expected primary care
physician shortages,11–14 these transformations in care delivery
and payment models offer potential solutions for offsetting primary care shortages by incentivizing a more productive and
efficient health care workforce.15–18 On the other hand, the fact
that these new models rely upon a robust primary care workforce
to provide increased level of services and attention to quality
outcomes may mean that we will actually need more primary care
physicians than has been projected. It is also possible that
advances in technology will not reduce the need for physicians
to the extent that has been hypothesized, and could actually drive
up demand for services if physicians become more accessible to
patients through shared records and alternative visits through
video or email.
Given the heightened focus on ensuring we have an adequate
primary care physician workforce, this paper first describes what is
known about the workforce implications of new primary care
models that incorporate team-based care, before going on to
illuminate several additional factors which must be taken into
account when considering the effect that the use of teams will
have on primary care capacity. The paper then turns to the role
C.E. Erikson / Healthcare 1 (2013) 8–11
that technology may play in easing physician shortages and raises
some additional points to consider. It concludes by calling for
policy makers, researchers and funding agencies to incorporate
workforce implications into evaluations of the new models of care
currently being tested. The evaluation plans that have been
proposed thus far do not include an explicit focus on measuring
and evaluating the impact these transformations have on provider
capacity or demand for primary care services.19–21 Furthermore,
patient panel size—a common metric used to discuss provider
capacity—is fraught with ambiguity.22 Developing reliable metrics
for evaluating primary care capacity and funding large scale
workforce evaluations will be essential to help policy makers
and health care providers identify and adopt the most efficient
models of care and better inform understanding of the workforce
needed to support the triple aim.
2. Teams offer potential for increasing primary care capacity
Numerous studies11–14 project primary care physician
shortages; those that take into account both the aging population
and the expansion of insurance coverage under the Affordable
Care Act generally estimate shortages of 35,000–50,000 full-timeequivalent primary care physicians by 2025. However, as a recent
report by the Bipartisan Policy Center and Deloitte Center for
Health Solutions points out, current models do not adequately
account for new delivery models or the role that other professionals are playing in care delivery.23 In order to model the effects
of care redesign on primary care capacity, researchers at the
Columbia Business School and the University of Pennsylvania’s
Wharton School recently designed a theoretical model that shows
increasing patient panels to 3400 as a result of open access
scheduling, team-based approaches to care and increased use of
electronic visits could potentially offset primary care physician
shortages.16 This is a significant increase over the estimated 2300
patients per provider thought to be the average among primary
care providers in the United States,24 or the figure of 2500 patients
per provider used in some projection estimates.16
A new study25 examining innovative primary care practices
also points to the potential of team-based care to lead to greater
efficiencies, demonstrating the way some practices are streamlining visits to reduce physician time on clerical work and other tasks
that can be delegated through enhanced protocols and standing
orders. Having lab tests ordered prior to the visit and improving
team functioning through co-location, team meetings, and workflow mapping were also cited as examples for improving efficiency. While this study did not focus on panel sizes, the
researchers found these efforts to redesign primary care can
increase job satisfaction and, in some cases, lead to increased
capacity. Cleveland Clinic Strongsville, for example, was able to
increase the number of patient visits per day from 21 to 28 by
using RNs and MAs in expanded roles without hiring additional
physicians. Beyond helping physicians make the most efficient use
of their day, this research shows potential for new care models to
reduce burn-out and thus increase workforce retention. Furthermore, if medical students are exposed to these professionally
satisfying models of care during clerkships and residencies, the
number electing to become primary care physicians could increase
as well.
3. Several missing factors must be taken into account
Despite the potential offered by team-based care, several
additional factors must be taken into account. The following three
sections address the issues that should also be considered when
9
evaluating the effect of team-based care on primary care physician
capacity.
3.1. Improved care coordination and disease management may
require smaller patient-to-provider ratios
The theoretical model proposed by Green and her colleagues
does not take into account changes in levels of service that often
accompany a team-based approach to care, such as improved care
coordination and disease management, medication reconciliation,
greater attention to behavioral health and other specialist access,
and an increased focus on patient education.26,27 Other analysts
suggest much more modest panel sizes of 1400–1900 are needed
in order to accommodate the increased level of services in a teambased approach to care, more in line with what is found in
practice.28 For example, Group Health Cooperative of Puget Sound
reduced patient panel sizes from 2300 to 1800 in order to increase
visit length and provide higher quality of care when they moved to
a team-based model.29
Another factor to consider is that not all panels are created
equal. Panels with high percentages of chronic care patients are
often smaller, as demonstrated by the Special Care Center in
Atlantic City, founded by Rushika Fernandapulle.30 The clinic,
which focuses treatment on the casino workers who are sickest
and thus most expensive to treat, has approximately 600 patients
per MD for the two physicians in that practice, plus two NPs, eight
health coaches, and one full-time social worker. The discrepancies
between what is possible under theoretical models and what has
been achieved in practices such as Group Health and the Special
Care Center—both leaders in providing high quality care under
innovative delivery models—point to the need for more systematic
research to determine how panel size is changing as a result of
innovations in payment models and care delivery. Accountable
Care Organizations (ACOs), patient-centered medical homes
(PCMHs), and other new delivery models feature a more intensive
population-based approach to health along with increased care
management efforts, which requires a robust primary care workforce.31 As we continue to collect data on how capacity changes
under new models of care, it will be important to better understand the extent to which increased visit capacity translates to
increased panel size versus improved access and continuity of care
for existing patients.
3.2. Changes in the supply and scope of midlevel providers
As more practices come to rely upon team-based care, the
number of NPs32 and PAs33 in training is increasing rapidly.
Recently, momentum for expanding the scope of practice for NPs
has been growing in response to projected demand for primary
care,34 and some have argued that more states should follow the
16 that already allow NPs to practice completely independently of
physician oversight so that they could step in to ease the shortage
of primary care physicians. Meta-analyses of numerous pilots and
research studies have found that the quality of care delivered by
NPs specializing in primary care is at least as high as that of
physicians,35 and patients also exhibit high levels of satisfaction
with NPs.36 However, given the vast variation in states’ scope-ofpractice laws,34 there is no guarantee that consensus will be
reached across all 50 states which could limit the extent to which
NPs will be able to take on a greater share of the primary care
workload. Even if they are allowed to practice independently, it is
worth noting that not all NPs choose to specialize in primary care.
Recent estimates suggest that only a little over half of NPs
currently practice in that field,37 limiting the numbers who could
be used to alleviate primary care physician shortages.
10
C.E. Erikson / Healthcare 1 (2013) 8–11
3.3. Growing numbers of part-time providers also impacts projected
need
The growing interest among practitioners in working part-time
is another factor that complicates primary care physician supply
calculations and, in fact, team-based care is highly compatible
with that preference. In a well-structured, team-based model,
physicians do not need to be available all the time because the
practice is deliberately designed to make sure that all team
members can access patient information and share in the care.
While there is no good source of national data on the percent
working part-time, a recent survey of primary care physicians in
Washington found that nearly one out of three work part-time and
on average have 16% smaller patient panels than their full-time
peers (1481 and 1764 respectively).38 Physician shortage projections do take account of part-time work, but may underestimate
the extent of this growing trend. Furthermore, the projected need
is for additional FTE physicians; including those who work part
time translates into a far greater total number of physicians,
further complicating models that suggest we do not need any
additional doctors.
4. The use of technology to address workforce needs
In addition to the use of new team members, technology has
been heralded as a key component of new care delivery models
that has the potential to increase efficiency and potentially drive
down health care utilization,39 in turn saving costs and potentially
decreasing the number of physicians necessary to care for a given
population. For example, EMRs allow multiple providers access to
all information about a single patient, thus reducing duplication of
testing, increasing the speed of diagnosis, and facilitating comanagement of patients. Other technological innovations, including videoconferencing, secure messaging through patient portals,
email or phone visits, and electronic monitoring of patients at
home, have also been shown to improve patient access to health
care and clinical outcomes while reducing emergency room
utilization and health care costs.40,41 A striking example of the
use of technology to reduce the need for physicians is Virtuwell, an
online “clinic” created by HealthPartners in Minneapolis, MN,
available to patients 24 h a day, 7 days a week for 40 conditions.
Patients are effectively triaged by the system by answering a series
of questions about symptoms and history, and, within 30 min,
receive a treatment plan and have a prescription transmitted to
their chosen pharmacy if necessary. Patients and providers alike
are pleased with the system, which also has documented cost
savings.42
Despite their promise, recent studies suggest that a number of
challenges—including concerns about privacy and security, the
lack of interoperability between different vendors, and reluctance
on the part of hospitals and physicians—must be overcome before
EHRs can realize their full potential,43 and in fact a recent review
concluded that EMRs to date have had little impact on efficiency or
health care costs.44 While they still hold great promise, it remains
to be seen whether these challenges will be overcome and EMRs
will alter our workforce needs. It is also unclear whether the shift
away from in-person office visits that is facilitated by technology
will necessarily result in reduced need for physicians. A recent
study45 from Kaiser Permanente Colorado, a leader in leveraging
technology to improve patient access to care and efficient use of
physician time, suggests that use of clinical services—including inperson office visits—may actually increase when patients have
access to their doctors through electronic portals, casting some
doubt on whether use of technology will completely eliminate
physician shortages in primary care. Taken together, these factors
raise caveats that must be further evaluated by studying the use of
technology in practices and collecting more information about the
impact on workforce needs.
Conclusion
As we enter a new era of health care delivery driven by the
triple aim and its approach to improving the quality of care while
cutting costs, we must not lose sight of health workforce needs
and should ask whether these efforts will help match physician
supply to demand. It is true that the medical home model is
gaining increasing traction and that technology is being used in
innovative ways but that does not de facto mean we will need
fewer primary care physicians. In evaluating the effects on the
future physician workforce, it is especially important to consider
the increased level of primary care service that is inherent in
supporting the success of these transformation efforts. While
some have proposed that innovations in care delivery, payment
models, and the use of technology will eliminate physician
shortages, we must evaluate evidence from practices that have
adopted these measures before we draw any conclusions. To do
this will require debate and analysis of the best metrics for
capturing workforce data, and then to collect this data from
practices that have adopted new models of care delivery. Agencies
that are funding demonstration projects for new models of care
should incorporate workforce analyses into their evaluation process. Metrics such as panel size and visit volume, which are
commonly referred to in analyses of workforce but not always
explained, must be clearly defined and used consistently in order
to perform evaluations and make comparisons. Unless further
attention is granted to evaluating the way new care delivery and
payment models affect workforce needs, we run the risk of falling
short of the workforce required to sustain these innovative efforts.
Including these considerations in the assessment of new care
delivery models will help us identify the most efficient approaches
and help those following in the footsteps of early pioneers to
design programs that make the most of our health workforce.
Acknowledgments
I want to thank my colleagues Scott Shipman, MD, Director of
Primary Care Initiatives and Workforce Research, and Stacie
Harbuck, MA, Research Analyst, who conducted site visits to
team-based practices, for their insights on the impact on productivity and efficiency. I would also like to acknowledge Shana
Sandberg, PhD, Research Writer, for her substantial contributions
in reviewing and editing this commentary.
References
1. Institute for healthcare improvement. The IHI triple aim. 〈http://www.ihi.org/
offerings/Initiatives/TripleAim/Pages/default.aspx〉. Accessed 5.04.2013.
2. Iglehart JK. Primary care update–light at the end of the tunnel? The New
England Journal of Medicine. 2012;366(23):2144–2146.
3. Center for medicare and medicaid innovation. Where innovation is happening.
〈http://innovation.cms.gov/initiatives/map/index.html〉. Accessed 26.02. 2013.
4. National Academy for State Health Policy. Medical home and patient centered
care map. 〈http://nashp.org/med-home-map〉. Accessed 26.02.2013.
5. Dentzer S. One payer's attempt to spur primary care doctors to form new
medical homes. Health Affairs. 2012;31:341–349.
6. Raskas RS, Latts LM, Hummel JR, Wenners D, Levine H, Nussbaum SR. Early
results show WellPoint's patient-centered medical home pilots have met some
goals for costs, utilization, and quality. Health Affairs. 2012;31(9):2002–2009.
7. Gilfillan RJ, Tomcavage J, Rosenthal MB, et al. Value and the medical home:
effects of transformed primary care. American Journal of Managed Care. 2010;16
(8):607–614.
8. Maeng DD, Graf TR, Davis DE, Tomcavage J, Bloom FJ. Can a patient-centered
medical home lead to better patient outcomes? The quality implications of
C.E. Erikson / Healthcare 1 (2013) 8–11
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
Geisinger's ProvenHealth Navigator American Journal of Medical Quality.
2012;27(3):210–216.
Milstein A, Gilbertson E. American medical home runs. Health Affairs. 2009;28(5):
1317–1326.
Rosenberg CN, Peele P, Keyser D, McAnallen S, Holder D. Results from a patientcentered medical home pilot at UPMC health plan hold lessons for broader
adoption of the model. Health Affairs. 2012;31:112423–112431.
Kirch DG, Henderson MK, Dill MJ. Physician workforce projections in an era of
health care reform. Annual Review of Medicine. 2012;63:435–445.
Collwill JM, Cultice JM, Kruse RL. Will generalist physician supply meet
demands of an increasing and aging population? Health Affairs. 2008;27(3):
W232–W241.
Petterson SM, Liaw WR, Phillips Jr. RL, Rabin DL, Meyers DS, Bazemore AW.
Projecting US primary care physician workforce needs: 2010–2025. Annals of
Family Medicine. 2012;10(6):503–509.
Huang ES, Finegold K. Seven million Americans live in areas where demand for
primary care may exceed supply by more than 10 percent. Health Affairs.
2013;32(3):1–8.
Dentzer S. It's past time to get serious about transforming care. Health Affairs.
2013;32(1):6.
Green LV, Savin S, Lu Y. Primary care physician shortages could be eliminated
through use of teams, nonphysicians, and electronic communication. Health
Affairs. 2013;32(1):11–19.
Health Policy Brief: Nurse Practitioners and Primary Care. Health Affairs. 〈http://
www.healthaffairs.org/healthpolicybriefs/brief.php?brief_id=79〉; October 25,
2012. Accessed 1.03.2013.
Ubel P. Why the primary care physician shortage is overblown. 〈http://www.
kevinmd.com/blog/2013/03/primary-care-physician-shortage-overblown.
html〉. Accessed 3.04. 2013.
Fisher ES, Shortell SM, Kreindler SA, Van Citters AD, Larson BK. A framework for
evaluating formation, implementation, and performance of accountable care
organizations. Health Affairs. 2012;31(11):2368–2378.
Shrank W. The center for medicare and medicaid innovation's blueprint for
rapid-cycle evaluation of new care and payment models. Health Affairs. 〈http://
content.healthaffairs.org/content/early/2013/03/21/hlthaff.2013.0216〉;
2013
March 27. Accessed April 3, 2013 [Epub ahead of print].
Nielson M, Langer B, Zema C, Hacker T, Grundy P. Benefits of implementing the
primary care patient-centered medical home: a review of cost and quality results..
Washington, DC: Patient Centered Primary Care Collaborative. 〈http://www.
pcpcc.net/sites/default/files/media/benefits_of_implementing_the_primary_
care_pcmh.pdf〉; 2012. Accessed 4.04.2013.
Murry M, Davies M, Boushon B. Panel size: how many patients can one doctor
manage? Fam Pract Manag. 2007; 14(4): 44-51.
Deloitte center for health solutions and Bipartisan policy center. The complexities of national health care workforce planning. 〈http://bipartisanpolicy.org/
sites/default/files/BPC%20DCHS%20Workforce%20Supply%20Paper%20Feb%
202013%20final.pdf〉; 2013. Accessed 25.02.2013.
Alexander GC, Kurlander J, Wynia MK. Physicians in retainer (“concierge”)
practice. A national survey of physician, patient, and practice characteristics.
Journal of General Internal Medicine. 2005;20(12):1079–1083.
Sinsky CA, Willard R, Schutzbank AM, Sinsky TA, Margolius D, Bodenheimer T.
In search of joy in practice: a report of twenty-three high functioning primary
care practices. Annals of Family Medicine. 2013 [in press].
Bodenheimer T. Primary care—will it survive? New England Journal of Medicine.
2006;355(9):861–864.
11
27. Grumbach K, Bodenheimer T. A primary care home for Americans putting the
house in order. JAMA. 2002; 288(7): 889-893.
28. Altschuler J, Margolius D, Bodenheimer T, Grumbach K. Estimating a reasonable
patient panel size for primary care physicians with team-based task delegation.
Annals of Family Medicine. 2012;10(5):396–400.
29. Medical home features small panels, long visits, outreach, and caregiver
collaboration, leading to less staff burnout, better access and quality, and lower
utilization. AHRQ health care innovations exchange. 〈http://www.innovations.
ahrq.gov/content.aspx?id=2703〉; 2010. Accessed 25.02.2013.
30. Gawande A. The hot spotters. The New Yorker. 〈http://www.newyorker.com/
reporting/2011/01/24/110124fa_fact_gawande〉; 2011 January 24, Accessed
25.02.2013.
31. Rittenhouse DR, Shatell SM, Fisher ES. Primary care and accountable care—two
essential elements of delivery-system reform. New England Journal of Medicine.
2009;361(24):2301–2303.
32. Auerbach DI. Will the NP workforce grow in the future? New forecasts and
implications for healthcare delivery Medical Care. 2012;50(7):606–610.
33. Hooker RA, Cawley JF, Everett CM. Predictive modeling the physician assistant
supply: 2010–2025. Public Health Reports. 2011;126:708–716.
34. Schiff M. The role of nurse practitioners in meeting increasing demand for primary
care. Washington, DC: National Governors Association; 2012.
35. Naylor M, Kurtzman E. The role of nurse practitioners in reinventing primary
care. Health Affairs. 2010;29(5):893–899.
36. Knudtson N. Patient satisfaction with nurse practitioner service in a rural
setting. Journal of the American Academy of Nurse Practitioners. 2000;12
(10):405–412.
37. Agency for healthcare research and quality. The number of nurse practitioners
and physician assistants practicing primary care in the United States. 〈http://
www.ahrq.gov/research/findings/factsheets/primary/pcwork2/index.html〉;
2010. Accessed 12.04.2013.
38. Skillman SM, Fordyce MA, Yen W, Mounts T. Washington state primary care
providers survey, 2011–2012: Summary of findings. 〈http://depts.washington.
edu/uwrhrc/uploads/OFM_Report_Skillman.pdf〉. Accessed 25.02.2013.
39. Hillestad R, et al. Can electronic medical record systems transform health care?
Potential health benefits, savings, and costs Health Affairs. 2005;24
(5):1103–1117.
40. Vo A, Brooks GB, Farr R, Raimer B. Benefits of telemedicine in remote
communities and use of mobile and wireless platforms in healthcare. The
UTMB Center for Telehealth Research and Policy.〈http://telehealth.utmb.edu/
presentations/Benefits_Of_Telemedicine.pdf〉. Accessed 3.04.2013.
41. Sequist TD, Cullen T, Acton KJ. Indian Health Service Innovations Have Helped
Reduce Health Disparities Affecting American Indian And Alaska Native People.
Health Affairs. 2011;30(10):1965–1973.
42. Courneya PT, Palattao KJ, Gallagher JM. HealthPartners' online clinic for simple
conditions delivers savings of $88 per episode and high patient approval.
Health Affairs. 2013;32(2):385–392.
43. Adler-Milstein J, Jha AK. Sharing clinical data electronically: a critical challenge
for fixing the health care system. Journal of the American Medical Association.
2012;307(16):1695–1696.
44. Kellermann AL, Jones SS. What it will take to achieve the as-yet-unfulfilled
promises of health information technology. Health Affairs. 2013;32(1):63–68.
45. Palen TE, Ross C, Powers JD, Xu S. Association of online patient access to
clinicians and medical records with use of clinical services. Journal of the
American Medical Association. 2012;308(19):2012–2019.
Healthcare 1 (2013) 12–14
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Perspectives Paper
Commentary on the spread of new payment models
Michael E. Chernew n, Johan S. Hong
Department of Health Care Policy, Harvard Medical School, 180 Longwood Avenue, Boston, MA 02115, USA
ar t ic l e i nf o
Article history:
Received 12 March 2013
Received in revised form
25 April 2013
Accepted 30 April 2013
Available online 9 May 2013
Keywords:
Health economics
Payment model innovation
1. Introduction
Although there have long been many settings where providers
receive a global payment or budget from payers, such payment has
been the exception, not the norm. There are many signs that the
health care system is moving more rapidly away from fee for
service payment towards payment models that bundle payments
either globally or across episodes of care (bundled payments).
For example, in the public sector, the Medicare program has set up
a model of payment for Accountable Care Organizations (ACOs).
Organizations participating in the Pioneer ACO program accept a
global population-based budget and face both upside and downside risk for all medical care delivered to Medicare beneficiaries
that receive most of their care from the ACO. Organizations in the
Medicare Shared Savings ACO program also receive a global
budget to care for their Medicare population and initially only
share in upside savings, but over time will transition to face
downside risk. Interest in the ACO programs has been substantial.1
Since the first selection of 27 ACOs for Medicare Shared Savings
Program in April 2012, the number of ACOs has grown to match
the growing interest in each successive year, rising from 88
selected in July 2012 to 106 selected in January 2013.2
States are also encouraging movement away from fee-forservice (FFS). For example, in Arkansas, Medicaid and private
insurers will pay physicians episode-based bundled payments
with upside and downside risk.3 Oregon has implemented a
coordinated care organization model in Medicaid that is similar
n
Corresponding author. Tel.: +1 617 432 0174; fax: +1 617 432 0173.
E-mail addresses: [email protected],
[email protected] (M.E. Chernew).
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.009
to the ACO model and uses global payments that, by 2014, will
combine payments for physical, behavioral and dental health into
a single global payment.4 In 2009, the Massachusetts Special
Commission on the Health Care Payment System unanimously
voted to transition the state from an FFS to global payment system
by 2014, though legislation is not that aggressive.5,6 In addition,
eight states that are participants in the Multi-payer Advanced
Primary Care Practice (MAPCP) demonstration by Center for
Medicare & Medicaid Innovation (CMMI) are implementing global
budgets and quality incentives in addition to an FFS system.7
Efforts in the private sector have also grown. Blue Cross Blue
Shield of Massachusetts implemented the alternative quality
contract (AQC) that brought provider groups into global budget
contracts with quality-based performance bonuses.5,8 Reports in
early 2012 suggested one in five Massachusetts residents were
enrolled in coverage linked to global budgets.6,9 CareFirst in the
Washington, D.C. area adopted the Patient-Centered Medical
Home model that makes per-member-per-month global payments
to participating practices with a shared-savings component.10
These are just a few examples, but provider interest in new
payment models seems to be growing. In MA, the AQC system
grew from only a small number of providers accepting significant
risk in 2008, to 86 percent in 2013. This experience indicates that
the transition to new payment systems may happen much more
rapidly than some observers may envision. The opinions expressed
in this commentary are informed largely by this experience in
Massachusetts. Our focus is on global payment (including global
budget) models, though many concerns apply to models that pay
for episodes of care using bundled rates
Providers may have many motivations for participating, but the
prospect of flat or falling reimbursement rates is likely a significant
factor. Unlike in FFS, under the new payment models, providers can
M.E. Chernew, J.S. Hong / Healthcare 1 (2013) 12–14
capture savings if they can deliver care more efficiently. When the
money gets tight, providers want more control over the money.
Existing evidence, though limited, suggests providers can be
successful. Relative to groups receiving FFS payment, the AQC was
associated with a $15.51 (1.9%) decrease in average quarterly
spending per enrollee and increases in quality of chronic care
management and pediatric care.8 Over 2 years, the subgroup of
providers who entered the AQC from a fee-for-service contract
saved $60.75 (8.2%) in average quarterly spending and demonstrated an improvement in adult preventative care.11 These savings largely reflect referrals to less expensive providers,
particularly in the first year, and utilization reductions. Participating providers largely capture these savings.12 Other experiences
with fixed payments to providers, accumulated over decades,
show that when providers were given fixed payments for patients
they reduce spending.13–15 In fact, experiences from fixed payments under California's managed care systems showed a reduction in hospital cost growth largely through reductions in
utilization.16
Some observers have expressed concern that while the global
payments have been shown to alter provider behavior, the
associated savings are captured by the providers themselves, not
by the payers and ultimately beneficiaries or taxpayers.17 This
should not be a major concern. We should not worry very much
about providers making profits if they can be achieved by greater
efficiency (as opposed to excessive pricing). These systems are
designed to provide incentives for providers to be more efficient.
The incentives inherently mean that providers will capture some
of the savings from efficiency—the greater the shared savings
percent, the stronger the incentives. Yet, ultimately, a good
payment system requires providers to be able to succeed in the
new environment. If they achieve savings, and thus earn profits,
the system will be able to survive with lower payment updates.
This slower payment trajectory is the real metric of success
because long run growth rates will swamp short term savings. If
the profits are captured by payers, incentives for efficiencies
diminish and without such efficiencies, the system will not be
able to be fiscally sustainable in the future.
Yet despite the incentives, the success of global payments
systems is not a certainty. Some of the concerns are obvious. For
example, will organizations beyond the few success stories, be
able to transform themselves to meet their objectives? Optimizing
the workforce in the new environment will be important. The
cultural changes are significant. Coordination across different
institutions and associated need for information technology is
also important. If providers integrate, which seems like a reasonable expectation, we must be concerned that the resulting market
power will lead them to increased prices in the private sector.18–20
Moreover, in integrated models, incentive systems to allocate
funds across different groups in the integrated system will be
needed. Those systems may be FFS, but operate with managerial
controls that may distinguish them from existing FFS systems. For
example, CareFirst's patient-centered medical home builds upon
the existing FFS system with participation enhancements and
outcomes-based incentives that allow for participating primary
care physicians to earn fees above the base fee schedule.21 It is
certainly possible that the cost of running the systems will exceed
the savings.22 Finally, designing effective regulation will be a
necessity but a lot of work needs to be done in this area.
Even if we transition to payment models in which providers
can reduce spending (or lower the rate of spending growth), there
is a concern about quality of care. While global payments provide
incentives to reduce costs, efforts to control utilization, in theory
could lead to denial of, and thus lower quality of, care. Most global
payment models implemented today include incentives to
improve quality, and in fact, evidence suggests they have been
13
successful.11,23 Yet, quality measures are incomplete and concerns
that systems facing incentives to provide less care will not provide
optimal care are genuine. Yet several factors balance out this
concern. Specifically, in a global payment model, providers have
incentives to coordinate care across all settings which could
improve quality. Similarly, providers face the cost of adverse
events so they have incentives to minimize bad outcomes and
improve population health. In an FFS system, providers do not get
paid for many quality enhancing activities, but a global model can
create incentives for such initiatives. More broadly, it is important
to recognize that in the FFS system, the incentives are to provide
more care, which could lead to poor quality, and in fact, there is
ample evidence of over use, which could lead to poor outcomes.24
For example FFS encourages use of services such as back surgeries,
which evidence suggests is often no needed.25–27 Continued
monitoring and research will be needed in these new systems to
support quality improvement and guard against stinting on care.
Other less obvious concerns will also arise. For example, one
concern is the ability of private sector innovators to appropriate
the value of their innovations. In a global payment model,
providers capture some of the savings associated with more
efficient care. However, that incentive exists only for the beneficiaries of the insurer who implements the program. Yet, many
providers serve patients from multiple plans. They may practice
similarly for all their patients.28 As a result, other insurers (or selfinsured employers) may benefit from the spillover. This
diminishes the incentives for insurers to adopt these programs
and over time may diminish their spread, not for lack of provider
interest, but because insurers may not be able to capture savings
over time. Insurers can address this issue in several ways, including limiting their provider networks (though beneficiaries may
want choice) or adopting services that complement the payment
model but are limited to their own beneficiaries. These may
include data support services that help target their beneficiaries
or targeted case management. Policymakers can minimize the free
rider/appropriability problem by encouraging all payer payment
systems. Moreover, provider preferences may further mitigate this
concern if delivery systems seek out such contracts from all of
their payers. In fact, the five AQC groups in Massachusetts opted to
participate in the Pioneer program, presumably in part because
they want to have a single set of incentives across their patient
population—not live in a world with FFS incentives for some
patients and global budget/quality incentives for others. It is not
clear if this experience will generalize.
2. Conclusion
For decades, the health care system has been characterized
largely by a FFS system which, given the prices that prevailed,
generated incentives for providers to provide more and not
necessarily better care. This system has resulted in an inefficient
health care system with rising spending. Bundled or global
payment models provide incentives to lower spending and can
thus be an important part of efforts to address our fiscal concerns.
As many observers have noted, there will be challenges to
implementing a global payment model that lowers spending
growth, allows all stakeholders to benefit from the captured
savings, and maintains quality of care. Monitoring the progress
of new adopters of bundled and global payment systems will be
important for policymakers. While it is too soon to say with a
degree of certainty whether or not global payments will succeed,
global payment models provide an alternative from the currently
unsustainable FFS system with the potential to significantly slow
health care spending growth.
14
M.E. Chernew, J.S. Hong / Healthcare 1 (2013) 12–14
References
1. Muhlestein D. Continued growth of public and private accountable care
organizations. Health Affairs Blog. 2013.
2. Centers for medicare & medicaid services. Shared savings program: Program
news and announcements; 2013.
3. Emanuel EJ. The Arkansas innovation. The New York Times; 2012.
4. Stecker EC. The Oregon ACO experiment—bold design, challenging execution.
New England Journal of Medicine. 2013.
5. Song Z, Landon BE. Controlling health care spending—The Massachusetts
experiment. New England Journal of Medicine. 2012;366(17):1560–1561.
6. Mechanic RE, Altman SH, McDonough JE. The new era of payment reform,
spending targets, and cost containment in massachusetts: early lessons for the
nation. Health Affairs. 2012.
7. Takach M. Reinventing Medicaid: State innovations to qualify and pay for
patient-centered medical homes show promising results. Health Affairs.
2011;30(7):1325–1334.
8. Song Z, Safran DG, Landon BE, He Y, Ellis RP, Mechanic RE, et al. Health care
spending and quality in year 1 of the alternative quality contract. New England
Journal of Medicine. 2011;365(10):909–918.
9. Kowalczyk L. Cost-controlled health coverage gaining ground in mass. Boston
Globe. 2012:30.
10. Dentzer S. One payer's attempt to spur primary care doctors to form new
medical homes. Health Affairs. 2012;31(2):341–349.
11. Song Z, Safran DG, Landon BE, Landrum MB, He Y, Mechanic RE, et al. The
‘alternative quality contract,‘based on a global budget, lowered medical spending and improved quality. Health Affairs. 2012;31(8):1885–1894.
12. Chernew ME, Mechanic RE, Landon BK, Safran DG. Private-payer innovation in
Massachusetts: the ‘alternative quality contract. Health Affairs. 2011;30
(1):51–61.
13. Landon BE, Reschovsky JD, O’Malley AJ, Pham HH, Hadley J. The relationship
between physician compensation strategies and the intensity of care delivered
to Medicare beneficiaries. Health Services Research. 2011;46(6pt1):1863 -82.
14. Liu C-FC, Michael K, Perkins Mark W, Fortney John, Maciejewski Matthew L.
The impact of contract primary care on health care expenditures and quality of
care. Medical Care Research and Review. 2008;65(3):300–314.
15. Wieland D, Kinosian B, Stallard E, Boland R. Does medicaid pay more to a
program of all-inclusive care for the elderly (PACE) Than for fee-for-service
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
long-term care? The Journals of Gerontology Series A: Biological Sciences and
Medical Sciences. 2013;68(1):47–55.
Zwanziger J, Melnick GA, Bamezai A. Costs and price competition in California
hospitals, 1980–1990. Health Affairs. 1994;13(4):118–126.
Nardin R, Himmelstein D, Woolhandler S. Medical spending and global budgets.
Health Affairs. 2012;31(11):2592.
Melnick GA, Zwanziger J, Bamezai A, Pattison R. The effects of market structure
and bargaining position on hospital prices. Journal of Health Economics. 1992;11
(3):217–233.
Vogt, WB, Town, R. How has hospital consolidation affected the price and
quality of hospital care? ;2006.
Capps C, Dranove D. Hospital consolidation and negotiated PPO prices. Health
Affairs. 2004;23(2):175–181.
CareFirst BlueCross Blue Shield. Patient-centered medical home program: program description and guidelines; 2011.
Bielaszka-DuVernay C. Vermont’s blueprint for medical homes, community
health teams, and better health at lower cost. Health Affairs. 2011;30
(3):383–386.
Song Z, Safran DG, Landon BE, He Y, Ellis RP, Mechanic RE, et al. Health care
spending and quality in year 1 of the alternative quality contract. New England
Journal of Medicine. 2011;365(10):909–918.
Korenstein D, Falk R, Howell EA, Bishop T, Keyhani S. Overuse of health care
services in the United States: An understudied problem. Archives of Internal
Medicine. 2012;172(2):171.
Shreibati JB, Baker LC. The relationship between low back magnetic resonance
imaging, surgery, and spending: impact of physician self‐referral status. Health
Services Research. 2011;46(5):1362–1381.
Deyo RA, Mirza SK, Martin BI, Kreuter W, Goodman DC, Jarvik JG. Trends, major
medical complications, and charges associated with surgery for lumbar spinal
stenosis in older adults. The Journal of the American Medical Association.
2010;303(13):1259–1265.
Rau J. Hospitals Have Got Your Back, Maybe A Little Too Quickly. Shots: Health
News From NPR; 2011.
Glied S, Zivin JG. How do doctors behave when some (but not all) of their
patients are in managed care? Journal of Health Economics. 2002;21
(2):337–353.
Healthcare 1 (2013) 15–21
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Global budgets and technology-intensive medical services$
Zirui Song a,b,n, A. Mark Fendrick c,d, Dana Gelb Safran e,f, Bruce E. Landon a,g, Michael E. Chernew a,b
a
Department of Health Care Policy, Harvard Medical School, United States
National Bureau of Economic Research, United States
Department of Internal Medicine, Center for Value-Based Insurance Design, University of Michigan, United States
d
Department of Health Management & Policy, Center for Value-Based Insurance Design, University of Michigan, United States
e
Blue Cross Blue Shield of Massachusetts, United States
f
Department of Medicine, Tufts University School of Medicine, United States
g
Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Deaconess Medical Center, United States
b
c
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 18 January 2013
Accepted 10 April 2013
Available online 9 May 2013
Background: In 2009–2010, Blue Cross Blue Shield of Massachusetts entered into global payment contracts
(the Alternative Quality Contract, AQC) with 11 provider organizations. We evaluated the impact of the AQC
on spending and utilization of several categories of medical technologies, including one considered high
value (colonoscopies) and three that include services that may be overused in some situations (cardiovascular, imaging, and orthopedic services).
Methods: Approximately 420,000 unique enrollees in 2009 and 180,000 in 2010 were linked to primary care
physicians whose organizations joined the AQC. Using three years of pre-intervention data and a large
control group, we analyzed changes in utilization and spending associated with the AQC with a propensityweighted difference-in-differences approach adjusting for enrollee demographics, health status, secular
trends, and cost-sharing.
Results: In the 2009 AQC cohort, total volume of colonoscopies increased 5.2 percent (p¼ 0.04) in the first
two years of the contract relative to control. The contract was associated with varied changes in volume for
cardiovascular and imaging services, but total spending on cardiovascular services in the first two years
decreased by 7.4% (p¼0.02) while total spending on imaging services decreased by 6.1% (po0.001) relative
to control. In addition to lower utilization of higher-priced services, these decreases were also attributable to
shifting care to lower-priced providers. No effect was found in orthopedics.
Conclusions: As one example of a large-scale global payment initiative, the AQC was associated with higher
use of colonoscopies. Among several categories of services whose value may be controversial, the contract
generally shifted volume to lower-priced facilities or services.
& 2013 Elsevier Inc. All rights reserved.
Keywords:
Global payment
Medical technologies
1. Introduction
The growth of health care spending is a major policy concern.1–4
Over the past few years, insurers have begun to adopt new
payment strategies centered on moving away from fee-forservice towards bundled payments.5 Global budgets, the most
inclusive form of bundled payment in which groups of physicians
and hospitals—increasingly in the form of accountable care
organizations (ACOs)—receive a fixed amount for all of a patient’s
care over a defined time period, are currently being implemented
by Medicare and private insurers across the country.6,7 ACOs share
savings if total medical spending for their patient population
☆
Funded by the Institute for Health Technology Studies, the Commonwealth
Fund, and the National Institute on Aging.
n
Corresponding author at: Department of Health Care Policy, Harvard Medical
School, 180 Longwood Avenue, Boston, MA 02115, United States.
E-mail address: [email protected] (Z. Song).
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.003
comes in under the budget and may share deficits when spending
exceeds the budget. This latter financial risk gives physician groups
a strong incentive to contain spending.8
Economists have long concluded that medical technology is the
dominant driver of health care spending growth.9–12 Over the past
half century, rapid growth in medical technology has dramatically
increased treatment options for many acute and chronic conditions.
For example, cardiac catheterization has expanded medicine’s ability
to respond to ischemic heart disease. Imaging advancements such
as computed tomography (CT) and magnetic resonance imaging
(MRI) have revolutionized the speed and accuracy of diagnosis.
New pharmaceuticals have introduced treatment options against
diseases like cancer and rheumatoid arthritis, turning them from
fatal or livable diagnoses into livable chronic conditions. Advancements in surgery and image-guided interventions have broadened
the scope of treatment for many conditions.
As medical technologies flourished, the appropriateness of
their use became a subject of debate. In many situations, the
16
Z. Song et al. / Healthcare 1 (2013) 15–21
appropriateness of an intervention is well accepted and backed by
formal practice guidelines. For example, therapies for secondary
prevention after a heart attack are generally both clinically
effective and low-cost.13 Colonoscopy screening for colorectal
cancer is also recognized as highly beneficial, especially in older
populations.14–16 In other clinical scenarios, technologies may be
inappropriately used in some circumstances. For example, percutaneous coronary intervention is frequently performed for nonacute indications across U.S. hospitals, giving rise to overuse.17,18
The potential overuse of diagnostic imaging in recent decades has
also come under scrutiny, such as CT or MRI for patients with low
back pain without neurological symptoms or risk factors indicating
a need for imaging.19,20 In the case of low back pain, reported
health and functioning have not improved as spending on such
imaging accrued, supporting the notion that both cost savings and
clinical benefit (through avoiding unnecessary radiation exposure)
could be achieved through lower use.21,22 Inappropriate imaging
generated through self-referrals has been identified as an especially important problem, as physicians have increasingly owned
their own diagnostic imaging equipment or facilities.23 Even
preventive care is susceptible to overuse in certain populations,
such as women who receive unnecessary screening for cervical
cancer after undergoing hysterectomy for benign causes.24 Such
scenarios are among the lists of low-value services produced by
the American Board of Internal Medicine’s “Choosing Wisely”
campaign in partnership with 17 specialty societies.25,26
For many reasons, a determination of appropriateness or
“value” for any clinical service is inherently difficult. The appropriateness of any individual service depends on a variety of factors,
some of which are difficult if not impossible to measure. They
include its timing in a patient’s trajectory of care (removing a
blood clot from the brain goes from high to low value in a matter
of minutes), location of delivery (the same service in a hospital
can cost much less in a clinic), and the complex clinical situation
within which the treatment decision is made (patients with
multiple co-morbidities). This clinical nuance extends to areas of
medicine, in which the appropriateness of interventions depends
on a patient’s particular risk factors. Other inputs include various
dimensions of patient preferences and supply-side (physician or
hospital) factors, such as capacity, practice volume, and specialization, which further complicate the value equation.27 Research
suggests that in heart attack treatments, the value of any treatment strategy depends on the patient’s fitness for the treatment
as well as the expertise a provider has with the technology.28
Physician expertise through the volume-outcomes relationship is
also sure to play a role.29,30 Thus, the value of any service may be
quite heterogeneous across a population, with some sure to have a
wider distribution around the average than others.31
Nevertheless, appropriateness of care has become a focus of
health policy, with major legislative efforts such as the Affordable
Care Act motivated, in part, by regional variations in care suggesting
that perhaps one-third of U.S. health care spending is wasteful.32,33
As the nation experiments with global budgets, understanding their
effects on technology-intensive services whose appropriateness can
often be unclear is important. We studied the impact of a widespread
global budget initiative in Massachusetts on several such categories
of services: those thought to be beneficial (colonoscopies), those
thought to be overused in some cases (cardiovascular and imaging),
and those thought to be driven substantially by patient preferences
(orthopedics).
1.1. The Alternative Quality Contract
In 2009, Blue Cross Blue Shield of Massachusetts (BCBS)
entered into global budget contracts with 7 physician organizations in Massachusetts.34 Four additional organizations joined in
2010 with 15 groups now participating as of 2012. The “Alternative
Quality Contract” (AQC) is a 5-year contract that pays organizations a global budget for the entire continuum of care for a
population of enrollees in health maintenance organization plans.
Enrollees designate a primary care physician (PCP) each year, and
budgets are allocated to the PCP’s organization. More than 1600
PCPs and 3200 specialists practice in organizations participating in
the AQC, ranging from multi-specialty practices to large tertiary
care systems. The AQC also includes pay-for-performance bonuses
of up to 10% of the organization’s budget, based on both inpatient
and outpatient quality measures. To support AQC organizations,
BCBS provides technical assistance including periodic reports that
compare an organization’s spending and quality to those of others.
Previous work found that the AQC was associated with a 1.9%
reduction in spending and modest quality improvements in
the first year.35 This grew to a 3.3% reduction in the second year
with larger improvements in quality.36 On the whole, savings were
achieved through lower prices in the first year, consistent with the
organizations’ focus of shifting referrals to less expensive providers.37 By the second year, reductions in utilization also contributed to savings.
2. Methods
2.1. Study design
The population included enrollees from January 2006 through
December 2010 who were continuously enrolled for at least one
calendar year. The 2009 intervention cohort consisted of 428,892
enrollees whose PCPs’ organizations joined the AQC in 2009,
and the 2010 cohort consisted of an additional 183,655 enrollees
of organizations that joined in 2010. About 1.3 million enrollees
whose PCPs did not participate in the AQC served as controls.
Characteristics of the population have been described elsewhere.
Enrollees averaged 35 years in age, with 50% female. Average costsharing was about 15%. These were stable across the study period.
We used a difference-in-difference approach to characterize the
treatment effect. Our primary analysis consists of the 2009 cohort,
for whom the pre-intervention period was 2006–2008 and postintervention was 2009–2010. Within this cohort, we pre-specified
2 subgroups. The “prior-risk” subgroup consisted of 4 organizations
that had prior experience with risk-based contracts from BCBS (88%
of the cohort), and the “no-prior-risk” subgroup consisted of 3 organizations that entered the contract without BCBS risk-contracting
experience (12% of the cohort). Prior-risk organizations tended to be
larger and more established in the marketplace, whereas the latter
included physicians groups which were formed in preparation for
entering the risk contract. We also analyzed the 2010 AQC cohort,
comprising of 4 organizations without prior risk contracting, analogous to the no-prior-risk subgroup.
We identified colonoscopy, cardiovascular, imaging, and orthopedic services using Current Procedural Terminology codes and
the 2010 Berenson–Eggers Type of Service (BETOS) classification
system from the Centers for Medicare and Medicaid Services.39
2.2. Statistical analysis
The dependent variable was spending (including patient cost
sharing) in 2010 dollars or utilization. Spending was computed
from claims payments made within the global budget, which
reflects negotiated fee-for-service prices. Utilization was computed
by counting services. For ease of interpretation, we scaled utilization data to volume per thousand enrollees.
We used a multivariate linear model at the enrollee-quarter
level. We controlled for age categories, interactions between age
Z. Song et al. / Healthcare 1 (2013) 15–21
and sex, and enrollee risk score. Risk scores were calculated by
BCBS from current year demographics and diagnoses grouped by
episodes of care, similar to the Hierarchical Coexisting Conditions
(HCC) risk adjustment system used by the Centers for Medicare
and Medicaid Services to adjust payments to Medicare Advantage
plans.40 The risk score comes from a statistical model relating
current year spending to current year diagnoses and demographic
information. Higher scores denote greater expected spending.
In our base model, we also included indicators for intervention
status, quarter, quarter-intervention interactions, the post-intervention period, and the interaction between intervention and the
post-intervention period, which produced our estimate of the
policy effect. To balance the sample on observable traits, the
model used propensity weights calculated from age, sex, risk
17
score, and cost-sharing. Given the possibility of unobserved
demand-side incentives, the model included indicators for each
specific benefit design within BCBS HMO plans. Consistent with
our prior work, the model was not logarithmic-transformed
because the risk score is designed to predict dollar spending and
linear models have been shown to better predict health spending
than more complex functional forms.41–44 Standard errors were
clustered at the practice level.45,46 Results are reported with 2tailed p-values.
We tested the model for differences in pre-intervention trends
between treatment and control enrollees. The lack of a difference
in pre-intervention trends supports our identification strategy
in the difference-in-differences framework. Resulting changes
in spending associated with the AQC can be explained by changes
Fig. 1. Colonoscopy utilization. For all figures, AQC denotes the 2009 intervention cohort, and Non-AQC denotes the control. The x-axis represents 2006–2010 in quarters,
with the vertical line placed at the start of 2009 when the AQC was implemented.
Table 1
Changes in utilization in treatment and control groups (volume per 1000 enrollees per quarter).
Category
Treatment enrollees (AQC)
(N ¼428,892)
Pre-AQC
(2006–08)
Post-AQC
(2009–10)
Control enrollees (Non-AQC)
(N¼ 1,339,798)
Change
Pre-AQC
(2006–08)
Post-AQC
(2009–10)
Between-group difference
Change
Average 2-year effect (2009–10)
Unadjusted
Colonoscopy
All enrollees
Over 50
Cardiovascular
CABG
Aneurysm repair
Endarterectomy
Angioplasty
Pacemaker
Other
Imaging
Standard imaging
CT
MRI
Ultrasound/echo
Imaging procedures
Orthopedics
Hip replacement
Knee replacement
Adjusted
p-Value
13.53
40.76
14.60
40.95
1.07
0.19
12.62
35.09
12.36
33.11
−0.26
−1.98
1.33
2.17
0.70
1.90
0.04
0.13
0.14
0.01
0.04
0.58
0.14
3.71
0.15
0.02
0.03
0.49
0.15
2.85
0.01
0.01
−0.01
−0.09
0.01
−0.86
0.17
0.02
0.03
0.62
0.18
4.26
0.15
0.03
0.04
0.60
0.19
3.00
−0.02
0.01
0.01
−0.02
0.01
−1.26
0.03
0.00
−0.02
−0.07
0.00
0.40
0.01
0.00
−0.01
−0.12
−0.02
0.15
0.60
0.80
0.07
0.02
0.41
0.46
251.99
38.23
49.18
99.56
13.26
269.85
39.21
48.50
95.80
15.67
17.86
0.98
−0.68
−3.76
2.41
265.21
41.30
48.00
96.22
14.28
271.10
39.59
46.95
89.00
16.03
5.89
−1.71
−1.05
−7.22
1.75
11.97
2.69
0.37
3.46
0.66
6.19
1.16
−1.06
0.47
0.07
0.045
0.07
0.30
0.72
0.87
0.15
0.23
0.22
0.31
0.07
0.08
0.18
0.29
0.22
0.35
0.04
0.06
0.03
0.02
0.02
0.01
0.29
0.63
18
Z. Song et al. / Healthcare 1 (2013) 15–21
in utilization (quantity) or changes in prices. We decomposed
spending results by category into price and quantity components
by standardizing the prices for each service to its median price across
all providers in 2006–2010. Differences in spending from repriced
claims reflect differences in utilization. We further assessed whether
the price effect was due to differential changes in negotiated fees or
differential changes in referral patterns (referring patients to less
expensive physicians or hospitals). We used our models of utilization
to directly analyze the relationship between the AQC and quantity of
specific services within each category.
Under a global budget, there may be incentives to upcode for
increased payments, which would make patients seem sicker and
thus spending adjusted for health status seem lower relative to
the control group. In prior work, we explored this concern and
showed that risk score changes associated with the AQC explain only
a nominal share of the spending difference. We used STATA software,
version 11. The study was approved by Harvard Medical School.
AQC cohort spent on average 7.4% less (−$1.47 per member per
quarter, p ¼0.02) relative to control. By year, these savings were
6.8% (p ¼0.001) in year-1 and 7.7% (p ¼0.04) in year-2 (Table 2).
The greater difference in year-2 was unrelated to the removal of
the 2010 cohort from the control group in year-2 (the 2010 cohort
belonged to the control group in year-1). Although selection into
the AQC was non-random, an interaction of the secular trend with
the AQC indicator demonstrated no significant spending trend
differences between AQC and non-AQC groups prior to the intervention. This suggests that the AQC effect is not explained by
differential underlying trends in spending between the groups.
The 2010 cohort spent 11.1% less (−$2.54, p ¼0.004) on cardiovascular services relative to control its first year.
3. Results
3.1. Colonoscopies
Over the first two years of the AQC, the 2009 cohort saw an
increase of 0.7 (p ¼0.04) colonoscopies per 1000 enrollees per
quarter relative to control (Fig. 1), amounting to a 5.2% rise in
the volume of colonoscopies (Table 1). Because colonoscopies are
clinically indicated as a screening tool in patients 50 years or older,
consistent with higher baseline volumes shown in Table 1, we
repeated our analysis in this subpopulation. Estimates show a
statistically insignificant increase of 1.9 (p¼ 0.13) colonoscopies
per 1000 enrollees per quarter, or about 4.7%, relative to control. In
the 2010 cohort, no changes in colonoscopy utilization associated
with the AQC were found in its first year (−0.3, p ¼0.48).
Spending on colonoscopies increased by 5.7% over the first
2 years in the 2009 cohort ($0.54 per enrollee per quarter). As
Table 2 shows, this increase was statistically significant in year-1
(p ¼0.005) but not significant in year-2 (p ¼0.22). The 2010 cohort
saw no changes in colonoscopy spending relative to control in its
first year ($0.10, p¼ 0.88). Overall, changes in colonoscopy spending were driven by the changes in volume, as we did not find
significant changes in colonoscopy prices or location of care
associated with the AQC.
3.2. Cardiovascular services
Spending on cardiovascular services was higher in the postAQC period relative to pre-AQC for both intervention and control
subjects, but the difference was smaller in AQC subjects. The 2009
Fig. 2. Cardiovascular utilization.
Table 2
Changes in spending in treatment and control groups ($ per enrollee per quarter).
Category
Treatment enrollees (AQC)
(N ¼ 428,892)
Pre-AQC
Post-AQC
Change
Control enrollees (non-AQC)
(N ¼1,339,798)
Between-group difference
Pre-AQC
Average 2-year effect
Post-AQC
Change
Unadj.
Colonoscopy
All enrollees
Over 50 yrs
9.42
27.75
Cardiovascular
20.03
22.17
2.14
22.40
Imaging
93.10
106.54
13.44
102.31
4.12
5.68
1.56
4.88
6.30
Orthopedics
11.64
31.73
2.22
3.98
9.39
25.39
10.65
27.78
Adj.
Year-1 effect
P
Year-2 effect
P
1.26
2.39
0.96
1.59
0.54
1.37
0.16
0.24
0.55
1.58
25.40
3.00
−0.86
−1.47
0.02
118.80
16.49
−3.05
−5.67
0.001
1.42
0.14
0.04
0.83
P
0.005
0.01
0.56
1.24
0.22
0.38
−1.37
0.001
−1.54
0.04
−4.40
o 0.001
−6.86
0.001
−0.01
0.95
0.07
0.77
Z. Song et al. / Healthcare 1 (2013) 15–21
19
For the 2009 cohort, about one-third of the savings was
explained by lower utilization of services, while two-thirds by
lower prices. In our decomposition of the price effect, we found no
differences in price trends between AQC and non-AQC providers.
Instead, spending reductions due to prices were explained by
patients receiving care in outpatient facilities with less expensive
fees, consistent with our prior work. Direct analyses of cardiovascular services (coronary artery bypass grafts, aneurysm repairs,
endarterectomies, angioplasties, and pacemaker insertions) are
reported in Table 1 and plotted in Fig. 2. Results suggest that a
decrease in angioplasty volume likely drove the differences in
spending. Other cardiovascular services did not undergo changes
in volume, consistent with Fig. 2.
Spending reductions were larger in the no-prior-risk subgroup,
which had an average reduction of 27.2% (−$6.06, po 0.001) in
cardiovascular spending relative to control in the first two years. In
contrast, the prior-risk subgroup experienced smaller reductions
of 4.1% (−$0.85, p¼ 0.17). Sensitivity analyses supported our main
results.
3.3. Imaging
The 2009 AQC cohort spent less on imaging relative to control
in the first 2 years. The AQC was associated with a 6.1% decrease
(−$5.67 per member per quarter, p ¼0.001) on imaging spending.
The year-1 reduction was 4.7% (−$4.40, po0.001) and year-2
was 7.4% (−$6.86, p ¼0.001) (Table 2). There were no significant
differences in pre-intervention spending trends on imaging between
the AQC and non-AQC groups.
Spending reductions were found in the outpatient setting
rather than the hospital. In addition, outpatient facility spending
(the portion of the fee that goes to the facility) accounted for
almost all of the savings, suggesting that imaging services were
referred to less expensive settings (such as non-hospital based),
giving rise to a large price effect that drove the findings. Consistent
with this and Fig. 3, direct analyses of utilization found no
statistically significant decreases in the volume of CTs, MRIs,
ultrasounds/echocardiograms, or imaging procedures (Table 1).
Standard imaging, mostly comprised of X-rays, actually saw an
increase in volume of 2.5% (p ¼0.045) over the first 2 years (6.19
per 1000 enrollees per quarter). This further implies a substantial
price effect that was able to overcome some volume increases to
produce savings.
Similar to cardiovascular services, savings were larger in the
no-prior-risk subgroup. In this subgroup, the AQC was associated
with a reduction of 11.9% on imaging spending (−$10.54 per
enrollee per quarter, p ¼0.02) after two years. The 2010 cohort,
all no-prior-risk groups, saw a similar reduction of 11.0% (−$10.90,
p o0.001) in its first year. In the prior-risk subgroup, savings were
smaller. The average reduction after 2 years was 4.7% (−$4.51,
p o0.001).
3.4. Orthopedics
The AQC was not associated with any changes in spending on
orthopedic services, nor were there any associated changes in
utilization. The average 2-year effect of the AQC on total orthopedic spending was an insignificant 0.9% ($0.04 per enrollee per
quarter, p ¼ 0.83). Neither year-1 nor year-2 saw significant effects
(Table 2). This was consistent across prior-risk and no-prior-risk
groups in the 2009 cohort. In addition, neither facility nor nonfacility components of orthopedic spending were affected. The
2010 cohort saw a large though statistically insignificant reduction
in orthopedic spending of 12.7% (−$0.70, p ¼0.09).
Fig. 3. Imaging utilization.
Consistent with Fig. 4, direct analyses of utilization demonstrated that the AQC was not associated with changes in the
volume of hip or knee replacements (Table 1).
4. Discussion
The AQC was associated with increased spending on colonoscopies due to increased volume. It was associated with reductions
in spending on cardiovascular and imaging services, but not on
orthopedic services. Reductions in spending were larger in year-2
than in year-1, and were explained both by decreased utilization
and lower prices achieved through referring patients to less
expensive providers. Consistent with qualitative evidence gathered from organizational leaders, shifting referral patterns to less
expensive providers was the low-hanging fruit by which organizations in the AQC aimed to achieve savings in the early years.48
However, among both cardiovascular and imaging services, about
one-third of the reductions in spending, concentrated in year-2
and in groups who entered the contract from fee-for-service, was
explained by lower volume. This suggests that even in the early
years, organizations were able to achieve reductions in volume in
some services.
Under bundled and global payment systems, there can be a
tradeoff between controlling spending and supporting innovation.
Innovations in medicine often lead to technologies that are high
cost, such as new devices or diagnostic modalities, some of which
may be high or low value depending on the clinical situation. In
cases such as angioplasty and imaging, extending these innovations to patients in certain situations, such as a lack of clinical
indications for use or purely elective use, have been questioned on
20
Z. Song et al. / Healthcare 1 (2013) 15–21
Fig. 4. Orthopedic utilization.
the grounds of appropriateness. One way that systems operating
under global payments have to reduce spending while maintaining quality is to lower spending on low-value services. Thus, a
crucial question surrounding our findings is whether the foregone
spending was for high or low value services. While our analysis of
claims data does not include any conclusive assessments of the
effect on value, several pieces of supporting evidence that suggest
the AQC did not adversely affect the quality of care and may
have successfully targeted services that may be overused in some
situations.
First, in other research we found that the AQC did not
negatively affect the pay-for-performance measures of quality
used in the AQC program. In contrast, we found that the AQC
was associated with improvements in quality in chronic care
management, adult preventive care, as well as pediatric care.
Second, the AQC was associated with increased use of colonoscopies, a service generally supported by practice guidelines. Third,
our findings on imaging (Fig. 3) are consistent with broader trends
in the slow down of imaging, which is posited to be driven largely
by reductions in use in lower-value situations.50 Finally, our
cardiology results are also consistent with a focus on value. Our
results merely provide suggestions, rather than conclusive evidence, about value. We must emphasizing that value depends on
the clinical situation.
Our study has several limitations. First, the population was
drawn from a large commercially insured HMO population in
Massachusetts, so results may not generalize to Medicare or
enrollees in other types of health plans. Second, we did not
observe details of each AQC contract or provider risk contracting
with other payers, which have become more prevalent in 2010.
Third, using administrative data, we cannot assess the appropriateness of any specific instance of utilization. Fourth, our service
categories do not capture the broad array of medical technologies
that patients and doctors use. We merely explored a small mixture
of services. In addition, beliefs about appropriateness may be in
flux; for example, recent practice guidelines for colonoscopies
have also called into question its presumed high value.51 Finally,
we caution that multiple comparisons within categories of services may produce spurious results.
Slowing the growth of health care spending without hurting
quality is a central goal.52,53 As payment reform and ACOs
gain momentum across the country, and physician organizations
adapt to a more constrained environment, avoiding the blunting of
technological innovations in medicine is an important priority.54
Different medical technologies will likely be affected differently
by global budgets. This will depend on providers’ and patients’
treatment choices, but it will also depend on the ability of device
makers and manufacturers of tools, scanners, and other technologies to focus on innovations that are high value. As lessons from
early global or bundled payment systems amass,55,56 an understanding of the mechanisms by which spending is reduced will be
as important to policymakers as that it can be reduced at all.
Acknowledgment
Supported by a grant from the Institute for Health Technology
Studies and a grant from the Commonwealth Fund (to Dr. Chernew).
Dr. Song is supported by a predoctoral M.D./Ph.D. National Research
Service Award (F30-AG039175) from the National Institute on Aging
and a predoctoral Fellowship in Aging and Health Economics (T32AG000186) from the National Bureau of Economic Research. None of
the funders were involved in the design and conduct of the study;
collection, management, analysis, and interpretation of the data; or
preparation, review, or approval of the manuscript. The content of this
article is solely the responsibility of the authors and does not
necessarily represent the official views of the National Institute on
Aging or the National Institutes of Health. The authors thank Yanmei
Liu for programming assistance and Johan Hong for help with
manuscript editing and preparation.
References
1. Emanuel E, Tanden N, Altman S, et al. A systemic approach to containing health
care spending. New England Journal of Medicine. 2012;367(10):949–954.
2. Antos JR, Pauly MV, Wilensky GR. Bending the cost curve through marketbased incentives. New England Journal of Medicine. 2012;367(10):954–958.
3. Chernew ME, Baicker K, Hsu J. The specter of financial armageddon—health
care and federal debt in the United States. New England Journal of Medicine.
2010;362(13):1166–1168.
4. Aaron HJ. The central question for health policy in deficit reduction. New
England Journal of Medicine. 2011;365:1655–1657.
5. Fisher ES, McClellan MB, Safran DG. Building the path to accountable care. New
England Journal of Medicine. 2011;365:2445–2447.
6. Frakt AB, Mayes R. Beyond capitation: how new payment experiments seek to
find the 'sweet spot' in amount of risk providers and payers bear. Health Affairs
(Millwood). 2012;31(9):1951–1958.
7. Meyer H. Many accountable care organizations are now up and running, if not
off to the races. Health Affairs (Millwood). 2012;31(11):2363–2367.
Z. Song et al. / Healthcare 1 (2013) 15–21
8. Engleberg Center for Health Care Reform. Bending the Curve: Effective Steps to
Address Long-Term Health Care Spending Growth. The Brookings Institution;
August 2009. 〈http://www.brookings.edu/reports/2009/0901_btc.aspx〉.
9. Newhouse JP. Medical care costs: how much welfare loss? Journal of Economic
Perspectives. 1992;6(3):3–21.
10. Garber AM, Skinner J. Is American health care uniquely inefficient? Journal of
Economic Perspectives. 2008;22(4):27–50.
11. Chernew ME, Newhouse JP. Health care spending growth. In: Mark V, Pauly
Thomas G, McGuire, Barros Pedro P, editors. Handbook of Health Economics, Vol.
2. North Holland: Elsevier Science; 2012. p. 1–43.
12. Cutler DM. Your Money or Your Life: Strong Medicine for America's Health Care
System; 2005.
13. Smith Jr. SC, Benjamin EJ, Bonow RO, et al. AHA/ACCF secondary prevention
and risk reduction therapy for patients with coronary and other atherosclerotic
vascular disease: 2011 update: a guideline from the American Heart Association and American College of Cardiology Foundation. Circulation. 2011;124:
2458–2473.
14. Sonnenberg A, Delco F, Inadomi JM. Cost-effectiveness of colonoscopy in
screening for colorectal cancer. Annals of Internal Medicine. 2000;133:573–584.
15. Frazier AL, Colditz GA, Fuchs CS, et al. Cost-effectiveness of screening for
colorectal cancer in the general population. Journal of the American Medical
Association. 2000;284:1954–1961.
16. Winawer S, Fletcher R, Rex D, et al. Colorectal cancer screening and
surveillance: clinical guidelines and rationale—update based on new evidence.
Gastroenterology. 2003;124:544–560.
17. Chan PS, Patel MR, Klein LW, et al. Appropriateness of percutaneous coronary
intervention. Journal of the American Medical Association. 2011;306(1):53–61.
18. Bradley SM, Maynard C, Bryson CL. Appropriateness of percutaneous coronary
interventions in Washington State. Circulation: Cardiovascular Quality and
Outcomes. 2012;5(4):445–453.
19. Chou R, Deyo RA, Jarvik JG. Appropriate use of lumbar imaging for evaluation of
low back pain. Radiologic Clinics of North America. 2012;50(4):569–585.
20. Webster BS, Courtney TK, Huang YH, Matz S, Christiani DC. Physicians' initial
management of acute low back pain versus evidence-based guidelines. Influence of sciatica. Journal of General Internal Medicine. 2005:201132–201135.
21. Martin BI, Deyo RA, Mirza SK, et al. Expenditures and health status among
adults with back and neck problems. Journal of the American Medical Association. 2008;299:656–664.
22. Chou R, Qaseem A, Owens DK, et al. Diagnostic imaging for low back pain:
advice for high-value health care from the American College of Physicians.
Annals of Internal Medicine. 2011;154(3):181–189.
23. Paxton BE, Lungren MP, Srinivasan RC, et al. Physician self-referral of lumbar
spine MRI with comparative analysis of negative study rates as a marker of
utilization appropriateness. American Journal of Roentgenology. 2012;198
(6):1375–1379.
24. Morbidity and Mortality Weekly Report (MMWR). Cervical cancer screening
among women by hysterectomy status and among women aged ≥65 years—
United States, 2000–2010. Morbidity and Mortality Weekly Report (MMWR).
2013;61(51):1043–1047.
25. Choosing Wisely: An initiative of the American Board of Internal Medicine. www.
choosingwisely.org.
26. Rao VM, Levin DC. The overuse of diagnostic imaging and the Choosing Wisely
initiative. Annals of Internal Medicine. 2012;157(8):574–576.
27. Chandra A, Cutler D, Song Z. Who ordered that? The economics of treatment
choices in medical care In: Pauly MV, McGuire TG, Barros PP, editors. Handbook
of Health Economics, Vol. 2. North Holland: Elsevier Science; 2012. p. 397–432.
28. Chandra A, Staiger DO. Productivity spillovers in health care: evidence from the
treatment of heart attacks. Journal of Political Economy. 2007;115(1):103–140.
29. Birkmeyer JD, Siewers AE, Finlayson EV, et al. Hospital volume and surgical
mortality in the United States. New England Journal of Medicine. 2002;346:
1128–1137.
30. Van Heek NT, Kuhlmann KF, Scholten, et al. Hospital volume and mortality
after pancreatic resection: a systematic review and an evaluation of intervention in the Netherlands. Annals of Surgery. 2005;242(6):781–788.
21
31. Amitabh C, Jonathan S. Technology growth and expenditure growth in health
care. Journal of Economic Literature. 2012;50(3):645–680.
32. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The
implications of regional variations in Medicare spending. Part 1: the content,
quality, and accessibility of care. Annals of Internal Medicine. 2003;138:273–287(4).
2003;138:273–287.
33. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The
implications of regional variations in Medicare spending. Part 2: health outcomes
and satisfaction with care. Annals of Internal Medicine. 2003;138:288–298(4).
2003;138:288–298.
34. Chernew ME, Mechanic RE, Landon BE, Safran DG. Private-payer innovation in
Massachusetts: the ‘Alternative Quality Contract’. Health Affairs (Millwood).
2011;30(1):51–61.
35. Song Z, Safran DG, Landon BE, et al. Health care spending and quality in year
1 of the Alternative Quality Contract. New England Journal of Medicine. 2011;365(10):
909–918.
36. Song Z, Safran DG, Landon BE, et al. The ‘Alternative Quality Contract,’ based on
a global budget, lowered medical spending and improved quality. Health Affairs
(Millwood). 2012;31(8):1885–1894.
37. Mechanic RE, Santos P, Landon BE, Chernew ME. Medical group responses to
global payment: early lessons from the ‘Alternative Quality Contract’ in
Massachusetts. Health Affairs (Millwood). 2011;30(9):1734–1742.
39. Centers for Medicare and Medicaid Services. Berenson-Eggers Type of Service;
2010. Available from: 〈https://www.cms.gov/HCPCSReleaseCodeSets/20_BETOS.asp〉.
40. Pope GC, Kautter J, Ellis RP, et al. Risk adjustment of medicare capitation
payments using the CMS-HCC model. Health Care Financing Review. 2004;25
(4):119–141.
41. Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk
adjustment of skewed outcomes data. Journal of Health Economics. 2005;24(3):
465–488.
42. Zaslavsky AM, Buntin MB. Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures Journal of
Health Economics. 2004;23:525–542.
43. Ai C, Norton EC. Interaction terms in logit and probit models. Economics Letters.
2003;80:123–129.
44. Ellis RP, McGuire TG. Predictability and predictiveness in health care spending.
Journal of Health Economics. 2007;26(1):25–48.
45. White H. A heteroskedasticity-consistent covariance matrix estimator and a
direct test for heteroskedasticity. Econometrica. 1980;48(4):817–830.
46. Goldberger AS. A Course in Econometrics. Cambridge, MA: Harvard University
Press; 1991.
48. Mechanic RE, Santos P, Landon BE, Chernew ME. Medical group responses to
global payment: early lessons from the ‘Alternative Quality Contract’ in
Massachusetts. Health Affairs (Millwood). 2011;30(9):1734–1742.
50. Lee DW, Levy F. The sharp slowdown in growth of medical imaging: an early
analysis suggests combination of policies was the cause. Health Affairs (Millwood).
2012;31(8):1876–1884.
51. Austin GL, Fennimore B, Ahnen DJ. Can colonoscopy remain cost-effective for
colorectal cancer screening? The impact of practice patterns and the Will
Rogers Phenomenon on costs American Journal of Gastroenterology. 2013;108
(3):296–301.
52. Berwick DM. Making good on ACOs' promise—the final rule for the medicare
shared savings program. New England Journal of Medicine. 2011;365:1753–1756.
53. McClellan M, McKethan AN, Lewis JL, Roski J, Fisher ES. A national strategy
to put accountable care into practice. Health Affairs (Millwood). 2010;29(5):
982–990.
54. Altman SH. The lessons of Medicare's prospective payment system show that
the bundled payment program faces challenges. Health Affairs (Millwood).
2012;31(9):1923–1930.
55. Weissman JS, Bailit M, D'Andrea G, Rosenthal MB. The design and application of
shared savings programs: lessons from early adopters. Health Affairs (Millwood).
2012;31(9):1959–1968.
56. Fisher ES, Shortell SM, Kreindler SA, Van Citters AD, Larson BK. A framework for
evaluating the formation, implementation, and performance of accountable
care organizations. Health Affairs (Millwood). 2012;31(11):2368–2378.
Healthcare 1 (2013) 22–29
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Reliability of utilization measures for primary care physician profiling
Hao Yu a,n, Ateev Mehrotra b,1, John Adams c,2
a
b
c
RAND Corporation, 4570 Fifth Avenue, Pittsburgh, PA 15213, USA
RAND Pittsburgh, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
RAND Corporation, 1776 Main Street, Santa Monica, CA 90401-3208, USA
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 15 January 2013
Received in revised form
12 April 2013
Accepted 15 April 2013
Available online 9 May 2013
Background: Given rising health care costs, there has been a renewed interest in using utilization
measures to profile physicians. Despite the measures' common use, few studies have examined their
reliability and whether they capture true differences among physicians.
Methods: A local health improvement organization in New York State used 2008–2010 claims data to
create 11 utilization measures for feedback to primary care physicians (PCP). The sample consists of 2938
PCPs in 1546 practices who serve 853,187 patients. We used these data to measure reliability of these
utilization measures using two methods (hierarchical model versus test–retest). For each PCP and each
practice, we estimate each utilization measure’s reliability, ranging from 0 to 1, with 0 indicating that all
differences in utilization are due to random noise and 1 indicating that all differences are due to real
variation among physicians.
Results: Reliability varies significantly across the measures. For 4 utilization measures (PCP visits,
specialty visits, PCP lab tests (blood and urine), and PCP radiology and other tests), reliability was high
(mean4 0.85) at both the physician and the practice level. For the other 7 measures (professional
therapeutic visits, emergency room visits, hospital admissions, readmissions, skilled nursing facility days,
skilled home care visits, and custodial home care services), there was lower reliability indicating more
substantial measurement error.
Conclusions: The results illustrate that some utilization measures are suitable for PCP and practice
profiling while caution should be used when using other utilization measures for efforts such as public
reporting or pay-for-performance incentives.
& 2013 Elsevier Inc. All rights reserved.
Keywords:
Physician profiling
Utilization measures
Reliability
Claims data
1. Introduction
Given the growing concerns about rising health care costs, there
is a push to profile providers on costs of care. Utilization measures
are commonly used to reflect costs. Utilization measures capture the
use of health services within a given patient population and, together
with medical prices, drive total costs. Common utilization measures
include the number of specialty visits referred among a primary care
physician’s (PCP) patient panel or the number of emergency room
visits among all the patients at a practice. Since these types of
measures are readily obtained from health plan claims,1 they have
been used in many ways, including confidentially providing feedback
to physicians,2,3 as a basis for financial incentives for physicians,4,5
and public profiling of physicians.6 For example, Centers for Medicare
and Medicaid Services (CMS) has two new physician profiling efforts.
The Quality and Resource Use Reports will provide confidential
n
Corresponding author. Tel.: +1 412 683 2300x4460; fax: +1 412 683 2800.
E-mail addresses: [email protected] (H. Yu),
[email protected] (A. Mehrotra).
1
Tel.: +1 412 683 2300x4894; fax: +1 412 802 4972.
2
Tel.: +1 310 393 0411; fax: +1 310 393 4818.
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.002
feedback reports to tens of thousands of individual physicians on
utilization measures such as emergency visits, lab tests, and days in a
skilled nursing facility.7 The Physician Value-Based Payment Modifier,
which was called for by the Affordable Care Act to link physician
performance in quality and cost with Medicare Part B payments,8 is
expected to expand over the next 2 years to cover all physicians for
performance-based payment. There are also state and local initiatives, such as the California Quality Collaborative, which publicly
profiles over 35,000 physicians on utilization measures (e.g. emergency room visits).6
Despite the common use of utilization measures, there is little
research on the reliability of these measures.9 Reliability indicates
a measure’s capability to distinguish true differences in providers'
performance from random variation.10,11 Prior studies have examined reliability of health care quality measures, such as HEDIS,12
cost measures,13 and composite measures.14 Many of these studies
have found inadequate reliability for the study measures—the
measure is capturing significant random noise. However, to our
knowledge, there is no published research on reliability of utilization measures at the individual physician and practice level.
A critical issue of any such evaluation is how to measure the
reliability of utilization measures. Conceptually, reliability is the
H. Yu et al. / Healthcare 1 (2013) 22–29
squared correlation between a measure and the true value.15 Because
the true value is rarely known, researchers often estimate the lower
bound of reliability.16 For example, the use of test–retest reliability is a
common method for estimating reliability of survey instruments. In
prior work, Adams and colleagues have used simple hierarchical
models to help estimate reliability in cross-sectional data.13 In such
models, the observed variability is divided into two components:
variability of scores across all providers and variability of scores for
individual providers. The provider-to-provider variance is combined
with the provider-specific error variance to calculate the reliability of
the profile for an individual provider.
In this study we use these methods to assess the reliability of
utilization measures. Our data are from a local consortium that
created the utilization measures for physician feedback. As discussed below, our data come from multiple health plans, reflecting
ongoing efforts at the state- and local-level to create multiplepayer claims database for health care improvement.17 We look at
two different ways of measuring reliability (hierarchical model
versus test–retest) and discuss the differences between them.
2. Methods
2.1. Data
We obtained data from THINC, Inc., a nonprofit health improvement organization serving 9 counties in the Hudson Valley of New
York State (Putnam, Rockland, Westchester, Dutchess, Orange,
Ulster, Sullivan, Columbia, and Greene). In an effort to improve
the value of the health care provided in the region, THINC, Inc. has
taken a series of steps to construct and monitor utilization
measures for PCPs. First, it pooled enrollment and claims data
from five major health plans operating in the Hudson Valley,
including four commercial plans (Aetna, United Healthcare, MVP
Healthcare, and Capital District Physicians' Health Plan), and one
Medicaid HMO (Hudson Health Plan) that together cover approximately 60% of the insured population in the Hudson Valley.
Second, it used the methods depicted in Fig. A1 in Appendix A to
attribute the insured people to PCPs, who are defined as those
physicians holding an MD or DO and practicing in one of the
specialties of Family Practice, General Practice, Internal Medicine,
and Pediatrics. A patient can be attributed to one PCP only. PCPs were
aggregated into practices which are the individual physicians at a
single geographic site.
Third, it created annual utilization measures for each PCP
during the 3-year period of 2008–2010. The 11 utilization
Test_Retest_Reliability ¼
23
developed by DxCG, Inc. The patient co-morbidity score available
to us was a single value for the entire patient panel. We did not
have access to co-morbidity data or risk scores for individual
patients. It is common for health plans to profile physicians and
practices using just the average risk scores across a patient panel.
The study sample consists of 2938 PCPs in 1546 practices,
serving 853,187 patients in the Hudson Valley in 2008–2010.
2.2. Two methods of assessing reliability
We use two methods for assessing reliability. Our primary
method builds on the hierarchical model method used by Adams
and colleagues.13 We used the formula below to calculate reliability at the individual physician level:
Reliability ¼
Variance_across_physicians
Variance_across_physicians þ Variance_within_physicians
For a given patient population, the utilization by patients is best
viewed as count data. To calculate the variance in a given utilization measure among physicians, we first estimated a Poisson model
with random effect for each of the utilization measures. Each
model controls for both physician and patient characteristics, such
as the physician’s age, gender, and degree, the patient’s average age
and average risk adjustment scores, and the proportion of male
patients. Then, the variance among physicians is calculated as the
product of the model’s covariance and the average utilization rate
among physicians for a given utilization measure. For example, the
rate of specialist visits by a physician’s patient panel is the panel’s
total number of specialist visits divided by the panel size, with the
average rate of specialist visits being the mean rate across all the
physicians. This calculation is based on the delta method to
transform the log scale in the Poisson model into rate.
To calculate the variance of a given utilization measure within a
physician’s patient panel, we divided the panel’s rate of that
utilization measure by the panel size. For example, the variance
in specialist visits within a physician’s patient panel is the panel’s
rate of specialist visits (described above) divided by the panel size.
The calculated reliability ranges from 1 to 0, with 1 suggesting
that all the variation results from real differences in performance,
and 0 meaning that all the variation in utilization is caused by
measurement error. There is no gold standard for what constitutes
adequate reliability for a measure of a physicians' performance,
although cut-offs of 0.7 and 0.9 have been used in prior studies.18,19
As a second method, we applied the simple test–retest method,
which, shown conceptually, is
V ariance_across_physicians
Variance_across_physicians þ V ariance_of _time_change þ Variance_within_physicians
measures (see Appendix B for detailed definitions.), which include
annual PCP visits, specialty visits, PCP lab tests (blood and urine),
and PCP radiology and other tests, professional therapeutic visits,
emergency room visits, hospital admissions, readmissions, skilled
nursing facility days, skilled home care visits, and custodial home
care services, provide a comprehensive assessment of health
services provided to a PCP’s patient panel.
In addition to data on the utilization measures, we also
obtained the following data from THINC, Inc.: (1) physicians'
characteristics, such as degree, age, gender, and clinical specialty;
(2) characteristics of each PCP’s patient panel including average
age, and proportion of male patients; (3) linkage files that can be
used to link practices, physicians, and patients; (4) the average
co-morbidity score for a PCP’s patient panel; (5) the average
co-morbidity score for patients enrolled in a practice.
Co-morbidities were captured using a diagnosis-based model
For each physician and practice, we calculated the number of
annual visits by types of service utilization per 100 patients and
estimated the correlation between the 2008, 2009, and 2010
utilization measures. The calculation is based on the assumption
that there is no difference from year to year in a physician’s
practice pattern. Given there is likely some shift in practice across
years,20 we must consider the test–retest reliability as the lower
bound of any reliability estimate. In comparison, the above
hierarchical model reflects cross-sectional reliability. We will
examine how the results differ between the two methods.
2.3. Sensitivity analyses
To test the effect of different model specifications for calculating reliability, we conducted a number of sensitivity analyses,
including (1) one analysis that included only physicians'
24
H. Yu et al. / Healthcare 1 (2013) 22–29
Table 1
Characteristics of primary care physicians and their practices.
Variable
Number or % with standard
deviation in ( )
PCP level
Age of physician (%)
35 years old or lower
31–50 years old
51 years old and over
Gender of physician (%)
Male
Female
Degree of physician (%)
DO
MD
Number of patients in physician panel (%)
0–50
51–100
101–250
250+
Average age of physician’s panel (years)
Average proportion of male patients
in physician’s panel (%)
Average patient risk score in physician’s
panela
N ¼ 2938
Practice level
Number of providers in the practice (%)
1
2
3–5
6–10
11–15
16+
Number of patients in the practice (%)
0–50
51–100
101–250
250+
Average Age of patients in the practice
(years)
Average proportion of male patients in
the practice (%)
Average patient risk score in the practicea
5.5
60.7
33.8
tests (1186.6 lab tests per 100 patients per year) while the lowest
mean utilization was for custodial home visits (1.0 custodial home
visits per 100 patients per year). On average there were 321.1 PCP
visits and 289.2 specialty visits per 100 patients per year.
Across PCPs, there was wide variation in utilization (Table 2).
For example, the 5th percentile of PCP visits by 100 patients in a
year is 150.0, about half of the median, and less than one-third of
the 95th percentile.
3.3. Comparison of two methods of measuring reliability at PCP level
61.1
38.9
8.2
91.8
16.5
9.9
29.1
44.4
37.9 (19.4)
45.2 (14.7%)
4.25 (5.07)
N ¼ 1546
66.3
12.5
9.4
5.1
2.1
4.6
21.7
11.0
22.1
45.2
40.6 (18.7)
47.7 (11.9%)
4.31 (3.91)
a
As described in text, we do not have risk scores on individual patients. Rather
we were provided with a single mean score of each physician’s panel and each
practice’s panel.
characteristics in the Poisson model, (2) one analysis that included
only patients' characteristics in the model, (3) one analysis that
examined reliability by physicians’ patient panel size, (4) one
analysis that had extreme values Winsorized, and (5) one analysis
that looked at test–retest reliability using different combinations
of years (2008 vs. 2009, 2009 vs. 2010, 2008 vs. 2010).
3. Results
3.1. Description of PCPs and their practices
As shown in Table 1, the mean age of the 2938 study PCPs is
51 years. Males make up a greater proportion of the PCPs (61%).
The vast majority of PCPs hold an M.D. (92%). The average patient
panel size per PCP is 290. The mean patient age is 38 years, and
45% of the patients are male. The mean risk adjustment score
among patients is 4.25. Most PCPs are solo physician practitioners
(66.3%). About one-fifth of the practices have 50 patients or less;
over 45% of the practices have 250 patients or more.
Table 3 summarizes results from the two methods. Using the
hierarchical model we found that (1) there is a large variation in
mean reliability measures among the 11 types of visits with the
highest mean for PCP lab visits (0.98) and the lowest mean for
custodial home care services (0.24); (2) six of the 11 types of visits
had a mean reliability over 0.70, including PCP visits, specialist
visits, PCP lab visits, PCP other visits, emergency visits, and skilled
home care visits, while other utilization measures have a mean
reliability at 0.70 or blow; (3) most utilization measures had a
wide variation in reliability across PCPs. For example, across the
PCPs in the sample, the reliability for skilled nursing facility days
ranged from 0.06 (5th percentile) to 0.99 (95th percentile). Some
measures had less variation across PCPs. For example, the reliability for PCP lab visits ranged from 0.92 (5th percentile) to 0.99
(95th percentile).
Using the test–retest method for measuring reliability, most
utilization measures have lower reliability. The reduction in
reliability between the two methods is substantial for four
utilization measures, including professional therapeutic visits,
emergency room visits, hospital admissions, and skilled nurse
home visits (Table 3). In comparison, there are two utilization
measures—PCP lab visits and readmissions—whose reliability
does not change as much between the two methods.
We also found that the test–retest method had similar results
across the different combinations of years we examined (2008 vs.
2010, 2009 vs. 2010, and 2008 vs. 2010) with two exceptions. For
professional therapeutic visits and custodial home care services
the test–retest results did vary across years (Table 3).
Overall, we found that when using either of the two methods,
four types of utilization had relatively high reliability (40.7),
including PCP visits, specialty visits, PCP lab visits, and PCP other
visits.
3.4. Sensitivity analyses at PCP level
One sensitivity analysis showed that reliability of utilization
measures varies by the PCPs' patient panel size. Fig. 1A shows that
the reliability of PCP visits increases with panel size. It increases
sharply for panel size below 20, and it becomes flattened for panel
size over 60. A similar pattern appears for two other utilization
measures, such as specialist visits and PCP lab visits.
In sensitivity analyses at the PCP level, we included different
co-variates in the Poisson model. If the model includes only PCP
characteristics, reliability of most utilization measures will be
slightly higher than the model with only patient characteristics.
When the model includes both the PCP and patient characteristics,
the results are virtually the same as the model using only the
patient characteristics. When we Winsorized extreme value, there
was no substantial difference in reliability (results not shown).
3.2. Variation in performance on utilization measures across PCPs
3.5. Variation in performance on utilization measures at practice
level
There is substantial variation in mean utilization across the
measures (Table 2). The highest mean utilization was for PCP lab
The lower part of Table 2 summarizes utilization measures
across practices. The highest mean utilization was for PCP labs
H. Yu et al. / Healthcare 1 (2013) 22–29
25
Table 2
Variation in PCP and practice performance on annual number of visits per 100 patients.
Level of analysis
Type of visits
Mean
5th Percentile
25th Percentile
Median
75th Percentile
95th Percentile
PCP level
PCP visits
Specialty visits
PCP lab tests (blood and urine)
PCP radiology and other tests
Professional therapeutic visits
Emergency room visits
Hospital admissions
Readmissions
Skilled nursing facility days
Skilled home care visits
Custodial home care services
321.1
289.2
1186.6
198.6
0.9
25.7
9.4
2.2
44.5
22.5
1.03
150.0
62.2
326.1
34.9
0.0
2.3
0.0
0.0
0.0
0.0
0
243.8
130.6
625.4
67.7
0.0
10.8
3.4
0.0
0.0
3.4
0
301.3
266.1
1114.7
183.3
0.0
17.1
6.3
0.6
0.0
13.0
0
368.9
373.5
1533.3
249.0
0.6
29.3
10.0
1.7
4.6
25.3
0
515.1
589.8
2294.3
403.1
3.1
72.7
23.8
6.1
65.3
74.1
1.04
PCP visits
Specialty visits
PCP lab tests (blood and urine)
PCP radiology and other tests
Professional therapeutic visits
Emergency room visits
Hospital admissions
Readmissions
Skilled nursing facility days
Skilled home care visits
Custodial home care services
317.7
324.8
1306.4
208.2
0.8
24.7
9.4
2.0
23.0
24.1
0.8
12.0
73.2
343.8
38.5
0.0
0.0
0.0
0.0
0.0
0.0
0.0
251.7
182.6
750.6
105.7
0.0
10.7
3.3
0.0
0.0
3.8
0.0
319.8
289.8
1183.7
200.0
0.0
17.9
6.5
0.7
0.0
13.5
0.0
387.7
404.4
1599.2
268.3
0.5
31.1
10.6
1.9
6.2
25.8
0.0
549.1
691.1
2501.0
434.5
3.0
63.2
25.7
8.2
96.7
74.5
1.1
Practice level
Table 3
Provider-level reliability estimates by two methodsa.
Utilization
measure
PCP visits
Specialty visits
PCP lab tests
(blood and urine)
PCP radiology and
other tests
Professional
therapeutic visits
Emergency room
visits
Hospital
admissions
Readmissions
Skilled nursing
facility days
Skilled home
care visits
Custodial home
care services
a
Reliability Measured using simple hierarchical models
Reliability measured using test–retest method
Mean 5th
25th
Median 75th
95th
Correlation between
Percentile Percentile
Percentile Percentile 2008 and 2009
Correlation between
2008 and 2010
Correlation between
2009 and 2010
0.94
0.93
0.98
0.63
0.64
0.92
0.96
0.95
0.99
0.98
0.98
1.00
0.99
0.99
1.00
1.00
1.00
1.00
0.68
0.75
0.93
0.70
0.68
0.84
0.83
0.84
0.90
0.88
0.41
0.89
0.96
0.98
0.99
0.79
0.55
0.76
0.50
0.04
0.35
0.55
0.68
0.81
0.04
0.06
0.32
0.87
0.37
0.88
0.94
0.97
0.98
0.23
0.31
0.39
0.69
0.12
0.61
0.78
0.86
0.92
0.25
0.41
0.31
0.49
0.70
0.04
0.06
0.30
0.37
0.53
0.92
0.68
0.98
0.82
0.99
0.53
0.48
0.52
0.52
0.48
0.85
0.92
0.55
0.94
0.97
0.99
0.99
0.12
0.11
0.45
0.24
0.00
0.04
0.19
0.41
0.65
0.71
0.37
0.36
See Section 2 for details of the two methods.
(1306.4 tests per 100 patients per year) while both professional
therapeutic visits and custodial home care services had the
smallest mean (0.8 visits per 100 patients per year). Table 2 also
shows a wide range for each of the utilization measures at the
practice level. For example, the 5th percentile of PCP visits was 12
visits per 100 patients per year, less than 5% of the median or 2% of
the 95th percentile.
3.6. Comparison of two methods of measuring reliability at practice
level
Table 4 compares practice-level reliability estimates from the
two methods. Using the hierarchical model, only 3 out of the
11 types of utilization had a mean reliability below 0.70, including
professional therapeutic visits (0.59), readmissions (0.59), and
custodial home visits (0.33). These three utilization measures also
had a larger variation in reliability than other measures.
Using the test–retest method, we found that (1) the reliability
measures were reduced remarkably for most types of visits and
(2) the reliability measures were consistent across the years with
one exception—custodial home visits—that has unusual high
reliability between 2008 and 2009.
Putting together results from the two methods, we found that
four utilization measures had relatively high reliability (4 0.7)
across the two methods, including PCP visits, specialty visits, PCP
lab visits, and PCP other visits.
We also conducted sensitivity analyses at the practice level.
As Fig. 1B shows, there is a positive relationship between the
reliability of PCP visits and a practice’s patient panel size. The
relationship is strong for panel size below 20, and the reliability
26
H. Yu et al. / Healthcare 1 (2013) 22–29
becomes flattened for panel size over 60. A similar pattern appears
for three other utilization measures, such as specialist visits, PCP
lab visits, and PCP other visits.
There are few differences among three specifications of the
Poisson model: the model with PCP characteristics only, the model
with patient characteristics only, and the model with both PCP and
patient characteristics.
4. Conclusion and discussion
Fig. 1. (A) Relationship between panel size and physician-level reliability estimate
of PCP visits. (B) Relationship between panel size and practice-level reliability
estimate of PCP visits.
In an effort to reduce costs, commercial health plans and
government payers use utilization measures to profile individual
physicians. Such efforts are only useful if these utilization measures are reliable. If they are unreliable and therefore primarily
capture random noise, then these efforts may be misguided. In this
study we used two methods of estimating reliability to look at
utilization measures.
We found variations in reliability across the measures. For 4 of
the 11 measures (PCP visits, specialty visits, PCP lab visits, and PCP
other visits), we found that both methods demonstrated high
reliability at both the physician and the practice level. This would
imply that the differences observed in these measures reflect, to
large extent, real differences between physicians. For the other
7 measures, there were concerns about reliability since the two
methods generated discrepant results. The test–retest method
generated low reliability for each of the 7 measures while some
of these measures, such as readmission, skilled nursing facility
days and admissions still had relatively high reliability in the
hierarchical model. We also found that most utilization measures
had a wide variation in reliability across PCPs. Overall, our findings
illustrate the importance of measuring reliability on a regular basis
in physician profiling efforts.
The high reliability we found for these 4 measures was contrary
to our expectations which were based on the prior literature on
the reliability of other measures. In the one prior study looking at
reliability of utilization measures, Hofer and colleagues (1999)
reported that the median reliability of physician visits by diabetic
patients was 0.41.1 In related work, we examined the reliability of
physician cost profiles and found that 59% of physicians had costprofile scores with reliabilities of less than 0.70, a commonly used
cut-off.13 Most other work has focused on reliability of physician
quality measures. One recent study found that a composite
measure of diabetes care had a physician-level reliability of 0.70
Table 4
Practice-level reliability estimates by two methodsa.
Utilization
measure
PCP visits
Specialty visits
PCP lab tests
(blood and urine)
PCP radiology and
other tests
Professional
therapeutic visits
Emergency room
visits
Hospital
admissions
Readmissions
Skilled nursing
facility days
Skilled home
care visits
Custodial home
care services
a
Reliability measured using simple hierarchical models
Reliability measured using test–retest method
Mean 5th
25th
Median 75th
95th
Correlation between
Percentile Percentile
Percentile Percentile 2008 and 2009
Correlation between
2008 and 2010
Correlation between
2009 and 2010
0.97
0.98
0.99
0.90
0.94
0.98
0.98
0.99
1.00
0.99
0.99
1.00
0.99
1.00
1.00
0.99
0.99
0.99
0.69
0.79
0.86
0.64
0.79
0.80
0.67
0.90
0.91
0.97
0.89
0.97
0.99
0.99
0.99
0.78
0.67
0.74
0.59
0.13
0.43
0.62
0.77
0.92
0.04
0.09
0.53
0.91
0.68
0.91
0.95
0.98
0.99
0.38
0.33
0.77
0.76
0.33
0.68
0.81
0.89
0.96
0.18
0.33
0.33
0.58
0.89
0.10
0.44
0.42
0.89
0.61
0.96
0.77
0.99
0.91
0.99
0.11
0.20
0.30
0.13
0.28
0.27
0.94
0.79
0.95
0.97
0.99
0.99
0.47
0.26
0.60
0.33
0.02
0.14
0.30
0.51
0.77
0.94
0.24
0.22
See Section 2 for details of the two methods.
H. Yu et al. / Healthcare 1 (2013) 22–29
with a sample of 25 patients.14 More recently, a study of primary
care performance found that physician level reliability varied
widely from 0.42 for HbA1c testing to 0.95 for cervical cancer
screening, and that the reliability was below 0.80 for 10 out of the
13 study measures.12 The high reliability we found for these four
measures might be due to our much larger sample size. We focus
on general utilization measures while the published studies have
focused on specific medical procedures for certain medical conditions, such as diabetes, and therefore their sample sizes are much
smaller.
While our findings of high or low reliability are based on the
full patient panel size, not surprisingly our sensitivity analysis
shows that the reliabilities are closely related to patient panel size.
For example, although the utilization measure of PCP visits, on
average, has high reliability, those physicians with small panel size
(e.g., below 20) typically still have low reliability estimates as
demonstrated in Fig. 1A. The sensitivity analysis also indicates that
the reliability becomes flattened for panel size over 60, suggesting
that the 7 utilization measures with low reliability are not likely to
have higher reliability for larger panel size.
It is not so surprising to see the discrepancy in our reliability
estimates for the 7 measures between the two methods. This is
inherent to some degree given the different formulas embedded in
the two methods. While the reliability formulation for the hierarchical model helps one understand the extent to which we can
distinguish physicians' performance within a particular year or
aggregation of years, the test–retest reliability, a simple correlation
coefficients between years, represents the lower bound of reliability estimate, and since it accounts for possible changes over
time, the test–retest reliability should never be larger than crosssectional reliability. Contrasting the two reliabilities clearly shows
that they have different meanings and they cannot be substituted
for each other. The results from the two methods indicate that it is
possible to be reliable for detecting differences in physician
performance in historical data while having less or far less
reliability for predicting future physician performance.
One important consideration when assessing reliability is the
relationship between reliability and validity. Validity indicates
how well a measure represents the phenomenon of interest;
reliability indicates the proportion of variability in a measure that
is due to real differences in performance. Classically, validity and
reliability are considered two separate characteristics of a measure
with little relationship between them. If a physician has high
reliability on a given measure, it does not mean that the physician
performed well on the measure or that the measure is a valid
measure of physician performance. Since reliability is difficult to
interpret in the absence of validity, it is also important to consider
the possible effects of the most common threat to validity,
inadequate case-mix adjustment. Although a precise quantification of the effects of failed case-mix adjustment is beyond the
scope of this paper, the intuition can be seen in a conceptual
extension of the reliability formula:
Calculated_Reliability
¼
Variance_across_physicians þ Squared_bias
Variance_across_physicians þ Variance_within_physicians þ Squared_bias
We distinguish here between calculated reliability and the
reliability you would obtain if the bias was removed from the
performance measures. The presence of the squared bias in both
the numerator and the denominator increases the calculated
reliability but does not do so in a way that is relevant for physician
profiling. Consider a physician whose panel is healthy beyond
what the measured case-mix adjusters can capture. He/she has a
structural advantage when compared to his/her peers. Indeed this
advantage will likely persist over time and influence calculated
test–retest reliability in a similar way.
27
Careful consideration of the adequacy of case-mix adjustment
should inform the interpretation of reliability. Unfortunately there is
often little analysis that can be done with the available data to inform
this important question. This study was not able to address this issue
because we had access only to health plans' administrative data and
did not have access to information about chronic conditions or risk
adjustment scores at individual patient level. Although it is common
for health plans to profile physicians and practices using just the
average risk scores across a patient panel, including more patientlevel information for risk adjustment to improve reliability analysis
could provide an avenue for future inquiry.
Our study has some limitations in terms of generalizability. Our
physician sample was drawn from the five major health plans in the
Hudson Valley, four of which are commercial health plans. The
results may not generalize to other regions or to physicians who
primarily serve Medicare patients. Our results may also not be
generable with over 66% of the study physicians being solo practitioners, a proportion higher the national average. Our study could be
expanded by using representative samples or data from large scale
policy interventions. For example, it remains an interesting topic for
future studies to assess reliability of the measures included in the
CMS' QRUR system, which contain performance information on 28
quality indicators to provide confidential feedback reports to
physicians in nine states (California, Illinois, Iowa, Kansas, Michigan,
Minnesota, Missouri, Nebraska, or Wisconsin).7
Our study findings have important policy implications. The
results of high and stable reliability of the four utilization measures suggest that these measures may be reliably used for PCP
profiling by health plans and other health care organizations. In
fact, some of the measures (e.g. PCP prescribed radiology visits)
have already been used by organizations, such as the California
Quality Collaborative to monitor resource use by PCPs.6 Our results
concerning the other utilization measures suggest that they should
be used with caution for provider profiling. Our finding that the
reliability increases with patient panel size before it becomes
flattened for panel size over 60 suggests that physician profiling
programs should not include physicians with relatively small
panel size (e.g. below 60).
Our results also highlight the importance of assessing reliability
as a standard aspect of any profiling effort. Utilization measures
have been in use for many years, yet this is one of the first studies
to assess their reliability. Hospital readmission measures have
become a critical aspect of the health care system, yet, to our
knowledge, no one has assessed their reliability before. Given the
substantial variation we found across utilization measures, we
encourage organizations, which are using utilization measures to
profile physicians, to evaluate reliability of their utilization measures. We also emphasize that reliability varies across providers.
Even if a given measure has a high reliability on average, it does
not mean that all providers should be profiled on that measure.
One option is to only profile providers where the reliability for that
provider meets a threshold (for example, 0.70). We feel this would
be more fair to providers than simply using volume cut-offs (for
example, a patient panel of 60) because our results underscore
that even above a volume cut-off (as demonstrated in our analysis
of the full patient panel), reliability may not be sufficiently high for
some utilization measures.
Acknowledgment
This study was funded by THINC, Inc.
Appendix A
See Fig. A1.
28
H. Yu et al. / Healthcare 1 (2013) 22–29
Did the patient specify a
primary care physician to the
health plan upon enrollment
during the measurement year?
Did the patient see any
primary care physician
from the study sample at
least once in the
measurement year?
No
Yes
Exclude
No
Yes
Select the primary care
physician specified most
recently by the patient in the
measurement year
Select the primary care
physician with most visits
for that patient during the
measurement year.
Did the patient see that primary
care physician for at least one
visit (preventive care and E&M)
in the measurement year?
If there is a tie between two
primary care physicians,
select the physician with
the most recent visit.
Exclude
No
Yes
Is that primary care physician
included in the study sample?
Exclude
No
Did the patient see that
primary care physician for
at least one preventive
care or E&M care visit in
the measurement year?
Exclude
No
Yes
Yes
Attribute the patient to the
primary care physician
(Imputation Method)
Attribute the patient to that
primary care physician
(Enrollment Method)
Fig. A1. Methods of attributing a patient to a PCP.
Appendix B. Definitions of utilization measures.
Measure
where Rendering Provider
meets the all of the following
criteria:
Is not a pediatrician, family
practice, internal medicine, or
general practice as determined by
submitted taxonomy codes
Is a physician
Measure description
PRIMARY CARE
Count of Patient–Provider
PHYSICIAN (PCP) VISITS encounters for the primary care
providers. A visit is defined by
distinct consistent Member id,
Rendering Provider id, and Date of
Service where Rendering Provider is
a PCP Provider (pediatrician, family
practice, internal medicine, or
general practice as determined by
submitted taxonomy codes).
SPECIALIST VISITS
Count of Patient–Provider
encounters for specialist providers.
A visit is defined by distinct
Consistent Member id, Rendering
Provider id, and Date of Service
PCP LABORATORY TESTS
(BLOOD AND URINE)
PCP RADIOLOGY AND
OTHER TESTS
Count of lines for any claim where:
Service code has BETOS code for lab
test OR Revenue code is for lab test.
Count of lines for any claim where:
Service code has BETOS code for
radiology or diagnostic test
OR Revenue code is for radiology or
diagnostic test.
H. Yu et al. / Healthcare 1 (2013) 22–29
PROFESSIONAL
Count of occupational, physical and
THERAPEUTIC SERVICES speech therapy services using
corresponding service codes.
EMERGENCY
Count of claims where CPT Code,
DEPARTMENT VISITS
Revenue code or CPT/Place of
Service Code indicates an
Emergency Department (ED) visit.
Multiple ED visits on the same date
of service are counted as one visit.
ED visits that also have Room and
Board are not counted (this means
the patient was admitted to the
hospital).
HOSPITAL ADMISSIONS
Based on fields such as claim ID,
admit & discharge date, Type of Bill,
Revenue Code and DRG Code, all
facility claims are aggregated into
hospital admissions. This field is a
count of those admissions.
READMISSIONS
If all of the following conditions are
met, count the admission as a
readmission:
The number of days between the
current admission and a previous
admission is 30 days or less.
The admission must have the
same Member ID and Product ID
as the previous admission.
If an admission is within the 30
days of more than 1 preceding
admission, count it only once.
SKILLED NURSING
FACILITY DAYS
SKILLED HOME CARE
VISITS
CUSTODIAL HOME CARE
SERVICES
Sum of the Length of Stays (LOS)
from admissions in a Skilled
Nursing Facility (SNF). SNF is an
admission with the following
criteria:
Type of Bill code (TOB) is “Skilled
Nursing facility—Inpatient” or
“Skilled Nursing facility—Swing
Bed” Or Revenue code on any
admission code indicates SNF.
Count of distinct Patient, Provider,
service dates where Place of service
code or BETOS code indicates Home
Care visit.
Count of claim lines where service
code is indicates Home Care
Services.
29
References
1. Hofer TP, Hayward RA, Greenfield S, Wagner EH, Kaplan SH, Manning WG. The
unreliability of individual physician report cards for assessing the costs and
quality of care of a chronic disease. JAMA. 1999;281(22):2098–2105.
2. Fung CH, Lim YW, Mattke S, Damberg C, Shekelle PG. Systematic review: the
evidence that publishing patient care performance data improves quality of
care. Annals of Internal Medicine. 2008;148(2):111–123.
3. Faber M, Bosch M, Wollersheim H, Leatherman S, Grol R. Public reporting in
health care: how do consumers use quality-of-care information?: A systematic
review Medical Care. 2009;47(1):1–8.
4. Rosenthal MB, Landon BE, Normand S-LT, Frank RG, Epstein AM. Pay for
performance in commercial HMOs. New England Journal of Medicine.
2006;355(18):1895–1902.
5. Milstein A, Lee TH. Comparing physicians on efficiency. New England Journal of
Medicine. 2007;357(26):2649–2652.
6. California Quality Collaborative. Actionable reports that show physician level
performance compared to peer group for specified measures. 〈http://www.
calquality.org/programs/costefficiency/resources/〉, and 〈http://www.calquality.
org/about/〉; 2012 Accessed 18.05.12.
7. Centers for Medicare & Medicaid Services. QRUR template for individual
physicians. 〈http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/
PhysicianFeedbackProgram/Downloads/QRURs_for_Individual_Physicians.pdf〉;
2012 Accessed 18.05.12.
8. VanLare Jm Blum JD, Conway PH. Linking performance with payment: implementing the physician value-based payment modifier. JAMA. 2012;308
(20):2089–2090.
9. Fischer E. Paying for performance—risks and recommendations. New England
Journal of Medicine. 2006;355:1845–1847.
10. Shahian DM, Normand S-L, Torchiana DF, et al. Cardiac surgery report cards:
comprehensive review and statistical critique. The Annals of Thoracic Surgery.
2001;7(6):2155–2168.
11. Landon B, Normand S, Blumenthal D, Daley J. Physician clinical performance
assessment: prospects and barriers. JAMA. 2003;290(9):1183–1189.
12. Sequist TD, Schneider EC, Li A, Rogers WH, Safran DG. Reliability of medical
group and physician performance measurement in the primary care setting.
Medical Care. 2011;49(2):126–131.
13. Adams JL, Mehrotra A, Thomas JW, McGlynn EA. Physician cost profiling—
reliability and risk of misclassification. New England Journal of Medicine.
2010;362(11):1014–1021.
14. Kaplan SH, Griffith JL, Price LL, Pawlson LG, Greenfield S. Improving the
reliability of physician performance assessment identifying the physician effect
on quality and creating composite measures. Medical Care. 2009;47
(4):378–387.
15. Bland J, Altman D. Statistical methods for assessing agreement between two
methods of clinical measurement. Lancet. 1986;1(8476):307–310.
16. Fleiss J, Levin B, Paik M. Statistical Methods for Rates & Proportions.Indianapolis,
IN: Wiley-Interscience; 2003.
17. Love D, Custer W, Miller P. All-Payer Claims Databases: State Initiatives to
Improve Health Care Transparency, Common Wealth Fund Publication No. 1439,
vol. 99. New York: Commonwealth Fund; 2010.
18. Safran DG, Karp M, Coltin K, et al. Measuring patients’ experiences with
individual primary care physicians. Results of a statewide demonstration
project. Journal of General Internal Medicine. 2006;21(1):13–21.
19. Hays RD, Revicki D. Reliability and validity (including responsiveness). In:
Fayers P, Hays R, editors. Assessing Quality of Life in Clinical Trials. New York:
Oxford University Press Inc.; 2005.
20. Phelps C. Health Economics. 3rd ed.,Boston: Addison-Welsley Educational
Publishers; 2008.
Healthcare 1 (2013) 30–36
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Contributors to variation in hospital spending for critically ill patients
with sepsis
Tara Lagu a,b,c,n, Michael B. Rothberg d, Brian H. Nathanson e, Nicholas S. Hannon a, Jay S. Steingrub a,c,f,
Peter K. Lindenauer a,b,c
a
Center for Quality of Care Research, Baystate Medical Center, Springfield, MA, USA
Division of General Internal Medicine, Baystate Medical Center, Springfield, MA, USA
c
Department of Medicine, Tufts University School of Medicine, Boston, MA, USA
d
Department of Medicine, Medicine Institute, Cleveland Clinic, Cleveland, OH, USA
e
OptiStatim, LLC, Longmeadow, MA, USA
f
Division of Critical Care Medicine, Baystate Medical Center, Springfield, MA, USA
b
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 17 January 2013
Received in revised form
24 April 2013
Accepted 26 April 2013
Available online 9 May 2013
Background: Costs of severe sepsis in the US exceeded $24 billion in 2007. Identifying the relative
contributions of patient, hospital, and physician factors to the variation in hospital costs of sepsis could
help target efforts to improve the value of care.
Methods: We identified adults with a principal or secondary diagnosis of sepsis who received care
between June 1, 2004 and June 30, 2006 at one of the hospitals participating in a multi-institutional
database. We constructed a regression model to predict mean hospital costs that included patient
characteristics, hospital mission and environment (e.g., teaching status, percentage of low-income
patients), hospital fixed costs, and risk-adjusted length of stay, which encompasses hospital throughput,
the incidence of complications, and other aspects of physician practice. To determine the contribution to
cost variance by each predictor, we calculated the R2.
Results: At 189 hospitals, we identified 40,265 adults with sepsis who met inclusion criteria. The median
cost of a hospitalization was $20,216. The model explained 69% of the hospital-level variation in the
costs of hospitalization. Of explained variation, differences in patients' ages, comorbidities, and severity
accounted for 20%; hospital mission and environment represented 16%; differences in hospital fixed
costs, including acquisition costs and overhead, accounted for 19%; and wage index explained an
additional 12%. Risk-adjusted length of stay comprised the final one-third of explained variation.
Conclusion: A large proportion of variation in the cost of caring for critically ill patients with sepsis across
hospitals is related to differences in patient characteristics and immutable hospital characteristics, while
nearly one-third is the result of differences in risk-adjusted length of stay.
Implications: Efforts to reduce spending on the critically ill should aim to understand determinants of
practice style but should also focus on hospital throughput, overhead, acquisition, and labor costs.
& 2013 Published by Elsevier Inc.
Keywords:
Sepsis
Critical illness
Costs
Introduction
There were more than 700,000 cases of severe sepsis in the
United States in 2007, with a combined cost exceeding $24 billion.1
Recent growth in the number of annual sepsis cases suggests that the
costs of sepsis care in the US will only continue to increase.1,2 Efforts
to reduce health care spending have focused on variations in practice,
because physicians and hospitals differ in the testing and treatment
they provide.3–7 In addition to differences in the intensity of care
n
Correspondence to: Center for Quality of Care Research, Baystate Medical
Center, 280 Chestnut St., 3rd Floor, Springfield, MA 01199, USA.
Tel.: +1 413 505 9173; fax: +1 413 794 8866.
E-mail addresses: [email protected], [email protected] (T. Lagu).
2213-0764/$ - see front matter & 2013 Published by Elsevier Inc.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.005
provided to patients with sepsis, however, there are patient and
hospital factors that influence costs at the hospital level.8 Sepsis
patients can present with a broad range of signs and symptoms, from
mildly deranged vital signs to fulminant disease with organ system
failure.9–12 A sicker patient population requires more monitoring,
testing, and treatment. Additionally, each hospital's mission and
environment, such as its role in the training of young physicians,
its provision of care for low income patients, its location (region of
the country, urban or rural), or its ability to negotiate for lower costs
of drugs, devices, and supplies may influence its spending.8,13–17
For example, higher wages are related to geographic location and
increase the cost of caring for each patient.14
Identifying the relative contributions of patient, hospital, and
physician factors to the variation in spending on sepsis observed
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
between hospitals could help target efforts to improve the value of
sepsis care, identify the extent to which efforts to change physician practice could result in cost savings, and highlight the need to
reduce fixed and acquisition costs.18 In this study, we aimed to
describe sources of variation across hospitals in the average cost of
care for critically ill patients with sepsis in order to reduce some of
the mystery surrounding variation in hospital spending and to
identify real-world strategies (e.g., limiting utilization vs. improving throughput vs. holding off on large capital investments vs.
altering purchasing strategies) most likely to help reduce some of
this variation over time.
Methods
Setting and subjects
Using data from hospitals that participated in the Perspective
database (Premier Healthcare Informatics, Charlotte, NC), we conducted a cohort study of medical patients with sepsis who were
treated in the ICU between June 1, 2004 and June 30, 2006.
Perspective contains a date-stamped log of all items and services
charged to the patient or insurer (such as medications, laboratory
tests, and therapeutic services) in addition to the elements found in
hospital claims derived from the uniform billing 04 (UB-04) form.
Participating hospitals closely approximate the makeup of acute care
hospitals nationwide and the database includes roughly 15% of US
hospitalizations. Participating hospitals are similar to the composition of acute care hospitals nationwide. They represent all regions of
the United States, and represent predominantly small- to mid-sized
non-teaching facilities that serve a largely urban population. The
detailed nature of the database is a notable advantage, particularly its
cost data.18,19 Approximately 75% of hospitals that participate in
Perspective submit item-level information on actual hospital costs,
taken from internal cost accounting systems. The remaining 25%
provide cost estimates based on Medicare cost-to-charge ratios.
Permission to conduct the study was obtained from the Institutional
Review Board at Baystate Medical Center.
Patients with a principal or secondary diagnosis of sepsis (Appendix 1) were included if they were 18 years of age or older and, by the
second hospital day, were admitted to the ICU, treated with antibiotics,
and had blood cultures drawn. Diagnostic information was assessed
using International Classification of Diseases, Ninth Revision, Clinical
Modification (ICD-9-CM) codes. We restricted the analysis to medical
(nonsurgical) patients for whom treatment was initiated within the
first 2 days of hospitalization in order to focus our investigation on the
care of patients who presented with sepsis (rather than those who
developed it later during the hospitalization). We used the first 2 days
of hospitalization (rather than just the first day) because in administrative datasets the duration of the first hospital day includes partial
days that can vary in length.
We limited the sample to hospitals that admitted at least 100
critically ill sepsis patients during the 2-year study period. This
prevented hospital-level results from being distorted by hospitals
with few patients. We also excluded patients who were transferred
from or to another acute care facility because we could not accurately
determine the clinical outcomes or cost of the hospitalization. To
lessen the possibility of including patients in the study who may
have had a working diagnosis of sepsis that was never confirmed, we
limited the analysis to patients who were treated with antibiotics for
at least 3 consecutive days (except in the case of death).
Contributors to hospital cost
For each hospital, we recorded size, teaching status, geographic
region, and whether it served an urban or rural population.
31
We then constructed additional variables comprising patient and
hospital factors that may contribute to a hospital's average cost
for caring for patients with sepsis. We grouped these into four
categories: patient factors, hospital mission and environment,
fixed costs, and risk-adjusted length of stay.
Patient factors
For each patient, we recorded age, gender, marital status,
insurance status, race, and ethnicity (as recorded by admission
or triage staff of participating hospitals using hospital-defined
options). We used software provided by the Healthcare Costs and
Utilization Project of the Agency for Healthcare Research and
Quality20 to assess the presence of 25 comorbid conditions. We
used diagnosis codes to identify the source (lung, abdomen,
urinary tract, blood, other) and type of infection (gram positive,
gram negative, mixed, anaerobic, fungal). We also recorded use of
initial critical care therapies (e.g., mechanical ventilation, vasopressors) which can serve as useful proxies for severity of illness in
the absence of physiologic data. These therapies are included in a
mortality prediction model with similar discrimination and calibration to clinical ICU risk-adjustment models (e.g., MPM III).21,22
Hospital mission and environment
We included two variables to account for the impact of hospital
mission on cost variation. The first, resident-to-bed ratio, represents the presence of teaching and the teaching burden; a higher
ratio indicates a greater teaching burden. The second, disproportionate share of low-income patients, accounts for hospital differences in the proportion of low-income Medicare patients and the
proportion of patient-days for which Medicaid is the primary
payer. Disproportionate share of low-income patients may contribute to variation because low-income patients tend to have
complex health care needs and may require additional staffing and
services like translators and social workers.23
Fixed and acquisition costs
Hospitals also pay different prices to acquire goods (e.g., a dose
of antibiotic has a different cost at different hospitals) and have
different fixed costs, such as overhead, labor, or infrastructure
investments (e.g., electronic records, new buildings, or expensive
radiological or surgical equipment).14,15 Fixed costs and acquisition
costs vary across hospitals. One important caveat to identifying the
contribution of fixed and acquisition costs to cost variation is that
these costs can be difficult to reduce. They are often considered to
be the cost of “keeping the lights on” and thus are not often the
focus of cost cutting measures. Still, there has been discussion in
the literature as to the ways that the fixed and acquisition costs of
critical care can be reduced.14 Some of these can be implemented
with relative ease, such as renegotiating purchasing contracts and
holding off on capital purchases and infrastructure investments.
Others are more painful (e.g., reducing the labor force in areas that
are “overstaffed” and eliminating ICU beds that are underutilized).
Typically, hospitals use cost accounting systems to add fixed
and acquisition costs proportionally to all billable items.18,19 A day
of ICU room and board is one high-cost item that was included in
each patient's hospital bill because of our inclusion criteria. It
includes the fixed and acquisition costs of a day of critical care,
including nursing care, but does not include the costs of associated
diagnostic tests or treatments for patients in the ICU. In a prior
analysis, we reported that room and board either on the floor or in
the ICU represents about half of the costs of a patient stay for
sepsis patients,18 and others have reported that ICU room and
board drives critical care costs.24 We therefore created a fixed cost
32
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
index by dividing the median cost of an ICU day at each hospital by
the overall median cost of an ICU day for all hospitals. A hospital
with median ICU room and board costs would have an indexed
value of 1.
Because wages may be an important contributor to variation in
hospital costs, we accounted for differences in labor costs using a
wage index. The wage index is calculated using relative hospital
wage levels in the geographic area of the hospital compared to the
national average hospital wage level.
Length of stay index
Because room and board accounts for the largest percentage of
hospital costs,18,19 decisions concerning the timing of discharge, as
reflected in length of stay (LOS) have a large impact on overall
hospital costs. ICU LOS is determined by factors both within and
beyond the control of the physician. A factor which may be beyond
the control of the physician is throughput, which is determined
partly by the availability of beds on the floor for ICU patients.
Factors affecting LOS that are in the control of the physician are the
decision to keep the patient in the ICU (or hospital) for another
day for observation or the pursuit of additional diagnostic testing
that might lead to an extra day in the hospital.25,26 We therefore
modeled each hospital's risk-adjusted length of ICU stay to
encompass these factors. We calculated a predicted length of ICU
stay at the patient level with a generalized linear model that
adjusted for demographics, comorbidities, early organ supportive
therapies, and hospital characteristics. By calculating predicted ICU
LOS and comparing it to observed ICU LOS, we adjusted for patient
severity on presentation (because a patient with greater illness
severity on presentation would be expected to have a longer ICU
length of stay).
We derived Pearson residuals at the patient level in order to
calculate average Pearson residuals at the hospital level. For
example, a hospital that keeps patients, on average, 2 days longer
in the ICU than would be expected based on their patient's
presenting severity and comorbidities would have an average
Pearson residual value of 2. We then standardized the hospitallevel Pearson residuals to have a mean of 0 and a standard
deviation of 1 (hospitals with longer than expected ICU LOS values
have LOS indices greater than 0). We defined this number as the
“Length of Stay (LOS) Index.”
Angle Regression (LAR) is a sequential model-building algorithm
that considers parsimony as well as prediction accuracy.
We then examined Partial (Type III) Sum of Squares in the
hospital-level model. We calculated the correlation coefficient (“r”)
and squared it (R2) to calculate the percent variance explained for
each predictor.29 This is equivalent to creating individual regression models where each variable is the sole predictor and hospitallevel patient costs as the dependent variable. Finally, we examined
standardized coefficients of the regression model to see how a one
standard deviation change in the predictors affect average hospital
costs. All analyses were carried out using STATA/SE 10.1 (StataCorp,
College Station, TX).
Results
We identified 272,785 adults with sepsis who were admitted to
a Premier hospital between June 1, 2004 and June 30, 2006. Of
these, 40,265 of these were medical patients who received 3
consecutive days of antibiotics and at least one blood culture
(Table 1). Almost all hospitals (91%) had more than 200 beds and
half (49%) had more than 400 beds. Nearly half (47%) participated in teaching activities and most (90%) were located in urban
locations.
Patient and hospital factors
The median age of patients was 69 years (Table 2); half (50%)
were women. The majority (62%) were insured by traditional
Medicare plans. Most (61%) were white (Table 1). Hypertension
(33%), diabetes (33%), and anemia (31%) were the most common
comorbid conditions. More than one-third of patients (36%)
received mechanical ventilation by hospital day 2 and more than
half (55%) received vasopressors. Median LOS was 8 days and
median ICU LOS was 4 days. The median cost of a hospitalization
was $20,216 with an interquartile range of $16,997–24,129.
Observed mean LOS and patient costs at the hospital level were
correlated (r ¼0.67, p o0.001) and we found no hospital in the
highest quintile of patient costs with the lowest quintile of LOS.
Table 1
Characteristics of hospitals treating medical patients with sepsis in the intensive
care unit.
Outcomes
Our outcome of interest was each hospital's median and mean
cost of a hospitalization for a medical patient with sepsis who was
treated in the ICU.
Analysis
We calculated patient-level summary statistics using frequencies for binary variables and medians and interquartile percentiles
for continuous variables. We then created a hospital-level regression model to predict mean hospital costs. The independent
variables in this model were patient demographics, comorbidities,
early organ supportive therapies averaged at the hospital level and
the hospital-level variables defined above (hospital environment
variables, fixed cost and length of stay indices). We chose variables
for inclusion based on a bootstrapping algorithm with stepwise
regression.27 This was based on two iterations of 100 bootstrapped
samples each. The stepwise algorithm was backward selecting
with p ¼0.05 to enter the model and p ¼0.1 to remain in the
model. We also re-analyzed the data using Least Angle Regression
to see how including more factors would affect our results.28 Least
Number of
hospitals n (%)
Number of
records n (%)
Total
189 (100)
40,265 (100)
Number of beds
o 200
200–400
4 400
16 (8.5)
80 (42.3)
93 (49.2)
2837 (7.0)
14,201 (35.3)
23,227 (57.7)
Teaching
89 (47.1)
18,744 (46.6)
Geographical location
South
North
Midwest
West
93
33
38
25
20,656
6744
8415
4450
Urban (vs. rural)
171 (90.5)
(49.2)
(17.5)
(20.1)
(13.2)
Median [IQR]
a
DSH patient percentage
(3-year average)
Wage index (3-year average)
Ratio of residents to beds
(3-year average)
a
23.2% [15.4%, 31.5%]
0.96 [0.92, 1.08]
3.0% [0%, 15.9%]
Disproportionate share of low-income patients.
(51.3)
(16.7)
(20.9)
(11.0)
37,524 (93.2)
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
Table 2
Characteristics, treatments, and outcomes of patients with medical sepsis treated in
the intensive care unit at 189 US hospitals.
Overall median
[IQR] or n (%)
across patients
Median [IQR]
across hospitals
Total
40,265 (100)
189 (100)
Demographics
Age (in years)
69 [56,80]
66.4 [64.3, 68.9]
Female
20,119 (50.0)
49.6% [47.0, 52.5]
Race/ethnicity
White
Black
Hispanic
Other/unknown
24,599 (61.1)
7795 (19.4)
2091 (5.2)
5780 (14.4)
69.6% [43.4, 85.0]
10.7% [3.0, 27.6]
0.7% [0, 3.2]
3.8% [1.4, 12.3]
Marital status
Married
Widowed
Single
Separated/divorced
Other
Not recorded
14,866 (36.9)
7700 (19.1)
8197 (20.4)
3952 (9.8)
1833 (4.6)
3717 (9.2)
42.0% [32.8, 46.5]
20.5% [15.8, 24.4]
20.2% [14.9, 25.4]
10.0% [6.5, 13.2]
1.4% [0, 3.4]
0% [0, 0]
Insurance
Medicare traditional
Managed care
Medicaid traditional
Self pay/Other
Medicare managed
Commercial/private
Medicaid managed
24,962 (62.0)
5581 (13.9)
3327 (8.3)
2484 (6.2)
2068 (5.1)
1293 (3.2)
550 (1.4)
63.9% [55.6, 70.3]
12.1% [8.5, 18.0]
6.8% [3.6, 11.3]
5.1% [2.8, 7.6]
0.6% [0, 8.9]
1.9% [0.7, 4.7]
0% [0, 1.6]
13,111 (32.6)
13,444 (33.4)
32.8% [25.9, 38.6]
32.8% [28.3, 38.8]
12,810 (31.8)
12,266 (30.5)
12,732 (31.6 )
8839 (22.0)
7746 (19.2)
5901 (14.7)
3850 (9.6)
4070 (10.1)
2968 (7.4)
3252 (8.1)
3461 (8.6)
2572 (6.4)
2505 (6.2)
2039 (5.1)
2382 (5.9)
1776 (4.4)
2346 (5.8)
1411 (3.5)
30.1% [23.8, 39.6]
30.1% [25.0, 36.1]
31.8% [26.9, 35.5]
21.2% [17.6, 24.9]
18.2% [13.5, 24.1]
13.8% [8.8, 21.0]
9.0% [6.4, 12.2]
9.6% [6.8, 13.4]
6.4% [3.8, 9.8]
7.6% [4.9, 10.6]
8.0% [5.4, 10.8]
6.1% [4.2, 8.1]
5.7% [3.7, 8.4]
4.7% [3.4, 6.5]
5.7% [4.1, 7.7]
4.1% [2.8, 5.7]
5.6% [3.9, 7.7]
3.4% [2.3, 4.5]
1183 (2.9)
1274 (3.2)
1352 (3.4)
645 (1.6)
110 (0.3)
2.7% [1.7, 3.9]
2.8% [1.5, 4.4]
2.8% [1.4, 4.9]
1.2% [0.6, 2.3]
0% [0, 0.4]
13,946 (34.6)
16,626 (41.3)
5493 (13.6)
33.6% [30.1, 37.7]
40.9% [35.6, 46.8]
13.3% [9.9, 16.2]
973 (2.4)
1.8% [0.9, 3.3]
10,887 (27.0)
26.6% [23.8, 30.8]
Clinical characteristics
Comorbidities
Hypertension
Diabetes (with or without
complications)
Anemia (chronic and acute)
Chronic pulmonary disease
Congestive heart failure
Neurological disorders
Renal failure
Weight loss
Solid tumor w/out metastasis
Hypothyroidism
Depression
Peripheral vascular disease
Valvular disease
Paralysis
Obesity
Metastatic cancer
Liver disease
Psychoses
Alcohol abuse
Rheumatoid arthritis/collagen
disease
Lymphoma
Drug abuse
Pulmonary circulation disease
Peptic ulcer disease
AIDS
Site of infection
Urinary (primary or secondary)
Lung (primary or secondary)
Abdominal (primary or
secondary)
Blood (and not Abdominal, lung,
or urinary)
Other (e.g., skin, bone)/
Unknown
Infection type
Gram positive
Gram negative
Other (fungal, viral, anaerobic,
mixed)
Unknown
33
Table 2 (continued )
Primary diagnosis of sepsis
Treatments (by day 2)
Mechanical ventilation
Vasopressors
Outcomes
LOS (days)
ICU LOS (days)
Patient costs (US $)
In-hospital mortality
Readmission (for survivors only)
Overall median
[IQR] or n (%)
across patients
Median [IQR]
across hospitals
22,272 (55.3)
56.0% [48.2, 62.2]
14,402 (35.8)
34.1% [28.3, 42.3]
22,334 (55.5)
54.9% [47.8, 62.8]
8 [5, 14]
4 [2, 7]
$15,321 [$8978,
$26,359]
13,277 (33.0)
5394 (20.0)
10.2 [9.3,11.4]
5.1 [4.5, 6.0]
$20,216 [$16,997,
$24,129]
33.6% [28.0, 38.0]
19.5% [16.3, 23.3]
Fixed cost index
The fixed cost index varied widely across hospitals (Fig. 1), with
17% of hospitals having an ICU day cost that was more than 1.25
times the median, and 13% of hospitals with an ICU day that was
less than 0.75 times the median cost.
Length of stay index
The ICU LOS index also varied considerably across hospitals
(Fig. 1). By definition, the mean and standard deviation (SD) of this
index were 0 and 1 respectively. There were nine hospitals (5%)
with a length of stay index greater than 2 SDs from the mean and
50 hospitals (26%) had LOS indices more extreme than 1 SD from
the mean.
Model examining contribution of patient and hospital factors to cost
variation
Our final model contained 11 variables and had an adjusted R2
of 0.67 (Table 3). Overall, our model explained 69% (the unadjusted
R2) of the hospital-level variation in the average cost of a
hospitalization for a critically ill sepsis patient (Fig. 2). Of
explained variation, differences in patient characteristics (including age, comorbidities, and severity variables) explained 20% of the
variation in cost across hospitals. Teaching burden (i.e., the
resident-to-bed ratio) represented 16% of explained variation,
wage index represented 12%, and fixed costs accounted for 19%.
LOS index accounted for 33% of explained variation. We found that
the significant variables with the LAR method were very similar to
what we included in our final model via the stepwise bootstrapping algorithm.
When the coefficients were standardized so that each one
represented a 1 SD increase in its value, we found that a 1 SD
increase in the LOS Index (equal to 1 day longer, on average, in the
ICU) caused an increase of approximately $2300 in average patient
costs. Similarly, a 1 SD increase in the Fixed Cost Index resulted in
an increase of $1951 in average patient costs where 1 SD equaled
an amount 33% above the median value (Fig. 3).
Discussion
6824 (17.0)
7002 (17.4)
544 (1.4)
16.6% [13.1, 20.4]
17.6% [13.8, 21.1]
1.2% [0.7, 2.0]
25,895 (64.3)
64.7% [58.5, 70.8]
The cost of intensive care in the United States was estimated to
be $81 billion in 2005, representing more than 13% of all hospital
costs and almost 1% of the US gross domestic product.30 Despite
this large and growing expenditure, the most effective method for
34
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
15
25
20
10
Percent
Percent
15
10
5
5
0
0
0
.5
1
1.5
2
2.5
-2
-1
0
Fixed Cost Index
1
2
3
4
Length of Stay Index
Fig. 1. Distribution of fixed cost index and practice pattern index.
Table 3
Model examining contribution of patient and hospital factors to cost variation.
LOS index
Fixed cost index
Wage index
Residents to bed ratio
Vasopressors by day 2
Mechanical ventilation by day 2
Valvular disease
Albumin by day 2
Fungal infection type
Percent age 85+
Widowed marital status
Constant term
Coefficient
95% CI
p
Standardized coefficient showing
the change in mean hospital costs
with a change in 1 SD of X
2318
5992
12,745
10,735
9073
7640
17,288
5287
17,541
−6034
−5132
−8380
(1815–2822)
(4288–7695)
(8156–17,333)
(7338–14,132)
(4046–14,100)
(2838–12,442)
(6605–27,972)
(1550–9024)
(3208–31,874)
(−16,194–4127)
(−10,868–605)
(−13,281 to –3479)
o 0.001
o 0.001
o 0.001
o 0.001
o 0.001
0.002
0.002
0.006
0.017
0.253
0.079
0.001
2318
1950
1766
1671
976
903
817
722
606
−374
−452
NA
reducing the cost of critical care remains a matter of
debate.15,16,24,31–33 In a large cohort of patients with sepsis who
were treated in the ICU, we observed significant variation in costs
of care across hospitals. To better understand the reasons for this
variation, we used patient factors, hospital mission, and fixed cost
and length of stay indices to create a model that accounted for
nearly 70% of the variation in costs that we observed. Of this
explained variation in spending, approximately one-third was
related to the length of stay index. The remainder was related to
patient comorbidities and disease severity (20%), hospital mission
(16%), wage index (12%), and fixed costs (19%). These findings
indicate that although opportunities exist to reduce the cost of
critical care by changing physician practice, the majority (more
than 66%) of the explained variation in spending was the result of
factors that were outside of a physician's control.
These findings have important implications for hospitals that
are seeking to reduce their expenditures on critical care. A hospital
that is 1 SD above the mean for both ICU LOS and fixed costs
would save a roughly equivalent amount by either reducing ICU
LOS by one day or by reducing fixed costs by 33%. Given the
emphasis on the need for all hospitals to reduce spending in
upcoming years, it is likely that such a hospital will need to both
reduce ICU length of stay and reduce fixed costs. Reducing ICU
length of stay can be achieved by improving throughput (from the
emergency room to the ICU, the ICU to the medical floors, and
from inpatient to rehabilitation beds); training physicians to
identify and triage stable patients to medical beds; early focus
on ventilator weaning and physical therapy, reduction of complications such as deep vein thromboses, ventilator-associated pneumonia, and catheter-associated bloodstream infections; and
collaborative, team-based care that is focused on reducing length
of stay.34–40 Reducing fixed costs is potentially more difficult, but is
accomplished by reducing infrastructure, labor, or acquisition costs
(e.g., renegotiating purchasing or labor contracts, identifying
areas where the hospital is “overstaffed,” holding off on future
infrastructure investments hospitals such as large equipment
purchases or new buildings, or eliminating underutilized ICU
beds).14,15,17
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
Fig. 2. Contributors to variation in hospital spending.
Fig. 3. Change in mean hospital costs with a change in 1 SD.
Our findings generally support other studies on the cost of
critical care. In a retrospective cohort study at a single academic
center, Kahn et al. concluded that less than 20% of total costs were
related to care intensity.17 In another single-center study, Roberts
et al. reported that fixed costs of hospitalizations (related to
overhead and labor) represented more than 80% of total costs.14
Others have suggested that room, board, and labor (and not
mechanical ventilation, procedures, or medications) were the most
significant contributors to the cost of critical care18,24 and that
costs across the entire healthcare system are more tied to price
35
than to intensity.41 Taken with our study's findings, these data
suggest that initiatives that focus exclusively on intensity of care
may garner only modest cost reductions.
A strength of this study is that we examined a broad sample of
hospitals in both academic and community settings across all
geographic regions of the US. Although our methods were different from those used in prior studies (which calculated fixed vs.
variable costs in a single hospital), our conclusions are similar.14,17
We were able to add to the literature by quantifying the contributions of hospital mission (e.g., teaching or care for low-income
patients). In contrast to prior work that has reported on the
association between resource utilization and patient factors, we
found that only 20% of explained variation was attributable to
patient demographics, comorbidities, and severity of illness at the
time of hospital admission.42 Future work should consider the
production function of health care at the entire hospital level,
across more diagnoses, with a target of optimizing care while
improving efficiency.
This study has limitations as well. Perhaps most importantly, 30%
of variation in costs across hospitals remained unexplained. This may
be because we were not able to account for some contributors to
variation and because some of our variables were estimates of the
contributor in question. The fixed cost index was calculated using the
relative cost of a day of ICU but did not assess the cost of other items.
Hospitals with low room and board costs but high costs for other
items might therefore have an artificially low index. In prior studies
using this database, however, we have found that fixed and acquisition costs were added proportionally to all items.18,19 ICU length of
stay encompasses some aspects of practice pattern but does not
include other markers of care intensity such as procedures, tests, or
medications (some of these variables were modeled as separate
predictors). This might lead us to under- or over-estimate the percent
of cost variation that is in the control of the physicians. Because of
this, we considered using additional methods for estimating practice
pattern, such as use of a single test or procedure or a combination of
tests and procedures, but decided that because ICU length of stay is a
primary determinant of hospital cost18 and represents a synthesis of
the hundreds of medical decisions physicians make in a day, it was
our best option for simultaneously estimating practice pattern and
aspects of hospital throughput. However, we acknowledge that ICU
length of stay does not capture all aspects of practice and we are
unable to separately examine how much of length of stay is in the
physicians' immediate control and how much is due to broader
throughput factors.
Another limitation is that our database does not contain clinical
data, so unmeasured severity of illness could explain some of the
unexplained variation in cost. Notably, we did adjust for severity
using variables from a validated sepsis severity adjustment
method, so unmeasured severity is less likely to be a significant
contributor to variation than if we did not use such a measure. We
also did not have access to data on the number of ICU beds in each
hospital, number of intensivists, or information on nurse staffing,
which are other potentially important contributors to variation in
costs. We limited our study to hospitals that admitted 100 sepsis
patients to the ICU over a 2-year period and we excluded
transferred patients.
Our findings may not be generalizable to all hospitals. Although
our study utilizes data from 189 hospitals throughout the United
States, we do not know how spending on sepsis varies among
hospitals not in our database. Spending patterns on sepsis patients
may not be generalizable to all critically ill patients: patient
demographics, comorbidities, and acuity may explain spending
variation differently in a more diverse ICU population.
In summary, a large proportion of the hospital-level variation
in spending on critically ill sepsis patients is related to differences
in patient characteristics and immutable hospital characteristics,
36
Tara. Lagu et al. / Healthcare 1 (2013) 30–36
while nearly one-third is the result of differences in risk-adjusted
length of stay. Efforts to reduce spending on the critically ill should
aim to understand determinants of practice style but should also
focus on hospital throughput, overhead, acquisition, and labor
costs.
17.
18.
19.
Acknowledgments
20.
21.
The study was conducted with funding from the Division of
Critical Care Medicine and the Center for Quality of Care Research
at Baystate Medical Center, Springfield, MA. Premier Healthcare
Informatics, Charlotte NC, provided the data used to conduct this
study but had no role in its design, conduct, analysis, interpretation of data, or the preparation, review or approval of the
manuscript.
Drs. Lagu and Lindenauer had full access to all of the data in the
study and take responsibility for the integrity of the data and the
accuracy of the data analysis. Drs. Lagu, Lindenauer, Steingrub and
Rothberg conceived of the study. Dr. Lindenauer acquired the data.
Drs. Lagu, Lindenauer, Rothberg, Steingrub, Nathanson, and
Mr. Hannon, analyzed and interpreted the data. Dr. Lagu drafted
the manuscript. Drs. Lindenauer, Rothberg, Nathanson, Steingrub
and Mr. Hannon critically reviewed the manuscript for important
intellectual content. Dr. Nathanson carried out the statistical
analyses.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Lagu T, Rothberg MB, Shieh M-S, et al. Hospitalizations, costs, and outcomes
of severe sepsis in the United States 2003 to 2007. Critical Care Medicine.
2012;40(3):754–761.
Rothberg MB, Cohen J, Lindenauer P, Maselli J, Auerbach A. Little evidence of
correlation between growth in health care spending and reduced mortality.
Health Affairs (Millwood). 2010;29(8):1523–1531.
Fisher ES, Bynum JP, Skinner JS. Slowing the growth of health care costs—
lessons from regional variation. New England Journal of Medicine. 2009;360
(9):849–852.
Fisher ES, Wennberg DE, Stukel TA, et al. The implications of regional
variations in Medicare spending. Part 2: health outcomes and satisfaction
with care. Annals of Internal Medicine. 2003;138(4):288–298.
Shortell SM. Increasing value: a research agenda for addressing the managerial
and organizational challenges facing health care delivery in the United States.
Medical Care Research and Review. 2004;61(suppl 3):12S–30S.
Esserman L, Belkora J, Lenert L. Potentially ineffective care. A new outcome to
assess the limits of critical care. JAMA. 1995;274(19):1544–1551.
Garland A, Shaman Z, Baron J, Connors AF. Physician-attributable differences
in intensive care unit costs: a single-center study. American Journal of
Respiratory and Critical Care Medicine. 2006;174(11):1206–1210.
Cutler DM, Ly DP. The (paper) work of medicine: understanding international
medical costs. Journal of Economic Perspectives. 2011;25(2):3–25.
Abraham E, Singer M. Mechanisms of sepsis-induced organ dysfunction.
Critical Care Medicine. 2007;35(10):2408.
Angus DC, Wax RS. Epidemiology of sepsis: an update. Critical Care Medicine.
2001;29(suppl 7):S109–116.
Dellinger RP, Levy MM, Carlet JM, et al. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock: 2008.
Critical Care Medicine. 2008;36(1):296–327.
Knaus WA, Wagner DP, Zimmerman JE, Draper EA. Variations in mortality and
length of stay in intensive care units. Annals of Internal Medicine. 1993;118
(10):753–761.
Lipscomb J, Yabroff KR, Brown ML, Lawrence W, Barnett PG. Health care
costing: data, methods, current applications. Medical Care. 2009;47(7 suppl 1):
S1–6.
Roberts RR, Frutos PW, Ciavarella GG, et al. Distribution of variable vs fixed
costs of hospital care. JAMA. 1999;281(7):644–649.
Kahn JM. Understanding economic outcomes in critical care. Current Opinion
in Critical Care. 2006;12(5):399–404.
Kahn JM, Angus DC. Reducing the cost of critical care: new challenges, new
solutions. American Journal of Respiratory and Critical Care Medicine. 2006;174
(11):1167–1168.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
Kahn JM, Rubenfeld GD, Rohrbach J, Fuchs BD. Cost savings attributable to
reductions in intensive care unit length of stay for mechanically ventilated
patients. Medical Care. 2008;46(12):1226–1233.
Lagu T, Rothberg MB, Nathanson BH, et al. The relationship between hospital
spending and mortality in patients with sepsis. Archives of Internal Medicine.
2011;171(4):292–299.
Chen SI, Dharmarajan K, Kim N, et al. Procedure intensity and the cost of care.
Circulation: Cardiovascular Quality and Outcomes. 2012;5(3):308–313.
Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use
with administrative data. Medical Care. 1998;36(1):8–27.
Lagu T, Lindenauer PK, Rothberg MB, et al. Development and validation of a
model that uses enhanced administrative data to predict mortality in patients
with sepsis. Critical Care Medicine. 2011;39(11):2425–2430.
Lagu T, Rothberg MB, Nathanson BH, Steingrub JS, Lindenauer PK. Incorporating initial treatments improves performance of a mortality prediction model
for patients with sepsis. Pharmacoepidemiology and Drug Safety. 2012;21(suppl
2):44–52.
Wynn B, Coughlin T, Bondarenko S, Bruen B, Analysis of the Joint Distribution of
Disproportionate Share Hospital Payments. Prepared for Assistant Secretary of
Planning and Evaluation U.S. Department of Health and Human Services by
RAND under contract with the Urban Institute. 〈http://aspe.hhs.gov/health/
reports/02/DSH/〉, 2002 (Accessed 15.04.13).
Pastores SM, Dakwar J, Halpern NA. Costs of critical care medicine. Critical Care
Clinics. 2012;28(1):1–10.
Southern WN, Bellin EY, Arnsten JH. Longer lengths of stay and higher risk of
mortality among inpatients of physicians with more years in practice.
American Journal of Medicine. 2011;124(9):868–874.
Southern WN, Berger MA, Bellin EY, Hailpern SM, Arnsten JH. Hospitalist care
and length of stay in patients requiring complex discharge planning and close
clinical monitoring. Archives of Internal Medicine. 2007;167(17):1869–1874.
Austin PC, Tu JV. Bootstrap methods for developing predictive models. The
American Statistician. 2004;58(2):131–137.
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of
Statistics. 2004;32(2):407–499.
Bradley EH, Herrin J, Curry L, et al. Variation in hospital mortality rates for
patients with acute myocardial infarction. American Journal of Cardiology.
2010;106(8):1108–1112.
Halpern NA, Pastores SM. Critical care medicine in the United States 2000–
2005: an analysis of bed numbers, occupancy rates, payer mix, and costs.
Critical Care Medicine. 2010;38(1):65–71.
Luce JM, Rubenfeld GD. Can health care costs be reduced by limiting intensive
care at the end of life? American Journal of Respiratory and Critical Care
Medicine. 2002;165(6):750–754.
Orszag PR, Ellis P. Addressing rising health care costs—a view from the
Congressional Budget Office. New England Journal of Medicine. 2007;357
(19):1885–1887.
Stockwell DC, Slonim AD. Intensive care unit costs: to infinity and beyond or
not? Critical Care Medicine. 2008;36(9):2676–2678.
Rivers E, Nguyen B, Havstad S, et al. Early goal-directed therapy in the
treatment of severe sepsis and septic shock. New England Journal of Medicine.
2001;345(19):1368–1377.
Nguyen HB, Corbett SW, Steele R, et al. Implementation of a bundle of quality
indicators for the early management of severe sepsis and septic shock is
associated with decreased mortality. Critical Care Medicine. 2007;35
(4):1105–1112.
Pronovost P, Needham D, Berenholtz S, et al. An intervention to decrease
catheter-related bloodstream infections in the ICU. New England Journal of
Medicine. 2006;355(26):2725–2732.
Pronovost PJ, Thompson DA, Holzmueller CG, Dorman T, Morlock LL. The
organization of intensive care unit physician services. Critical Care Medicine.
2007;35(10):2256–2261.
Bucknall TK, Manias E, Presneill JJ. A randomized trial of protocol-directed
sedation management for mechanical ventilation in an Australian intensive
care unit. Critical Care Medicine. 2008;36(5):1444–1450.
Gao F, Melody T, Daniels DF, Giles S, Fox S. The impact of compliance with 6hour and 24-hour sepsis bundles on hospital mortality in patients with severe
sepsis: a prospective observational study. Critical Care. 2005;9(6):R764–R770.
Resar R, Pronovost P, Haraden C, et al. Using a bundle approach to improve
ventilator care processes and reduce ventilator-associated pneumonia. Joint
Commission Journal on Quality and Patient Safety. 2005;31(5):243–248.
Anderson GF, Reinhardt UE, Hussey PS, Petrosyan V. It's the prices, stupid: why
the United States is so different from other countries. Health Affairs (Millwood).
2003;22(3):89–105.
Odetola FO, Gebremariam A, Freed GL. Patient and hospital correlates of
clinical outcomes and resource utilization in severe pediatric sepsis. Pediatrics.
2007;119(3):487–494.
Healthcare 1 (2013) 37–41
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Innovating in health delivery: The Penn medicine innovation tournament
Christian Terwiesch a,b,c, Shivan J. Mehta a,c,d,n, Kevin G. Volpp a,c,d,e
a
Penn Medicine Center for Innovation, United States
Department of Operations and Information Management, The Wharton School, University of Pennsylvania, Philadelphia, PA19104, United States
Leonard Davis Institute of Health Economics, Center for Health Incentives and Behavioral Economics, United States
d
Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104, United States
e
Center for Health Equity Research & Promotion, Philadelphia Veterans Affairs Medical Center, United States
b
c
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 11 February 2013
Received in revised form
19 April 2013
Accepted 5 May 2013
Available online 13 May 2013
Background: Innovation tournaments can drive engagement and value generation by shifting problem-solving
towards the end user. In health care, where the frontline workers have the most intimate understanding of
patients' experience and the delivery process, encouraging them to generate and develop new approaches is
critical to improving health care delivery.
Problem: In many health care organizations, senior managers and clinicians retain control of innovation.
Frontline workers need to be engaged in the innovation process.
Goals: Penn Medicine launched a system-wide innovation tournament with the goal of improving the patient
experience. We set a quantitative goal of receiving 500 ideas and getting at least 1000 employees to participate
in the tournament. A secondary goal was to involve various groups of the care process (doctors, nurses, clerical
staff, transporters).
Strategy: The tournament was broken up into three phases. During Phase 1, employees were encouraged to
submit ideas. Submissions were judged by an expert panel and crowd sourcing based on their potential to
improve patient experience and ability to be implemented within 6 months. During Phase 2, the best 200 ideas
were pitched during a series of 5 workshops and ten finalists were selected. During Phase 3, the best 10 ideas
were presented to and judged by an audience of about 200 interested employees and a judging panel of 15
administrators. Two winners were selected.
Results: A total of 1739 ideas were submitted and over 5000 employees participated in the innovation
tournament. Patient convenience/amenities (21%) was the top category of submission, with other popular
areas including technology optimization (11%), assistance with navigation within UPHS (10%), and improving
patient/family centered care (9%) and care delivery models/transitions (9%). A combination of winning and
submitted ideas were implemented.
Implications:
Keywords:
Health delivery
Innovation
Operations improvement
Innovation tournaments can successfully engage a large portion of the employee population.
Innovation tournaments represent a “bottom-up” approach to health care innovation and a method
by which innovation can be democratized from the control of administrators and executives.
Further research is needed to test, evaluate and improve innovation tournaments.
& 2013 Elsevier Inc. All rights reserved.
1. Introduction
Innovation can be defined as creating a novel match between
a solution and a need in order to create value. Innovation in
medicine has a long history but is often confused with invention.
Invention hinges only on the concept of novelty while innovation
n
Correspondence to: Perelman School of Medicine, University of Pennsylvania,
1137 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104, United States.
Tel.: +1 215 898 9807.
E-mail address: [email protected] (S.J. Mehta).
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.05.003
requires both novelty and value creation. Value is created if the
novel match between solution (a treatment) and need (disease or
injury) leads to better patient outcomes.
The process of innovation begins with the discovery of novel
needs and solutions or with a recombination of existing solution and
needs. At the initial stage though, value generation is merely a
hypothesis and the front-end of the innovation process really consists
of opportunity identification. Well-designed innovation is heavily
grounded in the scientific method, with opportunity identification
and hypothesis testing being key steps in the process. As economic
pressures on the health care system have risen new approaches to
38
C. Terwiesch et al. / Healthcare 1 (2013) 37–41
increase the value of health services provided have gained in
importance. For example, the triple aim describes improving the
patient experience of care, improving the health of populations, and
reducing the per capita cost of health care.1
In many healthcare and non-health care organizations, senior
managers and clinicians retain control of innovation—they create
new opportunities and select the ones to be implemented. However, a top-down approach to innovation might be ill suited for
areas that require a more intimate understanding of patients'
experience and delivery process. In this article, we describe a
‘bottom-up’ approach to improve the patient experience which
engage a large group of innovators, including front-line care
givers, administrative support staff, and hospital support functions. Specifically, we invited all 18,000 employees within Penn
Medicine to participate. This broader innovator base called for
significant modifications to the traditional innovation process and
led us to conduct an innovation tournament.
2. Innovation tournaments
Innovation tournaments are very similar to the request for
proposals that underlies many grant-funded initiatives. The host of a
tournament issues a call for ideas (opportunities) in a particular area of
interest; in our case, ideas on how to improve the patient experience.
The innovator community, in our case the entire population of
employees at Penn Medicine, is invited to submit opportunities. These
opportunities then go through multiple steps of screening and
evaluation (including crowd-sourcing and peer review). A few ideas
emerge as winners while many others do not advance.
This process is common among all tournaments, including
examples of the various Xprizes (such as the search for private
space travel technology and the development of highly fuel
efficient vehicles) and DARPA challenges (including the search
for a driverless car technology). Innovation tournaments have
been advocated in the academic literature as a broader movement
to “Democratize Innovation”.2–4 The locus of creative problem
solving is shifted towards the end user, in our case the front line
care-giver. A basic principle in running such tournaments is to
generate a high number of submissions from a large number of
employees since knowledge is “sticky” and distributed.5,6 Both the
number and diversity of the participants increases the quality of
the best ideas of an innovation process.3
3. Setting the stage for the Penn medicine innovation
tournament
Many aspects of the care delivery process that are important to
the patient are under the influence of nurses, patient transporters,
or administrative support staff. Involving these groups can help
identify new opportunities that would be overlooked if innovation
were left to physicians and executives alone. We set an initial goal
of 500 ideas and aspired to get at least 1000 employees involved in
the process, either by identifying an opportunity or by commenting on opportunities identified by others. All employees who
submitted an idea were allowed to vote on the ideas of others.
Penn Medicine is the umbrella organization that includes the
Perelman School of Medicine at the University of Pennsylvania and
the University of Pennsylvania Health System (UPHS). The goal of
the innovation tournament was to engage the broader community
to identify and select new ideas that generate value for the
patients of the health system. Understanding the complex organization and communicating with the different employees was a
big challenge identified early in the planning process. This tournament offered the potential to unite this diverse employee base
with a common purpose.
The Chief Executive Officer of UPHS endorsed the concept of a
tournament in November 2011. A steering committee was formed,
including UPHS executives and faculty from the medical and
business schools with an expertise in health services. The tournament challenged the employee base to come up with ideas that
would significantly improve the patient experience because this
theme transcended all employees across the organization. The
steering committee enlisted a working group from across the
health system to operationalize the logistics and leverage existing
communication channels in the organization.
4. Running the tournament
The tournament was broken up into three phases (Fig. 1). Phase 1
took place during January–February 2012 with the goal of encouraging the submission of ideas from across the health system. Beyond
the quantitative goal of receiving 500 ideas, a secondary goal was to
obtain a representative participation from the various groups
involved in the care process (doctors, nurses, clerical staff, transporters) and from the multiple sites that constitute the health system.
An active communication program was launched to reach the health
system's 18,000 employees, including flyers and posters, a strong
online presence, as well as several email memos from the CEO. The
communication channels directed participants to a specially
designed website that allowed for idea submission, viewing of ideas,
and rating of ideas on a scale of 1–5 through crowd-sourcing. There
were no monetary prized for winning or moving forward because it
was felt that it would not increase creativity in a mission-based
organization, and large cash prizes may hinder the free and open
sharing of ideas.
The number of submissions and the overall participation
substantially exceeded the previously set objectives. A total of
1739 ideas were submitted and over 5000 employees participated
by commenting and rating ideas. A total of 200 ideas were moved
from Phase 1 to Phase 2. This was done based on a combination of
expert rating and crowd sourcing on a scale of 1–5. A panel of 29
experts from various parts of the health system was asked to
evaluate a set of at least 400 ideas that were presented to them in
random order. Moreover, each idea obtained was rated by “the
crowds” (any employee that wanted to evaluate an idea). The two
criteria suggested for rating ideas were the potential to improve
patients experience and if it could be implemented in 6 months.
Given that there was uncertainty of the results, the decision was
made to keep criteria deliberately broad. Ideas were moved
forward for having a high mean score, but also if there was significant
excitement with a significant percentage of high scores (4–5).
The second phase of the tournament consisted of a set of
5 workshops representing the best 200 ideas from Phase 1. Each
of the workshops was structured to allow for the voting on ideas
and collaboration among newly formed teams. After some opening
comments and instructions from the moderator, participants were
asked to present (“pitch”) their idea. Participants were asked to
prepare a short poster summarizing their idea and present in less
than 90 s. Voting was conducted using simple stickers, and teams
of five were formed around the best ideas. Teams were then given
1 h to refine and improve the idea. After that, the best 6–10 ideas
were presented again and two ideas from each work-shop were
selected to advance to Phase 3.
Phase 3 of the tournament consisted of a presentation of the
ten (5 workshops, 2 ideas per work-shop; Fig. 2) best ideas to an
audience of about 200 interested employees and a judging panel of
15, which included the Chief Executive Officer, Chief Medical
Officer, as well as a representation of leadership at UPHS and the
School of Medicine. Between Phase 2 and Phase 3, the corresponding teams were provided with time and resources to create a
C. Terwiesch et al. / Healthcare 1 (2013) 37–41
Fig. 1. Overall ideas by phase.
Fig. 2. Top ten finalist ideas.
39
40
C. Terwiesch et al. / Healthcare 1 (2013) 37–41
Category
Example ideas
Convenience/amenities
(21%)
•
•
•
Technology optimization
(11%)
•
•
•
Navigation (10%)
•
•
•
Restaurant pagers/buzzers in waiting areas for patients
to be called for clinic appointments
Artwork for patients, families, and coworkers
Flexibility to order food online at any time in the hospital
Electronic consent form IPad application
Enhanced patient ID cards with on-line forms and
records
Online payment for patient bills
iPhone app to help patients and families navigate
hospital
ED waiting room patient queue screen
QR codes on signs in the hospital tolaunch directions to
specific places
Improving patient/ family
centered care (9%)
•
•
•
Iconography cards for non-English speaking patients
Daily summary of care plan for patients in the hospital
Speedy pre-check in process the night before an
appointment
Care delivery models/
transitions (9%)
•
Cellphone app for adherence monitoring and
intervention
Designate a Care Manager assigned just for patients
readmitted to improve transitions
24/7 clinical triage line to assist health system patients
•
•
Fig. 3. Categories of solutions.
10 min presentation that addressed the problem, solution, metrics
for success, and resources necessary for implementation. Each
of the Phase 3 finalists had their picture taken with the CEO and
the head of HR and received a trophy. The overall winners of the
tournament were chosen based on the votes of the judges as well
as an audience participation system. The final judges were instructed to evaluate “How new or novel is this idea for health care
delivery?” and “What is the potential for this idea to improve the
patient experience at Penn Medicine?” 2 winning ideas were
selected to move forward with the appropriate resources necessary
for implementation. The winning team members were given the
opportunity to actively participate in these development teams.
the health system executives as well as excitement from across the
organization, so a combination of top-down and bottom-up
approach. 10 ideas emerged as finalists. These ideas included
a patient kiosk for check-in as well as introducing an online
scheduling tool to book appointments with providers. It lies in
the nature of an innovation tournament that these ideas are not
“shovel ready” for execution, but still require a careful process
leading to a staged implementation, which started in the summer
of 2012. Another useful insight was that many solutions were not
based on the professional experiences of the employees, but rather
based on their experience as patients. In fact, the health system
is moving forward with one of the top ideas was to have an
innovation tournament with patients as the participants.
5. Post event assessment
Patient experience was a deliberately broad topic and the ideas
submitted transcended all aspects of patient experience including
convenience, care delivery, environment, and education (Fig. 3).
Patient convenience/amenities (21%) was the top category
of submission, with other popular areas including technology
optimization (11%), assistance with navigation within UPHS (10%),
and improving patient/family centered care (9%) and care delivery
models/transitions (9%).
Given the complex organization of an academic medical center,
there are a variety of “solvers” who proposed ideas, ranging from
physicians, nurses, managers, and additional staff. A stated goal of
the initiative was to generate ideas from all types of workers
across the organization, so all groups were represented in the
submissions. The role and background of “solvers” is of particular
interest, since it provides some insight into future efforts to
promote innovation across the organization. These “solvers” were
typically the clinical staff, which included nurses and physicians.
This could be a result of their close interaction with patients or
that they have deep expertise in an adjacent field to the many
non-clinical patient experience solutions.
Several important takeaways emerged. The tournament was
highly successful at engaging a large portion of the employee
population, with more than half of the organization participating
as either idea submitters, voters, or visiting the tournament
website. In order for this to happen, it required the support of
6. Conclusion
The success of a tournament can be measured on two dimensions.
The main and obvious success measure is to evaluate the output of the
tournament process, i.e., the quality of the ideas generated. Other
factors to be considered on this first dimension are the quality of the
team that was formed over the course of the tournament and the level
of organizational support and legitimacy the idea has received by
going through this fair and transparent process. The second dimension
of an evaluative framework looks at the impact the tournament had
on the culture of innovation in the organization as well as the
employee's engagement and commitment.
There is certainly a balance between having a broad or narrow
area of focus for the innovation tournament. A broad problem
could potentially increase the number of participants and allow for
more creativity in the problem statement. However, there is
sometimes value to identifying an important more specific problem, which may constrain the participants to think of more
divergent solutions in key areas. We believe that this depends
on the priorities of the organization. Our tournament had a
deliberately broad focus area in order to maximize participation
from a wide range of employees across the system. In thinking
about innovation tournaments, it is helpful to consider the depth
of expertise and scale of investment required.3 However, the
challenges of health delivery are multi-faceted and less capital
C. Terwiesch et al. / Healthcare 1 (2013) 37–41
intensive, so they could be better “distributed” to the frontline
workers.
There is a challenge to having only two “winners”, so it is
important to realize that the innovation process does not end with
the final session. Winning ideas need to be implemented to learn
more about the solutions, but also about the problem being
addressed. Meanwhile, organizations can increase the number of
“winners” by facilitating the development and testing of ideas that
did not emerge through the tournament. Submitted ideas in many
cases provided quick fixes to previously unaddressed problems—
they provided useful insights from the front line, yet were too
small or too specific to attract a large number of votes in the
tournament. While not chosen as tournament winners, these ideas
were valuable and were fed back to the responsible department or
team leaders for implementation. Some of them were implemented within weeks of the tournament closing.
Beyond the ideas themselves, the tournament succeeded at
creating the sense among employees that their ideas and participation are valued. For a large health care organization, such an
increase in employee engagement and the associated change in
organizational culture might be as valuable as the actual winning
ideas themselves in bringing about subsequent improvements in
the patient experience. They source and prioritize a large number
41
of opportunities at a relative low cost (either measured in the cost
per idea or the cost per selected idea). As such, they do create
value and are a more effective approach compared to using outside
consultants or a centralized organizational unit that “owns” the
innovation task. Future areas of research could involve studying
how engagement of employees in innovation tournaments could
alter their future activities outside of the tournament and in testing
different ways to evaluate and move forward tournament ideas.
References
1.
2.
3.
4.
5.
6.
Berwick DM, Nolan TW, Whittington J. The triple aim: care, health, and cost.
Health Affairs. 2008;27(3):759–769.
Hippel Ev. Democratizing innovation.Cambridge, Massachusetts: MIT Press. x;
204.
Terwiesch C, Ulrich KT. Innovation tournaments: creating and selecting exceptional opportunities.Boston, Massachusetts: Harvard Business Press. x; 242.
Terwiesch Christan, Xu Yi. Innovation contests, open innovation, and multiagent problem solving. Management Science. 2008;54(9):1529–1543.
Lakhani Karim R, Panetta Jill A. The principles of distributed innovation.
Innovations. 2007;2(3):97–112.
Lakhani KR, Jeppesen LB, Lohse PA, Panetta JA. The value of openness in scientific
problem solving. Harvard Business School Working Paper No. 07–050. Available
from: hhttp://www.hbs.edu/research/pdf/07–050.pdf;i 2007 [accessed 4.05.2007].
Healthcare 1 (2013) 42–49
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Review
What can the past of pay-for-performance tell us about the future
of Value-Based Purchasing in Medicare?
Andrew M. Ryan a,n, Cheryl L. Damberg b
a
b
Department of Public Health, Weill Cornell Medical College, 402 East 67th Street, New York, NY, USA
RAND Corporation and Pardee RAND Graduate School, Santa Monica, CA, USA
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 8 March 2013
Received in revised form
15 April 2013
Accepted 21 April 2013
Available online 9 May 2013
The Medicare program has implemented pay-for-performance (P4P), or Value-Based Purchasing, for
inpatient care and for Medicare Advantage plans, and plans to implement a program for physicians in
2015. In this paper, we review evidence on the effectiveness of P4P and identify design criteria deemed to
be best practice in P4P. We then assess the extent to which Medicare's existing and planned Value-Based
Purchasing programs align with these best practices. Of the seven identified best practices in P4P
program design, the Hospital Value-Based Purchasing program is strongly aligned with two of the best
practices, moderately aligned with three, weakly aligned with one, and has unclear alignment with one
best practice. The Physician Value-Based Purchasing Modifier is strongly aligned with two of the best
practices, moderately aligned with one, weakly aligned with three, and has unclear alignment with one
of the best practices. The Medicare Advantage Quality Bonus Program is strongly aligned with four of the
best practices, moderately aligned with two, and weakly aligned with one of the best practices. We
identify enduring gaps in P4P literature as it relates to Medicare's plans for Value-Based Purchasing and
discuss important issues in the future of these implementations in Medicare.
& 2013 Elsevier Inc. All rights reserved.
Keywords:
Pay-for-performance
Value-based-purchasing
Payment
Medicare
Contents
1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. Summary of research on hospital and physician P4P . . . . . . . . . . . . . . . . . . . . . . . .
3. Precursors to nationwide implementation of Value-Based Purchasing in Medicare
4. Medicare's Value-Based Purchasing programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5. Comparison of Medicare's direction in Value-Based Purchasing and best practices.
6. Considerations for the future of Value-Based Purchasing in Medicare . . . . . . . . . . .
Conflict of interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A.
Identifying best practices for the design of P4P programs . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Background
The concept of pay-for-performance (P4P) – that payers should
explicitly link provider reimbursement with performance on
quality measures – is compelling. Because patients have a limited
ability to observe the quality of care that they receive,1 providers
have lacked the incentive to provide sufficiently high quality care,
n
Corresponding author. Tel.: +1 646 962 8077.
E-mail address: [email protected] (A.M. Ryan).
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.006
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
42
43
43
43
44
46
47
47
47
48
resulting in suboptimal quality across the health care system.2
In response, both public and private payers have attempted to
incentivize the delivery of high quality care through the payment
system by initiating P4P programs. P4P has now been implemented nationally by Medicare for inpatient care3 and for Medicare
Advantage (MA) plans, and starting in 2015 will be implemented
for physicians as part of the Physician Value-Based Payment
Modifier.4
However, despite the best efforts of researchers, the question “Does
pay-for-performance improve quality in health care?” remains frustratingly elusive. Even after widespread implementation of P4P in the
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
United States, international P4P efforts,5 and accumulating research,
much is still unknown about the conditions under which P4P is most
effective and whether P4P has the potential to be a cost-effective
means of improving quality.6 Further, research has little to say about
extensions of P4P “version 1.0,” such as how P4P can improve the value
of care,7 not just quality. While policymakers would like to know
how a specific incentive targeted towards a specific provider can be
expected to impact a specific quality measure, research to date can, at
best, guide policymakers towards general principals of implementation. Nonetheless, the existing literature on P4P can provide some
guidance to policymakers about how we should form expectations as
to the likely effectiveness of future programs, the key design features
that will influence the success of these programs, and how we can
understand the risk and reward trade-off between more highly
powered incentives and the potential for unintended consequences.
In this article, we draw on the recent literature in P4P to
identify key insights that are relevant to national P4P implementation efforts. We focus on how the literature may inform P4P, now
referred to as Value-Based Purchasing, implementations in Medicare: the Hospital Value-Based Purchasing (HVBP) program, the
Physician Value-Based Payment Modifier (PVBPM), and the Medicare Advantage Quality Bonus Program (QBP). We then discuss
additional considerations for the design of these programs and
how research can support these efforts.
2. Summary of research on hospital and physician P4P
By 2004, 37 separate P4P programs had been implemented in
the United States, almost exclusively by private payers in the
outpatient setting,8 and by 2006 more than half of the HMOs used
pay-for-performance9 and most state Medicaid programs were
using some form of P4P.10 National estimates indicate that, by
2007, approximately half of physician practices had been exposed
to P4P from private payers or Medicaid.11,12
A number of influential articles have assessed the extent of P4P
implementation and evidence of payment for quality programs
implemented in the previous two decades.6,13–15 While equivocal,
reviews of the early published studies suggested that financial
incentives for quality could generate improvement under some
circumstances.13 More recent reviews of the literature have
painted a more mixed picture of the overall effectiveness of P4P,
and have also begun to identify conditions under which P4P could
be more effective. A review by Flodgren et al.16 found that financial
incentives were generally effective in improving processes of care
(improving 41/57 measures from 19 studies) but generally ineffective in improving compliance with a pre-specified population
quality target (improvement observed for 5/17 measures from five
studies). Other systematic reviews of the evidence found insufficient evidence to support (or not support) the use of financial
incentives for quality of care in primary care and for individual
physicians.17,18 This work also suggested that more methodologically rigorous studies were less likely to find positive effects of
P4P.18 In addition, a review by Van Herck et al.19 found that P4P
programs tended to show greater improvement on process measures compared to outcomes, that the positive effect of incentives
was generally greater for initially low performers compared to
higher performers, that it was unclear how the magnitude of
incentives impacted the effectiveness of P4P programs, and that
programs aimed at the individual-provider level and/or team level
generally reported positive results.
However, most of the programs evaluated in these reviews were
small scale P4P experiments, initiated either by a single payer
within a health care market or for a select group of providers. The
extent to which results from these programs would generalize to
national, mandatory implementations of P4P is unclear.
43
3. Precursors to nationwide implementation of Value-Based
Purchasing in Medicare
For hospitals, the research that is most relevant to Medicare's
implementation of HVBP comes from the Premier Hospital Quality
Incentive Demonstration (HQID). Under this demonstration, 266
hospitals, all subscribers to Premier's “Perspective” hospital performance benchmarking service, agreed to collect and report data
on a set of quality measures and make their performance subject
to financial incentives. Implementation of the HQID occurred
in two phases. Results from initial studies of the phase 1 HQID
implementation appeared promising: two studies reported that
participating hospitals experienced modestly greater rates of
quality improvement for process of care measures compared with
comparison hospitals for each of the incentivized diagnoses
examined in the first three years of the program.20,21 However,
subsequent studies on the HQID raised doubts that the program
improved quality performance.22–24 Even more discouraging
were results from phase 2 of the HQID which found that
changes in program design did not generate additional quality
improvement.25,26 Detailed re-analyses of the initial data
suggested that the early program success may have been due
to selection of stronger hospitals into the demonstration and
the later slowdown in improvement may have resulted from many
of the incentivized performance measures becoming “topped
out”.26 Other research found that the HQID did not appear
to improve mortality outcomes across both phases of
implementation.27
On the physician side, while numerous P4P programs have
been implemented by private payers and state Medicaid programs,
there are few large-scale programs to predict what might occur
under a Medicare-implemented physician Value-Based Purchasing
program. The most substantial private payer implementations of
P4P have been for physician group practices in California including
the PacifiCare Quality Incentive Program (QIP) and the California
Integrated Healthcare Association's (IHA) P4P program. Evaluation
of the QIP found very modest results28 while the IHA initiative was
found to have changed the behavior of the physician organizations, leading to an increased organizational focus on quality
and IT adoption.29 Less optimistically, evidence from a major
P4P initiative implemented by five large commercial payers in
Massachusetts found that the program did not improve performance on a series of incentivized HEDIS quality measures.30
In a few cases, P4P programs have been implemented among
physician practices that focus on improving both cost and quality
performance, using a shared savings incentive model. The Medicare Physician Group Practice (PGP) demonstration in 10 physician
practices shared savings contingent on physician groups demonstrating improvement in quality on clinical measures and reductions in beneficiary costs. Another program, the Blue Cross
Alternative Quality Contract (AQC), uses quality bonuses along
with a shared savings model based on global budgets that includes
both upside and downside risk for 11 participating provider
organizations in Massachusetts. Compared to a comparison group
of non-participating practices, quality improved more for AQC
practices for chronic care management, adult preventive care, and
pediatric care in the second year of the program, and has resulted
in modest reductions in spending trends.31
4. Medicare's Value-Based Purchasing programs
Following the experience of the HQID, the Affordable Care Act
(ACA) enacted Hospital Value-Based Purchasing (HVBP) for all
acute care hospitals in the United States, making HVBP the
first national implementation of P4P in the United States. Under
44
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
HVBP, acute care hospitals – those paid under Medicare's Inpatient
Prospective Payment System (IPPS) – received payment
adjustments starting in October of 2012 based on their performance on 12 clinical process and 9 patient experience measures
from July 1, 2011 through March of 2012. Based on their performance on the clinical process and patient domains, hospitals
receive a total performance score. In FY 2012, the score is weighted
so that 70% is clinical process and 30% is patient experience.
HVBP is budget neutral, redistributing hospital payment
“withholds” from “losing” to “winning” hospitals that were equal
to 1% of hospital payments from Diagnosis Related Groups (DRGs).
Incentive payments in HVBP are based on an approach that
incorporates both quality attainment and quality improvement,
incentivizing hospitals for incremental improvements and foregoing the all-or-nothing threshold design of other programs.
HVBP will also evolve over time. The magnitude of hospital
payments at risk in HVBP increases by 0.25% per year, from 1%
of hospital payments from DRGs in FY 2013 up to 2% in 2017 and
beyond. The measures incentivized in the program will also
change: in FY 2014, the program will add performance measures
for 30-day mortality for heart attack, heart failure, and pneumonia
and in FY 2015 patient safety indicators and the first efficiency
measure (Medicare spending per beneficiary) will be incentivized
as well.32
The PVBPM, the first national P4P initiative for physicians in
fee-for-service Medicare, began quality measurement on January
1, 2013 and will effect payments beginning in 2015. The measure
reporting and feedback components of the PVBPM are based on
the voluntary Physician Quality Reporting Initiative (PQRS), which
was implemented in 2007. For the PVBPM, physicians in practices
of 100 or more eligible professionals will be subject to a valuebased payment modifier in 2015, while all physicians in fee-forservice Medicare will be subject to the payment modifier in
2017. In 2013, physicians in all practices must report to the PQRS
in order to avoid a payment penalty in 2015.33 Practices can report
relevant data through several mechanisms, including through a
web interface or through a CMS registry, or request that
CMS calculate the measure performance through administrative
claims. Large practices that are subject to the PVBPM can then
either elect to be subject to quality tiering, which may result in
an upward and downward adjustment in payment, or elect to
not be subject to quality tiering, resulting in no payment adjustment. Large practices that do not report to PQRS will face a
1% value modifier penalty, and may face further penalties from
non-compliance with the PQRS. Practices will be evaluated on
measures related to all cause readmission, acute prevention
indicators, and chronic indicators and will also be evaluated on
measures on total per-capita costs, as well as for costs for patients
with specific chronic diseases. Standardized measures for the
cost and quality domains, based only on levels of performance,
will then be determined, and practices electing to face quality
tiering will face payment adjustments based on their combination of quality and cost performance. The details about how the
PVBPM will be expanded to smaller practices not yet been
determined.34
The ACA also initiated P4P in Medicare Advantage plans under
the Quality Bonus Program (QBP). Starting in 2015, Medicare
Advantage plans can earn substantial quality bonus payments
based on their performance on measures on clinical performance,
patient experience, patient-reported outcomes, and customer
complaints and other service indicators. These plans receive a star
rating based on their performance, and plans that receive 4 stars
or higher (out of 5) are eligible for bonus payments.35 Medicare is
currently running a three-year demonstration of this program
(2012–2014), with bonuses paid to plans that earn 3 stars or
higher.
5. Comparison of Medicare's direction in Value-Based
Purchasing and best practices
After reviewing the literature, we derived the following seven
criteria reflecting “best practice” in the design of P4P.
1. Choose measures that have room for improvement.
2. Promote widespread awareness of the program.
3. Coordinate program design (performance measures and payout
criteria) across payers (public and private payers, different public
payer initiatives).
4. Incentivize both quality attainment and quality improvement.
5. Adjust programs dynamically to recalibrate measures and
payment thresholds.
6. Pay incentives that are sufficiently large to motivate a behavioral response
7. Provide technical assistance to participating providers.
The process we used to arrive at the seven best practice criteria
is found in Appendix A. We mapped these criteria against the
ongoing and planned P4P program design features in Medicare to
identify the extent to which the current design of Medicare's P4P
programs are aligned with best practice in P4P. We assessed the
degree of alignment between HVBP, the PVBPM, and the Medicare
Advantage QBP and our identified best practices, determining
whether each program has weak (i.e., little or no alignment
between program design and best practice), moderate (i.e., some,
but not complete, alignment between program design and best
practice), strong (i.e., complete alignment between program
design and best practice), or unclear alignment (i.e., program
design is not sufficiently detailed to determine alignment or
degree of alignment cannot be determined) with each best
practice (Table 1).
The current design of HVBP is moderately or strongly aligned
with criteria #1 (measures have room for improvement), #2
(promote widespread awareness of the program), #4 (incentivize
quality attainment and quality improvement), #5 (adjust programs dynamically), and #7 (provide technical assistance to
participants); however, the program does not appear to be aligned
with criterion #3 (coordinate program design across payers) and it
is uncertain whether the design of HVBP is aligned with criterion,
#6 (pay sufficiently large incentives), particularly in light of
recently published data from CMS indicating that approximately
$150 million will be redistributed based on payment updates in
the first year of HVBP, resulting in an average absolute impact of
about $50 thousand per hospital.36 Also, while its dual criteria
of achievement and improvement are noted as a best practice in
P4P design, this approach is similar to the design of Phase 2 of the
HQID, which failed to maintain quality improvement among
participating hospitals and also did not generate quality improvement for initially lower performing hospitals.26 In addition, the
MassHealth hospital P4P program that based its incentive design
specifically on that of HVBP also failed to improve quality.37
For the PVBPM, the details of implementation for all physicians
remain to be determined. However, for large practices, the PVBPM
appears to be strongly aligned with criteria #1 (measures have
room for improvement) and #5 (adjust programs dynamically)
and moderately aligned with criterion #2 (promote widespread
awareness of the program). The PVBPM appears to be weakly
aligned with criteria #3 (coordinate program design across payers),
#4 (incentivize quality attainment and quality improvement), and #7
(provide technical assistance to participants), and it is uncertain
whether the PVBPM is aligned with criterion #6 (pay sufficiently
large incentives).
Beyond these criteria a number of important considerations
remain in the implementation of PVBPM. First, because it appears
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
45
Table 1
Alignment between pay-for-performance best practices and Medicare's pay-for-performance initiatives.
P4P design
feature
Program
Hospital Value-Based Purchasing
Physician Value-Based Payment Modifier
Medicare Advantage Quality Bonus Program
Degree of
alignment
with best
practice
Comment
Degree of
alignment
with best
practice
Comment
Degree of
alignment
with best
practice
Comment
1. Choose
measures that
have room for
improvement
Moderate
Existing measures reported for
PQRS generally have room for
improvement.33
Strong
Clinical process (HEDIS), patient
satisfaction, and patient reported
outcomes measures have room for
improvement.48
2. Promote
widespread
awareness of
the program
Strong
Clinical process measure selection
Strong
was based on measures not being
“topped off.” Future measures, such
as patient safety indicators, have low
prevalence and little room for
improvement for many hospitals.
Quality Improvement Organizations Moderate
(QIOs) are intended to work with
participating hospitals and inform
them about the program rules.
Promotion of program through
Strong
specialty societies and other
means. Relatively small number of
large practices makes promotion of
initial implementation easier.
Same as Hospital Value-Based
Weak
Purchasing.
Large bonuses at stake in the
program have created widespread
awareness. QIOs are also intended
to engage plans.
3. Coordinate
Weak
program design
(performance
measures and
payout criteria)
across payers
4. Incentivize
both quality
attainment and
quality
improvement
5. Adjust
programs
dynamically to
recalibrate
measures and
payment
thresholds
6. Pay incentives
that are
sufficiently
large to
motivate a
behavioral
response
7. Provide
technical
assistance to
participating
providers
There is little to no coordination
across private payers or other
Medicare programs currently,
although CMS is beginning to take
steps to coordinate across its own
programs and to harmonize with
private plans.
HVBP gives equal weight to quality
achievement and improvement.
Weak
Weak
Incentivizes only achievement.
Strong
Plans receive “stars” based on
quality performance, one of which
is based on quality improvement.
Moderate
CMS has some ability to change
performance measures in future
years. However, the regulatory
process introduces long lag times in
changing the program design.
Strong
Measure performance is
standardized relative to the
population of providers
Moderate
Same as Hospital Value-Based
Purchasing.
Unclear
Incentive payments are relatively
Unclear
small in the first program year and
increase gradually over time.
Program builds on the high profile
Hospital Compare public reporting
program, which may reduce need for
large incentives.
Quality Improvement Organizations Weak
are intended to assist providers.
QIOs, historically, have stronger
relationships with hospitals than
other providers.
Strong
Moderate
Same as Hospital Value-Based
Purchasing. However, historical
use of HEDIS measures in
managed care profiling creates
common measure set for P4P
across payers.
Eligible practices are subject to
Strong
substantial penalties (up to 2.5%)
for not participating. Incentives for
performing better on value tiering
appear small
$8 Billion in payouts during the
demonstration.48
Does not appear to be explicit
mechanism to provide technical
assistance to providers. There
exists the potential for QIOs to
assist physicians and practices.
QIOs are intended to assist
providers.
Moderate
Number of best practice criteria with moderate or strong alignment: Hospital Value-Based Purchasing—5; Physician Value-Based Payment Modifier—3; Medicare Advantage
Quality Bonus Program—6.
that the PVBPM applies only to physicians in practices of 100 or
more in its first year of implementation, this severely limits
participation at the outset of the program. Only 6% of US
physicians work in practices of 51 or greater,38 and far fewer
practice in groups of 100 or more. Even if program administration
is successfully initiated for these large practices, implementation
issues will differ substantially for smaller practices, and the
lessons learned from the initial stage of implementation may not
be applicable across all practices. Second, when rollout does occur
for smaller practices, the small sample sizes will reduce the
reliability of the performance estimates on individual measures.
Improving the reliability of performance estimates will be important to guard against paying on the basis of noisy data. Third, it
remains to be seen if and how a shared savings model would be
implemented for the PVBPM: What benchmark will be used to
determine savings (e.g., savings relative to a practice's historical
performance or relative to comparable practices)? Will PVBPM
reward quality achievement and improvement? Also, what plans
are in place to allow the design of the PVBPM to evolve over time,
for instance to set new thresholds for payment, retire old quality
measures, and integrate new measures? Finally, physicians participating in Medicare Shared Savings Accountable Care Organizations (ACO), Pioneer ACOs, or the Comprehensive Primary Care
Initiative will not initially be subject to the PVBPM, and it is
unclear how the PVBPM will integrate with ACO incentives as both
programs expand.
The design of the Medicare Advantage QBP has the strongest
alignment with the best practice criteria. The QBP is moderately
or strongly aligned with criteria #1 (measures have room for
improvement), #2 (promote widespread awareness of the program), #4 (incentivize quality attainment and quality improvement), #5 (adjust programs dynamically), #6 (pay sufficiently
large incentives), and #7 (provide technical assistance to participants), and does not appear to be aligned with #3 (coordinate
46
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
program design across payers). While the Medicare Advantage
QBP program will reward both quality achievement and improvement, it will be important to monitor how plans attempt to meet
these objectives as the threshold for receipt of the bonus payment
is raised in 2015 so as to avoid plans selecting patients on the basis
of how likely they are to meet criteria for incentive payments,
further contributing to historical selection issues in Medicare
Advantage. Another issue is how the Medicare Advantage QBP
measures will align with those in the PVBPM. Medicare's entry
into P4P was supposed to overcome the problem of numerous
small-bore P4P programs competing for providers' attention, but
different P4P programs for physicians caring for both Medicare
fee-for-service and Medicare Advantage patients is contrary to
this objective. Coordination across the various federal and private
payer P4P programs is critical so as to minimize burden on
providers. CMS has begun to take action to harmonize these
efforts, but it remains a work in progress.39 An additional challenge will be for CMS to synchronize performance measures
between the Medicare Advantage program and those reported
through the quality reporting system required by state insurance
exchanges in the ACA. As insurers, Medicare Advantage plans must
also find effective ways to coordinate efforts with providers to
improve quality performance, as few plans are vertically integrated with providers.
6. Considerations for the future of Value-Based Purchasing in
Medicare
The ACA has institutionalized P4P for hospitals, physicians,
private plans, and will advance testing of P4P in other settings,
such as skilled nursing and home health. The attempt to improve
value in how Medicare purchases services from health care providers is a welcome development. Nonetheless, designing these
programs to achieve improvements in value remains a challenge,
one that must balance the interests of payers and providers
to create meaningful and fair incentives (i.e., high reliability,
low misclassification, appropriate setting of any thresholds and
weighting of measures, and selection of clinically meaningful
measures) that appeal to the values of the provider community
and achieve buy-in.
Our assessment of the alignment between P4P programs in
Medicare and best practices in P4P program design varies across
the programs we examined: HVBP is moderately or strongly
aligned with five of the seven best practices, the PVBPM is aligned
with three of the best practices, and the Medicare Advantage QBP
is aligned with six of the best practices. These assessments
involved judgment that could have been construed otherwise:
for instance, we indicated a “moderate” degree of alignment
between each of the Medicare programs and criterion #7 (adjust
programs dynamically), although, depending on the program, it
may take several years for performance measures and other design
features to be modified. Additionally, the PVBPM has not yet been
fully defined, leading to uncertainty about its alignment between
many of its design features and best practice in P4P. Nonetheless,
our analysis provides an overview of the concordance between
Medicare's implementations of P4P and how existing evidence
suggests programs should be structured, helping to identify
elements of these programs that may merit redesign in the future.
What remains uncertain, however, is which of the identified
P4P design elements are necessary, which are sufficient, and
which are optional for a program to successfully motivate quality
improvement. Further, the scope of our best practice criteria are
quite limited, reflecting the fact that the literature is lacking in
many areas that could guide design, such as what is the best
approach for funding performance bonuses and incentivizing
reductions in the rate of cost growth in these programs. The initial
hope of P4P was that higher quality care could result in healthier
patients and lower costs, but, by focusing largely on measures of
underuse, this has largely failed to materialize in the first generation of P4P programs.23,40 Recognizing this, many of the second
generation programs are explicitly incorporating costs of care into
payout criteria. One approach, embodied in the Medicare ACO
program, is to pay providers bonuses for quality from a pool of
health care cost “savings” relative to some benchmark. If there are
no savings, there are no bonuses. Alternatively, as implemented in
HVBP, bonus payments for winning hospitals are funded from
revenue reductions from losing hospitals, and cost reductions are
directly incentivized as a performance measure: the Medicare
Spending per Beneficiary measure will be incentivized starting in
FY 2015. It is unclear if either of these approaches will encourage
value improvement in Medicare.
Research has also just begun to evaluate the relationship
between P4P design features and unintended consequences. On
one hand, the relatively small magnitude of financial incentives in
many P4P programs has been noted as one of the reasons why
these programs have not generated substantial effects.13 On the
other hand, larger incentives, in the context of budget neutrality,
means that more money will be redistributed, potentially hurting
certain classes of providers and the patients they serve, and
potentially resulting in undesirable behaviors from providers (as
seen in the recent cheating scandals in high stakes educational
testing). One recent study found that incentive payments in the
HQID accrued disproportionately to hospitals that cared for the
least disadvantaged patients when payouts were based on high
quality attainment alone, but that the change in incentives to
reward improvement led to greater payments accruing to hospitals that cared for more disadvantaged patients.41 While related
research suggests that the HQID did not have a direct effect on
health care disparities by leading to the avoidance of minority
patients,42 research on the distribution of bonus payments suggests that design of P4P programs can affect financial performance
across the gradient of socioeconomic disadvantage. This finding
has also been observed in P4P programs implemented for physician practices.43 The redistribution of income across providers may
have downstream implications for disparities in care, particularly
over the longer term.
Another emerging issue in the design of P4P programs concerns the pros and cons of complexity. The current design of HVBP,
with incentives for both quality attainment and improvement
based on a continuous points system using a linear exchange
function is conceptually appealing, but complex. While proposals
have been developed to modify the existing incentive criteria to
encourage equity in hospital P4P,44 it is possible that the complexity of the current design does not give hospitals specific quality
targets and, as a result, may be a barrier to quality improvement
activities. Alternatively, the Quality and Outcomes Framework in
the United Kingdom uses a very simple design in which providers
are rewarded for every “right” service they provide and are
incentivized through a linear exchange function with a minimum
and maximum score. The challenge with implementing such a
design in Medicare is that budget neutrality, which is currently
embedded in Value-Based Purchasing, requires that either the
performance criteria or the magnitude of the incentive must
remain flexible, and thus unknown, to providers during the
evaluation period.
Conceptually, much of the complexity associated with the
design of P4P stems from the fact that one set of incentives is
intended to provide incentives that maximize quality improvement while minimizing unintended consequences across heterogeneous providers that differ with respect to their levels of quality
attainment, their trajectory of improvement, patient populations,
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
and their organizational and financial capacity for continued
change.45 An alternative design strategy would be to create a
relatively simple design (e.g., a linear exchange function based on
quality attainment alone in which the median is the threshold for
a bonus or penalty) but implement the design among homogenous
competition pools, defined by region, levels of quality, rural or
urban location, size, or other criteria. If providers in the same
competition pool had relatively similar levels of quality and the
same ability to improve, then a simple attainment-only design
would likely be sufficient to meet the program aims while
avoiding the complexity of a one-size-fits-all design.
The core elements of HVBP have been statutorily defined, and,
notwithstanding major patient safety concerns that are tied to
HVBP, CMS has limited flexibility to alter the program design in
the short term. The primary area in which CMS can modify
the design of HVBP is in the choice of performance measures,
arguably the most crucial element of P4P programs. Research
should therefore focus on impact analysis, and whether different
measure domains, and perhaps individual measures, may be more
responsive to the program. In addition, research should pay special
attention to the potential unintended consequences of HVBP,
particularly the potential for revenue distribution away from
hospitals caring for disadvantaged patients and the potential for
hospitals to avoid caring for certain classes of patients as HVBP
begins to reward performance on outcome measures.
For the PVBPM, the design is still unfolding and can be
informed by the evidence to date from physician P4P program
implementations in the private sector. The major issues going
forward are deciding which types of physician practices to target,
how large incentive payments should be, what criteria are used for
incentive payments, and how to align the PVBPM with Medicare
Advantage QBP. In Medicare Advantage, the large magnitude of
incentive payments and the reasonable incentive design emphasizing both quality achievement and quality improvement are
good signs. Important issues for this program will be to avoid
patient cherry-picking among plans to increase the likelihood of
obtaining the quality bonus payments.
Moving forward with Value-Based Purchasing in Medicare, an
approach more appealing than having different programs for
different providers (hospitals, physicians, skilled nursing) would
be to develop integrated programs across providers that share
accountability for patients. A single integrated Value-Based Purchasing program covering all domains of care implemented for
Accountable Care Organizations may be the best way to simplify
Value-Based Purchasing and integrate across care settings to
improve value. However, this presumes that ACOs will diffuse
broadly enough to substantially impact care nationally, which
remains to be seen. In the meantime, policymakers need to look
to ways to strengthen existing programs to achieve the goal of
value improvement in Medicare.
Finally, continuous monitoring and evaluation of Value-Based
Purchasing programs in Medicare is crucial as these programs
evolve. There is a large social science literature that challenges the
use of financial incentives to change behavior, suggesting that
financial rewards for intrinsically valuable activities can be detrimental.46 To the extent possible, research should focus on how to
modify Value-Based Purchasing programs within their statutory
constraints to achieve best practice in design and align the
programs with emerging behavioral economics literature around
financial incentives and behavior.47 This research has the potential
to result in program designs that are better aligned with the
objectives of CMS to improve value. While the new generation of
P4P experiments may yield stronger evidence than previous
programs, Value-Based Purchasing is only one of many possible
levers to use to improve value in health care. Continued emphasis
on broader payment reform is critical as we progress towards a
47
better understanding of how to effectively implement value-based
payment reform in Medicare.
Conflict of interest
The authors have no conflicts of interest regarding this work.
Funding
Cheryl Damberg currently is funded to conduct work on two
federal contracts, one for the Assistant Secretary for Planning and
Evaluation in the US Department of Health and Human Services
(TO 12-233-SOL-00418) on “Value-based Purchasing: Measuring
Success” and the other for the Centers for Medicare and Medicaid
Services (HHSM-500-2005-00028I, TO # HHSM-500-T0005) on
“Analyses Related to Medicare Advantage Plan Ratings for Quality
Bonus Payments.” Her work on this paper was not supported by
either of these projects. For Andrew Ryan, this project was
supported by a grant from the Agency for Healthcare Research
and Quality (K01 HS018546-01). Andrew Ryan is also funded by a
grant from the Robert Wood Johnson Foundation to study the
effects of Hospital Value-Based Purchasing on quality of care in its
first period of implementation. The views expressed in this paper
are solely those of the authors. The authors would like to
acknowledge Jordan VanLare and William Borden for helpful
comments on the manuscript.
Appendix A. Identifying best practices for the design of P4P
programs
Various attempts have been made to identify the “best practices” for the design of P4P in health care.49 Probably the most
thorough attempt to identify P4P design elements that are
supported by the literature is a recent paper by Van Herck
et al.19 Based on a review of 128 studies of the effects of financial
incentives on quality in health care, the authors identify a number
of P4P design elements that appear to be supported by the
literature: choose measures that have room for improvement
(i.e., are not topped out); choose process or intermediate outcome
measures rather than longer term outcomes; engage providers
and stakeholders throughout implementation; coordinate program design across payers; incentivize both quality attainment
and quality improvement; and make incentive payments at the
individual or team level rather than higher organizational levels
(e.g. hospitals or physician practice). The authors further note
several design P4P features that are supported by theory, but for
which there is inconclusive evidence supporting their use in
health care: adjust programs dynamically to recalibrate measures
and payment thresholds; pay incentives that are sufficiently large
to motivate a behavioral response; and provide technical assistance to participating providers.50
Using the Van Herck et al. review as a starting point, we make
several modifications to these criteria based both on studies that
have been published since the review as well as our own interpretation of the evidence. Our assessment of best practices considers the
strength of the evidence both for design elements that lead to
stronger incentives to improve quality and for elements that lead
to fairness in the programs (i.e., the elements that do not result in
systematic advantages or disadvantages across different types of
participating providers).
First, we eliminate the criterion “Choose process or intermediate
outcome measures rather than longer term outcomes.” Research
from social psychology and behavioral economics suggests that
financial incentives may be more effective meeting simple per-
48
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
formance criteria (e.g. process improvement) compared to more
complex performance criteria (e.g. outcome improvement),51 and
process improvement is certainly easier than outcome improvement,
which may explain the apparent advantage of incentivizing process
measure or intermediate outcomes in P4P programs. However,
recent literature on the process and outcome relationship suggests
that incentivizing process improvement, by itself, is insufficient to
improve outcome performance.27,52–54 This questions the clinical
importance of process improvement by itself, leading us to eliminate
this best practice criterion. Second, we modify the criterion “Engage
providers and stakeholders throughout implementation” to “Promote
widespread awareness of the program.” While evidence suggests that
provider awareness of incentive programs is critical to their success,
provider engagement – particularly related to the choice of measures
and performance criteria – may be more reflective of provider
capture of P4P programs, rather than best practice. For example,
physicians in the United Kingdom have succeeded in avoiding
increases to maximum thresholds for incentive payments, even as
the vast majority of physicians have exceeded these thresholds and
have no financial incentives for additional incremental improvements in quality.55
Finally, we eliminate the criterion “Make incentive payments at
the individual or team level rather than higher organizational
levels,” because we feel that this criterion is specific to the type of
performance measure, and not a general concept. For instance,
for simple process measures over which a physician may have a
large degree of control (e.g. measuring HbA1c levels for diabetic
patients), incentives may be optimally effective at the physician
level, whereas incentives for measures involving more complex
processes and coordination across providers (e.g. reducing readmissions), are surely not optimally effective at the physician level.
While the criterion “Incentivize both quality attainment and
quality improvement” may in fact lead to stronger incentives for
improvement, it was not supported by the literature synthesized
by the Van Herck et al. review and recent evaluations of P4P
programs that have incentivized both quality attainment and
improvement have not found this approach to improve quality.25,26,37 However, we chose to maintain this criterion because
research has found that this design feature leads to a more equal
distribution of payouts across providers caring with initially higher
and lower levels of quality, and those that care for patients with
greater or lesser socioeconomic disadvantage.41
References
1 Arrow K. Uncertainty and the welfare economics of medical care. American
Economic Review. 1963;53(5):941–969.
2 Committee on Quality of Health Care in America, Institute of Medicine. Crossing
the Quality Chasm: A New Health System for the 21st Century. The National
Academies Press; 2001.
3 Ryan A, Blustein J. Making the best of hospital pay for performance. New
England Journal of Medicine. 2012;366(17):1557–1559.
4 Centers for Medicare and Medicaid Services. Report to Congress: Plan to
Implement a Medicare Hospital Value-Based Purchasing Program.Washington,
DC, US: Department of Health and Human Services; 2007 November 21.
5 Doran T, Fullwood C, Gravelle H, et al. Pay-for-performance programs in family
practices in the United Kingdom. New England Journal of Medicine. 2006;355(4):
375–384.
6 Town R, Kane R, Johnson P, Butler M. Economic incentives and physicians’
delivery of preventive care: a systematic review. American Journal of Preventive
Medicine. 2005;28(2):234–240.
7 Tompkins CP, Higgins AR, Ritter GA. Measuring outcomes and efficiency in
medicare value-based purchasing. Health Affairs (Millwood). 2009;28(2):
w251–w261.
8 Rosenthal MB, Fernandopulle R, Song HR, Landon BE. Paying for quality:
providers’ incentives for quality improvement. Health Affairs. 2004;23(2):
127–141.
9 Rosenthal MB, Landon BE, Normand SL, Frank RG, Epstein AM. Pay for performance in commercial HMOs. New England Journal of Medicine. 2006;355(18):
1895–1902.
10 Kuhmerker K, Hartman T. Pay-for-performance in State Medicaid Programs: a
survey of State Medicaid Directors and Programs. IPRO; April 2007: 1018.
11 Robinson JC, Shortell SM, Rittenhouse DR, Fernandes-Taylor S, Gillies RR,
Casalino LP. Quality-based payment for medical groups and individual physicians. Inquiry. 2009;46(2):172–181.
12 Alexander JA, Maeng D, Casalino LP, Rittenhouse D. Use of care management
practices in small- and medium-sized physician groups: do public reporting of
physician quality and financial incentives matter? Health Serv Res. 2013;
48(2 Pt 1):376–397.
13 Petersen LA, Woodard LD, Urech T, Daw C, Sookanan S. Does pay-forperformance improve the quality of health care? Annals of Internal Medicine.
2006;145(4):265–272.
14 Rosenthal MB, Frank RG. What is the empirical basis for paying for quality in
medical care? Medical Care Research and Review. 2006;63(2):135–157.
15 Mehrotra A, Damberg CL, Sorbero ME, Teleki SS. Pay for performance in the
hospital setting: what is the state of the evidence? American Journal of Medical
Quality: The Official Journal of the American College of Medical Quality. 2009;24(1):
19–28.
16 Flodgren G, Eccles MP, Shepperd S, Scott A, Parmelli E, Beyer FR. An overview of
reviews evaluating the effectiveness of financial incentives in changing healthcare professional behaviours and patient outcomes. Cochrane Database of
Systematic Reviews. 2011(7):CD009255.
17 Scott A, Sivey P, Ait Ouakrim D, et al. The effect of financial incentives on the
quality of health care provided by primary care physicians. Cochrane Database of
Systematic Reviews. 2011(9):CD008451.
18 Houle SK, McAlister FA, Jackevicius CA, Chuck AW, Tsuyuki RT. Does
performance-based remuneration for individual health care practitioners affect
patient care? A systematic review Annals of Internal Medicine. 2012;157(12):
889–899.
19 Van Herck P, De Smedt D, Annemans L, Remmen R, Rosenthal MB, Sermeus W.
Systematic review: effects, design choices, and context of pay-for-performance
in health care. BMC Health Services Research. 2010;10:247.
20 Lindenauer PK, Remus D, Roman S, et al. Public reporting and pay for
performance in hospital quality improvement. New England Journal of Medicine.
2007;356(5):486–496.
21 Grossbart SR. What's the return? Assessing the effect of pay-for-performance
initiatives on the quality of care delivery Medical Care Research and Review.
2006;63(1 Suppl):29S–48S.
22 Glickman SW, Ou FS, DeLong ER, et al. Pay for performance, quality of care, and
outcomes in acute myocardial infarction. JAMA. 2007;297(21):2373–2380.
23 Ryan AM. Effects of the premier hospital quality incentive demonstration on
medicare patient mortality and cost. Health Services Research. 2009;44(3):
821–842.
24 Bhattacharyya T, Freiberg AA, Mehta P, Katz JN, Ferris T. Measuring the report
card: the validity of pay-for-performance metrics in orthopedic surgery. Health
Affairs (Millwood). 2009;28(2):526–532.
25 Werner RM, Kolstad JT, Stuart EA, Polsky D. The effect of pay-for-performance in
hospitals: lessons for quality improvement. Health Affairs (Millwood). 2011;30(4):
690–698.
26 Ryan AM, Blustein J, Casalino LP. Medicare's flagship test of pay-forperformance did not spur more rapid quality improvement among lowperforming hospitals. Health Affairs (Millwood). 2012;31(4):797–805.
27 Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for
performance on patient outcomes. New England Journal of Medicine. 2012;366
(17):1606–1615.
28 Rosenthal MB, Frank RG, Li Z, Epstein AM. Early experience with pay-forperformance: from concept to practice. JAMA. 2005;294(14):1788–1793.
29 Damberg CL, Raube K, Teleki SS, dela Cruz E. Taking stock of pay-for-performance: a candid assessment from the front lines. Health Affairs. 2009;28(2):
517–525.
30 Pearson SD, Schneider EC, Kleinman KP, Coltin KL, Singer JA. The impact of payfor-performance on health care quality in Massachusetts, 2001–2003. Health
Affairs. 2008;27(4):1167–1176.
31 Song Z, Safran DG, Landon BE, et al. The ‘Alternative Quality Contract,’ based on
a global budget, lowered medical spending and improved quality. Health Affairs.
2012;31(8):1885–1894.
32 Centers for Medicare & Medicaid Services. Medicare program; hospital inpatient
value-based purchasing program. Final rule. Federal Register. 2011;76(88):
26490–26547.
33 cms.gov. Physician quality reporting system: 2010 reporting experience. 〈http://
www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/
PQRS/index.html〉; 2013 Accessed 27.02.13.
34 Medicare Program; Revisions to payment policies under the physician fee schedule,
DME face-to-face encounters, elimination of the requirement for termination of
non-random prepayment complex medical review and other revisions to Part B for
CY 2013. In: Centers for Medicare & Medicaid Services, ed. 77 FR 44721. vol. 42
CFR Parts 410, 414, 415, 421, 423, 425, 486, and 495: Federal Register; 2012:
68891–69380.
35 Cotton P, Datu B, Thomas S. Early evidence suggests medicare advantage pay for
performance may be getting results. Health Affairs Blog. 2012;2013 [healthaffairs.org: Health Affairs].
36 Ryan A, Damberg C. In: Services CfMaM, editor. Table 16. Actual Hospital Value
Based Purchasing Program (VBP) Adjustment Factors for FY 2013. Washington DC:
Centers for Medicare & Medicaid Services; 2013.
37 Ryan AM, Blustein J. The effect of the MassHealth hospital pay-for-performance
program on quality. Health Services Research. 2011;46(3):712–728.
A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49
38 Boukus ER, Cassil A, O'Malley AS. A Snapshot of U.S. Physicians: Key Findings from
the 2008 Health Tracking Physician Survey. vol. 35. Center for Studying Health
System Change; 2009.
39 Damberg C. Efforts to Reform Physician Payment: Tying Payment to Performance.
Santa Monica, CA: RAND Corporation; 2013.
40 Kruse GB, Polsky D, Stuart EA, Werner RM. The impact of hospital pay-forperformance on hospital and Medicare costs. Health Services Research. 2012;47(6):
2118–2136.
41 Ryan AM, Blustein J, Doran T, Michelow MD, Casalino LP. The effect of Phase 2 of
the Premier Hospital Quality Incentive Demonstration on incentive payments to
hospitals caring for disadvantaged patients. Health Services Research. 2012;47(4):
1418–1436.
42 Ryan AM. Has pay-for-performance decreased access for minority patients?
Health Services Research. 2010;45(1):6–23.
43 Chien AT, Wroblewski K, Damberg C, et al. Do physician organizations located
in lower socioeconomic status areas score lower on pay-for-performance
measures? Journal of General Internal Medicine. 2012;27(5):548–554.
44 Borden WB, Blustein J. Valuing improvement in value-based purchasing.
Circulation: Cardiovascular Quality and Outcomes. 2012;5(2):163–170.
45 Blustein J, Borden WB, Valentine M. Hospital performance, the local economy,
and the local workforce: findings from a US National Longitudinal Study. PLoS
Medicine. 2010;7(6):e1000297.
46 Deci EL, Koestner R, Ryan RM. A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological
Bulletin. 1999;125(6):627–668 [discussion 692–700].
49
47 Woolhandler S, Ariely D. Will pay for performance backfire? insights from
behavioral economics Health Affairs Blog. 2012;2013 [healthaffairs.org: Health
Affairs].
48 The Henry J, Kaiser Family Foundation. Medicare Advantage Plan Star Ratings and
Bonus Payments in 2012.Menlo Park, CA: Kaiser Family Foundation; 2011.
49 Institute of Medicine. Rewarding Provider Performance: Aligning Incentives in
Medicare. Washington, DC: 2006.
50 Damberg C, Sorbero ME, Mehrotra A, Teleki SS, Lovejoy S, Bradley L. An
Environmental Scan of Pay for Performance in the Hospital Setting: Final Report.
Washington, DC: Department of Health and Human Services; 2007.
51 Osterloh M, Frey B. 20-Does pay for performance really motivate employees?
Business Performance Measurement: Unifying Theory and Integrating Practice. 2nd
ed., Cambridge University Press; 433–448.
52 Werner RM, Dudley RA. Medicare's new hospital value-based purchasing
program is likely to have only a small impact on hospital payments. Health
Affairs (Millwood). 2012;31:1932–1940.
53 Ryan AM, Burgess Jr. JF, Tompkins CP, Wallack SS. The relationship between
Medicare's process of care quality measures and mortality. Inquiry. 2009;46:
274–290.
54 Ryan AM, Nallamothu BK, Dimick JB. Medicare's public reporting initiative
on hospital quality had modest or no impact on mortality from three key
conditions. Health Affairs (Millwood). 2012;31:585–592.
55 British Medical Association. General Medical Services—Contractual Changes
2013/14: BMA GPC Response. British Medical Association; 2013.
Healthcare 1 (2013) 50–51
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Interview with Mark B. McClellan, MD, Ph.D.
Interviewed by Brian W. Powers
A medical doctor and economist, Mark McClellan works on
promoting high-quality, innovative and affordable health care.
Once commissioner of the Food and Drug Administration (FDA)
and administrator of the Centers for Medicare & Medicaid Services
(CMS), Dr. McClellan is currently Director and Leonard D. Schaeffer
Chair in Health Policy Studies, of the Engelberg Center for Health
Care Reform at the Brookings Institution. McClellan holds an MD
from the Harvard University–Massachusetts Institute of Technology
(MIT) Division of Health Sciences and Technology, a Ph.D. in
economics from MIT, an MPA from Harvard University, and a BA
from the University of Texas at Austin. He completed his residency
training in internal medicine at Boston's Brigham and Women's
Hospital, is board-certified in Internal Medicine, and has been a
practicing internist during his career.
Brian Powers: You held several senior positions in the federal
government before coming to Brookings. When transitioning from
federal service why did you choose Brookings?
Mark McClellan: I thought about several options, but my roots
are in academics. I did not want to move far away from that, but I
had really enjoyed working on implementation issues with people
in government and the private sector. Brookings seemed like a
good balance of the two. I can continue to do policy analysis and
research, but also stay closely connected to political developments
and support public-private collaborations.
BP: Do you ever miss being in an implementation role?
MM: In some ways we are not too far from it at Brookings. For
example, a lot of our work relates to having impact on current
legislative and regulatory issues such as Accountable Care Organization (ACO) implementation, physician payment reform, drug
development and surveillance.
One of the hardest things about being in an implementation
role was the consuming nature of the work. In policy analysis and
research you do not have to get into all of the depths, complexities,
and practical issues of implementing new programs. At the beginning, it was nice to have more control over my schedule and less of
that stress. But one of the most rewarding things about being
involved in implementation is actually seeing your ideas make a
real and direct difference. So I do miss some of that.
BP: How did your time at CMS and FDA shape the portfolio of
activities you lead at Brookings?
MM: Our focus at FDA was on turning ideas from the lab into
safe, effective, and reliable treatments that can make an impact on
the lives of patients. I viewed my time at CMS in a similar way. In
my first address to the staff at CMS I highlighted that I thought
CMS was the nations largest public health agency in terms of
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.05.001
financing, since how we pay for care has such an impact on
effective treatment. At Brookings we have tried to put these ideas
together. Whether it is better approaches to determining which
treatments work for particular patients or better approaches to
improving the delivery system, all of our work maintains a focus
on protecting and promoting the health of the public.
BP: Are there any recent reports or activities from Brookings
that you would like to highlight?
MM: Last month we released the Bending the Curve report,
which brought together a number of experts to determine the best
policy directions for bringing costs down. The emphasis in the
report is on the fact that the best way to effectively reduce costs is
to focus on improving quality and on getting care right. There are
recommendations about financing and regulation, but the basic
theme is that better care as the pathway to higher-value.
BP: You were recently part of a debate in the Wall Street
Journal over the promise of Accountable Care Organizations. What
is the source of the disagreement?
MM: I do not think there is any disagreement about the need
for disruptive innovation in health care. The question is how is to
make that happen as quickly as possible. One of the reasons we
have not seen more disruptive innovation is because our financing is
not aligned. In most other industries you are rewarded when you
develop a product that costs less or does a better job. But under
traditional financing and regulatory systems in health care you are
often punished. Strategies like intervening early and targeted use of
interventions lead to less revenue. The promise of ACOs is making
disruptive innovation pay off. ACOs align valuable innovation and
health care financing systems so it is no longer the case that if you
come up with an innovation in care that reduces costs while
maintaining or improving quality you lose money. There are certainly
other ways to align incentives for innovation and improvement, such
as value-based insurance design and steps to make regulations
focused more on results and value rather than on structural and
process issues.
BP: What early lessons are you starting to see from your work
with the ACO community?
MM: I do not want to oversimplify based on limited case study
results, but ACOs that seem to have the most impact are ones
where there is an organizational commitment to real cultural change
creating a systematic focus on getting to better care at a lower cost.
Many of the ACOs we work with have emphasized that it's not a
matter of implementing one initiative and thinking you’re done.
Changing care delivery enough to have a systematic impact on health
outcomes and costs is a long and sustained process. Similarly, viewing
Interview with Mark B. McClellan / Healthcare 1 (2013) 50–51
the ACO financing model as a transitional process seems to lead to
more of an impact over time. Successful organizations use ACO
contracts a starter phase to help them get data systems in place and
identify a clear set of opportunities for improving care and lowering
costs. They can then move on to additional changes in financing such
as two-sided risk and capitation contracts.
We have seen a lot of encouraging results so far, mostly in the
private sector. But it is still early and we are looking for more
systematic evidence on when and how ACOs work best.
BP: How do ACOs fit into the larger environment of delivery
and finance reform?
MM: The key thing about ACOs is the focus on better health
outcomes at lower costs accompanied by aligned financial incentives. ACOs are not the only financing reform that can achieve this
goal. Also very helpful, even instrumental, could be case-based
payments in primary care medical homes and bundled payments
for episodes of care. In fact, a lot of ACOs are implementing these
kinds of payment reforms together in a complementary and
reinforcing way. But it is not just about provider payment reforms.
There needs to be corresponding changes in benefit design so that
patients can share in the savings when they take steps to get more
effective care at a lower cost. Some private sector ACOs are now
implementing their provider payment changes along with changes
in benefit design that lower premiums or co-pays when patients
use more effective, lower cost services.
BP: You mentioned that among ACOs that have made an impact,
there is a cultural commitment to high-value care. Do you think
that the process of sitting down and negotiating an ACO contract
motivates culture change?
MM: The focus on ACOs has been helpful since it puts the core
goals of better health and avoiding unnecessary costs at the center of
51
an organization's activities. Even if an organization does not implement an ACO, or adopts other payment reforms, it is a great way to
reorient the discussion around innovative approaches to get
more value.
BP: On the topic of the shifting payment from volume to value,
what can CMS do to best facilitate that transition?
MM: I think it would be helpful for CMS to view each of their
payment reform activities as potentially reinforcing elements of
a comprehensive improvement strategy rather than standalone
solutions. Many of these programs were started as pilots, and
evaluation is focused on how each may have an impact in isolation. I
think making these steps part of a systematic strategy is more
important for driving health care reform. Instead of figuring out if
we can get costs down and quality up with a particular medical
home or bundled payment model, it may be more important to
implement them all together and look for the cumulative impact.
This requires a systematic approach to thinking about payment
reform as well as a systematic infrastructure for measurement and
data support. And I think we still have a way to go before CMS
is there.
BP: To conclude, what major gaps do you see in the literature
on health care delivery science and improvement? What role can a
journal like HJDSI play in filling these gaps?
MM: Health care delivery is complex, dynamic, and uncertain.
A lot of organizations are trying to get better results for patients at
a lower cost, but are not sure how to do it. Success depends on
current organizational form, policy options for reimbursement and
benefit design, regulation, and being able to make solid inferences
from actual practice data. Better science for dealing with this
complexity is badly needed and I think the Journal could play an
important role in its development.
Healthcare 1 (2013) 52–54
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Short communication
Instant replay
David I. Rosenthal n
a
b
Yale School of Medicine, New Haven, CT, USA
Homeless PACT, VA Connecticut 950 Campbell Avenue, New Haven, CT, USA
ar t ic l e i nf o
a b s t r a c t
Article history:
Received 11 November 2012
Received in revised form
27 March 2013
Accepted 20 April 2013
Available online 9 May 2013
With widespread adoption of electronic health records (EHRs) and electronic clinical documentation,
health care organizations now have greater faculty to review clinical data and evaluate the efficacy of
quality improvement efforts. Unfortunately, I believe there is a fundamental gap between actual health
care delivery and what we document in the current EHR systems. This process of capturing the patient
encounter, which I'll refer to as transcription, is prone to significant data loss due to inadequate methods
of data capture, multiple points of view, and bias and subjectivity in the transcriptional process. Our
current EHR, text-based clinical documentation systems are lossy abstractions − one sided accounts of
what take place between patients and providers. Our clinical notes contain the breadcrumbs of
relationships, conversations, physical exams, and procedures but often lack the ability to capture the
form, the emotions, the images, the nonverbal communication, and the actual narrative of interactions
between human beings. I believe that a video record, in conjunction with objective transcriptional
services and other forms of data capture, may provide a closer approximation to the truth of health care
delivery and may be a valuable tool for healthcare improvement.
& 2013 Elsevier Inc. All rights reserved.
Keywords:
Quality improvement
Electronic health records
Video technology
Graduate medical education
“The information you have is not the information you want. The
information you want is not the information you need. The
information you need is not what you can get or is not known.
The information that is known can't be found in time.”
Finagle's law.1
Sunday, 3pm. Code Blue. Our code team races to converge on a
patient on a Neurology floor. When we all arrive, the patient is
found unconscious in her bed. The primary team communicates to
us important information and confirms that she has an Advance
Directive with instructions to not resuscitate. She is DNR. Within
moments the patient loses her pulse, has an asystolic arrest and
dies. While somewhat mundane and limited in its scope compared
to other code situations during my residency training, it plays over
and over again in my memory. It was particularly memorable
because the event was recorded on video while the patient was
undergoing Video Electroencephalography (Video EEG) monitoring. Prior to my career in medicine, I worked as a filmmaker and a
media technician. Sitting down at the computer screen to review
the 10-min video reminded me of my days in the editing room and
provided me with a rare omniscient glimpse of my own and our
team's performance. It provided extraordinary insight and sparked
my imagination for further uses of video technology as a tool in
healthcare settings.
n
Correspondence address: Yale School of Medicine, New Haven, CT, USA.
Tel.: +1 203 479 8077.
E-mail addresses: [email protected], [email protected]
2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.hjdsi.2013.04.004
With widespread adoption of electronic health records (EHRs)
and electronic clinical documentation, health care organizations
now have greater faculty to review clinical data and evaluate the
efficacy of quality improvement efforts. Unfortunately, I believe
there is a fundamental gap between actual health care delivery
and what we document in the current EHR systems. This process
of capturing the patient encounter, which I'll refer to as transcription, is prone to significant data loss due to inadequate methods of
data capture, multiple points of view, and bias and subjectivity in
the transcriptional process. Our current text-based documentation
systems are lossy abstractions—one sided, incomplete, accounts of
encounters between patients and providers. Our notes contain the
breadcrumbs of relationships, conversations, physical exams, and
procedures but lack the form, the emotions, the images, the
nonverbal communication, and the actual narrative of interactions
between human beings. Anyone who has performed chart review
understands these inherent challenges.2,3 I believe that a video
record, in conjunction with objective transcriptional services and
other forms of data capture, may provide a closer approximation
to the truth of health care delivery and may be a valuable tool for
healthcare improvement.
There is a growing evidence base for using video data as a
means to review and audit the quality of care delivery. Video has
been used in healthcare settings for myriad clinical and nonclinical purposes: medical education with standardized patients,
hospital security, trauma care, long term video electroencephalography (EEG), surgical procedures in operating rooms, sleep
studies, and more recently remote patient encounters, tele ICU
D.I. Rosenthal / Healthcare 1 (2013) 52–54
monitoring, and even remote monitoring of hospital handhygiene.4 Much of the early literature of video auditing initially
focused on emergency and trauma care,5–7 in tele ICU settings, and
with standardized patients in medical education.8 Such shortinterval, discrete encounters lend themselves well for auditing and
there is some data to support various technologies. Video recordings have been shown to be more sensitive than traditional
methods at identifying management errors during trauma resuscitation.9 In Europe in the 1990s with far less sophisticated
technology, pilots of video observation of general practitioners in
outpatient clinics were shown to be feasible to implement, valid
for assessing practitioner communication skills, and not have a
significant effect on consultation length or number of problems
dealt with in the patient encounter.10,11 In spite of this, valid legal
concerns about patient privacy, consent, and discoverability have
led many trauma centers, hospitals, and birthing centers to
prohibit all videotaping practices.12,13
As a primary care physician who sees patients both in the
outpatient clinic and in the inpatient hospital setting, I believe that
video may have a role to offer more objectivity in transcription of
actual clinical care services delivered, and the ability to provide
meaningful, rapid oversight of clinical care. With most medical
innovations, new data collection modalities spur unimagined
applications. Since Einthoven captured and characterized electrical
waves on the skin, the skin-based electrical signal capture technology has far blossomed past the application of the electrocardiogram, to monitoring in telemetry units, outpatient loop recorders,
electrophysiology labs, electroencephalography (EEG) units, automatic external defibrillators, and newer experimental devices built
into exercise equipment, iphone apps, and car seats. In the same
vein, there are numerous possible applications of using video data
in healthcare—most of which can be thought of as utilizing one of
the two core functions: (1) live monitoring of video feeds, e.g.
instantaneous review such as in tele ICUs, or (2) saving video data
and banking for retrospective analysis. I believe there are valuable
applications of both functions for key stakeholders in both inpatient and outpatient settings.
Tele ICUs, which provide instantaneous review for remote monitoring in higher-acuity patients, provide some current insights and
some conflicting quality data. Some studies show no clinical benefits14; other reviews suggest an association with lower ICU mortality
and length of stay but not with lower in-hospital mortality or
hospital length of stay15; and still others, a single academic center
focused on re-engineering the process of implementation showed
significant improvements in adherence to best practices with associated mortality reduction and decreased hospital length of stay.16
Staff acceptance of such programs shows initial ambivalence and
resistance, but after implementation, overall acceptance is high, and
perceived improvement in quality of care is higher as well.17
While I anticipate many nervous colleagues and privacy advocates sending me inflammatory messages in my inbox, I would
welcome video oversight into my office if I believed it were in the
patients' best interests and if patients felt that such oversight
could provide a tangible benefit to them. With the correct system
in place, as a primary care physician I could send a video link of an
encounter to a specialist with a clinical question for e-consultation
or to a master clinician for a second opinion. For the patient or
caregiver, access to a secure video link or a transcription of an
encounter allows reviewing of important information, and perhaps even submitting for a second opinion. I would welcome
automatic audio transcriptional services to capture much of the
documentation needed for accurate coding and billing to better
allow me to engage fully with patients—eyes and fingers away
from the EHR.
The applications of video documentation are numerous and
limited only by our collective imaginations. It is not hard to
53
imagine the hospitals of the future with control rooms with
highly-trained clinical and operational staff, like air traffic control
towers, remotely monitoring care delivery, communicating with
nursing staff, and assisting with bed flow management or early
detection of clinical decompensation in all areas of the hospital.
Such systems may also empower outpatient providers with
established relationships to patients, allowing them to remotely
care for patients or participate more fully with inpatient teams to
prevent fragmentation. From a management and operations perspective, clinical leadership could utilize video review as a part of
peer review and coaching. Adverse events could be reviewed by
re-design teams to help understand process flows in hospitals and
minimize future events. Archived video data could provide fundamental new learning modalities for medical education, such as the
ability to audit and review housestaff performance, and watch
medical student patient encounters in real time without the
Hawthorne effect of an attending's physical presence. Clinical
teaching cases that transpire over a course of days or weeks could
be compressed into condensed, memorable vignettes for review in
short lectures.
Yet with all the hype, for many valid reasons, namely those of
privacy, transparency, liability, ownership and cost, both the
providers and consumers of healthcare have not fully embraced
video as a form of electronic documentation.18–21 While medicolegal concerns have caused significant trepidation in advancing
this technology in healthcare, I believe that obfuscation behind a
biased, abstracted medical record system should not be a collective
goal of health professionals. It is possible that, if discoverable, such
video documentation may actually serve to protect defendants
against frivolous lawsuits and to help the judicial system identify
actual malpractice. However, it is likely that without legal protections, providers and care organizations will not be willing to take
the legal risk of implementation. Furthermore from a cost perspective, such systems require significant investment, for comparison, costs of current tele ICU systems are roughly between $2 and
$5 million for installation with annual operating costs of about
$1.5 million,22,23 warranting studies for cost-effectiveness. Other
critics of video in healthcare contend that the technology could
produce too much data, thereby generating too much noise to
actual signal. I think the same argument could also be made about
genetic information, high resolution imaging scans, telemetry, and
even our current text-based EHR systems. As with these other
sources, well-designed tools for monitoring, rapid analysis and
insight should be further developed for video systems.
Despite all of these limitations, my personal experience from
watching the replay of the code situation was profoundly moving
and stands out as a highlight of many years of medical training. In
my humble opinion, the ability to capture and monitor patient
care interactions on video is a modality that may better represent
the visual and auditory world that surrounds patient care and may
provide important insights into variations in practice and how, as
officials in this imperfect game of medical practice, we can all
strive to make the right calls. I long for medical instant replay.
References
1. Gillam, Stephen, Yates Jan, Badrinath Padmanabhan. Essential Public Health:
Theory and Practice.Cambridge: Cambridge University Press; 2007.
2. Wu L, Ashton CM. Chart review. A need for reappraisal. Evaluation and the
Health Professions. 1997;20:146–163.
3. Allison JJ, Wall TC, Spettell CM, Calhoun J, Fargason Jr. CA, Kobylinski RW, et al.
The art and science of chart review. Joint Commission Journal on Quality
Improvement. 2000;26:115–136.
4. Rosenberg Tina. An Electronic Eye on Hospital Hand-Washing. New York Times;
24 February, 2011 〈http://opinionator.blogs.nytimes.com/2011/11/24/an-electro
nic-eye-on-hospital-hand-washing/?scp=1&sq=hand%20hygeine&st=cse〉.
54
D.I. Rosenthal / Healthcare 1 (2013) 52–54
5. Fitzgerald M, Gocentas R, Dziukas L, Cameron P, Mackenzie C, Farrow N. Using
video audit to improve trauma resuscitation-time for a new approach. Canadian
Journal of Surgery. 2006;49(3):208–211.
6. Lubbert PH, Kaasschieter EG, Hoorntje LE, Leenen LPH. Video registration of
trauma team performance in the emergency department: the results of a 2year analysis in a level 1 trauma center. Journal of Trauma. 2009;67:1412–1420.
7. Mackenzie CF, Xiao Y, Hu FM, Seagull FJ, Fitzgerald M. Video as a tool for
improving tracheal intubation tasks for emergency medical and trauma care.
Annals of Emergency Medicine. 2007;50:436–442.
8. Zick A, Granieri M, Makoul G. First-year medical students' assessment of their
own communication skills: a video-based, open-ended approach. Patient
Education and Counseling. 2007;68:161–166.
9. Oakley E, Stocker S, Staubli G, Young S. Video recording to identify management
errors in pediatric trauma resuscitation. Pediatrics. 2006;117(3):658–664.
10. Ram P, Grol R, Rethans JJ, Schouten B, Van der Vleuten C, Kester A. Assessment
of general practitioners by video observation of communicative and medical
performance in daily practice: issues of validity, reliability and feasibility.
Medical Education. 1999;33:447–454.
11. Pringle M, Stewart-Evans C. Does awareness of being video recorded affect
doctor's consultation behavior? British Journal of General Practice. 1990;40:
45–48.
12. Eitel DR, Yankowitz J, Ely JW. Legal implications of birth videos. Journal of
Family Practice. 1998:2516.
13. Campbell S, Sosa JA, Rabinovici R, Frankel H. Do not roll the videotape: effects of
the health insurance portability and accountability act and the law on trauma
videotaping practices. American Journal of Surgery. 2006:183–190.
14. Thomas EJ, Lucke JF, Wueste L, Weavind L, Patel B. Association of telemedicine
for remote monitoring of intensive care patients with mortality, complications,
and length of stay. Journal of the American Medical Association. 2009;302(24):
2671–2678.
15. Young LB, Chan PS, Lu X, Nallamothu BK, Sasson C, Cram PM. Impact of
telemedicine intensive care unit coverage on patient outcomes: a systematic
review and meta-analysis. Archives of Internal Medicine. 2011;171:498–506.
16. Lilly CM, Cody S, Zhao H, et al. Hospital mortality, length of stay, and
preventable complications among critically ill patients before and after teleICU reengineering of critical care processes. Journal of the American Medical
Association. 2011;305(21):2175–2183.
17. Young LB, Chan PS, Cram P. Staff acceptance of tele-ICU coverage: a systematic
review. Chest. 2011;139:279–288.
18. Eitel DR, Yankowitz J, Ely JW. Legal implications of birth videos. Journal of
Family Practice. 1998:251–256.
19. Campbell S, Sosa JA, Rabinovici R, Frankel H. Do not roll the videotape: effects of
the health insurance portability and accountability act and the law on trauma
videotaping practices. American Journal of Surgery. 2006:183–190.
20. Bain JE, Mackay NSD. Videotaping general practice consultations (letter to the
editor). British Medical Journal. 1993;307:504.
21. Servant JB, Mathieson JAB. Video recording in general practice: the patients do
mind. Journal of the Royal College of General Practitioners. 1986;36:555–556.
22. New England Healthcare Institute. Tele-ICUs: Remote Management in Intensive
Care Units.Cambridge, MA: Massachusetts Technology Collaborative and Health
Technology Center; 2007.
23. Breslow M. Remote ICU care programs: current status. Journal of Critical Care.
2007;22:66–76.
Healthcare 1 (2013) 55
Contents lists available at SciVerse ScienceDirect
Healthcare
journal homepage: www.elsevier.com/locate/hjdsi
Book Review
Medical Licensing and Discipline in America: A History of the
Federation of State Medical Boards, David A. Johnson,
Humanyun J. Chaudhry. Published by Lexington Books and the
Federation of State Medical Boards. Lanham, MD (2012). 390
pp., ISBN-10: 0739174398, ISBN-13: 978-0739174395.
During the early centuries of American history, the organization of the medical profession was in a state of complete disarray.
Unscrupulous, unqualified, self-professed healers practiced medicine without medical degrees or licenses or readily purchased
degrees at diploma mills. Societies representing homeopathic,
eclectic, Thomsonian, osteopathic and allopathic medicine endorsed
conflicting health care agendas, and medical schools teaching
disparate curricula sprouted up across the country. Throughout
all of this, the American public, largely skeptical of the value of
medical licenses and medical degrees in the first place, ignored
all documentation, disregarded safety and just visited whichever
health care provider they wanted to see. It is against this complicated backdrop that authors David A. Johnson and Humanyun J.
Chaudhry begin their examination of the development and evolution of medical regulation in the United States.
Medical Licensing and Discipline in America is a comprehensive
historical account documenting the unique transformation of
medical regulation in the country. Filled with engaging anecdotes,
their work aims to explore the establishment of the Federation of
State Medical Boards (FSMB), weaving the organization's history
into the larger fabric of medical regulation and discipline from
colonial times to the present day. It details the challenges
confronted by efforts to regulate medical practice, the circumstances that led to the founding of the FSMB in 1912, and the
organization's role throughout the ensuing decades amidst a statebased regulatory system. The book is information-rich, salient in
the way it systematically embeds the story of medical regulation
and discipline within the broader context of the many political,
cultural, and medical changes in the country.
The book is divided into eight chapters and also includes
opening notes from US Surgeon General Regina Benjamin and
FSMB Chairman Lance Talmage. The chapters are arranged chronologically, beginning with the “Birth of Medical Regulation in
America,” in the 17th century—a time when regulation was
piecemeal at best and rife with fraud. The second chapter “The
Roots of the Federation of State Medical Boards” chronicles the
founding of the American Confederation and the National Confederation, predecessors of the FSMB, and their eventual merger
into the FSMB as we it know today. The third and fourth chapters,
“Beginnings, Growth, and Challenges” and “Stasis and Resurgence”
capture changes in medical licensure and discipline from 1912 to
1929 and 1930 to 1959 respectively. During these decades, there
was a concentrated effort to standardize medical licensure
requirements and exams, promote interstate endorsement of
licensure, and publish an informational bulletin. The fifth chapter
focuses on the early leadership of the Federation and in particular,
Walter Bierring's many contributions to the growth of the FSMB
are highlighted. Finally, the last three chapters focus on periods
between 1961 and 1979, 1980 and 2001, and 2001 until the
present day. Key developments featured within these pages
include the push for greater transparency of board disciplinary
actions, public accountability, a rigorous medical licensure examination and the expansion of the Federation as a source of data.
The final chapter brings readers to the state of a 21st-century
medical licensing board—a still relevant and now international
organization focused on international data sharing, clinical competence and physician accountability.
Throughout the book, authors Johnson and Chaudhry do an
excellent job of outlining the complexity of medical regulation and
how it is intimately tied to politics, government policy, and the
changing attitudes of the American public. As a result, readers
have the opportunity to not only learn about the formation of the
FSMB, but also about the influence critical periods or events
throughout American history – such as the Jacksonian Era, American
Civil War, and the expansion of Medicare and Medicaid – had
on the formation of regulatory policy. Medical Licensing and
Discipline in America is a valuable resource for those interested
in better understanding the history of state medical licensing
boards—or more broadly, the evolution of medical education and
practice in the United States. Johnson's and Chaudhry's work
represents an important contribution to the existing literature on
the medical regulatory system: with so much discussion about
patient safety and malpractice in today's healthcare environment
Medical Licensing and Discipline in America is a timely book that
prompts reflection on the role of the medical licensing board in
the ongoing conversation.
Khin-Kyemon Aung n
19 Quincy Mail Center Cambridge, MA 02138, United States
E-mail address: [email protected]
Received 22 April 2013
Available online 4 May 2013
n
http://dx.doi.org/10.1016/j.hjdsi.2013.04.001
Tel.: +1 440 364 8872.