Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Volume 1 Issues 1–2 Amsterdam–Boston–London–New York–Oxford–Paris–Philadelphia–San Diego–St. Louis © 2013 Elsevier Inc. All rights reserved. This journal and the individual contributions contained in it are protected under copyright by Elsevier Inc., and the following terms and conditions apply to their use: Photocopying Single photocopies of single articles may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier’s Rights Department in Oxford, UK: phone: (+44) 1865 843830; fax: (+44) 1865 853333; e-mail: [email protected]. Requests may also be completed on-line via the Elsevier homepage (http://www.elsevier.com/locate/ permissions). In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400; fax: (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W1P 0LP, UK; phone: (+44) 20 7631 5555; fax: (+44) 20 7631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Subscribers may reproduce tables of contents or prepare lists of articles including abstracts for internal circulation within their institutions. Permission of the Publisher is required for resale or distribution outside the institution. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this journal, including any article or part of an article. Except as outlined above, no part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Rights Department, at the fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Although all advertising material is expected to conform to ethical (medical) standards, inclusion in this publication does not constitute a guarantee or endorsement of the quality or value of such product or of the claims made of it by its manufacturer. Publication information: Journal of healthcare (ISSN 2213-0764). For 2013, volume 1 is scheduled for publication by Elsevier (Radarweg 29, 1043 NX Amsterdam, The Netherlands) and distributed by Elsevier, 360 Park Avenue South, New York, NY 10010-1710, USA. Subscription prices are available upon request from the Publisher or from the Regional Sales Office nearest you or from this journal’s website (http://www. elsevier.com/locate/hjdsi). Further information is available on this journal and other Elsevier products through Elsevier’s website: (http:// www.elsevier.com). Subscriptions are accepted on a prepaid basis only and are entered on a calendar year basis. Issues are sent by standard mail (surface within Europe, air delivery outside Europe). Priority rates are available upon request. Claims for missing issues should be made within six months of the date of dispatch. Orders, claims, and journal enquiries: please contact the Customer Service Department at the Regional Sales Office nearest you: Americas: Elsevier, Customer Service Department, 6277 Sea Harbor Drive, Orlando, FL 32887-4800, USA; phone: (+1) (877) 8397126 [toll free number for US customers], or (+1) (407) 3454020 [customers outside US]; fax: (+1) (407) 3631354; e-mail: [email protected] Europe: Elsevier, Customer Service Department, PO Box 211, 1000 AE Amsterdam, The Netherlands; phone: (+31) (20) 4853757; fax: (+31) (20) 4853432; e-mail: [email protected] Tokyo: Elsevier, Customer Service Department, 4F Higashi-Azabu, 1-Chome Bldg, 1-9-15 Higashi-Azabu, Minato-ku, Tokyo 106-0044, Japan; phone: (+81) (3) 5561 5037; fax: (+81) (3) 5561 5047; e-mail: [email protected] Asia: Elsevier, Customer Service Department, 3 Killiney Road, #08-01 Winsland House I, Singapore 239519; phone: (+65) 63490222; fax: (+65) 67331510; e-mail: [email protected] Advertising information: If you are interested in advertising or other commercial opportunities please e-mail [email protected] and your enquiry will be passed to the correct person who will respond to you within 48 hours. USA mailing notice: Journal of Obsessive-Compulsive and Related Disorders (ISSN 2211-3649) is published 4 times a year by Elsevier Inc. (P.O. Box 211, 1000 AE Amsterdam, The Netherlands). Periodical postage paid at Rahway, NJ and additional mailing offices. USA POSTMASTER: Send address changes to Journal of Obsessive-Compulsive and Related Disorders, Elsevier Customer Service Department, 3251 Riverport Lane, Maryland Heights, MO 63043, USA. AIRFREIGHT AND MAILING in the USA by Mercury International Limited, 365, Blair Road, Avenel, NJ 07001. The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper) Printed in the United States of America Healthcare 1 (2013) 1 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Introducing you to Healthcare: The Journal of Delivery Science and Innovation Amol Navathe n, Sachin Jain, Ashish Jha, Arnold Milstein, Richard Shannon 17 Worcester Street, Unit 7, Boston, MA 02118, United States Dear Readers, It is with great pleasure that we introduce you to Healthcare: The Journal of Delivery Science and Innovation. As our journal moves forward, we hope you will find on its pages descriptions of leading strategies to implement new payment models and improve care delivery processes; creative approaches to organizing care around patients and their conditions; and novel applications of information technology to enhance health system performance. With a rapid pace of health system transformation, there exists a tremendous opportunity to learn what works and what does not to improve patient care. We hope this journal will occupy a unique space: a hub for the growing community of practitioners to share critical ideas and n Corresponding author. Tel.: +1 12679758833. E-mail addresses: [email protected], [email protected] (A. Navathe). 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.05.004 knowledge about healthcare delivery with one another. More importantly, we hope that decision-makers will find the evidence, strategies, and methods needed to support translation of first-rate ideas into broader practice. We will highlight the successes and challenges of care delivery innovation and work to close the knowledge gaps that exist between academics, front-line practitioners, industry and policy-makers. Our first issue focuses on these issues as they relate to payment reform—a key emerging driver of health system transformation. We hope you will consider contributing your work to Healthcare: The Journal of Delivery Science and Innovation. Only if we work together and share our experiences, knowledge, and insights can we unlock the true potential that exists within our systems of care. Healthcare 1 (2013) 2–2 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Editorial Introduction to Healthcare: The Journal of Delivery Science and Innovation It's hard to imagine a more exciting era in US health care. This is a time of great challenge – and even fretfulness – for America, but also a time of great opportunity. We are dealing with a system too often fragmented, impersonal, and ineffective, that is at the same time so expensive as to have become unsustainable. We spend almost twice as much per person on healthcare as any other country on earth, but our lifespan is not even in the top 20. The main problem lies in our delivery system, which is outdated, unreliable, unsafe, fragmented, and full of waste. We need to change that system, and bring the focus back to where it should always have been: on the patient. And we need a system that thrives and celebrates when people stay well; not one that fixes its sights so thoroughly downstream on burdens of illness, injury, and disability that could be prevented upstream. Changing the delivery system, however, is not going to happen overnight. And, it is not going to be done by one person, one group, or even one arm of society. To make truly meaningful change, we need to bring together the best minds across academia, medicine, government, the community, and more. Health Care: The Journal of Delivery Science and Innovation will help do just that. Its aim is to enable new ideas to arise, grow, mature, become visible, and ultimately reach the people who can turn the idea into wide-scale impact. America could in time have the best health care system in the world. We have all the elements, the latest technologies, the best parts, and committed people, but we cannot reach that broad goal if the parts of care persist in fragmentation and fail to work together. The vast tectonic shifts of the past few years, including, but by no means limited to the Affordable Care Act, set the stage of something truly marvelous to happen with the care we all want and need. I am confident that Health Care: The Journal of Delivery Science and Innovation will provide a venue for sharing and shaping ideas that will help meet that challenge: to improve the quality, justice, and sustainability of our health care system, and ensure that health care is a right for all Americans. President Emeritus and Senior Fellow Donald M. Berwick MD, MPP n 20 University Road, 7th Floor Cambridge, MA 02138, USA E-mail address: [email protected] Received 22 April 2013 Available online 4 May 2013 n 2213-0764/$ - see front matter & 2013 Published by Elsevier Inc. http://dx.doi.org/10.1016/j.hjdsi.2013.04.010 Tel.: +1 440 364 8872. Healthcare 1 (2013) 3 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Introduction to Healthcare: The Journal of Delivery Science and Innovation I am pleased to introduce this first issue of Healthcare: The Science of Delivery and Innovation. Much of my work has focused on building the science of delivery. Thus, it is particularly gratifying to see a first journal dedicated to strengthening this science in the field of health care. While engaged in development work in many countries, I have been struck by the fact that when it comes to our most cherished social goals—health, education, social protection, environmental sustainability—we have an inexplicably high tolerance for poor execution. Downplaying execution has sometimes been seen as proof that our minds are focused on high ideals, not bogged down in the mundane mechanics of implementation. This confusion has compromised health care in the United States and many other places—and it is no longer acceptable. Over the past several years, delivery pioneers around the world have begun considering how to harness systems engineering, operations science, managerial science, leadership, and strategy to improve execution in health care and other fields that are critical for human wellbeing. Failure to incorporate multidisciplinary perspectives into health care delivery has hampered progress towards the goals that we all want to reach: better health outcomes and reduced health care costs. How can we capture and disseminate insights from successful practice? What are the possibilities and limits of dialog with other disciplines? How can we learn deeply from failures and continuously stimulate innovation to build a science of health care delivery? These are the fundamental questions the editors and authors of this journal are asking. The work in these pages brings the early seeds of leadership and transformation that we so urgently need: in health care delivery and across the social goals that define our highest aspirations for human progress. President Jim Yong Kim MD, PhD. The World Bank Group Received 22 April 2013 Available online 4 May 2013 2213-0764/$ - see front matter & 2013 Published by Elsevier Inc. http://dx.doi.org/10.1016/j.hjdsi.2013.04.011 Healthcare 1 (2013) 4–7 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Making the RCT more useful for innovation with evidence-based evolutionary testing Kevin G. Volpp a,b,c,d,e,n, C. Terwiesch c,d, A.B. Troxel b,c, S. Mehta b,c,e, D.A. Asch a,b,c,d,e a Center for Health Equity Research and Promotion, Philadelphia VA Medical Center, USA Leonard Davis Institute Center for Health Incentives and Behavioral Economics, USA c University of Pennsylvania, School of Medicine, USA d The Wharton School, USA e Penn Medicine Center for Innovation, USA b ar t ic l e i nf o a b s t r a c t Article history: Received 9 February 2013 Received in revised form 18 April 2013 Accepted 20 April 2013 Available online 9 May 2013 We propose a new innovation model designed to accelerate the rate of learning from provider payment reform initiatives. Drawing on themes from operations research, we describe a new approach that balances speed and rigor to more quickly build evidence on what works in delivery system redesign. While randomized controlled trials provide “gold standard” evidence on efficacy, traditional RCTs tend to be static and provide information too slowly given the CMMI tagline of “We can't wait.” Our approach speaks to broader needs within health financing and delivery reform for testing that while rigorous recognizes the urgency of the challenges we face. Published by Elsevier Inc. Keywords: Innovation Evidence Randomized controlled trials The Center for Medicare and Medicaid Innovation (CMMI) launched the Health Care Innovation Challenge in November of 2011 and called on teams from around the United States to design and evaluate innovative programs that would accomplish the triple aim of improving health, improving health care, and reducing costs. The creation of CMMI more broadly provides an opportunity to think about how to catalyze and evaluate innovations in provider payment, health care financing, and health care delivery. The Health Care Innovation Challenge was branded with the tagline, “We can't wait”—a reference both to the urgency of meeting the triple aim, and to the delays that traditional prospective evaluations often create. Innovation is slowed by evaluation, but innovation without evaluation limits learning. As a result, accelerating evaluation is a central challenge in health care innovation. Building on the literature of innovation management and design thinking, we propose a more flexible evaluation system that could be used by organizations in developing, testing, and evaluating provider payment and delivery system initiatives. This new approach emphasizes the fluid and dynamic nature of the innovation process. We argue that this added flexibility and n Correspondence to: University of Pennsylvania, School of Medicine, 1120 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021, USA. Tel.: +1 215 573 0270; fax: +1 215 573 8778. E-mail addresses: [email protected], [email protected] (K.G. Volpp). 2213-0764/$ - see front matter Published by Elsevier Inc. http://dx.doi.org/10.1016/j.hjdsi.2013.04.007 increased speed sacrifices only a small amount of the rigor of traditionally conceived randomized controlled trials. Experimentation and testing are at the heart of innovation. Ideally, new products or services are created and then empirically tested and improved upon through experiments.1 In assessing health care interventions, the typical approach to experimentation is either a pre-post study (not a true experiment) or a randomized controlled trial (RCT). In a pre-post study, an outcome is measured before the intervention and then again after the intervention and the change in outcome is then assumed to represent the impact of the intervention. Such one-arm demonstration projects allow only weak conclusions since observed changes over time may be due to the intervention or any number of measured or unmeasured confounders.2 A common problem is regression to the mean, because interventions often target individuals whose condition at enrollment is extreme (more often hospitalized, worse diabetes control) with the aim of putting them closer to the mean (hospitalized an average amount, average diabetes control). With time, simple regression to the mean can masquerade as “improvements” falsely attributed to the intervention. Randomized trials are often implicitly passed over as ways to evaluate innovations in care delivery or financing because of a misguided view that even though all new drugs require evaluation of their safety and effectiveness, policy or delivery system innovations do not require evaluation so long as they are thoughtful and well-conceived. In a standard randomized trial, participants are assigned randomly between the current care process and one K.G. Volpp et al. / Healthcare 1 (2013) 4–7 5 (or multiple) treatment arms that reflect a new care process or financing approach and the experimental evaluation of the intervention is simply the difference in outcome between the treatment and control groups. Randomized trials provide strong causal evidence of effects, but tend to be lengthy and narrowly focused on the effect of a precisely specified intervention. That precision may be less relevant in highly naturalized and complex settings like clinics and hospitals where the effects of a policy innovation are likely to be substantially modified by the particular people and context of the setting. A weakness of the traditional RCT is that the organization or researcher must commit to one new care or financing model ex-ante. Study teams passively observe how this model does over time. Although RCTs sometimes require interim analyses and intermediate stopping rules, few opportunities exist to improve the trial while the experiments are unfolding as teams typically wait until the end of the three or four year trial for the results to come in. Those are good rules when we want high confidence and are not constrained by the need for rapid results. But they are too strict if each approach is mostly meant to inform the next one in a rapid cycle innovation approach. In those cases we should be willing to make faster changes in the middle of a trial and tolerate much higher Type I error rate (i.e., the chance of concluding a significant improvement when none really exists). Evaluation of innovation in health care often rests at one of the following two extremes: (1) no evaluation at all or pre-post designs with limited ability to draw causal inferences; or (2) lengthy and expensive randomized trials whose results are too context-dependent to justify the long trial times they require. The design theorist and architect Christopher Alexander emphasized that the design of new systems requires iterating between the abstract, mental world where hypotheses are formulated and later tested in the actual world capturing the empirical reality. If there exist explicit milestones at which the researcher switches from the mental world to implementation, the process of innovation resembles the pathway shown in Fig. 1. In the innovation management literature, this process is commonly referred to as the stage-gate process for innovation. A vast body of research in the areas of product development and innovation has shown mathematically3 as well as empirically4–6 that the static processes of the stage-gate model are best suited for incremental innovations in low uncertainty environments.7,8 However, the challenge of transformative health care innovation is not to develop incremental change but to create a radically new care process for patients with high rates of poor outcomes and high costs.4–6 What CMMI seeks is a substantially new care process for patients, to shift the focus of health care from providing care through a visit-based model in which patients see providers when they are sick to one focused on improving population health. Such transformations require innovation simultaneously in (a) the operations of the care model; (b) the technological infrastructure and workforce; and (c) the mindsets and engagement of both patients and providers and the incentives they face. An open question is whether population health is defined as focusing on clinical aspects of population health or more broadly social and environmental determinants of health.9 In either case, an approach to design, testing, and evaluation is needed that can provide rigorous but timely feedback on effectiveness. The process model that the innovation literature recommends for such radical, high uncertainty innovation challenges is one of rapid iteration and validation. In “The Science of the Artificial,” Nobel laureate Herbert Simon emphasizes that certain problems, especially those of high complexity, are best solved by iteration. Simon's work had a major impact on the fields of design and innovation and the current literature in these fields strongly embraces rapid iteration between the mental world and empirical reality (see Fig. 2). In these settings, the point in time at which the definition of the new product or process is finalized is deliberately kept open for longer, creating an opportunity for learning and flexibility for future adjustment in response to new data.4–6 This type of innovation research is inspired by models of software development.7,8 For example, a 3-year evaluation could be sequenced into multiple learning cycles, each cycle refining the previous one (Fig. 2). This iterative model reflects the fact that learning about implementation comes during a trial, even if comparative outcomes do not arrive until the end, and knowledge of both implementation and outcomes can be used in refining the approach that is being tested. Version 2 (V2) reflects the knowledge gained from implementing Version 1 (V1), but may also be informed by parallel experiments of narrower components performed alongside a larger intervention, such as testing different ways of incenting patients or providers, different technologies, or different assessment or data collection techniques—any elements of the central intervention that includes embedded alternatives that can be compared. These side experiments can be carried out with a different set of patients or providers to avoid interfering with the main study population. Both of these sources of knowledge allow organizations (or researchers) to improve and refine the care model. The intervention can be dynamically adjusted to the new data obtained from the clinical implementation instead of “being stuck” with the initial hypothesis. This form of rapid cycle innovation might be called evidence-based evolutionary design. A common critique of rapid cycle innovation is that it appears chaotic compared to the carefully crafted project management plans of the more traditional innovation process with its distinct phases of design, build, implementation, and evaluation following the logic in Fig. 1.2 But the approach derives value from the recognition that even the most conceptually grounded and evidence-based interventions can always be improved. While a traditional RCT asks the question, “Which among these fixed alternative approaches is the best?” evidence-based evolutionary design asks the question, “How can I make my processes better?.” A fundamental appeal of the second approach is that it has no limit and supports the notion of continuous improvement. Fig. 1. Problem solving model.14 Fig. 2. Flexible model of iterative innovation. 6 K.G. Volpp et al. / Healthcare 1 (2013) 4–7 Fig. 3. Evidence based evolutionary testing. An example of our approach, in the context of our own applications to the CMMI 2012 Innovation Challenge, is summarized in Fig. 3; we present both a sequence of interventions related to development of an approach that could be used in populationbased financing of providers and side experiments. We use as a foundation the concept of “automated hovering” in which advances in behavioral economics, changes in health care financing to emphasize population-based health improvement as opposed to reimbursement for visits or procedures, and the proliferation of wireless devices and social media come together to create new possibilities to improve health engagement among patients.9 In the evidence-based evolutionary designs approach, the traditional RCT is adapted to better accommodate ongoing improvement in care processes but still retain the appealing RCT features that allow definitive evaluation. All interventions will be compared in a randomized design against patients receiving usual care as a control. In that way, the design remains true to the principles of the RCT. Rather than deploy a single intervention during this 3 year period, the investigative team has begun with a current state-ofthe-art standard (version 1.0), which will be modified based on new experience and early evidence, with an enhanced Version 2.0 deployed about midway through the trial period. Randomization to either active treatment will occur at a 2:1 ratio relative to controls, so that there are equally sized groups receiving usual care, Version 1.0, and Version 2.0, optimizing statistical power for all comparisons. Insights from these two models could then be used to develop a larger-scale model to be deployed with insurer partners in months 34–36. The result is that at the end of the 3 year period a team of investigators will have [1] deployed a state-of-the-art approach to achieve the triple aim in these patients based on the best available evidence; [2] compared Version 1.0 to a randomly assigned usual care control; [3] run a series of side experiments that will inform subsequent modifications; [4] deployed and tested an enhanced Version 2.0 against a randomly assigned usual care control. Results can be analyzed comparing any intervention (Version 1.0 or 2.0) to control—a comparison that spans the entire period; [5] deployed a version for scaled up implementation (Version 3.0) that has gone through two cycles of product development. Because of the unbalanced randomization, it will also be possible to compare Version 1.0 and Version 2.0 directly, although we acknowledge potential confounding by time or other cohort effects since this comparison does not rest on randomized groups. While we think it is cleaner to test Version 1.0 and Version 2.0 in separate cohorts, this could also be done using a single cohort. Note that this illustration contains only three cycles because of the limited time period of the funding mechanism; we would suggest that this type of iteration based on feedback and evaluation be continuous and ongoing. This approach to testing ideas using sequential randomized trials is likely to advance knowledge on the effectiveness and cost effectiveness of provider payment or other care delivery initiatives more rapidly than either: (1) implementation or “scaling up” of programs that have some limited supportive evidence; (2) a prepost design; or (3) a traditional RCT because such approaches either do not facilitate rigorous evaluation or efforts to expedite incorporation of intervention improvements. We put this framework in terms of alternative extremes in terms of how organizations can design, test, and evaluate new ideas3: (a) development and improvement or (b) Darwinian selection. To the extent that there exists a significant amount of ex ante knowledge about the innovation under development, innovation can be “rationally designed and executed” using development and improvement. Careful plans are developed, milestones are defined, and the plans are executed. The strength of the approach lies in its predictability—you get what you plan for. The problem with the development and improvement approach is that it does not do well under high degrees of uncertainty/ambiguity (low amount of ex ante knowledge and/or a fast moving world.5 In such settings, problem solving is a much more organic and evolutionary process.4 Many alternatives have to be considered and since there exists insufficient ex ante information about an evaluation of these opportunities, many of them will be launched and explored in parallel.10 This is the basis for considering an approach more akin to Darwinian selection. To categorize all alternative development approaches/methodologies, it is thus helpful to lay out a continuum ranging from “high amount of prior knowledge” to “low amount of prior knowledge.” In health care delivery settings, high amounts of prior knowledge imply that standard RCTs can be an effective approach in which the alternatives are defined, the sample sizes computed, and the trials implemented and evaluated. When prior knowledge levels are low, an entirely organic process of Darwinian selection makes more sense: start 10 parallel alternatives and judge at the end.11 We view our proposed methodology as a hybrid which reflects that we have an intermediate amount of prior knowledge. We build in the flexibility of adjusting/responding to new information as the process unfolds. This is analogous to the process articulated by MacCormack, Verganti, and Iansiti in their influential work in Management Science and work by Christensen and others that suggest using a learning mindset to continue to refine and improve new models as they are built, rather than pursuing a focused approach that attempts for prove/disprove (in case of a study) or scale (in the case of a business) a model that has not yet been validated and fully refined.12,13 These approaches share the notion that too much is uncertain and unknown at the outset of a new venture to deploy a static model of evaluation. However, we do not engage in a parallel exploration of alternatives. At any given time, we define the best alternative based on the limited knowledge that we have. As new knowledge comes in, we refine the alternative. The approach we describe here is intended to be a framework that can be broadly adopted as a way of continuously improving and building evidence as opposed to a single intervention that would be “approved.” Part of the concept is that improvement is always possible and by deploying new interventions in a manner that forces both identification of relevant outcomes and assessment of how well two or more alternatives perform in achieving those predetermined outcomes will facilitate continuous improvement based on evidence as opposed to presupposition. K.G. Volpp et al. / Healthcare 1 (2013) 4–7 While evaluation of complex care initiatives is inherently complex and there is inherent cross-sectional variability and lots of confounding influences, this approach could help separate signal from noise. There are tradeoffs inherent in our model. A principal feature relative to static designs is that the intervention groups continually change over the time period of evaluation, such that the “effect” is the sum of the impact of all the various iterations throughout the course of the study. This means that the results reflect the evolution and improvement of the model over time and makes it difficult to isolate the impact of specific interventions. This provides a test of an approach, which while it can be clearly described is more difficult to exactly replicate than a static intervention. Another limitation is that if multiple versions are to be explicitly tested, sample size requirements increase accordingly. We do not think premature evaluation of underdeveloped concepts is the fundamental problem; the problem is more one of inadequate evaluation of concepts that have shown some initial promise but which then are not subjected to ongoing evaluation, assessment, and improvement as they move toward wider scale adoption. Efficiently facilitating innovation will be important for the CMMI and health care delivery organizations as new models of payment and health care delivery are developed. This will be a key ingredient to maximizing the impact of the resources allocated to new initiatives in the least possible time, with the overall goal of improving the value of health care delivered in the United States quickly but building an evidence base and a process that will allow us to continue to do better. 7 References 1. Simon HA. The Sciences of the Artificial. 3rd., Cambridge, MA: The MIT Press; 1996. 2. Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for Research.Dallas: Houghton Mifflin Company; 1963. 3. Sommer S, Loch C. Selectionism and learning in projects with complexity and unforeseeable uncertainty. Management Science. 2004;50:1334–1347. 4. MacCormack A, Verganti R, Iansiti M. Developing products on internet time: the anatomy of a fexible development process. Management Science. 2001;47: 133–150. 5. Eisenhardt K, Tabrizi B. Accelerating adaptive processes: product innovation in the global computer industry. Administrative Science Quarterly 1995;40:84–110. 6. Terwiesch C, Loch C. Measuring the effectiveness of overlapping development activities. Management Science. 1999;45:455–465. 7. Boehm B. A spiral model of software development and enhancement. Computer. 1988;21:61–72. 8. Connel J, Shafer L. Structured Rapid Prototyping—An Evolutionary Approach to Software Development.Englewood Cliffs, NJ: Prentice Hall; 1989. 9. Noble DJ, Casalino LP. Can accountable care organizations improve population health?: should they try? JAMA: The journal of the American Medical Association. 2013;309:1119–1120. 10. Loch C, Terwiesch C, Thomke S. Parallel and sequential testing of design alternatives. Management Science. 2001;47:663–678. 11. MacCormack A. Management lessons from Mars. Harvard Business Review. 2004;32:18–19. 12. Christensen CM, Raynor ME. The Innovator's Solution: Creating and Sustaining Successful Growth. Boston, MA: Harvard Business School Press; 2003. 13. McGrath RG, MacMillan IC. Discovery Driven Planning.Boston, MA: Harvard Business School Press; 1995. 14. Alexander C; 1964. Healthcare 1 (2013) 8–11 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Perspectives Paper Will new care delivery solve the primary care physician shortage?: A call for more rigorous evaluation Clese E. Erikson n Center for Workforce Studies, Association of American Medical Colleges, 2450 N Street, NW, Washington, DC 20037, USA ar t ic l e i nf o a b s t r a c t Article history: Received 1 March 2013 Received in revised form 12 April 2013 Accepted 18 April 2013 Available online 9 May 2013 Transformations in care delivery and payment models that make care more efficient are leading some to question whether there will really be a shortage of primary care physicians. While it is encouraging to see numerous federal and state policy levers in place to support greater accountability and coordination of care, it is too early to know whether these efforts will change current and future primary care physician workforce needs. More research is needed to inform whether efforts to reduce cost and improve quality of care and population health will help alleviate or further exacerbate expected primary care physician shortages. & 2013 Elsevier Inc. All rights reserved. Keywords: Primary care shortages Team-based care Triple aim Accountable care organizations Payment reform Workforce 1. Introduction Health care in the United States is undergoing major transformations in care delivery and payment models that are bringing us closer to the goals of the IHI “triple aim”1 of improving both the patient experience of care and population health while simultaneously reducing cost. New payment models that reward value over volume such as Accountable Care Organizations (ACOs) and increased payments for practices recognized as Patient-Centered Medical Homes (PCMHs) are providing incentives for primary care practices to redesign care delivery, leading them to integrate novel team members and new technology to facilitate highly-coordinated, team-based care. While much of the innovation at the national level is spurred by the Affordable Care Act,2,3 states4 and even private insurance companies5,6 are also experimenting by paying more for care coordination in the hopes of reducing duplication of effort and avoidable hospitalizations and emergency room visits. While there is a growing body of evidence that such transformations in care delivery and payment models hold significant potential for helping to advance the triple aim,7–10 there is very little empirical data available to assess what affect these changes will have on workforce needs. New models of primary care are increasingly incorporating team members such as nurse n Tel.: +1 202 828 0587. E-mail address: [email protected] 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.008 practitioners (NPs), physician assistants (PAs), medical assistants (MAs), health coaches, care coordinators, and community health workers who facilitate task delegation and shared decision-making, thus potentially reducing the need for more physicians. Furthermore, advances in technology such as the adoption of electronic medical records (EMRs) and telemedicine facilitate a shift away from in-person office visits, which could also mitigate physician shortages. Thus, in an era of expected primary care physician shortages,11–14 these transformations in care delivery and payment models offer potential solutions for offsetting primary care shortages by incentivizing a more productive and efficient health care workforce.15–18 On the other hand, the fact that these new models rely upon a robust primary care workforce to provide increased level of services and attention to quality outcomes may mean that we will actually need more primary care physicians than has been projected. It is also possible that advances in technology will not reduce the need for physicians to the extent that has been hypothesized, and could actually drive up demand for services if physicians become more accessible to patients through shared records and alternative visits through video or email. Given the heightened focus on ensuring we have an adequate primary care physician workforce, this paper first describes what is known about the workforce implications of new primary care models that incorporate team-based care, before going on to illuminate several additional factors which must be taken into account when considering the effect that the use of teams will have on primary care capacity. The paper then turns to the role C.E. Erikson / Healthcare 1 (2013) 8–11 that technology may play in easing physician shortages and raises some additional points to consider. It concludes by calling for policy makers, researchers and funding agencies to incorporate workforce implications into evaluations of the new models of care currently being tested. The evaluation plans that have been proposed thus far do not include an explicit focus on measuring and evaluating the impact these transformations have on provider capacity or demand for primary care services.19–21 Furthermore, patient panel size—a common metric used to discuss provider capacity—is fraught with ambiguity.22 Developing reliable metrics for evaluating primary care capacity and funding large scale workforce evaluations will be essential to help policy makers and health care providers identify and adopt the most efficient models of care and better inform understanding of the workforce needed to support the triple aim. 2. Teams offer potential for increasing primary care capacity Numerous studies11–14 project primary care physician shortages; those that take into account both the aging population and the expansion of insurance coverage under the Affordable Care Act generally estimate shortages of 35,000–50,000 full-timeequivalent primary care physicians by 2025. However, as a recent report by the Bipartisan Policy Center and Deloitte Center for Health Solutions points out, current models do not adequately account for new delivery models or the role that other professionals are playing in care delivery.23 In order to model the effects of care redesign on primary care capacity, researchers at the Columbia Business School and the University of Pennsylvania’s Wharton School recently designed a theoretical model that shows increasing patient panels to 3400 as a result of open access scheduling, team-based approaches to care and increased use of electronic visits could potentially offset primary care physician shortages.16 This is a significant increase over the estimated 2300 patients per provider thought to be the average among primary care providers in the United States,24 or the figure of 2500 patients per provider used in some projection estimates.16 A new study25 examining innovative primary care practices also points to the potential of team-based care to lead to greater efficiencies, demonstrating the way some practices are streamlining visits to reduce physician time on clerical work and other tasks that can be delegated through enhanced protocols and standing orders. Having lab tests ordered prior to the visit and improving team functioning through co-location, team meetings, and workflow mapping were also cited as examples for improving efficiency. While this study did not focus on panel sizes, the researchers found these efforts to redesign primary care can increase job satisfaction and, in some cases, lead to increased capacity. Cleveland Clinic Strongsville, for example, was able to increase the number of patient visits per day from 21 to 28 by using RNs and MAs in expanded roles without hiring additional physicians. Beyond helping physicians make the most efficient use of their day, this research shows potential for new care models to reduce burn-out and thus increase workforce retention. Furthermore, if medical students are exposed to these professionally satisfying models of care during clerkships and residencies, the number electing to become primary care physicians could increase as well. 3. Several missing factors must be taken into account Despite the potential offered by team-based care, several additional factors must be taken into account. The following three sections address the issues that should also be considered when 9 evaluating the effect of team-based care on primary care physician capacity. 3.1. Improved care coordination and disease management may require smaller patient-to-provider ratios The theoretical model proposed by Green and her colleagues does not take into account changes in levels of service that often accompany a team-based approach to care, such as improved care coordination and disease management, medication reconciliation, greater attention to behavioral health and other specialist access, and an increased focus on patient education.26,27 Other analysts suggest much more modest panel sizes of 1400–1900 are needed in order to accommodate the increased level of services in a teambased approach to care, more in line with what is found in practice.28 For example, Group Health Cooperative of Puget Sound reduced patient panel sizes from 2300 to 1800 in order to increase visit length and provide higher quality of care when they moved to a team-based model.29 Another factor to consider is that not all panels are created equal. Panels with high percentages of chronic care patients are often smaller, as demonstrated by the Special Care Center in Atlantic City, founded by Rushika Fernandapulle.30 The clinic, which focuses treatment on the casino workers who are sickest and thus most expensive to treat, has approximately 600 patients per MD for the two physicians in that practice, plus two NPs, eight health coaches, and one full-time social worker. The discrepancies between what is possible under theoretical models and what has been achieved in practices such as Group Health and the Special Care Center—both leaders in providing high quality care under innovative delivery models—point to the need for more systematic research to determine how panel size is changing as a result of innovations in payment models and care delivery. Accountable Care Organizations (ACOs), patient-centered medical homes (PCMHs), and other new delivery models feature a more intensive population-based approach to health along with increased care management efforts, which requires a robust primary care workforce.31 As we continue to collect data on how capacity changes under new models of care, it will be important to better understand the extent to which increased visit capacity translates to increased panel size versus improved access and continuity of care for existing patients. 3.2. Changes in the supply and scope of midlevel providers As more practices come to rely upon team-based care, the number of NPs32 and PAs33 in training is increasing rapidly. Recently, momentum for expanding the scope of practice for NPs has been growing in response to projected demand for primary care,34 and some have argued that more states should follow the 16 that already allow NPs to practice completely independently of physician oversight so that they could step in to ease the shortage of primary care physicians. Meta-analyses of numerous pilots and research studies have found that the quality of care delivered by NPs specializing in primary care is at least as high as that of physicians,35 and patients also exhibit high levels of satisfaction with NPs.36 However, given the vast variation in states’ scope-ofpractice laws,34 there is no guarantee that consensus will be reached across all 50 states which could limit the extent to which NPs will be able to take on a greater share of the primary care workload. Even if they are allowed to practice independently, it is worth noting that not all NPs choose to specialize in primary care. Recent estimates suggest that only a little over half of NPs currently practice in that field,37 limiting the numbers who could be used to alleviate primary care physician shortages. 10 C.E. Erikson / Healthcare 1 (2013) 8–11 3.3. Growing numbers of part-time providers also impacts projected need The growing interest among practitioners in working part-time is another factor that complicates primary care physician supply calculations and, in fact, team-based care is highly compatible with that preference. In a well-structured, team-based model, physicians do not need to be available all the time because the practice is deliberately designed to make sure that all team members can access patient information and share in the care. While there is no good source of national data on the percent working part-time, a recent survey of primary care physicians in Washington found that nearly one out of three work part-time and on average have 16% smaller patient panels than their full-time peers (1481 and 1764 respectively).38 Physician shortage projections do take account of part-time work, but may underestimate the extent of this growing trend. Furthermore, the projected need is for additional FTE physicians; including those who work part time translates into a far greater total number of physicians, further complicating models that suggest we do not need any additional doctors. 4. The use of technology to address workforce needs In addition to the use of new team members, technology has been heralded as a key component of new care delivery models that has the potential to increase efficiency and potentially drive down health care utilization,39 in turn saving costs and potentially decreasing the number of physicians necessary to care for a given population. For example, EMRs allow multiple providers access to all information about a single patient, thus reducing duplication of testing, increasing the speed of diagnosis, and facilitating comanagement of patients. Other technological innovations, including videoconferencing, secure messaging through patient portals, email or phone visits, and electronic monitoring of patients at home, have also been shown to improve patient access to health care and clinical outcomes while reducing emergency room utilization and health care costs.40,41 A striking example of the use of technology to reduce the need for physicians is Virtuwell, an online “clinic” created by HealthPartners in Minneapolis, MN, available to patients 24 h a day, 7 days a week for 40 conditions. Patients are effectively triaged by the system by answering a series of questions about symptoms and history, and, within 30 min, receive a treatment plan and have a prescription transmitted to their chosen pharmacy if necessary. Patients and providers alike are pleased with the system, which also has documented cost savings.42 Despite their promise, recent studies suggest that a number of challenges—including concerns about privacy and security, the lack of interoperability between different vendors, and reluctance on the part of hospitals and physicians—must be overcome before EHRs can realize their full potential,43 and in fact a recent review concluded that EMRs to date have had little impact on efficiency or health care costs.44 While they still hold great promise, it remains to be seen whether these challenges will be overcome and EMRs will alter our workforce needs. It is also unclear whether the shift away from in-person office visits that is facilitated by technology will necessarily result in reduced need for physicians. A recent study45 from Kaiser Permanente Colorado, a leader in leveraging technology to improve patient access to care and efficient use of physician time, suggests that use of clinical services—including inperson office visits—may actually increase when patients have access to their doctors through electronic portals, casting some doubt on whether use of technology will completely eliminate physician shortages in primary care. Taken together, these factors raise caveats that must be further evaluated by studying the use of technology in practices and collecting more information about the impact on workforce needs. Conclusion As we enter a new era of health care delivery driven by the triple aim and its approach to improving the quality of care while cutting costs, we must not lose sight of health workforce needs and should ask whether these efforts will help match physician supply to demand. It is true that the medical home model is gaining increasing traction and that technology is being used in innovative ways but that does not de facto mean we will need fewer primary care physicians. In evaluating the effects on the future physician workforce, it is especially important to consider the increased level of primary care service that is inherent in supporting the success of these transformation efforts. While some have proposed that innovations in care delivery, payment models, and the use of technology will eliminate physician shortages, we must evaluate evidence from practices that have adopted these measures before we draw any conclusions. To do this will require debate and analysis of the best metrics for capturing workforce data, and then to collect this data from practices that have adopted new models of care delivery. Agencies that are funding demonstration projects for new models of care should incorporate workforce analyses into their evaluation process. Metrics such as panel size and visit volume, which are commonly referred to in analyses of workforce but not always explained, must be clearly defined and used consistently in order to perform evaluations and make comparisons. Unless further attention is granted to evaluating the way new care delivery and payment models affect workforce needs, we run the risk of falling short of the workforce required to sustain these innovative efforts. Including these considerations in the assessment of new care delivery models will help us identify the most efficient approaches and help those following in the footsteps of early pioneers to design programs that make the most of our health workforce. Acknowledgments I want to thank my colleagues Scott Shipman, MD, Director of Primary Care Initiatives and Workforce Research, and Stacie Harbuck, MA, Research Analyst, who conducted site visits to team-based practices, for their insights on the impact on productivity and efficiency. I would also like to acknowledge Shana Sandberg, PhD, Research Writer, for her substantial contributions in reviewing and editing this commentary. References 1. Institute for healthcare improvement. The IHI triple aim. 〈http://www.ihi.org/ offerings/Initiatives/TripleAim/Pages/default.aspx〉. Accessed 5.04.2013. 2. Iglehart JK. Primary care update–light at the end of the tunnel? The New England Journal of Medicine. 2012;366(23):2144–2146. 3. Center for medicare and medicaid innovation. Where innovation is happening. 〈http://innovation.cms.gov/initiatives/map/index.html〉. Accessed 26.02. 2013. 4. National Academy for State Health Policy. Medical home and patient centered care map. 〈http://nashp.org/med-home-map〉. Accessed 26.02.2013. 5. Dentzer S. One payer's attempt to spur primary care doctors to form new medical homes. Health Affairs. 2012;31:341–349. 6. Raskas RS, Latts LM, Hummel JR, Wenners D, Levine H, Nussbaum SR. Early results show WellPoint's patient-centered medical home pilots have met some goals for costs, utilization, and quality. Health Affairs. 2012;31(9):2002–2009. 7. Gilfillan RJ, Tomcavage J, Rosenthal MB, et al. Value and the medical home: effects of transformed primary care. American Journal of Managed Care. 2010;16 (8):607–614. 8. Maeng DD, Graf TR, Davis DE, Tomcavage J, Bloom FJ. Can a patient-centered medical home lead to better patient outcomes? The quality implications of C.E. Erikson / Healthcare 1 (2013) 8–11 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Geisinger's ProvenHealth Navigator American Journal of Medical Quality. 2012;27(3):210–216. Milstein A, Gilbertson E. American medical home runs. Health Affairs. 2009;28(5): 1317–1326. Rosenberg CN, Peele P, Keyser D, McAnallen S, Holder D. Results from a patientcentered medical home pilot at UPMC health plan hold lessons for broader adoption of the model. Health Affairs. 2012;31:112423–112431. Kirch DG, Henderson MK, Dill MJ. Physician workforce projections in an era of health care reform. Annual Review of Medicine. 2012;63:435–445. Collwill JM, Cultice JM, Kruse RL. Will generalist physician supply meet demands of an increasing and aging population? Health Affairs. 2008;27(3): W232–W241. Petterson SM, Liaw WR, Phillips Jr. RL, Rabin DL, Meyers DS, Bazemore AW. Projecting US primary care physician workforce needs: 2010–2025. Annals of Family Medicine. 2012;10(6):503–509. Huang ES, Finegold K. Seven million Americans live in areas where demand for primary care may exceed supply by more than 10 percent. Health Affairs. 2013;32(3):1–8. Dentzer S. It's past time to get serious about transforming care. Health Affairs. 2013;32(1):6. Green LV, Savin S, Lu Y. Primary care physician shortages could be eliminated through use of teams, nonphysicians, and electronic communication. Health Affairs. 2013;32(1):11–19. Health Policy Brief: Nurse Practitioners and Primary Care. Health Affairs. 〈http:// www.healthaffairs.org/healthpolicybriefs/brief.php?brief_id=79〉; October 25, 2012. Accessed 1.03.2013. Ubel P. Why the primary care physician shortage is overblown. 〈http://www. kevinmd.com/blog/2013/03/primary-care-physician-shortage-overblown. html〉. Accessed 3.04. 2013. Fisher ES, Shortell SM, Kreindler SA, Van Citters AD, Larson BK. A framework for evaluating formation, implementation, and performance of accountable care organizations. Health Affairs. 2012;31(11):2368–2378. Shrank W. The center for medicare and medicaid innovation's blueprint for rapid-cycle evaluation of new care and payment models. Health Affairs. 〈http:// content.healthaffairs.org/content/early/2013/03/21/hlthaff.2013.0216〉; 2013 March 27. Accessed April 3, 2013 [Epub ahead of print]. Nielson M, Langer B, Zema C, Hacker T, Grundy P. Benefits of implementing the primary care patient-centered medical home: a review of cost and quality results.. Washington, DC: Patient Centered Primary Care Collaborative. 〈http://www. pcpcc.net/sites/default/files/media/benefits_of_implementing_the_primary_ care_pcmh.pdf〉; 2012. Accessed 4.04.2013. Murry M, Davies M, Boushon B. Panel size: how many patients can one doctor manage? Fam Pract Manag. 2007; 14(4): 44-51. Deloitte center for health solutions and Bipartisan policy center. The complexities of national health care workforce planning. 〈http://bipartisanpolicy.org/ sites/default/files/BPC%20DCHS%20Workforce%20Supply%20Paper%20Feb% 202013%20final.pdf〉; 2013. Accessed 25.02.2013. Alexander GC, Kurlander J, Wynia MK. Physicians in retainer (“concierge”) practice. A national survey of physician, patient, and practice characteristics. Journal of General Internal Medicine. 2005;20(12):1079–1083. Sinsky CA, Willard R, Schutzbank AM, Sinsky TA, Margolius D, Bodenheimer T. In search of joy in practice: a report of twenty-three high functioning primary care practices. Annals of Family Medicine. 2013 [in press]. Bodenheimer T. Primary care—will it survive? New England Journal of Medicine. 2006;355(9):861–864. 11 27. Grumbach K, Bodenheimer T. A primary care home for Americans putting the house in order. JAMA. 2002; 288(7): 889-893. 28. Altschuler J, Margolius D, Bodenheimer T, Grumbach K. Estimating a reasonable patient panel size for primary care physicians with team-based task delegation. Annals of Family Medicine. 2012;10(5):396–400. 29. Medical home features small panels, long visits, outreach, and caregiver collaboration, leading to less staff burnout, better access and quality, and lower utilization. AHRQ health care innovations exchange. 〈http://www.innovations. ahrq.gov/content.aspx?id=2703〉; 2010. Accessed 25.02.2013. 30. Gawande A. The hot spotters. The New Yorker. 〈http://www.newyorker.com/ reporting/2011/01/24/110124fa_fact_gawande〉; 2011 January 24, Accessed 25.02.2013. 31. Rittenhouse DR, Shatell SM, Fisher ES. Primary care and accountable care—two essential elements of delivery-system reform. New England Journal of Medicine. 2009;361(24):2301–2303. 32. Auerbach DI. Will the NP workforce grow in the future? New forecasts and implications for healthcare delivery Medical Care. 2012;50(7):606–610. 33. Hooker RA, Cawley JF, Everett CM. Predictive modeling the physician assistant supply: 2010–2025. Public Health Reports. 2011;126:708–716. 34. Schiff M. The role of nurse practitioners in meeting increasing demand for primary care. Washington, DC: National Governors Association; 2012. 35. Naylor M, Kurtzman E. The role of nurse practitioners in reinventing primary care. Health Affairs. 2010;29(5):893–899. 36. Knudtson N. Patient satisfaction with nurse practitioner service in a rural setting. Journal of the American Academy of Nurse Practitioners. 2000;12 (10):405–412. 37. Agency for healthcare research and quality. The number of nurse practitioners and physician assistants practicing primary care in the United States. 〈http:// www.ahrq.gov/research/findings/factsheets/primary/pcwork2/index.html〉; 2010. Accessed 12.04.2013. 38. Skillman SM, Fordyce MA, Yen W, Mounts T. Washington state primary care providers survey, 2011–2012: Summary of findings. 〈http://depts.washington. edu/uwrhrc/uploads/OFM_Report_Skillman.pdf〉. Accessed 25.02.2013. 39. Hillestad R, et al. Can electronic medical record systems transform health care? Potential health benefits, savings, and costs Health Affairs. 2005;24 (5):1103–1117. 40. Vo A, Brooks GB, Farr R, Raimer B. Benefits of telemedicine in remote communities and use of mobile and wireless platforms in healthcare. The UTMB Center for Telehealth Research and Policy.〈http://telehealth.utmb.edu/ presentations/Benefits_Of_Telemedicine.pdf〉. Accessed 3.04.2013. 41. Sequist TD, Cullen T, Acton KJ. Indian Health Service Innovations Have Helped Reduce Health Disparities Affecting American Indian And Alaska Native People. Health Affairs. 2011;30(10):1965–1973. 42. Courneya PT, Palattao KJ, Gallagher JM. HealthPartners' online clinic for simple conditions delivers savings of $88 per episode and high patient approval. Health Affairs. 2013;32(2):385–392. 43. Adler-Milstein J, Jha AK. Sharing clinical data electronically: a critical challenge for fixing the health care system. Journal of the American Medical Association. 2012;307(16):1695–1696. 44. Kellermann AL, Jones SS. What it will take to achieve the as-yet-unfulfilled promises of health information technology. Health Affairs. 2013;32(1):63–68. 45. Palen TE, Ross C, Powers JD, Xu S. Association of online patient access to clinicians and medical records with use of clinical services. Journal of the American Medical Association. 2012;308(19):2012–2019. Healthcare 1 (2013) 12–14 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Perspectives Paper Commentary on the spread of new payment models Michael E. Chernew n, Johan S. Hong Department of Health Care Policy, Harvard Medical School, 180 Longwood Avenue, Boston, MA 02115, USA ar t ic l e i nf o Article history: Received 12 March 2013 Received in revised form 25 April 2013 Accepted 30 April 2013 Available online 9 May 2013 Keywords: Health economics Payment model innovation 1. Introduction Although there have long been many settings where providers receive a global payment or budget from payers, such payment has been the exception, not the norm. There are many signs that the health care system is moving more rapidly away from fee for service payment towards payment models that bundle payments either globally or across episodes of care (bundled payments). For example, in the public sector, the Medicare program has set up a model of payment for Accountable Care Organizations (ACOs). Organizations participating in the Pioneer ACO program accept a global population-based budget and face both upside and downside risk for all medical care delivered to Medicare beneficiaries that receive most of their care from the ACO. Organizations in the Medicare Shared Savings ACO program also receive a global budget to care for their Medicare population and initially only share in upside savings, but over time will transition to face downside risk. Interest in the ACO programs has been substantial.1 Since the first selection of 27 ACOs for Medicare Shared Savings Program in April 2012, the number of ACOs has grown to match the growing interest in each successive year, rising from 88 selected in July 2012 to 106 selected in January 2013.2 States are also encouraging movement away from fee-forservice (FFS). For example, in Arkansas, Medicaid and private insurers will pay physicians episode-based bundled payments with upside and downside risk.3 Oregon has implemented a coordinated care organization model in Medicaid that is similar n Corresponding author. Tel.: +1 617 432 0174; fax: +1 617 432 0173. E-mail addresses: [email protected], [email protected] (M.E. Chernew). 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.009 to the ACO model and uses global payments that, by 2014, will combine payments for physical, behavioral and dental health into a single global payment.4 In 2009, the Massachusetts Special Commission on the Health Care Payment System unanimously voted to transition the state from an FFS to global payment system by 2014, though legislation is not that aggressive.5,6 In addition, eight states that are participants in the Multi-payer Advanced Primary Care Practice (MAPCP) demonstration by Center for Medicare & Medicaid Innovation (CMMI) are implementing global budgets and quality incentives in addition to an FFS system.7 Efforts in the private sector have also grown. Blue Cross Blue Shield of Massachusetts implemented the alternative quality contract (AQC) that brought provider groups into global budget contracts with quality-based performance bonuses.5,8 Reports in early 2012 suggested one in five Massachusetts residents were enrolled in coverage linked to global budgets.6,9 CareFirst in the Washington, D.C. area adopted the Patient-Centered Medical Home model that makes per-member-per-month global payments to participating practices with a shared-savings component.10 These are just a few examples, but provider interest in new payment models seems to be growing. In MA, the AQC system grew from only a small number of providers accepting significant risk in 2008, to 86 percent in 2013. This experience indicates that the transition to new payment systems may happen much more rapidly than some observers may envision. The opinions expressed in this commentary are informed largely by this experience in Massachusetts. Our focus is on global payment (including global budget) models, though many concerns apply to models that pay for episodes of care using bundled rates Providers may have many motivations for participating, but the prospect of flat or falling reimbursement rates is likely a significant factor. Unlike in FFS, under the new payment models, providers can M.E. Chernew, J.S. Hong / Healthcare 1 (2013) 12–14 capture savings if they can deliver care more efficiently. When the money gets tight, providers want more control over the money. Existing evidence, though limited, suggests providers can be successful. Relative to groups receiving FFS payment, the AQC was associated with a $15.51 (1.9%) decrease in average quarterly spending per enrollee and increases in quality of chronic care management and pediatric care.8 Over 2 years, the subgroup of providers who entered the AQC from a fee-for-service contract saved $60.75 (8.2%) in average quarterly spending and demonstrated an improvement in adult preventative care.11 These savings largely reflect referrals to less expensive providers, particularly in the first year, and utilization reductions. Participating providers largely capture these savings.12 Other experiences with fixed payments to providers, accumulated over decades, show that when providers were given fixed payments for patients they reduce spending.13–15 In fact, experiences from fixed payments under California's managed care systems showed a reduction in hospital cost growth largely through reductions in utilization.16 Some observers have expressed concern that while the global payments have been shown to alter provider behavior, the associated savings are captured by the providers themselves, not by the payers and ultimately beneficiaries or taxpayers.17 This should not be a major concern. We should not worry very much about providers making profits if they can be achieved by greater efficiency (as opposed to excessive pricing). These systems are designed to provide incentives for providers to be more efficient. The incentives inherently mean that providers will capture some of the savings from efficiency—the greater the shared savings percent, the stronger the incentives. Yet, ultimately, a good payment system requires providers to be able to succeed in the new environment. If they achieve savings, and thus earn profits, the system will be able to survive with lower payment updates. This slower payment trajectory is the real metric of success because long run growth rates will swamp short term savings. If the profits are captured by payers, incentives for efficiencies diminish and without such efficiencies, the system will not be able to be fiscally sustainable in the future. Yet despite the incentives, the success of global payments systems is not a certainty. Some of the concerns are obvious. For example, will organizations beyond the few success stories, be able to transform themselves to meet their objectives? Optimizing the workforce in the new environment will be important. The cultural changes are significant. Coordination across different institutions and associated need for information technology is also important. If providers integrate, which seems like a reasonable expectation, we must be concerned that the resulting market power will lead them to increased prices in the private sector.18–20 Moreover, in integrated models, incentive systems to allocate funds across different groups in the integrated system will be needed. Those systems may be FFS, but operate with managerial controls that may distinguish them from existing FFS systems. For example, CareFirst's patient-centered medical home builds upon the existing FFS system with participation enhancements and outcomes-based incentives that allow for participating primary care physicians to earn fees above the base fee schedule.21 It is certainly possible that the cost of running the systems will exceed the savings.22 Finally, designing effective regulation will be a necessity but a lot of work needs to be done in this area. Even if we transition to payment models in which providers can reduce spending (or lower the rate of spending growth), there is a concern about quality of care. While global payments provide incentives to reduce costs, efforts to control utilization, in theory could lead to denial of, and thus lower quality of, care. Most global payment models implemented today include incentives to improve quality, and in fact, evidence suggests they have been 13 successful.11,23 Yet, quality measures are incomplete and concerns that systems facing incentives to provide less care will not provide optimal care are genuine. Yet several factors balance out this concern. Specifically, in a global payment model, providers have incentives to coordinate care across all settings which could improve quality. Similarly, providers face the cost of adverse events so they have incentives to minimize bad outcomes and improve population health. In an FFS system, providers do not get paid for many quality enhancing activities, but a global model can create incentives for such initiatives. More broadly, it is important to recognize that in the FFS system, the incentives are to provide more care, which could lead to poor quality, and in fact, there is ample evidence of over use, which could lead to poor outcomes.24 For example FFS encourages use of services such as back surgeries, which evidence suggests is often no needed.25–27 Continued monitoring and research will be needed in these new systems to support quality improvement and guard against stinting on care. Other less obvious concerns will also arise. For example, one concern is the ability of private sector innovators to appropriate the value of their innovations. In a global payment model, providers capture some of the savings associated with more efficient care. However, that incentive exists only for the beneficiaries of the insurer who implements the program. Yet, many providers serve patients from multiple plans. They may practice similarly for all their patients.28 As a result, other insurers (or selfinsured employers) may benefit from the spillover. This diminishes the incentives for insurers to adopt these programs and over time may diminish their spread, not for lack of provider interest, but because insurers may not be able to capture savings over time. Insurers can address this issue in several ways, including limiting their provider networks (though beneficiaries may want choice) or adopting services that complement the payment model but are limited to their own beneficiaries. These may include data support services that help target their beneficiaries or targeted case management. Policymakers can minimize the free rider/appropriability problem by encouraging all payer payment systems. Moreover, provider preferences may further mitigate this concern if delivery systems seek out such contracts from all of their payers. In fact, the five AQC groups in Massachusetts opted to participate in the Pioneer program, presumably in part because they want to have a single set of incentives across their patient population—not live in a world with FFS incentives for some patients and global budget/quality incentives for others. It is not clear if this experience will generalize. 2. Conclusion For decades, the health care system has been characterized largely by a FFS system which, given the prices that prevailed, generated incentives for providers to provide more and not necessarily better care. This system has resulted in an inefficient health care system with rising spending. Bundled or global payment models provide incentives to lower spending and can thus be an important part of efforts to address our fiscal concerns. As many observers have noted, there will be challenges to implementing a global payment model that lowers spending growth, allows all stakeholders to benefit from the captured savings, and maintains quality of care. Monitoring the progress of new adopters of bundled and global payment systems will be important for policymakers. While it is too soon to say with a degree of certainty whether or not global payments will succeed, global payment models provide an alternative from the currently unsustainable FFS system with the potential to significantly slow health care spending growth. 14 M.E. Chernew, J.S. Hong / Healthcare 1 (2013) 12–14 References 1. Muhlestein D. Continued growth of public and private accountable care organizations. Health Affairs Blog. 2013. 2. Centers for medicare & medicaid services. Shared savings program: Program news and announcements; 2013. 3. Emanuel EJ. The Arkansas innovation. The New York Times; 2012. 4. Stecker EC. The Oregon ACO experiment—bold design, challenging execution. New England Journal of Medicine. 2013. 5. Song Z, Landon BE. Controlling health care spending—The Massachusetts experiment. New England Journal of Medicine. 2012;366(17):1560–1561. 6. Mechanic RE, Altman SH, McDonough JE. The new era of payment reform, spending targets, and cost containment in massachusetts: early lessons for the nation. Health Affairs. 2012. 7. Takach M. Reinventing Medicaid: State innovations to qualify and pay for patient-centered medical homes show promising results. Health Affairs. 2011;30(7):1325–1334. 8. Song Z, Safran DG, Landon BE, He Y, Ellis RP, Mechanic RE, et al. Health care spending and quality in year 1 of the alternative quality contract. New England Journal of Medicine. 2011;365(10):909–918. 9. Kowalczyk L. Cost-controlled health coverage gaining ground in mass. Boston Globe. 2012:30. 10. Dentzer S. One payer's attempt to spur primary care doctors to form new medical homes. Health Affairs. 2012;31(2):341–349. 11. Song Z, Safran DG, Landon BE, Landrum MB, He Y, Mechanic RE, et al. The ‘alternative quality contract,‘based on a global budget, lowered medical spending and improved quality. Health Affairs. 2012;31(8):1885–1894. 12. Chernew ME, Mechanic RE, Landon BK, Safran DG. Private-payer innovation in Massachusetts: the ‘alternative quality contract. Health Affairs. 2011;30 (1):51–61. 13. Landon BE, Reschovsky JD, O’Malley AJ, Pham HH, Hadley J. The relationship between physician compensation strategies and the intensity of care delivered to Medicare beneficiaries. Health Services Research. 2011;46(6pt1):1863 -82. 14. Liu C-FC, Michael K, Perkins Mark W, Fortney John, Maciejewski Matthew L. The impact of contract primary care on health care expenditures and quality of care. Medical Care Research and Review. 2008;65(3):300–314. 15. Wieland D, Kinosian B, Stallard E, Boland R. Does medicaid pay more to a program of all-inclusive care for the elderly (PACE) Than for fee-for-service 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. long-term care? The Journals of Gerontology Series A: Biological Sciences and Medical Sciences. 2013;68(1):47–55. Zwanziger J, Melnick GA, Bamezai A. Costs and price competition in California hospitals, 1980–1990. Health Affairs. 1994;13(4):118–126. Nardin R, Himmelstein D, Woolhandler S. Medical spending and global budgets. Health Affairs. 2012;31(11):2592. Melnick GA, Zwanziger J, Bamezai A, Pattison R. The effects of market structure and bargaining position on hospital prices. Journal of Health Economics. 1992;11 (3):217–233. Vogt, WB, Town, R. How has hospital consolidation affected the price and quality of hospital care? ;2006. Capps C, Dranove D. Hospital consolidation and negotiated PPO prices. Health Affairs. 2004;23(2):175–181. CareFirst BlueCross Blue Shield. Patient-centered medical home program: program description and guidelines; 2011. Bielaszka-DuVernay C. Vermont’s blueprint for medical homes, community health teams, and better health at lower cost. Health Affairs. 2011;30 (3):383–386. Song Z, Safran DG, Landon BE, He Y, Ellis RP, Mechanic RE, et al. Health care spending and quality in year 1 of the alternative quality contract. New England Journal of Medicine. 2011;365(10):909–918. Korenstein D, Falk R, Howell EA, Bishop T, Keyhani S. Overuse of health care services in the United States: An understudied problem. Archives of Internal Medicine. 2012;172(2):171. Shreibati JB, Baker LC. The relationship between low back magnetic resonance imaging, surgery, and spending: impact of physician self‐referral status. Health Services Research. 2011;46(5):1362–1381. Deyo RA, Mirza SK, Martin BI, Kreuter W, Goodman DC, Jarvik JG. Trends, major medical complications, and charges associated with surgery for lumbar spinal stenosis in older adults. The Journal of the American Medical Association. 2010;303(13):1259–1265. Rau J. Hospitals Have Got Your Back, Maybe A Little Too Quickly. Shots: Health News From NPR; 2011. Glied S, Zivin JG. How do doctors behave when some (but not all) of their patients are in managed care? Journal of Health Economics. 2002;21 (2):337–353. Healthcare 1 (2013) 15–21 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Global budgets and technology-intensive medical services$ Zirui Song a,b,n, A. Mark Fendrick c,d, Dana Gelb Safran e,f, Bruce E. Landon a,g, Michael E. Chernew a,b a Department of Health Care Policy, Harvard Medical School, United States National Bureau of Economic Research, United States Department of Internal Medicine, Center for Value-Based Insurance Design, University of Michigan, United States d Department of Health Management & Policy, Center for Value-Based Insurance Design, University of Michigan, United States e Blue Cross Blue Shield of Massachusetts, United States f Department of Medicine, Tufts University School of Medicine, United States g Division of General Medicine and Primary Care, Department of Medicine, Beth Israel Deaconess Medical Center, United States b c ar t ic l e i nf o a b s t r a c t Article history: Received 18 January 2013 Accepted 10 April 2013 Available online 9 May 2013 Background: In 2009–2010, Blue Cross Blue Shield of Massachusetts entered into global payment contracts (the Alternative Quality Contract, AQC) with 11 provider organizations. We evaluated the impact of the AQC on spending and utilization of several categories of medical technologies, including one considered high value (colonoscopies) and three that include services that may be overused in some situations (cardiovascular, imaging, and orthopedic services). Methods: Approximately 420,000 unique enrollees in 2009 and 180,000 in 2010 were linked to primary care physicians whose organizations joined the AQC. Using three years of pre-intervention data and a large control group, we analyzed changes in utilization and spending associated with the AQC with a propensityweighted difference-in-differences approach adjusting for enrollee demographics, health status, secular trends, and cost-sharing. Results: In the 2009 AQC cohort, total volume of colonoscopies increased 5.2 percent (p¼ 0.04) in the first two years of the contract relative to control. The contract was associated with varied changes in volume for cardiovascular and imaging services, but total spending on cardiovascular services in the first two years decreased by 7.4% (p¼0.02) while total spending on imaging services decreased by 6.1% (po0.001) relative to control. In addition to lower utilization of higher-priced services, these decreases were also attributable to shifting care to lower-priced providers. No effect was found in orthopedics. Conclusions: As one example of a large-scale global payment initiative, the AQC was associated with higher use of colonoscopies. Among several categories of services whose value may be controversial, the contract generally shifted volume to lower-priced facilities or services. & 2013 Elsevier Inc. All rights reserved. Keywords: Global payment Medical technologies 1. Introduction The growth of health care spending is a major policy concern.1–4 Over the past few years, insurers have begun to adopt new payment strategies centered on moving away from fee-forservice towards bundled payments.5 Global budgets, the most inclusive form of bundled payment in which groups of physicians and hospitals—increasingly in the form of accountable care organizations (ACOs)—receive a fixed amount for all of a patient’s care over a defined time period, are currently being implemented by Medicare and private insurers across the country.6,7 ACOs share savings if total medical spending for their patient population ☆ Funded by the Institute for Health Technology Studies, the Commonwealth Fund, and the National Institute on Aging. n Corresponding author at: Department of Health Care Policy, Harvard Medical School, 180 Longwood Avenue, Boston, MA 02115, United States. E-mail address: [email protected] (Z. Song). 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.003 comes in under the budget and may share deficits when spending exceeds the budget. This latter financial risk gives physician groups a strong incentive to contain spending.8 Economists have long concluded that medical technology is the dominant driver of health care spending growth.9–12 Over the past half century, rapid growth in medical technology has dramatically increased treatment options for many acute and chronic conditions. For example, cardiac catheterization has expanded medicine’s ability to respond to ischemic heart disease. Imaging advancements such as computed tomography (CT) and magnetic resonance imaging (MRI) have revolutionized the speed and accuracy of diagnosis. New pharmaceuticals have introduced treatment options against diseases like cancer and rheumatoid arthritis, turning them from fatal or livable diagnoses into livable chronic conditions. Advancements in surgery and image-guided interventions have broadened the scope of treatment for many conditions. As medical technologies flourished, the appropriateness of their use became a subject of debate. In many situations, the 16 Z. Song et al. / Healthcare 1 (2013) 15–21 appropriateness of an intervention is well accepted and backed by formal practice guidelines. For example, therapies for secondary prevention after a heart attack are generally both clinically effective and low-cost.13 Colonoscopy screening for colorectal cancer is also recognized as highly beneficial, especially in older populations.14–16 In other clinical scenarios, technologies may be inappropriately used in some circumstances. For example, percutaneous coronary intervention is frequently performed for nonacute indications across U.S. hospitals, giving rise to overuse.17,18 The potential overuse of diagnostic imaging in recent decades has also come under scrutiny, such as CT or MRI for patients with low back pain without neurological symptoms or risk factors indicating a need for imaging.19,20 In the case of low back pain, reported health and functioning have not improved as spending on such imaging accrued, supporting the notion that both cost savings and clinical benefit (through avoiding unnecessary radiation exposure) could be achieved through lower use.21,22 Inappropriate imaging generated through self-referrals has been identified as an especially important problem, as physicians have increasingly owned their own diagnostic imaging equipment or facilities.23 Even preventive care is susceptible to overuse in certain populations, such as women who receive unnecessary screening for cervical cancer after undergoing hysterectomy for benign causes.24 Such scenarios are among the lists of low-value services produced by the American Board of Internal Medicine’s “Choosing Wisely” campaign in partnership with 17 specialty societies.25,26 For many reasons, a determination of appropriateness or “value” for any clinical service is inherently difficult. The appropriateness of any individual service depends on a variety of factors, some of which are difficult if not impossible to measure. They include its timing in a patient’s trajectory of care (removing a blood clot from the brain goes from high to low value in a matter of minutes), location of delivery (the same service in a hospital can cost much less in a clinic), and the complex clinical situation within which the treatment decision is made (patients with multiple co-morbidities). This clinical nuance extends to areas of medicine, in which the appropriateness of interventions depends on a patient’s particular risk factors. Other inputs include various dimensions of patient preferences and supply-side (physician or hospital) factors, such as capacity, practice volume, and specialization, which further complicate the value equation.27 Research suggests that in heart attack treatments, the value of any treatment strategy depends on the patient’s fitness for the treatment as well as the expertise a provider has with the technology.28 Physician expertise through the volume-outcomes relationship is also sure to play a role.29,30 Thus, the value of any service may be quite heterogeneous across a population, with some sure to have a wider distribution around the average than others.31 Nevertheless, appropriateness of care has become a focus of health policy, with major legislative efforts such as the Affordable Care Act motivated, in part, by regional variations in care suggesting that perhaps one-third of U.S. health care spending is wasteful.32,33 As the nation experiments with global budgets, understanding their effects on technology-intensive services whose appropriateness can often be unclear is important. We studied the impact of a widespread global budget initiative in Massachusetts on several such categories of services: those thought to be beneficial (colonoscopies), those thought to be overused in some cases (cardiovascular and imaging), and those thought to be driven substantially by patient preferences (orthopedics). 1.1. The Alternative Quality Contract In 2009, Blue Cross Blue Shield of Massachusetts (BCBS) entered into global budget contracts with 7 physician organizations in Massachusetts.34 Four additional organizations joined in 2010 with 15 groups now participating as of 2012. The “Alternative Quality Contract” (AQC) is a 5-year contract that pays organizations a global budget for the entire continuum of care for a population of enrollees in health maintenance organization plans. Enrollees designate a primary care physician (PCP) each year, and budgets are allocated to the PCP’s organization. More than 1600 PCPs and 3200 specialists practice in organizations participating in the AQC, ranging from multi-specialty practices to large tertiary care systems. The AQC also includes pay-for-performance bonuses of up to 10% of the organization’s budget, based on both inpatient and outpatient quality measures. To support AQC organizations, BCBS provides technical assistance including periodic reports that compare an organization’s spending and quality to those of others. Previous work found that the AQC was associated with a 1.9% reduction in spending and modest quality improvements in the first year.35 This grew to a 3.3% reduction in the second year with larger improvements in quality.36 On the whole, savings were achieved through lower prices in the first year, consistent with the organizations’ focus of shifting referrals to less expensive providers.37 By the second year, reductions in utilization also contributed to savings. 2. Methods 2.1. Study design The population included enrollees from January 2006 through December 2010 who were continuously enrolled for at least one calendar year. The 2009 intervention cohort consisted of 428,892 enrollees whose PCPs’ organizations joined the AQC in 2009, and the 2010 cohort consisted of an additional 183,655 enrollees of organizations that joined in 2010. About 1.3 million enrollees whose PCPs did not participate in the AQC served as controls. Characteristics of the population have been described elsewhere. Enrollees averaged 35 years in age, with 50% female. Average costsharing was about 15%. These were stable across the study period. We used a difference-in-difference approach to characterize the treatment effect. Our primary analysis consists of the 2009 cohort, for whom the pre-intervention period was 2006–2008 and postintervention was 2009–2010. Within this cohort, we pre-specified 2 subgroups. The “prior-risk” subgroup consisted of 4 organizations that had prior experience with risk-based contracts from BCBS (88% of the cohort), and the “no-prior-risk” subgroup consisted of 3 organizations that entered the contract without BCBS risk-contracting experience (12% of the cohort). Prior-risk organizations tended to be larger and more established in the marketplace, whereas the latter included physicians groups which were formed in preparation for entering the risk contract. We also analyzed the 2010 AQC cohort, comprising of 4 organizations without prior risk contracting, analogous to the no-prior-risk subgroup. We identified colonoscopy, cardiovascular, imaging, and orthopedic services using Current Procedural Terminology codes and the 2010 Berenson–Eggers Type of Service (BETOS) classification system from the Centers for Medicare and Medicaid Services.39 2.2. Statistical analysis The dependent variable was spending (including patient cost sharing) in 2010 dollars or utilization. Spending was computed from claims payments made within the global budget, which reflects negotiated fee-for-service prices. Utilization was computed by counting services. For ease of interpretation, we scaled utilization data to volume per thousand enrollees. We used a multivariate linear model at the enrollee-quarter level. We controlled for age categories, interactions between age Z. Song et al. / Healthcare 1 (2013) 15–21 and sex, and enrollee risk score. Risk scores were calculated by BCBS from current year demographics and diagnoses grouped by episodes of care, similar to the Hierarchical Coexisting Conditions (HCC) risk adjustment system used by the Centers for Medicare and Medicaid Services to adjust payments to Medicare Advantage plans.40 The risk score comes from a statistical model relating current year spending to current year diagnoses and demographic information. Higher scores denote greater expected spending. In our base model, we also included indicators for intervention status, quarter, quarter-intervention interactions, the post-intervention period, and the interaction between intervention and the post-intervention period, which produced our estimate of the policy effect. To balance the sample on observable traits, the model used propensity weights calculated from age, sex, risk 17 score, and cost-sharing. Given the possibility of unobserved demand-side incentives, the model included indicators for each specific benefit design within BCBS HMO plans. Consistent with our prior work, the model was not logarithmic-transformed because the risk score is designed to predict dollar spending and linear models have been shown to better predict health spending than more complex functional forms.41–44 Standard errors were clustered at the practice level.45,46 Results are reported with 2tailed p-values. We tested the model for differences in pre-intervention trends between treatment and control enrollees. The lack of a difference in pre-intervention trends supports our identification strategy in the difference-in-differences framework. Resulting changes in spending associated with the AQC can be explained by changes Fig. 1. Colonoscopy utilization. For all figures, AQC denotes the 2009 intervention cohort, and Non-AQC denotes the control. The x-axis represents 2006–2010 in quarters, with the vertical line placed at the start of 2009 when the AQC was implemented. Table 1 Changes in utilization in treatment and control groups (volume per 1000 enrollees per quarter). Category Treatment enrollees (AQC) (N ¼428,892) Pre-AQC (2006–08) Post-AQC (2009–10) Control enrollees (Non-AQC) (N¼ 1,339,798) Change Pre-AQC (2006–08) Post-AQC (2009–10) Between-group difference Change Average 2-year effect (2009–10) Unadjusted Colonoscopy All enrollees Over 50 Cardiovascular CABG Aneurysm repair Endarterectomy Angioplasty Pacemaker Other Imaging Standard imaging CT MRI Ultrasound/echo Imaging procedures Orthopedics Hip replacement Knee replacement Adjusted p-Value 13.53 40.76 14.60 40.95 1.07 0.19 12.62 35.09 12.36 33.11 −0.26 −1.98 1.33 2.17 0.70 1.90 0.04 0.13 0.14 0.01 0.04 0.58 0.14 3.71 0.15 0.02 0.03 0.49 0.15 2.85 0.01 0.01 −0.01 −0.09 0.01 −0.86 0.17 0.02 0.03 0.62 0.18 4.26 0.15 0.03 0.04 0.60 0.19 3.00 −0.02 0.01 0.01 −0.02 0.01 −1.26 0.03 0.00 −0.02 −0.07 0.00 0.40 0.01 0.00 −0.01 −0.12 −0.02 0.15 0.60 0.80 0.07 0.02 0.41 0.46 251.99 38.23 49.18 99.56 13.26 269.85 39.21 48.50 95.80 15.67 17.86 0.98 −0.68 −3.76 2.41 265.21 41.30 48.00 96.22 14.28 271.10 39.59 46.95 89.00 16.03 5.89 −1.71 −1.05 −7.22 1.75 11.97 2.69 0.37 3.46 0.66 6.19 1.16 −1.06 0.47 0.07 0.045 0.07 0.30 0.72 0.87 0.15 0.23 0.22 0.31 0.07 0.08 0.18 0.29 0.22 0.35 0.04 0.06 0.03 0.02 0.02 0.01 0.29 0.63 18 Z. Song et al. / Healthcare 1 (2013) 15–21 in utilization (quantity) or changes in prices. We decomposed spending results by category into price and quantity components by standardizing the prices for each service to its median price across all providers in 2006–2010. Differences in spending from repriced claims reflect differences in utilization. We further assessed whether the price effect was due to differential changes in negotiated fees or differential changes in referral patterns (referring patients to less expensive physicians or hospitals). We used our models of utilization to directly analyze the relationship between the AQC and quantity of specific services within each category. Under a global budget, there may be incentives to upcode for increased payments, which would make patients seem sicker and thus spending adjusted for health status seem lower relative to the control group. In prior work, we explored this concern and showed that risk score changes associated with the AQC explain only a nominal share of the spending difference. We used STATA software, version 11. The study was approved by Harvard Medical School. AQC cohort spent on average 7.4% less (−$1.47 per member per quarter, p ¼0.02) relative to control. By year, these savings were 6.8% (p ¼0.001) in year-1 and 7.7% (p ¼0.04) in year-2 (Table 2). The greater difference in year-2 was unrelated to the removal of the 2010 cohort from the control group in year-2 (the 2010 cohort belonged to the control group in year-1). Although selection into the AQC was non-random, an interaction of the secular trend with the AQC indicator demonstrated no significant spending trend differences between AQC and non-AQC groups prior to the intervention. This suggests that the AQC effect is not explained by differential underlying trends in spending between the groups. The 2010 cohort spent 11.1% less (−$2.54, p ¼0.004) on cardiovascular services relative to control its first year. 3. Results 3.1. Colonoscopies Over the first two years of the AQC, the 2009 cohort saw an increase of 0.7 (p ¼0.04) colonoscopies per 1000 enrollees per quarter relative to control (Fig. 1), amounting to a 5.2% rise in the volume of colonoscopies (Table 1). Because colonoscopies are clinically indicated as a screening tool in patients 50 years or older, consistent with higher baseline volumes shown in Table 1, we repeated our analysis in this subpopulation. Estimates show a statistically insignificant increase of 1.9 (p¼ 0.13) colonoscopies per 1000 enrollees per quarter, or about 4.7%, relative to control. In the 2010 cohort, no changes in colonoscopy utilization associated with the AQC were found in its first year (−0.3, p ¼0.48). Spending on colonoscopies increased by 5.7% over the first 2 years in the 2009 cohort ($0.54 per enrollee per quarter). As Table 2 shows, this increase was statistically significant in year-1 (p ¼0.005) but not significant in year-2 (p ¼0.22). The 2010 cohort saw no changes in colonoscopy spending relative to control in its first year ($0.10, p¼ 0.88). Overall, changes in colonoscopy spending were driven by the changes in volume, as we did not find significant changes in colonoscopy prices or location of care associated with the AQC. 3.2. Cardiovascular services Spending on cardiovascular services was higher in the postAQC period relative to pre-AQC for both intervention and control subjects, but the difference was smaller in AQC subjects. The 2009 Fig. 2. Cardiovascular utilization. Table 2 Changes in spending in treatment and control groups ($ per enrollee per quarter). Category Treatment enrollees (AQC) (N ¼ 428,892) Pre-AQC Post-AQC Change Control enrollees (non-AQC) (N ¼1,339,798) Between-group difference Pre-AQC Average 2-year effect Post-AQC Change Unadj. Colonoscopy All enrollees Over 50 yrs 9.42 27.75 Cardiovascular 20.03 22.17 2.14 22.40 Imaging 93.10 106.54 13.44 102.31 4.12 5.68 1.56 4.88 6.30 Orthopedics 11.64 31.73 2.22 3.98 9.39 25.39 10.65 27.78 Adj. Year-1 effect P Year-2 effect P 1.26 2.39 0.96 1.59 0.54 1.37 0.16 0.24 0.55 1.58 25.40 3.00 −0.86 −1.47 0.02 118.80 16.49 −3.05 −5.67 0.001 1.42 0.14 0.04 0.83 P 0.005 0.01 0.56 1.24 0.22 0.38 −1.37 0.001 −1.54 0.04 −4.40 o 0.001 −6.86 0.001 −0.01 0.95 0.07 0.77 Z. Song et al. / Healthcare 1 (2013) 15–21 19 For the 2009 cohort, about one-third of the savings was explained by lower utilization of services, while two-thirds by lower prices. In our decomposition of the price effect, we found no differences in price trends between AQC and non-AQC providers. Instead, spending reductions due to prices were explained by patients receiving care in outpatient facilities with less expensive fees, consistent with our prior work. Direct analyses of cardiovascular services (coronary artery bypass grafts, aneurysm repairs, endarterectomies, angioplasties, and pacemaker insertions) are reported in Table 1 and plotted in Fig. 2. Results suggest that a decrease in angioplasty volume likely drove the differences in spending. Other cardiovascular services did not undergo changes in volume, consistent with Fig. 2. Spending reductions were larger in the no-prior-risk subgroup, which had an average reduction of 27.2% (−$6.06, po 0.001) in cardiovascular spending relative to control in the first two years. In contrast, the prior-risk subgroup experienced smaller reductions of 4.1% (−$0.85, p¼ 0.17). Sensitivity analyses supported our main results. 3.3. Imaging The 2009 AQC cohort spent less on imaging relative to control in the first 2 years. The AQC was associated with a 6.1% decrease (−$5.67 per member per quarter, p ¼0.001) on imaging spending. The year-1 reduction was 4.7% (−$4.40, po0.001) and year-2 was 7.4% (−$6.86, p ¼0.001) (Table 2). There were no significant differences in pre-intervention spending trends on imaging between the AQC and non-AQC groups. Spending reductions were found in the outpatient setting rather than the hospital. In addition, outpatient facility spending (the portion of the fee that goes to the facility) accounted for almost all of the savings, suggesting that imaging services were referred to less expensive settings (such as non-hospital based), giving rise to a large price effect that drove the findings. Consistent with this and Fig. 3, direct analyses of utilization found no statistically significant decreases in the volume of CTs, MRIs, ultrasounds/echocardiograms, or imaging procedures (Table 1). Standard imaging, mostly comprised of X-rays, actually saw an increase in volume of 2.5% (p ¼0.045) over the first 2 years (6.19 per 1000 enrollees per quarter). This further implies a substantial price effect that was able to overcome some volume increases to produce savings. Similar to cardiovascular services, savings were larger in the no-prior-risk subgroup. In this subgroup, the AQC was associated with a reduction of 11.9% on imaging spending (−$10.54 per enrollee per quarter, p ¼0.02) after two years. The 2010 cohort, all no-prior-risk groups, saw a similar reduction of 11.0% (−$10.90, p o0.001) in its first year. In the prior-risk subgroup, savings were smaller. The average reduction after 2 years was 4.7% (−$4.51, p o0.001). 3.4. Orthopedics The AQC was not associated with any changes in spending on orthopedic services, nor were there any associated changes in utilization. The average 2-year effect of the AQC on total orthopedic spending was an insignificant 0.9% ($0.04 per enrollee per quarter, p ¼ 0.83). Neither year-1 nor year-2 saw significant effects (Table 2). This was consistent across prior-risk and no-prior-risk groups in the 2009 cohort. In addition, neither facility nor nonfacility components of orthopedic spending were affected. The 2010 cohort saw a large though statistically insignificant reduction in orthopedic spending of 12.7% (−$0.70, p ¼0.09). Fig. 3. Imaging utilization. Consistent with Fig. 4, direct analyses of utilization demonstrated that the AQC was not associated with changes in the volume of hip or knee replacements (Table 1). 4. Discussion The AQC was associated with increased spending on colonoscopies due to increased volume. It was associated with reductions in spending on cardiovascular and imaging services, but not on orthopedic services. Reductions in spending were larger in year-2 than in year-1, and were explained both by decreased utilization and lower prices achieved through referring patients to less expensive providers. Consistent with qualitative evidence gathered from organizational leaders, shifting referral patterns to less expensive providers was the low-hanging fruit by which organizations in the AQC aimed to achieve savings in the early years.48 However, among both cardiovascular and imaging services, about one-third of the reductions in spending, concentrated in year-2 and in groups who entered the contract from fee-for-service, was explained by lower volume. This suggests that even in the early years, organizations were able to achieve reductions in volume in some services. Under bundled and global payment systems, there can be a tradeoff between controlling spending and supporting innovation. Innovations in medicine often lead to technologies that are high cost, such as new devices or diagnostic modalities, some of which may be high or low value depending on the clinical situation. In cases such as angioplasty and imaging, extending these innovations to patients in certain situations, such as a lack of clinical indications for use or purely elective use, have been questioned on 20 Z. Song et al. / Healthcare 1 (2013) 15–21 Fig. 4. Orthopedic utilization. the grounds of appropriateness. One way that systems operating under global payments have to reduce spending while maintaining quality is to lower spending on low-value services. Thus, a crucial question surrounding our findings is whether the foregone spending was for high or low value services. While our analysis of claims data does not include any conclusive assessments of the effect on value, several pieces of supporting evidence that suggest the AQC did not adversely affect the quality of care and may have successfully targeted services that may be overused in some situations. First, in other research we found that the AQC did not negatively affect the pay-for-performance measures of quality used in the AQC program. In contrast, we found that the AQC was associated with improvements in quality in chronic care management, adult preventive care, as well as pediatric care. Second, the AQC was associated with increased use of colonoscopies, a service generally supported by practice guidelines. Third, our findings on imaging (Fig. 3) are consistent with broader trends in the slow down of imaging, which is posited to be driven largely by reductions in use in lower-value situations.50 Finally, our cardiology results are also consistent with a focus on value. Our results merely provide suggestions, rather than conclusive evidence, about value. We must emphasizing that value depends on the clinical situation. Our study has several limitations. First, the population was drawn from a large commercially insured HMO population in Massachusetts, so results may not generalize to Medicare or enrollees in other types of health plans. Second, we did not observe details of each AQC contract or provider risk contracting with other payers, which have become more prevalent in 2010. Third, using administrative data, we cannot assess the appropriateness of any specific instance of utilization. Fourth, our service categories do not capture the broad array of medical technologies that patients and doctors use. We merely explored a small mixture of services. In addition, beliefs about appropriateness may be in flux; for example, recent practice guidelines for colonoscopies have also called into question its presumed high value.51 Finally, we caution that multiple comparisons within categories of services may produce spurious results. Slowing the growth of health care spending without hurting quality is a central goal.52,53 As payment reform and ACOs gain momentum across the country, and physician organizations adapt to a more constrained environment, avoiding the blunting of technological innovations in medicine is an important priority.54 Different medical technologies will likely be affected differently by global budgets. This will depend on providers’ and patients’ treatment choices, but it will also depend on the ability of device makers and manufacturers of tools, scanners, and other technologies to focus on innovations that are high value. As lessons from early global or bundled payment systems amass,55,56 an understanding of the mechanisms by which spending is reduced will be as important to policymakers as that it can be reduced at all. Acknowledgment Supported by a grant from the Institute for Health Technology Studies and a grant from the Commonwealth Fund (to Dr. Chernew). Dr. Song is supported by a predoctoral M.D./Ph.D. National Research Service Award (F30-AG039175) from the National Institute on Aging and a predoctoral Fellowship in Aging and Health Economics (T32AG000186) from the National Bureau of Economic Research. None of the funders were involved in the design and conduct of the study; collection, management, analysis, and interpretation of the data; or preparation, review, or approval of the manuscript. The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health. The authors thank Yanmei Liu for programming assistance and Johan Hong for help with manuscript editing and preparation. References 1. Emanuel E, Tanden N, Altman S, et al. A systemic approach to containing health care spending. New England Journal of Medicine. 2012;367(10):949–954. 2. Antos JR, Pauly MV, Wilensky GR. Bending the cost curve through marketbased incentives. New England Journal of Medicine. 2012;367(10):954–958. 3. Chernew ME, Baicker K, Hsu J. The specter of financial armageddon—health care and federal debt in the United States. New England Journal of Medicine. 2010;362(13):1166–1168. 4. Aaron HJ. The central question for health policy in deficit reduction. New England Journal of Medicine. 2011;365:1655–1657. 5. Fisher ES, McClellan MB, Safran DG. Building the path to accountable care. New England Journal of Medicine. 2011;365:2445–2447. 6. Frakt AB, Mayes R. Beyond capitation: how new payment experiments seek to find the 'sweet spot' in amount of risk providers and payers bear. Health Affairs (Millwood). 2012;31(9):1951–1958. 7. Meyer H. Many accountable care organizations are now up and running, if not off to the races. Health Affairs (Millwood). 2012;31(11):2363–2367. Z. Song et al. / Healthcare 1 (2013) 15–21 8. Engleberg Center for Health Care Reform. Bending the Curve: Effective Steps to Address Long-Term Health Care Spending Growth. The Brookings Institution; August 2009. 〈http://www.brookings.edu/reports/2009/0901_btc.aspx〉. 9. Newhouse JP. Medical care costs: how much welfare loss? Journal of Economic Perspectives. 1992;6(3):3–21. 10. Garber AM, Skinner J. Is American health care uniquely inefficient? Journal of Economic Perspectives. 2008;22(4):27–50. 11. Chernew ME, Newhouse JP. Health care spending growth. In: Mark V, Pauly Thomas G, McGuire, Barros Pedro P, editors. Handbook of Health Economics, Vol. 2. North Holland: Elsevier Science; 2012. p. 1–43. 12. Cutler DM. Your Money or Your Life: Strong Medicine for America's Health Care System; 2005. 13. Smith Jr. SC, Benjamin EJ, Bonow RO, et al. AHA/ACCF secondary prevention and risk reduction therapy for patients with coronary and other atherosclerotic vascular disease: 2011 update: a guideline from the American Heart Association and American College of Cardiology Foundation. Circulation. 2011;124: 2458–2473. 14. Sonnenberg A, Delco F, Inadomi JM. Cost-effectiveness of colonoscopy in screening for colorectal cancer. Annals of Internal Medicine. 2000;133:573–584. 15. Frazier AL, Colditz GA, Fuchs CS, et al. Cost-effectiveness of screening for colorectal cancer in the general population. Journal of the American Medical Association. 2000;284:1954–1961. 16. Winawer S, Fletcher R, Rex D, et al. Colorectal cancer screening and surveillance: clinical guidelines and rationale—update based on new evidence. Gastroenterology. 2003;124:544–560. 17. Chan PS, Patel MR, Klein LW, et al. Appropriateness of percutaneous coronary intervention. Journal of the American Medical Association. 2011;306(1):53–61. 18. Bradley SM, Maynard C, Bryson CL. Appropriateness of percutaneous coronary interventions in Washington State. Circulation: Cardiovascular Quality and Outcomes. 2012;5(4):445–453. 19. Chou R, Deyo RA, Jarvik JG. Appropriate use of lumbar imaging for evaluation of low back pain. Radiologic Clinics of North America. 2012;50(4):569–585. 20. Webster BS, Courtney TK, Huang YH, Matz S, Christiani DC. Physicians' initial management of acute low back pain versus evidence-based guidelines. Influence of sciatica. Journal of General Internal Medicine. 2005:201132–201135. 21. Martin BI, Deyo RA, Mirza SK, et al. Expenditures and health status among adults with back and neck problems. Journal of the American Medical Association. 2008;299:656–664. 22. Chou R, Qaseem A, Owens DK, et al. Diagnostic imaging for low back pain: advice for high-value health care from the American College of Physicians. Annals of Internal Medicine. 2011;154(3):181–189. 23. Paxton BE, Lungren MP, Srinivasan RC, et al. Physician self-referral of lumbar spine MRI with comparative analysis of negative study rates as a marker of utilization appropriateness. American Journal of Roentgenology. 2012;198 (6):1375–1379. 24. Morbidity and Mortality Weekly Report (MMWR). Cervical cancer screening among women by hysterectomy status and among women aged ≥65 years— United States, 2000–2010. Morbidity and Mortality Weekly Report (MMWR). 2013;61(51):1043–1047. 25. Choosing Wisely: An initiative of the American Board of Internal Medicine. www. choosingwisely.org. 26. Rao VM, Levin DC. The overuse of diagnostic imaging and the Choosing Wisely initiative. Annals of Internal Medicine. 2012;157(8):574–576. 27. Chandra A, Cutler D, Song Z. Who ordered that? The economics of treatment choices in medical care In: Pauly MV, McGuire TG, Barros PP, editors. Handbook of Health Economics, Vol. 2. North Holland: Elsevier Science; 2012. p. 397–432. 28. Chandra A, Staiger DO. Productivity spillovers in health care: evidence from the treatment of heart attacks. Journal of Political Economy. 2007;115(1):103–140. 29. Birkmeyer JD, Siewers AE, Finlayson EV, et al. Hospital volume and surgical mortality in the United States. New England Journal of Medicine. 2002;346: 1128–1137. 30. Van Heek NT, Kuhlmann KF, Scholten, et al. Hospital volume and mortality after pancreatic resection: a systematic review and an evaluation of intervention in the Netherlands. Annals of Surgery. 2005;242(6):781–788. 21 31. Amitabh C, Jonathan S. Technology growth and expenditure growth in health care. Journal of Economic Literature. 2012;50(3):645–680. 32. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 1: the content, quality, and accessibility of care. Annals of Internal Medicine. 2003;138:273–287(4). 2003;138:273–287. 33. Fisher ES, Wennberg DE, Stukel TA, Gottlieb DJ, Lucas FL, Pinder EL. The implications of regional variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Annals of Internal Medicine. 2003;138:288–298(4). 2003;138:288–298. 34. Chernew ME, Mechanic RE, Landon BE, Safran DG. Private-payer innovation in Massachusetts: the ‘Alternative Quality Contract’. Health Affairs (Millwood). 2011;30(1):51–61. 35. Song Z, Safran DG, Landon BE, et al. Health care spending and quality in year 1 of the Alternative Quality Contract. New England Journal of Medicine. 2011;365(10): 909–918. 36. Song Z, Safran DG, Landon BE, et al. The ‘Alternative Quality Contract,’ based on a global budget, lowered medical spending and improved quality. Health Affairs (Millwood). 2012;31(8):1885–1894. 37. Mechanic RE, Santos P, Landon BE, Chernew ME. Medical group responses to global payment: early lessons from the ‘Alternative Quality Contract’ in Massachusetts. Health Affairs (Millwood). 2011;30(9):1734–1742. 39. Centers for Medicare and Medicaid Services. Berenson-Eggers Type of Service; 2010. Available from: 〈https://www.cms.gov/HCPCSReleaseCodeSets/20_BETOS.asp〉. 40. Pope GC, Kautter J, Ellis RP, et al. Risk adjustment of medicare capitation payments using the CMS-HCC model. Health Care Financing Review. 2004;25 (4):119–141. 41. Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. Journal of Health Economics. 2005;24(3): 465–488. 42. Zaslavsky AM, Buntin MB. Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures Journal of Health Economics. 2004;23:525–542. 43. Ai C, Norton EC. Interaction terms in logit and probit models. Economics Letters. 2003;80:123–129. 44. Ellis RP, McGuire TG. Predictability and predictiveness in health care spending. Journal of Health Economics. 2007;26(1):25–48. 45. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica. 1980;48(4):817–830. 46. Goldberger AS. A Course in Econometrics. Cambridge, MA: Harvard University Press; 1991. 48. Mechanic RE, Santos P, Landon BE, Chernew ME. Medical group responses to global payment: early lessons from the ‘Alternative Quality Contract’ in Massachusetts. Health Affairs (Millwood). 2011;30(9):1734–1742. 50. Lee DW, Levy F. The sharp slowdown in growth of medical imaging: an early analysis suggests combination of policies was the cause. Health Affairs (Millwood). 2012;31(8):1876–1884. 51. Austin GL, Fennimore B, Ahnen DJ. Can colonoscopy remain cost-effective for colorectal cancer screening? The impact of practice patterns and the Will Rogers Phenomenon on costs American Journal of Gastroenterology. 2013;108 (3):296–301. 52. Berwick DM. Making good on ACOs' promise—the final rule for the medicare shared savings program. New England Journal of Medicine. 2011;365:1753–1756. 53. McClellan M, McKethan AN, Lewis JL, Roski J, Fisher ES. A national strategy to put accountable care into practice. Health Affairs (Millwood). 2010;29(5): 982–990. 54. Altman SH. The lessons of Medicare's prospective payment system show that the bundled payment program faces challenges. Health Affairs (Millwood). 2012;31(9):1923–1930. 55. Weissman JS, Bailit M, D'Andrea G, Rosenthal MB. The design and application of shared savings programs: lessons from early adopters. Health Affairs (Millwood). 2012;31(9):1959–1968. 56. Fisher ES, Shortell SM, Kreindler SA, Van Citters AD, Larson BK. A framework for evaluating the formation, implementation, and performance of accountable care organizations. Health Affairs (Millwood). 2012;31(11):2368–2378. Healthcare 1 (2013) 22–29 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Reliability of utilization measures for primary care physician profiling Hao Yu a,n, Ateev Mehrotra b,1, John Adams c,2 a b c RAND Corporation, 4570 Fifth Avenue, Pittsburgh, PA 15213, USA RAND Pittsburgh, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA RAND Corporation, 1776 Main Street, Santa Monica, CA 90401-3208, USA ar t ic l e i nf o a b s t r a c t Article history: Received 15 January 2013 Received in revised form 12 April 2013 Accepted 15 April 2013 Available online 9 May 2013 Background: Given rising health care costs, there has been a renewed interest in using utilization measures to profile physicians. Despite the measures' common use, few studies have examined their reliability and whether they capture true differences among physicians. Methods: A local health improvement organization in New York State used 2008–2010 claims data to create 11 utilization measures for feedback to primary care physicians (PCP). The sample consists of 2938 PCPs in 1546 practices who serve 853,187 patients. We used these data to measure reliability of these utilization measures using two methods (hierarchical model versus test–retest). For each PCP and each practice, we estimate each utilization measure’s reliability, ranging from 0 to 1, with 0 indicating that all differences in utilization are due to random noise and 1 indicating that all differences are due to real variation among physicians. Results: Reliability varies significantly across the measures. For 4 utilization measures (PCP visits, specialty visits, PCP lab tests (blood and urine), and PCP radiology and other tests), reliability was high (mean4 0.85) at both the physician and the practice level. For the other 7 measures (professional therapeutic visits, emergency room visits, hospital admissions, readmissions, skilled nursing facility days, skilled home care visits, and custodial home care services), there was lower reliability indicating more substantial measurement error. Conclusions: The results illustrate that some utilization measures are suitable for PCP and practice profiling while caution should be used when using other utilization measures for efforts such as public reporting or pay-for-performance incentives. & 2013 Elsevier Inc. All rights reserved. Keywords: Physician profiling Utilization measures Reliability Claims data 1. Introduction Given the growing concerns about rising health care costs, there is a push to profile providers on costs of care. Utilization measures are commonly used to reflect costs. Utilization measures capture the use of health services within a given patient population and, together with medical prices, drive total costs. Common utilization measures include the number of specialty visits referred among a primary care physician’s (PCP) patient panel or the number of emergency room visits among all the patients at a practice. Since these types of measures are readily obtained from health plan claims,1 they have been used in many ways, including confidentially providing feedback to physicians,2,3 as a basis for financial incentives for physicians,4,5 and public profiling of physicians.6 For example, Centers for Medicare and Medicaid Services (CMS) has two new physician profiling efforts. The Quality and Resource Use Reports will provide confidential n Corresponding author. Tel.: +1 412 683 2300x4460; fax: +1 412 683 2800. E-mail addresses: [email protected] (H. Yu), [email protected] (A. Mehrotra). 1 Tel.: +1 412 683 2300x4894; fax: +1 412 802 4972. 2 Tel.: +1 310 393 0411; fax: +1 310 393 4818. 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.002 feedback reports to tens of thousands of individual physicians on utilization measures such as emergency visits, lab tests, and days in a skilled nursing facility.7 The Physician Value-Based Payment Modifier, which was called for by the Affordable Care Act to link physician performance in quality and cost with Medicare Part B payments,8 is expected to expand over the next 2 years to cover all physicians for performance-based payment. There are also state and local initiatives, such as the California Quality Collaborative, which publicly profiles over 35,000 physicians on utilization measures (e.g. emergency room visits).6 Despite the common use of utilization measures, there is little research on the reliability of these measures.9 Reliability indicates a measure’s capability to distinguish true differences in providers' performance from random variation.10,11 Prior studies have examined reliability of health care quality measures, such as HEDIS,12 cost measures,13 and composite measures.14 Many of these studies have found inadequate reliability for the study measures—the measure is capturing significant random noise. However, to our knowledge, there is no published research on reliability of utilization measures at the individual physician and practice level. A critical issue of any such evaluation is how to measure the reliability of utilization measures. Conceptually, reliability is the H. Yu et al. / Healthcare 1 (2013) 22–29 squared correlation between a measure and the true value.15 Because the true value is rarely known, researchers often estimate the lower bound of reliability.16 For example, the use of test–retest reliability is a common method for estimating reliability of survey instruments. In prior work, Adams and colleagues have used simple hierarchical models to help estimate reliability in cross-sectional data.13 In such models, the observed variability is divided into two components: variability of scores across all providers and variability of scores for individual providers. The provider-to-provider variance is combined with the provider-specific error variance to calculate the reliability of the profile for an individual provider. In this study we use these methods to assess the reliability of utilization measures. Our data are from a local consortium that created the utilization measures for physician feedback. As discussed below, our data come from multiple health plans, reflecting ongoing efforts at the state- and local-level to create multiplepayer claims database for health care improvement.17 We look at two different ways of measuring reliability (hierarchical model versus test–retest) and discuss the differences between them. 2. Methods 2.1. Data We obtained data from THINC, Inc., a nonprofit health improvement organization serving 9 counties in the Hudson Valley of New York State (Putnam, Rockland, Westchester, Dutchess, Orange, Ulster, Sullivan, Columbia, and Greene). In an effort to improve the value of the health care provided in the region, THINC, Inc. has taken a series of steps to construct and monitor utilization measures for PCPs. First, it pooled enrollment and claims data from five major health plans operating in the Hudson Valley, including four commercial plans (Aetna, United Healthcare, MVP Healthcare, and Capital District Physicians' Health Plan), and one Medicaid HMO (Hudson Health Plan) that together cover approximately 60% of the insured population in the Hudson Valley. Second, it used the methods depicted in Fig. A1 in Appendix A to attribute the insured people to PCPs, who are defined as those physicians holding an MD or DO and practicing in one of the specialties of Family Practice, General Practice, Internal Medicine, and Pediatrics. A patient can be attributed to one PCP only. PCPs were aggregated into practices which are the individual physicians at a single geographic site. Third, it created annual utilization measures for each PCP during the 3-year period of 2008–2010. The 11 utilization Test_Retest_Reliability ¼ 23 developed by DxCG, Inc. The patient co-morbidity score available to us was a single value for the entire patient panel. We did not have access to co-morbidity data or risk scores for individual patients. It is common for health plans to profile physicians and practices using just the average risk scores across a patient panel. The study sample consists of 2938 PCPs in 1546 practices, serving 853,187 patients in the Hudson Valley in 2008–2010. 2.2. Two methods of assessing reliability We use two methods for assessing reliability. Our primary method builds on the hierarchical model method used by Adams and colleagues.13 We used the formula below to calculate reliability at the individual physician level: Reliability ¼ Variance_across_physicians Variance_across_physicians þ Variance_within_physicians For a given patient population, the utilization by patients is best viewed as count data. To calculate the variance in a given utilization measure among physicians, we first estimated a Poisson model with random effect for each of the utilization measures. Each model controls for both physician and patient characteristics, such as the physician’s age, gender, and degree, the patient’s average age and average risk adjustment scores, and the proportion of male patients. Then, the variance among physicians is calculated as the product of the model’s covariance and the average utilization rate among physicians for a given utilization measure. For example, the rate of specialist visits by a physician’s patient panel is the panel’s total number of specialist visits divided by the panel size, with the average rate of specialist visits being the mean rate across all the physicians. This calculation is based on the delta method to transform the log scale in the Poisson model into rate. To calculate the variance of a given utilization measure within a physician’s patient panel, we divided the panel’s rate of that utilization measure by the panel size. For example, the variance in specialist visits within a physician’s patient panel is the panel’s rate of specialist visits (described above) divided by the panel size. The calculated reliability ranges from 1 to 0, with 1 suggesting that all the variation results from real differences in performance, and 0 meaning that all the variation in utilization is caused by measurement error. There is no gold standard for what constitutes adequate reliability for a measure of a physicians' performance, although cut-offs of 0.7 and 0.9 have been used in prior studies.18,19 As a second method, we applied the simple test–retest method, which, shown conceptually, is V ariance_across_physicians Variance_across_physicians þ V ariance_of _time_change þ Variance_within_physicians measures (see Appendix B for detailed definitions.), which include annual PCP visits, specialty visits, PCP lab tests (blood and urine), and PCP radiology and other tests, professional therapeutic visits, emergency room visits, hospital admissions, readmissions, skilled nursing facility days, skilled home care visits, and custodial home care services, provide a comprehensive assessment of health services provided to a PCP’s patient panel. In addition to data on the utilization measures, we also obtained the following data from THINC, Inc.: (1) physicians' characteristics, such as degree, age, gender, and clinical specialty; (2) characteristics of each PCP’s patient panel including average age, and proportion of male patients; (3) linkage files that can be used to link practices, physicians, and patients; (4) the average co-morbidity score for a PCP’s patient panel; (5) the average co-morbidity score for patients enrolled in a practice. Co-morbidities were captured using a diagnosis-based model For each physician and practice, we calculated the number of annual visits by types of service utilization per 100 patients and estimated the correlation between the 2008, 2009, and 2010 utilization measures. The calculation is based on the assumption that there is no difference from year to year in a physician’s practice pattern. Given there is likely some shift in practice across years,20 we must consider the test–retest reliability as the lower bound of any reliability estimate. In comparison, the above hierarchical model reflects cross-sectional reliability. We will examine how the results differ between the two methods. 2.3. Sensitivity analyses To test the effect of different model specifications for calculating reliability, we conducted a number of sensitivity analyses, including (1) one analysis that included only physicians' 24 H. Yu et al. / Healthcare 1 (2013) 22–29 Table 1 Characteristics of primary care physicians and their practices. Variable Number or % with standard deviation in ( ) PCP level Age of physician (%) 35 years old or lower 31–50 years old 51 years old and over Gender of physician (%) Male Female Degree of physician (%) DO MD Number of patients in physician panel (%) 0–50 51–100 101–250 250+ Average age of physician’s panel (years) Average proportion of male patients in physician’s panel (%) Average patient risk score in physician’s panela N ¼ 2938 Practice level Number of providers in the practice (%) 1 2 3–5 6–10 11–15 16+ Number of patients in the practice (%) 0–50 51–100 101–250 250+ Average Age of patients in the practice (years) Average proportion of male patients in the practice (%) Average patient risk score in the practicea 5.5 60.7 33.8 tests (1186.6 lab tests per 100 patients per year) while the lowest mean utilization was for custodial home visits (1.0 custodial home visits per 100 patients per year). On average there were 321.1 PCP visits and 289.2 specialty visits per 100 patients per year. Across PCPs, there was wide variation in utilization (Table 2). For example, the 5th percentile of PCP visits by 100 patients in a year is 150.0, about half of the median, and less than one-third of the 95th percentile. 3.3. Comparison of two methods of measuring reliability at PCP level 61.1 38.9 8.2 91.8 16.5 9.9 29.1 44.4 37.9 (19.4) 45.2 (14.7%) 4.25 (5.07) N ¼ 1546 66.3 12.5 9.4 5.1 2.1 4.6 21.7 11.0 22.1 45.2 40.6 (18.7) 47.7 (11.9%) 4.31 (3.91) a As described in text, we do not have risk scores on individual patients. Rather we were provided with a single mean score of each physician’s panel and each practice’s panel. characteristics in the Poisson model, (2) one analysis that included only patients' characteristics in the model, (3) one analysis that examined reliability by physicians’ patient panel size, (4) one analysis that had extreme values Winsorized, and (5) one analysis that looked at test–retest reliability using different combinations of years (2008 vs. 2009, 2009 vs. 2010, 2008 vs. 2010). 3. Results 3.1. Description of PCPs and their practices As shown in Table 1, the mean age of the 2938 study PCPs is 51 years. Males make up a greater proportion of the PCPs (61%). The vast majority of PCPs hold an M.D. (92%). The average patient panel size per PCP is 290. The mean patient age is 38 years, and 45% of the patients are male. The mean risk adjustment score among patients is 4.25. Most PCPs are solo physician practitioners (66.3%). About one-fifth of the practices have 50 patients or less; over 45% of the practices have 250 patients or more. Table 3 summarizes results from the two methods. Using the hierarchical model we found that (1) there is a large variation in mean reliability measures among the 11 types of visits with the highest mean for PCP lab visits (0.98) and the lowest mean for custodial home care services (0.24); (2) six of the 11 types of visits had a mean reliability over 0.70, including PCP visits, specialist visits, PCP lab visits, PCP other visits, emergency visits, and skilled home care visits, while other utilization measures have a mean reliability at 0.70 or blow; (3) most utilization measures had a wide variation in reliability across PCPs. For example, across the PCPs in the sample, the reliability for skilled nursing facility days ranged from 0.06 (5th percentile) to 0.99 (95th percentile). Some measures had less variation across PCPs. For example, the reliability for PCP lab visits ranged from 0.92 (5th percentile) to 0.99 (95th percentile). Using the test–retest method for measuring reliability, most utilization measures have lower reliability. The reduction in reliability between the two methods is substantial for four utilization measures, including professional therapeutic visits, emergency room visits, hospital admissions, and skilled nurse home visits (Table 3). In comparison, there are two utilization measures—PCP lab visits and readmissions—whose reliability does not change as much between the two methods. We also found that the test–retest method had similar results across the different combinations of years we examined (2008 vs. 2010, 2009 vs. 2010, and 2008 vs. 2010) with two exceptions. For professional therapeutic visits and custodial home care services the test–retest results did vary across years (Table 3). Overall, we found that when using either of the two methods, four types of utilization had relatively high reliability (40.7), including PCP visits, specialty visits, PCP lab visits, and PCP other visits. 3.4. Sensitivity analyses at PCP level One sensitivity analysis showed that reliability of utilization measures varies by the PCPs' patient panel size. Fig. 1A shows that the reliability of PCP visits increases with panel size. It increases sharply for panel size below 20, and it becomes flattened for panel size over 60. A similar pattern appears for two other utilization measures, such as specialist visits and PCP lab visits. In sensitivity analyses at the PCP level, we included different co-variates in the Poisson model. If the model includes only PCP characteristics, reliability of most utilization measures will be slightly higher than the model with only patient characteristics. When the model includes both the PCP and patient characteristics, the results are virtually the same as the model using only the patient characteristics. When we Winsorized extreme value, there was no substantial difference in reliability (results not shown). 3.2. Variation in performance on utilization measures across PCPs 3.5. Variation in performance on utilization measures at practice level There is substantial variation in mean utilization across the measures (Table 2). The highest mean utilization was for PCP lab The lower part of Table 2 summarizes utilization measures across practices. The highest mean utilization was for PCP labs H. Yu et al. / Healthcare 1 (2013) 22–29 25 Table 2 Variation in PCP and practice performance on annual number of visits per 100 patients. Level of analysis Type of visits Mean 5th Percentile 25th Percentile Median 75th Percentile 95th Percentile PCP level PCP visits Specialty visits PCP lab tests (blood and urine) PCP radiology and other tests Professional therapeutic visits Emergency room visits Hospital admissions Readmissions Skilled nursing facility days Skilled home care visits Custodial home care services 321.1 289.2 1186.6 198.6 0.9 25.7 9.4 2.2 44.5 22.5 1.03 150.0 62.2 326.1 34.9 0.0 2.3 0.0 0.0 0.0 0.0 0 243.8 130.6 625.4 67.7 0.0 10.8 3.4 0.0 0.0 3.4 0 301.3 266.1 1114.7 183.3 0.0 17.1 6.3 0.6 0.0 13.0 0 368.9 373.5 1533.3 249.0 0.6 29.3 10.0 1.7 4.6 25.3 0 515.1 589.8 2294.3 403.1 3.1 72.7 23.8 6.1 65.3 74.1 1.04 PCP visits Specialty visits PCP lab tests (blood and urine) PCP radiology and other tests Professional therapeutic visits Emergency room visits Hospital admissions Readmissions Skilled nursing facility days Skilled home care visits Custodial home care services 317.7 324.8 1306.4 208.2 0.8 24.7 9.4 2.0 23.0 24.1 0.8 12.0 73.2 343.8 38.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 251.7 182.6 750.6 105.7 0.0 10.7 3.3 0.0 0.0 3.8 0.0 319.8 289.8 1183.7 200.0 0.0 17.9 6.5 0.7 0.0 13.5 0.0 387.7 404.4 1599.2 268.3 0.5 31.1 10.6 1.9 6.2 25.8 0.0 549.1 691.1 2501.0 434.5 3.0 63.2 25.7 8.2 96.7 74.5 1.1 Practice level Table 3 Provider-level reliability estimates by two methodsa. Utilization measure PCP visits Specialty visits PCP lab tests (blood and urine) PCP radiology and other tests Professional therapeutic visits Emergency room visits Hospital admissions Readmissions Skilled nursing facility days Skilled home care visits Custodial home care services a Reliability Measured using simple hierarchical models Reliability measured using test–retest method Mean 5th 25th Median 75th 95th Correlation between Percentile Percentile Percentile Percentile 2008 and 2009 Correlation between 2008 and 2010 Correlation between 2009 and 2010 0.94 0.93 0.98 0.63 0.64 0.92 0.96 0.95 0.99 0.98 0.98 1.00 0.99 0.99 1.00 1.00 1.00 1.00 0.68 0.75 0.93 0.70 0.68 0.84 0.83 0.84 0.90 0.88 0.41 0.89 0.96 0.98 0.99 0.79 0.55 0.76 0.50 0.04 0.35 0.55 0.68 0.81 0.04 0.06 0.32 0.87 0.37 0.88 0.94 0.97 0.98 0.23 0.31 0.39 0.69 0.12 0.61 0.78 0.86 0.92 0.25 0.41 0.31 0.49 0.70 0.04 0.06 0.30 0.37 0.53 0.92 0.68 0.98 0.82 0.99 0.53 0.48 0.52 0.52 0.48 0.85 0.92 0.55 0.94 0.97 0.99 0.99 0.12 0.11 0.45 0.24 0.00 0.04 0.19 0.41 0.65 0.71 0.37 0.36 See Section 2 for details of the two methods. (1306.4 tests per 100 patients per year) while both professional therapeutic visits and custodial home care services had the smallest mean (0.8 visits per 100 patients per year). Table 2 also shows a wide range for each of the utilization measures at the practice level. For example, the 5th percentile of PCP visits was 12 visits per 100 patients per year, less than 5% of the median or 2% of the 95th percentile. 3.6. Comparison of two methods of measuring reliability at practice level Table 4 compares practice-level reliability estimates from the two methods. Using the hierarchical model, only 3 out of the 11 types of utilization had a mean reliability below 0.70, including professional therapeutic visits (0.59), readmissions (0.59), and custodial home visits (0.33). These three utilization measures also had a larger variation in reliability than other measures. Using the test–retest method, we found that (1) the reliability measures were reduced remarkably for most types of visits and (2) the reliability measures were consistent across the years with one exception—custodial home visits—that has unusual high reliability between 2008 and 2009. Putting together results from the two methods, we found that four utilization measures had relatively high reliability (4 0.7) across the two methods, including PCP visits, specialty visits, PCP lab visits, and PCP other visits. We also conducted sensitivity analyses at the practice level. As Fig. 1B shows, there is a positive relationship between the reliability of PCP visits and a practice’s patient panel size. The relationship is strong for panel size below 20, and the reliability 26 H. Yu et al. / Healthcare 1 (2013) 22–29 becomes flattened for panel size over 60. A similar pattern appears for three other utilization measures, such as specialist visits, PCP lab visits, and PCP other visits. There are few differences among three specifications of the Poisson model: the model with PCP characteristics only, the model with patient characteristics only, and the model with both PCP and patient characteristics. 4. Conclusion and discussion Fig. 1. (A) Relationship between panel size and physician-level reliability estimate of PCP visits. (B) Relationship between panel size and practice-level reliability estimate of PCP visits. In an effort to reduce costs, commercial health plans and government payers use utilization measures to profile individual physicians. Such efforts are only useful if these utilization measures are reliable. If they are unreliable and therefore primarily capture random noise, then these efforts may be misguided. In this study we used two methods of estimating reliability to look at utilization measures. We found variations in reliability across the measures. For 4 of the 11 measures (PCP visits, specialty visits, PCP lab visits, and PCP other visits), we found that both methods demonstrated high reliability at both the physician and the practice level. This would imply that the differences observed in these measures reflect, to large extent, real differences between physicians. For the other 7 measures, there were concerns about reliability since the two methods generated discrepant results. The test–retest method generated low reliability for each of the 7 measures while some of these measures, such as readmission, skilled nursing facility days and admissions still had relatively high reliability in the hierarchical model. We also found that most utilization measures had a wide variation in reliability across PCPs. Overall, our findings illustrate the importance of measuring reliability on a regular basis in physician profiling efforts. The high reliability we found for these 4 measures was contrary to our expectations which were based on the prior literature on the reliability of other measures. In the one prior study looking at reliability of utilization measures, Hofer and colleagues (1999) reported that the median reliability of physician visits by diabetic patients was 0.41.1 In related work, we examined the reliability of physician cost profiles and found that 59% of physicians had costprofile scores with reliabilities of less than 0.70, a commonly used cut-off.13 Most other work has focused on reliability of physician quality measures. One recent study found that a composite measure of diabetes care had a physician-level reliability of 0.70 Table 4 Practice-level reliability estimates by two methodsa. Utilization measure PCP visits Specialty visits PCP lab tests (blood and urine) PCP radiology and other tests Professional therapeutic visits Emergency room visits Hospital admissions Readmissions Skilled nursing facility days Skilled home care visits Custodial home care services a Reliability measured using simple hierarchical models Reliability measured using test–retest method Mean 5th 25th Median 75th 95th Correlation between Percentile Percentile Percentile Percentile 2008 and 2009 Correlation between 2008 and 2010 Correlation between 2009 and 2010 0.97 0.98 0.99 0.90 0.94 0.98 0.98 0.99 1.00 0.99 0.99 1.00 0.99 1.00 1.00 0.99 0.99 0.99 0.69 0.79 0.86 0.64 0.79 0.80 0.67 0.90 0.91 0.97 0.89 0.97 0.99 0.99 0.99 0.78 0.67 0.74 0.59 0.13 0.43 0.62 0.77 0.92 0.04 0.09 0.53 0.91 0.68 0.91 0.95 0.98 0.99 0.38 0.33 0.77 0.76 0.33 0.68 0.81 0.89 0.96 0.18 0.33 0.33 0.58 0.89 0.10 0.44 0.42 0.89 0.61 0.96 0.77 0.99 0.91 0.99 0.11 0.20 0.30 0.13 0.28 0.27 0.94 0.79 0.95 0.97 0.99 0.99 0.47 0.26 0.60 0.33 0.02 0.14 0.30 0.51 0.77 0.94 0.24 0.22 See Section 2 for details of the two methods. H. Yu et al. / Healthcare 1 (2013) 22–29 with a sample of 25 patients.14 More recently, a study of primary care performance found that physician level reliability varied widely from 0.42 for HbA1c testing to 0.95 for cervical cancer screening, and that the reliability was below 0.80 for 10 out of the 13 study measures.12 The high reliability we found for these four measures might be due to our much larger sample size. We focus on general utilization measures while the published studies have focused on specific medical procedures for certain medical conditions, such as diabetes, and therefore their sample sizes are much smaller. While our findings of high or low reliability are based on the full patient panel size, not surprisingly our sensitivity analysis shows that the reliabilities are closely related to patient panel size. For example, although the utilization measure of PCP visits, on average, has high reliability, those physicians with small panel size (e.g., below 20) typically still have low reliability estimates as demonstrated in Fig. 1A. The sensitivity analysis also indicates that the reliability becomes flattened for panel size over 60, suggesting that the 7 utilization measures with low reliability are not likely to have higher reliability for larger panel size. It is not so surprising to see the discrepancy in our reliability estimates for the 7 measures between the two methods. This is inherent to some degree given the different formulas embedded in the two methods. While the reliability formulation for the hierarchical model helps one understand the extent to which we can distinguish physicians' performance within a particular year or aggregation of years, the test–retest reliability, a simple correlation coefficients between years, represents the lower bound of reliability estimate, and since it accounts for possible changes over time, the test–retest reliability should never be larger than crosssectional reliability. Contrasting the two reliabilities clearly shows that they have different meanings and they cannot be substituted for each other. The results from the two methods indicate that it is possible to be reliable for detecting differences in physician performance in historical data while having less or far less reliability for predicting future physician performance. One important consideration when assessing reliability is the relationship between reliability and validity. Validity indicates how well a measure represents the phenomenon of interest; reliability indicates the proportion of variability in a measure that is due to real differences in performance. Classically, validity and reliability are considered two separate characteristics of a measure with little relationship between them. If a physician has high reliability on a given measure, it does not mean that the physician performed well on the measure or that the measure is a valid measure of physician performance. Since reliability is difficult to interpret in the absence of validity, it is also important to consider the possible effects of the most common threat to validity, inadequate case-mix adjustment. Although a precise quantification of the effects of failed case-mix adjustment is beyond the scope of this paper, the intuition can be seen in a conceptual extension of the reliability formula: Calculated_Reliability ¼ Variance_across_physicians þ Squared_bias Variance_across_physicians þ Variance_within_physicians þ Squared_bias We distinguish here between calculated reliability and the reliability you would obtain if the bias was removed from the performance measures. The presence of the squared bias in both the numerator and the denominator increases the calculated reliability but does not do so in a way that is relevant for physician profiling. Consider a physician whose panel is healthy beyond what the measured case-mix adjusters can capture. He/she has a structural advantage when compared to his/her peers. Indeed this advantage will likely persist over time and influence calculated test–retest reliability in a similar way. 27 Careful consideration of the adequacy of case-mix adjustment should inform the interpretation of reliability. Unfortunately there is often little analysis that can be done with the available data to inform this important question. This study was not able to address this issue because we had access only to health plans' administrative data and did not have access to information about chronic conditions or risk adjustment scores at individual patient level. Although it is common for health plans to profile physicians and practices using just the average risk scores across a patient panel, including more patientlevel information for risk adjustment to improve reliability analysis could provide an avenue for future inquiry. Our study has some limitations in terms of generalizability. Our physician sample was drawn from the five major health plans in the Hudson Valley, four of which are commercial health plans. The results may not generalize to other regions or to physicians who primarily serve Medicare patients. Our results may also not be generable with over 66% of the study physicians being solo practitioners, a proportion higher the national average. Our study could be expanded by using representative samples or data from large scale policy interventions. For example, it remains an interesting topic for future studies to assess reliability of the measures included in the CMS' QRUR system, which contain performance information on 28 quality indicators to provide confidential feedback reports to physicians in nine states (California, Illinois, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, or Wisconsin).7 Our study findings have important policy implications. The results of high and stable reliability of the four utilization measures suggest that these measures may be reliably used for PCP profiling by health plans and other health care organizations. In fact, some of the measures (e.g. PCP prescribed radiology visits) have already been used by organizations, such as the California Quality Collaborative to monitor resource use by PCPs.6 Our results concerning the other utilization measures suggest that they should be used with caution for provider profiling. Our finding that the reliability increases with patient panel size before it becomes flattened for panel size over 60 suggests that physician profiling programs should not include physicians with relatively small panel size (e.g. below 60). Our results also highlight the importance of assessing reliability as a standard aspect of any profiling effort. Utilization measures have been in use for many years, yet this is one of the first studies to assess their reliability. Hospital readmission measures have become a critical aspect of the health care system, yet, to our knowledge, no one has assessed their reliability before. Given the substantial variation we found across utilization measures, we encourage organizations, which are using utilization measures to profile physicians, to evaluate reliability of their utilization measures. We also emphasize that reliability varies across providers. Even if a given measure has a high reliability on average, it does not mean that all providers should be profiled on that measure. One option is to only profile providers where the reliability for that provider meets a threshold (for example, 0.70). We feel this would be more fair to providers than simply using volume cut-offs (for example, a patient panel of 60) because our results underscore that even above a volume cut-off (as demonstrated in our analysis of the full patient panel), reliability may not be sufficiently high for some utilization measures. Acknowledgment This study was funded by THINC, Inc. Appendix A See Fig. A1. 28 H. Yu et al. / Healthcare 1 (2013) 22–29 Did the patient specify a primary care physician to the health plan upon enrollment during the measurement year? Did the patient see any primary care physician from the study sample at least once in the measurement year? No Yes Exclude No Yes Select the primary care physician specified most recently by the patient in the measurement year Select the primary care physician with most visits for that patient during the measurement year. Did the patient see that primary care physician for at least one visit (preventive care and E&M) in the measurement year? If there is a tie between two primary care physicians, select the physician with the most recent visit. Exclude No Yes Is that primary care physician included in the study sample? Exclude No Did the patient see that primary care physician for at least one preventive care or E&M care visit in the measurement year? Exclude No Yes Yes Attribute the patient to the primary care physician (Imputation Method) Attribute the patient to that primary care physician (Enrollment Method) Fig. A1. Methods of attributing a patient to a PCP. Appendix B. Definitions of utilization measures. Measure where Rendering Provider meets the all of the following criteria: Is not a pediatrician, family practice, internal medicine, or general practice as determined by submitted taxonomy codes Is a physician Measure description PRIMARY CARE Count of Patient–Provider PHYSICIAN (PCP) VISITS encounters for the primary care providers. A visit is defined by distinct consistent Member id, Rendering Provider id, and Date of Service where Rendering Provider is a PCP Provider (pediatrician, family practice, internal medicine, or general practice as determined by submitted taxonomy codes). SPECIALIST VISITS Count of Patient–Provider encounters for specialist providers. A visit is defined by distinct Consistent Member id, Rendering Provider id, and Date of Service PCP LABORATORY TESTS (BLOOD AND URINE) PCP RADIOLOGY AND OTHER TESTS Count of lines for any claim where: Service code has BETOS code for lab test OR Revenue code is for lab test. Count of lines for any claim where: Service code has BETOS code for radiology or diagnostic test OR Revenue code is for radiology or diagnostic test. H. Yu et al. / Healthcare 1 (2013) 22–29 PROFESSIONAL Count of occupational, physical and THERAPEUTIC SERVICES speech therapy services using corresponding service codes. EMERGENCY Count of claims where CPT Code, DEPARTMENT VISITS Revenue code or CPT/Place of Service Code indicates an Emergency Department (ED) visit. Multiple ED visits on the same date of service are counted as one visit. ED visits that also have Room and Board are not counted (this means the patient was admitted to the hospital). HOSPITAL ADMISSIONS Based on fields such as claim ID, admit & discharge date, Type of Bill, Revenue Code and DRG Code, all facility claims are aggregated into hospital admissions. This field is a count of those admissions. READMISSIONS If all of the following conditions are met, count the admission as a readmission: The number of days between the current admission and a previous admission is 30 days or less. The admission must have the same Member ID and Product ID as the previous admission. If an admission is within the 30 days of more than 1 preceding admission, count it only once. SKILLED NURSING FACILITY DAYS SKILLED HOME CARE VISITS CUSTODIAL HOME CARE SERVICES Sum of the Length of Stays (LOS) from admissions in a Skilled Nursing Facility (SNF). SNF is an admission with the following criteria: Type of Bill code (TOB) is “Skilled Nursing facility—Inpatient” or “Skilled Nursing facility—Swing Bed” Or Revenue code on any admission code indicates SNF. Count of distinct Patient, Provider, service dates where Place of service code or BETOS code indicates Home Care visit. Count of claim lines where service code is indicates Home Care Services. 29 References 1. Hofer TP, Hayward RA, Greenfield S, Wagner EH, Kaplan SH, Manning WG. The unreliability of individual physician report cards for assessing the costs and quality of care of a chronic disease. JAMA. 1999;281(22):2098–2105. 2. Fung CH, Lim YW, Mattke S, Damberg C, Shekelle PG. Systematic review: the evidence that publishing patient care performance data improves quality of care. Annals of Internal Medicine. 2008;148(2):111–123. 3. Faber M, Bosch M, Wollersheim H, Leatherman S, Grol R. Public reporting in health care: how do consumers use quality-of-care information?: A systematic review Medical Care. 2009;47(1):1–8. 4. Rosenthal MB, Landon BE, Normand S-LT, Frank RG, Epstein AM. Pay for performance in commercial HMOs. New England Journal of Medicine. 2006;355(18):1895–1902. 5. Milstein A, Lee TH. Comparing physicians on efficiency. New England Journal of Medicine. 2007;357(26):2649–2652. 6. California Quality Collaborative. Actionable reports that show physician level performance compared to peer group for specified measures. 〈http://www. calquality.org/programs/costefficiency/resources/〉, and 〈http://www.calquality. org/about/〉; 2012 Accessed 18.05.12. 7. Centers for Medicare & Medicaid Services. QRUR template for individual physicians. 〈http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ PhysicianFeedbackProgram/Downloads/QRURs_for_Individual_Physicians.pdf〉; 2012 Accessed 18.05.12. 8. VanLare Jm Blum JD, Conway PH. Linking performance with payment: implementing the physician value-based payment modifier. JAMA. 2012;308 (20):2089–2090. 9. Fischer E. Paying for performance—risks and recommendations. New England Journal of Medicine. 2006;355:1845–1847. 10. Shahian DM, Normand S-L, Torchiana DF, et al. Cardiac surgery report cards: comprehensive review and statistical critique. The Annals of Thoracic Surgery. 2001;7(6):2155–2168. 11. Landon B, Normand S, Blumenthal D, Daley J. Physician clinical performance assessment: prospects and barriers. JAMA. 2003;290(9):1183–1189. 12. Sequist TD, Schneider EC, Li A, Rogers WH, Safran DG. Reliability of medical group and physician performance measurement in the primary care setting. Medical Care. 2011;49(2):126–131. 13. Adams JL, Mehrotra A, Thomas JW, McGlynn EA. Physician cost profiling— reliability and risk of misclassification. New England Journal of Medicine. 2010;362(11):1014–1021. 14. Kaplan SH, Griffith JL, Price LL, Pawlson LG, Greenfield S. Improving the reliability of physician performance assessment identifying the physician effect on quality and creating composite measures. Medical Care. 2009;47 (4):378–387. 15. Bland J, Altman D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1(8476):307–310. 16. Fleiss J, Levin B, Paik M. Statistical Methods for Rates & Proportions.Indianapolis, IN: Wiley-Interscience; 2003. 17. Love D, Custer W, Miller P. All-Payer Claims Databases: State Initiatives to Improve Health Care Transparency, Common Wealth Fund Publication No. 1439, vol. 99. New York: Commonwealth Fund; 2010. 18. Safran DG, Karp M, Coltin K, et al. Measuring patients’ experiences with individual primary care physicians. Results of a statewide demonstration project. Journal of General Internal Medicine. 2006;21(1):13–21. 19. Hays RD, Revicki D. Reliability and validity (including responsiveness). In: Fayers P, Hays R, editors. Assessing Quality of Life in Clinical Trials. New York: Oxford University Press Inc.; 2005. 20. Phelps C. Health Economics. 3rd ed.,Boston: Addison-Welsley Educational Publishers; 2008. Healthcare 1 (2013) 30–36 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Contributors to variation in hospital spending for critically ill patients with sepsis Tara Lagu a,b,c,n, Michael B. Rothberg d, Brian H. Nathanson e, Nicholas S. Hannon a, Jay S. Steingrub a,c,f, Peter K. Lindenauer a,b,c a Center for Quality of Care Research, Baystate Medical Center, Springfield, MA, USA Division of General Internal Medicine, Baystate Medical Center, Springfield, MA, USA c Department of Medicine, Tufts University School of Medicine, Boston, MA, USA d Department of Medicine, Medicine Institute, Cleveland Clinic, Cleveland, OH, USA e OptiStatim, LLC, Longmeadow, MA, USA f Division of Critical Care Medicine, Baystate Medical Center, Springfield, MA, USA b ar t ic l e i nf o a b s t r a c t Article history: Received 17 January 2013 Received in revised form 24 April 2013 Accepted 26 April 2013 Available online 9 May 2013 Background: Costs of severe sepsis in the US exceeded $24 billion in 2007. Identifying the relative contributions of patient, hospital, and physician factors to the variation in hospital costs of sepsis could help target efforts to improve the value of care. Methods: We identified adults with a principal or secondary diagnosis of sepsis who received care between June 1, 2004 and June 30, 2006 at one of the hospitals participating in a multi-institutional database. We constructed a regression model to predict mean hospital costs that included patient characteristics, hospital mission and environment (e.g., teaching status, percentage of low-income patients), hospital fixed costs, and risk-adjusted length of stay, which encompasses hospital throughput, the incidence of complications, and other aspects of physician practice. To determine the contribution to cost variance by each predictor, we calculated the R2. Results: At 189 hospitals, we identified 40,265 adults with sepsis who met inclusion criteria. The median cost of a hospitalization was $20,216. The model explained 69% of the hospital-level variation in the costs of hospitalization. Of explained variation, differences in patients' ages, comorbidities, and severity accounted for 20%; hospital mission and environment represented 16%; differences in hospital fixed costs, including acquisition costs and overhead, accounted for 19%; and wage index explained an additional 12%. Risk-adjusted length of stay comprised the final one-third of explained variation. Conclusion: A large proportion of variation in the cost of caring for critically ill patients with sepsis across hospitals is related to differences in patient characteristics and immutable hospital characteristics, while nearly one-third is the result of differences in risk-adjusted length of stay. Implications: Efforts to reduce spending on the critically ill should aim to understand determinants of practice style but should also focus on hospital throughput, overhead, acquisition, and labor costs. & 2013 Published by Elsevier Inc. Keywords: Sepsis Critical illness Costs Introduction There were more than 700,000 cases of severe sepsis in the United States in 2007, with a combined cost exceeding $24 billion.1 Recent growth in the number of annual sepsis cases suggests that the costs of sepsis care in the US will only continue to increase.1,2 Efforts to reduce health care spending have focused on variations in practice, because physicians and hospitals differ in the testing and treatment they provide.3–7 In addition to differences in the intensity of care n Correspondence to: Center for Quality of Care Research, Baystate Medical Center, 280 Chestnut St., 3rd Floor, Springfield, MA 01199, USA. Tel.: +1 413 505 9173; fax: +1 413 794 8866. E-mail addresses: [email protected], [email protected] (T. Lagu). 2213-0764/$ - see front matter & 2013 Published by Elsevier Inc. http://dx.doi.org/10.1016/j.hjdsi.2013.04.005 provided to patients with sepsis, however, there are patient and hospital factors that influence costs at the hospital level.8 Sepsis patients can present with a broad range of signs and symptoms, from mildly deranged vital signs to fulminant disease with organ system failure.9–12 A sicker patient population requires more monitoring, testing, and treatment. Additionally, each hospital's mission and environment, such as its role in the training of young physicians, its provision of care for low income patients, its location (region of the country, urban or rural), or its ability to negotiate for lower costs of drugs, devices, and supplies may influence its spending.8,13–17 For example, higher wages are related to geographic location and increase the cost of caring for each patient.14 Identifying the relative contributions of patient, hospital, and physician factors to the variation in spending on sepsis observed Tara. Lagu et al. / Healthcare 1 (2013) 30–36 between hospitals could help target efforts to improve the value of sepsis care, identify the extent to which efforts to change physician practice could result in cost savings, and highlight the need to reduce fixed and acquisition costs.18 In this study, we aimed to describe sources of variation across hospitals in the average cost of care for critically ill patients with sepsis in order to reduce some of the mystery surrounding variation in hospital spending and to identify real-world strategies (e.g., limiting utilization vs. improving throughput vs. holding off on large capital investments vs. altering purchasing strategies) most likely to help reduce some of this variation over time. Methods Setting and subjects Using data from hospitals that participated in the Perspective database (Premier Healthcare Informatics, Charlotte, NC), we conducted a cohort study of medical patients with sepsis who were treated in the ICU between June 1, 2004 and June 30, 2006. Perspective contains a date-stamped log of all items and services charged to the patient or insurer (such as medications, laboratory tests, and therapeutic services) in addition to the elements found in hospital claims derived from the uniform billing 04 (UB-04) form. Participating hospitals closely approximate the makeup of acute care hospitals nationwide and the database includes roughly 15% of US hospitalizations. Participating hospitals are similar to the composition of acute care hospitals nationwide. They represent all regions of the United States, and represent predominantly small- to mid-sized non-teaching facilities that serve a largely urban population. The detailed nature of the database is a notable advantage, particularly its cost data.18,19 Approximately 75% of hospitals that participate in Perspective submit item-level information on actual hospital costs, taken from internal cost accounting systems. The remaining 25% provide cost estimates based on Medicare cost-to-charge ratios. Permission to conduct the study was obtained from the Institutional Review Board at Baystate Medical Center. Patients with a principal or secondary diagnosis of sepsis (Appendix 1) were included if they were 18 years of age or older and, by the second hospital day, were admitted to the ICU, treated with antibiotics, and had blood cultures drawn. Diagnostic information was assessed using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes. We restricted the analysis to medical (nonsurgical) patients for whom treatment was initiated within the first 2 days of hospitalization in order to focus our investigation on the care of patients who presented with sepsis (rather than those who developed it later during the hospitalization). We used the first 2 days of hospitalization (rather than just the first day) because in administrative datasets the duration of the first hospital day includes partial days that can vary in length. We limited the sample to hospitals that admitted at least 100 critically ill sepsis patients during the 2-year study period. This prevented hospital-level results from being distorted by hospitals with few patients. We also excluded patients who were transferred from or to another acute care facility because we could not accurately determine the clinical outcomes or cost of the hospitalization. To lessen the possibility of including patients in the study who may have had a working diagnosis of sepsis that was never confirmed, we limited the analysis to patients who were treated with antibiotics for at least 3 consecutive days (except in the case of death). Contributors to hospital cost For each hospital, we recorded size, teaching status, geographic region, and whether it served an urban or rural population. 31 We then constructed additional variables comprising patient and hospital factors that may contribute to a hospital's average cost for caring for patients with sepsis. We grouped these into four categories: patient factors, hospital mission and environment, fixed costs, and risk-adjusted length of stay. Patient factors For each patient, we recorded age, gender, marital status, insurance status, race, and ethnicity (as recorded by admission or triage staff of participating hospitals using hospital-defined options). We used software provided by the Healthcare Costs and Utilization Project of the Agency for Healthcare Research and Quality20 to assess the presence of 25 comorbid conditions. We used diagnosis codes to identify the source (lung, abdomen, urinary tract, blood, other) and type of infection (gram positive, gram negative, mixed, anaerobic, fungal). We also recorded use of initial critical care therapies (e.g., mechanical ventilation, vasopressors) which can serve as useful proxies for severity of illness in the absence of physiologic data. These therapies are included in a mortality prediction model with similar discrimination and calibration to clinical ICU risk-adjustment models (e.g., MPM III).21,22 Hospital mission and environment We included two variables to account for the impact of hospital mission on cost variation. The first, resident-to-bed ratio, represents the presence of teaching and the teaching burden; a higher ratio indicates a greater teaching burden. The second, disproportionate share of low-income patients, accounts for hospital differences in the proportion of low-income Medicare patients and the proportion of patient-days for which Medicaid is the primary payer. Disproportionate share of low-income patients may contribute to variation because low-income patients tend to have complex health care needs and may require additional staffing and services like translators and social workers.23 Fixed and acquisition costs Hospitals also pay different prices to acquire goods (e.g., a dose of antibiotic has a different cost at different hospitals) and have different fixed costs, such as overhead, labor, or infrastructure investments (e.g., electronic records, new buildings, or expensive radiological or surgical equipment).14,15 Fixed costs and acquisition costs vary across hospitals. One important caveat to identifying the contribution of fixed and acquisition costs to cost variation is that these costs can be difficult to reduce. They are often considered to be the cost of “keeping the lights on” and thus are not often the focus of cost cutting measures. Still, there has been discussion in the literature as to the ways that the fixed and acquisition costs of critical care can be reduced.14 Some of these can be implemented with relative ease, such as renegotiating purchasing contracts and holding off on capital purchases and infrastructure investments. Others are more painful (e.g., reducing the labor force in areas that are “overstaffed” and eliminating ICU beds that are underutilized). Typically, hospitals use cost accounting systems to add fixed and acquisition costs proportionally to all billable items.18,19 A day of ICU room and board is one high-cost item that was included in each patient's hospital bill because of our inclusion criteria. It includes the fixed and acquisition costs of a day of critical care, including nursing care, but does not include the costs of associated diagnostic tests or treatments for patients in the ICU. In a prior analysis, we reported that room and board either on the floor or in the ICU represents about half of the costs of a patient stay for sepsis patients,18 and others have reported that ICU room and board drives critical care costs.24 We therefore created a fixed cost 32 Tara. Lagu et al. / Healthcare 1 (2013) 30–36 index by dividing the median cost of an ICU day at each hospital by the overall median cost of an ICU day for all hospitals. A hospital with median ICU room and board costs would have an indexed value of 1. Because wages may be an important contributor to variation in hospital costs, we accounted for differences in labor costs using a wage index. The wage index is calculated using relative hospital wage levels in the geographic area of the hospital compared to the national average hospital wage level. Length of stay index Because room and board accounts for the largest percentage of hospital costs,18,19 decisions concerning the timing of discharge, as reflected in length of stay (LOS) have a large impact on overall hospital costs. ICU LOS is determined by factors both within and beyond the control of the physician. A factor which may be beyond the control of the physician is throughput, which is determined partly by the availability of beds on the floor for ICU patients. Factors affecting LOS that are in the control of the physician are the decision to keep the patient in the ICU (or hospital) for another day for observation or the pursuit of additional diagnostic testing that might lead to an extra day in the hospital.25,26 We therefore modeled each hospital's risk-adjusted length of ICU stay to encompass these factors. We calculated a predicted length of ICU stay at the patient level with a generalized linear model that adjusted for demographics, comorbidities, early organ supportive therapies, and hospital characteristics. By calculating predicted ICU LOS and comparing it to observed ICU LOS, we adjusted for patient severity on presentation (because a patient with greater illness severity on presentation would be expected to have a longer ICU length of stay). We derived Pearson residuals at the patient level in order to calculate average Pearson residuals at the hospital level. For example, a hospital that keeps patients, on average, 2 days longer in the ICU than would be expected based on their patient's presenting severity and comorbidities would have an average Pearson residual value of 2. We then standardized the hospitallevel Pearson residuals to have a mean of 0 and a standard deviation of 1 (hospitals with longer than expected ICU LOS values have LOS indices greater than 0). We defined this number as the “Length of Stay (LOS) Index.” Angle Regression (LAR) is a sequential model-building algorithm that considers parsimony as well as prediction accuracy. We then examined Partial (Type III) Sum of Squares in the hospital-level model. We calculated the correlation coefficient (“r”) and squared it (R2) to calculate the percent variance explained for each predictor.29 This is equivalent to creating individual regression models where each variable is the sole predictor and hospitallevel patient costs as the dependent variable. Finally, we examined standardized coefficients of the regression model to see how a one standard deviation change in the predictors affect average hospital costs. All analyses were carried out using STATA/SE 10.1 (StataCorp, College Station, TX). Results We identified 272,785 adults with sepsis who were admitted to a Premier hospital between June 1, 2004 and June 30, 2006. Of these, 40,265 of these were medical patients who received 3 consecutive days of antibiotics and at least one blood culture (Table 1). Almost all hospitals (91%) had more than 200 beds and half (49%) had more than 400 beds. Nearly half (47%) participated in teaching activities and most (90%) were located in urban locations. Patient and hospital factors The median age of patients was 69 years (Table 2); half (50%) were women. The majority (62%) were insured by traditional Medicare plans. Most (61%) were white (Table 1). Hypertension (33%), diabetes (33%), and anemia (31%) were the most common comorbid conditions. More than one-third of patients (36%) received mechanical ventilation by hospital day 2 and more than half (55%) received vasopressors. Median LOS was 8 days and median ICU LOS was 4 days. The median cost of a hospitalization was $20,216 with an interquartile range of $16,997–24,129. Observed mean LOS and patient costs at the hospital level were correlated (r ¼0.67, p o0.001) and we found no hospital in the highest quintile of patient costs with the lowest quintile of LOS. Table 1 Characteristics of hospitals treating medical patients with sepsis in the intensive care unit. Outcomes Our outcome of interest was each hospital's median and mean cost of a hospitalization for a medical patient with sepsis who was treated in the ICU. Analysis We calculated patient-level summary statistics using frequencies for binary variables and medians and interquartile percentiles for continuous variables. We then created a hospital-level regression model to predict mean hospital costs. The independent variables in this model were patient demographics, comorbidities, early organ supportive therapies averaged at the hospital level and the hospital-level variables defined above (hospital environment variables, fixed cost and length of stay indices). We chose variables for inclusion based on a bootstrapping algorithm with stepwise regression.27 This was based on two iterations of 100 bootstrapped samples each. The stepwise algorithm was backward selecting with p ¼0.05 to enter the model and p ¼0.1 to remain in the model. We also re-analyzed the data using Least Angle Regression to see how including more factors would affect our results.28 Least Number of hospitals n (%) Number of records n (%) Total 189 (100) 40,265 (100) Number of beds o 200 200–400 4 400 16 (8.5) 80 (42.3) 93 (49.2) 2837 (7.0) 14,201 (35.3) 23,227 (57.7) Teaching 89 (47.1) 18,744 (46.6) Geographical location South North Midwest West 93 33 38 25 20,656 6744 8415 4450 Urban (vs. rural) 171 (90.5) (49.2) (17.5) (20.1) (13.2) Median [IQR] a DSH patient percentage (3-year average) Wage index (3-year average) Ratio of residents to beds (3-year average) a 23.2% [15.4%, 31.5%] 0.96 [0.92, 1.08] 3.0% [0%, 15.9%] Disproportionate share of low-income patients. (51.3) (16.7) (20.9) (11.0) 37,524 (93.2) Tara. Lagu et al. / Healthcare 1 (2013) 30–36 Table 2 Characteristics, treatments, and outcomes of patients with medical sepsis treated in the intensive care unit at 189 US hospitals. Overall median [IQR] or n (%) across patients Median [IQR] across hospitals Total 40,265 (100) 189 (100) Demographics Age (in years) 69 [56,80] 66.4 [64.3, 68.9] Female 20,119 (50.0) 49.6% [47.0, 52.5] Race/ethnicity White Black Hispanic Other/unknown 24,599 (61.1) 7795 (19.4) 2091 (5.2) 5780 (14.4) 69.6% [43.4, 85.0] 10.7% [3.0, 27.6] 0.7% [0, 3.2] 3.8% [1.4, 12.3] Marital status Married Widowed Single Separated/divorced Other Not recorded 14,866 (36.9) 7700 (19.1) 8197 (20.4) 3952 (9.8) 1833 (4.6) 3717 (9.2) 42.0% [32.8, 46.5] 20.5% [15.8, 24.4] 20.2% [14.9, 25.4] 10.0% [6.5, 13.2] 1.4% [0, 3.4] 0% [0, 0] Insurance Medicare traditional Managed care Medicaid traditional Self pay/Other Medicare managed Commercial/private Medicaid managed 24,962 (62.0) 5581 (13.9) 3327 (8.3) 2484 (6.2) 2068 (5.1) 1293 (3.2) 550 (1.4) 63.9% [55.6, 70.3] 12.1% [8.5, 18.0] 6.8% [3.6, 11.3] 5.1% [2.8, 7.6] 0.6% [0, 8.9] 1.9% [0.7, 4.7] 0% [0, 1.6] 13,111 (32.6) 13,444 (33.4) 32.8% [25.9, 38.6] 32.8% [28.3, 38.8] 12,810 (31.8) 12,266 (30.5) 12,732 (31.6 ) 8839 (22.0) 7746 (19.2) 5901 (14.7) 3850 (9.6) 4070 (10.1) 2968 (7.4) 3252 (8.1) 3461 (8.6) 2572 (6.4) 2505 (6.2) 2039 (5.1) 2382 (5.9) 1776 (4.4) 2346 (5.8) 1411 (3.5) 30.1% [23.8, 39.6] 30.1% [25.0, 36.1] 31.8% [26.9, 35.5] 21.2% [17.6, 24.9] 18.2% [13.5, 24.1] 13.8% [8.8, 21.0] 9.0% [6.4, 12.2] 9.6% [6.8, 13.4] 6.4% [3.8, 9.8] 7.6% [4.9, 10.6] 8.0% [5.4, 10.8] 6.1% [4.2, 8.1] 5.7% [3.7, 8.4] 4.7% [3.4, 6.5] 5.7% [4.1, 7.7] 4.1% [2.8, 5.7] 5.6% [3.9, 7.7] 3.4% [2.3, 4.5] 1183 (2.9) 1274 (3.2) 1352 (3.4) 645 (1.6) 110 (0.3) 2.7% [1.7, 3.9] 2.8% [1.5, 4.4] 2.8% [1.4, 4.9] 1.2% [0.6, 2.3] 0% [0, 0.4] 13,946 (34.6) 16,626 (41.3) 5493 (13.6) 33.6% [30.1, 37.7] 40.9% [35.6, 46.8] 13.3% [9.9, 16.2] 973 (2.4) 1.8% [0.9, 3.3] 10,887 (27.0) 26.6% [23.8, 30.8] Clinical characteristics Comorbidities Hypertension Diabetes (with or without complications) Anemia (chronic and acute) Chronic pulmonary disease Congestive heart failure Neurological disorders Renal failure Weight loss Solid tumor w/out metastasis Hypothyroidism Depression Peripheral vascular disease Valvular disease Paralysis Obesity Metastatic cancer Liver disease Psychoses Alcohol abuse Rheumatoid arthritis/collagen disease Lymphoma Drug abuse Pulmonary circulation disease Peptic ulcer disease AIDS Site of infection Urinary (primary or secondary) Lung (primary or secondary) Abdominal (primary or secondary) Blood (and not Abdominal, lung, or urinary) Other (e.g., skin, bone)/ Unknown Infection type Gram positive Gram negative Other (fungal, viral, anaerobic, mixed) Unknown 33 Table 2 (continued ) Primary diagnosis of sepsis Treatments (by day 2) Mechanical ventilation Vasopressors Outcomes LOS (days) ICU LOS (days) Patient costs (US $) In-hospital mortality Readmission (for survivors only) Overall median [IQR] or n (%) across patients Median [IQR] across hospitals 22,272 (55.3) 56.0% [48.2, 62.2] 14,402 (35.8) 34.1% [28.3, 42.3] 22,334 (55.5) 54.9% [47.8, 62.8] 8 [5, 14] 4 [2, 7] $15,321 [$8978, $26,359] 13,277 (33.0) 5394 (20.0) 10.2 [9.3,11.4] 5.1 [4.5, 6.0] $20,216 [$16,997, $24,129] 33.6% [28.0, 38.0] 19.5% [16.3, 23.3] Fixed cost index The fixed cost index varied widely across hospitals (Fig. 1), with 17% of hospitals having an ICU day cost that was more than 1.25 times the median, and 13% of hospitals with an ICU day that was less than 0.75 times the median cost. Length of stay index The ICU LOS index also varied considerably across hospitals (Fig. 1). By definition, the mean and standard deviation (SD) of this index were 0 and 1 respectively. There were nine hospitals (5%) with a length of stay index greater than 2 SDs from the mean and 50 hospitals (26%) had LOS indices more extreme than 1 SD from the mean. Model examining contribution of patient and hospital factors to cost variation Our final model contained 11 variables and had an adjusted R2 of 0.67 (Table 3). Overall, our model explained 69% (the unadjusted R2) of the hospital-level variation in the average cost of a hospitalization for a critically ill sepsis patient (Fig. 2). Of explained variation, differences in patient characteristics (including age, comorbidities, and severity variables) explained 20% of the variation in cost across hospitals. Teaching burden (i.e., the resident-to-bed ratio) represented 16% of explained variation, wage index represented 12%, and fixed costs accounted for 19%. LOS index accounted for 33% of explained variation. We found that the significant variables with the LAR method were very similar to what we included in our final model via the stepwise bootstrapping algorithm. When the coefficients were standardized so that each one represented a 1 SD increase in its value, we found that a 1 SD increase in the LOS Index (equal to 1 day longer, on average, in the ICU) caused an increase of approximately $2300 in average patient costs. Similarly, a 1 SD increase in the Fixed Cost Index resulted in an increase of $1951 in average patient costs where 1 SD equaled an amount 33% above the median value (Fig. 3). Discussion 6824 (17.0) 7002 (17.4) 544 (1.4) 16.6% [13.1, 20.4] 17.6% [13.8, 21.1] 1.2% [0.7, 2.0] 25,895 (64.3) 64.7% [58.5, 70.8] The cost of intensive care in the United States was estimated to be $81 billion in 2005, representing more than 13% of all hospital costs and almost 1% of the US gross domestic product.30 Despite this large and growing expenditure, the most effective method for 34 Tara. Lagu et al. / Healthcare 1 (2013) 30–36 15 25 20 10 Percent Percent 15 10 5 5 0 0 0 .5 1 1.5 2 2.5 -2 -1 0 Fixed Cost Index 1 2 3 4 Length of Stay Index Fig. 1. Distribution of fixed cost index and practice pattern index. Table 3 Model examining contribution of patient and hospital factors to cost variation. LOS index Fixed cost index Wage index Residents to bed ratio Vasopressors by day 2 Mechanical ventilation by day 2 Valvular disease Albumin by day 2 Fungal infection type Percent age 85+ Widowed marital status Constant term Coefficient 95% CI p Standardized coefficient showing the change in mean hospital costs with a change in 1 SD of X 2318 5992 12,745 10,735 9073 7640 17,288 5287 17,541 −6034 −5132 −8380 (1815–2822) (4288–7695) (8156–17,333) (7338–14,132) (4046–14,100) (2838–12,442) (6605–27,972) (1550–9024) (3208–31,874) (−16,194–4127) (−10,868–605) (−13,281 to –3479) o 0.001 o 0.001 o 0.001 o 0.001 o 0.001 0.002 0.002 0.006 0.017 0.253 0.079 0.001 2318 1950 1766 1671 976 903 817 722 606 −374 −452 NA reducing the cost of critical care remains a matter of debate.15,16,24,31–33 In a large cohort of patients with sepsis who were treated in the ICU, we observed significant variation in costs of care across hospitals. To better understand the reasons for this variation, we used patient factors, hospital mission, and fixed cost and length of stay indices to create a model that accounted for nearly 70% of the variation in costs that we observed. Of this explained variation in spending, approximately one-third was related to the length of stay index. The remainder was related to patient comorbidities and disease severity (20%), hospital mission (16%), wage index (12%), and fixed costs (19%). These findings indicate that although opportunities exist to reduce the cost of critical care by changing physician practice, the majority (more than 66%) of the explained variation in spending was the result of factors that were outside of a physician's control. These findings have important implications for hospitals that are seeking to reduce their expenditures on critical care. A hospital that is 1 SD above the mean for both ICU LOS and fixed costs would save a roughly equivalent amount by either reducing ICU LOS by one day or by reducing fixed costs by 33%. Given the emphasis on the need for all hospitals to reduce spending in upcoming years, it is likely that such a hospital will need to both reduce ICU length of stay and reduce fixed costs. Reducing ICU length of stay can be achieved by improving throughput (from the emergency room to the ICU, the ICU to the medical floors, and from inpatient to rehabilitation beds); training physicians to identify and triage stable patients to medical beds; early focus on ventilator weaning and physical therapy, reduction of complications such as deep vein thromboses, ventilator-associated pneumonia, and catheter-associated bloodstream infections; and collaborative, team-based care that is focused on reducing length of stay.34–40 Reducing fixed costs is potentially more difficult, but is accomplished by reducing infrastructure, labor, or acquisition costs (e.g., renegotiating purchasing or labor contracts, identifying areas where the hospital is “overstaffed,” holding off on future infrastructure investments hospitals such as large equipment purchases or new buildings, or eliminating underutilized ICU beds).14,15,17 Tara. Lagu et al. / Healthcare 1 (2013) 30–36 Fig. 2. Contributors to variation in hospital spending. Fig. 3. Change in mean hospital costs with a change in 1 SD. Our findings generally support other studies on the cost of critical care. In a retrospective cohort study at a single academic center, Kahn et al. concluded that less than 20% of total costs were related to care intensity.17 In another single-center study, Roberts et al. reported that fixed costs of hospitalizations (related to overhead and labor) represented more than 80% of total costs.14 Others have suggested that room, board, and labor (and not mechanical ventilation, procedures, or medications) were the most significant contributors to the cost of critical care18,24 and that costs across the entire healthcare system are more tied to price 35 than to intensity.41 Taken with our study's findings, these data suggest that initiatives that focus exclusively on intensity of care may garner only modest cost reductions. A strength of this study is that we examined a broad sample of hospitals in both academic and community settings across all geographic regions of the US. Although our methods were different from those used in prior studies (which calculated fixed vs. variable costs in a single hospital), our conclusions are similar.14,17 We were able to add to the literature by quantifying the contributions of hospital mission (e.g., teaching or care for low-income patients). In contrast to prior work that has reported on the association between resource utilization and patient factors, we found that only 20% of explained variation was attributable to patient demographics, comorbidities, and severity of illness at the time of hospital admission.42 Future work should consider the production function of health care at the entire hospital level, across more diagnoses, with a target of optimizing care while improving efficiency. This study has limitations as well. Perhaps most importantly, 30% of variation in costs across hospitals remained unexplained. This may be because we were not able to account for some contributors to variation and because some of our variables were estimates of the contributor in question. The fixed cost index was calculated using the relative cost of a day of ICU but did not assess the cost of other items. Hospitals with low room and board costs but high costs for other items might therefore have an artificially low index. In prior studies using this database, however, we have found that fixed and acquisition costs were added proportionally to all items.18,19 ICU length of stay encompasses some aspects of practice pattern but does not include other markers of care intensity such as procedures, tests, or medications (some of these variables were modeled as separate predictors). This might lead us to under- or over-estimate the percent of cost variation that is in the control of the physicians. Because of this, we considered using additional methods for estimating practice pattern, such as use of a single test or procedure or a combination of tests and procedures, but decided that because ICU length of stay is a primary determinant of hospital cost18 and represents a synthesis of the hundreds of medical decisions physicians make in a day, it was our best option for simultaneously estimating practice pattern and aspects of hospital throughput. However, we acknowledge that ICU length of stay does not capture all aspects of practice and we are unable to separately examine how much of length of stay is in the physicians' immediate control and how much is due to broader throughput factors. Another limitation is that our database does not contain clinical data, so unmeasured severity of illness could explain some of the unexplained variation in cost. Notably, we did adjust for severity using variables from a validated sepsis severity adjustment method, so unmeasured severity is less likely to be a significant contributor to variation than if we did not use such a measure. We also did not have access to data on the number of ICU beds in each hospital, number of intensivists, or information on nurse staffing, which are other potentially important contributors to variation in costs. We limited our study to hospitals that admitted 100 sepsis patients to the ICU over a 2-year period and we excluded transferred patients. Our findings may not be generalizable to all hospitals. Although our study utilizes data from 189 hospitals throughout the United States, we do not know how spending on sepsis varies among hospitals not in our database. Spending patterns on sepsis patients may not be generalizable to all critically ill patients: patient demographics, comorbidities, and acuity may explain spending variation differently in a more diverse ICU population. In summary, a large proportion of the hospital-level variation in spending on critically ill sepsis patients is related to differences in patient characteristics and immutable hospital characteristics, 36 Tara. Lagu et al. / Healthcare 1 (2013) 30–36 while nearly one-third is the result of differences in risk-adjusted length of stay. Efforts to reduce spending on the critically ill should aim to understand determinants of practice style but should also focus on hospital throughput, overhead, acquisition, and labor costs. 17. 18. 19. Acknowledgments 20. 21. The study was conducted with funding from the Division of Critical Care Medicine and the Center for Quality of Care Research at Baystate Medical Center, Springfield, MA. Premier Healthcare Informatics, Charlotte NC, provided the data used to conduct this study but had no role in its design, conduct, analysis, interpretation of data, or the preparation, review or approval of the manuscript. Drs. Lagu and Lindenauer had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Drs. Lagu, Lindenauer, Steingrub and Rothberg conceived of the study. Dr. Lindenauer acquired the data. Drs. Lagu, Lindenauer, Rothberg, Steingrub, Nathanson, and Mr. Hannon, analyzed and interpreted the data. Dr. Lagu drafted the manuscript. Drs. Lindenauer, Rothberg, Nathanson, Steingrub and Mr. Hannon critically reviewed the manuscript for important intellectual content. Dr. Nathanson carried out the statistical analyses. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Lagu T, Rothberg MB, Shieh M-S, et al. Hospitalizations, costs, and outcomes of severe sepsis in the United States 2003 to 2007. Critical Care Medicine. 2012;40(3):754–761. Rothberg MB, Cohen J, Lindenauer P, Maselli J, Auerbach A. Little evidence of correlation between growth in health care spending and reduced mortality. Health Affairs (Millwood). 2010;29(8):1523–1531. Fisher ES, Bynum JP, Skinner JS. Slowing the growth of health care costs— lessons from regional variation. New England Journal of Medicine. 2009;360 (9):849–852. Fisher ES, Wennberg DE, Stukel TA, et al. The implications of regional variations in Medicare spending. Part 2: health outcomes and satisfaction with care. Annals of Internal Medicine. 2003;138(4):288–298. Shortell SM. Increasing value: a research agenda for addressing the managerial and organizational challenges facing health care delivery in the United States. Medical Care Research and Review. 2004;61(suppl 3):12S–30S. Esserman L, Belkora J, Lenert L. Potentially ineffective care. A new outcome to assess the limits of critical care. JAMA. 1995;274(19):1544–1551. Garland A, Shaman Z, Baron J, Connors AF. Physician-attributable differences in intensive care unit costs: a single-center study. American Journal of Respiratory and Critical Care Medicine. 2006;174(11):1206–1210. Cutler DM, Ly DP. The (paper) work of medicine: understanding international medical costs. Journal of Economic Perspectives. 2011;25(2):3–25. Abraham E, Singer M. Mechanisms of sepsis-induced organ dysfunction. Critical Care Medicine. 2007;35(10):2408. Angus DC, Wax RS. Epidemiology of sepsis: an update. Critical Care Medicine. 2001;29(suppl 7):S109–116. Dellinger RP, Levy MM, Carlet JM, et al. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock: 2008. Critical Care Medicine. 2008;36(1):296–327. Knaus WA, Wagner DP, Zimmerman JE, Draper EA. Variations in mortality and length of stay in intensive care units. Annals of Internal Medicine. 1993;118 (10):753–761. Lipscomb J, Yabroff KR, Brown ML, Lawrence W, Barnett PG. Health care costing: data, methods, current applications. Medical Care. 2009;47(7 suppl 1): S1–6. Roberts RR, Frutos PW, Ciavarella GG, et al. Distribution of variable vs fixed costs of hospital care. JAMA. 1999;281(7):644–649. Kahn JM. Understanding economic outcomes in critical care. Current Opinion in Critical Care. 2006;12(5):399–404. Kahn JM, Angus DC. Reducing the cost of critical care: new challenges, new solutions. American Journal of Respiratory and Critical Care Medicine. 2006;174 (11):1167–1168. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. Kahn JM, Rubenfeld GD, Rohrbach J, Fuchs BD. Cost savings attributable to reductions in intensive care unit length of stay for mechanically ventilated patients. Medical Care. 2008;46(12):1226–1233. Lagu T, Rothberg MB, Nathanson BH, et al. The relationship between hospital spending and mortality in patients with sepsis. Archives of Internal Medicine. 2011;171(4):292–299. Chen SI, Dharmarajan K, Kim N, et al. Procedure intensity and the cost of care. Circulation: Cardiovascular Quality and Outcomes. 2012;5(3):308–313. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Medical Care. 1998;36(1):8–27. Lagu T, Lindenauer PK, Rothberg MB, et al. Development and validation of a model that uses enhanced administrative data to predict mortality in patients with sepsis. Critical Care Medicine. 2011;39(11):2425–2430. Lagu T, Rothberg MB, Nathanson BH, Steingrub JS, Lindenauer PK. Incorporating initial treatments improves performance of a mortality prediction model for patients with sepsis. Pharmacoepidemiology and Drug Safety. 2012;21(suppl 2):44–52. Wynn B, Coughlin T, Bondarenko S, Bruen B, Analysis of the Joint Distribution of Disproportionate Share Hospital Payments. Prepared for Assistant Secretary of Planning and Evaluation U.S. Department of Health and Human Services by RAND under contract with the Urban Institute. 〈http://aspe.hhs.gov/health/ reports/02/DSH/〉, 2002 (Accessed 15.04.13). Pastores SM, Dakwar J, Halpern NA. Costs of critical care medicine. Critical Care Clinics. 2012;28(1):1–10. Southern WN, Bellin EY, Arnsten JH. Longer lengths of stay and higher risk of mortality among inpatients of physicians with more years in practice. American Journal of Medicine. 2011;124(9):868–874. Southern WN, Berger MA, Bellin EY, Hailpern SM, Arnsten JH. Hospitalist care and length of stay in patients requiring complex discharge planning and close clinical monitoring. Archives of Internal Medicine. 2007;167(17):1869–1874. Austin PC, Tu JV. Bootstrap methods for developing predictive models. The American Statistician. 2004;58(2):131–137. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Annals of Statistics. 2004;32(2):407–499. Bradley EH, Herrin J, Curry L, et al. Variation in hospital mortality rates for patients with acute myocardial infarction. American Journal of Cardiology. 2010;106(8):1108–1112. Halpern NA, Pastores SM. Critical care medicine in the United States 2000– 2005: an analysis of bed numbers, occupancy rates, payer mix, and costs. Critical Care Medicine. 2010;38(1):65–71. Luce JM, Rubenfeld GD. Can health care costs be reduced by limiting intensive care at the end of life? American Journal of Respiratory and Critical Care Medicine. 2002;165(6):750–754. Orszag PR, Ellis P. Addressing rising health care costs—a view from the Congressional Budget Office. New England Journal of Medicine. 2007;357 (19):1885–1887. Stockwell DC, Slonim AD. Intensive care unit costs: to infinity and beyond or not? Critical Care Medicine. 2008;36(9):2676–2678. Rivers E, Nguyen B, Havstad S, et al. Early goal-directed therapy in the treatment of severe sepsis and septic shock. New England Journal of Medicine. 2001;345(19):1368–1377. Nguyen HB, Corbett SW, Steele R, et al. Implementation of a bundle of quality indicators for the early management of severe sepsis and septic shock is associated with decreased mortality. Critical Care Medicine. 2007;35 (4):1105–1112. Pronovost P, Needham D, Berenholtz S, et al. An intervention to decrease catheter-related bloodstream infections in the ICU. New England Journal of Medicine. 2006;355(26):2725–2732. Pronovost PJ, Thompson DA, Holzmueller CG, Dorman T, Morlock LL. The organization of intensive care unit physician services. Critical Care Medicine. 2007;35(10):2256–2261. Bucknall TK, Manias E, Presneill JJ. A randomized trial of protocol-directed sedation management for mechanical ventilation in an Australian intensive care unit. Critical Care Medicine. 2008;36(5):1444–1450. Gao F, Melody T, Daniels DF, Giles S, Fox S. The impact of compliance with 6hour and 24-hour sepsis bundles on hospital mortality in patients with severe sepsis: a prospective observational study. Critical Care. 2005;9(6):R764–R770. Resar R, Pronovost P, Haraden C, et al. Using a bundle approach to improve ventilator care processes and reduce ventilator-associated pneumonia. Joint Commission Journal on Quality and Patient Safety. 2005;31(5):243–248. Anderson GF, Reinhardt UE, Hussey PS, Petrosyan V. It's the prices, stupid: why the United States is so different from other countries. Health Affairs (Millwood). 2003;22(3):89–105. Odetola FO, Gebremariam A, Freed GL. Patient and hospital correlates of clinical outcomes and resource utilization in severe pediatric sepsis. Pediatrics. 2007;119(3):487–494. Healthcare 1 (2013) 37–41 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Innovating in health delivery: The Penn medicine innovation tournament Christian Terwiesch a,b,c, Shivan J. Mehta a,c,d,n, Kevin G. Volpp a,c,d,e a Penn Medicine Center for Innovation, United States Department of Operations and Information Management, The Wharton School, University of Pennsylvania, Philadelphia, PA19104, United States Leonard Davis Institute of Health Economics, Center for Health Incentives and Behavioral Economics, United States d Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104, United States e Center for Health Equity Research & Promotion, Philadelphia Veterans Affairs Medical Center, United States b c ar t ic l e i nf o a b s t r a c t Article history: Received 11 February 2013 Received in revised form 19 April 2013 Accepted 5 May 2013 Available online 13 May 2013 Background: Innovation tournaments can drive engagement and value generation by shifting problem-solving towards the end user. In health care, where the frontline workers have the most intimate understanding of patients' experience and the delivery process, encouraging them to generate and develop new approaches is critical to improving health care delivery. Problem: In many health care organizations, senior managers and clinicians retain control of innovation. Frontline workers need to be engaged in the innovation process. Goals: Penn Medicine launched a system-wide innovation tournament with the goal of improving the patient experience. We set a quantitative goal of receiving 500 ideas and getting at least 1000 employees to participate in the tournament. A secondary goal was to involve various groups of the care process (doctors, nurses, clerical staff, transporters). Strategy: The tournament was broken up into three phases. During Phase 1, employees were encouraged to submit ideas. Submissions were judged by an expert panel and crowd sourcing based on their potential to improve patient experience and ability to be implemented within 6 months. During Phase 2, the best 200 ideas were pitched during a series of 5 workshops and ten finalists were selected. During Phase 3, the best 10 ideas were presented to and judged by an audience of about 200 interested employees and a judging panel of 15 administrators. Two winners were selected. Results: A total of 1739 ideas were submitted and over 5000 employees participated in the innovation tournament. Patient convenience/amenities (21%) was the top category of submission, with other popular areas including technology optimization (11%), assistance with navigation within UPHS (10%), and improving patient/family centered care (9%) and care delivery models/transitions (9%). A combination of winning and submitted ideas were implemented. Implications: Keywords: Health delivery Innovation Operations improvement Innovation tournaments can successfully engage a large portion of the employee population. Innovation tournaments represent a “bottom-up” approach to health care innovation and a method by which innovation can be democratized from the control of administrators and executives. Further research is needed to test, evaluate and improve innovation tournaments. & 2013 Elsevier Inc. All rights reserved. 1. Introduction Innovation can be defined as creating a novel match between a solution and a need in order to create value. Innovation in medicine has a long history but is often confused with invention. Invention hinges only on the concept of novelty while innovation n Correspondence to: Perelman School of Medicine, University of Pennsylvania, 1137 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104, United States. Tel.: +1 215 898 9807. E-mail address: [email protected] (S.J. Mehta). 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.05.003 requires both novelty and value creation. Value is created if the novel match between solution (a treatment) and need (disease or injury) leads to better patient outcomes. The process of innovation begins with the discovery of novel needs and solutions or with a recombination of existing solution and needs. At the initial stage though, value generation is merely a hypothesis and the front-end of the innovation process really consists of opportunity identification. Well-designed innovation is heavily grounded in the scientific method, with opportunity identification and hypothesis testing being key steps in the process. As economic pressures on the health care system have risen new approaches to 38 C. Terwiesch et al. / Healthcare 1 (2013) 37–41 increase the value of health services provided have gained in importance. For example, the triple aim describes improving the patient experience of care, improving the health of populations, and reducing the per capita cost of health care.1 In many healthcare and non-health care organizations, senior managers and clinicians retain control of innovation—they create new opportunities and select the ones to be implemented. However, a top-down approach to innovation might be ill suited for areas that require a more intimate understanding of patients' experience and delivery process. In this article, we describe a ‘bottom-up’ approach to improve the patient experience which engage a large group of innovators, including front-line care givers, administrative support staff, and hospital support functions. Specifically, we invited all 18,000 employees within Penn Medicine to participate. This broader innovator base called for significant modifications to the traditional innovation process and led us to conduct an innovation tournament. 2. Innovation tournaments Innovation tournaments are very similar to the request for proposals that underlies many grant-funded initiatives. The host of a tournament issues a call for ideas (opportunities) in a particular area of interest; in our case, ideas on how to improve the patient experience. The innovator community, in our case the entire population of employees at Penn Medicine, is invited to submit opportunities. These opportunities then go through multiple steps of screening and evaluation (including crowd-sourcing and peer review). A few ideas emerge as winners while many others do not advance. This process is common among all tournaments, including examples of the various Xprizes (such as the search for private space travel technology and the development of highly fuel efficient vehicles) and DARPA challenges (including the search for a driverless car technology). Innovation tournaments have been advocated in the academic literature as a broader movement to “Democratize Innovation”.2–4 The locus of creative problem solving is shifted towards the end user, in our case the front line care-giver. A basic principle in running such tournaments is to generate a high number of submissions from a large number of employees since knowledge is “sticky” and distributed.5,6 Both the number and diversity of the participants increases the quality of the best ideas of an innovation process.3 3. Setting the stage for the Penn medicine innovation tournament Many aspects of the care delivery process that are important to the patient are under the influence of nurses, patient transporters, or administrative support staff. Involving these groups can help identify new opportunities that would be overlooked if innovation were left to physicians and executives alone. We set an initial goal of 500 ideas and aspired to get at least 1000 employees involved in the process, either by identifying an opportunity or by commenting on opportunities identified by others. All employees who submitted an idea were allowed to vote on the ideas of others. Penn Medicine is the umbrella organization that includes the Perelman School of Medicine at the University of Pennsylvania and the University of Pennsylvania Health System (UPHS). The goal of the innovation tournament was to engage the broader community to identify and select new ideas that generate value for the patients of the health system. Understanding the complex organization and communicating with the different employees was a big challenge identified early in the planning process. This tournament offered the potential to unite this diverse employee base with a common purpose. The Chief Executive Officer of UPHS endorsed the concept of a tournament in November 2011. A steering committee was formed, including UPHS executives and faculty from the medical and business schools with an expertise in health services. The tournament challenged the employee base to come up with ideas that would significantly improve the patient experience because this theme transcended all employees across the organization. The steering committee enlisted a working group from across the health system to operationalize the logistics and leverage existing communication channels in the organization. 4. Running the tournament The tournament was broken up into three phases (Fig. 1). Phase 1 took place during January–February 2012 with the goal of encouraging the submission of ideas from across the health system. Beyond the quantitative goal of receiving 500 ideas, a secondary goal was to obtain a representative participation from the various groups involved in the care process (doctors, nurses, clerical staff, transporters) and from the multiple sites that constitute the health system. An active communication program was launched to reach the health system's 18,000 employees, including flyers and posters, a strong online presence, as well as several email memos from the CEO. The communication channels directed participants to a specially designed website that allowed for idea submission, viewing of ideas, and rating of ideas on a scale of 1–5 through crowd-sourcing. There were no monetary prized for winning or moving forward because it was felt that it would not increase creativity in a mission-based organization, and large cash prizes may hinder the free and open sharing of ideas. The number of submissions and the overall participation substantially exceeded the previously set objectives. A total of 1739 ideas were submitted and over 5000 employees participated by commenting and rating ideas. A total of 200 ideas were moved from Phase 1 to Phase 2. This was done based on a combination of expert rating and crowd sourcing on a scale of 1–5. A panel of 29 experts from various parts of the health system was asked to evaluate a set of at least 400 ideas that were presented to them in random order. Moreover, each idea obtained was rated by “the crowds” (any employee that wanted to evaluate an idea). The two criteria suggested for rating ideas were the potential to improve patients experience and if it could be implemented in 6 months. Given that there was uncertainty of the results, the decision was made to keep criteria deliberately broad. Ideas were moved forward for having a high mean score, but also if there was significant excitement with a significant percentage of high scores (4–5). The second phase of the tournament consisted of a set of 5 workshops representing the best 200 ideas from Phase 1. Each of the workshops was structured to allow for the voting on ideas and collaboration among newly formed teams. After some opening comments and instructions from the moderator, participants were asked to present (“pitch”) their idea. Participants were asked to prepare a short poster summarizing their idea and present in less than 90 s. Voting was conducted using simple stickers, and teams of five were formed around the best ideas. Teams were then given 1 h to refine and improve the idea. After that, the best 6–10 ideas were presented again and two ideas from each work-shop were selected to advance to Phase 3. Phase 3 of the tournament consisted of a presentation of the ten (5 workshops, 2 ideas per work-shop; Fig. 2) best ideas to an audience of about 200 interested employees and a judging panel of 15, which included the Chief Executive Officer, Chief Medical Officer, as well as a representation of leadership at UPHS and the School of Medicine. Between Phase 2 and Phase 3, the corresponding teams were provided with time and resources to create a C. Terwiesch et al. / Healthcare 1 (2013) 37–41 Fig. 1. Overall ideas by phase. Fig. 2. Top ten finalist ideas. 39 40 C. Terwiesch et al. / Healthcare 1 (2013) 37–41 Category Example ideas Convenience/amenities (21%) • • • Technology optimization (11%) • • • Navigation (10%) • • • Restaurant pagers/buzzers in waiting areas for patients to be called for clinic appointments Artwork for patients, families, and coworkers Flexibility to order food online at any time in the hospital Electronic consent form IPad application Enhanced patient ID cards with on-line forms and records Online payment for patient bills iPhone app to help patients and families navigate hospital ED waiting room patient queue screen QR codes on signs in the hospital tolaunch directions to specific places Improving patient/ family centered care (9%) • • • Iconography cards for non-English speaking patients Daily summary of care plan for patients in the hospital Speedy pre-check in process the night before an appointment Care delivery models/ transitions (9%) • Cellphone app for adherence monitoring and intervention Designate a Care Manager assigned just for patients readmitted to improve transitions 24/7 clinical triage line to assist health system patients • • Fig. 3. Categories of solutions. 10 min presentation that addressed the problem, solution, metrics for success, and resources necessary for implementation. Each of the Phase 3 finalists had their picture taken with the CEO and the head of HR and received a trophy. The overall winners of the tournament were chosen based on the votes of the judges as well as an audience participation system. The final judges were instructed to evaluate “How new or novel is this idea for health care delivery?” and “What is the potential for this idea to improve the patient experience at Penn Medicine?” 2 winning ideas were selected to move forward with the appropriate resources necessary for implementation. The winning team members were given the opportunity to actively participate in these development teams. the health system executives as well as excitement from across the organization, so a combination of top-down and bottom-up approach. 10 ideas emerged as finalists. These ideas included a patient kiosk for check-in as well as introducing an online scheduling tool to book appointments with providers. It lies in the nature of an innovation tournament that these ideas are not “shovel ready” for execution, but still require a careful process leading to a staged implementation, which started in the summer of 2012. Another useful insight was that many solutions were not based on the professional experiences of the employees, but rather based on their experience as patients. In fact, the health system is moving forward with one of the top ideas was to have an innovation tournament with patients as the participants. 5. Post event assessment Patient experience was a deliberately broad topic and the ideas submitted transcended all aspects of patient experience including convenience, care delivery, environment, and education (Fig. 3). Patient convenience/amenities (21%) was the top category of submission, with other popular areas including technology optimization (11%), assistance with navigation within UPHS (10%), and improving patient/family centered care (9%) and care delivery models/transitions (9%). Given the complex organization of an academic medical center, there are a variety of “solvers” who proposed ideas, ranging from physicians, nurses, managers, and additional staff. A stated goal of the initiative was to generate ideas from all types of workers across the organization, so all groups were represented in the submissions. The role and background of “solvers” is of particular interest, since it provides some insight into future efforts to promote innovation across the organization. These “solvers” were typically the clinical staff, which included nurses and physicians. This could be a result of their close interaction with patients or that they have deep expertise in an adjacent field to the many non-clinical patient experience solutions. Several important takeaways emerged. The tournament was highly successful at engaging a large portion of the employee population, with more than half of the organization participating as either idea submitters, voters, or visiting the tournament website. In order for this to happen, it required the support of 6. Conclusion The success of a tournament can be measured on two dimensions. The main and obvious success measure is to evaluate the output of the tournament process, i.e., the quality of the ideas generated. Other factors to be considered on this first dimension are the quality of the team that was formed over the course of the tournament and the level of organizational support and legitimacy the idea has received by going through this fair and transparent process. The second dimension of an evaluative framework looks at the impact the tournament had on the culture of innovation in the organization as well as the employee's engagement and commitment. There is certainly a balance between having a broad or narrow area of focus for the innovation tournament. A broad problem could potentially increase the number of participants and allow for more creativity in the problem statement. However, there is sometimes value to identifying an important more specific problem, which may constrain the participants to think of more divergent solutions in key areas. We believe that this depends on the priorities of the organization. Our tournament had a deliberately broad focus area in order to maximize participation from a wide range of employees across the system. In thinking about innovation tournaments, it is helpful to consider the depth of expertise and scale of investment required.3 However, the challenges of health delivery are multi-faceted and less capital C. Terwiesch et al. / Healthcare 1 (2013) 37–41 intensive, so they could be better “distributed” to the frontline workers. There is a challenge to having only two “winners”, so it is important to realize that the innovation process does not end with the final session. Winning ideas need to be implemented to learn more about the solutions, but also about the problem being addressed. Meanwhile, organizations can increase the number of “winners” by facilitating the development and testing of ideas that did not emerge through the tournament. Submitted ideas in many cases provided quick fixes to previously unaddressed problems— they provided useful insights from the front line, yet were too small or too specific to attract a large number of votes in the tournament. While not chosen as tournament winners, these ideas were valuable and were fed back to the responsible department or team leaders for implementation. Some of them were implemented within weeks of the tournament closing. Beyond the ideas themselves, the tournament succeeded at creating the sense among employees that their ideas and participation are valued. For a large health care organization, such an increase in employee engagement and the associated change in organizational culture might be as valuable as the actual winning ideas themselves in bringing about subsequent improvements in the patient experience. They source and prioritize a large number 41 of opportunities at a relative low cost (either measured in the cost per idea or the cost per selected idea). As such, they do create value and are a more effective approach compared to using outside consultants or a centralized organizational unit that “owns” the innovation task. Future areas of research could involve studying how engagement of employees in innovation tournaments could alter their future activities outside of the tournament and in testing different ways to evaluate and move forward tournament ideas. References 1. 2. 3. 4. 5. 6. Berwick DM, Nolan TW, Whittington J. The triple aim: care, health, and cost. Health Affairs. 2008;27(3):759–769. Hippel Ev. Democratizing innovation.Cambridge, Massachusetts: MIT Press. x; 204. Terwiesch C, Ulrich KT. Innovation tournaments: creating and selecting exceptional opportunities.Boston, Massachusetts: Harvard Business Press. x; 242. Terwiesch Christan, Xu Yi. Innovation contests, open innovation, and multiagent problem solving. Management Science. 2008;54(9):1529–1543. Lakhani Karim R, Panetta Jill A. The principles of distributed innovation. Innovations. 2007;2(3):97–112. Lakhani KR, Jeppesen LB, Lohse PA, Panetta JA. The value of openness in scientific problem solving. Harvard Business School Working Paper No. 07–050. Available from: hhttp://www.hbs.edu/research/pdf/07–050.pdf;i 2007 [accessed 4.05.2007]. Healthcare 1 (2013) 42–49 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Review What can the past of pay-for-performance tell us about the future of Value-Based Purchasing in Medicare? Andrew M. Ryan a,n, Cheryl L. Damberg b a b Department of Public Health, Weill Cornell Medical College, 402 East 67th Street, New York, NY, USA RAND Corporation and Pardee RAND Graduate School, Santa Monica, CA, USA ar t ic l e i nf o a b s t r a c t Article history: Received 8 March 2013 Received in revised form 15 April 2013 Accepted 21 April 2013 Available online 9 May 2013 The Medicare program has implemented pay-for-performance (P4P), or Value-Based Purchasing, for inpatient care and for Medicare Advantage plans, and plans to implement a program for physicians in 2015. In this paper, we review evidence on the effectiveness of P4P and identify design criteria deemed to be best practice in P4P. We then assess the extent to which Medicare's existing and planned Value-Based Purchasing programs align with these best practices. Of the seven identified best practices in P4P program design, the Hospital Value-Based Purchasing program is strongly aligned with two of the best practices, moderately aligned with three, weakly aligned with one, and has unclear alignment with one best practice. The Physician Value-Based Purchasing Modifier is strongly aligned with two of the best practices, moderately aligned with one, weakly aligned with three, and has unclear alignment with one of the best practices. The Medicare Advantage Quality Bonus Program is strongly aligned with four of the best practices, moderately aligned with two, and weakly aligned with one of the best practices. We identify enduring gaps in P4P literature as it relates to Medicare's plans for Value-Based Purchasing and discuss important issues in the future of these implementations in Medicare. & 2013 Elsevier Inc. All rights reserved. Keywords: Pay-for-performance Value-based-purchasing Payment Medicare Contents 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Summary of research on hospital and physician P4P . . . . . . . . . . . . . . . . . . . . . . . . 3. Precursors to nationwide implementation of Value-Based Purchasing in Medicare 4. Medicare's Value-Based Purchasing programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Comparison of Medicare's direction in Value-Based Purchasing and best practices. 6. Considerations for the future of Value-Based Purchasing in Medicare . . . . . . . . . . . Conflict of interest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Identifying best practices for the design of P4P programs . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Background The concept of pay-for-performance (P4P) – that payers should explicitly link provider reimbursement with performance on quality measures – is compelling. Because patients have a limited ability to observe the quality of care that they receive,1 providers have lacked the incentive to provide sufficiently high quality care, n Corresponding author. Tel.: +1 646 962 8077. E-mail address: [email protected] (A.M. Ryan). 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 43 43 43 44 46 47 47 47 48 resulting in suboptimal quality across the health care system.2 In response, both public and private payers have attempted to incentivize the delivery of high quality care through the payment system by initiating P4P programs. P4P has now been implemented nationally by Medicare for inpatient care3 and for Medicare Advantage (MA) plans, and starting in 2015 will be implemented for physicians as part of the Physician Value-Based Payment Modifier.4 However, despite the best efforts of researchers, the question “Does pay-for-performance improve quality in health care?” remains frustratingly elusive. Even after widespread implementation of P4P in the A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 United States, international P4P efforts,5 and accumulating research, much is still unknown about the conditions under which P4P is most effective and whether P4P has the potential to be a cost-effective means of improving quality.6 Further, research has little to say about extensions of P4P “version 1.0,” such as how P4P can improve the value of care,7 not just quality. While policymakers would like to know how a specific incentive targeted towards a specific provider can be expected to impact a specific quality measure, research to date can, at best, guide policymakers towards general principals of implementation. Nonetheless, the existing literature on P4P can provide some guidance to policymakers about how we should form expectations as to the likely effectiveness of future programs, the key design features that will influence the success of these programs, and how we can understand the risk and reward trade-off between more highly powered incentives and the potential for unintended consequences. In this article, we draw on the recent literature in P4P to identify key insights that are relevant to national P4P implementation efforts. We focus on how the literature may inform P4P, now referred to as Value-Based Purchasing, implementations in Medicare: the Hospital Value-Based Purchasing (HVBP) program, the Physician Value-Based Payment Modifier (PVBPM), and the Medicare Advantage Quality Bonus Program (QBP). We then discuss additional considerations for the design of these programs and how research can support these efforts. 2. Summary of research on hospital and physician P4P By 2004, 37 separate P4P programs had been implemented in the United States, almost exclusively by private payers in the outpatient setting,8 and by 2006 more than half of the HMOs used pay-for-performance9 and most state Medicaid programs were using some form of P4P.10 National estimates indicate that, by 2007, approximately half of physician practices had been exposed to P4P from private payers or Medicaid.11,12 A number of influential articles have assessed the extent of P4P implementation and evidence of payment for quality programs implemented in the previous two decades.6,13–15 While equivocal, reviews of the early published studies suggested that financial incentives for quality could generate improvement under some circumstances.13 More recent reviews of the literature have painted a more mixed picture of the overall effectiveness of P4P, and have also begun to identify conditions under which P4P could be more effective. A review by Flodgren et al.16 found that financial incentives were generally effective in improving processes of care (improving 41/57 measures from 19 studies) but generally ineffective in improving compliance with a pre-specified population quality target (improvement observed for 5/17 measures from five studies). Other systematic reviews of the evidence found insufficient evidence to support (or not support) the use of financial incentives for quality of care in primary care and for individual physicians.17,18 This work also suggested that more methodologically rigorous studies were less likely to find positive effects of P4P.18 In addition, a review by Van Herck et al.19 found that P4P programs tended to show greater improvement on process measures compared to outcomes, that the positive effect of incentives was generally greater for initially low performers compared to higher performers, that it was unclear how the magnitude of incentives impacted the effectiveness of P4P programs, and that programs aimed at the individual-provider level and/or team level generally reported positive results. However, most of the programs evaluated in these reviews were small scale P4P experiments, initiated either by a single payer within a health care market or for a select group of providers. The extent to which results from these programs would generalize to national, mandatory implementations of P4P is unclear. 43 3. Precursors to nationwide implementation of Value-Based Purchasing in Medicare For hospitals, the research that is most relevant to Medicare's implementation of HVBP comes from the Premier Hospital Quality Incentive Demonstration (HQID). Under this demonstration, 266 hospitals, all subscribers to Premier's “Perspective” hospital performance benchmarking service, agreed to collect and report data on a set of quality measures and make their performance subject to financial incentives. Implementation of the HQID occurred in two phases. Results from initial studies of the phase 1 HQID implementation appeared promising: two studies reported that participating hospitals experienced modestly greater rates of quality improvement for process of care measures compared with comparison hospitals for each of the incentivized diagnoses examined in the first three years of the program.20,21 However, subsequent studies on the HQID raised doubts that the program improved quality performance.22–24 Even more discouraging were results from phase 2 of the HQID which found that changes in program design did not generate additional quality improvement.25,26 Detailed re-analyses of the initial data suggested that the early program success may have been due to selection of stronger hospitals into the demonstration and the later slowdown in improvement may have resulted from many of the incentivized performance measures becoming “topped out”.26 Other research found that the HQID did not appear to improve mortality outcomes across both phases of implementation.27 On the physician side, while numerous P4P programs have been implemented by private payers and state Medicaid programs, there are few large-scale programs to predict what might occur under a Medicare-implemented physician Value-Based Purchasing program. The most substantial private payer implementations of P4P have been for physician group practices in California including the PacifiCare Quality Incentive Program (QIP) and the California Integrated Healthcare Association's (IHA) P4P program. Evaluation of the QIP found very modest results28 while the IHA initiative was found to have changed the behavior of the physician organizations, leading to an increased organizational focus on quality and IT adoption.29 Less optimistically, evidence from a major P4P initiative implemented by five large commercial payers in Massachusetts found that the program did not improve performance on a series of incentivized HEDIS quality measures.30 In a few cases, P4P programs have been implemented among physician practices that focus on improving both cost and quality performance, using a shared savings incentive model. The Medicare Physician Group Practice (PGP) demonstration in 10 physician practices shared savings contingent on physician groups demonstrating improvement in quality on clinical measures and reductions in beneficiary costs. Another program, the Blue Cross Alternative Quality Contract (AQC), uses quality bonuses along with a shared savings model based on global budgets that includes both upside and downside risk for 11 participating provider organizations in Massachusetts. Compared to a comparison group of non-participating practices, quality improved more for AQC practices for chronic care management, adult preventive care, and pediatric care in the second year of the program, and has resulted in modest reductions in spending trends.31 4. Medicare's Value-Based Purchasing programs Following the experience of the HQID, the Affordable Care Act (ACA) enacted Hospital Value-Based Purchasing (HVBP) for all acute care hospitals in the United States, making HVBP the first national implementation of P4P in the United States. Under 44 A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 HVBP, acute care hospitals – those paid under Medicare's Inpatient Prospective Payment System (IPPS) – received payment adjustments starting in October of 2012 based on their performance on 12 clinical process and 9 patient experience measures from July 1, 2011 through March of 2012. Based on their performance on the clinical process and patient domains, hospitals receive a total performance score. In FY 2012, the score is weighted so that 70% is clinical process and 30% is patient experience. HVBP is budget neutral, redistributing hospital payment “withholds” from “losing” to “winning” hospitals that were equal to 1% of hospital payments from Diagnosis Related Groups (DRGs). Incentive payments in HVBP are based on an approach that incorporates both quality attainment and quality improvement, incentivizing hospitals for incremental improvements and foregoing the all-or-nothing threshold design of other programs. HVBP will also evolve over time. The magnitude of hospital payments at risk in HVBP increases by 0.25% per year, from 1% of hospital payments from DRGs in FY 2013 up to 2% in 2017 and beyond. The measures incentivized in the program will also change: in FY 2014, the program will add performance measures for 30-day mortality for heart attack, heart failure, and pneumonia and in FY 2015 patient safety indicators and the first efficiency measure (Medicare spending per beneficiary) will be incentivized as well.32 The PVBPM, the first national P4P initiative for physicians in fee-for-service Medicare, began quality measurement on January 1, 2013 and will effect payments beginning in 2015. The measure reporting and feedback components of the PVBPM are based on the voluntary Physician Quality Reporting Initiative (PQRS), which was implemented in 2007. For the PVBPM, physicians in practices of 100 or more eligible professionals will be subject to a valuebased payment modifier in 2015, while all physicians in fee-forservice Medicare will be subject to the payment modifier in 2017. In 2013, physicians in all practices must report to the PQRS in order to avoid a payment penalty in 2015.33 Practices can report relevant data through several mechanisms, including through a web interface or through a CMS registry, or request that CMS calculate the measure performance through administrative claims. Large practices that are subject to the PVBPM can then either elect to be subject to quality tiering, which may result in an upward and downward adjustment in payment, or elect to not be subject to quality tiering, resulting in no payment adjustment. Large practices that do not report to PQRS will face a 1% value modifier penalty, and may face further penalties from non-compliance with the PQRS. Practices will be evaluated on measures related to all cause readmission, acute prevention indicators, and chronic indicators and will also be evaluated on measures on total per-capita costs, as well as for costs for patients with specific chronic diseases. Standardized measures for the cost and quality domains, based only on levels of performance, will then be determined, and practices electing to face quality tiering will face payment adjustments based on their combination of quality and cost performance. The details about how the PVBPM will be expanded to smaller practices not yet been determined.34 The ACA also initiated P4P in Medicare Advantage plans under the Quality Bonus Program (QBP). Starting in 2015, Medicare Advantage plans can earn substantial quality bonus payments based on their performance on measures on clinical performance, patient experience, patient-reported outcomes, and customer complaints and other service indicators. These plans receive a star rating based on their performance, and plans that receive 4 stars or higher (out of 5) are eligible for bonus payments.35 Medicare is currently running a three-year demonstration of this program (2012–2014), with bonuses paid to plans that earn 3 stars or higher. 5. Comparison of Medicare's direction in Value-Based Purchasing and best practices After reviewing the literature, we derived the following seven criteria reflecting “best practice” in the design of P4P. 1. Choose measures that have room for improvement. 2. Promote widespread awareness of the program. 3. Coordinate program design (performance measures and payout criteria) across payers (public and private payers, different public payer initiatives). 4. Incentivize both quality attainment and quality improvement. 5. Adjust programs dynamically to recalibrate measures and payment thresholds. 6. Pay incentives that are sufficiently large to motivate a behavioral response 7. Provide technical assistance to participating providers. The process we used to arrive at the seven best practice criteria is found in Appendix A. We mapped these criteria against the ongoing and planned P4P program design features in Medicare to identify the extent to which the current design of Medicare's P4P programs are aligned with best practice in P4P. We assessed the degree of alignment between HVBP, the PVBPM, and the Medicare Advantage QBP and our identified best practices, determining whether each program has weak (i.e., little or no alignment between program design and best practice), moderate (i.e., some, but not complete, alignment between program design and best practice), strong (i.e., complete alignment between program design and best practice), or unclear alignment (i.e., program design is not sufficiently detailed to determine alignment or degree of alignment cannot be determined) with each best practice (Table 1). The current design of HVBP is moderately or strongly aligned with criteria #1 (measures have room for improvement), #2 (promote widespread awareness of the program), #4 (incentivize quality attainment and quality improvement), #5 (adjust programs dynamically), and #7 (provide technical assistance to participants); however, the program does not appear to be aligned with criterion #3 (coordinate program design across payers) and it is uncertain whether the design of HVBP is aligned with criterion, #6 (pay sufficiently large incentives), particularly in light of recently published data from CMS indicating that approximately $150 million will be redistributed based on payment updates in the first year of HVBP, resulting in an average absolute impact of about $50 thousand per hospital.36 Also, while its dual criteria of achievement and improvement are noted as a best practice in P4P design, this approach is similar to the design of Phase 2 of the HQID, which failed to maintain quality improvement among participating hospitals and also did not generate quality improvement for initially lower performing hospitals.26 In addition, the MassHealth hospital P4P program that based its incentive design specifically on that of HVBP also failed to improve quality.37 For the PVBPM, the details of implementation for all physicians remain to be determined. However, for large practices, the PVBPM appears to be strongly aligned with criteria #1 (measures have room for improvement) and #5 (adjust programs dynamically) and moderately aligned with criterion #2 (promote widespread awareness of the program). The PVBPM appears to be weakly aligned with criteria #3 (coordinate program design across payers), #4 (incentivize quality attainment and quality improvement), and #7 (provide technical assistance to participants), and it is uncertain whether the PVBPM is aligned with criterion #6 (pay sufficiently large incentives). Beyond these criteria a number of important considerations remain in the implementation of PVBPM. First, because it appears A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 45 Table 1 Alignment between pay-for-performance best practices and Medicare's pay-for-performance initiatives. P4P design feature Program Hospital Value-Based Purchasing Physician Value-Based Payment Modifier Medicare Advantage Quality Bonus Program Degree of alignment with best practice Comment Degree of alignment with best practice Comment Degree of alignment with best practice Comment 1. Choose measures that have room for improvement Moderate Existing measures reported for PQRS generally have room for improvement.33 Strong Clinical process (HEDIS), patient satisfaction, and patient reported outcomes measures have room for improvement.48 2. Promote widespread awareness of the program Strong Clinical process measure selection Strong was based on measures not being “topped off.” Future measures, such as patient safety indicators, have low prevalence and little room for improvement for many hospitals. Quality Improvement Organizations Moderate (QIOs) are intended to work with participating hospitals and inform them about the program rules. Promotion of program through Strong specialty societies and other means. Relatively small number of large practices makes promotion of initial implementation easier. Same as Hospital Value-Based Weak Purchasing. Large bonuses at stake in the program have created widespread awareness. QIOs are also intended to engage plans. 3. Coordinate Weak program design (performance measures and payout criteria) across payers 4. Incentivize both quality attainment and quality improvement 5. Adjust programs dynamically to recalibrate measures and payment thresholds 6. Pay incentives that are sufficiently large to motivate a behavioral response 7. Provide technical assistance to participating providers There is little to no coordination across private payers or other Medicare programs currently, although CMS is beginning to take steps to coordinate across its own programs and to harmonize with private plans. HVBP gives equal weight to quality achievement and improvement. Weak Weak Incentivizes only achievement. Strong Plans receive “stars” based on quality performance, one of which is based on quality improvement. Moderate CMS has some ability to change performance measures in future years. However, the regulatory process introduces long lag times in changing the program design. Strong Measure performance is standardized relative to the population of providers Moderate Same as Hospital Value-Based Purchasing. Unclear Incentive payments are relatively Unclear small in the first program year and increase gradually over time. Program builds on the high profile Hospital Compare public reporting program, which may reduce need for large incentives. Quality Improvement Organizations Weak are intended to assist providers. QIOs, historically, have stronger relationships with hospitals than other providers. Strong Moderate Same as Hospital Value-Based Purchasing. However, historical use of HEDIS measures in managed care profiling creates common measure set for P4P across payers. Eligible practices are subject to Strong substantial penalties (up to 2.5%) for not participating. Incentives for performing better on value tiering appear small $8 Billion in payouts during the demonstration.48 Does not appear to be explicit mechanism to provide technical assistance to providers. There exists the potential for QIOs to assist physicians and practices. QIOs are intended to assist providers. Moderate Number of best practice criteria with moderate or strong alignment: Hospital Value-Based Purchasing—5; Physician Value-Based Payment Modifier—3; Medicare Advantage Quality Bonus Program—6. that the PVBPM applies only to physicians in practices of 100 or more in its first year of implementation, this severely limits participation at the outset of the program. Only 6% of US physicians work in practices of 51 or greater,38 and far fewer practice in groups of 100 or more. Even if program administration is successfully initiated for these large practices, implementation issues will differ substantially for smaller practices, and the lessons learned from the initial stage of implementation may not be applicable across all practices. Second, when rollout does occur for smaller practices, the small sample sizes will reduce the reliability of the performance estimates on individual measures. Improving the reliability of performance estimates will be important to guard against paying on the basis of noisy data. Third, it remains to be seen if and how a shared savings model would be implemented for the PVBPM: What benchmark will be used to determine savings (e.g., savings relative to a practice's historical performance or relative to comparable practices)? Will PVBPM reward quality achievement and improvement? Also, what plans are in place to allow the design of the PVBPM to evolve over time, for instance to set new thresholds for payment, retire old quality measures, and integrate new measures? Finally, physicians participating in Medicare Shared Savings Accountable Care Organizations (ACO), Pioneer ACOs, or the Comprehensive Primary Care Initiative will not initially be subject to the PVBPM, and it is unclear how the PVBPM will integrate with ACO incentives as both programs expand. The design of the Medicare Advantage QBP has the strongest alignment with the best practice criteria. The QBP is moderately or strongly aligned with criteria #1 (measures have room for improvement), #2 (promote widespread awareness of the program), #4 (incentivize quality attainment and quality improvement), #5 (adjust programs dynamically), #6 (pay sufficiently large incentives), and #7 (provide technical assistance to participants), and does not appear to be aligned with #3 (coordinate 46 A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 program design across payers). While the Medicare Advantage QBP program will reward both quality achievement and improvement, it will be important to monitor how plans attempt to meet these objectives as the threshold for receipt of the bonus payment is raised in 2015 so as to avoid plans selecting patients on the basis of how likely they are to meet criteria for incentive payments, further contributing to historical selection issues in Medicare Advantage. Another issue is how the Medicare Advantage QBP measures will align with those in the PVBPM. Medicare's entry into P4P was supposed to overcome the problem of numerous small-bore P4P programs competing for providers' attention, but different P4P programs for physicians caring for both Medicare fee-for-service and Medicare Advantage patients is contrary to this objective. Coordination across the various federal and private payer P4P programs is critical so as to minimize burden on providers. CMS has begun to take action to harmonize these efforts, but it remains a work in progress.39 An additional challenge will be for CMS to synchronize performance measures between the Medicare Advantage program and those reported through the quality reporting system required by state insurance exchanges in the ACA. As insurers, Medicare Advantage plans must also find effective ways to coordinate efforts with providers to improve quality performance, as few plans are vertically integrated with providers. 6. Considerations for the future of Value-Based Purchasing in Medicare The ACA has institutionalized P4P for hospitals, physicians, private plans, and will advance testing of P4P in other settings, such as skilled nursing and home health. The attempt to improve value in how Medicare purchases services from health care providers is a welcome development. Nonetheless, designing these programs to achieve improvements in value remains a challenge, one that must balance the interests of payers and providers to create meaningful and fair incentives (i.e., high reliability, low misclassification, appropriate setting of any thresholds and weighting of measures, and selection of clinically meaningful measures) that appeal to the values of the provider community and achieve buy-in. Our assessment of the alignment between P4P programs in Medicare and best practices in P4P program design varies across the programs we examined: HVBP is moderately or strongly aligned with five of the seven best practices, the PVBPM is aligned with three of the best practices, and the Medicare Advantage QBP is aligned with six of the best practices. These assessments involved judgment that could have been construed otherwise: for instance, we indicated a “moderate” degree of alignment between each of the Medicare programs and criterion #7 (adjust programs dynamically), although, depending on the program, it may take several years for performance measures and other design features to be modified. Additionally, the PVBPM has not yet been fully defined, leading to uncertainty about its alignment between many of its design features and best practice in P4P. Nonetheless, our analysis provides an overview of the concordance between Medicare's implementations of P4P and how existing evidence suggests programs should be structured, helping to identify elements of these programs that may merit redesign in the future. What remains uncertain, however, is which of the identified P4P design elements are necessary, which are sufficient, and which are optional for a program to successfully motivate quality improvement. Further, the scope of our best practice criteria are quite limited, reflecting the fact that the literature is lacking in many areas that could guide design, such as what is the best approach for funding performance bonuses and incentivizing reductions in the rate of cost growth in these programs. The initial hope of P4P was that higher quality care could result in healthier patients and lower costs, but, by focusing largely on measures of underuse, this has largely failed to materialize in the first generation of P4P programs.23,40 Recognizing this, many of the second generation programs are explicitly incorporating costs of care into payout criteria. One approach, embodied in the Medicare ACO program, is to pay providers bonuses for quality from a pool of health care cost “savings” relative to some benchmark. If there are no savings, there are no bonuses. Alternatively, as implemented in HVBP, bonus payments for winning hospitals are funded from revenue reductions from losing hospitals, and cost reductions are directly incentivized as a performance measure: the Medicare Spending per Beneficiary measure will be incentivized starting in FY 2015. It is unclear if either of these approaches will encourage value improvement in Medicare. Research has also just begun to evaluate the relationship between P4P design features and unintended consequences. On one hand, the relatively small magnitude of financial incentives in many P4P programs has been noted as one of the reasons why these programs have not generated substantial effects.13 On the other hand, larger incentives, in the context of budget neutrality, means that more money will be redistributed, potentially hurting certain classes of providers and the patients they serve, and potentially resulting in undesirable behaviors from providers (as seen in the recent cheating scandals in high stakes educational testing). One recent study found that incentive payments in the HQID accrued disproportionately to hospitals that cared for the least disadvantaged patients when payouts were based on high quality attainment alone, but that the change in incentives to reward improvement led to greater payments accruing to hospitals that cared for more disadvantaged patients.41 While related research suggests that the HQID did not have a direct effect on health care disparities by leading to the avoidance of minority patients,42 research on the distribution of bonus payments suggests that design of P4P programs can affect financial performance across the gradient of socioeconomic disadvantage. This finding has also been observed in P4P programs implemented for physician practices.43 The redistribution of income across providers may have downstream implications for disparities in care, particularly over the longer term. Another emerging issue in the design of P4P programs concerns the pros and cons of complexity. The current design of HVBP, with incentives for both quality attainment and improvement based on a continuous points system using a linear exchange function is conceptually appealing, but complex. While proposals have been developed to modify the existing incentive criteria to encourage equity in hospital P4P,44 it is possible that the complexity of the current design does not give hospitals specific quality targets and, as a result, may be a barrier to quality improvement activities. Alternatively, the Quality and Outcomes Framework in the United Kingdom uses a very simple design in which providers are rewarded for every “right” service they provide and are incentivized through a linear exchange function with a minimum and maximum score. The challenge with implementing such a design in Medicare is that budget neutrality, which is currently embedded in Value-Based Purchasing, requires that either the performance criteria or the magnitude of the incentive must remain flexible, and thus unknown, to providers during the evaluation period. Conceptually, much of the complexity associated with the design of P4P stems from the fact that one set of incentives is intended to provide incentives that maximize quality improvement while minimizing unintended consequences across heterogeneous providers that differ with respect to their levels of quality attainment, their trajectory of improvement, patient populations, A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 and their organizational and financial capacity for continued change.45 An alternative design strategy would be to create a relatively simple design (e.g., a linear exchange function based on quality attainment alone in which the median is the threshold for a bonus or penalty) but implement the design among homogenous competition pools, defined by region, levels of quality, rural or urban location, size, or other criteria. If providers in the same competition pool had relatively similar levels of quality and the same ability to improve, then a simple attainment-only design would likely be sufficient to meet the program aims while avoiding the complexity of a one-size-fits-all design. The core elements of HVBP have been statutorily defined, and, notwithstanding major patient safety concerns that are tied to HVBP, CMS has limited flexibility to alter the program design in the short term. The primary area in which CMS can modify the design of HVBP is in the choice of performance measures, arguably the most crucial element of P4P programs. Research should therefore focus on impact analysis, and whether different measure domains, and perhaps individual measures, may be more responsive to the program. In addition, research should pay special attention to the potential unintended consequences of HVBP, particularly the potential for revenue distribution away from hospitals caring for disadvantaged patients and the potential for hospitals to avoid caring for certain classes of patients as HVBP begins to reward performance on outcome measures. For the PVBPM, the design is still unfolding and can be informed by the evidence to date from physician P4P program implementations in the private sector. The major issues going forward are deciding which types of physician practices to target, how large incentive payments should be, what criteria are used for incentive payments, and how to align the PVBPM with Medicare Advantage QBP. In Medicare Advantage, the large magnitude of incentive payments and the reasonable incentive design emphasizing both quality achievement and quality improvement are good signs. Important issues for this program will be to avoid patient cherry-picking among plans to increase the likelihood of obtaining the quality bonus payments. Moving forward with Value-Based Purchasing in Medicare, an approach more appealing than having different programs for different providers (hospitals, physicians, skilled nursing) would be to develop integrated programs across providers that share accountability for patients. A single integrated Value-Based Purchasing program covering all domains of care implemented for Accountable Care Organizations may be the best way to simplify Value-Based Purchasing and integrate across care settings to improve value. However, this presumes that ACOs will diffuse broadly enough to substantially impact care nationally, which remains to be seen. In the meantime, policymakers need to look to ways to strengthen existing programs to achieve the goal of value improvement in Medicare. Finally, continuous monitoring and evaluation of Value-Based Purchasing programs in Medicare is crucial as these programs evolve. There is a large social science literature that challenges the use of financial incentives to change behavior, suggesting that financial rewards for intrinsically valuable activities can be detrimental.46 To the extent possible, research should focus on how to modify Value-Based Purchasing programs within their statutory constraints to achieve best practice in design and align the programs with emerging behavioral economics literature around financial incentives and behavior.47 This research has the potential to result in program designs that are better aligned with the objectives of CMS to improve value. While the new generation of P4P experiments may yield stronger evidence than previous programs, Value-Based Purchasing is only one of many possible levers to use to improve value in health care. Continued emphasis on broader payment reform is critical as we progress towards a 47 better understanding of how to effectively implement value-based payment reform in Medicare. Conflict of interest The authors have no conflicts of interest regarding this work. Funding Cheryl Damberg currently is funded to conduct work on two federal contracts, one for the Assistant Secretary for Planning and Evaluation in the US Department of Health and Human Services (TO 12-233-SOL-00418) on “Value-based Purchasing: Measuring Success” and the other for the Centers for Medicare and Medicaid Services (HHSM-500-2005-00028I, TO # HHSM-500-T0005) on “Analyses Related to Medicare Advantage Plan Ratings for Quality Bonus Payments.” Her work on this paper was not supported by either of these projects. For Andrew Ryan, this project was supported by a grant from the Agency for Healthcare Research and Quality (K01 HS018546-01). Andrew Ryan is also funded by a grant from the Robert Wood Johnson Foundation to study the effects of Hospital Value-Based Purchasing on quality of care in its first period of implementation. The views expressed in this paper are solely those of the authors. The authors would like to acknowledge Jordan VanLare and William Borden for helpful comments on the manuscript. Appendix A. Identifying best practices for the design of P4P programs Various attempts have been made to identify the “best practices” for the design of P4P in health care.49 Probably the most thorough attempt to identify P4P design elements that are supported by the literature is a recent paper by Van Herck et al.19 Based on a review of 128 studies of the effects of financial incentives on quality in health care, the authors identify a number of P4P design elements that appear to be supported by the literature: choose measures that have room for improvement (i.e., are not topped out); choose process or intermediate outcome measures rather than longer term outcomes; engage providers and stakeholders throughout implementation; coordinate program design across payers; incentivize both quality attainment and quality improvement; and make incentive payments at the individual or team level rather than higher organizational levels (e.g. hospitals or physician practice). The authors further note several design P4P features that are supported by theory, but for which there is inconclusive evidence supporting their use in health care: adjust programs dynamically to recalibrate measures and payment thresholds; pay incentives that are sufficiently large to motivate a behavioral response; and provide technical assistance to participating providers.50 Using the Van Herck et al. review as a starting point, we make several modifications to these criteria based both on studies that have been published since the review as well as our own interpretation of the evidence. Our assessment of best practices considers the strength of the evidence both for design elements that lead to stronger incentives to improve quality and for elements that lead to fairness in the programs (i.e., the elements that do not result in systematic advantages or disadvantages across different types of participating providers). First, we eliminate the criterion “Choose process or intermediate outcome measures rather than longer term outcomes.” Research from social psychology and behavioral economics suggests that financial incentives may be more effective meeting simple per- 48 A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 formance criteria (e.g. process improvement) compared to more complex performance criteria (e.g. outcome improvement),51 and process improvement is certainly easier than outcome improvement, which may explain the apparent advantage of incentivizing process measure or intermediate outcomes in P4P programs. However, recent literature on the process and outcome relationship suggests that incentivizing process improvement, by itself, is insufficient to improve outcome performance.27,52–54 This questions the clinical importance of process improvement by itself, leading us to eliminate this best practice criterion. Second, we modify the criterion “Engage providers and stakeholders throughout implementation” to “Promote widespread awareness of the program.” While evidence suggests that provider awareness of incentive programs is critical to their success, provider engagement – particularly related to the choice of measures and performance criteria – may be more reflective of provider capture of P4P programs, rather than best practice. For example, physicians in the United Kingdom have succeeded in avoiding increases to maximum thresholds for incentive payments, even as the vast majority of physicians have exceeded these thresholds and have no financial incentives for additional incremental improvements in quality.55 Finally, we eliminate the criterion “Make incentive payments at the individual or team level rather than higher organizational levels,” because we feel that this criterion is specific to the type of performance measure, and not a general concept. For instance, for simple process measures over which a physician may have a large degree of control (e.g. measuring HbA1c levels for diabetic patients), incentives may be optimally effective at the physician level, whereas incentives for measures involving more complex processes and coordination across providers (e.g. reducing readmissions), are surely not optimally effective at the physician level. While the criterion “Incentivize both quality attainment and quality improvement” may in fact lead to stronger incentives for improvement, it was not supported by the literature synthesized by the Van Herck et al. review and recent evaluations of P4P programs that have incentivized both quality attainment and improvement have not found this approach to improve quality.25,26,37 However, we chose to maintain this criterion because research has found that this design feature leads to a more equal distribution of payouts across providers caring with initially higher and lower levels of quality, and those that care for patients with greater or lesser socioeconomic disadvantage.41 References 1 Arrow K. Uncertainty and the welfare economics of medical care. American Economic Review. 1963;53(5):941–969. 2 Committee on Quality of Health Care in America, Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. The National Academies Press; 2001. 3 Ryan A, Blustein J. Making the best of hospital pay for performance. New England Journal of Medicine. 2012;366(17):1557–1559. 4 Centers for Medicare and Medicaid Services. Report to Congress: Plan to Implement a Medicare Hospital Value-Based Purchasing Program.Washington, DC, US: Department of Health and Human Services; 2007 November 21. 5 Doran T, Fullwood C, Gravelle H, et al. Pay-for-performance programs in family practices in the United Kingdom. New England Journal of Medicine. 2006;355(4): 375–384. 6 Town R, Kane R, Johnson P, Butler M. Economic incentives and physicians’ delivery of preventive care: a systematic review. American Journal of Preventive Medicine. 2005;28(2):234–240. 7 Tompkins CP, Higgins AR, Ritter GA. Measuring outcomes and efficiency in medicare value-based purchasing. Health Affairs (Millwood). 2009;28(2): w251–w261. 8 Rosenthal MB, Fernandopulle R, Song HR, Landon BE. Paying for quality: providers’ incentives for quality improvement. Health Affairs. 2004;23(2): 127–141. 9 Rosenthal MB, Landon BE, Normand SL, Frank RG, Epstein AM. Pay for performance in commercial HMOs. New England Journal of Medicine. 2006;355(18): 1895–1902. 10 Kuhmerker K, Hartman T. Pay-for-performance in State Medicaid Programs: a survey of State Medicaid Directors and Programs. IPRO; April 2007: 1018. 11 Robinson JC, Shortell SM, Rittenhouse DR, Fernandes-Taylor S, Gillies RR, Casalino LP. Quality-based payment for medical groups and individual physicians. Inquiry. 2009;46(2):172–181. 12 Alexander JA, Maeng D, Casalino LP, Rittenhouse D. Use of care management practices in small- and medium-sized physician groups: do public reporting of physician quality and financial incentives matter? Health Serv Res. 2013; 48(2 Pt 1):376–397. 13 Petersen LA, Woodard LD, Urech T, Daw C, Sookanan S. Does pay-forperformance improve the quality of health care? Annals of Internal Medicine. 2006;145(4):265–272. 14 Rosenthal MB, Frank RG. What is the empirical basis for paying for quality in medical care? Medical Care Research and Review. 2006;63(2):135–157. 15 Mehrotra A, Damberg CL, Sorbero ME, Teleki SS. Pay for performance in the hospital setting: what is the state of the evidence? American Journal of Medical Quality: The Official Journal of the American College of Medical Quality. 2009;24(1): 19–28. 16 Flodgren G, Eccles MP, Shepperd S, Scott A, Parmelli E, Beyer FR. An overview of reviews evaluating the effectiveness of financial incentives in changing healthcare professional behaviours and patient outcomes. Cochrane Database of Systematic Reviews. 2011(7):CD009255. 17 Scott A, Sivey P, Ait Ouakrim D, et al. The effect of financial incentives on the quality of health care provided by primary care physicians. Cochrane Database of Systematic Reviews. 2011(9):CD008451. 18 Houle SK, McAlister FA, Jackevicius CA, Chuck AW, Tsuyuki RT. Does performance-based remuneration for individual health care practitioners affect patient care? A systematic review Annals of Internal Medicine. 2012;157(12): 889–899. 19 Van Herck P, De Smedt D, Annemans L, Remmen R, Rosenthal MB, Sermeus W. Systematic review: effects, design choices, and context of pay-for-performance in health care. BMC Health Services Research. 2010;10:247. 20 Lindenauer PK, Remus D, Roman S, et al. Public reporting and pay for performance in hospital quality improvement. New England Journal of Medicine. 2007;356(5):486–496. 21 Grossbart SR. What's the return? Assessing the effect of pay-for-performance initiatives on the quality of care delivery Medical Care Research and Review. 2006;63(1 Suppl):29S–48S. 22 Glickman SW, Ou FS, DeLong ER, et al. Pay for performance, quality of care, and outcomes in acute myocardial infarction. JAMA. 2007;297(21):2373–2380. 23 Ryan AM. Effects of the premier hospital quality incentive demonstration on medicare patient mortality and cost. Health Services Research. 2009;44(3): 821–842. 24 Bhattacharyya T, Freiberg AA, Mehta P, Katz JN, Ferris T. Measuring the report card: the validity of pay-for-performance metrics in orthopedic surgery. Health Affairs (Millwood). 2009;28(2):526–532. 25 Werner RM, Kolstad JT, Stuart EA, Polsky D. The effect of pay-for-performance in hospitals: lessons for quality improvement. Health Affairs (Millwood). 2011;30(4): 690–698. 26 Ryan AM, Blustein J, Casalino LP. Medicare's flagship test of pay-forperformance did not spur more rapid quality improvement among lowperforming hospitals. Health Affairs (Millwood). 2012;31(4):797–805. 27 Jha AK, Joynt KE, Orav EJ, Epstein AM. The long-term effect of premier pay for performance on patient outcomes. New England Journal of Medicine. 2012;366 (17):1606–1615. 28 Rosenthal MB, Frank RG, Li Z, Epstein AM. Early experience with pay-forperformance: from concept to practice. JAMA. 2005;294(14):1788–1793. 29 Damberg CL, Raube K, Teleki SS, dela Cruz E. Taking stock of pay-for-performance: a candid assessment from the front lines. Health Affairs. 2009;28(2): 517–525. 30 Pearson SD, Schneider EC, Kleinman KP, Coltin KL, Singer JA. The impact of payfor-performance on health care quality in Massachusetts, 2001–2003. Health Affairs. 2008;27(4):1167–1176. 31 Song Z, Safran DG, Landon BE, et al. The ‘Alternative Quality Contract,’ based on a global budget, lowered medical spending and improved quality. Health Affairs. 2012;31(8):1885–1894. 32 Centers for Medicare & Medicaid Services. Medicare program; hospital inpatient value-based purchasing program. Final rule. Federal Register. 2011;76(88): 26490–26547. 33 cms.gov. Physician quality reporting system: 2010 reporting experience. 〈http:// www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/ PQRS/index.html〉; 2013 Accessed 27.02.13. 34 Medicare Program; Revisions to payment policies under the physician fee schedule, DME face-to-face encounters, elimination of the requirement for termination of non-random prepayment complex medical review and other revisions to Part B for CY 2013. In: Centers for Medicare & Medicaid Services, ed. 77 FR 44721. vol. 42 CFR Parts 410, 414, 415, 421, 423, 425, 486, and 495: Federal Register; 2012: 68891–69380. 35 Cotton P, Datu B, Thomas S. Early evidence suggests medicare advantage pay for performance may be getting results. Health Affairs Blog. 2012;2013 [healthaffairs.org: Health Affairs]. 36 Ryan A, Damberg C. In: Services CfMaM, editor. Table 16. Actual Hospital Value Based Purchasing Program (VBP) Adjustment Factors for FY 2013. Washington DC: Centers for Medicare & Medicaid Services; 2013. 37 Ryan AM, Blustein J. The effect of the MassHealth hospital pay-for-performance program on quality. Health Services Research. 2011;46(3):712–728. A.M. Ryan, C.L. Damberg / Healthcare 1 (2013) 42–49 38 Boukus ER, Cassil A, O'Malley AS. A Snapshot of U.S. Physicians: Key Findings from the 2008 Health Tracking Physician Survey. vol. 35. Center for Studying Health System Change; 2009. 39 Damberg C. Efforts to Reform Physician Payment: Tying Payment to Performance. Santa Monica, CA: RAND Corporation; 2013. 40 Kruse GB, Polsky D, Stuart EA, Werner RM. The impact of hospital pay-forperformance on hospital and Medicare costs. Health Services Research. 2012;47(6): 2118–2136. 41 Ryan AM, Blustein J, Doran T, Michelow MD, Casalino LP. The effect of Phase 2 of the Premier Hospital Quality Incentive Demonstration on incentive payments to hospitals caring for disadvantaged patients. Health Services Research. 2012;47(4): 1418–1436. 42 Ryan AM. Has pay-for-performance decreased access for minority patients? Health Services Research. 2010;45(1):6–23. 43 Chien AT, Wroblewski K, Damberg C, et al. Do physician organizations located in lower socioeconomic status areas score lower on pay-for-performance measures? Journal of General Internal Medicine. 2012;27(5):548–554. 44 Borden WB, Blustein J. Valuing improvement in value-based purchasing. Circulation: Cardiovascular Quality and Outcomes. 2012;5(2):163–170. 45 Blustein J, Borden WB, Valentine M. Hospital performance, the local economy, and the local workforce: findings from a US National Longitudinal Study. PLoS Medicine. 2010;7(6):e1000297. 46 Deci EL, Koestner R, Ryan RM. A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin. 1999;125(6):627–668 [discussion 692–700]. 49 47 Woolhandler S, Ariely D. Will pay for performance backfire? insights from behavioral economics Health Affairs Blog. 2012;2013 [healthaffairs.org: Health Affairs]. 48 The Henry J, Kaiser Family Foundation. Medicare Advantage Plan Star Ratings and Bonus Payments in 2012.Menlo Park, CA: Kaiser Family Foundation; 2011. 49 Institute of Medicine. Rewarding Provider Performance: Aligning Incentives in Medicare. Washington, DC: 2006. 50 Damberg C, Sorbero ME, Mehrotra A, Teleki SS, Lovejoy S, Bradley L. An Environmental Scan of Pay for Performance in the Hospital Setting: Final Report. Washington, DC: Department of Health and Human Services; 2007. 51 Osterloh M, Frey B. 20-Does pay for performance really motivate employees? Business Performance Measurement: Unifying Theory and Integrating Practice. 2nd ed., Cambridge University Press; 433–448. 52 Werner RM, Dudley RA. Medicare's new hospital value-based purchasing program is likely to have only a small impact on hospital payments. Health Affairs (Millwood). 2012;31:1932–1940. 53 Ryan AM, Burgess Jr. JF, Tompkins CP, Wallack SS. The relationship between Medicare's process of care quality measures and mortality. Inquiry. 2009;46: 274–290. 54 Ryan AM, Nallamothu BK, Dimick JB. Medicare's public reporting initiative on hospital quality had modest or no impact on mortality from three key conditions. Health Affairs (Millwood). 2012;31:585–592. 55 British Medical Association. General Medical Services—Contractual Changes 2013/14: BMA GPC Response. British Medical Association; 2013. Healthcare 1 (2013) 50–51 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Interview with Mark B. McClellan, MD, Ph.D. Interviewed by Brian W. Powers A medical doctor and economist, Mark McClellan works on promoting high-quality, innovative and affordable health care. Once commissioner of the Food and Drug Administration (FDA) and administrator of the Centers for Medicare & Medicaid Services (CMS), Dr. McClellan is currently Director and Leonard D. Schaeffer Chair in Health Policy Studies, of the Engelberg Center for Health Care Reform at the Brookings Institution. McClellan holds an MD from the Harvard University–Massachusetts Institute of Technology (MIT) Division of Health Sciences and Technology, a Ph.D. in economics from MIT, an MPA from Harvard University, and a BA from the University of Texas at Austin. He completed his residency training in internal medicine at Boston's Brigham and Women's Hospital, is board-certified in Internal Medicine, and has been a practicing internist during his career. Brian Powers: You held several senior positions in the federal government before coming to Brookings. When transitioning from federal service why did you choose Brookings? Mark McClellan: I thought about several options, but my roots are in academics. I did not want to move far away from that, but I had really enjoyed working on implementation issues with people in government and the private sector. Brookings seemed like a good balance of the two. I can continue to do policy analysis and research, but also stay closely connected to political developments and support public-private collaborations. BP: Do you ever miss being in an implementation role? MM: In some ways we are not too far from it at Brookings. For example, a lot of our work relates to having impact on current legislative and regulatory issues such as Accountable Care Organization (ACO) implementation, physician payment reform, drug development and surveillance. One of the hardest things about being in an implementation role was the consuming nature of the work. In policy analysis and research you do not have to get into all of the depths, complexities, and practical issues of implementing new programs. At the beginning, it was nice to have more control over my schedule and less of that stress. But one of the most rewarding things about being involved in implementation is actually seeing your ideas make a real and direct difference. So I do miss some of that. BP: How did your time at CMS and FDA shape the portfolio of activities you lead at Brookings? MM: Our focus at FDA was on turning ideas from the lab into safe, effective, and reliable treatments that can make an impact on the lives of patients. I viewed my time at CMS in a similar way. In my first address to the staff at CMS I highlighted that I thought CMS was the nations largest public health agency in terms of 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.05.001 financing, since how we pay for care has such an impact on effective treatment. At Brookings we have tried to put these ideas together. Whether it is better approaches to determining which treatments work for particular patients or better approaches to improving the delivery system, all of our work maintains a focus on protecting and promoting the health of the public. BP: Are there any recent reports or activities from Brookings that you would like to highlight? MM: Last month we released the Bending the Curve report, which brought together a number of experts to determine the best policy directions for bringing costs down. The emphasis in the report is on the fact that the best way to effectively reduce costs is to focus on improving quality and on getting care right. There are recommendations about financing and regulation, but the basic theme is that better care as the pathway to higher-value. BP: You were recently part of a debate in the Wall Street Journal over the promise of Accountable Care Organizations. What is the source of the disagreement? MM: I do not think there is any disagreement about the need for disruptive innovation in health care. The question is how is to make that happen as quickly as possible. One of the reasons we have not seen more disruptive innovation is because our financing is not aligned. In most other industries you are rewarded when you develop a product that costs less or does a better job. But under traditional financing and regulatory systems in health care you are often punished. Strategies like intervening early and targeted use of interventions lead to less revenue. The promise of ACOs is making disruptive innovation pay off. ACOs align valuable innovation and health care financing systems so it is no longer the case that if you come up with an innovation in care that reduces costs while maintaining or improving quality you lose money. There are certainly other ways to align incentives for innovation and improvement, such as value-based insurance design and steps to make regulations focused more on results and value rather than on structural and process issues. BP: What early lessons are you starting to see from your work with the ACO community? MM: I do not want to oversimplify based on limited case study results, but ACOs that seem to have the most impact are ones where there is an organizational commitment to real cultural change creating a systematic focus on getting to better care at a lower cost. Many of the ACOs we work with have emphasized that it's not a matter of implementing one initiative and thinking you’re done. Changing care delivery enough to have a systematic impact on health outcomes and costs is a long and sustained process. Similarly, viewing Interview with Mark B. McClellan / Healthcare 1 (2013) 50–51 the ACO financing model as a transitional process seems to lead to more of an impact over time. Successful organizations use ACO contracts a starter phase to help them get data systems in place and identify a clear set of opportunities for improving care and lowering costs. They can then move on to additional changes in financing such as two-sided risk and capitation contracts. We have seen a lot of encouraging results so far, mostly in the private sector. But it is still early and we are looking for more systematic evidence on when and how ACOs work best. BP: How do ACOs fit into the larger environment of delivery and finance reform? MM: The key thing about ACOs is the focus on better health outcomes at lower costs accompanied by aligned financial incentives. ACOs are not the only financing reform that can achieve this goal. Also very helpful, even instrumental, could be case-based payments in primary care medical homes and bundled payments for episodes of care. In fact, a lot of ACOs are implementing these kinds of payment reforms together in a complementary and reinforcing way. But it is not just about provider payment reforms. There needs to be corresponding changes in benefit design so that patients can share in the savings when they take steps to get more effective care at a lower cost. Some private sector ACOs are now implementing their provider payment changes along with changes in benefit design that lower premiums or co-pays when patients use more effective, lower cost services. BP: You mentioned that among ACOs that have made an impact, there is a cultural commitment to high-value care. Do you think that the process of sitting down and negotiating an ACO contract motivates culture change? MM: The focus on ACOs has been helpful since it puts the core goals of better health and avoiding unnecessary costs at the center of 51 an organization's activities. Even if an organization does not implement an ACO, or adopts other payment reforms, it is a great way to reorient the discussion around innovative approaches to get more value. BP: On the topic of the shifting payment from volume to value, what can CMS do to best facilitate that transition? MM: I think it would be helpful for CMS to view each of their payment reform activities as potentially reinforcing elements of a comprehensive improvement strategy rather than standalone solutions. Many of these programs were started as pilots, and evaluation is focused on how each may have an impact in isolation. I think making these steps part of a systematic strategy is more important for driving health care reform. Instead of figuring out if we can get costs down and quality up with a particular medical home or bundled payment model, it may be more important to implement them all together and look for the cumulative impact. This requires a systematic approach to thinking about payment reform as well as a systematic infrastructure for measurement and data support. And I think we still have a way to go before CMS is there. BP: To conclude, what major gaps do you see in the literature on health care delivery science and improvement? What role can a journal like HJDSI play in filling these gaps? MM: Health care delivery is complex, dynamic, and uncertain. A lot of organizations are trying to get better results for patients at a lower cost, but are not sure how to do it. Success depends on current organizational form, policy options for reimbursement and benefit design, regulation, and being able to make solid inferences from actual practice data. Better science for dealing with this complexity is badly needed and I think the Journal could play an important role in its development. Healthcare 1 (2013) 52–54 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Short communication Instant replay David I. Rosenthal n a b Yale School of Medicine, New Haven, CT, USA Homeless PACT, VA Connecticut 950 Campbell Avenue, New Haven, CT, USA ar t ic l e i nf o a b s t r a c t Article history: Received 11 November 2012 Received in revised form 27 March 2013 Accepted 20 April 2013 Available online 9 May 2013 With widespread adoption of electronic health records (EHRs) and electronic clinical documentation, health care organizations now have greater faculty to review clinical data and evaluate the efficacy of quality improvement efforts. Unfortunately, I believe there is a fundamental gap between actual health care delivery and what we document in the current EHR systems. This process of capturing the patient encounter, which I'll refer to as transcription, is prone to significant data loss due to inadequate methods of data capture, multiple points of view, and bias and subjectivity in the transcriptional process. Our current EHR, text-based clinical documentation systems are lossy abstractions − one sided accounts of what take place between patients and providers. Our clinical notes contain the breadcrumbs of relationships, conversations, physical exams, and procedures but often lack the ability to capture the form, the emotions, the images, the nonverbal communication, and the actual narrative of interactions between human beings. I believe that a video record, in conjunction with objective transcriptional services and other forms of data capture, may provide a closer approximation to the truth of health care delivery and may be a valuable tool for healthcare improvement. & 2013 Elsevier Inc. All rights reserved. Keywords: Quality improvement Electronic health records Video technology Graduate medical education “The information you have is not the information you want. The information you want is not the information you need. The information you need is not what you can get or is not known. The information that is known can't be found in time.” Finagle's law.1 Sunday, 3pm. Code Blue. Our code team races to converge on a patient on a Neurology floor. When we all arrive, the patient is found unconscious in her bed. The primary team communicates to us important information and confirms that she has an Advance Directive with instructions to not resuscitate. She is DNR. Within moments the patient loses her pulse, has an asystolic arrest and dies. While somewhat mundane and limited in its scope compared to other code situations during my residency training, it plays over and over again in my memory. It was particularly memorable because the event was recorded on video while the patient was undergoing Video Electroencephalography (Video EEG) monitoring. Prior to my career in medicine, I worked as a filmmaker and a media technician. Sitting down at the computer screen to review the 10-min video reminded me of my days in the editing room and provided me with a rare omniscient glimpse of my own and our team's performance. It provided extraordinary insight and sparked my imagination for further uses of video technology as a tool in healthcare settings. n Correspondence address: Yale School of Medicine, New Haven, CT, USA. Tel.: +1 203 479 8077. E-mail addresses: [email protected], [email protected] 2213-0764/$ - see front matter & 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.hjdsi.2013.04.004 With widespread adoption of electronic health records (EHRs) and electronic clinical documentation, health care organizations now have greater faculty to review clinical data and evaluate the efficacy of quality improvement efforts. Unfortunately, I believe there is a fundamental gap between actual health care delivery and what we document in the current EHR systems. This process of capturing the patient encounter, which I'll refer to as transcription, is prone to significant data loss due to inadequate methods of data capture, multiple points of view, and bias and subjectivity in the transcriptional process. Our current text-based documentation systems are lossy abstractions—one sided, incomplete, accounts of encounters between patients and providers. Our notes contain the breadcrumbs of relationships, conversations, physical exams, and procedures but lack the form, the emotions, the images, the nonverbal communication, and the actual narrative of interactions between human beings. Anyone who has performed chart review understands these inherent challenges.2,3 I believe that a video record, in conjunction with objective transcriptional services and other forms of data capture, may provide a closer approximation to the truth of health care delivery and may be a valuable tool for healthcare improvement. There is a growing evidence base for using video data as a means to review and audit the quality of care delivery. Video has been used in healthcare settings for myriad clinical and nonclinical purposes: medical education with standardized patients, hospital security, trauma care, long term video electroencephalography (EEG), surgical procedures in operating rooms, sleep studies, and more recently remote patient encounters, tele ICU D.I. Rosenthal / Healthcare 1 (2013) 52–54 monitoring, and even remote monitoring of hospital handhygiene.4 Much of the early literature of video auditing initially focused on emergency and trauma care,5–7 in tele ICU settings, and with standardized patients in medical education.8 Such shortinterval, discrete encounters lend themselves well for auditing and there is some data to support various technologies. Video recordings have been shown to be more sensitive than traditional methods at identifying management errors during trauma resuscitation.9 In Europe in the 1990s with far less sophisticated technology, pilots of video observation of general practitioners in outpatient clinics were shown to be feasible to implement, valid for assessing practitioner communication skills, and not have a significant effect on consultation length or number of problems dealt with in the patient encounter.10,11 In spite of this, valid legal concerns about patient privacy, consent, and discoverability have led many trauma centers, hospitals, and birthing centers to prohibit all videotaping practices.12,13 As a primary care physician who sees patients both in the outpatient clinic and in the inpatient hospital setting, I believe that video may have a role to offer more objectivity in transcription of actual clinical care services delivered, and the ability to provide meaningful, rapid oversight of clinical care. With most medical innovations, new data collection modalities spur unimagined applications. Since Einthoven captured and characterized electrical waves on the skin, the skin-based electrical signal capture technology has far blossomed past the application of the electrocardiogram, to monitoring in telemetry units, outpatient loop recorders, electrophysiology labs, electroencephalography (EEG) units, automatic external defibrillators, and newer experimental devices built into exercise equipment, iphone apps, and car seats. In the same vein, there are numerous possible applications of using video data in healthcare—most of which can be thought of as utilizing one of the two core functions: (1) live monitoring of video feeds, e.g. instantaneous review such as in tele ICUs, or (2) saving video data and banking for retrospective analysis. I believe there are valuable applications of both functions for key stakeholders in both inpatient and outpatient settings. Tele ICUs, which provide instantaneous review for remote monitoring in higher-acuity patients, provide some current insights and some conflicting quality data. Some studies show no clinical benefits14; other reviews suggest an association with lower ICU mortality and length of stay but not with lower in-hospital mortality or hospital length of stay15; and still others, a single academic center focused on re-engineering the process of implementation showed significant improvements in adherence to best practices with associated mortality reduction and decreased hospital length of stay.16 Staff acceptance of such programs shows initial ambivalence and resistance, but after implementation, overall acceptance is high, and perceived improvement in quality of care is higher as well.17 While I anticipate many nervous colleagues and privacy advocates sending me inflammatory messages in my inbox, I would welcome video oversight into my office if I believed it were in the patients' best interests and if patients felt that such oversight could provide a tangible benefit to them. With the correct system in place, as a primary care physician I could send a video link of an encounter to a specialist with a clinical question for e-consultation or to a master clinician for a second opinion. For the patient or caregiver, access to a secure video link or a transcription of an encounter allows reviewing of important information, and perhaps even submitting for a second opinion. I would welcome automatic audio transcriptional services to capture much of the documentation needed for accurate coding and billing to better allow me to engage fully with patients—eyes and fingers away from the EHR. The applications of video documentation are numerous and limited only by our collective imaginations. It is not hard to 53 imagine the hospitals of the future with control rooms with highly-trained clinical and operational staff, like air traffic control towers, remotely monitoring care delivery, communicating with nursing staff, and assisting with bed flow management or early detection of clinical decompensation in all areas of the hospital. Such systems may also empower outpatient providers with established relationships to patients, allowing them to remotely care for patients or participate more fully with inpatient teams to prevent fragmentation. From a management and operations perspective, clinical leadership could utilize video review as a part of peer review and coaching. Adverse events could be reviewed by re-design teams to help understand process flows in hospitals and minimize future events. Archived video data could provide fundamental new learning modalities for medical education, such as the ability to audit and review housestaff performance, and watch medical student patient encounters in real time without the Hawthorne effect of an attending's physical presence. Clinical teaching cases that transpire over a course of days or weeks could be compressed into condensed, memorable vignettes for review in short lectures. Yet with all the hype, for many valid reasons, namely those of privacy, transparency, liability, ownership and cost, both the providers and consumers of healthcare have not fully embraced video as a form of electronic documentation.18–21 While medicolegal concerns have caused significant trepidation in advancing this technology in healthcare, I believe that obfuscation behind a biased, abstracted medical record system should not be a collective goal of health professionals. It is possible that, if discoverable, such video documentation may actually serve to protect defendants against frivolous lawsuits and to help the judicial system identify actual malpractice. However, it is likely that without legal protections, providers and care organizations will not be willing to take the legal risk of implementation. Furthermore from a cost perspective, such systems require significant investment, for comparison, costs of current tele ICU systems are roughly between $2 and $5 million for installation with annual operating costs of about $1.5 million,22,23 warranting studies for cost-effectiveness. Other critics of video in healthcare contend that the technology could produce too much data, thereby generating too much noise to actual signal. I think the same argument could also be made about genetic information, high resolution imaging scans, telemetry, and even our current text-based EHR systems. As with these other sources, well-designed tools for monitoring, rapid analysis and insight should be further developed for video systems. Despite all of these limitations, my personal experience from watching the replay of the code situation was profoundly moving and stands out as a highlight of many years of medical training. In my humble opinion, the ability to capture and monitor patient care interactions on video is a modality that may better represent the visual and auditory world that surrounds patient care and may provide important insights into variations in practice and how, as officials in this imperfect game of medical practice, we can all strive to make the right calls. I long for medical instant replay. References 1. Gillam, Stephen, Yates Jan, Badrinath Padmanabhan. Essential Public Health: Theory and Practice.Cambridge: Cambridge University Press; 2007. 2. Wu L, Ashton CM. Chart review. A need for reappraisal. Evaluation and the Health Professions. 1997;20:146–163. 3. Allison JJ, Wall TC, Spettell CM, Calhoun J, Fargason Jr. CA, Kobylinski RW, et al. The art and science of chart review. Joint Commission Journal on Quality Improvement. 2000;26:115–136. 4. Rosenberg Tina. An Electronic Eye on Hospital Hand-Washing. New York Times; 24 February, 2011 〈http://opinionator.blogs.nytimes.com/2011/11/24/an-electro nic-eye-on-hospital-hand-washing/?scp=1&sq=hand%20hygeine&st=cse〉. 54 D.I. Rosenthal / Healthcare 1 (2013) 52–54 5. Fitzgerald M, Gocentas R, Dziukas L, Cameron P, Mackenzie C, Farrow N. Using video audit to improve trauma resuscitation-time for a new approach. Canadian Journal of Surgery. 2006;49(3):208–211. 6. Lubbert PH, Kaasschieter EG, Hoorntje LE, Leenen LPH. Video registration of trauma team performance in the emergency department: the results of a 2year analysis in a level 1 trauma center. Journal of Trauma. 2009;67:1412–1420. 7. Mackenzie CF, Xiao Y, Hu FM, Seagull FJ, Fitzgerald M. Video as a tool for improving tracheal intubation tasks for emergency medical and trauma care. Annals of Emergency Medicine. 2007;50:436–442. 8. Zick A, Granieri M, Makoul G. First-year medical students' assessment of their own communication skills: a video-based, open-ended approach. Patient Education and Counseling. 2007;68:161–166. 9. Oakley E, Stocker S, Staubli G, Young S. Video recording to identify management errors in pediatric trauma resuscitation. Pediatrics. 2006;117(3):658–664. 10. Ram P, Grol R, Rethans JJ, Schouten B, Van der Vleuten C, Kester A. Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Medical Education. 1999;33:447–454. 11. Pringle M, Stewart-Evans C. Does awareness of being video recorded affect doctor's consultation behavior? British Journal of General Practice. 1990;40: 45–48. 12. Eitel DR, Yankowitz J, Ely JW. Legal implications of birth videos. Journal of Family Practice. 1998:2516. 13. Campbell S, Sosa JA, Rabinovici R, Frankel H. Do not roll the videotape: effects of the health insurance portability and accountability act and the law on trauma videotaping practices. American Journal of Surgery. 2006:183–190. 14. Thomas EJ, Lucke JF, Wueste L, Weavind L, Patel B. Association of telemedicine for remote monitoring of intensive care patients with mortality, complications, and length of stay. Journal of the American Medical Association. 2009;302(24): 2671–2678. 15. Young LB, Chan PS, Lu X, Nallamothu BK, Sasson C, Cram PM. Impact of telemedicine intensive care unit coverage on patient outcomes: a systematic review and meta-analysis. Archives of Internal Medicine. 2011;171:498–506. 16. Lilly CM, Cody S, Zhao H, et al. Hospital mortality, length of stay, and preventable complications among critically ill patients before and after teleICU reengineering of critical care processes. Journal of the American Medical Association. 2011;305(21):2175–2183. 17. Young LB, Chan PS, Cram P. Staff acceptance of tele-ICU coverage: a systematic review. Chest. 2011;139:279–288. 18. Eitel DR, Yankowitz J, Ely JW. Legal implications of birth videos. Journal of Family Practice. 1998:251–256. 19. Campbell S, Sosa JA, Rabinovici R, Frankel H. Do not roll the videotape: effects of the health insurance portability and accountability act and the law on trauma videotaping practices. American Journal of Surgery. 2006:183–190. 20. Bain JE, Mackay NSD. Videotaping general practice consultations (letter to the editor). British Medical Journal. 1993;307:504. 21. Servant JB, Mathieson JAB. Video recording in general practice: the patients do mind. Journal of the Royal College of General Practitioners. 1986;36:555–556. 22. New England Healthcare Institute. Tele-ICUs: Remote Management in Intensive Care Units.Cambridge, MA: Massachusetts Technology Collaborative and Health Technology Center; 2007. 23. Breslow M. Remote ICU care programs: current status. Journal of Critical Care. 2007;22:66–76. Healthcare 1 (2013) 55 Contents lists available at SciVerse ScienceDirect Healthcare journal homepage: www.elsevier.com/locate/hjdsi Book Review Medical Licensing and Discipline in America: A History of the Federation of State Medical Boards, David A. Johnson, Humanyun J. Chaudhry. Published by Lexington Books and the Federation of State Medical Boards. Lanham, MD (2012). 390 pp., ISBN-10: 0739174398, ISBN-13: 978-0739174395. During the early centuries of American history, the organization of the medical profession was in a state of complete disarray. Unscrupulous, unqualified, self-professed healers practiced medicine without medical degrees or licenses or readily purchased degrees at diploma mills. Societies representing homeopathic, eclectic, Thomsonian, osteopathic and allopathic medicine endorsed conflicting health care agendas, and medical schools teaching disparate curricula sprouted up across the country. Throughout all of this, the American public, largely skeptical of the value of medical licenses and medical degrees in the first place, ignored all documentation, disregarded safety and just visited whichever health care provider they wanted to see. It is against this complicated backdrop that authors David A. Johnson and Humanyun J. Chaudhry begin their examination of the development and evolution of medical regulation in the United States. Medical Licensing and Discipline in America is a comprehensive historical account documenting the unique transformation of medical regulation in the country. Filled with engaging anecdotes, their work aims to explore the establishment of the Federation of State Medical Boards (FSMB), weaving the organization's history into the larger fabric of medical regulation and discipline from colonial times to the present day. It details the challenges confronted by efforts to regulate medical practice, the circumstances that led to the founding of the FSMB in 1912, and the organization's role throughout the ensuing decades amidst a statebased regulatory system. The book is information-rich, salient in the way it systematically embeds the story of medical regulation and discipline within the broader context of the many political, cultural, and medical changes in the country. The book is divided into eight chapters and also includes opening notes from US Surgeon General Regina Benjamin and FSMB Chairman Lance Talmage. The chapters are arranged chronologically, beginning with the “Birth of Medical Regulation in America,” in the 17th century—a time when regulation was piecemeal at best and rife with fraud. The second chapter “The Roots of the Federation of State Medical Boards” chronicles the founding of the American Confederation and the National Confederation, predecessors of the FSMB, and their eventual merger into the FSMB as we it know today. The third and fourth chapters, “Beginnings, Growth, and Challenges” and “Stasis and Resurgence” capture changes in medical licensure and discipline from 1912 to 1929 and 1930 to 1959 respectively. During these decades, there was a concentrated effort to standardize medical licensure requirements and exams, promote interstate endorsement of licensure, and publish an informational bulletin. The fifth chapter focuses on the early leadership of the Federation and in particular, Walter Bierring's many contributions to the growth of the FSMB are highlighted. Finally, the last three chapters focus on periods between 1961 and 1979, 1980 and 2001, and 2001 until the present day. Key developments featured within these pages include the push for greater transparency of board disciplinary actions, public accountability, a rigorous medical licensure examination and the expansion of the Federation as a source of data. The final chapter brings readers to the state of a 21st-century medical licensing board—a still relevant and now international organization focused on international data sharing, clinical competence and physician accountability. Throughout the book, authors Johnson and Chaudhry do an excellent job of outlining the complexity of medical regulation and how it is intimately tied to politics, government policy, and the changing attitudes of the American public. As a result, readers have the opportunity to not only learn about the formation of the FSMB, but also about the influence critical periods or events throughout American history – such as the Jacksonian Era, American Civil War, and the expansion of Medicare and Medicaid – had on the formation of regulatory policy. Medical Licensing and Discipline in America is a valuable resource for those interested in better understanding the history of state medical licensing boards—or more broadly, the evolution of medical education and practice in the United States. Johnson's and Chaudhry's work represents an important contribution to the existing literature on the medical regulatory system: with so much discussion about patient safety and malpractice in today's healthcare environment Medical Licensing and Discipline in America is a timely book that prompts reflection on the role of the medical licensing board in the ongoing conversation. Khin-Kyemon Aung n 19 Quincy Mail Center Cambridge, MA 02138, United States E-mail address: [email protected] Received 22 April 2013 Available online 4 May 2013 n http://dx.doi.org/10.1016/j.hjdsi.2013.04.001 Tel.: +1 440 364 8872.