While insurance coverage and access dominate policy conversations now, issues of the cost and quality of care are not far from the surface. To achieve value for the health care dollars that we spend and to make coverage and care more affordable, policy makers and payers have sponsored a broad range of activity around payment reforms to support new models of care. So much activity in the public and private sectors raises a fundamental question: With so many diverse reforms taking place, how is it possible to generate the evidence needed for informed decision making about payment reforms for the future?
A Bumper Crop Of Payment Reforms
After a decade of experiments with pay-for-performance that typically focused on clinical process measures and showed only modest behavioral impacts, payers began to tackle value-based purchasing with “core” health care dollars instead of marginal reward payments. Early public- and private-sector initiatives that focused on patient-centered medical homes, episode-based payments, and accountable care organization (ACO)-like models such as Massachusetts’ Alternative Quality Contract (AQC) and Medicare’s Physician Group Practice Demonstration engaged leading providers in a handful of markets. Their influence was plain in key provisions of the 2010 Affordable Care Act that established the Medicare Shared Savings Program and the Center for Medicare and Medicaid Innovation (Innovation Center), to focus on both bringing to scale the concept of a Medicare ACO and empower the agency to test and scale other payment and care delivery initiatives.
Since then, the number of Medicare ACOs has grown from 120 in 2012 to more than 520 this year. The Innovation Center has announced or launched nearly 40 initiatives, involving providers in every state and more than 18 million people, with commercial and Medicaid initiatives also proliferating. The Medicare Access and CHIP Reauthorization Act of 2015 and delivery system reform goals promoted by the Department of Health and Human Services (HHS) in the last administration only added fuel to the fire.
The ubiquity of payment and delivery system reform initiatives has generated tremendous interest in these issues in the trade press and at trade meetings. A burgeoning industry of consultants and tool developers seek to serve providers engaged in such reforms. While it’s difficult to estimate the precise percentage of practicing clinicians currently participating in reforms, we know that they represent every state, many medical specialties, and diverse health care facilities such as home health agencies, acute care hospitals, and skilled nursing facilities.
It’s difficult to imagine that this level of market awareness and activity does not affect the behaviors of even providers who are not explicitly participating in a payment initiative. Many of the aforementioned consultants and tool developers also serve providers considering or preparing to participate in reforms. The Centers for Medicare and Medicaid Services’ (CMS’s) Transforming Clinical Practice Initiative explicitly aims to improve providers’ readiness for reforms. The proximity of peers and competitors who are participating further builds pressure for other providers to react.
Spillover effects can thus take several forms:
- from one payer’s beneficiary population to another payer’s population within the same physician organization;
- from a provider participating in a reform to nonparticipating providers in the same market; and
- from the signals payers and policy makers disseminate nationally to diffuse markets.
- While we do not have extensive evidence on the magnitude of reform spillover effects, we can predict their directional impact on the evaluation of reforms. First, the ubiquity of reforms makes it increasingly challenging for evaluators to identify valid comparison populations; it also complicates the strategies that evaluators must use to collect data and adjust for participation in reforms other than the one under analysis. Second, spillover effects will tend to raise the measured performance of even validly drawn comparison groups, thus reducing the perceived “difference in difference” of performance from baseline to intervention periods between intervention and comparison groups (Note 1). Thus, a given reform could look less effective than it is.
Risks Of Spillover Effects For Evaluation
Why are spillover effects a significant policy concern? Because while decision makers differ in their appetite for conducting rigorous evaluations or publicly releasing the results, they all want to base their decisions about continuing a reform on evidence that it produces measurable and meaningful impact. Every decision maker applies his or her own actuarial thresholds for concluding whether the operational investments necessary to implement a reform are worthwhile, and hence whether to continue that reform.
In the case of Medicare payment reforms piloted through the Innovation Center, those thresholds are outlined in legislative language—which grants the HHS secretary the authority to expand a model if it either reduces spending while at least maintaining quality or improves quality without increasing spending. The Office of the Actuary currently interprets these thresholds to mean a projection of statistically significant difference in performance between the intervention and “usual care” over the period of the intervention. The Office of the Actuary also considers projected behavioral responses but typically relies heavily on results of rigorous evaluations premised on valid comparison groups.
Spillover effects could curb the likelihood that CMS would conclude that reforms with modest impact had any statistically significant impact at all. This would in turn reduce the likelihood that the Office of the Actuary would predict that those reforms would generate future savings or be cost neutral and thus qualify for expansion. In the long term, the result would be a vicious cycle of implementing increasing numbers of new reforms without being able to demonstrate that most are worthy of expanding.
Reforms that address particularly high-cost populations, such as those in need of end-of-life care, oncology care, or end-stage renal disease care, have a greater likelihood of producing large improvements, but we expect that most payment reforms will have only modest, if any, impacts. This is consistent with findings for reforms that have had early positive results, such as the AQC or Pioneer ACO Model. Moreover, early successes happened early, involving leading provider organizations at a time when spillover effects might have been minimal. Whether those models would have even modestly positive results if implemented in the current climate is an open question.
Mitigating The Risks To Sound Decision Making
What viable options do decision makers have for addressing spillover effects? What’s done can’t easily be undone. Few decision makers would have an appetite for unwinding whole reforms (except those that result in harm) or for drastically limiting participation in reforms already announced. To do so would likely trigger disproportionate and negative market responses, undermining the momentum for providers to engage in reforms. As importantly, decision makers would find it challenging to distinguish between reforms that have modest impact from those that truly are ineffective.
Increasing Private Payer Participation And Reporting
But this does not mean that we have no options available to address the problem of spillover. First, policy conclusions would be more robust if, as some of us have written previously, private payers were more willing to participate in, and release results of, evaluations. More evaluations (including qualitative data) would provide valuable information for some of the strategies we list below, such as the creation of proxy measures of spillover.
Second, decision makers could revisit the definition of “success,” which currently compares difference-in-difference performance for intervention and comparison groups while defining the baseline as the period immediately before the start of the reform. They could also consider results relative to an alternative baseline period—one that occurred before the explosive growth in the number and scope of reforms. They could identify this inflection point using a temporal graph of the rise in the number and scope of reforms in a market or nationally, relying on available data from CMS, the National Association of Medicaid Directors, and private organizations such as Catalyst for Payment Reform or the Blue Cross Blue Shield Association.
Of course, such an approach would still need to account for other important temporal factors, such as macroeconomic trends or the introduction of disruptive new medical technology, which may play as great a confounding role as spillover. In the case of the Innovation Center, we believe there is flexibility in statutory language for the Office of the Actuary to consider alternative interpretations of the baseline period.
Measuring And Adjusting For Spillover Effects
Evaluators could also try to directly measure and adjust for spillover effects. They might consider proxy measures such as the number of reforms in an area, the aggregate penetration (in number of providers or patients) of all reforms, or the penetration of reforms targeting similar providers or patients. Alternatively, they could conduct targeted surveys of intervention and compare providers to assess the timing of adoption of key practice capabilities or care approaches that might indicate a spillover effect.
Understanding that such post-hoc measures are imperfect, evaluators and actuaries would need to make judgments about how to adjust estimates of program impact for a given unit of potential spillover. Or they could simply present quantitative information on potential spillover alongside their raw evaluation or actuarial findings, and let decision makers decide whether accounting for spillover would result in a conclusion that the reform was successful. In the case of the Innovation Center, this may require a statutory change to requirements for model expansion.
The Right And Wrong Ways To Evaluate Combinations Of Reforms
Finally, we recognize that in some circumstances, payers may believe it is important to test combinations of reforms to measure their aggregate impact. But we would recommend that they do so by deliberately layering multiple interventions in a market or with specific providers and design such a test with credible evaluation in mind, instead of backing into post-hoc analyses driven by where providers happen to participate in multiple reforms.
The real risk of spillover effects is that they could engender a nihilistic response; decision makers could conclude that it is impossible to generate meaningful evaluation results and thus that evaluations shouldn’t drive decision making. We strongly recommend resisting such a posture, as it would untether decision making from objective evidence.
The palpable energy and momentum generated by the national activity on payment and delivery system reform are a boon to efforts to improve the value of US health care. But we need to acknowledge and address the risks this energy and momentum pose to future decision making to fully leverage opportunities for lasting change.
A difference-in-difference analysis involves measuring the change in performance for an intervention group over a period of time and comparing that change to an analogous measure of change in performance for a comparison group, usually over the same period of time; hence the term “difference in differences” over time.