Introduction

Despite the advantages of rigorous clinical trial designs in providing unbiased estimates of treatment outcomes, these designs may also lead to somewhat paradoxical findings. For example, a treatment may have an unarguable benefit on objective outcomes but a less clear impact on more subjective outcomes. Research on response-shift effects provides a theory-driven and empirically testable path toward understanding such paradoxes. “Response shift” refers to the idea that when individuals experience a change in health status, they may change their internal standards, values, or conceptualization of a target construct like “quality of life” (QOL) [1, 2]. Over the past two decades, research in a broad range of therapeutic areas has supported that response-shift effects can influence clinical research findings, and can represent positive and negative adaptation [3,4,5,6,7,8,9,10,11,12]. While response-shift effects are generally small, they can influence study conclusions and are thus not inconsequential [3, 4].

The current methods for detecting response-shift effects [1, 13, 14] work with the idea that unexpected levels of QOL scores reflect adaptation [9, 11, 12, 15,16,17,18,19,20]. For example, if a clinician-assessed outcome does not agree with a patient-reported outcome (PRO), this “discrepancy” may signal patients’ changes in internal standards, values, or conceptualization of the target construct (e.g., QOL) [17]. Rather than suggesting that either the clinician-assessed or the patient-reported outcome is flawed or biased, this discrepancy suggests that there is ‘more than meets the eye,’ and that a deeper investigation of the situation is warranted. A recent study of people with spinal cord injury (SCI) reported that while objective measures of motor and cognitive function had stabilized one to five years post-injury, patient-reported outcomes reflected recalibration and reconceptualization response shifts [5]. Specifically, patients experienced improvements in physical functioning primarily by dint of improvements in physical role performance over time (recalibration) [5]. They also appeared to change their conceptualization of QOL over time such that over the long-term follow-up, the people with SCI stopped considering their SCI per se as part of their general health, and instead only considered SCI sequelae as part of their general health [5]. These response-shift effects may be important in understanding the full range of dynamics that matter for QOL, as in for example, measures of clinical significance in other patient populations such as with multiple sclerosis and spinal disorders [21, 22].

Researchers have long posited that response-shift effects may obfuscate treatment effects. A substantial number of articles have discussed the importance of response-shift effects in clinical trials (e.g., [23, 24]), and several studies have tested for response-shift effects in clinical trials [6, 11, 25,26,27,28,29]. Several of these studies used the then-test method, a method prone to recall bias and lack of specificity which challenges interpretation [30,31,32]. One of the studies used the relative-importance method to evaluate reprioritization response shifts [11], and two of these studies either combined the then-test with the Schedule for the Evaluation of Individualized QOL (SEIQOL) individualized method [27] or used another individualized metric, the Patient-Generated Index (PGI) [29]. These latter two studies thus provide a fuller, qualitative context to the respondents’ changes in priorities and conceptualizations of QOL. The metrics are, however, difficult to harness in quantitative metrics that can help to interpret trial outcomes in comparison to metrics that ignore response-shift effects.

The present work aimed to investigate possible response-shift effects in a recent clinical trial (n = 143) testing a new treatment for Neuromyelitis Optica Spectrum Disorder (NMOSD) [33]. This uncommon but severe form of demyelinating disease is a relapsing, autoimmune, inflammatory disorder that typically affects the optic nerves and spinal cord, leading to blindness and paralysis [34]. Often initially misdiagnosed as having multiple sclerosis, NMOSD patients face a frightening trajectory of severe relapses that leave residual neurologic disability and bring about unpredictable and disabling future attacks [35].

This NMOSD clinical trial provided impressive support in preventing relapse (primary outcome) for the drug Eculizumab. It also provided support for Eculizumab on the more objective secondary outcomes, which were clinician-assessed indicators as well as the EQ-5D utility measure of health state [36]. There was, however, less strong support as the indicators became more subjective or evaluative (e.g., EQ-5D visual analogue scale  or EQ-5D VAS indicator of global health; evaluative physical functioning), with null results for the evaluative self-report measure of mental functioning [37]. This pattern of results led us to hypothesize that response-shift effects are present.

Response-shift methods for detecting effects in secondary analyses often rely on relatively large sample sizes [38]. For example, the abovementioned SCI study used Oort’s Structural Equation Modeling (SEM) [10, 39], a well-vetted method that has been used in a number of secondary analyses of observational data [4, 10, 40,41,42]. This approach provides an ordered series of steps that test for response-shift effects, with later steps conducted only if earlier steps pass muster. The sample for the present study is too small for an Oort SEM analysis. Instead, we devised a “de-constructed” approach that is more appropriate for use with small samples. This method begins by testing an omnibus response-shift hypothesis and then implements a series of analyses aimed to elucidate what is uncovered in the omnibus test. We investigated possible effects first for Treatment Arm as a ‘catalyst’ of response shift due to the substantial health-state changes that differentiated the two groups [2, 17]. We then examined Relapse Group as a catalyst, to better understand these findings.

Methods

Sample and trial procedure

This secondary analysis utilized data from a randomized, double-blind, time-to-event trial evaluating the impact of Eculizumab in preventing relapses in 143 people with NMOSD. Eligible participants were patients of age 18 years or older, who had a diagnosis of NMOSD or neuromyelitis optica chronic medical condition. This international trial was recruited from 80 sites over four continents. Figure 1 provides the timing of clinician- and patient-reported outcome collection over the course of the trial. (For complete details on trial inclusion and exclusion criteria and procedures see reference [33]). The trial was conducted in accordance with the provision of the Declaration of Helsinki, the International Conference on Harmonization guidelines for Good Clinical Practice, and applicable regulatory requirements. The trial was approved by the institutional review board at each participating institution. All the patients provided written informed consent before participation.

Fig. 1
figure 1

Timing of PRO measurement collection in the clinical trial design. This study schema provides the timing of clinician- and patient-reported outcome collection over the course of the trial

Measures

For the present analysis, we included information about treatment arm (i.e., Eculizumab vs. Placebo) as well as the following clinician- and patient-reported outcome data and information about relapse.

Clinician-assessed outcomes.Clinicians who were blind regarding trial-group assignment rated patients’ disability on the Kurtzke Expanded Disability Status Scale (EDSS) [43]. This standard neurological outcome tool assigns scores based on eight Kurtzke Functional Systems that include signs of disability (pyramidal, cerebellar, brainstem, sensory, bowel/bladder control, visual, cerebral, and other). The EDSS score ranges from 0 [no disability] to 10 [death]. Treating clinicians or appropriately trained staff members evaluated patients using the modified Rankin scale (MRS) [44], which assesses the degree of dependence in daily activities; scores range from 0 (no disability) to 6 (death). The Hauser Ambulation Index (HAI) [45] focuses on mobility disability by assessing how much time and degree of assistance is needed to walk 25 feet. Its scores range from 0 to 9, with higher scores indicating increased impairment.

PROs.Patients completed the European Quality of Life 5-Dimension 3-Level (EQ-5D-3L) questionnaire [36]. For the purpose of this study, we included the EQ-5D VAS item, a subjective global score of self-reported health ranging from 0 (worst imaginable health) to 100 (best imaginable health). The Short-Form-36v2 (SF-36v2™) [46] is a generic evaluative measure of functional health that includes eight domain scores (general health, physical functioning, physical role performance, social functioning, emotional role performance, mental health, pain, vitality) that are summarized with the Physical Component Score (PCS) and Mental Component Score (MCS). The norm-based scoring system of the SF-36™ ranges from 0 to 100, with a normative mean of 50 and standard deviation of 10. Higher scores indicate better functional health.

Information about relapse.In the clinical trial, time to relapse was the primary endpoint. The Opticospinal Impairment Score (OSIS) [34] evaluated relapse severity in four domains: Visual Acuity, Motor Function, Sensory Function, and Sphincter Function. Scores range from 0 to 8 for the first domain, and 0 to 5 for all others. High scores indicate worse functioning.

On-trial relapse was defined as a patient’s new onset of neurologic symptoms or worsening of new neurologic symptoms if those symptoms persisted for more than 24 h, were attributed to NMO, and were preceded by at least 30 days of clinical stability. On the basis of the neurological exam and the OSIS, the treating clinician and a blinded examining clinician judged the severity of the relapse (“Clinician-Assessed Relapse”). A ‘major’ relapse was defined as an increase in 2–3 points in OSIS Visual Acuity (depending on whether the patient started with a score of 2–7 or 0–1, respectively); and an increase of 2–3 points on the OSIS Motor Subscale (depending on whether the patient started with a score of 2–6 or 0–1, respectively). Any loss in proprioception on the OSIS Sensory Subscale was considered ‘major.’

An independent panel of three experts (two neurologists and one neuroophthalmologist) who were blinded to treatment assignment then adjudicated the relapse by considering information from the Clinician-Assessed Relapse, Magnetic Resonance Imaging data, Optical Coherence Tomography imaging data, and the recorded exam. This adjudication process was intended to strengthen the robustness of the trial’s primary endpoint by reducing error variance due to (a) geographic differences in standards of care; and (b) a potential bias toward over-reporting a neurologic event as an on-trial relapse to mitigate potential long-term sequelae of a missed relapse. There were 43 patients with Clinician-Assessed Relapses, of whom 22 were adjudicated positively (i.e., categorized in the Adjudicated-Relapse group).

Statistical analysis

This secondary analysis of NMOSD trial data examined evidence of response-shift effects in trial outcomes. We began by focusing on differences by treatment arm (Eculizumab vs. Placebo) and then examined differences by relapse status. Relapse status was defined as a three-level variable (No Relapse, Clinician-Assessed Relapse, Adjudicated Relapse). This variable allowed us to test relationships that had more power (due to larger sample size than simply comparing the Adjudicated-Relapse and No-Relapse groups), and that differentiated more subjective indicators of signal (i.e., Clinician-Assessed) from more objective indicators (i.e., Adjudicated).

These analyses aimed to “de-construct” different aspects of measurement invariance in the context of a small sample to characterize recalibration, reprioritization, and reconceptualization response shifts. We proceeded in four steps.

Step 1::

Hypothesis-driven group differences in expected–observed discrepancy scores. This step tested the ‘omnibus response-shift hypothesis’ that there are differences between expected and observed QOL scores (“discrepancy scores”) as a function of the hypothesized response-shift catalyst (i.e., treatment arm and then whether the person ultimately had a relapse). If this omnibus test does not support a response-shift effect, then then subsequent analytic steps would not be implemented.

To examine discrepancy-score differences by catalyst group, we used the Rapkin and Schwartz residual-modeling approach [17, 47]. We began by computing a principal component from the PRO scores, including the EQ-5D VAS and the 8 domain scores of the SF-36™ at all time points (Supplementary Table 1). This analysis enabled summarizing the PRO scores in one component score using data from all time points (see Results section for details).Footnote 1 If this analysis had not supported the existence of one dominant component, we would have reduced the scores included such that they were well represented by a unidimensional component score. We then used this component score as the dependent variable in a random-effects model [48] that included the following demographic and clinical predictors at all available time points: gender, race, country, ethnicity, age, number of years since diagnosis, number of years since NMOSD-presenting symptoms, body mass index, and treatment compliance; and scores on the MRS, HAI, EDSS, and KFS. We saved the residuals from this model, and then tested models predicting these residuals (i.e., scores capturing the discrepancy between expected and observed outcomes) using hypothesized response-shift catalyst groups as the independent variable. Paneled histograms illustrate catalyst-group differences in the discrepancy scores.

If these results suggested that there were response-shift effects, steps 2–4 would then examine evidence of recalibration, reprioritization, and reconceptualization response shifts, respectively. Random-effects models [48], and more specifically random-intercept models, were used to examine longitudinal differences in patterns of emphasis by catalyst group—whether PCS and MCS differed by treatment arm or relapse group in their ability to explain EQ-5D VAS scores; and whether such dynamics changed over time. (The decision to work with SF-36™ QOL summary [component] scores rather than domain scores was based on the statistical collinearity of the latter [e.g., average rbaseline = 0.40, range = 0.14–0.66], whereas the PCS and MCS scoring algorithm results in uncorrelated component scores [rbaseline =  − 0.13]).

Step 2::

Group differences in patterns of emphasis. This analysis focused on characterizing a recalibration response shift. It examined whether the catalyst groups evinced different patterns of emphasis entailing a different connection between PCS or MCS and the EQ-5D VAS. This pattern would be indicated by significant two-way interactions (catalyst group-by-PCS; catalyst group-by-MCS) in a random-effects model predicting the EQ-5D VAS (dependent variable) from MCS, PCS, catalyst group, and Time (Weeks in Study). It is similar to asking in an Oort SEM [39] context whether the intercept of the slope relating the QOL component with the global ED-5D VAS is different by catalyst group.

Step 3::

Group differences over Time in patterns of emphasis. This analysis focused on characterizing reprioritization response shift. Step 3 expanded on the prior model to investigate whether these patterns of emphasis changed differently by catalyst group over time. It tested the three-way interactions among catalyst group, Weeks in Trial, and PCS or MCS scores, after adjusting for main effects and the other interactions.

Step 4::

We then tested for reconceptualization response shift by implementing a series of random-effects models predicting each QOL domain from catalyst group after adjusting for the other eight domains. Step 4 examined whether certain measures captured unique aspects of QOL that distinguished catalyst groups. This analysis focused on characterizing how each QOL domain’s relationship with catalyst-group status varied when isolated from (i.e., after adjusting for) the other QOL domains. We can infer reconceptualization response shift from this analysis based on how much catalyst-group variance is uniquely accounted for by each SF-36™ domain.

Handling of missing data

There was very little missing data in this data set, and the variables we used in our modeling had no missing data.

Results

Sample

The study sample included 143 people, of whom 107 had Definitive Neuromyelitis Optica and 36 had NMO Spectrum Disorder (Table 1). Two-thirds of the sample was on Eculizumab and one-third on placebo, and the sample evinced high levels of treatment adherence. The sample had a mean age of 44 and a mean age of diagnosis of 41. The sample was predominantly female. Each patient had between three and 23 clinician visits during the trial, and each spent between two and 30 months under study.

Table 1 Descriptive statistics of study sample (N = 143 patients)

Table 2 displays the descriptive statistics of baseline scores on the clinician- and patient-reported outcomes. On average, the sample had ‘slight disability’ on the MRS, and scores on the HAI and EDSS consistent with some gait abnormalities, but not enough to prevent independent walking. The sample’s average SF-36™ PCS score was substantially below norm-based means; the MCS score was slightly but not significantly below norm-based means. The biggest decrements on the SF-36™ domain scores were in physical functioning and physical role performance. On the EQ-5D VAS, mean scores reflected substantial health impairment. The Self-Care domain of the EQ-5D evinced the greatest decrement. Figure 2 shows the mean change from baseline on the SF-36™ domains by treatment arm. The Eculizumab group evidenced bigger changes in the SF-36™ physical domains compared to the Placebo group, which showed larger changes in the mental domains.

Table 2 Descriptive statistics of scores at baseline (N = 143)
Fig. 2
figure 2

Mean change from baseline to study end in SF-36™ domain scores and EQ-5D VAS by treatment arm. The Eculizumab group evidenced bigger Changes in the SF-36™ physical domains compared to the Placebo group, which showed larger changes in the mental domains

Component score used for creating discrepancy scores

Supplementary Table 1 shows the loadings of each of the PROs used in the PCA. The PRO data from all time points were effectively captured in one component score (Successive Eigenvalues = 4.95, 0.96, and 0.85; successive variances explained = 55%, 10.7%, and 9.4%.). Fig. 3 shows the distribution of the discrepancy scores in the entire sample. The distribution was centered around zero, and slightly left-skewed.

Fig. 3
figure 3

Discrepancy Score Distributions for the whole sample. The distribution was centered around zero, and slightly left-skewed

Treatment arm as catalyst

Step 1: Treatment arm differences in expected-versus-observed discrepancy scores

The Kruskal–Wallis non-parametric test revealed differences in the central tendencies of the distributions of the discrepancy score by treatment arm (test statistic = 108.40, df = 1, p < 0.0005). The placebo group had a systematically lower median (Fig. 4). For the Eculizumab patients, the discrepancy score was generally close to zero. A sensitivity test was done omitting one low-scoring outlier in the Placebo group and the results were essentially unchanged.

Fig. 4
figure 4

Discrepancy Score Distributions by Treatment Arm. The distributions of discrepancy scores are different by treatment arm, with the largest and more-negative discrepancies found in the Placebo group, as compared to the Eculizumab group. One low-scoring outlier was not shown in the graph but was retained in calculations

Step 2: Treatment arm differences in patterns of emphasis

Table 3 shows the results of random-effects models assessing differences in patterns of emphasis in the trial participants. There were significant two-way interactions between treatment arm and PCS and MCS scores, such that the Placebo patients had a greater emphasis on PCS and a lesser emphasis on MCS in their ED-5D VAS scores as compared to the Eculizumab patients.

Table 3 Random effects models testing treatment-arm-related differences in emphasis

Step 3: Treatment arm differences in changes over time in patterns of emphasis

There were no significant three-way interactions for Treatment Arm with time and PCS or MCS (Table 3). These results suggest that differences in patterns of emphasis did not change over time. Residuals overall and for each group were non-normal (p < 0.0005 for each treatment arm) due to skewness (− 0.57 and − 0.39, for Placebo and eculizumab, respectively).

Step 4: Group differences in conceptualization of QOL

Table 4 shows results of the series of random-effects models aimed at clarifying how each domain’s relationship with Treatment Arm varied across models when adjusting for all the other domains. These models suggested that the Placebo group was associated with substantially worse-than-expected ED-5D VAS and Vitality scores. None of the other seven SF-36™ domain scores had statistically important relationships with Treatment Arm after adjusting for the other QOL domain scores.

Table 4 Isolating QOL associations by treatment arm

Because Eculizumab was highly effective at preventing relapse, we hypothesized that the response-shift effects related to treatment arm overwhelmingly reflected the impact of relapse on patients. We thus investigated response-shift effects by relapse status using the same series of analyses.

Relapse group as catalyst

Step 1: Relapse-group differences in expected-versus-observed discrepancy scores

The Kruskal–Wallis non-parametric test supported that there were relapse-group differences in the discrepancy-score distributions (test statistic = 14.87, df = 2, p = 0.001). Figure 5 shows the distribution of discrepancy scores by relapse group. For No-Relapse patients, the discrepancy score was generally close to zero. For the Clinician-Assessed Relapse and Adjudicated-Relapse groups, the score varied more widely. Post hoc pairwise comparisons revealed that the Adjudicated-Relapse group had substantially larger and more-negative discrepancy scores than the other groups (K-W Test Statistics =  − 35.99 versus − 14.97 and − 21.02, respectively; p < 0.0001 versus p = 0.125 and 0.09, respectively). A sensitivity test omitted the low-scoring outlier in the No-Relapse group and the results were essentially unchanged.

Fig. 5
figure 5

Discrepancy Score Distributions by Relapse Group. The distributions of discrepancy scores are different by relapse group, with the largest and more-negative discrepancies found in the Adjudicated-Relapse as compared to the Clinician-Assessed and No-Relapse Groups. One low-scoring outlier was not shown in the graph but was retained in calculations

Step 2: Relapse-group differences in patterns of emphasis

Table 4 shows results of random-effects models assessing differences in patterns of emphasis in the trial participants. There was a significant two-way interaction between Adjudicated Relapse and PCS, and Clinician-Assessed Relapse and MCS. Thus, compared to the No-Relapse patients, Adjudicated-Relapse patients had a greater emphasis on PCS in their ED-5D VAS scores, and Clinician-Assessed relapse patients had a greater emphasis on MCS in their ED-5D VAS scores.

Step 3: Relapse-group differences in changes over time in patterns of emphasis

There were significant three-way interactions for Relapse-by-time-by-PCS and Relapse-by-time-by-MCS (b =  − 0.01 in both cases; p = 0.02 and 0.01, respectively), after adjusting for main effects and two-way interactions (Table 5). These results suggest that although PCS and MCS are more important in accounting for ED-5D VAS for people who had an adjudicated relapse than for people with no relapse, this difference attenuates over time. Residuals overall and especially for the no-relapse group were non-normal (p < 0.0005 and 0.0005, respectively) due to skewness (− 0.48 and − 0.52, respectively). For the adjudicated and clinician-assessed relapse, the residuals were normally distributed (p = 0.05 and 0.50, respectively).

Table 5 Random effects models testing relapse-related differences in emphasis by relapse group and over time

Step 4: Relapse-group differences in conceptualization of QOL

Table 6 shows results of the series of random-effects models aimed at clarifying how each domain’s relationship with relapse status varied across models when adjusting for all the other domains. These models suggested that relapse status was associated with substantially worse-than-expected ED-5D VAS scores for both Clinician-Assessed and Adjudicated-Relapse groups, after adjusting for the 8 SF-36™ domain scores. In other words, in contrast to the SF-36™ domain scores, ED-5D VAS scores uniquely discriminated Relapse-Group deficits. On the other hand, people who had a Clinician-Assessed Relapse had slightly better than expected Social-Function scores. In other words, Social-Function scores uniquely revealed a strength of this Group. None of the other seven SF-36™ domain scores had statistically important relationships with relapse status after adjusting for the other QOL domain scores.

Table 6 Isolating QOL associations with relapse status

Discussion

This secondary analysis of clinical trial data revealed that not receiving active treatment and, more specifically, the experience of relapse made people change their thinking about QOL (see summary in Table 7). The implications for such changes on interpreting treatment effects may be substantial. Our results suggest that the QOL impacts of placebo/relapse were under-estimated by the usual analyses, and thus the benefit of Eculizumab is likely even greater than what was documented in the pivotal clinical trial [33], extending to subjective outcomes.

Table 7 Summary of response-shift analyses and interpretation

Of note, the whole study sample started the trial with close-to-normal scores on the MCS, despite decidedly low scores on the PCS and ED-5D VAS. Thus, despite having dealt with the vicissitudes of NMOSD for an average of 4 years, the participants managed to maintain prior to the trial a relatively normal level of mental-health functioning. In this they also managed to maintain stability over the course of the trial, regardless of treatment arm. This paradox is consistent with response-shift theory, which posits that changes in internal standards, values, and conceptualizations of health allow individuals to maintain QOL homeostasis in the face of changing health circumstances [2, 17].

Our findings likely reflect the ‘shadow’ of response shift, inferred by the behavior of examined interactions and unique variance explained rather than characterized more directly. People on placebo and/or people who had a relapse are thinking differently about health due to their experiences. The relapse experience appears to reflect less and less that which is assessed by the SF-36™ generic functional health indicators, and so assessment of more constructs would be required to delineate exactly what ‘health’ means after relapse. For example, ‘health’ may have more to do with purpose in life or meaningful social connections, concepts measured by the Ryff Psychological Well-Being scale [49, 50]. Including measures of cognitive appraisal [51, 52] would also facilitate more direct characterization of the response-shift effects. Nevertheless, in the absence of other such measures, the ED-5D VAS has clear value in this study.

The present study represents a response-shift investigation of clinical trial data using accepted analytic methods. Triggered by prior unexpected non-significant treatment differences in the more subjective domains related to mental health, here we pursued a series of analyses to explicate these patterns. These analytic steps begin by testing an omnibus response-shift hypothesis that examines the distribution of discrepancy scores by catalyst group. If this hypothesis does not support response shift, then no further analyses would be done. In our companion paper [53], we provide a method that builds on these findings to enable estimation of how response shift affects measured outcomes.

It should be noted that the residual-modeling approach specified in our analyses is distinct from Mayo’s 2008 method [7]. While Mayo’s 2008 method also works with residuals, the Rapkin and Schwartz method [17] explicitly computes a ‘standard model’ that includes all available antecedents, and saves the residuals (i.e., discrepancies), which are then used as the dependent variable in hypothesis-driven analyses. Once the response-shift omnibus hypothesis is supported (i.e., the aforementioned discrepancies differ by catalyst group), the method presented in this article then implements a series of random-effects models to test response-shift effects operationalized in ways similar to the Oort SEM method. If measures of appraisal had been collected in the trial data, the Rapkin and Schwartz method would also examine main effects and interactions of appraisal and change in appraisal in conjunction with catalyst (i.e., treatment arm or relapse status) main effects and interactions. In contrast, the Mayo method creates residuals based on a short list of antecedents (i.e., disease severity, age, sex, and comorbidity), and then creates residual-trajectory scores which are then modeled using latent class analysis. Both methods utilize residuals to test response-shift hypotheses in interesting and informative ways, but the method used in our work is correctly identified as the Rapkin and Schwartz (2004) method [17].

The study has a number of strengths including the high-quality data on relapse, the inclusion of subjective and objective indicators, and the longitudinal follow-up with low attrition. Its limitations must, however, be acknowledged. Our results likely under-estimate response-shift effects for several reasons. First, the sample sizes of those who ultimately had an adjudicated relapse are relatively small, affording statistical power to detect only large effect sizes [54]. Accordingly, some models may be over-identified. To reach significance despite low power means more than to do so when aided by high power. This situation prevents the application of well-codified response-shift analyses using SEM that would enable us to work with collinear domain scores (using residual correlation), and to model moderation and mediation effects more robustly. The residuals from the two random-effects models were also not always normally distributed, which violates a random-effects model assumption [48, 55]. Random-effects models appear, however, to be robust to such violations [56, 57]. Further, the study does not include measures of certain relevant constructs such as well-being or of cognitive processes underlying patient self-report. Measures of QOL appraisal processes [51, 52] would facilitate a more narrative and nuanced description of how the relapse groups differed in their frames of reference, standards of comparison, experience sampling, and patterns of emphasis [58, 59]. Future research might include such cognitive-appraisal and well-being scales [49, 50] in prospective clinical trials of new treatments to ensure that the patients’ experience is captured over the course of the trial. Finally, most data that were collected from those who ultimately suffered a relapse were collected before that relapse. Thus, the study design afforded little opportunity for detecting Relapse-Group differences. Despite these odds, we found such differences, perhaps suggesting that relapse patients are experiencing sub-clinical, early warning signs of a relapse. Further investigation into these early warning signs might enable interventions to delay the ‘tipping point’ to full relapse [60, 61].

In summary, this study of response-shift effects in the Eculizumab clinical trial suggests response-shift effects by treatment arm and relapse status. Using a series of analytic steps aimed at detecting the ‘shadow’ or reflection of response shift, we found that, among Placebo patients and as relapse criteria became more specific and rigorous, commonly accepted clinical and demographic indicators explained less well the patients’ QOL ratings. The idea of ‘health’ among placebo patients and among those who eventually relapsed reflected different patterns of emphasis, and these emphases changed over time for relapse patients, compared to the No-Relapse Group, even when these differences were “watered down” by the inclusion of pre-relapse data. We conclude that there are other aspects of QOL that become more important when one experiences a relapse, aspects that are not well captured in the SF-36™ and/or EQ-5D VAS. This ‘shadow’ of response shift may take a more definite shape when more relevant constructs are included in a study well powered to explicate the relapse experience.