The effects of a treatment or an intervention on a count outcome are often of interest in applied research. When controlling for additional covariates, a negative binomial regression model is usually applied to estimate conditional expectations of the count outcome. The difference in conditional expectations under treatment and under control is then defined as the (conditional) treatment effect. While traditionally aggregates of these conditional treatment effects (e.g., average treatment effects) are computed by averaging over the empirical distribution, a recently proposed moment-based approach allows for computing aggregate effects as a function of distribution parameters. The moment-based approach makes it possible to control for (latent) multivariate normally distributed covariates and provides more reliable inferences under certain conditions. In this paper we propose three different ways to account for non-normally distributed continuous covariates in this approach: an alternative, known non-normal distribution; a plausible factorization of the joint distribution; and an approximation using finite Gaussian mixtures. A saturated model is used for categorical covariates, making a distributional assumption obsolete. We further extend the moment-based approach to allow for multiple treatment conditions and the computation of conditional effects for categorical covariates. An illustrative example highlighting the key features of our extension is provided.

1 Introduction

The evaluation of treatment effects on count outcomes is quite common in the social and health sciences. Often, covariates are included in the analysis using a negative binomial regression model (Garrett et al., 2018; Hittner, Owens, & Swickert, 2016; Jobe et al., 2001; Mazerolle et al., 2019; Nusser & Weinert, 2017; Schaumberg & Flynn, 2017; Sridharan, Shoda, Heffner, & Bricker, 2019), that is, the conditional expectation for the count outcome is logarithmically linked to the treatment variable and the covariates. The difference in conditional expectations under treatment and under control is then defined as the (conditional) treatment effect. Traditionally, aggregates of conditional treatment effects (e.g., the average treatment effect) are computed by averaging over the empirical (joint) distribution of the covariates (Greene, 2007).

Recently, a moment-based approach was proposed by Kiefer and Mayer (2019, 2020) enabling aggregated effects to be computed using parameters of the (joint) distribution of the covariates (e.g., means, variances, and covariances). Instead of computing conditional effects given the observed covariate values and averaging over their empirical distribution, the conditional effects are integrated over all possible values of the covariates weighted by their likelihood given by the (assumed) joint density. While this procedure would usually have to be carried out by numerically solving an improper integral (e.g., an integral over conditional effects weighted by a multivariate normal density), the improper integral is replaced with a moment-generating function in the moment-based approach. Thus, as most moment-generating functions have closed-form solutions, the moment-based approach allows for a fast and analytical computation of the aggregated treatment effects.

The moment-based approach has two major advantages over the traditional empirical distribution approach. First, the covariate side of the effect calculation is stochastic instead of fixed, which enables more accurate statistical inferences about conditional and average treatment effects. In contrast, the traditional approach treats the observed covariate values as fixed by design (i.e., not varying between samples), which can lead to an underestimation of standard errors for the aggregated effects. Second, it allows common factor models for the covariates, meaning that not directly observed (latent) covariates can be accounted for as well. As the empirical distribution of latent variables is not observed, the traditional approach cannot deal with latent variables and would require fallible substitutes (i.e., sum scores, factor scores).

However, the extended moment-based approach as suggested by Kiefer and Mayer (2020) has an Achilles' heel: covariates are assumed to be multivariate normally distributed within the treatment groups. In practical settings, this assumption is often violated, as observed variables in real data sets in the social and health sciences frequently deviate from the normal distribution (Bono, Blanca, Arnau, & Gómez-Benito, 2017; Micceri, 1989). For example, Blanca, Arnau, Lopez-Montiel, Bono, and Bendayan (2013) examined 693 distributions from real psychological data and found that 74.4% presented slight or moderate deviations from the normal distribution, while 20% exhibited more extreme deviation. In a simulation study, Lei and Lomax (2005) investigated parameter and standard error bias in structural equation models when non-normal variables are introduced into normal distribution-based maximum likelihood estimation. They conclude that parameter estimation is sensitive to non-normal variables, but the bias introduced by slight non-normality is moderate. However, severe non-normality leads to substantial bias in loadings and structural parameters. Similarly, Kiefer and Mayer (2019) found in their examination of the univariate moment-based approach that missspecification of the covariates' distribution can introduce bias into the average effect estimation.

Thus, in this paper, we propose several extensions of the moment-based approach. These extensions aim to make the approach more flexible with regard to its distributional assumptions as well as to provide applied researchers with a more nuanced effect analysis. First, to properly account for non-normal covariates, we suggest differentiating between categorical covariates (e.g., gender, ethnicity) and non-normal continuous (or metric) covariates (e.g., pre-test count variables). We then describe four different possibilities to account for non-normal joint distributions. Second, we define and compute average and conditional treatment effects of interest, for example, the conditional effect given a treatment condition or given a gender. These kinds of finer-grained effects (as opposed to the average treatment effect) enable researchers to examine treatment efficacy under different conditions in a more nuanced way. Third, we consider the case in which multiple treatment conditions are to be examined, that is, treatment effects can be modelled and examined for more than one treatment condition (in comparison to a control group).

This paper is structured as follows. First, we provide general definitions of treatment effects on count outcomes based on multiple categorical and continuous covariates. Specifically, we introduce the average treatment effect as well as several conditional treatment effects. Second, we derive how the moment-based approach can be utilized to compute these effects, differentiating between a solution for categorical covariates and three possible approaches for non-normal continuous covariates. Third, we briefly present the negative binomial multi-group structural equation model as the statistical framework for simultaneously estimating the parameters of the negative binomial regression and the covariates' distribution. Finally, the moment-based approach is applied to a real data sample to highlight the key features of our proposed extensions.

2 Definition and terminology of treatment effects

We start by introducing definitions of average and conditional treatment effects on a count outcome variable Y. Let the discrete treatment variable X have p+1 levels denoted with values x = 0,1,…, p. We will take X = 0 as reference or control group, and thus t = 1,…,p are the values of the remaining treatment conditions.1 Furthermore, we will consider a single (unfolded) categorical variable K with j+1 levels and values k = 0,1,…, j, and a vector of (latent) variables $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0007$ with z = 1,…, q denoting the (z+1) th element of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0008$ . We use $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0009$ to denote values of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0010$ . For this set of variables, the following parameterization of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0011$ always holds:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0012$ (1)

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0013$ (2)

In equation (1) the conditional expectation of Y is decomposed into an intercept function g₀ and a conditional effect function g_t for treatment condition t (cf. Mayer, Dietzfelbinger, Rosseel, & Steyer, 2016; Steyer & Nagel, 2017). As Y is a count variable, the conditional expectation $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0014$ can always be presented as in equation (2). Note that we use a regression with a logarithmic link function for each combination of X = x and K = k, that is, the function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0015$ is group-specific:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0016$

we use group-specific functions $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0017$ , because the effect computations later in this paper will be based on a multi-group model. The term $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0018$ denotes a partial conditional expectation. For its definition and further details, see Steyer and Nagel (2017, Section 14.4).

2.1 Conditional effects function

The intercept function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0019$ denotes the expected values for the control group X = 0,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0020$

and the p effect functions $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0021$ represent the expected increase/decrease in Y due to treatment t, that is, the difference between the conditional expectation of Y under treatment t compared to the control group:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0022$

The effect function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0023$ gives a conditional treatment effect for any value of the covariates K and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0024$ . The conditional expectations $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0025$ reflect the expected outcome given a treatment condition t. For example, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0026$ denotes the expected count of Y given K = 1 for a person who receives training X = 2 and has a pre-test score of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0027$ .

In practical settings, a parameterization is needed for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0028$ , which we will discuss at the end of this section. For a causal interpretation of the average and conditional effects, it is crucial that the effect functions $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0029$ be causally unbiased. Steyer, Mayer, and Fiege (2014) provide an overview of causality conditions ensuring unbiasedness of the effect function, one of which is the independence of the treatment variable from all potential confounders (e.g., created by randomized treatment assignment as in the ACTIVE study; see Section 6).

2.2 Average treatment effect

The average treatment effect of treatment X = t compared to control group X = 0 is defined as the unconditional expectation of the effect function,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0030$

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0031$ denotes the density of the joint distribution of K and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0032$ , and therefore reflects the average over the conditional effects given all values of the covariates. The average effect AE_t₀ gives the expected effect for a randomly sampled person assigned to treatment condition X = t compared to the control group X = 0.

2.3 Conditional effects given X and K

In addition to the average effect, finer-grained aggregates of the effect function also exist. For example, the treatment effect for female participants can be examined. This is represented as a conditional effect given a value k of the categorical covariate K and averages over the (K = k)-conditional distribution of the continuous covariates,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0033$

For example, if K represents a participant's gender, then $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0034$ provides the aggregated conditional effect of treatment X = t for all persons with gender K = k.

In a similar vein, the conditional effect given a treatment condition is defined by

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0035$

These effects are of interest when the (X = x)-conditional covariate distribution $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0036$ differs among treatment groups, as is the case, for example, in observational studies (i.e., without randomization) or in so-called broken experiments (Sagarin et al., 2014). For an overview, see Geneletti and Dawid (2011). In properly randomized trials, however, we would expect the conditional effects given a treatment condition to be equivalent to the corresponding average treatment effect.

Finally, the conditional effects above can also be combined to give the conditional effect given a value k of K and a treatment condition x of X:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0037$

Again, for a randomized controlled trial, we would expect these effects to be equivalent to the corresponding treatment effects given a value of the categorical covariate.

2.4 Regression parameterization and factorization of the covariates' distribution

Up to this point, we have presented non-parametrical definitions of average and conditional effects on count outcomes. In order to derive empirically estimable quantities, we need to introduce a parameterization for the $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0038$ functions and an applicable factorization of the mixed joint density of the categorical covariate K and the continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0039$ .

We choose a linear parameterization of the function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0040$ ,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0041$ (3)

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0042$ is a real-valued vector of regression coefficients with length q+1. This corresponds to the parameterization widely used in count regression models, for example, Poisson or negative binomial regression models. Hence, a count regression is specified for each combination of (X = x, K = k).

As can be seen in the previous subsection, the various average and conditional treatment effects require unconditional, but also (X = x)-, (K = k)-, and (X = x, K = k)-conditional distributions of the covariates K and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0043$ . Hence, we provide a factorization of the mixed joint distribution $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0044$ allowing us to easily derive all of the aforementioned unconditional and conditional distributions:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0045$ (4)

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0046$ (5)

In equation (4), the joint distribution of the covariates K and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0047$ is defined as the marginal distribution of the joint distribution of the treatment variable X, the categorical covariate K, and the continuous covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0048$ . In equation (5), the joint distribution is decomposed into a categorical group part P(X = x, K = k) and a group-conditional density $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0049$ of the continuous variables. This factorization serves two purposes. First, it allows us to identify the conditional densities required for the computation of the aforementioned average and conditional effects, namely $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0050$ , $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0051$ , $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0052$ , and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0053$ . Second, we can use a multi-group approach, namely a multi-group structural equation model, to estimate parameters both of the negative binomial regression and of the distribution of the covariates.

The choice of a factorization is without loss of generality for our approach, because an alternative factorization can always be transformed into our factorization. We provide more information on this aspect in the discussion and in Appendix 1.

Following the linear parameterization of the function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0054$ and factorization of the joint distribution $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0055$ , the average treatment effect, for example, can be computed as

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0056$ (6)

Note that the parameterization and factorization are chosen with regard to the statistical framework for parameter estimation we will discuss later on, that is, negative binomial regression and multi-group structural equation models. As stated before, for a causal interpretation of the treatment effects, the effect function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0057$ must be causally unbiased. Causal unbiasedness goes beyond unbiasedness of the estimated parameters and means that there are no unobserved confounders of the treatment effects. It can be achieved, for example, by an (unconditional or conditional) randomized assignment of persons to the treatment groups or by controlling for all confounders. Tools developed in the causal inference literature such as propensity scores (Rosenbaum & Rubin, 1983) can be helpful to meet the ‘no unobserved confounders’ condition. For an overview of conditions under which causal unbiasedness is achieved, see Steyer et al. (2014). Thus, applied researchers should carefully evaluate whether the parameterization and factorization described here fit their hypotheses and assumptions.

3 Moment-based approach assuming multivariate normality

In the previous section we introduced a distinction between categorical and continuous covariates. Such a distinction has been previously proposed for treatment effect computation, for example, in the EffectLiteR approach by Mayer et al. (2016). A saturated model is used, that is, the probability of each occurring value of the categorical covariate is examined. Thus, no further distributional assumption for the categorical covariate is required.

However, including continuous covariates requires specification of the (X = x, K = k)-conditional density $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0058$ of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0059$ . In their extension of the moment-based approach, Kiefer and Mayer (2020) suggest assuming (X = x)-conditional multivariate normality. Adapting this notion to our distinction between categorical covariates K and continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0060$ and assuming (X = x, K = k)-conditional normality, that is, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0061$ , the integration part of the average treatment effect in equation (6) can be substituted with moment-generating functions:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0062$

Sometimes it might suffice to account for the case of categorical covariates in addition to normally distributed continuous covariates to obtain unbiased parameter and effect estimates.

4 Accounting for non-normal continuous covariates

In the following subsections we present three different ways to incorporate non-normal continuous covariates into the moment-based approach. All three ways follow the basic idea of the moment-based approach, that is, (at least partly) substituting improper integrals with moment-generating functions. The suggested solutions depend on the level of information available for the non-normal continuous variables: (1) an alternative, non-normal joint distribution is known; (2) a plausible factorization of the joint distribution can be constructed; and (3) an approximation of the covariates' joint distribution is required. Note that we do not suggest that one of these approaches is generally preferable to the others. Rather, which approach to use depends on the concrete data situation one is confronted with.

All of the aforementioned approaches have in common that we use the joint density of the covariates in a maximum likelihood framework to estimate parameters and treatment effects. As these models contain negative binomial regressions, multi-group parts, measurement models and parameters of the non-normal distributions, they can involve many parameters. Thus, large sample sizes may be required for a solution to converge (Jackson, 2003). For smaller samples or cases in which none of the three proposed approaches is feasible, Bayesian modelling and estimation can be an alternative. We discuss Bayesian alternatives in the Discussion section.

4.1 Case 1: Known non-normal joint distribution

The first and probably simplest case is a known alternative non-normal (X = x, K = k)-conditional joint distribution for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0063$ . In this case, we can substitute the density and moment-generating functions for effect computation with the ones from the alternative distribution. For example, if we want to account for a slight skew in our continuous variables, the multivariate skew-normal distribution (Azzalini & Valle, 1996) is a viable alternative to the normal distribution. Admittedly, the case of knowing a number of suitable non-normal multivariate distributions and their moment-generating function might not be very common, especially for applied researchers.

However, another important scenario for known alternative distributions is (X = x, K = k)-conditional independence of the covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0064$ because then the product of univariate distributions corresponds to the marginal distribution. While the (X = x, K = k)-conditional independence of the covariates might be a strong assumption, it is helpful to see that the joint moment-generating function can be decomposed into the product of the univariate moment-generating functions in this case. For example, if we consider two continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0065$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0066$ with (X = x, K = k)-conditional independence $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0067$ , the corresponding density can be decomposed $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0068$ , and thus the moment-generating function can be written as the product of univariate moment-generating functions,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0069$

where t₁,t₂ are the evaluation points of the moment-generating functions. See Kiefer and Mayer (2019) for an overview of univariate moment-generating functions and their performance within the moment-based approach.

4.2 Case 2: Factorization of joint distribution

Sometimes, it might be impossible to find a suitable joint distribution, and the assumption of conditional independence might be too strong. A notable example of this case was chosen for our illustrative example: a continuous latent variable (i.e., depression) and a discrete, yet non-categorical count variable (i.e., baseline count of correctly answered items). To our knowledge, no joint distributions for count and continuous variables have been proposed to date. However, when examining count outcome variables, researchers often wish to account for the respective baseline count as well.

We suggest a practical workaround for these cases: finding a plausible factorization of the joint distribution. For example, this might be decomposing the joint distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0070$ into a marginal and a conditional distribution, that is,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0071$

where we specify the marginal distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0072$ with index m_i identifying a subset of covariates, and the conditional distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0073$ given $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0074$ with index c_i identifying the remaining covariates in $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0075$ . In general, this factorization makes it possible to simplify the integration part of the average effect in equation (6) as

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0076$

The factorization approach is challenging to integrate into the moment-based approach, because it does not yield a comprehensive moment-generating function for the factorized joint distribution. Consequently, a combination of moment-generating functions and numerical integration is required for effect estimation. In our illustrative example, we will further delineate these computations for the factorization approach.

In practical terms, the construction of a conditional distribution for some covariates can be achieved using a regression approach, that is, a parameterization of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0077$ . In our illustrative example, this relation will be estimated using a regression with a logarithmic link function. From a causal perspective, such a regression can be problematic, because baseline variables do not necessarily have a temporal order, and we do not suggest that one baseline variable causes another baseline variable. Thus, the factorization approach is a technical workaround and its regression parameters are not necessarily of interest.

4.3 Case 3: Approximation with finite Gaussian mixtures

Finally, when no alternative joint distribution or plausible factorization can be found, it is possible to approximate the joint distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0078$ . One possible approach is to approximate the non-normal joint distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0079$ by using a finite mixture of M multivariate normal distributions, that is, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0080$ where w_m are weights and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0081$ . In practice, this approximation of the distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0082$ can be estimated using a finite mixture or latent class approach (McLachlan & Peel, 2000). Researchers can control the degree of approximation by specifying the number M of latent classes, where only the expected values $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0083$ and the variance $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0084$ are allowed to vary among latent classes. This approach has previously been applied in the computation of average and conditional effects in a nonlinear regression setting by Mayer, Umbach, Flunger, and Kelava (2017) and can be estimated using nonlinear structural equation mixture models (Kelava & Brandt, 2014; Kelava, Nagengast, & Brandt, 2014). However, it is notable that an increasing number of latent classes M can lead to convergence problems in maximum likelihood estimation.

With respect to effect estimation, the finite-mixture approach yields a comprehensive (X = x, K = k)-conditional moment-generating function for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0085$ , as we can use a weighted sum of normal moment-generating functions for the integration part from equation (6),

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0086$

which is similar to the original extended moment-based approach by Kiefer and Mayer (2020), with the exception of the summation over the latent classes.

5 Negative binomial multi-group structural equation model

In this section, we introduce the negative binomial multi-group structural equation model (NB-MG-SEM) as a statistical framework for parameter and effect estimation. The NB-MG-SEM provides maximum likelihood estimation. The model involves the count outcome variable Y, the categorical treatment variable X with p levels, the (unfolded) categorical covariate K with j levels, and the vector of continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0087$ containing latent covariates measured by observed variables $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0088$ . The NB-MG-SEM consists of (at least) the following parts:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0089$ (7)

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0090$ (8)

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0091$ (9)

where v is a vector of measurement intercepts; $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0092$ is a matrix of loadings; ɛ is a vector of measurement error variables with mean zero and covariance matrix $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0093$ ; µ_Y is the conditional expectation of the count outcome; α_xk is a vector of regression coefficients; and κ_xk is a parameter for the log-transformed expected group frequency n_xk of group (X = x, K = k). The probability for a group (X = x, K = k) can be computed with $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0094$

The NB-MG-SEM presented in equations 7-9 is a least common denominator with regard to the cases of non-normally distributed covariates identified in the previous section and can be extended with respect to the given case at hand. In our illustrative example, we will add another regression with a logarithmic link function specifying the relation between two covariates. We provide detailed information on the maximum likelihood estimation of the NB-MG-SEM for our illustrative example in Appendix 2. For brevity, we present the four main parts and assumptions of the likelihood functions constituting the joint distribution of the variables considered:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0095$

The joint distribution of the covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0096$ depends on which of the three aforementioned cases is applied. In our illustrative example, we will factorize the distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0097$ into a marginal and a conditional distribution (i.e., case 2 scenario), where the conditional distribution depends on a $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0098$ and the parameters of the marginal distribution are implied by the measurement model. Illustration of the other cases is provided in the online Appendix S1.

6 Illustrative example

To illustrate the use of our extension to the moment-based approach, we use data from the Advanced Cognitive Training for Independent and Vital Elderly (ACTIVE) study (Ball et al., 2002; Jobe et al., 2001; Tennstedt et al., 2005). The ACTIVE study is a large randomized controlled trial designed to examine the effectiveness of cognitive interventions among older adults. We will investigate the effects of these interventions while controlling for the (non-normal) count pre-test score, (latent) baseline depression, and participants' gender. As count outcome Y, we investigate post-test performance on an inductive reasoning assessment (i.e., letter sets). Letter sets evaluate how well an individual can recognize a pattern among several sets of letters. Thus, Y reflects the count of correctly answered items. The ACTIVE data subset analysed is publicly available; a link is provided in the online Appendix S1.

In this paper we do not present a comprehensive analysis of the ACTIVE study. Instead, the primary goal of this paper is to illustrate how non-normally distributed covariates can be integrated into the moment-based approach when estimating average and conditional treatment effects. The NB-MG-SEM and treatment effects were estimated using R (R Core Team, 2018) and the CountEffects package. The CountEffects package is an implementation of the moment-based approach assuming (X = x, K = k)-conditionally multivariate normally distributed covariates, which we extended with functions to estimate all three proposed approaches to deal with non-normal covariates for our illustrative example. In the online Appendix S1, we provide information on how to install CountEffects and how to estimate treatment effects based on the multivariate normal assumption and all three cases. Our findings indicate that the three non-normal cases yield similar results for the illustrative example, while the normal case yields differing estimates for some effects. However, in this section we will only describe the estimation and results of the case 2 scenario. We used listwise deletion in our analyses in order to keep the supplementary R code accessible for interested readers. However, we also ran a full-information maximum likelihood estimation for our model (using Mplus; Muthén & Muthén, 1998–2015) which yielded similar results.

6.1 Sample

The total sample size for our analysis was N = 2,363. The participants were randomly assigned to one of four conditions: a no-contact control group (X = 0, N = 592), a memory training (X = 1, N = 589), a reasoning training (X = 2, N = 590), and a speed of processing training condition (X = 3, N = 606). Each treatment condition consisted of a ten-session training intervention. In a baseline assessment, the participants took several cognitive functioning tests and completed a self-report questionnaire of psychological measures. The post-test assessment was conducted in the first 10 days after the last training session. The participants were predominantly female (75.8%, K = 1).

6.2 Measures

6.2.1 Depression

Depressive symptoms were assessed using the 12-item version of the Center for Epidemiological Studies Depression Scale (CESD-12; Radloff, 1977). Participants rated the frequency of several depressive symptoms (e.g., feeling sad) during the last week on a four-point scale from 0 = never to 3 = 5–7 days. We modelled baseline depression as latent variable $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0099$ , measured by three parcels (i.e., sum scores of four items each; Z₁₁, Z₁₂, Z₁₃). We used random item parcels for simplicity in our illustrative example. For warnings about the use of parcels, see Marsh, Lüdtke, Nagengast, Morin, and von Davier (2013) and Sterba and Rights (2016).

6.2.2 Letter sets

The count of correctly answered items on a letter sets task was used as outcome Y (i.e., post-test score) and covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0100$ (i.e., pre-test score) in our analysis. Letter sets evaluate how well an individual can find rules or patterns that make different sets of letters alike in some way. Each problem had five sets of letters with four letters in each set. Participants were given 15 different problems and had 7 min to complete the task.

6.3 Model

Our effect analysis was based on a multi-group structural equation model with a group-invariant linear measurement model and a structural model with two linear predictors, logarithmic link functions, and a negative binomial distribution with overdispersion parameter ϕ fixed to zero (i.e., a Poisson distribution) for the respective dependent variable.2 The measurement model was chosen to be τ-congeneric, as we had no a priori assumption about the true scores (e.g., τ-equivalence). The first indicator was chosen as reference indicator. The measurement model is expressed as:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0101$

The latent variable $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0102$ was assumed to have an (X = x, K = k)-conditional normal distribution

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0103$

Our structural model with two linear predictors, logarithmic link functions, and Poisson distribution for the respective dependent variable was

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0104$

Note that the second regression $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0105$ is specified for technical reasons, that is, to model a factorized joint distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0106$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0107$ . We do not assume a causal relation, where the values of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0108$ are predetermined by $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0109$ . Hence, it would also be possible to factorize the joint distribution with $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0110$ . We chose the first factorization, because it yields some facilitative properties for the numerical integration procedures required in effect estimation (i.e., Gauss–Hermite quadrature; for more information, see Appendix 3).

Finally, the model for group sizes n_xk was expressed as

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0111$

The whole model is displayed in Figure 1. Note that regressions with a logarithmic link function are indicated by curved arrows, in contrast to straight lines for linear regressions.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Path diagram for our illustrative example depicting the group invariant measurement models for latent baseline depression $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0112$ and the baseline count of correctly answered items $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0113$ , the structural model specifying the regressions with a logarithmic link function (indicated by curved arrows) of the post-test count of correctly answered items Y on the baseline variables $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0114$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0115$ in each of the eight groups.

**Figure 1**
Open in figure viewer PowerPoint

Path diagram for our illustrative example depicting the group invariant measurement models for latent baseline depression $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0112$ and the baseline count of correctly answered items $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0113$ , the structural model specifying the regressions with a logarithmic link function (indicated by curved arrows) of the post-test count of correctly answered items Y on the baseline variables $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0114$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0115$ in each of the eight groups.

6.4 Average and conditional treatment effects

With regard to our three proposed scenarios of non-normal continuous covariates, the specification of a baseline count variable and a continuous variable falls under case 2: as no suitable joint distribution is available, a factorization of the joint distribution into a known marginal and a known conditional distribution can be specified. In the previous section, we stated that we assumed latent baseline depression is (X = x, K = k)-conditionally marginally normally distributed $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0116$ and is related to the baseline count of correctly answered items via a Poisson regression with $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0117$ with $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0118$ . In this scenario, the average treatment effect can be computed as

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0119$

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0120$ and a_h, w_h are Gauss–Hermite quadrature points and weights. A detailed derivation of this formula is provided in Appendix 3. The moment-generating function for the Poisson distribution is:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0121$

The conditional effects are then computed analogously to our definitions above. For example, the conditional effect given a value of the gender covariate K is computed as

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0122$

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0123$ are integration points at which the function is evaluated and a_h,w_h are Gauss–Hermite quadrature points and weights.

Standard errors for the estimated treatment effects can be derived using the delta method (cf. Boos & Stefanski, 2013, p. 237).3 In our analysis, we computed symmetric confidence intervals based on the standard errors, because the estimated treatment effects are asymptotically normally distributed and our sample was comparatively large. However, in smaller samples confidence intervals based on, for example, a bootstrap approach are recommended.

6.5 Results

It is currently not possible to evaluate the complete model fit using a χ² test or other fit statistics, as these are based on the χ² value of the model implied covariance matrix for the observed variables. For models with a logarithmic link function, the implied covariance matrix is not an appropriate description of the dependencies among the variables due to the nonlinearity.

6.5.1 Model parameters

The maximum likelihood estimates for the model parameters are given in Table 1. We will not discuss them in detail here, but rather present some notable findings to illustrate the interpretation of these parameters.

Table 1. Maximum likelihood results for group-specific model parameters

Parameter	Estimate	SE	p	Parameter	Estimate	SE	p
X = 0, K = 0				X = 0, K = 0
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0124$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0125$
α₀₀₀	1.324	0.103	<.001	α₀₁₀	1.218	0.061	<.001
α₀₀₁	−0.048	0.041	.241	α₀₁₁	-0.034	0.018	.061
α₀₀₂	0.099	0.011	<.001	α₀₁₂	0.103	0.007	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0126$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0127$
γ₀₀₀	1.970	0.059	<.001	γ₀₁₀	1.904	0.035	<.001
γ₀₀₁	0.148	0.049	.002	γ₀₁₁	-0.109	0.020	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0128$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0129$
µ₀₀	1.113	0.087	<.001	µ₀₁	1.571	0.072	<.001
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0130$	0.792	0.128	<.001	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0131$	1.629	0.149	<.001
X = 1, K = 0				X = 1, K = 1
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0132$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0133$
α₁₀₀	1.141	0.110	<.001	α₁₁₀	1.256	0.057	<.001
α₁₀₁	0.016	0.031	.602	α₁₁₁	-0.037	0.016	.019
α₁₀₂	0.111	0.013	<.001	α₁₁₂	0.102	0.007	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0134$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0135$
γ₁₀₀	1.938	0.053	<.001	γ₁₁₀	1.890	0.030	<.001
γ₁₀₁	-0.057	0.034	.094	γ₁₁₁	-0.077	0.017	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0136$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0137$
µ₁₀	1.296	0.109	<.001	µ₁₁	1.440	0.074	<.001
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0138$	1.298	0.201	<.001	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0139$	1.955	0.176	<.001
X = 2, K = 0				X = 2, K = 1
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0140$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0141$
α₂₀₀	1.335	0.103	<.001	α₂₁₀	1.340	0.053	<.001
α₂₀₁	0.004	0.026	.890	α₂₁₁	-0.013	0.014	.345
α₂₀₂	0.102	0.011	<.001	α₂₁₂	0.099	0.006	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0142$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0143$
γ₂₀₀	2.002	0.052	<.001	γ₂₁₀	1.785	0.033	<.001
γ₂₀₁	-0.098	0.031	.001	γ₂₁₁	-0.024	0.016	.145
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0144$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0145$
µ₂₀	1.411	0.127	<.001	µ₂₁	1.641	0.074	<.001
α₂₀	1.879	0.303	<.001	α₂₁	1.879	0.171	<.001
X = 3, K = 0				X = 3, K = 1
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0146$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0147$
α₃₀₀	1.274	0.101	<.001	α₃₁₀	1.210	0.057	<.001
α₃₀₁	0.016	0.026	.532	α₃₁₁	-0.009	0.017	.581
α₃₀₂	0.095	0.012	<.001	α₃₁₂	0.107	0.007	<.001
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0148$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0149$
γ₃₀₀	1.936	0.051	<.001	γ₃₁₀	1.823	0.033	<.001
γ₃₀₁	-0.081	0.030	.007	γ₃₁₁	-0.052	0.018	.005
Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0150$				Estimates for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0151$
µ₃₀	1.416	0.126	<.001	µ₃₁	1.526	0.067	<.001
α₃₀	1.861	0.277	<.001	α₃₁	1.515	0.137	<.001

Pre-test depression had no significant effect on the post-test count of correctly answered items in either group (e.g., α₀₀₁ = 0.048, 95% confidence interval (CI) [−0.129, 0.033]), except for female participants receiving the memory training (α₁₁₁ = −0.037, 95% CI [−0.069, 0.006]). Furthermore, pre-test depression was significantly negatively related to the pre-test count of correctly answered items in most groups (e.g., γ₀₁₁ = −0.109, 95% CI [−0.148, 0.070], γ₂₀₁ = −0.098, 95% CI [−0.758, 0.038], and γ₃₀₁ = −0.081, 95% CI [−0.139, 0.022]). For example, for male participants in the reasoning training, each one-unit change in pre-test depression was linked to a 9% decrease in correctly answered items at baseline (i.e., γ₂₀₁, exp(−0.098) = −0.91). In two groups, this relationship was not significant (γ₁₀₁ = −0.057, 95% CI [−0.123, 0.010] and γ₂₁₁ = −0.024, 95% CI [−0.056, 0.008]).

Pre-test count of correctly answered items was a significant positive predictor of post-test count of correctly answered items in all groups. For example, for female participants in the memory group, each additional correctly answered item at baseline was linked with an 11% increase in correctly answered items at post-test (i.e., α₁₁₂, exp(0.102) = 1.11). The intercept coefficients reflect the expected post-test count of correctly answered items for a male person with depression score ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0152$ ) and zero correctly answered items at baseline ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0153$ ). For example, in the reasoning training (i.e., α₂₀₀ = 1.335), the expected count is $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0154$ .

6.5.2 Average effects

An overview of all average and conditional treatment effects estimated for our illustrative example is given in Table 2. Remember that the letter sets test was part of a cognitive assessment measuring reasoning performance. Hence, we would expect the reasoning training in particular to have a significant effect, while the memory and speed of processing training might not necessarily affect reasoning performance. In line with this, the average treatment effect of the memory training ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0155$ , 95% CI [−0.203, 0.372], ES = 0.030) and the average treatment effect of the speed of processing training ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0156$ , 95% CI [−0.052, 0.525], ES = 0.083) on the count of correctly answered items were not significant. The average treatment effect of the reasoning training, however, was significant ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0157$ , 95% CI [0.487, 1.080], ES = 0.275), that is, participants in this condition correctly answered 0.849 items more on average compared to baseline.

Table 2. Estimated average and conditional effects of memory training (X = 1 versus X = 0), reasoning training (X = 2 versus X = 0), and speed of processing training (X = 3 versus X = 0) compared to the control group

Effect	Estimate	SE	ES	Effect	Estimate	SE	ES
Estimated average and conditional effects of memory training
Average effect $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0158$				Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0159$
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0160$	0.084	0.147	0.030	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0161$	−0.238	0.302	−0.084
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0162$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0163$	0.161	0.166	0.056
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0164$	0.053	0.147	0.019	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0165$	−0.130	0.314	−0.046
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0166$	0.094	0.151	0.033	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0167$	0.167	0.171	0.059
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0168$	0.093	0.147	0.033	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0169$	−0.109	0.321	−0.038
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0170$	0.095	0.147	0.034	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0171$	0.157	0.167	0.055
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0172$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0173$	−0.119	0.309	−0.042
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0174$	−0.152	0.305	−0.053	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0175$	0.160	0.167	0.056
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0176$	0.161	0.167	0.057
Estimated average and conditional effects of reasoning training
Average effect $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0177$				Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0178$
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0179$	0.784	0.151	0.275	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0180$	0.598	0.317	0.210
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0181$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0182$	0.814	0.171	0.286
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0183$	0.756	0.152	0.266	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0184$	0.697	0.325	0.245
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0185$	0.786	0.156	0.276	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0186$	0.814	0.178	0.286
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0187$	0.797	0.152	0.280	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0188$	0.715	0.329	0.251
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0189$	0.789	0.151	0.277	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0190$	0.831	0.171	0.292
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0191$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0192$	0.697	0.319	0.245
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0193$	0.674	0.318	0.237	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0194$	0.817	0.171	0.287
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0195$	0.819	0.172	0.288
Estimated average and conditional effects of speed of processing training
Average effect $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0196$				Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0197$
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0198$	0.237	0.147	0.083	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0199$	−0.102	0.308	−0.036
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0200$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0201$	0.321	0.166	0.113
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0202$	0.207	0.147	0.073	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0203$	−0.025	0.320	−0.009
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0204$	0.234	0.152	0.082	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0205$	0.317	0.172	0.111
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0206$	0.254	0.148	0.089	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0207$	−0.015	0.326	−0.005
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0208$	0.250	0.147	0.088	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0209$	0.340	0.167	0.120
Conditional effects $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0210$				$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0211$	0.011	0.314	0.004
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0212$	−0.035	0.310	−0.012	$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0213$	0.322	0.166	0.113
$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0214$	0.325	0.167	0.114

6.5.3 Conditional effects given (K = k)

The conditional treatment effects given a gender are given in detail in Table 2. Here, we examine differential treatment effects depending on the participants' gender.

For the memory training, we found a slightly negative treatment effect for male participants ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0215$ , 95% CI [−0.750, 0.446], ES = −0.053) and a positive treatment effect for female participants K = 1 ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0216$ , 95% CI [−0.167, 0.489], ES = 0.057). While the effect sizes differed in direction and magnitude, both effects were not statistically significant.

The reasoning training had significant effects for both men and women. The conditional effect for female participants ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0217$ , 95% CI [0.482, 1.156], ES = 0.288) was higher than the conditional effect for male participants ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0218$ , 95% CI [0.052, 1.297], ES = 0.237).

While the average treatment effect of the speed of processing training was not significant, we found a slightly non-significant conditional effect for female participants ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0219$ , 95% CI [−0.002, 0.652], ES = 0.114). Conversely, for male participants, the speed of processing training did not yield a significant effect ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0220$ , 95% CI [−0.750, 0.446], ES = −0.053). This differential effect illustrates why examining conditional effects can be crucial when evaluating a treatment.

6.5.4 Conditional effects given (X = x) and (X = x, K = k)

The conditional treatment effects given a treatment condition (and gender) are given in detail in Table 2. As the ACTIVE study was a randomized controlled trial, the (X = x, K = k)-conditional distributions of depression $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0221$ and baseline test scores $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0222$ should not differ across levels of X. Thus, the conditional effects given a treatment condition are expected to be close to the corresponding average treatment effects. The same applies to conditional effects given treatment and gender. We will not discuss all of these effects in detail here, but rather illustrate their interpretation using one example.

For example, the conditional treatment effect of the memory training for the non-treated ( $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0223$ , 95% CI [−0.234, 0.340], ES = 0.019) is similar to the average effect of the memory training. This refers to the expected effect if participants who were assigned to the no-contact control group had instead been assigned to the memory training. These effects being close to the average treatment effect $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0224$ reflects that differences in baseline variables are small across groups. Note that for observational studies or broken experiments, these effects would not be necessarily close to each other due to possible baseline differences.

7 Summary and conclusions

In this paper we presented and illustrated a new method of accounting for non-normal covariates when estimating average and conditional treatment effects for count outcomes. We extended the moment-based approach by Kiefer and Mayer (2019, 2020) in three respects. First, we presented four ways to account for non-normal covariates: (1) for categorical covariates, applying a saturated model; for continuous covariates, we suggested either (2) the use of an alternative, known non-normal joint distribution (e.g., skew-normal distribution), (3) a plausible factorization into marginal and conditional distributions, or (4) approximation via a finite Gaussian mixture distribution. Second, we extended the effect analysis to multiple treatment conditions, making it possible to evaluate the effectiveness of several treatments simultaneously. Third, we introduced conditional effects given a treatment condition and/or values of the categorical covariates into the moment-based approach, allowing for a finer-grained effect analysis. Finally, we provided an illustrative example to show how our extensions of the moment-based approach can be applied to real data. In our example, three cognitive training conditions were compared to a no-contact control group regarding the count of correctly answered items in a cognitive reasoning test. We considered as covariates the baseline count of correctly answered items (non-normal covariate), baseline depression (latent covariate), and gender (categorical covariate). The corresponding negative binomial multi-group structural equation model and the average and conditional treatment effect estimations were carried out in R. We have made these functions conveniently accessible for applied researchers in an R package.

The aforementioned advancements bring the benefits of the moment-based approach for statistical inference to a broader range of applied scenarios in psychological, social, and health sciences. In contrast to earlier approaches, the moment-based approach treats observed group sizes and covariate values as random and not predetermined by the experimenter. Ignoring this randomness would lead to underestimation of standard errors and, thus, inflated Type I error rates and decreased power, which has previously been shown for stochastic covariates (Li, McLouth, & Delaney, 2020; Liu, West, Levy, & Aiken, 2017) and stochastic group sizes (Mayer & Thoemmes, 2019). In addition, the moment-based approach allows accounting for measurement error in covariates. For an overview of consequences of measurement error in nonlinear regression models, see Carroll, Ruppert, Stefanski, and Crainiceanu (2010). However, in the social and health sciences, observed variables in real data sets often deviate from the normal distribution (Bono et al., 2017; Micceri, 1989). While Kiefer and Mayer (2020) proposed a moment-based approach assuming strictly multivariate normally distributed covariates, we relaxed this assumption in this paper, offering several alternatives for non-normally distributed variables. Thus, the moment-based approach with our extension should better fit typical applications in social and health sciences.

7.1 Limitations and further research

There are some aspects not covered in this paper that could provide starting points for further research and refinements of the approach:

In equation (3), we assumed a linear parameterization for the predictor function $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0225$ . While this is the standard parameterization of, for example, generalized linear models (McCullagh & Nelder, 1996) and for effect analysis with linear link function (e.g., Mayer et al., 2016), sometimes nonlinear predictor functions might be of substantial interest. For example, Mayer et al. (2017) investigated the effectiveness of autonomy support by ninth-grade teachers in reducing students' state of boredom. They hypothesized a quadratic relationship between boredom and self-efficacy, which means that the treatment could be most effective for medium values of self-efficacy and less for both extremes. Similarly, Liu and West (2015) use trigonometric terms to predict cyclic patterns in daily report (count) data. The inclusion of quadratic or other nonlinear terms might also be of interest for count outcomes and regression models with a logarithmic link function. We presume that the moment-based approach would not yield analytical effect formulas for nonlinear predictor functions, but effect computation using numerical integration would be an alternative.

Throughout the paper, we factorized the joint distribution of the covariates into a marginal distribution of the categorical variables (i.e., treatment X and covariates K) and a (X = x, K = k)-conditional distribution of the continuous covariates. Our choice can seem restrictive, as alternative factorizations might be more suitable for some cases. For example, Kiefer and Mayer (2019) used a converse factorization, that is, a marginal distributed continuous covariate Z and a (Z = z)-conditional probability of the binary treatment X. However, we would argue that for a given joint distribution multiple equivalent factorizations exist. In the aforementioned example, the joint distribution can be equivalently rewritten in our factorization, that is, a marginal distribution of the binary treatment and an (X = x)-conditional distribution of Z (for a derivation, see Appendix 1).

We used maximum likelihood estimation to obtain parameter estimates for the negative binomial multi-group structural equation model, but in applied settings alternative statistical frameworks might be more suitable. Maximum likelihood estimation can suffer from non-convergence issues in complex models (Deng, Yang, & Marcoulides, 2018). We also exhibited this phenomenon in our illustrative example, where we had to exclude overdispersion parameters from the model in order to achieve convergence. Thus, we recommend three solutions to address non-convergence issues. First, a pragmatic workaround would be to specify a more parsimonious model, because non-convergence might be a consequence of overparameterization (Jackson, 2001, 2003). Second, a Bayesian structural equation modelling framework can be applied. The Bayesian approach can typically handle complex models much easier than maximum likelihood estimation (Merkle & Rosseel, 2015). For an overview of Bayesian estimation of structural equation models based on finite-mixture distributions or distributions based on the exponential family, see Lee (2007, Chapters 11 and 13). Third, recent factor score approaches (e.g., Devlieger & Rosseel, 2017) might also help achieving convergence by simplifying the measurement model and, thereby, the numerical integration part of the maximum likelihood estimation. However, if the factor scores depend on a distributional assumption, it would be wise to use the same distributional assumption within the moment-based approach.

In our illustrative example, we examined the effectiveness of cognitive training using a pre–post design. The ACTIVE study also contains several follow-up measurements allowing long-term effectiveness of the cognitive training to be investigated. In future developments, it might be fruitful to incorporate this longitudinal information into effect analyses. We suggest two ways for doing so. First, when examining long-term effects using a follow-up measurement, earlier measurements (e.g., post-test) could be included as mediators. In this scenario, total, direct, and indirect effects of the treatment can be distinguished (Mayer, Thoemmes, Rose, Steyer, & West, 2014). This can help researchers understand, for example, which post-test characteristics foster a long-term effect of the cognitive training. Second, several measurement occasions can be summarized or contrasted, for example, using a growth curve model (McArdle & Epstein, 1987) or growth component model (Kiefer, Rosseel, Wiese, & Mayer, 2018; Mayer, Steyer, & Mueller, 2012) among follow-up measurements. In this case, the outcome of interest would be a latent slope or growth component instead of the count outcome.

In non-randomized observational studies the treatment effects discussed in this paper are not necessarily causal effects. However, causality theories, such as Rubin's causal model (Rubin, 2005) or the stochastic theory of causal effects (Steyer et al., 2014) provide causality conditions under which causal effects can be estimated. Usually, these conditions require a careful selection of covariates, which then are controlled for with regression adjustment (e.g., Mayer, 2019; Mayer et al., 2016) or propensity score methods (Rosenbaum & Rubin, 1983). For an overview of design and analysis in quasi-experimental settings, see also Reichhardt (2019).

Appendix 1 Factorization of the joint distribution

In equations (4) and (5), we claimed that our factorization of the joint distribution was without loss of generality. This can be shown using two properties of distribution functions. Let us consider two random variables A and B with joint distribution f_A_,_B(a,b). Then

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0226$

If the marginal distribution f_B(b) and the conditional distribution f_A_|_B₌_b(a) are known, the corresponding distributions f_A(a) and f_B_|_A₌_a(b) can directly be computed with

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0227$

Consequently, if one version or factorization of the joint distribution is known, equivalent factorizations for the same joint distribution can be derived.

Let us now consider a simple example of the moment-based approach, with a single covariate Z and a binary treatment variable X. In contrast to our factorization in equations (4) and (5), we now consider

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0228$

as the known factorization of the joint distribution, where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0229$ and the conditional probability of X = x is given by

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0230$

that is, as a logistic function of the covariate Z. This kind of joint distribution can be useful if self-selection of participants into treatment X = 1 or control X = 0 depending on the covariate Z is assumed.

Nevertheless, our factorization from equation (5) can be obtained from these known marginal and conditional distributions by computing

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0231$

Note that these transformations are generally required for effect computation only. The maximum likelihood estimation would, nevertheless, be based on f_Z(Z) and P(X = x|Z = z) in this example. In contrast, when trying to estimate the likelihood function based on P(X = x) and f_Z|X₌_x, it might be tempting to estimate P(X = x) directly instead of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0232$ . However, introducing a parameterization for P(X = x) would lead to redundancies, because P(X = x) is already determined by µ_Z, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0233$ , $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0234$ , and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0235$ . Consequently, additional estimation of P(X = x) would lead to a non-invertible covariance matrix of the parameter estimates and, thus, standard errors could not be computed. The redundant parameter would remain at its starting value, and thus would not yield a trustworthy estimate. The likelihood function should only contain parameters µ_Z, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0236$ , $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0237$ , and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0238$ . Corresponding parameters from our factorization and their standard errors can then be computed for effect estimation using the delta method.

Appendix 2 Model estimation

In this section we present details on the maximum likelihood estimation of the NB-MG-SEM for our illustrative example. Let us consider a count outcome variable Y, a treatment variable X with four levels, a binary categorical covariate K, and a vector of continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0239$ containing a latent covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0240$ measured by three observed indicator variables z = (Z₁,Z₂,Z₃), and a count covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0241$ . We want to estimate a statistical model as presented in equations 7-9 using maximum likelihood techniques. Assuming a sample of N independently and identically distributed observations, the complete-data likelihood is given by

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0242$

where y_i, x_i, k_i, z_i $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0243$ are the ith observed values of the aforementioned variables. As we do not have observed values $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0244$ i.e., latent covariate), we actually have to optimize the observed-data likelihood:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0245$

Note that marginalizing over the joint density of the complete data is only one way to optimize this likelihood. It would also be possible to use an EM algorithm (Dempster, Laird, & Rubin, 1977) to solve the complete data likelihood, but this is beyond the scope of this paper.

The joint density $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0246$ can be factorized into five conditional densities:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0247$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0248$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0249$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0250$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0251$

In the last step, we decomposed the parameter vector $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0252$ into four parts corresponding to the four conditional densities. The parameters to estimate in the respective densities are:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0253$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0254$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0255$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0256$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0257$

In the NB-MG-SEM, the conditional densities are defined by the model assumptions. Here, $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0258$ reflects the model for group sizes $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0259$ , which are assumed to be Poisson distributed. The group-specific Poisson regression of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0260$ on $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0261$ is given by $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0262$ , the group-specific and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0263$ -conditional distribution of the observed indicators $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0264$ is given by $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0265$ , the group-specific Poisson regression of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0266$ on $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0267$ is given by $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0268$ , and the group-specific marginal distribution of the latent variable $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0269$ is given by $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0270$ . Thus, the conditional densities in our case are

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0271$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0272$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0273$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0274$

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0275$

As the group weights do not depend on the latent covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0276$ , we can simplify the observed-data density as follows:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0277$

Hence, the corresponding log-likelihood is

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0278$

As there is no closed-form solution for $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0279$ , numerical integration is required. In our implementation, we use Gauss–Hermite quadrature which approximates the integral over $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0280$ with a finite sum

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0281$

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0282$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0283$ are Gauss–Hermite quadrature points and weights.

Appendix 3 Effect estimation

In this section we provide details on the derivation of the average and conditional effect formulas used in our illustrative example. Let us consider a count outcome variable $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0284$ , a treatment variable $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0285$ with four levels, a binary categorical covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0286$ , and a vector of continuous covariates $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0287$ containing a latent covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0288$ measured by three observed indicator variables $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0289$ , and a count covariate $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0290$ . In the model section, we already stated that we assume latent baseline depression is $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0291$ -conditionally marginally normally distributed $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0292$ and is related to the baseline count of correctly answered items via a Poisson regression with $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0293$ where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0294$ .

Beginning with the general definition of the average treatment effect for these variables,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0295$

the integration part can be rewritten as a difference of integrals,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0296$

where the decomposition of the joint distribution of $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0297$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0298$ into a marginal and a conditional distribution enables the formation of an inner and an outer integral,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0299$

The inner integral resembles a univariate (conditional) expectation and can therefore be substituted with a univariate moment-generating function:

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0300$

Consequently, the computation of the average treatment effect simplifies to

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0301$

and as both outer integrals have the same domain, we can go back to a single integral,

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0302$

where we integrate over the density of the $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0303$ -conditionally normally distributed $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0304$ , which allows us to approximate the integral using a Gauss–Hermite quadrature

$urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0305$

where $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0306$ and $urn:x-wiley:00071102:media:bmsp12237:bmsp12237-math-0307$ are Gauss–Hermite quadrature points and weights.

Supporting Information

References

Azzalini, A., & Valle, A. D. (1996). The multivariate skew-normal distribution. Biometrika, 83, 715–726. https://doi.org/10.1093/biomet/83.4.715
10.1093/biomet/83.4.715
Web of Science®Google Scholar
Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., … for the ACTIVE Study Group (2002). Effects of cognitive training interventions with older adults. Journal of the American Medical Association, 288, 2271. https://doi.org/10.1001/jama.288.18.2271
10.1001/jama.288.18.2271
PubMedWeb of Science®Google Scholar
Blanca, M. J., Arnau, J., Lopez-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9(2), 78–84.
10.1027/1614-2241/a000057
Google Scholar
Bono, R., Blanca, M. J., Arnau, J., & Gómez-Benito, J. (2017). Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Frontiers in Psychology, 8, 1602. https://doi.org/10.3389/fpsyg.2017.01602
10.3389/fpsyg.2017.01602
PubMedWeb of Science®Google Scholar
Boos, D. D., & Stefanski, L. A. (2013). Essential statistical inference: Theory and methods. New York, NY: Springer.
10.1007/978-1-4614-4818-1
Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2010). Measurement error in nonlinear models: A modern perspective. Boca Raton, FL: Chapman and Hall/CRC.

Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
10.1111/j.2517-6161.1977.tb01600.x
Google Scholar
Deng, L., Yang, M., & Marcoulides, K. M. (2018). Structural equation modeling with many variables: A systematic review of issues and developments. Frontiers in Psychology, 9, 580. https://doi.org/10.3389/fpsyg.2018.00580
10.3389/fpsyg.2018.00580
PubMedWeb of Science®Google Scholar
Devlieger, I., & Rosseel, Y. (2017). Factor score path analysis. Methodology, 13, 31–38. https://doi.org/10.1027/1614-2241/a000130
10.1027/1614‐2241/a000130
Google Scholar
Garrett, S. B., Doyle, S. R., Peavy, K. M., Wells, E. A., Owens, M. D., Shores-Wilson, K., … Donovan, D. M. (2018). Age differences in outcomes among patients in the “Stimulant Abuser Groups to Engage in 12-Step” (STAGE-12) intervention. Journal of Substance Abuse Treatment, 84, 21–29. https://doi.org/10.1016/j.jsat.2017.10.012
10.1016/j.jsat.2017.10.012
PubMedWeb of Science®Google Scholar
Geneletti, S., & Dawid, A. P. (2011). Defining and identifying the effect of treatment on the treated. Causality in the sciences (pp. 728–749). Oxford, UK: Oxford University Press.
10.1093/acprof:oso/9780199574131.003.0034
Google Scholar
Greene, W. (2007). Econometric analysis. Upper Saddle River, NJ: Prentice Hall.

Google Scholar
Hittner, J. B., Owens, E. C., & Swickert, R. J. (2016). Influence of social settings on risky sexual behavior. SAGE Open, 6(1), 215824401662918. https://doi.org/10.1177/2158244016629187
10.1177/2158244016629187
Web of Science®Google Scholar
Jackson, D. L. (2001). Sample size and number of parameter estimates in maximum likelihood confirmatory factor analysis: A Monte Carlo investigation. Structural Equation Modeling: A Multidisciplinary Journal, 8, 205–223. https://doi.org/10.1207/s15328007sem0802_3
10.1207/s15328007sem0802_3
Web of Science®Google Scholar
Jackson, D. L. (2003). Revisiting sample size and number of parameter estimates: Some support for the N:q hypothesis. Structural Equation Modeling, 10(1), 128–141.
10.1207/S15328007SEM1001_6
Web of Science®Google Scholar
Jobe, J. B., Smith, D. M., Ball, K., Tennstedt, S. L., Marsiske, M., Willis, S. L., … Kleinman, K. (2001). ACTIVE. Controlled Clinical Trials, 22, 453–479. https://doi.org/10.1016/S0197-2456(01)00139-8
10.1016/S0197?2456(01)00139?8
CASPubMedWeb of Science®Google Scholar
Kelava, A., & Brandt, H. (2014). A general nonlinear multilevel structural equation mixture model. Frontiers in Psychology, 5(748), 748. https://doi.org/10.3389/fpsyg.2014.00748
10.3389/fpsyg.2014.00748
PubMedWeb of Science®Google Scholar
Kelava, A., Nagengast, B., & Brandt, H. (2014). A nonlinear structural equation mixture modeling approach for nonnormally distributed latent predictor variables. Structural Equation Modeling, 21, 468–481. https://doi.org/10.1080/10705511.2014.915379
10.1080/10705511.2014.915379
Web of Science®Google Scholar
Kiefer, C., & Mayer, A. (2019). Average effects based on regressions with a logarithmic link function: A new approach with stochastic covariates. Psychometrika, 84, 422–446. https://doi.org/10.1007/s11336-018-09654-1
10.1007/s11336?018?09654?1
PubMedWeb of Science®Google Scholar
Kiefer, C., & Mayer, A. (2020). Accounting for latent covariates in average effects from count regressions. Multivariate Behavorial Research, https://doi.org/10.1080/00273171.2020.1751027
10.1080/00273171.2020.1751027
PubMedWeb of Science®Google Scholar
Kiefer, C., Rosseel, Y., Wiese, B. S., & Mayer, A. (2018). Modeling and predicting non-linear changes in educational trajectories: The multilevel latent growth components approach. Psychological Test and Assessment Modeling, 60, 189–221.

Google Scholar
Lee, S.-Y. (2007). Structural equation modeling: A Bayesian approach. Chichester, UK: Wiley.
10.1002/9780470024737
Google Scholar
Lei, M., & Lomax, R. G. (2005). The effect of varying degrees of nonnormality in structural equation modeling. Structural Equation Modeling, 12, 1–27.
10.1207/s15328007sem1201_1
Web of Science®Google Scholar
Li, L., McLouth, C. J., & Delaney, H. D. (2020). Analysis of covariance in randomized experiments with heterogeneity of regression and a random covariate: The variance of the estimated treatment effect at selected covariate values. Multivariate Behavioral Research, 55, 926–940. https://doi.org/10.1080/00273171.2019.1693953
10.1080/00273171.2019.1693953
PubMedWeb of Science®Google Scholar
Liu, Y., & West, S. G. (2015). Weekly cycles in daily report data: An overlooked issue. Journal of Personality, 84, 560–579. https://doi.org/10.1111/jopy.12182
10.1111/jopy.12182
PubMedWeb of Science®Google Scholar
Liu, Y., West, S. G., Levy, R., & Aiken, L. S. (2017). Tests of simple slopes in multiple regression models with an interaction: Comparison of four approaches. Multivariate Behavioral Research, 52(4), 1–20. https://doi.org/10.1080/00273171.2017.1309261
10.1080/00273171.2017.1309261
PubMedWeb of Science®Google Scholar
Marsh, H. W., Lüdtke, O., Nagengast, B., Morin, A. J. S., & von Davier, M. (2013). Why item parcels are (almost) never appropriate: Two wrongs do not make a right –Camouflaging misspecification with item parcels in CFA models. Psychological Methods, 18, 257–284. https://doi.org/10.1037/a0032773
10.1037/a0032773
PubMedWeb of Science®Google Scholar
Mayer, A. (2019). Causal effects based on latent variable models. Methodology, 15, 15–28. https://doi.org/10.1027/1614-2241/a000174
10.1027/1614?2241/a000174
Google Scholar
Mayer, A., Dietzfelbinger, L., Rosseel, Y., & Steyer, R. (2016). The EffectLiteR approach for analyzing average and conditional effects. Multivariate Behavioral Research, 51, 374–391. https://doi.org/10.1080/00273171.2016.1151334
10.1080/00273171.2016.1151334
PubMedWeb of Science®Google Scholar
Mayer, A., Steyer, R., & Mueller, H. (2012). A general approach to defining latent growth components. Structural Equation Modeling, 19, 513–533. https://doi.org/10.1080/10705511.2012.713242
10.1080/10705511.2012.713242
Web of Science®Google Scholar
Mayer, A., & Thoemmes, F. (2019). Analysis of variance models with stochastic group weights. Multivariate Behavioral Research, 54, 542–554. https://doi.org/10.1080/00273171.2018.1548960
10.1080/00273171.2018.1548960
PubMedWeb of Science®Google Scholar
Mayer, A., Thoemmes, F., Rose, N., Steyer, R., & West, S. G. (2014). Theory and analysis of total, direct and indirect causal effects. Multivariate Behavioral Research, 49, 425–442. https://doi.org/10.1080/00273171.2014.931797
10.1080/00273171.2014.931797
PubMedWeb of Science®Google Scholar
Mayer, A., Umbach, N., Flunger, B., & Kelava, A. (2017). Effect analysis using nonlinear structural equation mixture modeling. Structural Equation Modeling: A Multidisciplinary Journal, 24, 556–570. https://doi.org/10.1080/10705511.2016.1273780
10.1080/10705511.2016.1273780
Web of Science®Google Scholar
Mazerolle, L., Bennett, S., Antrobus, E., Cardwell, S. M., Eggins, E., & Piquero, A. R. (2019). Disrupting the pathway from truancy to delinquency: A randomized field trial test of the longitudinal impact of a school engagement program. Journal of Quantitative Criminology, 35, 663–689. https://doi.org/10.1007/s10940-018-9395-8
10.1007/s10940?018?9395?8
Web of Science®Google Scholar
McArdle, J. J., & Epstein, D. (1987). Latent growth curves within developmental structural equation models. Child Development, 58, 110–133.
10.2307/1130295
CASPubMedWeb of Science®Google Scholar
McCullagh, P., & Nelder, J. A. (1996). Generalized linear models ( 2nd ed.). London, UK: Chapman & Hall.

Google Scholar
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. New York, NY: John Wiley & Sons.
10.1002/0471721182
Google Scholar
Merkle, E. C., & Rosseel, Y. (2015). blavaan: Bayesian structural equation models via parameter expansion. Journal of Statistical Software, 85(4), 1-30. https://doi.org/10.18637/jss.v085.i04
10.18637/jss.v085.i04
Web of Science®Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156
10.1037/0033‐2909.105.1.156
Web of Science®Google Scholar
Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus user’s guide ( 7th edn). Los Angeles, CA: Author.
10.1146/annurev.ecolsys.28.1.341
Web of Science®Google Scholar
Nusser, L., & Weinert, S. (2017). Appropriate test-taking instructions for students with special educational needs. Journal of Cognitive Education and Psychology, 16, 227–240. https://doi.org/10.1891/1945-8959.16.3.227
10.1891/1945?8959.16.3.227
Web of Science®Google Scholar
R Core Team (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/

Google Scholar
Radloff, L. S. (1977). The CES-D Scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401.
10.1177/014662167700100306
PubMedWeb of Science®Google Scholar
Reichhardt, C. (2019). Quasi-experimentation: A guide to design and analysis. New York, NY: Guilford.

Google Scholar
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
10.1093/biomet/70.1.41
Web of Science®Google Scholar
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100, 322–331.
10.1198/016214504000001880
CASWeb of Science®Google Scholar
Sagarin, B. J., West, S. G., Ratnikov, A., Homan, W. K., Ritchie, T. D., & Hansen, E. J. (2014). Treatment noncompliance in randomized experiments: Statistical approaches and design issues. Psychological Methods, 19, 317–333. https://doi.org/10.1037/met0000013
10.1037/met0000013
PubMedWeb of Science®Google Scholar
Schaumberg, R. L., & Flynn, F. J. (2017). Clarifying the link between job satisfaction and absenteeism: The role of guilt proneness. Journal of Applied Psychology, 102, 982. https://doi.org/10.1037/apl0000208
10.1037/apl0000208
PubMedWeb of Science®Google Scholar
Sridharan, V., Shoda, Y., Heffner, J., & Bricker, J. (2019). A pilot randomized controlled trial of a web-based growth mindset intervention to enhance the effectiveness of a smartphone app for smoking cessation. JMIR mHealth and uHealth, 7(7), e14602. https://doi.org/10.2196/14602
10.2196/14602
PubMedWeb of Science®Google Scholar
Sterba, S. K., & Rights, J. D. (2016). Accounting for parcel-allocation variability in practice: Combining sources of uncertainty and choosing the number of allocations. Multivariate Behavioral Research, 51, 296–313. https://doi.org/10.1080/00273171.2016.1144502
10.1080/00273171.2016.1144502
PubMedWeb of Science®Google Scholar
Steyer, R., Mayer, A., & Fiege, C. (2014). Causal inference on total, direct, and indirect effects. In A. C. Michalos (Ed.), Encyclopedia of quality of life and well-being research (pp. 606–631). Dordrecht, The Netherlands: Springer. https://doi.org/10.1007/978-94-007-0753-5_295
10.1007/978-94-007-0753-5_295
Google Scholar
Steyer, R., & Nagel, W. (2017). Probability and conditional expectation. Chichester, UK: John Wiley & Sons.

Google Scholar
Tennstedt, S., Morris, J., Unverzagt, F., Rebok, G., Willis, S., Ball, K., & Marsiske, M. (2005). ACTIVE (Advanced Cognitive Training for Independent and Vital Elderly), United States, 1999–2001. Interuniversity Consortium for Political and Social Research, https://doi.org/10.3886/ICPSR04248.V3
10.3886/ICPSR04248.V3
Google Scholar
Ver Hoef, J. M. (2012). Who invented the delta method? American Statistician, 66, 124–127. https://doi.org/10.1080/00031305.2012.687494
10.1080/00031305.2012.687494
Web of Science®Google Scholar