No one would have invented coefficient alpha today

statistics

psychometrics

Author

Jonas Moss

Published

Oct 3, 2022

Coefficient alpha is the most famous coefficient in psychometrics – Cronbach’s paper Coefficient alpha and the internal structure of tests has been cited around \(60,00\) times after all. It’s supposed to measure reliability. What does that mean? Intuitively, a psychometric scale is supposed to measure some kind of psychological construct, such as intelligence, in a reliable way. You don’t want it to be noisy. You don’t want two intelligence tests administered at slightly different times to give widely different results. You also want the test to actually measure intelligence, and not something else, such emotionality. But that’s validity, not reliability.

In classical test theory they think about psychometric tests in the context of a true score \(T\) and observed score \(X\). Then they write \(X = T +\epsilon\) for some error term \(\epsilon\). (This is always possible). For instance, if \(X\) is the observed score on an IQ test, \(T\) is the true, underlying intelligence.

Now we’re ready to define classical reliability! The reliability is defined as \(R=\text{Cor}(X, T)^2\), the squared correlation between the true score and the observed score. That’s a pretty reasonable definition! But the problem is that you don’t know the true score, so you can’t estimate this correlation directly.

And that’s where coefficient alpha comes in. Suppose that \(X\) is a sum score, i.e., on the form \(X = X_1 + X_2 + \ldots + X_k\). Then assume that \[X_i = \mu_i + \lambda_i T + \sigma_i \epsilon_i \tag{1}\] for some error terms with variance equal to \(1\). In addition, assume that \(\text{Cov}(\epsilon_i, \epsilon_j) = 0\) when \(i = j\), i.e., the error terms are uncorrelated. Then \(X\) follows the congeneric measurement model. Now define coefficient alpha as \[\alpha=\frac{k}{k-1}\left(1-\frac{\text{tr}\Sigma}{{1}^{T}\Sigma{1}}\right),\] where \(\Sigma\) is the covariane matrix of the \(X_i\)s. This alpha has two important properties:

When the congeneric measurement model holds, \(R\geq \alpha\).In other words, \(\alpha\) is a lower bound for the true reliability.

If, in addition, if the true model is \(\tau\)-equivalent, then \(R = \alpha\).

The condition of \(\tau\)-equivalence means that all the factor loadings \(\lambda_i\) are equal, i.e., the congeneric model eq. 1 is reduced to \[X_i = \mu_i + \lambda T + \sigma_i \epsilon_i. \tag{2}\] There is widespread agreement that \(\tau\)-equivalence never holds.

Alpha is also the mean of all possible split-half reliabilities, but no one seem to care much about that today. It might have been important earlier on though. For more details see (Cho 2021).

Why no one would have invented alpha today

Take a look at the congeneric model (eq. 1) again. This is a linear one-factor model with no correlation among the errors. In the early 20th century, there was no feasible way to estimate such models, at least not for ordinary psychometric researchers. But that’s not the case anymore. Anyone able to install R and run the easiest lavaan script can estimate such a model. For example, we can estimate the parameters for the agreeableness part of the psychTools::bfi data.

The std.lv = TRUE argument forces the latent variable \(T\) to have variance equal to \(1\). This option makes it easier to interpret the remaining parameters. Using the estimated parameters of the lavaan object we can estimate the reliabiltiy \(R\) with little problems. Straight-forward calculations show that, when we assume the congeneric measurement model, \[R = \text{Cor}^{2}(X_1 + X_2+\cdots+X_k,T)=\frac{k\overline{\lambda}^{2}}{k\overline{\lambda}^{2}+\overline{\sigma^{2}}}, \tag{3}\] where \(\overline{x}\) denotes the mean of the vector \(x\).

Using this formulation of the reliability in terms of \(\lambda\) and \(\sigma\), the natural estimator if the reliability is the plug-in estimator\[\hat{R} = \frac{k\overline{\hat{\lambda}}^{2}}{k\overline{\hat{\lambda}}^{2}+\overline{\hat{\sigma}^{2}}},\] where \(\hat{\lambda}\) and \(\hat{\sigma}\) are estimators of your choice. If you’re using lavaan, you would calculate it using something like the following function.

#' Estimate the reliability from a `lavaan` object.#' #' Estimate the reliability assuming a congeneric measurement model using the#' plug-in estimator.#' @param obj A `lavaan` object.#' @return The estimated reliability coefficient.reliability <-function(obj) { params <- lavaan::lavInspect(obj, what ="coef") lambda <- params$lambda sigma2 <-diag(params$theta) k <-length(lambda) k *mean(lambda) ^2/ (k *mean(lambda) ^2+mean(sigma2))}

For the agreeableness data of psychTools::bfi, our reliability becomes

reliability(obj)

[1] 0.712129

It’s not hard to do inference for the reliability coefficient either – it is merely an application of the delta method. Moreover, by the invariance principle of maximum likelihood estimation, it is the maximum likelihood estimator of the reliability provided \(\lambda\) and \(\sigma\) are estimated using maximum likelihood. In this sense the estimator is natural.

To sum up:

There exists a natural estimator of the reliability under the congeneric measurement model.

It is easy to estimate and it’s efficient under normality.

Its asymptotic theory is well-understood. That’s just a by-product of the immense amount of research on asymptotic theory for general structural equation models.

I doubt anyone would have wanted to investigate alternatives to this 100% reasonable, simple, and efficient estimator. For why would they? It’s like finding an alternative estimator of the \(R^2\) that requires a little less computation power but is only consistent for the \(R^2\) under extraordinarily unlikely assumptions.

Maybe someone would have uncovered coefficient alpha today, but its discovery would be treated as a novelty, not something of widespread importance, worth \(60,000\) citations.

References

Cho, Eunseong. 2021. “Neither Cronbach’s Alpha nor McDonald’s Omega: A Commentary on Sijtsma and Pfadt.”Psychometrika 86 (4): 877–86. https://doi.org/10.1007/s11336-021-09801-1.