= 1000000
n = sqrt(3)*rexp(n)
s_x = 3*rexp(n)
s_y = 1
s_0
= rnorm(n, 1, 2)
x_0 = x_0 + s_x * rnorm(n)
x_1 = 0.8 + 0.5 * x_0 + s_0*rnorm(n)
y_0 = y_0 + s_y * rnorm(n) y_1
Problem and solution
Suppose we wish to estimate the regression coefficient for
However, we do not observe
As is well known, the regression coefficient
Define the regression model
It follows that
Notice that
Verification
Let’s simulate a bunch of values from the model.
The calculated coefficients are
= cov(y_1, x_1)/(var(x_1) - 2*var(s_x))
beta0_hat = mean(y_1) - beta0_hat * mean(x_1)
alpha0_hat c(alpha0_hat, beta0_hat)
[1] 0.7964205 0.4951465
But the naive regression
lm(y_1 ~ x_1)
Call:
lm(formula = y_1 ~ x_1)
Coefficients:
(Intercept) x_1
1.0927 0.1988
On the other hand, the correct (but unobserved) regression yields
lm(y_0~x_0)
Call:
lm(formula = y_0 ~ x_0)
Coefficients:
(Intercept) x_0
0.7987 0.4999
Inference and literature
To do inference on this method, use the delta method and large-sample theory (together with the studentized bootstrap), or perhaps the bias-corrected accelerated bootstrap (BCa). The delta method should be fairly easy to derive using the formulation of the “covariance of the covariance” foundin e.g. Magnus and Neudecker’s Matrix differential calculus.
There is a sizable literature on error-in-variable models, and inference for this simple model has probably been worked out, but a very rudimentary search yielded nothing for me. I think it’s uncommon to know the variances of the lavaan
) will help, because you don’t know the item variances in a typical application of structural equations models.
A final option is to assume bivariate normality and use maximum likelihood. This is also likely to be possible using an R
package, but I’m not sure the estimates would be consistent. Probably you’d have to use a sandwich matrix for correct standard errors.
To make things easy on yourself, if you’re faced with a problem of this kind, I would suggest just going with the BCa + the equations above. The equations are trivial to compute and BCa will be fairly simple as well; it might be possible to calculate using packages such as bootstrap
. Do something else only if the reviewers demand it.