### Some tips for the master thesis

statistics

education

This is a collection of general tips for my master students in business analytics and data science. It’s far from exhaustive. In particular, it contains no tips about how to produce impressive content for a thesis, i.e., actually doing the research.

### Deriving distributions from quantiles

effective altruism

statistics

fermi estimates

While doing Fermi estimation (“guesstimation”) you often want to construct a distribution from quantile knowledge. Dealing with two quantiles is quite easy, as you have plenty of distribution families that are easy to fit. Location-scale families, such as the normal distribution, logistic distribution, Cauchy distribution, or the shifted exponential distribution are particularly easy to fit. Moreover, a monotonically transformed location-scale distribution is equally easy to work with, e.g. the log-normal and log-logistic distributions.

### A peek at pairwise preference estimation in economics, marketing, and statistics

effective altruism

statistics

psychometrics

economics

marketing

I had a peek at value estimation in economics and marketing. There is a sizable literature here, and more work is needed to figure out what exactly is relevant for effective altruists. Discrete choice models are applied a lot in economics, but these models are not able to estimate the scaling of the values. Marketing researchers prefer graded pairwise comparisons, which is equivalent to the pairwise method used here, but with limits on how much you can prefer one choice to another.

### Inference for correlations corrected for attenuation

statistics

psychometrics

You have two psychometric instruments, \(\hat{Z_1}\) and \(\hat{Z_2}\), measuring the true scores \(Z_1\) and \(Z_2\) with error. The estimators are linear in \(Z_1,Z_2\) with independent error terms, i.e. \(\hat{Z_1} = Z_1 + \epsilon_1\) and \(\hat{Z_2} = Z_2 + \epsilon_2\). You only observe the correlation between the measurements \(\hat{Z_1}\) and \(\hat{Z_2}\), but you’re interested in the correlation between the true scores \(Z_1\) and \(Z_2\). What should you do? The Spearman (Spearman 1904) attenuation formula states that \[\operatorname{Cor}(Z_1, Z_2) = \frac{\operatorname{Cor}(\hat{Z_1}, \hat{Z_2})}{\operatorname{Cor}(Z_1,\hat{Z_1})\operatorname{Cor}(Z_2,\hat{Z_2})}\]A lot has been written about correction for attenuation. For instance, many people care about the easily verifiable and veritable

*horror*that the sample disattenuated correlationmay be greater than \(1\)! But there’s not a lot written much about inference. This is a very short review of what I’ve read.### Estimating value from pairwise comparisons

effective altruism

statistics

psychometrics

How can you estimate the value of research output? You could use pairwise comparisons, e.g., to ask specialists how much more valuable Darwin’s

*The Original of Species*is than Dembski’s*Intelligent Design*. Then you can use these relative valuations to estimate absolute valuations.### No one would have invented coefficient alpha today

statistics

psychometrics

Coefficient alpha is the most famous coefficient in psychometrics – Cronbach’s paper

*Coefficient alpha and the internal structure of tests*has been cited around \(60,00\) times after all. It’s supposed to measure*reliability*. What does that mean? Intuitively, a psychometric scale is supposed to measure some kind of psychological construct, such as intelligence, in a reliable way. You don’t want it to be noisy. You don’t want two intelligence tests administered at slightly different times to give widely different results. You also want the test to actually measure intelligence, and not something else, such emotionality. But that’s validity, not reliability.
No matching items