Some tips for the master thesis
statistics
education
This is a collection of general tips for my master students in business analytics and data science. It’s far from exhaustive. In particular, it contains no tips about how to produce impressive content for a thesis, i.e., actually doing the research.
Deriving distributions from quantiles
effective altruism
statistics
fermi estimates
While doing Fermi estimation (“guesstimation”) you often want to construct a distribution from quantile knowledge. Dealing with two quantiles is quite easy, as you have plenty of distribution families that are easy to fit. Location-scale families, such as the normal distribution, logistic distribution, Cauchy distribution, or the shifted exponential distribution are particularly easy to fit. Moreover, a monotonically transformed location-scale distribution is equally easy to work with, e.g. the log-normal and log-logistic distributions.
A peek at pairwise preference estimation in economics, marketing, and statistics
effective altruism
statistics
psychometrics
economics
marketing
I had a peek at value estimation in economics and marketing. There is a sizable literature here, and more work is needed to figure out what exactly is relevant for effective altruists. Discrete choice models are applied a lot in economics, but these models are not able to estimate the scaling of the values. Marketing researchers prefer graded pairwise comparisons, which is equivalent to the pairwise method used here, but with limits on how much you can prefer one choice to another.
Inference for correlations corrected for attenuation
statistics
psychometrics
You have two psychometric instruments, \(\hat{Z_1}\) and \(\hat{Z_2}\), measuring the true scores \(Z_1\) and \(Z_2\) with error. The estimators are linear in \(Z_1,Z_2\) with independent error terms, i.e. \(\hat{Z_1} = Z_1 + \epsilon_1\) and \(\hat{Z_2} = Z_2 + \epsilon_2\). You only observe the correlation between the measurements \(\hat{Z_1}\) and \(\hat{Z_2}\), but you’re interested in the correlation between the true scores \(Z_1\) and \(Z_2\). What should you do? The Spearman (Spearman 1904) attenuation formula states that \[\operatorname{Cor}(Z_1, Z_2) = \frac{\operatorname{Cor}(\hat{Z_1}, \hat{Z_2})}{\operatorname{Cor}(Z_1,\hat{Z_1})\operatorname{Cor}(Z_2,\hat{Z_2})}\]A lot has been written about correction for attenuation. For instance, many people care about the easily verifiable and veritable horror that the sample disattenuated correlationmay be greater than \(1\)! But there’s not a lot written much about inference. This is a very short review of what I’ve read.
Estimating value from pairwise comparisons
effective altruism
statistics
psychometrics
How can you estimate the value of research output? You could use pairwise comparisons, e.g., to ask specialists how much more valuable Darwin’s The Original of Species is than Dembski’s Intelligent Design. Then you can use these relative valuations to estimate absolute valuations.
No one would have invented coefficient alpha today
statistics
psychometrics
Coefficient alpha is the most famous coefficient in psychometrics – Cronbach’s paper Coefficient alpha and the internal structure of tests has been cited around \(60,00\) times after all. It’s supposed to measure reliability. What does that mean? Intuitively, a psychometric scale is supposed to measure some kind of psychological construct, such as intelligence, in a reliable way. You don’t want it to be noisy. You don’t want two intelligence tests administered at slightly different times to give widely different results. You also want the test to actually measure intelligence, and not something else, such emotionality. But that’s validity, not reliability.
No matching items