Sometimes people talk about p-values without alternative hypotheses. I will now explain why this is wrong-headed. It is wrong-headed since there is always a set of implied alternatives. Take any p-value \(U\). By definition, \(U\) is uniform under the null hypothesis \(H_0\) that the true probability measure is \(P\). All is fine and good. Now assume that \(Q\) is the true probability measure and that the distribution function \(Q(U \leq u)\) looks like this:
Let \(f\left(x\right)\) be a density and \(\pi\left(x\right)\) be a function satisfying \(0\leq\pi\left(x\right)\leq1\). In other words, \(\pi\left(x\right)\) is a probability for every \(x\). Then \(g\left(x\right)\propto f\left(x\right)\pi\left(x\right)\) is density, since \(\rho=\int f\left(x\right)\pi\left(x\right)dx<1\). This is an example of a , a class of models introduced by Rao (1965). Since \(p\left(x\right)\) is a probability, we can call this a . This note views rejection sampling (Neumann 1951) as sampling from a particular sort of probability weighted density.
Statistical Methods for Research Workers (1924, henceforth SMRW) was Fisher’s first book. Its’ a textbook for practicing scientist, and probably the most important book on practical statistics book published. It went through 14 editions from 1924 to 1970. The book is obviously of pure historical interest, and it should be illuminating in itself to read the perstives of Fisher. An interesting application of SMRW is to track changes in Fisher’s thought by looking at the changes in editions of SMRW.
Synthese is a generalist philosophy journal. It’s usually ranked among the 20 best, usually at the lower end. At least some of its focus is on themes I care about, including decision theory, interpretations of probability, probability paradoxes such as the Sleeping Beauty problem, and, of course, the philosophy of statistics. And the first issue of the 36th volume of Synthese was devoted to the philosophy of statistics. The occasion was Allan Birnbaum’s passing the year before, and the issue is built around his last submission to the journal.
This paper is old, and it shows! He starts of with the following: There was a time when we did not talk about tests of significance; we simply did them. We tested whether certain quantities we significant in the light of their standard errors, without inquiring as to just what was involved in the procedure, or attempting to generalize it. Sounds like the golden age of statistics! But the twilight of that age had long passed, for when he wrote this paper, statistics “consists almost entirely of tests of significance”.
At the Psychological Methods Discussion group, Ben Ambridge asked the following question: Hi everyone - I was wondering (don’t worry, I haven’t actually done this!) what would be wrong statistically speaking with an approach where you run a frequentist t-test (or whatever) after adding each participant and stop testing participants when the p value has remained below 0.05 (or 0.001 or whatever) for - say - each of the last 20 participants.