# A proof of rejection sampling correctness

Let \(f\left(x\right)\) be a density and \(\pi\left(x\right)\) be a function satisfying \(0\leq\pi\left(x\right)\leq1\). In other words, \(\pi\left(x\right)\) is a probability for every \(x\). Then \(g\left(x\right)\propto f\left(x\right)\pi\left(x\right)\) is density, since \(\rho=\int f\left(x\right)\pi\left(x\right)dx<1\). This is an example of a , a class of models introduced by Rao (1965). Since \(p\left(x\right)\) is a probability, we can call this a . This note views rejection sampling (Neumann 1951) as sampling from a particular sort of probability weighted density. The intuitively correct way to sample from a probability weighted density is by using a natural form of rejection sampling. The sampling consists of two steps: First sample a proposal \(y\) from \(f\), then accept it with probability \(\pi\left(y\right)\).

### Proposition

The following methods gives samples from the probability weighted density \(g=\rho^{-1}f\left(x\right)\pi\left(x\right)\): i.) Sample a proposal \(y\) from \(f\), ii.) accept it with probability \(\pi\left(y\right)\). Moreover, if \(T\) is the number of proposals \(y\) until success, then \(T\) is geometric with probability parameter \(\rho\).

#### Proof

Let \(\left\{ Y_{i}\right\} _{i=1}^{\infty}\) be a sequence of proposals sampled from \(f\), let \(\left\{ Z_{i}\right\} _{i=1}^{\infty}\) be conditionally \(\textrm{Bernoulli}\left(\pi\left(Y_{i}\right)\right)\), and let \(T\) be the first time \(Z_{i}=1\). Letting \(V=Y_{T}\), the mathematical formalization of the algorithm above is to let \(V\) be our sample. Our goal is to prove that \(p\left(v\right)=g\left(v\right)\), where \(p\left(v\right)\) is the density of \(V\). First observe that \(p\left(T=1,v\right)=\pi\left(v\right)f\left(v\right)\). Using this observation we see that \(p\left(T=1\right)=\int f\left(y\right)\pi\left(y\right)dy=\rho\), and since the attempts are independent, the distribution of \(T\) is geometric. As \(V\) is independent of \(T\), \(p\left(v\mid T=1\right)=p\left(v\right)\), and since \(p\left(v\mid T=1\right)=p\left(T=1,v\right)\rho^{-1},\)we get that \(p\left(v\right)=\rho^{-1}\pi\left(v\right)f\left(v\right)\) as claimed.

### Connection to rejection sampling

Assume \(f,g\) are densities such that \(f\left(x\right)\leq Mg\left(x\right)\) for every \(x\). Then \(f\) is equal to the probability weighted model \(f\left(x\right)\propto g\left(x\right)p\left(x\right)\), where \(p\left(x\right)=\frac{f\left(x\right)}{Mg\left(x\right)}\). The usual rejection sampling methodology follows, together with the common remark (in e.g. Robert and Casella (2013)) that \(M\) is the mean number of proposals before rejection.

## References

Neumann, John von. 1951. “Various Techniques Used in Connection with Random Digits.” *Applied Math Series* 12 (36-38): 1.

Rao, C Radhakrishna. 1965. “On Discrete Distributions Arising Out of Methods of Ascertainment.” *Sankhyā: The Indian Journal of Statistics, Series A*, 311–24.

Robert, Christian, and George Casella. 2013. *Monte Carlo Statistical Methods*. Springer Science & Business Media.