Hypergeometric distribution

Imagine we have a finite population containing \(m\) “success” cases and \(n\) “failure” cases, for a total population size of \(m+n\). If we sample \(k\) cases from the population without replacement, then a hypergeometric random variable counts the number of sampled cases that are a success.

You first met this distribution on Problem Set 4 when studying contested elections.

Basic properties

Notation \(X\sim\text{HG}(m, n, k)\)
Range \(\{0,\,1,\,2,\,3,\,...,\,k-1,\,k\}\)
PMF \(P(X = x) = \binom{m}{x} \binom{n}{k-x}/\binom{m+n}{k}\)
Expectation \(km/(m+n)\)

R commands

Here is the documentation for the suite of commands that let you work with the hypergeometric distribution in R:

dhyper(x, m, n, k)     # PMF: P(X = x)
phyper(q, m, n, k)     # CDF: P(X <= q)
qhyper(p, m, n, k)     # quantile function (inverse CDF)
rhyper(ndraw, m, n, k) # random numbers