Normal distribution
I usually have a little blurb up here describing how the distribution arises or when you might use it, but the normal distribution is so central to probability and statistics that words fail. It shows up everywhere.
Basic properties
| Notation | \(X\sim\text{N}(\mu,\,\sigma^2)\) |
| Range | \(\mathbb{R}=(-\infty,\,\infty)\) |
| Parameter space | \(\begin{matrix}-\infty<\mu<\infty\\\sigma^2>0\end{matrix}\) |
| \(f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2}\right)\) | |
| CDF | \(F(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^x\exp\left(-\frac{1}{2}\frac{(t-\mu)^2}{\sigma^2}\right)\,\text{d}t\) |
| MGF | \(M(t) = \exp\left(\mu t+\frac{\sigma^2}{2}t^2\right)\), \(t\in\mathbb{R}\) |
| Expectation | \(\mu\) |
| Variance | \(\sigma^2\) |
Fun facts:
- This is also known as the Gaussian distribution;
- N(0, 1) is called the standard normal;
- The density of the normal is the familiar bell curve;
- We do not have a closed-form for the CDF. We can approximate it arbitrarily well in a computer, but we can’t actually simplify the integral and get a neat formula. Bummer!
R commands
Here is the documentation for the suite of commands that let you work with the normal distribution in R:
You will get burned by this at least once. I guarantee it. But please take note. If you want to compute \(P(X\leq -2.4)\) for \(X\sim\text{N}(4, 9)\), you call pnorm(-2.4, 4, 3), because \(\text{sd}(X)=\sqrt{\text{var}(X)}=3\).
Play around!
As we have seen, the parameters \(\mu\) and \(\sigma\) have multiple interpretations. From a probabilistic point-of-view, they are the mean \(\mu=E(X)\) and the standard deviation \(\sigma=\sqrt{\text{var}(X)}\). From a purely geometric point-of-view, they control the shape of the density curve. \(\mu\) is the location of the peak, and \(\sigma\) controls how far the inflection points are from \(\mu\). You worked that out on Problem Set 0.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 700
library(shiny)
ui <- fluidPage(
titlePanel("Normal distribution CDF and PDF"),
sidebarLayout(
sidebarPanel(
sliderInput("mu", "Mean (μ):",
min = -5, max = 5, value = 0, step = 0.1),
sliderInput("sigma", "Standard Deviation (σ):",
min = 0.5, max = 3, value = 1, step = 0.1)
),
mainPanel(
plotOutput("distPlot", height = "600px")
)
)
)
server <- function(input, output) {
output$distPlot <- renderPlot({
mu <- input$mu
sigma <- input$sigma
# Fixed x range
x <- seq(-10, 10, length.out = 1000)
# Compute values
pdf_vals <- dnorm(x, mean = mu, sd = sigma)
cdf_vals <- pnorm(x, mean = mu, sd = sigma)
# Inflection points at mu ± sigma
inflect_left <- mu - sigma
inflect_right <- mu + sigma
# Fixed y limits
pdf_ylim <- c(0, 0.8)
par(mfrow = c(2, 1), mar = c(4, 4, 2, 1))
# --- CDF Plot ---
plot(x, cdf_vals, type = "l", lwd = 2, col = "blue",
xlim = c(-10, 10), ylim = c(0, 1),
main = "Cumulative Distribution Function (CDF)",
xlab = "", ylab = "F(x)")
abline(h = c(0, 1), col = "gray80", lty = 2)
abline(v = mu, col = "gray60", lty = 3)
# --- PDF Plot ---
plot(x, pdf_vals, type = "l", lwd = 2, col = "darkred",
xlim = c(-10, 10), ylim = pdf_ylim,
main = "Probability Density Function (PDF)",
xlab = "x", ylab = "f(x)")
# Vertical lines at mean and inflection points
abline(v = mu, col = "gray60", lty = 3)
abline(v = c(inflect_left, inflect_right), col = "gray70", lty = 2)
# Arrows showing sigma distance
y_arrow <- 0.05
arrows(mu, y_arrow, inflect_right, y_arrow, code = 3, angle = 10, length = 0.1)
arrows(mu, y_arrow, inflect_left, y_arrow, code = 3, angle = 10, length = 0.1)
# Label σ between mean and inflection points
text(mu + sigma / 2, y_arrow + 0.03, expression(sigma), cex = 1.1)
text(mu - sigma / 2, y_arrow + 0.03, expression(sigma), cex = 1.1)
# Label μ in the margin below the x-axis
mtext(expression(mu), side = 1, line = 2.2, at = mu, cex = 1.2)
})
}
shinyApp(ui = ui, server = server)
Derivations
\[ \begin{aligned} M(t)&=E[e^{tX}] && \text{definition}\\ &=\int_{-\infty}^\infty e^{tx}\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2}\,\textrm{d}x && \text{LOTUS}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2+tx}\,\textrm{d}x && \text{combine base-$e$ terms}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x^2-2tx)}\,\textrm{d}x && \text{factor out -1/2}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x^2-2tx+t^2-t^2)}\,\textrm{d}x && \text{add/subtract $t^2$}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}[(x-t)^2-t^2]}\,\textrm{d}x && \text{factor first three terms}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x-t)^2+\frac{1}{2}t^2}\,\textrm{d}x && \text{distribute -1/2}\\ &=\int_{-\infty}^\infty\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x-t)^2}e^{\frac{1}{2}t^2}\,\textrm{d}x\\ &=e^{\frac{1}{2}t^2}\int_{-\infty}^\infty\underbrace{\frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}(x-t)^2}}_{\textrm{N}(t,\,1)\text{ PDF}}\,\textrm{d}x&&\text{pull constant out of integral}\\ &=e^{\frac{1}{2}t^2}\cdot 1&&\text{PDFs integrate to 1}\\ &=e^{\frac{1}{2}t^2}. \end{aligned} \]
This works for any \(t\in\mathbb{R}\).
