Problem Set 7

Due Monday December 8 at 5PM

Final exam prep

The final exam includes one problem on maximum likelihood estimation, and you can expect it to look similar to Problems 3 and 4 below. The final exam closes with one problem on Bayesian statistics, and you can expect it to look similar to Problems 5 and 6 below.

Problem 0

Doodle a cute character that will cheer you on during this assignment.

Problem 1

Let \(Z\sim\text{N}(0,\,1)\), and recall what that means:

\[ \begin{aligned} f_Z(z)&=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{1}{2}z^2\right), && -\infty<z<\infty\\ M_Z(t)&=\exp\left(\frac{1}{2}t^2\right), && -\infty<t<\infty. \end{aligned} \]

Next, let \(X=\mu+\sigma Z\) for some constants \(\mu\in\mathbb{R}\) and \(\sigma>0\).

  1. Use the change-of-variables formula to derive density of \(X\). What is its distribution?
  2. What is the moment-generating function of \(X\)?
  3. What are the mean and variance of \(X\)? Make sure you justify your answer with some type of derivation.
  4. Consider \(X_1,\,X_2,\,...,\,X_n\overset{\text{iid}}{\sim}\text{N}(\mu,\,\sigma^2)\) and derive the distribution of their sum and their average.

Problem 2

If the discrete random variable \(X\) has the funky-ass distribution (FAS), then its range is \(\text{Range}(X)=\mathbb{N}\) and its pmf is

\[ P(X=k)=\binom{k+r-1}{k}(1-p)^kp^r\quad k=0,\,1,\,2,\,3,\,..., \]

where the parameters \(r\in\mathbb{N}\) and \(0<p<1\) are constants. We denote this \(X\sim\text{FAS}(r,\,p)\).

  1. Use what you know about probability to find the value of this infinite series:

\[ \sum\limits_{k=0}^\infty \binom{k+r-1}{k}(1-p)^k. \]

  1. Compute the MGF of \(X\);
  2. Use the definition of the expected value to compute the mean of \(X\);
  3. Use the MGF to compute the mean of \(X\) and verify that you get the same answer you got before;
  4. Consider an iid collection \(X_i\overset{\text{iid}}{\sim}\text{FAS}(r,\,p)\) and derive the distribution of the sum \(S_n=\sum_{i=1}^nX_i\).

Problem 3

Consider these data:

\[ X_1,\,X_2,\,...,\,X_n\overset{\text{iid}}{\sim}\text{N}(0,\,\theta). \]

  1. What is the maximum likelihood estimator of \(\theta>0\)?
  2. What is the sampling distribution of the estimator?
  3. What is the MSE of the estimator?
  4. Based on the MSE, what are the statistical properties of this estimator?

Problem 4

Let \(X_1\), \(X_2\), …, \(X_n\) be iid from some member of this parametric family:

\[ f(x\,|\,\theta) = \frac{1}{2\theta}\exp\left(-\frac{|x|}{\theta}\right), \quad -\infty<x<\infty. \]

  1. What is the maximum likelihood estimator of \(\theta>0\)?
  2. What is the sampling distribution of the estimator?
  3. What is the MSE of the estimator?
  4. Based on the MSE, what are the statistical properties of this estimator?

Problem 5

Imagine that I quit my job and open a factory that manufactures bow ties and light bulbs (Zito’s Bows and Bulbs). The ties are alright, but the bulbs suck. They burn out real quick. Each bulb is slightly different, and you can’t perfectly predict how long they will last, so the time (in hours) until the bulb dies is a random variable \(X\), and let’s assume \(X\sim\text{Exponential}(\lambda)\), where \(\lambda>0\) is unknown. Recall that \(E(X)=1/\lambda\), so the larger the rate, the sooner the burnout time.

I want to estimate \(\lambda\) to get a sense of how bad my light bulbs are, so I sample \(n\) bulbs and record their burnout times:

\[ X_1,\,X_2,\,...,\,X_n\overset{\text{iid}}{\sim}\text{Exponential}(\lambda). \]

At this point, I could just use the method of maximum likelihood to estimate \(\lambda\) like we did in lecture, but before I do, I go and consult Great Grandma Zito. She’s been making bad light bulbs for decades and taught me everything that I know. She says that in her experience, \(\lambda\) is in the ballpark of 1 (meaning our bulbs burn out in an hour, on average), but there’s uncertainty about that. In her opinion, the probability that \(\lambda>3\) is about 1.7%.

I want to incorporate my grandmother’s prior knowledge into my analysis, so I decide to be Bayesian:

\[ \begin{aligned} \lambda &\sim\text{Gamma}(\alpha_0,\,\beta_0) && \text{(prior)}\\ X_1,\,X_2,\,...,\,X_n\,|\,\lambda&\overset{\text{iid}}{\sim}\text{Exponential}(\lambda) && \text{(data model)}. \end{aligned} \]

\(\lambda \sim\text{Gamma}(\alpha_0,\,\beta_0)\) is my prior distribution for the unknown parameter, and \(\alpha_0,\,\beta_0>0\) are hyperparameters that I will tune in order to encode the prior knowledge about \(\lambda\) that my grandmother described. I chose the gamma family simply because it is convenient and familiar to me, and I know that \(\lambda\) is a continuous numerical quantity that must be positive.

  1. Show that the posterior distribution for \(\lambda\) in this model is

    \[ \lambda\,|\,X_{1:n}=x_{1:n} \sim \text{Gamma}(\alpha_n,\,\beta_n). \]

    After we see some data, what are the revised hyperparameters \(\alpha_n,\,\beta_n\) equal to?

Pay attention to the notation here

Before I see any data, \(\text{Gamma}(\alpha_0,\,\beta_0)\) summarizes my beliefs about \(\lambda\). After I see some data, \(\text{Gamma}(\alpha_n,\,\beta_n)\) summarizes my beliefs about \(\lambda\). \(\alpha_0\) and \(\beta_0\) are the prior hyperparameters, and \(\alpha_n\) and \(\beta_n\) are the posterior hyperparameters. The subscript indicates how much data my beliefs are based on.

  1. Show that the posterior mean has the form

    \[ E(\lambda\,|\,X_1,\,X_2,\,...,\,X_n)=w_n\hat{\lambda}_n^{(MLE)} + (1-w_n)\underbrace{E(\lambda)}_{\text{prior mean}}, \]

    where \(w_n\in(0,\,1)\) might depend on the data. This means that the posterior mean is a shrinkage estimator. We shrink the MLE toward our prior estimate of the parameter.

  2. How should the prior hyperparameters \(\alpha_0\) and \(\beta_0\) be set so that the prior distribution captures my grandmother’s beliefs about \(\lambda\)? There’s not a tremendous amount of math here. It’s just trial-and-error until you find numbers that work.

Problem 6

Consider the following Bayesian model:

\[ \begin{aligned} p&\sim\text{Beta}(a_0,\,b_0)&& \text{(prior)}\\ X_1,\,X_2,\,...,\,X_n\,|\,p&\overset{\text{iid}}{\sim}\text{Geometric}(p).&& \text{(likelihood)} \end{aligned} \]

  1. What is the posterior distribution?
  2. Compute the posterior mean and show that it is a weighted average of the prior mean and the maximum likelihood estimator.

Submission

You are free to compose your solutions for this problem set however you wish (scan or photograph written work, handwriting capture on a tablet device, LaTeX, Quarto, whatever) as long as the final product is a single PDF file. You must upload this to Gradescope and mark the pages associated with each problem.

Do not forget to include the following:

  • For each problem, please acknowledge your collaborators;
  • If a problem required you to code something, please include both the code and the output. “Including the code” can be as crude as a screenshot, but you might also use Quarto to get a nice lil’ pdf that you can merge with the rest of your submission.