STA 610L: Module 2.4

class: center, middle, inverse, title-slide

# STA 610L: Module 2.4
## Random effects ANOVA (Bayesian estimation I)
### Dr. Olanrewaju Michael Akande

---

## Introduction

Bayesian estimation is usually the approach of choice for fitting hierarchical models.

Two major advantages include

- estimation and computation, particularly in complex, highly structured, or generalized linear models; and
    
  - straightforward uncertainty quantification.

---
## Hierarchical normal model

Recall our random effects ANOVA model:

`\begin{eqnarray*}
y_{ij}&=&\mu_j+\varepsilon_{ij}
\end{eqnarray*}`
where
  + `$\mu_j=\mu +\alpha_j$`, and
  
  + `$\alpha_j \overset{iid}{\sim} N\left(0,\tau^2\right) \perp \varepsilon_{ij} \overset{iid}{\sim} N\left(0,\sigma^2\right)$`,
  
  so that `$\mu_j \overset{iid}{\sim} N\left(\mu,\tau^2\right)$`.

To do Bayesian estimation, we also need to specify a prior distribution for `$(\mu,\tau^2,\sigma^2)$`, which we will write as `$p(\theta)=p(\mu,\tau^2,\sigma^2)$`.

**Note**: this module should be a recap of the derivations you should have covered in STA 360/601/602. Some of the notations might be different so pay attention to those.

---
## Bayesian specification of the model

We can start with a .hlight[default semi-conjugate] prior specification given by

`$$p(\mu,\tau^2,\sigma^2)=p(\mu)p(\tau^2)p(\sigma^2),$$`

where
.block[
.small[
$$
`\begin{split}
\pi(\mu) & = \mathcal{N}\left(\mu_0, \gamma^2_0\right)\\
\pi(\tau^2) & = \mathcal{IG} \left(\dfrac{\eta_0}{2}, \dfrac{\eta_0\tau_0^2}{2}\right)\\
\pi(\sigma^2) & = \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right).\\
\end{split}`
$$
]
]

---
## Bayesian specification of the model

With this default prior specification, we have nice interpretations of the prior parameters.

- For `$\mu$`,
  + `$\mu_0$`: best guess of average of group averages
  
  + `$\gamma^2_0$`: set based on plausible ranges of values of `$\mu$`
    
- For `$\tau^2$`,
  + `$\tau_0^2$`: best guess of variance of group averages
  
  + `$\eta_0$`: set based on how tight prior for `$\tau^2$` is around `$\tau_0^2$`
    
- For `$\sigma^2$`,
  + `$\sigma_0^2$`: best guess of variance of individual responses around respective group means
  
  + `$\nu_0$`: set based on how tight prior for `$\sigma^2$` is around `$\sigma_0^2$`.

---
## Quick review: inverse-gamma distribution

If `$\theta \sim \mathcal{IG}(a,b)$`, then the pdf is
.block[
`$$p(\theta) = \frac{b^a}{\Gamma(a)} \theta^{-(a+1)} e^{-\frac{b}{\theta} }\ \ \ \textrm{for} \ \ \ a, b > 0,$$`
]

with

+ `$\mathbb{E}[\theta] = \frac{b}{a - 1}$`;
  
  + `$\mathbb{V}[\theta] = \frac{b^2}{(a - 1)^2(a-2)} \ \ \textrm{for} \ \ a \geq 2$`.;
  
  + `$\textrm{Mode}[\theta] = \frac{b}{a+1}$`.
  
  
  
---
## Implications on priors

Using an `$\mathcal{IG} \left(\dfrac{\eta_0}{2}, \dfrac{\eta_0\tau_0^2}{2}\right)$` prior for `$\tau^2$`, we see that our best guess of variance of group averages, `$\tau_0^2$`, is somewhere in the "center" of the distribution (between the mode `$\frac{\eta_0\tau_0^2}{\eta_0+2}$` and the mean `$\frac{\eta_0\tau_0^2}{\eta_0-2}$`).

As the "prior sample size" or "prior degrees of freedom" `$\eta_0$` increases, the difference between these quantities goes to 0.

We have similar implications on the prior `$\pi(\sigma^2) = \mathcal{IG} \left(\dfrac{\nu_0}{2}, \dfrac{\nu_0\sigma_0^2}{2}\right)$`.

---
## Fully-specified model

We have now fully-specified our model with the following components.

1. Unknown parameters `$(\mu_0, \tau_0^2, \sigma_0^2, \mu_1, \cdots, \mu_J)$`

2. Prior distributions, specified in terms of prior guesses `$(\mu_0,\tau_0^2,\sigma_0^2)$` and prior sample sizes or prior degrees of freedoms `$(\gamma^2_0,\eta_0,\nu_0)$`

3. Data from our groups.

We can then interrogate the posterior distribution of the parameters using Gibbs sampling, as the full conditional distributions have closed forms.

---
## Full conditionals

- For the full conditionals we will derive here, we will take advantage of results from the regular univariate normal model (from STA 360/601/602). For a refresher, see [here](https://sta-602l-s21.github.io/Course-Website/slides/3-5-normal-joint-inference.html#1).

- Recall that if we assume
.block[
.small[
`$$y_i \sim \mathcal{N}(\mu, \sigma^2), \ \ i=1, \ldots, n,$$`
]
]

and set our priors to be
.block[
.small[
$$
`\begin{split}
\pi(\mu) & = \mathcal{N}\left(\mu_0, \gamma_0^2\right).\\
\pi(\sigma^2) & = \mathcal{IG}\left(\dfrac{\nu_0}{2}, \dfrac{\nu_0 \sigma_0^2}{2}\right),\\
\end{split}`
$$
]
]

then we have
.block[
.small[
$$
`\begin{split}
\pi(\mu, \sigma^2 | Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n} p(y_{i} | \mu, \sigma^2 ) \right\} \cdot \pi(\mu) \cdot \pi(\sigma^2) \\
\end{split}`
$$
]
]

---
## Full conditionals

- We have
.block[
.small[
$$
`\begin{split}
\pi(\mu | \sigma^2, Y) & = \mathcal{N}\left(\mu_n, \gamma_n^2\right).\\
\end{split}`
$$
]
]

where
.block[
.small[
$$
`\begin{split}
\gamma^2_n = \dfrac{1}{ \dfrac{n}{\sigma^2} + \dfrac{1}{\gamma_0^2} }; \ \ \ \ \ \ \ \ \mu_n & = \gamma^2_n \left[ \dfrac{n}{\sigma^2} \bar{y} + \dfrac{1}{\gamma_0^2} \mu_0 \right],
\end{split}`
$$
]
]

- and
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2 | \mu,Y) \boldsymbol{=} \mathcal{IG}\left(\dfrac{\nu_n}{2}, \dfrac{\nu_n\sigma_n^2}{2}\right),
\end{split}`
$$
]
]

where
.block[
.small[
$$
`\begin{split}
\nu_n & = \nu_0 + n; \ \ \ \ \ \ \ \sigma_n^2  = \dfrac{1}{\nu_n} \left[ \nu_0 \sigma_0^2 + \sum^n_{i=1} (y_i - \mu)^2 \right].\\
\end{split}`
$$
]
]

---
## Posterior inference

- Our hierarchical model can be written as
.block[
.small[
$$
`\begin{split}
y_{ij} | \mu_j, \sigma^2 & \sim \mathcal{N} \left( \mu_j, \sigma^2\right); \ \ \ i = 1,\ldots, n_j\\
\mu_j | \mu, \tau^2 & \sim \mathcal{N} \left( \mu, \tau^2 \right); \ \ \ j = 1, \ldots, J,
\end{split}`
$$
]
]

- Under our prior specification, we can factor the posterior as follows:
.block[
.small[
$$
`\begin{split}
\pi(\mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2 | Y) & \boldsymbol{\propto} p(y | \mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2)\\
& \ \ \ \ \times p(\mu_1, \ldots, \mu_J | \mu, \sigma^2,\tau^2)\\
& \ \ \ \ \times \pi(\mu, \sigma^2,\tau^2)\\
\\
& \boldsymbol{=} p(y | \mu_1, \ldots, \mu_J, \sigma^2 )\\
& \ \ \ \ \times p(\mu_1, \ldots, \mu_J | \mu,\tau^2)\\
& \ \ \ \ \times \pi(\mu) \cdot \pi(\sigma^2) \cdot \pi(\tau^2)\\
\\
& \boldsymbol{=} \left\{ \prod_{j=1}^{J} \prod_{i=1}^{n_j} p(y_{ij} | \mu_j, \sigma^2 ) \right\}\\
& \ \ \ \ \times \left\{ \prod_{j=1}^{J} p(\mu_j | \mu,\tau^2) \right\}\\
& \ \ \ \ \times\pi(\mu) \cdot \pi(\sigma^2) \cdot \pi(\tau^2)\\
\end{split}`
$$
]
]

---
## Full conditional for grand mean

- The full conditional distribution of `$\mu$` is proportional to the part of the joint posterior `$\pi(\mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2 | Y)$` that involves `$\mu$`.

- That is,
.block[
.small[
$$
`\begin{split}
\pi(\mu | \mu_1, \ldots, \mu_J, \sigma^2,\tau^2, Y) & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\mu_j | \mu,\tau^2) \right\} \cdot \pi(\mu).
\end{split}`
$$
]
]

- This looks like the full conditional distribution from the one-sample normal case, so you can show that
.block[
.small[
$$
`\begin{split}
\pi(\mu | \mu_1, \ldots, \mu_J, \sigma^2,\tau^2, Y) & = \mathcal{N}\left(\mu_n, \gamma^2_n \right) \ \ \ \ \textrm{where}\\
\\
\gamma^2_n = \dfrac{1}{ \dfrac{J}{\tau^2} + \dfrac{1}{\gamma_0^2}  } ; \ \ \ \ \ \ \ \  \mu_n = \gamma^2_n \left[ \dfrac{J}{\tau^2} \bar{\theta} + \dfrac{1}{\gamma_0^2} \mu_0 \right]
\end{split}`
$$
]
]

and `$\bar{\theta} = \frac{1}{J} \sum\limits^J_{j=1} \mu_j$`.

---
## Full conditionals for group means

- Similarly, the full conditional distribution of each `$\mu_j$` is proportional to the part of the joint posterior `$\pi(\mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2 | Y)$` that involves `$\mu_j$`.

- That is,
.block[
.small[
$$
`\begin{split}
\pi(\mu_j | \mu, \sigma^2,\tau^2, Y) & \boldsymbol{\propto} \left\{ \prod_{i=1}^{n_j} p(y_{ij} | \mu_j, \sigma^2 ) \right\} \cdot p(\mu_j | \mu,\tau^2) \\
\end{split}`
$$
]
]

- Those terms include a normal for `$\mu_j$` multiplied by a product of normals in which `$\mu_j$` is the mean, again mirroring the one-sample case, so you can show that
.block[
.small[
$$
`\begin{split}
\pi(\mu_j | \mu, \sigma^2,\tau^2, Y) & = \mathcal{N}\left(\mu_j^\star, \nu_j^\star \right) \ \ \ \ \textrm{where}\\
\\
\nu_j^\star & = \dfrac{1}{ \dfrac{n_j}{\sigma^2} + \dfrac{1}{\tau^2}  } ; \ \ \ \ \ \ \ \mu_j^\star = \nu_j^\star \left[ \dfrac{n_j}{\sigma^2} \bar{y}_j + \dfrac{1}{\tau^2} \mu \right]
\end{split}`
$$
]
]

---
## Full conditionals for group means

- Our estimate for each `$\mu_j$` is a weighted average of `$\bar{y}_j$` and `$\mu$`, ensuring that we are borrowing information across all levels through `$\mu$` and `$\tau^2$`.

- The weights for the weighted average is determined by relative precisions from the data and from the second level model.

--
  
- The groups with smaller `$n_j$` have estimated `$\mu_j^\star$` closer to `$\mu$` than schools with larger `$n_j$`.

- Thus, degree of shrinkage of `$\mu_j$` depends on ratio of within-group to between-group variances.

---
## Full conditionals for across-group variance

- The full conditional distribution of `$\tau^2$` is proportional to the part of the joint posterior `$\pi(\mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2 | Y)$` that involves `$\tau^2$`.

- That is,
.block[
.small[
$$
`\begin{split}
\pi(\tau^2 | \mu_1, \ldots, \mu_J, \mu, \sigma^2, Y) & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} p(\mu_j | \mu,\tau^2) \right\} \cdot \pi(\tau^2)\\
\end{split}`
$$
]
]

- As in the case for `$\mu$`, this looks like the one-sample normal problem, and our full conditional posterior is
.block[
.small[
$$
`\begin{split}
\pi(\tau^2 | \mu_1, \ldots, \mu_J, \mu, \sigma^2, Y) & = \mathcal{IG} \left(\dfrac{\eta_n}{2}, \dfrac{\eta_n\tau_n^2}{2}\right) \ \ \ \ \textrm{where}\\
\\
\eta_n = \eta_0 + J ; \ \ \ \ \ \ \ \tau_n^2 & = \dfrac{1}{\eta_n} \left[\eta_0\tau_0^2 + \sum\limits_{j=1}^{J} (\mu_j - \mu)^2 \right].\\
\end{split}`
$$
]
]

---
## Full conditionals for within-group variance

- Finally, the full conditional distribution of `$\sigma^2$` is proportional to the part of the joint posterior `$\pi(\mu_1, \ldots, \mu_J, \mu, \sigma^2,\tau^2 | Y)$` that involves `$\sigma^2$`.

- That is,
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2 | \mu_1, \ldots, \mu_J, \mu, \tau^2, Y) & \boldsymbol{\propto} \left\{ \prod_{j=1}^{J} \prod_{i=1}^{n_j} p(y_{ij} | \mu_j, \sigma^2 ) \right\} \cdot \pi(\sigma^2)\\
\end{split}`
$$
]
]

- We can again take advantage of the one-sample normal problem, so that our full conditional posterior (homework) is
.block[
.small[
$$
`\begin{split}
\pi(\sigma^2 | \mu_1, \ldots, \mu_J, \mu, \tau^2, Y) & = \mathcal{IG} \left(\dfrac{\nu_n}{2}, \dfrac{\nu_n\sigma_n^2}{2}\right) \ \ \ \ \textrm{where}\\
\\
\nu_n = \nu_0 + \sum\limits_{j=1}^{J} n_j ; \ \ \ \ \ \ \ \sigma_n^2 & = \dfrac{1}{\nu_n} \left[\nu_0\sigma_0^2 + \sum\limits_{j=1}^{J}\sum\limits_{i=1}^{n_j} (y_{ij} - \mu_j)^2 \right].\\
\end{split}`
$$
]
]

---

class: center, middle

# What's next?

### Move on to the readings for the next module!