STA 610L: Module 2.9

class: center, middle, inverse, title-slide

# STA 610L: Module 2.9
## Random effects ANCOVA (holistic analysis)
### Dr. Olanrewaju Michael Akande

---

## NELS data: taking a step back

Until now, we have used the NELS data to illustrate different aspects of model fitting for the multilevel model.

Now let's step back and think about model selection for the data more holistically, as if we're seeing them for the first time (for the most part).

---
## NELS variables

Here are our variables of interest in the NELS:
  - Math score (individual-level outcome)
  - SES (individual-level socio-economic status)
  - FLP (school level % of kids eligible for free or reduced-price lunch -- think of this as school-level SES)
    - 1: 0-5% eligible
    - 2: 5-30% eligible
    - 3: >30% eligible
  - Enrollment (school level # of kids in 10th grade, rounded and measured in hundreds, so 0=<100, 1=around 100, ..., 5=around 500)
  - Public (school level, takes value 1 if public school and 0 if private school)
  - Urbanicity (school level factor with levels rural, suburban, and urban )

---
## Model selection

As we think about our model selection process, we'll keep in mind a couple of methods for comparison.

- Likelihood ratio test for nested models
    - For tests involving fixed effects only, we can use a `$\chi^2_d$` for testing whether `$d$` fixed effects all equal 0 (ML, not ok for REML)
    
    - For tests involving random effects only, we can use a 50-50 mixture of `$\chi^2_{p-1}$` and `$\chi^2_p$`, where `$p$` is the number of random effect variances in the larger model
    
    - Non-nested models or testing both fixed and random effects, not so simple.

---
## Model selection

One option is to rely on the metrics we already often use for model comparison, like AIC, BIC, etc.

For BIC in particular, we have the following

- smaller is better
  
  - it already adjusts for model complexity
    
  - there is an approximation to posterior model probability
    
  - model selection is consistent
    
  - nesting between models is not required

---
## NELS data

```r
load('data/nels.Rdata')
avmscore.schools <- tapply(nels$mscore,nels$school,mean,na.rm=TRUE)
id.schools <- names(avmscore.schools)
m <- length(id.schools)
nels$sesstd <- nels$ses/sd(nels$ses)
nels$enroll <- factor(nels$enroll)
nels$flp <- factor(nels$flp)
nels$public <- factor(nels$public)
nels$urban <- factor(nels$urban)
```

---
## Descriptive statistics

---
## What's wrong with ANOVA?

Suppose I don't really care about school effects one way or the other. Why not just use ANOVA (or other fixed effects model) here?

Under a fixed effects model,

<br>

`$$\text{Cov}(y_j)=\begin{pmatrix} \sigma^2 & 0 & \ldots & 0 \\ 0 & \sigma^2 & \ldots & 0 \\ \vdots & & \vdots \\ 0 & 0 & \ldots & \sigma^2 \end{pmatrix}$$`

---
## What's wrong with ANOVA?

Under a random intercepts model,

<br>

`$$\text{Cov}(y_j)=\begin{pmatrix} \sigma^2 +\tau^2 & \tau^2 & \ldots & \tau^2 \\ \tau^2 & \sigma^2 + \tau^2 & \ldots & \tau^2 \\ \vdots & & & \vdots \\ \tau^2 & \tau^2 & \ldots & \sigma^2 + \tau^2 \end{pmatrix},$$`

and

`$Corr(y_{ij},y_{i'j})=\frac{\tau^2}{\tau^2+\sigma^2}$`
  
We generally don't believe independence within the same school environment holds.

This type of covariance structure is often called *exchangeable* or *compound symmetric*.

---
## Other considerations

Why not treat school as a fixed effect?  That should handle the school heterogeneity.

```r
m10 <- lm(mscore~school+enroll+flp+public+
         urbanicity, data=nels)
#summary(m10)
coef(m10)[(length(coef(m10))-30):length(coef(m10))]
```

---
## Other considerations

.large[

```
##         school4513         school4521         school4522         school4531 
##           1.771737           3.085000           4.330590          -5.556333 
##         school4532         school4541         school4542         school4551 
##           1.619069           1.912625           4.158000           1.240200 
##         school4552         school4553         school4561         school4562 
##           2.027769           7.574857           8.552385           1.357000 
##         school4571         school4572         school4582         school4591 
##          -3.348000           4.821000           9.443250           6.169727 
##         school4592         school4601         school4602         school4611 
##          12.405182         -13.559667           3.622333           5.820846 
##         school4612            enroll1            enroll2            enroll3 
##          -7.980692                 NA                 NA                 NA 
##            enroll4            enroll5               flp2               flp3 
##                 NA                 NA                 NA                 NA 
##            public1 urbanicitysuburban    urbanicityurban 
##                 NA                 NA                 NA
```
]

What happened to the estimates for enrollment, eligibility for free lunch, public/private status, and urbanicity?

---
## Other considerations

The school-specific fixed effects explain approximately *all* heterogeneity in means across schools, leaving basically no room for the other factors (which we care more about in terms of learning about patterns in the data) to explain any heterogeneity.

So this approach does not allow us to evaluate school-level predictors, and it is also very expensive in terms of spending degrees of freedom (estimating a lot of parameters).

This is a relatively common phenomenon when dealing with categorical group-level predictors.