class: center, middle, inverse, title-slide # STA 610L: Module 3.5 ## Logistic mixed effects model (Part I) ### Dr. Olanrewaju Michael Akande --- ## Generalized linear mixed effects model (GLMM) As we continue to generalize the concepts we have covered, let's think about the incorporation of random effects into the standard representation of generalized linear models. The basic idea is that we assume there is natural heterogeneity across groups in a subset of the regression coefficients. These coefficients are assumed to vary across groups according to some distribution. Conditional on the random effects, we then assume the responses for a single subject are independent observations from a distribution in the exponential family. --- ## GLMM Note: when we look at longitudinal data, we will group by `\(i\)`, otherwise, we will group by `\(j\)`. .hlight[Generalized linear mixed effects models (GLMMs)] for longitudinal data usually take the form `$$g(E[Y_{ij} \mid {\bf X}_{ij}, {\bf Z}_{ij}, \boldsymbol{\beta}, {\bf b}_i])={\bf X}_{ij}'\boldsymbol{\beta}+{\bf Z}_{ij}'{\bf b}_i,$$` so that we assume the conditional distribution of each `\(Y_{ij}\)`, conditional on `\({\bf b}_i\)` (and everything else), belongs to the exponential family with the conditional mean given above, where `\(g(\cdot)\)` is a known link function. From here on, I will often write `\(E[Y_{ij} \mid {\bf X}_{ij}, {\bf Z}_{ij}, \boldsymbol{\beta}, {\bf b}_i]\)` as `\(E[Y_{ij} \mid {\bf b}_i]\)` for brevity. We assume the `\({\bf b}_i\)` are independent across subjects with `\({\bf b}_i \overset{iid}\sim N({\bf 0}, {\bf D})\)`. We also assume that given `\({\bf b}_i\)`, the responses `\(Y_{i1}, \ldots, Y_{in}\)` are mutually independent. --- ## Example: multilevel linear regression with random intercepts `$$Y_{ij}={\bf X}_{ij}'\boldsymbol{\beta}+b_i + \varepsilon_{ij},$$` where `$$b_i\overset{iid}\sim N(0,\sigma_b^2) \perp \varepsilon_{ij} \overset{iid}\sim N(0,\sigma_e^2)$$` and `$$E(Y_{ij} \mid b_i)={\bf X}_{ij}'\boldsymbol{\beta}+b_i$$` --- ## Example: multilevel logistic model with random intercepts `$$\text{logit}(E(Y_{ij} \mid b_i))={\bf X}_{ij}'\boldsymbol{\beta}+b_i,$$` where `$$b_i\sim N(0,\sigma^2)$$` <br> Question: what happened to `\(\varepsilon_{ij}\)`? --- ## Example: multilevel Poisson model `$$\log(E(Y_{ij} \mid {\bf b}_i))={\bf X}_{ij}'\boldsymbol{\beta}+{\bf Z}_{ij}'{\bf b}_i.$$` We could write `$${\bf X}_{ij}={\bf Z}_{ij}=[1,t_{ij}],$$` so we have random slopes and intercepts and then assume `$${\bf b}_i \sim N({\bf 0}, {\bf D}).$$` --- ## Interpretation of GLMM estimates In the model `$$\text{logit}(E[Y_{ij} \mid b_i])={\bf X}_{ij}'\boldsymbol{\beta}+b_i,$$` with `\(b_i \sim N(0,\sigma^2)\)`, each element of `\(\boldsymbol{\beta}\)` measures the change in the log odds of a 'positive' response per unit change in the respective covariate, in a specific group that has an underlying propensity to respond positively given by `\(b_i\)`. <br> That is, we need to hold the group membership constant when interpreting `\(\beta_k\)`, just as we would hold the values of `\(\bf{x}_{i,(-k)}\)` constant when interpreting `\(\beta_k\)` --- ## Caution Note also that with a non-linear link function, a non-linear contrast of the averages is not equal to the average of non-linear contrasts, so that the parameters do not in general have population-average interpretations (as they would in a linear mixed effects model, which has identity link). So while in the lmm `$$g(E(Y_{ij} \mid {\bf X}_{ij}, {\bf b}_i))={\bf X}_{ij}'\boldsymbol{\beta}+{\bf Z}_{ij}'{\bf b}_i$$` implies that `\(E(Y_{ij} \mid {\bf X}_{ij})={\bf X}_{ij}'\boldsymbol{\beta}\)`, when `\(g(\cdot)\)` is non-linear (say the logit), then the same relationship does not hold and `$$g(E(Y_{ij} \mid {\bf X}_{ij}))\neq {\bf X}_{ij}'\boldsymbol{\beta},$$` for all `\(\boldsymbol{\beta}\)` when averaged over the distribution of the random effects. --- ## Intraclass correlation Let's focus on binary response for the rest of this module. That is, each `\(Y_{ij} \in \{0,1\}\)`. Now consider an unobserved continuous variable `\(W_{ij}\)`. `\(W_{ij}\)` is related to `\(Y_{ij}\)` in the following manner: `\(Y_{ij}=1\)` if `\(W_{ij}<c\)`, and `\(Y_{ij}=0\)` otherwise. <br> The location of `\(c\)` and the distribution of `\(W\)` govern the probability that `\(Y=1\)`. --- ## Intraclass correlation Useful way of thinking about the model (but not an essential assumption) is: `$$W_{ij}= {\bf X}_{ij}'\boldsymbol{\beta}+b_i+\varepsilon_{ij}$$` - `\(\varepsilon_{ij} \sim N(0,1)\)`: probit regression - `\(\varepsilon_{ij} \sim\)` standard logistic (mean 0, variance `\(\frac{\pi^2}{3}\)`): logistic regression <br> We can use this framework to calculate ICC's: - `\(ICC=\frac{\sigma^2}{\sigma^2+1} ~~~ \text{for probit}\)` - `\(ICC=\frac{\sigma^2}{\sigma^2+\frac{\pi^2}{3}} ~~~ \text{for logistic}\)` --- ## Estimation using ML The joint probability density function is given by `$$f({\bf Y}_i \mid {\bf X}_i, {\bf b}_i)f({\bf b}_i).$$` However, recall that the `\({\bf b}_i\)` are unobserved. How then do we handle the `\({\bf b}_i\)` in the maximization? <br> Typically, we base frequentist inferences on the marginal (integrated) likelihood function, given by `$$\prod_{i=1}^N \int f({\bf Y}_i \mid {\bf X}_i, {\bf b}_i)f({\bf b}_i) d{\bf b}_i.$$` Estimation using maximum likelihood then involves a two-step procedure. --- ## ML estimation steps .hlight[Step 1]: Obtain ML estimates of `\(\boldsymbol{\beta}\)` and `\({\bf D}\)` based on the marginal likelihood of the data. While this may sound simple, numerical or Monte Carlo integration techniques are typically required, and the techniques used may have substantial impacts on resulting inferences. .hlight[Step 2]: Given estimates of `\(\boldsymbol{\beta}\)` and `\({\bf D}\)`, predict the random effects as `$$\widehat{{\bf b}}_i=E({\bf b}_i \mid {\bf Y}_i, \widehat{\boldsymbol{\beta}}, \widehat{{\bf D}}).$$` Again, simple analytic solutions are rarely available. <br> Even when the burden of integration is not that great, the optimization problem may be difficult to solve. --- ## Random effects logistic regression Inverse logit functions for random intercepts logistic model with a single predictor. <img src="3-5-logistic-mixed-effects-model-I_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Random effects logistic regression Inverse logit functions for random slopes logistic model with a single predictor. <img src="3-5-logistic-mixed-effects-model-I_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Random effects logistic regression Inverse logit functions for random intercepts and random slopes logistic model with a single predictor. <img src="3-5-logistic-mixed-effects-model-I_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## 1988 elections analysis To illustrate how to fit and interpret the results of random effect logistic models, we will use a sample data on election polls. National opinion polls are conducted by a variety of organizations (e.g., media, polling organizations, campaigns) leading up to elections. While many of the best opinion polls are conducted at a national level, it can also be often interesting to estimate voting opinions and preferences at the state or even local level. Well-designed polls are generally based on national random samples with corrections for nonresponse based on a variety of demographic factors (e.g., sex, ethnicity, race, age, education). The data is from CBS News surveys conducted during the week before the 1988 election. Respondents were asked about their preferences for either the Republican candidate (Bush Sr.) or the Democratic candidate (Dukakis). --- ## 1988 elections analysis The dataset includes 2193 observations from one of eight surveys (the most recent CBS News survey right before the election) in the original full data. .small[ Variable | Description :------------- | :------------ org | cbsnyt = CBS/NYT .hlight[bush] | .hlight[1 = preference for Bush Sr., 0 = otherwise] state | 1-51: 50 states including DC (number 9) edu | education: 1=No HS, 2=HS, 3=Some College, 4=College Grad age | 1=18-29, 2=30-44, 3=45-64, 4=65+ female | 1=female, 0=male black | 1=black, 0=otherwise region | 1=NE, 2=S, 3=N, 4=W, 5=DC v_prev | average Republican vote share in the three previous elections (adjusted for home-state and home-region effects in the previous elections) ] Given that the data has a natural multilevel structure (through `state` and `region`), it makes sense to explore hierarchical models for this data. --- ## 1988 elections analysis Both voting turnout and preferences often depend on a complex combination of demographic factors. In our example dataset, we have demographic factors such as biological sex, race, age, education, which we may all want to look at by state, resulting in `\(2 \times 2 \times 4 \times 4 \times 51 = 3264\)` potential categories of respondents. We may even want to control for `region`, adding to the number of categories. Clearly, without a very large survey (most political survey poll around 1000 people), we will need to make assumptions in order to even obtain estimates in each category. We usually cannot include all interactions; we should therefore select those to explore (through EDA and background knowledge). The data is in the file `polls_subset.txt` on Sakai. --- ## 1988 elections analysis ```r ###### Load the data polls_subset <- read.table("data/polls_subset.txt",header=TRUE) str(polls_subset) ``` ``` ## 'data.frame': 2193 obs. of 10 variables: ## $ org : chr "cbsnyt" "cbsnyt" "cbsnyt" "cbsnyt" ... ## $ survey: int 9158 9158 9158 9158 9158 9158 9158 9158 9158 9158 ... ## $ bush : int NA 1 0 0 1 1 1 1 0 0 ... ## $ state : int 7 39 31 7 33 33 39 20 33 40 ... ## $ edu : int 3 4 2 3 2 4 2 2 4 1 ... ## $ age : int 1 2 4 1 2 4 2 4 3 3 ... ## $ female: int 1 1 1 1 1 1 0 1 0 0 ... ## $ black : int 0 0 0 0 0 0 0 0 0 0 ... ## $ region: int 1 1 1 1 1 1 1 1 1 1 ... ## $ v_prev: num 0.567 0.527 0.564 0.567 0.524 ... ``` ```r head(polls_subset) ``` ``` ## org survey bush state edu age female black region v_prev ## 1 cbsnyt 9158 NA 7 3 1 1 0 1 0.5666333 ## 2 cbsnyt 9158 1 39 4 2 1 0 1 0.5265667 ## 3 cbsnyt 9158 0 31 2 4 1 0 1 0.5641667 ## 4 cbsnyt 9158 0 7 3 1 1 0 1 0.5666333 ## 5 cbsnyt 9158 1 33 2 2 1 0 1 0.5243666 ## 6 cbsnyt 9158 1 33 4 4 1 0 1 0.5243666 ``` --- ## 1988 elections analysis ```r summary(polls_subset) ``` ``` ## org survey bush state ## Length:2193 Min. :9158 Min. :0.0000 Min. : 1.00 ## Class :character 1st Qu.:9158 1st Qu.:0.0000 1st Qu.:14.00 ## Mode :character Median :9158 Median :1.0000 Median :26.00 ## Mean :9158 Mean :0.5578 Mean :26.11 ## 3rd Qu.:9158 3rd Qu.:1.0000 3rd Qu.:39.00 ## Max. :9158 Max. :1.0000 Max. :51.00 ## NA's :178 ## edu age female black ## Min. :1.000 Min. :1.000 Min. :0.0000 Min. :0.00000 ## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:0.0000 1st Qu.:0.00000 ## Median :2.000 Median :2.000 Median :1.0000 Median :0.00000 ## Mean :2.653 Mean :2.289 Mean :0.5887 Mean :0.07615 ## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:1.0000 3rd Qu.:0.00000 ## Max. :4.000 Max. :4.000 Max. :1.0000 Max. :1.00000 ## ## region v_prev ## Min. :1.000 Min. :0.1530 ## 1st Qu.:2.000 1st Qu.:0.5278 ## Median :2.000 Median :0.5481 ## Mean :2.431 Mean :0.5550 ## 3rd Qu.:3.000 3rd Qu.:0.5830 ## Max. :5.000 Max. :0.6927 ## ``` --- ## 1988 elections analysis ```r polls_subset$v_prev <- polls_subset$v_prev*100 #rescale polls_subset$region_label <- factor(polls_subset$region,levels=1:5, labels=c("NE","S","N","W","DC")) #we consider DC as a separate region due to its distinctive voting patterns polls_subset$edu_label <- factor(polls_subset$edu,levels=1:4, labels=c("No HS","HS","Some College","College Grad")) polls_subset$age_label <- factor(polls_subset$age,levels=1:4, labels=c("18-29","30-44","45-64","65+")) #the data includes states but without the names, which we will need, #so let's grab that from R datasets data(state) #"state" is an R data file (type ?state from the R command window for info) state.abb #does not include DC, so we will create ours ``` ``` ## [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA" ## [16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ" ## [31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT" ## [46] "VA" "WA" "WV" "WI" "WY" ``` ```r #In the polls data, DC is the 9th "state" in alphabetical order state_abbr <- c (state.abb[1:8], "DC", state.abb[9:50]) polls_subset$state_label <- factor(polls_subset$state,levels=1:51,labels=state_abbr) rm(list = ls(pattern = "state")) #remove unnecessary values in the environment ``` --- ## 1988 elections analysis ```r ###### View properties of the data head(polls_subset) ``` ``` ## org survey bush state edu age female black region v_prev region_label ## 1 cbsnyt 9158 NA 7 3 1 1 0 1 56.66333 NE ## 2 cbsnyt 9158 1 39 4 2 1 0 1 52.65667 NE ## 3 cbsnyt 9158 0 31 2 4 1 0 1 56.41667 NE ## 4 cbsnyt 9158 0 7 3 1 1 0 1 56.66333 NE ## 5 cbsnyt 9158 1 33 2 2 1 0 1 52.43666 NE ## 6 cbsnyt 9158 1 33 4 4 1 0 1 52.43666 NE ## edu_label age_label state_label ## 1 Some College 18-29 CT ## 2 College Grad 30-44 PA ## 3 HS 65+ NJ ## 4 Some College 18-29 CT ## 5 HS 30-44 NY ## 6 College Grad 65+ NY ``` ```r dim(polls_subset) ``` ``` ## [1] 2193 14 ``` --- ## 1988 elections analysis ```r ###### View properties of the data str(polls_subset) ``` ``` ## 'data.frame': 2193 obs. of 14 variables: ## $ org : chr "cbsnyt" "cbsnyt" "cbsnyt" "cbsnyt" ... ## $ survey : int 9158 9158 9158 9158 9158 9158 9158 9158 9158 9158 ... ## $ bush : int NA 1 0 0 1 1 1 1 0 0 ... ## $ state : int 7 39 31 7 33 33 39 20 33 40 ... ## $ edu : int 3 4 2 3 2 4 2 2 4 1 ... ## $ age : int 1 2 4 1 2 4 2 4 3 3 ... ## $ female : int 1 1 1 1 1 1 0 1 0 0 ... ## $ black : int 0 0 0 0 0 0 0 0 0 0 ... ## $ region : int 1 1 1 1 1 1 1 1 1 1 ... ## $ v_prev : num 56.7 52.7 56.4 56.7 52.4 ... ## $ region_label: Factor w/ 5 levels "NE","S","N","W",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ edu_label : Factor w/ 4 levels "No HS","HS","Some College",..: 3 4 2 3 2 4 2 2 4 1 ... ## $ age_label : Factor w/ 4 levels "18-29","30-44",..: 1 2 4 1 2 4 2 4 3 3 ... ## $ state_label : Factor w/ 51 levels "AL","AK","AZ",..: 7 39 31 7 33 33 39 20 33 40 ... ``` --- ## 1988 elections analysis I will not do any meaningful EDA here. I expect you to be able to do this yourself. Let's just take a look at the amount of data we have for "bush" and the age:edu interaction. ```r ###### Exploratory data analysis table(polls_subset$bush) #well split by the two values ``` ``` ## ## 0 1 ## 891 1124 ``` ```r table(polls_subset$edu,polls_subset$age) ``` ``` ## ## 1 2 3 4 ## 1 44 42 67 96 ## 2 232 283 223 116 ## 3 141 205 99 54 ## 4 119 285 125 62 ``` --- ## 1988 elections analysis As a start, we will consider a simple model with fixed effects of race and sex and a random effect for state (50 states + the District of Columbia). $$ `\begin{split} \textrm{bush}_{ij} | \boldsymbol{x}_{ij} & \sim \textrm{Bernoulli}(\pi_{ij}); \ \ \ i = 1, \ldots, n; \ \ \ j = 1, \ldots, J=51; \\ \textrm{log}\left(\dfrac{\pi_{ij}}{1-\pi_{ij}}\right) & = \beta_{0} + b_{0j} + \beta_1 \textrm{female}_{ij} + \beta_2 \textrm{black}_{ij}; \\ b_{0j} & \sim N(0, \sigma^2). \end{split}` $$ In `R`, we have ```r #library(lme4) model1 <- glmer(bush ~ black+female+(1|state_label), family=binomial(link="logit"), data=polls_subset) summary(model1) ``` --- ## 1988 elections analysis ``` ## Generalized linear mixed model fit by maximum likelihood (Laplace ## Approximation) [glmerMod] ## Family: binomial ( logit ) ## Formula: bush ~ black + female + (1 | state_label) ## Data: polls_subset ## ## AIC BIC logLik deviance df.resid ## 2666.7 2689.1 -1329.3 2658.7 2011 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -1.7276 -1.0871 0.6673 0.8422 2.5271 ## ## Random effects: ## Groups Name Variance Std.Dev. ## state_label (Intercept) 0.1692 0.4113 ## Number of obs: 2015, groups: state_label, 49 ## ## Fixed effects: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.44523 0.10139 4.391 1.13e-05 ## black -1.74161 0.20954 -8.312 < 2e-16 ## female -0.09705 0.09511 -1.020 0.308 ## ## Correlation of Fixed Effects: ## (Intr) black ## black -0.119 ## female -0.551 -0.005 ``` --- ## 1988 elections analysis Looks like we dropped some NAs. ```r c(sum(complete.cases(polls_subset)),sum(!complete.cases(polls_subset))) ``` ``` ## [1] 2015 178 ``` Not ideal; we'll learn about methods for dealing with missing data soon. Interpretation of results: + For a fixed state (or across all states), a non-black male respondent has odds of `\(e^{0.45}=1.57\)` of supporting Bush. + For a fixed state and sex, a black respondent as `\(e^{-1.74}=0.18\)` times (an 82% decrease) the odds of supporting Bush as a non-black respondent; you are much less likely to support Bush if your race is black compared to being non-black. + For a given state and race, a female respondent has `\(e^{-0.10}=0.91\)` (a 9% decrease) times the odds of supporting Bush as a male respondent. However, this effect is not actually statistically significant! --- ## 1988 elections analysis The state-level standard deviation is estimated at 0.41, so that the states do vary some, but not so much. I expect that you will be able to interpret the corresponding confidence intervals. ``` ## Computing profile confidence intervals ... ``` ``` ## 2.5 % 97.5 % ## .sig01 0.2608567 0.60403428 ## (Intercept) 0.2452467 0.64871247 ## black -2.1666001 -1.34322366 ## female -0.2837100 0.08919986 ``` --- class: center, middle # What's next? ### Move on to the readings for the next module!