Lab 6: Do more beautiful professors get higher evaluations? (Part II)

Multilevel and hierarchical models

Due: 11:59pm, Thursday, October 7


You MUST submit both your .Rmd and .pdf files to the course site on Gradescope here: Do NOT create a zipped folder with the two documents but instead, upload them as two separate documents. Also, be sure to knit to pdf and NOT html; ask the TA about knitting to pdf if you cannot figure it out. Be sure to submit under the right assignment entry. Finally, when submitting your files, please select the corresponding pages for each exercise.


This is an individual lab and each student must submit a separate solution by the due date.

This lab is based on Exercise 6 of Section 12.11 of Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman A., and Hill, J. The data is from the following article:
Hamermesh, D. S. and Parker, A. (2005), “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity”, Economics of Education Review, v. 24 (4), pp. 369-376.

The data contains information about student evaluations of instructor’s beauty and teaching quality for several courses at the University of Texas from 2000 to 2002. Evaluations were carried out during the last 3 weeks of the 15-week semester. Students administer the evaluation instrument while the instructor is absent from the classroom. The beauty judgements were made later using each instructor’s pictures by six undergraduate students (3 women and 3 men) who had not attended the classes and were not aware of the course evaluations. The sample contains a total of 94 professors across 463 classes, with the number of classes taught by each professor ranging from 1 to 13. Underlying the 463 observations are 16,957 completed evaluations from 25,547 registered students. The data you will use for this exercise excludes some variables in the original dataset.

Read the article (available via Duke library) for more information about the problem.

The Data

Download the data here. The code book is given below:

Variable Description
profnumber Id for each professor
beauty Average of 6 standardized beauty ratings
eval Average course rating
CourseID Id for 30 individual courses. The remaining classes were not identified in the original data, so that they have value 0.
tenured Is the professor tenured? 1 = yes, 0 = no
minority Is the professor from a minority group? 1 = yes, 0 = no
age Professor’s age
didevaluation Number of students filling out evaluations
female 1 = female, 0 = male
formal Did the instructor dress formally (that is, wears tie–jacket/blouse) in the pictures used for the beauty ratings? 1 = yes, 0 = no
lower Lower division course? 1 = yes, 0 = no
multipleclass 1 = more than one professor teaching sections in course in sample, 0 = otherwise
nonenglish Did the Professor receive an undergraduate education from a non-English speaking country? 1 = yes, 0 = no
onecredit 1 = one-credit course, 0 = otherwise
percentevaluating float didevaluation/students
profevaluation Average instructor rating
students Class enrollment
tenuretrack Is the professor tenure track faculty? 1 = yes, 0 = no


You do not need to write a full report for the exercises below. You can simply answer/address each question directly. You should be able to leverage ideas and techniques from the TA’s session for the previous lab.

Treat the variable eval (or a transformed version if needed) as the response variable and the other variables as potential predictors.

  1. Write down a Bayesian random intercepts model for eval (or the transformed version) as the response, grouped by profnumber, with beauty as the only predictor. Write down a version you can fit using the brms package (meaning for example that you won’t be able to specify a multivariate normal prior for your vector of fixed effects).

  2. Fit the model using the default prior options in brms and interpret the results in the context of the question.

  3. Using the model from question 2 above, how does the variation in average ratings across professors compare to the variation in ratings for the same professor? Which is higher, and what does that mean in context of the question? You should factor in the posterior distribution of each corresponding parameter.

  4. Identify three instructor-level predictors excluding beauty (with profnumber being the instructor-level identifier) that you think we should control for based on EDA. Write down and fit a Bayesian varying/random intercepts model for the data (using the default prior options in brms) grouped by profnumber, and include beauty, plus the other three variables you have identified as potential predictors.

  5. Interpret the results of the new model in the context of the question. Factor in the posterior distribution of the parameters.

  6. For the model from question 4, try out a few other choices for the default hyper-priors. That is, fit the model again using your own choices for the parameters of the prior distributions and report your findings. Did any of the estimates change? If yes, which and how much?

  7. We will cover how to incorporate additional grouping variables in more detail soon but here you will start with a relatively simple introduction. Extend the model from question 4 to allow for varying/random intercepts by CourseID and varying/random slopes for beauty by CourseID. To implement this in R, just add “+ (beauty | CourseID)” to the formula from your existing model.

  8. Fit the model using the default prior options in brms. Did any of the results for the fixed effects change? If yes, comment on the changes.


Total: 20 points.