Full TA Session
This lab is based on Exercise 6 of Section 12.11 of Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman A., and Hill, J. The data is from the following article:
Hamermesh, D. S. and Parker, A. (2005), “Beauty in the classroom: instructors’ pulchritude and putative pedagogical productivity”, Economics of Education Review, v. 24 (4), pp. 369-376.
The data contains information about student evaluations of instructor’s beauty and teaching quality for several courses at the University of Texas from 2000 to 2002. Evaluations were carried out during the last 3 weeks of the 15-week semester. Students administer the evaluation instrument while the instructor is absent from the classroom. The beauty judgements were made later using each instructor’s pictures by six undergraduate students (3 women and 3 men) who had not attended the classes and were not aware of the course evaluations. The sample contains a total of 94 professors across 463 classes, with the number of classes taught by each professor ranging from 1 to 13. Underlying the 463 observations are 16,957 completed evaluations from 25,547 registered students. The data you will use for this exercise excludes some variables in the original dataset.
Read the article (available via Duke library) for more information about the problem.
Download the data here. The code book is given below:
|profnumber||Id for each professor|
|beauty||Average of 6 standardized beauty ratings|
|eval||Average course rating|
|CourseID||Id for 30 individual courses. The remaining classes were not identified in the original data, so that they have value 0.|
|tenured||Is the professor tenured? 1 = yes, 0 = no|
|minority||Is the professor from a minority group? 1 = yes, 0 = no|
|didevaluation||Number of students filling out evaluations|
|female||1 = female, 0 = male|
|formal||Did the instructor dress formally (that is, wears tie–jacket/blouse) in the pictures used for the beauty ratings? 1 = yes, 0 = no|
|lower||Lower division course? 1 = yes, 0 = no|
|multipleclass||1 = more than one professor teaching sections in course in sample, 0 = otherwise|
|nonenglish||Did the Professor receive an undergraduate education from a non-English speaking country? 1 = yes, 0 = no|
|onecredit||1 = one-credit course, 0 = otherwise|
|profevaluation||Average instructor rating|
|tenuretrack||Is the professor tenure track faculty? 1 = yes, 0 = no|
The TA will walk you through how one would approach each of the following.
Treat the variable
eval as the response variable and the other variables as potential predictors.
Is the marginal distribution of
eval visibly skewed? If it is, try the log transformation. Does that look more “normal”? If you think the log transformation does not help, what other transformation(s) do you think might work? Examine and describe the distribution(s) for the transformation(s).
Note that for linear models and linear mixed effects models, the main normality assumption is actually a conditional assumption on the response variable and not a marginal one, so we should really be assessing the validity of the assumption via the residuals (since the normality assumption is a marginal assumption on the errors). That said, whenever the marginal distribution of your response is quite visibly skewed, the skew will often carry over to the errors anyway, so that we can start to think about potential transformations during EDA. Please note: best practice is to rely on the residual plots to be sure any transformations are needed.
Describe the overall relationship between
eval (or the transformed version) and
Examine the relationship between
eval (or the transformed version) and
CourseID. Are there any courses for which the relationship between
beauty looks potentially different than others?
There are 31 levels of
CourseID in all, which might be tough to explore graphically, so it might be a good idea to take a look at a random sample of say 7 classes plus class
CourseID == 0, making a total of 8 classes. Note that level
CourseID == 0 actually includes so many other classes which were not identified in the data. For the purposes of this lab, we treat that class as one single popular class.
Do you think it is meaningful to fit a multilevel linear model that includes varying slopes for
profnumber? Why or why not?
Now, explore the relationship between
eval and the other potential predictors, excluding
CourseID. Describe the most interesting relationships. Note that we should not include
profevaluation as a predictor for
eval. Why is that?
Write down a frequentist random intercepts model for
eval (or the transformed version) as the response, grouped by
beauty as the only predictor. Fit the model and interpret the results in the context of the question.
Using the model from question 6 above, how does the variation in average ratings across professors compare to the variation in ratings for the same professor? Which is higher, and what does that mean in context of the question?
Identify three instructor-level predictors excluding
profnumber being the instructor-level identifier) that you think we should control for based on the EDA. Write down and fit a varying-intercept model for the data grouped by
profnumber, and include
beauty, plus the other three variables you have identified as potential predictors. Interpret the results in the context of the question.
No submissions are required at the end of this lab.