Suppose that, for one semester, you can collect the following data on a random
sample of college juniors and seniors for each class taken: a standardized
final exam score, percentage of lectures attended, a dummy variable indicating
whether the class is within the student's major, cumulative grade point
average prior to the start of the semester, and SAT score.
i. Why would you classify this data set as a cluster sample? Roughly, how many
observations would you expect for the typical student?
ii. Write a model, similar to equation (14.18) , that explains final exam
performance in terms of attendance and the other characteristics. Use to
subscript student and to subscript class. Which variables do not change
within a student?
iii. If you pool all of the data and use OLS, what are you assuming about
unobserved student characteristics that affect performance and attendance
rate? What roles do SAT score and prior GPA play in this regard?
iv. If you think SAT score and prior GPA do not adequately capture student
ability, how would you estimate the effect of attendance on final exam
performance?