Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Problem 3

Let \(math\)10 denote the percentage of students at a Michigan high school receiving a passing score on a standardized math test (see also Example 4.2 ). We are interested in estimating the effect of per-student spending on math performance. A simple model is i. The variable Inchprg is the percentage of students eligible for the federally funded school lunch program. Why is this a sensible proxy variable for poverty? ii. The table that follows contains OLS estimates, with and without lnchprg as an explanatory variable. $$\begin{array}{lcc|}\hline \text { Independent Variables } & (1) & (2) \\\\\hline \log (\text {expend}) & 11.13 & 7.75 \\\& (3.30) & (3.04) \\\\\log (\text {enroll}) & .022 & -1.26 \\\& (.615) & (.58) \\\\\text {Inchprg} & \- & -.324 \\\& & (.036) \\\\\text {intercept} & -69.24 & -23.14 \\\& (26.72) & (24.99) \\\\\text { 0bservations } & 428 & 428 \\\\\text {\(R\)-squared} & .0297 & .1893 \\\\\hline\end{array}$$ Explain why the effect of expenditures on math10 is lower in column (2) than in column (1). Is the effect in column (2) still statistically greater than zero? iii. Does it appear that pass rates are lower at larger schools, other factors being equal? Explain. iv. Interpret the coefficient on \(Inchprg\) in column (2). v. What do you make of the substantial increase in \(R^{2}\) from column (1) to column (2)?

Problem 4

The following equation explains weekly hours of television viewing by a child in terms of the child's age, mother's education, father's education, and number of siblings: $$ {tvhours}^{*}=\beta_{0}+\beta_{1} {age}+\beta_{2} {age}^{2}+\beta_{3} {motheduc}+\beta_{4} { fatheduc}+\beta_{5} {sibs}+u$$ We are worried that \(t v\)hours\(^{*}\) is measured with error in our survey. Let tuhours denote the reported hours of television viewing per week. i. What do the classical errors-in-variables (CEV) assumptions require in this application? ii. Do you think the CEV assumptions are likely to hold? Explain.

Problem 5

In Example \(4.4,\) we estimated a model relating number of campus crimes to student enrollment for a sample of colleges. The sample we used was not a random sample of colleges in the United States, because many schools in 1992 did not report campus crimes. Do you think that college failure to report crimes can be viewed as exogenous sample selection? Explain.

Problem 7

Consider the simple regression model with classical measurement error, \(y=\beta_{0}+\beta_{1} x^{*}+u,\) where we have \(m\) measures on \(x^{*} .\) Write these as \(z_{h}=x^{*}+e_{h}, h=1, \ldots, m .\) Assume that \(x^{*}\) is uncorrelated with \(u, e_{1}, \ldots, e_{m},\) that the measurement errors are pairwise uncorrelated, and have the same variance, \(\sigma_{e}^{2} .\) Let \(w=\left(z_{1}+\ldots+z_{m}\right) / m\) be the average of the measures on \(x^{*},\) so that, for each observation \(i, w_{i}=\left(z_{i 1}+\ldots+z_{i m}\right) / m\) is the average of the \(m\) measures. Let \(\bar{\beta}_{1}\) be the OLS estimator from the simple regression \(y_{i}\) on \(1, w_{i}, i=1, \ldots, n,\) using a random sample of data. i. Show that $$\operatorname{plim}\left(\bar{\beta}_{1}\right)=\beta_{1}\left\\{\frac{\sigma_{x^{*}}^{2}}{\left[\sigma_{x^{*}}^{2}+\left(\sigma_{e}^{2} / m\right]\right.}\right\\}$$, $$\text { [Hint: The plim of }\left.\bar{\beta}_{1} \text { is } \operatorname{Cov}(w, y) / \operatorname{Var}(w) .\right]$$ ii. How does the inconsistency in \(\bar{\beta}_{1}\) compare with that when only a single measure is available (that is, \(m=1\) )? What happens as \(m\) grows? Comment.

Problem 8

The point of this exercise is to show that tests for functional form cannot be relied on as a general test for omitted variables. Suppose that, conditional on the explanatory variables \(x_{1}\) and \(x_{2},\) a linear model relating \(y\) to \(x_{1}\) and \(x_{2}\) satisfies the Gauss-Markov assumptions: $$\begin{aligned}y &=\beta_{0}+\beta_{1} x_{1}+\beta_{2} x_{2}+u \\\\\mathrm{E}\left(u | x_{1}, x_{2}\right) &=0 \\\\\operatorname{Var}\left(u | x_{1}, x_{2}\right) &=\sigma^{2}\end{aligned}$$. To make the question interesting, assume \(\beta_{2} \neq 0\). Suppose further that \(x_{2}\) has a simple linear relationship with \(x_{1}\) : $$\begin{aligned}x_{2} &=\delta_{0}+\delta_{1} x_{1}+r \\\\\mathrm{E}\left(r | x_{1}\right) &=0 \\\\\operatorname{Var}\left(r | x_{1}\right) &=\tau^{2}\end{aligned}$$ i. Show that $$\mathbf{E}\left(y | x_{1}\right)=\left(\beta_{0}+\beta_{2} \delta_{0}\right)+\left(\beta_{1}+\beta_{2} \delta_{1}\right) x_{1}$$. Under random sampling, what is the probability limit of the OLS estimator from the simple regression of \(y\) on \(x_{1} ?\) Is the simple regression estimator generally consistent for \(\beta_{1} ?\) ii. If you run the regression of \(y\) on \(x_{1}, x_{1}^{2},\) what will be the probability limit of the OLS estimator of the coefficient on \(x_{1}^{2} ?\) Explain. iii. Using substitution, show that we can write $$y=\left(\beta_{0}+\beta_{2} \delta_{0}\right)+\left(\beta_{1}+\beta_{2} \delta_{1}\right) x_{1}+u+\beta_{2} r$$. It can be shown that, if we define \(v=u+\beta_{2} r\) then \(\mathrm{E}\left(v | x_{1}\right)=0, \operatorname{Var}\left(v | x_{1}\right)=\sigma^{2}+\beta_{2}^{2} \tau^{2} .\) What consequences does this have for the \(t\) statistic on \(x_{1}^{2}\) from the regression in part (ii)? iv. What do you conclude about adding a nonlinear function of \(x_{1}-\) in particular, \(x_{1}^{2}-\) in an attempt to detect omission of \(x_{2} ?\)

Problem 10

This exercise shows that in a simple regression model, adding a dummy variable for missing data on the explanatory variable produces a consistent estimator of the slope coefficient if the "missingness" is unrelated to both the unobservable and observable factors affecting \(y\). Let \(m\) be a variable such that \(m=1\) if we do not observe \(x\) and \(m=0\) if we observe \(x\). We assume that \(y\) is always observed. The population model is $$\begin{aligned}y &=\beta_{0}+\beta_{1} x+u \\\\\mathrm{E}(u | x) &=0.\end{aligned}$$ i. Provide an interpretation of the stronger assumption $$\mathrm{E}(u | x, m)=0$$. In particular, what kind of missing data schemes would cause this assumption to fail? ii. Show that we can always write $$y=\beta_{0}+\beta_{1}(1-m) x+\beta_{1} m x+u$$. iii. Let \(\left(x_{i}, y_{i}, m_{i}\right): i=1, \ldots, n\) be random draws from the population, where \(x_{i}\) is missing when \(m_{i}=1 .\) Explain the nature of the variable \(z_{i}=\left(1-m_{i}\right) x_{i} .\) In particular, what does this variable equal when \(x_{i}\) is missing? iv. Let \(\rho=\mathrm{P}(m=1)\) and assume that \(m\) and \(x\) are independent. Show that $$\operatorname{Cov}[(1-m) x, m x]=-\rho(1-\rho) \mu_{x}$$, where \(\mu_{x}=\mathrm{E}(x) .\) What does this imply about estimating \(\beta_{1}\) from the regression \(y_{i}\) on \(z_{i}\) \(i=1, \ldots, n ?\) v. If \(m\) and \(x\) are independent, it can be shown that $$m x=\delta_{0}+\delta_{1} m+v$$, where \(v\) is uncorrelated with \(m\) and \(z=(1-m) x .\) Explain why this makes \(m\) a suitable proxy variable for \(m x .\) What does this mean about the coefficient on \(z_{i}\) in the regression $$y_{i} \text { on } z_{i}, m_{i}, i=1, \ldots, n ?$$ vi. Suppose for a population of children, \(y\) is a standardized test score, obtained from school records, and \(x\) is family income, which is reported voluntarily by families (and so some families do not report their income). Is it realistic to assume \(m\) and \(x\) are independent? Explain.

Access millions of textbook solutions in one place

  • Access over 3 million high quality textbook solutions
  • Access our popular flashcard, quiz, mock-exam and notes features
  • Access our smart AI features to upgrade your learning
Get Vaia Premium now
Access millions of textbook solutions in one place

Recommended explanations on History Textbooks