Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: Household food consumption. The data in the table below were collected for a random sample of 26 households in Washington, D.C. An economist wants to relate household food consumption, y, to household income, x1, and household size, x2, with the first-order model.

Ey=β0+β1x1+β2x2

  1. Fit the model to the data. Do you detect any signs of multicollinearity in the data? Explain.
  2. Is there visual evidence (from a residual plot) that a second-order model may be more appropriate for predicting household food consumption? Explain.
  3. Comment on the assumption of constant error variance, using a residual plot. Does it appear to be satisfied?
  4. Are there any outliers in the data? If so, identify them.
  5. Based on a graph of the residuals, does the assumption of normal errors appear to be reasonably satisfied? Explain.

Short Answer

Expert verified

Answers

  1. To detect the sign of multicollinearity, it can be seen that the sign of the household’s income is negative but logically, the household’s consumption would increase with an increase in income. This might indicate the existence of multicollinearity.
  2. From the residual plot, it can be seen that the second-order model is more appropriate for the data.
  3. The error variance from the residual plot does not look constant as the error terms are closer for the early observation while for the later observations, the spread in error terms increases.
  4. Observation 26 is an outlier as the residual value for the observation was 2.789.
  5. The assumption of normal errors is not satisfied here as the error variance from the graph is visible that is not constant.

Step by step solution

01

Given information  

The number of observations is 26 households and the first order model is given as.

02

Model fitting 

a.

Given in the question is data of 26 household regarding their food consumption, y, to household income, and household size. The excel summary output is attached below. To detect the sign of multicollinearity, it can be seen that the sign of the household’s income is negative but logically, the household’s consumption would increase with an increase in income. This might indicate that existence of multicollinearity.

The model can be fitted using excel function data analysis. The values of y and ,x1 and x2 can be taken from the excel table and the regression model can be fitted using data analysis function in the data tab in the excel. This function automatically gives summary output of the model after getting the data about dependent, y, and independent variables, x1 and x2 .

For the anova table we need to calculate the mean if the independent variable and then calculate the SSR, SSE, and SST, after that one need to calculate the degrees of freedom and the mean squares and the F.

The SSR is calculated by usingnΣ(Xj--x¯j..)2, and the SSE is calculated by squaring each term and adding them all. The SST is the sum of SSR and SSE. The MS regression is calculated by dividing SST by degrees of regression and similarly the MS residual is calculated by dividing SSE by degrees of residual and F is calculated by dividing MS regression by MS residual.

The coefficients of x is calculated by using this formula: nxy-xynx2-x2whereas the coefficient of intercept is calculated by yx2-xxynx2-x2.

Thestandard error is calculated bydividingthe standard deviation by the sample size's square root.

The excel summary input is attached here.

03

Residual plot

b.

The process to drawn the residual plot is given as follows:

  • Mean E = 0 - First, we demonstrate how a residual plot can detect a model in which the hypothesized relationship between E(y) and an independent variable x is mis specified. The assumption of mean error of 0 is violated in these types of models.
  • Constant Error Variance-Residual plots can also be used to detect violations of the assumption of constant error variance.
  • Errors Normally Distributed- Several graphical methods are available for assessing whether the random error e has an approximate normal distribution. If the assumption of normally distributed errors is satisfied, then we expect approximately 95% of the residuals to fall within 2 standard deviations of the mean of 0, and almost all of the residuals to lie within 3 standard deviations of the mean of 0.
  • Errors Independent- The assumption of independent errors is violated when successive errors are correlated.

From the residual plot, it can be seen that second-order model is more appropriate for the data.

The graph can be drawn by plotting the residual values which are calculated by y^-yon the y -axis and putting the no of observations on the x-axis. After plotting the individual combinations, a line can be drawn to reflect the relationship between the two parameters.

04

Constant error variance assumption

c.

The error variance from the residual plot does not look constant as the error terms are closer for the early observation while for the later observations, the spread in error terms increases.

05

Outlier

d.

Observation 26 is an outlier as the residual value for the observation was 2.789 and from the graph also it is visible that there is an outlier.

06

Assumption of normal errors 

The assumption of normally distributed errors is satisfied, then we expect approximately 95% of the residuals to fall within 2 standard deviations of the mean of 0, and almost all of the residuals to lie within 3 standard deviations of the mean of 0.

Here the assumption of normal errors is not satisfied here as the error variance from the graph is visible that is not constant. Some residual value observations are close to the regression line indicating small variance. However, some values are far from the regression line indicating a large variance between the y values and regressed y-values. This indicates that the error variance is not the same.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Can money spent on gifts buy love? Refer to the Journal of Experimental Social Psychology (Vol. 45, 2009) study of whether buying gifts truly buys love, Exercise 9.9 (p. 529). Recall those study participants were randomly assigned to play the role of gift-giver or gift-receiver. Gift-receivers were asked to provide the level of appreciation (measured on a 7-point scale where 1 = “not at all” and 7 = “to a great extent”) they had for the last birthday gift they received from a loved one. Gift-givers were asked to recall the last birthday gift they gave to a loved one and to provide the level of appreciation the loved one had for the gift.

  1. Write a dummy variable regression model that will allow the researchers to compare the average level of appreciation for birthday gift-giverswith the average for birthday gift-receivers.
  2. Express each of the model’s β parameters in terms ofand.
  3. The researchers hypothesize that the average level of appreciation is higher for birthday gift-givers than for birthday gift-receivers. Explain how to test this hypothesis using the regression model.

Personality traits and job performance. When attempting to predict job performance using personality traits, researchers typically assume that the relationship is linear. A study published in the Journal of Applied Psychology (Jan. 2011) investigated a curvilinear relationship between job task performance and a specific personality trait—conscientiousness. Using data collected for 602 employees of a large public organization, task performance was measured on a 30-point scale (where higher scores indicate better performance) and conscientiousness was measured on a scale of -3 to +3 (where higher scores indicate a higher level of conscientiousness).

a. The coefficient of correlation relating task performance score to conscientiousness score was reported as r = 0.18. Explain why the researchers should not use this statistic to investigate the curvilinear relationship between task performance and conscientiousness.

b. Give the equation of a curvilinear (quadratic) model relating task performance score (y) to conscientiousness score (x).

c. The researchers theorized that task performance increases as level of conscientiousness increases, but at a decreasing rate. Draw a sketch of this relationship.

d. If the theory in part c is supported, what is the expected sign ofβ2in the model, part b?

e. The researchers reportedβ^2=0.32with an associated p-value of less than 0.05. Use this information to test the researchers’ theory atα=0.05

Question: Risk management performance. An article in the International Journal of Production Economics (Vol. 171, 2016) investigated the factors associated with a firm’s supply chain risk management performance (y). Five potential independent variables (all measured quantitatively) were considered: (1) firm size, (2) supplier orientation, (3) supplier dependency, (4) customer orientation, and (5) systemic purchasing. Consider running a stepwise regression to find the best subset of predictors for risk management performance.

a. How many 1-variable models are fit in step 1 of the stepwise regression?

b. Assume supplier orientation is selected in step 1. How many 2-variable models are fit in step 2 of the stepwise regression?

c. Assume systemic purchasing is selected in step 2. How many 3-variable models are fit in step 3 of the stepwise regression?

d. Assume customer orientation is selected in step 3. How many 4-variable models are fit in step 4 of the stepwise regression?

e. Through the first 4 steps of the stepwise regression, determine the total number of t-tests performed. Assuming each test uses an a = .05 level of significance, give an estimate of the probability of at least one Type I error in the stepwise regression.

Suppose you have developed a regression model to explain the relationship between y and x1, x2, and x3. The ranges of the variables you observed were as follows: 10 ≤ y ≤ 100, 5 ≤ x1 ≤ 55, 0.5 ≤ x2 ≤ 1, and 1,000 ≤ x3 ≤ 2,000. Will the error of prediction be smaller when you use the least squares equation to predict y when x1 = 30, x2 = 0.6, and x3 = 1,300, or when x1 = 60, x2 = 0.4, and x3 = 900? Why?

Question: Suppose you fit the regression modelE(y)=β0+β1x1+β2x2+β3x1+β4x12+β5x22to n = 30 data points and wish to test H0: β3 = β4 = β5 = 0

a. State the alternative hypothesis Ha.

b. Give the reduced model appropriate for conducting the test.

c. What are the numerator and denominator degrees of freedom associated with the F-statistic?

d. Suppose the SSE’s for the reduced and complete models are SSER = 1,250.2 and SSEC = 1,125.2. Conduct the hypothesis test and interpret the results of your test. Test using α = .05.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free