Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: Accuracy of software effort estimates. Periodically, software engineers must provide estimates of their effort in developing new software. In the Journal of Empirical Software Engineering (Vol. 9, 2004), multiple regression was used to predict the accuracy of these effort estimates. The dependent variable, defined as the relative error in estimating effort, y = (Actual effort - Estimated effort)/ (Actual effort) was determined for each in a sample of n = 49 software development tasks. Eight independent variables were evaluated as potential predictors of relative error using stepwise regression. Each of these was formulated as a dummy variable, as shown in the table.

Company role of estimator: x1 = 1 if developer, 0 if project leader

Task complexity: x2 = 1 if low, 0 if medium/high

Contract type: x3 = 1 if fixed price, 0 if hourly rate

Customer importance: x4 = 1 if high, 0 if low/medium

Customer priority: x5 = 1 if time of delivery, 0 if cost or quality

Level of knowledge: x6 = 1 if high, 0 if low/medium

Participation: x7 = 1 if estimator participates in work, 0 if not

Previous accuracy: x8 = 1 if more than 20% accurate, 0 if less than 20% accurate

a. In step 1 of the stepwise regression, how many different one-variable models are fit to the data?

b. In step 1, the variable x1 is selected as the best one- variable predictor. How is this determined?

c. In step 2 of the stepwise regression, how many different two-variable models (where x1 is one of the variables) are fit to the data?

d. The only two variables selected for entry into the stepwise regression model were x1 and x8. The stepwise regression yielded the following prediction equation:

Give a practical interpretation of the β estimates multiplied by x1 and x8.

e) Why should a researcher be wary of using the model, part d, as the final model for predicting effort (y)?

Short Answer

Expert verified

Answer

a. Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

b. The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

c. 7 2-variable models are fitted.

d. The β estimates of x1 and x8are - 0.28 and 0.27. Negative sign of β1 indicate an inverse relationship between x1 and y and positive sign of β8 indicate a positive relationship between x8 and y.

e. Precautions while using stepwise model - First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.

Step by step solution

01

1-variable models

Since there are 8 independent variables, there will be 8 1-variable models which will be fitted to the data.

02

Best predictor variable

The best predictor variable is selected by comparing the t-values of all the variables. The variable with the highest absolute t-value is selected.

03

2-variable model

Since there are 8 independent variables, (k-1) no of models are 2-variable models are fitted in step 2 of stepwise regression.

So, 7 2-variable models are fitted.

04

Interpretation of β estimates

The β estimates of x1 and x8are – 0.28 and 0.27. Negative sign of β1 indicate an inverse relationship between x1 and y and positive sign of β8 indicate a positive relationship between x8 and y.

05

Precautions while using stepwise model

Precautions while using stepwise model -

First, an extremely large number of t-tests have been conducted, leading to a high probability of making one or more Type I or Type II errors. Second, the stepwise model does not include any higher-order or interaction terms.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Role of retailer interest on shopping behavior. Retail interest is defined by marketers as the level of interest a consumer has in a given retail store. Marketing professors investigated the role of retailer interest in consumers’ shopping behavior (Journal of Retailing, Summer 2006). Using survey data collected for n = 375 consumers, the professors developed an interaction model for y = willingness of the consumer to shop at a retailer’s store in the future (called repatronage intentions) as a function of = consumer satisfaction and = retailer interest. The regression results are shown below.

(a) Is the overall model statistically useful for predicting y? Test using a=0.05

(b )Conduct a test for interaction at a= 0.05.

(c) Use the estimates to sketch the estimated relationship between repatronage intentions (y) and satisfaction when retailer interest is x2=1 (a low value).

(d)Repeat part c when retailer interest is x2= 7(a high value).

(e) Sketch the two lines, parts c and d, on the same graph to illustrate the nature of the interaction.

Question: Consider the model:

y=β0+β1x1+β2x2+β3x3+ε

where x1 is a quantitative variable and x2 and x3 are dummy variables describing a qualitative variable at three levels using the coding scheme

role="math" localid="1649846492724" x2=1iflevel20otherwisex3=1iflevel30otherwise

The resulting least squares prediction equation is y^=44.8+2.2x1+9.4x2+15.6x3

a. What is the response line (equation) for E(y) when x2 = x3 = 0? When x2 = 1 and x3 = 0? When x2 = 0 and x3 = 1?

b. What is the least squares prediction equation associated with level 1? Level 2? Level 3? Plot these on the same graph.

Question: Novelty of a vacation destination. Many tourists choose a vacation destination based on the newness or uniqueness (i.e., the novelty) of the itinerary. The relationship between novelty and vacationing golfers’ demographics was investigated in the Annals of Tourism Research (Vol. 29, 2002). Data were obtained from a mail survey of 393 golf vacationers to a large coastal resort in the south-eastern United States. Several measures of novelty level (on a numerical scale) were obtained for each vacationer, including “change from routine,” “thrill,” “boredom-alleviation,” and “surprise.” The researcher employed four independent variables in a regression model to predict each of the novelty measures. The independent variables were x1 = number of rounds of golf per year, x2 = total number of golf vacations taken, x3 = number of years played golf, and x4 = average golf score.

  1. Give the hypothesized equation of a first-order model for y = change from routine.
  1. A test of H0: β3 = 0 versus Ha: β3< 0 yielded a p-value of .005. Interpret this result if α = .01.
  1. The estimate of β3 was found to be negative. Based on this result (and the result of part b), the researcher concluded that “those who have played golf for more years are less apt to seek change from their normal routine in their golf vacations.” Do you agree with this statement? Explain.
  1. The regression results for three dependent novelty measures, based on data collected for n = 393 golf vacationers, are summarized in the table below. Give the null hypothesis for testing the overall adequacy of the first-order regression model.
  1. Give the rejection region for the test, part d, for α = .01.
  1. Use the test statistics reported in the table and the rejection region from part e to conduct the test for each of the dependent measures of novelty.
  1. Verify that the p-values reported in the table support your conclusions in part f.
  1. Interpret the values of R2 reported in the table.

Question: Predicting elements in aluminum alloys. Aluminum scraps that are recycled into alloys are classified into three categories: soft-drink cans, pots and pans, and automobile crank chambers. A study of how these three materials affect the metal elements present in aluminum alloys was published in Advances in Applied Physics (Vol. 1, 2013). Data on 126 production runs at an aluminum plant were used to model the percentage (y) of various elements (e.g., silver, boron, iron) that make up the aluminum alloy. Three independent variables were used in the model: x1 = proportion of aluminum scraps from cans, x2 = proportion of aluminum scraps from pots/pans, and x3 = proportion of aluminum scraps from crank chambers. The first-order model, , was fit to the data for several elements. The estimates of the model parameters (p-values in parentheses) for silver and iron are shown in the accompanying table.

(A) Is the overall model statistically useful (at α = .05) for predicting the percentage of silver in the alloy? If so, give a practical interpretation of R2.

(b)Is the overall model statistically useful (at a = .05) for predicting the percentage of iron in the alloy? If so, give a practical interpretation of R2.

(c)Based on the parameter estimates, sketch the relationship between percentage of silver (y) and proportion of aluminum scraps from cans (x1). Conduct a test to determine if this relationship is statistically significant at α = .05.

(d)Based on the parameter estimates, sketch the relationship between percentage of iron (y) and proportion of aluminum scraps from cans (x1). Conduct a test to determine if this relationship is statistically significant at α = .05.

Question: Study of supervisor-targeted aggression. “Moonlighters” are workers who hold two jobs at the same time. What are the factors that impact the likelihood of a moonlighting worker becoming aggressive toward his/her supervisor? This was the research question of interest in the Journal of Applied Psychology (July 2005). Completed questionnaires were obtained from n = 105 moonlighters, and the data were used to fit several multiple regression models for supervisor-directed aggression score 1y2. Two of the models (with R2-values in parentheses) are given below:

a. Interpret the R2-values for the models.

b. Give the null and alternative hypotheses for comparing the fits of models 1 and 2.

c. Are the two models nested? Explain.

d. The nested F-test for comparing the two models resulted in F = 42.13 and p-value < .001. What can you conclude from these results?

e. A third model was fit, one that hypothesizes all possible pairs of interactions between self-esteem, history of aggression, interactional injustice at primary job, and abusive supervisor at primary job. Give the equation of this model (model 3).

f. A nested F-test to compare models 2 and 3 resulted in a p-value > .10. What can you conclude from this result?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free