Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Question: Adverse effects of hot-water runoff. The Environmental Protection Agency (EPA) wants to determine whether the hot-water runoff from a particular power plant located near a large gulf is having an adverse effect on the marine life in the area. The goal is to acquire a prediction equation for the number of marine animals located at certain designated areas, or stations, in the gulf. Based on past experience, the EPA considered the following environmental factors as predictors for the number of animals at a particular station:

X1 = Temperature of water (TEMP)

X2 = Salinity of water (SAL)

X3 = Dissolved oxygen content of water (DO)

X4 = Turbidity index, a measure of the turbidity of the water (TI)

x5 = Depth of the water at the station (ST_DEPTH)

x6 = Total weight of sea grasses in sampled area (TGRSWT)

As a preliminary step in the construction of this model, the EPA used a stepwise regression procedure to identify the most important of these six variables. A total of 716 samples were taken at different stations in the gulf, producing the SPSS printout shown below. (The response measured was y, the logarithm of the number of marine animals found in the sampled area.)

a. According to the SPSS printout, which of the six independent variables should be used in the model? (Use α = .10.)

b. Are we able to assume that the EPA has identified all the important independent variables for the prediction of y? Why?

c. Using the variables identified in part a, write the first-order model with interaction that may be used to predict y.

d. How would the EPA determine whether the model specified in part c is better than the first-order model?

e.Note the small value of R2. What action might the EPA take to improve the model?

Short Answer

Expert verified

Answer

a. The variables which should be used in the model are ST_DEPTH, TGRSWT, and TI.

b. The EPA should not assume that they have identified all the important independent variables for prediction. The stepwise procedure tends to perform a large number of t-tests, inflating the overall probability of a Type I error, and does not automatically include higher-order terms (e.g., interactions and squared terms) in the final model which might not give all the important variables for prediction.

c. Using variables identified in part a, the first-order model with interaction can be written as E(y)=β0+β1(STDEPTH)+β2(TGRSWT)+β3(TI)+β4(STDEPTH)(TGRSWT)+β5(TGRSWT)(TI)+β6(STDEPTH)(TI).

d. To determine if model described in part c is better than first-order model, t-test hypothesis testing is conducted on interaction terms present in the model to check if they are statistically significant to the model or not.

e. The R2 values for the three models are 0.122, 0.182, and 0.187. These values are significantly low and indicate that the model fitted to the data is not a good fit. To improve the model, different sets of variables ca be used which explain the variation in the data better.

Step by step solution

01

Variable selection

From the SPSS printout, it is clear that for ST_DEPTH, TGRSWT, and TI the p-value are <0.050. At α = .10, if p-value < α then H0that the β parameter is not statistically significantrejected. Here for all three variables p-value < α indicating that all β values are statistically significant.

The variables which should be used in the model are ST_DEPTH, TGRSWT, and TI.

02

Drawbacks of stepwise regression model

The EPA should not assume that they have identified all the important independent variables for prediction. The stepwise procedure tends to perform a large number of t-tests, inflating the overall probability of a Type I error, and does not automatically include higher-order terms (e.g., interactions and squared terms) in the final model which might not give all the important variables for prediction.

03

Stepwise regression model

Using variables identified in part a, the first-order model with interaction can be written asE(y)=β0+β1(STDEPTH)+β2(TGRSWT)+β3(TI)+β4(STDEPTH)(TGRSWT)+β5(TGRSWT)(TI)+β6(STDEPTH)(TI).

04

Significance of interaction term

To determine if model described in part c is better than first-order model, t-test hypothesis testing is conducted on interaction terms present in the model to check if they are statistically significant to the model or not.

05

Interpretation of R2

The R2 values for the three models are 0.122, 0.182, and 0.187. These values are significantly low and indicate that the model fitted to the data is not a good fit. To improve the model, different sets of variables ca be used which explain the variation in the data better.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Question: Shared leadership in airplane crews. Refer to the Human Factors (March 2014) study of shared leadership by the cockpit and cabin crews of a commercial airplane, Exercise 8.14 (p. 466). Recall that simulated flights were taken by 84 six-person crews, where each crew consisted of a 2-person cockpit (captain and first officer) and a 4-person cabin team (three flight attendants and a purser.) During the simulation, smoke appeared in the cabin and the reactions of the crew were monitored for teamwork. One key variable in the study was the team goal attainment score, measured on a 0 to 60-point scale. Multiple regression analysis was used to model team goal attainment (y) as a function of the independent variables job experience of purser (x1), job experience of head flight attendant (x2), gender of purser (x3), gender of head flight attendant (x4), leadership score of purser (x5), and leadership score of head flight attendant (x6).

a. Write a complete, first-order model for E(y) as a function of the six independent variables.

b. Consider a test of whether the leadership score of either the purser or the head flight attendant (or both) is statistically useful for predicting team goal attainment. Give the null and alternative hypotheses as well as the reduced model for this test.

c. The two models were fit to the data for the n = 60 successful cabin crews with the following results: R2 = .02 for reduced model, R2 = .25 for complete model. On the basis of this information only, give your opinion regarding the null hypothesis for successful cabin crews.

d. The p-value of the subset F-test for comparing the two models for successful cabin crews was reported in the article as p 6 .05. Formally test the null hypothesis using α = .05. What do you conclude?

e. The two models were also fit to the data for the n = 24 unsuccessful cabin crews with the following results: R2 = .14 for reduced model, R2 = .15 for complete model. On the basis of this information only, give your opinion regarding the null hypothesis for unsuccessful cabin crews.

f. The p-value of the subset F-test for comparing the two models for unsuccessful cabin crews was reported in the article as p < .10. Formally test the null hypothesis using α = .05. What do you conclude?

Question: After-death album sales. When a popular music artist dies, sales of the artist’s albums often increase dramatically. A study of the effect of after-death publicity on album sales was published in Marketing Letters (March 2016). The following data were collected weekly for each of 446 albums of artists who died a natural death: album publicity (measured as the total number of printed articles in which the album was mentioned at least once during the week), artist death status (before or after death), and album sales (dollars). Suppose you want to use the data to model weekly album sales (y) as a function of album publicity and artist death status. Do you recommend using stepwise regression to find the “best” model for predicting y? Explain. If not, outline a strategy for finding the best model.

Suppose the mean value E(y) of a response y is related to the quantitative independent variables x1and x2

E(y)=2+x1-3x2-x1x2

a) Identify and interpret the slope forx2

b) Plot the linear relationship between E(y) andx2for role="math" localid="1649796003444" x1=0,1,2, whererole="math" localid="1649796025582" 1x23

c) How would you interpret the estimated slopes?

d) Use the lines you plotted in part b to determine the changes in E(y) for eachrole="math" localid="1649796051071" x1=0,1,2.

e) Use your graph from part b to determine how much E(y) changes whenrole="math" localid="1649796075921" 3x15androle="math" localid="1649796084395" 1x23.

Question: Orange juice demand study. A chilled orange juice warehousing operation in New York City was experiencing too many out-of-stock situations with its 96-ounce containers. To better understand current and future demand for this product, the company examined the last 40 days of sales, which are shown in the table below. One of the company’s objectives is to model demand, y, as a function of sale day, x (where x = 1, 2, 3, c, 40).

  1. Construct a scatterplot for these data.
  2. Does it appear that a second-order model might better explain the variation in demand than a first-order model? Explain.
  3. Fit a first-order model to these data.
  4. Fit a second-order model to these data.
  5. Compare the results in parts c and d and decide which model better explains variation in demand. Justify your choice.


Factors that impact an auditor’s judgment. A study was conducted to determine the effects of linguistic delivery style and client credibility on auditors’ judgments (Advances in Accounting and Behavioural Research, 2004). Two hundred auditors from Big 5 accounting firms were each asked to perform an analytical review of a fictitious client’s financial statement. The researchers gave the auditors different information on the client’s credibility and linguistic delivery style of the client’s explanation. Each auditor then provided an assessment of the likelihood that the client-provided explanation accounted for the fluctuation in the financial statement. The three variables of interest—credibility (x1), linguistic delivery style (x2) , and likelihood (y) —were all measured on a numerical scale. Regression analysis was used to fit the interaction model,y=β0+β1x1+β2x2+β3x1x2+ε . The results are summarized in the table at the bottom of page.

a) Interpret the phrase client credibility and linguistic delivery style interact in the words of the problem.

b) Give the null and alternative hypotheses for testing the overall adequacy of the model.

c) Conduct the test, part b, using the information in the table.

d) Give the null and alternative hypotheses for testing whether client credibility and linguistic delivery style interact.

e) Conduct the test, part d, using the information in the table.

f) The researchers estimated the slope of the likelihood–linguistic delivery style line at a low level of client credibility 1x1 = 222. Obtain this estimate and interpret it in the words of the problem.

g) The researchers also estimated the slope of the likelihood–linguistic delivery style line at a high level of client credibility 1x1 = 462. Obtain this estimate and interpret it in the words of the problem.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free