Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Effects of an Outlier Refer to the Minitab-generated scatterplot given in Exercise 11 of

Section 10-1 on page 485.

a. Using the pairs of values for all 10 points, find the equation of the regression line.

b. After removing the point with coordinates (10, 10), use the pairs of values for the remaining 9 points and find the equation of the regression line.

c. Compare the results from parts (a) and (b).

Short Answer

Expert verified

a.The regression equation is\(\hat y = 0.264 - 0.906x\).

b.The regression equation excluding the pair (10, 10) is\(\hat y = 2.00 - 0.00x\).

c. The regression equations obtained in parts (a) and (b) are completely different from one another. The presence of an outlier (10,10) affects the regression equation significantly

Step by step solution

01

Given information

A set of 10 pairs of values is considered.

02

Regression equation using all values

a.

The regression equation of y on x has the following notation:

\(\hat y = {b_0} + {b_1}x\),where

\({b_0}\)is the intercept term, and

\({b_1}\)is the slope coefficient.

The following data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {28} \right)\left( {142} \right) - \left( {28} \right)\left( {136} \right)}}{{10\left( {142} \right) - {{\left( {28} \right)}^2}}}\\ = 0.264\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {10} \right)\left( {136} \right) - \left( {28} \right)\left( {28} \right)}}{{10\left( {142} \right) - {{\left( {28} \right)}^2}}}\\ = 0.906\end{array}\).

Thus, the regression equation becomes

\(\hat y = 0.264 - 0.906x\).

03

Regression equation excluding the pair (10, 10)

b.

The following 9 pairs of data points are considered:

The following table shows the necessary calculations:

The value of the y-intercept is computed below.

\(\begin{array}{c}{b_0} = \frac{{\left( {\sum y } \right)\left( {\sum {{x^2}} } \right) - \left( {\sum x } \right)\left( {\sum {xy} } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( {18} \right)\left( {42} \right) - \left( {18} \right)\left( {36} \right)}}{{9\left( {42} \right) - {{\left( {18} \right)}^2}}}\\ = 2.000\end{array}\).

The value of the slope coefficient is computed below.

\(\begin{array}{c}{b_1} = \frac{{n\left( {\sum {xy} } \right) - \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}}}\\ = \frac{{\left( 9 \right)\left( {36} \right) - \left( {18} \right)\left( {18} \right)}}{{9\left( {42} \right) - {{\left( {18} \right)}^2}}}\\ = 0.000\end{array}\).

Thus, the regression equation becomes

\(\hat y = 2.00 - 0.00x\).

04

Comparison

c.

The regression equations obtained in parts (a) and (b) are completely different from one another.

Thus, the presence of an extreme data pair (10,10) can greatly influence the regression equation.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Testing for a Linear Correlation. In Exercises 13โ€“28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

CSI Statistics Use the paired foot length and height data from the preceding exercise. Is there sufficient evidence to conclude that there is a linear correlation between foot lengths and heights of males? Based on these results, does it appear that police can use foot length to estimate the height of a male?

Shoe print(cm)

29.7

29.7

31.4

31.8

27.6

Foot length(cm)

25.7

25.4

27.9

26.7

25.1

Height (cm)

175.3

177.8

185.4

175.3

172.7

Testing for a Linear Correlation. In Exercises 13โ€“28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

Internet and Nobel Laureates Listed below are numbers of Internet users per 100 people and numbers of Nobel Laureates per 10 million people (from Data Set 16 โ€œNobel Laureates and Chocolateโ€ in Appendix B) for different countries. Is there sufficient evidence to conclude that there is a linear correlation between Internet users and Nobel Laureates?

Internet Users

Nobel Laureates

79.5

5.5

79.6

9

56.8

3.3

67.6

1.7

77.9

10.8

38.3

0.1

Stocks and Sunspots. Listed below are annual high values of the Dow Jones Industrial Average (DJIA) and annual mean sunspot numbers for eight recent years. Use the data for Exercises 1โ€“5. A sunspot number is a measure of sunspots or groups of sunspots on the surface of the sun. The DJIA is a commonly used index that is a weighted mean calculated from different stock values.

DJIA

14,198

13,338

10,606

11,625

12,929

13,589

16,577

18,054

Sunspot

Number

7.5

2.9

3.1

16.5

55.7

57.6

64.7

79.3

1. Data Analysis Use only the sunspot numbers for the following.

a. Find the mean, median, range, standard deviation, and variance.

b. Are the sunspot numbers categorical data or quantitative data?

c. What is the level of measurement of the data? (nominal, ordinal, interval, ratio)

Testing for a Linear Correlation. In Exercises 13โ€“28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

Lemons and Car Crashes Listed below are annual data for various years. The data are weights (metric tons) of lemons imported from Mexico and U.S. car crash fatality rates per 100,000 population (based on data from โ€œThe Trouble with QSAR (or How I Learned to Stop Worrying and Embrace Fallacy),โ€ by Stephen Johnson, Journal of Chemical Information and Modeling, Vol. 48, No. 1). Is there sufficient evidence to conclude that there is a linear correlation between weights of lemon imports from Mexico and U.S. car fatality rates? Do the results suggest that imported lemons cause car fatalities?

Lemon Imports

230

265

358

480

530

Crash Fatality Rate

15.9

15.7

15.4

15.3

14.9

Exercises 13โ€“28 use the same data sets as Exercises 13โ€“28 in Section 10-1. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10-5 on page 493.

Using the listed duration and interval after times, find the best predicted โ€œinterval afterโ€ time for an eruption with a duration of 253 seconds. How does it compare to an actual eruption with a duration of 253 seconds and an interval after time of 83 minutes?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free