Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Outlier Refer to the accompanying Minitab-generated scatterplot. a. Examine the pattern of all 10 points and subjectively determine whether there appears to be a correlation between x and y. b. After identifying the 10 pairs of coordinates corresponding to the 10 points, find the value of the correlation coefficient r and determine whether there is a linear correlation. c. Now remove the point with coordinates (10, 10) and repeat parts (a) and (b). d. What do you conclude about the possible effect from a single pair of values?

Short Answer

Expert verified

a. The pattern indicates an upward trend.Hence, a correlation can be expected for variables x and y.

b. The values are recorded as shown below:

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

10

10

The correlation coefficient is 0.906.

Also, there is sufficient evidence to support the existence ofa linear correlation between the two variables.

c. The correlation coefficient is 0, and there is insufficient evidence to support the claim that there is a linear correlation between the two variables.

d. A single pair of values has a substantial effect on the correlation measure.

Step by step solution

01

Given information

The scatterplot generated on Minitab is given.

02

Analyze the scatterplot

a.

A scatterplot is a two-dimensional graph thatrepresents a pair of values for two variables.

Here,the scatterplot represents an overall upward pattern, which means the values of one variable are expected to increase with the other.

Due to this pattern and moderately close observations, it can be expected that there exists a linear correlation between the two variables.

03

Compute the measure of the correlation coefficient

b.

The observations obtained from the scatterplot are as follows:

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

10

10

The formula for the correlation coefficient is shown below:

r=nโˆ‘xyโˆ’(โˆ‘x)(โˆ‘y)n(โˆ‘x2)โˆ’(โˆ‘x)2n(โˆ‘y2)โˆ’(โˆ‘y)2

The valuesare in the table below:

x

y

x2

y2

xy

1

1

1

1

1

2

1

4

1

2

3

1

9

1

3

1

2

1

4

2

1

3

1

9

3

2

2

4

4

4

2

3

4

9

6

3

2

9

4

6

3

3

9

9

9

10

10

100

100

100

โˆ‘x=28

โˆ‘y=28

โˆ‘x2=142

โˆ‘y2=142

โˆ‘xy=136

Substitute the values to obtain the correlation coefficient.

r=10(136)โˆ’(28)(28)10(142)โˆ’(28)210(142)2โˆ’(28)2=0.906

Thus, the correlation coefficient is 0.906.

04

Step 4:Conduct a hypothesis test for correlation

Letฯbe the true correlation coefficient.

Form the hypotheses as shown:

Ho:ฯ=0Ha:ฯโ‰ 0

The samplesize is10(n).

The test statistic is computed as follows:

t=r1โˆ’r2nโˆ’2=0.9061โˆ’0.906210โˆ’2=6.054

Thus, the test statistic is 6.054.

The degree of freedom is computedbelow:

df=nโˆ’2=10โˆ’2=8

The p-value is computedfrom the t-distribution table.

pโˆ’value=2P(T>t)=2P(T>6.054)=2(1โˆ’P(T<6.054))=0.0003

As thep-value is lesser than 0.05, the null hypothesis is rejected.

Therefore, there is sufficient evidence to prove theexistence of a linear correlation between thetwo variables.

05

Analyze the scatterplot after removing the coordinates (10,10)

c.

The data without coordinate (10,10) is

x

y

1

1

2

1

3

1

1

2

1

3

2

2

2

3

3

2

3

3

The scatterplot hence formed is shown below:

Thus, there appears to be an association between the two variables.

06

Compute the correlation coefficient

The values are in the table below:

x

y

x2

y2

xy

1

1

1

1

1

2

1

4

1

2

3

1

9

1

3

1

2

1

4

2

1

3

1

9

3

2

2

4

4

4

2

3

4

9

6

3

2

9

4

6

3

3

9

9

9

โˆ‘x=18

โˆ‘y=18

โˆ‘x2=42

โˆ‘y2=42

โˆ‘xy=36

Substitute the values to obtain the correlation coefficient:

r=9(36)โˆ’(18)(18)9(42)โˆ’(18)29(42)2โˆ’(18)2=0

Thus, the correlation coefficient is 0.

07

Conduct a hypothesis test for correlation

Letฯdenote the actual correlation coefficient.

The hypotheses areformulatedas shown below

Ho:ฯ=0Ha:ฯโ‰ 0

The samplesize is9(n).

The test statistic is computed as follows:

t=r1โˆ’r2nโˆ’2=01โˆ’029โˆ’2=0

Thus, the test statistic is 0.

The degree of freedom is computedbelow:

df=nโˆ’2=9โˆ’2=7

The p-value is computed from the t-distribution table.

pโˆ’value=2P(t>0)=2(1โˆ’P(t<0))=1

As the p-value is greater than 0.05, the null hypothesis fails to be rejected.

Therefore, there is not sufficient evidence to support the claim that the variables have a linear correlation between them.

08

Discuss the effect of a single pair of values

The result changes to a large extent as one single paired observation is removed from the data. The correlation measure changes from 0.906 to 0 as one pair is removed from the data.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Exercises 13โ€“28 use the same data sets as Exercises 13โ€“28 in Section 10-1. In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted value by following the prediction procedure summarized in Figure 10-5 on page 493.

Use the CPI/subway fare data from the preceding exercise and find

the best predicted subway fare for a time when the CPI reaches 500. What is wrong with this prediction?

Testing for a Linear Correlation. In Exercises 13โ€“28, construct a scatterplot, and find the value of the linear correlation coefficient r. Also find the P-value or the critical values of r from Table A-6. Use a significance level of A = 0.05. Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section 10-2 exercises.)

POTUS Media periodically discuss the issue of heights of winning presidential candidates and heights of their main opponents. Listed below are those heights (cm) from severalrecent presidential elections (from Data Set 15 โ€œPresidentsโ€ in Appendix B). Is there sufficient evidence to conclude that there is a linear correlation between heights of winning presidential candidates and heights of their main opponents? Should there be such a correlation?

President

178

182

188

175

179

183

192

182

177

185

188

188

183

188

Opponent

180

180

182

173

178

182

180

180

183

177

173

188

185

175

a. What is a residual?

b. In what sense is the regression line the straight line that โ€œbestโ€ fits the points in a scatterplot?

Interpreting a Computer Display. In Exercises 9โ€“12, refer to the display obtained by using the paired data consisting of Florida registered boats (tens of thousands) and numbers of manatee deaths from encounters with boats in Florida for different recent years (from Data Set 10 in Appendix B). Along with the paired boat, manatee sample data, StatCrunch was also given the value of 85 (tens of thousands) boats to be used for predicting manatee fatalities.

Finding a Prediction Interval For a year with 850,000 (x = 852) registered boats in Florida, identify the 95% prediction interval estimate of the number of manatee fatalities resulting from encounters with boats. Write a statement interpreting that interval.

let the predictor variable x be the first variable given. Use the given data to find the regression equation and the best predicted value of the response variable. Be sure to follow the prediction procedure summarized in Figure 10-5 on page 493. Use a 0.05 significance level.

For 30 recent Academy Award ceremonies, ages of Best Supporting Actors (x) and ages of Best Supporting Actresses (y) are recorded. The 30 paired ages yieldxยฏ=52.1years,yยฏ=37.3years, r= 0.076, P-value = 0.691, and

y^=34.4+0.0547x. Find the best predicted value ofy^(age of Best Supporting Actress) in 1982, when the age of the Best Supporting Actor (x) was 46 years.

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free