Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Explore! Exercises 9 and 10 provide two data sets from “Graphs in Statistical Analysis,” by F. J. Anscombe, the American Statistician, Vol. 27. For each exercise,

a. Construct a scatterplot.

b. Find the value of the linear correlation coefficient r, then determine whether there is sufficient evidence to support the claim of a linear correlation between the two variables.

c. Identify the feature of the data that would be missed if part (b) was completed without constructing the scatterplot.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.10

6.13

3.10

9.13

7.26

4.74

Short Answer

Expert verified

a. The scatter plot is shown below:

b. The correlation coefficient is 0.8162. There is enough evidence to support the claim that there is a linear correlation between the two variables.

c. The scatterplot shows that the data follows a non-linear pattern missing in part (b).

Step by step solution

01

Given information

The paired data for two variables arerecorded.

x

10

8

13

9

11

14

6

4

12

7

5

y

9.14

8.14

8.74

8.77

9.26

8.1

6.13

3.1

9.13

7.26

4.74

02

Sketch a scatterplot

a.

A scatterplot is a graph that represents observations for a paired set of data.

Steps to sketch a scatterplot:

  1. Define thex and yaxes for each of the two variables. The horizontal axis is thex-axis, and the vertical axis is the y-axis.
  2. Map each paired value corresponding to the axes.
  3. Thus, a scatter plot for the paired data is obtained.

03

Compute the measure of the correlation coefficient

b.

The correlation coefficient is computed below:

\(r = \frac{{n\sum {xy} - \left( {\sum x } \right)\left( {\sum y } \right)}}{{\sqrt {n\left( {\sum {{x^2}} } \right) - {{\left( {\sum x } \right)}^2}} \sqrt {n\left( {\sum {{y^2}} } \right) - {{\left( {\sum y } \right)}^2}} }}\)

The valuesare listedin the table below:

x

y

\({x^2}\)

\({y^2}\)

\(xy\)

10

9.14

100

83.5396

91.4

8

8.14

64

66.2596

65.12

13

8.74

169

76.3876

113.62

9

8.77

81

76.9129

78.93

11

9.26

121

85.7476

101.86

14

8.1

196

65.61

113.4

6

6.13

36

37.5769

36.78

4

3.1

16

9.61

12.4

12

9.13

144

83.3569

109.56

7

7.26

49

52.7076

50.82

5

4.74

25

22.4676

23.7

\(\sum x = 99\)

\(\sum y = 82.51\)

\(\sum {{x^2}} = 1001\)

\(\sum {{y^2} = } \;660.1763\)

\(\sum {xy\; = \;} 797.59\)

Substitute the values in the formula:

\(\begin{aligned} r &= \frac{{11\left( {797.59} \right) - \left( {99} \right)\left( {82.51} \right)}}{{\sqrt {11\left( {1001} \right) - {{\left( {99} \right)}^2}} \sqrt {11{{\left( {660.1763} \right)}^2} - {{\left( {82.51} \right)}^2}} }}\\ &= 0.8162\end{aligned}\)

Thus, the correlation coefficient is 0.8162.

04

Step 4:Conduct a hypothesis test for correlation

Let\(\rho \)be the true correlation coefficient measure for the paired variables.

For testing the claim, form the hypotheses as shown below:

\(\begin{array}{l}{{\rm{H}}_{\rm{o}}}:\rho = 0\\{{\rm{{\rm H}}}_{\rm{a}}}:\rho \ne 0\end{array}\)

The samples size is11(n).

The test statistic is computed as follows:

\(\begin{aligned} t &= \frac{r}{{\sqrt {\frac{{1 - {r^2}}}{{n - 2}}} }}\\ &= \frac{{0.8162}}{{\sqrt {\frac{{1 - {{0.8162}^2}}}{{11 - 2}}} }}\\ &= 4.238\end{aligned}\)

Thus, the test statistic is 4.238.

The degree of freedom is computed below:

\(\begin{aligned} df &= n - 2\\ &= 11 - 2\\ &= 9\end{aligned}\)

The p-value is computed using the t-distribution table.

\(\begin{aligned} p{\rm{ - value}} &= 2P\left( {T > t} \right)\\ &= 2P\left( {T > 4.238} \right)\\ &= 2\left( {1 - P\left( {T < 4.238} \right)} \right)\\ &= 0.002\end{aligned}\)

Thus, the p-value is 0.002.

Since the p-value is lesser than 0.05, the null hypothesis is rejected.

Therefore, there is sufficient evidence to conclude that variables x and y have a linear correlation between them.

05

Analyze the importance of the scatterplot

c.

The scatterplot reveals that the data follows a strong non-linear pattern. It means that the observations do not align on a straight line.

The characteristic of the data would be missed in part (b) if the scatterplot was not sketched.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Prediction Interval Using the heights and weights described in Exercise 1, a height of 180 cm is used to find that the predicted weight is 91.3 kg, and the 95% prediction interval is (59.0 kg, 123.6 kg). Write a statement that interprets that prediction interval. What is the major advantage of using a prediction interval instead of simply using the predicted weight of 91.3 kg? Why is the terminology of prediction interval used instead of confidence interval?

Critical Thinking: Is the pain medicine Duragesic effective in reducing pain? Listed below are measures of pain intensity before and after using the drug Duragesic (fentanyl) (based on data from Janssen Pharmaceutical Products, L.P.). The data are listed in order by row, and corresponding measures are from the same subject before and after treatment. For example, the first subject had a measure of 1.2 before treatment and a measure of 0.4 after treatment. Each pair of measurements is from one subject, and the intensity of pain was measured using the standard visual analog score. A higher score corresponds to higher pain intensity.

Pain Intensity Before Duragesic Treatment

1.2

1.3

1.5

1.6

8

3.4

3.5

2.8

2.6

2.2

3

7.1

2.3

2.1

3.4

6.4

5

4.2

2.8

3.9

5.2

6.9

6.9

5

5.5

6

5.5

8.6

9.4

10

7.6










Pain Intensity After Duragesic Treatment

0.4

1.4

1.8

2.9

6

1.4

0.7

3.9

0.9

1.8

0.9

9.3

8

6.8

2.3

0.4

0.7

1.2

4.5

2

1.6

2

2

6.8

6.6

4.1

4.6

2.9

5.4

4.8

4.1










Two Independent Samples The methods of Section 9-2 can be used to test the claim that two populations have the same mean. Identify the specific claim that the treatment is effective, then use the methods of Section 9-2 to test that claim. The methods of Section 9-2 are based on the requirement that the samples are independent. Are they independent in this case?

Interpreting the Coefficient of Determination. In Exercises 5–8, use the value of the linear correlation coefficient r to find the coefficient of determination and the percentage of the total variation that can be explained by the linear relationship between the two variables.

Weight , Waist r = 0.885 (x = weight of male, y = waist size of male)

In Exercises 9 and 10, use the given data to find the equation of the regression line. Examine the scatterplot and identify a characteristic of the data that is ignored by the regression line.

The following exercises are based on the following sample data consisting of numbers of enrolled students (in thousands) and numbers of burglaries for randomly selected large colleges in a recent year (based on data from the New York Times).

Conclusion The linear correlation coefficient r is found to be 0.499, the P-value is 0.393, and the critical values for a 0.05 significance level are\( \pm 0.878\). What should you conclude?

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free