Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

The data points given in Exercises \(6-7\) were formed by reversing the slope of the lines in Exercises 4 - 5. Plot the points on graph paper and calculater and \(r^{2}\). Notice the change in the sign of \(r\) and the relationship between the values of \(r^{2}\) compared to Exercises \(4-5 .\) By what percentage was the sum of squares of deviations reduced by using the least-squares predictor \(\hat{y}=a+b x\) rather than \(\bar{y}\) as a predictor of \(y\) ? $$\begin{array}{l|llllll}x & 1 & 2 & 3 & 4 & 5 & 6 \\\\\hline y & 0 & 2 & 3 & 5 & 5 & 7\end{array}$$

Short Answer

Expert verified
Answer: The sum of squares of deviations was reduced by approximately 73.27% when using the least-squares predictor 𝑦̂ =𝑎+𝑏𝑥 rather than 𝑦¯ as a predictor of 𝑦.

Step by step solution

01

Calculate the mean of x and y values

To determine the mean, we will add up all the values for x and y and divide by the number of values. We have the following data points: $$\begin{array}{l|llllll}x & 1 & 2 & 3 & 4 & 5 & 6 \\\\\hline y & 0 & 2 & 3 & 5 & 5 & 7\end{array}$$ We have 6 data points, so \(\bar{x}=\frac{1+2+3+4+5+6}{6}=\frac{21}{6}=3.5\) and \(\bar{y}=\frac{0+2+3+5+5+7}{6}=\frac{22}{6}=3.\overline{6}\).
02

Calculate the covariance of x and y and the variance of x

Covariance quantifies the relationship between the x and y values. We will use the following formula to calculate the covariance: $$Cov(x,y)=\frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{n}$$ Variance of x is calculated as: $$Var(x)=\frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n}$$ Cov(x,y)= \(\frac{(1-3.5)(0-3.\overline{6})+\dots+(6-3.5)(7-3.\overline{6})}{6}\approx2.\overline{3}\) Var(x) = \(\frac{(1-3.5)^2+\dots+(6-3.5)^2}{6}\approx2.916\overline{6}\)
03

Determine the values of a and b

Using the formulas for the regression coefficients a and b, we can now calculate the values of a and b: $$b=\frac{Cov(x,y)}{Var(x)}$$ $$a=\bar{y}-b\bar{x}$$ Using the covariance and variance calculated in step 2, we have $$b\approx \frac{2.\overline{3}}{2.916\overline{6}} \approx 0.916\overline{6}$$ $$a\approx 3.\overline{6} - 0.916\overline{6} * 3.5 \approx -0.\overline{1}$$
04

Calculate the least-squares predictor for each x value

Now we can calculate the least-squares predictor \(\hat{y}=a+bx\) for each x value: $$\hat{y}\approx -0.\overline{1}+0.916\overline{6}x$$ Using this equation, we obtain the predicted y values for the given x values: $$\begin{array}{l|llllll}x & 1 & 2 & 3 & 4 & 5 & 6 \\\\\hline \hat{y} & 0.\overline{8} & 1.74\overline{2} & 2.66\overline{4} & 3.582 & 4.49\overline{8} & 5.41\overline{04}\end{array}$$
05

Calculate the sum of squares of deviations using the least-squares predictor and \(\bar{y}\) as a predictor

We will now compute the sum of squares of deviations using the least-squares predictor and \(\bar{y}\) as a predictor: $$SSD(\hat{y})=\sum_{i=1}^{n}(y_i-\hat{y}_i)^2$$ $$SSD(\bar{y})=\sum_{i=1}^{n}(y_i-\bar{y})^2$$ Using the observed and predicted y values, we calculate the sum of squares of deviations for both predictors: $$SSD(\hat{y})\approx(0-0.\overline{8})^2+(2-1.74\overline{2})^2+\dots+(7-5.41\overline{04})^2\approx5.87\overline{75}$$ $$SSD(\bar{y})\approx(0-3.\overline{6})^2+(2-3.\overline{6})^2+\dots+(7-3.\overline{6})^2\approx22$$
06

Determine the percentage reduction in the sum of squares of deviations

Finally, we can calculate the percentage reduction in the sum of squares of deviations by using the least-squares predictor \(\hat{y}=a+bx\) rather than \(\bar{y}\) as a predictor of \(y\): $$\text{Percentage Reduction}=\frac{SSD(\bar{y})-SSD(\hat{y})}{SSD(\bar{y})}\times 100$$ Using the calculated values from step 5, we obtain: $$\text{Percentage Reduction}\approx\frac{22-5.87\overline{75}}{22}\times100\approx73.\overline{27}\%$$ The sum of squares of deviations was reduced by approximately \(73.\overline{27}\%\) when using the least-squares predictor \(\hat{y}=a+bx\) rather than \(\bar{y}\) as a predictor of \(y\).

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Sum of Squares of Deviations
The sum of squares of deviations (SSD) is a measure that captures the total deviation of data points from their expected values. Essentially, it indicates how much the observed data spread around a given line, such as the least-squares regression line. In mathematical terms, if we have observed values of a variable, say \(y_i\), and a set of predicted values \(\hat{y}_i\), the SSD is calculated as:
\[ SSD = \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 \]
This formula squares the differences between the predicted and actual values, with the squaring process giving more weight to larger deviations. A lower SSD indicates that the model has a better fit to the data since the observations are closer to the line. The exercise's solution compared SSD using the least-squares predictor against using the mean of \(y\) as a predictor, demonstrating a significant reduction in the SSD and, thereby, showing the effectiveness of the regression model.
Correlation Coefficient (r)
The correlation coefficient, represented by \(r\), is a statistical metric expressing the degree of linear relationship between two variables. It ranges between -1 and +1, where +1 indicates a perfect positive linear correlation, -1 indicates a perfect negative linear correlation, and 0 signifies no linear correlation. A change in the sign of \(r\) implies a change in the direction of the relationship between the variables. In our context, calculating \(r\) helps in understanding how well the regression line represents the relationship in the data. The square of the correlation coefficient, \(r^2\), represents the proportion of variability in one variable that can be explained by its linear relationship with the other variable. In the given exercise, the relationship between the change in \(r\) and \(r^2\) compared to earlier exercises provides insights into the nature and strength of the linear association between the variables.
Variance and Covariance
Variance and covariance are concepts that describe how data points vary around the mean. Variance is a measure of the spread of a set of numbers. If we denote the mean of the observations as \(\bar{x}\), the variance of these observations \(Var(x)\) is calculated as:
\[ Var(x) = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n} \]
Covariance, on the other hand, extends this idea to two variables, showing how both variables vary with each other. For two sets of numbers \(x\) and \(y\), with their means being \(\bar{x}\) and \(\bar{y}\), respectively, covariance \(Cov(x,y)\) is calculated using the formula:
\[ Cov(x,y) = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{n} \]
These two metrics together are the backbone of linear regression analysis as they are used to calculate the regression line, capturing the relationship between the dependent variable \(y\) and independent variable \(x\).
Linear Regression Analysis
Linear regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship between the variables, characterized by the equation of the line \(\hat{y} = a + bx\), where \(a\) is the intercept, \(b\) is the slope, and \(\hat{y}\) are the predicted values. The goal is to find the line that best fits the data by minimizing the sum of squares of deviations. In the provided exercise, we looked at each step needed to construct this model from calculating the mean values, variance, and covariance, through to determining the regression coefficients and finally, the best-fit line. The resulting model's effectiveness was illustrated by a significant percentage reduction in the SSD when comparing predictions using the least-squares regression line versus predictions using the mean value of \(y\). The essence of linear regression is therefore to provide the most accurate predictions of the dependent variable, given the values of independent variables, harnessing the power of both variance and covariance.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

10\. Recidivism Recidivism refers to the return to prison of a prisoner who has been released or paroled. The data that follow reports the group median age at which a prisoner was released from a federal prison and the percentage of those arrested for another crime. \({ }^{7}\) Use the MS Excel printout to answer the questions that follow. $$ \begin{array}{l|lllllll} \text { Group Median Age }(x) & 22 & 27 & 32 & 37 & 42 & 47 & 52 \\ \hline \text { \% Arrested }(y) & 64.7 & 59.3 & 52.9 & 48.6 & 44.5 & 37.7 & 23.5 \end{array} $$ $$ \begin{aligned} &\text { SUMMARY OUTPUT }\\\ &\begin{array}{ll} \hline \text { Regression Statistics } & \\ \hline \text { Multiple R } & 0.9779 \\ \text { R Square } & 0.9564 \\ \text { Adjusted R Square } & 0.9477 \\ \text { Standard Error } & 3.1622 \\ \text { Observations } & 7.0000 \\ \hline \end{array} \end{aligned} $$ $$ \begin{aligned} &\text { ANOVA }\\\ &\begin{array}{llrrr} \hline & & & & {\text { Significance }} \\ & \text { df } & \text { SS } & \text { MS } & \text { F } & \text { F } \\ & & & & & \\ \hline \text { Regression } & 1 & 1096.251 & 1096.251 & 109.631 & 0.000 \\ \text { Residual } & 5 & 49.997 & 9.999 & & \\ \text { Total } & 6 & 1146.249 & & & \\ \hline \end{array} \end{aligned} $$ $$ \begin{array}{lrrrrrr} \hline& {\text { Coeffi- Standard }} \\ & \text { cients } & \text { Error } & \text { tStat } & \text { P-value } & \text { Lower } 95 \% & \text { Upper } 95 \% \\ \hline \text { Intercept } & 93.617 & 4.581 & 20.436 & 0.000 & 81.842 & 105.393 \\ \mathrm{x} & -1.251 & 0.120 & -10.471 & 0.000 & -1.559 & \- \\ \hline \end{array} $$ a. Find the least-squares line relating the percentage arrested to the group median age. b. Do the data provide sufficient evidence to indicate that \(x\) and \(y\) are linearly related? Test using the \(t\) statistic at the \(5 \%\) level of significance. c. Construct a \(95 \%\) confidence interval for the slope of the line. d. Find the coefficient of determination and interpret its significance.

Of two personnel evaluation methods, the first requires a two-hour test interview while the second can be completed in less than an hour. The scores for each of the 15 individuals who took both tests are given in the next table. $$\begin{array}{ccc}\hline \text { Applicant } & \text { Test } 1(x) & \text { Test } 2(y) \\\\\hline 1 & 75 & 38 \\\2 & 89 & 56 \\\3 & 60 & 35 \\\4 & 71 & 45 \\\5 & 92 & 59 \\\6 & 105 & 70 \\\7 & 55 & 31 \\\8 & 87 & 52 \\\9 & 73 & 48 \\\10 & 77 & 41\end{array}$$ $$\begin{array}{ccc}\hline \text { Applicant } & \text { Test } 1(x) & \text { Test } 2(y) \\\\\hline 11 & 84 & 51 \\\12 & 91 & 58 \\\13 & 75 & 45 \\\14 & 82 & 49 \\\15 & 76 & 47 \\\\\hline\end{array}$$ a. Construct a scatterplot for the data. Does the assumption of linearity appear to be reasonable? b. Find the least-squares line for the data. c. Use the regression line to predict the score on the second test for an applicant who scored 85 on Test 1 . d. Construct the ANOVA table for the linear regression relating \(y\) to \(x\).

The number of miles of U.S. urban roadways (millions of miles) for the years \(2000-2015\) is reported below. \({ }^{6}\) The years are simplified as years 0 through \(15 .\) $$ \begin{array}{l|cccccccc} \text { Miles of Road- } & & & & & & & & \\ \text { ways (millions) } & 0.85 & 0.88 & 0.89 & 0.94 & 0.98 & 1.01 & 1.03 & 1.04 \\ \hline \text { Year }-2000 & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \end{array} $$ $$ \begin{array}{l|cccccccc} \begin{array}{l} \text { Miles of Road- } \\ \text { ways (millions) } \end{array} & 1.07 & 1.08 & 1.09 & 1.10 & 1.11 & 1.18 & 1.20 & 1.21 \\ \hline \text { Year }-2000 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \end{array} $$ a. Draw a scatterplot of the number of miles of roadways in the U.S. over time. Describe the pattern that you see. b. Find the least-squares line describing these data. Do the data indicate that there is a linear relationship between the number of miles of roadways and the year? Test using a \(t\) statistic with \(\alpha=.05\). c. Construct the ANOVA table and use the \(F\) statistic to answer the question in part b. Verify that the square of the \(t\) statistic in part \(\mathrm{b}\) is equal to \(F\). d. Calculate \(r^{2}\). What does this value tell you about the effectiveness of the linear regression analysis?

Give the y-intercept and slope for the line. $$y=2 x+1$$

Give the y-intercept and slope for the line. $$y=2 x+3$$

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free