Warning: foreach() argument must be of type array|object, bool given in /var/www/html/web/app/themes/studypress-core-theme/template-parts/header/mobile-offcanvas.php on line 20

Suppose that we wish to construct the likelihood ratio statistic for comparison of the two linear models \(y=X_{1} \beta_{1}+\varepsilon\) and \(y=X_{1} \beta_{1}+X_{2} \beta_{2}+\varepsilon\), where the components of \(\varepsilon\) are independent normal variables with mean zero and variance \(\sigma^{2} ;\) call the corresponding residual sums of squares \(S S_{1}\) and \(S S\) on \(v_{1}\) and \(v\) degrees of freedom. (a) Show that the maximum value of the log likelihood is \(-\frac{1}{2} n(\log S S+1-\log n)\) for a model whose residual sum of squares is \(S S\), and deduce that the likelihood ratio statistic for comparison of the models above is \(W=n \log \left(S S_{1} / S S\right)\). (b) By writing \(S S_{1}=S S+\left(S S_{1}-S S\right)\), show that \(W\) is a monotonic function of the \(F\) statistic for comparison of the models. (c) Show that \(W \doteq\left(v_{1}-v\right) F\) when \(n\) is large and \(v\) is close to \(n\), and say why \(F\) would usually be preferred to \(W\).

Short Answer

Expert verified
The likelihood ratio statistic \( W = n \log(SS_1 / SS) \) relates to the F statistic as a monotonic function, and is approximated by \( (v_1-v)F \) when \( n \) is large, thereby favoring the use of \( F \) due to its interpretability and standardized distribution.

Step by step solution

01

Understanding the Likelihood Function

The likelihood function for a normal distribution is given by \( L(\beta,\sigma^2) = (2\pi\sigma^2)^{-n/2}\exp\left(-\frac{1}{2\sigma^2}\text{RSS}\right) \), where RSS is the residual sum of squares. For model comparison, we use the maximum likelihood estimation.
02

Finding the Maximum Log Likelihood

For a model with residual sum of squares \( SS \), the maximum likelihood occurs when \( \sigma^2 = SS/n \). Substitute this into the log likelihood function to get: \[ \log L = -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log \left(\frac{SS}{n}\right) - \frac{1}{2} n \] which simplifies to \( -\frac{1}{2}n( \log SS + 1 - \log n) \).
03

Deriving the Likelihood Ratio Statistic W

The likelihood ratio statistic \( W \) is given by the log likelihood difference between the models: \[ W = -\frac{1}{2}n(\log SS_1 + 1 - \log n) - \left(-\frac{1}{2}n(\log SS + 1 - \log n) \right) = n \log(SS_1 / SS) \]
04

Expressing SS1 in Terms of SS and F

Express the residual sum of squares \( SS_1 = SS + (SS_1 - SS) \). The expression \( (SS_1 - SS) \) can be related to the F statistic, which compares the variance between the models.
05

Showing the Monotonicity in Relation to F

The likelihood ratio statistic \( W = n \log(SS_1 / SS) \) can be transformed using the F statistic: \[ F = \frac{(SS_1 - SS)/(v_1 - v)}{SS/v} \]This shows that \( W \) is a monotonic function of \( F \) because they both reflect model improvement.
06

Approximating W with the F Statistic

For large \( n \) and when \( v \) is close to \( n \), the approximation \( W \approx (v_1 - v) F \) holds due to the relationship between \( SS_1 \), \( SS \), and the F-statistic's comparative measure of explained variance.
07

Preference for F over W

Typically, \( F \) is preferred over \( W \) because it provides a direct statistical test (ANOVA) with an established distribution under the null hypothesis, making it more interpretable and commonly used in hypothesis testing.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Linear Models
Linear Models are fundamental in statistical analysis for understanding relationships between variables. They express the response variable, often denoted as \( y \), as a linear combination of predictor variables, plus an error term \( \varepsilon \). In the exercise at hand, we compare two such models where:
  • Model 1: \( y = X_1 \beta_1 + \varepsilon \)
  • Model 2: \( y = X_1 \beta_1 + X_2 \beta_2 + \varepsilon \)
These models assess how additional variables may improve the explanation of the response variable. The error term represents the part of \( y \) not explained by the predictors and is assumed to be normally distributed with mean zero and variance \( \sigma^2 \). Understanding these models helps in gauging the effect of different predictors on the outcome variable.
Likelihood Function
The Likelihood Function is a core concept in parameter estimation, especially in the context of statistical modeling like linear models. It quantifies how likely it is to observe the given data under specific parameter values.
For models with normally distributed errors, the likelihood function is defined as:\[ L(\beta, \sigma^2) = (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2}\text{RSS}\right) \]where \( \text{RSS} \) is the residual sum of squares.

The goal is to maximize this likelihood with respect to the parameters \( \beta \) and \( \sigma^2 \). Doing so provides maximum likelihood estimates (MLEs), which are parameter values that make the observed data most probable. In the exercise's context, the maximum log likelihood for a model is expressed as:

\[ -\frac{1}{2} n( \log SS + 1 - \log n) \]where \( SS \) is the residual sum of squares for the model. This formulation simplifies the computation and comparison of models, leading us to the likelihood ratio test.
F Statistic
The F Statistic is a fundamental tool for comparing statistical models. Particularly in linear models, it helps determine if the addition of new predictors significantly improves the model.
  • Calculated as: \[ F = \frac{(SS_1 - SS)/(v_1 - v)}{SS/v} \]where \( SS_1 \) and \( SS \) are the residual sum of squares for the models being compared, and \( v_1 \) and \( v \) are their degrees of freedom, respectively.
  • The F statistic quantifies whether the model with more predictors is significantly better fitting compared to a simpler model.
  • When the F statistic is large, it indicates that the additional predictors significantly reduce the error from the model, suggesting better performance.
This measure is often preferred over the likelihood ratio because it fits into the Analysis of Variance (ANOVA) framework, providing a direct pathway to hypothesis testing.
Residual Sum of Squares
Residual Sum of Squares (RSS) plays a crucial role in assessing the fit of a linear model. It measures the total of squared differences between observed values and the values predicted by the model. The RSS is calculated as:
\[ RSS = \sum (y_i - \hat{y}_i)^2 \]where \( y_i \) are the observed values and \( \hat{y}_i \) are the predicted values from the model.

A smaller RSS indicates a better fit, as it signifies that the model's predictions are closer to actual observations. In model comparison, choosing a model with a lower RSS typically means a better overall fit.
  • For Model 1, the RSS is \( SS_1 \), and for Model 2, it is \( SS \).
  • These RSS values are pivotal in calculating the likelihood function and, subsequently, the likelihood ratio statistic.
The comparison between different models often hinges on analyzing these RSS values to determine improvement in model fit when additional variables are included.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

Over a period of \(2 m+1\) years the quarterly gas consumption of a particular household may be represented by the model $$ Y_{i j}=\beta_{i}+\gamma j+\varepsilon_{i j}, \quad i=1, \ldots, 4, j=-m,-m+1, \ldots, m-1, m $$ where the parameters \(\beta_{i}\) and \(\gamma\) are unknown, and \(\varepsilon_{i j} \stackrel{\text { iid }}{\sim} N\left(0, \sigma^{2}\right) .\) Find the least squares estimators and show that they are independent with variances \((2 m+1)^{-1} \sigma^{2}\) and \(\sigma^{2} /\left(8 \sum_{i=1}^{m} i^{2}\right)\) Show also that $$ (8 m-1)^{-1}\left[\sum_{i=1}^{4} \sum_{j=-m}^{m} Y_{i j}^{2}-(2 m+1) \sum_{i=1}^{4} \bar{Y}_{i}^{2}-\frac{2 \sum_{j=-m}^{m} j \bar{Y}_{. j}^{2}}{\sum_{i=1}^{m} i^{2}}\right] $$ is unbiased for \(\sigma^{2}\), where \(\bar{Y}_{i}=(2 m+1)^{-1} \sum_{j=-m}^{m} Y_{i j}\) and \(\bar{Y}_{. j}=\frac{1}{4} \sum_{i=1}^{4} Y_{i j}\).

The angles of the triangle \(\mathrm{ABC}\) are measured with \(\mathrm{A}\) and \(\mathrm{B}\) each measured twice and \(\mathrm{C}\) three times. All the measurements are independent and unbiased with common variance \(\sigma^{2}\). Find the least squares estimates of the angles \(\mathrm{A}\) and \(\mathrm{B}\) based on the seven measurements and calculate the variance of these estimates.

Write down the linear model corresponding to a simple random sample \(y_{1}, \ldots, y_{n}\) from the \(N\left(\mu, \sigma^{2}\right)\) distribution, and find the design matrix. Verify that $$ \widehat{\mu}=\left(X^{\mathrm{T}} X\right)^{-1} X^{\mathrm{T}} y=\bar{y}, \quad s^{2}=S S(\widehat{\beta}) /(n-p)=(n-1)^{-1} \sum\left(y_{j}-\bar{y}\right)^{2} $$

Suppose that random variables \(Y_{g j}, j=1, \ldots, n_{g}, g=1, \ldots, G\), are independent and that they satisfy the normal linear model \(Y_{g j}=x_{g}^{\mathrm{T}} \beta+\varepsilon_{g j}\). Write down the covariate matrix for this model, and show that the least squares estimates can be written as \(\left(X_{1}^{\mathrm{T}} W X_{1}\right)^{-1} X_{1}^{\mathrm{T}} W Z\), where \(W=\operatorname{diag}\left\\{n_{1}, \ldots, n_{G}\right\\}\), and the \(g\) th element of \(Z\) is \(n_{g}^{-1} \sum_{j} Y_{g j} .\) Hence show that weighted least squares based on \(Z\) and unweighted least squares based on \(Y\) give the same parameter estimates and confidence intervals, when \(\sigma^{2}\) is known. Why do they differ if \(\sigma^{2}\) is unknown, unless \(n_{g} \equiv 1 ?\) Discuss how the residuals for the two setups differ, and say which is preferable for modelchecking.

Consider the straight-line regression model \(y_{j}=\alpha+\beta x_{j}+\sigma \varepsilon_{j}, j=1, \ldots, n\). Suppose that \(\sum x_{j}=0\) and that the \(\varepsilon_{j}\) are independent with means zero, variances \(\varepsilon\), and common density \(f(\cdot)\) (a) Write down the variance of the least squares estimate of \(\beta\). (b) Show that if \(\sigma\) is known, the log likelihood for the data is $$ \ell(\alpha, \beta)=-n \log \sigma+\sum_{j=1}^{n} \log f\left(\frac{y_{j}-\alpha-\beta x_{j}}{\sigma}\right) $$ derive the expected information matrix for \(\alpha\) and \(\beta\), and show that the asymptotic variance of the maximum likelihood estimate of \(\beta\) can be written as \(\sigma^{2} /\left(i \sum x_{j}^{2}\right)\), where $$ i=\mathrm{E}\left\\{-\frac{d^{2} \log f(\varepsilon)}{d \varepsilon^{2}}\right\\} $$ Hence show that the the least squares estimate of \(\beta\) has asymptotic relative efficiency \(i / v \times 100 \%\) (c) Show that the cumulant-generating function of the Gumbel distribution, \(f(u)=\) \(\exp \\{-u-\exp (-u)\\},-\infty

See all solutions

Recommended explanations on Math Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.

Sign-up for free