Chapter 9: Problem 14

Consider the nonlinear least square problem of minimizing $$ \phi(\mathbf{x})=\frac{1}{2}\|\mathbf{g}(\mathbf{x})-\mathbf{b}\|^{2} $$ (a) Show that $$ \nabla \phi(\mathbf{x})=A(\mathbf{x})^{T}(\mathbf{g}(\mathbf{x})-\mathbf{b}) $$ where $A$ is the $m \times n$ Jacobian matrix of $\mathbf{g}$. (b) Show that $$ \nabla^{2} \phi(\mathbf{x})=A(\mathbf{x})^{T} A(\mathbf{x})+L(\mathbf{x}) $$ where $L$ is an $n \times n$ matrix with elements $$ L_{i, j}=\sum_{k=1}^{m} \frac{\partial^{2} g_{k}}{\partial x_{i} \partial x_{j}}\left(g_{k}-b_{k}\right) $$ You may want to check first what $\frac{\partial \phi}{\partial x_{j}}$ looks like for a fixed $i$; later on, look at $\frac{\partial^{2} \phi}{\partial x_{i} \partial x_{j}}$ for a fixed $j .]$

Short Answer

Expert verified

Question: Describe how to find the gradient and Hessian of the function $\phi(\mathbf{x}) = \frac{1}{2}\|\mathbf{g}(\mathbf{x})-\mathbf{b}\|^2$ using its Jacobian matrix, $A(\mathbf{x})$, and an additional matrix, $L(\mathbf{x})$. Answer: To find the gradient of $\phi(\mathbf{x})$, first compute the partial derivatives of $\phi$ with respect to each $x_i$. Then, express the gradient as a column vector, with the partial derivatives computed earlier as the elements. Finally, the gradient can be expressed as $\nabla \phi(\mathbf{x}) = A(\mathbf{x})^T(\mathbf{g}(\mathbf{x}) - \mathbf{b})$. To find the Hessian of $\phi(\mathbf{x})$, compute the second-order partial derivatives of $\phi$ with respect to every pair of $x_i, x_j$. Then, express the second-order partial derivatives using the Jacobian matrix $A$ and the matrix $L$, where $L_{i, j} = \sum_{k=1}^m \frac{\partial^2 g_k}{\partial x_i \partial x_j}(g_k - b_k)$. Finally, the Hessian can be expressed as $\nabla^2 \phi(\mathbf{x}) = A(\mathbf{x})^T A(\mathbf{x}) + L(\mathbf{x})$.

Step by step solution

Find the partial derivative of $\phi$ with respect to $x_i$

Let's first find the expression for the partial derivative of $\phi$ with respect to any $x_i$: $$ \frac{\partial \phi}{\partial x_i} = \frac{\partial}{\partial x_i}\left(\frac{1}{2}\|\mathbf{g}(\mathbf{x})-\mathbf{b}\|^2\right) $$ Now, applying the chain rule: $$ \frac{\partial \phi}{\partial x_i} = (\mathbf{g}(\mathbf{x})-\mathbf{b})^T\left(\frac{\partial \mathbf{g}(\mathbf{x})}{\partial x_i}\right) $$

Express partial derivative using Jacobian matrix

Now, let's denote by $A_{i, j}$ the $i$-th row and $j$-th column of the Jacobian matrix $A$. This means that $A_{i, j} = \frac{\partial g_j}{\partial x_i}$. We have $$ A^T = \begin{bmatrix}\frac{\partial g_1}{\partial x_1} & \frac{\partial g_2}{\partial x_1} & \cdots & \frac{\partial g_m}{\partial x_1} \\ \frac{\partial g_1}{\partial x_2} & \frac{\partial g_2}{\partial x_2} & \cdots & \frac{\partial g_m}{\partial x_2} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_1}{\partial x_n} & \frac{\partial g_2}{\partial x_n} & \cdots & \frac{\partial g_m}{\partial x_n}\end{bmatrix} $$ From the expression in step 1, the gradient will be: $$ \nabla \phi(\mathbf{x}) = \begin{bmatrix}\frac{\partial \phi}{\partial x_1} \\ \frac{\partial \phi}{\partial x_2} \\ \vdots \\ \frac{\partial \phi}{\partial x_n}\end{bmatrix} $$ Therefore, $$ \nabla \phi(\mathbf{x}) = A(\mathbf{x})^T(\mathbf{g}(\mathbf{x}) - \mathbf{b}) $$

Find second-order partial derivatives of $\phi$

Now let's compute the second-order partial derivatives of $\phi$ with respect to any pair $x_i, x_j$: $$ \frac{\partial^2 \phi}{\partial x_i \partial x_j} = \frac{\partial}{\partial x_j}\left(\frac{\partial \phi}{\partial x_i}\right) = \frac{\partial}{\partial x_j}\left[(\mathbf{g}(\mathbf{x})-\mathbf{b})^T\left(\frac{\partial \mathbf{g}(\mathbf{x})}{\partial x_i}\right)\right] $$ Applying the chain rule again, we get: $$ \frac{\partial^2 \phi}{\partial x_i \partial x_j} = \frac{\partial \mathbf{g}(\mathbf{x})}{\partial x_j}^T \frac{\partial \mathbf{g}(\mathbf{x})}{\partial x_i} + (\mathbf{g}(\mathbf{x})-\mathbf{b})^T \frac{\partial^2 \mathbf{g}(\mathbf{x})}{\partial x_i \partial x_j} $$

Express second-order partial derivatives using Jacobian matrix and matrix $L$

Now, let's express the second-order partial derivatives using the Jacobian matrix $A$ and the matrix $L$. The element of the Hessian matrix, $\nabla^2 \phi(\mathbf{x})$, will have the form: $$ [\nabla^2 \phi(\mathbf{x})]_{ij} = A_{j, k} A_{i, k} + L_{i, j} $$ where $$ L_{i, j} = \sum_{k=1}^m \frac{\partial^2 g_k}{\partial x_i \partial x_j}(g_k - b_k) $$ Since every element of $\nabla^2\phi(\mathbf{x})$ can be expressed in this way, we can conclude that: $$ \nabla^2 \phi(\mathbf{x}) = A(\mathbf{x})^T A(\mathbf{x}) + L(\mathbf{x}) $$

Unlock Step-by-Step Solutions & Ace Your Exams!

Full Textbook Solutions
Get detailed explanations and key concepts
Unlimited Al creation
Al flashcards, explanations, exams and more...
Ads-free access
To over 500 millions flashcards
Money-back guarantee
We refund you if you fail your exam.

Start your free trial

Over 30 million students worldwide already upgrade their learning with Vaia!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

Understanding the Jacobian Matrix in Nonlinear Least Squares

The Jacobian matrix is a key component when solving nonlinear least squares problems. In such problems, we seek to minimize a function, $ \phi(\mathbf{x}) $, that measures the residual between a predicted vector, $ \mathbf{g}(\mathbf{x}) $, and an observed vector, $ \mathbf{b} $.

The Jacobian matrix, denoted $ A(\mathbf{x}) $, includes the first-order partial derivatives of a vector-valued function, which in our case is $ \mathbf{g}(\mathbf{x}) $. More specifically, each entry $ A_{i,j} $ corresponds to the partial derivative of the i-th component of $ \mathbf{g} $ with respect to the j-th variable, $ x_i $.

Knowing the Jacobian is crucial because it provides the gradient of $ \phi(\mathbf{x}) $, which is represented as $ abla \phi(\mathbf{x}) = A(\mathbf{x})^{T}(\mathbf{g}(\mathbf{x}) - \mathbf{b}) $. This gradient is essentially the direction in which $ \phi $ increases the fastest and is used to iteratively refine our estimates of $ \mathbf{x} $ to minimize the least squares problem.

Understanding the construction and use of the Jacobian matrix can improve the grasp of optimization algorithms such as the Gauss-Newton or Levenberg-Marquardt method, which are commonly employed to solve nonlinear least squares problems.

The Role of Gradient and Hessian Matrices in Optimization

The gradient and Hessian matrices play pivotal roles in optimization, particularly in the context of nonlinear least squares. The gradient of a function, expressed as $ abla \phi(\mathbf{x}) $, provides the direction of steepest ascent, which is essential in gradient-based optimization methods. In our discussion, it relates to how rapidly the residual $ \phi(\mathbf{x}) $ is changing with respect to $ \mathbf{x} $.

The Hessian matrix, which is the square matrix of second-order partial derivatives, offers deeper insights into the curvature of $ \phi(\mathbf{x}) $. In other words, it tells us how the gradient itself changes, guiding us on how to adjust our steps when searching for the function's minimum. This matrix is represented as $ abla^2 \phi(\mathbf{x}) = A(\mathbf{x})^{T} A(\mathbf{x}) + L(\mathbf{x}) $, where $ L $ is a matrix comprised of the products of second-order partial derivatives and the residuals.

Having an understanding of these matrices enables students to better appreciate iterative optimization procedures and to approach the nonlinear least squares problems with greater confidence and clarity.

Applying the Chain Rule in Differentiation to Solve Nonlinear Problems

The chain rule is an essential tool in calculus, especially when dealing with complex functions. It enables us to differentiate composite functions by breaking them down into their individual components. In the realm of optimization, particularly for the nonlinear least squares problem we're discussing, the chain rule allows us to find the derivative of our objective function $ \phi(\mathbf{x}) $ with respect to the variables $ \mathbf{x} $.

For instance, to derive the gradient $ abla \phi(\mathbf{x}) $, we apply the chain rule to differentiate $ \phi $ as it pertains to each variable $ x_i $. We further employ the rule when computing the Hessian matrix by finding the second-order partial derivatives.

Understanding the application of the chain rule can significantly aid in grasping more complex concepts in optimization and is an invaluable skill in the toolbox of anyone working with nonlinear least squares problems. It assists in elucidating the relationship between the variables and the overall behavior of the objective function, thereby enhancing problem-solving efficiency and efficacy.

Short Answer

Step by step solution

Find the partial derivative of \(\phi\) with respect to \(x_i\)

Express partial derivative using Jacobian matrix

Find second-order partial derivatives of \(\phi\)

Express second-order partial derivatives using Jacobian matrix and matrix \(L\)

Key Concepts

Understanding the Jacobian Matrix in Nonlinear Least Squares

The Role of Gradient and Hessian Matrices in Optimization

Applying the Chain Rule in Differentiation to Solve Nonlinear Problems

One App. One Place for Learning.

Most popular questions from this chapter

Recommended explanations on Math Textbooks

Pure Maths

Geometry

Mechanics Maths

Theoretical and Mathematical Physics

Discrete Mathematics

Probability and Statistics

Study anywhere. Anytime. Across all devices.

Company

Product

Help