当前课程知识点:现代优化方法(英文) > Chapter 4. Trust Region Methods > 4.3 Solving Subproblem > 4.3 Solving Subproblem
Hi, everyone.
Welcome to modern optimization methods.
In this part,
we will introduce the third part of chapter four,
the trust region method.
We will discuss how we solve the subproblem.
We will mainly discuss the Cauchy point method
as well as the dogleg method.
Now recall that
we try to find the approximate solution
for the trust region methods.
So the subproblem is like this.
We try to minimize a quadratic function subject
to a trust region radius.
Then there are three ways
to do the approximation solution.
One is the Cauchy point.
The second one is the dogleg method.
And the third one
is the two-dimensional subspace minimization method.
Let's discuss the first method,
the Cauchy point.
Now the idea of Cauchy point is like this.
So we try to use the steepest descent direction,
and then try to find out the solution
in terms of
if I walk along the steepest descent solution,
then what kind of solution we can get.
So the resulting point is called the Cauchy point.
Now the way we generate the Cauchy point
is like this.
Firstly,
we first find out the model function
with linear term and subject
to the trust region radius.
In other words,
in the first step,
we solve a linear model function optimization problem.
Of course,
we can find later that
the direction is the same as the steepest descent direction.
After we get the direction pks in the first step,
then we assume that the scalar τ multiplied by pks
is the solution of the quadratic function.
Then we try to find out the optimal τk,
which is kind of step length.
In other words,
in the second step for Cauchy point,
we actually minimize the quadratic model function by
substituting the pk by τpks.
So it's actually a function only in one variable τ.
Finally,
the Cauchy point is set as the τk multiplied by
the direction pks.
Now let's see how we can derive the Cauchy direction.
For the first step,
we can easily find that the Cauchy point
is a way of steepest descent direction
with a special step length.
After that,
let's put it into the model function,
and we do the simple calculation
to find out that the model function related
to τ is a quadratic function.
So for this quadratic function,
if we try to find the minimization,
recall that we also got a constraint
which is τ smaller than one greater than zero.
In this sense,
we try to find the minimization.
So the way we do is that
we first discuss the coefficient of the quadratic term.
So if the coefficient is negative,
so together with the linear term,
we can see that actually the model function
in this case takes the following shape.
So it is a quadratic function.
And also it is a concave function like this.
Recall that we are looking for the minimization problem
within the interval zero and one.
So in this case,
τ equals one will be the minimizer.
So this corresponds to case one.
And if for the case two,
the coefficient is greater than zero,
then the curve will be like this.
We got a quadratic function with the shape like this.
So in this case,
if we try to find the local minimizer
that we only need to take the minimization
of the function as well as one.
So whichever we reaches first,
then it will be the minimizer
of the quadratic function.
So this gives the two cases solution.
And finally,
we summarize the result in this page.
So this is the formula for Cauchy point.
In other words,
Cauchy point actually has explicit form.
Now we illustrate Cauchy point by a figure.
Now in this figure,
we are standing at the current iteration xk,
and the contours of model function
is the ellipsoid here.
So the Cauchy point actually takes the steepest descent direction.
However,
we restricted it to the step length here.
Because we are looking for the minimization
along this direction.
So the smallest function value
we can get along this direction is here.
So this give us the Cauchy point.
Now let's take a look at some remarks
about Cauchy point.
So the advantage of Cauchy point
is that it has an explicit form
calculation.
And it is crucial in deciding
whether an approximate solution will provide
good reduction or not,
and whether we can get global convergence or not.
So Cauchy point is kind of bridge
to set up the trust region method
with the convergence result.
We will see this later in the convergence part.
And however,
we may think about this issue
that is there any way to improve the Cauchy point?
Because in Cauchy point,
we only make use of the gradient information.
So this may slow down the process.
So is there any way to improve the Cauchy point?
So there are actually two ways to improve.
The first one is the dogleg method.
The second one
is the two-dimensional subspace minimization method.
In this part,
we first take a look at the dogleg method.
Now the idea for dogleg methods is as follow.
So we first take a look at the subproblem
which is minimizing a quadratic function
subject to some trust region constraint.
And we assume that for this subproblem,
we take Δ as the changing variable.
And we denote the optimal solution
of the subproblem as p*(Δ)
So as the Δ changes,
the optimal solution will change as well.
So p* is a function of this trust region radius.
From this sense,
and we can see that if the matrix p is positive definite,
then the direction pB defined as the Newton's direction.
It is the minimizer of the unconstrained problem.
So if we forget about the trust region restrictions,
then we solve the model functions directly,
then it will give us the result as pB is the Newton's direction.
So keep in mind the pB here.
So pB actually means the full Newton's step.
Now let's take a further look at different cases of Δ.
If this Δ is greater than the length of pB,
in other words,
Newton's direction is achieved
within the trust region.
So from this case,
it actually means that
the trust region problem reduces to an unconstrained problem.
Because we do not have to worry about the trust region radius.
Newton's direction is achieved
within this trust region radius.
So in this case,
p*(Δ) will be the trust region step.
However,
if we change this Δ to be very small,
then sometimes we can make this pB outside of this trust region.
In this case,
this trust region constraint does take effect.
It actually ensures that
the quadratic term in the model function
has little effect on the solution of p*.
Why?
Because the trust region radius
is small compared to the model function.
Or in other words,
if we restrict the trust region
to be a very small area,
then in this case,
the curvature information,
we can neglect it.
So we drop the second order information,
and we only need the linear term of the model function.
In this case,
we can get the p*
is approximately related to the gradient.
In other words,
when the radius is very small,
we just use the linear approximation for the model function.
So for the intermediate Δ
between very small size to very large size,
this p*(Δ) follows a curved trajectory.
So let's illustrate this by the following figure.
Now let's take a look at this figure.
We put a dotted line here
to demonstrate the trust region.
And we are currently sitting at the point xk,
so notice that at this moment,
we are trying to change the trust region radius Δ.
So for each Δ,
we put out its optimal solution p*(Δ).
We set them to get this dotted curve here.
So this dotted curve actually is the p*(Δ).
For example,
if I draw a trust region circle here,
the intersection of this curve
actually represents the optimal solution
of the subproblem.
So if we get a bigger trust region size,
we may get an intersection here.
So this point represents the optimal solution
of the subproblem.
So from this way,
we actually get a dotted curve for the optimal solution
with respect to trust region radius.
Then what is the dogleg method?
So the idea of dogleg method
is that when the Δ is very small,
we try to use the steepest descent direction
to do the approximation.
For example,
when Δ is like this,
a very small trust region,
then we just take the steepest descent direction.
It is very similar to the optimal trajectory here.
And if this Δ trust region is very huge,
then we just take the full Newton step.
And between the two size,
we use a dogleg shape
or we use two line segments
to approximate the optimal trajectory.
This will become the dogleg method.
So it's kind of a leg.
So that's why we call it dogleg method.
Now mathematically speaking,
we put the dogleg method in a mathematical way.
So we define two directions.
The first one is sort of line search direction
or the steepest descent direction,
because we got coefficient here and the gradient.
So it represents when the radius is very small,
we move along this direction.
When the radius is very huge,
we use the second line segment.
The second line segment is starting from this point.
We move toward the full Newton direction.
So this becomes the second line segment from pU to pB.
Then the optimal solution of dogleg method
is actually like this.
We construct manually a function p tilde (τ)
to approximate the optimal trajectory.
And this is our dogleg path.
The intersection of this dogleg path
with the trust region radius
will be the optimal solution of our subproblem.
So in other words,
in each iteration,
we actually solve the following problem.
We have defined the dogleg path p tilde (τ) here,
where this τ is actually a variable.
And we try to minimize this p tilde (τ) in terms of model function.
Then this will give us the dogleg method.
In other words,
dogleg method is also a one variable method.
However,
it does not move along a straight line.
It has two line segments.
It provide more choices for us to get a local minimizer.
Now let's make a brief summary about this part.
In this part,
we mainly discuss how we can solve the subproblem.
We discuss the Cauchy point,
which has an explicit form to calculate,
as well as the dogleg method.
How we design the dogleg method?
And what is the subproblem in dogleg method?
So we stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业