当前课程知识点:现代优化方法(英文) > Chapter 4. Trust Region Methods > 4.4 Solving Subproblem II > 4.4 Solving Subproblem II
Hi.
Welcome to modern optimization methods.
Today we will talk about the fourth part of chapter four,
how we solve the subproblem.
We will mainly introduce the dogleg method,
as well as the two-dimensional subspace minimization.
Actually,
we have introduced the dogleg method
in the previous part.
Recall that the dogleg method basically
take two line segments
to approximate the optimal trajectory
of the subproblem.
So we choose the first line segment as pU this direction.
The second line segment is connecting pU
with the full Newton step pB.
So with the two line segments the dogleg method
basically solves the following subproblem.
We substitute the two line segments path
into the model function,
and we try to find out the one parameter τ.
This will be the dogleg method.
Let's first talk about the properties
of dogleg method.
So we have the two properties
about the dogleg method.
First is the length of p tilde (τ)
is an increasing function of τ.
In other words,
as τ increases,
the length of p tilde (τ) will increase as well.
So this is a natural common sense
from our analysis before.
And the second result is that the model function
with respect to p tilde (τ) is a decreasing function about τ.
In other words,
as τ grows bigger and bigger,
the model function will be decreased.
Now to prove the result,
let's first take a look at the first part.
So for the first part,
actually when τ is between zero and one,
the result is very trivial.
So we didn't present this part.
We only focus on τ between one and two.
To show that the length of p(τ)
is an increasing function,
we define function h(α) as the norm of p tilde (τ).
So it is a norm square of p tilde (τ).
And meanwhile,
we do a translation.
We replace the τ between one and two by a function
of α between zero and one.
And 1+α is the original τ.
Now for h(α) the interval
is between zero and one
If we do the expansion to h(α) ,
we can get the following expansion.
So we got a norm about a constant pU
and the linear term about α like this.
And you got a quadratic term about α
with coefficient pB minus pU the quadratic norm.
So to show that
h(α) is an increasing function,
notice that h(α) is a quadratic function about α.
So we can take the gradient to show
the increasing property.
Now let's take the h'(α),
which is the gradient of α.
Then we try to prove that it is nonnegative.
By doing this,
we can show the increasing property about h.
Now let's see whether h'(α) is nonnegative or not.
So if we take gradient,
then we can get the constant term
and the linear term about α.
By this prime expression,
because you got a second term to be nonnegative one,
so it is greater than the first term.
So this is a very trivial one.
Then next we will substitute our definition for pU and pB
to get this equality here,
we rearrange a little bit about this part
then we can get the equality here.
We got some coefficient here and then the red part.
So actually,
the red part is a nonnegative one.
Then we can see that this is greater than
or equal to zero.
So overall,
we can prove that h'(α) is nonnegative one.
Therefore,
the original h(α) is an increasing function about α,
which then shows that the length of p tilde (α)
is an increasing function about α.
Here we leave you a homework to try to show that
this red part is a nonnegative one.
So this is a homework for you.
Now let's take a look at the second comment.
So the second one is that the model function
with respect to τ is a decreasing function.
To show the decreasing function,
we also use the prime or the gradient.
So if you can show the gradient of m
with respect to τ is nonpositive,
then we have shown the result.
Now let's take a look at this one.
We still replace this τ with 1+α.
And we take α between zero and one.
So we define a new function h hat about α like this.
We do the expansion about h hat (α).
We take the prime about h hat (α).
Because h hat (α) is already a quadratic function,
and we directly take the gradient,
then we get the following result,
which is about pB and pU
as well as some terms about α.
If we notice that B here
is positive definite matrix,
and for any positive definite matrix,
this term here is actually a nonnegative one.
So with the fact that
α is between zero and one.
So this second term
is smaller than or equal to the term that
α replaced by one.
After that,
we enlarge our condition by this term.
We put the pB-pU this term out.
Then we do a submission about the left part.
Now after a simple calculation,
we can see that this term is actually zero.
By this way,
we actually show that the gradient of h hat (α)
is also nonpositive one.
In other words,
h'(α) is smaller than or equal to zero.
Therefore,
the original model function with respect to α
is a decreasing function.
This gives our result.
With the two properties mentioned above,
then we can now solve the dogleg method.
So the dogleg method,
actually we solve this model function related to τ.
And we already know the properties about m(τ).
So how do we solve this model function?
We can check the following properties actually.
So the path p tilde (τ) actually intersects the boundary.
So it is actually the intersection
between the trust region boundary
with the dogleg path.
So we need to check
whether the pB reaches the boundary or not.
If the full Newton step
is already inside of the trust region,
then the optimal solution will be exactly the full Newton step.
Otherwise,
it means that the p(τ) will exactly reach the boundary.
It means that we can directly solve the line segment
to intersect with the trust region radius.
So we solve this equation with respect to τ
is actually a quadratic equation.
And we can find the root of this equation,
which will give us the solution of τ.
However,
do remember that the τ will lie between zero and two.
It cannot lie outside of this region.
Now as for the choice of the second order matrix Bk
we have some comments.
So the first choice of Bk
is that we can use the Hessian matrix at the current iteration.
So if this Hessian matrix is available and is positive definite,
then we can replace this Bk by the Hessian matrix.
Otherwise,
we need some choices.
We may approximate the Hessian matrix on our own.
For example,
we use some trust region strategy
or you will use some quasi-Newton method to estimate this Bk
or we use some part information of the Hessian matrix.
So in this case,
Bk can be defined as the modified Hessian matrix.
So it's kind of modification
because we cannot know all the information of Hessian matrix.
Now let's take a look at the two-dimensional subspace
minimization technique.
As the name mentioned here,
that we can see that
there are actually two dimensions in the subspace method.
So assume that Bk is positive definite
and we try to solve the model function like this,
which is as usual as before.
However,
the constraint changes a little bit.
Besides the trust region constraint,
we also require that p lies in the subspace
spanned by the gradient g as well as the Newton step (B^-1)g.
So in this case,
the two-subspace dimensional case comes from here.
So p is actually a linear combination of the gradient direction
as well as the Newton's direction.
So you may think about the dogleg method.
As we mentioned before,
the dogleg method is actually two line segments.
The first line is along the steepest descent direction.
And the second line
is actually moving from the steepest descent direction
to the full Newton's direction.
So it is actually a special case about the subspace method.
It's a linear combination of the two directions.
However,
it is a specified combination.
However,
here in two-dimensional subspace minimization,
we allow more freedom to the coefficients
of the two directions.
So we require that p
is just the combination of the two parts.
There is no restrictions on the coefficient.
So it will allow us even more smaller optimization values
than dogleg method.
So if we compare the
two-dimensional subspace minimization method
with the dogleg method,
then the two-dimensional subspace method
will definitely returns a smaller model function value
than the dogleg method.
This is a crucial property
when we try to prove the convergence result
for trust region method.
We also have some remarks about
the two-dimensional subspace minimization.
Firstly,
this subproblem we have inexpensive computational cost
because it only involves two directions
and it is a two-dimensional subspace method.
And the result or the subproblem
can be reduced to roots of a fourth degree equations.
So recall that for a dogleg method,
it actually reduces to roots of a four order equation.
By the relation of dogleg method
with the subspace method,
we can guarantee the global convergence
for the two minimization methods.
Finally,
it is actually the extension of the dogleg method
as I just mentioned.
Finally,
a very special feature for this
two-dimensional subspace minimization
is that it is particularly proper for the case
that B is not positive definite.
So it is probably used in some cases
where the Hessian matrix is not positive definite.
It is very useful in that case.
Now let's make a brief summary about this part.
We have introduced the dogleg method,
how to solve it
as well as the two-dimensional subspace minimization method.
So in the next part,
we will introduce the convergence of trust region method.
So we stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业