当前课程知识点:现代优化方法(英文) >  Chapter 4. Trust Region Methods >  4.4 Solving Subproblem II >  4.4 Solving Subproblem II

返回《现代优化方法(英文)》慕课在线视频课程列表

4.4 Solving Subproblem II在线视频

下一节:4.5 Convergence

返回《现代优化方法(英文)》慕课在线视频列表

4.4 Solving Subproblem II课程教案、知识点、字幕

Hi.

Welcome to modern optimization methods.

Today we will talk about the fourth part of chapter four,

how we solve the subproblem.

We will mainly introduce the dogleg method,

as well as the two-dimensional subspace minimization.

Actually,

we have introduced the dogleg method

in the previous part.

Recall that the dogleg method basically

take two line segments

to approximate the optimal trajectory

of the subproblem.

So we choose the first line segment as pU this direction.

The second line segment is connecting pU

with the full Newton step pB.

So with the two line segments the dogleg method

basically solves the following subproblem.

We substitute the two line segments path

into the model function,

and we try to find out the one parameter τ.

This will be the dogleg method.

Let's first talk about the properties

of dogleg method.

So we have the two properties

about the dogleg method.

First is the length of p tilde (τ)

is an increasing function of τ.

In other words,

as τ increases,

the length of p tilde (τ) will increase as well.

So this is a natural common sense

from our analysis before.

And the second result is that the model function

with respect to p tilde (τ) is a decreasing function about τ.

In other words,

as τ grows bigger and bigger,

the model function will be decreased.

Now to prove the result,

let's first take a look at the first part.

So for the first part,

actually when τ is between zero and one,

the result is very trivial.

So we didn't present this part.

We only focus on τ between one and two.

To show that the length of p(τ)

is an increasing function,

we define function h(α) as the norm of p tilde (τ).

So it is a norm square of p tilde (τ).

And meanwhile,

we do a translation.

We replace the τ between one and two by a function

of α between zero and one.

And 1+α is the original τ.

Now for h(α) the interval

is between zero and one

If we do the expansion to h(α) ,

we can get the following expansion.

So we got a norm about a constant pU

and the linear term about α like this.

And you got a quadratic term about α

with coefficient pB minus pU the quadratic norm.

So to show that

h(α) is an increasing function,

notice that h(α) is a quadratic function about α.

So we can take the gradient to show

the increasing property.

Now let's take the h'(α),

which is the gradient of α.

Then we try to prove that it is nonnegative.

By doing this,

we can show the increasing property about h.

Now let's see whether h'(α) is nonnegative or not.

So if we take gradient,

then we can get the constant term

and the linear term about α.

By this prime expression,

because you got a second term to be nonnegative one,

so it is greater than the first term.

So this is a very trivial one.

Then next we will substitute our definition for pU and pB

to get this equality here,

we rearrange a little bit about this part

then we can get the equality here.

We got some coefficient here and then the red part.

So actually,

the red part is a nonnegative one.

Then we can see that this is greater than

or equal to zero.

So overall,

we can prove that h'(α) is nonnegative one.

Therefore,

the original h(α) is an increasing function about α,

which then shows that the length of p tilde (α)

is an increasing function about α.

Here we leave you a homework to try to show that

this red part is a nonnegative one.

So this is a homework for you.

Now let's take a look at the second comment.

So the second one is that the model function

with respect to τ is a decreasing function.

To show the decreasing function,

we also use the prime or the gradient.

So if you can show the gradient of m

with respect to τ is nonpositive,

then we have shown the result.

Now let's take a look at this one.

We still replace this τ with 1+α.

And we take α between zero and one.

So we define a new function h hat about α like this.

We do the expansion about h hat (α).

We take the prime about h hat (α).

Because h hat (α) is already a quadratic function,

and we directly take the gradient,

then we get the following result,

which is about pB and pU

as well as some terms about α.

If we notice that B here

is positive definite matrix,

and for any positive definite matrix,

this term here is actually a nonnegative one.

So with the fact that

α is between zero and one.

So this second term

is smaller than or equal to the term that

α replaced by one.

After that,

we enlarge our condition by this term.

We put the pB-pU this term out.

Then we do a submission about the left part.

Now after a simple calculation,

we can see that this term is actually zero.

By this way,

we actually show that the gradient of h hat (α)

is also nonpositive one.

In other words,

h'(α) is smaller than or equal to zero.

Therefore,

the original model function with respect to α

is a decreasing function.

This gives our result.

With the two properties mentioned above,

then we can now solve the dogleg method.

So the dogleg method,

actually we solve this model function related to τ.

And we already know the properties about m(τ).

So how do we solve this model function?

We can check the following properties actually.

So the path p tilde (τ) actually intersects the boundary.

So it is actually the intersection

between the trust region boundary

with the dogleg path.

So we need to check

whether the pB reaches the boundary or not.

If the full Newton step

is already inside of the trust region,

then the optimal solution will be exactly the full Newton step.

Otherwise,

it means that the p(τ) will exactly reach the boundary.

It means that we can directly solve the line segment

to intersect with the trust region radius.

So we solve this equation with respect to τ

is actually a quadratic equation.

And we can find the root of this equation,

which will give us the solution of τ.

However,

do remember that the τ will lie between zero and two.

It cannot lie outside of this region.

Now as for the choice of the second order matrix Bk

we have some comments.

So the first choice of Bk

is that we can use the Hessian matrix at the current iteration.

So if this Hessian matrix is available and is positive definite,

then we can replace this Bk by the Hessian matrix.

Otherwise,

we need some choices.

We may approximate the Hessian matrix on our own.

For example,

we use some trust region strategy

or you will use some quasi-Newton method to estimate this Bk

or we use some part information of the Hessian matrix.

So in this case,

Bk can be defined as the modified Hessian matrix.

So it's kind of modification

because we cannot know all the information of Hessian matrix.

Now let's take a look at the two-dimensional subspace

minimization technique.

As the name mentioned here,

that we can see that

there are actually two dimensions in the subspace method.

So assume that Bk is positive definite

and we try to solve the model function like this,

which is as usual as before.

However,

the constraint changes a little bit.

Besides the trust region constraint,

we also require that p lies in the subspace

spanned by the gradient g as well as the Newton step (B^-1)g.

So in this case,

the two-subspace dimensional case comes from here.

So p is actually a linear combination of the gradient direction

as well as the Newton's direction.

So you may think about the dogleg method.

As we mentioned before,

the dogleg method is actually two line segments.

The first line is along the steepest descent direction.

And the second line

is actually moving from the steepest descent direction

to the full Newton's direction.

So it is actually a special case about the subspace method.

It's a linear combination of the two directions.

However,

it is a specified combination.

However,

here in two-dimensional subspace minimization,

we allow more freedom to the coefficients

of the two directions.

So we require that p

is just the combination of the two parts.

There is no restrictions on the coefficient.

So it will allow us even more smaller optimization values

than dogleg method.

So if we compare the

two-dimensional subspace minimization method

with the dogleg method,

then the two-dimensional subspace method

will definitely returns a smaller model function value

than the dogleg method.

This is a crucial property

when we try to prove the convergence result

for trust region method.

We also have some remarks about

the two-dimensional subspace minimization.

Firstly,

this subproblem we have inexpensive computational cost

because it only involves two directions

and it is a two-dimensional subspace method.

And the result or the subproblem

can be reduced to roots of a fourth degree equations.

So recall that for a dogleg method,

it actually reduces to roots of a four order equation.

By the relation of dogleg method

with the subspace method,

we can guarantee the global convergence

for the two minimization methods.

Finally,

it is actually the extension of the dogleg method

as I just mentioned.

Finally,

a very special feature for this

two-dimensional subspace minimization

is that it is particularly proper for the case

that B is not positive definite.

So it is probably used in some cases

where the Hessian matrix is not positive definite.

It is very useful in that case.

Now let's make a brief summary about this part.

We have introduced the dogleg method,

how to solve it

as well as the two-dimensional subspace minimization method.

So in the next part,

we will introduce the convergence of trust region method.

So we stop here and see you next time.

现代优化方法(英文)课程列表:

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

--8.1 KKT conditions

-8.2 An Example

--8.2 An Example

-8.3 Dual Problem

--8.3 Dual Problem

-课件

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

4.4 Solving Subproblem II笔记与讨论

也许你还感兴趣的课程:

© 柠檬大学-慕课导航 课程版权归原始院校所有,
本网站仅通过互联网进行慕课课程索引,不提供在线课程学习和视频,请同学们点击报名到课程提供网站进行学习。