当前课程知识点:现代优化方法(英文) >  Chapter 4. Trust Region Methods >  4.3 Solving Subproblem >  4.3 Solving Subproblem

返回《现代优化方法(英文)》慕课在线视频课程列表

4.3 Solving Subproblem在线视频

下一节:4.4 Solving Subproblem II

返回《现代优化方法(英文)》慕课在线视频列表

4.3 Solving Subproblem课程教案、知识点、字幕

Hi, everyone.

Welcome to modern optimization methods.

In this part,

we will introduce the third part of chapter four,

the trust region method.

We will discuss how we solve the subproblem.

We will mainly discuss the Cauchy point method

as well as the dogleg method.

Now recall that

we try to find the approximate solution

for the trust region methods.

So the subproblem is like this.

We try to minimize a quadratic function subject

to a trust region radius.

Then there are three ways

to do the approximation solution.

One is the Cauchy point.

The second one is the dogleg method.

And the third one

is the two-dimensional subspace minimization method.

Let's discuss the first method,

the Cauchy point.

Now the idea of Cauchy point is like this.

So we try to use the steepest descent direction,

and then try to find out the solution

in terms of

if I walk along the steepest descent solution,

then what kind of solution we can get.

So the resulting point is called the Cauchy point.

Now the way we generate the Cauchy point

is like this.

Firstly,

we first find out the model function

with linear term and subject

to the trust region radius.

In other words,

in the first step,

we solve a linear model function optimization problem.

Of course,

we can find later that

the direction is the same as the steepest descent direction.

After we get the direction pks in the first step,

then we assume that the scalar τ multiplied by pks

is the solution of the quadratic function.

Then we try to find out the optimal τk,

which is kind of step length.

In other words,

in the second step for Cauchy point,

we actually minimize the quadratic model function by

substituting the pk by τpks.

So it's actually a function only in one variable τ.

Finally,

the Cauchy point is set as the τk multiplied by

the direction pks.

Now let's see how we can derive the Cauchy direction.

For the first step,

we can easily find that the Cauchy point

is a way of steepest descent direction

with a special step length.

After that,

let's put it into the model function,

and we do the simple calculation

to find out that the model function related

to τ is a quadratic function.

So for this quadratic function,

if we try to find the minimization,

recall that we also got a constraint

which is τ smaller than one greater than zero.

In this sense,

we try to find the minimization.

So the way we do is that

we first discuss the coefficient of the quadratic term.

So if the coefficient is negative,

so together with the linear term,

we can see that actually the model function

in this case takes the following shape.

So it is a quadratic function.

And also it is a concave function like this.

Recall that we are looking for the minimization problem

within the interval zero and one.

So in this case,

τ equals one will be the minimizer.

So this corresponds to case one.

And if for the case two,

the coefficient is greater than zero,

then the curve will be like this.

We got a quadratic function with the shape like this.

So in this case,

if we try to find the local minimizer

that we only need to take the minimization

of the function as well as one.

So whichever we reaches first,

then it will be the minimizer

of the quadratic function.

So this gives the two cases solution.

And finally,

we summarize the result in this page.

So this is the formula for Cauchy point.

In other words,

Cauchy point actually has explicit form.

Now we illustrate Cauchy point by a figure.

Now in this figure,

we are standing at the current iteration xk,

and the contours of model function

is the ellipsoid here.

So the Cauchy point actually takes the steepest descent direction.

However,

we restricted it to the step length here.

Because we are looking for the minimization

along this direction.

So the smallest function value

we can get along this direction is here.

So this give us the Cauchy point.

Now let's take a look at some remarks

about Cauchy point.

So the advantage of Cauchy point

is that it has an explicit form

calculation.

And it is crucial in deciding

whether an approximate solution will provide

good reduction or not,

and whether we can get global convergence or not.

So Cauchy point is kind of bridge

to set up the trust region method

with the convergence result.

We will see this later in the convergence part.

And however,

we may think about this issue

that is there any way to improve the Cauchy point?

Because in Cauchy point,

we only make use of the gradient information.

So this may slow down the process.

So is there any way to improve the Cauchy point?

So there are actually two ways to improve.

The first one is the dogleg method.

The second one

is the two-dimensional subspace minimization method.

In this part,

we first take a look at the dogleg method.

Now the idea for dogleg methods is as follow.

So we first take a look at the subproblem

which is minimizing a quadratic function

subject to some trust region constraint.

And we assume that for this subproblem,

we take Δ as the changing variable.

And we denote the optimal solution

of the subproblem as p*(Δ)

So as the Δ changes,

the optimal solution will change as well.

So p* is a function of this trust region radius.

From this sense,

and we can see that if the matrix p is positive definite,

then the direction pB defined as the Newton's direction.

It is the minimizer of the unconstrained problem.

So if we forget about the trust region restrictions,

then we solve the model functions directly,

then it will give us the result as pB is the Newton's direction.

So keep in mind the pB here.

So pB actually means the full Newton's step.

Now let's take a further look at different cases of Δ.

If this Δ is greater than the length of pB,

in other words,

Newton's direction is achieved

within the trust region.

So from this case,

it actually means that

the trust region problem reduces to an unconstrained problem.

Because we do not have to worry about the trust region radius.

Newton's direction is achieved

within this trust region radius.

So in this case,

p*(Δ) will be the trust region step.

However,

if we change this Δ to be very small,

then sometimes we can make this pB outside of this trust region.

In this case,

this trust region constraint does take effect.

It actually ensures that

the quadratic term in the model function

has little effect on the solution of p*.

Why?

Because the trust region radius

is small compared to the model function.

Or in other words,

if we restrict the trust region

to be a very small area,

then in this case,

the curvature information,

we can neglect it.

So we drop the second order information,

and we only need the linear term of the model function.

In this case,

we can get the p*

is approximately related to the gradient.

In other words,

when the radius is very small,

we just use the linear approximation for the model function.

So for the intermediate Δ

between very small size to very large size,

this p*(Δ) follows a curved trajectory.

So let's illustrate this by the following figure.

Now let's take a look at this figure.

We put a dotted line here

to demonstrate the trust region.

And we are currently sitting at the point xk,

so notice that at this moment,

we are trying to change the trust region radius Δ.

So for each Δ,

we put out its optimal solution p*(Δ).

We set them to get this dotted curve here.

So this dotted curve actually is the p*(Δ).

For example,

if I draw a trust region circle here,

the intersection of this curve

actually represents the optimal solution

of the subproblem.

So if we get a bigger trust region size,

we may get an intersection here.

So this point represents the optimal solution

of the subproblem.

So from this way,

we actually get a dotted curve for the optimal solution

with respect to trust region radius.

Then what is the dogleg method?

So the idea of dogleg method

is that when the Δ is very small,

we try to use the steepest descent direction

to do the approximation.

For example,

when Δ is like this,

a very small trust region,

then we just take the steepest descent direction.

It is very similar to the optimal trajectory here.

And if this Δ trust region is very huge,

then we just take the full Newton step.

And between the two size,

we use a dogleg shape

or we use two line segments

to approximate the optimal trajectory.

This will become the dogleg method.

So it's kind of a leg.

So that's why we call it dogleg method.

Now mathematically speaking,

we put the dogleg method in a mathematical way.

So we define two directions.

The first one is sort of line search direction

or the steepest descent direction,

because we got coefficient here and the gradient.

So it represents when the radius is very small,

we move along this direction.

When the radius is very huge,

we use the second line segment.

The second line segment is starting from this point.

We move toward the full Newton direction.

So this becomes the second line segment from pU to pB.

Then the optimal solution of dogleg method

is actually like this.

We construct manually a function p tilde (τ)

to approximate the optimal trajectory.

And this is our dogleg path.

The intersection of this dogleg path

with the trust region radius

will be the optimal solution of our subproblem.

So in other words,

in each iteration,

we actually solve the following problem.

We have defined the dogleg path p tilde (τ) here,

where this τ is actually a variable.

And we try to minimize this p tilde (τ) in terms of model function.

Then this will give us the dogleg method.

In other words,

dogleg method is also a one variable method.

However,

it does not move along a straight line.

It has two line segments.

It provide more choices for us to get a local minimizer.

Now let's make a brief summary about this part.

In this part,

we mainly discuss how we can solve the subproblem.

We discuss the Cauchy point,

which has an explicit form to calculate,

as well as the dogleg method.

How we design the dogleg method?

And what is the subproblem in dogleg method?

So we stop here and see you next time.

现代优化方法(英文)课程列表:

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

--8.1 KKT conditions

-8.2 An Example

--8.2 An Example

-8.3 Dual Problem

--8.3 Dual Problem

-课件

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

4.3 Solving Subproblem笔记与讨论

也许你还感兴趣的课程:

© 柠檬大学-慕课导航 课程版权归原始院校所有,
本网站仅通过互联网进行慕课课程索引,不提供在线课程学习和视频,请同学们点击报名到课程提供网站进行学习。