4.5 Convergence慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, everyone.

Welcome to modern optimization methods.

In this part,

we will introduce the last part of chapter four,

the convergence of the trust region method.

This part will include the following three parts.

First,

we will discuss the reduction obtained by the Cauchy point.

Then we will discuss the global convergence.

Finally, we will discuss some extensions.

Now let's first take a look at the Cauchy point.

So recall that in previous parts,

we have derived the Cauchy point.

So it's basically

a kind of steepest descent method

with a special step length.

And we have the following results

that the Cauchy point

actually satisfies the following conditions.

If we take a look at the condition,

it basically says that the decrease

provided by the Cauchy point in model function

is lower bounded by the right hand side

of this inequality.

So the right hand side is actually related to the norm of gradient,

as well as to a term of minimum functions.

So why this is important by Cauchy reduction?

This result actually set up a bridge

between the trust region method

and its global convergence.

We will show this result in the next page.

So with a sufficient reduction by Cauchy point,

then we can prove the following theorem.

So any optimization point pk

if it is a vector

provided by solving the subproblem,

satisfying the following condition

which is a

proportional of the reduction by Cauchy point.

In that case,

then we can similarly get the

sufficient reduction of this point.

So the reduction provided by this pk

will also be lower bounded

by a term here.

So the term here is similar

to the lower bound for Cauchy reduction,

however different in the ratio c1 here.

It means that

once we have set up the bridge

of Cauchy reduction,

then we can get

other trust region iterations,

which provide better

reduction than Cauchy reduction.

So the proof is actually very simple.

We just use the inequality between the

current iteration pk whose

reduction in model function will be a proportion

of the reduction in Cauchy point.

Then the Cauchy point will also provide a reduction like this.

Then we take the

first part and the last part.

Then we get the final result as required.

So the Cauchy point is a bridge

similar to line search results.

In line search results

the Zoutendijk theorem is a

bridge between the line search

strategy and the convergence result.

Now we have some remarks

about the dogleg method

and the two-dimensional subspace minimization method.

As we may recall that

the dogleg method

and the two-dimensional subspace minimization method

will always provide

better result than Cauchy point.

Therefore,

they will also have the

similar result as we mentioned in the previous page.

And they enjoyed

the lower bound property

provided by the dogleg method

and two-dimensional subspace minimization method.

So for the two ways of solving subproblem,

the model reduction is always satisfied.

Now let's present the first global convergence

theorem for trust region method.

In this part,

we consider the case that η is zero.

Recall in trust region algorithms,

we have a parameter η.

So this η is chosen

from greater than or equal to zero

until 1/4.

So in this case,

we only consider that η is zero.

And we have the following result.

So the result is,

in the bottom of this theorem,

it basically says that

the iteration xk

actually have a weak global convergence result.

So notice that

we have the limit and infinite

the two terms here,

it actually means that

we can find a subsequence

of the norm gradient

such that the limit is zero.

So it is a weak

global convergence result.

Now let's take a look at

the settings as well as the assumptions.

So the settings is in the above.

We assume that in model function,

the Hessian matrix Bk

it has a bounded upper bound.

So the norm of Bk

is upper bounded by a constant β.

Meanwhile,

we assume that

the function f is bounded below on the level set.

So this is reasonable because we are looking for

the minimum of this function.

So it must be lower bounded.

And then we also assume that the function f

is Lipschitz continuously differentiable

in this same neighborhood here.

For this S(Ro),

we assume that

y lies in the level set

and function f

the gradient is Lipschitz continuous

in this small local area.

So it is kind of weaker

Lipschitz continuity called locally Lipschitz continuity.

Further,

we assume that

all approximate solutions of the subproblem

satisfy the inequalities.

So the inequalities in the previous page actually says that

the subproblem

of the trust region will provide

sufficient reduction in model functions.

So with the assumptions here,

we assume that the length of the step pk

will be upper bounded by a ratio

of the trust region radius.

Then we have the weak convergence result.

So it seems that we got a lot of assumptions and settings.

However,

all of them are necessary

to get the weak global convergence result.

We will omit the proof of this theorem.

And we go on to discuss the second case

of the global convergence.

The second case of the global convergence

is actually the case that η

lies strictly between zero and 1/4.

For this case,

we actually need another way

to prove the result.

So we put it in a separate theorem.

The settings and assumptions

are the same as before.

And also the result

here is the same as before.

We still get weak global convergence result.

In a word,

for trust region method

under the certain conditions,

we can have global convergence result.

However,

in the sense of weak global convergence,

because we only got a subsequence

of the norm of gradient converges to zero.

Now,

in terms of the local convergence result,

as we may see that,

locally speaking,

the trust region method

is very similar to the Newton's method

or quasi-Newton's method.

Because in the local sense,

trust region method also do quadratic

approximation to the problem.

And Newton's method and quasi-Newton's method,

they also do quadratic approximation in the local area.

So from this point of view,

the behavior

of trust region method

in the local sense should be similar

like the Newton type methods.

Actually,

theoretically speaking,

we can indeed prove the asymptotically

Newton's behavior.

So the settings

we summarized in this page,

we have some assumptions on f

that f to be twice continuously differentiable

in a neighborhood of the optimal solution.

And the optimal solution,

we require that

the second-order sufficient conditions are satisfied,

which means that gradient to be zero

and Hessian matrix to be positive definite.

And assume that

the sequence converges to x*.

This is a very standard assumption as well.

And for all k sufficiently large,

the trust region method

based on solving the subproblem

with Bk exactly the Hessian matrix.

We choose the step pk

such that

the Cauchy point

sufficient reduction in model function

should be satisfied.

In this case,

we also require that

the Cauchy point model function reduction,

are asymptotically similar to Newton step.

Moreover, in mathematical sense,

we assume that

the current step pk

behaves very similar to the Newton's direction.

So the difference between the two terms

is a high order of Newton's direction.

After that,

under all these settings,

then we can present the local convergence result

for trust region methods.

Now with the settings above,

then we can have that

there is a trust region bound Δk become inactive.

So what means inactive?

In other words,

when the trust iteration goes on,

we can find some trust region radius Δk

that it will not take effect to the problem.

In other words,

the subproblem will reduce to an unconstrained problem.

In this case,

xk will converge superlinearly to x*.

Because locally speaking,

it indeed behaves

similar to the Newton's method.

So this is the local convergence

behavior of trust region method.

We need more assumptions in this theorem.

However,

these assumptions are indeed satisfied in real situations.

Now let's talk about some extensions.

The first extension is the scaling.

Scaling means that in some situations,

we need to do the scaling on the variables

in order to make this problem well defined.

In terms of trust region method,

notice that in trust region method,

we actually assume that

the variables behave similar for each of its components.

However, if for some particular components,

it varies in the large area

and for some other particular components,

it varies a little bit.

So in this case,

we need the scaling technique.

So how do we do the scaling?

The scaling basically

we can introduce the

diagonal matrix D

to the trust region area.

So originally we

restrict the trust region step length p to be smaller than

or equal to Δ.

Now we add a diagonal matrix D

before p

meaning that we ask the rescaled length of p

to within this area.

So by this rescaling,

then the problem seems to have better situation.

Now by this diagonal matrix,

then if we take a look at the model function

and the subproblem again,

so we got extra diagonal p over there.

When f is highly sensitive to the value of some components,

and it's not the sensitive to some value of some components.

Then we can introduce such diagonal matrix D

to represent the sensitivity of the components.

The diagonal matrix

can be derived from the Hessian matrix of the problem.

So introducing the diagonal matrix

may provide us some new problems.

For example,

for Cauchy point calculation,

if we got a scaling restriction,

then we need to derive the Cauchy point again.

And we can actually have similar results

as the normal Cauchy point.

So for this part,

we call it the generalized Cauchy point calculation.

In other words,

the model function is the same.

However,

the constraint changes a little bit.

Then we'll get different formulae.

So the formulae we present the result here,

but we omit the details how we derive it.

Now,

the last extension is that

we can also do the trust region in some other ways of norm.

For example,

naturally we use the Euclidean norm.

However,

in real practice,

when you need then you can choose L1 norm

or the L∞ norm

to measure the length of the vector.

So in that sense,

it is called the trust region method in L1 norm

or trust region method in L∞ norm.

We can also use some extensions,

for example,

the scaled L1 norm or scaled L∞ norm.

All of them are available or acceptable.

So for different real situations.

We may define different extensions

based on the application algorithms.

Now let's make a brief summary about this part.

In this part,

we actually introduce the global convergence

of trust region method,

as well as its convergence rate.

All of the theorems here we didn't provide the details.

I leave you as homework to derive the results.

Moreover,

we derive some extensions,

including scaling and other norms for trust region methods.

Now let's make a brief summary about chapter four.

Chapter four mainly discusses the trust region methods,

including the main idea,

the algorithms,

as well as how we solve the subproblem.

Finally,

we also consider the convergence performance of the methods.

We stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

4.5 Convergence在线视频

4.5 Convergence课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

4.5 Convergence笔记与讨论

也许你还感兴趣的课程: