3.4 Convergence of Line Search Methods慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, everyone.

Welcome to modern optimization methods.

Today we will introduce the fourth part of chapter three,

the convergence of line search methods.

In this part, we will first talk about the Zoutendijk's Theorem,

which is a key point in proving the global convergence.

Then we will discuss the global convergence result.

Now let's recall the line search methods first.

So for line search methods,

we are always update the iteration like this.

We standing up at xk and we choose a direction first.

After that, we move along this direction with a step size αk.

Notice that this direction must be a descent direction.

To measure the global convergence, we need some notations.

Firstly, let's use cos(θk) to denote the vector

between the minus gradient and the pk vector.

So we use cos(θk) to measure their inner product.

This is a standard definition.

Now we have another notation called the level set.

So level set is defined as the collection of the x

whose function value is smaller than or equal

to the initial starting point x0.

So actually, we are looking for

the minimizer of this function inside of this level set,

because the function value will always be smaller than f(xo ).

Now let's recall the line search update,

as well as other settings for the Zoutendijk's Theorem.

So the line search strategy update is the following form,

as we just mentioned.

And we use the descent direction pk,

which means that this strictly negative condition must hold.

Then thirdly, the step length αk we require

that it satisfies the Wolfe condition.

so recall that we introduced the Wolfe condition

in the second part of chapter three.

And we detailed the Wolfe condition here

with some parameters between zero and one.

So this is the setting for the Zoutendijk's Theorem.

Also we have some assumptions for this theorem.

For example, the bounded below assumption,

we assume that the function we are looking for,

the minimizer is bounded below.

Otherwise, we cannot find the minimizer.

So this assumption is a reasonable one.

Another one is that we require f is continuously differentiable

in an open set N, which contains the level set.

So in other words, this function

must be continuously differentiable over all the level set.

The third one is about the gradient.

We require the gradient is Lipschitz continuous.

In other words, we have the constant L

to make the condition hold for any point in this N.

So these are three assumptions in Zoutendijk's Theorem.

Now with the above settings and assumptions,

we will have the result in the following.

So the result of Zoutendijk's Theorem basically says the series,

so this is the series, the series is convergent.

In other words, all the sequence (cos(θk))^2 multiplied

by the norm square of gradient, they are convergent.

This result is a key result to prove

the global convergence of line search methods.

And this result is due to Zoutendijk.

So we refer to this theorem as Zoutendijk's Theorem.

And this condition, we call it Zoutendijk condition.

Now let's take a look at the implementation of this result.

So before that,

let's first present how we can show the Zoutendijk's Theorem.

To show the Zoutendijk's Theorem,

the first step is that we have to prove the following conditions.

The bottom condition actually shows that

the difference between the two function values,

they have some lower bounds.

So to reach this aim, let's start from the first step,

one by one. Now step one,

we use the curvature condition in Wolfe condition.

So this is the second condition in the Wolfe condition.

And we add from both sides the gradient term here.

And we can reach this term.

So this term basically contains a difference between two gradients.

And using the Lipschitz condition,

we can do some reformulation to the left hand side

of these inequalities, and we reach the following conditions.

So it means that the difference

of the gradient inner product by pk,

they will be upper bounded by this term.

Therefore, from the condition here,

we can find a lower bound for the αk.

Now we substitute this αk to the first Wolfe condition,

which is the sufficient decrease condition.

Notice that pk is a descent direction.

So we have the second term, the coefficient of α is negative.

Therefore, when we substitute the αk into this term,

we have the following result.

So this is the first aim we want to prove.

After that, with this condition,

the following proof is a bit easier because the only thing

we need to do is that we first replace this square term

by the definition of cos(θk).

so you have the numerator here, which is exactly this term.

So we replace the numerator by cos(θk) multiplied

by the denominator. Then we can get this result.

So this result is related to the (cos(θk))^2 multiplied

by norm square of gradient.

Then to reach the final Zoutendijk result,

we need to do the summation.

so we sum this inequalities from k equals zero until k+1.

We got several inequalities.

And the inequalities we add up the left hand side and

add up the right hand side,

some of the terms will be cancelled out.

For example, f1 until fk they are cancelled out

because they appear both left hand side and right hand side.

Then the left term will be like this.

So we got the difference of the two function values

will be smaller than or equal to this right hand side.

Also notice that we assumed the function is bounded below.

In other words, the left hand side is bounded below.

However, the right hand side, if we add up the k until infinity,

then this will become a series.

So in other words, this series must be convergent.

Otherwise the result will fail.

So we have the desired Zoutendijk result.

Now the proof is finished.

So next let's take a look at what it implies

in Zoutendijk's Theorem.

Now the Zoutendijk's Theorem, I copy the result here.

To get the Zoutendijk's Theorem,

we assumed that the step length is satisfied

by the Wolfe condition.

So we use the Wolfe condition in our proof.

Actually, if we replace the Wolfe condition

by Goldstein condition or the strong Wolfe condition,

the result also holds, because the proof is quite similar.

However, if we try to prove the result

using the backtracking step length to get the step length αk,

then we need extra assumptions. So this is a different proof.

Now let's take a look at the implications

of the Zoutendijk's Theorem. We have that

it actually implies for each item of the summation,

its limit tends to zero.

This is a standard result in series theorem.

So to use this result,

let's take a further assumption that

if this cos(θk) have a lower bound,

and the lower bound is a strictly positive constant.

Then by the limit turning to zero

and nonzero lower bound of cos(θk),

then we can get the following result.

The limit of the norm of gradient is zero.

In other words,

this is exactly the definition for global convergence.

So in other words, if we can show that

cos(θk) has a nonzero lower bound,

then we can prove the global convergence.

So from this page,

we can demonstrate the role of Zoutendijk's Theorem.

It is actually a bridge setting up the line search

with the global convergence.

So the next question is,

can we indeed prove that the cos(θk)

has a nonzero lower bound?

Now we need to discuss the nonzero lower bound,

one method by one method,

because the direction pk may be defined by different ways.

For example,

steepest descent direction or some other directions.

Now let's take a look at the global convergence formally.

So we define that an algorithm is globally convergent.

If this condition holds.

So we define it as the global convergence.

And for line search method,

this is the best result we can achieve.

So in other words, if we take a look at this limit,

we can see that it actually provide us a stationary point

rather than a local minimizer.

So recall that the stationary point for

a smooth function is defined as the gradient to be zero.

So actually,

global convergence only returns us a stationary point.

And this is the best result that line search method can get.

And to guarantee the global convergence to a local minimum,

we may need further extra informations.

For example, we may need the information

about Hessian matrix for each iteration.

We will discuss this in later chapters.

Now let's take a look at the steepest descent method

as we mentioned before.

The direction may be determined by different ways.

So we start with steepest descent method.

And for steepest descent method, the cos(θk) is always one.

Why? Because if we take a look at the definition of cos(θk),

it is actually like this.

So it measures the angle between the minus gradient

and the search direction, right?

Over the norm. So for steepest descent method,

we got the direction pk exactly the minus gradient f(xk ).

In other words, we have this θk always zero,

because they coincide with each other.

In other words, this cos(θk) will be always one.

Therefore, coming back to the Zoutendijk's Theorem,

we did find a constant

which is nonzero for the lower bound of cos(θk).

Therefore,

we can reach that the global convergence always holds.

So the global convergence for steepest descent method

with Wolfe condition, strong Wolfe condition

and Goldstein condition is already proved.

Now let's take a look at the Newton type methods.

For Newton-like method,

the direction takes the following form.

We have pk is the Bk inverse multiplied by gradient f(xk ).

And by choosing different choices of Bk

we will have Newton's method or quasi-Newton method.

So to measure the angle between pk

and the steepest descent directions,

we need some assumptions on Bk.

So we require that the condition number here

about Bk must have an upper bound.

Then under this condition, we can prove that

cos(θk) does have a nonzero lower bound.

With this condition, we can have the global convergence.

So for Newton-like method,

we need to impose some conditions

on the update of Hessian matrix.

Now we stop here for this part.

And let's make a brief summary about this part.

In this part, we introduced the bridge-Zoutendijk's Theorem

to prove the global convergence of line search methods.

And we particularly discuss the global convergence

for steepest descent method, as well as Newton-like methods.

In the next part,

we will discuss other types of line search methods.

So see you next time. Bye.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

3.4 Convergence of Line Search Methods在线视频

3.4 Convergence of Line Search Methods课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

3.4 Convergence of Line Search Methods笔记与讨论

也许你还感兴趣的课程: