当前课程知识点:现代优化方法(英文) > Chapter 3. Line Search Methods > 3.4 Convergence of Line Search Methods > 3.4 Convergence of Line Search Methods
Hi, everyone.
Welcome to modern optimization methods.
Today we will introduce the fourth part of chapter three,
the convergence of line search methods.
In this part, we will first talk about the Zoutendijk's Theorem,
which is a key point in proving the global convergence.
Then we will discuss the global convergence result.
Now let's recall the line search methods first.
So for line search methods,
we are always update the iteration like this.
We standing up at xk and we choose a direction first.
After that, we move along this direction with a step size αk.
Notice that this direction must be a descent direction.
To measure the global convergence, we need some notations.
Firstly, let's use cos(θk) to denote the vector
between the minus gradient and the pk vector.
So we use cos(θk) to measure their inner product.
This is a standard definition.
Now we have another notation called the level set.
So level set is defined as the collection of the x
whose function value is smaller than or equal
to the initial starting point x0.
So actually, we are looking for
the minimizer of this function inside of this level set,
because the function value will always be smaller than f(xo ).
Now let's recall the line search update,
as well as other settings for the Zoutendijk's Theorem.
So the line search strategy update is the following form,
as we just mentioned.
And we use the descent direction pk,
which means that this strictly negative condition must hold.
Then thirdly, the step length αk we require
that it satisfies the Wolfe condition.
so recall that we introduced the Wolfe condition
in the second part of chapter three.
And we detailed the Wolfe condition here
with some parameters between zero and one.
So this is the setting for the Zoutendijk's Theorem.
Also we have some assumptions for this theorem.
For example, the bounded below assumption,
we assume that the function we are looking for,
the minimizer is bounded below.
Otherwise, we cannot find the minimizer.
So this assumption is a reasonable one.
Another one is that we require f is continuously differentiable
in an open set N, which contains the level set.
So in other words, this function
must be continuously differentiable over all the level set.
The third one is about the gradient.
We require the gradient is Lipschitz continuous.
In other words, we have the constant L
to make the condition hold for any point in this N.
So these are three assumptions in Zoutendijk's Theorem.
Now with the above settings and assumptions,
we will have the result in the following.
So the result of Zoutendijk's Theorem basically says the series,
so this is the series, the series is convergent.
In other words, all the sequence (cos(θk))^2 multiplied
by the norm square of gradient, they are convergent.
This result is a key result to prove
the global convergence of line search methods.
And this result is due to Zoutendijk.
So we refer to this theorem as Zoutendijk's Theorem.
And this condition, we call it Zoutendijk condition.
Now let's take a look at the implementation of this result.
So before that,
let's first present how we can show the Zoutendijk's Theorem.
To show the Zoutendijk's Theorem,
the first step is that we have to prove the following conditions.
The bottom condition actually shows that
the difference between the two function values,
they have some lower bounds.
So to reach this aim, let's start from the first step,
one by one. Now step one,
we use the curvature condition in Wolfe condition.
So this is the second condition in the Wolfe condition.
And we add from both sides the gradient term here.
And we can reach this term.
So this term basically contains a difference between two gradients.
And using the Lipschitz condition,
we can do some reformulation to the left hand side
of these inequalities, and we reach the following conditions.
So it means that the difference
of the gradient inner product by pk,
they will be upper bounded by this term.
Therefore, from the condition here,
we can find a lower bound for the αk.
Now we substitute this αk to the first Wolfe condition,
which is the sufficient decrease condition.
Notice that pk is a descent direction.
So we have the second term, the coefficient of α is negative.
Therefore, when we substitute the αk into this term,
we have the following result.
So this is the first aim we want to prove.
After that, with this condition,
the following proof is a bit easier because the only thing
we need to do is that we first replace this square term
by the definition of cos(θk).
so you have the numerator here, which is exactly this term.
So we replace the numerator by cos(θk) multiplied
by the denominator. Then we can get this result.
So this result is related to the (cos(θk))^2 multiplied
by norm square of gradient.
Then to reach the final Zoutendijk result,
we need to do the summation.
so we sum this inequalities from k equals zero until k+1.
We got several inequalities.
And the inequalities we add up the left hand side and
add up the right hand side,
some of the terms will be cancelled out.
For example, f1 until fk they are cancelled out
because they appear both left hand side and right hand side.
Then the left term will be like this.
So we got the difference of the two function values
will be smaller than or equal to this right hand side.
Also notice that we assumed the function is bounded below.
In other words, the left hand side is bounded below.
However, the right hand side, if we add up the k until infinity,
then this will become a series.
So in other words, this series must be convergent.
Otherwise the result will fail.
So we have the desired Zoutendijk result.
Now the proof is finished.
So next let's take a look at what it implies
in Zoutendijk's Theorem.
Now the Zoutendijk's Theorem, I copy the result here.
To get the Zoutendijk's Theorem,
we assumed that the step length is satisfied
by the Wolfe condition.
So we use the Wolfe condition in our proof.
Actually, if we replace the Wolfe condition
by Goldstein condition or the strong Wolfe condition,
the result also holds, because the proof is quite similar.
However, if we try to prove the result
using the backtracking step length to get the step length αk,
then we need extra assumptions. So this is a different proof.
Now let's take a look at the implications
of the Zoutendijk's Theorem. We have that
it actually implies for each item of the summation,
its limit tends to zero.
This is a standard result in series theorem.
So to use this result,
let's take a further assumption that
if this cos(θk) have a lower bound,
and the lower bound is a strictly positive constant.
Then by the limit turning to zero
and nonzero lower bound of cos(θk),
then we can get the following result.
The limit of the norm of gradient is zero.
In other words,
this is exactly the definition for global convergence.
So in other words, if we can show that
cos(θk) has a nonzero lower bound,
then we can prove the global convergence.
So from this page,
we can demonstrate the role of Zoutendijk's Theorem.
It is actually a bridge setting up the line search
with the global convergence.
So the next question is,
can we indeed prove that the cos(θk)
has a nonzero lower bound?
Now we need to discuss the nonzero lower bound,
one method by one method,
because the direction pk may be defined by different ways.
For example,
steepest descent direction or some other directions.
Now let's take a look at the global convergence formally.
So we define that an algorithm is globally convergent.
If this condition holds.
So we define it as the global convergence.
And for line search method,
this is the best result we can achieve.
So in other words, if we take a look at this limit,
we can see that it actually provide us a stationary point
rather than a local minimizer.
So recall that the stationary point for
a smooth function is defined as the gradient to be zero.
So actually,
global convergence only returns us a stationary point.
And this is the best result that line search method can get.
And to guarantee the global convergence to a local minimum,
we may need further extra informations.
For example, we may need the information
about Hessian matrix for each iteration.
We will discuss this in later chapters.
Now let's take a look at the steepest descent method
as we mentioned before.
The direction may be determined by different ways.
So we start with steepest descent method.
And for steepest descent method, the cos(θk) is always one.
Why? Because if we take a look at the definition of cos(θk),
it is actually like this.
So it measures the angle between the minus gradient
and the search direction, right?
Over the norm. So for steepest descent method,
we got the direction pk exactly the minus gradient f(xk ).
In other words, we have this θk always zero,
because they coincide with each other.
In other words, this cos(θk) will be always one.
Therefore, coming back to the Zoutendijk's Theorem,
we did find a constant
which is nonzero for the lower bound of cos(θk).
Therefore,
we can reach that the global convergence always holds.
So the global convergence for steepest descent method
with Wolfe condition, strong Wolfe condition
and Goldstein condition is already proved.
Now let's take a look at the Newton type methods.
For Newton-like method,
the direction takes the following form.
We have pk is the Bk inverse multiplied by gradient f(xk ).
And by choosing different choices of Bk
we will have Newton's method or quasi-Newton method.
So to measure the angle between pk
and the steepest descent directions,
we need some assumptions on Bk.
So we require that the condition number here
about Bk must have an upper bound.
Then under this condition, we can prove that
cos(θk) does have a nonzero lower bound.
With this condition, we can have the global convergence.
So for Newton-like method,
we need to impose some conditions
on the update of Hessian matrix.
Now we stop here for this part.
And let's make a brief summary about this part.
In this part, we introduced the bridge-Zoutendijk's Theorem
to prove the global convergence of line search methods.
And we particularly discuss the global convergence
for steepest descent method, as well as Newton-like methods.
In the next part,
we will discuss other types of line search methods.
So see you next time. Bye.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业