当前课程知识点:现代优化方法(英文) > Chapter 3. Line Search Methods > 3.2 The Wolfe conditions > 3.2 The Wolfe conditions
Hi, everyone.
Welcome to modern optimization methods.
Today we will introduce the second part of chapter three,
which is the Wolfe conditions.
Now let's first recall the line search strategy.
As we mentioned before,
the line search strategy actually takes two steps.
So at the current iteration,
we need to first choose a direction
which is a descent direction.
And then we determine a step length α,
then we move to that iteration.
The question is,
how can we choose a reasonably good step length αk.
So as we demonstrate in the previous lecture,
choosing an important inexact step length
is very essential.
So the idea is
how we can find sufficient decrease condition
to guarantee the convergence of f to the local minimizer.
Now let's first introduce the first condition
in Wolfe condition,
which is the sufficient decrease condition.
So the sufficient decrease condition
is illustrated by this inequality here.
So if we look at this inequality,
the left hand side basically says the next iteration,
the function value of next iteration
and the right hand side actually is a sum of two terms.
The first term is the current function values f(xk),
and the second term is
we use the inner product of gradient f at xk
multiplied by the direction pk,
then we have some constant here, c1 multiplied by α.
This constant actually controls somehow the step length.
So notice that the second term here,
it is a negative term.
So it guarantees that the next iteration is
strictly smaller than the current one.
And now what does this inequality means?
It actually means that the reduction
in function f should be proportional
between the step length
and the directional derivative reduction.
Now let's illustrate it geometrically by a figure.
Before we present the figure,
let's first do some reformulation.
So what is the reformulation?
So we transfer this inequality in terms of Φ(α)
as well as a line. Now let's recall that Φ(α) here,
it is a function of the step length.
And if we take the step length at zero,
then we are standing at the current iteration f(xk)
which is Φ(0), actually is the same as f(xk).
And similarly, Φ(αk ) is actually the next iteration f(xk).
Now if we do the gradient of Φ with respect to α,
then we use the chain rule.
We can get that Φ' at zero point takes this value here,
which is exact the gradient of f transpose pk.
So this term is actually the tangent
of the function Φ at zero point.
So try to remember this. Keep it in mind.
Now for the Φ'(αk) for general α,
we have the iteration like this.
So Φ'(αk) is calculated by the function values
of f at next iteration multiplied by pk.
So this is about the left hand side f(xk+αpk).
This is the left hand side. We transfer it into Φ(α).
For the right hand side,
if we take it as a function of α.
It is actually a linear function.
So for this linear function,
we denote it as l. And l(α) is the right hand side.
For the coefficient of l(α), it is actually,
we call it the slope.
The slope of α is actually take this value multiplied by c1.
So it is also negative one.
However, the slope of l(α) is
a little bit different from the slope of Φ(α)
because they have a ratio relationship.
So now after presenting these relations,
we transfer the conditions to this one.
So we actually require that for the Φ(α),
it must lie between this line l(α).
So this is the figure we want to illustrate.
We still have the function curve about Φ(α) here.
And then we plot out the straight line l(α) in dotted.
So you can see that if we try to find the part that
Φ(α) lies below l(α),
then this is the interval, which we can accept.
And this is the interval that we can accept.
So the two intervals actually returns us
the acceptable step length.
So actually, there are quite amount of step length
that can satisfy the first condition.
However, you may notice that
there is an interval which is very close to zero.
They are also acceptable.
So in a very bad situation,
if we choose this α very close to this zero,
then it satisfies the sufficient decrease condition.
However, in practice update,
it means that we can only move forward a little bit just like this.
So it will slow down the iteration process.
Therefore, the next condition in Wolfe conditions
will rule out such small size of α.
And then we come to the second condition
called the curvature condition.
And this condition actually, in mathematical form,
it presented like this.
So we recall some gradient at next iteration transpose
pk is greater than or equal to a constant ratio
of the current iteration multiplied by pk.
So similarly, if we transfer this condition into the form of Φ,
then we are actually requiring that
the slope of the function at the current iteration
here is greater than
or equal to a ratio of the prime at zero.
Let's present it also in this figure.
And recall that we have translated the second condition here.
And we're looking for the point αk
whose slope is between the figure Φ'(0)
multiplied by a ratio.
So if we look at this figure, we can find some slope here.
So this slope, the tangent part at zero is actually the Φ'(0).
And the slope a little bit horizontal here
is the slope of the right hand side.
So the α we are looking for must have the slope greater
than this slope here. So finally,
we can reach the acceptable interval for α
for the second condition like this,
for the interval here, up to here.
And another interval is over there.
So you can see that for the second condition,
we actually rule out the very small size of α
For this α, which is very close to zero,
we do not select them.
So taking the intersection of the two conditions
will give us the final acceptable interval of α.
So any point lying between the two intervals
will be candidates of our inexact line search.
So we just pick up one of them.
They will be our inexact line search step.
And then we move forward.
Now you may wonder
whether such a step length exists or not for any functions.
Now we have this theorem to guarantee that
if our parameter c1 and c2 choose as specified here,
then we can guarantee that such condition always holds.
So mathematically speaking,
that we assume the function to be continuously differentiable.
And more over we set pk to be a descent direction
at current xk and we require
that the function is bounded below along this ray.
That means that along this search direction,
there must be a minimum value of this function.
Otherwise, it is unbounded.
And it is not a well posed problem.
So the assumption here are all reasonable.
And if the condition c1, c2 satisfy the condition here,
then we can guarantee
that such step length intervals always exists.
So this is a very good and beautiful result,
which guarantees
that the Wolfe condition can always be satisfied.
Now let's make a brief summary about the Wolfe conditions.
Basically, there are two conditions in the Wolfe conditions.
The first one will guarantee that we have sufficient decrease.
The second one will rule out the very small α,
which can prevent us from slow process.
So in the next part,
we will discuss more details
about other sufficient decrease conditions.
So we stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业