3.2 The Wolfe conditions慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, everyone.

Welcome to modern optimization methods.

Today we will introduce the second part of chapter three,

which is the Wolfe conditions.

Now let's first recall the line search strategy.

As we mentioned before,

the line search strategy actually takes two steps.

So at the current iteration,

we need to first choose a direction

which is a descent direction.

And then we determine a step length α,

then we move to that iteration.

The question is,

how can we choose a reasonably good step length αk.

So as we demonstrate in the previous lecture,

choosing an important inexact step length

is very essential.

So the idea is

how we can find sufficient decrease condition

to guarantee the convergence of f to the local minimizer.

Now let's first introduce the first condition

in Wolfe condition,

which is the sufficient decrease condition.

So the sufficient decrease condition

is illustrated by this inequality here.

So if we look at this inequality,

the left hand side basically says the next iteration,

the function value of next iteration

and the right hand side actually is a sum of two terms.

The first term is the current function values f(xk),

and the second term is

we use the inner product of gradient f at xk

multiplied by the direction pk,

then we have some constant here, c1 multiplied by α.

This constant actually controls somehow the step length.

So notice that the second term here,

it is a negative term.

So it guarantees that the next iteration is

strictly smaller than the current one.

And now what does this inequality means?

It actually means that the reduction

in function f should be proportional

between the step length

and the directional derivative reduction.

Now let's illustrate it geometrically by a figure.

Before we present the figure,

let's first do some reformulation.

So what is the reformulation?

So we transfer this inequality in terms of Φ(α)

as well as a line. Now let's recall that Φ(α) here,

it is a function of the step length.

And if we take the step length at zero,

then we are standing at the current iteration f(xk)

which is Φ(0), actually is the same as f(xk).

And similarly, Φ(αk ) is actually the next iteration f(xk).

Now if we do the gradient of Φ with respect to α,

then we use the chain rule.

We can get that Φ' at zero point takes this value here,

which is exact the gradient of f transpose pk.

So this term is actually the tangent

of the function Φ at zero point.

So try to remember this. Keep it in mind.

Now for the Φ'(αk) for general α,

we have the iteration like this.

So Φ'(αk) is calculated by the function values

of f at next iteration multiplied by pk.

So this is about the left hand side f(xk+αpk).

This is the left hand side. We transfer it into Φ(α).

For the right hand side,

if we take it as a function of α.

It is actually a linear function.

So for this linear function,

we denote it as l. And l(α) is the right hand side.

For the coefficient of l(α), it is actually,

we call it the slope.

The slope of α is actually take this value multiplied by c1.

So it is also negative one.

However, the slope of l(α) is

a little bit different from the slope of Φ(α)

because they have a ratio relationship.

So now after presenting these relations,

we transfer the conditions to this one.

So we actually require that for the Φ(α),

it must lie between this line l(α).

So this is the figure we want to illustrate.

We still have the function curve about Φ(α) here.

And then we plot out the straight line l(α) in dotted.

So you can see that if we try to find the part that

Φ(α) lies below l(α),

then this is the interval, which we can accept.

And this is the interval that we can accept.

So the two intervals actually returns us

the acceptable step length.

So actually, there are quite amount of step length

that can satisfy the first condition.

However, you may notice that

there is an interval which is very close to zero.

They are also acceptable.

So in a very bad situation,

if we choose this α very close to this zero,

then it satisfies the sufficient decrease condition.

However, in practice update,

it means that we can only move forward a little bit just like this.

So it will slow down the iteration process.

Therefore, the next condition in Wolfe conditions

will rule out such small size of α.

And then we come to the second condition

called the curvature condition.

And this condition actually, in mathematical form,

it presented like this.

So we recall some gradient at next iteration transpose

pk is greater than or equal to a constant ratio

of the current iteration multiplied by pk.

So similarly, if we transfer this condition into the form of Φ,

then we are actually requiring that

the slope of the function at the current iteration

here is greater than

or equal to a ratio of the prime at zero.

Let's present it also in this figure.

And recall that we have translated the second condition here.

And we're looking for the point αk

whose slope is between the figure Φ'(0)

multiplied by a ratio.

So if we look at this figure, we can find some slope here.

So this slope, the tangent part at zero is actually the Φ'(0).

And the slope a little bit horizontal here

is the slope of the right hand side.

So the α we are looking for must have the slope greater

than this slope here. So finally,

we can reach the acceptable interval for α

for the second condition like this,

for the interval here, up to here.

And another interval is over there.

So you can see that for the second condition,

we actually rule out the very small size of α

For this α, which is very close to zero,

we do not select them.

So taking the intersection of the two conditions

will give us the final acceptable interval of α.

So any point lying between the two intervals

will be candidates of our inexact line search.

So we just pick up one of them.

They will be our inexact line search step.

And then we move forward.

Now you may wonder

whether such a step length exists or not for any functions.

Now we have this theorem to guarantee that

if our parameter c1 and c2 choose as specified here,

then we can guarantee that such condition always holds.

So mathematically speaking,

that we assume the function to be continuously differentiable.

And more over we set pk to be a descent direction

at current xk and we require

that the function is bounded below along this ray.

That means that along this search direction,

there must be a minimum value of this function.

Otherwise, it is unbounded.

And it is not a well posed problem.

So the assumption here are all reasonable.

And if the condition c1, c2 satisfy the condition here,

then we can guarantee

that such step length intervals always exists.

So this is a very good and beautiful result,

which guarantees

that the Wolfe condition can always be satisfied.

Now let's make a brief summary about the Wolfe conditions.

Basically, there are two conditions in the Wolfe conditions.

The first one will guarantee that we have sufficient decrease.

The second one will rule out the very small α,

which can prevent us from slow process.

So in the next part,

we will discuss more details

about other sufficient decrease conditions.

So we stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

3.2 The Wolfe conditions在线视频

3.2 The Wolfe conditions课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

3.2 The Wolfe conditions笔记与讨论

也许你还感兴趣的课程: