2.5 Search direction Ⅱ慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hello, everyone.

Welcome to modern optimization methods.

In this part,

we will introduce the fifth part of chapter two,

the search directions.

This part includes the three kinds of directions,

Newton's direction, quasi-Newton directions,

as well as conjugate gradient directions.

Now firstly, let's talk about Newton's direction.

So Newton's direction is

actually given by the formula like this.

So we have pkN,

here this N represent Newton's direction.

It is given by the inverse of Hessian matrix at xk.

Then multiplied by the gradient of f at xk

and then we take the minus direction.

So this is Newton's direction.

It seems that the calculation is a bit complicated.

So let's first introduce the intuition

why we can have this formula.

The intuition is by the Taylor's expansion of f(xk+p) at xk.

So we can expand the function value of f to its second order.

And we basically approximate this function f

by the second order quadratic functions.

So if we take p as our variables and we take expansions,

then the first term is f(xk ) as before.

The second term is gradient.

And the third term is the Hessian matrix of f at xk.

So we write the right hand side

of this equation as the mk(p).

It is a function.

We are looking for the direction p.

So if we try to minimize this mk(p),

then we will get the Newton's direction.

So if I draw a picture geometrically,

it will give you the following result.

So we assume that we got a function here.

This is f.

And we assume we get xk here.

Then basically Newton's direction says,

we start at xk and we approximate this function curve

by a quadratic function.

So this is the quadratic function.

And xk is here.

So instead of minimizing this f(x),

we minimize this quadratic function.

So the function will return us

an xk here,

x(k+1) here.

So the direction from xk to x(k+1)

will be the Newton's direction.

Here.

So this give us the intuition of Newton's direction.

Geometrically, it means that we approximate function

at a point xk by a quadratic function.

And we solve this quadratic function to its minimizer.

So the direction between xk and x(k+1)

will be the Newton's direction.

Now we have several remarks about Newton's direction.

As we can see that we approximate this function

by a quadratic one at a neighborhood of xk.

So if this approximation is within a small region,

then it is very reliable.

This corresponds to the first remark.

Newton's direction is reliable when f this curve,

and the quadratic curve

are approximate to each other very well.

The second remark basically means that

Newton's direction is a descent direction under some conditions.

So recall that we have mentioned

what is descent direction in the previous class.

It basically says that the direction

takes the inner product with the gradient

must be strictly smaller than zero.

If we take Newton's direction to do the inner product,

then under some conditions,

we will have this inner product smaller than zero.

So what is the condition?

So we have several conditions,actually,

to make Newton's direction exist.

Firstly, the Hessian matrix must be invertible.

Secondly, the Hessian matrix must be positive definite

to guarantee the sufficient decrease.

So these are two assumptions to

make Newton's direction a descent direction.

The third remark is that in this case,

we actually take the step length to be unit step length.

And we can also modify it to a nonunit step length,

meaning that we can add a line search strategy.

We will talk about the details later in chapter three.

Now, we can show that

the quadratic convergence actually holds.

So what is quadratic convergence?

We will introduce this in the last part of chapter two.

However, a disadvantage is that for Newton's direction,

it may have heavy computational cost.

Why?

Because it involves the calculation of Hessian matrix.

So recall that Hessian matrix, it is a matrix of n by n,

meaning that you have to calculate

n square elements in this matrix.

So it involves computational cost, which is very heavy.

Now let's take a look at the quasi-Newton directions.

Given the fact that

Newton's direction have a lot of requirement,

for example, it requires the inverse of Hessian matrix exists.

And also it requires that

the Hessian matrix must be positive definite.

Can we modify a little bit

to get around of these conditions?

So this comes the quasi-Newton methods.

In quasi-Newton methods,

we do not use the Hessian matrix directly.

Instead, we use the matrix Bk

to approximate the Hessian matrix.

So this gives the quasi-Newton direction pk

as I showed in this equation.

And we replace the Hessian matrix

by an approximate matrix Bk.

So the advantage of

quasi-Newton methods is obvious.

Firstly, it does not require the

calculation of Hessian matrix,

because we use Bk instead.

And meanwhile

it still keeps a superlinear convergence rate,

which is fast.

Finally, you may ask how we can get this Bk?

By which way

we can approximate the Hessian matrix very well.

So this comes that

why we use the quasi-Newton methods.

So the main idea is that

we approximate Hessian matrix by Bk firstly.

The second one is that how

we can get the next Hessian matrix approximation,

that means B(k+1).

How we can get the next approximation.

So this is very important.

In this page,

we will mainly discuss

how we can derive B(k+1) based on Bk.

So actually Bk has to satisfy some conditions.

Now the first condition is that

it must satisfy the following secant condition,

that means B(k+1)

in sk direction will be the same as yk.

So what is sk?

sk is the difference of the two successive iterations.

And yk is the difference of the two successive gradients.

So Bk the second equation basically says that

the Bk along the direction sk

after a multiplication,

then it must be the same as yk.

So this is one condition that

we can use to get the next B(k+1).

The other condition is that B(k+1) must be symmetric.

And also we must make sure that

when we generate B(k+1) from Bk,

the computational cost must be small,

which means that

we can make that the difference

between the two matrices must be low rank.

In other words,

we can make a bit modification to Bk to get B(k+1).

Based on the several rules here,

we propose a class of quasi-Newton directions.

Notice that the above three conditions

does not lead to unique B(k+1),

it leads to a class of B(k+1).

So we call the quasi-Newton methods as

methods rather than method.

So there are two most popular updating formulae,

which I list here.

The first one is called

the symmetric rank one update short for SR1.

the symmetric rank one update, short for SR1.

It actually modifies Bk by a rank one modification.

And then we can reach B(k+1).

So if we take a look at the second term here,

it is actually a rank one matrix.

Why?

Because we can take this vector yk minus Bk sk as a vector.

Then this vector multiplied the transpose of

this vector will become a rank one matrix.

And the denominator here is a constant.

So totally speaking, this Bk is a rank one update.

Let's take a look at the

second update formula, the BFGS formula.

It was originally proposed by Broyden,

Fletcher,

Goldfarb and Shanno.

It is actually a rank two modification.

So why this modification is rank two?

If we take a look at this formula,

we find there are two terms after Bk.

So each term actually is a rank one matrix.

So it is a rank two modification.

Now, let's talk about the inverse of Bk.

Recall that in quasi-Newton direction,

Bk is the approximation of Hessian matrix.

So actually, in quasi-Newton direction,

we need the inverse of Bk.

Therefore, is it possible to

derive an update formula for the inverse of Bk?

The answer is yes.

We use H to denote the inverse of Bk.

Then we can get the

following corresponding update formulae for Hk.

So Hk is the inverse of Bk and we can similarly

have the two updating formulae for the inverse of Bk.

So H(k+1) is modified by a rank one constraint.

And for the BFGS formula,

the update tends to be a little bit complicated.

However, it is still rank two update.

So this is the update formula for quasi-Newton methods.

Now let's take a look at the third direction,

conjugate gradient direction.

In conjugate gradient direction, from the name,

you can see that it is related to gradient.

There is also some conjugate involved.

Conjugacy will be introduced in chapter five.

We do not introduce the notation here.

So I just give the formula for the conjugate direction.

It actually has the form like the sum of two vectors.

The first vector is the steepest descent direction,

which is kind of gradient.

The second vector is p(k-1).

This p(k-1) is actually the previous direction.

So recall that we have current direction as pk,

so p(k-1) is the previous direction.

So this is the update for conjugate gradient direction.

It is kind of combination

between the previous update direction

as well as the current gradient direction.

What is the intuition for conjugate gradient direction?

You may ask this question.

Actually, conjugate gradient directions

were originally proposed for solving the linear systems here,

Ax=b where this A is positive definite.

And actually, we can show that

this linear system is equivalent

to solving the quadratic optimization problem,

which is a minimization problem here.

So this is the connection between solving linear system

as well as solving optimization problem.

Now, what is the benefit

of the conjugate gradient directions?

As we can see from the direction,

it actually only involves gradient information

on the previous direction.

So the computational cost is very cheap.

This is one advantage.

On the other hand,

it actually needs less storage,

Because it only involves vectors.

Moreover it is much more effective

than using the steepest descent direction.

So this is compared with the steepest descent direction.

It will be more efficient than the steepest descent method.

Actually, it is a very powerful direction

to solve the optimization problems.

We will demonstrate more details in chapter five.

Now let's make a brief summary about this part.

In this part,

we mainly talk about three search directions.

All of them are descent directions under some conditions.

So the first one is Newton's direction.

The second one is quasi-Newton direction.

And finally the conjugate gradient direction.

So together with steepest descent directions,

they are the search directions

under the line search strategies.

So in the next part,

we will introduce the trust region strategy,

which is another way to update the next iterate.

We stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

2.5 Search direction Ⅱ在线视频

2.5 Search direction Ⅱ课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

2.5 Search direction Ⅱ笔记与讨论

也许你还感兴趣的课程: