2.4 Line search strategy慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, everyone.

Welcome to modern optimization methods.

In this part,

we will look at 2.4,

the fourth part of chapter two,

line search strategy.

In this part,

we will mainly introduce the overview of algorithms

and the line search strategy.

Now let's take a brief look

at numerical algorithms.

As we mentioned in chapter one,

the numerical algorithms actually start

from initial point x0

and then iterate

until it reaches some stopping criterion.

So it will generate a sequence xk.

And the stopping criterion may set as follows.

If there is no progress,

then we will stop the iteration.

Or if we reach an approximate solution,

then we will also stop the iteration.

Now,

as we mentioned before,

the key point is

how we move from xk the current iterate

to the next iterate x(k+1).

As long as we figure out this process,

then we can design numerical algorithms.

Now naturally

there are two ways to generate the next iteration.

We will detail this in the next slide.

Now the current information we may use

including the current function values f(xk)

or even more previous function values,

including f(x(k-1)) until f(x0).

Also,

if the function value it is differentiable,

we may also use the gradient information.

The main aim in this case is that

we try to reach a smaller function value

than the current iterate.

So this is our objective.

Now let's take a look at

how we move from the current iterate

to the next iterate.

There are basically two strategies.

The first one is the line search strategy.

The second one is the trust region strategy.

The two strategies have different ideas

to design the next iteration.

Now in this part,

we first take a look at line search strategy.

For line search methods,

the main idea for line search methods is like this.

We stand at the current iterate.

And we choose a direction,

for example,

we choose the direction here.

Then we move along this direction

at a step size αk.

So then this will be the next iteration.

So mathematically speaking,

the next iteration will take the following form

x(k+1)=xk+αkpk.

Now there are several definitions.

The pk which we choose

is called the direction.

So this is the first step

we choose the direction pk.

After we get this pk

then we need to choose a step size αk.

Basically speaking,

the main idea for line search strategy

are divided into two steps.

The first one is

we choose the direction.

The second one is

we choose a step length.

Now let's take a further look

at the step length.

How we determine the step length?

In other words,

how large I will move along this direction?

So the idea is that

we try to minimize

a one-dimensional function value Φ(α).

In order that

we can reach a smaller Φ(α),

the returned solution is actually the step size.

So we have two ways to find this step length.

The first way is that

we do exact line search.

That means

we solve this minimization problem

to reach the global minimizer.

Then this global minimizer

will be the exact line search step.

Another way is the inexact line search,

which means that

the step length may not be necessarily

the global minimizer of the problem.

However,we just require that

the step length achieves adequate sufficient decrease

in terms of function value.

So there are different concerns

for the two kinds of step size.

For the exact line search,

the benefit is that

we can reach a global minimizer of this Φ(α).

In other words,

we can get the smallest function decrease

along the direction pk.

However,

to achieve this goal,

we need a lot of function values.

Because we need to find the global minimizer.

So it may involves a lot of computational cost

and a lot of inner iterations.

Another way is the inexact line search,

the decrease in function value is adequate reduction.

So if there are some adequate reduction

in this function value.

However,

αk can be achieved by using lower function value cost.

In other words,

the inexact line search can save some computational cost.

Meanwhile,

it can reach some adequate function reduction.

Now in terms of line search methods,

there are different ways to determine the search direction pk.

So here I list some of the different ways

to determine the directions.

For example,

if the direction is determined

by steepest descent direction,

then the resulting method is called

steepest descent method.

Similarly,

we can also determine the direction

by Newton's direction

or by quasi-Newton directions

or by conjugate gradient directions.

So there are different ways

to determine the directions.

Now let's first take a look

at the steepest descent direction.

So from the name,

you can see that

the steepest descent direction

actually reduces the function value

in a very fast way.

So let's take a look at what is its form.

In this case,

direction pk is taken as the minus of the gradient.

In other words,

it is the negative direction of gradient.

Let's recall that

for a gradient of a function,

it actually represents the fastest way that

a function value grows.

So if we take the minus of this gradient,

it actually represents the fastest way that

a function decreases.

So recall that

we are trying to minimize a function.

So we try to find the fastest way that

function value decreases.

So this direction actually returns us

the steepest direction.

Now we have a simple result,

which explains

why this direction is called steepest descent direction.

So the result basically says that

among all the directions,

the steepest descent direction

actually the one that along which

the function value decreases most rapidly.

So this is a result.

We put it as a theorem.

However, here we did not give the proof.

I leave the proof to you as a homework.

After giving the steepest descent direction,

let's take a look at a graph to show

how steepest descent direction works.

Now look at this figure.

We got some curves

or some circles here.

So as we recall that

this circle represents the contours of a function.

So suppose we are standing at xk here

and we try to choose a steepest descent direction.

So it is actually orthogonal

to the tangent line here.

So this will be the

steepest descent direction

pointing to the minimizer,

pointing inside.

So the steepest descent direction actually choose this way.

And then if we move along pk

at some step size αk

then we jump to a point x(k+1).

Then in the next iterate,

we still choose the steepest descent way

and the iteration goes on.

So this actually shows that

the steepest descent direction is orthogonal

to the contours of f.

Because it's pointing directly into the minimizer.

So for the steepest descent direction,

we have some remarks about this.

The first remark is that

the advantage of this direction.

As you can see,

that it actually requires only the gradient information,

which means that

the computational cost is not that expensive.

So we just need the gradient.

The disadvantage is that

it may be extremely slow for some particular examples.

Because in some very bad situations,

if you move along the steepest descent directions,

there will be very slow convergence rate.

We will discuss the convergence rate

later in chapter three.

And more extensions is that

if we choose other directions,

not steepest descent directions.

So what kind of directions we can choose?

So we give a definition here that

any direction pk satisfying the condition here

will be a descent direction.

So what is this condition?

This condition basically means that

pk takes inner product with gradient of f at xk

will be strictly smaller than zero.

In this case,

the direction pk will be a descent direction.

So why in such condition,

it will be a descent direction.

We still use the Taylor's expansion here.

And we move f(xk) along pk

with a very small step size, ε.

Then we can get the second order Taylor's expansion.

And we can see that

when this ε is small enough,

then as long as

this term pk transpose gradient fk is negative.

Then the function value of f at xk+εpk

will be strictly smaller than f(xk).

So this somehow explains the intuition

why pk satisfying this condition

will be a descent direction.

So after we give the descent directions here,

then the next question is that

how we can choose other descent directions

in order to get around of slow convergence

like steepest descent direction?

We will talk about this later

in Newton's direction,

quasi-Newton direction

and also conjugate gradient directions.

Now for this part,

let's make a brief summary.

In this part,

we have introduced the line search strategy.

So there are two steps

in line search strategy.

First is step length,

how we determine a step length?

The second one is search direction.

How we determine the search direction?

Here we only introduce the first search direction,

the steepest descent direction.

And for other directions

we will talk about more later in the next part.

Until then,

see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

2.4 Line search strategy在线视频

2.4 Line search strategy课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

2.4 Line search strategy笔记与讨论

也许你还感兴趣的课程: