当前课程知识点:现代优化方法(英文) > Chapter 2. Fundamentals of Unconstrained Optimization > 2.4 Line search strategy > 2.4 Line search strategy
Hi, everyone.
Welcome to modern optimization methods.
In this part,
we will look at 2.4,
the fourth part of chapter two,
line search strategy.
In this part,
we will mainly introduce the overview of algorithms
and the line search strategy.
Now let's take a brief look
at numerical algorithms.
As we mentioned in chapter one,
the numerical algorithms actually start
from initial point x0
and then iterate
until it reaches some stopping criterion.
So it will generate a sequence xk.
And the stopping criterion may set as follows.
If there is no progress,
then we will stop the iteration.
Or if we reach an approximate solution,
then we will also stop the iteration.
Now,
as we mentioned before,
the key point is
how we move from xk the current iterate
to the next iterate x(k+1).
As long as we figure out this process,
then we can design numerical algorithms.
Now naturally
there are two ways to generate the next iteration.
We will detail this in the next slide.
Now the current information we may use
including the current function values f(xk)
or even more previous function values,
including f(x(k-1)) until f(x0).
Also,
if the function value it is differentiable,
we may also use the gradient information.
The main aim in this case is that
we try to reach a smaller function value
than the current iterate.
So this is our objective.
Now let's take a look at
how we move from the current iterate
to the next iterate.
There are basically two strategies.
The first one is the line search strategy.
The second one is the trust region strategy.
The two strategies have different ideas
to design the next iteration.
Now in this part,
we first take a look at line search strategy.
For line search methods,
the main idea for line search methods is like this.
We stand at the current iterate.
And we choose a direction,
for example,
we choose the direction here.
Then we move along this direction
at a step size αk.
So then this will be the next iteration.
So mathematically speaking,
the next iteration will take the following form
x(k+1)=xk+αkpk.
Now there are several definitions.
The pk which we choose
is called the direction.
So this is the first step
we choose the direction pk.
After we get this pk
then we need to choose a step size αk.
Basically speaking,
the main idea for line search strategy
are divided into two steps.
The first one is
we choose the direction.
The second one is
we choose a step length.
Now let's take a further look
at the step length.
How we determine the step length?
In other words,
how large I will move along this direction?
So the idea is that
we try to minimize
a one-dimensional function value Φ(α).
In order that
we can reach a smaller Φ(α),
the returned solution is actually the step size.
So we have two ways to find this step length.
The first way is that
we do exact line search.
That means
we solve this minimization problem
to reach the global minimizer.
Then this global minimizer
will be the exact line search step.
Another way is the inexact line search,
which means that
the step length may not be necessarily
the global minimizer of the problem.
However,we just require that
the step length achieves adequate sufficient decrease
in terms of function value.
So there are different concerns
for the two kinds of step size.
For the exact line search,
the benefit is that
we can reach a global minimizer of this Φ(α).
In other words,
we can get the smallest function decrease
along the direction pk.
However,
to achieve this goal,
we need a lot of function values.
Because we need to find the global minimizer.
So it may involves a lot of computational cost
and a lot of inner iterations.
Another way is the inexact line search,
the decrease in function value is adequate reduction.
So if there are some adequate reduction
in this function value.
However,
αk can be achieved by using lower function value cost.
In other words,
the inexact line search can save some computational cost.
Meanwhile,
it can reach some adequate function reduction.
Now in terms of line search methods,
there are different ways to determine the search direction pk.
So here I list some of the different ways
to determine the directions.
For example,
if the direction is determined
by steepest descent direction,
then the resulting method is called
steepest descent method.
Similarly,
we can also determine the direction
by Newton's direction
or by quasi-Newton directions
or by conjugate gradient directions.
So there are different ways
to determine the directions.
Now let's first take a look
at the steepest descent direction.
So from the name,
you can see that
the steepest descent direction
actually reduces the function value
in a very fast way.
So let's take a look at what is its form.
In this case,
direction pk is taken as the minus of the gradient.
In other words,
it is the negative direction of gradient.
Let's recall that
for a gradient of a function,
it actually represents the fastest way that
a function value grows.
So if we take the minus of this gradient,
it actually represents the fastest way that
a function decreases.
So recall that
we are trying to minimize a function.
So we try to find the fastest way that
function value decreases.
So this direction actually returns us
the steepest direction.
Now we have a simple result,
which explains
why this direction is called steepest descent direction.
So the result basically says that
among all the directions,
the steepest descent direction
actually the one that along which
the function value decreases most rapidly.
So this is a result.
We put it as a theorem.
However, here we did not give the proof.
I leave the proof to you as a homework.
After giving the steepest descent direction,
let's take a look at a graph to show
how steepest descent direction works.
Now look at this figure.
We got some curves
or some circles here.
So as we recall that
this circle represents the contours of a function.
So suppose we are standing at xk here
and we try to choose a steepest descent direction.
So it is actually orthogonal
to the tangent line here.
So this will be the
steepest descent direction
pointing to the minimizer,
pointing inside.
So the steepest descent direction actually choose this way.
And then if we move along pk
at some step size αk
then we jump to a point x(k+1).
Then in the next iterate,
we still choose the steepest descent way
and the iteration goes on.
So this actually shows that
the steepest descent direction is orthogonal
to the contours of f.
Because it's pointing directly into the minimizer.
So for the steepest descent direction,
we have some remarks about this.
The first remark is that
the advantage of this direction.
As you can see,
that it actually requires only the gradient information,
which means that
the computational cost is not that expensive.
So we just need the gradient.
The disadvantage is that
it may be extremely slow for some particular examples.
Because in some very bad situations,
if you move along the steepest descent directions,
there will be very slow convergence rate.
We will discuss the convergence rate
later in chapter three.
And more extensions is that
if we choose other directions,
not steepest descent directions.
So what kind of directions we can choose?
So we give a definition here that
any direction pk satisfying the condition here
will be a descent direction.
So what is this condition?
This condition basically means that
pk takes inner product with gradient of f at xk
will be strictly smaller than zero.
In this case,
the direction pk will be a descent direction.
So why in such condition,
it will be a descent direction.
We still use the Taylor's expansion here.
And we move f(xk) along pk
with a very small step size, ε.
Then we can get the second order Taylor's expansion.
And we can see that
when this ε is small enough,
then as long as
this term pk transpose gradient fk is negative.
Then the function value of f at xk+εpk
will be strictly smaller than f(xk).
So this somehow explains the intuition
why pk satisfying this condition
will be a descent direction.
So after we give the descent directions here,
then the next question is that
how we can choose other descent directions
in order to get around of slow convergence
like steepest descent direction?
We will talk about this later
in Newton's direction,
quasi-Newton direction
and also conjugate gradient directions.
Now for this part,
let's make a brief summary.
In this part,
we have introduced the line search strategy.
So there are two steps
in line search strategy.
First is step length,
how we determine a step length?
The second one is search direction.
How we determine the search direction?
Here we only introduce the first search direction,
the steepest descent direction.
And for other directions
we will talk about more later in the next part.
Until then,
see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业