当前课程知识点:现代优化方法(英文) > Chapter 2. Fundamentals of Unconstrained Optimization > 2.2 Optimality Conditions Ⅰ > 2.2 Optimality Conditions Ⅰ
Hi, everyone.
Welcome to modern optimization methods.
In this part,
we will introduce the second part of chapter two
called the optimality condition.
We will basically look at first order necessary condition,
second order necessary condition,
as well as second order sufficient condition.
Before that, let me ask you a question.
So we already introduced the definition of local minimizers
and global minimizers.
So if I give you a point for a function,
how do you verify that
whether this point is a local minimizer
or a global minimizer?
So you may answer that like this.
I just checked by definition.
So this is one way.
However, this is a theoretical way,
because in practice you may never examine all the points
in a neighborhood of a point.
So it is just for theoretical definition.
In practice, how we recognize a local minimizer?
We need some theoretical tools to characterize it.
So what kind of theoretical tools?
We actually use the information of gradient
as well as Hessian matrix.
As we mentioned before,
in continuous optimization,
we can use the neighborhood information
to estimate the state of one point.
So from our following analysis,
we will see this point.
Now,
let's first introduce the first order necessary conditions.
You may ask, what is the necessary condition?
So necessary condition actually are derived
by assuming that a point is a minimizer.
In other words,
if I know that this point is a minimizer,
so what condition can we get?
So the necessary conditions actually talk about this question.
Now let's first introduce the first order necessary conditions.
What is the first order necessary conditions?
It is related to the first order information of a function.
So in mathematical way,
we need to first assume that if x is a local minimizer,
because we need this necessary condition,
then we further need to assume that
f is continuously differentiable,
because we will characterize the first order condition.
Also, we need that f has gradient.
Now after these two assumptions,
we can see that the gradient of f at x* is zero.
So this is the first order necessary condition.
It's very simple.
And it is popular used in many applications.
Now if we try to prove this theorem,
what can we do to prove it?
So this is the first theorem
that we will show how we prove the result.
Now actually there are many ways to prove this result.
Here we use the contradiction way.
So the contradiction way basically in the first step,
we assume that for contradiction,
the gradient of f at x* is not zero.
So we assume it does not hold.
Then we try to derive some contradictions.
Now by the gradient is not zero,
then we can define a direction p
as the inverse direction of gradient at x*.
Now this p we take the inner product
of p with the gradient of f at x*.
Then we get the minus of squared norm of f at x*.
So this term from the nonzero of gradient f.
So this part is strictly smaller than zero.
Now in the third step,
we try to use our condition.
We have assumed that
the gradient f is continuously differentiable
in an open neighborhood of x*.
Because this gradient is continuously differentiable.
Therefore, this p transpose a gradient f at x*,
it is continuously differentiable as well.
Now we take a little perturbation of x*,
and we move x* to a little bit to tp along this direction.
And this term will also keep the strict negativity.
So this is the key point.
Then, for any t in zero and t in this interval,
we can find that
if we do Taylor's expansion here,
then we can have this result.
So the Taylor's expansion basically says that
we have function values of f at x*+t bar p
and we do it at x*.
So it can be expressed as the function value of f at x*
plus the second term here.
So this term is related to the first order information.
It is very similar to the term here.
The difference lies in that
this t is between zero and t bar.
So recall our t bar is here.
t bar is between zero and T.
So this t is even smaller than t bar.
So the condition here,
the negative condition still holds.
Therefore, we have proved that
for the function value of f at x*+t bar p,
it is strictly smaller than f at x*.
Because the second term here is shown to be strictly negative.
So in this sense,
we have shown that this function value is smaller than f(x*).
Now in other words,
we have found a direction p that
if we start from x*
and move a little bit along this direction p,
then we can reach a even smaller function value than f(x*).
In other words,
this will violate the claim that x* is a local minimizer.
So in this way,
we derive a contradiction.
Therefore, the result is proved.
So this is the way of a contradiction way to prove the theorem.
Of course you may got other ways to prove the result.
Now by the first order necessary condition,
let me introduce a definition called stationary point.
We call x* a stationary point if its gradient is zero.
In other words, as long as the gradient is zero,
this point will be a stationary point.
So as you can see that
any local minimizer of a smooth function,
it is a stationary point.
Because for smooth function,
we already have its gradient exists.
Therefore, it satisfies the first necessary condition of f.
Therefore, we have any local minimizer
must be a stationary point for smooth functions.
However, if the function is not smooth,
then it may not be a stationary point of f.
For example, I draw a picture here.
You got
absolute value
here
which is not smooth.
And for this nonsmooth function,
we know that x=0 is a minimizer.
Of course it is a local minimizer.
However, this local minimizer
does not satisfy the gradient to be zero.
Why?
Because you even do not have gradient.
Because f is not differentiable at zero.
So there's no gradient.
Of course there is no such condition.
So here the smooth function is very important one.
If it is not smooth,
we cannot have such conclusions.
So this is why I claim that
a stationary point is not necessarily a local minimizer.
Now let me talk about the second order necessary condition.
So from the name,
we know that
second order necessary condition must assume that x*
is already a local minimizer.
Another information we can get is that
this second order information is talking about
the second order Hessian matrix.
So we must assume that the Hessian matrix exists.
This is the theorem.
So we also assume that x is a local minimizer.
Moreover we assume Hessian matrix exists
and is continuous.
So the continuity holds in an open neighborhood of x*
under the two assumptions here.
Then we have the first order optimality condition first.
So the gradient must be zero.
And moreover the Hessian matrix
is positive semidefinite at x*.
So this is the second order necessary condition.
So basically,
it means that this Hessian matrix at x*
is positive semidefinite.
Now let's take a simple example to show you.
So we assume that in n-dimensional space,
you got a quadratic function here.
The quadratic function take the form of
x transpose Qx + b transpose x
in this form.
And if x* is the local minimizer of this function,
then we can have the gradient to be zero.
Actually, the gradient takes the form of Qx*+b
and Hessian matrix is positive semidefinite.
So we take the Hessian matrix,
then you can see that
the Hessian function of f is a constant matrix Q
and we require this Q to be positive semidefinite.
So this is the second order necessary condition
for quadratic function.
Now in the following,
we will think about how we prove this result.
Below, we also use the way of contradiction.
So we assume that for contradiction,
the Hessian matrix is not positive semidefinite.
In other words,
we can find a vector p such that
p transpose the Hessian matrix
multiplied by p must be strictly negative.
In other words,
in this case,
Hessian matrix is not positive semidefinite.
So this is the condition we can get by assumption.
Then we try to use the continuity of Hessian matrix.
So the continuity of Hessian matrix actually implies that
if we take p transpose multiplied by the Hessian matrix
and then multiplied by p,
it will be strictly negative.
Why?
Because we take a little perturbation of x*.
We change it to x*+tp.
Then in this small neighborhood of x*,
the strict negative condition still holds.
Then with this condition,
if we try to take use of this condition,
we need Taylor's expansion.
So in this case,
Taylor's expansion must be expanded to the second order one.
So we do Taylor's expansion at x*+t bar p.
And we try to expand this function value at x*.
So we basically got three terms.
The first term is function value at x*.
The second term is the first order information.
And the third term is the second order information.
Because by the first order necessary condition,
we have the gradient of this part to be zero.
So this term is cancelled out.
And we are left with two terms here.
Moreover, by step two,
we have this part to be strictly negative one,
meaning that
the function value of f at x*+t bar p is strictly smaller than
the function value of f at x*.
In other words,
in the neighborhood of x*,
we have found a direction p
that if we walk along this direction p and move a little bit,
then you can reach another
function value which is strictly smaller than this one.
So this actually violates the fact that x* is a local minimizer.
So in this way,
we also derive a contradiction.
So by this way,
we have proved the result that
the second order condition actually holds in this case.
So this is the way we prove the result.
Now let's take a brief summary about this lecture.
So in this part,
we actually introduce the first order necessary condition,
and also the second order necessary condition.
Moreover we have a definition called the stationary point.
So it is actually closely related to the local minimizer.
However, they are not equivalent.
In the next part,
we will introduce further the second order sufficient conditions.
so until then, let's see you next time.
Bye.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业