当前课程知识点:现代优化方法(英文) > Chapter 7. Theory of Constrained Optimization > 7.6 Second Order Necessary Condition > 7.6 Second Order Necessary Condition
Hi. Welcome to modern optimization methods.
Today we will talk about the sixth part of chapter seven,
the second order optimality conditions.
Recall that at the moment we are talking about the
constrained optimization problems
with equality constraints and inequality constraints.
And the KKT conditions we have introduced before
are as follows,
which we got the gradient of Lagrangian function
with respect to x to be zero,
as well as feasibility conditions
and complementarity conditions
also require that
when we introduce the constraint qualification,
we also use the linearized feasible direction,
F at point x,
which actually can approximate
or sometimes represent the tangent cone of Ω at x.
Now let's take a look at the properties
of the directions in linearized feasible directions.
Now let's take a gradient,
or let's make inner product
between the direction d
with the gradient of Lagrangian function
with respect to x,
then we can see that because at optimal solution x*,
the gradient of Lagrangian function is zero.
Therefore, the inner product is zero as well.
If we do expansion to the Lagrangian function,
then we can get that the two terms
are equal to each other.
Because the difference of them is zero.
In other words,
it means that
if we take direction d multiplied
by the gradient of f at optimal solution x*.
Then this term can be expressed
by the second term here.
We rearrange them into this part.
And this term,
as we can see that i in active set,
so we separate the equality constraints,
as well as the inequality active constraints
into two parts.
So why we do this analysis?
And we can answer this question in the next page.
Now,
if we take a further look at this equation here,
it actually means that
for the inner product of d
and gradient of function f
it can be expressed into two terms here.
And if we further take a look at the two terms,
then by the definition of linearized feasible directions,
we can have the conclusion that for this term,
it is actually zero.
Because for all the linearized direction,
we require that d takes inner product
with the gradient of equality constraints should be zero.
And also by the definition of linearized feasible directions,
this term should be nonnegative.
Because λi is nonnegative
and this term inner product is nonnegative.
So overall,
we have that the sum of the two terms here
are nonnegative,
meaning that for the linearized direction d
it is actually an increasing direction for the function value f ,
the inner product between the two term
will be greater than or equal to zero.
Now let's take a look at the result here.
So to summarize,
for all the linearized feasible direction d,
we have this property that
this d may be an increasing direction.
Or we have this d transposed the gradient to be zero.
So there are two possible cases.
One is that
this d will increase the function value
another case is that the d will be orthogonal to the gradient.
In other words,
along this case,
or for this case,
if d satisfies this condition,
then we have no idea
whether this function value will increase or not,
because such d is orthogonal to the gradient.
Therefore,
the second order conditions
actually take care of the directions d here.
So it requires or has some special properties
about such kind of d.
They have to control or characterize the properties
of such undecided directions
in order to analyze the second order optimality condition.
Now having this in mind,
then we will define the critical cone
which is actually related to such undecided directions.
Now,
to present the critical cone,
let's make some assumptions.
We assume that
the function f and constraints,
they are twice continuously differentiable,
because we need to use the second order information
of the two functions.
We have some settings.
We assume that at x*,
where x* is actually a local minimizer.
We give the linearized feasible direction F(x*).
Then we have some Lagrange multiplier λ*
satisfying the KKT condition.
In other words,
when we talk about the second order information,
we already assume that x* is the local minimizer.
There is a corresponding Lagrange multiplier
which satisfies the KKT condition.
Now we give the definition for critical cone.
So critical cone is actually defined as an intersection
of the linearized feasible direction.
Also,
with another property,
So which property
it actually requires that
for all the active inequality constraints,
the direction w should be orthogonal
to the gradient of this constraint.
Meanwhile,
we require that,
in this case,
the corresponding Lagrange multiplier λi
is also strictly positive.
So this direction will be the critical cone.
In other words,
critical cone we denote as C(x*, λ*),
is actually related to a local optimal solution x*,
as well as its corresponding Lagrange multipliers λ*.
With this x* and λ*, we can check the linearized feasible direction,
and we also pick up those directions
which satisfy the second part.
So the intersection between the two sets
will make up the critical cone.
In other words,
for the critical cone,
it actually requires the three conditions here.
The first one is that for all equality constraints,
we got that this direction should be orthogonal
to the gradient.
And for inequality constraints here,
which are active inequality constraints,
we separate them into two cases.
One is that if the λ multiplier is strictly positive,
then we require that this direction
should be orthogonal to the gradient of constraints.
Otherwise,
if this λ is zero,
then we only require that
it has nonnegative inner product.
So this will be the directions in the critical cone.
You may wonder that such direction looks weird
and we have no idea how it comes.
So we explain this in the page here.
And as we mentioned before,
that critical cone actually characterizes
those undecided directions,
for function value f.
Now let's take a look at
why this sentence is like this.
We have this remark.
If this λ is zero,
as we mentioned before,
for the inactive set.
So this is inactive inequality constraint.
In this case,
for this λ,
if they lie in the inequality constraint,
then we have this term should be strictly zero.
Because λi is zero.
Therefore this term is zero.
And for this direction in the critical cone,
then we do the inner product with the gradient of f
we can see that the inner product
can be represented in this term.
This is from the first order optimality condition.
And for this term,
as we can see that,
we can separate them into equality constraints,
active inequality constraints,
as well as inactive inequality constraints.
So for the inactive inequality constraints,
we already showed that this term is zero.
And for the active inequality constraints,
we already mention them in the previous part,
that this term is actually zero.
In other words,
for all the directions in the critical cone,
they actually characterize the undecided direction.
Because this direction is orthogonal to the gradient,
which means that
we have no idea along this gradient
whether you will increase or not.
Now let's take a look at an example
to show what is the critical cone.
We have minimization of function f
which is very simple.
It's just x1.
We have two inequality constraints.
I think you can figure out this problem
in two-dimensional space
and find the optimal solution zero.
So at this point,
then we can find out the active inequality constraints,
which both of the constraints are actually active.
And with this,
we can find a unique Lagrange multiplier here.
Which we can solve from the KKT condition.
We can also verify the LICQ condition.
by making the active set
and take the gradient of inequality constraints.
Then we get the LICQ holds.
In other words,
the Lagrange multiplier will be unique,
actually.
We can do the linearized feasible set.
We can calculate the gradient of the two terms
to make them satisfy this condition.
This is the linearized feasible direction.
Based on this.
We can calculate the critical cone.
So the critical cone actually picks up points
in this sets,
and then we look further
to find out the active inequality constraints with positive λi,
which is the second constraints.
So the second constraint we need the direction w,
orthogonal to this direction.
Therefore,
by simple calculation that
this w transpose gradient of c2
should be strictly zero.
Then we can see that
this w could only takes the form like (0, d2),
where this d2 is actually any real number.
Therefore,
we can see that by taking intersection
with the linearized feasible direction,
then the final critical cone will be zero,
the first item.
And the second one should be nonnegative
because we have this nonnegative.
Then for the critical cone,
we can characterize the second order sufficient condition.
Now let's make assumptions in the following.
We first assume that x* is a local minimizer
of the constrained problem,
and LICQ holds.
Then we have the corresponding Lagrange multiplier λ*,
which satisfies
the KKT condition.
Then the second order necessary condition
actually says that for all the directions
in the critical cone,
we require,
or we have the result as this.
So the Lagrangian function,
if we take the Hessian matrix with respect to x,
then this matrix is positive semidefinite
over this cone.
So in other words,
for all the directions in the critical cone,
we have this term to be nonnegative.
So this is the second order sufficient condition.
In other words,
for all the undecided directions,
they are actually positive definite directions
for the Lagrangian function.
All they are the nondecreasing directions
for the Lagrangian function.
Now we have some remarks here.
So this theorem actually means that
if this x* is a local solution,
then the Hessian of the Lagrangian function
has nonnegative curvature
along all the undecided directions
or along all the critical directions,
meaning that the Lagrangian function,
they have nonnegative directions.
Now let's make a brief summary about this part.
In this part,
we have introduced the second order necessary conditions.
So to derive the second order necessary conditions,
we actually introduce a very important definition
called the critical cone.
In the critical cone,
we characterize the undecided directions
for function f.
So for second order necessary condition,
we basically need some properties
about the undecided directions in the critical cone.
Okay,
we stop here for this part and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业