当前课程知识点:现代优化方法(英文) > Chapter 7. Theory of Constrained Optimization > 7.2 Examples One > 7.2 Examples One
Hi, everyone. Welcome to modern optimization methods
In this part, we will introduce the second part of chapter seven
the first order necessary condition
We begin with an example
Recall that in this
chapter we are considering the constrained optimization problem
Now we recall that the constrained optimization problem
takes the following form
We got minimizing function f(x)
where x is in n-dimensional space
We have some equality constraints
as well as some inequality constraints
This is the problem we are considering
Now we will try to
introduce the first order optimality condition
for the constrained optimization problem.
We begin with an example
and let's first define the so called active set
The active set is actually defined by given the point x
For example, we have x this point
And the indices where the constraints are obtained
with equality constraints
we will denote the indices as active set
In other words
we have the formally definition here
that it is composed of all the equality constraints
ε as well as the inequality constraints
where at x
the inequality constraints will be obtained with equality part
This is the active set
In other words, given a point x
the active set means that for these constraints
they are obtained by the exact equality
which means that the point will lie exactly on the
curve of constraints
At this point, we have some definitions for active and inactive
for inequality constraints
Recall that for inequality constraints
assume that we got inequality constraints like this
This is ci(x) equal to zero
And this part, maybe the
ci(x) greater than or equal to zero part
If x lies exactly on the boundary
then it means that this inequality is active for this point x
in other words, you cannot move
very freely by this constraint
However, if x lies
in the interior part of this constraints
then it means that in a very small region of this x
we can move freely.
Without taking account of this inequality constraints
This is for the active and inactive
for x with respect to a constraint ci(x).
Now let's give an example
We have a linear function
x1+x2. We have some equality constraints
where it is actually a circle constraint
Now, for this constraint
we can actually see that there is no inequality constraints
and we can reformulate this constraint as x1^2+x2^2-2
So by this reformulation
we illustrate the figure of this problem
Look at the figure here
And we have our feasible region as the circle
All the point on the circle will be the feasible set of this problem
And on this circle
we're trying to find a point
whose function value is the largest one
Recall that our objective function is x1+x2
If we write down the gradient of this function value
it is actually a constant vector
which means that along this constant vector
the function value will increase fastest
If we look at the
circle here and we try to find the largest function value
think about this problem
which point will we get for the optimal solution
Let's take a look at some examples
If we take the point here
suppose we are currently standing at this point
Now let's take a look at the gradient direction
The direction that the function value increases fast
So the gradient direction will move along this direction
pointing out that direction
And if we take a look at the
gradient of the constraints
which is actually orthogonal to the tangent line
You can see that this is the gradient about this constraint
We can see that at this point
if we move along this gradient direction
and move along to this point or this point
that our function value will actually increase
And suppose that we are standing here
at this point, we can actually have that
the gradient of the constraints
partial c1 will be
the inverse direction to the gradient of function value
In other words
for this point
we cannot move down further to reach the smaller function value
At this point
this is the smallest function value on the circle that we can achieve
And for any other point
for example, this point here
we actually reach the largest function value on the circle
If we take a look at the relationship
at the optimal value here
you can see that actually
the gradient of the constraint
lies on the same line of the gradient of f
and actually
they take the inverse direction
the directions of the two vectors, they're opposite
In other words
if x* is an optimal solution
it means that the gradient of the constraint
and the gradient of function f they are parallel
In other words
we can find a constant λ
which may be both positive or negative
such that the two vectors here are parallel
This is the first explanation for the optimality condition
Now let's take a look at the further interpretation
about why the gradient and the
gradient of constraint need to be parallel
Previously we just demonstrate by a figure
Now we analyze it by some mathematical formulae
We assume that x is feasible
In other words, we got this equality constraint to be zero
Because we have only one equality constraint
And now to maintain the feasibility
when we move to next step
the step s has to satisfy that the next
x+s has to be feasible as well
meaning that c1(x+s) need to be zero
To derive that
what such feasible direction needs to be satisfied
We do the Taylor's expansion to this function
It is approximately equal to this linear term
We just expand the function at x* by a linear term
We drop some high order term
It is approximately equal to c1(x)
at this point, plus a linear term here
So the linear term is
with respect to the gradient of the constraint here
transpose the direction
Recall that x is a feasible point
So this c1(x) is zero, actually
Then this approximation means that
the gradient of c1 at x* transpose s
This is the approximation
In other words
if we try to move along to the next feasible direction
the feasible direction has to approximately satisfy that
the gradient of the constraints
transpose the direction s to be zero
This is the feasibility constraint
or the feasibility condition
Now let's take a look at the second part
When we move along a feasible direction
we also need to consider the reduction in function value f
So what is the condition of this direction
if it needs to keep the reduction
As we mentioned before
such reduction in function f actually means that we got this
gradient transpose s to be strictly negative
In other words, this condition has to be satisfied
Therefore, to summarize
if we try to move along to the next iteration
then the direction s
needs to satisfy both the
sufficient reduction condition as well as the feasibility condition here
Now, the feasibility condition says
that you have this s
must be orthogonal to the gradient of the constraint
The feasibility condition says s
multiplied by this gradient need to be strictly negative
So what is the feasibility
for these choices of s
if we can find such direction s
satisfying both conditions
It actually means that x is not an optimal solution
Because we can still move forward
to get a better function value
And it also satisfies the condition
In other words
if no function
or if no direction s
satisfies the two conditions here
then we can say that x* probably is a local solution.
In other words
if we know that x* is a local solution
it means that we cannot move
forward to get another better solution
In other words
there is no such direction s
that satisfies the both
feasibility condition as well as reduction condition
This is a very important fact
So this condition holds actually
when the direction gradient f
and the direction of ci(x) are parallel to each other
As we mentioned before
if x* is already a local minimizer
then we cannot find such s
to satisfy the reduction condition as well as the feasibility condition
In other words
only when the two vectors are parallel
such s will be an infeasible direction
Therefore, for one equality constraint case
to derive the first order optimality condition
let's mathematically define the Lagrangian function
For one equality constraint case
the Lagrangian function is like this
We have the objective function f(x)
then minus the constant λ1 multiplied by c1
There is another constant λ1 here
It actually means the constant when they are parallel
So this is just the parameter λ1 here.
After we introduce the Lagrangian function
then at a local solution x*
we can find a scalar λ1*
such that the gradient of Lagrangian function is zero
If we take the gradient for the Lagrangian function
we can see that
it is exactly the gradient of the objective function f
minus the gradient of c1
with a multiple case
There is a parameter between the gradient of c1
Then in other words
this is exactly meaning that
the direction of gradient of f
is parallel to the direction of gradient of the equality constraint
This actually tells us the
first order optimality condition for constrained problem
Here we also require that this equality constraint to be held
In other words
there is another
default constraint here that we also require
this x to be a feasible point
So this gives the
first order optimality condition for the one constraint case
and also for the one equality constraint case
Now let's present it mathematically
We assume that if x* is a local minimizer
then there exists a scalar
which is a constant
such that the following two conditions hold
The first one is the Lagrangian function
If we take the gradient with respect to s then it is zero
meaning that the two vectors here are parallel
And the second one actually mentions that
this x must be a feasible solution
This is the first order optimality condition
We have several remarks about the
first order optimality condition here
So notice here that
this is just a necessary condition
meaning that if we know the local minimizer is x*
then we have such relationship
So conversely
if such relationship holds
we cannot guarantee that x* is a local minimizer
This is an important fact
Because we will show that for the maximum
or for the local maximizer x
it also satisfies the condition here
Meaning that this is just a necessary condition
Another important fact is that
for the parameter or the scalar λ1
it may be positive or negative
And there is no restrictions on this λ1
We just require that the two vectors to be parallel
Now let's make a brief summary about this part
In this part, we investigate the first order optimality condition
in the case that there is only one equality constraint
And we also introduce the definition of active set
which will be used later in this chapter
We stop here and see you next time
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业