当前课程知识点:现代优化方法(英文) > Chapter 9. Penalty and Augmented Lagrangian Methods > 作业 > 6.2 Semismooth Newton's Method
Hi, welcome to modern optimization methods.
Today we will introduce the second part of chapter six,
semismooth Newton's method.
Now let's first recall the definition for semismoothness.
As we introduced in the previous part of this chapter,
semismoothness needs to satisfy two conditions.
The first one is that we require this vector function Φ
to be directional differentiable.
And for any V in the generalized Jacobian of Φ at x+h,
the following equation should be satisfied.
Furthermore,
a condition here,
meaning that the left hand side can be represented
at the same order of the 1+γ norm of h.
Then we also said Φ is γ-order semismooth.
Now let's present the semismooth Newton's method.
Actually,
let's make a brief review about semismooth Newton's method.
As we mentioned before,
the semismoothness definition was originally proposed
by Mifflin in 1977.
Then it was extended to Newton's method by Qi and Sun in 1993.
After that,
it was widely used for vector problems.
Now the first extension to matrix problem
is proposed by Qi and Sun in 2006 for nearest correlation matrix.
After that,
semismooth Newton's method received great attention
from optimization community.
And until currently,
it acts as an efficient subsolver
for quadratic semidefinite programming problem,
and also for nonlinear programming problems.
Now let's take a look at the nonsmooth version
of Newton's method.
Actually,
we consider solving a nonlinear equation
instead of solving optimization problem.
We got this Φ,
as we defined before,
from m-dimensional space to n-dimensional space.
And Φ(x) equals zero is the system we are trying to solve,
so the classical Newton's method to solve the system
will update xk like this.
In each iteration,
next iteratio x(k+1) will be generated
by xk minus the Newton's direction,
so Newton's direction is generated by Vk^(-1) multiplied by Φ(xk).
Here,
Vk is the Jacobian of Φ at xk ,
so in this case,
Newton's method actually assume that Φ is differentiable.
However,
for nonsmooth Newton's method,
it does not need the continuously differentiable of Φ.
We only assume that Φ is a locally Lipchitz function
and Φ satisfies some nonsmooth properties,
so in this case,
we get the nonsmooth Newton's method,
meaning that when we update the Newton's direction,
we choose this matrix Vk
from the Clarke's generalized Jacobian set.
The ∂Φ(xk) is what we have defined in the previous chapter,
it is the Clarke's generalized Jacobian,
which is normally a set.
This is the nonsmooth Newton's method,
in other words,
if at this point xk,
Φ is differentiable,
then this Vk is a unique matrix,
because you got the generalized Jacobian matrix coincides
with the Jacobian set.
However,
if at xk,
this xk is not differentiable,
we need to choose the Vk from a set,
so any elements in this set is okay for this update.
This is the definition for nonsmooth Newton's method.
And the relation between the two is that,
when Φ is differentiable, then the nonsmooth version
will reduce to the classical Newton's method,
otherwise they are different.
Now let's take a look at the local convergence rate
of nonsmooth version of Newton's method.
We assume that the generalized Jacobian set is nonsingular
and x0 is sufficiently close to x bar,
where this x bar is the solution of the nonlinear system.
If Φ is semismooth at x bar,
then we can analyze the local convergence rate
in the following way.
We do the reformulation of the distance from the next iteration
to x bar,
so this term we reformulate it in the following sense,
where we put out Vk^(-1) here.
Then the inside of this part is represented like this.
Here we add a zero term Φ(x bar),
so notice that Φ(x bar) is zero.
Then this term,
if we take a further look at it,
then we can see that,
by our assumption,
this inverse of Vk is bounded.
And also we use the semismoothness property for the second term here.
And we can see that this term
is a high order of the distance of xk to x*.
In other words,
from the next iteration to the current iteration,
if we take the ratio and take the limit,
we can see that by the definition,
the convergence rate is superlinear.
Because you got high order relationship of the two terms here.
And furthermore,
if Φ is γ-order semismooth,
then the high order can be reduced to the same order of the distance
of xk to x bar.
The order is 1+γ.
Recall the strongly semismoothness,
which corresponds to γ equals one.
Then we can see that if Φ is strongly semismooth,
then the order here will be two,
in other words,
the convergence rate will be quadratic.
This is a beautiful result.
Now we have finished the local convergence analysis.
We make a bit summary
if we try to get the local convergence result,
we need two essential conditions.
The first one is that the semismoothness,
or strongly semismoothness of the function Φ.
Another condition is that
we need the nonsingularity of all the matrices V
in the generalized Jacobian set.
Why do we need all the matrices V to be nonsingular?
Because in nonsmooth Newton's update,
we pick up any V, so you cannot specify
which V you choose.
We have to assume that
all matrices V in generalized Jacobian set is nonsingular.
With the two conditions,
then we can guarantee the superlinear or quadratic convergence rate.
Now let's take a look at the global convergence result
for semismooth Newton's method.
And we consider solving the minimization problem.
We try to minimize function f,
which we assume that function f
is continuously differentiable,
so it is equivalent to solving the gradient to be zero.
Then we can see that
the gradient to be zero is actually the nonlinear equations
that we are going to solve for Newton's method,
or semismooth Newton's method.
We apply semismooth Newton's method directly
to solve this first order optimality conditions,
and this is the update.
We assume that the gradient f is semismooth,
and also the generalized Jacobian set is nonsingular,
then the local convergence can be guaranteed.
So how we can get global convergence?
To reach global convergence,
as we recall in chapter two and three,
we need some step length,
in line search way,
so we add the following update like this.
We update the next iteration
by choosing the semismooth Newton's direction,
then we use a line search of αk.
How do we do the line search?
Next,
we present a globalized version of semismooth Newton's method
using the backtracking line search strategy.
Now the algorithms are presented here,
where we replace the variable x by ω,
actually.
The settings are that
we choose the starting point ω0 from any point.
We have some parameter σ between zero and one,
a ratio ρ between zero and one,
and δ,
η0 and η1.
The first step is that we check whether we need to stop.
If the norm of function f is small enough,
we can regard that we have reached the optimality condition
and we stop.
Otherwise,
we update and we go to the second step.
The second step is that
we choose any element from the generalized Jacobian set,
and we solve the Newton's direction by conjugate gradient method,
so that's why we introduce conjugate gradient method
in chapter five.
It is very important to apply CG to solve this linear system,
in other words,
we do not have to solve the inverse,
to solve the exact Newton's direction,
we just solve the inexact direction.
This is the stopping criterion for solving linear system,
and this actually means that we just solve the residual
of the linear system,
which can be bounded by a constant of norm of function f
at the current iteration.
By doing this,
we can already show the global convergence.
And the third one is the step length that we use.
We apply the backtracking strategy to do the step length.
That means that we start with mk equals zero,
and increase mk one by one
until we can find the following condition holds.
Then we reach the αk in the form of ρ^mk.
After that,
then we can get the next iteration in the following form,
and the cycle goes on and on.
By extending the nonsmooth Newton's method
to its globalized version,
then it can be successfully used
to solve other unconstrained optimization problems.
And we do not have to worry about
how to choose the initial point.
This is a very typical version
for globalized semismooth Newton's method.
Now,
for the global convergence result,
after adding the step length strategy,
then we can guarantee the global convergence
for the previous globalized version
of semismooth Newton's method.
Now to make a brief summary,
our previously presented globalized version
of semismooth Newton's method
will both maintain the global convergence result
and fast local convergence result,
so it is a very good algorithm.
Now let's make a brief summary about this part.
We have introduced the nonsmooth version of Newton's method
and discuss the local convergence rate,
as well as the globalized version of Newton's method.
We stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业