当前课程知识点:现代优化方法(英文) > Chapter 6. Semismooth Newton's Method > 6.1 Semismoothness > 6.1 Semismoothness
Hi, welcome to modern optimization methods.
Today we will introduce chapter six,
semismooth Newton's method.
This is the outline of this chapter.
We will start with the definition of semismoothness.
Then we introduce the semismooth Newton's method.
After that,
we will discuss the support vector machine problem.
And we will apply the semismooth Newton's method
to that SVM problem.
Finally,
we will present some numerical results by
exploring the sparsity of the problem.
Now we start with the first part,
the semismoothness.
Semismoothness is originally proposed by Mifflin.
And before that,
we need some preparations about definitions.
For example,
we assume that this Φ from m-dimensional space
to l-dimensional space to be a locally Lipschitz continuous function.
And according to Rademacher's theorem,
Φ is differentiable almost everywhere.
Based on this assumption of Φ,
let's define the set where Φ is differentiable as DΦ,
so DΦ represents those points,
at which Φ is differentiable.
We collect all the differentiable points
and put them in DΦ.
Let Φ'(x) denote the Jacobian of Φ
at any differentiable point x in DΦ.
These are some preliminary notations that
we will use later.
Based on the notations here,
we make further definitions.
We introduce the Bouligand subdifferential at x,
it's defined by the accumulation point of Φ' at xk,
where this xk is actually convergent to x
and xk is a differentiable point of Φ.
Then the Φ' of xk will converge to an accumulation point,
such point we denote it as V.
All the collections of these points V,
we define as the Bouligand subdifferential of Φ.
We use the ∂BΦ(x) to denote this Bouligand subdifferential.
So let's take a look at a simple example.
The small Φ is actually the maximum of zero and x.
Then from this definition,
we can see that
the point that
Φ is differentiable is actually infinity to zero
and zero to positive infinity.
In other words,
except zero,
all the other points in R is
continuously differentiable.
And at zero,
this is the main focus we will think about.
And for other points that x is not zero,
we can easily calculate the D',
the Jacobian.
For x greater than zero,
it is one.
And if x is smaller than zero,
it is zero.
Therefore,
at zero this point,
we can calculate the Bouligand subdifferential
according to the definition here.
And we simply take the negative sequence
and positive sequence turning to zero,
to take the limit of the Φ'.
And then we can get that the Bouligand subdifferential
at zero is either zero or one,
is the collection of zero and one.
Now let's introduce the convex hull of a set.
A convex hull of a set is defined
by all the convex combinations of the points in this set.
Recall that we have introduced the convex combinations
in chapter one,
which means that all the coefficients of xi
is between zero and one,
and all the sum of the coefficients will be one.
In that case,
this result will be called the convex combinations
of the points x1 to xn.
For a set A we take all the elements
or any elements from A
and we take the possible convex combinations,
then the resulting set will denote the convex hull of A.
Coming back to our simple example,
if you got A at zero or one,
the two sets of points,
and we take the convex combination.
Then the resulting set will be interval between zero and one,
so this is the convex combination.
Now after that,
we are ready to give the Clarke's generalized Jacobian.
The Clarke's generalized Jacobian at a point x,
is actually a convex hull of the B Bouligand subdifferential.
In other words,
the Clarke's generalized Jacobian,
which is denoted as ∂Φ at point x
is the convex hull of the Bouligand set.
Now coming back to our example,
the generalized gradient for max function
is actually given as follows.
It depends on the choices of x.
If x is zero,
then it is actually the definition,
as we mentioned,
is the combination between zero and one.
Therefore,
it is the value of v where this v is between zero and one,
so it is a multi-value function at this moment.
If t is not zero,
then it is actually differentiable.
Therefore,
the convex hull or the Clarke's generalized Jacobian coincides
with the Jacobian of Φ at x.
Therefore,
if t is greater than zero,
then the Clarke's generalize gradient is one.
If t is smaller than zero,
then it is zero.
This is a simple example to demonstrate,
what is the Clarke's generalize Jacobian.
Now this figure actually shows us that
what or how the generalized Jacobian works for the max function.
This is the curve for the max(0,t) this function.
And we can see that this curve
is differentiable everywhere except zero,
so in other words,
except t equals zero,
all the other points they have unique Clark's generalized Jacobian.
And if t is zero,
then we have to calculate
the generalized Jacobian
by first calculating the Bouligand subdifferential,
then take the convex hull.
The Bouligand subdifferential is the limit point of the gradient,
so we take the a sequence from the left hand side
approaching to zero,
and the Φ' will be zero.
And another sequence from right hand side turning to zero
with the Φ' is one.
The Bouligand subdifferential is zero or one,
two points.
Then we take the convex hull to get zero and one this interval.
Now having given the generalized Jacobian matrix,
then we are ready to give the definition for semismoothness.
The concept of semismoothness was introduced
by Mifflin firstly for functionals.
And then it was extended to vector-valued function
by Qi and Sun.
Now the definition for semismoothness has two conditions.
First one is that Φ is directional differentiable.
Another one is that for any matrix V
between or inside of the generalized Jacobian of Φ
at x+h, we have this expansion.
If we take a detailed look at this expansion,
it basically says that the function value
of Φ(x+h) minus the function value of Φ(x)
and then minus V multiplied by h
is a high order of the length of h.
It seems that this formula
is a bit similar to the Taylor's expansion.
In fact,
as we compare this equation with Taylor's expansion.
The difference we can find that it lies in this V.
In Taylor's expansion,
if we do Taylor's expansion of this function Φ at x,
then the V here should be the subdifferential,
or the Jacobian of Φ at x.
However,
for the definition of semismoothness,
such Φ actually comes from the subdifferential
or generalized Jacobian of Φ at x+h.
It's at this point rather than x,
so in other words,
it actually means that if this h comes from the left hand side,
then this formula basically means that
I do expansion along the left hand side,
then I can get the high order approximation.
The V actually depends on the coming direction h
or the changing direction h.
If the second formula holds together with the first one,
then the Φ is said to be semismooth.
Now we also have a very special definition
called γ-order semismooth.
It means that,
if Φ is semismooth,
and further the following equation holds.
This equation is similar to the semismoothness definition.
However,
the difference lies in the order of h.
If we take a look at the order of h,
we can see that it means that
the left hand side can be written as the same order of Φ
in the order of 1+γ,
here γ is a positive number,
so this is called γ-order semismoothness.
Until here,
we have defined two semismoothness.
One is the general semismooth.
The second is called γ-order semismooth.
We have a very special definition for the γ-order semismoothness.
That is,
if the γ value is one.
Then we have the following equations reduces to,
the expansion of Φ at x+h is the same order of the norm of h,
the square norm of h.
In this case,
we call the semismoothness strongly semismooth.
This is a very special case for γ-order semismoothness,
which is called strongly semismooth.
We will talk about strongly semismooth later
when we apply our semismooth Newton's method
to support vector machine.
Now again,
we demonstrate this figure to show the semismoothness.
Actually,
for the max function,
it is strongly semismooth.
As we can see,
the strong semismooth basically says,
if we do a sort of linear expansion along the coming direction h,
we will get high order approximation.
And compared to the max function,
max function is basically a piecewise linear function,
in other words,
if we come from the left hand side approaching zero,
it is actually a straight line.
By this expansion,
we can actually calculate that this term is exactly zero,
because you've got only linear term,
so this linear term is fully represented in this part.
And this term will be zero,
in other words,
this max function is a strongly semismooth function.
If we do a little bit extension,
any piecewise linear function,
they are all strongly semismooth,
so this is the result.
And for other results about semismooth functions,
we can actually see that
any composition of semismooth functions will be semismooth.
And any strongly semismooth functions,
their compositions will also be strongly semismooth.
This is some existing results about semismoothness,
in other words,
if you try to verify whether a function is semismooth,
one way is by definition,
another way is by some existing properties like this.
Now let's make a brief summary about this part.
In this part,
we mainly introduce the definition of semismoothness.
To reach the definition of semismoothness,
we first introduce the Bouligand subdifferential,
then the convex hull and the Clarke's generalized Jacobian,
and finally we reach the semismoothness.
We stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业