6.1 Semismoothness慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, welcome to modern optimization methods.

Today we will introduce chapter six,

semismooth Newton's method.

This is the outline of this chapter.

We will start with the definition of semismoothness.

Then we introduce the semismooth Newton's method.

After that,

we will discuss the support vector machine problem.

And we will apply the semismooth Newton's method

to that SVM problem.

Finally,

we will present some numerical results by

exploring the sparsity of the problem.

Now we start with the first part,

the semismoothness.

Semismoothness is originally proposed by Mifflin.

And before that,

we need some preparations about definitions.

For example,

we assume that this Φ from m-dimensional space

to l-dimensional space to be a locally Lipschitz continuous function.

And according to Rademacher's theorem,

Φ is differentiable almost everywhere.

Based on this assumption of Φ,

let's define the set where Φ is differentiable as DΦ,

so DΦ represents those points,

at which Φ is differentiable.

We collect all the differentiable points

and put them in DΦ.

Let Φ'(x) denote the Jacobian of Φ

at any differentiable point x in DΦ.

These are some preliminary notations that

we will use later.

Based on the notations here,

we make further definitions.

We introduce the Bouligand subdifferential at x,

it's defined by the accumulation point of Φ' at xk,

where this xk is actually convergent to x

and xk is a differentiable point of Φ.

Then the Φ' of xk will converge to an accumulation point,

such point we denote it as V.

All the collections of these points V,

we define as the Bouligand subdifferential of Φ.

We use the ∂BΦ(x) to denote this Bouligand subdifferential.

So let's take a look at a simple example.

The small Φ is actually the maximum of zero and x.

Then from this definition,

we can see that

the point that

Φ is differentiable is actually infinity to zero

and zero to positive infinity.

In other words,

except zero,

all the other points in R is

continuously differentiable.

And at zero,

this is the main focus we will think about.

And for other points that x is not zero,

we can easily calculate the D',

the Jacobian.

For x greater than zero,

it is one.

And if x is smaller than zero,

it is zero.

Therefore,

at zero this point,

we can calculate the Bouligand subdifferential

according to the definition here.

And we simply take the negative sequence

and positive sequence turning to zero,

to take the limit of the Φ'.

And then we can get that the Bouligand subdifferential

at zero is either zero or one,

is the collection of zero and one.

Now let's introduce the convex hull of a set.

A convex hull of a set is defined

by all the convex combinations of the points in this set.

Recall that we have introduced the convex combinations

in chapter one,

which means that all the coefficients of xi

is between zero and one,

and all the sum of the coefficients will be one.

In that case,

this result will be called the convex combinations

of the points x1 to xn.

For a set A we take all the elements

or any elements from A

and we take the possible convex combinations,

then the resulting set will denote the convex hull of A.

Coming back to our simple example,

if you got A at zero or one,

the two sets of points,

and we take the convex combination.

Then the resulting set will be interval between zero and one,

so this is the convex combination.

Now after that,

we are ready to give the Clarke's generalized Jacobian.

The Clarke's generalized Jacobian at a point x,

is actually a convex hull of the B Bouligand subdifferential.

In other words,

the Clarke's generalized Jacobian,

which is denoted as ∂Φ at point x

is the convex hull of the Bouligand set.

Now coming back to our example,

the generalized gradient for max function

is actually given as follows.

It depends on the choices of x.

If x is zero,

then it is actually the definition,

as we mentioned,

is the combination between zero and one.

Therefore,

it is the value of v where this v is between zero and one,

so it is a multi-value function at this moment.

If t is not zero,

then it is actually differentiable.

Therefore,

the convex hull or the Clarke's generalized Jacobian coincides

with the Jacobian of Φ at x.

Therefore,

if t is greater than zero,

then the Clarke's generalize gradient is one.

If t is smaller than zero,

then it is zero.

This is a simple example to demonstrate,

what is the Clarke's generalize Jacobian.

Now this figure actually shows us that

what or how the generalized Jacobian works for the max function.

This is the curve for the max(0,t) this function.

And we can see that this curve

is differentiable everywhere except zero,

so in other words,

except t equals zero,

all the other points they have unique Clark's generalized Jacobian.

And if t is zero,

then we have to calculate

the generalized Jacobian

by first calculating the Bouligand subdifferential,

then take the convex hull.

The Bouligand subdifferential is the limit point of the gradient,

so we take the a sequence from the left hand side

approaching to zero,

and the Φ' will be zero.

And another sequence from right hand side turning to zero

with the Φ' is one.

The Bouligand subdifferential is zero or one,

two points.

Then we take the convex hull to get zero and one this interval.

Now having given the generalized Jacobian matrix,

then we are ready to give the definition for semismoothness.

The concept of semismoothness was introduced

by Mifflin firstly for functionals.

And then it was extended to vector-valued function

by Qi and Sun.

Now the definition for semismoothness has two conditions.

First one is that Φ is directional differentiable.

Another one is that for any matrix V

between or inside of the generalized Jacobian of Φ

at x+h, we have this expansion.

If we take a detailed look at this expansion,

it basically says that the function value

of Φ(x+h) minus the function value of Φ(x)

and then minus V multiplied by h

is a high order of the length of h.

It seems that this formula

is a bit similar to the Taylor's expansion.

In fact,

as we compare this equation with Taylor's expansion.

The difference we can find that it lies in this V.

In Taylor's expansion,

if we do Taylor's expansion of this function Φ at x,

then the V here should be the subdifferential,

or the Jacobian of Φ at x.

However,

for the definition of semismoothness,

such Φ actually comes from the subdifferential

or generalized Jacobian of Φ at x+h.

It's at this point rather than x,

so in other words,

it actually means that if this h comes from the left hand side,

then this formula basically means that

I do expansion along the left hand side,

then I can get the high order approximation.

The V actually depends on the coming direction h

or the changing direction h.

If the second formula holds together with the first one,

then the Φ is said to be semismooth.

Now we also have a very special definition

called γ-order semismooth.

It means that,

if Φ is semismooth,

and further the following equation holds.

This equation is similar to the semismoothness definition.

However,

the difference lies in the order of h.

If we take a look at the order of h,

we can see that it means that

the left hand side can be written as the same order of Φ

in the order of 1+γ,

here γ is a positive number,

so this is called γ-order semismoothness.

Until here,

we have defined two semismoothness.

One is the general semismooth.

The second is called γ-order semismooth.

We have a very special definition for the γ-order semismoothness.

That is,

if the γ value is one.

Then we have the following equations reduces to,

the expansion of Φ at x+h is the same order of the norm of h,

the square norm of h.

In this case,

we call the semismoothness strongly semismooth.

This is a very special case for γ-order semismoothness,

which is called strongly semismooth.

We will talk about strongly semismooth later

when we apply our semismooth Newton's method

to support vector machine.

Now again,

we demonstrate this figure to show the semismoothness.

Actually,

for the max function,

it is strongly semismooth.

As we can see,

the strong semismooth basically says,

if we do a sort of linear expansion along the coming direction h,

we will get high order approximation.

And compared to the max function,

max function is basically a piecewise linear function,

in other words,

if we come from the left hand side approaching zero,

it is actually a straight line.

By this expansion,

we can actually calculate that this term is exactly zero,

because you've got only linear term,

so this linear term is fully represented in this part.

And this term will be zero,

in other words,

this max function is a strongly semismooth function.

If we do a little bit extension,

any piecewise linear function,

they are all strongly semismooth,

so this is the result.

And for other results about semismooth functions,

we can actually see that

any composition of semismooth functions will be semismooth.

And any strongly semismooth functions,

their compositions will also be strongly semismooth.

This is some existing results about semismoothness,

in other words,

if you try to verify whether a function is semismooth,

one way is by definition,

another way is by some existing properties like this.

Now let's make a brief summary about this part.

In this part,

we mainly introduce the definition of semismoothness.

To reach the definition of semismoothness,

we first introduce the Bouligand subdifferential,

then the convex hull and the Clarke's generalized Jacobian,

and finally we reach the semismoothness.

We stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

6.1 Semismoothness在线视频

6.1 Semismoothness课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

6.1 Semismoothness笔记与讨论

也许你还感兴趣的课程: