6.2 Semismooth Newton's Method慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, welcome to modern optimization methods.

Today we will introduce the second part of chapter six,

semismooth Newton's method.

Now let's first recall the definition for semismoothness.

As we introduced in the previous part of this chapter,

semismoothness needs to satisfy two conditions.

The first one is that we require this vector function Φ

to be directional differentiable.

And for any V in the generalized Jacobian of Φ at x+h,

the following equation should be satisfied.

Furthermore,

a condition here,

meaning that the left hand side can be represented

at the same order of the 1+γ norm of h.

Then we also said Φ is γ-order semismooth.

Now let's present the semismooth Newton's method.

Actually,

let's make a brief review about semismooth Newton's method.

As we mentioned before,

the semismoothness definition was originally proposed

by Mifflin in 1977.

Then it was extended to Newton's method by Qi and Sun in 1993.

After that,

it was widely used for vector problems.

Now the first extension to matrix problem

is proposed by Qi and Sun in 2006 for nearest correlation matrix.

After that,

semismooth Newton's method received great attention

from optimization community.

And until currently,

it acts as an efficient subsolver

for quadratic semidefinite programming problem,

and also for nonlinear programming problems.

Now let's take a look at the nonsmooth version

of Newton's method.

Actually,

we consider solving a nonlinear equation

instead of solving optimization problem.

We got this Φ,

as we defined before,

from m-dimensional space to n-dimensional space.

And Φ(x) equals zero is the system we are trying to solve,

so the classical Newton's method to solve the system

will update xk like this.

In each iteration,

next iteratio x(k+1) will be generated

by xk minus the Newton's direction,

so Newton's direction is generated by Vk^(-1) multiplied by Φ(xk).

Here,

Vk is the Jacobian of Φ at xk ,

so in this case,

Newton's method actually assume that Φ is differentiable.

However,

for nonsmooth Newton's method,

it does not need the continuously differentiable of Φ.

We only assume that Φ is a locally Lipchitz function

and Φ satisfies some nonsmooth properties,

so in this case,

we get the nonsmooth Newton's method,

meaning that when we update the Newton's direction,

we choose this matrix Vk

from the Clarke's generalized Jacobian set.

The ∂Φ(xk) is what we have defined in the previous chapter,

it is the Clarke's generalized Jacobian,

which is normally a set.

This is the nonsmooth Newton's method,

in other words,

if at this point xk,

Φ is differentiable,

then this Vk is a unique matrix,

because you got the generalized Jacobian matrix coincides

with the Jacobian set.

However,

if at xk,

this xk is not differentiable,

we need to choose the Vk from a set,

so any elements in this set is okay for this update.

This is the definition for nonsmooth Newton's method.

And the relation between the two is that,

when Φ is differentiable, then the nonsmooth version

will reduce to the classical Newton's method,

otherwise they are different.

Now let's take a look at the local convergence rate

of nonsmooth version of Newton's method.

We assume that the generalized Jacobian set is nonsingular

and x0 is sufficiently close to x bar,

where this x bar is the solution of the nonlinear system.

If Φ is semismooth at x bar,

then we can analyze the local convergence rate

in the following way.

We do the reformulation of the distance from the next iteration

to x bar,

so this term we reformulate it in the following sense,

where we put out Vk^(-1) here.

Then the inside of this part is represented like this.

Here we add a zero term Φ(x bar),

so notice that Φ(x bar) is zero.

Then this term,

if we take a further look at it,

then we can see that,

by our assumption,

this inverse of Vk is bounded.

And also we use the semismoothness property for the second term here.

And we can see that this term

is a high order of the distance of xk to x*.

In other words,

from the next iteration to the current iteration,

if we take the ratio and take the limit,

we can see that by the definition,

the convergence rate is superlinear.

Because you got high order relationship of the two terms here.

And furthermore,

if Φ is γ-order semismooth,

then the high order can be reduced to the same order of the distance

of xk to x bar.

The order is 1+γ.

Recall the strongly semismoothness,

which corresponds to γ equals one.

Then we can see that if Φ is strongly semismooth,

then the order here will be two,

in other words,

the convergence rate will be quadratic.

This is a beautiful result.

Now we have finished the local convergence analysis.

We make a bit summary

if we try to get the local convergence result,

we need two essential conditions.

The first one is that the semismoothness,

or strongly semismoothness of the function Φ.

Another condition is that

we need the nonsingularity of all the matrices V

in the generalized Jacobian set.

Why do we need all the matrices V to be nonsingular?

Because in nonsmooth Newton's update,

we pick up any V, so you cannot specify

which V you choose.

We have to assume that

all matrices V in generalized Jacobian set is nonsingular.

With the two conditions,

then we can guarantee the superlinear or quadratic convergence rate.

Now let's take a look at the global convergence result

for semismooth Newton's method.

And we consider solving the minimization problem.

We try to minimize function f,

which we assume that function f

is continuously differentiable,

so it is equivalent to solving the gradient to be zero.

Then we can see that

the gradient to be zero is actually the nonlinear equations

that we are going to solve for Newton's method,

or semismooth Newton's method.

We apply semismooth Newton's method directly

to solve this first order optimality conditions,

and this is the update.

We assume that the gradient f is semismooth,

and also the generalized Jacobian set is nonsingular,

then the local convergence can be guaranteed.

So how we can get global convergence?

To reach global convergence,

as we recall in chapter two and three,

we need some step length,

in line search way,

so we add the following update like this.

We update the next iteration

by choosing the semismooth Newton's direction,

then we use a line search of αk.

How do we do the line search?

Next,

we present a globalized version of semismooth Newton's method

using the backtracking line search strategy.

Now the algorithms are presented here,

where we replace the variable x by ω,

actually.

The settings are that

we choose the starting point ω0 from any point.

We have some parameter σ between zero and one,

a ratio ρ between zero and one,

and δ,

η0 and η1.

The first step is that we check whether we need to stop.

If the norm of function f is small enough,

we can regard that we have reached the optimality condition

and we stop.

Otherwise,

we update and we go to the second step.

The second step is that

we choose any element from the generalized Jacobian set,

and we solve the Newton's direction by conjugate gradient method,

so that's why we introduce conjugate gradient method

in chapter five.

It is very important to apply CG to solve this linear system,

in other words,

we do not have to solve the inverse,

to solve the exact Newton's direction,

we just solve the inexact direction.

This is the stopping criterion for solving linear system,

and this actually means that we just solve the residual

of the linear system,

which can be bounded by a constant of norm of function f

at the current iteration.

By doing this,

we can already show the global convergence.

And the third one is the step length that we use.

We apply the backtracking strategy to do the step length.

That means that we start with mk equals zero,

and increase mk one by one

until we can find the following condition holds.

Then we reach the αk in the form of ρ^mk.

After that,

then we can get the next iteration in the following form,

and the cycle goes on and on.

By extending the nonsmooth Newton's method

to its globalized version,

then it can be successfully used

to solve other unconstrained optimization problems.

And we do not have to worry about

how to choose the initial point.

This is a very typical version

for globalized semismooth Newton's method.

Now,

for the global convergence result,

after adding the step length strategy,

then we can guarantee the global convergence

for the previous globalized version

of semismooth Newton's method.

Now to make a brief summary,

our previously presented globalized version

of semismooth Newton's method

will both maintain the global convergence result

and fast local convergence result,

so it is a very good algorithm.

Now let's make a brief summary about this part.

We have introduced the nonsmooth version of Newton's method

and discuss the local convergence rate,

as well as the globalized version of Newton's method.

We stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

6.2 Semismooth Newton's Method在线视频

6.2 Semismooth Newton's Method课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

6.2 Semismooth Newton's Method笔记与讨论

也许你还感兴趣的课程: