9.7 Augmented Lagrangian Method for SVM: Ⅱ慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi. Welcome to modern optimization methods.

Today we will introduce the last part of chapter nine,

applying augmented Lagrangian method

to solve L1-Loss SVM.

Now recall that

we are considering the L1-Loss SVM.

We try to minimize

a quadratic norm of ω

plus a nonsmooth function p(s).

Meanwhile,

we have some equality constraints here.

To solve this problem,

we apply the augmented Lagrangian method.

For the augmented Lagrangian method,

recall that the main focus

in the method is how to solve the subproblem.

We listed three key points

when solving the subproblem.

In the previous part,

we talked about how

or what are the key points

to solve this subproblem.

Next, we will address the key points one

by one to show how we tackle them.

Now let's take a look at the first one,

how to calculate the proximal mapping.

Recall that the function p(s)

take the following form,

where it is basically the sum of

the max function here.

So the proximal function

is actually the optimal solution

for this minimization problem.

So this minimization is actually

called the Moreau-Yosida regularization.

Basically given a vector z.

We are trying to minimize

the distance between z and s,

then plus another function p(s).

For this problem,

in order to get the optimal solution,

we can actually transfer this problem

into the number of m-dimensional problem

into the number of one-dimensional problem.

It is basically a quadratic

minimization problem for one variable.

Then after that,

we can discuss that

due to different cases,

we can actually get different optimal solution values.

So this leads to the proximal

formula for this function.

And then we tackle the problem one.

Now let's take a look at the subdifferentiable

of this proximal mapping.

This is in preparation to derive the generalized Jacobian.

So the subdifferentiable of

this proximal term is actually some matrix,

which is a diagonal matrix.

So the diagonal element μ,

actually is the sort of generalized

gradient of max function.

If you can see that

if zi is greater than some constant

then it is one,

if it is between zero and a constant,

then it is zero.

If it's smaller than zero,

it is still one.

Between zi in zero and this constant,

it is zero.

So when zi reaches this point,

say zi equals zero or CM,

then the subdifferential

will be an interval μi here.

We demonstrate the figure

of proximal mapping in the left hand side.

Compared to the traditional L1

norm proximal mapping,

it's kind of shift between the l1

mapping to this one.

So this is the demonstration

for proximal mapping,

as well as how we derive the subdifferentiable

of the proximal mapping.

Now let's talk about

how we can get the generalized

Jacobian of the gradient φ.

As we mentioned in chapter 6,

the generalized Jacobian

is actually difficult to obtain

if you got very complicated functions.

However, by the chain rule here,

we can actually identify

a larger set of the generalized gradient.

In other words,

when multiplied by a vector h,

then the generalized Jacobian

of φ can be characterized by a larger set.

We call it a generalized gradient hat.

We can characterize this part.

For this larger set,

the formula is like this.

We got a constant matrix,

which is identity plus

a constant matrix as well.

Then the third part is a set.

Because there is a subdifferential

of the proximal mapping,

which we have discussed before.

By doing like this,

we can characterize

the generalized Jacobian as well.

Now, we give the globalization

of semismooth Newton's method.

We actually introduced

this globalization version in chapter six.

We just copy this algorithm here

to apply it to solve our subproblem.

The basic idea is that

we apply the line search strategy,

where the step αk

is provided by the backtracking strategy.

The search direction dk

is obtained by semismooth

Newton's method here.

When we apply semismooth

Newton's method to get the direction,

we have two issues to pay attention to.

The first one is that

this matrix V must be selected

from this larger set of generalized Jacobian.

Recall that we got this hat

to denote a larger set

of generalized Jacobian of φ.

So we choose V from this set.

The second issue is that

when we solve this direction d,

we need conjugate gradient method.

We did not solve it directly.

However, we apply the iterative

solver conjugate gradient

method to solve it

in order to save computational cost.

We will talk about this later for this issue.

Now, we take a look

at how we apply the conjugate

gradient method to solve this linear system.

The aim is that

we try to reduce the computational cost.

Notice that this V, takes the following form.

So this is one element

from the generalized Jacobian set.

Actually, if we save this V explicitly,

then we need a lot of memory cost

as well as computational cost,

because you got a matrix multiplication here.

Therefore, a practical way is that

how we can calculate this Vh directly,

instead of saving V and then doing calculation.

So we do this Vh calculation directly

instead of saving V and doing Vh.

So if you calculate Vh by this formula,

then we can see that the main

part we need to worry about is this one.

So there is an identity matrix

minus the diagonal matrix.

Notice that this μ

is either zero or one.

So in order to save our computational cost,

we need to find a way

to make sure that this part

contains as many zeros as possible.

So how can we choose this?

For this μ,

if it is one,

then the difference of this part will be zero.

Therefore, we need to choose μ as many one as possible.

Therefore, we use this part.

This I(z) actually denotes that zi

between zero and this constant.

In other words, if zi lives in this interval,

then μ will be zero.

In other words,

if we make difference of this part,

then this will lead the index of B

in terms of this row and the transpose

by this B in terms of the row here.

So we can measure how small this I(z) is.

If it contains less elements,

than we can save a lot of computational cost.

Now in this table,

we actually summarize different ways

of computational cost.

So the third way is actually what

we use in our implementation.

We can see that the computational

cost can be significantly

reduced from the previous one to here.

So if I(z) is smaller,

then we save more computational cost.

Now finally,

let's discuss the convergence result.

For the global convergence result,

we use the theorem in this paper.

And we can see that finally,

under the settings here,

we can guarantee that the sequence

of ωk and sk will be bounded

and converge to the unique optimal

solution of the original problem.

This is actually sort of global convergence result.

Next to present the local convergence result,

we need some stopping criteria.

So here I present different types of stopping criteria.

However, I will not go to details

to show why they come and how we implement it.

We use this stopping criterion

to get the following convergence

result for the augmented Lagrangian method.

So the first result is that

the sequence λk will converge to λ*,

which is one of the augmented Lagrange multiplier.

Moreover, the iteration here,

the distance of λ(k+1) to ω will be bounded

by the distance of the previous iteration to ω.

And this bounded constant βk

will turn to a constant which is smaller than one.

In other words,

the augmented Lagrange multiplier

has the linear convergence rate.

If we have more conditions on the stopping criteria,

then we can get that the update iteration ωk

will also enjoy the linear convergence rate.

Here this βk is actually a constant

which is related to k and it will turn to a constant,

which hopefully is smaller than one.

Now finally, let me present some numerical results.

Here we also compare our method

with the state-of-art solvers LIBLINEAR.

And we compare with the DCD method,

dual coordinate descent method.

So these are the CPU times.

These are the accuracy.

As you can see that

our algorithm is a bit

faster than the DCD method.

In many cases,

we will return a bit higher accuracy than DCD method.

In particular, there are several examples

showing that the augmented Lagrangian method

can actually significantly improve the algorithm

of DCD method because it gets much higher accuracy

than DCD does.

Now finally, let me conclude this part.

We basically introduce the augmented Lagrangian method

to solve the L1-loss SVC problem.

If you're interested in this part,

you can download the reference here to see more details.

Finally, for the chapter of this part that

we conclude in the following,

so we introduce the quadratic penalty method,

exact penalty method,

as well as augmented Lagrangian method

to solve constrained optimization problem.

We also present two examples to show how

we can apply efficient methods to solve the modern

optimization problems.

So this is the end of this course.

Thank you. Goodbye.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

9.7 Augmented Lagrangian Method for SVM: Ⅱ在线视频

9.7 Augmented Lagrangian Method for SVM: Ⅱ课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

9.7 Augmented Lagrangian Method for SVM: Ⅱ笔记与讨论

也许你还感兴趣的课程: