9.6 Augmented Lagrangian Method for SVM慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi. Welcome to modern optimization methods.

Today we will introduce the sixth part

of chapter nine,

the augmented Lagrangian method

applied to support vector machine.

This is the outline of this part.

We will introduce the support vector machine problem.

Then we apply the augmented Lagrangian method

to solve the L1 loss SVC.

Finally, we will discuss some theoretical properties

and then some numerical results.

Now let's first recall the support vector machine.

So the support vector machine is

divided into support vector classification

and support vector regression.

So in this part,

we mainly focus on the support vector classification problem.

Our method can also be applied

to support vector regression problems.

Now recall that in support vector classification,

we are trying to search for a hyperplane

such that the two types of data can be separated

by this hyperplane as much as possible.

So when the data cannot be separated strictly,

we use some loss functions

to measure the misclassification.

This leads to the L1 loss-SVC here

So in the L1 loss SVC model,

we actually measure the misclassification

by the L1 function,

which leads to the following problem here.

And we mainly focus on this problem in this part.

So firstly,

to study this problem,

let's first reformulate this x

by adding one extra component.

Then by adding x the training set

by one extra component

and ω by one extra component.

Then the problem is reduced to the following L1 loss SVC

without bias term here.

So you got this, the difference lies here

that we actually ignore or omit this b

which is basically represented in ω and xi.

So the unbiased term,

the L1 loss SVC model is like this.

Now we try to solve this problem

by using the augmented Lagrangian method.

So how we can achieve this goal?

Now we first write this problem

in the form of matrix and vector case,

or vector formula.

So the max term here

that we can reformulate as an L1 norm

of a max function.

Then the inner part is Bω+d,

this matrix B actually is made up of the training data set

xi as well as the label.

I think you can easily write down the form of B.

And d is actually a vector of all ones.

After that,

we can introduce an extra variable s,

and we replace the second term here by a function p(s),

and then s is the term of Bω+d.

Then by doing this,

the problem is actually reformulated as a nonsmooth part p(s)

plus a smooth part.

And this s has some equality constraint here.

So now for this problem,

the variable here is ω and s.

We got constraints here.

So for the equality constrained problems,

we can then apply the augmented Lagrangian function.

So the augmented Lagrangian function

is written as follows.

We just put the original f function

and the Lagrangian function here.

Then add the quadratic penalty term here.

This will lead to the augmented Lagrangian function.

And λ here

are corresponding to the equality constraints,

σ is the penalty parameter.

So the augmented Lagrangian method

actually takes the following update.

Recall that the primal parameters are ω and s,

there are two blocks for variables.

And we need to solve them with respect to ω and s

together to get the minimizer.

Then after that,

we update λ by the following formula,

which is a very traditional way.

So the main focus is actually how

we solve this subproblem with respect to ω and s.

Now the key points,

as we mentioned just now,

the first one is to solve the subproblem,

the second one is that

if we can manage to solve the subproblem,

then what is the convergence result

for the augmented Lagrangian method?

So these are the two questions

we are going to consider.

Now let's take a look at some preliminaries.

To answer question one,

we have to use some important tools.

The first tool is called the Moreau-Yosida regularization.

This is a basic tool in designing

nonsmooth optimization methods.

So to introduce the Moreau-Yosida regularization,

we have to be equipped with some basic settings

where we let the capital X

to be a real finite dimensional space

equipped with Euclidean distance and inner products.

Its induced norm is denoted

by the natural norm notation.

Then we assume that this q

is a closed convex function

from the space X to the real scalar function.

And then after these settings,

the Moreau-Yosida regularization of q

at point x is defined in the following way.

So if we take a detailed look at this

equation or this problem,

it is actually this θq(x) this function

with respect to x.

It is a minimal function value

of the right hand side here.

So this minimal function value

is actually a least square between y and x,

plus the function q(y).

So this minimization,

is actually looking for the minimizer of y

which makes this objective function smallest.

So in other words,

given x,

then we can solve this minimization problem

to get the optimal solution y

and the optimal function value

will be the function q(x).

So if this x changes,

then this optimal value y also changes

and the resulting optimal value will also change,

as well given the function value θq(x) here.

So this is called the

Moreau-Yosida regularization.

And for the optimal solution

of this minimization problem,

we give it a notation as Proxq(x).

So this is optimal solution y*.

And we call this as the proximal point of x

associate with function q.

So this page is actually very important

in our following analysis.

Now we have a very important proposition

in terms of Moreau-Yosida regularization.

So the property basically says that

if we got this q to be a closed convex function,

and θ is the Moreau-Yosida regularization of q,

and the proximal point is denoted as this.

Then we can calculate or firstly,

we have that this function q

is continuously differentiable.

Although the explicit form of q is not available,

however,

we can prove that this θq

is continuously differentiable.

We can have the gradient like this.

We can write down the gradient of this function

as the difference of x,

minus its proximal point.

In other words,

for this q function,

we can calculate its gradient by x itself

as well as the proximal term.

This will be used in solving subproblems in our case.

Now getting back to the subproblem in our case,

we have that we try to minimize

the augmented Lagrangian function with respect to ω and s

at the same time.

So the idea to solve this subproblem is like this.

We first solve the

function with respect to s

to get a formula with respect to ω,

then we solve the minimization problem

in terms of ω.

Now let's explain it in more details here.

Firstly,

we minimize this augmented Lagrangian function

in terms of s,

we try to solve this.

Given ω and λ,

we solve the minimization of s,

then we reformulate the augmented Lagrangian function

by a constant term here.

So this term is a constant with respect to s,

then this part is related to s.

So it is equivalent to minimizing

least square term here

as well as a proximal function here.

Therefore,

if you look at this part,

this part is actually sort

of Moreau-Yosida regularization.

So we denote the optimal solution

of this part as σθp(z(ω)).

And we assume that

we can solve this explicit formula,

then here this z(ω) is actually this term.

You can easily calculate the function here.

And if we write s in terms of this form,

then we actually eliminate the s variable.

Then the next step is that

we try to solve the minimization of Φ

in terms of ω.

Moreover,

if we take gradient with respect to ω for Φ,

then by the Moreau-Yosida properties,

we can see that the gradient of Φ

is equal to ω plus the gradient of θp.

So by Moreau-Yosida decomposition,

we can have this gradient of θp like this.

In other words,

we can have the gradient of Φ in an explicit form.

So this will provide us

a more practical way to solve the subproblem.

Now based on the analysis before,

we solve the minimization problem in the following way.

Firstly,

we solve the minimization of Φ in terms of ω,

we find the optimal solution ω*.

After that,

the s* is actually a proximal term of this z(ω*).

So this is actually,

we just do calculation to get this s*.

So the main focus is actually

how we solve this ω*.

And solving ω* is actually equivalent

to solving the stationary point.

The first order optimality condition.

Now if we take a look at the formula of Φ,

the gradient of Φ,

it takes the following formula.

In this formula,

we can have this s*(ω),

s*(ω) is actually the proximal term of z(ω).

Therefore,

to solve the gradient to be zero,

we need to fix the three questions here.

The first one is that

we have to derive the proximal mapping formula,

because it appears here, there is an s*.

The second one is

what is the generalized Jacobian of Φ.

Because we need to use the semismoothness property.

The third one is

how we can reduce the computational cost.

Because we need to make sure that

this algorithm is very fast.

Let's make a brief summary about this part.

In this part,

we basically investigate the augmented Lagrangian method

to solve the L1 loss SVC,

and we derive the framework

of augmented Lagrangian method to solve it.

We also use an important tool

called Moreau-Yosida regularization.

In the next part.

We will talk about

how we can fix the three key points

and design the efficient algorithm.

We stop here and see you next time.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

9.6 Augmented Lagrangian Method for SVM在线视频

9.6 Augmented Lagrangian Method for SVM课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

9.6 Augmented Lagrangian Method for SVM笔记与讨论

也许你还感兴趣的课程: