8958284

当前课程知识点:现代优化方法(英文) >  Chapter 6. Semismooth Newton's Method >  6.5 Exploring Sparsity in SVM >  6.5 Exploring Sparsity in SVM

返回《现代优化方法(英文)》慕课在线视频课程列表

6.5 Exploring Sparsity in SVM在线视频

下一节:课件

返回《现代优化方法(英文)》慕课在线视频列表

6.5 Exploring Sparsity in SVM课程教案、知识点、字幕

Hi. Welcome to modern optimization methods.

Today we will introduce the last part of chapter six,

exploring sparsity in SVM.

Now recall that

when we apply the semismooth Newton's method to solve the

support vector classification and support vector regression,

we need to solve a linear system

in order to get the semismooth Newton's direction.

Now we take a detailed look at how we solve the linear system.

And we will show how we

explore the sparsity of the optimal solution

to significantly reduce the computational cost.

Now first let's take a look at the conjugate gradient method.

We are trying to solve the linear equations in the following form.

I use this V.

In the semismooth Newton's method,

in each direction,

we will solve this sort of linear equation

in order to get this direction d

here this V is selected from the generalized jacobian set.

And h is a given right hand side.

Now let's recall that the dimension of this V

and recall that this ω is in n-dimensional space.

Therefore, the matrix V is an n by n size of matrix.

A question is how do we solve this linear equation

to get the d?

So we apply the conjugate gradient method,

as you may guess that when we use the conjugate gradient method,

the most time-consuming part is to solve

or to calculate the matrix multiplied by a vector.

So how do we calculate this matrix multiplied by a vector?

Here we do some further analysis.

First, we take a look at the formula of V.

Recall that the set V1 takes the following form.

It is composed by two parts.

The first part is the identity matrix.

The second part is actually a sum of matrices.

For each term we got hi multiplied by xi(xi^T).

In other words, for each term here,

it is a rank one matrix, due to different values of h,

this term will be a little bit different.

Now let's take a look at how we do the

matrix multiplied by a vector Δh.

We take a look at this term.

This is a vector in n-dimensional space.

And this is a matrix in n by n space.

We will take a look at how we do this calculation.

The natural calculation, if we multiplied this by this one,

then it will lead us to order O(n^2) computational cost.

Can we reduce this further to save some computational cost?

This is a question we ask.

And below we will do some analysis.

Now let's take a look at how we can calculate this multiplication.

If we substitute the form of V into the multiplication,

then we can get the multiplication will be the Δω itself,

which is a vector.

And then the sum term there.

For the sum term,

we can see that if we multiplied by this Δω,

then we actually get a

constant or a scalar xi transpose Δω which is a scalar.

Then this scalar multiplied by a vector xi.

In other words, if we do not calculate this term directly,

however we do the summation of the right hand side,

then we may save some computational cost.

So how do we further to save more?

Now let's take a look at this h.

Recall that h is the element in the generalized Jacobian

of this max function.

Then we can see that actually h is either zero or one

or between zero and one.

In other words, we do not multiply h all the time,

we just pick up those nonzero h.

If zi(ω) is positive, then hi is one.

Then we just get this term here.

And if zi(ω) is zero,

then we drop it.

We do not consider that.

If it is zero and this hi is the value between zero and one.

In this situation,

what value can we choose to reduce the computational cost?

You can see that the answer is very simple.

We can choose this μ to set it zero.

In this case,

by setting h to be zero,

then we do not consider the two parts here.

The left part is those zi(ω)>0.

We define an index set,

Ik whose zik(ωk) is positive.

We collect those indices and for those indices,

we do the summation.

Otherwise they are zero and we do not have to

do the calculation.

By doing this,

we can see that the computational cost can be reduced.

For this term, there is no computational cost.

And for this term,

we only need to do part of vector multiplication.

We summarize

the computational cost of the original one

and the improved one in this table.

Then you can see that the original one

will take the complexity of I times n order,

while the reduced one,

actually we only need to take the order of

2|Ik| elements multiplied by n.

The absolute value of Ik actually indicates

the number of elements in Ik.

You may wonder that

what is the size of the elements in Ik.

If there are many elements in Ik,

then we cannot get so many benefits.

Below I will show by a figure

that how big or how small the element in Ik is.

This is an example that for all the indices.

The x axis is the iteration number.

For example, if k equals two,

this is the number of Ik when k is two.

And the right hand figure actually

plot out the ratio of the number in Ik overall

the number of samples.

So the second figure actually indicates that

when this k goes on,

then the ratio will go up or go bigger firstly,

then it will decrease.

Finally, the ratio will be around 10%,

which means that the computational cost we save is about

at least 10% of the original one,

which is significantly reduced.

Now in the next figure,

we demonstrate the quadratic convergence rate

for some examples.

Here I listed two examples to show the convergence rate.

The x axis is the number of iterations,

and the y axis is the log of the norm of gradient.

If we observe the decreasing line like this,

then it indicates that the convergence rate is quadratic.

As we can see that for different parameters here,

all the algorithm, they will have quadratic convergence rate.

It is the same for another example here.

We also observe that

for the two examples here, it takes less than 10 iterations

to terminate the algorithm.

In other words,

this algorithm takes a very small number of iterations.

It actually converges very fast.

This is the demonstration of convergence rate.

Now, let's see some numerical results.

This is the numerical results that we compare with LIBLINEAR.

Remember that LIBLINEAR is the state-of-art solver

for SVM problem in linear kernel case.

We compare our semismooth Newton's method with

DCD and TRON.

So A1 is DCD, A2 is TRON,

and A3 is our semismooth Newton's method.

We report the CPU time t in seconds,

as well as the accuracy in percentage.

So here are the different names of the data sets.

As you can see that computational time of the three methods are

similar to each other.

And in terms of accuracy,

I marked the winner in bold.

You can see that the three methods are actually comparable

with each other.

If we take a look at the

support vector regression problem,

and the result is actually very different.

As you can see, B1 is DCD,

B2 is TRON and B3 is our semismooth Newton's method.

As you can see that the CPU time is also comparable

with each other.

However, in terms of MSE which is a

measure of errors.

The lower MSE is,

the better the performance is.

So you can see that compared with TRON and DCD,

the semismooth Newton's method actually

performs significantly better for the data here.

Sometimes for some data like the housing data,

and eunite 2001 data,

they actually perform very much better than TRON and DCD.

In other words,

when applying the semismooth Newton's method to the

support vector regression problem,

the performance of the semismooth Newton's method is much

better than the existing state-of-art method.

Now, if you are interested in semismooth Newton's method for SVM,

you can find more details in the publication here.

Let's make a summary about this part.

We basically apply semismooth Newton's method

to solve the support vector machine problem,

the L2-loss related models.

And we can get global convergence rate as well as

quadratic local convergence rate,

especially when applying semismooth Newton's method to

support vector regression model,

the performance is much better than the existing

state-of-art solvers.

Now here are some related reference about this chapter

and if you're interested, you can download them from website.

Now let's make a brief summary about this chapter.

In this chapter, we introduced semismooth Newton's method,

including the algorithm,

the globalized version of algorithms,

as well as local and global convergence result.

Moreover, we connect the semismooth Newton's method with the

classification methods and we apply to solve the

SVC and SVR model.

The resulting performance is very good.

This is the end of chapter six.

See you next time.

现代优化方法(英文)课程列表:

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

--8.1 KKT conditions

-8.2 An Example

--8.2 An Example

-8.3 Dual Problem

--8.3 Dual Problem

-课件

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

6.5 Exploring Sparsity in SVM笔记与讨论

也许你还感兴趣的课程:

© 柠檬大学-慕课导航 课程版权归原始院校所有,
本网站仅通过互联网进行慕课课程索引,不提供在线课程学习和视频,请同学们点击报名到课程提供网站进行学习。