2.2 Optimality Conditions Ⅰ慕课视频播放-现代优化方法（英文）-MOOC慕课视频教程-柠檬大学

Hi, everyone.

Welcome to modern optimization methods.

In this part,

we will introduce the second part of chapter two

called the optimality condition.

We will basically look at first order necessary condition,

second order necessary condition,

as well as second order sufficient condition.

Before that, let me ask you a question.

So we already introduced the definition of local minimizers

and global minimizers.

So if I give you a point for a function,

how do you verify that

whether this point is a local minimizer

or a global minimizer?

So you may answer that like this.

I just checked by definition.

So this is one way.

However, this is a theoretical way,

because in practice you may never examine all the points

in a neighborhood of a point.

So it is just for theoretical definition.

In practice, how we recognize a local minimizer?

We need some theoretical tools to characterize it.

So what kind of theoretical tools?

We actually use the information of gradient

as well as Hessian matrix.

As we mentioned before,

in continuous optimization,

we can use the neighborhood information

to estimate the state of one point.

So from our following analysis,

we will see this point.

Now,

let's first introduce the first order necessary conditions.

You may ask, what is the necessary condition?

So necessary condition actually are derived

by assuming that a point is a minimizer.

In other words,

if I know that this point is a minimizer,

so what condition can we get?

So the necessary conditions actually talk about this question.

Now let's first introduce the first order necessary conditions.

What is the first order necessary conditions?

It is related to the first order information of a function.

So in mathematical way,

we need to first assume that if x is a local minimizer,

because we need this necessary condition,

then we further need to assume that

f is continuously differentiable,

because we will characterize the first order condition.

Also, we need that f has gradient.

Now after these two assumptions,

we can see that the gradient of f at x* is zero.

So this is the first order necessary condition.

It's very simple.

And it is popular used in many applications.

Now if we try to prove this theorem,

what can we do to prove it?

So this is the first theorem

that we will show how we prove the result.

Now actually there are many ways to prove this result.

Here we use the contradiction way.

So the contradiction way basically in the first step,

we assume that for contradiction,

the gradient of f at x* is not zero.

So we assume it does not hold.

Then we try to derive some contradictions.

Now by the gradient is not zero,

then we can define a direction p

as the inverse direction of gradient at x*.

Now this p we take the inner product

of p with the gradient of f at x*.

Then we get the minus of squared norm of f at x*.

So this term from the nonzero of gradient f.

So this part is strictly smaller than zero.

Now in the third step,

we try to use our condition.

We have assumed that

the gradient f is continuously differentiable

in an open neighborhood of x*.

Because this gradient is continuously differentiable.

Therefore, this p transpose a gradient f at x*,

it is continuously differentiable as well.

Now we take a little perturbation of x*,

and we move x* to a little bit to tp along this direction.

And this term will also keep the strict negativity.

So this is the key point.

Then, for any t in zero and t in this interval,

we can find that

if we do Taylor's expansion here,

then we can have this result.

So the Taylor's expansion basically says that

we have function values of f at x*+t bar p

and we do it at x*.

So it can be expressed as the function value of f at x*

plus the second term here.

So this term is related to the first order information.

It is very similar to the term here.

The difference lies in that

this t is between zero and t bar.

So recall our t bar is here.

t bar is between zero and T.

So this t is even smaller than t bar.

So the condition here,

the negative condition still holds.

Therefore, we have proved that

for the function value of f at x*+t bar p,

it is strictly smaller than f at x*.

Because the second term here is shown to be strictly negative.

So in this sense,

we have shown that this function value is smaller than f(x*).

Now in other words,

we have found a direction p that

if we start from x*

and move a little bit along this direction p,

then we can reach a even smaller function value than f(x*).

In other words,

this will violate the claim that x* is a local minimizer.

So in this way,

we derive a contradiction.

Therefore, the result is proved.

So this is the way of a contradiction way to prove the theorem.

Of course you may got other ways to prove the result.

Now by the first order necessary condition,

let me introduce a definition called stationary point.

We call x* a stationary point if its gradient is zero.

In other words, as long as the gradient is zero,

this point will be a stationary point.

So as you can see that

any local minimizer of a smooth function,

it is a stationary point.

Because for smooth function,

we already have its gradient exists.

Therefore, it satisfies the first necessary condition of f.

Therefore, we have any local minimizer

must be a stationary point for smooth functions.

However, if the function is not smooth,

then it may not be a stationary point of f.

For example, I draw a picture here.

You got

absolute value

here

which is not smooth.

And for this nonsmooth function,

we know that x=0 is a minimizer.

Of course it is a local minimizer.

However, this local minimizer

does not satisfy the gradient to be zero.

Why?

Because you even do not have gradient.

Because f is not differentiable at zero.

So there's no gradient.

Of course there is no such condition.

So here the smooth function is very important one.

If it is not smooth,

we cannot have such conclusions.

So this is why I claim that

a stationary point is not necessarily a local minimizer.

Now let me talk about the second order necessary condition.

So from the name,

we know that

second order necessary condition must assume that x*

is already a local minimizer.

Another information we can get is that

this second order information is talking about

the second order Hessian matrix.

So we must assume that the Hessian matrix exists.

This is the theorem.

So we also assume that x is a local minimizer.

Moreover we assume Hessian matrix exists

and is continuous.

So the continuity holds in an open neighborhood of x*

under the two assumptions here.

Then we have the first order optimality condition first.

So the gradient must be zero.

And moreover the Hessian matrix

is positive semidefinite at x*.

So this is the second order necessary condition.

So basically,

it means that this Hessian matrix at x*

is positive semidefinite.

Now let's take a simple example to show you.

So we assume that in n-dimensional space,

you got a quadratic function here.

The quadratic function take the form of

x transpose Qx + b transpose x

in this form.

And if x* is the local minimizer of this function,

then we can have the gradient to be zero.

Actually, the gradient takes the form of Qx*+b

and Hessian matrix is positive semidefinite.

So we take the Hessian matrix,

then you can see that

the Hessian function of f is a constant matrix Q

and we require this Q to be positive semidefinite.

So this is the second order necessary condition

for quadratic function.

Now in the following,

we will think about how we prove this result.

Below, we also use the way of contradiction.

So we assume that for contradiction,

the Hessian matrix is not positive semidefinite.

In other words,

we can find a vector p such that

p transpose the Hessian matrix

multiplied by p must be strictly negative.

In other words,

in this case,

Hessian matrix is not positive semidefinite.

So this is the condition we can get by assumption.

Then we try to use the continuity of Hessian matrix.

So the continuity of Hessian matrix actually implies that

if we take p transpose multiplied by the Hessian matrix

and then multiplied by p,

it will be strictly negative.

Why?

Because we take a little perturbation of x*.

We change it to x*+tp.

Then in this small neighborhood of x*,

the strict negative condition still holds.

Then with this condition,

if we try to take use of this condition,

we need Taylor's expansion.

So in this case,

Taylor's expansion must be expanded to the second order one.

So we do Taylor's expansion at x*+t bar p.

And we try to expand this function value at x*.

So we basically got three terms.

The first term is function value at x*.

The second term is the first order information.

And the third term is the second order information.

Because by the first order necessary condition,

we have the gradient of this part to be zero.

So this term is cancelled out.

And we are left with two terms here.

Moreover, by step two,

we have this part to be strictly negative one,

meaning that

the function value of f at x*+t bar p is strictly smaller than

the function value of f at x*.

In other words,

in the neighborhood of x*,

we have found a direction p

that if we walk along this direction p and move a little bit,

then you can reach another

function value which is strictly smaller than this one.

So this actually violates the fact that x* is a local minimizer.

So in this way,

we also derive a contradiction.

So by this way,

we have proved the result that

the second order condition actually holds in this case.

So this is the way we prove the result.

Now let's take a brief summary about this lecture.

So in this part,

we actually introduce the first order necessary condition,

and also the second order necessary condition.

Moreover we have a definition called the stationary point.

So it is actually closely related to the local minimizer.

However, they are not equivalent.

In the next part,

we will introduce further the second order sufficient conditions.

so until then, let's see you next time.

Bye.

现代优化方法（英文）课程列表：

Chapter 1. Introduction

-1.1 About optimization

--1.1 About optimization

-1.2 Classification of optimization

--1.2 Classification of optimization

-1.3 Preliminaries in convex analysis

--1.3 Preliminaries in convex analysis

-课件

-【作业须知】

-练习题

Chapter 2. Fundamentals of Unconstrained Optimization

-2.1 What is a Solution?

--2.1 What is a Solution?

-2.2 Optimality Conditions Ⅰ

--2.2 Optimality Conditions Ⅰ

-2.3 Optimality Conditions Ⅱ

--2.3 Optimality Conditions Ⅱ

-2.4 Line search strategy

--2.4 Line search strategy

-2.5 Search direction Ⅱ

--2.5 Search direction Ⅱ

-2.6 Convergence

--2.6 Convergence

-课件

-【作业须知】

-作业

Chapter 3. Line Search Methods

-3.1 Exact Step Length

--3.1 Exact Step Length

-3.2 The Wolfe conditions

--3.2 The Wolfe conditions

-3.3 Inexact Line Search II

--3.3 Inexact Line Search II

-3.4 Convergence of Line Search Methods

--3.4 Convergence of Line Search Methods

-3.5 Convergence Rate

--3.5 Convergence Rate

-课件

-【作业须知】

-作业

Chapter 4. Trust Region Methods

-4.1 Main Idea of Trust Region Methods

--4.1 Main Idea of Trust Region Methods

-4.2 Trust-Region Algorithm

--4.2 Trust-Region Algorithm

-4.3 Solving Subproblem

--4.3 Solving Subproblem

-4.4 Solving Subproblem II

--4.4 Solving Subproblem II

-4.5 Convergence

--4.5 Convergence

-课件

-【作业须知】

-作业

Chapter 5. Conjugate Gradient Methods

-5.1 Conjugate direction method

--5.1 Conjugate direction method

-5.2 Property of conjugate direction method

--5.2 Property of conjugate direction method

-5.3 Conjugate gradient method

--5.3 Conjugate gradient method

-5.4 Rate of convergence

--5.4 Rate of convergence

-5.5 Nonlinear conjugate gradient method

--5.5 Nonlinear conjugate gradient method

-5.6 Convergence of nonlinear conjugate gradient method

--5.6 Convergence of nonlinear conjugate gradient method

-课件

-【作业须知】

-作业

Chapter 6. Semismooth Newton's Method

-6.1 Semismoothness

--6.1 Semismoothness

-6.2 Semismooth Newton's Method

-6.3 Support Vector Machine

--6.3 Support Vector Machine

-6.4 Semismooth Newtons' Method for SVM

--6.4 Semismooth Newtons' Method for SVM

-6.5 Exploring Sparsity in SVM

--6.5 Exploring Sparsity in SVM

-课件

-【作业须知】

-作业

Chapter 7. Theory of Constrained Optimization

-7.1 Local and Global Solutions

--7.1 Local and Global Solutions

-7.2 Examples One

--7.2 Examples One

-7.3 Examples Two and Three

--7.3 Examples Two and Three

-7.4 Constraint Qualifications

--7.4 Constraint Qualifications

-7.5 First-Order Optimality Conditions

--7.5 First-Order Optimality Conditions

-7.6 Second Order Necessary Condition

--7.6 Second Order Necessary Condition

-7.7 Second Order Sufficient Condition

--7.7 Second Order Sufficient Condition

-7.8 Duality

--7.8 Duality

-课件

-【作业须知】

-作业

Chapter 8. Further Discussions on Constrained Optimization

-8.1 KKT conditions

-8.2 An Example

-8.3 Dual Problem

-【作业须知】

-测试题

Chapter 9. Penalty and Augmented Lagrangian Methods

-9.1 Quadratic Penalty Method

--9.1 Quadratic Penalty Method

-9.2 Exact Penalty Function

--9.2 Exact Penalty Function

-9.3 Augmented Lagrangian Method

--9.3 Augmented Lagrangian Method

-9.4 Quadratic Penalty Method for Hypergraph Matching

--9.4 Quadratic Penalty Method for Hypergraph Matching

-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ

-9.6 Augmented Lagrangian Method for SVM

--9.6 Augmented Lagrangian Method for SVM

-9.7 Augmented Lagrangian Method for SVM: Ⅱ

--9.7 Augmented Lagrangian Method for SVM: Ⅱ

-课件

-【作业须知】

-作业

2.2 Optimality Conditions Ⅰ在线视频

2.2 Optimality Conditions Ⅰ课程教案、知识点、字幕

现代优化方法（英文）课程列表：

Chapter 1. Introduction

Chapter 2. Fundamentals of Unconstrained Optimization

Chapter 3. Line Search Methods

Chapter 4. Trust Region Methods

Chapter 5. Conjugate Gradient Methods

Chapter 6. Semismooth Newton's Method

Chapter 7. Theory of Constrained Optimization

Chapter 8. Further Discussions on Constrained Optimization

Chapter 9. Penalty and Augmented Lagrangian Methods

2.2 Optimality Conditions Ⅰ笔记与讨论

也许你还感兴趣的课程: