当前课程知识点:现代优化方法(英文) > Chapter 4. Trust Region Methods > 4.1 Main Idea of Trust Region Methods > 4.1 Main Idea of Trust Region Methods
Hi. Welcome to modern optimization methods.
I'm Qingna Li from Beijing Institute of Technology.
Today we will introduce the first part of chapter four,
the trust region methods.
Now this is the outline of the methods.
We will start with the main idea of trust region methods.
Then go through the algorithms of trust region methods.
In the next
we will discuss how to solve the subproblem.
Finally, we discuss the convergence
of trust region methods.
Now let's start with the first part of chapter four,
the main idea of trust region methods.
It includes three parts.
One is the size of trust region,
then the model function.
Finally, the step.
Now let's take a look at
the rough idea of trust region methods.
So as we mentioned before,
the trust region method
works actually in the following.
We first choose a
circle or choose a region
called a trust region.
Then within this trust region,
we try to find a minimizer.
So whose minimizer?
The model function's minimizer.
After that,
we jump directly to that minimizer.
This will be the trust region methods.
So from the statement above,
we can find that there are actually many key points
in the trust region methods.
Now I summarize the key points in the following.
So the first one is the size of trust region.
So how big or how large
the region I will choose,
it is a very key factor to the methods.
The second one is
what kind of approximation function I will use.
We call it
the quadratic model function.
Because naturally,
we will use the quadratic function.
So we refer it to as quadratic model function.
And the third factor is the step pk.
The step means that
I jump from the current iteration
directly to the next iteration.
So the difference between the two iterations
will be called a step.
In contrast,
for line search method,
we first determine a direction.
Then we determine a step length.
The direction multiplied by the step length
will be the step here.
So the idea of trust region methods is actually totally
different from the line search methods.
Now let's first take a look at the first factor,
the size of trust region.
So why it is important?
So think about the situation like this.
If we have a very small trust region,
for example, in two-dimensional space,
we draw a very small circle around the current iteration.
Then what kind of influence do we have?
So you may probably
find that
if I draw a very small circle around the current iteration,
then I cannot move
very far from the current iteration.
So this will slow down the process of the iteration.
Therefore,
we cannot afford that the trust region to be too small.
So what's the effect
if we choose a very large trust region?
So the very large trust region
will not prevent our process
to the final accumulation point.
However, if the trust region is too large,
then recall that we need to approximate
our function value within this trust region.
So this will make our approximation a great error.
So we need to balance the trust region size
from small to big.
And we need to choose a proper one.
So that's why it is important
for the trust region size.
Now how do we define
the size of trust region?
So we do have to find some rules
to measure whether it is good.
So the measurement is actually
we use the previous iteration.
So suppose we are standing at the current iteration
and we have some approximation at the moment.
We can use the approximation now to estimate
what will happen next
for the trust region iteration.
So if the performance in the previous
iteration is very good.
It means that the trust region
at the moment is very good.
So we may enlarge a little bit
in the next iteration.
If the performance of the current iteration
is not that good,
it means that
there may be not a very good
trust region size.
So we have to shrink our trust region size.
And we repeat the process again
in order to get a very good approximation.
So this is the rough idea
for the radius of the trust region.
Now let me show you a figure to demonstrate
the difference of trust region method
as well as the line search methods.
So if you look at the figure here,
and actually we are standing at the center of this circle.
This is a current iteration.
We got a trust region here,
which is the big circle.
Now the lines here
are actually the contours of function f.
And the dotted ellipsoids here
are the contours of model function.
So model function
is the approximation of the function value.
So in the trust region method,
we are looking for the minimizer of model function
within the trust region.
So remember that
we only can move inside of the trust region.
So within this trust region,
the minimizer we can get is this point.
So we jump here.
However, compared to the trust region,
the line search will look for the
minimizer of this function f.
So it actually move along the steepest descent direction here.
So you can see that there are difference
between the two directions.
The line search method actually pointing out
directly to the minimizer here.
So it will move along this direction.
There are difference between the two.
And the idea is also different.
Now let's take a look at the second key factor,
the model function.
As we mentioned before,
model function is the approximation
of the original function within a reasonable region.
So what kind of form
we can choose for the model function?
Normally we will choose the quadratic function
to do the model function.
So the intuition is that
if we do Taylor's expansion around xk ,
we can get the second order expansion here.
We replace the Hessian function
by a matrix Bk .
So Bk is an approximation of Hessian matrix.
And this will become the model function.
We require that Bk to be symmetric.
Moreover,
we must guarantee that the difference
between the two functions
must be the high order of the norm of p.
Otherwise, we may got large errors.
And that's why we cannot make the
trust region size too large.
Because if this size is too large,
then the error will be large as well.
So this is the key fact,
the model function.
Now in each step,
we need to solve the model function.
Because we are looking for the minimizer
of model functions instead.
Moreover, we have a trust region restriction.
So we require that the step cannot be too large.
It should be within a trust region size.
So Δ is the trust region radius.
The length of p must
not exceed the size of Δ.
Here another notation is that
we use the norm
to measure the length of this vector.
The natural norm is the Euclidean norm,
which is very often used.
We can choose other norms.
We will talk about this later.
As you can see that
if we drop the trust region restrictions,
we can directly solve the solution
as Bk inverse multiplied by gk .
In other words,
if the length of this direction is
smaller than or equal to the radius here,
then we can automatically
get the trust region solution,
which is this one.
In this case,
it means that
the unconstrained minimization subproblem coincides
with the trust region subproblem.
However,
if the length of this vector exceeds the radius.
Then we have to solve the trust region problem.
And it will be a complicated subproblem.
We will discuss this later for more details.
Now let's make a brief summary about this part.
In this part,
we mainly introduce the main idea
of trust region method,
including three key factors
and why they are important in the method
and how we deal with them.
So we stop here and see you next time.
-1.1 About optimization
-1.2 Classification of optimization
--1.2 Classification of optimization
-1.3 Preliminaries in convex analysis
--1.3 Preliminaries in convex analysis
-课件
-【作业须知】
-练习题
-2.1 What is a Solution?
-2.2 Optimality Conditions Ⅰ
-2.3 Optimality Conditions Ⅱ
-2.4 Line search strategy
-2.5 Search direction Ⅱ
-2.6 Convergence
-课件
-【作业须知】
-作业
-3.1 Exact Step Length
-3.2 The Wolfe conditions
-3.3 Inexact Line Search II
-3.4 Convergence of Line Search Methods
--3.4 Convergence of Line Search Methods
-3.5 Convergence Rate
-课件
-【作业须知】
-作业
-4.1 Main Idea of Trust Region Methods
--4.1 Main Idea of Trust Region Methods
-4.2 Trust-Region Algorithm
-4.3 Solving Subproblem
-4.4 Solving Subproblem II
-4.5 Convergence
-课件
-【作业须知】
-作业
-5.1 Conjugate direction method
--5.1 Conjugate direction method
-5.2 Property of conjugate direction method
--5.2 Property of conjugate direction method
-5.3 Conjugate gradient method
--5.3 Conjugate gradient method
-5.4 Rate of convergence
-5.5 Nonlinear conjugate gradient method
--5.5 Nonlinear conjugate gradient method
-5.6 Convergence of nonlinear conjugate gradient method
--5.6 Convergence of nonlinear conjugate gradient method
-课件
-【作业须知】
-作业
-6.1 Semismoothness
-6.2 Semismooth Newton's Method
-6.3 Support Vector Machine
-6.4 Semismooth Newtons' Method for SVM
--6.4 Semismooth Newtons' Method for SVM
-6.5 Exploring Sparsity in SVM
--6.5 Exploring Sparsity in SVM
-课件
-【作业须知】
-作业
-7.1 Local and Global Solutions
--7.1 Local and Global Solutions
-7.2 Examples One
-7.3 Examples Two and Three
-7.4 Constraint Qualifications
--7.4 Constraint Qualifications
-7.5 First-Order Optimality Conditions
--7.5 First-Order Optimality Conditions
-7.6 Second Order Necessary Condition
--7.6 Second Order Necessary Condition
-7.7 Second Order Sufficient Condition
--7.7 Second Order Sufficient Condition
-7.8 Duality
-课件
-【作业须知】
-作业
-8.1 KKT conditions
-8.2 An Example
-8.3 Dual Problem
-课件
-【作业须知】
-测试题
-9.1 Quadratic Penalty Method
--9.1 Quadratic Penalty Method
-9.2 Exact Penalty Function
-9.3 Augmented Lagrangian Method
--9.3 Augmented Lagrangian Method
-9.4 Quadratic Penalty Method for Hypergraph Matching
--9.4 Quadratic Penalty Method for Hypergraph Matching
-9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
--9.5 Quadratic Penalty Method for Hypergraph Matching: Ⅱ
-9.6 Augmented Lagrangian Method for SVM
--9.6 Augmented Lagrangian Method for SVM
-9.7 Augmented Lagrangian Method for SVM: Ⅱ
--9.7 Augmented Lagrangian Method for SVM: Ⅱ
-课件
-【作业须知】
-作业