9.5.1 Regression analysis: model evaluation 回归分析：模型评价慕课视频播放-Learn Statistics with Ease-MOOC慕课视频教程-柠檬大学

好那么这节我们来说
Well, in this section let’s talk about

回归方程的解释力
the explanatory power of regression equation

就像我们刚才
Just as we said

在上一节里面我们说的
in the previous section

我们可以把这条回归线画出来
we can draw out the regression line

而且我们知道
Moreover, we know

我们画出来的这条回归线
the regression line we draw

是x和y它们之间关系的
is the line of best fit

最好的一条拟合线
for the relation between x and y

可是这个x到底能够解释y多少
However, exactly how much can x explain y

其实我们还是不知道的
We still don’t know

因为对于任何一个x和y
This is because for any x and y

我们都可以画出一条这样的线
we can always draw one such line

那么不同的x
but the explanatory power over y is different

其实对于y的解释力是不同的
across different x

那么有的x可以解释y的多一点
Some x can explain y a bit more

有的x可以解释的可能就要少一点
while some x may explain y a bit less

就像我们在前面
Just as we previously

我们讨论过一个例子
discussed in an example

我们说智商
we said IQ

然后还有你的高考当天的
and the traffic condition and weather condition

交通的状况天气的状况
on the very day of your college entrance examination

那这些都可以预测你的高考成绩
might predict your score in the examination

那么究竟是哪个自变量
So exactly which independent variable

能够解释的好呢
is capable to explain better

我们就要比较这些x的解释力的大小
We shall compare the x in terms of the magnitude of explanatory power

那我们怎么才能够计算到
So how can we figure out

这个x的解释力的大小呢
the magnitude of the explanatory power of some x

我们之前在介绍回归方程的时候
While introducing the regression equation before

我们说观测值和均值的距离
we said we could break down

可以对它进行分解
the distance between the observed value and mean

可以分解成两块
into two parts

一块是观测值和预测值的距离
One is the distance between the observed values and predictands

另外一块是预测值和均值的距离
the other being the distance between the predictands and mean

那么在最左边的这块
For the leftmost part

他们的这个平方和
the sum of squares

也就是观测值和均值的差的平方和
namely the sum of squares of differences between the observed values and mean

我们把它叫做SST
is called SST

叫做和方总的和方
or (total) sum square

那么在后面这个式子
Amid the latter formula

当中中间这块
the component in the middle

也就是说观测值和预测值
namely the sum of squares of differences

差的平方和
between the observed values and predictands

我们把它叫做残差的平方和
is called the sum of squares of errors

也就是SSE
namely SSE

然后预测值和均值差的平方和
Next, the sum of squares of differences between the predictands and mean

我们把它叫做SSR
is called SSR

那么就叫回归平方和
namely the sum of squares about regression

其实这个式子来说
Actually

留心一点的同学可能会发现
careful students may find

如果说我们把这个式子
if we square this formula

尤其是等式右端的式子
particularly the expression

做了平方以后
on the right side of the equality

那如果把它的展开以后
and after expanding it

其实还会有一个乘积项
there will actually be a product term

也就是观测值减去预测值
namely the observed value minus the predictand

再乘以预测值减去均值
times the predictand minus the mean

那这个已经证明过
This has been proved

说这个乘积项为零
Since this product term proves to be zero

所以去分解起来就会更方便
it would be more convenient to break it down

那我们就可以看到
Then we can see

那么就可以得到
and thus derive

总的平方和
the total sum of squares

等于残差平方和
equals the sum of squares of errors

加上回归平方和这样一个式子
Plus such a formula for the sum of squares about regression

那么既然SSE+SSR
since SSE+SSR

合起来等于SST
equals SST

那么SSE越小
the smaller the value of SSE

SSR的值就会越大
the greater the value of SSR

那么SSE越大
the greater the value of SSE

相反SSR的值就会越小
conversely, the smaller the value of SSR

我们是希望说
We hope to say

如果这个x对y的解释力比较好的话
if this x has better explanatory power over y

SSR的值会比较大
the value of SSR would be greater

那么我们怎么找到一个这样的指标
So how can we find such an index

来标记说SSR究竟有多大呢
of the exact magnitude of SSR

那么我们可以直接用SSR
We can obtain an index by

除以SST得到了一个指标
dividing SSR directly by SST

叫R的平方
which is called the square of R

R的平方在不同的教材当中的叫法
the square of R may be called differently

可能会有差别
in different textbooks

比如说有的书叫判定系数
For example, in some books, it is called the coefficient of judgment

也有的书叫确定系数
while in other books coefficient of determination

那么他们都指的是同样的一个内容
But they point to the same object

指的是R的平方
namely the square of R

那么R的平方的值
The value of the square of R

是在零和一之间的
ranges between 0 and 1

R方越大
The greater the value of the square of R

表示x对y的解释能力也就会越强
the greater the explanatory power of x over y

我们可以把这个R方的式子
We can also expand

把它展开来看一看
the formula for the square of R and have a look

那么R方等于SSR除以SST
the square of R equals SSR divided by SST

那么就是我们现在看到
as shown by the fraction

这个分式的样子
we are seeing now

如果我们把这个式子
If we expand this formula

一层一层的展开
layer by layer

那么最后我们可以展开成什么呢
what will it look like finally

就是我们在最后一行
The last row

它等于括号的平方
equals the square of the bracketed term

括号里边是什么呢
What is in the brackets

括号里边是个分式
It is a fraction

分式的分子是（公式如上）
whose numerator is (the formula as above)

也就是x和y各自的离差乘积和
namely the sum of products of the respective deviations of x and y

而分母上
and whose denominator

是x和y的总的和方再开根号
is the square root of the total sum square of x and y

而这个式子就是在它的平方里边
While this formula in the brackets,

括号里边的这个式子
namely the one in its square

它的公式
is actually

其实就是r的公式
the formula for r

也就是我们说的相关系数的公式
namely the formula for the coefficient of correlation we have mentioned

所以r的平方就等于
So the square of r equals

相关系数的平方
the square of coefficient of correlation

那么这个地方
Here

我们要稍微的做一点点澄清
we shall make a slight clarification

就是如果说
If we

我们把那个大R的平方的
remove the square

平方去掉的话
of the square of R

那么这个R
then R

其实有一个更正式的名字
actually has a more formal appellation:

叫做复相关系数
coefficient of multiple correlation

复相关系数其实指的是说
The coefficient of multiple correlations indicates

这个值是y的观测值
the correlation between the observed value

和预测值之间的相关
and predictand of y

如果有多个自变量的话
In the case of multiple independent variables

这个预测值就变成了
the predictand becomes

多个x的一种线性组合
a linear combination of multiple x

那么当只有一个自变量的时候
In the case of only one independent variable

那么这个y的观测值
the coefficient of correlation between the observed value

和预测值的相关系数
and predictand of y

也就是这个大R的值
namely the value of R

也就等于y和x的相关系数
equals the coefficient of correlation between y and x

也就等于这个小r
and thus equals r

也就是说
In other words

那么只有当
in the mere case where

只有一个自变量的时候
there is only one independent variable

大R才等于小r
R equals r

如果有多个自变量的话
else in the case where there are multiple independent variables

大R就不等于小r了
R no longer equals r

因为这个时候
because

会存在着多个小r
there may exist multiple r

因为有几个自变量
After all, the number of coefficients of correlation between independent variables and y

就存在着几个自变量跟y的相关系数
is equal to the number of these independent variables

所以这个地方大家注意到就可以了
Everyone just pays attention here

那么R的平方我们还可以继续展开
We can continue to expand the square of R

我们看到R的平方可以等于b的平方
to find that the square of R can equal the square of b

乘以S{\fs16}x{\r}的平方除以S{\fs16}y{\r}的平方
To times the square of S{\fs16}x{\r}or to divide the square of S{\fs16}y{\r}

也就是说R的平方根回归系数
In other words, the square of R bears some relation to

也有一定的关系
the coefficient of regression

或者说复相关系数
or coefficient of multiple correlation

等于回归系数乘以S{\fs16}x{\r}比上S{\fs16}y{\r}
It equals the coefficient of regression times S{\fs16}x{\r} over S{\fs16}y{\r}

S{\fs16}x{\r}是x的标准差
where S{\fs16}x{\r} is the standard deviation of x

S{\fs16}y{\r}是y的标准差
and S{\fs16}y{\r} is the standard deviation of y

好那么知道了R方的计算
Well, having known the calculation of the square of R

那么我们也计算到了b{\fs12}0{\r}和b{\fs12}1{\r}的值
and figured out the values of b{\fs12}0{\r} and b{\fs12}1{\r}

那么我们就可以计算到SSR SSE
we can figure out SSR and SSE

当然首先我们也可以计算到
Of course, we can first figure out

SST的值
the value of SST

这样的话我们就可以回到我们
This way we can return to

回归的方差分析表的部分
the section of variance analysis table for regression

当我们可以看到
We can see

在回归的方差分析表当中
the variance analysis table for regression

那么跟方差的分析表是很像的
bears high similarity to the analytic table for variance

同样的是有和方自由度
in that both have the values of sum square, degree of freedom

均方和F这么几个值
mean square, and F

那么SSR
What about SSR

对我们再重复一遍
Well, let’s reiterate

SSR是预测值和均值的差的平方和
SSR is the sum of squares of differences between predictands and mean

SSE是观测值和预测值的
whereas SSE is the sum of squares of differences

差的平方和
between observed values and predictands

那么SSE越大就说明
The greater the value of SSE

x对y的解释力越差
the lower the explanatory power of x over y

那么SSE的值越小
A smaller value of SSE

那么就说明说
means

x对y的解释力就会越大
the greater explanatory power of x over y

因为SSR的值就会越大
since the value of SSR is greater

那么SST是观测值
SST is the sum of squares of differences between

和均值的差的平方和
the observed values and mean

这个值在数据一定的时候
When data are fixed

SST是一个恒定的一个值
the value of SST is a constant

那么第二列是它的自由度
In the second column are degrees of freedom

那么SSR的自由度
The degree of freedom of SSR

也就是回归的自由度是1
namely the degree of freedom of regression, is 1

为什么是1呢
Why is it 1

你看我们有两个参数
You see, we have two parameters:

一个是a 一个是b
One is a and the other is b

那么a和b这两个参数
Given both parameters a and b

知道了b就知道了a
once b is known a is also known

所以能够自由的变来变去的值
How many parameters are there

参数的个数是几个呢
whose value can vary freely

只有一个
Only one

那么残差是n-2
Then the residual is n � 2

那为什么是n-2呢
Why is it n � 2

因为我们有a和b这么两个值
Because we have two values of a and b

基本上有了a和b这么两个值以后
and basically, with these two values of a and b

那么残差可以自由变来变化的个数
the number of free-changing residuals

就变成了是n减去它的参数个数个
becomes n minus the number of parameters

那么最后是SST
Finally, it’s time to calculate SST

因为我们在计算SST的时候
At this point

需要用到总体的均值
we need to use the population mean

那么总体均值确定以后
After the population mean has been determined

那么可以自由变来变去的
the number of free-changing

观测值的个数就变成了n-1了
observed values becomes n � 1

那么均方也就是MS
The same goes with the calculation of mean square

那么均方的计算
namely MS

还是跟我们之前
as we previously

在方差分析里面介绍的是一样的
introduced in variance analysis

那么均方就等于和方除以自由度
Mean square equals sum square divided by degree of freedom

所以MSR等于SSR除以一
so MSR equals SSR divided by 1

那么MSE等于SSE除以n-2
and MSE equals SSE divided by n � 2

那么这个是均方
This is mean square

其实这里的均方MSR跟MSE
Actually, the mean squares MSR and MSE

都是对于总体方差的无偏估计值
are both unbiased estimators of population variance

那么如果说这条回归线
If the regression line is said

是有用的一条回归线
to be a useful one

那么这里的x可以预测y
then here x can predict y

那么y的值会随着x的值
It follows that the value of y

变化而变化的话
would vary with the value x

而是这些预测值就会各不相同
And these predictands will become distinct

预测值之间的差距就会比较大
and the difference between them will become significant

那么这个时候MSR的值
Thus at this moment the value of MSR

就会比较大
would be great

那么MSR的值越大
A greater value of MSR

就越说明这条回归线
is more indicative of the fact that

它的b{\fs12}1{\r}的值应该不为零
the value of b{\fs12}1{\r} of the regression line should not be zero

那么我们用一个指标来标记它们
So we mark them using an index

用F F就等于MSR比上MSE
FF = MSR/MSE

如果H0为真的时候
If H0 is true

F的值应该是在1的左右
the value of F should be 1 or so

F的值越大
The greater the value of F

就越说明x对y的撬动的力量
the greater the power of leverage

会比较大
of x on y

那么这条回归线就越不会
and the regression line would less likely

是一条水平的线
be a horizontal line

那么怎么去检验这个F的值大小呢
So how to test the value of F

我们之前在介绍方差分析的时候
Actually, we have discussed this problem before

其实已经讨论过这个问题
while introducing variance analysis

那么我们要去查F的表
We shall look up the table for F

F的表它有两个自由度
which has two degrees of freedom

那么在阿尔法等于005的时候
When α=0.05

我们就可以查到
we can find out

对应的两个自由度
the two corresponding degrees of freedom

那么它的临界值是多少
What is its critical value

如果F值大于这个临界值
If the F value is greater than the critical value

那我们就去拒绝H0
then we reject H0

说x可以预测y
and say that x can predict y

说这条回归线是有用的
and that the regression line is useful

那么如果说小于这个F的临界值
if it is smaller than the critical value of F

我们就接受H0
then we accept H0

说那这条回归线其实可能
and say that the regression line may actually

对于y没有什么很明显的预测作用
have no significant predictive effect on y

那我们建立这个回归方程
Perhaps the regression equation we set up

可能说明不了什么问题
can explain nothing

我们没有发现x和y之间的关系
as we have not found any relation between x and y

好那我们这节就介绍到这里
Well, so much for this section

9.5.1 Regression analysis: model evaluation 回归分析：模型评价在线视频

9.5.1 Regression analysis: model evaluation 回归分析：模型评价课程教案、知识点、字幕

Learn Statistics with Ease课程列表：

Chapter 1 Data and Statistics

Chapter 2 Descriptive Statistics: Tabular and Graphical Methods

Chapter 3 Descriptive Statistics: Numerical Methods

Chapter 4 Time Series Analysis

Chapter 5 Statistical Index

Chapter 6 Sampling Distributions

Chapter 7 Confidence Intervals

Chapter 8: Hypothesis Tests

Chapter 9 Correlation and Regression Analysis

9.5.1 Regression analysis: model evaluation 回归分析：模型评价笔记与讨论

也许你还感兴趣的课程: