当前课程知识点:Learn Statistics with Ease >  Chapter 9 Correlation and Regression Analysis >  9.5The application of regression equation >  9.5.1 Regression analysis: model evaluation 回归分析:模型评价

返回《Learn Statistics with Ease》慕课在线视频课程列表

9.5.1 Regression analysis: model evaluation 回归分析:模型评价在线视频

返回《Learn Statistics with Ease》慕课在线视频列表

9.5.1 Regression analysis: model evaluation 回归分析:模型评价课程教案、知识点、字幕

好 那么这节我们来说
Well, in this section let’s talk about

回归方程的解释力
the explanatory power of regression equation

就像我们刚才
Just as we said

在上一节里面我们说的
in the previous section

我们可以把这条回归线画出来
we can draw out the regression line

而且我们知道
Moreover, we know

我们画出来的这条回归线
the regression line we draw

是x和y它们之间关系的
is the line of best fit

最好的一条拟合线
for the relation between x and y

可是这个x到底能够解释y多少
However, exactly how much can x explain y

其实我们还是不知道的
We still don’t know

因为对于任何一个x和y
This is because for any x and y

我们都可以画出一条这样的线
we can always draw one such line

那么不同的x
but the explanatory power over y is different

其实对于y的解释力是不同的
across different x

那么有的x可以解释y的多一点
Some x can explain y a bit more

有的x可以解释的可能就要少一点
while some x may explain y a bit less

就像我们在前面
Just as we previously

我们讨论过一个例子
discussed in an example

我们说智商
we said IQ

然后还有你的高考当天的
and the traffic condition and weather condition

交通的状况 天气的状况
on the very day of your college entrance examination

那这些都可以预测你的高考成绩
might predict your score in the examination

那么究竟是哪个自变量
So exactly which independent variable

能够解释的好呢
is capable to explain better

我们就要比较这些x的解释力的大小
We shall compare the x in terms of the magnitude of explanatory power

那我们怎么才能够计算到
So how can we figure out

这个x的解释力的大小呢
the magnitude of the explanatory power of some x

我们之前在介绍回归方程的时候
While introducing the regression equation before

我们说观测值和均值的距离
we said we could break down

可以对它进行分解
the distance between the observed value and mean

可以分解成两块
into two parts

一块是观测值和预测值的距离
One is the distance between the observed values and predictands

另外一块是预测值和均值的距离
the other being the distance between the predictands and mean

那么在最左边的这块
For the leftmost part

他们的这个平方和
the sum of squares

也就是观测值和均值的差的平方和
namely the sum of squares of differences between the observed values and mean

我们把它叫做SST
is called SST

叫做和方 总的和方
or (total) sum square

那么在后面这个式子
Amid the latter formula

当中中间这块
the component in the middle

也就是说观测值和预测值
namely the sum of squares of differences

差的平方和
between the observed values and predictands

我们把它叫做残差的平方和
is called the sum of squares of errors

也就是SSE
namely SSE

然后预测值和均值差的平方和
Next, the sum of squares of differences between the predictands and mean

我们把它叫做SSR
is called SSR

那么就叫回归平方和
namely the sum of squares about regression

其实这个式子来说
Actually

留心一点的同学可能会发现
careful students may find

如果说我们把这个式子
if we square this formula

尤其是等式右端的式子
particularly the expression

做了平方以后
on the right side of the equality

那如果把它的展开以后
and after expanding it

其实还会有一个乘积项
there will actually be a product term

也就是观测值减去预测值
namely the observed value minus the predictand

再乘以预测值减去均值
times the predictand minus the mean

那这个已经证明过
This has been proved

说这个乘积项为零
Since this product term proves to be zero

所以去分解起来就会更方便
it would be more convenient to break it down

那我们就可以看到
Then we can see

那么就可以得到
and thus derive

总的平方和
the total sum of squares

等于残差平方和
equals the sum of squares of errors

加上回归平方和这样一个式子
Plus such a formula for the sum of squares about regression

那么既然SSE+SSR
since SSE+SSR

合起来等于SST
equals SST

那么SSE越小
the smaller the value of SSE

SSR的值就会越大
the greater the value of SSR

那么SSE越大
the greater the value of SSE

相反SSR的值就会越小
conversely, the smaller the value of SSR

我们是希望说
We hope to say

如果这个x对y的解释力比较好的话
if this x has better explanatory power over y

SSR的值会比较大
the value of SSR would be greater

那么我们怎么找到一个这样的指标
So how can we find such an index

来标记说SSR究竟有多大呢
of the exact magnitude of SSR

那么我们可以直接用SSR
We can obtain an index by

除以SST得到了一个指标
dividing SSR directly by SST

叫R的平方
which is called the square of R

R的平方在不同的教材当中的叫法
the square of R may be called differently

可能会有差别
in different textbooks

比如说有的书叫判定系数
For example, in some books, it is called the coefficient of judgment

也有的书叫确定系数
while in other books coefficient of determination

那么他们都指的是同样的一个内容
But they point to the same object

指的是R的平方
namely the square of R

那么R的平方的值
The value of the square of R

是在零和一之间的
ranges between 0 and 1

R方越大
The greater the value of the square of R

表示x对y的解释能力也就会越强
the greater the explanatory power of x over y

我们可以把这个R方的式子
We can also expand

把它展开来看一看
the formula for the square of R and have a look

那么R方等于SSR除以SST
the square of R equals SSR divided by SST

那么就是我们现在看到
as shown by the fraction

这个分式的样子
we are seeing now

如果我们把这个式子
If we expand this formula

一层一层的展开
layer by layer

那么最后我们可以展开成什么呢
what will it look like finally

就是我们在最后一行
The last row

它等于括号的平方
equals the square of the bracketed term

括号里边是什么呢
What is in the brackets

括号里边是个分式
It is a fraction

分式的分子是(公式如上)
whose numerator is (the formula as above)

也就是x和y各自的离差乘积和
namely the sum of products of the respective deviations of x and y

而分母上
and whose denominator

是x和y的总的和方再开根号
is the square root of the total sum square of x and y

而这个式子就是在它的平方里边
While this formula in the brackets,

括号里边的这个式子
namely the one in its square

它的公式
is actually

其实就是r的公式
the formula for r

也就是我们说的相关系数的公式
namely the formula for the coefficient of correlation we have mentioned

所以r的平方就等于
So the square of r equals

相关系数的平方
the square of coefficient of correlation

那么这个地方
Here

我们要稍微的做一点点澄清
we shall make a slight clarification

就是如果说
If we

我们把那个大R的平方的
remove the square

平方去掉的话
of the square of R

那么这个R
then R

其实有一个更正式的名字
actually has a more formal appellation:

叫做复相关系数
coefficient of multiple correlation

复相关系数其实指的是说
The coefficient of multiple correlations indicates

这个值是y的观测值
the correlation between the observed value

和预测值之间的相关
and predictand of y

如果有多个自变量的话
In the case of multiple independent variables

这个预测值就变成了
the predictand becomes

多个x的一种线性组合
a linear combination of multiple x

那么当只有一个自变量的时候
In the case of only one independent variable

那么这个y的观测值
the coefficient of correlation between the observed value

和预测值的相关系数
and predictand of y

也就是这个大R的值
namely the value of R

也就等于y和x的相关系数
equals the coefficient of correlation between y and x

也就等于这个小r
and thus equals r

也就是说
In other words

那么只有当
in the mere case where

只有一个自变量的时候
there is only one independent variable

大R才等于小r
R equals r

如果有多个自变量的话
else in the case where there are multiple independent variables

大R就不等于小r了
R no longer equals r

因为这个时候
because

会存在着多个小r
there may exist multiple r

因为有几个自变量
After all, the number of coefficients of correlation between independent variables and y

就存在着几个自变量跟y的相关系数
is equal to the number of these independent variables

所以这个地方大家注意到就可以了
Everyone just pays attention here

那么R的平方我们还可以继续展开
We can continue to expand the square of R

我们看到R的平方可以等于b的平方
to find that the square of R can equal the square of b

乘以S{\fs16}x{\r}的平方除以S{\fs16}y{\r}的平方
To times the square of S{\fs16}x{\r}or to divide the square of S{\fs16}y{\r}

也就是说R的平方根回归系数
In other words, the square of R bears some relation to

也有一定的关系
the coefficient of regression

或者说复相关系数
or coefficient of multiple correlation

等于回归系数乘以S{\fs16}x{\r}比上S{\fs16}y{\r}
It equals the coefficient of regression times S{\fs16}x{\r} over S{\fs16}y{\r}

S{\fs16}x{\r}是x的标准差
where S{\fs16}x{\r} is the standard deviation of x

S{\fs16}y{\r}是y的标准差
and S{\fs16}y{\r} is the standard deviation of y

好 那么知道了R方的计算
Well, having known the calculation of the square of R

那么我们也计算到了b{\fs12}0{\r}和b{\fs12}1{\r}的值
and figured out the values of b{\fs12}0{\r} and b{\fs12}1{\r}

那么我们就可以计算到SSR SSE
we can figure out SSR and SSE

当然首先我们也可以计算到
Of course, we can first figure out

SST的值
the value of SST

这样的话我们就可以回到我们
This way we can return to

回归的方差分析表的部分
the section of variance analysis table for regression

当我们可以看到
We can see

在回归的方差分析表当中
the variance analysis table for regression

那么跟方差的分析表是很像的
bears high similarity to the analytic table for variance

同样的是有和方 自由度
in that both have the values of sum square, degree of freedom

均方和F这么几个值
mean square, and F

那么SSR
What about SSR

对 我们再重复一遍
Well, let’s reiterate

SSR是预测值和均值的差的平方和
SSR is the sum of squares of differences between predictands and mean

SSE是观测值和预测值的
whereas SSE is the sum of squares of differences

差的平方和
between observed values and predictands

那么SSE越大就说明
The greater the value of SSE

x对y的解释力越差
the lower the explanatory power of x over y

那么SSE的值越小
A smaller value of SSE

那么就说明说
means

x对y的解释力就会越大
the greater explanatory power of x over y

因为SSR的值就会越大
since the value of SSR is greater

那么SST是观测值
SST is the sum of squares of differences between

和均值的差的平方和
the observed values and mean

这个值在数据一定的时候
When data are fixed

SST是一个恒定的一个值
the value of SST is a constant

那么第二列是它的自由度
In the second column are degrees of freedom

那么SSR的自由度
The degree of freedom of SSR

也就是回归的自由度是1
namely the degree of freedom of regression, is 1

为什么是1呢
Why is it 1

你看 我们有两个参数
You see, we have two parameters:

一个是a 一个是b
One is a and the other is b

那么a和b这两个参数
Given both parameters a and b

知道了b就知道了a
once b is known a is also known

所以能够自由的变来变去的值
How many parameters are there

参数的个数是几个呢
whose value can vary freely

只有一个
Only one

那么残差是n-2
Then the residual is n � 2

那为什么是n-2呢
Why is it n � 2

因为我们有a和b这么两个值
Because we have two values of a and b

基本上有了a和b这么两个值以后
and basically, with these two values of a and b

那么残差可以自由变来变化的个数
the number of free-changing residuals

就变成了是n减去它的参数个数个
becomes n minus the number of parameters

那么最后是SST
Finally, it’s time to calculate SST

因为我们在计算SST的时候
At this point

需要用到总体的均值
we need to use the population mean

那么总体均值确定以后
After the population mean has been determined

那么可以自由变来变去的
the number of free-changing

观测值的个数就变成了n-1了
observed values becomes n � 1

那么均方 也就是MS
The same goes with the calculation of mean square

那么均方的计算
namely MS

还是跟我们之前
as we previously

在方差分析里面介绍的是一样的
introduced in variance analysis

那么均方就等于和方除以自由度
Mean square equals sum square divided by degree of freedom

所以MSR等于SSR除以一
so MSR equals SSR divided by 1

那么MSE等于SSE除以n-2
and MSE equals SSE divided by n � 2

那么这个是均方
This is mean square

其实这里的均方MSR跟MSE
Actually, the mean squares MSR and MSE

都是对于总体方差的无偏估计值
are both unbiased estimators of population variance

那么如果说这条回归线
If the regression line is said

是有用的一条回归线
to be a useful one

那么这里的x可以预测y
then here x can predict y

那么y的值会随着x的值
It follows that the value of y

变化而变化的话
would vary with the value x

而是这些预测值就会各不相同
And these predictands will become distinct

预测值之间的差距就会比较大
and the difference between them will become significant

那么这个时候MSR的值
Thus at this moment the value of MSR

就会比较大
would be great

那么MSR的值越大
A greater value of MSR

就越说明这条回归线
is more indicative of the fact that

它的b{\fs12}1{\r}的值应该不为零
the value of b{\fs12}1{\r} of the regression line should not be zero

那么我们用一个指标来标记它们
So we mark them using an index

用F F就等于MSR比上MSE
FF = MSR/MSE

如果H0为真的时候
If H0 is true

F的值应该是在1的左右
the value of F should be 1 or so

F的值越大
The greater the value of F

就越说明x对y的撬动的力量
the greater the power of leverage

会比较大
of x on y

那么这条回归线就越不会
and the regression line would less likely

是一条水平的线
be a horizontal line

那么怎么去检验这个F的值大小呢
So how to test the value of F

我们之前在介绍方差分析的时候
Actually, we have discussed this problem before

其实已经讨论过这个问题
while introducing variance analysis

那么我们要去查F的表
We shall look up the table for F

F的表它有两个自由度
which has two degrees of freedom

那么在阿尔法等于005的时候
When α=0.05

我们就可以查到
we can find out

对应的两个自由度
the two corresponding degrees of freedom

那么它的临界值是多少
What is its critical value

如果F值大于这个临界值
If the F value is greater than the critical value

那我们就去拒绝H0
then we reject H0

说x可以预测y
and say that x can predict y

说这条回归线是有用的
and that the regression line is useful

那么如果说小于这个F的临界值
if it is smaller than the critical value of F

我们就接受H0
then we accept H0

说那这条回归线其实可能
and say that the regression line may actually

对于y没有什么很明显的预测作用
have no significant predictive effect on y

那我们建立这个回归方程
Perhaps the regression equation we set up

可能说明不了什么问题
can explain nothing

我们没有发现x和y之间的关系
as we have not found any relation between x and y

好 那我们这节就介绍到这里
Well, so much for this section

Learn Statistics with Ease课程列表:

Chapter 1 Data and Statistics

-Introduction

-1.1 Applications in Business and Economics

--1.1.1 Statistics application: everywhere 统计应用:无处不在

-1.2 Data、Data Sources

--1.2.1 History of Statistical Practice: A Long Road 统计实践史:漫漫长路

-1.3 Descriptive Statistics

--1.3.1 History of Statistics: Learn from others 统计学科史:博采众长

--1.3.2 Homework 课后习题

-1.4 Statistical Inference

--1.4.1 Basic research methods: statistical tools 基本研究方法:统计的利器

--1.4.2 Homework课后习题

--1.4.3 Basic concepts: the cornerstone of statistics 基本概念:统计的基石

--1.4.4 Homework 课后习题

-1.5 Unit test 第一单元测试题

Chapter 2 Descriptive Statistics: Tabular and Graphical Methods

-Statistical surveys

-2.1Summarizing Qualitative Data

--2.1.1 Statistical investigation: the sharp edge of mining raw ore 统计调查:挖掘原矿的利刃

-2.2Frequency Distribution

--2.2.1 Scheme design: a prelude to statistical survey 方案设计:统计调查的前奏

-2.3Relative Frequency Distribution

--2.3.1 Homework 课后习题

-2.4Bar Graph

--2.4.1 Homework 课后习题

-2.6 Unit 2 test 第二单元测试题

Chapter 3 Descriptive Statistics: Numerical Methods

-Descriptive Statistics: Numerical Methods

-3.1Measures of Location

--3.1.1 Statistics grouping: from original ecology to systematization 统计分组:从原生态到系统化

--3.1.2 Homework 课后习题

-3.2Mean、Median、Mode

--3.2.1 Frequency distribution: the initial appearance of the overall distribution characteristics 频数分布:初显总体分布特征

--3.2.2 Homework 课后习题

-3.3Percentiles

--3.3 .1 Statistics chart: show the best partner for data 统计图表:展现数据最佳拍档

--3.3.2 Homework 课后习题

-3.4Quartiles

--3.4.1 Calculating the average (1): Full expression of central tendency 计算平均数(一):集中趋势之充分表达

--3.4.2 Homework 课后习题

-3.5Measures of Variability

--3.5.1 Calculating the average (2): Full expression of central tendency 计算平均数(二):集中趋势之充分表达

--3.5.2 Homework 课后习题

-3.6Range、Interquartile Range、A.D、Variance

--3.6.1 Position average: a robust expression of central tendency 1 位置平均数:集中趋势之稳健表达1

--3.6.2 Homework 课后习题

-3.7Standard Deviation

--3.7.1 Position average: a robust expression of central tendency 2 位置平均数:集中趋势之稳健表达2

-3.8Coefficient of Variation

--3.8.1 Variance and standard deviation (1): Commonly used indicators of deviation from the center 方差与标准差(一):离中趋势之常用指标

--3.8.2 Variance and Standard Deviation (2): Commonly Used Indicators of Deviation Trend 方差与标准差(二):离中趋势之常用指标

-3.9 unit 3 test 第三单元测试题

Chapter 4 Time Series Analysis

-Time Series Analysis

-4.1 The horizontal of time series

--4.1.1 Time series (1): The past, present and future of the indicator 时间序列 (一) :指标的过去现在未来

--4.1.2 Homework 课后习题

--4.1.3 Time series (2): The past, present and future of indicators 时间序列 (二) :指标的过去现在未来

--4.1.4 Homework 课后习题

--4.1.5 Level analysis: the basis of time series analysis 水平分析:时间数列分析的基础

--4.1.6Homework 课后习题

-4.2 The speed analysis of time series

--4.2.1 Speed analysis: relative changes in time series 速度分析:时间数列的相对变动

--4.2.2 Homework 课后习题

-4.3 The calculation of the chronological average

--4.3.1 Average development speed: horizontal method and cumulative method 平均发展速度:水平法和累积法

--4.3.2 Homework 课后习题

-4.4 The calculation of average rate of development and increase

--4.4.1 Analysis of Component Factors: Finding the Truth 构成因素分析:抽丝剥茧寻真相

--4.4.2 Homework 课后习题

-4.5 The secular trend analysis of time series

--4.5.1 Long-term trend determination, smoothing method 长期趋势测定,修匀法

--4.5.2 Homework 课后习题

--4.5.3 Long-term trend determination: equation method 长期趋势测定:方程法

--4.5.4 Homework 课后习题

-4.6 The season fluctuation analysis of time series

--4.6.1 Seasonal change analysis: the same period average method 季节变动分析:同期平均法

-4.7 Unit 4 test 第四单元测试题

Chapter 5 Statistical Index

-Statistical indices

-5.1 The Conception and Type of Statistical Index

--5.1.1 Index overview: definition and classification 指数概览:定义与分类

-5.2 Aggregate Index

--5.2.1 Comprehensive index: first comprehensive and then compare 综合指数:先综合后对比

-5.4 Aggregate Index System

--5.4.1 Comprehensive Index System 综合指数体系

-5.5 Transformative Aggregate Index (Mean value index)

--5.5.1 Average index: compare first and then comprehensive (1) 平均数指数:先对比后综合(一)

--5.5.2 Average index: compare first and then comprehensive (2) 平均数指数:先对比后综合(二)

-5.6 Average target index

--5.6.1 Average index index: first average and then compare 平均指标指数:先平均后对比

-5.7 Multi-factor Index System

--5.7.1 CPI Past and Present CPI 前世今生

-5.8 Economic Index in Reality

--5.8.1 Stock Price Index: Big Family 股票价格指数:大家庭

-5.9 Unit 5 test 第五单元测试题

Chapter 6 Sampling Distributions

-Sampling and sampling distribution

-6.1The binomial distribution

--6.1.1 Sampling survey: definition and several groups of concepts 抽样调查:定义与几组概念

-6.2The geometric distribution

--6.2.1 Probability sampling: common organizational forms 概率抽样:常用组织形式

-6.3The t-distribution

--6.3.1 Non-probability sampling: commonly used sampling methods 非概率抽样:常用抽取方法

-6.4The normal distribution

--6.4.1 Common probability distributions: basic characterization of random variables 常见概率分布:随机变量的基本刻画

-6.5Using the normal table

--6.5.1 Sampling distribution: the cornerstone of sampling inference theory 抽样分布:抽样推断理论的基石

-6.9 Unit 6 test 第六单元测试题

Chapter 7 Confidence Intervals

-Parameter Estimation

-7.1Properties of point estimates: bias and variability

--7.1.1 Point estimation: methods and applications 点估计:方法与应用

-7.2Logic of confidence intervals

--7.2.1 Estimation: Selection and Evaluation 估计量:选择与评价

-7.3Meaning of confidence level

--7.3.1 Interval estimation: basic principles (1) 区间估计:基本原理(一)

--7.3.2 Interval estimation: basic principles (2) 区间估计:基本原理(二)

-7.4Confidence interval for a population proportion

--7.4.1 Interval estimation of the mean: large sample case 均值的区间估计:大样本情形

--7.4.2 Interval estimation of the mean: small sample case 均值的区间估计:小样本情形

-7.5Confidence interval for a population mean

--7.5.1 Interval estimation of the mean: small sample case 区间估计:总体比例和方差

-7.6Finding sample size

--7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定:抽样的前奏(一)

--7.6.2 Determination of sample size: a prelude to sampling (2) 样本容量的确定:抽样的前奏(二)

-7.7 Unit 7 Test 第七单元测试题

Chapter 8: Hypothesis Tests

-Hypothesis Tests

-8.1Forming hypotheses

--8.1.1 Hypothesis testing: proposing hypotheses 假设检验:提出假设

-8.2Logic of hypothesis testing

--8.2.1 Hypothesis testing: basic ideas 假设检验:基本思想

-8.3Type I and Type II errors

--8.3.1 Hypothesis testing: basic steps 假设检验:基本步骤

-8.4Test statistics and p-values 、Two-sided tests

--8.4.1 Example analysis: single population mean test 例题解析:单个总体均值检验

-8.5Hypothesis test for a population mean

--8.5.1 Analysis of examples of individual population proportion and variance test 例题分析 单个总体比例及方差检验

-8.6Hypothesis test for a population proportion

--8.6.1 P value: another test criterion P值:另一个检验准则

-8.7 Unit 8 test 第八单元测试题

Chapter 9 Correlation and Regression Analysis

-Correlation and regression analysis

-9.1Correlative relations

--9.1.1 Correlation analysis: exploring the connection of things 相关分析:初探事物联系

--9.1.2 Correlation coefficient: quantify the degree of correlation 相关系数:量化相关程度

-9.2The description of regression equation

--9.2.1 Regression Analysis: Application at a Glance 回归分析:应用一瞥

-9.3Fit the regression equation

--9.3.1 Regression analysis: equation establishment 回归分析:方程建立

-9.4Correlative relations of determination

--9.4.1 Regression analysis: basic ideas

--9.4.2 Regression analysis: coefficient estimation 回归分析:系数估计

-9.5The application of regression equation

--9.5.1 Regression analysis: model evaluation 回归分析:模型评价

9.5.1 Regression analysis: model evaluation 回归分析:模型评价笔记与讨论

也许你还感兴趣的课程:

© 柠檬大学-慕课导航 课程版权归原始院校所有,
本网站仅通过互联网进行慕课课程索引,不提供在线课程学习和视频,请同学们点击报名到课程提供网站进行学习。