当前课程知识点:Learn Statistics with Ease > Chapter 9 Correlation and Regression Analysis > 9.3Fit the regression equation > 9.3.1 Regression analysis: equation establishment 回归分析:方程建立
返回《Learn Statistics with Ease》慕课在线视频课程列表
返回《Learn Statistics with Ease》慕课在线视频列表
好 各位同学
All right, fellow students
我们这节来介绍一些
In this section, we will present some
关于回归分析的一些基本的概念
basic concepts on regression analysis
那么回归分析一般来说
Generally speaking, regression analysis
至少会涉及到两个变量之间的关系
involves the relation between at least two variables
那我们先看一个例子
Let’s examine an example first
在这个例子里面大家可以看到
In this example, everyone can notice
我们有一个坐标轴
we have a pair of coordinate axes
在这个坐标轴上
of which
Y轴表示的是中国各省市的平均寿命
the Y-axis denotes the average lifespan in all municipalities and provinces of China
X轴表示的是每万人床位数
whereas the X-axis denotes the number of sickbeds per 10,000 people
那么这个数据就是说
These data suggest
我们至少要有两个变量
we shall have at least two variables
一个变量我们要搜集到
one being the average lifespan we shall collect
中国31个省市的平均寿命
in the 31 municipalities and provinces of China
而第二列变量是相对应的
the other being the corresponding
每个省市的每万人床位数
number of sickbeds per 10,000 people per municipality/province
那么这二者之间
So what kind of relation
是一个什么样的关系呢
lies between both
如果我们把这两个变量的值
If we put the values of both variables
放到同样的一个坐标轴上
onto a same pair of coordinate axes
那就像我们现在看到的这样
as we are seeing now
它们就可以画成一条
they can look like a
画成散点图的样子
scatter plot
有的同学可能会想
Perhaps some students may think
那么我们为什么要考虑
why we shall consider
这两个变量之间的关系呢
the relation between these two variables
那大家可以想一想
Everyone can have a guess
每万人床位数是个什么概念
what is the concept of the number of sickbeds per 10,000 people
每万人床位数也就意味着
The number of sickbeds per 10,000 people means
政府在公共卫生上的开支
government’s expenditure on public health
建立一家好一点的医院
After all, building a well-performing hospital
需要的投入是很高的
calls for very high investments
比如说像一个三甲的医院
for example, a grade-three and a first-class hospital
那么它需要有各种各样医疗的设备
calls for a wide variety of medical equipment
需要有病房大楼
It calls for a ward building
需要有门诊的大楼
and an outpatient building
那么需要有一些基建的开支
thereby some expenditures on infrastructures
而且这样的医院
Moreover, such a hospital
还需要很多的医生和护士
also calls for many doctors and nurses
所以我们可以想一想
So we can imagine
政府去建设这样的医院的话
it would actually take a huge investment
其实是需要很大的投入的
for the government to build such a hospital
那么建设这样的医院
So can building such hospitals
那对于我们的人均寿命
really extend our lifespan per capita
是不是真的能够延长我们的寿命呢
所以在这个例子当中主要想讨论
In this example the foremost thing to discuss
那么这两者之间的关系
is the relation between both
也就是说政府在公共医疗上的投入
namely government’s investment in public medical service
和对人们寿命的影响
versus its effect on people’s lifespan
那么只是说在公共医疗上的投入
We just say we choose a very concrete variable
我们选了一个非常具体的一个变量
as the investment in public medical service
选择了每万人的床位数
namely the number of sickbeds per 10,000 people
如果我们观测这样一个散点图的话
When observing such a scatter plot
有一条线
we find there is a line
可以从左下角
that can be drawn indistinctly from the bottom left corner
隐隐约约的可以画到右上角去
to the top right corner
而如果我们画出来这条线的话
When this line is drawn out
大家可能就会想起来
perhaps everyone would recall
你们在高中的时候学过的一个知识
a knowledge you learned at senior high school
叫最小二乘法
which is called the least square method
那么最小二乘法就可以帮助我们
The least-square method can help us
去建立起来这样的一条回归线
set up such a regression line
那对于这样一条回归线来说
Such a regression line
它有两个非常重要的参数
includes two highly important parameters
一个参数是它的截距
One parameter is its intercept
截距意味着什么呢
What does intercept mean
意味着是说当X为零的时候
It means the Y value of the regression line
那么这条回归线在Y轴上的
at the intersection point in the Y line
交叉的那个点的Y值
when X is zero
那么另外一个非常重要的参数
The other very important parameter
是它的斜率
is its slope
斜率是说当X每变化一个单位
The slope indicates by how many units Y varies
Y变化多少单位
for each unit X varies by
那么在回归分析当中
In regression analysis
经常的我们其实不是特别的
most of the time we actually pay little
去重视截距的含义
attention to the connotation of intercept
因为截距有的时候它的含义
since it is sometimes
并不是特别清楚
not very clear
而我们实际上
While in fact, what we
特别关注的是两个变量之间的关系
pay particular attention to is the relation between two variables
也就是说我们会特别关心斜率
Namely, we would pay special attention to the slope
如果说斜率比较大的话
A great slope
那就好像是一个杠杆一样
is like a lever
我们可以看到X对Y的撬动的力量
whereby we can notice the moment of X on Y
就会比较大
is significant
如果说这斜率比较小
A small slope
那就意味着X对Y撬动的力量
means the moment of X on Y
就比较小
is small
那如果说
If expressing it
我们用一个符号来表示的话
in a symbol
我们一般会写成
we would generally write it into
(公式如上)
(the formula as above)
β{\fs12}0{\r}就是说的叫截距
Where β{\fs12}0{\r} is the so-called intercept
β{\fs12}1{\r}是斜率
and β{\fs12}1{\r} is the slope
而Y hat
While Y hat
上面加了一个尖
with a sharp hat on it
那Y hat指的是
refers to
Y的预测值
the predictand of Y
那为什么是Y hat
Why do we use Y hat
而不是Y呢
instead of Y
对 我们待会会再讨论这个问题
Well, we will discuss this question in a while
那么对于回归分析来讲
For regression analysis
我们其实就是要建立起来
what we shall actually do is set up
这样的一个回归方程
such a regression equation
就是(公式如上)
as (the formula above)
我们需要能够估计到
We need to be able to estimate
β{\fs12}0{\r}和β{\fs12}1{\r}的值
the values of β{\fs12}0{\r} and β{\fs12}1{\r}
那么通过这样一个回归方程
With such a regression equation
那么我们就可以实现一些目的
we can achieve some goals
第一个目的我们可以实现
The first goal we can achieve
把X和Y的关系
is to express the relation between X and Y
用一种量化的方式来表达
by a quantifying method
很多时候我们知道说
Most of the time we know
当X增加了时候 Y也会增加
as X increases Y increases
但是我们并不知道说
but we do not know
当X增加的时候 Y增加多少
how much Y increases as X increases
或者是说反过来
or on the opposite
当X增加的时候 Y减少多少
how much Y decreases as X increases
那么如果我们想要
If we want to
得到一个非常确定的值
get a highly definite value
我们就需要用回归方程
we need to use the regression equation
用回归分析来实现
to achieve this through regression analysis
另外我们可以检验有关
Besides, we can test the theories on
X和Y之间关系的理论
the relation between X and Y
就像在上一节当中
Just like in the previous section
我们讨论了资源理论
we discussed the resource theory
和好基因理论
and the good gene theory
那么究竟哪个理论
Exactly which theory
会更符合实际的情况呢
would conform to the real situation
那我们需要用回归方程的方法
We need to use the method of the regression equation
来实现对于这个相关理论的
to implement some tests
一些检验
on these relevant theories
那么第三我们可以测量
Third, we can measure
X和Y之间关系的强度
the intensity of the relation between X and Y
X和Y之间的关系
Among the relations between X and Y
有的关系可能是比较弱的
some may be weak
有的关系可能是比较强的
and some may be strong
比如说我们每个同学
For instance, every student
可能都参加过高考
may have taken the college entrance examination
那你参加高考的时候
When you took the examination
影响你高考成绩的因素会有很多
there might be a myriad of factors influencing your score
比如说智商
such as IQ
比如说你高考当天的身体的状况
your physical condition on the very day of the examination
比如说你高考当天的气温
the air temperature of the very day of the examination
比如说高考当天的交通堵塞的情况
and the condition of traffic congestion on the very day of the examination
那么这些都有可能会影响到
All these could make a difference to
你的高考的成绩
your score in the college entrance examination
如果我们把这些变量都考虑进去
If we take all these variables into consideration
我们可能会发现说
we may tell
在这个里边
which of those variables
能够对你的高考成绩
can have the strongest influence
影响力最强的那个变量是什么
on your score in the college entrance examination
那这样的话我们就可以实现说
That way we can conclude
这些变量他们和高考成绩之间的
which relation between these variables and the score of college entrance examination
关系的强度
is stronger
究竟是哪个更强哪个更弱
and which one is weaker
那么这个也是回归分析
This is also another function
可以实现的一个功能
regression analysis can implement
最后我们可以实现预测
Finally, we can achieve
就是在已知X值的条件下
predicting the value of Y
对Y来实现预测
under the condition that the value of X is known
而这个预测
While this
预测值就是这个Y hat
predictand is Y hat
比如说我们之前在介绍
For instance, while introducing
时间序列分析的时候
time series analysis previously
老师会介绍一种方法
the instructor would introduce a method
叫做趋势方程法
called trend equation method
趋势方程它的自变量是时间
The independent variable in a trend equation is time
而因变量是一些经济运行的指标
whereas the dependent variables are some indices of economic operation
如果我们通过趋势方程法
If we have set up an index of some economic operation
建立了某一个经济运行的指标
by the trend equation method
随着时间的变化而变化的
a regression equation that
这样的一个回归方程的话
varies with time
那么我们就可以预测到下一年
then we can predict
或者到下两年
what value this index of economic operation
那么这个经济运行的指标
would probably reach
大概会达到一个什么样的值
by next year or by the year after
这是回归分析的几个目的
The above are several goals of regression analysis
那么在我们进一步的介绍之前
Before making further introduction
我们想介绍两个基本的概念
we want to introduce two basic concepts:
一个概念叫确定模型
One is called the definitive model
确定模型也是用函数的形式
A definitive model is also expressed
来表示的
in form of function
那么在这种确定模型里面
In such a definitive model
每一个X值都对应着一个单一的Y值
every value of X corresponds to a single value of Y
比如说大家看到这个例子
Look at this example
某个实验室打算采购一批计算机
A laboratory plans to purchase a batch of computers
一台是6500块钱
The unit price is 6500 yuan
X是计算机的台数
If X represents the number of computers
那Y就是总花费
then Y is the total cost
那么在计算机的台数
It follows that the relation between the number of computers
和总花费之间的关系
and the total cost
就是Y等于6500X
is given by Y=6500X
那这样的话我们就可以看到
Thus we can notice
X有一个值
for each value of X
Y就一定有一个确定的值
Y must have a definite value
所以二者之间是一个
So there is a one-to-one correspondence
一一的对应关系
between both
那么在这个里边
Here
更常见的例子是
a more common example is
比如说不同的度量单位之间的转换
the conversion between different units of measurement
比如说对温度来说华氏和摄氏
say Fahrenheit and Celsius for temperature
那么对于一些质量的单位来说
and kilogram and pound
比如说公斤和镑
as units of mass
那么我们现在在PPT上看到的一个
What we are seeing on the PPT is a
是在华氏和摄氏之间的关系
relation between Fahrenheit and Celsius
那么华氏等于什么呢
What does Fahrenheit equal
等于5/9的摄氏加上32
It equals 5/9 of Celsius plus 32
那摄氏和华氏之间的关系
So the Celsius-Fahrenheit relation
也是一一对应的
is also one-to-one correspondence
这是确定模型
This is a definitive model
那么确定模型
A definitive model
其实是不需要去估计的
does not need to be estimated
因为确定模型基本上来说
since basically the definitive model
就是我们在确定它的时候
is accurate
是比较准确的
when we determine it
那么回归分析
Then what regression analysis
其实它要解决的
actually solves
还不是确定模型这样的问题
are not such problems of the definitive model
那么回归分析要解决是
but
概率模型的问题
problems of the probabilistic model
概率模型指的是说
The probabilistic model means
当X取某个值的时候
when X takes some value
Y的值是不确定的
the value of Y is not definite
而是服从某一个概率分布
but obeys a certain probability distribution
那么这个时候X和Y之间关系
The relation between X and Y at this moment
就叫概率模型
is called the probabilistic model
那么Y的值不确定
That the value of Y is indefinite
并不意味着是说
does not mean that
Y的值是随意的
the value of Y is arbitrary
而是说Y的值是服从某个概率分布
but that the value of Y obeys a certain probability distribution
而这个概率分布就有它的期望
which has its expectation
还有它的方差
and variance
比如说
In the example above
像我们刚才介绍的这个例子里边
we have just introduced
当X是每万人病床数
where X denotes the number of sickbeds per 10,000 people
Y是平均寿命
and Y denotes the average lifespan
我们在散点图里面其实也可以看到
as can actually be seen in the scatter plot
那么当X取一个值的时候
when X takes a value
Y是有多个值和它对应
Y has multiple values to correspond to it
那么如果我们
If we set up
建立起来一条回归线的话
a regression line
就像我们刚才做出的
like the one
这条回归线一样
we created just now
那我们可以得到β{\fs12}0{\r}和β{\fs12}1{\r}的值
then we can obtain the values of β{\fs12}0{\r} and β{\fs12}1{\r}
Y hat等于68065加上01641X
Y hat equals 68065 plus 01641X
那这个时候其实你会看到
At this moment you will notice
这条线只是隐隐约约的存在着
this line just exists indistinctly
或者说即使我们把这条线
Or we say even if we draw this line
用实线的形式画出来
in the form of a solid line
那么也并没有任意的一个点
none of the points would
或者说大部分的点
or most of the points
都不是正好落在这条回归线上的
would not, exactly fall in this regression line
很多点都是
Most points would
在这条回归线的上下波动
fluctuate above and below this regression line
那就会跟确定模型有很大的的不同
That would be largely different from the definitive model
我们刚才说了
We have just said
说在这条回归线上
the regression equation we have set up
我们建立起来的回归方程
in this regression line
是Y hat等于68065加上01641X
is Y hat = 68065 + 01641X
那如果对于任意的一个Y
So for any Y
这个方程要怎么写呢
how shall this equation be written
任意的Y我们就要后面
We shall add a residual term
再加上一个残差项
for an arbitrary value of Y
也就是说Y{\fs12}i{\r}
namely Y{\fs12}i{\r}
等于68065加上01641X
equals 68065 plus 01641X
再加是一个残差
plus a residual
这个残差是一个随机变量
where the residual is a random variable
那也就是说我们通过这个回归方程
That is to say, we can obtain a predictand
可以得到一个预测值
by this regression equation
而真实的观测值
While the actual observed value
是在这个预测值的上下波动的
fluctuates above or below this predictand
而究竟波动多少
We use {\fs22}e{\r}{\fs12}i{\r} to measure
我们是用{\fs22}e{\r}{\fs12}i{\r}来测量的
the exact amount of the fluctuation
{\fs22}e{\r}{\fs12}i{\r}是一个随机变量
{\fs22}e{\r}{\fs12}i{\r} is a random variable
那为什么对于回归分析
Why there must exist {\fs22}e{\r}{\fs12}i{\r}
或者说对于概率模型来讲
for regression analysis
一定会存在着一个{\fs22}e{\r}{\fs12}i{\r}呢
or probabilistic model
因为我们要知道
Since we shall know
对于整个的现实世界来讲
for the entire real world
任意一个的原因
any cause
都会产生多个结果
would beget multiple outcomes
而任意的一个结果
while any outcome
也都是由多个原因共同作用
has come about
所产生的
under the joint effect of multiple causes
所以当我们在回归方程当中
This is why we include an independent variable
我们放入了一个自变量
in the regression equation
其实也就意味着
which actually means
我们放弃了很多很多其他的自变量
we have given up quite a lot of other independent variables
那这些其他自变量
Have these other independent variables
是不是就放弃了他们对Y的影响呢
given up their effect on Y
对 那当然是没有的
Well, they certainly have not
因为只是说
Since it is just to say
我们从认识世界的角度来看
from the perspective of knowing the world
我们会尽量的希望
we would hope
能够获得一个既简洁
to get succinct and beneficial access
又有利的关于世界的途径
to the world as much as possible
但是其实世界真实的运行关系
but the real operational relations in the world
是非常复杂的
are actually very convoluted
那么有些变量对于因变量
The effect of some variables on the dependent variable
或者是对于Y的影响
or on Y
或者是非常的琐碎或者是非常的小
is extremely trivial or insignificant
或者是说在我们目前的测量
or ignored
或者是对变量的考察当中
in our current measurement
没有把它考虑进去
or in our examination into the variables
但是它仍然在发挥着作用
But they remain playing their role
那么这样的作用
and such a role
就构成了就是{\fs22}e{\r}{\fs12}i{\r}的来源
is the source of {\fs22}e{\r}{\fs12}i{\r}
这是{\fs22}e{\r}{\fs12}i{\r}最大的一个来源
This is the most significant source of {\fs22}e{\r}{\fs12}i{\r}
那么还会有一些别的来源
Still there are some other sources
比如说如果我们去测量一些
For example, if we measure some
动物的行为
Animals’ behavior
那么动物的行为
then the animals’ behavior
有的时候会有一些随机性
would have some randomness at times
那并不一定完全的
It does not always completely
去服从某一个规律的分布
obey the distribution of a certain law
另外在测量的过程当中
Furthermore, the measuring process
也会有测量误差
is also open to measuring errors
因为我们要用仪器去实现这个测量
Since we shall implement the measurement using some gauges
测量就是会有
there would be a certain difference
或者跟那个变量真实的值
in the measurement
会有一定的差异
from the true value of the variable
这也是{\fs22}e{\r}{\fs12}i{\r}的来源之一
This is also one of the sources of {\fs22}e{\r}{\fs12}i{\r}
那我们再来举一个例子
Let’s take another example
我们就会更深入的
in order to understand the probabilistic model
去理解这个概率模型
in more depth
比如说我们现在看到这个例子
The example we are now looking at
是一个人他的每周的收入
is the relation between an individual’s weekly income
和看电影开支之间的关系
and expenditure on seeing movie
现在有很多同学就非常喜欢的
Nowadays many students enjoy
去电影院看电影
seeing a movie at the cinema
那么去看电影的话
To see a movie
现在电影票有的也不是特别便宜
you have to pay for the cinema ticket, which is not very cheap
如果你收入高的话
If you have a high income
可能就是支付
you may have no problem
这个电影的开支是没有问题的
paying the expense for the movie
那么如果是收入低的话
Else if you have a low income
去支付电影的开支
you may feel a little pain
有的时候也会觉得有点心痛
on paying the expense for the movie at times
那么收入对于电影开支的影响
So what is the exact effect
究竟是什么呢
of income on the expenditure of the movie
那我们搜集了一些数据
We have collected some data
在这个数据当中
Among these data
我们也画了一个散点图
we have graphed a scatter plot
而且我们也做出了一条回归线
as well as a regression line
我们可以看到这样一条回归线
We can see such a regression line
那么在这条回归线当中
in which
Y的预测值等于
the predictand of Y equals
13.92加上0.076X
13.92 plus 0.076X
也就是说它的截距是13.92
meaning its intercept is 13.92
而斜率是0.076
whereas its slope is 0.076
我们刚才其实说过一点
Actually we have just now mentioned
说截距很多时候
that the intercept on most occasions
是没有很明确的意义的
is of no specific significance
那这个地方你就可以看到
Here you can notice
那么它的截距等于13.92
the intercept equals 13.92
意味着什么呢
What does this mean
意味着当X为零的时候
It means when X is zero
它还会在看电影上花13.92元
there remains an expense of 13.92 yuan on seeing movie
这个就很难理解了
which is difficult to understand
所以它的意义其实不明确的
Hence its significance is unspecific
那我们其实真正关心的是什么呢
So what do we care
真正关心的是0.076
What we care about is 0.076
0.076意味着什么呢
What does 0.076 mean
意味着说当你的收入
It means for each yuan
每增加一块钱
your income increases
你的看电影的开支就会增加七分钱
your expenditure on seeing movie would increase 7 cents
那如果说按照一张电影票
If one cinema ticket
30块钱来算的话
is sold for 30 yuan
大家考虑一下
everyone thinks over
那你的收入要增加多少
how much you have to increase your income
你才会多去看一场电影呢
until you are ready to see one more movie
好 我们再过来看一下这个表
Well, let’s look back at this table
在这个表当中你可以看到
From this table, you can see
我们左边的一列是收入的数据
the left column is the data on the income
那么有的收入的数据
Some of the data
是九百 有的是八百
are nine hundred, some are eight hundred
有的是六百
some are six hundred
也有的六百五
and still some are six hundred and fifty
那么对应着具体的某一个收入
Corresponding to a specific income
Y都有多个值和它对应
Y has multiple values
那么这种就是我们说的概率模型
Such is the probabilistic model we are talking about
当X取值固定的时候
When the value of X is constant
Y的取值并不固定
the value of Y is not constant
Y的取值是服从某一个概率分布
but it obeys a certain probability distribution
而这个概率分布是有期望的
which has its expectation
这个期望就是我们在这个表的
as seen in the last column
最后一列看到的
of this table
说当给定X的时候
When X is given
Y的期望是多少
what is the expectation of Y
好 那么我们在回归线上的
Well, it is actually like this
在回归分析当中其实是这个
in regression analysis based on the regression line
在给定X的时候
When X is given
Y的期望是正好落在一条直线上
the expectation of Y falls exactly in a straight line
而观测值和这个期望之间的距离
The distance between the observed value and the expectation
就是我们说的残差
is the so-called residual
-1.1 Applications in Business and Economics
--1.1.1 Statistics application: everywhere 统计应用:无处不在
-1.2 Data、Data Sources
--1.2.1 History of Statistical Practice: A Long Road 统计实践史:漫漫长路
-1.3 Descriptive Statistics
--1.3.1 History of Statistics: Learn from others 统计学科史:博采众长
--1.3.2 Homework 课后习题
-1.4 Statistical Inference
--1.4.1 Basic research methods: statistical tools 基本研究方法:统计的利器
--1.4.2 Homework课后习题
--1.4.3 Basic concepts: the cornerstone of statistics 基本概念:统计的基石
--1.4.4 Homework 课后习题
-1.5 Unit test 第一单元测试题
-2.1Summarizing Qualitative Data
--2.1.1 Statistical investigation: the sharp edge of mining raw ore 统计调查:挖掘原矿的利刃
-2.2Frequency Distribution
--2.2.1 Scheme design: a prelude to statistical survey 方案设计:统计调查的前奏
-2.3Relative Frequency Distribution
--2.3.1 Homework 课后习题
-2.4Bar Graph
--2.4.1 Homework 课后习题
-2.6 Unit 2 test 第二单元测试题
-Descriptive Statistics: Numerical Methods
-3.1Measures of Location
--3.1.1 Statistics grouping: from original ecology to systematization 统计分组:从原生态到系统化
--3.1.2 Homework 课后习题
-3.2Mean、Median、Mode
--3.2.2 Homework 课后习题
-3.3Percentiles
--3.3 .1 Statistics chart: show the best partner for data 统计图表:展现数据最佳拍档
--3.3.2 Homework 课后习题
-3.4Quartiles
--3.4.1 Calculating the average (1): Full expression of central tendency 计算平均数(一):集中趋势之充分表达
--3.4.2 Homework 课后习题
-3.5Measures of Variability
--3.5.1 Calculating the average (2): Full expression of central tendency 计算平均数(二):集中趋势之充分表达
--3.5.2 Homework 课后习题
-3.6Range、Interquartile Range、A.D、Variance
--3.6.1 Position average: a robust expression of central tendency 1 位置平均数:集中趋势之稳健表达1
--3.6.2 Homework 课后习题
-3.7Standard Deviation
--3.7.1 Position average: a robust expression of central tendency 2 位置平均数:集中趋势之稳健表达2
-3.8Coefficient of Variation
-3.9 unit 3 test 第三单元测试题
-4.1 The horizontal of time series
--4.1.1 Time series (1): The past, present and future of the indicator 时间序列 (一) :指标的过去现在未来
--4.1.2 Homework 课后习题
--4.1.3 Time series (2): The past, present and future of indicators 时间序列 (二) :指标的过去现在未来
--4.1.4 Homework 课后习题
--4.1.5 Level analysis: the basis of time series analysis 水平分析:时间数列分析的基础
--4.1.6Homework 课后习题
-4.2 The speed analysis of time series
--4.2.1 Speed analysis: relative changes in time series 速度分析:时间数列的相对变动
--4.2.2 Homework 课后习题
-4.3 The calculation of the chronological average
--4.3.1 Average development speed: horizontal method and cumulative method 平均发展速度:水平法和累积法
--4.3.2 Homework 课后习题
-4.4 The calculation of average rate of development and increase
--4.4.1 Analysis of Component Factors: Finding the Truth 构成因素分析:抽丝剥茧寻真相
--4.4.2 Homework 课后习题
-4.5 The secular trend analysis of time series
--4.5.1 Long-term trend determination, smoothing method 长期趋势测定,修匀法
--4.5.2 Homework 课后习题
--4.5.3 Long-term trend determination: equation method 长期趋势测定:方程法
--4.5.4 Homework 课后习题
-4.6 The season fluctuation analysis of time series
--4.6.1 Seasonal change analysis: the same period average method 季节变动分析:同期平均法
-4.7 Unit 4 test 第四单元测试题
-5.1 The Conception and Type of Statistical Index
--5.1.1 Index overview: definition and classification 指数概览:定义与分类
-5.2 Aggregate Index
--5.2.1 Comprehensive index: first comprehensive and then compare 综合指数:先综合后对比
-5.4 Aggregate Index System
--5.4.1 Comprehensive Index System 综合指数体系
-5.5 Transformative Aggregate Index (Mean value index)
--5.5.1 Average index: compare first and then comprehensive (1) 平均数指数:先对比后综合(一)
--5.5.2 Average index: compare first and then comprehensive (2) 平均数指数:先对比后综合(二)
-5.6 Average target index
--5.6.1 Average index index: first average and then compare 平均指标指数:先平均后对比
-5.7 Multi-factor Index System
--5.7.1 CPI Past and Present CPI 前世今生
-5.8 Economic Index in Reality
--5.8.1 Stock Price Index: Big Family 股票价格指数:大家庭
-5.9 Unit 5 test 第五单元测试题
-Sampling and sampling distribution
-6.1The binomial distribution
--6.1.1 Sampling survey: definition and several groups of concepts 抽样调查:定义与几组概念
-6.2The geometric distribution
--6.2.1 Probability sampling: common organizational forms 概率抽样:常用组织形式
-6.3The t-distribution
--6.3.1 Non-probability sampling: commonly used sampling methods 非概率抽样:常用抽取方法
-6.4The normal distribution
--6.4.1 Common probability distributions: basic characterization of random variables 常见概率分布:随机变量的基本刻画
-6.5Using the normal table
--6.5.1 Sampling distribution: the cornerstone of sampling inference theory 抽样分布:抽样推断理论的基石
-6.9 Unit 6 test 第六单元测试题
-7.1Properties of point estimates: bias and variability
--7.1.1 Point estimation: methods and applications 点估计:方法与应用
-7.2Logic of confidence intervals
--7.2.1 Estimation: Selection and Evaluation 估计量:选择与评价
-7.3Meaning of confidence level
--7.3.1 Interval estimation: basic principles (1) 区间估计:基本原理(一)
--7.3.2 Interval estimation: basic principles (2) 区间估计:基本原理(二)
-7.4Confidence interval for a population proportion
--7.4.1 Interval estimation of the mean: large sample case 均值的区间估计:大样本情形
--7.4.2 Interval estimation of the mean: small sample case 均值的区间估计:小样本情形
-7.5Confidence interval for a population mean
--7.5.1 Interval estimation of the mean: small sample case 区间估计:总体比例和方差
-7.6Finding sample size
--7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定:抽样的前奏(一)
--7.6.2 Determination of sample size: a prelude to sampling (2) 样本容量的确定:抽样的前奏(二)
-7.7 Unit 7 Test 第七单元测试题
-8.1Forming hypotheses
--8.1.1 Hypothesis testing: proposing hypotheses 假设检验:提出假设
-8.2Logic of hypothesis testing
--8.2.1 Hypothesis testing: basic ideas 假设检验:基本思想
-8.3Type I and Type II errors
--8.3.1 Hypothesis testing: basic steps 假设检验:基本步骤
-8.4Test statistics and p-values 、Two-sided tests
--8.4.1 Example analysis: single population mean test 例题解析:单个总体均值检验
-8.5Hypothesis test for a population mean
--8.5.1 Analysis of examples of individual population proportion and variance test 例题分析 单个总体比例及方差检验
-8.6Hypothesis test for a population proportion
--8.6.1 P value: another test criterion P值:另一个检验准则
-8.7 Unit 8 test 第八单元测试题
-Correlation and regression analysis
-9.1Correlative relations
--9.1.1 Correlation analysis: exploring the connection of things 相关分析:初探事物联系
--9.1.2 Correlation coefficient: quantify the degree of correlation 相关系数:量化相关程度
-9.2The description of regression equation
--9.2.1 Regression Analysis: Application at a Glance 回归分析:应用一瞥
-9.3Fit the regression equation
--9.3.1 Regression analysis: equation establishment 回归分析:方程建立
-9.4Correlative relations of determination
--9.4.1 Regression analysis: basic ideas
--9.4.2 Regression analysis: coefficient estimation 回归分析:系数估计
-9.5The application of regression equation