当前课程知识点:Learn Statistics with Ease > Chapter 3 Descriptive Statistics: Numerical Methods > 3.8Coefficient of Variation > 3.8.1 Variance and standard deviation (1): Commonly used indicators of deviation from the center 方差与标准差(一):离中趋势之常用指标
返回《Learn Statistics with Ease》慕课在线视频课程列表
返回《Learn Statistics with Ease》慕课在线视频列表
下面我们讲 离中趋势指标
In this lecture, we are going to learn dispersion tendency index
离中趋势指标的话 它说明的情况
The dispersion tendency index describes a situation
跟集中趋势指标刚好反的
opposite to central tendency
它在反映总体里面
It represents the difference among
各个变量值之间的差异程度
variable values in the population
因为这些差异程度将导致
The difference helps us
我们对算术平均
to understand arithmetic mean
就是对集中趋势的理解
which is central tendency
也就是可能能够说明
That is to say, it shows
集中趋势指标的代表性
whether the representation of central tendency
高和低的有关问题
is high or low
离中趋势指标
Dispersion tendency index
它的计算的作用非常大
plays an important role in calculation
一 说明平均指标它的代表性高低
First of all, it indicates the representative quality of the mean
第二 还能说明我们的事物发展
Secondly, it predicts whether the development
是否均衡或者是否稳定
is stable or balanced
当然这个均衡 稳定
Stability and balance in statistics
都是时间和空间上
is measured in
都要进行测算
time and space
比如说
For example
我们的新产品的研制
in the development of a new product
它就要进行测定
there must be determination
比如说某种新产品 农产品
when it comes to a new product, agricultural product
它的试值
of its false position
它的推广
its promotion
你就要测定它
It must be determined
在时间上 它会不会变异
in time to see if there is any variation
在空间上 会不会变异
and in space, too, to see if there is any variation
但农产品相对还好一些
It is easier to deal with
如果出现了问题
agricultural products
它的方差太大 出现了问题
When the variance is high
弥补起来
and causes social costs
社会成本也不多
the costs will be lower
但是 如果是药品
But, if it is a pharmaceutical product
它推广起来 那出现的就是人命
it is mortal, with the power to cure or kill
所以在这里面
So in promoting pharmaceutical products
我们大家都很重视这个指标
dispersion tendency is highly valued
这类指标 它的计算
The calculation of this indicator
它反映的情况就是
represents the variation
所有的总体单位的变量值
among all the variable values of population units
在总体里面它们的差异程度
in the population
到底有多大
How much is the difference
那差异程度到底有多大
How much is the variation
我们大家首先来想
Think about it
首先就会想到第一个
The first thing comes to our mind is
它最大的差异有多大
what the biggest variation is
那就第一个指标 全距 也叫极差
The first indicator, range, also called extreme variation
它由R表示 它是用最大变量值
is represented by letter R, equal to the maximum variable value
减掉最小变量值
minus the minimal variable value
用这个数字来表示
We use this figure to describe
比如说 我们工人生产的产品
for example, the products of our company
比如说这个产品
Take the product size
用尺寸表示它的大小
for example, its size
它最大的生产这个产品的尺寸
use its largest size
最大尺寸用最小尺寸相减
minus its minimal size
它就它的极差是多大
to get the range
那说明它的产品质量
When it describes the product quality
极差越大 产品质量越差
the bigger the range, the worse the quality is
这是我们讲的第一个
This is the first indicator we have learned
反映离散程度的指标
to represent dispersion tendency
下面的反映离散程度指标
The following indicators representing dispersion tendency
肯定就是从第一个指标开始
originate from this first one
你发现第一个指标它的优点很明确
The first indicator, we can see, has a clear advantage
就是直观 告诉你
It tells us in an obvious way
这个变量值的离散程度有多大
the dispersion degree of the variable values
但是它的缺点
But it also
也出来了
has disadvantages
它只是讲了极端
It can only reflect the extreme variation
中间所有变量值它怎么变
It fails to show what the other variable values
差别有多大反映不出来
are like, or how much their variations are
那这样的话我们就把x{\fs10}i{\r}
In this case, what can we do
它的所有变动都反映出来
to represent all the variations
那怎么反应 大家想
of x{\fs10}i{\r}. We assume
那就要给一个对比
if we want to describe a variation
要找一个标准
we need to establish a standard
那就是x{\fs10}i{\r}-a
That is to say, x{\fs10}i{\r}-a
都跟a对比
is compared with a
a就是标准
a is the standard
那这个标准怎么确定呢
How can we establish the standard
那我们就可以取最大值
We take the maximum value
也可以取最小值
or the minimal value
也可以取中间值
or the median
也可以取它本身
or the figure itself
我们通过不断地试验
We try all these out
不断地证明
proving and calculating
最后确定了
and at last, we find
我们每一个变量值x{\fs11}i{\r}
we can compare the variable values x{\fs11}i{\r}
与算术平均数相比
with arithmetic mean
比完了以后我们把它
After that, we calculate the
比出来的差额来进行平均
average of the differences
来说明它们
to describe
总体的变量值的离散程度
the dispersion degree of the variable values in the population
但这里就有一个问题
But there is a problem.
我们知道
As we know
算术平均数它有一个特点
arithmetic mean has a characteristic
(公式如上)它永远等于0
(see from the formula above), it is always equal to 0
也就是说 小的数字减平均数
which means, for figures less than the average
它是为负 大的数字减平均数
their difference is negative, and for figures greater than the average
它为正 它正负抵消等于0
it is positive, and the offset is 0
为了让它正负不抵消 怎么办呢
If we want to avoid the offset, what can we do
我们就是说它的离差
We want to talk about the deviations
我们是第一步
The first step we take
第一种方法去处理的时候
by this mean, to deal with this issue
就取绝对值
is to use its absolute value
正的保持正
The absolute value of positive number is the number
负的变为正
The absolute value of the negative number is the number without the negative sign
那就把它的离差加起来
Now let us add these deviations together
就离差和加起来
Divide the sum of these deviations
除以它的总体单位数
by the population size
这样算出来叫平均差
to calculate the average deviation
也就是平均离差
or the mean deviation
我们这个上面写的 就是
As we have written here
(公式如上)
(see the formula above)
当然这是简单的 加权的话
This is the simple mean deviation. If there is weight
加上f除以底下的∑f
add f to divide ∑f
下面我们试试用加权平均差的方法
Now let us try to calculate the weighted mean deviation
对二妞班上
by analyzing
这次统计学考试成绩进行分析
the Statistics test scores in Er Niu’s class
在第四讲中
In our fourth lecture
我们已经通过
we have already learned
加权算术平均数的方法
how to calculate weighted arithmetic mean
得到了二妞班
We know
这次统计学考试的平均成绩
the mean of Statistics test scores in Er Niu’s class
也就是72.6分
is 72.6.
那么有了这个平均成绩
With this average grade
下面我们可以进一步计算
we can further calculate
其加权平均差
the weighted mean deviation
前面我们已经知道
In previous lectures
加权平均差的计算公式
we learned the formula to calculate WMD
(公式如上)
(see the formula above)
由此我们可以知道
From this formula we know
要计算出二妞班上
to calculate the WMD
这次考试的加权平均差
of test scores in Er Niu’s class
我们就需要得到
we need
(公式如上)的数据
data (see the formula above)
以及∑f的数据
and ∑f
大家下面看到的这张频数分布表上
Let us look at this frequency distribution table
已经将(公式如上)以及
it shows the values of (see the formula above)
(公式如上)的值
and (see the formula above)
计算出来了
the output here
这里要注意的是(公式如上)中的x
But we should keep in mind the x (see the formula above)
它实际上表示的是每一组的组中值
is actually the class mid-point of individual sets
而x拔是我们在第四讲中
and x bar is what we have, in the fourth lecture
已经计算出来的 72.6
calculated, 72.6.
如 50到60这一组
Take the group from 50 to 60 as an example
所计算出来的X-X拔
the X, X bar calculated
应该等于55-72.6=�17.6
should be 55-72.6=�17.6
通过这张频数分布表我们已经
From the frequency distribution table
可以看到
we can see
(公式如上)的加和是等于389.6
(see the formula above), the sum is equal to 389.6
而∑f等于50
and ∑f is equal to 50
将这两个数据分别代入得到
Substitute the two data, and we can get
A.D=7.792
A.D=7.792
也就是我们要计算得到的
which is the weighted mean deviation
加权平均差
we have calculated
这里需要特别说明的是(公式如上)
We need to clarify that (see the formula above)
它表示的是单个离差
this figure represents a single deviation
而(公式如上)它表示的是组离差
that figure (see the formula above) represents a class deviation
而(公式如上)
and (see the formula above)
则表示的是总离差
that figure is total deviation.
还有一种处理方法
Another way to deal with this
让它的负数变为正数
is to remove the negative sign of the negative numbers
正数保持正数
while maintaining positive numbers.
就是什么呢
In other words
它的离差 x-x拔的离差呢
the deviation between x and x bar
括号进行平方
square the bracket
平方了以后呢
After this operation
它就都保持正的
they all become positive.
把它平方和加起来
Add the quadratic sum together
除以它的总体单位数
and divide the sum by the population size
那得出的这个指标 得出的结论
the output, the indicator
得出的结果
the result from this computation
就是我们平时讲的方差
is what we call the variance
这个指标用的特别多
It is a widely used indicator
它就(公式如上)
This figure (see the formula above)
这个指标开根号以后就是标准差
the square root of this indicator is standard deviation
这个指标 在我们后面会使用非常多
We will come across this indicator later in the following lectures
这也是统计里面用得比较多的
It is also a widely used indicator
一个指标
in statistics
下面我们试试用加权方差的方法
Now let us try to calculate the weighted variance
对二妞班上这次统计学考试成绩
by analyzing the Statistics test scores
进行分析
in Er Niu’s class
首先根据方差的基本公式
First, according to the formula of variance
(公式如上)
(see the formula above)
(公式如上)
(see the formula above)
可以知道
we can see
要求其加权方差
in order to calculate weighted variance
就必须计算出
we have to first calculate
(公式如上)
(see the formula above)
∑f的值
the value of ∑f
下面我已经将所需计算的数值
Here all the values we need to calculate
显示在这张频数分布表上
are displayed on this frequency distribution table
我们只需要将它们
All we need is to substitute them
代入方差公式即可
into the formula of variance
但是 在这里要注意的是
But we must keep in mind
其中的x实际上是每一组的组中值
x here is actually the class mid-point of individual groups
它并不是每一组数据的真实值
rather than the true value of data in individual groups
也就是说
which means
在组距式数列中
in class interval distribution series
其每一组的组中值表示的是
the class mid-point of each group
这一组数据的平均值
is the mean of data in this group
而不是这一组数据的真实值
rather than the true value of the data
由此我们进而可以知道
From this, we can know that
通过加权方差所得到的σ平方
σ square calculated by weighted variance
它并不是总体的真实方差
is not the true variance of the population
而真实方差我们就需要通过计算
To find its true variance, we need to calculate
它的简单方差
the simple variance
(公式如上)
(see the formula above)
(公式如上)
(see the formula above)
我们在这里就不给大家进行计算了
I will skip the detailed computation here
有兴趣的同学
If you are interested
可以自己计算一下
you are welcome to do the computation by yourself
它的简单方差
The simple variance
那么肯定是与我们
is sure different from
刚刚所计算得到的加权方差的结果
the weighted variance, as the previous calculation
是不一样的
shows
那方差在计算的时候
So, in calculating variance
在实际过程中计算的时候
in the specific computation procedure
要注意了
we should pay attention
实际过程中
in the computation procedure
比如我们举的例子里面
like in the case we just mentioned
用的是组距数列计算的时候
When it is class interval series
这个方差它的组距数列用的是
its variance is the class mid-point
组中值
of the class interval series
也就是我们刚上面讲的
This is what we have talked about
上限加下限除以2的组中值
The class mid-point by dividing the sum of upper and lower limit by 2
来代表这一组的组平均数
can represent variance calculated from
计算出来的方差
the subgroup average of this class
它不是这个总体的真实方差
It is not the true variance of this population
总体的真实方差 其实
The true variance of this population, actually
就用刚才原始的
is the original
(∑x-x拔)平方除以n
(∑x-x bar)square divided by n
这才是它真实方差
This is the true variance
当然这是未分组资料
Of course only the ungrouped data
才有真实的方差
has true variance
那分了组的时候
After the data is grouped
那怎么办
what can we do
那我们就会下面有一个
Then we will have the
方差的一个数学性质
mathematical quality of the variance
大家注意一下
Please pay attention to this
那下面我们来讲方差的数学性质
Now let us talk about the mathematical quality of the variance
方差的数学性质就是
The mathematical quality of the variance
方差加法定理
is variance addition theorem
它就把未分组的资料
It separates ungrouped data
分成几个组
into groups
比如我们PPT里面讲的例子里面
as in the examples in our PPT
它就分成三个组
it has been divided into 3 groups
这三个组里面
In the three groups
我们马上就能看出来
we can see immediately
首先它没有分之前
first, there is a total deviation
它们自己有一个总的离差
before the data is grouped
也就是总方差
Namely, the total variance
已经算出来了
is known
第二 首先三个组之间就有差别
Second, the three groups are different
那这样的话
In that case
我们可以计算三个组之间的差别
we can calculate the difference among them
算出来的方差 叫组间方差
The variance from that calculation, is called variance between laboratories
以什么来代表来算
How to calculate this
以各个组的组平均数作为代表
Use the subgroup average as representatives
进行加权计算
to conduct weighted computation
大家看一下这个公式
Let us look at this formula
第二 组里面它的这些单位
Second, the units within the group
你看一下 它们之间也有差别
if you look closer, are different as well
那这个差别我们也可以算出方差
We can also use this difference to calculate variance
那这个方差是组内方差
That variance is variance within laboratory
那有三个组 就是三个组内方差
Since there are three groups, there are three variances within laboratory
那我们现在算出三个组内方差
Now let us calculate the three variances within laboratory
大家用这个公式算出来以后
After calculating them by this formula
再算一个总的组内平均方差
we can further calculate a total average variance within a group
就用这个公式
by this formula
用各组的组内方差进行加权平均
Use variances within laboratory for weighted mean
算出组内平均方差
and calculate the average variance within a group
大家发现通过这个计算
We can see from this calculation
总方差会等于组内平均方差
that the total variance is equal to the sum of
加上组间方差
variance within laboratory and variance between laboratories
这个就是我们讲的
This is what we call
方差的加法定理
variance addition theorem
刚才我们讲这个例子
In the case we just mentioned
用组距数列来计算方差 有一点像
to calculate variance by class interval series is similar
我们用组平均数
to calculating variance between laboratories
计算的组间方差
by subgroup average
但是这两者
But the two are
本质是不相同的
different in nature
但是形式上相同
despite their similarity in form
这是方差的加法定理
This is variance addition theorem
方差的加法定理的用途特别多
The variance addition theorem can be applied widely
我们可以用来进行一系列的
Let us perform a series of
分析和检验
analysis and tests
这个等到大家去看
You can find many
方差分析里面就有
easy-to-understand examples
举个比较简单的例子
in variance analysis
比如说化妆品
Take cosmetics for example
它的广告效果到底用
which advertising media
哪种广告媒体做 它的效果好
is best for its promotion
你就可以用这种方法来进行测定
You can test this by using this method
这是方差的有关问题
This is how variance is used in real life problems
方差开平方以后就是标准差
The square root of variance is standard variance
标准差它的单位
The unit of standard variance
跟平均差的单位
the unit of the mean deviation
和全距的单位
the unit of range
与总体单位的变量值的单位
and the unit of the variable values in the population
是一样的
are the same
哦 原来方差和标准差这么有用呀
Well, variance and standard deviation are truly useful tools
以后我就用它们来帮助老师
In the future, I will use them to help me
选择最稳定的学生
to choose the most stable student
来担任辅导差生的任务啦
to offer academic help to those in need
方差和标准差可不是万能的
But variance and standard deviation are not panacea
如果遇到均值不同
In a data set whose mean value varies
而标准差也不同的数据组
or whose stand deviation is different
它们是很难比较的
it is difficult to make the comparison
必须对它们进行一定的转换
unless these data are converted
-1.1 Applications in Business and Economics
--1.1.1 Statistics application: everywhere 统计应用:无处不在
-1.2 Data、Data Sources
--1.2.1 History of Statistical Practice: A Long Road 统计实践史:漫漫长路
-1.3 Descriptive Statistics
--1.3.1 History of Statistics: Learn from others 统计学科史:博采众长
--1.3.2 Homework 课后习题
-1.4 Statistical Inference
--1.4.1 Basic research methods: statistical tools 基本研究方法:统计的利器
--1.4.2 Homework课后习题
--1.4.3 Basic concepts: the cornerstone of statistics 基本概念:统计的基石
--1.4.4 Homework 课后习题
-1.5 Unit test 第一单元测试题
-2.1Summarizing Qualitative Data
--2.1.1 Statistical investigation: the sharp edge of mining raw ore 统计调查:挖掘原矿的利刃
-2.2Frequency Distribution
--2.2.1 Scheme design: a prelude to statistical survey 方案设计:统计调查的前奏
-2.3Relative Frequency Distribution
--2.3.1 Homework 课后习题
-2.4Bar Graph
--2.4.1 Homework 课后习题
-2.6 Unit 2 test 第二单元测试题
-Descriptive Statistics: Numerical Methods
-3.1Measures of Location
--3.1.1 Statistics grouping: from original ecology to systematization 统计分组:从原生态到系统化
--3.1.2 Homework 课后习题
-3.2Mean、Median、Mode
--3.2.2 Homework 课后习题
-3.3Percentiles
--3.3 .1 Statistics chart: show the best partner for data 统计图表:展现数据最佳拍档
--3.3.2 Homework 课后习题
-3.4Quartiles
--3.4.1 Calculating the average (1): Full expression of central tendency 计算平均数(一):集中趋势之充分表达
--3.4.2 Homework 课后习题
-3.5Measures of Variability
--3.5.1 Calculating the average (2): Full expression of central tendency 计算平均数(二):集中趋势之充分表达
--3.5.2 Homework 课后习题
-3.6Range、Interquartile Range、A.D、Variance
--3.6.1 Position average: a robust expression of central tendency 1 位置平均数:集中趋势之稳健表达1
--3.6.2 Homework 课后习题
-3.7Standard Deviation
--3.7.1 Position average: a robust expression of central tendency 2 位置平均数:集中趋势之稳健表达2
-3.8Coefficient of Variation
-3.9 unit 3 test 第三单元测试题
-4.1 The horizontal of time series
--4.1.1 Time series (1): The past, present and future of the indicator 时间序列 (一) :指标的过去现在未来
--4.1.2 Homework 课后习题
--4.1.3 Time series (2): The past, present and future of indicators 时间序列 (二) :指标的过去现在未来
--4.1.4 Homework 课后习题
--4.1.5 Level analysis: the basis of time series analysis 水平分析:时间数列分析的基础
--4.1.6Homework 课后习题
-4.2 The speed analysis of time series
--4.2.1 Speed analysis: relative changes in time series 速度分析:时间数列的相对变动
--4.2.2 Homework 课后习题
-4.3 The calculation of the chronological average
--4.3.1 Average development speed: horizontal method and cumulative method 平均发展速度:水平法和累积法
--4.3.2 Homework 课后习题
-4.4 The calculation of average rate of development and increase
--4.4.1 Analysis of Component Factors: Finding the Truth 构成因素分析:抽丝剥茧寻真相
--4.4.2 Homework 课后习题
-4.5 The secular trend analysis of time series
--4.5.1 Long-term trend determination, smoothing method 长期趋势测定,修匀法
--4.5.2 Homework 课后习题
--4.5.3 Long-term trend determination: equation method 长期趋势测定:方程法
--4.5.4 Homework 课后习题
-4.6 The season fluctuation analysis of time series
--4.6.1 Seasonal change analysis: the same period average method 季节变动分析:同期平均法
-4.7 Unit 4 test 第四单元测试题
-5.1 The Conception and Type of Statistical Index
--5.1.1 Index overview: definition and classification 指数概览:定义与分类
-5.2 Aggregate Index
--5.2.1 Comprehensive index: first comprehensive and then compare 综合指数:先综合后对比
-5.4 Aggregate Index System
--5.4.1 Comprehensive Index System 综合指数体系
-5.5 Transformative Aggregate Index (Mean value index)
--5.5.1 Average index: compare first and then comprehensive (1) 平均数指数:先对比后综合(一)
--5.5.2 Average index: compare first and then comprehensive (2) 平均数指数:先对比后综合(二)
-5.6 Average target index
--5.6.1 Average index index: first average and then compare 平均指标指数:先平均后对比
-5.7 Multi-factor Index System
--5.7.1 CPI Past and Present CPI 前世今生
-5.8 Economic Index in Reality
--5.8.1 Stock Price Index: Big Family 股票价格指数:大家庭
-5.9 Unit 5 test 第五单元测试题
-Sampling and sampling distribution
-6.1The binomial distribution
--6.1.1 Sampling survey: definition and several groups of concepts 抽样调查:定义与几组概念
-6.2The geometric distribution
--6.2.1 Probability sampling: common organizational forms 概率抽样:常用组织形式
-6.3The t-distribution
--6.3.1 Non-probability sampling: commonly used sampling methods 非概率抽样:常用抽取方法
-6.4The normal distribution
--6.4.1 Common probability distributions: basic characterization of random variables 常见概率分布:随机变量的基本刻画
-6.5Using the normal table
--6.5.1 Sampling distribution: the cornerstone of sampling inference theory 抽样分布:抽样推断理论的基石
-6.9 Unit 6 test 第六单元测试题
-7.1Properties of point estimates: bias and variability
--7.1.1 Point estimation: methods and applications 点估计:方法与应用
-7.2Logic of confidence intervals
--7.2.1 Estimation: Selection and Evaluation 估计量:选择与评价
-7.3Meaning of confidence level
--7.3.1 Interval estimation: basic principles (1) 区间估计:基本原理(一)
--7.3.2 Interval estimation: basic principles (2) 区间估计:基本原理(二)
-7.4Confidence interval for a population proportion
--7.4.1 Interval estimation of the mean: large sample case 均值的区间估计:大样本情形
--7.4.2 Interval estimation of the mean: small sample case 均值的区间估计:小样本情形
-7.5Confidence interval for a population mean
--7.5.1 Interval estimation of the mean: small sample case 区间估计:总体比例和方差
-7.6Finding sample size
--7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定:抽样的前奏(一)
--7.6.2 Determination of sample size: a prelude to sampling (2) 样本容量的确定:抽样的前奏(二)
-7.7 Unit 7 Test 第七单元测试题
-8.1Forming hypotheses
--8.1.1 Hypothesis testing: proposing hypotheses 假设检验:提出假设
-8.2Logic of hypothesis testing
--8.2.1 Hypothesis testing: basic ideas 假设检验:基本思想
-8.3Type I and Type II errors
--8.3.1 Hypothesis testing: basic steps 假设检验:基本步骤
-8.4Test statistics and p-values 、Two-sided tests
--8.4.1 Example analysis: single population mean test 例题解析:单个总体均值检验
-8.5Hypothesis test for a population mean
--8.5.1 Analysis of examples of individual population proportion and variance test 例题分析 单个总体比例及方差检验
-8.6Hypothesis test for a population proportion
--8.6.1 P value: another test criterion P值:另一个检验准则
-8.7 Unit 8 test 第八单元测试题
-Correlation and regression analysis
-9.1Correlative relations
--9.1.1 Correlation analysis: exploring the connection of things 相关分析:初探事物联系
--9.1.2 Correlation coefficient: quantify the degree of correlation 相关系数:量化相关程度
-9.2The description of regression equation
--9.2.1 Regression Analysis: Application at a Glance 回归分析:应用一瞥
-9.3Fit the regression equation
--9.3.1 Regression analysis: equation establishment 回归分析:方程建立
-9.4Correlative relations of determination
--9.4.1 Regression analysis: basic ideas
--9.4.2 Regression analysis: coefficient estimation 回归分析:系数估计
-9.5The application of regression equation