7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定：抽样的前奏（一）慕课视频播放-Learn Statistics with Ease-MOOC慕课视频教程-柠檬大学

大家好
Hello, everyone

欢迎回到轻松学统计课堂
Welcome back to the Easy Learning Statistics Class

在这一章前面的几讲里面
In the foregoing lectures of this chapter

我们学习了点估计
we learned point estimate

以及点估计量评选的优良标准
and the excellent criteria for the selection of point estimators

还有我们介绍了
And we introduced

总体均值总体比例
population mean, population proportion

以及总体方差的区间估计的方法
and the interval estimation method of population variance

接下来为大家介绍的是
Now, we are going to introduce you

样本容量的确定
sample size determination

其实从抽样的环节来看
Actually, from the sampling point of view

样本容量的确定
sample size determination

应该是先于总体均值或者比例
should be before the links of population mean or the proportion

或者是方差的区间估计的环节的
or the interval estimates of the variance

我们在实际调查数据之前
Before we conduct the actual survey

就应该先要知道
we should first know

我们要抽查多少规模的样本
what size is the sample we are going to select

所以在抽样的环节来看
So from the links of sampling

样本容量应该是要放在前面的
the sample size should be in front

那为什么在这一章
Then, why do we

我们把样本容量的确定
Put the sample size determination

放到后面呢
later in this chapter and

采用了倒叙的方法
Take a flashback method

通过后面的学习
As we go along with the study

我想大家就能够明白
I think you'll understand

我们为什么要这么做了
why we're doing this way

在这一讲里面
In this lecture,

我们为大家介绍
we are going to introduce you

总体均值估计时
the determination of sample size

样本容量的确定
in population mean estimation

以及总体比例估计时
and the determination of sample size

样本容量的确定
in population

这两个内容
proportion estimation

首先我们来分析一下
Let's analyze first

有哪些因素会影响到
what factors will affect

样本容量的大小呢
the size of sample

样本容量的确定
The size of the sample

对于我们来讲
to us,

是一个非常关键的问题
is a very critical issue

因为如果样本容量大的话
Because if the sample size is large

当然理论上来讲
theoretically,

我们调查的结果会更准确
the results of our survey will be more accurate

但是这样就不容易显示
But that's not easy to show

抽样调查的优越性
the superiority of the sampling survey

而如果样本容量太小的话
And if the sample size is too small

有可能产生一个比较大的误差
it is possible to produce a relatively large error,

又无法满足我们对准确性的要求
making it unable to meet our requirements for accuracy

所以样本容量的大小
So the sample size

是一个非常关键的问题
is a very critical issue

那么接下来
Then,

我们首先来分析
let's analyze first

有哪些因素会影响到
what factors will affect

样本容量的大小呢
the size of samples

根据前面的学习
According to the previous study

我来帮同学们
let me help you

稍微地总结一下
to summarize a little bit

哪些因素会影响到
what factors will affect

样本容量的大小
the size of samples

首先第一个
The first factor

可能影响样本容量大小的因素
that may influence sample size

可能是极限误差
is possibly the limiting error

也就是我们能接受的
That is the maximum error range

最大的误差范围
acceptable to us

如果我们能接受的
If the maximum error range that we can accept

最大的误差范围放大一点
is a little bit larger,

那么我们允许的样本容量的大小
the sample size that we allow

就可以稍微地少一点
can be a little bit less

但是如果我们对极限误差
But if we're strict about

要求比较严格
the limiting error,

也就是我们要求的准确度
that is, if we require

比较高的话

a high degree of accuracy

那么这个时候
in this case,

我们就可能需要一个较大的样本
we might need a larger sample

来支撑我们一个准确度高的结果
to support our high accuracy results

这是极限误差
This is the effect of the limiting error

对于样本容量的影响
on the sample size

第二个
The second

可能对样本容量
factor that may affect

带来影响的因素是
the sample size is

总体的方差
the variance of population

也就是说
That is

总体内部各单位的变异情况
variation of units in the population

因为如果总体各单位变异大的话
Because if the units of the population vary a lot

那么为了得到一个
in order to get a

结果相同的估计
same estimation result

我们需要一个
we need a

较大的样本来支撑
a larger sample to support

而如果总体内部各单位本身
and if the units within the population themselves

差异比较小
are less different

也就是总体方差比较小的情况下
That is in the case that the variance of the population is small

这个时候你可以稍微少抽一点
you may draw a little less

样本单位出来
sample units

也可以达到一个相似的准确程度
to get a similar degree of accuracy

这是第二个因素
That's the second factor

第三个因素的话
The third factor

就是置信水平1-α
is the confidence level, 1-α

根据前面的分析
According to the previous analysis

置信水平
confidence level

它会影响到极限误差
will affect the limiting error

由于极限误差
As limiting error

它是影响我们样本容量的
is a very crucial factor that

一个非常关键的因素
can affect the sample size

因此置信水平
so, confidence level

也会通过这个传导机制
may, through this transmission mechanism

对样本容量产生影响
affect the sample size

那通常情况下
Generally,

如果我们要求比较高的
if we want a high

置信水平的话
confidence level,

那一般情况下
in general

样本容量相对地也要大一些
the sample size will be relatively large

而如果你的把握程度
If you do not want a

不要求那么高的话
high degree of assurance,

那么样本容量
the sample size

稍微地也可以少一点
can be a bit smaller

这是第三个因素
This is the third factor

那从前面三个因素来看的话
The foregoing three factors

其实它们都可能是
actually can,

通过极限误差
through limiting error

也就是我们对于准确程度的要求
or the requirement for accuracy degree

来影响样本容量
affect the sample size

产生的误差越小
The smaller the error

我们对于样本容量的要求就越高
the higher the sample size requirement

产生的误差越大
the larger the error

我们对于容量的要求就没有那么高
the lower the size requirement

那么还有哪一些因素
What are the other factors

可能会对误差的大小产生影响呢
that might influence the error

通过回顾我们前面所学过的知识点
By recalling the knowledge points learned previously

在抽样与抽样分布这一章里面
in the chapter of sampling and sampling distribution

我们知道抽样的组织形式
we know that the organization form of sampling

以及抽样的方法
and the sampling methods

也会对抽样所产生的误差
may also have a certain impact on

带来一定的影响
the sampling error

比如我们知道
For example we know that

分层抽样的情形下面
in stratified sampling

它的误差相对来说是比较小的
the error is relatively small

那如果我们采用的是分层抽样而不是简单随机抽样的话
so if we use stratified sampling

而不是简单随机抽样的话
instead of simple random sampling

样本容量
can the sample size

是不是可以稍稍地小一些呢
be a little smaller

那如果我们讨论的是
If the sampling methods under discussion are

重复抽样和不重复抽样
repeated sampling and

这样的抽样方法的话
non-repeated sampling

我们通过前面的学习也知道
we’ve known from the previous study that

重复抽样所带来的误差
the error in repeated sampling

相对来说会比较大
will be relatively large

而不重复抽样呢
while in the non-repeated sampling,

它的误差相对来说
the error will be relatively

要小一些
small

因此方法的不同
Therefore, sampling method

也会对容量产生一定的影响
may have a certain impact on the sample size

因此稍微总结一下
Let’s summarize a little

刚才前面提过的五个因素
The five factors mentioned earlier

都有可能会对样本容量的大小产生影响
may all have an impact

产生影响
on the sample size

其实它们最主要的一个传导机制
In fact, their primary transmission mechanism

就是我们对准确程度的要求
is our requirement of accuracy degree

准确程度越高
The higher the accuracy degree

样本容量就要相应地扩大
the larger the sample size should be

准确程度如果不是那么高
If the required accuracy degree is not so high

那么样本容量可以允许稍微小一些
the sample size may be a little smaller

那接下来我们就先以
Next, let’s first take as example

总体均值估计作为例子
the population mean estimation

来介绍样本容量确定的方法
to introduce the method of determination of sample size

之后再把它推广给总体的比例
and then generalize that to the process of sample size determination

样本容量确定的过程
for population proportion

主要其实就是在
Actually, it is to do deformation

极限误差计算的式子上
of the formula for computing

来进行变形
limiting error

回顾前面总体均值的
Let’s recall the formula for limiting error used

区间估计的时候
In the previous interval estimation

我们极限误差计算的式子
of population mean

（公式如上）
(Formula as above)

这个式子
In the previous computing process

在前面的极限误差的计算过程里边
for limiting error,

我们如果要计算极限误差
to compute the limiting error

是先需要知道（公式如上）
you need to know (formula as above)

知道σ和n的
σ and n

那如果置信系数1-α
If the confidence coefficient 1-α

在研究之前就已经给定
is given before study,

那么我们很快就可以通过查表
Then we can quickly find the corresponding critical value (formula above)

得到相应的临界值（公式如上）
by looking up the table

在已知σ和（公式如上）之后
After knowing σ and (formula as above)

我们可以求出极限误差
we can find the corresponding sample size n

为任何数值时
at any value of

所对应的样本容量n
the limiting error

因此就把极限误差的式子
By squaring the both sides of the formula of limiting error

两边平方再做移项整理
and doing transposition of items

我们就可以得到
we can get

样本容量小n的计算方法
the computing method of sample size lowercased n

那小n的计算方法
Then, the method for computing sample size lowercased n

经过整理以后
after rearranging,

就会等于（公式如上）
will equal (the formula as above)

那这个里边的E平方
The E squared in the formula

依然代表的是极限误差
still represents limiting error

这个地方
Here,

我想大家可能会
you may

产生了一个疑惑
have a doubt

老师我要计算极限误差的时候
Teacher, when I want to compute the limiting error

要先有n 式子里边
we first need n in the formula

它不是在分母吗
Isn't it in the denominator,

对不对
Is it

那现在我要计算小n的时候
Then, when I want to compute lowercased n

你又先要有E
you need to have E first

那到底是先有鸡还是先有蛋呢
which should come first, the chicken or the egg

我应该怎么算呢
how should I do about it

这个地方我要给大家解释一下
I should explain this to you

这个E通常情况下
Generally, the E

是期望的极限误差
is the expected limiting error

而不是你在刚刚前面
other than the actual limiting error

区间估计的过程里边
computed in the process

所计算的实际的极限误差
of interval estimation

所以期望的是指
So-called expectation refers to

我们在进行实验之前
the limiting error taken as objective

对极限误差的一个目标
before experiment

比如如果大家还有印象
For example, do you remember that

还记得我们前面在猜
in the game we played

我今天包里带了多少钱的
where you guessed how much money

游戏里边
I had in my wallet,

我曾经给出过一个这样的描述
I gave a description, in which

我说正负50块都算你对
I said minus or plus 50 yuan would be right

这就是我期待的极限误差
That was the limiting error I expected

实际上经过抽样
In fact, by sampling

经过估计
by estimation

你给出来的误差
the error you gave

可能只有10块钱
could be only 10 yuan,

可能在正负10之内波动
fluctuating within minus and plus 10 yuan

所以这个E来自于研究者的期待
So this E comes from the expectation of the researcher

一般事先可以根据经验来给出
In general, it can be given in advance according to experience

这是在重复抽样的情形下面
This is its computation method

它的计算方法
in the case of repeated sampling

如果我们采用了
If we use the method of

不重复抽样的方法
non-repeated sampling

那么样本容量的计算公式
the formula for the sample size

会在刚才的计算公式上面
will be slightly adjusted

略微做一些调整
based on the foregoing formula

这一次它会变成（公式如上）
This time it will become (formula as above)

如果我们仔细观察的话
By careful observation

就发现实际上
we can find that In fact

就是在刚才那个式子的基础上
it is a rearranged formula based on the foregoing one

先分子分母都乘了一个大N
by multiplying both the numerator and the denominator by an uppercased N

然后再把原来的分子
then adding the original numerator

加到现在的分母里边
to the existing denominator,

这就得到了
we get the

不重复抽样的情况下
computer formula of lowercased n

小n的计算公式
for non-repeated sampling

通过这个计算公式
In this computing formula

我们很快也可以发现
we can quickly find that

如果z σ以及E都相同的话
if z, σ and E are the same,

不重复抽样的情形下面的样本容量
the sample size in the non-repeated sampling

会比重复抽样的情形
will be a little smaller than

下面的样本容量
the sample size

要略微地少一点
in repeated sampling

这就是我们刚才分析的
This was analyzed just now

由于不重复抽样
In the case of non-repeated sampling

它的误差要更小一些
its error is smaller

如果你要求达到相同的准确程度
So to reach a same accuracy

那么它的样本容量
its sample size

就可以稍微地少一点
can be a little smaller

这个通过公式
Through the formula

也可以非常迅速地能够分析出来
this can also be quickly found by analysis

有同学可能就会反问了
Some of you might ask

老师这两个公式可以体现
Teacher: we know these two formulae can reflect

抽样方法的不同
The different impacts of sampling method

对样本容量的影响
on the sample size

那抽样组织形式的影响
Where can the impact of the sampling organization forms

在哪里可以体现呢
be reflected

这个问题问得非常的好
That's a very good question

如果你要看出来
If you want find

抽样组织形式
the impact of sampling organization form

对于样本容量的影响的话
on the sample size,

你得回忆起方差加法定理
you have to remember the variance addition theorem

以及它在抽样组织形式里面的应用
and its application in the sample organization form

前面应该已经学习过
I think we learned before

简单随机抽样的时候
that when you do a simple random sampling

我们的抽样误差的来源
the sampling error comes from

是总体的总方差
the total variance of the population

也就是我们现在的符号σ平方
or what we call σ squared

而总体的总方差
The total variance of the population

它是可以被分解的
can be broken down

在分组的情况下
In the case of grouping,

它是分解为组间方差
it's broken down into the sum of the inter-group variances

和组内平均方差的和的
and the intra-group mean variance

那在分层抽样的情况下
In the case of stratified sampling,

它的误差
its error,

抽样误差的来源
the sampling error comes only from

仅仅来自于平均组内方差
intra-group mean variance

而在整群抽样的情况下
In the case of cluster sampling,

抽样误差的来源
the sampling error comes only

它仅仅来自于群间方差
from the intra-cluster variance

因此在我们这个样本容量的
Therefore, in the computing formula

计算式子里边
of sample size,

平方它可以被置换为
the σ squared can be replaced by

组间方差（符号如上）
inter-cluster variance (symbol as above)

或者是被置换为平均组内方差
or by intra-cluster variance

（公式如上）来表示
(Formula as above)

这就体现了
This reflects

不同的抽样组织形式
the impact of

给它带来的影响
sampling organization forms

当然我们接下来的例子
Our following examples

主要是以简单随机抽样为例子
will be mainly

来介绍的
simple random sampling

接下来我们看一个例子
Let's see an example

在第三讲的例二里边
In the second example cited in the third lecture

大妞的妈妈了解到
the girl’s mother got to know that

24寸ABS+PC材质的拉杆箱
the mean level of the carrying weight

承重量的平均水平是35.36公斤
24-inch ABS+PC trolley cases was 35.36 kg

在对其进行置信水平为90%的
In the interval estimation

区间估计的时候
with a 90% confidence,

极限误差是5.76公斤
the limiting error was 5.76 kg

大妞的爸爸看了以后
The girl’s father thought

觉得这个误差水平偏高
this error level was high

建议再抽一次
and suggest doing another sampling

假定大妞爸爸认为
Assume that the girl’s father holds

误差水平不超过2.5公斤
the error level not exceeding 2.5 kg

是可以接受的
is acceptable

假定置信水平不变
suppose the confidence level stays the same

那么这一次
then, this time

我们应该抽多少只拉杆箱
how many trolley cases should we select

才能够实现这个目标呢
to realize this target

根据题目里边的意思
According to the problem meaning

我们有以下一些信息
we have the following information

1-α=90%
1-α=90%

有这个信息
With this information

我们就可以推算出
We can deduce

（公式如上）
(formula as above)

σ它来自于哪里呢
Where does σ come from

σ来自于我们前面例题里边
It comes from the data of sampling made by

大妞妈妈的抽样的数据
the girl’s mother in our previous example

我们前面计算的结果是17.9公斤
Our previous computing result was 17.9 kg

那么它就是属于
It is a content

根据以往的调查显示
fisplayed in

同样的内容了
the previous survey

所以在前面它是作为样本信息
Previously, it was taken as sample information

而在我们这个题目里边
In our problem now,

它则是作为总体标准差的信息
it is taken as the information of population standard deviation

17.9公斤
17.9 kg

那大妞爸爸认为
The girl’s father thinks

误差水平不超过2.5公斤
an error level not exceeding 2.5 kg

是可以接受的
is acceptable

也就是说这一次估计的极限误差
In other words, the limiting error of this estimate

是2.5公斤
is 2.5 kg

有了这些信息以后
Now, substitute such information

把它们代入到n的计算式里边
into the formula for n calculation

就可以把必要的样本容量计算出来
and we can get the necessary sample size

那n是等于（公式如上）
n is equal to (formula as above), and

计算结果为138.7
the calculation result is 138.7

那么我们能抽138.7只拉杆箱吗
Then can we sample 138.7 trolley cases

显然不能
Obviously impossible

那大家想一想
So think about it

我是抽138只就可以了呢
Is it ok that I sample 138 cases or

还是一定要抽139只才可以呢
must I sample 139 cases

我想大家应该都已经找到了
I think you’ve already found

这个结论
the conclusion

就是我们一定要取139只拉杆箱
That is we must sample 139 trolley cases

才能够满足
in order to satisfy

大妞爸爸提出的误差水平
the error level raised by the girl's father

因为如果你只抽138只
If you only sample 138 cases

那还少了一点点
the size will be a bit smaller

那么你的误差水平
Then the error level will

就会比大妞爸爸
compared with

设想的2.5公斤
2.5 kg supposed by the girl's father

还要来得更大一些
be a bit larger

因此在计算必要的
So when computing the necessary

样本容量的时候
sample size,

不论你后面剩余的小数是多少
no matter what the remaining decimal is

都一定要往下
you must get it

进一个整数
into the next integer

这里就不使用四舍五入的准则
We do not use the rounding rule here

都要取下一个整数了
We should take the next integer

那经过这个运算的话
After the operation,

我们就了解到
we get to know that

在均值估计的时候
How to compute the sample size

如何来推算样本容量
in mean estimation

当然这个样本容量
Of course, this sample size

是必要的样本容量
is a necessary sample size

就说你最少要抽这么多
That is you must get this size at least

而如果你的时间允许
If your time permits

你的成本允许
your cost permits

你的经历允许
your experience permits

你愿意抽150只拉杆箱
you want to sample 150 trolley cases

当然也可以
of course it is OK

因为你如果抽150只拉杆箱
Because if you select 150 cases

你的误差水平
your error level

肯定会比2.5公斤
will be sure to be smaller

要来得更小一些
than 2.5 kg

那当然更可以接受
It, of course, will be more acceptable

对不对
Right

这里我们要说明一下
We should explain here that

由于总体的标准差
as the population standard deviation

σ在大多数的情况下
σ, in most cases

都是未知的
is unknown,

我们有以下一些方法
we have the following methods

来取得σ的值
to find the value of σ

第一种方法
The first method is to

使用有同样或者类似单元的
use the standard deviation of the previous sample

以前样本的样本标准差
with the same or similar units

在我们刚刚的例题里边
In the foregoing example

用的就是这种方法
this method was used

前面大妞妈妈
As the girl's mother

已经抽过一次了
had done a sampling

那再抽一次
to take another sampling,

原来的抽样（标准差）就可以作为
the standard deviation of the previous sampling can be used as

我的标准差的值
the standard error value in the following sampling

第二个
The second method is to

抽取一个预备样本
draw a preparatory sample

进行试验性研究
for experimental research

用试验性样本的标准差
Then, the standard deviation of the experimental sample

作为σ的估计值
can be used as the σ estimate

这是第二一种方法
This is the second method

第三种方法
The third method is to use

运用对σ值的判断
a judged σ value

或者是最好的猜测
or best guessed value of σ

例如在国外的一些文献里边
In some foreign references

通常可以用全距的1/4
1/4 of the full distance is generally used

作为σ的近似值
as an approximation of σ

这个其实是根据经验法则
This is actually a simple calculation

来进行这个简单的推算的
based on a rule of thumb

如果是不能假定
If it cannot be assumed that

我们的总体
the population

服从正态分布的话
obeys a normal distribution

那么就要用切比雪夫不等式
we need to use Chebyshev's inequation

来帮我们进行推测
to help us do inference

这个时候的话
In this case,

可能是用全距的1/6
maybe 1/6 of the total distance is used

来作为σ的近似值
as an approximation of σ

所以我们可能
So we can use

有这样的一些方法
these methods

去帮助我们获得σ的值
to help us get the value of σ

这是一个简单的说明
This is a simple description

7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定：抽样的前奏（一）在线视频

7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定：抽样的前奏（一）课程教案、知识点、字幕

Learn Statistics with Ease课程列表：

Chapter 1 Data and Statistics

Chapter 2 Descriptive Statistics: Tabular and Graphical Methods

Chapter 3 Descriptive Statistics: Numerical Methods

Chapter 4 Time Series Analysis

Chapter 5 Statistical Index

Chapter 6 Sampling Distributions

Chapter 7 Confidence Intervals

Chapter 8: Hypothesis Tests

Chapter 9 Correlation and Regression Analysis

7.6.1 Determination of sample size: a prelude to sampling (1) 样本容量的确定：抽样的前奏（一）笔记与讨论

也许你还感兴趣的课程: