Download Sampling Theory - VT Scholar

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sampling Theory
I.
Purpose of Sampling Theory
A.
Statistics (sample proportions, sample means, sample standard deviations, etc.)
are used to make inferences about about parameters (population proportion,
population mean, population standard deviation).
1.
I.e., statistics are used to estimate parameters,
2.
and statistics are used to test hypotheses about parameters.
B.
But statistics are random variables.
C.
Therefore, like all random variables, statistics have a probability distribution.
1.
The probability distribution of a statistic is called a sampling distribution.
Document1
1
4/30/2017
Table 1
Sampling Theory
Sampling Theory of
the Sample Mean
I.e., the distribution of
the...
Example
The distribution of the...
Location Theorem
Example
Dispersion Theorem
Underlying Distribution of the
(Underlying) Quantitative
Random Variable (r.v.) of
Interest
Y = quantitative r.v. of interest
(continuous or discrete)
Y = DBH (cm) of the i-th
randomly sampled tree from
Region 1 of the MNF, i =
1,2,...,25.
Expected value of r.v. of interest
= Population mean
E(Y) = μ
Sampling Distribution of the
Sample Mean
Y = sample mean for samples of
size n.
Y = (Sample) Mean DBH (cm)
of trees in a sample of size n = 25
trees from Region 1 of the MNF.
Expected value of sample mean
= Population mean
E Y   E Y   
E (Y) = 30 cm
E Y   30 cm )
Population standard deviation,
SD = σ
Standard error of the mean,
SE   n
Y  Y
Example
Shape Theorem 1
σ = 10 cm
If the underlying distribution is
normal,
Example
If DBH is normally distributed,
Shape Theorem 2, the
Central Limit
Theorem (CLT)
(Regardless of the underlying
distribution)
Example
Even if the DBH is not normally
distributed,
Standardization
Example
Document1
Z
Z
Y 

Y  30
10
2
n
SE = 10/√25 = 10/5 = 2 cm
then the sampling distribution is
normal (regardless of sample
size).
then the sample mean DBH is
normally distributed.
If the sample size is large, then
the sampling distn is
approximately normal. (The
larger the sample, the closer the
approximation.)
for a sample of size 25 (or
larger), the sample mean DBH is
approximately normal.
Y 
Z
 n
Z
Y  30
2
4/30/2017
Example
Let Yi = the DBH (cm) of the ith randomly sampled tree from Region 1 of the MNF be the
underlying random variable of interest. Assume that is normally distributed with a mean of 30
and a standard deviation of 10.
Symbolically, we have
Y
i
Normal    30,   10
(1)
1. Compute the probability that a randomly sample tree from MNF Region 1 has a DBH
between 28 and 32 cm.
32  30 
 28  30
P 28  Yi  32  P 
 Zi 
  P 0.2  Z i  0.2  0.1585
10 
 10
(2)
2. Compute the probability that a random sample of n = 4 trees from MNF Region 1 has a
sample mean DBH between 28 and 32 cm.
Now, in addition to (1), we have, for samples of size n = 4,
Y
10


Normal  Y  30,  Y 
 5.00 
4


(3)
and, for samples of size n = 4,
32  30 
 28  30
P 28  Y  32  P 
Z

5 
 5
(4)
 P 0.4  Z  0.4  0.3108
3. Compute the probability that a random sample of n = 16 trees from MNF Region 1 has a
sample mean DBH between 28 and 32 cm.
Now, in addition to (1) and (3), we have, for n = 16,
Document1
3
4/30/2017
Y
10


Normal  Y  30,  Y 
 2.5 
16


(5)
and, for samples of size n = 20,
32  30 
 28  30
P 28  Y  32  P 
Z

2.5 
 2.5
(6)
 P 0.8  Z  0.8  0.5763
4. Compute the probability that a random sample of n = 64 trees from MNF Region 1 has a
sample mean DBH between 28 and 32 cm.
Now, in addition to (1), (3), and (5) we have, for n = 64,
Y
10


Normal  Y  30,  Y 
 1.25 
64


(7)
and, for samples of size n = 64,
32  30 
 28  30
P 28  Y  32  P 
Z

1.25 
 1.25
(8)
 P 1.6  Z  1.6  0.8904
The underlying population distribution and the sampling distribution of the mean for samples
of size n = 1, 4, and 16 are shown in Figure
2 and to the right.
error, 
PDF
5. Now tabulate our results, the standard
n , and the probability that
the sample mean, Y , is within a certain
distance in this case 2, from the
population mean,  .
0
10
20
30
Y
40
50
X = Mean + (Z)(SD)
Document1
4
4/30/2017
Table 2.
The effect of sample size on SE and difference between the sample mean and the
population mean in terms of probability.
Standard error of the
mean,
SE =  n
Sample size,
n
1
4
16
64
Probability Y within ±2 cm of
the population mean  ,


P 28  Y  32  P Y    2
10.00
5.00
2.50
1.25
0.1585
0.3108
0.5763
0.8904
6. Graph the relationships.
10
SEM
8
6
4
2
P{28 < Sample Mean < 32}
0
0
10 20 30 40 50 60 70 80 90 100
n
0
10 20 30 40 50 60 70 80 90
n
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Figure 1
Document1
(i) The standard error of the mean and (ii) the probability that sample mean is near the
population mean, both as a function of sample size.
5
4/30/2017
7. Describe how the standard error of the sample mean changes as the sample size increases.
Answer: We can see the following by examining the equations, especially equations of
the Dispersion Theorem and of Standardization in Table 1, or, equivalently, by
examining examples of those equation in Table 2, or the graphs of the relationships in

As the sample size increases,
o the standard error of the mean decreases, and
o the difference between the sample mean and population mean decreases
probabilistically. I.e., the probability that the difference is small increases,
i.e., the probability that the difference is large decreases (no matter how
you define small and large). This is because
28  Y  32    2  Y    2   Y    2
(9)
and therefore


P 28  Y  32  P   2  Y    2  P Y    2
(10)
Moreover, the difference of 2 cm could be replace by any other fixed
difference that is considered small of large, without changing the nature of
the conclusions.

Law of diminishing returns. To cut the standard error in half, requires (not
doubling, but) quadrupling the sample size. This is how the law of diminishing
returns manifests in sampling (i.e., experimenting or observing).
Document1
6
4/30/2017
PDF
0
10
20
30
40
50
X = Mean + (Z)(SD)
Y
Figure 2
As the sample size increases, the SEM decreases, the dispersion of the sampling
distribution of the (sample) mean decreases, and the probability of being within any
fixed distance from the population mean increases approaching 1.
Figures in this document were made with SamplingNormal(30,10)n=4,16.JMP.
Golde I. Holtzman, Department of Statistics, College of Arts and Sciences, Virginia Tech (VPI)
Last updated: March 1, 2010 © Golde I. Holtzman, all rights reserved.
URL: ../STAT5605/sampling.html
Document1
7
4/30/2017
Related documents