Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Theory I. Purpose of Sampling Theory A. Statistics (sample proportions, sample means, sample standard deviations, etc.) are used to make inferences about about parameters (population proportion, population mean, population standard deviation). 1. I.e., statistics are used to estimate parameters, 2. and statistics are used to test hypotheses about parameters. B. But statistics are random variables. C. Therefore, like all random variables, statistics have a probability distribution. 1. The probability distribution of a statistic is called a sampling distribution. Document1 1 4/30/2017 Table 1 Sampling Theory Sampling Theory of the Sample Mean I.e., the distribution of the... Example The distribution of the... Location Theorem Example Dispersion Theorem Underlying Distribution of the (Underlying) Quantitative Random Variable (r.v.) of Interest Y = quantitative r.v. of interest (continuous or discrete) Y = DBH (cm) of the i-th randomly sampled tree from Region 1 of the MNF, i = 1,2,...,25. Expected value of r.v. of interest = Population mean E(Y) = μ Sampling Distribution of the Sample Mean Y = sample mean for samples of size n. Y = (Sample) Mean DBH (cm) of trees in a sample of size n = 25 trees from Region 1 of the MNF. Expected value of sample mean = Population mean E Y E Y E (Y) = 30 cm E Y 30 cm ) Population standard deviation, SD = σ Standard error of the mean, SE n Y Y Example Shape Theorem 1 σ = 10 cm If the underlying distribution is normal, Example If DBH is normally distributed, Shape Theorem 2, the Central Limit Theorem (CLT) (Regardless of the underlying distribution) Example Even if the DBH is not normally distributed, Standardization Example Document1 Z Z Y Y 30 10 2 n SE = 10/√25 = 10/5 = 2 cm then the sampling distribution is normal (regardless of sample size). then the sample mean DBH is normally distributed. If the sample size is large, then the sampling distn is approximately normal. (The larger the sample, the closer the approximation.) for a sample of size 25 (or larger), the sample mean DBH is approximately normal. Y Z n Z Y 30 2 4/30/2017 Example Let Yi = the DBH (cm) of the ith randomly sampled tree from Region 1 of the MNF be the underlying random variable of interest. Assume that is normally distributed with a mean of 30 and a standard deviation of 10. Symbolically, we have Y i Normal 30, 10 (1) 1. Compute the probability that a randomly sample tree from MNF Region 1 has a DBH between 28 and 32 cm. 32 30 28 30 P 28 Yi 32 P Zi P 0.2 Z i 0.2 0.1585 10 10 (2) 2. Compute the probability that a random sample of n = 4 trees from MNF Region 1 has a sample mean DBH between 28 and 32 cm. Now, in addition to (1), we have, for samples of size n = 4, Y 10 Normal Y 30, Y 5.00 4 (3) and, for samples of size n = 4, 32 30 28 30 P 28 Y 32 P Z 5 5 (4) P 0.4 Z 0.4 0.3108 3. Compute the probability that a random sample of n = 16 trees from MNF Region 1 has a sample mean DBH between 28 and 32 cm. Now, in addition to (1) and (3), we have, for n = 16, Document1 3 4/30/2017 Y 10 Normal Y 30, Y 2.5 16 (5) and, for samples of size n = 20, 32 30 28 30 P 28 Y 32 P Z 2.5 2.5 (6) P 0.8 Z 0.8 0.5763 4. Compute the probability that a random sample of n = 64 trees from MNF Region 1 has a sample mean DBH between 28 and 32 cm. Now, in addition to (1), (3), and (5) we have, for n = 64, Y 10 Normal Y 30, Y 1.25 64 (7) and, for samples of size n = 64, 32 30 28 30 P 28 Y 32 P Z 1.25 1.25 (8) P 1.6 Z 1.6 0.8904 The underlying population distribution and the sampling distribution of the mean for samples of size n = 1, 4, and 16 are shown in Figure 2 and to the right. error, PDF 5. Now tabulate our results, the standard n , and the probability that the sample mean, Y , is within a certain distance in this case 2, from the population mean, . 0 10 20 30 Y 40 50 X = Mean + (Z)(SD) Document1 4 4/30/2017 Table 2. The effect of sample size on SE and difference between the sample mean and the population mean in terms of probability. Standard error of the mean, SE = n Sample size, n 1 4 16 64 Probability Y within ±2 cm of the population mean , P 28 Y 32 P Y 2 10.00 5.00 2.50 1.25 0.1585 0.3108 0.5763 0.8904 6. Graph the relationships. 10 SEM 8 6 4 2 P{28 < Sample Mean < 32} 0 0 10 20 30 40 50 60 70 80 90 100 n 0 10 20 30 40 50 60 70 80 90 n 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Figure 1 Document1 (i) The standard error of the mean and (ii) the probability that sample mean is near the population mean, both as a function of sample size. 5 4/30/2017 7. Describe how the standard error of the sample mean changes as the sample size increases. Answer: We can see the following by examining the equations, especially equations of the Dispersion Theorem and of Standardization in Table 1, or, equivalently, by examining examples of those equation in Table 2, or the graphs of the relationships in As the sample size increases, o the standard error of the mean decreases, and o the difference between the sample mean and population mean decreases probabilistically. I.e., the probability that the difference is small increases, i.e., the probability that the difference is large decreases (no matter how you define small and large). This is because 28 Y 32 2 Y 2 Y 2 (9) and therefore P 28 Y 32 P 2 Y 2 P Y 2 (10) Moreover, the difference of 2 cm could be replace by any other fixed difference that is considered small of large, without changing the nature of the conclusions. Law of diminishing returns. To cut the standard error in half, requires (not doubling, but) quadrupling the sample size. This is how the law of diminishing returns manifests in sampling (i.e., experimenting or observing). Document1 6 4/30/2017 PDF 0 10 20 30 40 50 X = Mean + (Z)(SD) Y Figure 2 As the sample size increases, the SEM decreases, the dispersion of the sampling distribution of the (sample) mean decreases, and the probability of being within any fixed distance from the population mean increases approaching 1. Figures in this document were made with SamplingNormal(30,10)n=4,16.JMP. Golde I. Holtzman, Department of Statistics, College of Arts and Sciences, Virginia Tech (VPI) Last updated: March 1, 2010 © Golde I. Holtzman, all rights reserved. URL: ../STAT5605/sampling.html Document1 7 4/30/2017