Download sampling distribution

Statistics for Business (Env) Chapter 7 Sampling Distributions 1 Sampling Distributions 7.1 7.2 7.3 The Sampling Distribution of the Sample Mean Central Limit Theorem STANDARD ERROR AND STATISTICAL INFERENCE 2 The sampling process A sample should be representative of the entire population, yet it is not expected to be identical to the population. 3 Sampling distribution • Suppose that we draw all possible samples of size n from a given population. • Suppose further that we compute a statistic (e.g., a mean, IQR, standard deviation) for each sample. • The probability distribution of this statistic is called a sampling distribution. 4 Sampling error is the discrepancy, or amount of error, between a sample statistic and its corresponding population parameter. The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population. 5 The sampling distribution 6 Two different questions Data distribution Distribution of sample means P( X > 70) P( X > 70) 7 The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population. 8 Variability of a Sampling Distribution • The variability of a sampling distribution depends on three factors: – N: The number of objects in the population. – n: The number of objects in the sample. – The way that the random sample is chosen. 9 Sample without replacement • If the population size is much larger than the sample size, then the sampling distribution has roughly the same sampling error, whether we sample with or without replacement (population element can be selected only one time). • On the other hand, if the sample represents a significant fraction (say, 1/10) of the population size, the sampling error will be noticeably smaller, when we sample without replacement. 10 Methods of Probability Sampling The sampling error is the difference between a sample statistic (e.g. X) and its corresponding population parameter(e.g. ). The sampling distribution of the sample mean is the probability distribution of the population of the sample means obtainable from all possible samples of size n from a population of size N. 11 Example 1: A population that consists of only 4 scores: 2, 4, 6, 8. Mean=5 12 All the possible samples of n = 2 TABLE 7.1 Notice that the table lists random samples. This requires sampling with replacement, so it is possible to select the same score twice. 13 FIGURE 7.2 The distribution of sample means for n = 2. The distribution shows the 16 sample means from Table 7.1. Mean of sample mean = 5 14 Sampling without replacement: 4C2 = 4!/(2! 2!) = 6 Sample #1 #2 Sample mean 1 2 4 3 2 2 6 4 3 2 8 5 4 4 6 5 5 4 8 6 6 6 8 7 Mean of sample mean = 5 f 3 4 5 6 7 X 15 Example 1 Example 2: A law firm has five partners. At their weekly partners meeting each reported the number of hours they billed clients for their services last week. Partner Hours Dunn 22 Hardy 26 Kiers 30 Malory 26 Tillman 22 If two partners are selected randomly, how many different samples are possible? 16 Sampling without replacement Example 1 5 objects taken 2 at a time. 5 C2 Partners 1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5 5!   10 2! (5  2)! Total 48 52 48 44 56 52 48 56 52 48 A total of 10 different samples Mean 24 26 24 22 28 26 24 28 26 24 17 Example 1 As a sampling distribution Sample Mean Frequency Probability 22 1 1/10 24 4 4/10 26 3 3/10 28 2 2/10 18 Compute the mean of the sample means. Compare it with the population mean. The mean of the sample means E(X)   X  22(1)  24(2)  26(3)  28(2)  25.2 10 The population mean 22  26  30  26  22   25 .2 5 Notice that the mean of the sample means is exactly equal to the population mean. 19 Example 3 • Take another population: 3, 6, 9, 12, 15 • Population size N=5, sample size n=2, mean=9, variance=18, SD=4.2426 • The number of possible samples which can be drawn without replacement is 5C2 =10 20 Variance = 6.75 21 Example 4: Sampling All Stocks • Population of returns of all 1,815 stocks listed on NYSE for 1987 – See Figure on next slide – The mean rate of return m was –3.5% with a standard deviation s of 26% • Draw all possible random samples of size n=5 and calculate the sample mean return of each – Sample with a computer – See Figure on next slide 22 Example: Sampling All Stocks 23 Results from Sampling All Stocks • Observations – Both histograms appear to be bell-shaped and centered over the same mean of –3.5% – The histogram of the sample mean returns looks less spread out than that of the individual returns • Statistics – Mean of all sample means: µx = µ = -3.5% – Standard deviation of all possible means:  26 x    11.63% n 5 24 Examples above demonstrate the construction of the distribution of sample means for a relatively simple, specific situation. In most cases, however, it will not be possible to list all the samples and compute all the possible sample means. Therefore, it is necessary to develop the general characteristics of the distribution of sample means that can be applied in any situation. Fortunately, these characteristics are specified in a mathematical proposition known as the central limit theorem. This important and useful theorem serves as a cornerstone for much of inferential statistics. 25 General Conclusions 1. If the population of individual items is normal, then the population of all sample means is also normal 2. Even if the population of individual items is not normal, there are circumstances when the population of all sample means is normal (Central Limit Theorem) 26 Central Limit Theorem: For any population with mean  and standard deviation , the distribution of sample means for sample size n will have a mean of  and a standard deviation of  and will n approach a normal distribution as n becomes sufficiently large. The value of this theorem comes from two simple facts. First, it describes the distribution of sample means for any population, no matter what shape, or mean, or standard deviation. Second, the distribution of sample means “approaches” a normal distribution very rapidly. By the time the sample size reaches n > 30, the distribution is almost perfectly normal. 27 Central Limit Theorem If the samples size is large enough (n30), then we can consider the sample mean approximately follows a normal distribution f(X) ~ N(, 2 / n). This theorem also implies the variance of the sample mean is the population variance divided by n. (for large n) 2 2    Var (X)       X n Averages are less variable than individual observations. 28 Sample Means Sample means follow the normal distribution under two conditions: the population itself follows the normal distribution OR the sample size is large enough (n30). 29 Distribution of all possible sample means x x  n The distribution of sample means is less spread out.  Distribution of data (normal distribution) 30 The standard deviation of the distribution of sample means is called The standard error measures the standard amount of difference between and  that is reasonable to expect simply by chance. It should be intuitively reasonable that the size of a sample should influence how accurately the sample represents its population. Specifically, a large sample should be more accurate than a small sample. In general, as the sample size increases, the error between the sample mean and the population mean should decrease. This rule is also known as the law of large numbers. 31 The law of large numbers states that the larger the sample size (n), the more probable it is that the sample mean will be close to the population mean. The standard error provides a way to measure the “average” or standard distance between a sample mean and the population mean. 32 The distribution of sample means for random samples as the size n increases 33 Example 5: The population of scores on the SAT forms a normal distribution with mean = 500 and sd = 100. If you take a random sample of n = 25 students, what is the probability that the sample mean would be greater than = 540? You can restate this probability question as : Out of all the possible sample means, what proportion has values greater than 540? Need to determine the distribution of the sample mean with n = 25. We know: 1. The distribution is normal because the population of SAT scores is normal. 2. The distribution has a mean of 500 because the population mean is 500. 3. The distribution has a standard error of 100/sqrt(25) 34 The distribution of sample means for n = 25. Samples were selected from a normal population with mean = 500 and sd= 100. x   The next step is to use a z-score to locate the exact position of the distribution. n = 540 in 35 The value 540 is located above the mean by 40 points, which is exactly 2 standard deviations (in this case, exactly 2 standard errors). Thus, the z-score for 540 is 2.00. Because this distribution of sample means is normal, you can use the unit normal table to find the probability associated with z>2.00. The table indicates that 0.0228 of the distribution is located in the tail of the distribution beyond z>2.00. Our conclusion is that it is very unlikely, p = 0.0228 (2.28%), to obtain a random sample of n = 25 students with an average SAT score greater than 540. 36 Example 6 Suppose the mean selling price of a gallon of gasoline in the U.S. is $1.30. Further, assume the population  is $0.28. What is the probability that the mean of a sample of 35 gasoline stations is between $1.22 and $1.38? 37 Example 2 The z-values corresponding to $1.22 and $1.38 are -1.69 and 1.69 From the table for standard normal distribution P(1.69  Z  1.69)  2(.4545)  .9090 We would expect about 91% of the sample means to be within $0.08 of the population mean. 38 Example 7 Assume that a school district has 10,000 sixth graders. In this district, the average weight of a sixth grader is 80 pounds, with a standard deviation of 20 pounds. Suppose you draw a random sample of 50 students. What is the probability that the average weight of a sampled student will be less than 75 pounds? 39 Example 7 cont. • The standard deviation of the sampling distribution can be computed using the following formula. x   n • σx = 20 * sqrt(1/50) = 20 * 0.141 = 2.828 • The sampling distribution of the mean is normally distributed with a mean of 80 and a standard deviation of 2.83. • To find from table: P(z<(75-80)/2.83)=P(z<-1.77)=0.038 The Central Limit Theorem Random Sample (x1, x2, …, xn) x X as n  large Population Distribution (, ) (right-skewed) Sampling Distribution of Sample Mean ( x  , x   n (nearly normal) 41 ) Example: Central Limit Theorem Simulation 42 Histogram of Population - Bimodal Distribution: population = 16,000; mean = 5.002 std dev 4.242 Sampling Distribution (from a bimodal population) n = 2: number of samples = 4000; mean = 4.977; std dev 3.017; 43 Sampling Distribution (from a bimodal population) n = 3: number of samples = 4000; mean = 4.946; std dev 2.425; Sampling Distribution n = 30: number of samples = 4000; mean = 5.032; std dev 0.722; 44 STANDARD ERROR AND STATISTICAL INFERENCE : Standard error as a measure of chance Most inferential statistics are used in the context of a research study. Typically, the researcher begins with a general question about how a treatment will affect the individuals in a population. For example, Will the drug affect blood pressure? Will the hormone affect growth? Will the special training affect students’ reading scores? 45 46 The question for the researcher is how to interpret the 4-point difference. Specifically, there are two possible explanations: 1. The treatment may have caused the scores in the sample to be 4 points higher. 2. The 4-point difference may be sampling error. Remember, a sample mean is not expected to be exactly the same as the population mean. Perhaps the treatment has no effect at all, and the 4-point difference has occurred just by chance. The standard error can help the researcher decide between these two alternatives. In particular, the standard error tells exactly how much difference is reasonable to expect just by chance. For example, if the standard error is only 1 point, then the researcher could conclude that the observed difference (4 points) is much larger than would be expected by chance. In this case, it would be reasonable to conclude that the treatment has caused the difference. 47 The standard error is reported in Scientific Journals in two ways. It may be reported in a table along with the sample means (see Table 7.2). Alternatively, the standard error may be reported in graphs. 48 Figure 7.8 illustrates the use of a bar graph to display information about the sample mean and the standard error. Note that the mean is represented by the height of the bar, and the standard error is depicted on the graph by brackets at the top of each bar. Each bracket extends 1 standard error above and 1 standard error below the sample mean. 49 Figure 7.9 shows how sample means and standard error are displayed on a line graph. 50 Summary: Sampling Methods and the Central Limit Theorem ONE Explain why sometime sampling is the only feasible way to learn about a population. TWO Define and construct a sample distribution of the sample mean. THREE Explain and apply the central limit theorem. FOUR STANDARD ERROR AND STATISTICAL INFERENCE 51

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download sampling distribution