* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sample Distribution of the Mean and the Proportion
Survey
Document related concepts
Transcript
Sample Distribution of the Mean and the Proportion Daniel Royer Geneva Business School April 2016 1 1 Sampling Distribution of the Mean The sample mean is unbiased because the mean of all the possible sample means (of a given sample size, n) is equal to the population mean, µ. The value of the standard deviation of all possible sample means, called the stand error of the mean, expresses how the sample means vary from sample to sample. The following equation defines the standard error of the mean when sampling with replacement or sampling without replacement from large or infinite populations. σ σX̄ = √ n where σ is the population mean. • If the population’s distribution is Normal, with mean µ and standard deviation σ, then regardless of the sample size n the sampling distribution of the mean is normally distributed with mean µX̄ = µ, and standard error of the mean, σX̄ = √σn • However, in many instances either you know that the population is not normally distributed or it is unrealistic to assume that the population is normally tributed. • An important theorem in statistics, the Central Limit Theorem, deals with this situation : the Central Limit Theorem. Example Oxford Cereals fills thousands of boxes of cereal during an eight-hour shift. As the plant operations manager, you are responsible for monitoring the amount of cereal placed in each box. To be consistent with package labeling, boxes should contain a mean of 368 grams of cereal. Because of the speed of the process, the cereal weight varies from box to box, causing some boxes to be underfilled and others overfilled. If the process is not working properly, the mean weight in the boxes could vary too much from the label weight of 368 grams to be acceptable. Because weighing every single box is too time-consuming, costly, and inefficient, you must take a sample of boxes. For each sample you select, you plan to weigh the individual boxes and calculate a sample mean. You need to determine the probability that such a sample mean could have been randomly selected from a population whose mean is 368 grams. Based on your analysis, you will have to decide whether to maintain, alter, or shut down the cereal-filling process. if you randomly select a sample of 25 boxes without replacement from the thousands of boxes filled during a shift, the sample contains far less than 5% of the population. Given that standard deviation of the cereal-filling process is 15 grams, the standard error of mean is equal to 15 15 σ =3 σX̄ = √ = √ = n 5 25 2 How can you determine the probability that the sample of 25 boxes will have a below 365 grams ? To find the area below 365 grams, you compute Z= −3 X̄ − µX̄ 365 − 368 = = −1.00 = 15 √ σX̄ 3 25 The area corresponding to Z = -1.00 is 0.1587. Therefore, 15.87% of all possible samples of 25 boxes have a sample mean below 365 grams. If you select a sample of 100 boxes, what is the probability that the sample mean is below 365 grams ? Z= X̄ − µX̄ 365 − 368 −3 = −2.00 = = 15 √ σX̄ 1.5 100 The area less than Z = -2.00 is 0.0228. Therefore, 2.28% of the samples of 100 boxes have means below 365 grams, as compared with 15.87% for samples of 25 boxes. Sometimes you need to find the interval that contains a fixed proportion of the sample means. You need to determine a distance below and above the population mean containing a specific area of the normal curve. σ X̄ = µ + Z √ n In our example, find an interval symmetrically distributed around the population mean that will include 95% of the sample means, based on samples of 25 boxes. If 95% of the sample means are in the interval, then 5% are outside the interval. Divide the 5% into two equal parts of 2.5%. The value of Z corresponding to an area of 0.0250 in the lower tail of the normal curve is -1.96, and the value of Z corresponding to a cumulative area of 0.9750 (i.e., 0.0250 in the upper tail of the normal curve) is + 1.96. The lower value of X (called XL ) and the upper value of X (called XU ) are 15 X¯L = 368 + (−1.96 √ = 362.12 25 15 X¯U = 368 + (1.96) √ = 373.88 25 Therefore, 95% of all sample means, based on samples of 25 boxes, are between 362.12 and 373.88 grams. 1.1 Sampling from Non-Normally Distributed Populations : The Central Limit Theorem The Central Limit Theorem states that as the sample size (i.e. the number of values in each sample) gets large enough, the sampling distribution of the mean is approximately normally distributed. This is true regardless of the shape of the distribution of the individual values in the population. 3 Figure 1 : Sampling distribution of the mean for different populations for samples of n = 2, 5, and 30 2 Sampling Distribution of the Proportion Consider a categorical variable that has only two categories, such as the customer prefers your brand or the customer prefers the competitor’s brand. You are interested in the proportion of items belonging to one of the categories-for example, the proportion of customers that prefer your brand. The population proportion, represented by π, is the proportion of items in the entire population with the characteristic of interest. The sample proportion, represented by p, is the proportion of items in the sample with the characteristic of interest. The sample proportion, a statistic, is used to estimate the population proportion, a parameter. To calculate the sample propor4 tion, you assign one of two possible values, 1 or 0, to represent the presence or absence of the characteristic. You then sum all the 1 and 0 values and divide by n, the sample size. For example, if, in a sample of five customers, three preferred your brand and two did not, you have three 1s and two Os. Summing the three ls and two Os and dividing by the sample size of 5 results in a sample proportion of 0.60. 2.1 Sample Proportion p= Number of items having the characteristic of interest X = n Sample size The sample proportion, p, takes on values between 0 and 1. If all items have the characteristic, you assign each a score of 1, and p is equal to 1. If half the items have the characteristic, you assign half a score of 1 and assign the other half a score of 0, and p is equal to 0.5. If none of the items have the characteristic, you assign each a score of 0, and p is equal to 0. The statistic p is an unbiased estimator of the population proportion, π. 2.2 Standard Error Of The Proportion s π(1 − π) n In most cases in which inferences are made about the proportion, the sample size is substantial enough to meet the conditions for using the normal approximation. Therefore, in many instances, you can use the normal distribution to estimate the sampling distribution of the proportion σp = 2.3 Finding Z For The Sampling Distribution Of The Proportion Substituting p for X̄, π for µ, and q π(1−π) n for √σ n p−π Z=q π(1−π) n 2.4 we get (1) Example To illustrate the sampling distribution of the proportion, suppose that the manager of the local branch of a bank determines that 40% of all depositors have multiple accounts at the bank. If you select a random sample of 200 depositors, the sample size is large enough to assume that the sampling distribution of the proportion is approximately normally distributed. Then, you can calculate the probability that the 5 sample proportion of depositors with multiple accounts is less than 0.30 by using equation (1) : p−π Z=q π(1−π) n 0.30 − 0.40 = q (0.40)(0.60) 200 −0.10 = q 0.24 200 −0.10 0.0346 = −2.89 = Using normdist or a table we find that the area under the normal curve less than -2.89 is 0.0019. Therefore, if the true proportion of items of interest in the population is 0.40, then only 0.19% of the sample size of n = 200 would be expected to have sample proportions less than 0.30. 6 2.5 Summary of the Key Equations Figure 2 : Key Equations 7