Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STA301 – Statistics and Probability Lecture No 32: • Sampling Distribution of • Sampling Distribution of p̂ X1 X 2 We discussed the mean and the standard deviation of the sampling distribution, and, towards the end of the lecture, we consider the very important theorem known as the Central Limit Theorem. Let us now consider the real-life application of this concept with the help of an example: EXAMPLE: A construction company has 310 employees who have an average annual salary of Rs.24,000.The standard deviation of annual salaries is Rs.5,000. Suppose that the employees of this company launch a demand that the government should institute a law by which their average salary should be at least Rs. 24500, and, suppose that the government decides to check the validity of this demand by drawing a random sample of 100 employees of this company, and acquiring information regarding their present salaries. What is the probability that, in a random sample of 100 employees, the average salary will exceed Rs.24,500 (so that the government decides that the demand of the employees of this company is unfounded, and hence does not pay attention to the demand(although, in reality, it was justified))? SOLUTION: The sample size (n = 100) is large enough to assume that the sampling distribution ofX is approximately normally distributed with the following mean and standard deviation: and standard deviation x Rs.24,000. N n 5000 310 100 . n N 1 100 310 1 Rs. 412.20 x NOTE: Here we have used finite population correction factor (fpc), because the sample size n = 100 is greater than 5 percent of the population size N = 310. Since X is approximately N(24000, 412.20), therefore Z X x x X 24000 412.20 is approximately N(0, 1).We are required to evaluate P(X > 24,500). Atx = 24,500, we find that z 24500 24000 1.21 412.20 24000 24500 0 1.21 X Z Using the table of areas under the standard normal curve, we find that the area between z = 0 and z = 1.21 is 0.3869. Virtual University of Pakistan Page 253 STA301 – Statistics and Probability 0.3869 24000 24500 0 1.21 X Z Hence, P(X > 24,500) = P(Z > 1.21) = 0.5 – P(0 < Z < 1.21) = 0.5 – 0.3869 = 0.1131. 0.3869 0.1131 24000 24500 0 1.21 X Z Hence, the chances are only 11% that in a random sample of 100 employees from this particular construction company, the average salary will exceed Rs.24,500.In other words, the chances are 89% that, in such a sample, the average salary will not exceed Rs.24,500. Hence, the chances are considerably high that the government might pay attention to the employees’ demand. SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION: In this regard, the first point to be noted is that, whenever the elements of a population can be classified into two categories, technically called “success” and “failure”, we may be interested in the proportion of “successes” in the population. If X denotes the number of successes in the population, then the proportion of successes in the population is given by p X . N Similarly, if we draw a sample of size n from the population, the proportion of successes in the sample is given by pˆ X , n where X represents the number of successes in the sample. It is interesting to note that X is a binomial random variable and the binomial parameter p is being called a proportion of successes here. The sample proportion has different values in different samples. It is obviously a random variable and has a probability distribution. This probability distribution of the proportions of successes in all possible random samples of size n, is called the sampling distribution of p̂. Virtual University of Pakistan Page 254 STA301 – Statistics and Probability We illustrate this sampling distribution with the help of the following examples: EXAMPLE-1: A population consists of six values 1, 3, 6, 8, 9 and 12.Draw all possible samples of size n = 3 without replacement from the population and find the proportion of even numbers in each sample. Construct the sampling distribution of sample proportions and verify that i) p̂ p ii) Var p̂ pq N n . . n N 1 SOLUTION: The number of possible samples of size n = 3 that could be selected without replacement from a population of size N is 6 20. 3 Let p̂ represent the proportion of even numbers in the sample.Then the 20 possible samples and the proportion of even numbers are given as follows: Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Sample Data 1, 3, 6 1, 3, 8 1, 3, 9 1, 3, 12 1, 6, 8 1, 6, 9 1, 6, 12 1, 8, 9 1, 8, 12 1, 9, 12 3, 6, 8 3, 6, 9 3, 6, 12 3, 8, 9 3, 8, 12 3, 9, 12 6, 8, 9 6, 8, 12 6, 9, 12 8, 9, 12 Sample Proportion p̂ 1/3 1/3 0 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1 2/3 2/3 The sampling distribution of sample proportion is given below: Virtual University of Pakistan Page 255 STA301 – Statistics and Probability Sampling Distribution of p̂ : Probability p̂ f p̂ p̂ 2 f p̂ 0 1/3 2/3 1 No. of Samples 1 9 9 1 1/20 9/20 9/20 1/20 0 3/20 6/20 1/20 0 1/20 4/20 1/20 20 1 10/20 6/20 p̂ f p̂ Now p̂ p̂ f p̂ 10 0.5 , and 20 2p̂ p̂ 2 f p̂ p̂ f p̂ 2 2 2 10 1 0.05. 60 20 20 To verify the given relations, we first calculate the population proportion p.Thus: p X , where X represents N the number of even numbers in the population. In other words, p 3 0 .5 , 6 Hence, we find that p̂ 0.5 p , pq N n 0.25 6 3 . . n N 1 3 6 1 and 0.25 0.05 Var p̂ 5 Hence, two properties of the sampling distribution of p̂ are verified. pˆ The sampling distribution of p̂ pq N n , n N 1 has the following important properties: Virtual University of Pakistan Page 256 STA301 – Statistics and Probability PROPERTIES OF THE SAMPLING DISTRIBUTION OF p̂ : Property No. 1: The mean of the sampling distribution of proportions, denoted by proportion p, that is p̂ , is equal to the population pˆ p. Property No. 2: The standard deviation of the sampling distribution of proportions, called the standard error of p̂ and denoted by , p̂ is given as: a) p̂ pq , n when the sampling is performed with replacement b) when sampling is done without replacement from a finite population. (As in the case of the sampling distribution of X,is known as the finite population correction factor (fpc).) Nn , N 1 Property No. 3: SHAPE OF THE DISTRIBUTION: The sampling distribution of is the binomial distribution. However, for sufficiently large sample sizes, the sampling distribution of approximately normal. As n , the sampling distribution of p̂ approaches normality: is p̂ pˆ p. p̂ pq , n As a rule of thumb, the sampling distribution of will be approximately normal whenever both np and nq are equal to or greater than 5.Let us apply this concept to a real-world situation: p̂ EXAMPLE-2: Ten percent of the 1-kilogram boxes of sugar in a large warehouse are underweight. Suppose a retailer buys a random sample of 144 of these boxes. What is the probability that at least 5 percent of the sample boxes will be underweight? SOLUTION: Virtual University of Pakistan Page 257 STA301 – Statistics and Probability Here the statistic is the sample proportion, The sample size (n = 144) is large enough to assume that the sample proportion is approximately normally distributed with mean Mean of the sampling distribution of p̂ : p̂ p 0.10 , and Standard Error of p̂ : p̂ 0.100.90 pq n 144 0.3 0.025. 12 Therefore, the sampling distribution of is approximately N(0.10, 0.025) And, hence: Z p̂ p̂ p̂ p̂ p pq / n p̂ 0.10 0.025 is approximately N(0, 1). We are required to find the probability that the proportion of underweight boxes in the sample is equal to or greater than 5% i.e., we require P pˆ 0.05. In this regard, a very important point to be noted is that, just as we use a continuity correction of + ½ whenever we consider the normal approximation to the binomially distributed random variable X, in this situation, since p̂ X , n therefore, we need to use the following continuity correction: We need to use a continuity correction of 1 2n in the case of the sampling distribution of p̂ . Applying the continuity correction in this problem, we have: 1 Pp̂ 0.05 P p̂ 0.05 2144 1 P p̂ 0.05 288 p̂ 0.10 0.05 1 / 288 0.10 P 0.025 0.025 P Z 2.14 Virtual University of Pakistan P 2.14 Z 0 P0 Z 0.4838 0.5 0.9838 Page 258 STA301 – Statistics and Probability 0.4838 0.5 p̂ 0.10 -2.14 0 Z Hence, the probability that at least 5% of the sample boxes are under-weight is as high as 98% ! The sampling distributions of X and pertain to the situation when we are drawing all possible samples of a p̂ particular size from one particular population. Next, we will discuss the case when we are dealing with all possible samples drawn from two populations, such that the samples from the two populations are independent. In this regard, we will consider the sampling distributions of X X and pˆ pˆ : 1 We begin with the sampling distribution of 2 1 2 X1 X 2 : SAMPLING DISTRIBUTION OF DIFFERENCES BETWEEN MEANS Suppose we have two distinct populations with means Let independent random samples of sizes differences x1 x 2 1 and 2 and variances 12 and 22 n1 and n 2 respectively. be selected from the respective populations, and the between the means of all possible pairs of samples be computed. X1 X 2 can sampling distribution of the differences of sample means X1 X 2 . Then, a probability distribution of the differences be obtained. Such a distribution is called the We illustrate the sampling distribution of X X with the help of the following example: 1 2 EXAMPLE: Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 4, 6, 8.Similarly, draw all possible random samples of size n = 2 with replacement from another finite population consisting of 1, 2, 3. a) Find the possible differences between the sample means of the two population. b) Construct the sampling distribution of Virtual University of Pakistan X1 X 2 and compute its mean and variance. Page 259 STA301 – Statistics and Probability c) Verify that x1 x 2 1 2 and 2 x1 x2 12 n1 22 n1 . SOLUTION: Whenever we are sampling with replacement from a finite population, the total number of possible samples is Nn (where N is the population size, and n is the sample size).Hence, in this example, there are (3)2 = 9 possible samples which can be drawn with replacement from each population. These two sets of samples and their means are given below: From Population 1 From Population 2 Sample Sample Sample Sample x1 x2 No. Value No. Value 1 4, 4 4 1 1, 1 1.0 2 4, 6 5 2 1, 2 1.5 3 4, 8 6 3 1, 3 2.0 4 6, 4 5 4 2, 1 1.5 5 6, 6 6 5 2, 2 2.0 6 6, 8 7 6 2, 3 2.5 7 8, 4 6 7 3, 1 2.0 8 8, 6 7 8 3, 2 2.5 9 8, 8 8 9 3, 3 3.0 a) Since there are 9 samples from the first population as well as 9 from the second, hence, there are 81 possible combinations of x1 andx2 . The 81 possible differencesx1 –x2 are presented in the following table: x2 x2 4 3.0 2.5 2.0 2.5 2.0 1.5 2.0 1.0 1.0 5 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0 6 5 6 7 1.0 5.0 4.0 5.0 6.0 1.5 4.5 3.5 4.5 5.5 2.0 4.0 3.0 4.0 5.0 1.5 4.5 3.5 4.5 5.5 2.0 4.0 3.0 4.0 5.0 2.5 3.5 2.5 3.5 4.5 2.0 4.0 3.0 4.0 5.0 2.5 3.5 2.5 3.5 4.5 3.0 3.0 2.0 3.0 4.0 b)The sampling distribution ofX 1 X 2 is as follows: 6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0 7 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0 8 7.0 6.5 6.0 6.5 6.0 5.5 6.0 5.5 5.0 Probability x1 x 2 Tally d f f x 1 x 2 df (d) d2 f(d) f d 1.0 | 1 1/81 1/81 1.0/81 1.5 || 2 2/81 3/81 4.5/81 Virtual University of Pakistan 2.0 |||| 5 5/81 10/81 20.0/81 2.5 |||| | 6 6/81 15/81 37.5/81 Page 260 STA301 – Statistics and Probability Thus the mean and the variance are x x x1 x 2 f x1 x 2 1 2 df d 324 4 , and 81 2x1 x 2 d 2f d df d 2 2 1431 324 53 5 16 1.67 81 81 3 3 c) In order to verify the properties of the sampling distribution of variance of the first population: The mean and standard deviation of the first population are: 1 12 X1 X 2 we first need to compute the mean and 468 6 , and 3 4 62 6 62 8 62 3 8 . 3 12 22 8 1 2 1 . . n1 n 2 3 2 3 2 4 1 5 3 3 3 1.67 2 Virtual University of Pakistan x1 x 2 Page 261 STA301 – Statistics and Probability And The mean and variance of the second population are: 2 22 1 2 3 2 , and 3 1 22 2 22 3 22 3 2 . 3 Now x1 x2 4 6 2 1 2 , and 12 22 8 1 2 1 . . n1 n 2 3 2 3 2 4 1 5 3 3 3 1.67 2x1 x 2 Hence, two properties of the sampling distribution of differences X1 X 2 X 1 X 2 are satisfied. The sampling distribution of the has the following properties: PROPERTIES OF THE SAMPLING DISTRIBUTION OF X1 Property No. 1: The mean of the sampling distribution of between population means, that is X2 : X1 X 2 , denoted by X 1 X2 , is equal to the difference X1X2 1 2 Property No. 2: In case of sampling with or without replacement from two infinite populations, the standard deviation of the sampling distribution of X1 X 2 (i.e. standard error of The above expression for the Standard X1 X 2 ), denoted by X 12 22 X1 of X 2 X1 X Error n1 2 nalso 2 1 X2 , is given by holds for finite population when sampling is performed with replacement. In case of sampling without replacement from a finite population, the formula for the standard error of will be suitably modified. Property No. 3: Shape of the distribution: Virtual University of Pakistan Page 262 STA301 – Statistics and Probability a) If the POPULATIONS are normally distributed, the sampling distribution of sizes, will be normal with mean 1 2 and variance X1 X 2 , regardless of sample 12 22 . n1 n 2 In other words, the variable Z X 1 X 2 1 2 12 n1 22 n2 is normally distributed with zero mean and unit variance. b) If the POPULATIONS are non-normal and if both sample sizes are large, (i.e., greater than or equal to 30), then the sampling distribution of the differences between means is approximately a normal distribution by the Central Limit Theorem. In this case too, the variable Z X 1 X 2 1 2 12 n1 22 n2 will be approximately normally distributed with mean zero and variance one. Virtual University of Pakistan Page 263