Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STP231 Brief Class Notes Instructor: Ela Jackiewicz CHAPTER 5 THE SAMPLING DISTRIBUTION OF THE SAMPLE MEAN Sampling Variability. We will visualize our data as a random sample from the population with unknown parameter μ . Our sample mean Ȳ is intended to estimate population mean μ . Each sample we take is going to give us a different estimate. The variability among the samples is called sampling variability. Probability distribution of Ȳ values from all possible samples from our population is called sampling distribution of Ȳ . In General: Sampling Distribution of a statistics = the distribution of all possible values of the statistic for samples of a given size. Sampling Error – the error resulting from using a sample characteristic (statistic) to estimate a population characteristic (parameter). We will try to observe a behavior of Ȳ for unrealistic small population of size N=5. Example 1. Weight of certain breed of dogs. Below we have unrealistic small population of 5 dogs and their weight in pound. dog weight A 42 B 48 C 52 D 58 E 60 The population mean height is ∑ x =42+ 48+ 52+ 58+ 60 =52 pounds. σ=6.57 pounds μ= N 5 Possible samples of size two and its means are summarized in the following table. Samples A, B A, C A, D A, E B, C B, D B, E C, D C, E D, E Weights 42,48 42,52 42,58 42,60 48,52 48,58 48,60 52,58 52,60 58,60 45 47 50 51 50 53 54 55 56 59 Y Any sample of size 2 we will take is going to be one of the above 10 possible samples, so probability of obtaining each value of Ȳ is 1/10. How confident of our sample mean of size two is going to estimate our population mean within 2 pounds of the population mean weight? In other words what is P(50≤Ȳ ≤54) ? 1 STP231 Brief Class Notes Instructor: Ela Jackiewicz Since there are 5 samples ({A,D}, {A,E}, {B,C}, {B,D} and {B, E} ) which lie within 2 pounds of the population mean 52, P(50≤Ȳ ≤54)=50 % We can also compute mean and standard deviation of all Ȳ values, they are: μ ̄y =52 and σ ̄Y =4.02 Notice that mean of all Ȳ values is the same as the population mean and standard deviation is smaller than the population standard deviation. Now we repeat our example for samples of size 4 Possible samples of size four and its means are summarized in the following table. Samples A, B, C,D A,B,C,E A,B,D,E A,C,D,E B,C,D,E Weights 42,48, 52, 58 42,48, 52, 60 42,48,58,60 42,52,58,60 48,52,58,60 50 50.5 52 53 54.5 Y Any sample of size 4 we will take is going to be one of the above 5 possible samples, so probability of obtaining each value of Ȳ is 1/5. This time P(50≤Ȳ ≤54) = 4/5=80% We can also compute mean and standard deviation of all Ȳ values, they are: μ ̄y =52 and σ ̄Y =1.64 In conclusion we can clearly see that mean the distribution of Ȳ remains the same as a population mean , regardless of the sample size. Standard deviation of that distribution decreases as n increases. Sample size and Sampling Error As sample size increases, the more sample means cluster around the population mean, and the sampling error of estimating µ, by Ȳ is smaller. The Mean and Standard deviation of Ȳ We use the sampling distribution of the sample mean to make inferences about a population mean based on the mean of a sample from the population. Bur generally we do not know the exact distribution of the sample mean (sampling distribution) Under certain conditions, we can approximate the sampling distribution of the sample mean ( Ȳ ) by the normal distribution. Normal distribution is determined by its mean and standard deviation. So let’s denote its mean is μȲ and its standard deviation is σ Ȳ . 2 STP231 Brief Class Notes Instructor: Ela Jackiewicz Mean of the variable Ȳ For samples of size n, the mean of the variable Ȳ equals the mean of the variable Y under consideration i.e. the mean of all possible sample means equals the population mean. μȲ =μ . Standard Deviation of the variable Ȳ For samples of size n, the standard deviation of the variable Ȳ equals the standard deviation of the variable under consideration divided by the square root of the sample size, i.e. the standard deviation of all possible sample means equals the population standard deviation divided by the square root of the sample size) σ Ȳ = σ √n Sample Size and Sampling Error The larger the sample size, the smaller the standard deviation of Ȳ . The smaller the standard deviation of Ȳ , the more closely its possible values cluster around the mean of Y. 1. 2. NOTE: The standard deviation of Ȳ determines the amount of sampling error to be expected when a population mean is estimated by s ample mean. The Shape of the Sampling Distribution of the Sample Mean • Ȳ If the variable Y of a population is normally distributed with mean µ and standard deviation σ, then, for any sample of size n ≥1 , the variable Ȳ is also σ normally distributed with mean µ and standard deviation . n So what is the distribution shape if Y is not normally distributed? We can then apply the following theorem: • The Central Limit Theorem (CLT) – one of the most important theorems is statistics For a relatively large sample size ( n≥30 ), the variable Ȳ is approximately normally distributed, regardless of the distribution of the variable Y under consideration. The approximation becomes better and better with increasing sample size. 3 STP231 Brief Class Notes Instructor: Ela Jackiewicz Example. Let Y be a height of males in certain population. Assume Y has approximately normal distribution with mean μ=69.7 inches and SD σ=2.8 inches. a) Suppose we randomly select one individual from that population, what is the probability that his height will exceed 72 inches? P(Y>72)= P(Z>0.82)=0.2061, where Z = 72−69.7 2.8 and probability equals to the area under N(69.7, 2.8) to the right of 72. b) Suppose we randomly select a sample of 4 individuals from that population. What is the probability that their average height ( Ȳ ) will estimate population mean with an error of no more than 1 inch? P(68.7≤Ȳ ≤70.7)=P (−0.71≤Z≤0.71)=0.5223 , where −0.71= 0.71= 68.7−69.7 and 2.8/ √ 4 70.7−69.7 and probability equals to the area under N(69.7, 2.8/ √ 4=1.4 ) 2.8 / √ 4 c) Without computations, will the answer in part b change or will it remain the same if we take a sample of size 16? If n=16, distribution of Ȳ will be N(69.7, 2.8/ √ 16=0.7 ), so it will be narrower than the distribution for n=4. The % of all Ȳ values within 1 inch of off the mean will be larger, so our probability will increase. d) Suppose our population was not normal, but severely left skewed, what would be the answers to questions b and c? If population is not normal, we need to use Central Limit Theorem, but that requires us to have sample of size at least 30. If samples are as small as 4 and 16, we can't assume nothing about the distribution of Ȳ , so we can't answer our questions in both parts. 4