Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Statistics 11/Economics 40 Lecture 13 Distribution of the Sample Mean (5.2) 1. Introduction Conceptually, chapter 5.2 says the same things as Chapter 5.1, we are just working with means now instead of counts and proportions. 2. Sampling Distributions for Means (5.2) Suppose we draw a simple random sample of size n from a large population. Call the observed values X1 , X2 , ..., Xn . An example might be -- draw a simple random sample (SRS) of 25 stocks from 7,497 currently traded stocks on the NYSE, AMEX & NASD. Measure the average percentage change from the sample of 25 and compare it to the population average. Some Stata Output for the sample: Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------chgprice | 25 17.3368 23.54894 -13.8 76.01 Min Max And from the population from which it was drawn: Variable | Obs Mean Std. Dev. ---------+----------------------------------------------------chgprice | 7497 12.27294 28.133 -89.36 525 A statistic: The mean of the sample of 25 is 17.3368 and it is just old x (from Chapter 1.2) You could define x = (X1 + X2 + ... + Xn )/n. x can be thought of as the mean from a single sample selected at random from all possible samples that could have been generated from the population. It could also be thought of as a random variable -- it's an outcome of a random experiment (sample). The expected value of x is µx , the mean of the population of random variable x. In other words, the mean of all sample means should be equal to the population mean. We can check this using a simulation. If I were to draw 10,000 samples of size 25 (with replacement) from our population of 7,497 stocks this is the result: . summ Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------chgprice | 10000 12.2264 5.588577 -4.1552 45.1392 This the overall average of 10,000 sample means. We got 12.2264, this is very close to µx x is considered an unbiased estimator of µx when it comes from a random sample. If your samples are not random, this relationship will not hold. For our sample of 25 stocks, the mean of the sample is 17.3368 and the mean of the population is 12.27294. Statistics 11/Economics 40 Lecture 13 Distribution of the Sample Mean (5.2) σ , where sigma is n the standard deviation of the population. In our data here, sigma is 28.133 so the theoretical standard 28 .133 deviation for a distribution of samples of size 25 should be = 5.6266 25 The theoretical standard deviation of all possible x 's from all possible samples is We can check whether this holds true or not by examining the results of a simulation (you will do this in Lab #3) (see the handout) from the output above, the standard deviation for the 10,000 samples means (from samples of size 25) is 5.58857, again, very close to what we would expect in theory. So note: A sample has a mean x and it has a standard deviation s. A population has a mean µx and a standard deviation ó A sampling distribution or a distribution of all possible sample statistics, in this case a σ mean, also has a mean µ x but a standard deviation . n Your sample is just one realization of all possible samples from a population. σ of the SAMPLE MEAN will be smaller than the standard deviation for n individual measurements (as in Chapter 1.3). In other words, it is easier to predict the mean of many observations than it is to predict the value of a single observation (or to predict the average of small samples). What is causing this? Examine the formula for the standard deviation of the sampling distribution, note the effect of sample size on the standard deviation. The standard deviation Some things to consider How close is x to µ x ... in other words, how accurate will our guesses be? In order to do this, you will need to know the standard deviation of the population ó and the sample size n Note how the standard deviation of the sampling distribution changes with sample size. For big samples, the standard deviation for the sample mean will be small and for small samples, the standard deviation is large. 3. The Central Limit Theorem and the Normal Distribution Given a simple random sample of size n from a population having mean µ and standard deviation σ, the sample mean x will come from a sampling distribution of means with mean µ and standard deviation = σ . n A. Basic Distributional Result If the original population had a normal distribution, then the distribution of the sample mean will also be normally distributed. This is good, because it means we can use the normal table (Table A) to make inferences with a statement of probability or chance. Example. IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? (0.38%) How likely is it for the first score to be 108 or more? (29.8%) Statistics 11/Economics 40 Lecture 13 Distribution of the Sample Mean (5.2) B. The Central Limit Theorem (p. 401) No matter what the distribution of the original population, if the sample size is "large" (your textbook believes that samples greater than or equal to 15 are large), the distribution of the possible sample means will be close to the normal distribution. It is a very powerful theorem and it is the reason why the normal distribution is so well studied. C. Summary Take a simple random sample from a population with mean µ and standard deviation σ . Let x be the average of the samples taken from the population. If either the original population is normally distributed OR the sample size n is sufficiently large, then x will be normally distributed with expected value µ and standard deviation σ . n If the histogram for the population follows a normal curve, or if the sample size is large enough each time, then the histogram for the possible values for x-bar will follow a normal curve that has a mean of µ σ and a standard deviation of . n Thus, about 68% of the x-bars will be within one standard deviation, about 95% of the x-bars will be within two standard deviations, and 99.7% of the x-bars will be within 3 SD. Let's go back to our first sample of 25 with its mean of 17.3368. The chance of getting a mean that large 17.3368 − 12 .27294 or larger is: first calculate Z = = .899 about .90, then do a look-up and get .8159 and 28.133 25 then take 1-.8159 to get .1841. So the chance of drawing a sample of size 25 with an average of 17.33 or higher was a little over 18% NOTE: The Central Limit Theorem only applies to the distribution of possible sample averages (i.e. the sampling distribution) it says nothing about the distribution of individual scores in either the sample or the population. D. Example You are interested in valuating a e-brokerage for your employers, the owners claim clients have an average brokerage account of $19,000 with a standard deviation of $10000 (clearly not normal). They allow you to draw a random sample of 150 clients from their database and you get a sample average of $16,500. If the claim is truthful, how likely is it to get a sample average of 16,500 dollars or less? 10000 = 816.497 dollars, so the chance of getting a 150 single sample mean of $16,500 or less has a z = (16500-19000)/816.497 = -3.06... about 0.11% or like something like 1 in a 1000 samples. Well the standard deviation of all sample means is Things to note The e-brokerage clients accounts need not be normally distributed given the central limit theorem. The CLT also lets you use normal calculations to figure out what the chance of getting an average of $165,00 if the claim is an average of $19000.