Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistics LECTURE 6 SAMPLING DISTRIBUTIONS Outline • • • • The importance of sampling distribution Repeated sampling Sampling distribution of sample mean Using simulation to understand sampling distribution Central limit theorem • Adapted from http://www.prenhall.com/mcclave 6-1 6-2 Importance of sampling distribution Statistical Methods • • 6-3 The basis of statistical inference Basis for understanding hypothesis testing, estimation, etc. 6-4 Repeated sampling Example We wish to estimate population mean Select a random sample Find the sample mean (e.g. = 20) and use it as an estimate If other people select different samples, and find markedly different sample means Repeated sampling • The same problem but: • Sampling distribution of sample mean gives ideas about how sample means vary between samples Sample mean: just a particular sample statistic • Would we trust our estimate? 6-5 If everyone else selects different samples, their results are close to our result 6-6 Example of sampling distribution • • • Given a population of salaries of 5 employees: 2, 5, 7, 8, 10 (in hundred dollars/month) Imagine population mean is unknown; we wish to estimate population mean salary Example of sampling distribution • • Denote mean of random sample: Before sample selection: does represent a fixed value or a Random Variable? We select a random sample of 3 salaries 6-7 6-8 Example of sampling distribution If represents a variable that can change in values how many possible values it can take? What is the possibility of each value? What if we use a sample size of n=4? Sampling distribution of sample mean Probability distribution of all of the possible values of the sample mean for a given size sample selected from a population What if we change the sample size? 6 - 10 6-9 Example of sampling distribution of variance Questions • • Is there a sampling distribution of median? Is there a sampling distribution of variance? 6 - 11 6 - 12 Activity: exploring sampling distributions via simulation In general Sampling distribution is a probability distribution of all of the possible values of a sample statistic for a given size sample selected from a population • Use the applet on the webpage: http://www.rossmanchance.com/applets/OneSample .html • • 1st Population: math scores of 15892 high school students Let’s observe Histogram of population Mean of population SD of population 6 - 13 6 - 14 Activity • Now we will develop sampling distribution of sample means (for example, by selecting 10000 samples or more) for n = 6 - 15 2 10 30 100 Observations Let’s write down our observations: 6 - 16 Many sampling distributions (for each n) Shape of sampling distribution Mean of sampling distribution (and compare it mean of population) SD of sampling distribution (and compare it with SD of population) The difference between sampling distribution and population Activity Activity • • Now let’s choose a different population (a non-normal population) provided by the website Repeat what we have done Write down our observations When does the sampling distribution becomes approximately normal? • • 6 - 17 Clearly distinguish between population and sampling distribution Homework: you should experiment with other populations in the website to deepen your understanding of sampling distributions Question: Is there a sampling distribution of another statistic? 6 - 18 Theorem II: Central Limit Theorem Theorem I • If a random sample is selected from a normal population, the sampling distribution of sample mean is normal • Demonstrated by the applet of population of math scores 6 - 19 Now we should • If a random sample is selected from a non-normal population, the sampling distribution of sample mean is approximately normal for large sample sizes • Demonstrated by the applet of a skewed population 6 - 20 Theorem II: Central Limit Theorem Properties of sampling distribution of mean Practical guideline: • If the population is nearly normal, then a sample of size n = 5 will probably be large enough to assure that is approximately normal. If the population is symmetric, then a sample of size n = 20 to 25 is enough for the Central Limit Theorem (CLT) to hold. For most moderately skewed distributions, a sample size of around 30 is traditionally thought to be sufficiently large for the CLT to hold. This is a rule of thumb but this is not a definitive number. For very skewed distributions or distributions with outliers, the sample size required for the CLT to hold may be much larger than 30. • • • 6 - 21 The relationship between The relationship between SD of population and SD of all sample means 6 - 22 Sampling error • • Difference between sample statistic and parameter Important when making inference about population Standard error of mean SD of sample means 6 - 23 Mean of population and Mean of all sample means 6 - 24 Represents (approx.) average deviation of sample means to center The center = population mean Represents (approx.) average error when using sample mean to estimate population mean So called Standard error of mean = (if n/N ≤ 0.05) Finite population correction factor • In cases where n/N > 0.05, the standard error of mean is: Finding probability of sample mean • • 6 - 25 First, check that the sampling distribution of sample mean is normal or nearly so If so, convert to Z to find probability: 6 - 26 Solution Exercise 1 You’re an operations analyst for AT&T. Long-distance telephone calls are normally distributed with = 8 min. & = 2 min. If you select a random sample of 25 calls, what is the probability that the sample mean would be between 7.8 & 8.2 minutes? © 1984-1994 T/Maker Co. 6 - 27 6 - 28 Solution Exercise 2 You’re an operations analyst for company A. The distribution of long-distance telephone calls is symmetric but non-normal with = 8 min. & = 2 min. If you select a random sample of 30 calls, what is the probability that the sample mean lies between 7.8 & 8.2 minutes? © 1984-1994 T/Maker Co. 6 - 29 6 - 30 Conclusion • • • • • The importance of sampling distribution Repeated sampling Sampling distribution of sample mean Using simulation to understand sampling distribution Central limit theorem 6 - 31