Random Samples A random sample taken from a probability model that describes a population or process of (single) units: Sample size: n (Population size is generally not a factor.) Statistics (values obtained from a sample) Sample mean: x Sample standard deviation: S Parameters (values for an entire population or process) Population mean: Population standard deviation: Statistics are used to estimate parameters. Generally speaking the sample mean is the best way to estimate the population mean. We care about the distribution of sample means. (Super) Units: Samples of size n (Super) Variable: Mean (of sample) Two types of random sampling For both types the following is true: The probability of selecting a particular unit at any point is given by the probability model. If this model describes a population, then every unit in the population is equally likely to be chosen on any selection. (Keep in mind that the probability model describes how many units share any particular variable value.) For a random sample drawn from a probability model describing a population or process (that could go on indefinitely): After sampling any particular set of units, the probabilities of the variable values for the next unit are exactly the same as they were for the first unit – and every other unit. This means that random sampling is done with replacement. For a simple random sample drawn from a probability model describing a finite population: After sampling any particular set of units, that set of units is not replaced, and subsequent units are chosen from the remaining unsampled units. This means a simple random sample is done without replacement. (With a simple random sample, every possible sample of n different units is equally likely to be the chosen sample.) Distributional Results For both types of sampling, the mean of the distribution of (all possible) sample means is equal to the mean for the probability model describing individual units. X The formula for the standard deviation of the sample mean technically depends upon the type of sampling used. For a random sample (drawn with replacement) the formula shown works. This formula also works for almost all practical situations involving a simple random sample. X n 20 Times Rule (also known as the 5% rule) If the population size is at least 20 times as large as the sample size n, the two types of sampling can be treated the same way. (Restated: If the sample size n is no more than 5% of the population size…) Consequently, the standard deviation of the sample mean is virtually the same as that shown above. The two expressions above are generally appropriate to handle the distribution of sample means. Normally distributed data If the distribution of the variable over individual units in the population is Normal, and the sampling is random, then the sample mean has Normal distribution. Central Limit Theorem For a variable that does not have Normal distribution: As the sample size grows, the distribution of (all possible) sample means becomes closer and closer to a Normal distribution. How large is large enough to use a Normal distribution to determine probabilities for a sample mean? If the variable is Normal, n = 1 is large enough. See above. For a variable with a symmetric distribution, n = 10 or so might be fine, unless the distribution is the sort that has some large outliers. For somewhat skewed distributions without large outliers, n = 30 is generally fine. For highly skewed distributions, or distributions that have outliers, rather large samples are required. These situations require expertise beyond the intro stats level.