Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Probability and Samples: The Distribution of Sample Means Samples and Sampling Error The scores we have looked at thus far are z-scores and probabilities where the sample consists of a single score. This chapter will extend the concepts of zscores and probability to cover situations with larger samples. Ex: A z-score for an entire sample Z-scores (review) Describes exactly where the score is located in the distribution Ex: a z-score of +2.00 is extreme Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.4 The normal distribution following a z-score transformation Central, Representative Sample Extreme Sample Probability (review) If the score is normal, should be able to determine the probability value for each score. A score with a z-score of +2.00 has a probability of only p = .0028 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.4 The normal distribution following a z-score transformation Central, Representative Sample Extreme Sample Z-Scores So far we have been limited to situations where the sample consists of a single score. Most studies have larger samples We will now extend the concepts of z-scores and probability to cover situations with larger samples. A z-score near zero indicates a central, representative sample A z-score beyond +/- 2.00 indicates an extreme example It will be possible to determine exact probabilities for a sample Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 6.4 The normal distribution following a z-score transformation Central, Representative Sample Extreme Sample Difficulties with using samples Samples provide an incomplete picture of the population Any stats computed will not be identical to the corresponding parameters for the entire population Ex: IQ for a sample of 25 students is different for IQ of all population The difference is called a sampling error Sampling Error This difference, or error between the sample stats and the corresponding population parameters, is called sampling error A sampling error is the discrepancy, or amount of error between a sample statistic and its corresponding population parameter. Questions How can you tell which sample is giving the best description of the population? Can you predict how a sample will describe its population? What is the probability of selecting a sample that has a certain sample mean? We can answer these, but we need to set rules that relate samples to populations. Distribution of Sample Means Many different samples come up with different results. A huge set of possible samples forms a relatively simple, orderly, and predictable pattern makes it possible to predict the characteristics of a sample with some accuracy. Distribution of Sample Means (cont.) The ability to predict sample characteristics is based on the distribution of sample means. The distribution of sample means is the collection of sample means for all the possible random samples of a particular size (n) that can be obtained from a population Distribution of Sample Means (cont.) It is necessary to have all the possible values in order to compute probabilities. If a set has 100 samples, the probability of obtaining any specific sample is 1 out of 100 or p = 1/100. Before we only discussed scores, now we are discussing statistics (sample means); Because statistics are obtained from samples, a distribution of statistics is referred to as a sampling distribution. Sampling Distribution A sampling distribution is a distribution of statistics obtained by selecting all the possible samples of a specific size from a population. To construct a sample mean: Take a sample Get the mean Replace Get the sample Get the mean Replace Do this until you have gotten all possible sample combinations. Look at Ex. 7.1 – 4 scores n=2 16 sample means – look at histogram p. 147. Sample Means Note that the sample means tend to pile up around the population mean m=5 The sample means are clustered around a value of 5 Sample Means (cont.) Samples are supposed to be representative of the population Therefore, the sample means tend to approximate the population mean. Sample Means (cont.) The distribution of sample means is approximately normal in shape. Can use the distribution of sample means to answer probability questions about sample means. Ex: if you take a sample of n=2 scores from the original population, what is the probability of obtaining a sample mean greater than 7? P (X > 7) = ? Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 7.1 Frequency distribution for a population of four scores: 2, 4, 6, 8 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Table 7.1 The possible samples of n = 2 scores from the population in Figure 7.1 Ex: if you take a sample of n=2 scores from the original population, what is the probability of obtaining a sample mean greater than 7? P (X > 7) = ? Because probability is equivalent to proportion, the probability question can be restated as follows: Of all the possible sample means, what proportion has values greater than 7? In Figure 7.2 – All the possible sample means are pictured, and only 1 out of the 16 means has a value greater than 7. Answer: 1 out of 16 or p = 1/16 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 7.2 The distribution of sample means for n = 2 The Central Limit Theorem It might not be possible to list all the samples and compute all the possible sample means. As the size of n increases, the number of possible samples increases too. Therefore, it is necessary to develop the general characteristics of the distribution of sample means that can be applied in any situation. Characteristics are specified in Central Limit Theorem Cornerstone for much of inferential statistics Central Limit Theorem For any population with mean m and standard deviation s, the distribution of sample means for sample size n will have a mean of m and a standard deviation of s/ n and will approach a normal distribution as n approaches infinity. Central Limit Theorem Describes the distribution of sample means for any population, no matter what shape, mean, or standard deviation. The distribution of sample means “approaches” a normal distribution very rapidly. Describes the distribution of sample means by identifying the three basic characteristics that describe any distribution: shape, central tendency, and variability. Shape of the Distribution of Means Sample means tends to be a normal distribution Can be almost perfect shape if: The population from which the samples are selected is a normal distribution The number of scores (n) in each sample is relatively large, around 30 or more. Mean of the Distribution of Means The expected value of X The mean of the distribution of sample means is equal to m (the population mean) and is called the expected value of X. Standard Error of X We have considered the shape and the central tendency of the distribution of sample means. To completely describe this distribution, we need one more characteristic Variability Standard Error of X We will be working with the standard deviation for the distribution of sample means. Called the standard error of X The standard error defines the standard, or typical, distance from the mean. Remember, a sample is not expected to provide a perfectly accurate reflection of its population. There will be some error between the sample and the population Standard Error of X The standard deviation of the distribution of sample means is called the standard error of X. The standard error measures the standard amount of difference between X and m due to chance Standard Error of X Standard error = s x = standard distance between X and m s s s indicates that we are measuring a standard deviation or a standard distance from the mean The subscript x indicates that we are measuring the standard deviation for a distribution of sample means. Standard Error Valuable because it specifies precisely how well a sample mean estimates its population mean How much error you should expect on the average Can use the sample mean as an estimate of the population mean Standard Error Magnitude determined by two factors Size of the sample The larger the sample size (n), the more probable it is that the sample mean will be close to the population The standard deviation of the population from which the sample is selected standard error = s x = s n Standard error When the sample size increases, the standard error decreases As n decreases, the error increases Probability and the Distribution of Sample Means Primary use of the distribution of sample means is to find the probability associated with any specific sample. Remember probability is equivalent to proportion. Because the distribution of sample means presents the entire set of all possible X’s, we can use proportion of this distribution to determine probabilities. Example 7.2 Population of SAT scores m = 500 s = 100 If you take a random sample of n = 25 students, what is the probability that the sample mean would be greater than X = 540? Restate probability question as a proportion question Out of all the possible sample means, what proportion has values greater than 540? all the possible sample means is the distribution of sample means The problems is to find a specific portion of this distribution What we know The distribution is normal becausse the population of SAT scores is normal The distribution has a mean of 500 because the population mean is m = 500 The distribution has a standard error of s X = 20 s X = s = 100 = 100 = 20 n 25 5 Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 7.3 A distribution of sample means We are interested in sample means greater than 540 – the shaded area Next, find the s-score value that defines the exact location of X = 540 The value of 540 is located above the mean by 40 pts. This is 2 s.d. (in this case, 2 standard errors) above the mean The z-score for X = 540 is z = +2.00 Because this distribution of sample means is normal, you can use the unit normal table to find the probability associated with z=+2.00 The table indicates that 0.0228 of the distribution is located in the tail of the distribution beyond z = +2.00 Conclusion – it is very unlikely, p = 0.0228 (2.28%) to obtain a random sample of n = 25 students with an average SAT score greater than 540 Z-scores It is possible to use a z-score to describe the position of any specific sample within the distribution of sample means Z-score tells exactly where a specific sample is located in relation to all the other possible samples that could have been obtained. Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the Wadsworth Group, a division of Thomson Learning Figure 7.8 Showing standard error in a graph