Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Commentary on Distribution of Sample Means Class Notes First & foremost – read your text! Chapter 7 – this will make all of your notes & commentary make much better sense.. Just as a refresher, the first couple of rows in your Class Notes contain material already covered, but helpful as we move ahead & tackle the subject of sample means. Before we start performing hypothesis testing (inferential statistics, making statements about populations based upon sample data) we are kicking it up a notch & moving closer to obtaining the most accurate data necessary for making statements about populations based upon sample data. So far, the original data we have been working with since the beginning has been individual values or scores. First, we worked with basic probabilities to learn not just how probabilities that are known are obtained, but to discover how others use information & probabilities to predict the unknown. Our last assignment, we learned to convert raw scores to standard scores for the following reasons 1) Converting raw scores to standard scores, or z-scores gives us the same mean, or average (0) & the same standard deviation (average distance of all the scores from the mean) (+/-1.00). 2) Converting raw scores to standard scores allows us to compare dissimilar data sets or values, & allows us to identify where exactly on the distribution each score lies. Then we brought in the Unit Normal Table to combine probability & standard scores to again, make predictions of where, on a distribution, certain scores may fall. Do you see the pattern (hint: the bolded, italicized, underlined words)? The next step now is to work with larger data sets so that we can then make even more informed decisions about making predictions about a populations based on sample data. This is where sample means comes into play. As we are making statements about population data using sample means, it is assumed that the values are equal to population data (review the Central Limit Theorem.) The two most important descriptive values we use in statistics is the mean & standard deviation, so this is where we begin. I will be using a viable example, but to make this a more simple display, I am reducing the number of scores just so you can “see” the concept of obtaining sample means. I want to evaluate the personality scores using the MMPI of schizophrenics who are in in-patient facilities throughout the state. Instead of gathering all of the personality scores, I gather the scores from each facility in the state. (Hypothetically) there are 5 in-patient facilities each current with 5 schizophrenic patients to be tested. Their results are as follows: N= 5 Facility 1 Facility 2 Facility 3 Facility 4 Facility 5 MMPI Scores for each of the 5 patients (n=5) (n=5) (n=5) (n=5) (n=5) 45, 50, 66, 79, 33, 38, 44, 59, 55, 63, 72, 90, 58, 62, 65, 88, 40, 60, 87, 100 65 112 90 100, 120 Means (M=∑X/n) 45+50+66+79 +100 = 340/5 = 68 (M1=68) 33+38+44+59 +65= 239/5 =47.8 (M2=47.8) Mean of the Sample Means (µM 55+63+72+90 +112= 392/5 = 78.4 (M3=78.4 58+62+65+88 +90=363/5 = 72.6 (M4=72.6) 40+60+87+ 100+120= 407/5 = 81.4 (M5=81.4) = (∑M/N)) = 68 + 47.8+78.4+72.6+81.4 = 348.2/5 = 69.64 Notice that the formula used for computing a regular sample mean is exactly the same as when we compute the mean of a sample mean (i.e., the average of all the sample means). What is different are the notations that are used. Since we are now working with a set of means instead of individual scores, then this means that we need new symbols & formulas representing sample means instead of regular single x-values if we are to calculate the probability of obtaining certain population values. Each sample mean is represented as “M.” “N” represents the number of sample means that you are working with. Remember that “N” represents population numbers where ‘n’ represents sample numbers. Since our means are pulled from the full population that we are working with, then we use the capital ‘N’ as the value representing the number of sample means. Above, we obtained 5 sample means, so our ‘N’ value is 5 (Also, each sample set had 5 values in it, so each sample set from each facility would be n = 5) Now we have what is called the “mean of the sample means” since we took each sample mean, & then utilized the mean formula (with changes in our symbols) (∑M/N) to obtain the “mean” of the sample means. This is also called the expected value of M. To obtain the standard deviation of our new set of sample means, we also experience modifications in formulas & symbols. When we calculate the standard deviation of sample means, it is called the standard error of M. First, we must compute the population standard deviation formula (which you already know) since our data represents the full population. But we add an extra step. After we calculate the population standard deviation, we then divide its results by the square root of N to obtain our Standard Error of M, or the standard distance of all sample means from the center (the average; the mean). Review your class notes as well as the table at the end of your class notes to review the new symbols & formulas associated with sample means. Using the example above, the standard error of M would look like this: M2 M 68 4624 47.8 2284.84 78.4 6146.56 72.6 5270.76 81.4 6625.96 ∑M = 348.2 ∑M2 = 24952.12 (∑M)2 = 121243.24 24952.12 – 121243.24 ____________5_____ 5 = 24952.12 – 24248.648 = √703.472/5 = √140.6944 = 11.861467 5 We have our population standard deviation value = 11.861467. Since we are working with sample means & not regular x-values, we need to add an extra step. We need to divide our SD value by the square root of N. 11.861467/√5 = 11.861467/2.2360679 = 5.304609… Our standard error of M, or σM = 5.304609… Since our sample sets have different means & different standard deviations, our next step is to convert these values to standard scores. The steps for converting to z-scores is the same as we did last week, but the symbols have changed since we are now working with sample means. We have the values then that we need for converting our sample means to standard scores or zscores. See the formula at the top of the second column. M 68 ZM = M - µM σM 68 – 69.64/5.304609 = -1.64/5.304609 = -.309165 47.8 47.8 – 69.64/5.304609 = - 21.84/5.304609 = -4.117174 78.4 78.4 – 69.64/5.304609 = 8.6/5.304609 = 1.651394 72.6 72.6 – 69.64/5.304609 = 2.6/5.304609 = .558005 81.4 81.4 – 69.64/5.304609 = 11.76/5.304609 = 2.216940 Before moving on with our next step, let’s look at our z-scores that we have listed above in comparison to our standard curve (distributing each sample mean from a population will result in a normal or near normal curve due to the rules re: distribution of sample means. This is also represented in figure 7.7 on page 215 of the 9th edition): More than +/- 2.00 standard deviations from the mean would be considered significant. All of our z-scores above fall within the average range with the exception of 2. Our M = 81.4 corresponds to a z-score of 2.216940; just slightly out of the significant range on the positive side. This mean comes from Facility 5. If you add up the % under the normal curve beyond +2.216940, we have: 1.7 + .05 + .01 = 1.76% meaning that there is a 1.76% chance of obtaining a mean of 81.4, which is rare. But we also have an outliar. Facility 2 reports a mean of 47.8 which has a standard score of -4.117174. As you can see on the distribution, the probability of obtaining a mean score of 47.8 is 0.1%; far beyond what would be considered normal or average. Do you see how all of this is coming together now? Keep in mind that the values we have used in this example above are minimal in comparison to a more valid study where greater masses of data are evaluated. The rule of large numbers indicates that the greater the number of sample numbers used, the less variability there will be between sample & population data (makes sense, right?). As the N value reaches 30 or more, the distribution is considered normal. This does not mean that outliars will not present themselves, but it does mean that the probability of outliars will be reduced. The mass of the values will join around the mean & then reduce as they move away from the mean (as you can see in the normal curve). We use sample means data to make predictions. So far, our steps involve: 1. Obtain sample values from the population to be studied. 2. Calculate the mean from each sample set. 3. Obtain the mean of the sample means (expected value of M). 4. Calculate the standard error of M (the standard distance of all sample means from the mean of the sample means). 5. Convert sample means to standard scores (since each sample set has a different mean & SD, we must convert scores so that they have the same mean & same SD to compare these values). 6. These values are assumed to be equal to population data. Obtaining the requisite number of values allows us to make predictions based upon the normal curve. This is where the last part of each one of your assignment problems comes in. You already worked on using standard scores or z-scores to obtain probabilities in the previous assignment. So, using the values in your assignment, you will again be asked to obtain the probability of obtaining a certain score. Using our example above: “What is the probability of obtaining a sample mean of 78.4 or greater? We already calculated the z-score of 1.651394 for this particular sample mean. So we go to our z-score table in the back of our text & locate the z-score of 1.65. The z-score is positive & the question utilizes the term “or greater,” which means we look to the right of the score. The right of a positive value on the distribution is equal to less than 50% of the distribution. So this tells us that we are looking for the proportion in the tail, which in our table gives us .0495 or 4.95%. Stating, that there is a 4.95% of obtaining a sample mean of 78.4. Since the values are assumed to be equal to a population, we can still make a prediction based upon a value that is related to our subject matter (in this case, Personality Scores) even though the value may not be in our original sample mean list. For instance, what if we wanted to determine the probability of obtaining a sample mean of 75 or less? 75 is not in our sample mean set. All we would need to do in this case is to convert our sample mean M=75 to the standard score by plugging it into the z-score formula: 75 – 69.64/5.304609 = 5.36/5.304609 = 1.010442 We find the z-score 1.01 in our Unit Normal Table. This value falls on the positive side of the distribution. Since the question asks “less than,” then we know we are looking to the left of the value. Therefore, we are working with all of the negative side plus 1.01 SD on the positive side giving us more than 50% of the distribution covered. Thus, the proportion in the body. This gives us a proportion of .8438. So there is an 84.38% chance that we will obtain a sample mean value of 75 or less. It is this logic that scientists, educators, politicians, sports organizations, etc… use to make predictions about the future. Notice the different levels where we were working with values in-between whole numbers. Our sample means involved values to the right of the decimal. Our expected value of M was a decimal value. Our standard error of M is a decimal value. All of our z-scores are decimal values. If you do not utilize the values to the right of the decimal anywhere along the line, it throws off subsequent calculations to the point where your outcome will be slightly off, or even wrong. So mind your decimals, please. And remember we are working with values in-between whole numbers!! The rules re: distribution of sample means are listed in your class notes. First, under ‘application’ section in the row distribution of sample means. 1. The anchor, or cornerstone of the concept of the distribution of sample means is The Central Limit Theorem: Provides a precise description of the distribution that would be obtained if you selected every possible sample, calculated every sample mean, & constructed the distribution of the sample mean. So, for any population with mean µ & standard deviation of σ / N , & will approach a normal distribution as n approaches infinity. The value of the central limit theorem comes from 2 different facts: 1. It describes the distribution of sample means for any population, no matter what shape, mean or standard deviation. 2. The distribution of sample means approaches a normal distribution very rapidly. So, by the time n=30, the distribution is almost perfectly normal. The Central Limit Theorem identifies with the 3 basic characteristics that describe any distribution: shape, central tendency & variability. 2. The next important rule is The shape of the distribution of sample means: Will be almost perfectly normal if either of the following two conditions is satisfied: 1. The population from which the samples are selected is a normal distribution 2. The number of scores in each sample is relatively large, around 30 or more. However, increasing the size of more than 30 does not produce much additional improvement in how well the sample represents the population. 3. Third, there is The law of large numbers: States that the larger the sample size (n), the more probable it is that the sample mean will be close to the population mean. There is an inverse relationship b/t sample size & standard error. Bigger samples = smaller error; smaller samples = bigger error. Because of this rule, if you have n = 1, then the standard error & standard deviation are the same (σM = σ). So, the equation to the left satisfies the following 2 requirements: 1) as sample size (n) increases (↑), standard error decreases (↓). 2) When the sample consists of a single score (n=1), the standard error is the same as the standard deviation (σM = σ). Changes from then to now .. 1) Distribution of Sample Means. When engaging in calculations, so far your X-value has represented a single score or a single value in a data set. For the distribution of sample means, your X-value now becomes a mean from a sample pulled from the population. So as to distinguish a single score (X) from a sample mean (M), we change that symbol to “M” as indicated. However, due to the flexibility of the use of the symbol of “X,” you may still see “X” as the symbol of a sample mean, so do not be surprised if you still see “X” when referring to a sample mean. Just make sure that you review the material surrounding this value so you know whether or not you are working with a single score or a sample mean. 2) You will start to see “subscripts” here. So far, we have been working with “postscripts” or “superscripts” which are the small values listed above & next to the value (such as when we square a value, it is represented as 82. The small “2” at the top tells us that we are multiplying the value by itself. If there were a “3” in that place, then that would tell us we should multiply the value by itself 3 times, etc…). Subscripts are listed closer to the bottom & next to the value instead of upper & next. Subscripts are most often used to represent a particular group the value belongs to. It serves no mathematical purpose. For example, if you see M1 = 131, then that tells you that 131 is the sample mean belonging to sample group 1. M2 = 119 tells you that 119 is the sample men belonging to sample group 2, M3 = 138 is the sample mean associated with the sample group 3, etc… 3) Your next step would then be to calculate the mean of this group of sample means (which you would calculate exactly as you have with descriptive data). MM = ΣM / N. Notice the new symbol representing the ‘mean of the sample means’ (also called the ‘expected value of M’) (MM). As this value would be the same as if you were to calculate the mean from each value from the full population, the ‘mean of the sample means’ is also represented as a population mean: µM. Although we are using the new symbol µM to represent the mean of the sample means, due to the central limit theorem, we can assume that the µM = µ or the population mean. 4) Your next step is the standard error of M is the standard deviation of the mean of sample means. You still need to first calculate the population standard deviation (population instead of sample since our group of sample means are representative of the full population), but then you add an extra step. You divide σ (population SD) by N . Keep in mind that in this case, N represents the number of sample means you are working with; not the total value of every number in your data set. See your class notes for an example of calculating the mean of the sample means & then the standard error of M. 5) Look at the change in your Z-score formula in your class notes. Now, your numerator represents the sample mean – the population mean (or the mean of the sample means) divided by the standard error of M, or the standard deviation of the sample means. Since the mean of the sample means would be the same as the population mean, you will often see these symbols used interchangeably at this point (µM = µ): ZM = M - µ σM 6) Now that we have our z-score associated with our set of sample means & we are working with a normal curve, we can then follow through with the same process for obtaining probabilities as we did when we were working with a set of single scores. Then, we were making a probability based upon a sample set. This time, we are making probabilities regarding the full population. Review your notes & assignments previous re: obtaining probabilities & using the Unit Normal Table.