Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BIOL/STAT 335 Lab02: Please remember to write your name on the back only 1. Take ten samples of size 36 from a normal population with mean =100, SD = 16. To generate 10 samples of size 36 from this population click CalcRandom DataNormal; after “Generate” type “36” to get 36 rows of data (each column is a sample of 36); after “Store in Column(s),” type “c1-c10”; Type “100” after “Mean and “16” after “StDev” then hit “OK.” This is a simulation of taking 10 random samples of 36 people from the general population and then measuring their IQ’s, except that IQ’s are whole numbers and this simulation gives decimal values. Recall that when we generated 200 samples from an exponentially distributed population as part of the first Minitab lab assignment, each of the 200 rows was a sample. Here each of the 10 columns is a sample. Version 12 or 13 of Minitab: For each sample use StatBasicStatisticsDisplayDescriptiveStaticstic. Click Graphs and specify that you want the GraphicalSummary. This command computes all the usual descriptive statistics and displays a histogram with a normal curve superimposed. This command also performs a statistical test of normality called the Anderson-Darling test. Version 14 of Minitab: For each sample use StatBasicStatisticsNormalityTest. Be sure that Anderson-Darling test is checked off. This command shows a Normal Probability Plot and performs a statistical test of normality called the Anderson-Darling test. As part of this test there is reference to a statistic called A-squared and a P-value. P-values are used in all tests, not just for normality tests; we will study P-values in more detail in Chapter 7. In a test for normality, if the P-value is very small, then it’s unlikely that the sample came from a normal population. As a convenient rule, if the P-value is less than .05, then we have evidence to reject the hypothesis that the sample came from a normal population. List the ten P-values you get: __________ __________ __________ __________ __________ __________ __________ __________ __________ _________ Although all of our 10 samples actually came from a normal population, note the variety of histograms you get(Version 14: Use StatBasicStatisticsDisplayDescriptiveStaticstic. Click Graphs and specify that you want Histogram of data, with normal curve) . Most of the ten P-values of the Anderson-Darling test should not be small, but some may be (actually, since we know the population is normal, 5% of all the P-values should be under .05, 10% of them should be under .10, etc). State in your own words what you observe. Now do a normal probability plot (GraphProbabilityPlot) for each of the 10 samples. (With Version 14, you already have this. With versions of Minitab earlier than Version 12, before doing a normal probability plot, you had to convert scores to standard units (CalcStandardize; store in columns C11-C20); and then GraphProbabilityPlot for each of C11 to C20. However, for the current Release 12 or 13, this is no longer necessary). Different software use different methods of plotting such a graph. In all of them, a perfect normal distribution will be displayed as a straight line. Note that the Minitab x and y axes are reversed from those in the Samuels/Witmer textbook. State in your own words what you observe. 2. Use the serum creatine phosphokinase data of Exercise 2.50 (page 49-50) to answer the following. a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test? b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these 36 data points support the claim that indeed serum creatine phosphokinase data of this sort follows the normal curve? _______ Why/why not? c. Use the “Basic Statistics” display (such as “Mean”, “StDev” and “SE Mean”) to answer the following: Your best guess for the true average serum CK level μ of all healthy men is __________. Your best guess for the accuracy of your estimate is ±______________. Now include both to give your conclusion μ = _________±______________. If you measured the serum CK level of a 37th man, the reading would be about __________, give or take ________, or so. If you took another sample of 36 men, the average serum CK level of these 36 men would be about ____________, give or take ___________, or so. What is the relationship between the standard deviation and the standard error of the mean for this data set? 3. Consider the Lentil Growth data of Example 4.9 (page 138). This data is on your Samuels/Witmer data disk a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test? b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these 47 data points support the claim that indeed growth rates of lentil plants follow the normal curve? _______ Why/why not? c. Now take a log transform of the 47 data points. Repeat the above two questions for the log transformed data. Now read the Note on the Normal Probability Plot (NPP) below. If, after reading the Note, you change your mind about any of the questions above, please feel free to do so. Note on the Normal Probability Plot (NPP). Here is some more information about the lab, in particular about the Lentil Growth data. The Lentil Growth data has a distribution that is far from normal. If you look at the probability plot, you notice the lack of a left tail. A perfectly normal distribution will follow the solid straight line on the Normal Probability Plot (NPP). The plot also has two 95% confidence curves shown on the NPP, which are dashed curves. Notice that the Lentil Growth NPP drops abruptly. This is because the normal curve has a long narrow left tail, whereas the lentil growth data stops abruptly at x = 0. Looking at the histogram of the Lentil Growth (Figure 4.31a), notice the two outliers on the right, one at x = 2.00 and the other at x = 1.70. On the NPP, they are represented by dots away from the normal (both dip below the lower 95% curve). This means that the Lentil Growth data has a fatter right tail than the corresponding normal curve. The NPP is a standard graphical display that can be used to see deviations from normality. All of this is discussed in section 4.4 of Samuels/Witmer, except that their plots have the x and y axes interchanged. So their Figure 4.31b is the same as your Minitab NPP, provided you flip the axes (their plots don’t have the 95% confidence curves). The Anderson-Darling Normality test gives you a number that summarizes the comparison between the Lentil Growth data and the normal curve. Well, there are two numbers. The first one is the statistic A-squared. It would take us too far astray to get into a discussion of its meaning. The second is the P-value. A P-value is a measure of how likely one would get such a sample if the parent population satisfied the null hypothesis. In this case the null hypothesis is that the population is normal. At this point, let me just say that, if the P-value is less that .05, then we have reason to doubt that the data came from a normal population. For the Lentil Growth data, the P-value is 0.00, which means that it is highly unlikely the data came from a normal population. The question of importance is whether the **population** from which the data came is normal. In item 1 of the Lab, you took 10 samples from a population that we **know** is normal. And in this part of the Lab you explore the way normal plots look and the way that the Anderson-Darling test looks. In items 2 and 3 you use your observations from item 1 to answer questions about sample for which we don’t **know** whether the parent population is normal. We can never be 100% sure, but we should be able to say whether we are **confident** in some conclusion. If you need to look at more samples from a normal population, repeat part 1 of the Lab to look at more. You should agree that the Lentil Growth data comes from a population that is not normal---not with 100% certainty--but with a very high level of confidence.