Download BIOL/STAT 335 Lab02

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Lab02: Second Computer Assignment
BIOL/STAT 335
Write your name on the back
1. Take ten samples of size 36 from a normal population with mean =100, SD = 16. To generate 10 samples of size 36 from this
population click CalcRandom DataNormal; after “Generate” type “36” to get 36 rows of data (each column is a sample of 36);
after “Store in Column(s),” type “c1-c10”; Type “100” after “Mean and “16” after “StDev” then hit “OK.” This is a simulation of
taking 10 random samples of 36 people from the general population and then measuring their IQ’s, except that IQ’s are whole numbers
and this simulation gives decimal values. Recall that when we generated 200 samples from an exponentially distributed population as
part of the first Minitab lab assignment, each of the 200 rows was a sample. Here each of the 10 columns is a sample.
For each sample use StatBasicStatisticsNormalityTest. Be sure that Anderson-Darling test is checked off.
shows a Normal Probability Plot and performs a statistical test of normality called the Anderson-Darling test.
This command
As part of this test there is reference to a statistic called A-squared and a P-value. P-values are used in all tests, not just for normality
tests; we will study P-values in more detail in Chapter 7. In a test for normality, if the P-value is very small, then it’s unlikely that the
sample came from a normal population. As a convenient rule, if the P-value is less than .05, then we have evidence to reject the
hypothesis that the sample came from a normal population. List the ten P-values you get:
__________
__________
__________
__________
__________
__________
__________
__________
__________
_________
Although all of our 10 samples actually came from a normal population, note the variety of histograms you get. Use
StatBasicStatisticsDisplayDescriptiveStaticstic. Click Graphs and specify that you want Histogram of data, with normal
curve). Most of the ten P-values of the Anderson-Darling test should not be small, but some may be (actually, since we know the
population is normal, 5% of all the P-values should be under .05, 10% of them should be under .10, etc). State in your own words
what you observe.
Now do a normal probability plot (GraphProbabilityPlot) for each of the 10 samples. Different software use different methods of
plotting such a graph. In all of them, a perfect normal distribution will be displayed as a straight line. Note that the Minitab x and y
axes are reversed from those in the Samuels/Witmer textbook. State in your own words what you observe.
2. Use the serum creatine phosphokinase data of Exercise 2.6.1 on page 68 (or download) to answer the following.
a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test?__________
b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these
36 data points support the claim that indeed serum creatine phosphokinase data of this sort follows the normal curve? _____________
Why/why not?
c. Use the “Basic Statistics” display (such as “Mean”, “StDev” and “SE Mean”) to answer the following:
Your best guess for the true average serum CK level μ of all healthy men is __________.
Your best guess for the accuracy of your estimate is ±______________.
Now include both to give your conclusion μ = _________±______________.
If you measured the serum CK level of a 37th man, the reading would be about __________, give or take ________, or so.
If you took another sample of 36 men, the average serum CK level of these 36 men would be about ____________, give or take
___________, or so. What is the relationship between the standard deviation and the standard error of the mean for this data set?
3. Consider the Lentil Growth data of Example 4.4.5 on page 138. This data can be downloaded from our class web page.
a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test?
____________________
b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these
47 data points support the claim that indeed growth rates of lentil plants follow the normal curve?
_______________ Why/why not?
c. Now take a log transform of the 47 data points. Repeat the above two questions for the log transformed data.
4. Consider the Moisture Content data of Example 4.4.2 on page 133. This data can be downloaded from our class web page.
a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test?
____________________
b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these
83 data points support the claim that the moisture content of freshwater fruit follow the normal curve?
_______________ Why/why not?
c. Now take a transform by taking the fourth power of the 83 data points. Repeat the above two questions for the transformed data.
Now read the Note on the Normal Probability Plot (NPP) below. If, after reading the Note, you change your mind about any of the
questions above, please feel free to do so.
Note on the Normal Probability Plot (NPP).
Here is some more information about the lab, in particular about the Lentil Growth data.
The Lentil Growth data has a distribution that is far from normal. If you look at the probability plot, you notice the lack of a left tail.
A perfectly normal distribution will follow the solid straight line on the Normal Probability Plot (NPP). The plot also has two 95%
confidence curves shown on the NPP, which are dashed curves.
Notice that the Lentil Growth NPP drops abruptly. This is because the normal curve has a long narrow left tail, whereas the lentil
growth data stops abruptly at x = 0.
Looking at the histogram of the Lentil Growth (Figure 4.31a), notice the two outliers on the right, one at x = 2.00 and the other at x =
1.70. On the NPP, they are represented by dots away from the normal (both dip below the lower 95% curve). This means that the
Lentil Growth data has a fatter right tail than the corresponding normal curve.
The NPP is a standard graphical display that can be used to see deviations from normality. All of this is discussed in section 4.4 of
Samuels/Witmer, except that their plots have the x and y axes interchanged. So their Figure 4.31b is the same as your Minitab NPP,
provided you flip the axes (their plots don’t have the 95% confidence curves).
The Anderson-Darling Normality test gives you a number that summarizes the comparison between the Lentil Growth data and the
normal curve. Well, there are two numbers. The first one is the statistic A-squared. It would take us too far astray to get into a
discussion of its meaning. The second is the P-value.
A P-value is a measure of how likely one would get such a sample if the parent population satisfied the null hypothesis.
In this case the null hypothesis is that the population is normal. At this point, let me just say that, if the P-value is less that .05, then
we have reason to doubt that the data came from a normal population.
For the Lentil Growth data, the P-value is 0.00, which means that it is highly unlikely the data came from a normal population.
The question of importance is whether the **population** from which the data came is normal. In item 1 of the Lab, you took 10
samples from a population that we **know** is normal. And in this part of the Lab you explore the way normal plots look and the way
that the Anderson-Darling test looks. In items 2 and 3 you use your observations from item 1 to answer questions about sample for
which we don’t **know** whether the parent population is normal. We can never be 100% sure, but we should be able to say whether
we are **confident** in some conclusion. If you need to look at more samples from a normal population, repeat part 1 of the Lab to
look at more.
You should agree that the Lentil Growth data comes from a population that is not normal---not with 100% certainty--but with a very
high level of confidence.