Download BIOL/STAT 335 Lab02

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
BIOL/STAT 335
Lab02: Please remember to write your name on the back only
1. Take ten samples of size 36 from a normal population with mean =100, SD = 16. To generate 10 samples of size 36 from this
population click CalcRandom DataNormal; after “Generate” type “36” to get 36 rows of data (each column is a sample of 36);
after “Store in Column(s),” type “c1-c10”; Type “100” after “Mean and “16” after “StDev” then hit “OK.” This is a simulation of
taking 10 random samples of 36 people from the general population and then measuring their IQ’s, except that IQ’s are whole numbers
and this simulation gives decimal values. Recall that when we generated 200 samples from an exponentially distributed population as
part of the first Minitab lab assignment, each of the 200 rows was a sample. Here each of the 10 columns is a sample.
Version 12 or 13 of Minitab: For each sample use StatBasicStatisticsDisplayDescriptiveStaticstic. Click Graphs and
specify that you want the GraphicalSummary. This command computes all the usual descriptive statistics and displays a histogram
with a normal curve superimposed. This command also performs a statistical test of normality called the Anderson-Darling test.
Version 14 of Minitab: For each sample use StatBasicStatisticsNormalityTest. Be sure that Anderson-Darling test is checked
off. This command shows a Normal Probability Plot and performs a statistical test of normality called the Anderson-Darling test.
As part of this test there is reference to a statistic called A-squared and a P-value. P-values are used in all tests, not just for normality
tests; we will study P-values in more detail in Chapter 7. In a test for normality, if the P-value is very small, then it’s unlikely that the
sample came from a normal population. As a convenient rule, if the P-value is less than .05, then we have evidence to reject the
hypothesis that the sample came from a normal population. List the ten P-values you get:
__________
__________
__________
__________
__________
__________
__________
__________
__________
_________
Although all of our 10 samples actually came from a normal population, note the variety of histograms you get(Version 14: Use
StatBasicStatisticsDisplayDescriptiveStaticstic. Click Graphs and specify that you want Histogram of data, with normal
curve) . Most of the ten P-values of the Anderson-Darling test should not be small, but some may be (actually, since we know the
population is normal, 5% of all the P-values should be under .05, 10% of them should be under .10, etc).
State in your own words
what you observe.
Now do a normal probability plot (GraphProbabilityPlot) for each of the 10 samples. (With Version 14, you already have this.
With versions of Minitab earlier than Version 12, before doing a normal probability plot, you had to convert scores to standard units
(CalcStandardize; store in columns C11-C20); and then GraphProbabilityPlot for each of C11 to C20. However, for the current
Release 12 or 13, this is no longer necessary). Different software use different methods of plotting such a graph. In all of them, a
perfect normal distribution will be displayed as a straight line. Note that the Minitab x and y axes are reversed from those in the
Samuels/Witmer textbook. State in your own words what you observe.
2. Use the serum creatine phosphokinase data of Exercise 2.50 (page 49-50) to answer the following.
a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test?
b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these
36 data points support the claim that indeed serum creatine phosphokinase data of this sort follows the normal curve? _______
Why/why not?
c. Use the “Basic Statistics” display (such as “Mean”, “StDev” and “SE Mean”) to answer the following:
Your best guess for the true average serum CK level μ of all healthy men is __________.
Your best guess for the accuracy of your estimate is ±______________.
Now include both to give your conclusion μ = _________±______________.
If you measured the serum CK level of a 37th man, the reading would be about __________, give or take ________, or so.
If you took another sample of 36 men, the average serum CK level of these 36 men would be about ____________, give or take
___________, or so. What is the relationship between the standard deviation and the standard error of the mean for this data set?
3. Consider the Lentil Growth data of Example 4.9 (page 138). This data is on your Samuels/Witmer data disk
a. Make a histogram and a normal probability plot as in Part 1. What is the P-value of the Anderson-Darling normality test?
b. Use your observations from part 1 above to answer the following question. Does the histogram and normal probability plot of these
47 data points support the claim that indeed growth rates of lentil plants follow the normal curve? _______ Why/why not?
c. Now take a log transform of the 47 data points. Repeat the above two questions for the log transformed data.
Now read the Note on the Normal Probability Plot (NPP) below. If, after reading the Note, you change your mind about any of the
questions above, please feel free to do so.
Note on the Normal Probability Plot (NPP).
Here is some more information about the lab, in particular about the Lentil Growth data.
The Lentil Growth data has a distribution that is far from normal. If you look at the probability plot, you notice the lack of a left tail.
A perfectly normal distribution will follow the solid straight line on the Normal Probability Plot (NPP). The plot also has two 95%
confidence curves shown on the NPP, which are dashed curves.
Notice that the Lentil Growth NPP drops abruptly. This is because the normal curve has a long narrow left tail, whereas the lentil
growth data stops abruptly at x = 0.
Looking at the histogram of the Lentil Growth (Figure 4.31a), notice the two outliers on the right, one at x = 2.00 and the other at x =
1.70. On the NPP, they are represented by dots away from the normal (both dip below the lower 95% curve). This means that the
Lentil Growth data has a fatter right tail than the corresponding normal curve.
The NPP is a standard graphical display that can be used to see deviations from normality. All of this is discussed in section 4.4 of
Samuels/Witmer, except that their plots have the x and y axes interchanged. So their Figure 4.31b is the same as your Minitab NPP,
provided you flip the axes (their plots don’t have the 95% confidence curves).
The Anderson-Darling Normality test gives you a number that summarizes the comparison between the Lentil Growth data and the
normal curve. Well, there are two numbers. The first one is the statistic A-squared. It would take us too far astray to get into a
discussion of its meaning. The second is the P-value.
A P-value is a measure of how likely one would get such a sample if the parent population satisfied the null hypothesis.
In this case the null hypothesis is that the population is normal. At this point, let me just say that, if the P-value is less that .05, then
we have reason to doubt that the data came from a normal population.
For the Lentil Growth data, the P-value is 0.00, which means that it is highly unlikely the data came from a normal population.
The question of importance is whether the **population** from which the data came is normal. In item 1 of the Lab, you took 10
samples from a population that we **know** is normal. And in this part of the Lab you explore the way normal plots look and the way
that the Anderson-Darling test looks. In items 2 and 3 you use your observations from item 1 to answer questions about sample for
which we don’t **know** whether the parent population is normal. We can never be 100% sure, but we should be able to say whether
we are **confident** in some conclusion. If you need to look at more samples from a normal population, repeat part 1 of the Lab to
look at more.
You should agree that the Lentil Growth data comes from a population that is not normal---not with 100% certainty--but with a very
high level of confidence.