Download A worksheet on testing for equality of the means of two independent samples

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Business intelligence wikipedia , lookup

Data analysis wikipedia , lookup

Forecasting wikipedia , lookup

Experimental uncertainty analysis wikipedia , lookup

Transcript
Worksheet on Two Samples
Comparing two means
Simulated Data
Unknown Variances/Equal Variances
The following summary was produced simulating by two normal samples with equal variances:
Variable 1-1 Variable 1-2
Sample Mean
3.04576
3.79085
Sample Variance
0.85380
1.18353
25
25
Observations
Unknown Variances/Unequal Variances
The following summary was produced by simulating two normal samples with different variances
Variable 2-1 Variable 2-2
Sample Mean
3.51379
3.03036
Sample Variance
0.93372
0.30450
25
25
Observations
Data Files from Triola
The following are several examples form the data sets in Triola's book (Appendix B). We will work on some of them as
teams, comparing the means from two data sets.
Weight of M&M's
Several types of M&M candies are measured, but we concentrate on the two colors with the largest sample sizes. The
summary statistics are
Blue
M&M's
Orange M&M's
Mean
0.8578
Mean
0.85604
Standard Error
0.01002
Standard Error
0.00808
Standard Deviation
0.05010
Standard Deviation
0.04199
Sample Variance
0.00251
Sample Variance
0.00176
Count
25
Count
Is the difference in the sample means significant?
27
Note: even if the variances are unknown, we could venture to assume that they are equal, since the manufacturing process is
presumably the same for both. Also, it is not unreasonable to assume that a theoretical normal distribution model is
appropriate, since the variation is probably due to many small and, hopefully, independent factors. Since we are assuming
that both are simple random samples, we might extend this to assume that the two samples are independent of each other.
Cigarette Smoke
Data set #4 reports some measurements of nicotine, tar and Carbon monoxide for several brands of cigarettes. This is a
typical data set that you could find on the web. The main obstacle to doing some statistics on such a data set is that the data
is already manipulated (presumably, the numbers are averages over, hopefully, proper samples of the various brands) and do
not conform to the requirements for our methods to be applicable (i.e., to come from a simple random sample). Having
stipulated that any conclusion we may draw is not very substantiated, we may boldly try to use our methods to compare, for
example, CO output between menthol and filtered non-menthol. Without further knowledge of the manufacturing process it
is also hard to guess whether the two samples (menthol and filtered cigarettes) are independent of each other. All the
modeling assumptions behind our tools are, to be kind, on very shaky ground, but, oh, well. There seems to be no special
reason to assume that the variances for the two samples should be the same, so we won't assume that. The summary
statistics are the following:
Menthol CO
Mean
14.96
Filtered CO
Mean
14.8
Standard Error
0.83363
Standard Error
0.84656
Standard Deviation
4.16813
Standard Deviation
4.23281
Sample Variance
17.3733
Sample Variance
17.9167
Count
25
Count
25
Content of Cola Cans
Data set #17 lists measurements of weight and volume of cans of different varieties of Cola beverages (regular and diet
Coke and Pepsi). As usual, we assume that these measures are the output of a simple random sample for each category. It
would be a stretch to assume that the variance could be the same for Coke and Pepsi products – after all, they come from
different environments – but it can be argued that they could be the same for products from the same manufacturer (then,
again, we would need to know much more about the samples: for example, do they come from the same factory, or not, are
the production lines similar or different, and so on).
If you go and look at the actual data, you will be happy to notice that the volume of each type is consistently higher than the
statutory 12 oz (just one data, a regular Coke, is lower). Thus, testing whether the true mean of this measure is 12, vs. it is
less, would yield an obvious result (we cannot reject the hypothesis that the average volume is 12 oz.). On the other hand,
just looking at the data, you can guess that the average weight of regular soda (for both brands) should test as higher than
the weight of diet. In both cases, it's not so much a matter of eyeballing as to make an educated guess as to what a standard
test would produce (you should, as extra credit, find out what the p-value of a test “H0: the average weights are the same –
H1: regular Cola weighs more than Diet” turns out to be).
As an example, we will compare the weights of two products, in this case, volume of regular Coke vs. volume of regular
Pepsi. As mentioned, it is a stretch to assume that the variances should be equal. On the other hand, we will have to assume
that it is reasonable to assume that the data comes from normal distributions, and that they represent simple random
samples. The summary statistics for the samples are the following:
Coke
regular
Pepsi regular
Mean
12.1944
Mean
12.2917
Standard Error
0.01908
Standard Error
0.01511
Standard Deviation
0.11450
Standard Deviation
0.09063
Sample Variance
0.01311
Sample Variance
0.00821
Count
36
Count
36