Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Worksheet on Two Samples Comparing two means Simulated Data Unknown Variances/Equal Variances The following summary was produced simulating by two normal samples with equal variances: Variable 1-1 Variable 1-2 Sample Mean 3.04576 3.79085 Sample Variance 0.85380 1.18353 25 25 Observations Unknown Variances/Unequal Variances The following summary was produced by simulating two normal samples with different variances Variable 2-1 Variable 2-2 Sample Mean 3.51379 3.03036 Sample Variance 0.93372 0.30450 25 25 Observations Data Files from Triola The following are several examples form the data sets in Triola's book (Appendix B). We will work on some of them as teams, comparing the means from two data sets. Weight of M&M's Several types of M&M candies are measured, but we concentrate on the two colors with the largest sample sizes. The summary statistics are Blue M&M's Orange M&M's Mean 0.8578 Mean 0.85604 Standard Error 0.01002 Standard Error 0.00808 Standard Deviation 0.05010 Standard Deviation 0.04199 Sample Variance 0.00251 Sample Variance 0.00176 Count 25 Count Is the difference in the sample means significant? 27 Note: even if the variances are unknown, we could venture to assume that they are equal, since the manufacturing process is presumably the same for both. Also, it is not unreasonable to assume that a theoretical normal distribution model is appropriate, since the variation is probably due to many small and, hopefully, independent factors. Since we are assuming that both are simple random samples, we might extend this to assume that the two samples are independent of each other. Cigarette Smoke Data set #4 reports some measurements of nicotine, tar and Carbon monoxide for several brands of cigarettes. This is a typical data set that you could find on the web. The main obstacle to doing some statistics on such a data set is that the data is already manipulated (presumably, the numbers are averages over, hopefully, proper samples of the various brands) and do not conform to the requirements for our methods to be applicable (i.e., to come from a simple random sample). Having stipulated that any conclusion we may draw is not very substantiated, we may boldly try to use our methods to compare, for example, CO output between menthol and filtered non-menthol. Without further knowledge of the manufacturing process it is also hard to guess whether the two samples (menthol and filtered cigarettes) are independent of each other. All the modeling assumptions behind our tools are, to be kind, on very shaky ground, but, oh, well. There seems to be no special reason to assume that the variances for the two samples should be the same, so we won't assume that. The summary statistics are the following: Menthol CO Mean 14.96 Filtered CO Mean 14.8 Standard Error 0.83363 Standard Error 0.84656 Standard Deviation 4.16813 Standard Deviation 4.23281 Sample Variance 17.3733 Sample Variance 17.9167 Count 25 Count 25 Content of Cola Cans Data set #17 lists measurements of weight and volume of cans of different varieties of Cola beverages (regular and diet Coke and Pepsi). As usual, we assume that these measures are the output of a simple random sample for each category. It would be a stretch to assume that the variance could be the same for Coke and Pepsi products – after all, they come from different environments – but it can be argued that they could be the same for products from the same manufacturer (then, again, we would need to know much more about the samples: for example, do they come from the same factory, or not, are the production lines similar or different, and so on). If you go and look at the actual data, you will be happy to notice that the volume of each type is consistently higher than the statutory 12 oz (just one data, a regular Coke, is lower). Thus, testing whether the true mean of this measure is 12, vs. it is less, would yield an obvious result (we cannot reject the hypothesis that the average volume is 12 oz.). On the other hand, just looking at the data, you can guess that the average weight of regular soda (for both brands) should test as higher than the weight of diet. In both cases, it's not so much a matter of eyeballing as to make an educated guess as to what a standard test would produce (you should, as extra credit, find out what the p-value of a test “H0: the average weights are the same – H1: regular Cola weighs more than Diet” turns out to be). As an example, we will compare the weights of two products, in this case, volume of regular Coke vs. volume of regular Pepsi. As mentioned, it is a stretch to assume that the variances should be equal. On the other hand, we will have to assume that it is reasonable to assume that the data comes from normal distributions, and that they represent simple random samples. The summary statistics for the samples are the following: Coke regular Pepsi regular Mean 12.1944 Mean 12.2917 Standard Error 0.01908 Standard Error 0.01511 Standard Deviation 0.11450 Standard Deviation 0.09063 Sample Variance 0.01311 Sample Variance 0.00821 Count 36 Count 36