Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
“Real” Data Examples from a previous edition of our book All of the following examples are taken from the DVD that came with the previous edition of our book. Some are in the current edition as well, and some are not. For space reasons, the detailed data sets are not reported here. We should calculate descriptive statistics for all these data sets. Various inference problems can be defined depending on the details Confidence Intervals 1. Time spent watching on line videos Summaries: Sum Count Sum of Squares Descriptive statistics Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum Confidence Interval 95% CI for the Mean from to 2. 205.45 30 1503.20 6.84833 0.33255 6.55 1.82145 3.31767 6.4 3.8 10.2 6.16819 7.52847 Grams of Carbohydrates in fast food sandwiches Summaries: Sum Count Sum of Squares Descriptive Statistics Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum Confidence Interval 95% CI for the Mean from to 1259 30 56855 41.9667 2.14930 38.5 11.7722 138.585 37 26 63 37.5708 46.3625 3. MPG for Sports Cars Summaries Sum Count Sum of Squares 548 25 12300 Descriptive Statistics Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 21.92 0.69263 22 3.46314 11.9933 14 13 27 Confidence Interval 95% CI for the Mean from to 20.4905 23.3495 Statistical Tests In the following, besides computing various descriptive statistics, whether necessary for testing purposes or not, we can always also compute confidence intervals, which, incidentally, if at confidence level 1 ¡ ®, are equivalent to a two-tailed test at significance level α (the test is significant, if and only if the Null Hypothesis value is outside the corresponding confidence interval). Testing for the mean 1. Pay for advertising executives in Denver Summaries Sum Count Sum of Squares 2,330,734 35 155,402,469,208 Descriptive Statistics and Confidence Interval Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum Sum Count 66592.4 403.033 66150 2384.38 5.7E+06 7956 62419 70375 2.3E+06 35 Sum of Squares 1.6E+11 95% CI for the Mean from to 65773.3 67411.5 Is it higher than the national average of $66,200? t-score p-value 0.97362 0.16856 2. Life of fluorescent bulbs Summaries Sum Count Sum of Squares Descriptive statistics and confidence interval Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 306588 32 2,921,601,102 9580.88 304.476 10002.5 1722.38 3E+06 6891 6110 13001 95% CI for the Mean from to 8959.89 10201.9 Do they last at least 10,000 hours? t-score p-value −1.377 0.08926 3. Evacuation times Summaries Sum Count Sum of Squares 2450 50 142726 Descriptive Statistics and Confidence Interval Mean Standard Error Mode Standard Deviation Sample Variance Range Minimum Maximum 49 3.04229 43 21.5122 462.776 95 7 102 95% CI for the Mean from to Is the average less than 60 seconds? t-score p-value 8959.89 10201.9 −3.616 0.00035 Paired Samples As we noted, paired samples refer to two (paired) sets of measurements, but the testing is done on the difference, hence it is, in practice, a one-sample test. Since we have two data sets, we could also do descriptive statistics and confidence intervals separately, (not reported here, for brevity) but we should not use the latter in place of the proper paired sample test to check if the the second set is essentially unchanged from the first. 4. Does a finance seminar help improve credit scores? Difference summaries Sum Count Sum of Squares 614 12 44844 Difference descriptive statistics and confidence interval Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 51.1667 10.0859 46.5 34.9385 1220.7 112 −6 106 95% CI for the Mean from to 28.9678 73.3655 Testing statistics Mean Variance Observations Pearson Correlation Hypothesized Mean Difference Observed Mean Difference Variance of the Differences df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Before After 638.417 597.72 12 0.5088 0 −51.17 1220.7 11 −5.073 0.00018 1.79588 0.00036 2.20099 689.583 1626.27 12 5. Does a herbal medicine help people sleep? Difference summaries Sum Count Sum of Squares 6.1 14 8.6 Difference descriptive statistics and confidence interval Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum 0.43571 0.18084 0.2 −0.1 0.67665 0.45786 1.73148 1.62583 2 −0.1 1.9 95% CI for the Mean from to 0.04503 0.8264 Testing statistics Mean Variance Observations Pearson Correlation Hypothesized Mean Difference Observed Mean Difference Variance of the Differences df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail No medicine 4.13571 2.24093 14 0.91409 0 −0.436 0.45786 13 −2.409 0.01576 1.77093 0.03153 2.16037 with medicine 4.57143 1.14374 14 6. Does a SAT preparation course help improve scores? Difference summaries Sum Count Sum of Squares 599 10 42359.0 Difference descriptive statistics and confidence interval Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 59.9 8.48456 57.5 26.8305 719.878 83 29 112 95% CI for the Mean from to 40.7066 79.0934 Testing statistics Mean Variance Observations Pearson Correlation Hypothesized Mean Difference Observed Mean Difference Variance of the Differences df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Before course 385.6 3878.27 10 0.92513 0 −59.9 719.878 9 −7.060 3E−05 1.83311 5.9E−05 2.26216 After course 445.5 4941.61 10 7. Do scoring statistics improve from the rookie to the sophomore years in basketball? Difference summaries Sum Count Sum of Squares 12.8 10 100.16 Difference descriptive statistics and confidence interval Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 1.28 0.9648 1.2 3.05097 9.30844 11.4 −5.5 5.9 95% CI for the Mean from to −0.903 3.46254 Testing statistics Mean Variance Observations Pearson Correlation Hypothesized Mean Difference Observed Mean Difference Variance of the Differences df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Rookie 13.95 6.805 10 0.6287 0 −1.28 9.30844 9 −1.327 0.10864 1.83311 0.21728 2.26216 Sophomore 15.23 15.3579 10 Independent Samples We have to decide whether we should assume that the two samples come from populations with the same variances or not.. In the following we apply both methods, as well as work out the descriptive statistics and confidence intervals for the two sample separately. Note that the confidence intervals should not be used in stead of the proper test, to check whether it is reasonable to assume that the two means are equal or not (the book has early on a problem on proportions where it suggests you do just that) Distance Traveled By Air- and Helium-filled footballs Summaries for both samples Air Sum Count Sum of Squares Helium 777 29 21247 Sum Count Sum of Squares 798 29 23084 Descriptive statistics and confidence intervals for the two samples Air Mean 26.7931 Mean Standard Error 0.72666 Standard Error Median 27 Median Standard Deviation 3.91316 Standard Deviation Sample Variance 15.3128 Sample Variance Range 15 Range Minimum 19 Minimum Maximum 34 Maximum Helium 27.5172 1.17719 29 6.33934 40.1872 28 11 39 95% CI for the Mean from 25.3046 to 28.2816 25.1059 29.9286 95% CI for the Mean from to Testing statistics Unequal Variances Mean Variance Observations Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Air 26.7931 15.3128 29 0 −0.7241 46.6328 −0.5234 0.30157 1.67819 0.60314 2.01216 Helium 27.5172 40.1872 29 Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Air 26.7931 15.3128 29 27.75 0 −0.7241 56 −0.5234 0.30136 1.67252 0.60273 2.00324 Helium 27.5172 40.1872 29 Body Temperatures of Men and Women Summaries for both samples Men Sum Count Sum of Squares Women 6376.8 65 625625 Sum Count Sum of Squares 6395.6 65 629323 Descriptive statistics and confidence intervals for the two samples Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum Men 98.1046 0.08667 98.1 0.69876 0.48826 3.2 96.3 99.5 95% CI for the Mean from to Women 98.3938 0.09222 98.4 0.74349 0.55277 4.4 96.4 100.8 Mean Standard Error Median Standard Deviation Sample Variance Range Minimum Maximum 97.9315 98.2778 95% CI for the Mean from to Testing statistics Unequal Variances Mean Variance Observations Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail Men 98.1046 0.48826 65 0 −0.2892 127.510 −2.2854 0.01197 1.65689 0.02394 1.97874 Women 98.3938 0.55277 65 Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail Men 98.1046 0.48826 65 0.52052 0 −0.2892 128 −2.2854 0.01197 1.65685 0.02393 Women 98.3938 0.55277 65 98.2096 98.5781 t Critical two-tail 1.97867 Time devoted to study in 1981 and now Summaries for both samples 1981 Sum Count Sum of Squares Now 1138.3 35 37702.3 Sum Count Sum of Squares 1679.6 35 81337.2 Descriptive statistics and confidence intervals for the two samples 1981 Mean 32.5229 Mean Standard Error 0.75679 Standard Error Median 33 Median Standard Deviation 4.47720 Standard Deviation Sample Variance 20.0453 Sample Variance Range 19.2 Range Minimum 21.9 Minimum Maximum 41.1 Maximum Now 47.9886 0.78620 47.9 4.65123 21.6340 19 39 58 95% CI for the Mean from to 30.9849 34.0608 95% CI for the Mean from to Testing statistics Unequal Variances Mean Variance Observations Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail 1981 32.5229 20.0453 35 0 −15.466 67.9014 −14.172 2.9E−22 1.66761 5.8E−22 1.99552 Now 47.9886 21.6340 35 Equal Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference Observed Mean Difference df 1981 32.5229 20.0453 35 20.8397 0 −15.466 68 Now 47.9886 21.6340 35 46.3908 49.5863 t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail −14.172 2.8E−22 1.66757 5.6E−22 1.99547 Steel bar resilience for two different manufacturing methods Summaries for both samples New Sum Count Sum of Squares Old 6847 17 2759789 Sum Count Sum of Squares 5376 14 2068456 Descriptive statistics and confidence intervals for both samples New Mean 402.765 Mean Standard Error 2.75138 Standard Error Median 402 Median Standard Deviation 11.3442 Standard Deviation Sample Variance 128.691 Sample Variance Range 36 Range Minimum 386 Minimum Maximum 422 Maximum 95% CI for the Mean from to Testing statistics Unequal Variances Mean Variance Observations Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail 396.932 408.597 New 402.765 128.691 17 0 18.7647 21.3037 3.42917 0.00124 1.71961 0.00248 2.07781 Old 384 4.73008 382.5 17.6983 313.231 67 352 419 95% CI for the Mean from to Old 384 313.231 14 373.781 394.219 Same Variances Mean Variance Observations Pooled Variance Hypothesized Mean Difference Observed Mean Difference df t Stat P (T<=t) one-tail t Critical one-tail P (T<=t) two-tail t Critical two-tail New 402.765 128.691 17 211.416 0 18.7647 29 3.57586 0.00062 1.69913 0.00125 2.04523 Old 384 313.231 14