Download A selection of book data sets from the previous edition to illustrate various tools that we have studied in this quarter

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
“Real” Data Examples from a previous edition of our book
All of the following examples are taken from the DVD that came with the previous edition of our book. Some are in the current
edition as well, and some are not. For space reasons, the detailed data sets are not reported here. We should calculate descriptive
statistics for all these data sets. Various inference problems can be defined depending on the details
Confidence Intervals
1. Time spent watching on line videos
Summaries:
Sum
Count
Sum of Squares
Descriptive statistics
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Confidence Interval
95% CI for the Mean from
to
2.
205.45
30
1503.20
6.84833
0.33255
6.55
1.82145
3.31767
6.4
3.8
10.2
6.16819
7.52847
Grams of Carbohydrates in fast food sandwiches
Summaries:
Sum
Count
Sum of Squares
Descriptive Statistics
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Confidence Interval
95% CI for the Mean from
to
1259
30
56855
41.9667
2.14930
38.5
11.7722
138.585
37
26
63
37.5708
46.3625
3. MPG for Sports Cars
Summaries
Sum
Count
Sum of Squares
548
25
12300
Descriptive Statistics
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
21.92
0.69263
22
3.46314
11.9933
14
13
27
Confidence Interval
95% CI for the Mean from
to
20.4905
23.3495
Statistical Tests
In the following, besides computing various descriptive statistics, whether necessary for testing purposes or not, we can always
also compute confidence intervals, which, incidentally, if at confidence level 1 ¡ ®, are equivalent to a two-tailed test at
significance level α (the test is significant, if and only if the Null Hypothesis value is outside the corresponding confidence
interval).
Testing for the mean
1. Pay for advertising executives in Denver
Summaries
Sum
Count
Sum of Squares
2,330,734
35
155,402,469,208
Descriptive Statistics and Confidence Interval
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count
66592.4
403.033
66150
2384.38
5.7E+06
7956
62419
70375
2.3E+06
35
Sum of Squares
1.6E+11
95% CI for the Mean from
to
65773.3
67411.5
Is it higher than the national average of $66,200?
t-score
p-value
0.97362
0.16856
2. Life of fluorescent bulbs
Summaries
Sum
Count
Sum of Squares
Descriptive statistics and confidence interval
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
306588
32
2,921,601,102
9580.88
304.476
10002.5
1722.38
3E+06
6891
6110
13001
95% CI for the Mean from
to
8959.89
10201.9
Do they last at least 10,000 hours?
t-score
p-value
−1.377
0.08926
3. Evacuation times
Summaries
Sum
Count
Sum of Squares
2450
50
142726
Descriptive Statistics and Confidence Interval
Mean
Standard Error
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
49
3.04229
43
21.5122
462.776
95
7
102
95% CI for the Mean from
to
Is the average less than 60 seconds?
t-score
p-value
8959.89
10201.9
−3.616
0.00035
Paired Samples
As we noted, paired samples refer to two (paired) sets of measurements, but the testing is done on the difference, hence it is, in
practice, a one-sample test. Since we have two data sets, we could also do descriptive statistics and confidence intervals
separately, (not reported here, for brevity) but we should not use the latter in place of the proper paired sample test to check if the
the second set is essentially unchanged from the first.
4. Does a finance seminar help improve credit scores?
Difference summaries
Sum
Count
Sum of Squares
614
12
44844
Difference descriptive statistics and confidence interval
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
51.1667
10.0859
46.5
34.9385
1220.7
112
−6
106
95% CI for the Mean from
to
28.9678
73.3655
Testing statistics
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
Observed Mean Difference
Variance of the Differences
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Before
After
638.417
597.72
12
0.5088
0
−51.17
1220.7
11
−5.073
0.00018
1.79588
0.00036
2.20099
689.583
1626.27
12
5. Does a herbal medicine help people sleep?
Difference summaries
Sum
Count
Sum of Squares
6.1
14
8.6
Difference descriptive statistics and confidence interval
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
0.43571
0.18084
0.2
−0.1
0.67665
0.45786
1.73148
1.62583
2
−0.1
1.9
95% CI for the Mean from
to
0.04503
0.8264
Testing statistics
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean
Difference
Observed Mean Difference
Variance of the Differences
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
No medicine
4.13571
2.24093
14
0.91409
0
−0.436
0.45786
13
−2.409
0.01576
1.77093
0.03153
2.16037
with medicine
4.57143
1.14374
14
6. Does a SAT preparation course help improve scores?
Difference summaries
Sum
Count
Sum of Squares
599
10
42359.0
Difference descriptive statistics and confidence interval
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
59.9
8.48456
57.5
26.8305
719.878
83
29
112
95% CI for the Mean from
to
40.7066
79.0934
Testing statistics
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
Observed Mean Difference
Variance of the Differences
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Before course
385.6
3878.27
10
0.92513
0
−59.9
719.878
9
−7.060
3E−05
1.83311
5.9E−05
2.26216
After course
445.5
4941.61
10
7. Do scoring statistics improve from the rookie to the sophomore years in
basketball?
Difference summaries
Sum
Count
Sum of Squares
12.8
10
100.16
Difference descriptive statistics and confidence interval
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
1.28
0.9648
1.2
3.05097
9.30844
11.4
−5.5
5.9
95% CI for the Mean from
to
−0.903
3.46254
Testing statistics
Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
Observed Mean Difference
Variance of the Differences
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Rookie
13.95
6.805
10
0.6287
0
−1.28
9.30844
9
−1.327
0.10864
1.83311
0.21728
2.26216
Sophomore
15.23
15.3579
10
Independent Samples
We have to decide whether we should assume that the two samples come from populations with the same variances or not.. In the
following we apply both methods, as well as work out the descriptive statistics and confidence intervals for the two sample
separately. Note that the confidence intervals should not be used in stead of the proper test, to check whether it is reasonable to
assume that the two means are equal or not (the book has early on a problem on proportions where it suggests you do just that)
Distance Traveled By Air- and Helium-filled footballs
Summaries for both samples
Air
Sum
Count
Sum of Squares
Helium
777
29
21247
Sum
Count
Sum of Squares
798
29
23084
Descriptive statistics and confidence intervals for the two samples
Air
Mean
26.7931
Mean
Standard Error
0.72666
Standard Error
Median
27
Median
Standard Deviation
3.91316
Standard Deviation
Sample Variance
15.3128
Sample Variance
Range
15
Range
Minimum
19
Minimum
Maximum
34
Maximum
Helium
27.5172
1.17719
29
6.33934
40.1872
28
11
39
95% CI for the Mean from 25.3046
to
28.2816
25.1059
29.9286
95% CI for the Mean from
to
Testing statistics
Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Air
26.7931
15.3128
29
0
−0.7241
46.6328
−0.5234
0.30157
1.67819
0.60314
2.01216
Helium
27.5172
40.1872
29
Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Air
26.7931
15.3128
29
27.75
0
−0.7241
56
−0.5234
0.30136
1.67252
0.60273
2.00324
Helium
27.5172
40.1872
29
Body Temperatures of Men and Women
Summaries for both samples
Men
Sum
Count
Sum of Squares
Women
6376.8
65
625625
Sum
Count
Sum of Squares
6395.6
65
629323
Descriptive statistics and confidence intervals for the two samples
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Men
98.1046
0.08667
98.1
0.69876
0.48826
3.2
96.3
99.5
95% CI for the Mean from
to
Women
98.3938
0.09222
98.4
0.74349
0.55277
4.4
96.4
100.8
Mean
Standard Error
Median
Standard Deviation
Sample Variance
Range
Minimum
Maximum
97.9315
98.2778
95% CI for the Mean from
to
Testing statistics
Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
Men
98.1046
0.48826
65
0
−0.2892
127.510
−2.2854
0.01197
1.65689
0.02394
1.97874
Women
98.3938
0.55277
65
Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
Men
98.1046
0.48826
65
0.52052
0
−0.2892
128
−2.2854
0.01197
1.65685
0.02393
Women
98.3938
0.55277
65
98.2096
98.5781
t Critical two-tail
1.97867
Time devoted to study in 1981 and now
Summaries for both samples
1981
Sum
Count
Sum of Squares
Now
1138.3
35
37702.3
Sum
Count
Sum of Squares
1679.6
35
81337.2
Descriptive statistics and confidence intervals for the two samples
1981
Mean
32.5229
Mean
Standard Error
0.75679
Standard Error
Median
33
Median
Standard Deviation
4.47720
Standard Deviation
Sample Variance
20.0453
Sample Variance
Range
19.2
Range
Minimum
21.9
Minimum
Maximum
41.1
Maximum
Now
47.9886
0.78620
47.9
4.65123
21.6340
19
39
58
95% CI for the Mean from
to
30.9849
34.0608
95% CI for the Mean from
to
Testing statistics
Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
1981
32.5229
20.0453
35
0
−15.466
67.9014
−14.172
2.9E−22
1.66761
5.8E−22
1.99552
Now
47.9886
21.6340
35
Equal Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
Observed Mean Difference
df
1981
32.5229
20.0453
35
20.8397
0
−15.466
68
Now
47.9886
21.6340
35
46.3908
49.5863
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
−14.172
2.8E−22
1.66757
5.6E−22
1.99547
Steel bar resilience for two different manufacturing methods
Summaries for both samples
New
Sum
Count
Sum of Squares
Old
6847
17
2759789
Sum
Count
Sum of Squares
5376
14
2068456
Descriptive statistics and confidence intervals for both samples
New
Mean
402.765
Mean
Standard Error
2.75138
Standard Error
Median
402
Median
Standard Deviation
11.3442
Standard Deviation
Sample Variance
128.691
Sample Variance
Range
36
Range
Minimum
386
Minimum
Maximum
422
Maximum
95% CI for the Mean from
to
Testing statistics
Unequal Variances
Mean
Variance
Observations
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
396.932
408.597
New
402.765
128.691
17
0
18.7647
21.3037
3.42917
0.00124
1.71961
0.00248
2.07781
Old
384
4.73008
382.5
17.6983
313.231
67
352
419
95% CI for the Mean from
to
Old
384
313.231
14
373.781
394.219
Same Variances
Mean
Variance
Observations
Pooled Variance
Hypothesized Mean Difference
Observed Mean Difference
df
t Stat
P (T<=t) one-tail
t Critical one-tail
P (T<=t) two-tail
t Critical two-tail
New
402.765
128.691
17
211.416
0
18.7647
29
3.57586
0.00062
1.69913
0.00125
2.04523
Old
384
313.231
14
Related documents