Download Chp12_section1_setb_answers Word file

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 12 Section 1
Homework Set B
12.23 The importance of recreational sports to college satisfaction. The National IntramuralRecreational Sports Association (NIRSA) performed a survey to look at the value of recreational sports on
college campuses.6 One of the questions asked each student to rate the importance of recreational sports to
college satisfaction and success. Responses were on
a 10-point scale with 1 indicating total lack of importance and 10 indicating very high importance. The
following table summarizes these results:
Class
Freshman
Sophomore
Junior
Senior
n
724
536
593
437
x-bar
7.6
7.6
7.5
7.3
(a) To compare the mean scores across classes, what are the degrees of freedom for the ANOVA F
statistic?
Numerator d.f. = 4 – 1 = 3
Denominator d.f. = 724 + 536 + 593 + 437 – 4 = 2286
(b) The MSG = 11.806. If Sp = 2.16, what is the F statistic?
F=
11.806
= 2.53 Recall that Sp2 = MSE.
2.16 2
(c) Give the P-value by using Excel ,Fdist(f, df num, df denom), or the F-calculator found at
http://www.stat.tamu.edu/~west/applets/fdemo.html . What do you conclude?
Fdist(2.53, 3, 2286)
P(F > 2.53) = 0.05556 This suggests that we should consider the fact that at least one of the classes scored
differently. Mainly it looks like the seniors have a different mean. Notice that the difference in means is tiny,
and the pooled standard deviation suggest overlap between the data values. But the sample sizes are very
large, and thus any small deviation from the perfect (no difference in means) can be detected. However, I have
a feeling something is wrong here. I calculated x-bar (the mean of all the data to be equal 7.51, and calculating
MSG I get 9.810, which produces a p-value of 0.0978.
12.45 How long should an infant be breast-fed?
Recommendations regarding how long infants in developing countries should be
breast-fed are controversial. If the nutritional quality of the breast milk is inadequate
because the mothers are malnourished, then there is risk of inadequate nutrition for the
infant. On the other hand, the introduction of other foods carries the risk of infection
from contamination.
Further complicating the situation is the fact that companies that produce infant
formulas and other foods benefit when these foods are consumed by large numbers of
customers. One question related to this controversy concerns it amount of energy
intake for infants who have other foods introduced into the diet at different ages. Part
of one study compared the energy intakes, measured in kilocalories per day (kcal/d)
for infants who were breast-fed exclusively for 4, 5, or 6 months.'6 Here are the data:
(a) Make a table use data already typed into Excel, giving the sample size, mean and
standard deviation (=qrt(variance) ) for each group of infants. Is it reasonable to pool
the variances? Write down the table or use the copy and paste feature of our computer
BF4
499
620
469
485
660
588
675
517
649
209
404
738
628
609
617
704
558
653
548
BF5
490
395
402
177
475
617
616
587
528
518
370
431
518
639
368
538
519
506
SUMMARY
Groups
Count
Sum
Average
Variance
s
BF4
19
10830
570 15118.55556 122.9575
BF5
18
8694
483 12757.29412 112.9482
BF6
8
4335
541.875 8828.982143 93.96266
We do meet the rule that 2(93.96) > 122.95 thus it is not unreasonable to assume equal standard
deviations, .
BF6
585
647
477
445
485
703
528
465
(b) Make a Normal quantile plot (Use CrunchIt!, upload data using file 12_45 at spot) for the data in each
of the four treatment groups. Summarize the information in the plots and draw a conclusion regarding the
Normality of these data. Make a copy of the plots.
Normal Quanilte Plot BF5
700
600
500
data
400
300
200
100
0
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
EXpected Z -score
The plot of BF6 is closest to being perfectly
straight, indicating that the population that
we are sampling from is close to a normal
distribution. The other two are also close
to straight except for the endpoints to the
left. Since there is no pattern before the
dip occurs that suggest a serious move
away from a straight line, we will say that
there is no strong evidence that the
distributions we are sampling from are not
close to normal.
(c) Make a dotplot of the data using Excel as done in class. Does the dotplot indicate that the means of all
the groups are equal or does it indicate that at least one of the means is not equal?
Notice that the dot plots clearly show those
800.00
700.00
Two extreme points from BF4 and BF5. A
600.00
researcher would look at those two data
500.00
BF4
BF5
values more carefully and try to understand
400.00
BF6
why they are so much lower than the rest of
300.00
the data points. I see that the means are
very close together and that the spread of
200.00
each are close to being equal. The amount of
100.00
0
0.5
1
1.5
2
2.5
3
3.5
4
overlap in the dot plots along with the
proximity of the means, and relatively small
sample sizes, would indicate to me that the p-value will be high leading to no evidence that the
population means are not the same.
(d) Run the analysis of variance using Excel. Report the F statistic with its degrees of freedom and Pvalue. What do you conclude? Copy the table below.
ANOVA
Source of Variation
Between Groups
Within Groups
Total
SS
71288.325
550810.875
622099.2
df
MS
F
P-value
F crit
2 35644.16 2.717910798 0.077625 3.219938
42 13114.54
44
The p-value of 0.0776 shows that my evidence is not strong against the null hypothesis. It would be interesting
to see what would happen if I removed those two extreme points and ran the test again.
1. Do we experience emotions differently? Do people from different
cultures experience emotions differently? One study designed to
examine this question collected data from 416 college students from
five different cultures.9 The participants were asked to record, on a 1
(never) to 7 (always) scale, how much of the time they typically felt
Culture
European American
Asian American
Japanese
Indian
Hispanic American
n
16
33
91
160
80
Mean
4.39
4.35
4.72
4.34
5.04
s
1.03
1.18
1.13
1.26
1.16
eight specific emotions. These were averaged to produce the global emotion score for each participant.
Here is a summary of this measure:
(a) Is it reasonable to used a pooled standard deviation for these data? Why or why not?
Yes, since 2(1.03) > 1.26.
(b) Draw a rough sketch denoting the location of the mean for
each group and use the value of the sample standard
deviation of each group to indicate how spread the data is.
I used roughly three standard deviations away from each
sample mean.
(c) From the information given (and your sketch in (b) allowing you visualize the information), do you
think that we need to be concerned that a possible lack of Normality in the data will invalidate the
conclusions that we might draw using ANOVA to analyze the data? Give reasons for your answer.
The data is not normal because the measurements are discrete (the only possible numbers are the integers 1
through 7) similar to the chapter 8 situation when dealing with proportions. Also the means hover around 4
and the standard deviations around one, so if you use three standard deviations away from the mean to
encapsulate 99.7% of the data (about) you reach the ends of the possible values in our measurements. Thus,
you hope the sample size is large enough to overcome the measurement type. The sample sizes of 16 and 33
are the more worrisome of the five.
(d) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates
the P-value. What do you conclude? Show your work.
How to calculate the sample mean of entire data set regardless of group. x =
n1  x1   n 2  x 2    n I  x I 
n1  n 2   n I
d.f.
SS
MS
F
P
Group
4
30.25
7.56
5.31
0.000361
Error
375
534.12
1.42
16(4.39)  33(4.35)  91(4.72)  160(4.34)  80(5.04)
= 4.58
16  33  91  160  80
SSG = 16(4.39 – 4.58)2 + 33(4.35 – 4.58)2 + 91(4.72 – 4.58)2 + 160(4.35 – 4.58)2 + 80(5.04 – 4.58)2
= 30.25
SSE = 15(1.03)2 + 32(1.18)2 + 90(4.72)2 +159(1.26)2 +79(1.16)2 = 534.12
16+ 33 + 91 + 160 + 80 = 380
P(F > 5.31) = 0.000361 The result says that at least one of the means is different.
(e) Without doing any additional formal analysis, describe the pattern in the means that appears to be
responsible for your conclusion in part (d). Are there pairs, of means that are quite similar?
The Hispanic American group has the largest mean and the second is the Japanese group.
2. If a supermarket product is offered at a reduced price frequently, do customers expect the price of the
product to be lower in the future? This question was examined by researchers in a study conducted on
students enrolled in an introductory management course at a large Midwestern University. For 10 weeks
subjects received information about the products. The treatment conditions corresponded to the number of
promotions (1, 2, 3, or 4) that were described during this 10-week period. Students were randomly
assigned to four groups. Below are three possible outcomes of this study. Which one do you think
produces the smallest p-value and why?
Column
n
Mean
Column
n
Column n
Mean
1
40
4.224
1
20 4.1405
2
40 4.06275
2
20
4.027
3
40
3.759
3
20
3.828
4
40 3.54875
4
20
3.583
Mean
1
7
4.257143
2
7
4.04
3
7 3.7042856
4
7
3.602857
The group with the largest sample size should produce the smallest p-value. Why? Notice that the means of
each group from the three situations are about the same, thus, SSG for each group is about the same. But SSE,
is created by the standard deviation of each group (s) which is about the same in the same situations, but, the
sample size change produces a different degrees of freedom.
3. Is there a relationship between the amount of time a battery lasts measured in minutes and the battery
manufacturer? Four different manufacturers of batteries were tested under the same conditions.
Culture
Manufacturer 1
Manufacturer 2
Manufacturer 3
Manufacturer 4
(a) Is it reasonable to used a pooled standard deviation for these data? Why or why not?
n
10
10
10
10
Mean
265.31
277.2
268.2
275.03
s
5.32
4.18
5.13
4.26
(b) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates
the P-value. What do you conclude? Show your work.
How to calculate the sample mean of entire data set regardless of group. x =
d.f.
SS
MS
F
n1  x1   n 2  x 2    n I  x I 
n1  n 2   n I
P
Group
Error
10(265.31)  10(277.2)  10(268.2)  10(275.03)
= 4.58
40
4. Is there a relationship between the amount of time a battery lasts measured in minutes and the battery
manufacturer? Four different manufacturers of batteries were tested under the same conditions.
Culture
Manufacturer 1
Manufacturer 2
Manufacturer 3
Manufacturer 4
n
100
100
100
100
Mean
265.31
277.2
268.2
275.03
s
5.32
4.18
5.13
4.26
(a) Is it reasonable to used a pooled standard deviation for these data? Why or why not?
(b) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates
the P-value. What do you conclude? Show your work.
How to calculate the sample mean of entire data set regardless of group. x =
d.f.
Group
Error
SS
MS
F
n1  x1   n 2  x 2    n I  x I 
n1  n 2   n I
P