Download Lab_Activity_14_Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
SOLUTIONS ACTIVITY SET 14
Activity 14.1 For each of the following research questions does the situation or research question
involve independent samples or paired data?
a. Twenty-five people have their cholesterol measure before eating a Big Mac and again after
eating a Big Mac. On average, does eating a Big Mac increase cholesterol?
Paired data – the measurements will be taken twice on the same 25 subjects
b. What is the difference in average ages at which teachers and plumbers retire?
Two independent samples
c. What is the difference in average salaries for high school graduates and college graduates?
Two independent samples
d. In fifty married couples, the husband and wife each separately take the same test of marital
satisfaction. Is there a difference, on average, between the scores of husbands and wives?
Paired data – spousal data is often analyzed as paired data.
Activity 14.2 In the Datasets folder click the link for the “GSS Dataset” to open Minitab with the
data in place. The data are from the 2002 General Social Survey, a federally funded national
survey done every other year by the University of Chicago. The variable marital indicates
whether the respondent is presently married or not. We’ll compare the mean amount of television
watching per typical day (tvhours is the variable) for those who are married versus those who are
not.
a. In words, write a null hypothesis for this situation. We’re comparing two means (television
watching for married people versus unmarried people).
Null: no difference in mean television watching for married people and unmarried people
b. Using statistical notation for means write null and alternative hypotheses for this problem.
H0: μ1 – μ2 = 0 or equivalently H0: μ 1 = μ 2
Ha: μ1 – μ2 ≠0 or equivalently H0: μ 1 ≠ μ 2
c. Recall from the lecture notes that when doing a two-sample t-test one consideration is whether
the two standard deviations (or variances) are equal. To check, go to Stat > Basic Statistics >
Display Descriptive Statistics and enter tvhours in the Variables window and marital in the By
Variables window.
i. What are the two standard deviations?
Variable
tvhours
marital
1_NotMarried
2_Married
N
496
387
N*
1
1
Mean
3.276
2.6072
SE Mean
0.121
0.0927
StDev
2.693
1.8241
Minimum
0.000
0.0000
Q1
2.000
1.0000
ii. Is the larger standard deviation more than twice the smaller standard deviation?
No, 2.693 is not more than twice 1.8241
iii. If your answer to part ii is “Yes” then we will use the unpooled method for calculating
the standard error. If your answer was “No” then we can use the pooled method. Which method
should we use? Pooled
d. The two-sample t-test is used to compare means when data is from two independent
samples (as it is here). Use Stat>Basic Statistics>2-sample t. At the top of the dialog
box, enter tvhours in the “Samples” box and marital in the “Subscripts” box. If your
answer to part iii above is to use pooled then click the box for “Assume Equal
Variances”. Read the output to find the values of the t-statistic and the p-value.
t=
p-value = 0.000
4.19
e. State a conclusion about the hypotheses and about the “real world” situation.
Reject the null. Conclude population mean tv hours differs for the two groups. It looks like
the mean is higher for unmarried.
f. The formula for the pooled t-statistic is t 
in the formula.
x 1  3.28
x 2  2.61
x1  x2
. Give values for each of the elements
1 1
sp

n1 n2
s p  2.3524
n 1  496
n 2  387
g. The output includes a 95% confidence interval for the difference between means. Write a
sentence that interprets this interval in terms of how much difference there is mean television
watching for the two groups.
We are 95% confident that the difference in means for the two group is somewhere between 0.36
and 0.98 hours per day (with not married having a higher mean)
h. Refer again the to the 95% confidence interval of the previous part. Explain why it is evidence
that makes it reasonable to conclude that the population means differ.
The interval does not include 0 so we can reject no difference as a possibility.
Activity 14.3 In a national survey of 12th graders, 254 of 1356 boys said they never or rarely wear
a seatbelt when driving. Among 1168 girls, 97 said they never or rarely wear a seatbelt when
driving.
a. Let p1 = population proportion that never or rarely wears a seatbelt for boys and p2 = the
corresponding proportion for girls. Write null and alternative hypotheses about p1 and p2.
H0: p1 – p2 = 0 or equivalently H0: p1 = p2
Ha: p1 – p2 ≠ 0 or equivalently Ha: p1 ≠ p2 or it’s okay to use H0: p1 – p2 <0 implying we think girls
are less likely to never wear a seatbelt
b. Start Minitab (you can use the Start menu to do this). Use Stat>Basic Stats> 2 proportions.
Click on Summarized data. Use the boys as the first sample and girls as the second sample.
“Number of trials” means sample size and “Number of events” means number rarely or never
wearing a seatbelt. Use the output to give values for the following:
For boys, sample proportion = p̂1 = .187
For girls, sample proportion = p̂ 2 = .083
The difference between the sample proportions is p̂1  p̂ 2 = .104
Value of z-statistic = 7.83
p-value = 0.000
c. Explain whether we can we say there is a difference between the population proportions in this
situation.
We can conclude that there is a difference (p-value =0.000 is less than 0.05)
Activity 14.4
DEFINITIONS
A type 1 error occurs if we incorrectly pick the alternative hypothesis (pick it when null
is really the truth)
A type 2 error occurs if we incorrectly pick the null hypothesis (pick it when the
alternative is really the truth).
a. Refer back to part a of Activity 14.2. Explain what the type 1 and type 2 errors are in this
situation. Do this in terms of the "real world" situation.
Type 1 = deciding the mean number of hours watched between married and
unmarried people differs in the population when in reality this difference does not
exist.
Type 2 = deciding the mean number of hours watched between married and
unmarried people does not differ in the population when in reality this difference does
exist.
b Refer back to part a of Activity 14.3. Explain what the type 1 and type 2 errors are in
this situation. Do this in terms of the "real world" situation.
Type 1 = deciding the proportion of girls who wear seatbelts differs from the
proportion of boys who wear seatbelts when in reality this difference does not exist.
Type 2 = deciding the proportion of girls who wear seatbelts does not differ from
the proportion of boys who wear seatbelts when in reality this difference does exist.