Download Solutions to Practice Problems: Comparing Two Groups

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Solutions to Practice Problems: Comparing Two Groups
1. A random sample of 60 Pennsylvania high school seniors was taken. They provided
information on their sex, and whether or not they support affirmative action in college
admissions. The sample consisted of 31 females, 26 of who support affirmative action, and 29
males, 17 of who support affirmative action. Is there a difference between the sexes on their
support of affirmative action? Also, estimate this this difference at a 90% level of confidence.
Check conditions: For the female group there are 26 “successes” and 5 “failures”, and for
the males there are 17 “successes” and 12 “failures”. All are at least 5 so we can use the zmethod.
Ho: pf – pm = 0 Ha: pf – pm ≠ 0 and α = 0.10 (note the 90% confidence.)
In Minitab using Stat > Basic Statistics > 2 Proportions to estimate the difference Female –
Male, we need to enter the female information as Sample 1 and the male information as
Sample 2. Click Options and change the confidence level to 90, Hypothesized difference is
0, Alternative is ‘Difference ≠ hypothesized difference”, and this 0 for hypothesized
difference suggests we use the Method pooled.
Minitab output:
Test and CI for Two Proportions:
Event = y
Sex
f
m
X
26
17
N
31
29
Sample p
0.838710
0.586207
Difference = p (f) - p (m)
Estimate for difference: 0.252503
90% CI for difference: (0.0669318, 0.438074)
Test for difference = 0 (vs ≠ 0): Z = 2.17 P-Value = 0.030
With the p-value of 0.030 being less than alpha of 0.10, we reject the null hypothesis. We
conclude that there is a difference in the sexes of Pennsylvania high school seniors on their
support of affirmative action for college admissions. Furthermore, we estimate the
difference in the proportion of females who support affirmative action to males who
support it is 0.067 or 0.438; or from 6.7% to 43.8%. With the difference calculated
Females minus Males and the estimated range being positive, this suggests that on average,
female Pennsylvania high school seniors are more likely to support affirmative action than
their male counterparts.
1
2. A mining company is interested in the pH levels of soil before and after mining is completed.
The pH levels from 15 areas of the mining land are taken before mining begins. At the
conclusion of mining, the land is restored and another set of pH readings were taken on the same
15 grids. Is this data paired or independent data?
The data is paired; observations are taken before and after from the same location. Two
measurements are taken from each subject i.e. location.
3. A random sample of 8 new SUV’s and 8 mid-size cars are tested for front impact resistance.
The amounts of damage (in hundreds of dollars) to the vehicles when crashed at 20 mph head on
into a stationary barrier are recorded. Investigators want to know if the SUV’s are sturdier and
thus suffer less damage. Is this paired data or independent data?
The data are independent. There is no paired structure between the SUVs sampled and the
Midsize cars sampled. All vehicles crashed under the same conditions. Only one
measurement is taken off each subject i.e. vehicle.
4. According to the publication, High School Profile Report, in past years college-bound males
have out-performed college-bound females on the mathematics portion of tests given by ACT
Program. Samples of this year’s score yield the following data (ACT_scores.txt) Does it appear
that college-bound males are, on the average, still outperforming college-bound females on the
mathematics portion of ACT tests? Set up the hypothesis. Indicate whether you are going to use
2-sample t-test or paired t-test for the problem. If you use 2-sample t-test, will you use pooled or
non-pooled variances? What conditions will you check before you perform the test?
Males
Females
34
18
34
33
30
27
18
11
24
23
16
23
15
18
26
20
13
26
21
10
24
20
11
22
11
14
15
21
We have two independent samples because there is no paired relationship between the
scores for males and females. Note that only one score measurement is taken of each
subject. With the sample sizes are not large enough we need to check the normal
assumption for both sets of data using a probability plot (in Minitab: Graphs > Probability
Plot > Multiple). From the graph we do not notice any departures from normality for
either group; assume the condition is met.
2
We will set the hypotheses as follows:
Ho: µmale – μfemale = 0 versus Ho: µmale – μfemale > 0 and alpha will be default value of 0.05
since none was provided.
Next since using conducting a two-independent means test, we need to check if the
variances of the two groups are equal. We will first show the rule of thumb method
followed by the test of two variances method.
Rule of Thumb: Is the ratio of the two sample standard deviations between 0.5 and 2? In
Minitab using Stats > Basic Statistics > Display Descriptive Statistics we enter both the
male and female columns of data into the variables box. From the output we get a ratio of
8/6.2 = 1.29 We can consider the sample standard deviations “close enough” to assume the
population variances are equal. We will use the pooled method.
Hypothesis Test of Two Variances: More reliable than the Rule of Thumb method is a
statistical test for equal variances. This is done under Stat > Basic Statistics > 2 Variances.
After selecting the data columns click Options and check the box for “use test and
confidence intervals based on normal distribution”. We will use this since we are assuming
the data is normal. The hypotheses for this two variance test are:
Ho: the ratio of the two variances = 1 vs. Ha: the ratio of the two variances ≠ 1
With the p-value for the tests being greater than alpha of 0.05 we fail to reject the null
hypothesis and assume the ratio is 1 and that the variances are equal.
3
Using Stat > Basic Statistics > 2-Sample t, we select “each sample in its own column” and
enter “Males” as sample 1 and “Females” as sample 2. Click Options and make sure the
confidence level, hypothesized difference, and alternative hypothesis are correct (95, 0, and
“difference > hypothesized value), and that the check box for Equal Variances is checked.
Two-Sample T-Test and CI: Males, Females
Two-sample T for Males vs Females
Males
Females
N
14
14
Mean
20.86
20.43
StDev
8.00
6.20
SE Mean
2.1
1.7
Difference = μ (Males) - μ (Females)
Estimate for difference: 0.43
95% lower bound for difference: -4.18
T-Test of difference = 0 (vs >): T-Value = 0.16
Both use Pooled StDev = 7.1553
P-Value = 0.438
DF = 26
The p-value for our test is 0.438 which exceeds our 0.05 level of significance. We will
therefore fail to reject the null hypothesis. Based on the data, there is not enough evidence
to conclude that, on average, college-bound males score higher on the ACT Math compared
to college-bound females.
5. Eleven tires were each measured for tread wear by two methods, one based on weight and the
other on groove wear. Here are the data (tire_data.txt) in thousands of miles:
Weight
Groove
30.5
28.7
30.9
25.9
31.9
23.3
30.4
23.1
27.3
23.7
20.4
20.9
24.5
16.1
20.9
19.9
18.9
15.2
13.7
11.5
11.4
11.2
Does it appear the two methods, on the average, give different results? Set up the hypotheses.
Indicate whether you are going to use 2-sample t-test or paired t-test for the problem. If you use
2-sample t-test, will you use pooled or non-pooled variances? What conditions will you check
before you perform the test?
The data are paired since the two methods are measured on the same tire (i.e. we have two
measurements off each subject: a tire). This designed is preferred since we would
anticipate that tires with high measurement on one of the methods will also have high
measurements from the other method and vice-versa.
With only 11 paired observations means we only have 11 differences, which is not large
enough - need at least 30 differences - to assume normality. We will need to check the
differences using a probability plot.
To compute these differences in Minitab go to Calc > Calculator and follow the screen shot
below. The term “diff” is not required although you must enter something in this field.
Minitab uses it to name the output column. Click OK and you will find a column in
Minitab named ‘diff’ (or whatever you called it) that contains the differences between the
Weight method column minus the Groove method column.
4
We then go to Graphs > Probability Plot > Single and graph the column of these
differences. From the plot we do not notice any departures from normality. All 11 of the
differences fall within the confidence bands.
5
Set up hypotheses as:
Ho: µd = 0 versus Ho: µd ≠ 0 where ‘d’ represents the difference, “Weight – Groove”.
Using Stat > Basic Statistics > Paired t we select “data in separate columns” and enter
“Weight” as sample 1 and “Groove” as sample 2. Click Options and make sure the
confidence level, hypothesized difference, and alternative hypothesis are correct (95, 0, and
“difference > hypothesized value). The output is:
Paired T-Test and CI: Weight, Groove
Paired T for Weight - Groove
Weight
Groove
Difference
N
11
11
11
Mean
23.71
19.95
3.755
StDev
7.19
5.77
3.221
SE Mean
2.17
1.74
0.971
95% CI for mean difference: (1.590, 5.919)
T-Test of mean difference = 0 (vs ≠ 0): T-Value = 3.87
P-Value = 0.003
With p-value of 0.003 being less than our alpha value of 0.05, we reject the null hypothesis.
There is statistical evidence to conclude that there is a difference in average tread wear
between the weight and groove methods. Furthermore, with the confidence interval being
positive (1.59 to 5.919) and the difference calculated by Weigh – Groove, we can also
conclude that the average tread wear using the weighted method is larger than the mean
tread wear when using the grooved method.
SPECIAL NOTE: If you were to run this as an independent means test with equal
variances, the p-value comes to 0.096 which we lead to a non-rejection at the 5% level of
significance. This is why determining the correct data structure is vital as not doing so can
lead to an incorrect decision. Depending on the severity of the study – think for instance a
drug trial where the incorrect decision could be fatal – such an error can have costly
ramifications.
6. The risk of an investment is measured in terms of the variance in the return that could be
observed. Random samples of 10 yearly returns were obtained from two different portfolios. The
data are given (in thousands of dollars) below (investment.txt). Does Portfolio 2 appear to have a
higher risk than Portfolio 1? Which test did you use and why?
Portfolio 1
130 135 135 131 129 135 126 136 127 132
Portfolio 2
154 144 147 150 155 153 149 139 140 141
6
𝝈𝟐
𝝈𝟐
𝑷𝟐
The hypotheses set up would be: Ho: 𝝈𝑷𝟐
𝟐 = 1 vs. Ha: 𝝈𝟐 > 1
𝑷𝟏
𝑷𝟏
With a probability plot of both samples not showing any departures from normality, the Ftest for equal variances would be better choice.
To test the hypotheses, we use Stat > Basic Statistics > 2 Variances. Here we have to be
careful!!! Because we are interested in testing a specific difference (i.e. a one-sided test) we
have to carefully select the data to fit the default methods of the software. In this case,
Minitab compares the ratio by Sample 1 over Sample 2. Since our hypothesis is Portfolio2
over Portfolio1 is greater than 1, we need to enter Portfolio2 as sample 1 and Portfolio1 as
sample 2. Then click Options and make the selections as shown below.
After we choose everything correctly and run the test, we get a 0.074 p-value which is
larger than α of 0.05 leading us to not reject the null hypothesis. The data does not support
a difference in the variances at a 5% level of significance. We cannot conclude that
Portfolio2 is riskier than Portfolio1.
7