Download Example 1: Population Proportion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
PubH 6414 Worksheet 9: Inference for Proportions from One Group
1 of 5
Example 1: Population Proportion
Table 3-16 in the text describes the number of men and women with and without hematuria
among a sample of patients with acquired hemophilia. Data for the sample of 15 men are
represented below as ‘Yes’ for hematuria and ‘No’ for no hematuria.
Reference: Dawson B, Trapp RG. (2004). Chapter 3. Summarizing Data & Presenting Data in
Tables and Graphs. In, Basic & Clinical Biostatistics. 4th ed. New York: McGraw-Hill.
1. We are interested in the proportion of men with hematuria. Code this binary variable as ‘1’
for ‘Yes’ and ‘0’ for ‘No’ and sum the coded outcomes.
Hematuria
status
Yes
No
No
Yes
Yes
No
Yes
Yes
Yes
No
No
Yes
Yes
No
Yes
Sum
X: Coded as
‘1’ or ‘0’
1
0
0
1
1
0
1
1
1
0
0
1
1
0
1
9
2. The proportion of men with hematuria in this sample = the sum of the coded variable divided
by the number of men (n = 15).
p
x
n
i

9
 0.60
15
When the sample proportion is calculated as the sum of the coded ‘1’s and ‘0’s divided by
the number of trials, the sample proportion resembles a sample mean. By the Central Limit
Theorem, the sample mean has a normal distribution regardless of the population distribution
if the sample size is large enough. By the Central Limit Theorem the sample proportion has a
normal distribution when n* > 5 and n*(1-) > 5.
PubH 6414 Worksheet 9: Inference for Proportions from One Group
2 of 5
Example 2: Confidence Interval of a Population Proportion
Data as presented in the section slides: 202 of 900 randomly selected metro area youth reported
that they currently smoked in 2005.
a. Construct a 95% confidence interval for the population proportion of smokers among metro
area youth in 2005. Does the 95% confidence interval provide evidence that the smoking rate
among metro area youth is significantly different from the projected rate of 25% in 2005? As
you answer the questions above, you should included in your answer the proportion of
smokers, the standard error, the calculation of the confidence interval, and the interpretation
of the confidence interval.
i. Calculate the sample proportion of smokers among metro area youth (p):
p = 202/900 = 0.224
ii. Calculate the SE of sample proportion. Use the sample proportion to calculate the
SE(p)
0.224(1  0.224)
 0.014
se =
900
iii. Confidence coefficient for the 95% CI of the population proportion: 1.96
Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal
Quantiles
Probabilities = 0.975; select Lower tail
R script: z=qnorm(0.975)
iv. Compute the lower limit of the confidence interval.
p – z*se = 0.224 – 1.96*0.0014 = 0.197
v. Compute the upper limit of the confidence interval.
p + z*se = 0.224 – 1.96*0.0014 = 0.251
vi. Interpretation of confidence interval – is there evidence that the metro area smoking
rate is significantly different from 25%? Why or why not?
We have 95% confidence that the interval from 19.7% and 25.1% contains the true
mean smoking rate of metro area youth. Since the 95% confidence interval contains
the projected value of 25% there is not sufficient evidence to conclude, at 0.05 alpha
level, that the rate in the metro area is significantly less than the projected rate.
PubH 6414 Worksheet 9: Inference for Proportions from One Group
3 of 5
b. Compare the confidence interval results to the hypothesis test results. What are some
similarities and differences between these two methods of inference?
Similar: the confidence coefficient for the 95% confidence interval is the same as the critical
value for the hypothesis test with alpha = 0.05. Conclusions from the confidence interval and
the hypothesis test are the same.
Different: The SE of the confidence interval is calculated using the observed sample
proportion. The SE for the hypothesis test is calculated using the hypothesized value since
the hypothesis test is conducted under the assumption that the null hypothesis is true.
Because of the different calculations of SE it’s possible that confidence intervals and
hypothesis tests of proportion from one group might give different results.
PubH 6414 Worksheet 9: Inference for Proportions from One Group
4 of 5
Example 3: One Sample Hypothesis Test of a Proportion
From a random sample of 100 infants born to mothers who smoked during pregnancy, 16 had
LBW
a. Use the low birth weight data to conduct a hypothesis test with significance level 0.05 to
investigate whether LBW rate for infants born to mothers who smoke is different from the
national average LBW rate of 0.077. Set the significance level to 0.05. In your answer you
should include the null and alternative hypotheses, the appropriate type of test statistic,
critical value (s), the test statistic and p-value, and your conclusion.
i. Proportion of LBW infants.
p = 16/100 = 0.16
ii. State the null and alternative hypotheses.
hypothesized value ( = the national rate = 0.077
Ho:  = 0.077
HA:  ≠ 0.077
This is a two-tailed alternative.
iii. Identify the appropriate test statistic.
The appropriate test statistics is the z-statistic because the sample proportion has a
normal distribution when n* > 5 and n*(1-) > 5.
iv. Determine the critical value(s) for the hypothesis test.
The critical values are from the standard normal distribution with a rejection region of
0.025 in each tail: -1.96 and 1.96
v. Calculate the test statistic and p-value.
0.077(1  0.077)
SE 
 0.027
100
z = p - 
SE (
0.027
p-value = 2* 0.0009247942 = 0.001849588
The test statistic is 3.11 with p-vale 0.00185.
Since the test statistic is positive, use Rcmdr to get the upper tail probability then
multiple by 2 in the script window to obtain the two-tailed probability.
Rcmdr: Distributions > Continuous Distributions > Normal Distribution > Normal
Probabilities
Variable Value = z; select Lower tail
R script: 2*pnorm(z, lower.tail=FALSE)
2*(1-pnorm(z))
PubH 6414 Worksheet 9: Inference for Proportions from One Group
5 of 5
vi. State the conclusion of the test. Use both the critical value method and the p-value
method to make the conclusion.
The test statistic is in the upper tail rejection region because 3.11 > 1.96
The p-value of the test (p=0.00187) is less than the significance level 0.05.
By both criteria the null hypothesis is rejected and we can conclude that the LBW rate
for infants whose mothers smoked is significantly greater than the LBW rate in the
general population
b. Compare the results of the 95% confidence interval in the lesson and the z-test of one
proportion. What are the similarities and differences?
Similarities: the confidence coefficient for a 95% confidence interval is the same as the
critical value for a hypothesis test with alpha level = 0.05. The conclusions are the same for
both the confidence interval and the hypothesis test.
Differences: The SE for the confidence interval is calculated using the observed sample
proportion. The SE for the hypothesis test is calculated using the hypothesized proportion.
The confidence interval provides information about the precision of the estimate. The
hypothesis test provides a p-value.