Download Notes for Module 8 - UNC

Document related concepts

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Soci708 – Statistics for Sociologists
Module 8 – Inference for Means & Proportions1
François Nielsen
University of North Carolina
Chapel Hill
Fall 2008
1
Adapted from slides for the course Quantitative Methods in Sociology
(Sociology 6Z3) taught at McMaster University by Robert Andersen (now at
University of Toronto)
1 / 73
Goals of Today’s Lecture
É
É
What can we do when we don’t know the standard deviation
of the population?
Practical application of inference for means
É
É
É
É
Standard errors
Significance tests and confidence intervals based on the t-test
statistic
One-sample tests (including matched pairs), Two-sample tests
Practical application for inference for proportions
É
É
Significance tests and confidence intervals based Z-tests
One-sample (including matched pairs), Two-sample tests
2 / 73
Sampling Distribution and σ
É
Recall the three characteristics of the sampling distribution of
a sample mean (x):
1. It is close to normally distributed (central limit theorem)
2. The mean of all the means (i.e., the mean of x) equals the
population mean µ
3. The standard deviation of the mean is:
σ
σx = p
n
É
É
The problem here for real applications is that we do not know
the population standard deviation σ and hence cannot
determine the standard deviation of the sampling distribution
Recall that last week we proceeded as if we knew σ
3 / 73
Statistical Significance for Means when σ is known
É
Recall that we can standardize x if we know σ, resulting in
the one-sample z statistic:
z=
x−µ
p
σ/ n
Where z tells us how far the observed sample mean (x) is
from µ in the units of the standard deviation of x.
É
Following from the z-statistic, we can solve for a confidence
interval for µ:
σ
µ = x ± z∗ p
n
Where z∗ represents the critical value of z – i.e., the value of z
that marks off the specified area under the normal curve that
is of interest
4 / 73
Calculating the confidence interval for a mean when σ is
known
1. Calculate the point estimate for the sample mean:
P
xi
x=
= 12.5
n
2. Find the critical value of z∗ that corresponds to the desired
level of confidence
É
For example, a 95% CI requires the middle 95% of the area.
Therefore, we need z∗ for an area (1 − .95)/2 = .025 and
1 − .05/2 = .975. We see from Table A, that z∗ ≈ 1.96
3. Substitute the known values into the formula:
σ
x ± z∗ p
n
5 / 73
Calculating a significance test for a mean when σ is
known
1. Start with an alternative hypothesis:
Ha : µ > µ0
Remember, any direction can be specified.
2. The null hypothesis is then stated as the complementary event
to Ha :
H0 : µ ≤ µ0
3. Carry out a test of significance of x − µ0 :
z=
É
x − µ0
p
σ/ n
High z-scores (small P-values) are evidence against the null
hypothesis. We can specify an α level in advance, or simply
state the significance level (P-value)
6 / 73
What do we do when σ is not known?
É
É
In order to use this statistical inference method on a
population for which σ is not known, we need some way of
estimating the standard deviation of x
We do this by using information from the sample
É
É
É
More specifically we substitute σ with s, the sample standard
deviation of X
This results in replacing the standard deviation of x with its
standard error:
s
SEx = p
n
When we use the SE to estimate the standard deviation of x,
the resulting test statistic is no longer normally distributed. It
follows a Student t distribution
7 / 73
“Student” Strikes Again!
É
William Gosset published
the t distribution under the
pseudonym Student while
employed by the Guiness
brewery in Dublin
É
A t(k)-distributed variable is
equivalent to
t(k) = q
Z
χ 2 (k)
k
where Z and χ 2 (k) are
independent and k is the
degrees of freedom
8 / 73
More on “Student”
Origin of the t-test in industrial quality control2
The t statistic was introduced in 1908 by William Sealy
Gosset, a statistician working for the Guinness brewery in
Dublin, Ireland (“Student” was his pen name). Gosset had
been hired due to Claude Guinness’s innovative policy of
recruiting the best graduates from Oxford and Cambridge
to apply biochemistry and statistics to Guinness’ industrial
processes. Gosset devised the t-test as a way to cheaply
monitor the quality of beer. He published the test in
Biometrika in 1908, but was forced to use a pen name by
his employer, who regarded the fact that they were using
statistics as a trade secret. In fact, Gosset’s identity was
known to fellow statisticians.
2
From Wikipedia article Student’s t-test accessed 30 Oct 2008
9 / 73
The One-Sample t-Statistic
É
The one-sample t statistic is as follows:
t=
É
É
Notice that the only thing different here from the z-test is that
we use s in place of σ
Since each sample will have a different standard deviation s,
we incorporate this added uncertainty into our estimation of
the test statistic.
É
É
É
x−µ
p
s/ n
The t statistic has a different degrees of freedom for each sample
size, resulting in greater confidence as the sample size
increases. The degrees of freedom for t are n − 1.
There is a separate distribution for each n
We write the t distribution with k degrees of freedom as t(k).
See Table C in Moore & al. (2009) for critical values of t
10 / 73
Degrees of Freedom (1)
É
É
The degrees of freedom is usually denoted by df
This is the number of values (observations) that are allowed
to vary when computing a statistic
É
É
É
In other words, it is the number of independent pieces of
information that are used in calculating the statistic
In practical terms, it is almost synonymous with sample size
For example, consider the calculation of the sample variance
s2 , which has n − 1 df:
P
(xi − x)2
s2 =
n−1
11 / 73
Degrees of Freedom (2)
É
É
2
To
Pcalculate the variance s , we start by calculating the mean
( xi /n). Next we find the sum of the squared deviations
from that mean:
X
(xi − x)2
There are n squared deviations but only (n − 1) of them are
free to take on any value whatsoever
É
É
The last squared deviation from
P the mean includes the one
value of X that ensures that xi /n equals the mean of the
sample.
For these reasons, the variance s2 is said to have only (n − 1)
degrees of freedom
12 / 73
The t distributions
É
É
The t distributions are similar in shape to the normal
distribution – they are symmetric, have a mean of zero, are
single peaked and bell-shaped
The t distributions differ from the normal distribution in terms
of spread, however
É
É
Because we substitute s for the fixed parameter σ, we add
more variation into the statistic (in other words, less
certainty). This results in the standard deviation of the t
distribution being larger than the standard deviation of the
normal distribution
As the sample size n (and degrees of freedom, df = n − 1)
increases the t distribution approaches the N(0, 1)
É
É
In other words, with large sample sizes, t and z give nearly
identical results
This can be seen in McCabe et al. (2009) Table C (see bottom
row)
13 / 73
The t-distributions (2)
14 / 73
Confidence Intervals for a Population Mean using the
t-statistic
É
É
É
Procedures for confidence intervals and significance tests for a
population mean follow exactly the same structure as when
using the normal distribution
All we do now, then, is use s in place of σ and the t-statistic
instead of the z-statistic
So, if we want a confidence interval for a population mean we
use the following formula:
s
x ± t∗ p = x ± t∗ × SEx
n
where SEx =
É
s
p
n
Here t∗ is the upper (1 − C)/2 critical value for the t(n − 1)
distribution; recall the C is the level of confidence we choose
15 / 73
Hypothesis Tests for a Population Mean and the
t-distribution
É
For a significance test, we again specify the alternative
hypothesis and the null hypothesis in the same manner as for
the z-statistic
É
Assume the null hypothesis H0 : µ = µ0
É
The one-sample t-statistic with n − 1 df is:
t=
É
x − µ0
p
s/ x
Just as for the significance tests based on the z-statistic, we
can specify one of three different alternative hypotheses:
É
É
É
Ha : µ > µ0 (we look at the area to the right of t)
Ha : µ < µ0 (we look at the area to the left of t)
Ha : µ 6= µ0 (we look at the area more extreme than |t|)
16 / 73
Hypothesis Tests for a Population Mean and the
t-distribution (2)
17 / 73
Assumptions and Statistical Inference using the
t-distribution
É
The data are from a simple random sample
É
É
Typically we relax this assumption to include samples that are
approximate to a simple random sample
The population has a normal distribution with mean µ and
standard deviation σ
É
É
We relax this assumption and assume only that the distribution
is symmetric, single peaked and does not have serious outliers
(like the mean, t-procedures are strongly influenced by
outliers).
The larger the sample, the more relaxed we get:
É
É
É
Small samples of n < 15: Sample data should be close to
normally distributed and have no serious outliers
Samples of 15 ≤ n < 40: Use if no outliers nor strong skewness
(normality not necessary)
Large samples of n ≥ 40: can be used even if sample distribution
is skewed
18 / 73
Calculating the confidence interval for a mean using the
t-distribution
É
Recalling the example from last week, assume that we are
interested in finding a 95% confidence interval for average
years of education
É
É
This means that we want the middle 95% of the area of the
sampling distribution for the mean of education
We are given the following information:
n = 1000
s = 4.5
x = 12.5
∗
t = 1.962
(sample size)
(standard deviation of education)
(mean of education)
(critical value of t from Table C)
19 / 73
Calculating the confidence interval for a mean using the
t-distribution (2)
É
Calculating the confidence interval is now as simple as
substituting the known values into the equation:
n = 1000
s = 4.5
x = 12.5
4.5
s
x ± t∗ p = 12.5 ± 1.962 p
n
1000
= 12.5 ± .279
= 12.221 to 12.779
∗
t = 1.962
É
In conclusion, we are 95% confident that the true education
mean lies between 12.221 and 12.779 years
É
Remember: We have used a method that works 95% of the
time. For a given sample we are either completely right or
completely wrong – we are not 95% correct! We can’t know
whether we are correct or not.
20 / 73
Calculating the confidence interval for a mean using the
t-distribution (3)
Calculations for example in R and in Stata
> # in R
> qt(0.975, 999)
[1] 1.962341
> # Margin of error
> qt(0.975, 999)*(4.5/sqrt(1000))
[1] 0.2792461
> # CI lower and upper bounds
> 12.5 - 0.2792461
[1] 12.22075
> 12.5 + 0.2792461
[1] 12.77925
. * in Stata
. display invttail(999, 0.025)
1.9623415
. * Margin of error
. display invttail(999, 0.025)
*(4.5/sqrt(1000))
.27924609
. * CI lower and upper bounds
. display 12.5 - 0.27924609
12.220754
. display 12.5 + 0.27924609
12.779246
21 / 73
Normal Distribution versus the t-distribution
É
For this example, a 95% CI is nearly equivalent using the
t-distribution and the normal distribution
É
É
The critical value of t for n = 1000 is 1.962 – not much
different from the critical value for z of 1.960 for a 95% CI
using the normal distribution
Let’s compare the results for a 95% CI for the t distribution
and normal distribution for the same example but for a
smaller sample (n = 10, df = 9):
s
4.5
x ± t∗ p = 12.5 ± 2.262 p
n
10
= 12.5 ± 3.219
= 9.281 to 15.719
É
s
4.5
x ± z∗ p = 12.5 ± 1.96 p
n
10
= 12.5 ± .2789
= 9.711 to 15.289
We see here that the CI is larger for the t distribution
É
The difference between t∗ and z∗ gets bigger the smaller the
sample size
22 / 73
Calculating a significance test for a mean using the
t-distribution
É
Recalling the education example once again, we want to test
whether average level of education in the population is
greater than 12
É
We know the following information:
n = 1000
s = 4.5
x = 12.5
∗
t = 1.645
(sample size)
(standard deviation of education)
(mean of education)
(critical value of t(999) for α = .05)
1. We start by stating the alternative hypothesis, which is that
average education is greater than 12 years:
Ha : µ > 12
23 / 73
Significance Tests: An example (2)
2. We state the (complementary) null hypothesis which in this
case is that average education level is 12 years (or less):
H0 : µ ≤ 12
3. We now carry out a t-test to see what proportion of samples
would give an outcome as extreme as ours (12.5) if the null
hypothesis (µ = 12) is correct. In this case we are looking for
the proportion of samples that would be higher than 12.5:
t=
x − µ0
12.5 − 12
.5
=
= 3.52
p =
p
s/ n
4.5/ 1000 .142
4. Since t = 3.52 is much higher than t∗ = 1.645, we can reject
the null hypothesis at α = .05. In other words, we can
conclude Ha that average education in the population is
higher than 12 years
24 / 73
25 / 73
One-sample t-tests in R
> x<-c(3,3.2,3.2,3.3,2.9,3,3.1,3.1,3.4)
> mean(x)
[1] 3.133333
> t.test(x, alternative="greater", mu=3, conf.level=.95)
One Sample t-test
data: x
t = 2.5298, df = 8, p-value = 0.01763
alternative hypothesis: true mean is greater than 3
95 percent confidence interval:
3.035327
Inf
sample estimates:
mean of x
3.133333
> # other tests: alternative="less"; alternative="two.sided"
26 / 73
One-sample t-tests in Stata
. input var
var
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
3
3.2
3.2
3.3
2.9
3
3.1
3.1
3.4
end
. ttest var=3
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------var |
9
3.133333
.0527046
.1581139
3.011796
3.25487
-----------------------------------------------------------------------------mean = mean(var)
t =
2.5298
Ho: mean = 3
degrees of freedom =
8
Ha: mean < 3
Pr(T < t) = 0.9824
Ha: mean != 3
Pr(|T| > |t|) = 0.0353
Ha: mean > 3
Pr(T > t) = 0.0176
27 / 73
Matched Pairs
A Special Kind of One-sample t-test
É
É
So far we have only examined one-sample t-tests for a single
population mean
A matched pairs test is a special kind of one-sample test that
tests for difference in a particular variable measured twice for a
single observation
É
É
For example, a test for difference in salary between husband
and wives would require a matched pairs design – the husband
and wives are not independent observations. In this case
“married couples” are the observations
Another example of a matched pair design is a test for a
change over time for individuals – i.e., information is gathered
from people at two points in time, and we test for difference
over time
28 / 73
Matched Pairs Example
Incomes of husbands and wives
É
É
Our goal is to test whether husbands have higher incomes
than women on average
Assume that we have the following sample of n = 6 married
couples. X is husband’s income − wife’s income
Obs
Husband’s Income
Wife’s income
X
1
2
3
4
5
6
50,000
44,000
51,000
35,000
30,000
28,000
41,000
36,000
45,000
29,000
22,000
21,000
9,000
8,000
6,000
6,000
8,000
7,000
P
x=
xi
n
=
44, 000
6
rP
= 7, 330 s =
(xi − x)2
n−1
= 1, 211
29 / 73
Matched Pairs Example (2)
Incomes of husbands and wives
É
The alternative hypothesis is that husbands make more:
Ha : µ > 0
É
The null hypothesis is that there is no difference in incomes of
husbands and wives (or wives make more):
H0 : µ ≤ 0
É
We have the following information:
n=6
x = 7, 330
(mean difference between husband and wife)
s = 1, 211
(s for difference between husband and wife)
µ0 = 0
É
The one sample t(5) statistic for a test for difference in means
is then:
x − µ0
t= p
s/ n
30 / 73
Matched Pairs Example (3)
Incomes of husbands and wives
É
Substituting in the known values we get:
t=
7, 330
x−0
7, 330
= 14.84
p =
p =
494
s/ n 1, 211/ 6
É
We can see from Table C that the upper tail p-value for
t(5) = 14.84 is much smaller than .0005
É
In other words, we are quite confident that the null
hypothesis is not correct and thus reject it
É
We conclude, then, that husbands typically have higher
incomes than their wives
31 / 73
Matched Pairs in R
Incomes of husbands and wives
> x<-c(9000,8000,6000,6000,8000,7000)
> t.test(x, alternative="greater", mu=0, conf.level=.95)
One Sample t-test
data: x
t = 14.8324, df = 5, p-value = 1.260e-05
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
6337.067
Inf
sample estimates:
mean of x
7333.333
32 / 73
Matched Pairs in Stata
Incomes of husbands and wives
. input x
x
1.
2.
3.
4.
5.
6.
7.
9000
8000
6000
6000
8000
7000
end
. ttest x=0
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------x |
6
7333.333
494.4132
1211.06
6062.404
8604.263
-----------------------------------------------------------------------------mean = mean(x)
t = 14.8324
Ho: mean = 0
degrees of freedom =
5
Ha: mean < 0
Pr(T < t) = 1.0000
Ha: mean != 0
Pr(|T| > |t|) = 0.0000
Ha: mean > 0
Pr(T > t) = 0.0000
33 / 73
Matched Pairs in R
Aggressive behavior of dementia patients during full moon (Moore et al. 2009)
> Fullmoon <- read.table("D:/soci708/
data/data_from_Moore_2009/PC-Text/
ch07/ta07_002.txt", header = TRUE)
> attach(Fullmoon)
> Fullmoon
patient aggmoon aggother aggdiff
1
1
3.33
0.27
3.06
2
2
3.67
0.59
3.08
3
3
2.67
0.32
2.35
4
4
3.33
0.19
3.14
5
5
3.33
1.26
2.07
6
6
3.67
0.11
3.56
7
7
4.67
0.30
4.37
8
8
2.67
0.40
2.27
9
9
6.00
1.59
4.41
10
10
4.33
0.60
3.73
11
11
3.33
0.65
2.68
12
12
0.67
0.69
-0.02
13
13
1.33
1.26
0.07
14
14
0.33
0.23
0.10
15
15
2.00
0.38
1.62
> xbar <- mean(aggdiff)
> xbar
[1] 2.432667
> s <- sd(aggdiff)
> s
[1] 1.460320
> n <- 15
> t <- (xbar-0)/(s/sqrt(n))
> t
[1] 6.451789
> pval <- 2*(1-pt(t,14))
> pval
[1] 1.518152e-05
> # Or use t.test function
> t.test(aggdiff, alternative=
"two.sided", mu=0)
...
t = 6.4518, df = 14,
p-value = 1.518e-05
34 / 73
Matched Pairs in Stata
Aggressive behavior of dementia patients during full moon (Moore et al. 2009)
*
*
*
*
.
.
open ta07_002.xls in Excel, copy data
open Data Editor in Stata, paste data
click Preserve then CLOSE Data Editor
to save as *.dta file use File/Save
list ... (output not shown)
stem aggdiff ...
-0**
0**
0**
1**
1**
2**
2**
3**
3**
4**
|
|
|
|
|
|
|
|
|
|
02
07,10
. display 2.432667/(1.46032/sqrt(15))
6.4517906
. display 2*ttail(14, 6.4517906)
.00001518
. * now the easy way!
. ttest aggdiff=0
... (output not shown)
62
07,27,35
68
06,08,14
56,73
37,41
. * first do it by hand for practice
. su aggdiff
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------aggdiff |
15
2.432667
1.46032
-.02
4.41
35 / 73
Two-sample Problems: Comparing Means
É
Often we want to compare the means from two different
populations or subgroups
É
For example, testing whether men have higher incomes than
women. In this case, women and men constitute separate
populations
É
Such comparisons require two sample t-test procedures
É
A confidence interval is found as follows:
È
s21
s2
+ 2
(x1 − x2 ) ± t∗
n1 n2
É
É
Here the subscripts are used to denote which sample the
numbers come from. So, we have two samples: one labeled
“1” and the other labeled “2”
The degrees of freedom for t∗ is the smaller of n1 − 1 and
n2 − 1
36 / 73
Two-sample Problems: Comparing Means
É
A two-sample t-statistic for a significance test for the
hypothesis H0 : µ1 = µ2 is found as follows:
x1 − x2
t= q
s21
s22
+
n
n
1
2
É
Once again the subscripts denote which sample the numbers
come from: one sample is labeled “1”; the other labeled “2”
É
As was the case for the critical value of t for the confidence
interval, the degrees of freedom for t is the smaller of n1 − 1
and n2 − 13
3
Software uses an approximation formula for df ; estimated df is between n1
and n2 and not necessarily an integer
37 / 73
Assumptions for Comparing Two Means
1. Both samples should be SRS of two distinct populations or
subgroups
É
É
The samples must be independent of each other
In practice, subsamples of a single SRS satisfy this condiiton
2. Both populations must be normally distributed.
É
É
We cannot know whether the populations are normally
distributed, but just as with the one sample tests and
confidence intervals for a mean, we can (and should) look at
the sample distribution for signs of nonnormality
If the population distributions differ, larger sample sizes are
needed
3. Finally, the two sample tests are most robust when the two
samples are the same size
38 / 73
Two-sample Problems: An example
Proportion of women in State cabinets
É
Suppose we want to test whether the proportion of women in
State cabinets is higher in states with Democratic governors
(population 1) than in states with Republican governors
(population 2)
É
É
É
É
The alternative hypothesis is: Ha : µ1 > µ2
The null hypothesis is: H0 : µ1 ≤ µ2
We chose the α = .05 level
We have the following data from the two SRS (sets of states
with cabinets):
1
2
3
4
5
> Woprops[1:5,]
state. demo womno cabno woprop
NJ
0
6
20 0.300
IL
0
7
27 0.259
IA
0
5
20 0.250
ME
0
5
20 0.250
WI
0
5
20 0.250
...
39 / 73
Two-sample Problems: An example (2)
Proportion of women in State cabinets
É
The following statistics were calculated from the samples:
Democrat
É
Republican
x1 = 0.239864
x2 = 0.171235
s1 = 0.135046
s2 = 0.08224
n1 = 22
n2 = 17
Substituting these values into the equation gives the t-statistic
with (n2 − 1 = 17 − 1 = 16) degrees of freedom:
x1 − x2
.239864 − .171235
=q
t= q
2
2
2
s1
s
0.1350462
+ 0.08224
+ n2
22
17
n
1
2
=
0.068629
0.03502602
= 1.959372
40 / 73
Two-sample Problems: An example (3)
Proportion of women in State cabinets
É
Since, equivalently
É
É
É
É
the t-statistic 1.959 is more than 1.746 (the one-sided critical
value of t(16) for α = .05)
the one-sided P-value 0.0339 is less than .05
we can reject the null hypothesis that the proportion of women
is the same in states with Democratic and Republican
governors, against the alternative that it is higher in states
with Democratic governors
On the other hand if we had carried out a two-sided test, the
corresponding p-value would have been 2 × 0.0339 = 0.0677.
We would conclude that we cannot reject the null hypothesis
that the proportion of women is the same in Democratic and
Republican-governed states against the alternative that they
are different
In the two-sided context we can also construct a confidence
interval for the difference in the proportion of women in the
cabinets Democratic and Republican governors
41 / 73
Two-sample Test for Mean in R
Proportion of women in State cabinets: Two-sided test
>
>
>
>
library foreign # provides the read.dta procedure
Woprops<-read.dta("D:/soci708/data/woprops.dta")
attach(Woprops)
t.test(woprop~demo, alternative="two.sided", conf.level=.95,
var.equal=FALSE)
Welch Two Sample t-test
data: woprop by demo
t = -1.9594, df = 35.317, p-value = 0.058
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.139712197 0.002455513
sample estimates:
mean in group 0 mean in group 1
0.1712353
0.2398636
> # States with democratic governor do not have a significantly
different proportion of women cabinet members (2-sided)
42 / 73
Two-sample Test for Mean in R
Proportion of women in State cabinets: One-sided test
>
>
>
>
library foreign # provides the read.dta procedure
Woprops<-read.dta("D:/soci708/data/woprops.dta")
attach(Woprops)
t.test(woprop~demo, alternative="less", conf.level=.95,
var.equal=FALSE)
Welch Two Sample t-test
data: woprop by demo
t = -1.9594, df = 35.317, p-value = 0.029
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -0.009463723
sample estimates:
mean in group 0 mean in group 1
0.1712353
0.2398636
> # Now states with democratic governor have a significantly
greater proportion of women cabinet members (1-sided)
43 / 73
Two-sample Test for Mean in Stata
Proportion of women in State cabinets: First, look at data
. stem woprop if demo
0**
0**
1**
1**
2**
2**
3**
3**
4**
4**
5**
5**
6**
|
|
|
|
|
|
|
|
|
|
|
|
|
00
67
00,30
50,67,82,88
00,00,22,31,40
61,67,73,86
00,33
00,44
36
. stem woprop if !demo
0**
0**
0**
1**
1**
1**
1**
1**
2**
2**
2**
2**
2**
3**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48
67,71,77
05,05,11
67
85
22,22,22
50,50,50,59
00
. by demo: su woprop
-> demo = 0
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------woprop |
17
.1712353
.0822401
.048
.3
-> demo = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------woprop |
22
.2398636
.1350461
0
.636
44 / 73
Two-sample Test for Mean in Stata
Proportion of women in State cabinets: Equal variances (default)
. ttest wopro, by(demo)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
17
.1712353
.0199462
.0822401
.1289513
.2135193
1 |
22
.2398636
.0287919
.1350461
.1799875
.2997397
---------+-------------------------------------------------------------------combined |
39
.2099487
.0190242
.1188063
.1714362
.2484613
---------+-------------------------------------------------------------------diff |
-.0686283
.0372071
-.144017
.0067604
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -1.8445
Ho: diff = 0
degrees of freedom =
37
Ha: diff < 0
Pr(T < t) = 0.0366
Ha: diff != 0
Pr(|T| > |t|) = 0.0731
Ha: diff > 0
Pr(T > t) = 0.9634
. * contrary to R examples, Stata assumes equal variances by default
. * df are calculated as (17 -1) + (22 - 1) = 37
45 / 73
Two-Sample Means Test in Stata
Proportion of women in State cabinets: Unequal variances
. ttest wopro, by(demo) unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
17
.1712353
.0199462
.0822401
.1289513
.2135193
1 |
22
.2398636
.0287919
.1350461
.1799875
.2997397
---------+-------------------------------------------------------------------combined |
39
.2099487
.0190242
.1188063
.1714362
.2484613
---------+-------------------------------------------------------------------diff |
-.0686283
.0350261
-.1397122
.0024555
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -1.9594
Ho: diff = 0
Satterthwaite’s degrees of freedom = 35.3172
Ha: diff < 0
Pr(T < t) = 0.0290
Ha: diff != 0
Pr(|T| > |t|) = 0.0580
Ha: diff > 0
Pr(T > t) = 0.9710
. * same as in R examples, assume unequal variances (option unequal)
. * note p-values smaller because assumption of unequal variances
. * seems more realistic in this situation (compare the sd’s)
46 / 73
Inference for Proportions
É
So far we have looked only at how we can make inferences
about population means
É
Similar techniques can be used to make inferences about
population proportions
É
Recall that a sample proportion is calculated as follows:
p̂ =
count of successes in the sample
total observations in the sample
Here p̂ denotes a sample proportion and p is the population
proportion.
É
As with the case for means, we use the sampling distribution
to make inferences about population proportions
47 / 73
Sampling Distribution of a Sample Proportion
É
É
The sampling distribution of a sample proportion behaves in a
manner similar to the sampling distribution of the sample
mean
As we saw earlier, the sampling distribution of a sample
proportion has the following characteristics:
1. The sampling distribution of p̂ becomes approximately normal
as the sample size increases
2. The mean of the sampling distribution of p̂ is p
3. The standard deviation of the sampling distribution of p̂ is:
r
p(1 − p)
n
48 / 73
Assumptions for Inference about Proportions
1. We assume a simple random sample
2. The normal approximation and the formula for the standard
deviation hold only when the sample is no more than 1/10 the
size of the population
3. The sample size must be sufficiently large in relation to p:
É
É
É
np and n(1 − p) must both be at least 10
This suggests, then, that the normal approximation is most
accurate when p is close to .5 and least accurate when p = 0 or
p=1
When these criteria are met, we can replace the unknown
standard deviation of the sampling distribution of p̂ with its
standard error
r
p̂(1 − p̂)
SEp̂ =
n
49 / 73
50 / 73
Confidence Intervals and Hypothesis Tests for Proportions
É
CIs for proportions take the usual form:
Estimate ± z∗ × SE
É
É
É
Since we are assuming a normal distribution, we use a critical
z value:
r
p̂(1 − p̂)
∗
p̂ ± z
n
Here z∗ is the upper (1 − C)/2 standard normal critical value –
i.e., we look to Table A in Moore et al. (2009)
We test the hypothesis H0 : p = p0 by computing the z statistic:
p̂ − p0
z= q
p0 (1−p0 )
n
51 / 73
52 / 73
Example of a Confidence Interval
É
É
Imagine that we have a SRS of 2500 Canadians. We ask
whether the respondent has lived abroad for at least 1 year.
We count X = 187 “Yes” answers. We want to estimate the
population proportion p with 99% Confidence.
We have the following information from our sample:
p̂ =
187
2500
z∗ = 2.576
É
= .075
n = 2500
(for 99% CI)
Substituting this information into the formula, we get:
r
r
p̂(1
−
p̂)
(.075)(.925)
p̂ ± z∗
= .075 ± 2.576
n
2500
= .075 ± .014
= .061 to .089
É
We are 99% confident that between 6.1% and 8.9% of
Canadians have lived abroad
53 / 73
Another Example of a Confidence Interval
Obama v. McCain – Gallup Poll of 22 Oct 2008
É
É
É
The Gallup Poll reports Obama 51%, McCain 45%, Other or
undecided 6%; n = 2788 registered voters; margin of error is
±2%
For a 95% CI we have z∗ = 1.960
Focusing on Obama support and substituting into the formula,
we get:
r
r
p̂(1
−
p̂)
(.51)(.49)
= .51 ± 1.960
p̂ ± z∗
n
2788
= .51 ± 0.019
= .491 to .529
É
É
We are 95% confident that Obama support is between 49%
and 53% of registered voters
Note that declared margin of error of ±2 is slightly
conservative
54 / 73
Example of a Significance Test
É
É
Using an earlier example, we now test whether the
percentage of Canadians who lived abroad differed from 5%
We chose an α = .01 and thus need a critical value of
z∗ = 2.576 (this is a two-tailed test!)
H0 : p = .05
Ha : p 6= .05
É
Substituting the known information into the formula we get:
p̂ − p0
z= q
p0 (1−p0 )
n
É
.075 − .05
=q
= 5.735
.05(1−.05)
2500
Since the z-statistic is significantly larger than the critical
value z∗ , we can reject H0
55 / 73
One-sample Test for Proportion in R
> # in R
> prop.test(187, 2500, p=.05, alternative="two.sided",
conf.level=.99, correct=FALSE)
1-sample proportions test without continuity correction
data: 187 out of 2500, null probability 0.05
X-squared = 32.3705, df = 1, p-value = 1.274e-08
alternative hypothesis: true p is not equal to 0.05
99 percent confidence interval:
0.06234433 0.08950663
sample estimates:
p
0.0748
> # Alternatives are "two.sided", "greater" (p1>p2),
and "less" (p1<p2)
56 / 73
One-sample Test for Proportion in Stata
. * in Stata
. prtesti 2500 187 .05, level(99) count
One-sample test of proportion
x: Number of obs =
2500
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
[99% Conf. Interval]
-------------+---------------------------------------------------------------x |
.0748
.0052614
.0612476
.0883524
-----------------------------------------------------------------------------p = proportion(x)
z =
5.6895
Ho: p = 0.05
Ha: p < 0.05
Pr(Z < z) = 1.0000
Ha: p != 0.05
Pr(|Z| > |z|) = 0.0000
Ha: p > 0.05
Pr(Z > z) = 0.0000
57 / 73
Sample Size for Desired Margin of Error
É
Just as was the case for inference for means, when collecting
data it can be important to choose a sample size large enough
to obtain a desired margin of error
É
The margin of error is determined by:
r
p̂(1 − p̂)
∗
m=z
n
É
É
We must guess the value of p̂ with p∗ . We can use either a
pilot study or use the conservative estimate of .5 (this will
give the largest possible margin of error).
Sample size can then be calculated as follows:
‚
n=
z∗
m
Œ2
p∗ (1 − p∗ )
58 / 73
Sample Size for Desired Margin of Error
An Example
É
A public opinion firm wants to determine the sample size
needed to estimate the proportion of adults in North Carolina
holding a variety of opinions with 95% confidence with a
margin of error of ±3%
É
Since there are several opinion questions, the firm wants a
sample size sufficient to insure the desired margin of error in
the worst case scenario – i.e., when p = 0.5
É
Applying the formula for n on the previous slide, the firm
calculates:
> # in R
> (1.96/0.03)^2*0.5*(1-0.5)
[1] 1067.111
É
Thus the firm will need a sample of n = 1, 068 respondents
59 / 73
Sample Size for Desired Margin of Error
Why p = .5 is used to calculate sample size when true p is unknown
. * in Stata
. twoway function y=x*(1-x),
range(0 1) xtitle(‘‘p’’)
ytitle(‘‘p*(1-p)’’)
É
p = .5 corresponds to the
maximum variance
p(1 − p), hence the
largest (most
conservative) estimate of
n needed to achieve the
desired margin of error
60 / 73
Comparing Two Proportions: Confidence Intervals
É
Again as in the case with means, it is often of interest to
compare two populations
É
The confidence interval for comparing two proportions is given
by:
(p̂1 − p̂2 ) ± z∗ SEp̂1 −p̂2
È
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
∗
+
(p̂1 − p̂2 ) ± z
n1
n2
As usual, z∗ is the upper (1 − C)/2 standard normal critical
value
É
This confidence interval can be used when the populations are
at least 10 times as large as the samples and counts of success
in both samples is 5 or more
61 / 73
62 / 73
Comparing Two Proportions: Significance Tests
É
É
Significance tests for the differences between two proportions
also follow a similar pattern to the tests for difference in
means
To test H0 : p1 = p2 we calculate the z statistic as follows:
z= q
p̂1 − p̂2
p̂(1 − p̂) n1 +
1
1
n2
Here p̂1 is for sample one; p̂2 is for sample two; p̂ (without the
subscript) is the pooled sample proportion:
p̂ =
count of successes in both samples combined
total observations in both samples combined
63 / 73
Two-Sample Test for Proportions
Example: Frequency of Left-Higher Ridge Count in Right-Handers
É
Higher ridge count on
fingers of left hand is a
measure of body asymmetry
É
Perhaps related to
differential development of
hemispheres of the brain –
thus differing between men
and women
É
From Kimura, Doreen
(1999, Figure 12.2 p. 167)
64 / 73
Two-Sample Test for Proportions
Example: Frequency of Left-Higher Ridge Count in Right-Handers (2)
É
Data from Kimura (1999, Table 12.2 p.169):
Frequency of Left-higher Ridge Count in Right-handers
Women
Men
É
Left-higher
Not Left-higher
Total
23
20
73
134
96
154
We have the following information from the data:
Women (1)
Men (2)
n1 = 96
n2 = 154
X1 = 23
23
= .240
p̂1 =
96
X2 = 20
20
p̂2 =
= .130
154
65 / 73
Two-Sample Test for Proportions
Example: Frequency of Left-Higher Ridge Count in Right-Handers (3)
É
Using the formula for the confidence interval for the
difference of two proportions we have:
È
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) ± z∗
+
n1
n2
r
.240(1 − .240) .130(1 − .130)
(.240 − .130) ± 1.960
+
96
154
.110 ± 1.960 × .05133 = 0.0094 to 0.2106
É
We conclude with 95% confidence that the difference in
proportion left-higher between women and men is between
0.9% and 21.1%
É
As this interval does not include zero we can also conclude that
the difference is significant at the .05 level
66 / 73
Two-Sample Test for Proportions
Example: Frequency of Left-Higher Ridge Count in Right-Handers (4)
É
To directly test the hypothesis H0 : p1 = p2 we calculate the z
statistic as follows:
z= q
p̂1 − p̂2
p̂(1 − p̂) n1 +
1
1
n2
.240 − .130
=q
1
+
.172(1 − .172) 96
É
1
154
= 2.2415
Here p̂1 is for sample one; p̂2 is for sample two; p̂ (without the
subscript) is the pooled sample proportion
(23 + 20)/(96 + 154) = .172
67 / 73
Two-Sample Test for Proportions
Example: Frequency of Left-Higher Ridge Count in Right-Handers (5)
É
The two-sided p-value P(|Z| > 2.2415) corresponding to the
alternative hypothesis Ha : p1 6= p2 is 0.025.
É
É
We conclude once again that the proportion left-higher differs
significantly between women and men at the α = .05 level
If we wanted to test the one-sided alternative Ha : p1 ≥ p2 we
would obtain the one-sided p-value P(Z > 2.2415) by dividing
the two-sided p-value .025 by 2, obtaining 0.0125
É
É
We conclude the one-sided alternative hypothesis Ha : p1 > p2
is also significant at the .05 level
Software packages often give only the two-sided p-value,
which has to be divided by 2 to obtain the one-sided p-value
68 / 73
Two-Sample Test for Proportions in R
Left-Higher Ridge Count: Two-Sided Test
>
>
>
>
# in R
left.higher <- c(23, 20) # women lh, men lh
subjects <- c(96, 154) # n women, n men
prop.test(left.higher, subjects, correct=FALSE)
2-sample test for equality of proportions without continuity
correction
data: left.higher out of subjects
X-squared = 4.9982, df = 1, p-value = 0.02537
alternative hypothesis: two.sided
95 percent confidence interval:
0.009170057 0.210256350
sample estimates:
prop 1
prop 2
0.2395833 0.1298701
> sqrt(4.9982)
[1] 2.235665
69 / 73
Two-Sample Test for Proportions in R
Left-Higher Ridge Count: One-Sided Test
>
>
>
>
# in R
left.higher <- c(23, 20) # women lh, men lh
subjects <- c(96, 154) # n women, n men
prop.test(left.higher, subjects, alternative="greater", correct=FALSE)
2-sample test for equality of proportions without continuity
correction
data: left.higher out of subjects
X-squared = 4.9982, df = 1, p-value = 0.01269
alternative hypothesis: greater
95 percent confidence interval:
0.02533473 1.00000000
sample estimates:
prop 1
prop 2
0.2395833 0.1298701
70 / 73
Two-Sample Test for Proportions in Stata
Left-Higher Ridge Count: One-Sided & Two-Sided Tests
.
.
.
.
* in Stata
* syntax is ‘‘prtesti n1 p1 n2 p2’’
* or ‘‘prtesti n1 X1 n2 X2, count’’
prtesti 96 23 154 20, count
Two-sample test of proportion
x: Number of obs =
96
y: Number of obs =
154
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.2395833
.0435631
.1542013
.3249654
y |
.1298701
.0270886
.0767775
.1829628
-------------+---------------------------------------------------------------diff |
.1097132
.0512985
.0091701
.2102564
| under Ho:
.0490742
2.24
0.025
-----------------------------------------------------------------------------diff = prop(x) - prop(y)
z =
2.2357
Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.9873
Ha: diff != 0
Pr(|Z| < |z|) = 0.0254
Ha: diff > 0
Pr(Z > z) = 0.0127
71 / 73
Two-Sample Test for Proportions in Stata
Left-Higher Ridge Count: Equivalence of z Test for Proportions & χ 2 Test
.
.
.
.
* in Stata
* ‘‘prtesti 96 23 154 20, count’’ is equivalent to using
* the tabi command with the original contingency table:
tabi 23 73\ 20 134, chi2
|
col
row |
1
2 |
Total
-----------+----------------------+---------1 |
23
73 |
96
2 |
20
134 |
154
-----------+----------------------+---------Total |
43
207 |
250
Pearson chi2(1) =
4.9982
Pr = 0.025
. * take the square root of the chi-squared
. display sqrt(4.9982)
2.2356654
. * note it is the same z as obtained with prtesti!
. * this is a glimpse of the next topic
72 / 73
Next week:
É
Inference for crosstabs
73 / 73