* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes for Module 8 - UNC
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
German tank problem wikipedia , lookup
Taylor's law wikipedia , lookup
Misuse of statistics wikipedia , lookup
Soci708 – Statistics for Sociologists Module 8 – Inference for Means & Proportions1 François Nielsen University of North Carolina Chapel Hill Fall 2008 1 Adapted from slides for the course Quantitative Methods in Sociology (Sociology 6Z3) taught at McMaster University by Robert Andersen (now at University of Toronto) 1 / 73 Goals of Today’s Lecture É É What can we do when we don’t know the standard deviation of the population? Practical application of inference for means É É É É Standard errors Significance tests and confidence intervals based on the t-test statistic One-sample tests (including matched pairs), Two-sample tests Practical application for inference for proportions É É Significance tests and confidence intervals based Z-tests One-sample (including matched pairs), Two-sample tests 2 / 73 Sampling Distribution and σ É Recall the three characteristics of the sampling distribution of a sample mean (x): 1. It is close to normally distributed (central limit theorem) 2. The mean of all the means (i.e., the mean of x) equals the population mean µ 3. The standard deviation of the mean is: σ σx = p n É É The problem here for real applications is that we do not know the population standard deviation σ and hence cannot determine the standard deviation of the sampling distribution Recall that last week we proceeded as if we knew σ 3 / 73 Statistical Significance for Means when σ is known É Recall that we can standardize x if we know σ, resulting in the one-sample z statistic: z= x−µ p σ/ n Where z tells us how far the observed sample mean (x) is from µ in the units of the standard deviation of x. É Following from the z-statistic, we can solve for a confidence interval for µ: σ µ = x ± z∗ p n Where z∗ represents the critical value of z – i.e., the value of z that marks off the specified area under the normal curve that is of interest 4 / 73 Calculating the confidence interval for a mean when σ is known 1. Calculate the point estimate for the sample mean: P xi x= = 12.5 n 2. Find the critical value of z∗ that corresponds to the desired level of confidence É For example, a 95% CI requires the middle 95% of the area. Therefore, we need z∗ for an area (1 − .95)/2 = .025 and 1 − .05/2 = .975. We see from Table A, that z∗ ≈ 1.96 3. Substitute the known values into the formula: σ x ± z∗ p n 5 / 73 Calculating a significance test for a mean when σ is known 1. Start with an alternative hypothesis: Ha : µ > µ0 Remember, any direction can be specified. 2. The null hypothesis is then stated as the complementary event to Ha : H0 : µ ≤ µ0 3. Carry out a test of significance of x − µ0 : z= É x − µ0 p σ/ n High z-scores (small P-values) are evidence against the null hypothesis. We can specify an α level in advance, or simply state the significance level (P-value) 6 / 73 What do we do when σ is not known? É É In order to use this statistical inference method on a population for which σ is not known, we need some way of estimating the standard deviation of x We do this by using information from the sample É É É More specifically we substitute σ with s, the sample standard deviation of X This results in replacing the standard deviation of x with its standard error: s SEx = p n When we use the SE to estimate the standard deviation of x, the resulting test statistic is no longer normally distributed. It follows a Student t distribution 7 / 73 “Student” Strikes Again! É William Gosset published the t distribution under the pseudonym Student while employed by the Guiness brewery in Dublin É A t(k)-distributed variable is equivalent to t(k) = q Z χ 2 (k) k where Z and χ 2 (k) are independent and k is the degrees of freedom 8 / 73 More on “Student” Origin of the t-test in industrial quality control2 The t statistic was introduced in 1908 by William Sealy Gosset, a statistician working for the Guinness brewery in Dublin, Ireland (“Student” was his pen name). Gosset had been hired due to Claude Guinness’s innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness’ industrial processes. Gosset devised the t-test as a way to cheaply monitor the quality of beer. He published the test in Biometrika in 1908, but was forced to use a pen name by his employer, who regarded the fact that they were using statistics as a trade secret. In fact, Gosset’s identity was known to fellow statisticians. 2 From Wikipedia article Student’s t-test accessed 30 Oct 2008 9 / 73 The One-Sample t-Statistic É The one-sample t statistic is as follows: t= É É Notice that the only thing different here from the z-test is that we use s in place of σ Since each sample will have a different standard deviation s, we incorporate this added uncertainty into our estimation of the test statistic. É É É x−µ p s/ n The t statistic has a different degrees of freedom for each sample size, resulting in greater confidence as the sample size increases. The degrees of freedom for t are n − 1. There is a separate distribution for each n We write the t distribution with k degrees of freedom as t(k). See Table C in Moore & al. (2009) for critical values of t 10 / 73 Degrees of Freedom (1) É É The degrees of freedom is usually denoted by df This is the number of values (observations) that are allowed to vary when computing a statistic É É É In other words, it is the number of independent pieces of information that are used in calculating the statistic In practical terms, it is almost synonymous with sample size For example, consider the calculation of the sample variance s2 , which has n − 1 df: P (xi − x)2 s2 = n−1 11 / 73 Degrees of Freedom (2) É É 2 To Pcalculate the variance s , we start by calculating the mean ( xi /n). Next we find the sum of the squared deviations from that mean: X (xi − x)2 There are n squared deviations but only (n − 1) of them are free to take on any value whatsoever É É The last squared deviation from P the mean includes the one value of X that ensures that xi /n equals the mean of the sample. For these reasons, the variance s2 is said to have only (n − 1) degrees of freedom 12 / 73 The t distributions É É The t distributions are similar in shape to the normal distribution – they are symmetric, have a mean of zero, are single peaked and bell-shaped The t distributions differ from the normal distribution in terms of spread, however É É Because we substitute s for the fixed parameter σ, we add more variation into the statistic (in other words, less certainty). This results in the standard deviation of the t distribution being larger than the standard deviation of the normal distribution As the sample size n (and degrees of freedom, df = n − 1) increases the t distribution approaches the N(0, 1) É É In other words, with large sample sizes, t and z give nearly identical results This can be seen in McCabe et al. (2009) Table C (see bottom row) 13 / 73 The t-distributions (2) 14 / 73 Confidence Intervals for a Population Mean using the t-statistic É É É Procedures for confidence intervals and significance tests for a population mean follow exactly the same structure as when using the normal distribution All we do now, then, is use s in place of σ and the t-statistic instead of the z-statistic So, if we want a confidence interval for a population mean we use the following formula: s x ± t∗ p = x ± t∗ × SEx n where SEx = É s p n Here t∗ is the upper (1 − C)/2 critical value for the t(n − 1) distribution; recall the C is the level of confidence we choose 15 / 73 Hypothesis Tests for a Population Mean and the t-distribution É For a significance test, we again specify the alternative hypothesis and the null hypothesis in the same manner as for the z-statistic É Assume the null hypothesis H0 : µ = µ0 É The one-sample t-statistic with n − 1 df is: t= É x − µ0 p s/ x Just as for the significance tests based on the z-statistic, we can specify one of three different alternative hypotheses: É É É Ha : µ > µ0 (we look at the area to the right of t) Ha : µ < µ0 (we look at the area to the left of t) Ha : µ 6= µ0 (we look at the area more extreme than |t|) 16 / 73 Hypothesis Tests for a Population Mean and the t-distribution (2) 17 / 73 Assumptions and Statistical Inference using the t-distribution É The data are from a simple random sample É É Typically we relax this assumption to include samples that are approximate to a simple random sample The population has a normal distribution with mean µ and standard deviation σ É É We relax this assumption and assume only that the distribution is symmetric, single peaked and does not have serious outliers (like the mean, t-procedures are strongly influenced by outliers). The larger the sample, the more relaxed we get: É É É Small samples of n < 15: Sample data should be close to normally distributed and have no serious outliers Samples of 15 ≤ n < 40: Use if no outliers nor strong skewness (normality not necessary) Large samples of n ≥ 40: can be used even if sample distribution is skewed 18 / 73 Calculating the confidence interval for a mean using the t-distribution É Recalling the example from last week, assume that we are interested in finding a 95% confidence interval for average years of education É É This means that we want the middle 95% of the area of the sampling distribution for the mean of education We are given the following information: n = 1000 s = 4.5 x = 12.5 ∗ t = 1.962 (sample size) (standard deviation of education) (mean of education) (critical value of t from Table C) 19 / 73 Calculating the confidence interval for a mean using the t-distribution (2) É Calculating the confidence interval is now as simple as substituting the known values into the equation: n = 1000 s = 4.5 x = 12.5 4.5 s x ± t∗ p = 12.5 ± 1.962 p n 1000 = 12.5 ± .279 = 12.221 to 12.779 ∗ t = 1.962 É In conclusion, we are 95% confident that the true education mean lies between 12.221 and 12.779 years É Remember: We have used a method that works 95% of the time. For a given sample we are either completely right or completely wrong – we are not 95% correct! We can’t know whether we are correct or not. 20 / 73 Calculating the confidence interval for a mean using the t-distribution (3) Calculations for example in R and in Stata > # in R > qt(0.975, 999) [1] 1.962341 > # Margin of error > qt(0.975, 999)*(4.5/sqrt(1000)) [1] 0.2792461 > # CI lower and upper bounds > 12.5 - 0.2792461 [1] 12.22075 > 12.5 + 0.2792461 [1] 12.77925 . * in Stata . display invttail(999, 0.025) 1.9623415 . * Margin of error . display invttail(999, 0.025) *(4.5/sqrt(1000)) .27924609 . * CI lower and upper bounds . display 12.5 - 0.27924609 12.220754 . display 12.5 + 0.27924609 12.779246 21 / 73 Normal Distribution versus the t-distribution É For this example, a 95% CI is nearly equivalent using the t-distribution and the normal distribution É É The critical value of t for n = 1000 is 1.962 – not much different from the critical value for z of 1.960 for a 95% CI using the normal distribution Let’s compare the results for a 95% CI for the t distribution and normal distribution for the same example but for a smaller sample (n = 10, df = 9): s 4.5 x ± t∗ p = 12.5 ± 2.262 p n 10 = 12.5 ± 3.219 = 9.281 to 15.719 É s 4.5 x ± z∗ p = 12.5 ± 1.96 p n 10 = 12.5 ± .2789 = 9.711 to 15.289 We see here that the CI is larger for the t distribution É The difference between t∗ and z∗ gets bigger the smaller the sample size 22 / 73 Calculating a significance test for a mean using the t-distribution É Recalling the education example once again, we want to test whether average level of education in the population is greater than 12 É We know the following information: n = 1000 s = 4.5 x = 12.5 ∗ t = 1.645 (sample size) (standard deviation of education) (mean of education) (critical value of t(999) for α = .05) 1. We start by stating the alternative hypothesis, which is that average education is greater than 12 years: Ha : µ > 12 23 / 73 Significance Tests: An example (2) 2. We state the (complementary) null hypothesis which in this case is that average education level is 12 years (or less): H0 : µ ≤ 12 3. We now carry out a t-test to see what proportion of samples would give an outcome as extreme as ours (12.5) if the null hypothesis (µ = 12) is correct. In this case we are looking for the proportion of samples that would be higher than 12.5: t= x − µ0 12.5 − 12 .5 = = 3.52 p = p s/ n 4.5/ 1000 .142 4. Since t = 3.52 is much higher than t∗ = 1.645, we can reject the null hypothesis at α = .05. In other words, we can conclude Ha that average education in the population is higher than 12 years 24 / 73 25 / 73 One-sample t-tests in R > x<-c(3,3.2,3.2,3.3,2.9,3,3.1,3.1,3.4) > mean(x) [1] 3.133333 > t.test(x, alternative="greater", mu=3, conf.level=.95) One Sample t-test data: x t = 2.5298, df = 8, p-value = 0.01763 alternative hypothesis: true mean is greater than 3 95 percent confidence interval: 3.035327 Inf sample estimates: mean of x 3.133333 > # other tests: alternative="less"; alternative="two.sided" 26 / 73 One-sample t-tests in Stata . input var var 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 3 3.2 3.2 3.3 2.9 3 3.1 3.1 3.4 end . ttest var=3 One-sample t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------var | 9 3.133333 .0527046 .1581139 3.011796 3.25487 -----------------------------------------------------------------------------mean = mean(var) t = 2.5298 Ho: mean = 3 degrees of freedom = 8 Ha: mean < 3 Pr(T < t) = 0.9824 Ha: mean != 3 Pr(|T| > |t|) = 0.0353 Ha: mean > 3 Pr(T > t) = 0.0176 27 / 73 Matched Pairs A Special Kind of One-sample t-test É É So far we have only examined one-sample t-tests for a single population mean A matched pairs test is a special kind of one-sample test that tests for difference in a particular variable measured twice for a single observation É É For example, a test for difference in salary between husband and wives would require a matched pairs design – the husband and wives are not independent observations. In this case “married couples” are the observations Another example of a matched pair design is a test for a change over time for individuals – i.e., information is gathered from people at two points in time, and we test for difference over time 28 / 73 Matched Pairs Example Incomes of husbands and wives É É Our goal is to test whether husbands have higher incomes than women on average Assume that we have the following sample of n = 6 married couples. X is husband’s income − wife’s income Obs Husband’s Income Wife’s income X 1 2 3 4 5 6 50,000 44,000 51,000 35,000 30,000 28,000 41,000 36,000 45,000 29,000 22,000 21,000 9,000 8,000 6,000 6,000 8,000 7,000 P x= xi n = 44, 000 6 rP = 7, 330 s = (xi − x)2 n−1 = 1, 211 29 / 73 Matched Pairs Example (2) Incomes of husbands and wives É The alternative hypothesis is that husbands make more: Ha : µ > 0 É The null hypothesis is that there is no difference in incomes of husbands and wives (or wives make more): H0 : µ ≤ 0 É We have the following information: n=6 x = 7, 330 (mean difference between husband and wife) s = 1, 211 (s for difference between husband and wife) µ0 = 0 É The one sample t(5) statistic for a test for difference in means is then: x − µ0 t= p s/ n 30 / 73 Matched Pairs Example (3) Incomes of husbands and wives É Substituting in the known values we get: t= 7, 330 x−0 7, 330 = 14.84 p = p = 494 s/ n 1, 211/ 6 É We can see from Table C that the upper tail p-value for t(5) = 14.84 is much smaller than .0005 É In other words, we are quite confident that the null hypothesis is not correct and thus reject it É We conclude, then, that husbands typically have higher incomes than their wives 31 / 73 Matched Pairs in R Incomes of husbands and wives > x<-c(9000,8000,6000,6000,8000,7000) > t.test(x, alternative="greater", mu=0, conf.level=.95) One Sample t-test data: x t = 14.8324, df = 5, p-value = 1.260e-05 alternative hypothesis: true mean is greater than 0 95 percent confidence interval: 6337.067 Inf sample estimates: mean of x 7333.333 32 / 73 Matched Pairs in Stata Incomes of husbands and wives . input x x 1. 2. 3. 4. 5. 6. 7. 9000 8000 6000 6000 8000 7000 end . ttest x=0 One-sample t test -----------------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------x | 6 7333.333 494.4132 1211.06 6062.404 8604.263 -----------------------------------------------------------------------------mean = mean(x) t = 14.8324 Ho: mean = 0 degrees of freedom = 5 Ha: mean < 0 Pr(T < t) = 1.0000 Ha: mean != 0 Pr(|T| > |t|) = 0.0000 Ha: mean > 0 Pr(T > t) = 0.0000 33 / 73 Matched Pairs in R Aggressive behavior of dementia patients during full moon (Moore et al. 2009) > Fullmoon <- read.table("D:/soci708/ data/data_from_Moore_2009/PC-Text/ ch07/ta07_002.txt", header = TRUE) > attach(Fullmoon) > Fullmoon patient aggmoon aggother aggdiff 1 1 3.33 0.27 3.06 2 2 3.67 0.59 3.08 3 3 2.67 0.32 2.35 4 4 3.33 0.19 3.14 5 5 3.33 1.26 2.07 6 6 3.67 0.11 3.56 7 7 4.67 0.30 4.37 8 8 2.67 0.40 2.27 9 9 6.00 1.59 4.41 10 10 4.33 0.60 3.73 11 11 3.33 0.65 2.68 12 12 0.67 0.69 -0.02 13 13 1.33 1.26 0.07 14 14 0.33 0.23 0.10 15 15 2.00 0.38 1.62 > xbar <- mean(aggdiff) > xbar [1] 2.432667 > s <- sd(aggdiff) > s [1] 1.460320 > n <- 15 > t <- (xbar-0)/(s/sqrt(n)) > t [1] 6.451789 > pval <- 2*(1-pt(t,14)) > pval [1] 1.518152e-05 > # Or use t.test function > t.test(aggdiff, alternative= "two.sided", mu=0) ... t = 6.4518, df = 14, p-value = 1.518e-05 34 / 73 Matched Pairs in Stata Aggressive behavior of dementia patients during full moon (Moore et al. 2009) * * * * . . open ta07_002.xls in Excel, copy data open Data Editor in Stata, paste data click Preserve then CLOSE Data Editor to save as *.dta file use File/Save list ... (output not shown) stem aggdiff ... -0** 0** 0** 1** 1** 2** 2** 3** 3** 4** | | | | | | | | | | 02 07,10 . display 2.432667/(1.46032/sqrt(15)) 6.4517906 . display 2*ttail(14, 6.4517906) .00001518 . * now the easy way! . ttest aggdiff=0 ... (output not shown) 62 07,27,35 68 06,08,14 56,73 37,41 . * first do it by hand for practice . su aggdiff Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------aggdiff | 15 2.432667 1.46032 -.02 4.41 35 / 73 Two-sample Problems: Comparing Means É Often we want to compare the means from two different populations or subgroups É For example, testing whether men have higher incomes than women. In this case, women and men constitute separate populations É Such comparisons require two sample t-test procedures É A confidence interval is found as follows: È s21 s2 + 2 (x1 − x2 ) ± t∗ n1 n2 É É Here the subscripts are used to denote which sample the numbers come from. So, we have two samples: one labeled “1” and the other labeled “2” The degrees of freedom for t∗ is the smaller of n1 − 1 and n2 − 1 36 / 73 Two-sample Problems: Comparing Means É A two-sample t-statistic for a significance test for the hypothesis H0 : µ1 = µ2 is found as follows: x1 − x2 t= q s21 s22 + n n 1 2 É Once again the subscripts denote which sample the numbers come from: one sample is labeled “1”; the other labeled “2” É As was the case for the critical value of t for the confidence interval, the degrees of freedom for t is the smaller of n1 − 1 and n2 − 13 3 Software uses an approximation formula for df ; estimated df is between n1 and n2 and not necessarily an integer 37 / 73 Assumptions for Comparing Two Means 1. Both samples should be SRS of two distinct populations or subgroups É É The samples must be independent of each other In practice, subsamples of a single SRS satisfy this condiiton 2. Both populations must be normally distributed. É É We cannot know whether the populations are normally distributed, but just as with the one sample tests and confidence intervals for a mean, we can (and should) look at the sample distribution for signs of nonnormality If the population distributions differ, larger sample sizes are needed 3. Finally, the two sample tests are most robust when the two samples are the same size 38 / 73 Two-sample Problems: An example Proportion of women in State cabinets É Suppose we want to test whether the proportion of women in State cabinets is higher in states with Democratic governors (population 1) than in states with Republican governors (population 2) É É É É The alternative hypothesis is: Ha : µ1 > µ2 The null hypothesis is: H0 : µ1 ≤ µ2 We chose the α = .05 level We have the following data from the two SRS (sets of states with cabinets): 1 2 3 4 5 > Woprops[1:5,] state. demo womno cabno woprop NJ 0 6 20 0.300 IL 0 7 27 0.259 IA 0 5 20 0.250 ME 0 5 20 0.250 WI 0 5 20 0.250 ... 39 / 73 Two-sample Problems: An example (2) Proportion of women in State cabinets É The following statistics were calculated from the samples: Democrat É Republican x1 = 0.239864 x2 = 0.171235 s1 = 0.135046 s2 = 0.08224 n1 = 22 n2 = 17 Substituting these values into the equation gives the t-statistic with (n2 − 1 = 17 − 1 = 16) degrees of freedom: x1 − x2 .239864 − .171235 =q t= q 2 2 2 s1 s 0.1350462 + 0.08224 + n2 22 17 n 1 2 = 0.068629 0.03502602 = 1.959372 40 / 73 Two-sample Problems: An example (3) Proportion of women in State cabinets É Since, equivalently É É É É the t-statistic 1.959 is more than 1.746 (the one-sided critical value of t(16) for α = .05) the one-sided P-value 0.0339 is less than .05 we can reject the null hypothesis that the proportion of women is the same in states with Democratic and Republican governors, against the alternative that it is higher in states with Democratic governors On the other hand if we had carried out a two-sided test, the corresponding p-value would have been 2 × 0.0339 = 0.0677. We would conclude that we cannot reject the null hypothesis that the proportion of women is the same in Democratic and Republican-governed states against the alternative that they are different In the two-sided context we can also construct a confidence interval for the difference in the proportion of women in the cabinets Democratic and Republican governors 41 / 73 Two-sample Test for Mean in R Proportion of women in State cabinets: Two-sided test > > > > library foreign # provides the read.dta procedure Woprops<-read.dta("D:/soci708/data/woprops.dta") attach(Woprops) t.test(woprop~demo, alternative="two.sided", conf.level=.95, var.equal=FALSE) Welch Two Sample t-test data: woprop by demo t = -1.9594, df = 35.317, p-value = 0.058 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.139712197 0.002455513 sample estimates: mean in group 0 mean in group 1 0.1712353 0.2398636 > # States with democratic governor do not have a significantly different proportion of women cabinet members (2-sided) 42 / 73 Two-sample Test for Mean in R Proportion of women in State cabinets: One-sided test > > > > library foreign # provides the read.dta procedure Woprops<-read.dta("D:/soci708/data/woprops.dta") attach(Woprops) t.test(woprop~demo, alternative="less", conf.level=.95, var.equal=FALSE) Welch Two Sample t-test data: woprop by demo t = -1.9594, df = 35.317, p-value = 0.029 alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -0.009463723 sample estimates: mean in group 0 mean in group 1 0.1712353 0.2398636 > # Now states with democratic governor have a significantly greater proportion of women cabinet members (1-sided) 43 / 73 Two-sample Test for Mean in Stata Proportion of women in State cabinets: First, look at data . stem woprop if demo 0** 0** 1** 1** 2** 2** 3** 3** 4** 4** 5** 5** 6** | | | | | | | | | | | | | 00 67 00,30 50,67,82,88 00,00,22,31,40 61,67,73,86 00,33 00,44 36 . stem woprop if !demo 0** 0** 0** 1** 1** 1** 1** 1** 2** 2** 2** 2** 2** 3** | | | | | | | | | | | | | | 48 67,71,77 05,05,11 67 85 22,22,22 50,50,50,59 00 . by demo: su woprop -> demo = 0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------woprop | 17 .1712353 .0822401 .048 .3 -> demo = 1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------woprop | 22 .2398636 .1350461 0 .636 44 / 73 Two-sample Test for Mean in Stata Proportion of women in State cabinets: Equal variances (default) . ttest wopro, by(demo) Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------0 | 17 .1712353 .0199462 .0822401 .1289513 .2135193 1 | 22 .2398636 .0287919 .1350461 .1799875 .2997397 ---------+-------------------------------------------------------------------combined | 39 .2099487 .0190242 .1188063 .1714362 .2484613 ---------+-------------------------------------------------------------------diff | -.0686283 .0372071 -.144017 .0067604 -----------------------------------------------------------------------------diff = mean(0) - mean(1) t = -1.8445 Ho: diff = 0 degrees of freedom = 37 Ha: diff < 0 Pr(T < t) = 0.0366 Ha: diff != 0 Pr(|T| > |t|) = 0.0731 Ha: diff > 0 Pr(T > t) = 0.9634 . * contrary to R examples, Stata assumes equal variances by default . * df are calculated as (17 -1) + (22 - 1) = 37 45 / 73 Two-Sample Means Test in Stata Proportion of women in State cabinets: Unequal variances . ttest wopro, by(demo) unequal Two-sample t test with unequal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------0 | 17 .1712353 .0199462 .0822401 .1289513 .2135193 1 | 22 .2398636 .0287919 .1350461 .1799875 .2997397 ---------+-------------------------------------------------------------------combined | 39 .2099487 .0190242 .1188063 .1714362 .2484613 ---------+-------------------------------------------------------------------diff | -.0686283 .0350261 -.1397122 .0024555 -----------------------------------------------------------------------------diff = mean(0) - mean(1) t = -1.9594 Ho: diff = 0 Satterthwaite’s degrees of freedom = 35.3172 Ha: diff < 0 Pr(T < t) = 0.0290 Ha: diff != 0 Pr(|T| > |t|) = 0.0580 Ha: diff > 0 Pr(T > t) = 0.9710 . * same as in R examples, assume unequal variances (option unequal) . * note p-values smaller because assumption of unequal variances . * seems more realistic in this situation (compare the sd’s) 46 / 73 Inference for Proportions É So far we have looked only at how we can make inferences about population means É Similar techniques can be used to make inferences about population proportions É Recall that a sample proportion is calculated as follows: p̂ = count of successes in the sample total observations in the sample Here p̂ denotes a sample proportion and p is the population proportion. É As with the case for means, we use the sampling distribution to make inferences about population proportions 47 / 73 Sampling Distribution of a Sample Proportion É É The sampling distribution of a sample proportion behaves in a manner similar to the sampling distribution of the sample mean As we saw earlier, the sampling distribution of a sample proportion has the following characteristics: 1. The sampling distribution of p̂ becomes approximately normal as the sample size increases 2. The mean of the sampling distribution of p̂ is p 3. The standard deviation of the sampling distribution of p̂ is: r p(1 − p) n 48 / 73 Assumptions for Inference about Proportions 1. We assume a simple random sample 2. The normal approximation and the formula for the standard deviation hold only when the sample is no more than 1/10 the size of the population 3. The sample size must be sufficiently large in relation to p: É É É np and n(1 − p) must both be at least 10 This suggests, then, that the normal approximation is most accurate when p is close to .5 and least accurate when p = 0 or p=1 When these criteria are met, we can replace the unknown standard deviation of the sampling distribution of p̂ with its standard error r p̂(1 − p̂) SEp̂ = n 49 / 73 50 / 73 Confidence Intervals and Hypothesis Tests for Proportions É CIs for proportions take the usual form: Estimate ± z∗ × SE É É É Since we are assuming a normal distribution, we use a critical z value: r p̂(1 − p̂) ∗ p̂ ± z n Here z∗ is the upper (1 − C)/2 standard normal critical value – i.e., we look to Table A in Moore et al. (2009) We test the hypothesis H0 : p = p0 by computing the z statistic: p̂ − p0 z= q p0 (1−p0 ) n 51 / 73 52 / 73 Example of a Confidence Interval É É Imagine that we have a SRS of 2500 Canadians. We ask whether the respondent has lived abroad for at least 1 year. We count X = 187 “Yes” answers. We want to estimate the population proportion p with 99% Confidence. We have the following information from our sample: p̂ = 187 2500 z∗ = 2.576 É = .075 n = 2500 (for 99% CI) Substituting this information into the formula, we get: r r p̂(1 − p̂) (.075)(.925) p̂ ± z∗ = .075 ± 2.576 n 2500 = .075 ± .014 = .061 to .089 É We are 99% confident that between 6.1% and 8.9% of Canadians have lived abroad 53 / 73 Another Example of a Confidence Interval Obama v. McCain – Gallup Poll of 22 Oct 2008 É É É The Gallup Poll reports Obama 51%, McCain 45%, Other or undecided 6%; n = 2788 registered voters; margin of error is ±2% For a 95% CI we have z∗ = 1.960 Focusing on Obama support and substituting into the formula, we get: r r p̂(1 − p̂) (.51)(.49) = .51 ± 1.960 p̂ ± z∗ n 2788 = .51 ± 0.019 = .491 to .529 É É We are 95% confident that Obama support is between 49% and 53% of registered voters Note that declared margin of error of ±2 is slightly conservative 54 / 73 Example of a Significance Test É É Using an earlier example, we now test whether the percentage of Canadians who lived abroad differed from 5% We chose an α = .01 and thus need a critical value of z∗ = 2.576 (this is a two-tailed test!) H0 : p = .05 Ha : p 6= .05 É Substituting the known information into the formula we get: p̂ − p0 z= q p0 (1−p0 ) n É .075 − .05 =q = 5.735 .05(1−.05) 2500 Since the z-statistic is significantly larger than the critical value z∗ , we can reject H0 55 / 73 One-sample Test for Proportion in R > # in R > prop.test(187, 2500, p=.05, alternative="two.sided", conf.level=.99, correct=FALSE) 1-sample proportions test without continuity correction data: 187 out of 2500, null probability 0.05 X-squared = 32.3705, df = 1, p-value = 1.274e-08 alternative hypothesis: true p is not equal to 0.05 99 percent confidence interval: 0.06234433 0.08950663 sample estimates: p 0.0748 > # Alternatives are "two.sided", "greater" (p1>p2), and "less" (p1<p2) 56 / 73 One-sample Test for Proportion in Stata . * in Stata . prtesti 2500 187 .05, level(99) count One-sample test of proportion x: Number of obs = 2500 -----------------------------------------------------------------------------Variable | Mean Std. Err. [99% Conf. Interval] -------------+---------------------------------------------------------------x | .0748 .0052614 .0612476 .0883524 -----------------------------------------------------------------------------p = proportion(x) z = 5.6895 Ho: p = 0.05 Ha: p < 0.05 Pr(Z < z) = 1.0000 Ha: p != 0.05 Pr(|Z| > |z|) = 0.0000 Ha: p > 0.05 Pr(Z > z) = 0.0000 57 / 73 Sample Size for Desired Margin of Error É Just as was the case for inference for means, when collecting data it can be important to choose a sample size large enough to obtain a desired margin of error É The margin of error is determined by: r p̂(1 − p̂) ∗ m=z n É É We must guess the value of p̂ with p∗ . We can use either a pilot study or use the conservative estimate of .5 (this will give the largest possible margin of error). Sample size can then be calculated as follows: n= z∗ m 2 p∗ (1 − p∗ ) 58 / 73 Sample Size for Desired Margin of Error An Example É A public opinion firm wants to determine the sample size needed to estimate the proportion of adults in North Carolina holding a variety of opinions with 95% confidence with a margin of error of ±3% É Since there are several opinion questions, the firm wants a sample size sufficient to insure the desired margin of error in the worst case scenario – i.e., when p = 0.5 É Applying the formula for n on the previous slide, the firm calculates: > # in R > (1.96/0.03)^2*0.5*(1-0.5) [1] 1067.111 É Thus the firm will need a sample of n = 1, 068 respondents 59 / 73 Sample Size for Desired Margin of Error Why p = .5 is used to calculate sample size when true p is unknown . * in Stata . twoway function y=x*(1-x), range(0 1) xtitle(‘‘p’’) ytitle(‘‘p*(1-p)’’) É p = .5 corresponds to the maximum variance p(1 − p), hence the largest (most conservative) estimate of n needed to achieve the desired margin of error 60 / 73 Comparing Two Proportions: Confidence Intervals É Again as in the case with means, it is often of interest to compare two populations É The confidence interval for comparing two proportions is given by: (p̂1 − p̂2 ) ± z∗ SEp̂1 −p̂2 È p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) ∗ + (p̂1 − p̂2 ) ± z n1 n2 As usual, z∗ is the upper (1 − C)/2 standard normal critical value É This confidence interval can be used when the populations are at least 10 times as large as the samples and counts of success in both samples is 5 or more 61 / 73 62 / 73 Comparing Two Proportions: Significance Tests É É Significance tests for the differences between two proportions also follow a similar pattern to the tests for difference in means To test H0 : p1 = p2 we calculate the z statistic as follows: z= q p̂1 − p̂2 p̂(1 − p̂) n1 + 1 1 n2 Here p̂1 is for sample one; p̂2 is for sample two; p̂ (without the subscript) is the pooled sample proportion: p̂ = count of successes in both samples combined total observations in both samples combined 63 / 73 Two-Sample Test for Proportions Example: Frequency of Left-Higher Ridge Count in Right-Handers É Higher ridge count on fingers of left hand is a measure of body asymmetry É Perhaps related to differential development of hemispheres of the brain – thus differing between men and women É From Kimura, Doreen (1999, Figure 12.2 p. 167) 64 / 73 Two-Sample Test for Proportions Example: Frequency of Left-Higher Ridge Count in Right-Handers (2) É Data from Kimura (1999, Table 12.2 p.169): Frequency of Left-higher Ridge Count in Right-handers Women Men É Left-higher Not Left-higher Total 23 20 73 134 96 154 We have the following information from the data: Women (1) Men (2) n1 = 96 n2 = 154 X1 = 23 23 = .240 p̂1 = 96 X2 = 20 20 p̂2 = = .130 154 65 / 73 Two-Sample Test for Proportions Example: Frequency of Left-Higher Ridge Count in Right-Handers (3) É Using the formula for the confidence interval for the difference of two proportions we have: È p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 ) (p̂1 − p̂2 ) ± z∗ + n1 n2 r .240(1 − .240) .130(1 − .130) (.240 − .130) ± 1.960 + 96 154 .110 ± 1.960 × .05133 = 0.0094 to 0.2106 É We conclude with 95% confidence that the difference in proportion left-higher between women and men is between 0.9% and 21.1% É As this interval does not include zero we can also conclude that the difference is significant at the .05 level 66 / 73 Two-Sample Test for Proportions Example: Frequency of Left-Higher Ridge Count in Right-Handers (4) É To directly test the hypothesis H0 : p1 = p2 we calculate the z statistic as follows: z= q p̂1 − p̂2 p̂(1 − p̂) n1 + 1 1 n2 .240 − .130 =q 1 + .172(1 − .172) 96 É 1 154 = 2.2415 Here p̂1 is for sample one; p̂2 is for sample two; p̂ (without the subscript) is the pooled sample proportion (23 + 20)/(96 + 154) = .172 67 / 73 Two-Sample Test for Proportions Example: Frequency of Left-Higher Ridge Count in Right-Handers (5) É The two-sided p-value P(|Z| > 2.2415) corresponding to the alternative hypothesis Ha : p1 6= p2 is 0.025. É É We conclude once again that the proportion left-higher differs significantly between women and men at the α = .05 level If we wanted to test the one-sided alternative Ha : p1 ≥ p2 we would obtain the one-sided p-value P(Z > 2.2415) by dividing the two-sided p-value .025 by 2, obtaining 0.0125 É É We conclude the one-sided alternative hypothesis Ha : p1 > p2 is also significant at the .05 level Software packages often give only the two-sided p-value, which has to be divided by 2 to obtain the one-sided p-value 68 / 73 Two-Sample Test for Proportions in R Left-Higher Ridge Count: Two-Sided Test > > > > # in R left.higher <- c(23, 20) # women lh, men lh subjects <- c(96, 154) # n women, n men prop.test(left.higher, subjects, correct=FALSE) 2-sample test for equality of proportions without continuity correction data: left.higher out of subjects X-squared = 4.9982, df = 1, p-value = 0.02537 alternative hypothesis: two.sided 95 percent confidence interval: 0.009170057 0.210256350 sample estimates: prop 1 prop 2 0.2395833 0.1298701 > sqrt(4.9982) [1] 2.235665 69 / 73 Two-Sample Test for Proportions in R Left-Higher Ridge Count: One-Sided Test > > > > # in R left.higher <- c(23, 20) # women lh, men lh subjects <- c(96, 154) # n women, n men prop.test(left.higher, subjects, alternative="greater", correct=FALSE) 2-sample test for equality of proportions without continuity correction data: left.higher out of subjects X-squared = 4.9982, df = 1, p-value = 0.01269 alternative hypothesis: greater 95 percent confidence interval: 0.02533473 1.00000000 sample estimates: prop 1 prop 2 0.2395833 0.1298701 70 / 73 Two-Sample Test for Proportions in Stata Left-Higher Ridge Count: One-Sided & Two-Sided Tests . . . . * in Stata * syntax is ‘‘prtesti n1 p1 n2 p2’’ * or ‘‘prtesti n1 X1 n2 X2, count’’ prtesti 96 23 154 20, count Two-sample test of proportion x: Number of obs = 96 y: Number of obs = 154 -----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .2395833 .0435631 .1542013 .3249654 y | .1298701 .0270886 .0767775 .1829628 -------------+---------------------------------------------------------------diff | .1097132 .0512985 .0091701 .2102564 | under Ho: .0490742 2.24 0.025 -----------------------------------------------------------------------------diff = prop(x) - prop(y) z = 2.2357 Ho: diff = 0 Ha: diff < 0 Pr(Z < z) = 0.9873 Ha: diff != 0 Pr(|Z| < |z|) = 0.0254 Ha: diff > 0 Pr(Z > z) = 0.0127 71 / 73 Two-Sample Test for Proportions in Stata Left-Higher Ridge Count: Equivalence of z Test for Proportions & χ 2 Test . . . . * in Stata * ‘‘prtesti 96 23 154 20, count’’ is equivalent to using * the tabi command with the original contingency table: tabi 23 73\ 20 134, chi2 | col row | 1 2 | Total -----------+----------------------+---------1 | 23 73 | 96 2 | 20 134 | 154 -----------+----------------------+---------Total | 43 207 | 250 Pearson chi2(1) = 4.9982 Pr = 0.025 . * take the square root of the chi-squared . display sqrt(4.9982) 2.2356654 . * note it is the same z as obtained with prtesti! . * this is a glimpse of the next topic 72 / 73 Next week: É Inference for crosstabs 73 / 73