Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 312 – Dr. Uebersax 23 – Confidence Interval for Difference Between Means 1. Comparing Two Means: Dependent Samples In the preceding lectures we've considered how to test a difference of two means for independent samples. Now we look at how to do the same thing with dependent samples – specifically, when observations from both samples can be matched one-for-one. This method is called the matched-pairs t-test or paired t-test. Some example applications: Are student ability scores are the same before vs. after a course? Do patients show improvement after a treatment? If Treatment A and Treatment B are given to the same patients, which works better? This procedure is very simple, because it is ultimately merely a test of a single mean. That is, let X1 and X2 be two measurements (e.g., Pre and Post scores) made on the same sample of subjects/objects. Define the new variable Difference = D = X1 – X2 for all cases. If our scientific hypothesis is that the means of X1 and X2 are different (e.g., one treatment is better than another other), our null and alternative hypotheses are simply: H0: μD = 0 (i.e., μ1 = μ2) H1: μD ≠ 0 (i.e., μ1 ≠ μ2) where μD is the (population) mean difference of X1 and X2, equal to μ1 – μ2. Alternatively, if we want to test for a difference of, say, greater than some value c: H0: μD = c (i.e., μ1 = μ2 + c) H1: μD > c (i.e., μ1 > μ2 + c) When the null hypothesis is for no difference (H0: μD = 0) we our test statistic is: t= where: D - D sD / n n is the number of pairs. sD is the sample standard deviation computed for D = (X1 – X2). As before, we then determine the probability (p) of this t value and compare it to a pre-specified α (e.g., α = 0.05). If p < α, reject H0. Credible/Confidence Intervals To compute a credible/confidence interval for the mean difference between matched pairs, look up the critical value of tcrit for the desired width of the credible/confidence interval (e.g., 95%). Then use the formulas: Statistics 312 – Dr. Uebersax 23 – Confidence Interval for Difference Between Means LL = ( X 1 X 2 ) tcrit sD UL = ( X 1 X 2 ) tcrit sD 2. Paired t-tests in Excel and JMP Excel 1. State H0 and H1; choose α. 2. Enter X1 and X2 values side by side in adjacent columns. 3. Make a new column for D = (X1 – X2). 4. Calculate mean and sample standard deviation of D. 5. Compute t statistic t = D /( s D / n ) (assuming H0: μD = 0) 6. Use Excel function T.DIST to find p = probability in tail area(s) of t distribution. 7. If p < α, reject H0. Figure 1 JMP 1. Paste X and Y variables into two separate columns, side by side. 2. Highlight columns 3. Analyze > Matched Pairs 4. In pop-up window, designate both variables as "Y, Paired Responses", and press OK Step 4 Step 3 Statistics 312 – Dr. Uebersax 23 – Confidence Interval for Difference Between Means 3. Chi-Square Tests We'll now look at how to test statistical hypotheses concerning nominal data, and specifically when nominal data are summarized as tables of frequencies. The tests we will considered are generically called chi-squared (or chi-square) tests. Each test involves computing a test statistic, and then calculating the area in the tail of a theoretical distribution called the chisquared (χ²) distribution. The χ² distribution, like the t distribution, is actually a family of distributions – each one corresponding to a certain number of degrees of freedom: However in the case of the χ² distribution, we are almost always concerned with upper-tail probabilities. That is, chi-squared tests are usually 1-tailed. Statistics 312 – Dr. Uebersax 23 – Confidence Interval for Difference Between Means Hypothetical Data Various Outcomes to Arterial Stent Placement Outcome Observed frequency (O) Expected frequency (E) Rejected 15 7 1 – 100 days 75 60 > 100 days 118 156 Replaced 20 5 Total 228 228 Our observed frequencies come from data on 228 patients who receive the treatment. Our expected frequencies may come from theoretical models or from estimates of probabilities derived from some larger reference population. Our null hypothesis is that the observed frequencies do not differ from the expected frequencies by more than is expected than chance. Or: H0: Our sample comes from some specified reference population. To test the null hypothesis, we may use either of two test statistics. Pearson X-squared statistic X2 (O E ) 2 E All cells Likelihood ratio statistic L2 2 O O ln E All cells Both of these test statistics follow a theoretical χ²-distribution. They are typically, (though not necessarily always), close in value to each other. Note that in the former case the test statistic is denoted X2. This should be called "ex-squared". It is not the same as the theoretical distribution, χ² (chi-squared). Most textbooks mistakenly call the test statistic (X2) "chi-squared." That is, the name "chi-squared" test comes from the distribution used to test the hypothesis (χ² distribution), and not the test statistic itself. Statistics 312 – Dr. Uebersax 23 – Confidence Interval for Difference Between Means We perform our test by computing X2 . Our calculations for the example data are shown below: Hypothetical Data Various Outcomes to Arterial Stent Placement Outcome Observed frequency (O) Expected frequency (E) (O – E)2 (O E ) 2 E Rejected 15 7 64 9.14 1 – 100 days 75 60 225 3.75 > 100 days 118 156 1444 9.26 Replaced 20 5 225 45 Total 228 228 Sum = X2 = 67.15 The area of the χ² distribution (with 4 – 1 = 3 df) above 67.15 is vanishingly small (p = 1.73922E-14). Even assuming a low α (e.g., α = 0.001) then p < α, so we reject the H0 which asserted that our data came from the reference population. That is, our sample comes from some other population, with probabilities of each level that are different from the reference population. We can check our results here: http://vassarstats.net/csfit.html Homework 24 Work 9.29 (a) and (c) using Excel, as in Figure 1 above and class demonstration. Use data = Gasmile.xls. (Hint. First do problem in JMP to find correct results). Print results (or check with me for alternative). Read: http://onlinestatbook.com/2/chi_square/distribution.html http://onlinestatbook.com/2/chi_square/one-way.html http://onlinestatbook.com/2/chi_square/contingency.html