Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stats II, Lecture Notes, page 54 HYPOTHESIS TESTING CONTINUED TESTS FOR DIFFERENCES The topic in a nutshell. In statistical work, we’re often interested in the proposition “A is like B,” where A and B are groups (populations), which differ in some observable and indisputable way, and we wish to know whether they are alike or different in some way which is subject to variation. Examples abound. Marketers (and politicians) often wish to know whether young people and old people buy the same or different products. Medical researchers are often concerned with whether different possible treatments provide different cure rates. Production engineers often wish to know whether one production line layout is more effective than another. Purchasing agents (and consumer advocates) may want to know whether a product from one manufacturer is more durable than a competitor’s product. And so on. The basic features in all these cases are the same: 1) the characteristic of interest is subject to variation, and so to sampling variation; 2) we can only measure sample values from the relevant populations; 3) because of variation, having different sample means from the two populations is not conclusive proof that the population means are different. Thus, deciding whether A is like B depends on the probability of getting the sample means we got from each population on the assumption that the population means are equal. To be concrete, let’s imagine a simple case: suppose we wish to determine whether Western Carolina students and ASU students spend different amounts on entertainment each month. Sensibly, the only evidence we have must come from samples, and here again variation creates problems for us in interpreting the results of our samples. Suppose, for the sake of the argument, that the population standard deviations for the two schools are known and are known to be equal: let’s suppose σ = $20 and that we will choose from each school a sample of 25 students and ask them to record carefully their entertainment expenditures for one month. By the principles we’ve seen several times, the standard error of the mean for each sample will be σ/√n = $20/5 = $4. Suppose now that we take our samples and calculate a sample mean from each school and that x̄A = $100 while x̄W = $120. Can we conclude that Western Carolina students spend more money on entertainment? Not necessarily. Remember that sample means are not in general equal to the population mean we’re trying to estimate but rather vary around the population mean. Therefore, it is possible that the population mean is the same at both schools and that we just happened to get a smallish sample mean at ASU and a largeish sample mean at WCU. We can never absolutely prove whether that is the case or not; all we can do is to ask: suppose the two population means are equal – how probable then is the pair of sample means we got. That is a question we can answer. Again for the sake of the argument, suppose that µA = µW = $110. (Suppose, that is, that A is like B.) Then the probability of a sample mean as small as $100 is only about 1% because $100 is 2.5 standard errors below the population mean; similarly the probability of a sample mean as large as $120 is only about 1%, and the joint probability of the two samples is about .01 × .01 = .0001 or about 1 in 10,000. Stats II, Lecture Notes, page 55 That’s pretty unlikely, but we can think of an alternative situation which is less of a stretch to believe: suppose, in fact, that the two population means are not equal. If µA = $102, for example, then a sample mean of $100 isn’t particularly improbable; if µW = $122, a sample mean from WCU of $120 isn’t especially hard to believe. Thus, faced with these sample means, we may find it easier to believe that the population means are different than that we got two such unlikely sample means. In a way, that’s all there is to testing for differences between populations, but the devil is in the details. To conduct our tests, we must find the sampling distributions of differences between sample means. That is: if x̄A is a statistic and x̄W is a statistic, then the difference x̄A - x̄W is a statistic, and our first concern is to find the sampling distribution of this statistic. Finding some of the standard errors is a bit tricky, but we’ll find that we follow the same principles of sampling distributions we’ve already seen. Before we get into that, it should be easy to see one thing: the expected value of the difference x̄ A - x̄W is equal to zero, so we’ll be concerned with probabilities on distributions centered on zero, and our hypothesis tests will implicitly have the form H0: µA - µW = 0 vs. H1: µA - µW ≠ 0, although we may more often use the equivalent form H0: µA = µW vs. H1: µA ≠ µW . HYPOTHESIS TESTS FOR DIFFERENCES t TESTS TESTS FOR DIFFERENCES OF MEANS Ø t Tests for the Difference between Two Population Means: „ Independent Samples „ equal population standard deviations Independent samples: two distinct samples, comprising different objects drawn from different populations, and the samples are in no way dependent on one another. „ population standard deviations NOT known Hypotheses to be tested: H0: µ1 = µ2 H1: µ1 ≠ µ2 or H0: µ1 ≥ µ2 H1: µ1 < µ2 Stats II, Lecture Notes, page 56 The hypothesis test should be conducted with the t distribution whenever: „ population standard deviations are not known „ population standard deviations can be assumed to be equal „ We require also w either both populations are normally distributed or w n ≥ 30 Ø Under these circumstances the sampling distribution of x̄1 - x̄2 is given by the following: „ the distribution is a t distribution „ E(x̄1 - x̄2) = 0 1 1 „ sx1 − x2 = s p × + where sp is the pooled standard deviation and is given by n1 n2 (1) s p = (n1 − 1) × s12 + ( n2 − 1) × s22 n1 + n2 − 2 n1 is the sample size for the first sample, n2 the sample size for the sample from the second population and the s2’s are the respective sample variances Ø calculated t statistic will be ( x − x2 ) − ( µ1 − µ2 ) t= 1 s x1 − x2 since, as a rule, the hypothesis is µ1 = µ2, the last term in the numerator = 0, and we often see: (x − x 2 ) (2) t = 1 s x1 − x 2 this is compared to a critical t value with n1 + n2 - 2 degrees of freedom or used to find the p-value of the test „ definition of sp: look carefully: it is really just a weighted average of the two sample standard deviations represents the best estimate we can give of the unknown, but assumed equal, population standard deviations Stats II, Lecture Notes, page 57 Example: We wish to determine whether imposing a co-payment will reduce medical insurance claims. At ART, Inc. a sample of 37 workers whose medical insurance carries a 20% co-payment had mean claims last year of $250 with standard deviation of $100; a sample of 28 workers who had no co-payment requirement had mean claims of $400 with standard deviation of $120. From long experience, we know that medical claims are approximately normally distributed, and we believe that the standard deviations will be equal for these two populations. Use these data to test whether co-payments reduce claims. 1. State hypotheses H0: µ1 ≥ µ2 H1: µ1 < µ2 2. Select the test statistic/identify the sampling distribution: here it’s a t, calculated by the formula above, with d.f. = n1 + n2 − 2 = 37 + 28 − 2 = 63 3. Select α and find the critical value of t. Let’s take α = 0.05; then tC = - 1.669 4. Draw sample, calculate test statistic, etc. sp = (n1 − 1) × s12 + ( n2 − 1) × s22 = n1 + n2 − 2 ( 37 − 1) × 100 2 + ( 28 − 1) × 120 2 37 + 28 − 2 =109.02 sx1 − x2 = s p × 1 1 1 1 + = 109.02 × + = 109.02 × 0.25 = 27.31 n1 n2 37 28 and the statistic to be calculated is ( x − x2 ) ($250 − $400) t= 1 = = −5.49 s x1 − x2 27.31 Since -5.49 < -1.669 we can reject the null hypothesis Alternatively TDIST(5.49, 63, 1) = 3.8 × 10−7 < 0.05 ⇒ reject H0 5. Conclude that imposing co-payments reduces medical care usage and insurance claims In such problems, it is often helpful to begin by extracting the data from the problem and writing it down. Here, we have n1 = 37; x̄1 = $250; s1 = $100 n2 = 28; x̄2 = $400; s2 = $120 Excel Spreadsheet Tests for Differences of Two Population Means Three ways to skin a cat: Ø Calculate t as above and use TDIST to find the p-value OR find the critical t and compare to calculated value Stats II, Lecture Notes, page 58 Ø Spreadsheet formula: TTEST(Data Range 1, Data Range 2, Tails, Type): result is the p-value of the test „ The Data Ranges are the ranges that contain the data for your two samples „ Tails = 1 or 2. Note that if Tails = 1, the spreadsheet always performs an upper one tail test, taking x̄1 as the larger of the two sample means „ Type = 1 or 2 or 3 w if Type = 1, the spreadsheet does a paired difference test, which will be described in the next section w if Type = 2, the spreadsheet does a pooled variance test of the type described above w if Type = 2, the spreadsheet does a t test assuming that the population standard deviations are not equal Ø Tools-Data Analysis t-Test: Two Sample Assuming Equal Variances Output: t-Test: Two-Sample Assuming Equal Variances Sample means: Sample Variances Sample Sizes Equation (1) squared equation (2) p-value for one tail test p-value for two tail test Mean Variance Observations Pooled Variance Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 4.666667 4.666667 6 6.350694 0 Variable 2 5.125 7.553571 8 12 -0.33677 0.371055 1.782287 0.74211 2.178813 Notice that Excel reports the calculated t statistic; it also reports the p-values for one and two-tailed tests, as well as the critical values for α as entered in the dialog box; the default is 0.05 The first column in the table does not appear in the Excel output; I added for reference purposes. Comment: A Normal Distribution test for the Difference between Two Means? An odd case, rarely encountered. In older practice, z was used with large samples and unknown σ, appealing to the Central Limit Theorem, especially if it could not be assumed that σ1 = σ2. Today, most would choose to use a t test with different population variances. See above for procedures with TTEST or Analysis Tools. MORE ON HYPOTHESIS TESTING Stats II, Lecture Notes, page 59 Paired Differences Ø Hypothesis tests when samples are not independent If samples are not independent, in the statistical sense, comparison of sample means is inappropriate. Non-independence usually involves one of two cases: „ The two “samples” are in fact the same set of objects, but have been subjected to different treatments on different occasions Example: Two routes are possible from Boone to Elizabethton. A trucking company sends six of its drives over the route through Newland on Tuesday and on Wednesday sends them over the route around Watauga Lake. The time for each driver over each route is recorded. „ the first sample is drawn randomly, but the second is drawn to match the characteristics of the first sample Example: We wish to know whether a business degree, in and of itself, increases income. From our student records we draw a sample of BSBA holders; then we draw a second sample in which each BSBA holder is matched to a BA graduate who has the same GPA, same number of extracurricular activities, same sex, and so on, and who has worked in the same industry for the same length of time. PURPOSE: to control the variation in everything but the one characteristic of interest – whatever variation is left must be due to variation in that characteristic. Ø In this case we calculate the difference for each pair of observations and test whether the mean difference is equal to zero. We’re interested in the sampling distribution of the average difference d . The sampling distribution of this quantity is given by the following: it is a t distribution with n - 1 degrees of freedom, where n is the number of pairs of observations it has E(d ) = 0 s the standard error is given by sd = d where sd is the sample standard deviation of n Σ(d − d ) 2 n −1 So, in effect, we take each pair of observations, calculate the difference and this set of differences become the data used to test the null hypothesis H0: µ = 0 or the equivalent one-tailed test as appropriate the differences, that is, sd = Stats II, Lecture Notes, page 60 Example: Whitney has invented a “smart pill.” He asks 9 of his classmates to prepare for the second stats exam just as they prepared for the first, but he administers a smart pill to each a half hour before the exam. The results are: Student # Score 1st Exam Score 2nd Exam Difference 1 82 87 5 2 43 55 12 3 71 69 -2 4 68 75 7 5 66 69 3 6 91 95 4 7 77 74 -3 8 58 68 10 9 75 78 3 The mean score on the second exam for all students who did not take smart pills was the same as the mean score on the first exam. Did Whitney’s pill work? Solution: First calculate for each student the difference between his score on the first exam and on the second exam. This yields the set of differences given in the fourth column. Notice that we must always subtract in the same direction and preserve any minus signs in our calculations. Doing our five steps we have 1. H0: µd ≤ 0 vs. H1: µd > 0. 2. the appropriate test is a t with n - 1 d.f. Here n = 9, that is, there are nine d − µ0 differences. The test statistic is t = . Since the hypothesis is that µd = sd 0, this is more commonly seen as t = d . sd 3. Let’s take α = 0.01; then tC = +2.896. Note carefully: there are 8 d.f. 4. Calculations: d = (5+12-2+7+3+4-3+10+3)/9 = 4.33; sd = 4.95 and the standard error sd = 4.95/3 = 1.65. Accordingly t = 4.33/1.65 = 2.626; TDIST(2.626, 8,1) = 0.015183 and we will fail to reject H0. (Alternatively: 2.626 < 2.896) 5. Conclude that there is no strong evidence that Whitney’s pill works. Example: We are interested in whether, other things being equal, women’s salaries are equal to men’s. This is not a simple question, since, on average, women differ from men in many important employment characteristics. So, we choose a random sample of 81 men; for each man chosen we then choose a woman who has the same education level, age and number of years experience and who works for a similar-sized company. The men’s salaries are paired with the corresponding women’s and the paired differences are recorded. A portion of the data looks like this: Pair Man’s Salary Woman’s Salary Difference 1 2500 2230 270 Stats II, Lecture Notes, page 61 2 3000 3200 -200 3 3500 3300 200 For all pairs the average difference is $110 with a standard deviation of differences of $90. The distributions both appear to be somewhat skewed. Can we conclude that men’s and women’s salaries are different? Solution: 1. H0: µd = 0 vs. H1: µ d ≠ 0 2. t test with 81 - 1 = 80 d.f. 3. for α = 0.05, tC = ± 1.990 4. sd = 90/√81 = 10, so t = $110/10 = 11; we can decisively reject H0. TDIST(11, 80, 2) = 10−17 5. Are there implications for legislation or policy? Excel Spreadsheet Procedures Ø Use TDIST to calculate p-value or TINV to find critical values Ø Use TTEST(DATA RANGE 1, DATA RANGE 2, 1, 1) The 1 in the Type position specifies that it is a paired difference test Ø Use Data Analysis—t Test Paired Two Sample for Means The output is very similar to that discussed above, except that it includes a measure of the correlation between the two sets of data; we can ignore this for now.