Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Inductive probability wikipedia , lookup
History of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
Analysis of variance wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Student's t-test wikipedia , lookup
Module III Lecture 4 Test of Hypotheses For More Than Two Independent Samples Suppose that you recently read an article that indicated that the tempo of music can affect a consumer’s behavior. In particular, it is conjectured that the faster the tempo (speed) of the music, the more likely a consumer is to make a purchase. Since you are a V.P at George Giant Food Stores, you pick a random sample of 15 stores. You divide the stores into three set of five. At five stores you play no music. At five other stores you play pleasant but slow music. At the remaining five stores you play fast light music. For randomly chosen days, you measure the volume of sales (in purchases not dollars) at the 15 stores. The resulting data is given below: Daily Supermarket Purchases Music Slow Fast None 14,140 13,991 14,772 13,266 14,040 15,029 14,656 16,029 14,783 14,700 12,874 13,165 13,140 11,245 12,400 Is there evidence that the tempo of the music has an effect? This is an example of a situation where we are interested in whether or not groups have different means but we have more than two groups. One approach is to just look at the groups in pairs. That is, use the results of the previous lecture and compare the groups two at a time. In our case we would look at the No Music Group versus the Slow Music Group, the No Music Group versus the Fast Music Group, and the Fast Music Group versus the Slow Music Group. The results are shown below: t-Test: Two-Sample Assuming Unequal VariancesSlow vs None Variable 1 Variable 2 Mean 14041.8 12564.8 Variance 286821.2 638932.7 Observations 5 5 Hypothesized Mean Difference 0 df 7 t Stat 3.432557 P(T<=t) one-tail 0.005474 t Critical one-tail 1.894578 P(T<=t) two-tail 0.010947 t Critical two-tail 2.364623 Using an alpha level of .05, this would indicate that there is a significant difference between sales in stores that play no music and stores that play fast music. When comparing the No Music Group and the Fast Music Group, one obtains: t-Test: Two-Sample Assuming Unequal Variances Fast vs None Variable 1 Variable 2 Mean 15039.4 12564.8 Variance 326836.3 638932.7 Observations 5 5 Hypothesized Mean Difference 0 df 7 t Stat 5.630583 P(T<=t) one-tail 0.000395 t Critical one-tail 1.894578 P(T<=t) two-tail 0.00079 t Critical two-tail 2.364623 This indicates at the .05 alpha level, that there is a difference between the No Music Group and the Fast Music Group. Finally, we compare the Slow Music Group and the Fast Music Group obtaining the output: t-Test: Two-Sample Assuming Unequal Variances Slow vs Fast Variable 1 Variable 2 Mean 14041.8 15039.4 Variance 286821.2 326836.3 Observations 5 5 Hypothesized Mean Difference 0 df 8 t Stat -2.8476 P(T<=t) one-tail 0.010779 t Critical one-tail 1.859548 P(T<=t) two-tail 0.021559 t Critical two-tail 2.306006 Again, at the .05 level, this indicates that there is a difference between the Fast Group and the Slow Group. Unfortunately, things are not quite so simple. As the number of groups increases, the number of groups goes up quite rapidly. If there are k groups, the number of pair wise comparisons which must be made is: k( k 1 ) / 2 This number increases quite quickly as the table below shows: k Pairwise Comparisons 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 3 6 10 15 21 28 36 45 55 66 78 91 105 Now recall that in the logic of statistical testing, there is an 100 % chance that we will say that there is a difference when in fact there is none. Consider performing two independent tests under circumstances where there are no differences. The correct conclusion is that both tests should accept the null hypothesis. The probability of this happening is ( 1 )2 Therefore the probability of making at least one error, that is rejecting either one or both of the true null hypotheses, would be: 1 ( 1 )2 If = .05, this probability is 1 – (.95)2 = .0975. So if we do two tests at alpha level .05, we have almost a 1 in ten chance of rejecting a at least one true null hypothesis. Let E be the probability of rejecting at least one true null hypothesis when we make k(k-1)/2 independent tests, each at level . Specifically, E is the probability of rejecting the null hypothesis in the following situation: H0: 1 = 2 = . . . . = k HA: at least one difference i - j different from zero While represents the probability of rejecting one of the k(k-1)/2 hypotheses of the form: H0 : i j H A : i j Then from the basic rules of probability, one has that: E 1 ( 1 )k ( k 1 ) / 2 Although the pair wise tests are not independent, the following table gives an indication of how the probability of at least one error is related to the number of groups, k, if we apply the previous results: Probability of Number of Groups at Least One Error k alpha = .05 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.0500 0.1426 0.2649 0.4013 0.5367 0.6594 0.7622 0.8422 0.9006 0.9405 0.9661 0.9817 0.9906 0.9954 Thus, even with three groups, as in our example, we have an unacceptably high probability of rejecting at least one true null hypothesis. One way around this problem is to use a result due to the Italian probabilist, Bonferroni. He showed that for m comparisons, whether independent or dependent, the following relationship always holds: E m This implies that if we pick E m we could control the overall error of rejecting at least one true null hypothesis. The procedure is as follows: 1) Pick your overall alpha level of rejecting at least one true null hypothesis, I will typically choose E = .05 ; 2) If I have k groups and will be comparing the means of the groups two at a time use: (since m = k (k-1)/2 in this case). 2(.05 ) k( k 1 ) In our case we have k = 3 and we are comparing each group to each of the others. I would therefore use: 2(.05 ) .01667 3( 3 1 ) Recalling the p-values of the pair wise t-tests, the table below shows the results of testing the various pair wise hypotheses using the Bonferroni approach which gives us only a .05 chance of making any pair wise error: Comparison Two-Sided p-value None vs Slow None vs Fast Slow vs Fast 0.0109 0.0008 0.0216 Any p-value less than .01667 would be declared significant. In this case the None vs Slow and None vs Fast comparisons are significant. This seems to support the argument that music makes a difference but there is no difference between slow and fast music. For the pair wise differences which show significance using the Bonferroni approach, a confidence interval on the difference in the means is given by the equation: ( x i x j ) t / 2 2 s i2 s j ni n j where the t value is chosen based on the computed degrees of freedom given earlier. First I compute the mean and standard deviation of each group using the usual EXCEL functions “average” and “stdev”. This results in the table below: Daily Supermarket Sales Music Slow 14,140 13,991 14,772 13,266 14,040 Average Stan Dev Size Fast 15,029 14,656 16,029 14,783 14,700 None 12,874 13,165 13,140 11,245 12,400 14,041.80 15,039.40 12,564.80 535.56 571.70 799.33 5 5 5 Then I use the template from the EXCEL file “twosamp.xls” . For the Slow vs No Music comparison, the confidence interval is: Template for Confidence Interval on Positive Difference in Means mean sd n Approx Degrees of Freedom = Slow Music No Music 14041.8000 535.5600 5 12564.8000 799.3300 5 Alpha 0.01667 7 Confidence Interval on Positive Difference in Means 131.3042 to 2822.6958 The comparison of Fast Music and No Music is given as: Template for Confidence Interval on Positive Difference in Means mean sd n Fast Music No Music 15039.4000 571.7000 5 12564.8000 799.3300 5 Approx Degrees of Freedom = Alpha 0.01667 7 Confidence Interval on Positive Difference in Means 1100.1275 to 3849.0725 Finally, even though not significant, the comparison between Fast and Slow Music is: Template for Confidence Interval on Positive Difference in Means mean sd n Approx Degrees of Freedom = Fast Music Slow Music 15039.4000 571.7000 5 14041.8000 535.5600 5 Alpha 0.01667 8 Confidence Interval on Positive Difference in Means -58.8741 Notice that the confidence interval contains the value of zero. to 2054.0741 Sometimes when one has performed an analysis like the one above, you arrive at a set of seemingly conflicting conclusions. For example you might accept the hypothesis that A = B. You might also accept the hypothesis that B = C. And then reject the hypothesis A = C !!! This seems to violate logic, but it violates mathematical logic not statistical logic. Consider the three probability distributions below: Comparison of Three Groups 0.5 freq 0.4 A B C 0.3 0.2 0.1 6.9 6 5.1 4.2 3.3 2.4 1.5 0.6 -0.3 -1.2 -2.1 -3 0 x Clearly Groups A and B overlap quite a bit. Groups B and C also overlap quite a bit. But Groups A and C overlap very little. With the picture in mind, what the statistical results are saying is that the data does not provide enough evidence to say that Groups A and B are different. Similarly, the data does not provide enough evidence to say that Groups A and C are different, but the data does provide enough evidence to indicate that Groups A and C are different. The seeming inconsistency then disappears. The Analysis of Variance The Bonferroni procedure described in the previous section is the most general approach to the multiple sample analysis of group differences in that it makes few assumptions on the data. If one is willing to make more assumptions about the data, other methods exist for analyzing the data which if the assumptions are valid is more “powerful” than the Bonferroni procedure. By more powerful we mean that they have a higher chance of rejecting the null hypothesis when it is false. The most used of these alternative procedures is called the “Analysis of Variance”. It is used in exactly the same situation as the Bonferroni method except that it only applies when the standard deviations in each of the groups can be assumed to be the same! Examining our data below, we see that although the standard deviations in the Fast Music and Slow Music groups are approximate the same, the standard deviation of the No Music group is approximately 60% higher. Average Stan Dev Size Music Slow 14,140 13,991 14,772 13,266 14,040 Fast 15,029 14,656 16,029 14,783 14,700 None 12,874 13,165 13,140 11,245 12,400 14,042 535.56 5 15,039 571.70 5 12,565 799.33 5 It is somewhat debatable as to whether this method can be used on this data, but we will use it to illustrate the procedure. The basic data structure of this problem is given below: Group 1 Group 2 . . . Group k x11 x21 . . . xk1 x12 x22 . . . xk2 . . . . . . . . . . x1n1 . . . . . xknk x2n2 nk Size n1 n2 Mean x1 x2 . . . xk Standard Deviation s1 s2 . . . sk The basic hypothesis being tested is: H0: 1 = 2 = . . . . = k HA: at least one difference i - j different from zero The basis of the Analysis of Variance Method (ANOVA) is the fact that one can estimate the assumed common variance in two ways. For each group we can form an estimate of the standard deviation by using the formula: ni si ( x j 1 ij x i )2 ni 1 where, ni x i x ij / ni j 1 Now since all groups are assumed to have the same variance, we can pool all k estimates of the sample variance to get one estimate of the common variance as: k ̂ W2 (n i 1 i 1 )s i2 k n i 1 i k We will call this the “Within Group” estimate of the common variance since it is based on the deviations, within each group, of the observations from the group mean. The second estimate of the common variance is based on the central limit theorem. Remember that: Var( x ) 2 n Since we have a mean for each group, it should be possible to estimate the variance by looking at the variability of the group means “between” the groups. One can show theoretically that: k ˆ B2 n ( x i 1 i i x )2 k1 Where, k x ni x i / n i 1 One can also show that if the null hypothesis that all the groups come from populations with the same mean is true, then E ( ˆ W2 ) E ( ˆ B2 ) 2 On the other hand, if the null hypothesis is false , then: E ( ˆ W2 ) 2 k E ( ˆ B2 ) 2 n ( i 1 i i )2 k 1 where, k k i 1 i 1 ni i / ni Define, Fobs ˆ B2 ˆ W2 If the null hypothesis is true, then this F-ratio should be close to one. On the other hand if the null hypothesis is false, then this F-ratio should be shifted upward by an amount that increases as the means of the groups differ more from one another. This forms the basis of the so called “F-Test” of the null hypothesis that all the groups have the same mean as with the alternative hypothesis that at least one pair of groups have means that differ (actually the alternative is a bit more complicated, but in practice the above alternative hypothesis will suffice). The procedure for testing the basic null hypothesis is: a) Pick E (usually .05); b) Compute Fobs and find its one-sided p-value; c) Reject the null hypothesis of the one-sided p-value is less than E, otherwise accept the null hypothesis that all groups have the same mean. Fortunately, almost all of the proceeding computations are done automatically in EXCEL. To perform the analysis, first label each column (group) in the cell immediately above the first number in the group. Then click on “Tools”, “Data Analysis”, and then “ANOVA: Single Factor”. You screen should look something like: The data (including the group labels) is in cells B6:D11. Be sure to check the box “Labels in First Row” and finally enter alpha. Then hit “OK”. The output will look something like: Anova: Single Factor SUMMARY Groups Slow Fast None Count 5 5 5 Sum 70209 75197 62824 Average 14041.8 15039.4 12564.8 Variance 286821.2 326836.3 638932.7 ANOVA Source of Variation SS Between Groups 15500633 Within Groups 5010361 df 2 12 MS 7750317 417530.1 F 18.5623 Total 14 20510994 P-value 0.000212468 F crit 3.88529 I have highlighted the important variables: ˆ B2 7 ,750 ,317 ˆ W2 417530.1 Fobs 18.5623 The one-sided p-value is .000212. Since this is much smaller than .05, we reject the null hypothesis that all the groups have the same mean in favor of the alternative that at least one pair of groups have different means. Since we suspect that there are differences between the groups, we need a procedure to find out which group mean differ from which. A confidence interval for the difference in means between groups i and j is given by: ( x i x j ) t / 2 ˆ W2 ni ˆ W2 nj i j ( x i x j ) t / 2 ˆ W2 ni ˆ W2 nj using the same value of alpha that we had used to perform the F-test. The degrees of freedom for the t distribution is given in the EXCEL ANOVA output in the row labeled “Within Groups”. In our case this value for the degrees of freedom is shown below: ANOVA Source of Variation SS Between Groups 15500633 Within Groups 5010361 df 2 12 Total 14 20510994 MS 7750317 417530.1 F 18.5623 P-value 0.000212468 F crit 3.88529 In our case = .05, so the appropriate value of t with 12 degrees of freedom is given by the EXCEL function “tinv” as: t .025 tinv(.05 ,12 ) 2.178813 Further, in our case all the groups are the same size so we only have to compute the +/- limits once as: ( x i x j ) 2.178813 417530.1 417530.1 5 5 which gives: ( x i x j ) 890.42 This is all summarized in the table below: Difference in Means Confidence Interval Conslusion Fast vs Slow 997.60 107.18 to 1,888.02 Significant Slow vs None 1477.00 586.58 to 2367.42 Significant Fast vs None 2474.60 1584.18 to 3365.02 Significant The ANOVA analysis thus indicates that all groups are different from one another leading us to maximize sales by playing fast music with numbers of sales increasing by between 1,584 to 3,365 per day. These results differ from the Bonferroni approach only in the comparison of the Fast vs Slow group. By making an assumption that seems dubious for the data, the analysis has been changed. In this case I would stay with the Bonferroni analysis. Further study of the problem might indicate whether or not the difference between Fast and Slow Music is real. The Binomial Distribution with Multiple Groups The following data was collected by the marketing department at your company: Effect of Advertising on Random Sample of Consumers No Ads TV Paper TV and Paper Total Buy Didn't buy Total 15 25 12 30 85 90 53 90 100 115 65 120 82 318 400 Here we have four forms of advertising (groups) and a binomial response of "buy" or "didn’t buy" for each group. We are interested in whether or not the probability of buying or not buying differs based on the form of advertising seen by the consumer. Before performing a formal analysis, I computed the probability of buying or not buying for the four groups with the results shown below: Prob Buy No Ads TV Paper TV and Paper Total Prob Didn't buy 0.150 0.217 0.185 0.250 0.850 0.783 0.815 0.750 0.205 0.795 The results look encouraging since any form of advertising is associated with a higher probability of buying. Further it looks like advertising both on TV and in the newspaper is the most effective technique. Before rushing to judgment however, we need to apply statistical methods to see if these effects justify major advertising expenditures. The structure of the data in this situation is: Successes Failures Total Estimate Group 1 Group 2 . . . . . . Group k x1 x2 . . . xk n2 – x2 ______ . . . . . . n2 . . . p̂2 x 2 / n2 . . . n1 – x1 _______ n1 p̂1 x 1 / n1 nk – xk _______ nk p̂ k x k / nk The null hypothesis being tested is: H0 : p1 = p2 = . . . pk = p HA: at least one pair pi and pj not equal Under the null hypothesis, all the groups have the same probability of success, so the natural estimate of the common value of p is: k p̂ x i n i i 1 k i 1 Then for each group we can find the expected number of successes and expected number of failures by the following formulae: Expected Successes in Group i if Null Hypothesis is true = ni p̂ Expected Failures in Group i if Null Hypothesis is true = ni ( 1 p̂ ) Now since ni is the total for group i (i.e. the column total), and p̂ is the total number of successes divided by the grand total of all observations, we arrive at the fact that just as in the two sample case: EXPij = (Total for Column i) x (Total for Row j) / (Grand Total) Now if ni p̂ 5 for all i, then we can use the Chi-Square distribution with (2 –1) x (k – 1) degrees of freedom to test the null hypothesis. Specifically, we would compute the test statistic: k 2 obs i 1 j ( OBS ij EXPij ) 2 EXPij Then we compute the one-sided p-value using the EXCEL function "chidist" as: 2 pvalue chidist( obs ,k 1 ) If the one-sided p-value is less than we reject the null hypothesis. Otherwise we accept the null hypothesis. If the null hypothesis is rejected, we again examine the contributions to the chi-square statistic for values of magnitude greater than 3.5 as the cells providing the greatest deviation of observed values from expectation. For our data let us work with alpha = .05. We begin with the original table: Observed No Ads TV Paper TV and Paper Total Buy Didn't buy Total 15 25 12 30 85 90 53 90 100 115 65 120 82 318 400 Next we compute our Expected values as shown below: Computation of Expected Entries No Ads TV Paper TV and Paper Total Buy Didn't buy Total 100 x 82 / 400 115 x 82 / 400 65 x 82 / 400 120 x 82 / 400 100 x 318 / 400 115 x 318 / 400 65 x 318 / 400 120 x 318 / 400 100 115 65 120 82 318 400 The resultant computation gives the expected table as: Expected No Ads TV Paper TV and Paper Total Buy Didn't buy Total 20.5 23.575 13.325 24.6 79.5 91.425 51.675 95.4 100 115 65 120 82 318 400 Next we compute for each cell the square difference between the observed and expected values, divided by the expected value. This gives the "Contributions" to chi-square values given below: Contributions to Chi-Square Buy No Ads TV Paper TV and Paper Didn't buy 1.47561 0.380503 0.086135 0.022211 0.131754 0.033974 1.185366 0.30566 Observed Chi-Square = 3.621213 The one-sided p-value is computed as: one-sided p-value = chidist(3.621213, 3) = .305378. Since this value exceeds .05, we would accept the null hypothesis that the probability of purchasing does not change from group to group. That is, the data is insufficient to support what seemed to be a clear pattern in the data. Multiple Group Structural Hypotheses Assume that you are responsible for choosing the health care plan for your company. You wish to choose a plan with provides excellent health care for your employees but at the same time minimizing the cost to the firm. In studying various health plans, you note that the percentage of a hospital bill covered by the plan is one of your major choices. In discussing this issue with your colleagues, one of them points out an article that showed how the length of the hospital stay varied with the percentage of the hospital bill paid for a random sample of 311 patients. The article claims that the greater the percentage of the hospital bill paid by insurance coverage, the longer patients stayed in the hospital. They presented the following data: Company Paid Hospital Coverage <5 Hospital Stay in Days 6 to 10 11 to 15 >15 Total <25% 25-50% 51-75% >75% 26 21 25 11 30 30 25 32 6 11 45 17 5 7 9 11 67 69 104 71 Total 83 117 79 32 311 Does this data indicate that the pattern of hospital stay changes depending on the coverage plan? Does it indicate that the greater the percentage of the hospital bill covered by the health plan, the longer patients tend to stay in the hospital? The basic question is whether or not the proportionate distribution of patients by length of stay is the same for the four hospital coverage categories. The basic structure of the problem is: Category 1 Category 2 . . . Category c Group 1 p11 p12 . . . p1c Group 2 p21 p22 . . . p2c . . . . . . . . . . . . . . pk2 . . . pkc Group k pk1 The hypothesis testing situation is: H0 : p1j = p2j = . . . . = pkj = pj, for j = 1, 2, . . . , c HA: at least one pair pij not equal to pmj for some j All the null hypothesis says is that the probability of falling in a category is the same for all groups. The alternative hypothesis indicates that there are at least two groups where the probability of falling in a category differ. Let me illustrate this for our data, our original data gives the following probabilities of length of stay by percentage coverage: Company Paid Hospital Coverage <5 Hospital Stay in Days 6 to 10 11 to 15 >15 Total <25% 25-50% 51-75% >75% 38.81% 30.43% 24.04% 15.49% 44.78% 43.48% 24.04% 45.07% 8.96% 15.94% 43.27% 23.94% 7.46% 10.14% 8.65% 15.49% 100.00% 100.00% 100.00% 100.00% Total 26.69% 37.62% 25.40% 10.29% 100.00% Notice that in the less than 5 days stay column, the percentage varies from 38.81% to 15.49%. Overall 26.69% of all people had stays less than 5 days. Is the variability in this column consistent with chance or does it indicate that it is more likely to be changing in some way associated with the hospital coverage? The hypothesis simultaneously asks this question for the all the columns. The basic structure of the observed data is given below: Category 1 Group 1 Group 2 . . Group k Column Total x11 Category . 2 . x12 . . . . . Category c . . x1c Row Total | x1+ | x21 xp22 . . . x2c | x2+ | . . . . . . | . | . . . . . . | . | xk1 xk2 . . . xkc | xk+ _______________________________________________________ | x+1 x+2 . . . x+c | x++ To get the expected values under the null hypothesis consider the entry for category j and group i. Since under the null hypothesis all the groups have the same chance of falling into category j, the estimate of the expected number would be just the number in group i times the percentage of all the data in category j. Symbolically the formula would be: EXPij ( x i )( x j ) x which is the same as the formula we have used before of: EXPij = (Total for Row i) x (Total for Column j) / (Grand Total) Then we would compute our chi-square statistic with (k – 1) x (c – 1) degrees of freedom as: 2 obs i j ( OBS ij EXPij ) 2 EXPij We would then use EXCEL to compute the one-sided p-value using the function "chidist", in the format 2 p value chidist( obs , ( k 1 )( c 1 ) If the p-value is less than then we reject the null hypothesis and look for cells with contributions to the chi-square statistic with values higher than 3.5. If the p-value is greater than we accept the null hypothesis. Let us work at the .01 level of alpha. The observed table is: Observed Company Paid Hospital Coverage <5 Hospital Stay in Days 6 to 10 11 to 15 >15 Total <25% 25-50% 51-75% >75% 26 21 25 11 30 30 25 32 6 11 45 17 5 7 9 11 67 69 104 71 Total 83 117 79 32 311 I then compute the expected values to obtain the expected table of: Company Paid Hospital Coverage <25% 25-50% 51-75% >75% Total Expected Hospital Stay in Days <5 6 to 10 11 to 15 >15 Total 17.88 18.41 27.76 18.95 25.21 25.96 39.13 26.71 17.02 17.53 26.42 18.04 6.89 7.10 10.70 7.31 67 69 104 71 83 117 79 32 311 This was computed using the formula on the previous page. For example the entry in the Group "<25%" and Hospital Stay "<5" was computed as: (67) x (83) / 311 = 17.88 . The contributions to chi-square and the observed chi-square statistic are then given by: Contributions to Chi-Square Company Paid Hospital Coverage <5 <25% 25-50% 51-75% >75% 3.69 0.36 0.27 3.33 Hospital Stay in Days 6 to 10 11 to 15 >15 0.91 0.63 5.10 1.05 7.13 2.43 13.07 0.06 0.52 0.00 0.27 1.87 Chisquare = 40.70 The degrees of freedom in this problem are (4 –1) x (4 –1) = 9. Therefore, the one-sided p-value is given by: p-value = chidist( 40.70, 9) = .00000567. Since this value is much lower than .01, we reject the null hypothesis, and say that the distribution of the length of stay is changes by insurance coverage. Since we have rejected the null hypothesis, I have highlighted below the cells that seem to show the greatest deviation between the observed and expected values: Contributions to Chi-Square Company Paid Hospital Coverage <5 <25% 25-50% 51-75% >75% Hospital Stay in Days 6 to 10 11 to 15 >15 3.69 0.36 0.27 3.33 0.91 0.63 5.10 1.05 7.13 2.43 13.07 0.06 0.52 0.00 0.27 1.87 Chisquare = 40.70 By comparing the observed to the expected values, we arrive at the following structure: Company Paid Hospital Coverage <25% 25-50% 51-75% >75% Observed Compared to Expected Hospital Stay in Days <5 6 to 10 11 to 15 >15 Higher x x Lower x x Lower x Lower x Higher x x x x x The yellow highlighted cells seem to indicate that the lower the coverage, the quicker you will leave the hospital. In the "<25%" coverage category that is reinforced by the fact that the observed number in the "11 to 15" day stay period is lower than would be expected. Interestingly, in the "51-75%" coverage group, the lower than expected number in the "6 – 10" day category and the much higher than expected number in the "11 – 15" day category also tend to reinforce the argument that the hospital coverage is related to length of hospital stay with increasing coverage associated with increasingly long stays.