Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Sufficient statistic wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
History of statistics wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Using Simple Statistics in Assurance Services Statistics are useful for summarizing large amounts of data into a form that can be more easily interpreted. Accountants and managers can use statistical theory and statistics to develop expectations about accounting numbers (e.g., balances or transaction amounts) or behavior; the recorded numbers or behavior can then be compared to the expectations, again using statistics. Patterns and discrepancies that are extremely difficult to observe and identify with the naked eye are often easily discernable using relatively simple statistics. Sometimes statistics can be used to evaluate whether a discrepancy is large enough to investigate further, or whether it can be ignored. The descriptions of the tests below are brief and dense; they are meant to be a general guide only. Details on most of these tests can be found in any standard nonparametric statistics or business statistics book. We will illustrate most of the tests in class. You should apply them whenever they seem appropriate. A large part of using these techniques is identifying situations in which they can be applied. I. Change-point test Purpose. The change-point test is used to assess whether there has been an underlying change in the process that generates an ordered sequence of values. Assuming that the process has changed, the test also indicates at which point the process changed. Method. Say that we have a series of N observations. We first rank the N observations from 1 to N. (The version of the change-point that we are describing requires a continuous variable; there is a version of the test for binomial variables.) Define ri to be the rank associated with observation i. (If there are ties, assign to each of the tied observations the average of the ranks they would have if no ties had occurred. For example, if two observations are equal and are tied for ranks 5 and 6, assign them each the rank 5.5.) For each point j in the series, calculate Wj, which is the sum of the ranks through that point: j W j ri j 1,2,..., N 1 . i 1 For each of the Wj values, calculate the difference Wj – j(N+1)/2. The value of j for which the absolute value of this difference is the maximum is the estimated change point in the series, and is denoted m. The number of observations after the change point is denoted n, and is equal to N – m. (Note that the ranking procedure and subsequent calculations can easily be accomplished in excel.) We can calculate the expected value for W, E(W), under the null hypothesis that the process has not changed. (To employ the following procedures, either m or n must be greater than 10. For smaller sample sizes, special tables are available to test the significance of W.) If the observed value of W significantly differs from E(W), then we 1 will conclude that the process has changed. The expected value of the sum of the ranks is E(W) = m(N + 1)/2. The standard deviation of W is1 mn( N 1) W . 12 We can test the null hypothesis that the process has not changed by using the z statistic, as follows2: W h E (W ) W h m( N 1) / 2 z , W mn( N 1) / 12 where h = +.5 if W < m(N + 1)/2 and h = -.5 if W > m(N + 1)/2. This statistic, when the null hypothesis is true, is approximately normally distributed with mean 0 and standard deviation 1. (The factor h is a correction for continuity that improves the approximation to the normal distribution.) The significance of the observed value of z can be determined by referring to a standard normal distribution table. Example. A certain manufacturing process is considered to be out of control when the lengths of the parts produced start systematically to exceed 10. The accountant gathers lengths for 28 recent parts (listed in order of occurrence), as follows: 9.99, 9.88, 10.24, 9.87, 10.03, 10.01, 9.96, 9.91, 10.20, 10.02, 10.06, 10.00, 9.81, 9.79, 9.97, 10.22, 10.31, 9.94, 10.27, 9.90, 10.16, 9.93, 10.17, 10.29, 10.30, 10.33, 9.92, 10.25 In addition to determining whether the process is out of control, management also wants to determine when the process started to be out of control. This latter information is useful in planning when machines need to be adjusted. As a first step in determining whether the process is out of control, the accountant determines whether the process has changed, under the assumption that the process was in control at the start. If the process has changed, the accountant can perform follow-up analyses to determine whether the mean length for the changed process is greater than 10. The change-point test can be used as follows: 1 If there are ties, the following formula should be used for σw: g [mn / N ( N 1)][ N 3 N ) / 12 (t 3j t j ) / 12 ] , where g is the number of groupings of different j 1 tied ranks and tj is the number of tied ranks in the jth grouping. 2 This method of testing the null hypothesis is used by S. Siegel and N. J. Castellan (1988, p.68). Nonparametric Statistics for the Behavioral Sciences, 2nd ed. S. Siegel and N. J. Castellan (McGraw-Hill). It appears, however, that the method is liberal in that the observed p is much smaller than the “true” p. Perhaps for some types of assurance services, this bias is acceptable. For a test that better approximates the true one-tailed p, use this formula: exp[( 6(2(W E (W )) ) /( N N )] , which is based on Pettitt, A. N. (1979). A non-parametric approach to the change-point problem. Applied Statistics, 126-135. 2 3 2 2 observation (j) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 length 9.99 9.88 10.24 9.87 10.03 10.01 9.96 9.91 10.2 10.02 10.06 10 9.81 9.79 9.97 10.22 10.31 9.94 10.27 9.9 10.16 9.93 10.17 10.29 10.3 10.33 9.92 10.25 rank 12 4 22 3 16 14 10 6 20 15 17 13 2 1 11 21 27 9 24 5 18 8 19 25 26 28 7 23 Wj 12 16 38 41 57 71 81 87 107 122 139 152 154 155 166 187 214 223 247 252 270 278 297 322 348 376 383 406 Wj-j(N+1)/2 -2.5 -13 -5.5 -17 -15.5 -16 -20.5 -29 -23.5 -23 -20.5 -22 -34.5 -48 -51.5 -45 -32.5 -38 -28.5 -38 -34.5 -41 -36.5 -26 -14.5 -1 -8.5 0 For this problem, j ranges from 1 to 28 and N = 28. The bolded value in the last column is the maximum, in absolute value, of Wj – j(N+1)/2, which is -51.5. The sum of ranks, W, associated with that maximum is 166. The maximum occurs at observation 15, so m = 15 and n = N – m = 28 – 15 = 13. The expected value of W for m = 15 under the null hypothesis of no change in process is m(N + 1)/2 = 15(28 + 1)/2 = 217.5. (Note that the last column in the table above shows the difference between the observed and expected W, e.g., 166 – 217.5 = -51.5.) The standard deviation, σW, is mn( N 1) / 12 = (13)(15)(28 1) /12 = 21.708. The z statistic is (W h E (W )) / W = (166 + .5 – 217.5) / 21.708 = -2.35. We can employ a one-tailed significance test in this case because we are interested only in whether the process mean has increased (and a negative z-score is consistent with that direction); the standard normal table indicates that the probability of obtaining a z-score ≤ - 2.35 when the null hypothesis is true is .0094. Thus, we reject the null hypothesis and conclude that the process has changed, suggesting that the mean has increased. There is evidence that the process is out of control. Further, the evidence suggests that the out-of-control state started at approximately observation 15. 3 II. Chi-square goodness-of-fit test Purpose. The chi-square goodness-of-fit test can be used when data fall into two or more categories and the accountant wants to know whether the observed (i.e., recorded or actual) frequencies in each category differ significantly from the expected frequencies in each category. For example, one could use this test to determine whether the number of processing errors varies across different accounting clerks. Method. To use this test, one must be able to specify the expected frequencies in each category. The observed frequencies are then compared with the expected frequencies using this statistic: (Oi Ei ) 2 , Ei i 1 where Oi is the observed frequency in category i, Ei is the expected frequency in category i, and k is the number of categories. Χ2, if the null hypothesis (i.e., that observed frequencies equal expected frequencies) is true, asymptotically has a chi-square distribution with df = k – 1. Thus, a chi-square table can be used to assess the significance of Χ2 (see table at the end of this document). The larger the differences between the observed and expected frequencies are, the larger will be the statistic, and the more likely the null hypothesis is not true (and therefore should be rejected). For the statistic to be adequately represented by the chi-square distribution, the expected frequencies in all the categories should be ≥ 1 and the expected frequencies in at least 80% of the categories should be ≥ 5. (When there are only two categories, the expected frequencies should be ≥ 5.) k 2 Example. The internal auditors in your company are supposed to sample transactions in proportion to the number of transactions at each of the five locations. The auditors during the past year sampled 1,000 transactions, as follows: Location 1 2 3 4 5 Number of transactions sampled 200 250 150 150 250 Number of transactions occurring 9,500 11,000 9,000 10,000 10,500 Did the internal auditors sample the appropriate number from each location? A total of 50,000 transactions occurred, divided among the locations as follows: 19%, 22%, 18%, 20%, and 21% at location 1, 2, 3, 4, and 5, respectively. Thus, we can compute the “expected” frequency of sampled transactions by taking the respective percentages times 1,000 (the total sample): 4 Location 1 2 3 4 5 Observed number of transactions sampled 200 250 150 150 250 Expected number of transactions sampled 190 220 180 200 210 (Oi Ei ) 2 (200 190) 2 (250 220) 2 (150 180) 2 (150 200) 2 (250 210) 2 Ei 190 220 180 200 210 i 1 .53 4.09 5 12.5 7.62 29.74 . 5 2 The statistic has df = k – 1 = 5 – 1 = 4. Reference to a chi-square table indicates that the probability of getting a statistic with a value of 29.74 or higher is substantially less than .005 (the tabled value for p = .005 is 14.86). Therefore, since this is such a rare occurrence under the assumption that the null hypothesis (i.e., that observed and expected frequencies are equal) is true, we reject the null hypothesis and conclude that the frequencies differ. We examine the five individual components making up the statistic and note that most of the difference occurs in location 4 (which contributes 12.5 to the test statistic), where too little sampling was conducted. Other locations (e.g., 2, 3, and 5) also have relatively large differences. (If you want to continue this example, assess whether the simple strategy of sampling an equal number of transactions from each location (i.e., 200) would satisfy the edict that the transactions sampled at a location be proportional to the number of transactions at a location.) III. Chi-square contingency table test Purpose. The chi-square goodness-of-fit test described above is used when the accountant wishes to determine whether the observed frequencies in two or more categories differ from the expected frequencies. For that test, there is one sample from a single population. The chi-square contingency table test, on the other hand, can be used to compare relative frequencies for two or more samples. The samples are assumed to be independent samples. Method. The data are cast into a two-dimensional contingency table, with cell values equal to the frequencies that the observations fall into the category defined by the row and column. For example, assume that the accountant has two groups of individuals, and each individual is classified into one of three categories: 5 Classification 1 Classification 2 Classification 3 Totals (cj) Group 1 n11 n21 n31 c1= n11 + n21 + n31 Group 2 n12 n22 n32 c2= n12 + n22 + n32 Totals (ri) r1= n11 + n12 r2= n21 + n22 r3= n31 + n32 N= r1 + r2 + r3 = c1 +c2 In this table, there are r = 3 rows and c = 2 columns. In general, nij refers to the frequency in the cell identified by row i and column j; ri is the total number of observations that are in row i; cj is the total number of observations in column j; and N is the total number of observations. The null hypothesis for this test is that the relative frequencies for the two groups are the same. We test this hypothesis by calculating what the relative frequencies would be under the assumption that they are the same for the two groups (we will call these frequencies the “expected” frequencies), and then assessing whether the observed frequencies are close to the expected frequencies. The expected frequency, Eij, is equal to (ri*cj)/N. We again use the chi-square statistic to compare the expected and observed frequencies (the nij’s): r 2 c (nij Eij ) 2 . Eij The double summation means that the sum is over all the cells in the table. This statistic, if the null hypothesis is true, asymptotically has a chi-square distribution with df = (r-1)(c-1). For the statistic to be adequately represented by the chi-square distribution, the expected frequencies in all of the cells should be ≥ 1 and the expected frequencies in at least 80% of the cells should be ≥ 5. For 2 x 2 tables, the following formula, which incorporates a correction for continuity, is recommended: i 1 j 1 N (| AD BC | N / 2) 2 , ( A B)(C D)( A C )( B D) 2 where A = n11, B = n12, C = n21, and D = n22. This statistic has df = (r-1)(c-1) = (2-1)(2-1) = 1. Example. A company is assessing the effect of different incentive compensation plans on the performance of data-input operators. Three plans were recently implemented at three different locations: A, B, and C. Prior to the implementation of the plans, all three locations had similar levels of operator performance. Operator performance is unsatisfactory if more than .5% of the input data needs to be recoded. The accountant gathers performance data from the 200 operators at the three locations, as follows: 6 Recode ≤ .5% 62 46 42 150 Location A Location B Location C Totals (cj) Recode > .5% 11 20 19 50 Totals (ri) 73 66 61 200 The tabled values are frequencies, so, for example, of the 73 operators at location A, 62 performed at an acceptable level and 11 did not (more than .5% of the data had to be recoded). Is there evidence that the compensation plans are associated with different levels of data-entry performance? The chi-square contingency table test is suitable to employ to answer this question. The table below incorporates the expected frequencies for all the cells (expected frequencies are in parentheses). For example, the expected frequency E11 is equal to (r1*c1)/N = (73)(150)/200 = 54.75. Recode ≤ .5% 62 (54.75) 46 (49.50) 42 (45.75) 150 Location A Location B Location C Totals (cj) Recode > .5% 11 (18.25) 20 (16.50) 19 (15.25) 50 Totals (ri) 73 66 61 200 A quick comparison of the observed and expected frequencies indicates that the compensation plan at location A may be associated with better performance than at the other locations, because there were relatively fewer instances of data requiring recoding at that location. To determine whether any significant differences exist, we compute the chi-square statistic, as follows: r 2 i 1 c (nij Eij ) 2 j 1 Eij (62 54.75) 2 (11 18.25) 2 (19 15.25) 2 ... 6.06 54.75 18.25 15.25 The statistic has df equal (r-1)(c-1) = (3-1)(2-1) = 2. The chi-square distribution table shows a statistic this large has less than a .05 probability of occurring if the null hypothesis (of no differences across the locations) were true. We therefore reject the null hypothesis and conclude that the locations have different levels of operator performance. The data indicate that the operators at location A performed better than operators at the other locations, so we conclude that the compensation plan used at location A was associated with better performance. (Note that the significant overall test indicates only that there are significant differences; it does not indicate specifically where (i.e., in which cells) the significant differences are. There are follow-up tests than can be used to determine where these differences are, but we will not illustrate those procedures. If you are interested, refer to a nonparametric statistics book.) 7 IV. z-scores Purpose. z-scores are useful for measuring the relative location of an observation in a frequency or probability distribution. z-scores, sometimes called standardized scores, measure the number of standard deviations an observation is from the mean of the distribution. If the distribution is approximately normal, one can refer to standard tables to make probability statements about where the observation is located (see example below). Method. The standard deviation, s, for a sample is computed as n s _ (X i X )2 i 1 _ , where Xi is an individual observation, X is the sample mean, and n n 1 is the number of observations in the sample. (For a population, one would divide by n instead of n-1.) This measure of the “spread” of a distribution can be used to assess whether the same process that generated the sample or population also generated a given observation of interest. (Or, alternatively, whether a given observation seems to differ from the sampled observations or population). A z-score for a given observation, X, is calculated as follows: z XX . s For most bell-shaped distributions (of which the normal distribution is an example), with _ n > 30 or so, the interval ( X ± s), which corresponds to z-scores up to ±1.00, contains _ approximately 68% of the observations, the interval ( X ± 2s) contains approximately _ 95% of the observations and the interval ( X ± 3s) contains over 99% of the observations. (These percentages can be inferred and refined, and other intervals developed, by referring to a standard normal distribution table. For example, the table at the end of this document shows that for 1 standard deviation (i.e., when z = 1.00), .3413 of the _ observations will lie between X (0 in the table) and s, and .3413 of the observations will _ lie between X and –s; thus, .3413 + .3413 = .6826 (i.e., approximately 68%) of the observations are expected to lie between s and –s of the mean.) Example. Assume that the industry mean and standard deviation for inventory turnover ratio are 6.5 and 1.0, respectively. If a company’s inventory turnover ratio is 8.5, the zscore is (8.5 – 6.5) / 1.0 = 2.00. If the ratios are approximately normally distributed, the accountant could state that the probability of obtaining an inventory turnover ratio of this magnitude (i.e., 2 standard deviations from the mean) or greater is approximately 2.28%. The accountant might conclude, with an observation this extreme, that the company differs in some way from the average company in the industry. 8 V. The t-test Purpose. The t-test is used to evaluate differences in means from two groups or populations. It can be used on very small sample sizes (e.g., as small as 10 or, according to some, even smaller), although larger sample sizes are preferred. It is assumed that each of the two underlying populations is normally distributed. Further, each of the underlying populations is assumed to have same variance (the variance is equal to the square of the standard deviation); this is the homogeneity of variance assumption. Violations of the normality assumption are not important, but violations of the homogeneity of variance assumption can be problematic, unless the two sample sizes are equal. (As discussed below, the Excel function that performs the t-test has an option that appropriately “corrects” for lack of homogeneity of variance that can be used when the variances are unequal.) Method. With the t-test, the accountant tests the null hypothesis that the two means are the same. The t statistic is computed as follows: t X1 X 2 (n1 1) s (n2 1) s n1 n2 2 2 1 2 2 1 1 n1 n2 , where X 1 and X 2 are the means of the samples from population (or group) 1 and 2, respectively, s12 and s 22 are the variances (i.e., squared standard deviations) from population 1 and 2, respectively, which serve as estimates of the common population variance, and n1 and n2 are the sizes of the samples. This statistic is distributed as t with n1 + n2 – 2 degrees of freedom (df). The t distribution is similar to the normal distribution (i.e., symmetrical and bellshaped), except that it has thicker tails (unless the number of degrees of freedom is large). With infinite degrees of freedom, the t and normal distributions are identical. Similar to the normal distribution, we can refer to tables to determine the probabilities associated with obtaining various magnitudes of t under the assumption that there is no difference between the means (i.e., the null hypothesis). Refer to the table at the end of this document. To reject the null hypothesis of no difference between means at p < .05 for a two-sided test with 40 df, we would need a t statistic of at least 2.021. (This compares with value of 1.96 for the normal distribution.) Instead of computing means and standard deviations, using the above formula, and looking up probability values in tables, you can use Excel to perform the t-test. The function “TTEST” returns the probability value. With this function, you can specify a one- or two-tailed test (the third argument) and whether the variances are assumed to be homogeneous or not (the fourth argument: 2 = homogeneous, 3 = heterogeneous). If you want to calculate the actual t statistic, use the TINV function. Example. The internal auditor for E-MFE suspects that the 25 northern Montreal delivery outlets have different levels of delivery expenses than the 25 southern outlets. Before she modifies her planning regression model, she wishes to verify that there is a difference. 9 NORTHERN OUTLETS 3196 3136 3165 3464 3342 3153 3168 3531 2957 3451 3108 3117 3376 3338 3509 3163 2987 3391 3140 3310 3294 3358 3255 3399 3120 mean standard deviation SOUTHERN OUTLETS 3303 3055 3225 3134 3215 2959 3541 3338 3305 3381 3146 2876 3127 3198 3250 3180 2836 3205 3170 3187 3088 3110 2986 3244 3104 3257.12 3166.52 156.93 154.56 Do these delivery-expense data indicate that northern outlets and southern outlets have different levels of expense? The standard deviations are close in magnitude, so we can assume that the homogeneity of variance assumption is satisfied. (Because the sample sizes are equal, unequal variances are not a major concern anyway.) We compute the t statistic, as follows: t 3257.12 3166.52 (25 1)(156.93) 2 (25 1)(154.56) 2 1 1 25 25 2 25 25 = 2.057. There are 25 + 25 – 2 = 48 df associated with the statistic. Because the internal auditor does not have reason to expect the northern or southern outlets to have the higher expenses, a two-tailed test should be employed. The table indicates that there is a probability of less than .05 (i.e., 2 * .025) that we would observe a t of 2.011 or greater if the means were really the same. Thus, because our observed t of 2.057 is greater than the tabled value, we reject the null hypothesis and conclude that the evidence indicates that the northern outlets have different levels of delivery expense than the southern outlets. Specifically, the northern outlets’ expenses are higher. If we were to use Excel, we could obtain the probability directly by using the following function: TTEST(B2:D10,F2:H10,2,2). This assumes that the northern outlets’ expenses are in cells B2:D10, the southerns’ are in cells F2:H10, a two-tailed test is used (the “2” as the third argument), and the variances are equal (the “2” as the fourth argument). The TTEST function returns a value of .04518 (which is consistent with the probability we obtained from the table). If we wanted to obtain the t statistic, we could use the function TINV(.04518,48); the first argument is the two-tailed probability value and the second is the number of df. The TINV function returns a value of 2.0566, which is the same as calculated by the formula above. 10 Table: Chi-Square Probabilities Locate the appropriate degrees of freedom in the left column. The top row gives the probability under the null hypothesis that the observed statistic is greater than or equal to the value in the body of the table. df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 0.995 --0.010 0.072 0.207 0.412 0.676 0.989 1.344 1.735 2.156 2.603 3.074 3.565 4.075 4.601 5.142 5.697 6.265 6.844 7.434 8.034 8.643 9.260 9.886 10.520 11.160 11.808 12.461 13.121 13.787 20.707 0.99 --0.020 0.115 0.297 0.554 0.872 1.239 1.646 2.088 2.558 3.053 3.571 4.107 4.660 5.229 5.812 6.408 7.015 7.633 8.260 8.897 9.542 10.196 10.856 11.524 12.198 12.879 13.565 14.256 14.953 22.164 0.975 0.001 0.051 0.216 0.484 0.831 1.237 1.690 2.180 2.700 3.247 3.816 4.404 5.009 5.629 6.262 6.908 7.564 8.231 8.907 9.591 10.283 10.982 11.689 12.401 13.120 13.844 14.573 15.308 16.047 16.791 24.433 0.95 0.004 0.103 0.352 0.711 1.145 1.635 2.167 2.733 3.325 3.940 4.575 5.226 5.892 6.571 7.261 7.962 8.672 9.390 10.117 10.851 11.591 12.338 13.091 13.848 14.611 15.379 16.151 16.928 17.708 18.493 26.509 0.90 0.016 0.211 0.584 1.064 1.610 2.204 2.833 3.490 4.168 4.865 5.578 6.304 7.042 7.790 8.547 9.312 10.085 10.865 11.651 12.443 13.240 14.041 14.848 15.659 16.473 17.292 18.114 18.939 19.768 20.599 29.051 0.10 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 17.275 18.549 19.812 21.064 22.307 23.542 24.769 25.989 27.204 28.412 29.615 30.813 32.007 33.196 34.382 35.563 36.741 37.916 39.087 40.256 51.805 0.05 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362 23.685 24.996 26.296 27.587 28.869 30.144 31.410 32.671 33.924 35.172 36.415 37.652 38.885 40.113 41.337 42.557 43.773 55.758 0.025 5.024 7.378 9.348 11.143 12.833 14.449 16.013 17.535 19.023 20.483 21.920 23.337 24.736 26.119 27.488 28.845 30.191 31.526 32.852 34.170 35.479 36.781 38.076 39.364 40.646 41.923 43.195 44.461 45.722 46.979 59.342 0.01 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 38.932 40.289 41.638 42.980 44.314 45.642 46.963 48.278 49.588 50.892 63.691 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 26.757 28.300 29.819 31.319 32.801 34.267 35.718 37.156 38.582 39.997 41.401 42.796 44.181 45.559 46.928 48.290 49.645 50.993 52.336 53.672 66.766 11 50 60 70 80 90 100 27.991 35.534 43.275 51.172 59.196 67.328 29.707 37.485 45.442 53.540 61.754 70.065 32.357 40.482 48.758 57.153 65.647 74.222 34.764 43.188 51.739 60.391 69.126 77.929 37.689 63.167 67.505 71.420 76.154 79.490 46.459 74.397 79.082 83.298 88.379 91.952 55.329 85.527 90.531 95.023 100.425 104.215 64.278 96.578 101.879 106.629 112.329 116.321 73.291 107.565 113.145 118.136 124.116 128.299 82.358 118.498 124.342 129.561 135.807 140.169 12 Table: Areas under the Standard Normal Distribution Locate the appropriate z-score by referring to the left column (for z to one decimal place) and the top column (for the second decimal place). Tabled values represent the area under the curve between 0 and the observed z. To find the area beyond the observed z, subtract the tabled amount from .50. This area is the normally reported (one-tailed) pvalue. Since the distribution is symmetric, if the observed z-score is negative, use its absolute value. For a two-tailed test, double the tabled values. Area between 0 and z z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 13 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 14 Table: The t Distribution To use the following tabled values, specify a value for α (the acceptable level of significance; that is, the acceptable probability of rejecting the null hypothesis of “equal means” when the null hypothesis is really true): 1. For a two-sided test, find the column corresponding to α/2 and reject the null hypothesis if the absolute value of the test statistic is greater than the value of tα/2 in the table below. 2. For a one-sided test, make sure that the means are in the hypothesized direction. If so, find the column corresponding to α and reject the null hypothesis if the absolute value of the test statistic is greater than the tabled value. Upper critical values of Student's t distribution with ν degrees of freedom Probability of exceeding the critical value ν 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 0.10 0.05 0.025 0.01 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 0.005 0.001 63.657 318.313 9.925 22.327 5.841 10.215 4.604 7.173 4.032 5.893 3.707 5.208 3.499 4.782 3.355 4.499 3.250 4.296 3.169 4.143 3.106 4.024 3.055 3.929 3.012 3.852 2.977 3.787 2.947 3.733 2.921 3.686 2.898 3.646 2.878 3.610 2.861 3.579 2.845 3.552 2.831 3.527 2.819 3.505 2.807 3.485 2.797 3.467 2.787 3.450 2.779 3.435 15 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 1.314 1.313 1.311 1.310 1.309 1.309 1.308 1.307 1.306 1.306 1.305 1.304 1.304 1.303 1.303 1.302 1.302 1.301 1.301 1.300 1.300 1.299 1.299 1.299 1.298 1.298 1.298 1.297 1.297 1.297 1.297 1.296 1.296 1.296 1.296 1.295 1.295 1.295 1.295 1.295 1.294 1.294 1.294 1.294 1.294 1.293 1.293 1.293 1.293 1.293 1.293 1.292 1.292 1.292 1.292 1.703 1.701 1.699 1.697 1.696 1.694 1.692 1.691 1.690 1.688 1.687 1.686 1.685 1.684 1.683 1.682 1.681 1.680 1.679 1.679 1.678 1.677 1.677 1.676 1.675 1.675 1.674 1.674 1.673 1.673 1.672 1.672 1.671 1.671 1.670 1.670 1.669 1.669 1.669 1.668 1.668 1.668 1.667 1.667 1.667 1.666 1.666 1.666 1.665 1.665 1.665 1.665 1.664 1.664 1.664 2.052 2.048 2.045 2.042 2.040 2.037 2.035 2.032 2.030 2.028 2.026 2.024 2.023 2.021 2.020 2.018 2.017 2.015 2.014 2.013 2.012 2.011 2.010 2.009 2.008 2.007 2.006 2.005 2.004 2.003 2.002 2.002 2.001 2.000 2.000 1.999 1.998 1.998 1.997 1.997 1.996 1.995 1.995 1.994 1.994 1.993 1.993 1.993 1.992 1.992 1.991 1.991 1.990 1.990 1.990 2.473 2.467 2.462 2.457 2.453 2.449 2.445 2.441 2.438 2.434 2.431 2.429 2.426 2.423 2.421 2.418 2.416 2.414 2.412 2.410 2.408 2.407 2.405 2.403 2.402 2.400 2.399 2.397 2.396 2.395 2.394 2.392 2.391 2.390 2.389 2.388 2.387 2.386 2.385 2.384 2.383 2.382 2.382 2.381 2.380 2.379 2.379 2.378 2.377 2.376 2.376 2.375 2.374 2.374 2.373 2.771 2.763 2.756 2.750 2.744 2.738 2.733 2.728 2.724 2.719 2.715 2.712 2.708 2.704 2.701 2.698 2.695 2.692 2.690 2.687 2.685 2.682 2.680 2.678 2.676 2.674 2.672 2.670 2.668 2.667 2.665 2.663 2.662 2.660 2.659 2.657 2.656 2.655 2.654 2.652 2.651 2.650 2.649 2.648 2.647 2.646 2.645 2.644 2.643 2.642 2.641 2.640 2.640 2.639 2.638 3.421 3.408 3.396 3.385 3.375 3.365 3.356 3.348 3.340 3.333 3.326 3.319 3.313 3.307 3.301 3.296 3.291 3.286 3.281 3.277 3.273 3.269 3.265 3.261 3.258 3.255 3.251 3.248 3.245 3.242 3.239 3.237 3.234 3.232 3.229 3.227 3.225 3.223 3.220 3.218 3.216 3.214 3.213 3.211 3.209 3.207 3.206 3.204 3.202 3.201 3.199 3.198 3.197 3.195 3.194 16 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 1.292 1.292 1.292 1.292 1.291 1.291 1.291 1.291 1.291 1.291 1.291 1.291 1.291 1.291 1.290 1.290 1.290 1.290 1.290 1.282 1.664 1.663 1.663 1.663 1.663 1.663 1.662 1.662 1.662 1.662 1.662 1.661 1.661 1.661 1.661 1.661 1.661 1.660 1.660 1.645 1.989 1.989 1.989 1.988 1.988 1.988 1.987 1.987 1.987 1.986 1.986 1.986 1.986 1.985 1.985 1.985 1.984 1.984 1.984 1.960 2.373 2.372 2.372 2.371 2.370 2.370 2.369 2.369 2.368 2.368 2.368 2.367 2.367 2.366 2.366 2.365 2.365 2.365 2.364 2.326 2.637 2.636 2.636 2.635 2.634 2.634 2.633 2.632 2.632 2.631 2.630 2.630 2.629 2.629 2.628 2.627 2.627 2.626 2.626 2.576 3.193 3.191 3.190 3.189 3.188 3.187 3.185 3.184 3.183 3.182 3.181 3.180 3.179 3.178 3.177 3.176 3.175 3.175 3.174 3.090 © 2006 by Eric E. Spires. This document was prepared for use in AMIS 822 at The Ohio State University. If you have comments or questions, please contact the author at [email protected]. A primary source for the descriptions of some of the statistics was Nonparametric Statistics for the Behavioral Sciences, 2nd ed., by S. Siegel and N. J. Castellan (McGraw-Hill, 1988). 17