Download Table: Chi-Square Probabilities - Fisher College of Business

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Sufficient statistic wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Using Simple Statistics in Assurance Services
Statistics are useful for summarizing large amounts of data into a form that can be more
easily interpreted. Accountants and managers can use statistical theory and statistics to
develop expectations about accounting numbers (e.g., balances or transaction amounts) or
behavior; the recorded numbers or behavior can then be compared to the expectations,
again using statistics. Patterns and discrepancies that are extremely difficult to observe
and identify with the naked eye are often easily discernable using relatively simple
statistics. Sometimes statistics can be used to evaluate whether a discrepancy is large
enough to investigate further, or whether it can be ignored.
The descriptions of the tests below are brief and dense; they are meant to be a
general guide only. Details on most of these tests can be found in any standard
nonparametric statistics or business statistics book. We will illustrate most of the tests in
class. You should apply them whenever they seem appropriate. A large part of using
these techniques is identifying situations in which they can be applied.
I. Change-point test
Purpose. The change-point test is used to assess whether there has been an underlying
change in the process that generates an ordered sequence of values. Assuming that the
process has changed, the test also indicates at which point the process changed.
Method. Say that we have a series of N observations. We first rank the N observations
from 1 to N. (The version of the change-point that we are describing requires a
continuous variable; there is a version of the test for binomial variables.) Define ri to be
the rank associated with observation i. (If there are ties, assign to each of the tied
observations the average of the ranks they would have if no ties had occurred. For
example, if two observations are equal and are tied for ranks 5 and 6, assign them each
the rank 5.5.) For each point j in the series, calculate Wj, which is the sum of the ranks
through that point:
j
W j   ri
j  1,2,..., N  1 .
i 1
For each of the Wj values, calculate the difference Wj – j(N+1)/2. The value of j for
which the absolute value of this difference is the maximum is the estimated change point
in the series, and is denoted m. The number of observations after the change point is
denoted n, and is equal to N – m. (Note that the ranking procedure and subsequent
calculations can easily be accomplished in excel.)
We can calculate the expected value for W, E(W), under the null hypothesis that
the process has not changed. (To employ the following procedures, either m or n must be
greater than 10. For smaller sample sizes, special tables are available to test the
significance of W.) If the observed value of W significantly differs from E(W), then we
1
will conclude that the process has changed. The expected value of the sum of the ranks is
E(W) = m(N + 1)/2. The standard deviation of W is1
mn( N  1)
W 
.
12
We can test the null hypothesis that the process has not changed by using the z statistic,
as follows2:
W  h  E (W ) W  h  m( N  1) / 2
z

,
W
mn( N  1) / 12
where h = +.5 if W < m(N + 1)/2 and h = -.5 if W > m(N + 1)/2. This statistic, when the
null hypothesis is true, is approximately normally distributed with mean 0 and standard
deviation 1. (The factor h is a correction for continuity that improves the approximation
to the normal distribution.) The significance of the observed value of z can be
determined by referring to a standard normal distribution table.
Example. A certain manufacturing process is considered to be out of control when the
lengths of the parts produced start systematically to exceed 10. The accountant gathers
lengths for 28 recent parts (listed in order of occurrence), as follows:
9.99, 9.88, 10.24, 9.87, 10.03, 10.01, 9.96, 9.91, 10.20, 10.02, 10.06, 10.00, 9.81, 9.79,
9.97, 10.22, 10.31, 9.94, 10.27, 9.90, 10.16, 9.93, 10.17, 10.29, 10.30, 10.33, 9.92, 10.25
In addition to determining whether the process is out of control, management also wants
to determine when the process started to be out of control. This latter information is
useful in planning when machines need to be adjusted.
As a first step in determining whether the process is out of control, the accountant
determines whether the process has changed, under the assumption that the process was
in control at the start. If the process has changed, the accountant can perform follow-up
analyses to determine whether the mean length for the changed process is greater than 10.
The change-point test can be used as follows:
1
If there are ties, the following formula should be used for σw:
g
[mn / N ( N  1)][ N 3  N ) / 12   (t 3j  t j ) / 12 ] , where g is the number of groupings of different
j 1
tied ranks and tj is the number of tied ranks in the jth grouping.
2
This method of testing the null hypothesis is used by S. Siegel and N. J. Castellan (1988, p.68).
Nonparametric Statistics for the Behavioral Sciences, 2nd ed. S. Siegel and N. J. Castellan (McGraw-Hill).
It appears, however, that the method is liberal in that the observed p is much smaller than the “true” p.
Perhaps for some types of assurance services, this bias is acceptable. For a test that better approximates the
true one-tailed p, use this formula: exp[( 6(2(W  E (W )) ) /( N  N )] , which is based on Pettitt,
A. N. (1979). A non-parametric approach to the change-point problem. Applied Statistics, 126-135.
2
3
2
2
observation (j)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
length
9.99
9.88
10.24
9.87
10.03
10.01
9.96
9.91
10.2
10.02
10.06
10
9.81
9.79
9.97
10.22
10.31
9.94
10.27
9.9
10.16
9.93
10.17
10.29
10.3
10.33
9.92
10.25
rank
12
4
22
3
16
14
10
6
20
15
17
13
2
1
11
21
27
9
24
5
18
8
19
25
26
28
7
23
Wj
12
16
38
41
57
71
81
87
107
122
139
152
154
155
166
187
214
223
247
252
270
278
297
322
348
376
383
406
Wj-j(N+1)/2
-2.5
-13
-5.5
-17
-15.5
-16
-20.5
-29
-23.5
-23
-20.5
-22
-34.5
-48
-51.5
-45
-32.5
-38
-28.5
-38
-34.5
-41
-36.5
-26
-14.5
-1
-8.5
0
For this problem, j ranges from 1 to 28 and N = 28. The bolded value in the last column
is the maximum, in absolute value, of Wj – j(N+1)/2, which is -51.5. The sum of ranks,
W, associated with that maximum is 166. The maximum occurs at observation 15, so m =
15 and n = N – m = 28 – 15 = 13. The expected value of W for m = 15 under the null
hypothesis of no change in process is m(N + 1)/2 = 15(28 + 1)/2 = 217.5. (Note that the
last column in the table above shows the difference between the observed and expected
W, e.g., 166 – 217.5 = -51.5.) The standard deviation, σW, is mn( N  1) / 12 =
(13)(15)(28  1) /12 = 21.708. The z statistic is (W  h  E (W )) /  W =
(166 + .5 – 217.5) / 21.708 = -2.35. We can employ a one-tailed significance test in this
case because we are interested only in whether the process mean has increased (and a
negative z-score is consistent with that direction); the standard normal table indicates that
the probability of obtaining a z-score ≤ - 2.35 when the null hypothesis is true is .0094.
Thus, we reject the null hypothesis and conclude that the process has changed, suggesting
that the mean has increased. There is evidence that the process is out of control. Further,
the evidence suggests that the out-of-control state started at approximately observation
15.
3
II. Chi-square goodness-of-fit test
Purpose. The chi-square goodness-of-fit test can be used when data fall into two or more
categories and the accountant wants to know whether the observed (i.e., recorded or
actual) frequencies in each category differ significantly from the expected frequencies in
each category. For example, one could use this test to determine whether the number of
processing errors varies across different accounting clerks.
Method. To use this test, one must be able to specify the expected frequencies in each
category. The observed frequencies are then compared with the expected frequencies
using this statistic:
(Oi  Ei ) 2
,
Ei
i 1
where Oi is the observed frequency in category i, Ei is the expected frequency in category
i, and k is the number of categories. Χ2, if the null hypothesis (i.e., that observed
frequencies equal expected frequencies) is true, asymptotically has a chi-square
distribution with df = k – 1. Thus, a chi-square table can be used to assess the
significance of Χ2 (see table at the end of this document). The larger the differences
between the observed and expected frequencies are, the larger will be the statistic, and the
more likely the null hypothesis is not true (and therefore should be rejected). For the
statistic to be adequately represented by the chi-square distribution, the expected
frequencies in all the categories should be ≥ 1 and the expected frequencies in at least
80% of the categories should be ≥ 5. (When there are only two categories, the expected
frequencies should be ≥ 5.)
k
2  
Example. The internal auditors in your company are supposed to sample transactions in
proportion to the number of transactions at each of the five locations. The auditors
during the past year sampled 1,000 transactions, as follows:
Location
1
2
3
4
5
Number of transactions sampled
200
250
150
150
250
Number of transactions occurring
9,500
11,000
9,000
10,000
10,500
Did the internal auditors sample the appropriate number from each location?
A total of 50,000 transactions occurred, divided among the locations as follows: 19%,
22%, 18%, 20%, and 21% at location 1, 2, 3, 4, and 5, respectively. Thus, we can
compute the “expected” frequency of sampled transactions by taking the respective
percentages times 1,000 (the total sample):
4
Location
1
2
3
4
5
Observed number of transactions
sampled
200
250
150
150
250
Expected number of transactions
sampled
190
220
180
200
210
(Oi  Ei ) 2 (200  190) 2 (250  220) 2 (150  180) 2 (150  200) 2 (250  210) 2





Ei
190
220
180
200
210
i 1
 .53  4.09  5  12.5  7.62  29.74 .
5
2  
The statistic has df = k – 1 = 5 – 1 = 4. Reference to a chi-square table indicates that the
probability of getting a statistic with a value of 29.74 or higher is substantially less than
.005 (the tabled value for p = .005 is 14.86). Therefore, since this is such a rare
occurrence under the assumption that the null hypothesis (i.e., that observed and expected
frequencies are equal) is true, we reject the null hypothesis and conclude that the
frequencies differ. We examine the five individual components making up the statistic
and note that most of the difference occurs in location 4 (which contributes 12.5 to the
test statistic), where too little sampling was conducted. Other locations (e.g., 2, 3, and 5)
also have relatively large differences. (If you want to continue this example, assess
whether the simple strategy of sampling an equal number of transactions from each
location (i.e., 200) would satisfy the edict that the transactions sampled at a location be
proportional to the number of transactions at a location.)
III. Chi-square contingency table test
Purpose. The chi-square goodness-of-fit test described above is used when the
accountant wishes to determine whether the observed frequencies in two or more
categories differ from the expected frequencies. For that test, there is one sample from a
single population. The chi-square contingency table test, on the other hand, can be used
to compare relative frequencies for two or more samples. The samples are assumed to be
independent samples.
Method. The data are cast into a two-dimensional contingency table, with cell values
equal to the frequencies that the observations fall into the category defined by the row
and column. For example, assume that the accountant has two groups of individuals, and
each individual is classified into one of three categories:
5
Classification 1
Classification 2
Classification 3
Totals (cj)
Group 1
n11
n21
n31
c1= n11 + n21 + n31
Group 2
n12
n22
n32
c2= n12 + n22 + n32
Totals (ri)
r1= n11 + n12
r2= n21 + n22
r3= n31 + n32
N= r1 + r2 + r3 = c1 +c2
In this table, there are r = 3 rows and c = 2 columns. In general, nij refers to the
frequency in the cell identified by row i and column j; ri is the total number of
observations that are in row i; cj is the total number of observations in column j; and N is
the total number of observations.
The null hypothesis for this test is that the relative frequencies for the two groups
are the same. We test this hypothesis by calculating what the relative frequencies would
be under the assumption that they are the same for the two groups (we will call these
frequencies the “expected” frequencies), and then assessing whether the observed
frequencies are close to the expected frequencies. The expected frequency, Eij, is equal
to (ri*cj)/N. We again use the chi-square statistic to compare the expected and observed
frequencies (the nij’s):
r
 
2
c

(nij  Eij ) 2
.
Eij
The double summation means that the sum is over all the cells in the table. This statistic,
if the null hypothesis is true, asymptotically has a chi-square distribution with df =
(r-1)(c-1). For the statistic to be adequately represented by the chi-square distribution,
the expected frequencies in all of the cells should be ≥ 1 and the expected frequencies in
at least 80% of the cells should be ≥ 5.
For 2 x 2 tables, the following formula, which incorporates a correction for
continuity, is recommended:
i 1
j 1
N (| AD  BC |  N / 2) 2
 
,
( A  B)(C  D)( A  C )( B  D)
2
where A = n11, B = n12, C = n21, and D = n22. This statistic has df = (r-1)(c-1) = (2-1)(2-1)
= 1.
Example. A company is assessing the effect of different incentive compensation plans on
the performance of data-input operators. Three plans were recently implemented at three
different locations: A, B, and C. Prior to the implementation of the plans, all three
locations had similar levels of operator performance. Operator performance is
unsatisfactory if more than .5% of the input data needs to be recoded.
The accountant gathers performance data from the 200 operators at the three
locations, as follows:
6
Recode ≤ .5%
62
46
42
150
Location A
Location B
Location C
Totals (cj)
Recode > .5%
11
20
19
50
Totals (ri)
73
66
61
200
The tabled values are frequencies, so, for example, of the 73 operators at location A, 62
performed at an acceptable level and 11 did not (more than .5% of the data had to be
recoded). Is there evidence that the compensation plans are associated with different
levels of data-entry performance?
The chi-square contingency table test is suitable to employ to answer this question. The
table below incorporates the expected frequencies for all the cells (expected frequencies
are in parentheses). For example, the expected frequency E11 is equal to (r1*c1)/N =
(73)(150)/200 = 54.75.
Recode ≤ .5%
62 (54.75)
46 (49.50)
42 (45.75)
150
Location A
Location B
Location C
Totals (cj)
Recode > .5%
11 (18.25)
20 (16.50)
19 (15.25)
50
Totals (ri)
73
66
61
200
A quick comparison of the observed and expected frequencies indicates that the
compensation plan at location A may be associated with better performance than at the
other locations, because there were relatively fewer instances of data requiring recoding
at that location. To determine whether any significant differences exist, we compute the
chi-square statistic, as follows:
r
 
2
i 1
c
(nij  Eij ) 2
j 1
Eij


(62  54.75) 2 (11  18.25) 2
(19  15.25) 2

 ... 
 6.06
54.75
18.25
15.25
The statistic has df equal (r-1)(c-1) = (3-1)(2-1) = 2. The chi-square distribution table
shows a statistic this large has less than a .05 probability of occurring if the null
hypothesis (of no differences across the locations) were true. We therefore reject the null
hypothesis and conclude that the locations have different levels of operator performance.
The data indicate that the operators at location A performed better than operators at the
other locations, so we conclude that the compensation plan used at location A was
associated with better performance. (Note that the significant overall test indicates only
that there are significant differences; it does not indicate specifically where (i.e., in which
cells) the significant differences are. There are follow-up tests than can be used to
determine where these differences are, but we will not illustrate those procedures. If you
are interested, refer to a nonparametric statistics book.)
7
IV. z-scores
Purpose. z-scores are useful for measuring the relative location of an observation in a
frequency or probability distribution. z-scores, sometimes called standardized scores,
measure the number of standard deviations an observation is from the mean of the
distribution. If the distribution is approximately normal, one can refer to standard tables
to make probability statements about where the observation is located (see example
below).
Method. The standard deviation, s, for a sample is computed as
n
s
_
 (X i  X )2
i 1
_
, where Xi is an individual observation, X is the sample mean, and n
n 1
is the number of observations in the sample. (For a population, one would divide by n
instead of n-1.) This measure of the “spread” of a distribution can be used to assess
whether the same process that generated the sample or population also generated a given
observation of interest. (Or, alternatively, whether a given observation seems to differ
from the sampled observations or population). A z-score for a given observation, X, is
calculated as follows:
z
XX
.
s
For most bell-shaped distributions (of which the normal distribution is an example), with
_
n > 30 or so, the interval ( X ± s), which corresponds to z-scores up to ±1.00, contains
_
approximately 68% of the observations, the interval ( X ± 2s) contains approximately
_
95% of the observations and the interval ( X ± 3s) contains over 99% of the observations.
(These percentages can be inferred and refined, and other intervals developed, by
referring to a standard normal distribution table. For example, the table at the end of this
document shows that for 1 standard deviation (i.e., when z = 1.00), .3413 of the
_
observations will lie between X (0 in the table) and s, and .3413 of the observations will
_
lie between X and –s; thus, .3413 + .3413 = .6826 (i.e., approximately 68%) of the
observations are expected to lie between s and –s of the mean.)
Example. Assume that the industry mean and standard deviation for inventory turnover
ratio are 6.5 and 1.0, respectively. If a company’s inventory turnover ratio is 8.5, the zscore is (8.5 – 6.5) / 1.0 = 2.00. If the ratios are approximately normally distributed, the
accountant could state that the probability of obtaining an inventory turnover ratio of this
magnitude (i.e., 2 standard deviations from the mean) or greater is approximately 2.28%.
The accountant might conclude, with an observation this extreme, that the company
differs in some way from the average company in the industry.
8
V. The t-test
Purpose. The t-test is used to evaluate differences in means from two groups or
populations. It can be used on very small sample sizes (e.g., as small as 10 or, according
to some, even smaller), although larger sample sizes are preferred. It is assumed that
each of the two underlying populations is normally distributed. Further, each of the
underlying populations is assumed to have same variance (the variance is equal to the
square of the standard deviation); this is the homogeneity of variance assumption.
Violations of the normality assumption are not important, but violations of the
homogeneity of variance assumption can be problematic, unless the two sample sizes are
equal. (As discussed below, the Excel function that performs the t-test has an option that
appropriately “corrects” for lack of homogeneity of variance that can be used when the
variances are unequal.)
Method. With the t-test, the accountant tests the null hypothesis that the two means are
the same. The t statistic is computed as follows:
t
X1  X 2
(n1  1) s  (n2  1) s
n1  n2  2
2
1
2
2
1
1
  
 n1 n2 
,
where X 1 and X 2 are the means of the samples from population (or group) 1 and 2,
respectively, s12 and s 22 are the variances (i.e., squared standard deviations) from
population 1 and 2, respectively, which serve as estimates of the common population
variance, and n1 and n2 are the sizes of the samples. This statistic is distributed as t with
n1 + n2 – 2 degrees of freedom (df).
The t distribution is similar to the normal distribution (i.e., symmetrical and bellshaped), except that it has thicker tails (unless the number of degrees of freedom is large).
With infinite degrees of freedom, the t and normal distributions are identical. Similar to
the normal distribution, we can refer to tables to determine the probabilities associated
with obtaining various magnitudes of t under the assumption that there is no difference
between the means (i.e., the null hypothesis). Refer to the table at the end of this
document. To reject the null hypothesis of no difference between means at p < .05 for a
two-sided test with 40 df, we would need a t statistic of at least 2.021. (This compares
with value of 1.96 for the normal distribution.)
Instead of computing means and standard deviations, using the above formula,
and looking up probability values in tables, you can use Excel to perform the t-test. The
function “TTEST” returns the probability value. With this function, you can specify a
one- or two-tailed test (the third argument) and whether the variances are assumed to be
homogeneous or not (the fourth argument: 2 = homogeneous, 3 = heterogeneous). If you
want to calculate the actual t statistic, use the TINV function.
Example. The internal auditor for E-MFE suspects that the 25 northern Montreal delivery
outlets have different levels of delivery expenses than the 25 southern outlets. Before she
modifies her planning regression model, she wishes to verify that there is a difference.
9
NORTHERN OUTLETS
3196
3136
3165
3464
3342
3153
3168
3531
2957
3451
3108
3117
3376
3338
3509
3163
2987
3391
3140
3310
3294
3358
3255
3399
3120
mean
standard
deviation
SOUTHERN OUTLETS
3303
3055
3225
3134
3215
2959
3541
3338
3305
3381
3146
2876
3127
3198
3250
3180
2836
3205
3170
3187
3088
3110
2986
3244
3104
3257.12
3166.52
156.93
154.56
Do these delivery-expense data indicate that northern outlets and southern outlets have
different levels of expense?
The standard deviations are close in magnitude, so we can assume that the homogeneity
of variance assumption is satisfied. (Because the sample sizes are equal, unequal
variances are not a major concern anyway.) We compute the t statistic, as follows:
t
3257.12  3166.52
(25  1)(156.93) 2  (25  1)(154.56) 2  1
1
  
25  25  2
 25 25 
= 2.057.
There are 25 + 25 – 2 = 48 df associated with the statistic. Because the internal auditor
does not have reason to expect the northern or southern outlets to have the higher
expenses, a two-tailed test should be employed. The table indicates that there is a
probability of less than .05 (i.e., 2 * .025) that we would observe a t of 2.011 or greater if
the means were really the same. Thus, because our observed t of 2.057 is greater than the
tabled value, we reject the null hypothesis and conclude that the evidence indicates that
the northern outlets have different levels of delivery expense than the southern outlets.
Specifically, the northern outlets’ expenses are higher.
If we were to use Excel, we could obtain the probability directly by using the following
function: TTEST(B2:D10,F2:H10,2,2). This assumes that the northern outlets’ expenses
are in cells B2:D10, the southerns’ are in cells F2:H10, a two-tailed test is used (the “2”
as the third argument), and the variances are equal (the “2” as the fourth argument). The
TTEST function returns a value of .04518 (which is consistent with the probability we
obtained from the table). If we wanted to obtain the t statistic, we could use the function
TINV(.04518,48); the first argument is the two-tailed probability value and the second is
the number of df. The TINV function returns a value of 2.0566, which is the same as
calculated by the formula above.
10
Table: Chi-Square Probabilities
Locate the appropriate degrees of freedom in the left column. The top row gives the
probability under the null hypothesis that the observed statistic is greater than or equal to
the value in the body of the table.
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
0.995
--0.010
0.072
0.207
0.412
0.676
0.989
1.344
1.735
2.156
2.603
3.074
3.565
4.075
4.601
5.142
5.697
6.265
6.844
7.434
8.034
8.643
9.260
9.886
10.520
11.160
11.808
12.461
13.121
13.787
20.707
0.99
--0.020
0.115
0.297
0.554
0.872
1.239
1.646
2.088
2.558
3.053
3.571
4.107
4.660
5.229
5.812
6.408
7.015
7.633
8.260
8.897
9.542
10.196
10.856
11.524
12.198
12.879
13.565
14.256
14.953
22.164
0.975
0.001
0.051
0.216
0.484
0.831
1.237
1.690
2.180
2.700
3.247
3.816
4.404
5.009
5.629
6.262
6.908
7.564
8.231
8.907
9.591
10.283
10.982
11.689
12.401
13.120
13.844
14.573
15.308
16.047
16.791
24.433
0.95
0.004
0.103
0.352
0.711
1.145
1.635
2.167
2.733
3.325
3.940
4.575
5.226
5.892
6.571
7.261
7.962
8.672
9.390
10.117
10.851
11.591
12.338
13.091
13.848
14.611
15.379
16.151
16.928
17.708
18.493
26.509
0.90
0.016
0.211
0.584
1.064
1.610
2.204
2.833
3.490
4.168
4.865
5.578
6.304
7.042
7.790
8.547
9.312
10.085
10.865
11.651
12.443
13.240
14.041
14.848
15.659
16.473
17.292
18.114
18.939
19.768
20.599
29.051
0.10
2.706
4.605
6.251
7.779
9.236
10.645
12.017
13.362
14.684
15.987
17.275
18.549
19.812
21.064
22.307
23.542
24.769
25.989
27.204
28.412
29.615
30.813
32.007
33.196
34.382
35.563
36.741
37.916
39.087
40.256
51.805
0.05
3.841
5.991
7.815
9.488
11.070
12.592
14.067
15.507
16.919
18.307
19.675
21.026
22.362
23.685
24.996
26.296
27.587
28.869
30.144
31.410
32.671
33.924
35.172
36.415
37.652
38.885
40.113
41.337
42.557
43.773
55.758
0.025
5.024
7.378
9.348
11.143
12.833
14.449
16.013
17.535
19.023
20.483
21.920
23.337
24.736
26.119
27.488
28.845
30.191
31.526
32.852
34.170
35.479
36.781
38.076
39.364
40.646
41.923
43.195
44.461
45.722
46.979
59.342
0.01
6.635
9.210
11.345
13.277
15.086
16.812
18.475
20.090
21.666
23.209
24.725
26.217
27.688
29.141
30.578
32.000
33.409
34.805
36.191
37.566
38.932
40.289
41.638
42.980
44.314
45.642
46.963
48.278
49.588
50.892
63.691
0.005
7.879
10.597
12.838
14.860
16.750
18.548
20.278
21.955
23.589
25.188
26.757
28.300
29.819
31.319
32.801
34.267
35.718
37.156
38.582
39.997
41.401
42.796
44.181
45.559
46.928
48.290
49.645
50.993
52.336
53.672
66.766
11
50
60
70
80
90
100
27.991
35.534
43.275
51.172
59.196
67.328
29.707
37.485
45.442
53.540
61.754
70.065
32.357
40.482
48.758
57.153
65.647
74.222
34.764
43.188
51.739
60.391
69.126
77.929
37.689 63.167 67.505 71.420 76.154 79.490
46.459 74.397 79.082 83.298 88.379 91.952
55.329 85.527 90.531 95.023 100.425 104.215
64.278 96.578 101.879 106.629 112.329 116.321
73.291 107.565 113.145 118.136 124.116 128.299
82.358 118.498 124.342 129.561 135.807 140.169
12
Table: Areas under the Standard Normal Distribution
Locate the appropriate z-score by referring to the left column (for z to one decimal place)
and the top column (for the second decimal place). Tabled values represent the area
under the curve between 0 and the observed z. To find the area beyond the observed z,
subtract the tabled amount from .50. This area is the normally reported (one-tailed) pvalue. Since the distribution is symmetric, if the observed z-score is negative, use its
absolute value. For a two-tailed test, double the tabled values.
Area between 0 and z
z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
13
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
14
Table: The t Distribution
To use the following tabled values, specify a value for α (the acceptable level
of significance; that is, the acceptable probability of rejecting the null
hypothesis of “equal means” when the null hypothesis is really true):
1. For a two-sided test, find the column corresponding to α/2 and reject
the null hypothesis if the absolute value of the test statistic is greater
than the value of tα/2 in the table below.
2. For a one-sided test, make sure that the means are in the hypothesized
direction. If so, find the column corresponding to α and reject the null
hypothesis if the absolute value of the test statistic is greater than the
tabled value.
Upper critical values of Student's t distribution with ν degrees of freedom
Probability of exceeding the critical value
ν
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
0.10
0.05
0.025
0.01
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
0.005
0.001
63.657 318.313
9.925 22.327
5.841 10.215
4.604
7.173
4.032
5.893
3.707
5.208
3.499
4.782
3.355
4.499
3.250
4.296
3.169
4.143
3.106
4.024
3.055
3.929
3.012
3.852
2.977
3.787
2.947
3.733
2.921
3.686
2.898
3.646
2.878
3.610
2.861
3.579
2.845
3.552
2.831
3.527
2.819
3.505
2.807
3.485
2.797
3.467
2.787
3.450
2.779
3.435
15
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
1.314
1.313
1.311
1.310
1.309
1.309
1.308
1.307
1.306
1.306
1.305
1.304
1.304
1.303
1.303
1.302
1.302
1.301
1.301
1.300
1.300
1.299
1.299
1.299
1.298
1.298
1.298
1.297
1.297
1.297
1.297
1.296
1.296
1.296
1.296
1.295
1.295
1.295
1.295
1.295
1.294
1.294
1.294
1.294
1.294
1.293
1.293
1.293
1.293
1.293
1.293
1.292
1.292
1.292
1.292
1.703
1.701
1.699
1.697
1.696
1.694
1.692
1.691
1.690
1.688
1.687
1.686
1.685
1.684
1.683
1.682
1.681
1.680
1.679
1.679
1.678
1.677
1.677
1.676
1.675
1.675
1.674
1.674
1.673
1.673
1.672
1.672
1.671
1.671
1.670
1.670
1.669
1.669
1.669
1.668
1.668
1.668
1.667
1.667
1.667
1.666
1.666
1.666
1.665
1.665
1.665
1.665
1.664
1.664
1.664
2.052
2.048
2.045
2.042
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
2.023
2.021
2.020
2.018
2.017
2.015
2.014
2.013
2.012
2.011
2.010
2.009
2.008
2.007
2.006
2.005
2.004
2.003
2.002
2.002
2.001
2.000
2.000
1.999
1.998
1.998
1.997
1.997
1.996
1.995
1.995
1.994
1.994
1.993
1.993
1.993
1.992
1.992
1.991
1.991
1.990
1.990
1.990
2.473
2.467
2.462
2.457
2.453
2.449
2.445
2.441
2.438
2.434
2.431
2.429
2.426
2.423
2.421
2.418
2.416
2.414
2.412
2.410
2.408
2.407
2.405
2.403
2.402
2.400
2.399
2.397
2.396
2.395
2.394
2.392
2.391
2.390
2.389
2.388
2.387
2.386
2.385
2.384
2.383
2.382
2.382
2.381
2.380
2.379
2.379
2.378
2.377
2.376
2.376
2.375
2.374
2.374
2.373
2.771
2.763
2.756
2.750
2.744
2.738
2.733
2.728
2.724
2.719
2.715
2.712
2.708
2.704
2.701
2.698
2.695
2.692
2.690
2.687
2.685
2.682
2.680
2.678
2.676
2.674
2.672
2.670
2.668
2.667
2.665
2.663
2.662
2.660
2.659
2.657
2.656
2.655
2.654
2.652
2.651
2.650
2.649
2.648
2.647
2.646
2.645
2.644
2.643
2.642
2.641
2.640
2.640
2.639
2.638
3.421
3.408
3.396
3.385
3.375
3.365
3.356
3.348
3.340
3.333
3.326
3.319
3.313
3.307
3.301
3.296
3.291
3.286
3.281
3.277
3.273
3.269
3.265
3.261
3.258
3.255
3.251
3.248
3.245
3.242
3.239
3.237
3.234
3.232
3.229
3.227
3.225
3.223
3.220
3.218
3.216
3.214
3.213
3.211
3.209
3.207
3.206
3.204
3.202
3.201
3.199
3.198
3.197
3.195
3.194
16
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
1.292
1.292
1.292
1.292
1.291
1.291
1.291
1.291
1.291
1.291
1.291
1.291
1.291
1.291
1.290
1.290
1.290
1.290
1.290
1.282
1.664
1.663
1.663
1.663
1.663
1.663
1.662
1.662
1.662
1.662
1.662
1.661
1.661
1.661
1.661
1.661
1.661
1.660
1.660
1.645
1.989
1.989
1.989
1.988
1.988
1.988
1.987
1.987
1.987
1.986
1.986
1.986
1.986
1.985
1.985
1.985
1.984
1.984
1.984
1.960
2.373
2.372
2.372
2.371
2.370
2.370
2.369
2.369
2.368
2.368
2.368
2.367
2.367
2.366
2.366
2.365
2.365
2.365
2.364
2.326
2.637
2.636
2.636
2.635
2.634
2.634
2.633
2.632
2.632
2.631
2.630
2.630
2.629
2.629
2.628
2.627
2.627
2.626
2.626
2.576
3.193
3.191
3.190
3.189
3.188
3.187
3.185
3.184
3.183
3.182
3.181
3.180
3.179
3.178
3.177
3.176
3.175
3.175
3.174
3.090
© 2006 by Eric E. Spires. This document was prepared for use in AMIS 822 at The Ohio State University.
If you have comments or questions, please contact the author at [email protected]. A primary source for
the descriptions of some of the statistics was Nonparametric Statistics for the Behavioral Sciences, 2nd ed.,
by S. Siegel and N. J. Castellan (McGraw-Hill, 1988).
17