Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Hypothesis Testing COMP 245 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Error Rates and Power of a Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 2 Testing for a population mean 2.1 Normal Distribution with Known Variance . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Duality with Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . 2.2 Normal Distribution with Unknown Variance . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 3 Testing for differences in population means 3.1 Two Sample Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Normal Distributions with Known Variances . . . . . . . . . . . . . . . . . . . . . 3.3 Normal Distributions with Unknown Variances . . . . . . . . . . . . . . . . . . . 7 7 7 8 4 Goodness of Fit 4.1 Count Data and Chi-Square Tests 4.2 Proportions . . . . . . . . . . . . 4.3 Model Checking . . . . . . . . . . 4.4 Independence . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 12 13 13 1 1.1 Hypothesis Testing Introduction Hypotheses Suppose we are going to obtain a random i.i.d. sample X = ( X1 , . . . , Xn ) of a random variable X with an unknown distribution PX . To proceed with modelling the underlying population, we might hypothesise probability models for PX and then test whether such hypotheses seem plausible in light of the realised data x = ( x1 , . . . , xn ). Or, more specifically, we might fix upon a parametric family PX |θ with unknown parameter θ and then hypothesise values for θ. Generically let θ0 denote a hypothesised value for θ. Then after observing the data, we wish to test whether we can indeed reasonably assume θ = θ0 . For example, if X ∼ N(µ, σ2 ) we may wish to test whether µ = 0 is plausible in light of the data x. Formally, we define a null hypothesis H0 as our hypothesised model of interest, and also specify an alternative hypothesis H1 of rival models against which we wish to test H0 . Most often we simply test H0 : θ = θ0 against H1 : θ 6= θ0 . This is known as a twosided test. In some situations it may be more appropriate to consider alternatives of the form H1 : θ > θ0 or H1 : θ < θ0 , known as one-sided tests. Rejection Region for a Test Statistic To test the validity of H0 , we first choose a test statistic T ( X ) of the data for which we can find the distribution, PT , under H0 . Then, we identify a rejection region R ⊂ R of low probability values of T under the assumption that H0 is true, so P( T ∈ R| H0 ) = α for some small probability α (typically 5%). A well chosen rejection region will have relatively high probability under H1 , whilst retaining low probability under H0 . Finally, we calculate the observed test statistic t( x ) for our observed data x. If t ∈ R we “reject the null hypothesis at the 100α% level”. p-Values For each possible significance level α ∈ (0, 1), a hypothesis test at the 100α% level will result in either rejecting or not rejecting H0 . As α → 0 it becomes less and less likely that the null hypothesis will be rejected, as the rejection region is becoming smaller and smaller. Similarly, as α → 1 it becomes more and more likely that the null hypothesis will be rejected. For a given data set and resulting test statistic, we might, therefore, be interested in identifying the critical significance level which marks the threshold between us rejecting and not rejecting the null hypothesis. This is known as the p-value of the data. Smaller p-values suggest stronger evidence against H0 . 1.2 Error Rates and Power of a Test Test Errors There are two types of error in the outcome of a hypothesis test: 2 • Type I: Rejecting H0 when in fact H0 is true. By construction, this happens with probability α. For this reason, the significance level of a hypothesis test is also referred to as the Type I error rate. • Type II: Not rejecting H0 when in fact H1 is true. The probability with which this type of error will occur depends on the unknown true value of θ, so to calculate values we plug-in a plausible alternative value for θ 6= θ0 , θ1 say, and let β = P( T ∈ / R|θ = θ1 ) be the probability of a Type II error. Power We define the power of a hypothesis test by 1 − β = P( T ∈ R | θ = θ1 ). For a fixed significance level α, a well chosen test statistic T and rejection region R will have high power - that is, maximise the probability of rejecting the null hypothesis when the alternative is true. 2 2.1 Testing for a population mean Normal Distribution with Known Variance N(µ, σ2 ) - σ2 Known Suppose X1 , . . . , Xn are i.i.d. N(µ, σ2 ) with σ2 known and µ unknown. We may wish to test if µ = µ0 for some specific value µ0 (e.g. µ0 = 0, µ0 = 9.8). Then we can state our null and alternative hypotheses as H0 :µ = µ0 ; H1 :µ 6= µ0 . Under H0 : µ = µ0 , we then know both µ and σ2 . So for the sample mean X̄ we have a known distribution for the test statistic Z= X̄ − µ0 √ ∼ Φ. σ/ n So if we define our rejection region R to be the 100α% tails of the standard normal distribution distribution, R = −∞, −z1− α2 ∪ z1− α2 , ∞ n o ≡ z |z| > z1− α2 , we have P( Z ∈ R) = α under H0 . We thus reject H0 at the 100α% significance level ⇐⇒ our observed test statistic z = x̄ − µ0 √ ∈ R. σ/ n The p-value is given by 2 × {1 − Φ(|z|)}. 3 0.2 R 0.0 0.1 φ(z) 0.3 0.4 Ex. 5% Rejection Region for N(0, 1) Statistic −4 −2 0 2 4 z 2.1.1 Duality with Confidence Intervals There is a strong connection in this context between hypothesis testing and confidence intervals. Suppose we have constructed a 100(1 − α)% confidence interval for µ . Then this is precisely the set of values {µ0 } for which there would be insufficient evidence to reject a null hypothesis H0 : µ = µ0 at the 100α%-level. Example A company makes packets of snack foods. The bags are labelled as weighing 454g; of course they won’t all be exactly 454g, and let’s suppose the variance of bag weights is known to be 70g². The following data show the mass in grams of 50 randomly sampled packets. 464, 450, 450, 456, 452, 433, 446, 446, 450, 447, 442, 438, 452, 447, 460, 450, 453, 456, 446, 433, 448, 450, 439, 452, 459, 454, 456, 454, 452, 449, 463, 449, 447, 466, 446, 447, 450, 449, 457, 464, 468, 447, 433, 464, 469, 457, 454, 451, 453, 443 Are these data consistent with the claim that the mean weight of packets is 454g? 1. We wish to test H0 : µ = 454 vs. H1 : µ 6= 454. So set µ0 = 454. 2. Although we have not been told that the packet weights are individually normally distributed, by the CLT we still have that the mean weight of the sample of packets is apX̄ − µ0 √ ∼ proximately normally distributed, and hence we still approximately have Z = σ/ n Φ. x̄ − µ0 √ = −2.350. 3. x̄ = 451.22 and n = 50 ⇒ z = σ/ n 4. For a 5%-level significance test, we compare the statistic z = −2.350 with the rejection region R = (−∞, −z0.975 ) ∪ (z0.975 , ∞) = (−∞, −1.96) ∪ (1.96, ∞). Clearly we have z ∈ R, and so at the 5%-level we reject the null hypothesis that the mean packet weight is 454g. 4 5. At which significance levels would we have not rejected the null hypothesis? • For a 1%-level significance test, the rejection region would have been R = (−∞, −z0.995 ) ∪ (z0.995 , ∞) = (−∞, −2.576) ∪ (2.576, ∞). In which case z ∈ / R, and so at the 1%-level we would not have rejected the null hypothesis. • The p-value is 2 × {1 − Φ(|z|)} = 2 × {1 − Φ(| − 2.350|)} ≈ 2(1 − 0.9906) = 0.019, and so we would only reject the null hypothesis for α > 1.9%. 2.2 Normal Distribution with Unknown Variance N(µ, σ2 ) - σ2 Unknown Similarly, if σ2 in the above example were unknown, we still have that T= X̄ − µ0 √ ∼ t n −1 . s n −1 / n So for a test of H0 : µ = µ0 vs. H1 : µ 6= µ0 at the α level, the rejection region of our x̄ − µ0 √ is observed test statistic t = s n −1 / n R = −∞, −tn−1,1− α2 ∪ tn−1,1− α2 , ∞ o n ≡ t |t| > tn−1,1− α2 . 0.2 0.1 R 0.0 f(t) 0.3 0.4 Ex. 5% Rejection Region for t5 Statistic −4 −2 0 t 5 2 4 Example 1 Consider again the snack food weights example. There, we assumed the variance of bag weights was known to be 70. Without this, we could have estimated the variance by s2n−1 = n 1 ( xi − x̄ )2 = 70.502. n − 1 i∑ =1 Then the corresponding t-statistic becomes t= x̄ − µ0 √ = −2.341, s n −1 / n very similar to the z-statistic of before. And since n = 50, we compare with the t49 distribution which is approximately N(0, 1). So the hypothesis test results and p-value would be practically identical. Example 2 A particular piece of code takes a random time to run on a computer, but the average time is known to be 6 seconds. The programmer tries an alternative optimisation in compilation and wishes to know whether the mean run time has changed. To explore this, he runs the reoptimised code 16 times, obtaining a sample mean run time of 5.8 seconds and bias-corrected sample standard deviation of 1.2 seconds. Is the code any faster? 1. We wish to test H0 : µ = 6 vs. H1 : µ 6= 6. So set µ0 = 6. 2. Assuming the run times are approximately normal, T = X̄ − µ0 √ ∼ tn−1 . That is, s n −1 / n X̄ − 6 √ ∼ t15 . So we reject H0 at the 100α% level if | T | > t15,1−α/2 . sn−1 / 16 3. x̄ = 5.8, sn−1 = 1.2 and n = 16 ⇒ t = −2 x̄ − µ0 √ = . 3 s n −1 / n 4. We have |t| = 2/3 << 2.13 = t15,.975 , so we have insufficient evidence to reject H0 at the 5% level. 5. In fact, the p-value for these data is 51.51%, so there is very little evidence to suggest the code is now any faster. 6 3 3.1 Testing for differences in population means Two Sample Problems Samples from 2 Populations Suppose, as before, we have a random sample X = ( X1 , . . . , Xn1 ) from an unknown population distribution PX . But now, suppose we have a further random sample Y = (Y1 , . . . , Yn2 ) from a second, different population PY . Then we may wish to test hypotheses concerning the similarity of the two distributions PX and PY . In particular, we are often interested in testing whether PX and PY have equal means. That is, to test H0 : µ X = µY vs H1 : µ X 6= µY . Paired Data A special case is when the two samples X and Y are paired. That is, if n1 = n2 = n and the data are collected as pairs ( X1 , Y1 ), . . . , ( Xn , Yn ) so that, for each i, Xi and Yi are possibly dependent. For example, we might have a random sample of n individuals and Xi represents the heart rate of the ith person before light exercise and Yi the heart rate of the same person afterwards. In this special case, for a test of equal means we can consider the sample of differences Z1 = X1 − Y1 , . . . , Zn = Xn − Yn and test H0 : µ Z = 0 using the single sample methods we have seen. In the above example, this would test whether light exercise causes a change in heart rate. 3.2 Normal Distributions with Known Variances N(µ X , σX2 ), N(µY , σY2 ) Suppose • X = ( X1 , . . . , Xn1 ) are i.i.d. N(µ X , σX2 ) with µ X unknown; • Y = (Y1 , . . . , Yn2 ) are i.i.d. N(µY , σY2 ) with µY unknown; • the two samples X and Y are independent. Then we still have that, independently, σ2 X̄ ∼ N µ X , X n1 σ2 Ȳ ∼ N µY , Y n2 From this it follows that the difference in sample means, σ2 σ2 X̄ − Ȳ ∼ N µ X − µY , X + Y n1 n2 7 , and hence ( X̄ − Ȳ ) − (µ X − µY ) q ∼ Φ. σX2 /n1 + σY2 /n2 So under the null hypothesis H0 : µ X = µY , we have Z= q X̄ − Ȳ σX2 /n1 + σY2 /n2 ∼ Φ. N(µ X , σX2 ), N(µY , σY2 ) - σX2 , σY2 Known So if σX2 and σY2 are known, we immediately have a test statistic z= q x̄ − ȳ σX2 /n1 + σY2 /n2 which we can compare against the quantiles of a standard normal. That is, o n R = z |z| > z1− α2 , gives a rejection region for a hypothesis test of H0 : µ X = µY vs. H1 : µ X 6= µY at the 100α% level. 3.3 Normal Distributions with Unknown Variances N(µ X , σ2 ), N(µY , σ2 ) - σ2 Unknown On the other hand, suppose σX2 and σY2 are unknown. Then if we know σX2 = σY2 = σ2 but σ2 is unknown, we can still proceed. We have ( X̄ − Ȳ ) − (µ X − µY ) √ ∼ Φ, σ 1/n1 + 1/n2 and so, under H0 : µ X = µY , X̄ − Ȳ ∼ Φ. σ 1/n1 + 1/n2 √ but with σ unknown. Pooled Estimate of Population Variance We need an estimator for the variance using samples from two populations with different means. Just combining the samples together into one big sample would over-estimate the variance, since some of the variability in the samples would be due to the difference in µ X and µY . So we define the bias-corrected pooled sample variance n Sn2 1 +n2 −2 = n ∑i=1 1 ( Xi − X̄ )2 + ∑i=2 1 (Yi − Ȳ )2 , n1 + n2 − 2 8 which is an unbiased estimator for σ2 . We can immediately see that s2n1 +n2 −2 is indeed an unbiased estimate of σ2 by noting Sn2 1 +n2 −2 = n2 − 1 n1 − 1 S2 + S2 ; n 1 + n 2 − 2 n1 −1 n 1 + n 2 − 2 n2 −1 That is, s2n1 +n2 −2 is a weighted average of the bias-corrected sample variances for the individual samples x and y, which are both unbiased estimates for σ2 . Then substituting Sn1 +n2 −2 in for σ we get ( X̄ − Ȳ ) − (µ X − µY ) √ ∼ tn1 + n2 −2 , Sn1 +n2 −2 1/n1 + 1/n2 and so, under H0 : µ X = µY , T= X̄ − Ȳ √ ∼ tn1 + n2 −2 . Sn1 +n2 −2 1/n1 + 1/n2 So we have a rejection region for a hypothesis test of H0 : µ X = µY vs. H1 : µ X 6= µY at the 100α% level given by n o R = t |t| > tn1 +n2 −2,1− α2 , for the statistic t= x̄ − ȳ √ . sn1 +n2 −2 1/n1 + 1/n2 Example The same piece of C code was repeatedly run after compilation under two different C compilers, and the run times under each compiler were recorded. The sample mean and biascorrected sample standard deviation for Compiler 1 were 114s and 310s respectively, and the corresponding figures for Compiler 2 were 94s and 290s. Both sets of data were each based on 15 runs. Suppose that Compiler 2 is a refined version of Compiler 1, and so if µ1 , µ2 are the expected run times of the code under the two compilations, we might fairly assume µ2 ≤ µ1 . Conduct a hypothesis test of H0 : µ1 = µ2 vs H1 : µ1 > µ2 at the 5% level. Until now we have mostly considered two-sided tests. That is tests of the form H0 : θ = θ0 vs H1 : θ 6= θ0 . Here we need to consider one-sided tests, which differ by the alternative hypothesis being of the form H1 : θ < θ0 or H1 : θ > θ0 . This presents no extra methodological challenge and requires only a slight adjustment in the construction of the rejection region. We still use the t-statistic t= x̄ − ȳ √ , sn1 +n2 −2 1/n1 + 1/n2 9 where x̄, ȳ are the sample mean run times under Compilers 1 and 2 respectively. But now the one-sided rejection region becomes R = {t |t > tn1 +n2 −2,1−α } . First calculating the bias-corrected pooled sample variance, we get s2n1 +n2 −2 = 14 × 310 + 14 × 290 = 300. 28 (Note that since the sample sizes n1 and n2 are equal, the pooled estimate of the variance is the average of the individual estimates.) x̄ − ȳ 114 − 94 √ =√ √ sn +n −2 1/n1 + 1/n2 300 1/15 + 1/15 √1 2 = 10 = 3.162. So t = For a 1-sided test we compare t = 3.162 with t28,0.95 = 1.701 and conclude that we reject the null hypothesis at the 5% level; the second compilation is significantly faster. 10 4 4.1 Goodness of Fit Count Data and Chi-Square Tests Count Data The results in the previous sections relied upon the data being either normally distributed, or at least through the CLT having the sample mean being approximately normally distributed. Tests were then developed for making inference on population means under those assumptions. These tests were very much model-based. Another important but very different problem concerns model checking, which can be addressed through a more general consideration of count data for simple (discrete and finite) distributions. The following ideas can then be trivially extended to infinite range discrete and continuous r.v.s by binning observed samples into a finite collection of predefined intervals. Samples from a Simple Random Variable Let X be a simple random variable taking values in the range { x1 , . . . , xk }, with probability mass function p j = P( X = x j ), j = 1, . . . , k. A random sample of size n from the distribution of X can be summarised by the observed frequency counts O = (O1 , . . . , Ok ) at the points x1 , . . . , xk (so ∑kj=1 O j = n). Suppose it is hypothesised that the true pmf { p j } is from a particular parametric model p j = P( X = x j |θ ), j = 1, . . . , k for some unknown parameter p-vector θ. To test this hypothesis about the model, we first need to estimate the unknown parameters θ so that we are able to calculate the distribution of any statistic under the null hypothesis H0 : p j = P( X = x j |θ ), j = 1, . . . , k. Let θ̂ be such an estimator, obtained using the sample O. Then under H0 we have estimated probabilities for the pmf p̂ j = P( X = x j |θ̂ ), j = 1, . . . , k and so we are able to calculate estimated expected frequency counts E = ( E1 , . . . , Ek ) by Ej = n p̂ j . (Note again we have ∑kj=1 Ej = n.) We then seek to compare the observed frequencies with the expected frequencies to test for goodness of fit. Chi-Square Test To test H0 : p j = P( X = x j |θ ) vs. H1 : 0 ≤ p j ≤ 1, ∑ p j = 1 we use the chi-square statistic X2 = k (Oi − Ei )2 . Ei i =1 ∑ If H0 were true, then the statistic X 2 would approximately follow a chi-square distribution with ν = k − p − 1 degrees of freedom. • k is the number of values (categories) the simple r.v. X can take. • p is the number of parameters being estimated (dim(θ )). • For the approximation to be valid, we should have ∀ j, Ej ≥ 5. This may require some merging of categories. 11 0.6 χ2ν pdf for ν = 1, 2, 3, 5, 10 0.3 0.0 0.1 0.2 χ2ν(x) 0.4 0.5 χ21 χ22 χ23 χ25 χ210 0 2 4 6 8 10 x Rejection Region Clearly larger values of X 2 correspond to larger deviations from the null hypothesis model. That is, if X 2 = 0 the observed counts exactly match those expected under H0 . For this reason, we always perform a one-sided goodness of fit test using the χ2 statistic, looking only at the upper tail of the distribution. Hence the rejection region for a goodness of fit hypothesis test at the at the 100α% level is given by R= 4.2 n o x2 x2 > χ2k− p−1,1−α . Proportions Example Each year, around 1.3 million people in the USA suffer adverse drug effects (ADEs). A study in the Journal of the American Medical Association (July 5, 1995) gave the causes of 95 ADEs below. Cause Lack of knowledge of drug Rule violation Faulty dose checking Slips Other Number of ADEs 29 17 13 9 27 Test whether the true percentages of ADEs differ across the 5 causes. Under the null hypothesis that the 5 causes are equally likely, we would have expected counts of 95 5 = 19 for each cause. 12 So our χ2 statistic becomes (29 − 19)2 (17 − 19)2 (13 − 19)2 (9 − 19)2 (27 − 19)2 + + + + 19 19 19 19 19 100 4 36 100 64 304 = + + + + = = 16. 19 19 19 19 19 19 x2 = We have not estimated any parameters from the data, so we compare x2 with the quantiles of the χ25−1 = χ24 distribution. Well 16 > 9.49 = χ24,0.95 , so we reject the null hypothesis at the 5% level; we have reason to suppose that there is a difference in the true percentages across the different causes. 4.3 Model Checking Example - Fitting a Poisson Distribution to Data Recall the example from the Discrete Random Variables chapter, where the number of particles emitted by a radioactive substance which reached a Geiger counter was measured for 2608 time intervals, each of length 7.5 seconds. We fitted a Poisson(λ) distribution to the data by plugging in the sample mean number of counts (3.870) for the rate parameter λ. (Which we now know to be the MLE!) x O(n x ) E(n x ) 0 57 54.4 1 203 210.5 2 383 407.4 3 525 525.5 4 532 508.4 5 408 393.5 6 273 253.8 7 139 140.3 8 45 67.9 9 27 29.2 ≥10 16 17.1 (O=Observed, E=Expected). Whilst the fitted Poisson(3.87) expected frequencies looked quite convincing to the eye, at that time we had no formal method of quantitatively assessing the fit. However, we now know how to proceed. x O E O−E 0 57 54.4 2.6 1 203 210.5 -7.5 2 383 407.4 -24.4 3 525 525.5 -0.5 4 532 508.4 23.6 5 408 393.5 14.5 6 273 253.8 19.2 7 139 140.3 -1.3 8 45 67.9 22.9 9 27 29.2 2.2 (O− E)2 E 0.124 0.267 1.461 0.000 1.096 0.534 1.452 0.012 7.723 0.166 The statistic x2 = ∑ (O− E)2 E ≥10 16 17.1 1.1 0.071 = 12.906 should be compared with a χ211−1−1 = χ29 distribution. Well χ29,0.95 = 16.91, so at the 5% level we do not reject the null hypothesis of a Poisson(3.87) model for the data. 4.4 Independence Contingency Tables Suppose we have two simple random variables X and Y which are jointly distributed with unknown probability mass function p XY . We are often interested in trying to ascertain whether X and Y are independent. That is, determine whether p XY ( x, y) = p X ( x ) pY (y). 13 Let the ranges of the r.v.s X and Y be { x1 , . . . , xk } and {y1 , . . . , y` } respectively. Then an i.i.d. sample of size n from the joint distribution of ( X, Y ) can be represented by a list of counts nij (1 ≤ i ≤ k; 1 ≤ j ≤ `) of the number of times we observe the pair ( xi , y j ). Tabulating these data in the following way gives what is known as a k × ` contingency table. x1 x2 .. . xk y1 n11 n21 y2 n12 n22 nk1 n ·1 nk2 n ·2 ... ... y` n 1` n 2` n 1· n 2· nk` n·` nk· n Note the row sums (n1· , n2· , . . . , nk· ) represent the frequencies of x1 , x2 , . . . , xk in the sample (that is, ignoring the value of Y). Similarly for the column sums (n·1 , n·2 , . . . , n·` ) and y1 , . . . , y` . Under the null hypothesis H0 : X and Y are independent, the expected values of the entries of the contingency table, conditional on the row and column sums, can be estimated by n̂ij = ni · × n · j , n 1 ≤ i ≤ k, 1 ≤ j ≤ `. ni · To see this, consider the marginal distribution of X; we could estimate p X ( xi ) by p̂i· = . n n· j . Similarly for pY (y j ) we get p̂· j = n Then under the null hypothesis of independence p XY ( xi , y j ) = p X ( xi ) pY (y j ), and so we can estimate p XY ( xi , y j ) by ni · × n · j p̂ij = p̂i· × p̂· j = . n2 Now that we have a set of expected frequencies to compare against our k × ` observed frequencies, a χ2 test can be performed. We are using both the row and column sums to estimate our probabilities, and there are k and ` of these respectively. So we compare our calculated x2 statistic against a χ2 distribution with k ` − {(k − 1) + (` − 1)} − 1 = (k − 1)(` − 1) degrees of freedom. Hence the rejection region for a hypothesis test of independence in a k × ` contingency table at the at the 100α% level is given by R= n o x2 x2 > χ2(k−1)(`−1),1−α . 14 Example An article in International Journal of Sports Psychology (July-Sept 1990) evaluated the relationship between physical fitness and stress. 549 people were classified as good, average, or poor fitness, and were also tested for signs of stress (yes or no). The data are shown in the table below. Stress No stress Poor Fitness 206 36 242 Average Fitness 184 28 212 Good Fitness 85 10 95 475 74 549 Is there any relationship between stress and fitness? Under independence we would estimate the expected values to be Stress No stress Poor Fitness 209.4 32.6 242 Average Fitness 183.4 28.6 212 Good Fitness 82.2 12.8 95 475 74 549 Hence the χ2 statistic is calculated to be X2 = (Oi − Ei )2 (206 − 209.4)2 (10 − 12.88)2 = + . . . + = 1.1323. ∑ Ei 209.4 12.8 i This should be compared with a χ2 distribution with (2 − 1) × (3 − 1) = 2 degrees of freedom. χ22,0.95 = 5.99, so we have no significant evidence to suggest there is any relationship between fitness and stress. 15