* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 7 Estimation and testing
Degrees of freedom (statistics) wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
Chapter 7 Estimation and testing The researcher can never be certain that his observations are uncontaminated by error. No matter how carefully one may be in planning and conducting a study, a multitude of influences, unintended and unwanted by researcher, produces his data effects of unknown magnitude. These unintended differences or biases make interpretation of of the results of research quit difficult and introduce into the interpretation a degree of uncertainty which cannot be eliminated. The answer should be stated in probabilistic term. To estimate the value of a population parameter, one can use information from the sample in the form of an estimator. Estimators are calculated using information from the sample observation. An estimator is rule, usually expressed as a formula, that tells us how to calculate an estimate based on the information in the sample. Estimators are used in two different ways: Point estimator: Based on sample data, a single number is calculated to estimate the population parameter. Interval estimator: Based on sample data, two numbers are calculated to form an interval within which the parameter is expected to lie with a certain probability. The aim of statistical inference is to make certain determinations with regard to the unknown parameters figuring in the underlying distribution. This is to be done on the basis of data, represented by the observed values of a random sample drawn from said distribution. 7.1 Sampling distributions What exactly does the sample, an often tiny subset, tell us about the population? We can never observe the whole population, even if finite, except at enormous expense so the population mean and variance and indeed any aspect of the population distribution can never be known exactly. We call these unknown population quantities parameters and use Greek letters to denote them: µ(‘mu’) is the symbol commonly used for the population mean and σ(‘sigma’) for the population 1 2 CHAPTER 7. ESTIMATION AND TESTING standard deviation. As we have a sample of n observations we need to ask the question: Is x̄ a ‘good’ estimate of µ? ? We know that for almost all samples x̄ is not equal to µ but how close is it? Can we answer this question without knowing what µ is? Does x̄ get closer to µ as n increases? We need to study the properties of the sample mean as an estimator of the population mean and we achieve this by looking at the values x̄ can take over all possible samples: the sampling distribution. Of course we can never examine all possible samples but the easy availability of a statistical package like Splus enables us to study sampling properties much more readily. We can actually illustrate the theoretical results of this chapter by conducting a simulation experiment. A parameter is a numerical characteristic of the population of interest. Parameters are usually unknown and we make inferences on them using the sample data. (Examples: p, the probability of ‘success’ is a parameter of a Binomial population distribution. The rate of ‘failure’ λ is a parameter of a Poisson distribution and also of an exponential distribution of ‘lifetimes’ i.e. time to ‘failure’ in a Poisson process where ‘failures’ occur randomly in time.) The population mean µ (‘mu’) is a common parameter. If the population is modelled or described by a p.d.f. (probability density function) fX (x) for a continuous variable X then Z µ = E[X] = xfX (x)dx If however X is a discrete random variable with probability function (or p.d.f.) pX (x) = P [X = x] then X µ = E[X] = xpX (x) Other parameters measuring location can be defined in terms of the c.d.f. FX (x) = P [X ≤ x]: for example the population median M and the upper and lower quartiles Q3 and Q1 respectively. The population variance σ 2 (‘sigma-squared’) is a common parameter measuring variability: 2 2 2 2 σ = V ar[X] = E[(X − µ) ] = E[X ] − µ = Z x2 fX (x)dx − µ2 (for X continuous, and similarly in the discrete case.) An estimate is a statistic that we hope will be ‘near’ to the parameter of interest. (For example, x̄ is an estimate of µ.) An estimator is a rule for calculating an estimate from any sample, usually a random sample. A random variable (r.v.) is a statistic whose value is determined once the sample data have been observed. Thus an estimator is a r.v. but, in general, a r.v. need not estimate anything. CHAPTER 7. ESTIMATION AND TESTING 3 Use upper case letters for r.v.s and the corresponding lower case letter for the values taken by the r.v.s. If Xi is the r.v. denoting the measurement of variable X on unit i in the sample (i = 1, 2, . . . , n) then n 1X Xi X̄ = n i=1 and n 1 X (Xi − X̄)2 S = n − 1 i=1 2 are the sample mean and variance respectively, considered as r.v.s. The sampling distribution of a r.v. is the collection or distribution of all possible values of the r.v. over all possible samples. The properties of the sampling distributions of these two r.v.s determine how we make inferences on the unknown µ (and σ 2 ) from any sample. If the sample is a random sample of size n from an infinite population then X1 , X2 , . . . , Xn are independent r.v.s each with the same distribution (i.e. same p.d.f. or probability function) as the population so that E[Xi ] = µ and V ar[Xi ] = σ 2 (i = 1, 2, . . . , n) The main result of this section is the following theorem: Theorem 7.1 Averaging over all random samples of size n from an arbitrary population with mean µ and variance σ 2 , the sample mean X̄ and sample variance S 2 have the following three properties: E[X̄] = µ i.e. X̄ is an unbiased estimator of µ V ar[X̄] = σ2 n E[S 2 ] = σ 2 i.e. the variability of X̄ as an estimator decreases with n i.e. S 2 is an unbiased estimator of σ 2 . Thus S 2 /n is used as an unbiased estimator of the variability or variance of X̄ as an estimator of µ. It is vital in statistics to have such an estimate so that inferences using probability can be made. CHAPTER 7. ESTIMATION AND TESTING 4 √ The theorem shows that σ/ n is the standard√deviation of the sampling distribution of X̄ as an estimator of µ. That is, σ/ n measures the variability of possible estimates √ x̄ about the ‘true’ population mean µ. A sample estimate of this variability is s/ n called the (estimated) standard error of the (sample) mean. As the sample size increases, but with the sample still random, the variability or uncertainty in our estimate of µ decreases monotonically with a limit of zero, i.e. knowledge without uncertainty as n → ∞. We can show using some rather difficult results in probability theory that X̄ → µ as n → ∞ with probability 1, with the intuitive interpretation that we are certain to arrive at the ‘true’ value as the sample size increases indefinitely. This is the Strong Law of Large Numbers and is not discussed further here. Theorem 5 allows us to prove (proof not required in this course) only that P [|X̄ − µ| < ǫ] → 1 as n → ∞. This means that the sequence of real numbers giving the probability that X̄ is within ǫ of µ has a limit of unity. This is called the Weak Law of Large Numbers. Another important result for statistical inference for large sample sizes is the Central Limit Theorem, which says that as n → ∞ the sampling distribution of X̄ tends to a Normal distribution with the same mean and variance. We can demonstrate this empirically using R. The importance of this result is that we do not need to know the form or type of the original population distribution if our sample size is sufficiently large. We can use instead the Normal distribution for statistical inference with the knowledge that the probabilities we calculate will be good approximations to the true (but generally unknown) probabilities. Recall the Normal approximations to the Binomial and Poisson distributions. See the following chapter for sections for the tests of hypotheses and methods of estimation such as confidence intervals. These are the techniques statistical inference applies to real data. Using the symbol ‘∼’ to mean ‘is distributed as’, we write the Central Limit Theorem X̄ ∼ N (µ; σ2 ) approx. for large n. n Then using the properties of the Normal distribution we can say that for large n, X̄ − µ √ >u P σ/ n can be found approximately (using say, NCST) for any specified value u without knowing the original form of the population. Pn Suppose once again that Y = i=1 Xi , where the Xi are independent rvs. When n is large, the central limit theorem (CLT) which, roughly speaking, says that if the Xi have mean and variance µi and σi2 respectively, then X X Y ∼ N( µi , σi2 ). CHAPTER 7. ESTIMATION AND TESTING 5 This result helps to explain the importance of the Normal distribution in statistics. In particular # " P (y − µi ) . P (Y ≤ y) ∼ =Φ P 1 ( σi2 ) 2 If we do know the form of the population and it follows a Normal distribution, then for any sample size n > 1 it can be shown that X̄ ∼ N (µ; σ2 ). n Thus Z= X̄ − µ √ ∼ N (0, 1) σ/ n has a sampling distribution which is standard Normal for any n. As the population standard deviation σ is often unknown, replacing it by the corresponding sample quantity s changes the sampling distribution. However, provided the underlying population is Normal, it can be shown that T = X̄ − µ √ s/ n has a ‘Student’s t-distribution’ with ν ‘degrees of freedom’, where ν = n − 1 (named after W. S. Gossett, who took the pseudonym ‘Student’). The percentiles of this distribution are given in Table 10 ; thus tν (0.25) is the 75th percentile or upper quartile whose value for different ν is given by the third column of figures in the main body of Table 10. These percentiles will be used extensively in the next section for statistical inference on Normal populations. Another distribution which arises from random samples of Normal populations is the ‘chi-square’ distribution, whose percentage points are given in Table 8. It can be shown that V = (n − 1)s2 ∼ χ2n−1 , σ2 the chi-square distribution with n − 1 degrees of freedom, whatever the value of X̄. Using some rather tricky distribution theory we can derive the t-distribution mentioned above. Yet another distribution is the (Fisher) F -distribution with percentage points in Table 12. The F - and χ2 distributions are used for statistical inference on the variances of Normal populations as well as for wider application in Goodness-of-Fit tests later in this chapter. Example 7.1 Suppose we select a random sample Xi of size n from a N (µ, σ 2 ) population and calculate the mean of the sample, X̄. Suppose σ = 70mm and I wish to estimate the mean height in mm of a certain population on the basis of a sample of size n. 6 CHAPTER 7. ESTIMATION AND TESTING What is the probability the sample mean is within 10mm of the population mean? How large must n be to make P (µ − 10 < X̄ ≤ µ + 10) = 0.9 Solution X̄ ∼ N σ2 µ, n . µ + 10 − µ √ P (µ − 10 < X̄ ≤ µ + 10) = Φ 70/ n √ n −1 = 2Φ 7 −Φ µ − 10 − µ √ 70/ n If n = 1, P (µ − 10 < X̄ ≤ µ + 10) = 0.12. If n = 10, P (µ − 10 < X̄ ≤ µ + 10) = 0.35. If n = 100, P (µ − 10 < X̄ ≤ µ + 10) = 0.85. √ n ) − 1 = 0.9. 7 √ n ) = 0.95 P (z < √7 n = 1.6449 7 n ≈ 133 2P (z < △ 7.2 Point estimation One of the first tasks a statistician or an engineer undertakes when faced with data is to try to summarize or describe the data in some manner. Some of the statistics (sample mean, sample variance, etc.) we covered can be used as descriptive measures for our sample. In this section, we look at methods to derive and to evaluate estimates of population parameters. There are several methods available CHAPTER 7. ESTIMATION AND TESTING 7 for obtaining parameter estimates, we will discuss the maximum likelihood method and the method of moments. Typically, population parameters can take on values from a subset of the real line. For example, the population mean can be any real number, −∞ < µ < ∞, and the population standard deviation can be any positive real number, σ > 0. The set of all possible values for a parameter θ is called the parameter space. The data space is defined as the set of all possible values of the random sample of size n. The estimate is calculated from the sample data as a function of the random sample. An estimator is a function or mapping from the data space to the parameter space and is denoted as T = t(X1 , ..., Xn ). Since an estimator is calculated using the sample alone, it is a statistic. Furthermore, if we have a random sample, then an estimator is also a random variable. This means that the value of the estimator varies from one sample to another based on its sampling distribution. In order to assess the usefulness of our estimator, we need to have some criteria to measure the performance. We discuss four criteria used to assess estimators: bias, mean squared error, efficiency, and standard error. In this discussion, we only present the definitional aspects of these criteria. Bias The bias in an estimator gives a measure of how much error we have, on average, in our estimate when we use T to estimate our parameter θ. The bias is defined as bias(T ) = E[T ] − θ. If the estimator is unbiased, then the expected value of our estimator equals the true parameter value, so E[T ] = θ. To determine the expected value E[T ], we must know the distribution of the statistic T . In these situations, the bias can be determined analytically. When the distribution of the statistic is not known, then we can use special methods to estimate the bias of T . Mean Square Error Let θ denote the parameter we are estimating and T denote our estimate, then the mean squared error (MSE) of the estimator is defined as M SE(T ) = E[(T − θ)2 ]. Thus, the MSE is the expected value of the squared error. We can write this in more useful quantities such as the bias and variance of T . If we expand the expected value on the right hand side of the above equation, then we have M SE(T ) = E[(T 2 − 2T θ + θ2 )] = E[T 2 ] − 2θE[T ] + θ2 . CHAPTER 7. ESTIMATION AND TESTING 8 By adding and subtracting (E[T ]2 ), we have the following M SE(T ) = E[T 2 ] − (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2 . The first two terms are the variance of T , and the last three terms equal the squared bias of our estimator. Thus, we can write the mean squared error as M SE(T ) = E[T 2 ] − (E[T ])2 + (E[T ] − θ)2 = V ar[T ] + (bias(T ))2 Since the mean squared error is based on the variance and the squared bias, the error will be small when the variance and the bias are both small. When T is unbiased, then the mean squared error is equal to the variance only. The concepts of bias and variance are important for assessing the performance of any estimator. Standard error We can get a measure of the precision of our estimator by calculating the standard error. The standard error of an estimator (or a statistic) is defined as the standard deviation of its sampling distribution: p SE(T ) = V (T ) = σT To illustrate this concept, let’s use the sample mean as an example. We know that the variance of the estimator is V [X̄] = 1 2 σ , n for large n. So, the standard error is given by σ SE(X̄) = σX̄ = √ . n If the standard deviation σ for the underlying population is unknown, then we can substitute an estimate for the parameter. In this case, we call it the estimated standard error: ˆ X̄) = σˆX̄ = √S . SE( n This estimate is also a random variable and has a probability distribution associated with it. If the bias in an estimator is small, then the variance of the estimator is approximately equal to the MSE, V (T ) ≈ M SE(T ). Thus, we can also use the square root of the MSE as an estimate of the standard error. CHAPTER 7. ESTIMATION AND TESTING 9 Maximum Likelihood Estimation A maximum likelihood ML estimator is that value of the parameter (or parameters) that maximizes the likelihood function of the sample. The likelihood function of a random sample of size n from density function f (x; θ) is the joint probability density function, denoted by L(θ; x1 , ..., xn ) = f (x1 , ..., xn ; θ) . This equation provides the likelihood that the random variables take on a particular value x1 , ..., x2 . Note that the likelihood function L is a function of the unknown parameter θ, and that we allow θ to represent a vector of parameters. If we have a random sample (independent, identically distributed random variables), then we can write the likelihood function as L(θ) = L(θ; x1 , ..., xn ) = n Y f (xi ; θ), i=1 which is the product of the individual density functions evaluated at each or sample point. In most cases, to find the value θ̂ that maximizes the likelihood function, we take the derivative of L, set it equal to 0 and solve for θ. Thus, we solve the following likelihood equation d L(θ) = 0. dθ It can be shown that the likelihood function, L(θ), and logarithm of the likelihood function, ln L(θ), have their maxima at the same value of θ. It is sometimes easier to find the maximum of ln L(θ), especially when working with an exponential function: l(θ) = ln L(θ), Then we solve the following equation d l(θ) = 0. dθ However, keep in mind that a solution to the above equation does not imply that it is a maximum; it could be a minimum. It is important to ensure this is the case before using the result as a maximum likelihood estimator. When a distribution has more than one parameter, then the likelihood function is a function of all parameters that pertain to the distribution. In these situations, the maximum likelihood estimates are obtained by taking the partial derivatives of the likelihood function (or ln L(θ)), setting them all equal to zero, and solving the system of equations. Example 7.2 10 CHAPTER 7. ESTIMATION AND TESTING In this example, we derive the maximum likelihood estimators for the parameters of the normal distribution. Solution We start off with the likelihood function for a random sample of size n given by L(θ) = n Y i=1 ! n n 1 (xi − µ)2 1 1 X √ exp − = (xi − µ)2 . exp − 2 2 2 2σ 2πσ 2σ σ 2π i=1 l(θ) = ln(L(θ)) = ln " 1 2πσ 2 n2 # " n 1 X (xi − µ)2 + ln exp − 2 2σ i=1 !# This simplifies to n n n 1 X l(θ) = − ln[2π] − ln[σ 2 ] − 2 (xi − µ)2 2 2 2σ i=1 with σ > 0 and −∞ < µ < ∞. The next step is to take the partial derivative with respect to µ and σ 2 . These derivatives are n 1 X ∂l (xi − µ), = 2 ∂µ σ i=1 and n n 1 X ∂l (xi − µ)2 . =− 2 + 4 2 ∂σ 2σ 2σ i=1 We then set equations equal to zero and solve for µ and σ 2 . Solving the first equation for µ, we get the familiar sample mean for the estimator. n 1 X (xi − µ) = 0 σ 2 i=1 n X xi = nµ i=1 n 1X xi = x̄ µ̂ = n i=1 Substituting µ̂ = x̄ and solving for the variance, we get 2 n 1 X − 2+ 4 (xi − x̄)2 = 0 2σ 2σ i=1 CHAPTER 7. ESTIMATION AND TESTING 11 n 1X (x − x̄)2 σˆ2 = n i=1 △ We know that the E[X̄] = µ, so the sample mean is an unbiased estimator for the population mean. However, that is not the case for the maximum likelihood estimate for the variance E[σˆ2 ] = (n − 1)σ 2 , n so that the maximum likelihood estimate, σˆ2 , for the variance is biased. If we want to obtain an unbiased estimator for the variance, we simply multiply our n . This yields the statistic for the sample maximum likelihood estimator by (n−1) variance given by n 1 X s = (xi − x̄)2 n − 1 i=1 2 Methods of moments In some cases, it is difficult finding the maximum of the likelihood function. The method of moments is one way to approach this problem. In general, we write the unknown population parameters in terms of the population moments. Let X1 , X2 , ..., Xn be a random sample from the probability distribution f (x). The k th population moment is E[X k ], k = 1, 2, .... The corresponding k th sample moment is n 1X k X , n i=1 i k = 1, 2... The moment estimators are found by replacing the population moments with the corresponding sample moments. Exercises Exercise 7.1 In terms of a random sample of size n, from the binomial B(1, θ) distribution with observed values x1 , ..., xn , determine the MLE θ̂ = θ̂(x) of θ ∈ (0, 1), x = (x1 , ..., xn ). Solution X ∼ B(1, θ), so that xi can take values 0 or 1. 12 CHAPTER 7. ESTIMATION AND TESTING f (xi , θ) = θxi (1 − θ)1−xi , i = 1, .., n. L(θ) = =θ Pn Qn i=1 i=1 xi f (xi , θ) = (1 − θ)n− l(θ) = ln L(θ) = dl dθ = Pn i=1 Pn i=1 Pn i=1 θ − i=1 Pn i=1 (n− Pn i=1 xi − Pn Pn i=1 n i=1 xi i=1 xi ln θ + (n − xi ) =0 Pn θ − nθ + = x̄ θxi (1 − θ)1−xi = θ Pn i=1 xi Pn (1 − θ) i=1 (1−xi ) = xi 1−θ xi (1 − θ) − (n − θ̂M L = 7.3 xi Pn Qn i=1 Pn i=1 xi ) ln(1 − θ) xi ) θ = 0 Pn i=1 θ=0 △ Confidence Intervals A confidence interval allows us to make statements concerning the likely range that a population parameter (such as the mean) lies within. A single statistic could be used as an estimate for a population (commonly referred to as a point estimate ). A single value, however, would not reflect any amount of confidence in the value. This range of values is referred to as the confidence interval. The confidence interval is not only dependent on the number of samples collected but is also dependent on the required degree of confidence in the range. If we wish to make a more confident statement, we would have to make the range larger. This required degree of confidence is based on the confidence level at which the estimate is to be calculated. Commonly used, confidence levels include 0.9, 0.95, and 0.99. Confidence level α Φ−1 (α/2) 0.99 0.98 0.95 0.9 0.5 2.58 2.33 1.96 1.645 0.6745 The confidence limits for the population mean are given by σ x̄ ± Φ−1 (α/2) √ n In general, the population standard deviation σ is unknown, so that to obtain the confidence limits, we use the estimator s2 . In this case we use the t-distribution to obtain confidence intervals. In general the confidence limits for population means are given by s x̄ ± t−1 n−1 (α/2) √ n CHAPTER 7. ESTIMATION AND TESTING 13 For n > 30, Φ−1 (α/2) and t−1 n−1 (α/2) are practically equal. Suppose that the statistic is the proportion of success a sample of size n > 30 drawn from a binomial population in which p is the probability of successes. Then the confidence limits for p are given by: r p̂(1 − p̂) p̂ ± Φ−1 (α/2) n A confidence interval for the variance of population σ 2 is ( (n − 1)s2 (n − 1)s2 , ) χ2n−1 (α/2) χ2n−1 (1 − α/2) Example 7.3 Suppose there are two political parties A and B. An opinion poll is carried out based on 1,000 individuals, 0.53% voted for candidate A. Find cinfidence interval for a polling results. Solution r p̂(1 − p̂) n 0.53 ± 1.96 × 0.016 [0.5; 0.53] p̂ ± Φ−1 (α/2) △ Exercises Exercise 7.2 The stray-load loss (in watts) for a certain type of induction motor, when the line current is held at 10 amps for a speed of 1,500 rpm, is a r.v. X ∼ N (µ, 9). 1. Compute a 99% confidence interval for µ when n = 100 and = 58.3. 2. Determine the sample size n, if the length of the 99% confidence interval is required to be 1. Solution X ∼ N (µ, 9) CHAPTER 7. ESTIMATION AND TESTING 14 1. x̄ ± Φ−1 (α/2) √σn 3 58.3 ± 2.58 ∗ √100 = 58.3 ± 0.7740 99 % confidence interval is (57.5260, 59.0740) 2. Φ−1 (α/2) √σn = 0.5 2.58 ∗ √3n = 0.5 √ n = 15.48 n = 240 △ 7.4 7.4.1 Hypothesis Testing Introductions Very often in practice we need to make a decision about population based on the information from the sample. For example, we may wish to decide if a new serum more effectively cures the dieses, or one procedure is better then other. In attempt to reach a decision, it is useful to make assumptions or guesses about population involved. A hypothesis test determines whether the data collected supports a specific claim. A statistical test of hypothesis consist of five parts: 1. The null hypothesis, denoted by H0 2. The alternative hypothesis, denotes H1 3. The test statistics: a single number calculated from the sample 4. The p-value: a probability taken from distribution which approximates test statistics. 5. The conclusion: there is or there is no evidence to reject H0 Null hypothesis H0 is a claim that a particular population parameter (e.g.mean) equals a specific value. A hypothesis test will either reject or not reject the null hypothesis using the collected data. Alternative hypothesis H1 is the conclusion that we would be interested in reaching if the null hypothesis is rejected. There are three options: not equal to, greater than or less than. Once the null hypothesis and the alternative hypothesis have been described, it is now possible to assess the hypotheses using the data collected. First, the statistic of interest from the sample is calculated. Next, a hypothesis test will look at the p-value. A p-value is the probability of getting the recorded value or a more extreme value. This is actually the area under probability density function to the right (for positive values) or to the left (for negative values) of the test statistics. To CHAPTER 7. ESTIMATION AND TESTING 15 calculate a p-value, use the score calculated using the test statistics and look up the score on the standardized normal distribution. To interpret p-values in a consistent way, we adopt a convention which gives the following interpretations: p > 0.1 very weak or no evidence against the null hypothesis 0.05 < p < 0.1 slight or weak evidence against the null hypothesis 0.01 < p < 0.05 moderate evidence against the null hypothesis 0.001 < p < 0.01 strong evidence against the null hypothesis p < 0.001 very strong or overwhelming evidence against the null hypothesis. However the exact interpretation and appropriate action to be taken must obviously vary according to the problem at hand. Errors Since a hypothesis test is based on a sample and samples vary, there exists the possibility for errors. There are two potential errors and these are described as: Type I Error: In this situation the null hypothesis is rejected when it really should not be. These errors are minimized by setting the p-value to be high. Type II Error: In this situation the null hypothesis is not rejected when it should have been. These errors are minimized by increasing the number of observations in the sample. 7.4.2 Single sample Consider we have a sample which comes from population described by normal distribution with mean µ and variance σ 2 . Suppose that we have a knowledge of the population variance σ, or we have large number of observations and Theorem of large numbers can be applied. We are interested if the data from the sample support a hypothesis that population mean µ takes a certain value µ0 . CHAPTER 7. ESTIMATION AND TESTING 16 Independent samples, population variances known, or sample size n ≥ 30 1. Null hypothesis: H0 : µ = µ0 2. Alternative hypothesis: One-tailed test H 1 : µ > µ0 ( or H1 : µ < µ0 ) Two-tailed test H1 : µ 6= µ0 3. Test statistic: x̄ − µ0 √ σ/ n If σ is unknown and n > 30, substitute the sample standard deviation s for population standard deviation σ. Z= 4. p-value from normal distribution: One-tailed test Two-tailed test p = P (z ≥ Z) p = 2 × P (z ≥ Z) ( or p = P (z ≤ Z) ) 5. Conclusion - based on calculated p-value Example* Burning rate of a solid propellant used to power aircrew escape systems is a random variable that can be described by a normal distribution with the unknown mean and the standard deviation σ = 2.5 centimeters per second. Given that a mean from 10 samples is 48.5 centimeters per second, test the hypothesis whether or not the mean burning rate is 50 centimeters per second. Solution H0 : µ = 50 H1 : µ 6= 50 Two tailed test. x̄−µ √ √ 0 = 48.5−50 = −1.90 Test statistics: Z = σ/ n 2.5/ 10 p = P (|Z| > 1.9) = 2 ∗ (1 − P (Z < 1.9)) = 0.0574 There is a weak evidence against H0 , i.e. that mean burning rate is 50 centimeters per second. △ Independent samples, population variances unknown, n < 30 1. Null hypothesis: H0 : µ = µ0 2. Alternative hypothesis: One-tailed test H 1 : µ > µ0 ( or H1 : µ < µ0 ) Two-tailed test H1 : µ 6= µ0 17 CHAPTER 7. ESTIMATION AND TESTING 3. Test statistic: T = x̄ − µ0 √ s/ n 4. p-value from Student’s t-distribution, with n − 1 degrees of freedom: One-tailed test Two-tailed test p = P (tn−1 ≥ T ) p = 2 × P (tn−1 ≥ T ) ( or p = P (tn−1 ≤ T )) 5. Conclusion - based on calculated p-value Example 7.4 The increased availability of light materials with high strength has revolutionized the design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and very thin faces can result in much longer tee shots, especially for players of modest skills. This is due partly to the spring-like effect that the thin face imparts to the ball. Firing a golf ball at the head of the club and measuring the ratio of the outgoing velocity of the ball to the incoming velocity can quantify this spring-like effect. The ratio of velocities is called the coefficient of restitution of the club. An experiment was performed in which 15 drivers produced by a particular club maker were selected at random and their coefficients of restitution measured. In the experiment the golf balls were fired from an air cannon so that the incoming velocity and spin rate of the ball could be precisely controlled. It is of interest to determine if there is evidence to support a claim that the mean coefficient of restitution exceeds 0.82. The observations follow: 0.8411 0.8580 0.8042 0.8191 0.8532 0.8730 0.8182 0.8483 0.8282 0.8125 0.8276 0.8359 0.8750 0.7983 0.8660 Solution The sample mean and sample standard deviation are x̄ = 0.83725 and s = 0.02456. Since the objective of the experimenter is to demonstrate that the mean coefficient of restitution exceeds 0.82, a one-sided alternative hypothesis is appropriate. H0 : µ = 0.82 H1 : µ > 0.82 We want to reject H0 if the mean coefficient of restitution exceeds 0.82. The test statistic is x̄ − µ0 √ T = s/ n Computations: T = 0.83725 − 0.82 √ = 2.72 0.02456/ 15 CHAPTER 7. ESTIMATION AND TESTING 18 Conclusions: p = P (t14 > 2.72) = 0.0086 No evidence to reject null hypothesis. △ Test Hypothesis Concerning a Population Variance 1. Null hypothesis: H0 : σ 2 = σ02 2. Alternative hypothesis: One-tailed test H1 : σ 2 > σ02 ( or H1 : σ 2 < σ02 ) 3. Test statistic: X2 = Two-tailed test H1 : σ 2 6= σ02 (n − 1)s2 σ02 4. p-value from chi-square distribution, with n − 1 degrees of freedom: One-tailed test Two-tailed test P (χ2n−1 ≥ X 2 ) 2 × P (χ2n−1 ≥ X 2 ) ( or P (χ2n−1 ≤ X 2 ) ) 5. Conclusion - based on calculated p-value Example* An automatic filling machine is used to fill bottles with liquid detergent. A random sample of 20 bottles results in a sample variance of fill volume of s2 = 0.0153 (fluid ounces)2 . If the variance of fill volume exceeds 0.01 (fluid ounces)2 , an unacceptable proportion of bottles will be underfilled or overfilled. Is there evidence in the sample data to suggest that the manufacturer has a problem with underfilled or overfilled bottles? Solution H0 : σ 2 = 0.01 H1 : σ 2 > 0.01 One-tailed test. 2 Test statistics: X 2 = (n−1)s = 19∗0.153 = 29.07 0.01 σ02 2 p = P (χ19 > 29.07) > 0.05 No evidence to reject H0 , i.e. variance is not more then 0.01 (fluid ounces)2 . △ CHAPTER 7. ESTIMATION AND TESTING 7.4.3 19 Comparing two samples Two different situations are possible: independent samples and paired samples. Important assumption is that he samples are taken from normal distributions, the first of size nA from N (µA , σA2 ) and the second one of size nB from N (µB , σB2 ). The means of the samples X̄A and X̄B can be calculated and used to compare means of populations µA and µB . Independent samples, population variances known, or if unknown, nA ≥ 30 and nB ≥ 30 1. Null hypothesis: H0 : (µA − µB ) = µ0 where µ0 is some specific difference that you wish to test. For many tests, you will hypothesize that there is no difference between µA and µB ; that is H0 : µA = µB . 2. Alternative hypothesis: One-tailed test Two-tailed test H1 : (µA − µB ) > µ0 H1 : (µA − µB ) 6= µ0 ( or H1 : (µA − µB ) < µ0 ) 3. Test statistic: Z= (x̄A − x̄B ) − µ0 q 2 2 σA σB + nA nB If σA2 and σB2 are unknown, nA > 30 and nB > 30, substitute the sample variances s2A and s2B for σA2 and σB2 , respectively. 4. p-value from normal distribution: One-tailed test Two-tailed test p = P (z ≥ Z) p = 2 × P (z ≥ Z) ( or p = P (z ≤ Z) ) 5. Conclusion - based on calculated p-value Example* A product developer is interested in reducing the drying time of a primer paint. Two formulations of the paint are tested; formulation 1 is the standard chemistry, and formulation 2 has a new drying ingredient that should reduce the drying time. From experience, it is known that the standard deviation of drying time is 8 minutes, and this inherent variability should be unaffected by the addition of the new ingredient. Ten specimens are painted with formulation 1, and another 10 specimens are painted with formulation 2; the 20 specimens are painted in random order. The two sample average drying times are x̄1 = 121 minutes and x̄2 = 112 CHAPTER 7. ESTIMATION AND TESTING 20 minutes, respectively. What conclusions can the product developer draw about the effectiveness of the new ingredient? Solution H0 : µ1 = µ2 H 1 : µ 1 > µ2 One-tailed test. Test statistics: Z = x̄1 −x̄2 −0 r 2 σ1 σ2 + n2 n1 2 = 121−112 q 2 82 + 810 10 = 2.52 p = P (Z > 2.52) = 1 − P (Z < 2.52) = 0.0059 There is an evidence to reject H0 , i.e. new drying ingredient can reduce the drying time. △ If independent samples have unknown population variances, but we can assume that these unknown population variances are equal, we construct pooled variance estimator, which is weighted average of two unbiased estimators s2A and s2B : Independent samples with unknown, but equal variances σA = σB = σ, with nA ≤ 30 and nB ≤ 30, (pooled variances) 1. Null hypothesis: H0 : (µA −µB ) = µ0 , where µ0 is some specific difference that you wish to test. If no difference between µA and µB ; than is H0 : µA = µB . 2. Alternative hypothesis: One-tailed test Two-tailed test H1 : (µA − µB ) > µ0 H1 : (µA − µB ) 6= µ0 ( or H1 : (µA − µB ) < µ0 ) 3. Test statistic: T = where s2 = (nA −1)s2A +(nB −1)s2B nA +nB −2 (x̄A − x̄B ) − µ0 q 1 + n1B s nA 4. p-value from Student’s t-distribution, with nA + nB − 2 degrees of freedom: One-tailed test Two-tailed test P (tnA +nB −2 ≥ T ) 2 × P (tnA +nB −2 ≥ T ) ( or P (tnA +nB −2 ≤ T ) ) 5. Conclusion - based on calculated p-value Example* Suppose that we have obtained X̄ = 80.02 and sx = 0.024 from the control group of size nx = 13, and Ȳ = 79.98 and sy = 0.031 from the experimental group of size CHAPTER 7. ESTIMATION AND TESTING 21 ny = 8 and we assume that σx2 = σy2 . Test the hypothesis that the means for two groups are equal. Solution H0 : µx = µy H1 : µx 6= µy Two-tailed test. s2 = (nx −1)s2x +(ny −1)s2y nx +ny −2 s = 0.027 Test statistics: T = = 12∗0.0242 +7∗0.0312 13+8−2 80.02−79.98 √1 1 +8 0.027 13 = 0.000729 = 3.3 p = 2 ∗ P (t19 > 3.3) = 2 ∗ (1 − P (t19 < 3.3)) = 2 ∗ (1 − 0.9981) = 0.0035 There is an evidence against H0 and the two population means are significantly different. △ Before performing the above test for comparison of the means of two samples, we need to check if there is an evidence that the variances are equal: Test Hypothesis Concerning the Equality of Two Population Variances 1. Null hypothesis: H0 : σA2 = σB2 2. Alternative hypothesis: One-tailed test H1 : σA2 > σB2 ( or H1 : σA2 < σB2 ) Two-tailed test H1 : σA2 6= σB2 Small-sample statistical test for the difference between two population means: 3. Test statistic: F = σA2 σB2 4. p-value from Fisher distribution, with dfA = nA − 1 and dfB = nB − 1 degrees of freedom: One-tailed test Two-tailed test p = P (FdfA ,dfB ≥ F ) p = 2 × P (FdfA ,dfB ≥ F ) ( or p = P (FdfA ,dfB ≥ 1/F ) ) 5. Conclusion - based on calculated p-value CHAPTER 7. ESTIMATION AND TESTING 22 Example* Oxide layers on semiconductor wafers are etched in a mixture of gases to achieve the proper thickness. The variability in the thickness of these oxide layers is a critical characteristic of the wafer, and low variability is desirable for subsequent processing steps. Two different mixtures of gases are being studied to determine whether one is superior in reducing the variability of the oxide thickness. Twenty wafers are etched in each gas. The sample standard deviations of oxide thickness are s1 = 1.96 angstroms and s2 = 2.13 angstroms, respectively. Is there any evidence to indicate that either gas is preferable? Solution H0 : σ12 = σ22 H1 :21 6= σ22 Two- tailed test. 2 σ2 3.84 Test statistics: F = 21 = 1.96 2 = 4.54 = 0.85 2.13 2 p = 2 ∗ P (F19,19 > 0.85) From Table 12: F12,19 (0.1) = 1.912, F24,19 (0.1) = 1.787 so that P (F19,19 > 0.85) > 0.1 and p > 0.2. There is no evidence against H0 , i.e. to indicate that either gas results in a smaller variance of oxide thickness. △ If the above test indicates that there is an evidence to reject null hypothesis about equality of the variances, following strategy should be adopted: Independent samples with unknown variances 1. Null hypothesis: H0 : (µA − µB ) = µ0 where µ0 is some specific difference that you wish to test. If no difference between µA and µB ; then is H0 : µA = µB . 2. Alternative hypothesis: One-tailed test Two-tailed test H1 : (µA − µB ) > µ0 H1 : (µA − µB ) 6= µ0 ( or H1 : (µA − µB ) < µ0 ) 3. Test statistic: T = (x̄A − x̄B ) − µ0 q 2 sA s2 + nBB nA 4. p-value is found from Student’s t-distribution, with ν ∗ degrees of freedom. ν∗ = (vA + vB )2 2 vA nA −1 + 2 vB nB −1 23 CHAPTER 7. ESTIMATION AND TESTING where vA = s2A nA and vB = s2B nB One-tailed test Two-tailed test p = P (tnA +nB −2 ≥ T ) p = 2 × P (tnA +nB −2 ≥ T ) ( or p = P (tnA +nB −2 ≤ T ) ) 5. Conclusion - based on calculated p-value 7.4.4 Paired Observations For paired comparison, the same individuals/items from the sample are subjected to two different treatments A and B. Paired comparison data is analysed by considering the differences of the pairs of observations. Thus, if the observations are represented by the random variables XA and XB , we consider the derived random variable X = XA − XB . It is easer to present the results in the following table: 1 .. . A XA1 .. . B XB1 .. . X X1 = XA1 − XB1 .. . X2 X12 .. . n XAn XBn Xn = XAn − XBn Xn2 The mean of the distribution of X is µ which equals µA − µB . We interested in hypothesis that µA and µB differ by a amount µ0 . Most probably, the variance σ 2 of differences is unknown, but we can calculates variance s2 of X from the data. √ 0 ∼ tn−1 Then the statistics T = X̄−µ s/ n A Paired Samples Test 1. Null hypothesis: H0 : X̄ = µ0 . 2. Alternative hypothesis: One-tailed test H1 : X̄ > µ0 ( or H1 : X̄ < µ0 ) 3. Test statistic: T = Two-tailed test H1 : X̄ 6= µ0 x̄ − µ0 √ s/ n where n is a number of paired differences, x̄ is mean of sample differences, s standard deviation of the sample differences. 4. p-value from Student’s t-distribution, with n − 1 degrees of freedom: One-tailed test Two-tailed test p = P (tn−1 ≥ T ) p = 2 × P (tn−1 ≥ T ) ( or p = P (T ≤ tn−1 ) ) 5. Conclusion - based on calculated p-value CHAPTER 7. ESTIMATION AND TESTING 24 Example* The journal Human Factors (1962, pp. 375-380) reports a study in which n = 14 subjects were asked to parallel park two cars having very different wheel bases and turning radii. The time in seconds for each subject was recorded and is given in table below: Subject Configuration 1 Configuration 2 Difference 1 37.0 17.8 19.2 2 25.8 20.2 5.6 3 16.2 16.8 -0.6 4 24.2 41.4 -17.2 5 22.0 21.4 0.6 33.4 38.4 -5.0 6 7 23.8 16.8 7.0 8 58.2 32.2 26.0 9 33.6 27.8 5.8 24.4 23.2 1.2 10 11 23.4 29.6 -6.2 12 21.2 20.6 0.6 13 36.2 32.2 4.0 14 29.8 53.8 -24.0 Test if the first configuration of wheel bases and turning radii gives a faster car parking time. Solution H0 : X̄ = 0 H1 : X̄ > 0 One-tailed test. P = 1.21 x̄ = Pnxi = 19.2+5.6−0.6−7.2+0.6−5.0+7.0+26.0+5.8+1.2−6.2+0.6+4.0−24.0 14 2 (x −x̄) i s2 = n−1 = 160.78 s = 12.68 1.21√ = 0.3441 Test statistics: T = s/x̄√n = 12.68/ 13 p = P (T13 > 0.3441) From Table 10: T13 (0.4) = 0.2586 and T13 (0.3) = 0.5375 so that 0.3 < p < 0.4. No evidence to eject H0 , i.e. both configurations give equal mean parking times. △ Exercises Exercise 7.3 Two random samples were independently drawn from two populations, A and B. Is there evidence in the following data to indicate a difference in the population means? CHAPTER 7. ESTIMATION AND TESTING Sample PSize n Pni=1 x2i i=1 xi Mean Variance S.E.Mean 25 A B 6 5 297 322 16103 21978 49.5 64.4 280.3 310.3 6.84 7.88 Solution First we want to test hypothesis if variances of two populations are equal. H0 : σA = σB H1 : σA 6= σB Two-tailed test. σ2 = 0.9033 Test statistics F = σA2 = 280.3 310.3 B p = 2 ∗ P (F5,4 > 0.9033) From Table 12: F4,5 (0.1) = 3.520 and F4,5 (0.05) = 5.192, this indicates that P (F5,4 > 0.9033) > 0.1 and p > 0.2. There is no evidence to reject H0 , i.e. the variances of two populations are equal. Now we can test hypothesis if the means of two populations are equal (populations have unknown but equal variances. H0 : µA = µB H1 : µA 6= µB Two-tailed test. (n −2)s2 +(nB −1)s2B = 5∗280.3+4∗310.3 = 293.6 s2 = A nAA+nB −2 6+5−2 s = 17.14 √ Test statistics: T = √ x̄A −x̄B = 59.5−64.4 = −1.5 s 1/nA +1/nB 17.14 1/6+1/5 p = 2∗ P (t9 < −1.5) = 2∗ (1 − P (t9 < 1.5)) = 2∗ (1 − 0.9161) = 2∗ 0.0839 = 0.1678 No evidence to reject H0 , i.e. the means of two populations are the same. △ Exercise 7.4 The heights in inches of m male students and n female students were obtained. It is well-known that males tend to be taller on average than females. However, it is of interest to estimate the difference in mean heights between the sexes. General anthropometric considerations suggest that male heights and female heights are approximately Normal and that the population variances differ slightly. In the general population the difference of mean male and female heights is 5.5 inches. Thus, for our example it is of interest to test H0 : µM − µF = 5.5 against H1 : µM − µF 6= 5.5. In this case we need to modify the t-statistic since we are testing µM − µF = 5.5, rather than µM − µF = 0. Here CHAPTER 7. ESTIMATION AND TESTING Male m = 41 x̄ = 69.2 sx = 2.66 26 Female n = 17 ȳ = 66.7 sy = 2.34 Solution H0 : µx − µy = 5.5 H1 : µx − µy 6= 5.5 Two-tailed test. Independent samples with unknown variances (x̄−ȳ)−µ0 Test statistic: T = r = 2 2 s sx + ny nx x = √ 69.2−66.7−5.5 2 2 2.66 /41+2.34 /17 vx = ν∗ = 2 s2x = 2.66 nx 41 (vx +vy )2 2 v2 vx + n y−1 nx −1 y = −4.26 2 s2y = 0.3221 = 2.34 ny 17 2 (0.1726+0.3221) = 33.85 0.17262 /40+0.32212 /16 = 0.1726, vy = = p = 2 ∗ P (t34 < −4.26) = 2 ∗ P (t34 > 4.26) From Table 10: P (t34 > 3.601) = 0.0005 so we have that p < 0.001. There is a strong evidence against H0 , i.e. the difference of mean male and female heights is not 5.5 inches. △ Exercise 7.5 Sixteen patients sampled at random were assigned as matched pairs to two treatments, treatment A being assigned to a random member of each pair. A response was measured and the data were: A B X (difference) 14.0 13.2 +0.8 5.0 4.7 +0.3 8.6 9.0 −0.4 11.6 11.1 +0.5 12.1 12.2 −0.1 5.3 4.7 +0.6 8.9 8.7 +0.2 10.3 9.6 +0.7 Is there an evidence for a difference in means? Solution H0 : X̄ = 0 H1 : X̄ 6= 0 27 CHAPTER 7. ESTIMATION AND TESTING Two-tailed test. P xi = 0.3250 x̄ = Pn = 0.8+0.3−0.4+0.5−0.1+0.6+0.2+0.7 8 (xi −x̄)2 2 s = n−1 = 0.176 s = 0.413 0.325√ = 2.2 Test statistics: T = s/x̄√n = 0.413/ 8 p = 2 ∗ P (t7 > 2.2) = 2 ∗ (1 − P (t7 < 2.2)) = 2 ∗ (1 − 0.9681) = 0.0638 There is a weak evidence against H0 , i.e. that there is a difference in means of two treatments. △ 7.5 Chi-square test The chi-square test allows an analysis of whether there is a relationship between two categorical variables. The chi-square test is used in two similar but distinct circumstances: • For estimating how closely an observed distribution matches an expected distribution - we’ll refer to this as the goodness-of-fit test. • For estimating whether two random variables are independent. Chi- square is always right - tailed test. 7.5.1 The Goodness-of-Fit Test One of the more interesting goodness-of-fit applications of the chi-square test is to examine issues of fairness and cheating in games of chance, such as cards, dice, and roulette. For example, if the die being used is fair, then the chance of any particular number coming up is the same: 1 in 6. However, if the die is loaded, then certain numbers will have a greater likelihood of appearing, while others will have a lower likelihood. Consider, that the data comes from discrete random distribution with N categories, and is given by the following Table: x Observed frequencies x1 O1 x2 O2 ...... ..... xN ON Total n The key idea of the chi-square test is a comparison of observed frequencies Ok and expected frequencies Ek . We calculate expected frequencies Ek multiplying the total number of samples n by probability distribution function of the distribution P (xk ). The sum of expected frequencies should be equal to total number of elements n: we can not compare Ok and Ek unless they both represent the same total collection of items. Then we can add two rows to the table already given; these rows will contain the expected probabilities and frequencies. Expected frequencies should not necessary be integer numbers. 28 CHAPTER 7. ESTIMATION AND TESTING x x1 x2 ...... Observed freq. O1 O2 ..... Expected prob. P (x1 ) P (x2 ) ..... Expected freq. E1 = nP (x1 ) E2 = nP (x2 ) ..... xN Total ON n P (xN ) 1 EN = nP (xN ) n If the data comes from continuous random distribution with probability density function f (x), then classes are defined as the intervals [xk , xk+1 ] or inequalities x ≥ xN : x x ≤ x1 [x1 , x2 ] ... Observed freq. ... 2 R xO1 1 RO x f (x)dx P2 = x12 f (x)dx ... Expected prob. P1 = −∞ Expected freq. E1 = nP1 E2 = nP2 ..... x ≥ xN Total n RO∞N PN = xN f (x)dx n EN = nPN n Test statistic is given by X2 = X (Ok − Ek )2 k Ek ∼ χ2ν and has Chi-square distribution with the degrees of freedom ν = N − 1-number of parameters estimated. In the PN above case, number of parameters estimated is 1, as we had a restriction that k=1 Ek = n. Important rule for the chi-square test is that the values of Ek should not be allowed to fall below 5. This can be ensured by grouping together the top tail of distribution and treating say x > xl as one class, or merging some other classes together. Then number of degrees of freedom is ν = N̂ − 1-number of parameters estimated, where N̂ is the number of classes after merging. We find p-value for the statistics calculated in the formula above from a standard set of tables. Obviously, in ideal case, if observed frequencies are equal to expected frequencies, statistics X 2 takes value near 0 and p-value is about 1 giving no evidence to reject H0 . A goodness-of-fit test with chi-square 1. Establish null hypotheses that frequencies follow a particular distribution with defined probability P (x). 2. Calculate expected frequency valuesR for each category of the table Ek = x nP (xk ) (for discrete r.v.) or Ek = n xkk+1 f (x)dx (for continuous r.v.). 3. Calculate chi-square statistic 2 X = N X (Ok − Ek )2 k=1 Ek 4. Assess p-value of the statistic from chi-square distribution with ν = N − 1 − 1 df: p = P (χν ≥ X 2 ) 29 CHAPTER 7. ESTIMATION AND TESTING 5. Finally, decide whether to accept or reject the null hypothesis. Example 7.5 The number of defects in printed circuit boards is hypothesized to follow a Poisson distribution. A random sample of n = 60 printed boards has been collected, and the following number of defects observed. Number of Observed 0 1 2 3 Defects Frequency 32 15 9 4 Solution The mean of the assumed Poisson distribution in this example is unknown and must be estimated from the sample data. The estimate of the mean number of defects per board is the sample average, that is, (32×0+15×1+9×2 = 4×3)/60 = 0.75. From the Poisson distribution with parameter 0.75, we may compute pi, the theoretical, hypothesized probability associated with the ith class interval. Since each class interval corresponds to a particular number of defects, we may find the pi as follows: e−0.75 (0.75)0 = 0.527 0! e−0.75 (0.75)1 p2 = P (X = 1) = = 0.354 1! e−0.75 (0.75)2 p3 = P (X = 2) = = 0.133 2! e−0.75 (0.75)3 = 0.041 p4 = P (X = 3) = 3! The expected frequencies are computed by multiplying the sample size n = 60 times the probabilities pi . That is, Ei = npi . The expected frequencies follow: p1 = P (X = 0) = Number of Expected Defects 0 1 0.133 ≥3 Probability 0.472 0.354 7.98 0.041 Frequency 28.32 21.24 2 2.46 Since the expected frequency in the last cell is less than 3, we combine the last two cells: Number of Expected Defects 0 1 23 Frequency 32 32 10.44 Frequency 28.32 21.24 ≥ 2 30 CHAPTER 7. ESTIMATION AND TESTING The chi-square test statistic will have k − p − 1 = 3 − 1 − 1 = 1 degree of freedom, because the mean of the Poisson distribution was estimated from the data. χ2 = (32 − 28.32)2 (15 − 21.24)2 (13 − 10.44)2 + + = 2.94. 28.32 21.24 10.44 Conclusion: p = P (χ1 ≥ 2.94) = 0.0886 No evidence against H0 . △ 7.5.2 Testing Independence The other primary use of the chi-square test is to examine whether two variables are independent or not. It is important to keep in mind that the chi-square test only tests whether two variables are independent. It cannot address questions of which is greater or less. Suppose that nc characteristics are observed on each of n members of a sample, and that each characteristic is classified into nr types, i.e there are nc classes of the first variable and nr classes of the second variable. A summary table, also called contingency table, would be drawn up, where each cell of the table would give the number of samples, who have particular characteristic and particular type. Variable II R1 .. . C1 O1,1 ... Rnr Onr ,1 Total c1 Variable I ... ... ... Cnc O1,nc Total r1 ... Onr ,nc cnc ... rnr n As with the goodness-of-fit example described earlier, the key idea of the chisquare test for independence is a comparison of observed and expected values. How many of something were expected and how many were observed in some process? In the case of tabular data, however, we usually do not know what the distribution should look like. Rather, in this use of the chi-square test, expected values are calculated based on the row and column totals from the table. The expected value for each cell of the table can be calculated using the following formula: Ek,j = rk c j n where rk is the k th row total count, cj is the j th column total count and n is the total observations in the sample. The first step, then, in calculating the chi-square statistic in a test for independence is generating the expected value for each cell of the table. Again, they should not be necessarily integers. This gives an alternative table for an expected frequencies: 31 CHAPTER 7. ESTIMATION AND TESTING Variable II C1 E1,1 R1 .. . Variable I ... ... Rnr Enr ,1 Total c1 ... ... Cnc E1,nc Total r1 ... Enr ,nc cnc ... rnr n With these sets of figures, we calculate the chi-square statistic: X2 = X X (Ok,j − Ek,j )2 Ek,j j k which has Chi-square distribution with the degrees of freedom ν = (nr −1)(nc −1), where nr and nc is number of rows and columns, respectively. We then find p-value for the statistics calculated in the formula above from a standard set of tables. Independence test with chi-square 1. Establish null hypotheses that variables are independent. 2. Calculate expected frequency values for each cell of the table Ek,j = rk cj . n 3. Calculate chi-square statistic X X (Ok,j − Ek,j )2 X = Ek,j j k 2 4. Assess p-value of the statistic from chi-square distribution with ν = (nc − 1)(nr − 1) degree of freedom: p = P (χ2ν ≤ X 2 ) 5. Finally, decide whether to accept or reject the null hypothesis. Example* The following contingency table relates the cination status of pertussis patients. Test status are independent. Illness duration in days < 30 31-60 > 61 Not vacc. 45 104 77 Vacc. 64 64 45 109 168 122 Solution Expected frequencies: duration of illness in days to the vacif duration of illness and vaccination 226 173 399 32 CHAPTER 7. ESTIMATION AND TESTING E11 = 226∗109 399 = 61.74 E12 = 226∗168 399 = 95.16 E13 = 226∗122 399 = 69.10 E21 = 173∗109 399 = 47.26 E22 = 173∗168 399 = 72.84 E32 = 173∗122 399 = 47.26 Illness duration in days < 30 31-60 > 61 Not vacc. 61.74 95.16 69.10 226 Vacc. 47.26 72.84 47.26 173 109 168 122 399 P (Ok −Ek )2 (45−61.74)2 2 Test statistics: X = + = Ek 61.74 (64−47.26)2 47.26 (64−72.84)2 72.84 (104−95.16)2 95.16 + (77−69.10)2 69.10 + (45−47.46)2 47.26 + + = 14.45 Degrees of freedom ν = (3 − 1)(2 − 1) = 2 p = P (χ22 > 14.45) From Table 8: χ22 (0.001) = 13.82 and χ22 (0.0005) = 15.2 so that 0.0005 < p < 0.001 There is a strong evidence against H0 , i.e. duration of illness depend on the vaccination status. △ Exercises Exercise 7.6 In 116 randomly selected families with two children, 42 have no girls, 52 have one girl and only 22 have two girls. Assuming births of either sex are equally likely, do these data conflict with the hypothesis that the sexes of successive births are independent? Solution H0 : ’The sexes of successive births are independent’ H1 : ’the sexes of successive births are not independent’ If the hypothesis is true, then the number of girls in any family of two children follows a Binomial distribution B(2, 1/2). We can construct table of frequencies: CHAPTER 7. ESTIMATION AND TESTING 33 Number of girls x 0 1 2 Total Observed frequency Ok 42 52 22 116 Probability P (X = xk ) 0.25 0.5 0.25 1 Expected frequency Ek = nP (X = xk ) 29 58 29 116 P 2 2 2 (52−58) (22−29) (42−29) O −E Test statistics: X 2 = k kEk k = 29 + 58 + 29 = 8.14 p = P (χ22 > 8.14) From Table 8 : χ22 (0.025) = 7.378 and χ22 (0.001) = 9.210 so that 0.010 < p < 0.025 There is a moderate evidence against H0 , i.e. that sexes of successive births are independent. △ Exercise 7.7 The number of accidents in a month is observed over a period of ten years. Test if the data follow a Poisson distribution. The data are Number of Observed Expected Accidents Frequency Probability Frequency k Ok Ek 0 41 0.30119 36.14 1 40 0.36144 43.37 2 22 0.21686 26.02 3 10 0.08674 10.41 4 6 0.02602 3.12 5 0 0.00625 0.75 6 1 0.00125 0.15 7 or more 0 0.00025 0.03 Total 120 1.00000 120 Solution P 2 2 2 2 k = (41−36.14) + (40−43.47) + (22−26.02) + (17−14.4) = Test statistics: X 2 = k OkE−E 36.14 43.47 26.02 14.4 k 1.198 Degrees of freedom ν = 4 − 1 = 3 p = P (χ23 > 1.198 From Table 8: χ23 (0.8) = 1.005 and χ23 (0.7) = 1.424 so that 0.7 < p < 0.8. There is no evidence against H0 , i.e. the number of accidents follow Poisson distribution. △ Exercise 7.8 The times to failure of 500 electric have been recorded as follows: 34 CHAPTER 7. ESTIMATION AND TESTING Time (hours), x 0-50 50-100 100-150 150-200 200-250 250-300 300-350 350-400 Frequency 208 112 75 40 30 18 11 6 Test whether these data follow an exponential distribution. Solution We need an estimate for Exponential distribution parameter λ̂ = To estimate mean we take mid-points of each interval: = 47500 = 95 x̄ = 25∗208+75∗112+125∗75+175∗40+225∗30+275∗18+325∗11+375∗6 500 500 λ̂ = 1/95 1 x̄ Cumulative distribution function for Exponential distribution: F (x) = 1 − e−λ̂x . E1 = 500 ∗ P (0 < X < 50) = 500 ∗ (F (50) − F (0)) = 500 ∗ (e−0/95 − e−50/95 ) = 500 ∗ 0.4092 = 204.6 E2 = 500 ∗ P (50 < X < 100) = 120.9 E3 = 500 ∗ P (100 < X < 150) = 71.4 E4 = 500 ∗ P (150 < X < 200) = 42.2 E5 = 500 ∗ P (200 < X < 250) = 25.9 E6 = 500 ∗ P (250 < X < 300) = 14.7 E7 = 500 ∗ P (300 < X < 350) = 8.7 E8 = 500 ∗ P (350 < X < 400) = 5.1 E9 = 500∗P (X > 400) = 500−204.6−120.9−71.4−42.2−24.9−14.7−8.7−5.1 = 7.5 Time 050- 100- 150- 200- 250- 300- 350- ¿400 Ok 208 112 40 30 18 11 6 0 Ek 204.6 120.9 71.4 42.2 24.9 14.7 8.7 5.1 7.5 P (Ok −Ek )2 2 Test statistics: x = = 10.96 Ek Degrees of freedom ν = 9 − 1 − 1 = 7 (because we have estimated parameter λ) p = P (χ27 > 10.96) From Table 8: χ27 (0.2) = 9.803 and χ27 (0.1) = 12.02 so that 0.1 < p < 0.2 No evidence to reject H0 , i.e. the times of the failure follow Exponential distribution. △ Exercise 7.9 A survey of smoking habits in a sixth form sampled 50 boys and 40 girls at random 35 CHAPTER 7. ESTIMATION AND TESTING and the frequencies were noted in the following contingency table: NonLight Smokers Smokers Boys 16 20 Girls 24 10 Total 40 30 Heavy Smokers Total 14 50 6 40 20 90 Is there evidence of differences between the sexes? We are comparing two distributions (over smoking habits) so the test is one of similarity. Solution Table of expected frequencies: NonLight Smokers Smokers Boys 22.2 16.7 Girls 17.8 13.3 Total 40 30 where, for example, 22.2 = 50∗40 P (Ok −Ek )90 2 2 = Tests statistics: X = Ek (10−13.3)2 13.3 (6−8.9)2 8.9 (16−22.2)2 22.2 Heavy Smokers Total 11.1 50 8.9 40 20 90 + (20−16.7)2 16.7 + (14−11.1)2 11.1 + (24−17.8)2 17.8 + + = 7.06 Number of degrees of freedom: ν = (2 − 1)(3 − 1) = 2 p = P (χ22 > 7.06) From Table 8: χ22 (0.05) = 5.991 and χ22 (0.025) = 7.378 so that 0.025 < p < 0.05. There is a moderate evidence to reject H0 , i.e. that smoking habit differ between sexes. △