Download Hypothesis testing

YALE School of Management EMBA MGT511- HYPOTHESIS TESTING AND REGRESSION K. Sudhir Sessions 1 and 2 Hypothesis Testing 1. Introduction to Hypothesis Testing A hypothesis is a statement about the population. The hypothesis (statement about the population) may be true or false. Examples of Hypothesis: 1. The average salary of SOM MBA students who finished their MBAs in 2001 is $110000. 2. The proportion of 2001 SOM MBA graduates who had jobs at the time of graduation is 0.9. For these hypotheses about the population (SOM MBA students who finished their MBAs in 2001), it is easy to verify whether the statement is true by asking every student who is graduating (i.e., take a census) their starting salaries as well as whether they had a job at the time of graduation. In that case, we can categorically say whether the hypothesis is true or false. In most practical situations, however, it is not possible to conduct a census. Suppose we had not collected the data from the students at the time of graduation, but now need this information. We could send a survey to these alumni and ask them. It is likely that only a fraction of the alumni would respond. Assuming we obtained a representative set of responses, we could use this sample to still assess whether our statement about the population is true or false. However, we have to recognize that there is likely to be some probability with which we could make errors, because the sample mean would be different from the population mean. The approach of using sample data to assess whether a hypothesis is true or false is the essence of hypothesis testing. There are two types of hypotheses: the null and alternative (research) hypothesis. The null hypothesis is usually the default belief about the population parameter. The alternative hypothesis reflects a research claim. Consider the following null and alternative hypothesis. Null Alternative Defendant is innocent Defendant is guilty Machine is “in control” Machine is “out of control” (working according to specs) The new drug is no better than a placebo The new drug is better than a placebo The portfolio manager’s performance is The portfolio manager’s performance equal to the S&P 500 performance exceeds the S&P 500 performance In hypothesis testing, the null hypothesis is assumed as the default unless the evidence is strong enough to claim support for the alternative. It is a method of “proof by contradiction.” More precisely, it offers proof of the alternative hypothesis by contradicting the null hypothesis. 1. Defendant is assumed innocent until proven guilty 2. Machine is assumed to be “in control” until the evidence suggests otherwise. 3. A new drug is assumed to be ineffective unless it is shown to be better than a placebo (or some other benchmark). 4. A portfolio manager is assumed to perform no better than the S&P 500 unless… We use the weight of the evidence from the sample to see if we can reject the null hypothesis. If the weight of the sample evidence is such that the null hypothesis is unlikely, we reject the null. Type I and Type II Errors Since we use sample evidence to reject or not reject the null hypothesis, we face the possibility that there will be some errors in the decisions due to sampling variation. There are two types of errors: Type I and Type II Errors. A Type I error occurs if the null hypothesis is true, but it is rejected. This is akin to convicting an innocent defendant. A Type II error occurs if the null hypothesis is false, but it is not rejected. This is akin to acquitting a guilty defendant. These errors are well summarized in the table below: Ho Decision Reject Ho Do not reject Ho True False Type I error Correct Decision Type II error Correct Decision We want to keep both types of errors to a minimum. However, for a constant sample size, these errors are negatively correlated. Reducing one increases the likelihood of the other. Carrying forward the innocent-guilty example: A low probability of convicting someone who is innocent (Type I Error) implies a very high threshold for conviction of any defendant. But such a high threshold implies that you are likely to acquit a guilty defendant (Type II Error). So when you set the acceptable level of one type of error, you automatically set the level for the other type of error, unless you change the sample size. In practice, we typically control for Type I error. We often allow for a 5% level of Type I error. But the level of error is in fact a managerial decision when doing hypothesis testing. If Type I Error is more costly than Type II error, then managers will want to keep Type I error to a minimum. If Type II error is more costly that Type I error, then managers may accept higher Type I errors. Practice Exercise: Think of situations where Type I errors may be costlier than Type II errors and vice versa. 2. The Hypothesis Testing Process Hypothesis testing involves a series of steps as shown by the following example problem. Step 1: Defining the Null and Alternative Hypotheses A machine is designed to fill bottles with an average content of 12 oz. and a standard deviation of 0.1 oz. Periodically, random samples are taken to determine if the machine might be “out of control”. Define the null and alternative hypotheses for this problem using symbols and in words. Note: When defining the hypotheses, it is critical to define the population precisely. Null Hypothesis: Ho:  y = 12 oz. The true average content of all bottles filled by the machine during the time period of sampling is 12 oz. (note the precise definition of the population of interest) Alternative Hypothesis: HA:  y ≠ 12 oz. The true average content of all bottles filled by the machine during the time period of sampling is not 12 oz. Step 2: Specify the appropriate probability of Type I Error (alpha) Since we use sample evidence to reject or not reject the null hypothesis, we face the possibility of errors in the decisions due to sampling variation. There are two types of errors: Type I and Type II Errors. Recall that, for any given sample size, reducing the probability of Type I error increases the likelihood of Type II error. We typically control (minimize) the probability of Type I error. Ho Decision Reject Ho Do not reject Ho True False Type I error Correct Decision Type II error Correct Decision The implication of specifying a 5% type I error probability (alpha=0.05) is that we will reject the null hypothesis (Ho) even when it is true 5% of the time. Thus, the sample outcomes for which we reject Ho are determined under the assumption that Ho is true. We make our decision to reject or not reject the null as follows. If the sample outcome is among the 5% least likely outcomes assuming that the Null hypothesis is true, then we reject the null. The basic logic of this decision is that the 5% the least likely outcomes under the null hypothesis are more likely to occur if Ho is false. Note the similarity with a court procedure. A defendant is charged with a crime. The judge and jury assume the defendant is innocent until proven otherwise. If the sample evidence presented by the prosecutor is very unlikely to occur under the assumption that the defendant is innocent, the decision of guilt is favored. Thus, the decision is based on inconsistency (highly unlikely evidence to occur under the assumption of innocence) between the evidence and the assumption of innocence. How do we know what are the 5% least likely outcomes? The sampling distribution of the sample statistic of interest will help us to identify the least likely (most extreme) outcomes. So the next step is to set up an appropriate test statistic for this problem and identify what values of the statistic cause us to reject the null hypothesis. Step 3: Defining and Justifying the Relevant Test Statistic In the above problem, we know the population standard deviation (  ). Given random sampling, the sample mean Y will tend to follow a normal distribution with mean equal to the population mean (  ) and the standard deviation  . It is conceivable that the n content of individual bottles is actually normally distributed. But if it is not, the Central Limit Theorem allows us to claim that the theoretical distribution of all possible sample means, of a given sample size, will tend to a bell-shaped curve (as the sample size increases). Given this knowledge of the distribution, we can find out what the 5% least likely values of Y are. However, it is conventional to specify the rejection region in terms of extreme values for the standardized test statistic. If the null hypothesis is true, then the population mean should be  0 . So we specify the rejection region in terms of standardized units, i.e. we follow the usual procedure of standardizing the variable of interest when we need to compute probabilities. Z = Y  0 y Step 4: Determining the Rejection Region Having defined the test statistic, we now decide for what values of the test-statistic we reject the null hypothesis. The selection of the rejection region depends on whether the hypotheses are stated as one- or two-tailed tests. Probability For a two-tailed test, the five percent of the least likely values are split equally between the two tails as in the figure above. So Z  1.96 and Z  1.96 are the 5% least likely values if the null hypothesis is true. Therefore, we will reject Ho if the computed Z-value based on the sample data is in this rejection region. 95% Of Values Rejection Region 2.5% of Values -4 -3 Rejection Region Null Acceptance Region -1.96 -1 0 1 2.5% of Values 1.96 3 4 Z Steps 5 and 6: Computing the test statistics and drawing statistical and managerial conclusions Exercise: For the above bottling machine problem, suppose a simple random sample of 100 bottles is taken and the sample mean is 11.982 oz. Is this sample result among the five percent least likely to occur under the null hypothesis (Ho)? From the null hypothesis,  0 =12. Z = Y  0 y  11.982  12 0.18   1.8 . 0.01 0.1/ 100 Since Z  1.96 and Z  1.96 is the rejection region with the 5% least likely values, this computed value from the sample does not fall in the rejection region. Hence we cannot reject the null hypothesis (statistical conclusion). Strictly speaking, we conclude that the machine is functioning according to bottling content specifications at the time the sample was taken (managerial conclusion). However, one might argue that it is possible for the machine to be “out of control” but that we have insufficient evidence to reject Ho at the 5% level. Compare a jury’s decision that the defendant is not guilty. Summary of Steps in Hypothesis Testing 1. Identify the appropriate null and alternative hypotheses (Ho and HA). Be precise about the interpretation of hypotheses, by carefully identifying the population of interest. 2. Choose an acceptable probability of Type I error (alpha). 3. Define a relevant test statistic. Justify it. 4. Determine the rejection region (for what values of the test statistic will Ho be rejected?). 5. Collect the data (in practice, we need to consider the proper sample size so that the probability of a type II error is controlled), compute sample results and calculate the test statistic value. 6. Draw statistical and managerial conclusions. P-Values An interesting issue with deciding on the level of Type I error is: who should decide what type I error probability is tolerable? What if the person conducting the test does not know the decision maker’s tolerance? One solution to this problem is the following. Instead of testing H0 at a specified type I error probability (alpha), we can report the probability of a type I error if H0 is rejected This probability is called p-value. In the example above, what is the probability of Type I error if the Ho is rejected? We can answer this directly by looking at what fraction of values of Z lies below -1.8 and above -1.8. Probability We can do this either by looking at the normal tables in a statistics textbook or using Excel. Rejection Region Rejection Region 3.6% of Values -4 -3 3.6% of Values -1.8 -1.08 -1 0 1 1.8 3 4 Z In Excel, the function =normsdist(Z) (Hint: this is short form for Normal Standardized Distribution) can be used to find out P(X<Z). Plugging this function in to Excel tell us that P(X<-1.8) is 0.036; i.e., 3.6% of the values lie below -1.8. Since the rejection region includes both P(X<-1.8) and P(X>1.8), the probability of Type I error will be 0.036*2=0.072. Hence the p-value is 0.072. Question: What should we do to compute p-value when Z is positive? When Z is positive we need to compute P(X>Z). Since P(X>Z) = 1 – P(X<Z), simply compute 1-normdist(Z), when Z is positive and get the p-value by multiplying that number by 2. 3. One-tailed versus Two-tailed tests Two-tailed test: Recall: In the example in the previous section, we conducted a two-tailed test. The null and the alternative hypotheses were: Null Hypothesis: Ho: Y  12 oz Alternative Hypothesis: HA: Y  12 oz . We rejected the null if Z  1.96 or Z  1.96 . This implies that the person conducting the test wants to stop the machine if it is out of control in either direction. That is, there is a cost associated with having an excessive amount of liquid as well as with an insufficient amount in the bottles. One-tailed test: Suppose in the machine-bottling problem, we take the perspective of a distributor, who does not care if the bottles truly have more than 12 oz. on average (or the bottles have a maximum capacity of 12 oz). That is, the distributor is only concerned about insufficient content on average. So the distributor takes samples out of batches of items and returns the entire batch if the hypothesis test shows that the contents are less than 12 oz on average. Now the null and the alternative hypotheses are: Null Hypothesis: Alternative Hypothesis: Ho: Y  12 oz . HA: Y  12 oz . For a one-tailed test, the alternate hypothesis expresses the values of the parameter for which we want to reject the null hypothesis. If we create mutually exclusive and collectively exhaustive hypotheses, the null hypothesis must then have the complement of the alternative. Note that it is usually easier, in practice, to start with the alternative hypothesis. It represents what management is concerned about or what a researcher might believe based on theory. The test proceeds under the assumption that the equality case under the null hypothesis applies. In other words, the machine is assumed to be in control, i.e. the true average is assumed to be 12 oz. But now only the 5% extreme cases in the left tail will result in rejection of the null hypothesis. As before, we use the Z statistic because we know the population standard deviation. As argued above, the Z-statistic is still computed at the boundary of the null hypothesis (12 oz). Z = Y  0 y  11.982  12 0.18   1.8 . 0.01 0.1/ 100 Since now all of the extreme values for rejection are concentrated in the left tail, the 5% rejection region is for computed Z  1.645 . (See the figures on the next page) The interesting finding is that for the same test result, we cannot reject the null hypothesis at the 5% level if the test is two-tailed but we can if it is one-tailed. Therefore, for the one-tailed test, we reject the null hypothesis. (Statistical Conclusion) The batch of bottles received by the distributor from this manufacturer has lower than the specified contents of 12 oz and therefore must be returned to the manufacturer. (Managerial Conclusion) Probability Note that the two-tailed test is more conservative in rejecting the null hypothesis. While the z-score needs to be below –1.96 to reject the null in the two tailed test, it needs to be just –1.645 to reject the null in the one-tailed test. In this case, the distributor thus rejects the null hypothesis with a lower threshold of evidence, than the manufacturer. Null Acceptance Region Rejection Region 95% Of Values -4 -3 -2 -1 0 5% of Values 1 1.645 3 4 Probability Z Rejection Region Null Acceptance Region 5% of Values -4 -3 95% Of Values -1.645 -1 Z 0 1 3 4 4. What are appropriate test-statistics for hypothesis testing? The choice of the appropriate test-statistic is critical in hypothesis testing. In the example above we were testing a hypothesis about a population mean. We used a Z-statistic, because we knew the population standard deviation and we knew Y was normally distributed. Even if Y is not normally distributed, we know by the Central Limit Theorem that Y will be normally distributed for sufficiently large samples. Therefore we can use the Z-statistic. However, if the population standard deviation is not known, we need to use a t-statistic instead to compensate for additional uncertainty due to the use of the sample standard deviation instead of the population value. Note that we still require the theoretical distribution of all possible sample means to be normal. We now discuss appropriate test statistics for means and proportions under different conditions. Test of One Mean Population Standard Deviation is known: H0:  = 0 HA:   0 Condition: If (1) simple random sampling (2) Y is normally distributed (because Y is normal or Central Limit Theorem applies) and (3)  is known, then use the Z statistic: Z= Y-μ  y where:  y =  n (assuming N is large so that we can ignore the finite population correction factor) Population Standard Deviation is unknown Condition: If (1) simple random sampling (2) Y is normally distributed (because Y is normal or Central Limit Theorem applies) and (3)  is unknown, then use the t statistic: t= Y-μ sy with (n-1) df where: s y = s n (assuming N is large; re: finite population correction) Test of one proportion H0:  = 0 HA:   0 Condition: If (1) simple random sampling (2) ̂ is approximately normally distributed (for large N) then we can use the Z statistic: Z= ˆ -  0 ˆ  where:  ˆ =  (1   ) 0 0 n However this statistic can be used only if the following conditions are satisfied: n 0 > 5 and n (1-0) > 5 Example: John Rowland and Bill Curry are candidates for CT Governor. We are interested in knowing who is likely to win the election on Nov 5, 2002. A survey of 900 CT “likely voters” on October 30, 2002 asked who they intend to vote on election day. 56% of respondents said they intend to vote for Rowland and 44% for Bill Curry. Test the hypothesis that one of the candidates is more likely to win the election. Let  be the proportion of likely voters who intend to vote for Bill Curry on Nov 5, as of October 30. (You could just as well have written the hypothesis in terms of voters intending to vote for Rowland and do this test). Hypothesis testing H o :   .5 (we assume, until proven otherwise, that Curry (and therefore Rowland) has 50 percent of the vote among all likely voters) H A :   .5 Random sampling:  ˆ   1    Var  n E ̂   ̂ approximately normal if n and n 1    both > 5 z ˆ   0  ˆ  ˆ   0  (1   ) 0 0 n At   .05 reject Ho if Z(calculated) ≥ 1.96 or ≤ -1.96 ˆ Suppose n = 900   0.44 (this would be 0.56, if you had written  in terms of votes for Rowland) Calculate z-value, z  .44  .50 .5.5  .06  3.59 .017 900 Since -3.59 < -1.96, reject Ho Or, calculate p-value Prob [Z  -3.59] = .001 p-value = .001 *2 = .002 Since .002 < .05, reject Ho Test of Two Samples Independent versus paired samples 1. Suppose we wish to test which of two ads, consumers like better. We can use either independent samples or paired samples. If we use independent samples, then we can show the two ads to two different samples of consumers and ask them to rate the ads. We can then compare the average ratings of the two ads in performing the hypothesis test. Alternatively, we can take one sample of consumers and show both ads to the consumers. We can then compare the ratings of each individual for the two ads. This would be an example of a paired sample. 2. Suppose we want to test whether married men or women are happier. Here again we could use independent samples or paired samples. If we use independent samples, then we can ask a sample of married men and an independent sample of married women about their happiness. We can the compare the average ratings of the two ads in performing the hypothesis test. Alternatively, we can pick married couples and ask both the men and the women about their happiness. Then we can look at the difference in happiness reported by each man and his wife. We can test whether this difference (one number for each couple) is significantly different from zero. This would be a paired sample test. Independent samples: each sample is randomly drawn from a separate population, and there is no linkage between successive draws. Paired samples: the population of interest is defined in terms of pairs in such a way that the paired observations have something in common. Comparing Means for Paired Samples: Population Standard Deviation is unknown Paired samples: As discussed earlier, the population of interest is defined in terms of pairs in such a way that the paired observations have something in common. Note that in these examples the pairs can either be two different individuals or two different measures on each individual Given random sampling, EY1  Y 2   1   2  With paired samples, Var Y1  Y 2  = Var Y1  Var Y 2  2CovY1, Y 2  Thus, if there is a positive covariance, pairing reduces the variance of the difference between the sample means Instead of accommodating the covariance, we can create differences between the paired observations; the greater the positive covariance between Y1 and Y2, the smaller the variance of the difference (compared to the variance of Y1 and the variance of Y2). In fact, for paired samples, we take the differences and then perform the single sample hypothesis test on the differences. If Y1 and Y2 are a paired sample, then create a difference variable YD = Y1 – Y2. Given the following null and alternative hypothesis Null: H0:  D  D0 (true average difference in the population is D0) Alternative: HA  D  D0 If random sampling, If Y D normally distributed (conditions?), t Y D  D0 sY D with (n-1) df where n is number of pairs Comparing Means for Independent Samples: Population Standard Deviation is known Two samples, sizes n1 and n2 We are interested in the difference between, say, two population means, 1   2  Define two random variables, Y 1 and Y 2   Given random sampling, E Y1  Y 2   1  2     Var Y   Var Y  With independent sampling, Var Y 1  Y 2 1   12 Hence  Y 1 Y 2      12 n1  2  22 n2  22 n1 n2 Given the following null and alternative hypothesis Null: H0:  1  2   D0 Alternative:  1  2   D0 If normality applies, then the test statistic is Z Y 1 If Ho :   Y 2  D0  Y 1 Y 2   1  2   0 Then Z  Y 1  Y 2  0  Y Y  1 2 = Y1  Y 2 2 2 1  2 n1 n 2 Independent Samples: Population Standard Deviation is unknown Everything else is the same as above, but if 1,  2 unknown, replace it with s1 and s2 and compute the t-statistic. Two samples, sizes n1 and n2 We are interested in the difference between, say, two population means, 1   2  Define two random variables, Y 1 and Y 2   Given random sampling, E Y1  Y 2   1  2     Var Y   Var Y  With independent sampling, Var Y 1  Y 2 1  2 s12 s22  n1 n2 s12 s22  Hence sY 1 Y 2   n1 n2 Given the following null and alternative hypothesis Null: H0:  1  2   D0 Alternative:  1  2   D0 If normality applies, then the test statistic is t Y   Y 2  D0 1  Y Y  1 If Ho : with  n1  n2  2 df 2  1  2   0 Then t  Y 1  Y 2  0 sY 1 Y 2  = Y1 Y 2 s12 s22  n1 n2 with  n1  n2  2 df Example: The table below provides the salaries of MBA students at a mid-western school before they started their MBA and after they finished their MBA. Test the hypothesis that their salaries “after MBA” is different from the salaries “before MBA”. Before MBA After MBA 60 40 35 75 52 35 50 40 35 140 45 35 45 110 130 15 62 36 N Average Std Devn 75 45 50 90 70 45 65 55 50 130 55 55 55 110 130 15 72 29 Percentage After-Before Change 15 5 15 15 18 10 15 15 15 -10 10 20 10 0 0 15 10 8 0.25 0.13 0.43 0.20 0.35 0.29 0.30 0.38 0.43 -0.07 0.22 0.57 0.22 0.00 0.00 15.00 0.25 0.18 The data represent paired samples. By creating the Difference, we eliminate the common covariance which is quite large. As a result, the standard deviation of Difference is much smaller than of Before or After (as can be seen from the table) If random sampling, Ho : D  0 Ho : D  0 (true average difference in the population is zero) If Y D normally distributed (conditions?), t YD  D sY D t 10  4.84 8 / 15 with (n-1) df where n is number of pairs Since t 14,0.025=2.14, we can reject the null. Prob [t ≥ 4.84] < .005 So, p-value < .01 Contrast this with the result if we assumed (wrongly) that the data comes from two independent samples: 2 2  36   29  s Y 1 Y 2        11.93    15   15  t Y1  Y 2  0 s Y  Y  1 2 = 10 11.93 = 0.84 Since t 28,0.025=2.04, with the mistaken assumption of independent sampling, we cannot reject the null. This example illustrates the importance of designing a statistical study based on an understanding of statistical principles.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Hypothesis testing