Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
References Chapter 8 - Sections 1,2, and 4, Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, 3rd Edition, Addison-Wesley, Boston. Chapter 6 - Section 1, Bernard W. Lindgren, Statistical Theory, 3rd Edition, MacMillan, New York. An Overview of Hypothesis Testing Assume we have an estimator t(X) of an unknown parameter 2. A null hypothesis is chosen. This is simply a statement about the numerical value of the unknown parameter. The objective is to test the validity of the null hypothesis. Since the estimator is itself a random variable, its value will generally differ from the hypothesized value. This is to be expected. But when is a difference "large enough" to be construed as statistical evidence against the hypothesized value? The ex-ante probability of the observed sample statistic, t(X), is computed for the hypothesized value of 2. If this probability is “unusual” or “extreme” in the sense of falling below some threshold, then the sample value of t(X) is considered to be evidence against the null hypothesis, and the null is rejected. Otherwise, we accept the null hypothesis. Some authors replace the word “accept” in the previous sentence with the phrase “fail to reject.” I do not have strong preferences regarding this terminological dispute, provided that you understand that acceptance of the null hypothesis is not proof of its validity, in the same way that rejection of the null hypothesis is not proof of its invalidity. You cannot generate proof from statistics (except for trivial problems). The Null Hypothesis In application, the null hypothesis is often of simple form. A simple hypothesis is one that allows only a single value of the unknown parameter. This is typically written as Ho: 2=2o, where 2o denotes some fixed value of the unknown parameter 2. In the context of classical regression, the most common simple null hypothesis is of the form $j=0. This hypothesis states that the regressor Xj may be omitted from the regression. If we reject the null hypothesis, we say that $j is “significantly different than zero.” Many authors will truncate this phrase to “significant.” In some cases, the simple null hypothesis will be of the form $j=c for some constant c. For example, if we are estimating a demand equation, we might want to test whether the price elasticity of demand is unitary. In this example, if we reject the null hypothesis, we say that the price elasticity of demand is significantly different than one. There are cases where the null hypothesis is composite. A composite hypothesis allows more than one value of the unknown parameter (typically an interval). There are many examples we could consider, but one such example is Ho: 2<2o. In the context of classical regression, a composite null hypothesis is often of the form $j<0 or $j>0. For example, if we are estimating a demand equation, we might want to test whether the income elasticity is positive or negative. If the null hypothesis $j<0 is rejected, we say that $j is “significantly positive.” If the null hypothesis $j>0 is rejected, we say that $j is “significantly negative.” Note that “significantly positive” is different than “positive and significant(ly different than zero),” although there are many authors that do not understand this distinction and use the phrases interchangeably. Likewise, “significantly negative” is different than “negative and significant(ly different than zero).” Finally, the term “null space” refers to the subset of the parameter space that is specified by the null hypothesis. The “alternative space” is best defined as the compliment of the null space. In this way, the null and alternative spaces partition the parameter space into disjoint subsets. Absent specific knowledge that allows us to eliminate portions of the parameter space from consideration, failure to define the alternative space in the prescribed manner can overstate the reported power of the test. The Critical Region The critical region, denoted C, is defined as the subset of the sample space for which the null hypothesis is rejected. That is, C = { t(X) | reject null } These are values of the estimator that are judged to be "so unlikely" under the null hypothesis, that we feel compelled to reject the null. The subjectivity of this statement is clear. In this course, we will not consider the question of how to find an optimal critical region. This material is available in many mathematical statistics textbooks under the topic “uniformly most powerful tests.” Instead, we will focus on the intuitive relationship between the form of the hypothesis pair and the form of the corresponding critical region. Eventually, we will discuss the Likelihood Ratio critical region, which provides an intuitive method of finding a critical region for many different structures. The LR critical region typically has desirable characteristics and is often of optimal form. The Power Function In order to measure the performance of a critical region (or test), we must introduce the concept of the power function. The power function, B(C:2), gives the probability the sample statistic falls in the critical region, C, as a function of the unknown parameter, 2. That is, B(C:2) = P2[t(X)0C] The power function gives the probability of rejecting the null hypothesis for alternative values of the unknown parameter. Note that the form of the power function depends on the form of the critical region C and the distribution of t(X), while the numerical value of the power function is dependent on the value of the unknown parameter 2. The process of hypothesis testing admits two types of errors. Type I error occurs when the null hypothesis is rejected when it is true. Type II error occurs when the null hypothesis cannot be rejected when it is false. The ideal power function would take the value 1 for all 2 in the alternative space, and the value 0 for all 2 in the null space. In this way, the probabilities of type 1 and type II errors would always be zero. Obviously, the ideal power function cannot be attained by any reasonable statistical test, since it implies a random process yielding valid conclusions with certainty. The power function may be used to define two summary measure of the performance of a test. The " level of a test is the supremum (the least upper bound or maximum when it exists) of the power function over the null space. That is, sup α = θ ∈ H π (C:θ ) o The " level is the largest probability of Type I error. The $ level of a test is one minus the infimum (greatest lower bound or minimum when it exists) of the power function over the alternative space. That is, β = 1− inf π (C:θ ) θ ∈ HA The $ level is the largest probability of Type II error. The relationship between the states of nature and statistical decisions is summarized in the following table. H0 True H0 False Reject H0 Type I Error (") No Error (1-$) Accept H0 No Error (1-") Type II Error ($) The power function may be used to define another important property of statistical tests. A test is unbiased if sup inf θ ∈ HO π (C:θ ) ≤ θ ∈ H A π (C:θ ) This property can be stated less precisely as follows; a test is unbiased if its power on the null space is always less than its power on the alternative space. An unbiased critical region has the intuitively pleasing property that the probability of rejecting the null is always greater when the null is false than when it is true.