Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Confidence Sets For statistical confidence sets, the basic problem is to use a random sample X from an unknown distribution P to determine a random subfamily A(X) of a given family of distributions P such that PrP (A(X) 3 P ) ≥ 1 − α ∀P ∈ P, for some given α. The set A(X) is called a 1−α confidence set or confidence region. The “confidence level” is 1 − α, so we sometimes call it a “level 1 − α confidence set”. Notice that α is given a priori. We call inf PrP (A(X) 3 P ) P ∈P the confidence coefficient of A(X). Confidence in Confidence Sets If the confidence coefficient of A(X) is > 1 − α, then A(X) is said to be a conservative 1 − α confidence set. We generally wish to determine a region with a given confidence coefficient, rather than with a given significance level. If the distributions are characterized by a parameter θ in a given parameter space Θ an equivalent 1 − α confidence set for θ is a random subset S(X) such that Prθ (S(X) 3 θ) ≥ 1 − α ∀θ ∈ Θ. Optimal Confidence Sets A desirable property of a confidence set is that it be “small” is some sense. We will consider various criteria for optimality and then seek confidence sets that are optimal with respect to those criteria. As with other problems in statistical inference, it is often not possible to develop a procedure that is uniformly optimal. As with the estimation problem, we can impose restrictions, such as unbiasedness or equivariance. Optimal Confidence Sets We can define optimality in terms of a global averaging over the family of distributions of interest. If the the global averaging is considered to be a true probability distribution, then the resulting confidence intervals can be interpreted differently, and it can be said that the probability that the distribution of the observations is in some fixed family is some stated amount. (The HPD Bayesian credible regions can also be thought of as optimal sets that address similar applications in which confidence sets are used.) Approximate Confidence Sets Because determining an exact 1 − α confidence set requires that we know the exact distribution of some statistic, we often have to form approximate confidence sets. We have previously discussed three general ways of approximate inference, and we will consider them in the context of confidence sets. Construction and Properties Our usual notion of a confidence interval relies on a frequency approach to probability, and it leads to the definition of a 1 − α confidence interval for the (scalar) parameter θ as the random interval [TL, TU ], that has the property Pr (TL ≤ θ ≤ TU ) = 1 − α. This is also called a (1 − α)100% confidence interval. The interval [TL, TU ] is not uniquely determined. A realization of the random interval, say [tL, tU ], is also called a confidence interval. Vector-Valued Parameters The concept extends easily to vector-valued parameters. A simple extension would be merely to let TL and TU , and let the confidence region be hyperrectangle defined by the cross products of the intervals. Rather than taking vectors TL and TU , however, we generally define other types of regions; in particular, we often take an ellipsoidal region whose shape is determined by the covariances of the estimators. Interpretation Although it may seem natural to state that the “probability that θ is in [tL , tU ] is 1 − α”, this statement can be misleading unless a certain underlying probability structure is assumed. Construction and Properties In practice, the interval is usually specified with respect to an estimator of θ, T (X). If we know the sampling distribution of T − θ, we may determine c1 and c2 such that Pr (c1 ≤ T − θ ≤ c2) = 1 − α; and hence Pr (T − c2 ≤ θ ≤ T − c1) = 1 − α. If either TL or TU is infinite or corresponds to a bound on acceptable values of θ, the confidence interval is one-sided. For two-sided confidence intervals, we may seek to make the probability on either side of T to be equal. This is called an equal-tail confidence interval. We may, rather, choose to make c1 = −c2, and/or to minimize |c2 − c1| or |c1| or |c2|. This is similar in spirit to seeking an estimator with small variance. Prediction Sets We often want to identify a set in which a future observation on a random variable has a high probability of occurring. This kind of set is called a prediction set. For example, we may assume a given sample X1, . . . , Xn is from a N(µ, σ 2 ) and we wish to determine a measurable set C(X) such that for a future observation Xn+1 inf PrP (Xn+1 ∈ C(X)) ≥ 1 − α. P ∈P More generally, instead of Xn+1 , we could define a prediction interval for any random variable Y . The difference in this and a confidence set for µ is that there is an additional source of variation. The prediction set will be larger, so as to account for this extra variation. Prediction Sets and Tolerance Sets We may want to separate the statements about Y and S(X). A tolerance set attempts to do this. Given a sample X, a measurable set S(X), and numbers δ and α in ]0, 1[, if inf ( inf PrP (Y ∈ S(X)|X) ≥ δ) ≥ 1 − α, P ∈P P ∈P then S(X) is called a δ-tolerance set for Y with confidence level 1 − α. Randomized confidence Sets For discrete distributions, as we have seen, sometimes to achieve a test of a specified size, we had to use a randomized test. Confidence sets may have exactly the same problem – and solution – in forming confidence sets for parameters in discrete distributions. We form randomized confidence sets. The idea is the same as in randomized tests, and we will discuss randomized confidence sets in the context of hypothesis tests below. Pivot Functions A straightforward way to form a confidence interval is to use a function of the sample that also involves the parameter of interest, but that does not involve any nuisance parameters. The confidence interval is then formed by separating the parameter from the sample values. A class of functions that are particularly useful for forming confidence intervals are called pivotal values, or pivotal functions. A function f (T, θ) is said to be a pivotal function if its distribution does not depend on any unknown parameters. Pivot Functions This allows exact confidence intervals to be formed for the parameter θ. We first form Pr(f(α/2) ≤ f (T, θ) ≤ f(1−α/2) ) = 1 − α, where f(α/2) and f(1−α/2) are quantiles of the distribution of f (T, θ); that is, Pr(f (T, θ) ≤ f(π)) = π. If, as in the case considered above, f (T, θ) = T − θ, the resulting confidence interval has the form Pr(T − f(1−α/2) ≤ θ ≤ T − f(α/2)) = 1 − α. For example, suppose Y1, Y2, . . . , Yn is a random sample from a N(µ, σ 2 ) distribution, and Y is the sample mean. The quantity q f (Y , µ) = n(n − 1) (Y − µ) q P (Y i − Y )2 has a Student’s t distribution with n − 1 degrees of freedom, no matter what is the value of σ 2 . This is one of the most commonly-used pivotal values. The pivotal value can be used to form a confidence interval for θ by first writing Pr (t(α/2) ≤ f (Y , µ) ≤ t(1−α/2) ) = 1 − α, where t(π) is a percentile from the Student’s t distribution. Then, after making substitutions for f (Y , µ), we form the familiar confidence interval for µ: √ √ [Y − t(1−α/2) s/ n, Y − t(α/2) s/ n], P where s2 is the usual sample variance, (Yi − Y )2/(n − 1). Other similar pivotal values have F distributions. For example, consider the usual linear regression model in which the n-vector random variable Y has a Nn(Xβ, σ 2I) distribution, where X is an n × m known matrix, and the m-vector β and the scalar σ 2 are unknown. A pivotal value useful in making inferences about β is b − β))T X(βb − β)/m β X( ( b β) = , g(β, b b T (Y − X β) (Y − X β)/(n − m) where βb = (X T X)+ X T Y. b β) for any finite value of σ 2 has an F The random variable g(β, distribution with m and n − m degrees of freedom. For a given parameter and family of distributions there may be multiple pivotal values. For purposes of statistical inference, such considerations as unbiasedness and minimum variance may guide the choice of a pivotal value to use. Approximate Pivot Values It may not be possible to identify a pivotal quantity for a particular parameter. In that case, we may seek an approximate pivot. A function is asymptotically pivotal if a sequence of linear transformations of the function is pivotal in the limit as n → ∞. If the distribution of T is known, c1 and c2 can be determined. If the distribution of T is not known, some other approach must be used. A common method is to use some numerical approximation to the distribution. Another method is to use bootstrap samples from the ECDF. Relation to Acceptance Regions of Hypothesis Tests A test at the α level has a very close relationship with a 1 − α level confidence set. When we test the hypothesis H0 : θ ∈ ΘH0 at the α level, we form a critical region for a test statistic or rejection region for the values of the observable X. This region is such that the probability that the test statistic is in it is ≤ α. For any given θ0 ∈ Θ, consider the nonrandomized test Tθ0 for testing the simple hypothesis H0 : θ = θ0, against some alternative H1 . We let A(θ0) be the set of all x such that the test statistic is not in the critical region; that is, A(θ0) is the acceptance region. Now, for any θ and any value x in the range of X, we let C(x) = {θ : x ∈ A(θ)}. For testing H0 : θ = θ0 at the α significance level, we have sup Pr(X ∈ / A(θ0) | θ = θ0) ≤ α; that is, 1 − α ≤ inf Pr(X ∈ A(θ0) | θ = θ0) = inf Pr(C(X) 3 θ0 | θ = θ0). This holds for any θ0, so inf PrP (C(X) 3 θ) = inf inf PrP (C(X) 3 θ0 | θ = θ0 ) θ0∈Θ ≥ 1 − α. P ∈P Hence, C(X) is a 1 − α level confidence set for θ. If the size of the test is α, the inequalities are equalities, and so the confidence coefficient is 1 − α. For example, suppose Y1, Y2, . . . , Yn is a random sample from a N(µ, σ 2 ) distribution, and Y is the sample mean. To test H0 : µ = µ0, against the universal alternative, we form the test statistic q T (X) = n(n − 1) (Y − µ0) q P (Y i − Y )2 which, under the null hypothesis, has a Student’s t distribution with n − 1 degrees of freedom. An acceptance region at the α level is [t(α/2), t(1−α/2) ], and hence, putting these limits on T (X) and inverting, we get √ √ [Y − t(1−α/2) s/ n, Y − t(α/2) s/ n], which is a 1 − α level confidence interval. The test has size α and so the confidence coefficient is 1 − α. Randomized confidence Sets To form a 1 − α confidence level set, we form a nonrandomized confidence set (which may be null) with 1 − α1 confidence level, with 0 ≤ α1 ≤ α, and then we define a random experiment with some event that has a probability of α − α1. Optimal Confidence Sets We often evaluate a confidence set using a family of distributions that does not include the true parameter. For example, “accuracy” is the (true) probability of the set including an incorrect value. The “volume” (or “length”) of a confidence set is the Lebesgue measure of the set: vol(C(x)) = Z C(x) This may not be finite. dθ̃. If the volume is finite, we have ( Theorem 7.6 in Shao) Eθ (vol(C(x))) = Z θ6=θ̃ Prθ (C(x) 3 θ̃)dθ̃. We see this by a simple application of Fubini’s theorem to handle the integral over the product space, and then an interchange of integration: Want to minimize volume (if appropriate; i.e., finite.) Want to maximize accuracy. Uniformly most accurate 1 − α level set: Prθ (C(X) 3 θ̃) is minimum among all 1 − α level sets and ∀θ̃ 6= θ. This definition of UMA may not be so relevant in the case of a one-sided confidence interval. f is a subset of Θ that does not include θ, and If Θ Prθ (C(X) 3 θ̃) ≤ Prθ (C1 (X) 3 θ̃) f then C(X) is said to for any 1 − α level set C1(X) and ∀θ̃ ∈ Θ, f most accurate. be Θ-uniformly A confidence set formed by inverting a nonrandomized UMP test is UMA. We see this easily from the definitions of UMP and UMA. (This is Theorem 7.4 in Shao.) Just as sometimes no UMP exists, sometimes we cannot form a UMA confidence interval, so we add some criterion. f that does not We define unbiasedness in terms of a subset Θ include the true θ. f A 1 − α level confidence set C(X) is said to be Θ-unbiased if Prθ (C(X) 3 θ̃) ≤ 1 − α f ∀θ̃ ∈ Θ. f = {θ}c, we call the set unbiased. If Θ f A Θ-unbiased set that is uniformly more accurate (“more” is f set is defined similarly to “most”) than any other Θ-unbiased said to be a uniformly most accurate unbiased (UMAU) set.