Download Confidence Sets - George Mason University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

German tank problem wikipedia , lookup

Doomsday argument wikipedia , lookup

Transcript
Confidence Sets
For statistical confidence sets, the basic problem is to use a
random sample X from an unknown distribution P to determine
a random subfamily A(X) of a given family of distributions P
such that
PrP (A(X) 3 P ) ≥ 1 − α
∀P ∈ P,
for some given α.
The set A(X) is called a 1−α confidence set or confidence region.
The “confidence level” is 1 − α, so we sometimes call it a “level
1 − α confidence set”.
Notice that α is given a priori.
We call
inf PrP (A(X) 3 P )
P ∈P
the confidence coefficient of A(X).
Confidence in Confidence Sets
If the confidence coefficient of A(X) is > 1 − α, then A(X) is
said to be a conservative 1 − α confidence set.
We generally wish to determine a region with a given confidence
coefficient, rather than with a given significance level.
If the distributions are characterized by a parameter θ in a given
parameter space Θ an equivalent 1 − α confidence set for θ is a
random subset S(X) such that
Prθ (S(X) 3 θ) ≥ 1 − α
∀θ ∈ Θ.
Optimal Confidence Sets
A desirable property of a confidence set is that it be “small” is
some sense.
We will consider various criteria for optimality and then seek
confidence sets that are optimal with respect to those criteria.
As with other problems in statistical inference, it is often not
possible to develop a procedure that is uniformly optimal.
As with the estimation problem, we can impose restrictions, such
as unbiasedness or equivariance.
Optimal Confidence Sets
We can define optimality in terms of a global averaging over the
family of distributions of interest.
If the the global averaging is considered to be a true probability
distribution, then the resulting confidence intervals can be interpreted differently, and it can be said that the probability that the
distribution of the observations is in some fixed family is some
stated amount.
(The HPD Bayesian credible regions can also be thought of as
optimal sets that address similar applications in which confidence
sets are used.)
Approximate Confidence Sets
Because determining an exact 1 − α confidence set requires that
we know the exact distribution of some statistic, we often have
to form approximate confidence sets.
We have previously discussed three general ways of approximate
inference, and we will consider them in the context of confidence
sets.
Construction and Properties
Our usual notion of a confidence interval relies on a frequency
approach to probability, and it leads to the definition of a 1 − α
confidence interval for the (scalar) parameter θ as the random
interval [TL, TU ], that has the property
Pr (TL ≤ θ ≤ TU ) = 1 − α.
This is also called a (1 − α)100% confidence interval.
The interval [TL, TU ] is not uniquely determined.
A realization of the random interval, say [tL, tU ], is also called a
confidence interval.
Vector-Valued Parameters
The concept extends easily to vector-valued parameters.
A simple extension would be merely to let TL and TU , and let
the confidence region be hyperrectangle defined by the cross
products of the intervals.
Rather than taking vectors TL and TU , however, we generally
define other types of regions; in particular, we often take an
ellipsoidal region whose shape is determined by the covariances
of the estimators.
Interpretation
Although it may seem natural to state that the “probability that
θ is in [tL , tU ] is 1 − α”, this statement can be misleading unless
a certain underlying probability structure is assumed.
Construction and Properties
In practice, the interval is usually specified with respect to an
estimator of θ, T (X).
If we know the sampling distribution of T − θ, we may determine
c1 and c2 such that
Pr (c1 ≤ T − θ ≤ c2) = 1 − α;
and hence
Pr (T − c2 ≤ θ ≤ T − c1) = 1 − α.
If either TL or TU is infinite or corresponds to a bound on acceptable values of θ, the confidence interval is one-sided.
For two-sided confidence intervals, we may seek to make the
probability on either side of T to be equal.
This is called an equal-tail confidence interval.
We may, rather, choose to make c1 = −c2, and/or to minimize
|c2 − c1| or |c1| or |c2|.
This is similar in spirit to seeking an estimator with small variance.
Prediction Sets
We often want to identify a set in which a future observation on
a random variable has a high probability of occurring.
This kind of set is called a prediction set.
For example, we may assume a given sample X1, . . . , Xn is from
a N(µ, σ 2 ) and we wish to determine a measurable set C(X) such
that for a future observation Xn+1
inf PrP (Xn+1 ∈ C(X)) ≥ 1 − α.
P ∈P
More generally, instead of Xn+1 , we could define a prediction
interval for any random variable Y .
The difference in this and a confidence set for µ is that there is
an additional source of variation.
The prediction set will be larger, so as to account for this extra
variation.
Prediction Sets and Tolerance Sets
We may want to separate the statements about Y and S(X).
A tolerance set attempts to do this.
Given a sample X, a measurable set S(X), and numbers δ and
α in ]0, 1[, if
inf ( inf PrP (Y ∈ S(X)|X) ≥ δ) ≥ 1 − α,
P ∈P P ∈P
then S(X) is called a δ-tolerance set for Y with confidence level
1 − α.
Randomized confidence Sets
For discrete distributions, as we have seen, sometimes to achieve
a test of a specified size, we had to use a randomized test.
Confidence sets may have exactly the same problem – and solution – in forming confidence sets for parameters in discrete
distributions.
We form randomized confidence sets.
The idea is the same as in randomized tests, and we will discuss
randomized confidence sets in the context of hypothesis tests
below.
Pivot Functions
A straightforward way to form a confidence interval is to use
a function of the sample that also involves the parameter of
interest, but that does not involve any nuisance parameters.
The confidence interval is then formed by separating the parameter from the sample values.
A class of functions that are particularly useful for forming confidence intervals are called pivotal values, or pivotal functions.
A function f (T, θ) is said to be a pivotal function if its distribution
does not depend on any unknown parameters.
Pivot Functions
This allows exact confidence intervals to be formed for the parameter θ.
We first form
Pr(f(α/2) ≤ f (T, θ) ≤ f(1−α/2) ) = 1 − α,
where f(α/2) and f(1−α/2) are quantiles of the distribution of
f (T, θ); that is,
Pr(f (T, θ) ≤ f(π)) = π.
If, as in the case considered above, f (T, θ) = T − θ, the resulting
confidence interval has the form
Pr(T − f(1−α/2) ≤ θ ≤ T − f(α/2)) = 1 − α.
For example, suppose Y1, Y2, . . . , Yn is a random sample from a
N(µ, σ 2 ) distribution, and Y is the sample mean.
The quantity
q
f (Y , µ) =
n(n − 1) (Y − µ)
q
P
(Y i − Y )2
has a Student’s t distribution with n − 1 degrees of freedom, no
matter what is the value of σ 2 .
This is one of the most commonly-used pivotal values.
The pivotal value can be used to form a confidence interval for
θ by first writing
Pr (t(α/2) ≤ f (Y , µ) ≤ t(1−α/2) ) = 1 − α,
where t(π) is a percentile from the Student’s t distribution.
Then, after making substitutions for f (Y , µ), we form the familiar
confidence interval for µ:
√
√
[Y − t(1−α/2) s/ n, Y − t(α/2) s/ n],
P
where s2 is the usual sample variance, (Yi − Y )2/(n − 1).
Other similar pivotal values have F distributions.
For example, consider the usual linear regression model in which
the n-vector random variable Y has a Nn(Xβ, σ 2I) distribution,
where X is an n × m known matrix, and the m-vector β and the
scalar σ 2 are unknown.
A pivotal value useful in making inferences about β is
b − β))T X(βb − β)/m
β
X(
(
b β) =
,
g(β,
b
b
T
(Y − X β) (Y − X β)/(n − m)
where
βb = (X T X)+ X T Y.
b β) for any finite value of σ 2 has an F
The random variable g(β,
distribution with m and n − m degrees of freedom.
For a given parameter and family of distributions there may be
multiple pivotal values. For purposes of statistical inference,
such considerations as unbiasedness and minimum variance may
guide the choice of a pivotal value to use.
Approximate Pivot Values
It may not be possible to identify a pivotal quantity for a particular parameter. In that case, we may seek an approximate
pivot.
A function is asymptotically pivotal if a sequence of linear transformations of the function is pivotal in the limit as n → ∞.
If the distribution of T is known, c1 and c2 can be determined.
If the distribution of T is not known, some other approach must
be used.
A common method is to use some numerical approximation to
the distribution.
Another method is to use bootstrap samples from the ECDF.
Relation to Acceptance Regions of
Hypothesis Tests
A test at the α level has a very close relationship with a 1 − α
level confidence set.
When we test the hypothesis H0 : θ ∈ ΘH0 at the α level, we
form a critical region for a test statistic or rejection region for
the values of the observable X.
This region is such that the probability that the test statistic is
in it is ≤ α.
For any given θ0 ∈ Θ, consider the nonrandomized test Tθ0 for
testing the simple hypothesis H0 : θ = θ0, against some alternative H1 .
We let A(θ0) be the set of all x such that the test statistic is not
in the critical region; that is, A(θ0) is the acceptance region.
Now, for any θ and any value x in the range of X, we let
C(x) = {θ : x ∈ A(θ)}.
For testing H0 : θ = θ0 at the α significance level, we have
sup Pr(X ∈
/ A(θ0) | θ = θ0) ≤ α;
that is,
1 − α ≤ inf Pr(X ∈ A(θ0) | θ = θ0) = inf Pr(C(X) 3 θ0 | θ = θ0).
This holds for any θ0, so
inf PrP (C(X) 3 θ) = inf inf PrP (C(X) 3 θ0 | θ = θ0 )
θ0∈Θ
≥ 1 − α.
P ∈P
Hence, C(X) is a 1 − α level confidence set for θ.
If the size of the test is α, the inequalities are equalities, and so
the confidence coefficient is 1 − α.
For example, suppose Y1, Y2, . . . , Yn is a random sample from a
N(µ, σ 2 ) distribution, and Y is the sample mean.
To test H0 : µ = µ0, against the universal alternative, we form
the test statistic
q
T (X) =
n(n − 1) (Y − µ0)
q
P
(Y i − Y )2
which, under the null hypothesis, has a Student’s t distribution
with n − 1 degrees of freedom.
An acceptance region at the α level is
[t(α/2), t(1−α/2) ],
and hence, putting these limits on T (X) and inverting, we get
√
√
[Y − t(1−α/2) s/ n, Y − t(α/2) s/ n],
which is a 1 − α level confidence interval.
The test has size α and so the confidence coefficient is 1 − α.
Randomized confidence Sets
To form a 1 − α confidence level set, we form a nonrandomized
confidence set (which may be null) with 1 − α1 confidence level,
with 0 ≤ α1 ≤ α, and then we define a random experiment with
some event that has a probability of α − α1.
Optimal Confidence Sets
We often evaluate a confidence set using a family of distributions
that does not include the true parameter.
For example, “accuracy” is the (true) probability of the set including an incorrect value.
The “volume” (or “length”) of a confidence set is the Lebesgue
measure of the set:
vol(C(x)) =
Z
C(x)
This may not be finite.
dθ̃.
If the volume is finite, we have ( Theorem 7.6 in Shao)
Eθ (vol(C(x))) =
Z
θ6=θ̃
Prθ (C(x) 3 θ̃)dθ̃.
We see this by a simple application of Fubini’s theorem to handle
the integral over the product space, and then an interchange of
integration:
Want to minimize volume (if appropriate; i.e., finite.)
Want to maximize accuracy.
Uniformly most accurate 1 − α level set:
Prθ (C(X) 3 θ̃) is minimum among all 1 − α level sets and ∀θ̃ 6= θ.
This definition of UMA may not be so relevant in the case of a
one-sided confidence interval.
f is a subset of Θ that does not include θ, and
If Θ
Prθ (C(X) 3 θ̃) ≤ Prθ (C1 (X) 3 θ̃)
f then C(X) is said to
for any 1 − α level set C1(X) and ∀θ̃ ∈ Θ,
f
most accurate.
be Θ-uniformly
A confidence set formed by inverting a nonrandomized UMP test
is UMA.
We see this easily from the definitions of UMP and UMA. (This
is Theorem 7.4 in Shao.)
Just as sometimes no UMP exists, sometimes we cannot form a
UMA confidence interval, so we add some criterion.
f that does not
We define unbiasedness in terms of a subset Θ
include the true θ.
f
A 1 − α level confidence set C(X) is said to be Θ-unbiased
if
Prθ (C(X) 3 θ̃) ≤ 1 − α
f
∀θ̃ ∈ Θ.
f = {θ}c, we call the set unbiased.
If Θ
f
A Θ-unbiased
set that is uniformly more accurate (“more” is
f
set is
defined similarly to “most”) than any other Θ-unbiased
said to be a uniformly most accurate unbiased (UMAU) set.