Download SUFFICIENT STATISTICS 1. Introduction Let X = (X 1,...,Xn) be a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Central limit theorem wikipedia , lookup

Sufficient statistic wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
SUFFICIENT STATISTICS
1. Introduction
Let X = (X1 , . . . , Xn ) be a random sample from fθ , where θ ∈ Θ
is unknown. We are interested using X to estimate θ. In the simple
case where Xi ∼ Bern(p), we found that the sample mean was an
efficient estimator for p. Thus, if we observe a finite sequence of coin
flips, in order to have an efficient estimate of the probability, p, that
heads occurs in a single flip, we need only count the number of times
we see heads (and divide by the total number of flips), and we need not
worry about the order in which the heads and tails occurred. Note that
the sequences 100011 and 111000 lead to the same estimate, when we
use the sample mean. In what follows, we want to study the following
question: do we get any additional information about p by making use
of the order in which the heads and tails occurred? The sample mean
does not make use of the order, and it does give us an efficient estimator,
so, in short the answer in this case is no. Thus in this example, it
appears that we can greatly simplify and reduce the amount of data,
without affecting our ability to find good estimators.
2. Sufficient statistics
Let X = (X1 , . . . , Xn ) be a random sample from fθ , where θ ∈ Θ
is unknown. Recall that a T is a statistic if T = T (X) = u(X) for
some deterministic function u. We will assume that u does not depend
on θ. Some examples of that you are familiar with are when T is the
sample mean, the sample variance, and the maximum. Let us remark
that although in many important examples, T is one dimensional point
estimator, it need not be, for example, T (X) = X is a statistic. We say
that T is sufficient for θ if the conditional distribution of X given T
does not depend on θ. In the case were the random variables involved
are not discrete, even this definition requires somewhat advanced mathematics, since we might have that P(T = t) = 0, in which case it is not
immediate how one can make sense of
P(X ∈ · | T = t)
We will first discuss the discrete case, and then we will extend our
discussion to the continuous case.
1
2
SUFFICIENT STATISTICS
3. The discrete case
Exercise 1. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼
Bern(p). Show that the sample sum given by T = X1 + · · · + Xn is a
sufficient statistic for p.
Solution. Let x ∈ {0, 1}n and t ∈ {0, 1, . . . n}. We need to show that
P(X = x | T = t) does not depend on p. In fact, you already did this
computation in the first homework!
By definition,
P(X = x | T = t) =
P(X = x, T = t)
.
P(T = t)
We may assume that t = t(x) = x1 + · · · + xn , otherwise, P(X = x, T =
t) = 0. Thus, {X = x} = {X = x} ∩ {T = t}, and P(X = x, T = t) =
P(X = x). We have that
P(X = x) = L(x; p) =
n
Y
pxi (1 − p)1−xi = pt (1 − p)n−t
i=1
We also know that T ∼ Bin(n, p), so that
n t
P(T = t) =
p (1 − p)n−t .
t
Hence we obtain that
P(X = x | T = t) =
1
n ,
t
which does not depend on p.
Exercise 2. Discuss why you should expect that the final answer we
obtained in Exercise 1 is
1
P(X = x | T = t) = n .
t
In the discrete setting, we have that T is sufficient for θ if and only
if for all x and t = t(x), we have
P(X = x | T = t) =
P(X = x, T = t)
P(X = x)
=
= H(x),
P(T = t)
P(T = t)
for some function of H(x) which does not depend on θ.
Exercise 3. Let X = (X1 , . . . , Xn ) be a random sample, where Xi is a
discrete random variable that is uniformly distributed in {1, 2, . . . , θ}.
Show that M = max {X1 , . . . , Xn } is a sufficient statistic for θ.
SUFFICIENT STATISTICS
3
Solution. Let m ∈ {1, 2, . . . , θ}. Note that
{M ≤ m} =
n
\
{Xi ≤ m}
i=1
and
{M = m} = {M ≤ m} \ {M ≤ m − 1} .
Hence
1
(mn − (m − 1)n ).
n
θ
n
Let x ∈ {1, 2, . . . , θ} and m = max {x1 , . . . , xn }.
P(M = m) =
P(X = x, M = m)
P(M = m)
P(X = x)
=
P(M = m)
P(X = x | M = m) =
=
1
θn
1
(mn
θn
− (m − 1)n )
1
,
=
n
m − (m − 1)n
so we are done.
Exercise 4. Let X = (X1 , . . . , Xn ) be a random sample, where Xi is
a Poisson random variable with mean λ. Show that the sample sum
given by T = X1 + · · · + Xn is a sufficient statistic for λ.
In order to have some more examples to discuss, recall that X is a
geometric random variable with parameter p ∈ (0, 1), if
P(X = k) = p(1 − p)k−1 ,
for k = 1, 2, . . .. Thus X is the number of Bernoulli p trials required
to get a success. Here, EX = 1/p. Let us remark that sometimes
geometric random variables are defined so that P(X = k) = p(1 − p)k ,
for k = 0, 1, 2, . . . ; in this case X is the number of fails before a success,
and EX = p/(1 − p). Before we find a sufficient statistic for p, we do
a couple of preliminary exercises.
Exercise 5. Let X = (X1 , . . . , Xn ) be a random sample, where Xi is
a geometric random variable with parameter p ∈ (0, 1) and mean 1/p.
Show that the mle for p is given by 1/X̄.
4
SUFFICIENT STATISTICS
Exercise 6. Referring to Exercise 5, let T = X1 + · · · + Xn . Show that
for k = n, n + 1, . . ., we have
k−1 n
P(T = k) =
p (1 − p)k−n .
n−1
Solution. Note that T is the number for trials required to get n success. By counting we obtain the required formula: the last kth trial is
a success, and you are left with k − 1 trials, of which n − 1 of them
must be successes.
Exercise 7. Referring to Exercise 5, show that the sample sum given
by T = X1 + · · · + Xn is a sufficient statistic for p.
Solution. Let x ∈ {1, 2, 3, 4, . . .}n and t = x1 + · · · + xn . We have that
P(X = x)
P(T = t)
Qn
xi −1
i=1 p(1 − p)
= t−1
pn (1 − p)t−n
n−1
1
= t−1 ,
P(X = x | T = t) =
n−1
which does not depend on p.
4. The continuous case
In the continuous case, as in the case of likelihoods we work with
the density functions instead of the probabilities directly. Let X =
(X1 , . . . , Xn ) be a random sample from fθ , where θ ∈ Θ is unknown.
Let T = u(X) be a statistic with density function q(t). Then T is a
sufficient statistic for θ if for all x and t = t(x), we have
Qn
f (xi ; θ)
L(x; θ)
= i=1
= H(x),
q(t(x))
q(t(x))
for some function H which does not depend on θ.
Exercise 8. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼
U nif (0, θ), where θ is unknown. Show that M = max {X1 , . . . , Xn } is
a sufficient statistic for θ.
Exercise 9. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼
N (µ, 1), where µ is unknown. Show that the sample mean is a sufficient
statistic for µ.
SUFFICIENT STATISTICS
5
Solution. Luckily, we know that the distribution for X̄; we have that
X̄ ∼ N (µ, 1/n). However, even with this piece of knowledge, this is a
tricky exercise. First, we need the following observation. Note that
n
X
(xi − x̄) = 0.
i=1
Thus
n
X
(xi − µ)2 =
i=1
n
X
(xi − x̄ + x̄ − µ)2
i=1
=
n X
(xi − x̄)2 + 2(xi − x̄)(x̄ − µ) + (x̄ − µ)2
i=1
=
=
n X
i=1
n
X
(xi − x̄)2 + (x̄ − µ)2
(xi − x̄)2 + n(x̄ − µ)2
i=1
With this algebra in hand, we have that
Qn
(xi −µ)2
√1 e−
2
P
(xi −x̄)2
1
i=1 2π
− n
i=1
2
√
=
,
e
√
n(x̄−µ)2
n
n(2π)(n−1)/2
√ e−
2
2π
which does not depend on µ.
5. Fisher-Neyman factorization
We saw in the previous exercises that proving that a statistic is
sufficient from the definition can be quite challenging. The following
theorem factorization theorem makes life easier.
Theorem 10. Let X = (X1 , . . . , Xn ) be a random sample from the pdf
fθ , where θ ∈ Θ is unknown. A statistic T is sufficient for θ if and
only if there exists nonnegative functions g(t; θ) and h(x) (which does
not depend on θ) such that for all points x and all θ ∈ Θ, we have
L(x; θ) =
n
Y
f (xi ; θ) = g(t(x); θ)h(x).
i=1
Clearly, by definition, a factorization holds if T is sufficient, so one
direction of the proof is trivial. It is also immediate for Theorem 10,
that a 1-1 function of a sufficient statistic is again sufficient. Let us
also remark in Theorem 10, g(t; θ) does not have to be the density
6
SUFFICIENT STATISTICS
function for T (X), and in the discrete case, we do not require that
g(t) = P(T = t). The factorization of Theorem 10 is not unique. The
utility of Theorem 10 lies in the fact that we do not need to identify the
distribution of T . Before we prove the non-trivial direction of Theorem
10, let us apply it Exercise 9.
Exercise 11. Apply Theorem 10 to solve Exercise 9.
Solution (Solution to Exercise 9). The difference here is we still need
the somewhat tricky algebra, but we no longer need to know that sum
of independent normals is again normal.
n
Y
(xi −µ)2
1 Pn
n
1
2
2
√ e− 2 = (2π)−n/2 · e− 2 i=1 (xi −x̄) · e− 2 (x̄−µ) .
L(x; µ) =
2π
i=1
n
2
1
Thus we choose, g(x̄; µ) = e− 2 (x̄−µ) and h(x) = (2π)−n/2 ·e− 2
Pn
2
i=1 (xi −x̄)
Exercise 12. Let X = (X1 , . . . , Xn ) be a random sample, where
Xi ∼
P
N (0, θ), where the variance θ is unknown. Show that T = ni=1 Xi2 is
a sufficient statistic for θ.
Exercise 13. Let X = (X1 , . . . , Xn ) be a random sample, where Xi ∼
N (µ, σ 2 ), where both µ and σ 2 are unknown. Set θ = (µ, σ 2 ). Let
T = (X̄, S 2 ), where X̄ is the usual sample mean, and S 2 is the usual
sample variance. Show that
L(x; θ) = g(t(x); θ)h(x),
some functions g and h, so that T is a sufficient statistic for θ.
Exercise 14. Apply Theorem 10 to solve Exercise 4.
Solution. Let x ∈ {0, 1, 2 . . .}n , and t = t(x) = x1 + · · · + xn . We have
that
n
n
Y
Y
λxi
1
P(X = x) =
e−λ
= λt e−nλ
.
x
!
x
!
i
i
i=1
i=1
Qn 1
t −nλ
Thus we choose g(t; λ) = λ e
and h(x) = i=1 xi ! .
Exercise 15. Let X = (X1 , . . . , Xn ) be a random sample, where X1 is
a real-valued continuous random variable with a pdf given by
f (x1 ; θ) = h(x1 )c(θ)ew(θ)u(x1 )
Show that T =
Pn
i=1
u(Xi ) is a sufficient statistic for θ.
.
SUFFICIENT STATISTICS
7
Proof Theorem 10 (discrete case). Let t = t(x). We have by assumption that
P(X = x)
g(t; θ) · h(x)
=
.
P(T = t)
P(T = t)
Let us remark that we do not have that g(t; θ) = P(T = t). Of
course, P(T = t) = Pθ (T = t) depends on θ, and the claim is that
the θ’s in g(t; θ) cancel out the θ’s in P(T = t). To see why, let
A := {y : t(y) = t(x)}. Of course, x ∈ A, but there could be other
elements; think of t as the sample sum, then if t(x) = t, any other
permutation of y of x, we have t(y) = t. Thus,
X
X
P(T = t) = P(A) =
P(X = y) = g(t; θ)
h(y).
y∈A
y∈A
Hence
P(X = x)
h(x)
,
=P
P(T = t)
y∈A h(y)
which does not depend on θ.
The proof in the continuous case is more technical; your text has
a proof of a special case of the continuous case. The above proof is
similar to the proof of the following elementary fact.
Theorem 16. Let X be a discrete random variable with pdf f . If
g : R → R, then
X
g(x)f (x),
Eg(X) =
x
whenever the sum is absolutely convergent.
Proof. We have that
Eg(X) =
X
yP(g(X) = y).
y
Suppose X takes values on the set A. Let Ay := {x ∈ A : g(x) = y}.
Note that the sets Ay partition the set A. Thus
X
P(g(X) = y) = P(Ay ) =
f (x)
x∈Ay
and
Eg(X) =
XX
y
yf (x) =
XX
x∈Ay
y
x∈Ay
g(x)f (x) =
X
g(x)f (x).
x∈A
End of Midterm 1 coverage