Download Lecture 12 Data Reduction 1. Sufficient Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Information theory wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Generalized linear model wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Lecture 12
Data Reduction
We should think about the following questions carefully before
the "simplification" process:
• Is there any loss of information due to summarization?
• How to compare the amount of information about θ in the
original data X and in
?
• Is it sufficient to consider only the "reduced data" T?
1. Sufficient Statistics
A statistic T is called sufficient if the conditional distribution of
X given T is free of θ (that is, the conditional is a completely
known distribution).
Example. Toss a coin n times, and the probability of head is an
unknown parameter θ. Let T = the total number of heads. Is T
sufficient for θ?
Sufficiency Principle
If T is sufficient, the "extra information" carried by X is
worthless as long as θ is concerned. It is then only natural to
1
consider inference procedures which do not use this extra
irrelevant information. This leads to the Sufficiency Principle :
Any inference procedure should depend on the data only through
sufficient statistics.
Definition: Sufficient Statistic (in terms of Conditional
Probability)
(discrete case):
For any x and t, if the conditional pdf of X given T:
|
does not depend on θ then we say
for θ.
is a sufficient statistic
Sufficient Statistic, the general definition (for both
discrete and continuous variables):
Let the pdf of data X is
and the pdf of T be
is free of θ,
. If
(may depend on x) (∗)
for all x and θ, then T is a sufficient statistic for θ.
Example: Toss a coin n times, and the probability of head is an
unknown parameter θ. Let T = the total number of heads. Is T
sufficient for θ?
Bernoulli:
∑
∑
∑
B(n,θ):
∑
[∑
]
∑
∑
2
Thus
[ ]
is free of θ,
So by the definition, ∑
Example.
is a sufficient statistic for .
̅.
iid
Remarks: The definition (*) is not always easy to apply.
• Need to guess the form of a sufficient statistic.
• Need to figure out the distribution of T.
How to find a sufficient statistic?
(Neyman-Fisher) Factorization theorem.
T is sufficient if and only if
can be written as the
product
, where the first factor depends on x
only though
and the second factor is free of θ.
Example. Binomial. iid bin(1, θ)
Solution 1:
Bernoulli:
∑
[
∑
(∑
So according to the factorization theorem,
sufficient statistic for .
∑
∑
] [ ]
)
∑
is a
3
Solution 2:
|
{
{
∏
∑
∑
where
and
{
Hence T is a sufficient statistic for θ.
Example. Exp(λ).
Let
be a random sample from an exponential
distribution with rate . And Let
and
be the joint density of
.
|
{
{
where
∏
∑
, and
{
Hence T is a sufficient statistic for .
4
Example. Normal. iid N(θ,1).
Please derive the sufficient statistic for θ by yourself.
When the range of X depends on θ, we should be more
careful about factorization. Must use indicator functions
explicitly.
Example. Uniform. iid
.
Solution 1:
Let
on
be a random sample from an uniform distribution
. And Let
and be the joint density of
.
Then
|
{
∏
{
{
where
{
,
and
{
Hence T is a sufficient statistic for .
5
*** I personally prefer this approach because it is most
straightforward. Alternatively, one can use the indicator
function and simplify the solution as illustrated next.
Definition: Indicator function
{
Solution 2 (in terms of the indicator function):
Uniform:
( )
( ) ∏
( )
(
[( )
(
)
(
)] [
(
)
So by factorization theorem,
.
)
(
)]
is a sufficient statistic for
Example: Please derive the sufficient statistics for θ, when
given a random sample of size n from
Solution:
1. Indicator function approach:
Uniform:
|
∏
(
[
(
(
)
(
)
(
)
)] [ ]
)
6
(
So,
) is a SS for .
2. Do not use the indicator function:
|
{
∏
{
{
where
(
)
,
{
and
So
(
) is a SS for .
Two-dimensional Examples.
Example. Normal. iid
unknown).
Let
(both
be a random sample from a normal distribution
. And Let
̅
∑
∑
and let
be the joint density of
̅
.
7
|
(
∏
√
∑
)
Now
∑
∑
∑
̅
̅
∑
̅
̅
̅
̅
∑
∑ ̅
̅
̅
̅
Thus,
|
(
)
(
(
̅
̅
where
̅
(
)
̅
))
is sufficient for
.
and
In this case we say ̅
8
9