Download Notes 14 - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Pattern recognition wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Statistics 550 Notes 14
Reading: Sections 3.2
I. Computation of Bayes procedures for complex
problems
For nonconjugate priors, the posterior mean (which is the
Bayes estimator under squared error loss) is not typically
available in closed form.
2
2
Example 1: X 1 , , X n iid N ( ,  ) ,  known, and our
prior on  is a logistic (a, b) distribution:
1 exp{(  a) / b}
 ( ) 
b [1  exp{(  a) / b}]2
The logistic distribution is a more heavily tailed
distribution than the normal distribution.
Since X is sufficient, we can just compute the posterior
given X .
The posterior pdf is
 1 ( X   ) 2  1 e ( a ) / b
exp 

2
 (  a ) / b 2
]
2 / n
 2  / n  b [1  e
2
 (  a ) / b
 1 ( X  )  1 e
1
exp 
d

2
 (  a ) / b 2
]
2 / n
 2  / n  b [1  e
1
p( | X ) 
p( X |  ) ( )

p( X )



The numerator is not proportional to any commonly used
density function and the denominator is not evaluatable in
closed form.
1
Monte Carlo methods can be used to sample from the
posterior distribution to approximate the Bayes estimator.
For discussion of Monte Carlo methods for Bayesian
inference, see Bayesian Data Analysis by Gelman, Carlin,
Stern and Rubin; these methods will be discussed in Stat
542 taught by Professor Jensen in the Spring.
II. Improper Priors
Bayes estimators are defined with respect to a proper prior
distribution  ( ) .
Often a weighting function  ( ) is considered that is not a
probability distribution; this is called an improper prior.
Example 2: X ~ N (  ,1) ,  (  )  1 .
Example 3: X ~ Binomial( p , n ) . Consider the prior
 ( p)  p 1 (1  p)1 , 0  p  1 , which in some sense
corresponds to a Beta(0,0) distribution. However, this is
not a proper distribution because

1
0
p 1 (1  p)1 dp   .
For an improper prior, we can still consider the “Bayes”
risk of a decision procedure:
r ( )   R( ,  ) ( )d
(1.1)
2
*
An estimator  ( x) is called a generalized Bayes estimator
with respect to a weighting function  ( ) (even if it is not a
proper probability distribution) if it minimizes the “Bayes”
risk (1.1) over all estimators.
We can write the “Bayes” risk (1.1) as
r ( )     l ( ,  ( X ) p( X |  )dX   ( )d


    l ( ,  ( X ) p( X |  ) ( ) d dX


A decision procedure  ( X ) for which for all X ,
 ( X )  arg min  l ( , a) p( X |  ) ( )d
aA
is a generalized Bayes estimator (this is the analogue of
Proposition 3.2.1).
Let
s( | X ) 
p( X |  ) ( )
 p( X |  ) ( )d
(1.2)
 p( X |  ) ( )d   , then
a  arg min  l ( , a) p( X |  ) ( ) d if and only if
a  arg min  l ( , a)s( | X )d .
Note that if
aA
aA
Sometimes s ( | X ) is a proper probability distribution
even if  ( ) is not a proper probability distribution, and
then we can think of s ( | X ) as the “posterior” density
3
function of  | X and  l ( , a )s ( | X )d as the “posterior”
risk so that an estimator which minimizes the “posterior”
risk for each X is a generalized Bayes estimator.
Example 3 continued: For X ~ Binomial( p, n) and the
1
1
prior  ( p)  p (1  p) , 0  p  1 , consider the
generalized Bayes estimator under squared error loss.
Error! Reference source not found. equals
n  X
n X
1
1
  p (1  p) p (1  p)
X
s( p | X )   
for 0  p  1
1 n 
X
n X
1
1
0  X  p (1  p) p (1  p) dp
For 1  X  n 1, s ( p | X ) is a Beta( X , n  X ) distribution,
i.e., s ( p | X ) is a proper probability distribution, and the
action of the generalized Bayes estimator with respect to
squared error loss is the expected value of p under the
X
X

distribution s ( p | X ) , which equals X  n  X
n . For
X  0 and X  n , the “posterior” density s ( p | X ) is no
longer proper but it can be shown that
X
 arg min  l ( , a) p( X |  ) ( ) d . Thus, the
n
a A
X
p
generalized Bayes estimator of is n for
 ( p)  p 1 (1  p)1 .
4
Another useful variant of Bayes estimators are limits of
Bayes estimators.
A nonrandomized estimator  ( x ) is a limit of Bayes
estimators if there exists a sequence of proper priors

Bayes estimators  v with respect to these prior

distributions such that  v ( x )   ( x) for all x.
 v and
Example 3 continued: For X ~ Binomial( p, n) ,
 ( X )  X / n is a limit of Bayes estimators. Consider a
Beta ( r , s ) prior (which is proper if r  0, s  0 ); the Bayes
X r
estimator is n  r  s . Consider the sequence of priors
Beta (1,1), Beta (1/2,1/2),Beta(1/3,1/3),... Since
X r
X
lim

, we have that  ( X )  X / n is a limit of
r 0 n  r  s
n
s 0
Bayes estimators.
From a Bayesian point of view, estimators that are limits of
Bayes estimators are somewhat more desirable than
generalized Bayes estimators (often estimators are both
limit of Bayes estimators and generalized Bayes estimators
as in Example 3). This is because, by construction, a limit
of Bayes estimators must be close to a proper Bayes
estimator. In contrast, a generalized Bayes estimator may
not be close to any proper Bayes estimator.
III. Admissibility of Bayes rules:
5
In general, Bayes rules are admissible.
*
Theorem : Suppose that  is an interval and 
is a Bayes rule with respect to a prior density function
 ( ) such that  ( )  0 for all   and R ( , d ) is a
*
continuous function of  for all d . Then  is admissible.
Proof: The proof is by contradiction. Suppose that  is
inadmissible. There is then another estimate,  , such that
R( ,  * )  R( ,  ) for all  and with strict inequality for
some  , say 0 . Since R( ,  *)  R( ,  ) is a continuous
function of  , there is an   0 and an interval   h such
that
R( ,  *)  R( ,  )   for 0  h    0  h
Then,
*
0  h

  R( ,  *)  R( ,  ) ( )d     R( ,  *)  R( ,  ) ( )d

0 h

0  h


 ( )d  0
0 h
But this contradicts the fact that  * is a Bayes rule because
a Bayes rule has the property that
B( *)  B( ) 

  R( ,  *)  R( ,  ) ( )d  0 .

The proof is complete.

6
The theorem can be regarded as both a positive and
negative result. It is positive in that it identifies a certain
class of estimates as being admissible, in particular, any
Bayes estimate. It is negative in that there are apparently
so many admissible estimates – one for every prior
distribution that satisfies the hypotheses of the theorem –
and some of these might make little sense.
Complete class theorems characterize the class of all
admissible estimators.
Roughly the class of all admissible estimators for most
models is the class of all Bayes and limit of Bayes
estimators.
7