Download Notes 14 - Wharton Statistics

Statistics 550 Notes 14 Reading: Sections 3.2 I. Computation of Bayes procedures for complex problems For nonconjugate priors, the posterior mean (which is the Bayes estimator under squared error loss) is not typically available in closed form. 2 2 Example 1: X 1 , , X n iid N ( ,  ) ,  known, and our prior on  is a logistic (a, b) distribution: 1 exp{(  a) / b}  ( )  b [1  exp{(  a) / b}]2 The logistic distribution is a more heavily tailed distribution than the normal distribution. Since X is sufficient, we can just compute the posterior given X . The posterior pdf is  1 ( X   ) 2  1 e ( a ) / b exp   2  (  a ) / b 2 ] 2 / n  2  / n  b [1  e 2  (  a ) / b  1 ( X  )  1 e 1 exp  d  2  (  a ) / b 2 ] 2 / n  2  / n  b [1  e 1 p( | X )  p( X |  ) ( )  p( X )    The numerator is not proportional to any commonly used density function and the denominator is not evaluatable in closed form. 1 Monte Carlo methods can be used to sample from the posterior distribution to approximate the Bayes estimator. For discussion of Monte Carlo methods for Bayesian inference, see Bayesian Data Analysis by Gelman, Carlin, Stern and Rubin; these methods will be discussed in Stat 542 taught by Professor Jensen in the Spring. II. Improper Priors Bayes estimators are defined with respect to a proper prior distribution  ( ) . Often a weighting function  ( ) is considered that is not a probability distribution; this is called an improper prior. Example 2: X ~ N (  ,1) ,  (  )  1 . Example 3: X ~ Binomial( p , n ) . Consider the prior  ( p)  p 1 (1  p)1 , 0  p  1 , which in some sense corresponds to a Beta(0,0) distribution. However, this is not a proper distribution because  1 0 p 1 (1  p)1 dp   . For an improper prior, we can still consider the “Bayes” risk of a decision procedure: r ( )   R( ,  ) ( )d (1.1) 2 * An estimator  ( x) is called a generalized Bayes estimator with respect to a weighting function  ( ) (even if it is not a proper probability distribution) if it minimizes the “Bayes” risk (1.1) over all estimators. We can write the “Bayes” risk (1.1) as r ( )     l ( ,  ( X ) p( X |  )dX   ( )d       l ( ,  ( X ) p( X |  ) ( ) d dX   A decision procedure  ( X ) for which for all X ,  ( X )  arg min  l ( , a) p( X |  ) ( )d aA is a generalized Bayes estimator (this is the analogue of Proposition 3.2.1). Let s( | X )  p( X |  ) ( )  p( X |  ) ( )d (1.2)  p( X |  ) ( )d   , then a  arg min  l ( , a) p( X |  ) ( ) d if and only if a  arg min  l ( , a)s( | X )d . Note that if aA aA Sometimes s ( | X ) is a proper probability distribution even if  ( ) is not a proper probability distribution, and then we can think of s ( | X ) as the “posterior” density 3 function of  | X and  l ( , a )s ( | X )d as the “posterior” risk so that an estimator which minimizes the “posterior” risk for each X is a generalized Bayes estimator. Example 3 continued: For X ~ Binomial( p, n) and the 1 1 prior  ( p)  p (1  p) , 0  p  1 , consider the generalized Bayes estimator under squared error loss. Error! Reference source not found. equals n  X n X 1 1   p (1  p) p (1  p) X s( p | X )    for 0  p  1 1 n  X n X 1 1 0  X  p (1  p) p (1  p) dp For 1  X  n 1, s ( p | X ) is a Beta( X , n  X ) distribution, i.e., s ( p | X ) is a proper probability distribution, and the action of the generalized Bayes estimator with respect to squared error loss is the expected value of p under the X X  distribution s ( p | X ) , which equals X  n  X n . For X  0 and X  n , the “posterior” density s ( p | X ) is no longer proper but it can be shown that X  arg min  l ( , a) p( X |  ) ( ) d . Thus, the n a A X p generalized Bayes estimator of is n for  ( p)  p 1 (1  p)1 . 4 Another useful variant of Bayes estimators are limits of Bayes estimators. A nonrandomized estimator  ( x ) is a limit of Bayes estimators if there exists a sequence of proper priors  Bayes estimators  v with respect to these prior  distributions such that  v ( x )   ( x) for all x.  v and Example 3 continued: For X ~ Binomial( p, n) ,  ( X )  X / n is a limit of Bayes estimators. Consider a Beta ( r , s ) prior (which is proper if r  0, s  0 ); the Bayes X r estimator is n  r  s . Consider the sequence of priors Beta (1,1), Beta (1/2,1/2),Beta(1/3,1/3),... Since X r X lim  , we have that  ( X )  X / n is a limit of r 0 n  r  s n s 0 Bayes estimators. From a Bayesian point of view, estimators that are limits of Bayes estimators are somewhat more desirable than generalized Bayes estimators (often estimators are both limit of Bayes estimators and generalized Bayes estimators as in Example 3). This is because, by construction, a limit of Bayes estimators must be close to a proper Bayes estimator. In contrast, a generalized Bayes estimator may not be close to any proper Bayes estimator. III. Admissibility of Bayes rules: 5 In general, Bayes rules are admissible. * Theorem : Suppose that  is an interval and  is a Bayes rule with respect to a prior density function  ( ) such that  ( )  0 for all   and R ( , d ) is a * continuous function of  for all d . Then  is admissible. Proof: The proof is by contradiction. Suppose that  is inadmissible. There is then another estimate,  , such that R( ,  * )  R( ,  ) for all  and with strict inequality for some  , say 0 . Since R( ,  *)  R( ,  ) is a continuous function of  , there is an   0 and an interval   h such that R( ,  *)  R( ,  )   for 0  h    0  h Then, * 0  h    R( ,  *)  R( ,  ) ( )d     R( ,  *)  R( ,  ) ( )d  0 h  0  h    ( )d  0 0 h But this contradicts the fact that  * is a Bayes rule because a Bayes rule has the property that B( *)  B( )     R( ,  *)  R( ,  ) ( )d  0 .  The proof is complete.  6 The theorem can be regarded as both a positive and negative result. It is positive in that it identifies a certain class of estimates as being admissible, in particular, any Bayes estimate. It is negative in that there are apparently so many admissible estimates – one for every prior distribution that satisfies the hypotheses of the theorem – and some of these might make little sense. Complete class theorems characterize the class of all admissible estimators. Roughly the class of all admissible estimators for most models is the class of all Bayes and limit of Bayes estimators. 7

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes 14 - Wharton Statistics