Download Notes 17 - Wharton Statistics

Statistics 550 Notes 17 Reading: Section 3.2-3.3 I. Improper priors and Generalized Bayes estimators Review: Bayes estimators are defined with respect to a proper prior distribution  ( ) , where proper means that  ( ) is a probability distribution. Often a weighting function  ( ) is considered that is not a probability distribution; this is called an improper prior. Example 1: X ~ Binomial( p , n ) . Consider the prior  ( p)  p 1 (1  p)1 , 0  p  1 , which in some sense corresponds to a Beta(0,0) distribution. However, this is not a proper distribution because  1 0 p 1 (1  p)1 dp   . We can still consider the “Bayes” risk of a decision procedure: r ( )   R( ,  ) ( )d (1.1) * An estimator  ( x) is called a generalized Bayes estimator with respect to a weighting function  ( ) (even if it is not a proper probability distribution) if it minimizes the “Bayes” risk (1.1) over all estimators. We can write the “Bayes” risk (1.2) as 1 r ( )     l ( ,  ( X ) p( X |  )   ( )d    p( X |  ) ( )d     p( X |  ) ( ) d  dX     l ( ,  ( X )   p ( X |  )  (  ) d       p( X |  ) ( )d  q( X )dX     l ( ,  ( X )   p( X |  ) ( )d  A decision procedure  ( X ) which minimizes p( X |  ) ( )d l (  ,  ( X )   p( X |  ) ( )d (1.3) for each X is a generalized Bayes estimator (this is the analogue of Proposition 3.2.1). Sometimes s( | X )  p( X |  ) ( )  p( X |  ) ( )d (1.4) is a proper probability distribution even if  ( ) is not a proper probability distribution, and then we can think of s ( | X ) as the “posterior” density function of  | X and (1.4) as the “posterior” risk. Example 1 continued: For X ~ Binomial( p, n) and the 1 1 prior  ( p)  p (1  p) , 0  p  1 , consider the generalized Bayes estimator under squared error loss. (1.4) equals 2 n  X n X 1 1   p (1  p) p (1  p) X s( p | X )    for 0  p  1 1 n  X n X 1 1 0  X  p (1  p) p (1  p) dp For 1  X  n 1, s ( p | X ) is a Beta( X , n  X ) distribution and the generalized Bayes estimator with respect to squared error loss is the expected value of p under the distribution X X  s ( p | X ) , which equals X  0 and X n X n . For X  n , the “posterior” density s ( p | X ) is no longer proper. X However, it can be shown that for n , the “posterior” X expected loss (1.3) is finite, and n minimizes (1.3). Thus, X for all X , n is the generalized Bayes estimator of p for  ( p)  p 1 (1  p)1 . Another useful variant of Bayes estimators are limits of Bayes estimators. A nonrandomized estimator  ( x ) is a limit of Bayes estimators if there exists a sequence of proper priors  Bayes estimators  v with respect to these prior  distributions such that  v ( x )   ( x) for all x. 3  v and Example 1 continued: For X ~ Binomial( p, n) ,  ( X )  X / n is a limit of Bayes estimators. Consider a Beta ( r , s ) prior (which is proper if r  0, s  0 ); the Bayes X r estimator is n  r  s . Consider the sequence of priors Beta (1,1), Beta (1/2,1/2),Beta(1/3,1/3),... Since X r X lim  , we have that  ( X )  X / n is a limit of r 0 n  r  s n s 0 Bayes estimators. From a Bayesian point of view, estimators that are limits of Bayes estimators are somewhat more desirable than generalized Bayes estimators (often estimators are both limit of Bayes estimators and generalized Bayes estimators as in Example 1). This is because, by construction, a limit of Bayes estimators must be close to a proper Bayes estimator. In contrast, a generalized Bayes estimator may not be close to any proper Bayes estimator. II. Admissibility of Bayes rules: In general, Bayes rules are admissible. * Theorem : Suppose that  is an interval and  is a Bayes rule with respect to a prior density function  ( ) such that  ( )  0 for all   and R ( , d ) is a * continuous function of  for all d . Then  is admissible. 4 Proof: The proof is by contradiction. Suppose that  is inadmissible. There is then another estimate,  , such that R( ,  * )  R( ,  ) for all  and with strict inequality for some  , say 0 . Since R( ,  *)  R( ,  ) is a continuous function of  , there is an   0 and an interval   h such that R( ,  *)  R( ,  )   for 0  h    0  h Then, * 0  h    R( ,  *)  R( ,  ) ( )d     R( ,  *)  R( ,  ) ( )d  0 h  0  h    ( )d  0 0 h But this contradicts the fact that  * is a Bayes rule because a Bayes rule has the property that B( *)  B( )     R( ,  *)  R( ,  ) ( )d  0 .  The proof is complete.  The theorem can be regarded as both a positive and negative result. It is positive in that it identifies a certain class of estimates as being admissible, in particular, any Bayes estimate. It is negative in that there are apparently so many admissible estimates – one for every prior distribution that satisfies the hypotheses of the theorem – and some of these might make little sense (like  ( X )  3 for the normal distribution above). 5 Complete class theorems characterize the class of all admissible estimators. Roughly the class of all admissible estimators for most models is the class of all Bayes and limit of Bayes estimators. III. Minimax Procedures (Section 3.3) The minimax criteria minimizes the worst possible risk. That is, we prefer  to  ' , if and only if sup  R( ,  )  sup  R( ,  ') . * A procedure  is minimax (over a class of considered decision procedures) if it satisfies sup R( ,  *)  inf sup R( ,  ) . Let   denote the Bayes estimator with respect to the prior  ( ) and r ( )  E [ E[l ( ,  ( X )) |  ]]  E [ R( ,  )] denotes the Bayes risk for the Bayes estimator for the prior  ( ) . A prior distribution  is least favorable if r  r ' for all prior distributions  ' . This is the prior distribution which causes the statistician the greatest average loss assuming the statistician uses the Bayes estimator. 6 Theorem 3.3.2 (I’ve expanded on the statement of it): Suppose that  is a prior distribution on  and   is a Bayes estimator with respect to  such that r (  )   R( ,  )d ( )  sup R( ,   ) (1.5) Then: (i)   is minimax. (ii) If   is the unique Bayes solution with respect to  , it is the unique minimax procedure. (iii)   is a least favorable prior. Proof: (i) Let  be any other procedure. Then, sup R( ,  )   R( , )d  ( )   R( ,  ) d  ( )  sup R( ,   ) (ii) This follows by replacing  by > in the second equality of the proof of (i). (iii) Let  ' be some other distribution of  . Then, r ' ( ' )   R( , ' )d '(   R( , )d '(  sup R( ,  )  r ( ) Corollary: If a Bayes procedure   has constant risk, then it is minimax. Proof: If   has constant risk, then (1.5) is clearly satisfied. 7 Example 1 (Example 3.3.1, Problem 3.3.4): Suppose X 1 , , X n are iid Bernoulli (  ) and we want to estimate  . 2 Consider the squared error loss function l ( , a)  (  a) . For squared error loss and a Beta(r,s) prior, we showed in Notes 16 that the Bayes estimator is r   i 1 xi n ˆr , s  rsn . We now seek to choose r and s so that ˆr , s has constant risk. The risk of ˆ is r ,s 2 n    r  x  i 1 i      R ( , ˆr , s )  E   r  s  n        r   xi    r   n xi i 1 i 1    E   Var   rsn    rsn     n n (1   )  r  n        ( r  s  n) 2  r  s  n  n (1   )  r  n  r  s  n     2 ( r  s  n)  rsn  n (1   )  (r  r  s ) 2  ( r  s  n) 2 2 8          2 2 2 The coefficient on  in the numerator is n  (r  s) and the coefficient on  in the numerator is n  2r (r  s ) . We choose r and s so that both these coefficients are zero: n  (r  s)2  0, n  2r (r  s)  0 n r  s  Solving these equations gives 2 . The unique minimax estimator is n n   i 1 xi ˆminimax  ˆ n n  2 , n n 2 2  n 2 2 1 which has constant risk 4(1  n ) 2 compared to  (1   ) for the MLE X . n For small n, the minimax estimator is better than the MLE for a large range of  . For large n, the minimax estimator is better than the MLE for only a small range of  near 0.5. 9 Minimax as limit of Bayes rules: If the parameter space  is not bounded, minimax rules are often not Bayes rules but instead can be obtained as limits of Bayes rules. To deal with such situations we need an extension of Theorem 3.3.2. * Theorem 3.3.3: Let  be a decision rule such that sup R( ,  * )  r   . Let { k } be a sequence of prior 10 distributions and let rk be the Bayes risk of the Bayes rule with respect to the prior  k . If rk  r as k   , then  * is minimax. Proof: Suppose  is any other estimator. Then, sup R( ,  )   R( ,  )d  k ( )  rk , and this holds for every k. Hence, sup R( ,  )  sup R( ,  * ) and  * is minimax. Note: Unlike Theorem 3.3.2, even if the Bayes estimators for the priors  k are unique, the theorem does not guarantee * that  is the unique minimax estimator. Example 2 (Example 3.3.3): X 1 , , X n iid N (  ,1),       . Suppose we want to estimate  with squared error loss. We will show that X is minimax. 1 First, note that X has constant risk n . Consider the sequence of priors,  k  N (0, k ) . In Notes 16, we showed that the Bayes estimator for squared error loss with respect n ˆk  X  1 to the prior k is . The risk function of ˆk is n k 11 2    n  2  1       n k R( ,ˆk )  E    X  2 1    . 1   n n     k   k  The Bayes risk of ˆ with respect to  is 2 k k 1 n    k rk   2  1  n  k  2 2  2  exp   d 2 k 2 k   1 1 n k   2 2 1  1  n  n  k  k  1 r  As k   , k n , which is the constant risk of X . Thus, by Theorem 3.3.3, X is minimax. 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Notes 17 - Wharton Statistics