Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 550 Notes 17 Reading: Section 3.2-3.3 I. Improper priors and Generalized Bayes estimators Review: Bayes estimators are defined with respect to a proper prior distribution ( ) , where proper means that ( ) is a probability distribution. Often a weighting function ( ) is considered that is not a probability distribution; this is called an improper prior. Example 1: X ~ Binomial( p , n ) . Consider the prior ( p) p 1 (1 p)1 , 0 p 1 , which in some sense corresponds to a Beta(0,0) distribution. However, this is not a proper distribution because 1 0 p 1 (1 p)1 dp . We can still consider the “Bayes” risk of a decision procedure: r ( ) R( , ) ( )d (1.1) * An estimator ( x) is called a generalized Bayes estimator with respect to a weighting function ( ) (even if it is not a proper probability distribution) if it minimizes the “Bayes” risk (1.1) over all estimators. We can write the “Bayes” risk (1.2) as 1 r ( ) l ( , ( X ) p( X | ) ( )d p( X | ) ( )d p( X | ) ( ) d dX l ( , ( X ) p ( X | ) ( ) d p( X | ) ( )d q( X )dX l ( , ( X ) p( X | ) ( )d A decision procedure ( X ) which minimizes p( X | ) ( )d l ( , ( X ) p( X | ) ( )d (1.3) for each X is a generalized Bayes estimator (this is the analogue of Proposition 3.2.1). Sometimes s( | X ) p( X | ) ( ) p( X | ) ( )d (1.4) is a proper probability distribution even if ( ) is not a proper probability distribution, and then we can think of s ( | X ) as the “posterior” density function of | X and (1.4) as the “posterior” risk. Example 1 continued: For X ~ Binomial( p, n) and the 1 1 prior ( p) p (1 p) , 0 p 1 , consider the generalized Bayes estimator under squared error loss. (1.4) equals 2 n X n X 1 1 p (1 p) p (1 p) X s( p | X ) for 0 p 1 1 n X n X 1 1 0 X p (1 p) p (1 p) dp For 1 X n 1, s ( p | X ) is a Beta( X , n X ) distribution and the generalized Bayes estimator with respect to squared error loss is the expected value of p under the distribution X X s ( p | X ) , which equals X 0 and X n X n . For X n , the “posterior” density s ( p | X ) is no longer proper. X However, it can be shown that for n , the “posterior” X expected loss (1.3) is finite, and n minimizes (1.3). Thus, X for all X , n is the generalized Bayes estimator of p for ( p) p 1 (1 p)1 . Another useful variant of Bayes estimators are limits of Bayes estimators. A nonrandomized estimator ( x ) is a limit of Bayes estimators if there exists a sequence of proper priors Bayes estimators v with respect to these prior distributions such that v ( x ) ( x) for all x. 3 v and Example 1 continued: For X ~ Binomial( p, n) , ( X ) X / n is a limit of Bayes estimators. Consider a Beta ( r , s ) prior (which is proper if r 0, s 0 ); the Bayes X r estimator is n r s . Consider the sequence of priors Beta (1,1), Beta (1/2,1/2),Beta(1/3,1/3),... Since X r X lim , we have that ( X ) X / n is a limit of r 0 n r s n s 0 Bayes estimators. From a Bayesian point of view, estimators that are limits of Bayes estimators are somewhat more desirable than generalized Bayes estimators (often estimators are both limit of Bayes estimators and generalized Bayes estimators as in Example 1). This is because, by construction, a limit of Bayes estimators must be close to a proper Bayes estimator. In contrast, a generalized Bayes estimator may not be close to any proper Bayes estimator. II. Admissibility of Bayes rules: In general, Bayes rules are admissible. * Theorem : Suppose that is an interval and is a Bayes rule with respect to a prior density function ( ) such that ( ) 0 for all and R ( , d ) is a * continuous function of for all d . Then is admissible. 4 Proof: The proof is by contradiction. Suppose that is inadmissible. There is then another estimate, , such that R( , * ) R( , ) for all and with strict inequality for some , say 0 . Since R( , *) R( , ) is a continuous function of , there is an 0 and an interval h such that R( , *) R( , ) for 0 h 0 h Then, * 0 h R( , *) R( , ) ( )d R( , *) R( , ) ( )d 0 h 0 h ( )d 0 0 h But this contradicts the fact that * is a Bayes rule because a Bayes rule has the property that B( *) B( ) R( , *) R( , ) ( )d 0 . The proof is complete. The theorem can be regarded as both a positive and negative result. It is positive in that it identifies a certain class of estimates as being admissible, in particular, any Bayes estimate. It is negative in that there are apparently so many admissible estimates – one for every prior distribution that satisfies the hypotheses of the theorem – and some of these might make little sense (like ( X ) 3 for the normal distribution above). 5 Complete class theorems characterize the class of all admissible estimators. Roughly the class of all admissible estimators for most models is the class of all Bayes and limit of Bayes estimators. III. Minimax Procedures (Section 3.3) The minimax criteria minimizes the worst possible risk. That is, we prefer to ' , if and only if sup R( , ) sup R( , ') . * A procedure is minimax (over a class of considered decision procedures) if it satisfies sup R( , *) inf sup R( , ) . Let denote the Bayes estimator with respect to the prior ( ) and r ( ) E [ E[l ( , ( X )) | ]] E [ R( , )] denotes the Bayes risk for the Bayes estimator for the prior ( ) . A prior distribution is least favorable if r r ' for all prior distributions ' . This is the prior distribution which causes the statistician the greatest average loss assuming the statistician uses the Bayes estimator. 6 Theorem 3.3.2 (I’ve expanded on the statement of it): Suppose that is a prior distribution on and is a Bayes estimator with respect to such that r ( ) R( , )d ( ) sup R( , ) (1.5) Then: (i) is minimax. (ii) If is the unique Bayes solution with respect to , it is the unique minimax procedure. (iii) is a least favorable prior. Proof: (i) Let be any other procedure. Then, sup R( , ) R( , )d ( ) R( , ) d ( ) sup R( , ) (ii) This follows by replacing by > in the second equality of the proof of (i). (iii) Let ' be some other distribution of . Then, r ' ( ' ) R( , ' )d '( R( , )d '( sup R( , ) r ( ) Corollary: If a Bayes procedure has constant risk, then it is minimax. Proof: If has constant risk, then (1.5) is clearly satisfied. 7 Example 1 (Example 3.3.1, Problem 3.3.4): Suppose X 1 , , X n are iid Bernoulli ( ) and we want to estimate . 2 Consider the squared error loss function l ( , a) ( a) . For squared error loss and a Beta(r,s) prior, we showed in Notes 16 that the Bayes estimator is r i 1 xi n ˆr , s rsn . We now seek to choose r and s so that ˆr , s has constant risk. The risk of ˆ is r ,s 2 n r x i 1 i R ( , ˆr , s ) E r s n r xi r n xi i 1 i 1 E Var rsn rsn n n (1 ) r n ( r s n) 2 r s n n (1 ) r n r s n 2 ( r s n) rsn n (1 ) (r r s ) 2 ( r s n) 2 2 8 2 2 2 The coefficient on in the numerator is n (r s) and the coefficient on in the numerator is n 2r (r s ) . We choose r and s so that both these coefficients are zero: n (r s)2 0, n 2r (r s) 0 n r s Solving these equations gives 2 . The unique minimax estimator is n n i 1 xi ˆminimax ˆ n n 2 , n n 2 2 n 2 2 1 which has constant risk 4(1 n ) 2 compared to (1 ) for the MLE X . n For small n, the minimax estimator is better than the MLE for a large range of . For large n, the minimax estimator is better than the MLE for only a small range of near 0.5. 9 Minimax as limit of Bayes rules: If the parameter space is not bounded, minimax rules are often not Bayes rules but instead can be obtained as limits of Bayes rules. To deal with such situations we need an extension of Theorem 3.3.2. * Theorem 3.3.3: Let be a decision rule such that sup R( , * ) r . Let { k } be a sequence of prior 10 distributions and let rk be the Bayes risk of the Bayes rule with respect to the prior k . If rk r as k , then * is minimax. Proof: Suppose is any other estimator. Then, sup R( , ) R( , )d k ( ) rk , and this holds for every k. Hence, sup R( , ) sup R( , * ) and * is minimax. Note: Unlike Theorem 3.3.2, even if the Bayes estimators for the priors k are unique, the theorem does not guarantee * that is the unique minimax estimator. Example 2 (Example 3.3.3): X 1 , , X n iid N ( ,1), . Suppose we want to estimate with squared error loss. We will show that X is minimax. 1 First, note that X has constant risk n . Consider the sequence of priors, k N (0, k ) . In Notes 16, we showed that the Bayes estimator for squared error loss with respect n ˆk X 1 to the prior k is . The risk function of ˆk is n k 11 2 n 2 1 n k R( ,ˆk ) E X 2 1 . 1 n n k k The Bayes risk of ˆ with respect to is 2 k k 1 n k rk 2 1 n k 2 2 2 exp d 2 k 2 k 1 1 n k 2 2 1 1 n n k k 1 r As k , k n , which is the constant risk of X . Thus, by Theorem 3.3.3, X is minimax. 12