Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 550 Notes 4 Reading: Section 1.3 I. Bayes Criterion The Bayesian point of view leads to a natural global criterion. Suppose a person’s prior distribution about is ( and the model is that X | has probability density function (or probability mass function) p( x | ) . Then the joint (subjective) pdf (or pmf) of ( X , ) is ( p ( x | ) . The Bayes risk of a decision procedure for a prior distribution ( , denoted by r ( ) , is the expected value of the risk over the joint distribution of ( X , ) : r ( ) E [ E[l ( , ( X )) | ]] E [ R( , )] . For a person with subjective prior probability distribution ( , the decision procedure which minimizes r ( ) minimizes the person’s (subjective) expected loss and is the best procedure from this person’s point of view. The decision procedure which minimizes the Bayes risk for a prior ( is called the Bayes rule for the prior ( . Example continued: For prior, (1 ) 0.2 and (2 ) 0.8 , the Bayes risks are 1 r ( ) 0.2R(1 , ) 0.8R(2 , ) 1 r ( ) 9.6 Rule 2 3 4 5 7.48 8.38 4.92 2.8 6 3.7 7 8 7.02 4.9 9 5.8 Thus, rule 5 is the Bayes rule for this prior distribution. The Bayes rule depends on the prior. For prior (1 ) 0.5 and (2 ) 0.5 , the Bayes risks are r ( ) 0.5R(1 , ) 0.5R(2 , ) Rule 3 4 5 6.55 4.2 5.5 1 2 6 7 8 9 7.3 4.75 4.95 6.25 5.5 r ( ) 6 Thus, rule 4 is the Bayes rule for this prior distribution. A non-subjective interpretation of Bayes rules: The Bayes approach leads us to compare procedures on the basis of r ( ) R( , ) ( ) if is discrete with frequency function ( or r ( ) R( , ) ( ) d if is continuous with density ( . Such comparisons make sense even if we do not interpret ( as a prior density or frequency, but only as a weight 2 function that reflects the importance we place on doing well at the different possible values of . For example, in Example 1, if we felt that doing well at both 1 and 2 are equally important, we would set (1 ) (2 ) 0.5 . II. Minimax Criteria The minimax criteria minimizes the worst possible risk. That is, we prefer to ' , if and only if sup R( , ) sup R( , ') . * A procedure is minimax (over a class of considered decision procedures) if it satisfies sup R( , *) inf sup R( , ) . Among the nine decision rules considered for Example 2, rule 4 is the minimax rule. Rule 1 2 3 4 5 6 7 8 9 0 7 3.5 3 10 6.5 1.5 8.5 5 R(1 , ) 12 7.6 9.6 5.4 1 3 8.4 4 6 R( 2 , ) max{ R(1 , ) , 12 7.6 9.6 5.4 10 6.5 8.4 8.5 6 R( 2 , ) } Game theory motivation for minimax criterion: Suppose we play a two-person zero sum game against Nature. Then 3 the minimax decision procedure is the minimax strategy for the game. Comments on the minimax criteria: The minimax criteria is very conservative. It aims to give maximum protection against the worst can happen. The principle would be compelling if the statistician believed that Nature was a malevolent “opponent” but in fact Nature is just the inanimate state of the world. Although the minimax criterion is conservative, in many cases the principle does lead to reasonable procedures. III. Other Global Criteria for Decision Procedures Two compromises between Bayes and minimax criteria that have been proposed are: (1) Restricted Risk Bayes: Suppose that M is the maximum risk of the minimax decision procedure. Then, one may be willing to consider decision procedures whose maximum risk exceeds M , if the excess is controlled, say, if R( , ) M (1 ) for all (0.1) where is the proportional increase in risk that one is willing to tolerate. A restricted risk Bayes decision procedure for the prior is then obtained by minimizing the Bayes risk r ( ) among all decision procedures that satisfy Error! Reference source not found.. For Example 1 and prior (1 ) 0.2 , (2 ) 0.8 4 1 r ( ) 9.6 Max 12 Risk Rule 2 3 4 5 7.48 8.38 4.92 2.8 6 3.7 7 8 7.02 4.9 9 5.8 7.6 6.5 8.4 6 9.6 5.4 10 8.5 For =0.1 (maximum risk allowed = (1+0.1)*5.4=5.94), decision rule 4 is the restricted risk Bayes procedure; for =0.25 (maximum risk allowed = (1+0.25)*5.4=6.75), decision rule 6 is the restricted risk Bayes procedure. (2) Gamma minimaxity. Let be a class of prior * distributions. A decision procedure is gamma-minimax (over a class of considered decision procedures) if inf sup r ( ) sup r ( * ) * Thus, the estimator minimizes the maximum Bayes risk over those priors in the class . Consider the two prior distributions: (1) 1 (1 ) 0.2 , 1 (2 ) 0.8 ; (2) 2 (1 ) 2 (2 ) 0.5 . The maximum Bayes risk over these two priors for the rules are max i 1,2 r i ( ) Rule 1 2 3 4 5 6 7 8 9 9.6 7.48 8.38 4.92 5.5 4.75 7.02 6.25 5.8 Thus, the Gamma minimax rule is rule 6. 5 Computational issues: We will study more on how to find Bayes and minimax point estimators in Chapter 3. The restricted risk Bayes procedure is appealing but it is difficult to compute. VIII. Randomized decision procedures A randomized decision procedure is a decision procedure which assigns to each possible outcome of the data X , a random variable Y( X ) , where the values of Y( X ) are actions in the action space. When X = x , a draw from the distribution of Y( x ) will be taken and will constitute the action taken. We will show in Chapter 3 that for any prior, there is always a nonrandomized decision procedure that has at least as small Bayes risk as a randomized decision procedure (so we can ignore randomized decision procedures in looking for the Bayes rule). Students of game theory will realize that a randomized decision procedure may lead to a lower maximum risk than a nonrandomized decision procedure. Example: For Example 1, a randomized decision procedure is to flip a fair coin and use decision rule 4 if the coin lands heads and decision rule 6 if the coin lands tails – i.e., Y ( x 0) a2 with probability 1 and Y ( x 1) a1 with probability 0.5 and Y ( x 1) a3 with probability 0.5. The risk of this randomized decision procedure is 6 4.75 if =1 0.5R( , 4 ) 0.5 R( , 6 ) 4.20 if = 2 , which has lower maximum risk than decision rule 4 (the minimax rule among nonrandomized decision rules). Randomized decision procedures are somewhat impractical – it makes the statistician’s inferences seem less credible if she has to explain to a scientist that she flipped a coin after observing the data to determine the inferences. We will show in Chapter 1.5 that a randomized decision procedure cannot lower the maximum risk if the loss function is convex. 7