Download Statistics 512 Notes ID

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Statistics 550 Notes 4
Reading: Section 1.3
I. Bayes Criterion
The Bayesian point of view leads to a natural global
criterion.
Suppose a person’s prior distribution about  is  (  and
the model is that X |  has probability density function (or
probability mass function) p( x |  ) . Then the joint
(subjective) pdf (or pmf) of ( X ,  ) is  (  p ( x |  ) .
The Bayes risk of a decision procedure  for a prior
distribution  (  , denoted by r ( ) , is the expected value
of the risk over the joint distribution of ( X ,  ) :
r ( )  E [ E[l ( ,  ( X )) |  ]]  E [ R( ,  )] .
For a person with subjective prior probability distribution
 (  , the decision procedure which minimizes
r ( ) minimizes the person’s (subjective) expected loss and
is the best procedure from this person’s point of view. The
decision procedure which minimizes the Bayes risk for a
prior  (  is called the Bayes rule for the prior  (  .
Example continued: For prior,  (1 )  0.2 and  (2 )  0.8 ,
the Bayes risks are
1
r ( )  0.2R(1 ,  )  0.8R(2 ,  )
1
r ( ) 9.6
Rule
2
3
4
5
7.48 8.38 4.92 2.8
6
3.7
7
8
7.02 4.9
9
5.8
Thus, rule 5 is the Bayes rule for this prior distribution.
The Bayes rule depends on the prior. For prior
 (1 )  0.5 and  (2 )  0.5 , the Bayes risks are
r ( )  0.5R(1 ,  )  0.5R(2 ,  )
Rule
3
4
5
6.55 4.2 5.5
1
2
6
7
8
9
7.3
4.75 4.95 6.25 5.5
r ( ) 6
Thus, rule 4 is the Bayes rule for this prior distribution.
A non-subjective interpretation of Bayes rules: The Bayes
approach leads us to compare procedures on the basis of
r ( )   R( ,  ) ( )
if  is discrete with frequency function  (  or
r ( )   R( ,  ) ( ) d
if  is continuous with density  (  .
Such comparisons make sense even if we do not interpret
 (  as a prior density or frequency, but only as a weight
2
function that reflects the importance we place on doing
well at the different possible values of  .
For example, in Example 1, if we felt that doing well at
both 1 and  2 are equally important, we would set
 (1 )   (2 )  0.5 .
II. Minimax Criteria
The minimax criteria minimizes the worst possible risk.
That is, we prefer  to  ' , if and only if
sup  R( ,  )  sup  R( ,  ') .
*
A procedure  is minimax (over a class of considered
decision procedures) if it satisfies
sup R( ,  *)  inf sup R( ,  ) .
Among the nine decision rules considered for Example 2,
rule 4 is the minimax rule.
Rule
1 2
3
4
5 6
7
8
9
0 7
3.5 3
10 6.5 1.5 8.5 5
R(1 ,  )
12 7.6 9.6 5.4 1 3
8.4 4
6
R( 2 ,  )
max{ R(1 ,  ) , 12 7.6 9.6 5.4 10 6.5 8.4 8.5 6
R( 2 ,  ) }
Game theory motivation for minimax criterion: Suppose
we play a two-person zero sum game against Nature. Then
3
the minimax decision procedure is the minimax strategy for
the game.
Comments on the minimax criteria: The minimax criteria is
very conservative. It aims to give maximum protection
against the worst can happen. The principle would be
compelling if the statistician believed that Nature was a
malevolent “opponent” but in fact Nature is just the
inanimate state of the world.
Although the minimax criterion is conservative, in many
cases the principle does lead to reasonable procedures.
III. Other Global Criteria for Decision Procedures
Two compromises between Bayes and minimax criteria
that have been proposed are:
(1) Restricted Risk Bayes: Suppose that M is the maximum
risk of the minimax decision procedure. Then, one may be
willing to consider decision procedures whose maximum
risk exceeds M , if the excess is controlled, say, if
R( ,  )  M (1   ) for all   
(0.1)
where  is the proportional increase in risk that one is
willing to tolerate. A restricted risk Bayes decision
procedure for the prior  is then obtained by minimizing
the Bayes risk r ( ) among all decision procedures that
satisfy Error! Reference source not found..
For Example 1 and prior  (1 )  0.2 ,  (2 )  0.8
4
1
r ( ) 9.6
Max 12
Risk
Rule
2
3
4
5
7.48 8.38 4.92 2.8
6
3.7
7
8
7.02 4.9
9
5.8
7.6
6.5
8.4
6
9.6
5.4
10
8.5
For  =0.1 (maximum risk allowed = (1+0.1)*5.4=5.94),
decision rule 4 is the restricted risk Bayes procedure; for
 =0.25 (maximum risk allowed = (1+0.25)*5.4=6.75),
decision rule 6 is the restricted risk Bayes procedure.
(2) Gamma minimaxity. Let  be a class of prior
*
distributions. A decision procedure  is gamma-minimax
(over a class of considered decision procedures) if
inf sup  r ( )  sup  r ( * )
*
Thus, the estimator  minimizes the maximum Bayes risk
over those priors in the class  .
Consider the two prior distributions: (1) 1 (1 )  0.2 ,
1 (2 )  0.8 ; (2)  2 (1 )   2 (2 )  0.5 . The maximum
Bayes risk over these two priors for the rules are
max i 1,2 r i ( )
Rule
1 2
3
4
5 6
7
8
9
9.6 7.48 8.38 4.92 5.5 4.75 7.02 6.25 5.8
Thus, the Gamma minimax rule is rule 6.
5
Computational issues: We will study more on how to find
Bayes and minimax point estimators in Chapter 3. The
restricted risk Bayes procedure is appealing but it is
difficult to compute.
VIII. Randomized decision procedures
A randomized decision procedure  is a decision procedure
which assigns to each possible outcome of the data X , a
random variable Y( X ) , where the values of Y( X ) are
actions in the action space. When X = x , a draw from the
distribution of Y( x ) will be taken and will constitute the
action taken.
We will show in Chapter 3 that for any prior, there is
always a nonrandomized decision procedure that has at
least as small Bayes risk as a randomized decision
procedure (so we can ignore randomized decision
procedures in looking for the Bayes rule).
Students of game theory will realize that a randomized
decision procedure may lead to a lower maximum risk than
a nonrandomized decision procedure.
Example: For Example 1, a randomized decision procedure
is to flip a fair coin and use decision rule 4 if the coin lands
heads and decision rule 6 if the coin lands tails – i.e.,
Y ( x  0)  a2 with probability 1 and Y ( x  1)  a1 with
probability 0.5 and Y ( x  1)  a3 with probability 0.5. The
risk of this randomized decision procedure is
6
4.75 if  =1
0.5R( ,  4 )  0.5 R( ,  6 )  
4.20 if  = 2 ,
which has lower maximum risk than decision rule 4 (the
minimax rule among nonrandomized decision rules).
Randomized decision procedures are somewhat impractical
– it makes the statistician’s inferences seem less credible if
she has to explain to a scientist that she flipped a coin after
observing the data to determine the inferences.
We will show in Chapter 1.5 that a randomized decision
procedure cannot lower the maximum risk if the loss
function is convex.
7