Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
FDR, Evidence Theory,
Robustness
Multiple testing
• The probability of rejecting a true null
hypothesis at 99% is 1%.
• Thus, if you repeat test 100 times, each time
with new data, you will reject sometime with
probability 0.63
• Bonferroni correction, FWE control:
in order to reach significance level 1% in an
experiment involving 1000 tests, each test
should be checked with significance
1/1000 %
FDR Example - independence
Fdrex(pv,0.05,0)
10 signals suggested.
Smallest p-value not
significant with
Bonferroni correction
(0.019 vs 0.013)
FDR Example - dependency
Fdrex(pv,0.05,1)
10 signals suggested
assuming independence
all disappear with
correction term
Ed Jaynes devoted a large
part of his career to promote
Bayesian inference.
He also championed the
use of Maximum Entropy in physics
Outside physics, he received
resistance from people who had
already invented other methods.
Why should statistical mechanics
say anything about our daily human
world??
Zadeh’s Paradoxical Example
• Patient has headache, possible explanations are
M-- Meningitis ; C-- Concussion ; T-- Tumor.
•
•
•
•
Expert 1: P( M )=0 ; P( C )=0.9 ; P( T )=0.1
Expert 2: P( M )=0.9 ; P( C )=0 ; P( T )=0.1
Parallel comb:
0
0
0.01
What is the combined conclusion? Parallel
normalized: (0,0,1)?
• Is there a paradox??
Zadeh’s Paradox (ctd)
• One expert (at least) made an error
• Experts do not know what probability zero means
• Experts made correct inferences based on different
observation sets, and T is indeed the correct answer:
f(|o1, o2) = c f(o1|)f(o2| )f()
but this assumes f(o1,o2 | )=f(o1| ) f(o2| )
which need not be true if granularity of  is
too coarse (not taking variability of f(oi| ) into
account).
One reason (among several) to look at Robust
Bayes.
Robust Bayes
• Priors and likelihoods are convex sets of probability distributions
(Berger, de Finetti, Walley,...): imprecise probability:
•
f ( | D)  f (D | ) f ( )
Every member of posterior is a ’parallell combination’ of one member of
| D)
 F(D|
likelihoodF(
and
one
member
of prior.  )F( )
• For decision making: Jaynes recommends to use that member of
posterior with maximum entropy (Maxent estimate).
Generalisation of
Bayes/Kalman:
What
if:
You have no prior?
•
• Likelihood infeasible to compute (imprecision)?
• Parameter space vague, i.e., not the same for
all likelihoods? (Fuzziness, vagueness)?
• Parameter space has complex structure
(a simple structure is e.g., a Cartesian product
of reals, R, and some finite sets)?
Some approaches...
• Robust Bayes: replace distributions by convex sets of
distributions (Berger m fl)
• Dempster/Shafer/TBM: Describe imprecision with
random sets
• DSm: Transform parameter space to capture
vagueness. (Dezert/Smarandache, controversial)
• FISST: FInite Set STatistics: Generalises
observation- and parameter space to product of
spaces described as random sets.
(Goodman, Mahler, Ngyuen)
Ellsberg’s Paradox:
Ambiguity Avoidance
Urna A innehåller
4 vita och 4 svarta
kulor, och 4 av
okänd färg (svart
eller vit)
?
Urna B
innehåller
6 vita och
6 svarta kulor
?
?
?
Du får en krona om du drar en svart kula. Ur vilken urna
vill du dra den?
En precis Bayesian bör först anta hur ?-kulorna är färgade och sedan
svara. Men en majoritet föredrar urna B även om svart byts mot vit
Hur används imprecisa
sannolikheter?
• Förväntad nytta för beslutsalternativ blir intervall i
stället för punkter: maximax, maximin, maximedel?
u
Bayesian
pessimist
optimist
a
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Hur används imprecisa
sannolikheter?
• Förväntad nytta för beslutsalternativ blir intervall i
stället för punkter: maximax, maximin, maximedel?
u
Bayesian
pessimist
optimist
a
Dempster/Shafer/Smets
• Evidence is random set over over .
• I.e., probability distribution over . 
• Probability of singleton: ‘Belief’ allocated to
alternative, i.e., probability.
• Probability of non-singelton: ‘Belief’ allocated to set of
alternatives, but not to any part of it.
• Evidences combined by random intersection
conditioned to be non-empty (Dempster’s rule).
2

Correspondence DS-structure
-set
of
probability
distributions
For a pdf (bba) m over 2^, consider all
ways of reallocating the probability mass
of non-singletons to their member atoms:
This gives a convex set of probability distributions
over . Example: ={A,B,C}
set of pdfs
bba
A: 0.1
A: 0.1+0.5*x
for all x[0,1]
B: 0.3
B: 0.3+0.5*(1-x)
C: 0.1
C: 0.1
AB: 0.5
Can we regard any set of pdf:s as a bba? Answer is NO!!
There are more convex sets of pdf:s than DS-structures
Representing probability set as bba: 3-element universe
Rounding up: use
lower envelope.
Black: convex set
Blue: rounded up Rounding down:
Red: rounded down Linear programming
Rounding is not unique!!
Another appealing conjecture
• Precise pdf can be regarded as (singleton) random set.
• Bayesian combination of precise pdf:s corresponds to random
set intersection (conditioned on non-emptiness)
• DS-structure corresponds to Choquet capacity
(set of pdf:s)
• Is it reasonable to combine Choquet capacities by (nonempty)
random set intersection (Dempster’s rule)??
• Answer is NO!!
• Counterexample: Dempster’s combination cannot be obtained
by combining members of prior and likelihood:
Arnborg: JAIF vol 1, No 1, 2006
Consistency of fusion
operators
Axes are probabilities of A and B in a 3-element universe
P(B)
Operands (evidence)
Robust Fusion
Dempster’s rule
Modified Dempster’s rule
Rounded robust
DS rule
MDS rule
P(A)
P(C )=1-P(A)-P(B)