Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HW1 Beta(p;4,10),Beta(p;9,15), Beta(6,6) Beta(p;1,1) likelihood FDR, Evidence Theory, Robustness General Dependency test function PV=testdep(D,N) % General dependecy test TESTDEP(D,N) % D: n by d Data Matrix % N : Monte Carlo comparison sample F=[]; [n,d]=size(D); mv=mean(D); D=D-repmat(mv,n,1); %remove mean st=std(D); D=D./repmat(st,n,1); %standardize variance for i=1:d for j=1:i-1 q=mean(D(:,i).*D(:,j)); F=[F q]; end end Empirical no-dependency distribution EE=[]; for iN=1:N-1 E=[]; for i=1:d q=[]; if i>1 D(:,i)=D(randperm(n),i); for j=1:i-1 q=mean(D(:,i).*D(:,j)); E=[E q]; end end end EE=[EE;E]; end Computing P-value %Sorting twice gives value ranks of EE - test statistics EE=[F ; EE]; [EEs,iix]=sort(EE); [EEs,iix]=sort(iix); % p-value is proportional to value rank PV=iix(1,:)/N; % reshuffle to matrix PVM(ix)=PV Correlation coefficient >> D=[1:100]'; >> D=[D -D D.^2 D+200*rand(size(D)) randn(size(D))]; >> [c pv]=corrcoef(D) c= 1.0000 -1.0000 0.9689 0.2506 -0.0977 -1.0000 1.0000 -0.9689 -0.2506 0.0977 0.9689 -0.9689 1.0000 0.2959 -0.0540 0.2506 -0.2506 0.2959 1.0000 -0.0242 -0.0977 0.0977 -0.0540 -0.0242 1.0000 Correlation coefficient >> [c pv]=corrcoef(D) pv = 1.0000 0 0.0000 >> [,pv]=testdep(D,N) pv = 0 0 1.0000 >> 0.0119 0.3335 0.9936 0.1668 Multiple testing • The probability of rejecting a true null hypothesis at 99% is 1%. • Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 • Bonferroni correction, FWE control: in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 % Multiple testing • Several approaches try to verify an excess of small p-values • Sort set of p-values and test if there is an excess of small values - this is an indication of false null hypotheses Approaches to multiple testing Definition of FDR, positive correlation No significance Lower envelope FDR FDR corrected Some significance One of 15 first tests not null-On 5% significanc More significance FDR: 95% of first 3 tests not null hypothesis Even more significance 95% of first 14 tests not null - worth effort to investigate all FDR Example - independence Fdrex(pv,0.05,0) 10 signals suggested. Smallest p-value not significant with Bonferroni correction (0.019 vs 0.013) FDR Example - dependency Fdrex(pv,0.05,1) 10 signals suggested assuming independence all disappear with correction term Ed Jaynes devoted a large part of his career to promote Bayesian inference. He also championed the use of Maximum Entropy in physics Outside physics, he received resistance from people who had already invented other methods. Why should statistical mechanics say anything about our daily human world?? Generalisation of Bayes/Kalman: What if: You have no prior? • • Likelihood infeasible to compute (imprecision)? • Parameter space vague, i.e., not the same for all likelihoods? (Fuzziness, vagueness)? • Parameter space has complex structure (a simple structure is e.g., a Cartesian product of reals, R, and some finite sets)? Philippe Smets (1938-2005) Developed Dempster’s and Shafer’s method in uncertainty management into the Transferable Belief Method, that combines imprecise ‘evidence’ (likelihood or prior) using Dempster’ rule, and uses pignistic transformation to get a sharp decision criterion Some approaches... • Robust Bayes: replace distributions by convex sets of distributions (Berger m fl) • Dempster/Shafer/TBM: Describe imprecision with random sets • DSm: Transform parameter space to capture vagueness. (Dezert/Smarandache, controversial) • FISST: FInite Set STatistics: Generalises observation- and parameter space to product of spaces described as random sets. (Goodman, Mahler, Ngyuen) Combining Evidence Combining Evidence Combining Evidence QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Robust Bayes • Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise probability: f ( | D) f (D | ) f () F( | D) F(D | )F() • Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior. • For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate). Ellsberg’s Paradox: Ambiguity Avoidance Urna A innehåller 4 vita och 4 svarta kulor, och 4 av okänd färg (svart eller vit) ? Urna B innehåller 6 vita och 6 svarta kulor ? ? ? Du får en krona om du drar en svart kula. Ur vilken urna vill du dra den? En precis Bayesian bör först anta hur ?-kulorna är färgade och sedan svara. Men en majoritet föredrar urna B även om svart byts mot vit Prospect Theory: Kahneman, Tversky • Safety belts eliminate car collision injuries at low speed completely (I BUY IT!!!) • Safety belts eliminate 90% of injuries in car accidents. In 10% the speed is to high (So belts are not that good!???) Hur används imprecisa sannolikheter? • Förväntad nytta för beslutsalternativ blir intervall i stället för punkter: maximax, maximin, maximedel? u Bayesian pessimist optimist a Dempster/Shafer/Smets • Evidence is random set over over . • I.e., probability distribution over . • Probability of singleton: ‘Belief’ allocated to alternative, i.e., probability. • Probability of non-singelton: ‘Belief’ allocated to set of alternatives, but not to any part of it. • Evidences combined by random intersection conditioned to be non-empty (Dempster’s rule). 2 Logic of Dempster’s rule • Each observer has a private state space and assesses the posterior over it. • Each private state can correspond to one or more global or common states, multivalued mapping • Observers state spaces are assumed independent. Correspondence DS-structure-set of probability distributions For a pdf (bba) m over 2^, consider all ways of reallocating the probability mass of non-singletons to their member atoms: This gives a convex set of probability distributions over . Example: ={A,B,C} set of pdfs bba A: 0.1 A: 0.1+0.5*x for all x[0,1] B: 0.3 B: 0.3+0.5*(1-x) C: 0.1 C: 0.1 AB: 0.5 Can we regard any set of pdf:s as a bba? Answer is NO!! There are more convex sets of pdf:s than DS-structures Representing probability set as bba: 3-element universe Rounding up: use lower envelope. Black: convex set Blue: rounded up Rounding down: Red: rounded down Linear programming Rounding is not unique!! Another appealing conjecture • Precise pdf can be regarded as (singleton) random set. • Bayesian combination of precise pdf:s corresponds to random set intersection (conditioned on non-emptiness) • DS-structure corresponds to Choquet capacity (set of pdf:s) • Is it reasonable to combine Choquet capacities by (nonempty) random set intersection (Dempster’s rule)?? • Answer is NO!! • Counterexample: Dempster’s combination cannot be obtained by combining members of prior and likelihood: Arnborg: JAIF vol 1, No 1, 2006 Consistency of fusion operators Axes are probabilities of A and B in a 3-element universe P(B) Operands (evidence) Robust Fusion Dempster’s rule Modified Dempster’s rule Rounded robust DS rule MDS rule P(A) P(C )=1-P(A)-P(B) Deciding target type • • • • • Attack aircraft: small, dynamic Bomber aircraft: large, dynamic Civilian: Large, slow dynamics Prior: (0.5,0.4,0.1); Observer 1: probably small, likelihood (0.8,0.1,0.1); • Observer 2: probably fast, likelihood (0.4,0.4,0.2); Estimators Center encl sphere Pignistic MaxEnt 3 states: P( C ) = 1-P(A)-P(B) QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. What about Smets’ TBM?? • TBM combines the original Dempster’s rule with the pignistic transformation. This is not compatible with precise Bayesian analysis. • However, there is nothing against claiming TBM to be some kind of Robust Bayesian scheme. • Main problem: Dempster’s rule and its motivation using multi-valued mappings is against the dominant argumentation used in introductions and tutorials: TBM is incompatible with the Capacity interpretation of DS structures