Download Bayesian Optimization Algorithm, Decision Graphs, and Occam`s

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No. 2000020 May 2000. Abstract • The use of various scoring metrics for Bayesian networks. • The use of decision graphs in Bayesian networks to improve the performance of the BOA. • BDe metric for Bayesian networks with decision graphs. Bayesian Networks • Two basics components in Bayesian Networks – A scoring metric for discriminates the networks – A search algorithm for finding the best scoring metric value • BOA (in previous works) – The complexity of the considered models was bounded by the maximum number of incoming edges into any node. – To search the space of networks, a simple greedy algorithm was used due to its efficiency. Bayesian-Dirichlet Metric • BDe metric combines the prior knowledge about the problem and the statistical data from a given data set. p( B) p( D | B) • Bayes theorem p( B | D)  p ( D) • The higher the p(B|D), the more likely the network B is a correct model of the data.  Bayesian scoring metric, or the posterior probability – Even more, we use a fixed data set D. Bayesian-Dirichlet Metric • p(B) : prior probability of the network B p( B)  c  c : normalizat ion constant  : constant factor penalizing the network for each unmatched edge with the prior network  : symmetric difference between B and the prior network • BDe metric gives preference to simpler networks – But, it’s not enough! Bayesian-Dirichlet Metric • p(B|D) – Data is a multinomial sample – Parameters are independent 1. The parameters associated with each variable are independent (global parameter independence) 2. The parameters associated with each instance of the parents of a variable are independent (local parameter independence) – Dirichlet distribution – No missing data (complete data) Bayesian-Dirichlet Metric n 1 (m( i )) (m( i )) p( D | B)      ( m (  )  m (  )) i 0  i i 0  i ( m( i )  m( i )) i i n 1 • Often referred to K2 metric Minimum Description Length Metric length( B, D)  length( B)  length( X ,  )  length( D | B)   n    length( B)    log 2 n  log 2  i 0   |  i |  n 1 1 length( X ,  )  log 2 N  2| i | 2 i 0 n 1 n 1 length( D | B)   N  p( xi ,  i ) log 2 p( xi |  i ) i  0 xi i • Not good for using prior information Constructing a Network • Constructing a best network is NPcomplete. • Most of the commonly used metrics can be decomposed into independent terms each of which corresponds to one variable. • Empirical results show that more sophisticated search algorithms do not improve the obtained result significantly. Decision Graphs in Bayesian Networks • The use of local structures as decision trees, decision graphs, and default tables to represent equalities among parameters was proposed • The network construction algorithm takes an advantage of using decision graphs by directly manipulating the network structure through the graphs. Decision Graphs • A decision graph is an extension of a decision tree in which each non-root node can have multiple parents. Advantages of Decision Graph • Much less parents can be used to represent a model • Learning more complex class of models, called Bayesian multinets • Performs smaller and more specific steps what results in better models with respect to their likelihood. • Network complexity measure can be incorporated into the scoring metir Bayesian Score for Networks with Decision Graphs (m( xi , i, l )  m( xi , i, l )) (m(i, l )) p( D | B)      ( m ( i , l )  m ( i , l )) (m( xi , i, l )) i  0 lLi xi n 1 Li : the set of leaves in the dicision graph m( i ) : the number of instances in D which end up the traversal through t he graph m : the prior knowledge Operators on Decision Graphs split merge Constructing BN with DG 1. Initialize a decision graph Gi for each node xi to a graph containing only a single leaf. 2. Initialize the network B into an empty network. 3. Choose the best split or merge that does not result in a cycle in B. 4. If the best operator does not improve the score, finish. Constructing BN with DG 5. Execute the chosen operator 6. If the operator was a split, update the network B by adding a new edge. 7. Go to (3) Experiments • One-max • 3-deceptive • Spin-glass • Graph bisection

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Bayesian Optimization Algorithm, Decision Graphs, and Occam`s