Download Genetic-Algorithm-Based Instance and Feature Selection

Genetic-Algorithm-Based Instance and Feature Selection Instance Selection and Construction for Data Mining Ch. 6 H. Ishibuchi, T. Nakashima, and M. Nii Abstract  GA based approach for selecting a small number of instances from a given data set in a pattern classification problem.  To improve the classification ability of our nearest neighbor classifier by searching for an appropriate reference set. Genetic Algorithm  Coding  Binary string of the length (n+m) S  a1a2 an s1s2  sm  ai : inclusion or exclusion of the i-th feature  sp : the inclusion or exclusion of the p-th instance  Fitness function  Minimize |F|, minimize |P|, and maximize g(S)  |F| : number of selected feature  |P| : number of selected instance  g(S) : classification performance Genetic Algorithm  Performance measure (first one) : gA(S)  The number of correctly classified instances  Minimize |P| subject to gA(S) = m  Performance measure (second one) : gB(S)  When an instance xq was included in the reference set, xq was not selected as its own nearest neighbor. min{ d F ( x p , xq ) | x p  P}, if xq  P d F ( x p * , xq )   min{ d F ( x p , xq ) | x p  P  {xq }}, if xq  P  fitness fitness( S )  Wg  g ( S )  WF  | F | WP  | P | Genetic Algorithm Initialization 2. Genetic Operation: Iterate the following procedure Npop/2 times to generate Npop string 1. 1. Randomly select a pair of strings 2. Apply a uniform crossover 3. Apply a mutation operator Generation Update: Select the Npop best string from 2Npop 4. Termination test 3. Numerical Example Biased Mutation  For effectively decreasing the number of selected instances is to bias the mutation probability  In the biased mutation, a much larger probability is assigned to the mutation from sp = 1 to sp = 0. Data sets        2 artificial + 4 real Normal distribution with small overlap Normal distribution with large overlap Iris data Appendicitis Data Cancer Data Wine Data Parameter Specifications    Pop Size : 50 Crossover Prob. : 1.0 Mutation Prob.  Pm = 0.01 for feature selection  Pm(1  0) = 0.1 for instance selection  Pm(0  1) = 0.01 for instance selection     Stopping condition : 500 gen. Weight values : Wg = 5; WF = 1; WP = 1 Performance measure : gA(S) or gB(S) 30 trials for each data Performance on Training Data Performance on Test Data  Leaving-one-out procedure (iris & appendicitis)  10-fold cross-validation (cancer & wine) Effect of Feature Selection Effect on NN Some Variants

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Genetic-Algorithm-Based Instance and Feature Selection