Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India 1 2 Introduction to evolutionary computation Gene Expression Programming (GEP) Application of GEP for Event Selection Conclusion Liliana Teodorescu, Brunel University 3 Evolutionary computation simulates the natural evolution on a computer process leading to maintenance or increase of a population ability to survive and reproduce in a specific environment quantitatively measured by evolutionary fitness Goal of natural evolution - to generate a population of individuals with increasing fitness Goal of evolutionary computation - to generate a set of solutions (to a problem) of increasing quality Liliana Teodorescu, Brunel University 4 Individual – candidate solution to a problem decoding encoding Chromosome – representation of the candidate solution Gene – constituent entity of the chromosome Population – set of individuals/chromosomes Fitness function – representation of how good a candidate solution is Genetic operators – operators applied on chromosomes in order to create genetic variation (other chromosomes) Liliana Teodorescu, Brunel University 5 Natural evolution simulation - core of the evolutionary algorithms: Basic evolutionary algorithm Run Start Initial population creation (randomly) Fitness evaluation (of each chromosome) New generation Problem definition Encoding of the candidate solution Fitness definition Run Decoding the best fitted chromosome = solution yes Terminate? Stop no Selection of individuals (proportional with fitness) Reproduction (genetic operators) Replacement of the current population with the new one Liliana Teodorescu, Brunel University 6 Genetic Algorithms (GA) (J. H. Holland, 1975) Genetic Programming (GP) (J. R. Koza, 1992) Gene Expression Programming (GEP) (C. Ferreira, 2001) Main differences Encoding method Reproduction method Liliana Teodorescu, Brunel University 7 search for the computer program that solve the problem (as GP) works with two entities: chromosomes and expression trees Encoding Candidate solution represented by an expression tree (ET) Q ( a b) (c d ) * Q*-+abcd + a ET encoded in a chromosome: read ET left - right and top - down b c Q means sqrt d Decoding the chromosome (translates the chromosome in an ET) •first line of ET (root) – first element of the chromosome •next line of ET – as many arguments needed by the element in the previous line Liliana Teodorescu, Brunel University 8 Reproduction Genetic operators applied on chromosoms not on ET => always produce sintactically correct structures! Recombination – exchanges parts of two chromosomes Mutation – changes the value of a node Transposition – moves a part of the chromosome to another location in the same chromosome e.g. Mutation: Q replaced with * *b+a-aQab+//+b+babbabbbababbaaa * + b - a a *b+a-aQab+//+b+babbabbbababbaaa * + b - a Q a a * a b Liliana Teodorescu, Brunel University 9 Chromosome – has one or more genes of equal length Gene – head: contains both functions and terminals (length h) - tail: contains only terminals (length t) t=h(n-1)+1 n – number of arguments of the function with the highest number of arguments e.g. set of functions: Q,*,/,-,+ set of terminals: a,b n=2; h=15 (choosen) t =16 => length of gene=15+16=31 * + b - a a *b+a-aQab+//+b+babbabbbababbaaa ET ends before the end of the gene! Q a Liliana Teodorescu, Brunel University 10 GEP for event selection cuts/selection criteria finding classification problem (signal/background classification) statistical learning approach Data samples: Monte-Carlo simulation from BaBar experiment Ks production in e+e- (~10 GeV) 5000 training events (for classification rule extraction) 5000 test events (others than training events) limitations imposed by APS 3.0 S/N = 0.25, 1, 5 Software resources APS 3.0 (Automatic Problem Solver) - commercial package (Windows based) - www.gepsoft.com - function finding, classification, time series analysis Liliana Teodorescu, Brunel University 11 Functions and constants to be used in the classification rules (cut type rule) 10 functions - AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0) - OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0) - <, >, <=, >=, =, != 36 functions -previous 10 functions + common mathematical functions constants - floating point constants (-10,10) Data – variables usually used in a cut based analysis for K S selection - doca (distance of closest approach) - |cos(hel)| - Fsig (Flight Significance) - Mass (KS reconstructed mass) - RXY, |RZ| (region around interaction point) - SFL (Signed Flight Length) - Pchi (2 probability of the vertex) GEP parameters - fitness function: number of hits (events correctly classified) - gene length (head = 1-20) - no. of chromosomes per generation: 100 - no. of generations per run: 1000-20000 - genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1 Liliana Teodorescu, Brunel University 12 Data sample: S/N =0.25 10 functions No. of genes = 1, Head length =10 Fsig 5.26, Rxy < 0.19, doca <1, Pchi > 0 Classification Accuracy = 95.36% Liliana Teodorescu, Brunel University 13 Data samples: S/N =0.25, 1, 5 10 functions 95% S/N = 0.25 5% 92% S/N = 1 92% S/N = 5 8% 8% Liliana Teodorescu, Brunel University 14 Data sample: S/N =0.25 10 functions Head Acc. (%) Selection criteria 1 83.34 Fsig 9.93 2 90.76 Fsig 8.80, doca <1 3 94.88 Fsig > 3.67, Rxy Pchi 4 94.88 Fsig > 3.67, Rxy Pchi 5 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi 7 95.04 Fsig 3.64, Rxy < Pchi, Pchi > 0 10 95.360 Fsig 5.26, Rxy < 0.19, doca <1, Pchi > 0 20 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0, doca > 0, Rxy Mass Liliana Teodorescu, Brunel University 15 GEP 20 Cut-based analysis 95.50 Fsig > 4.1, Rxy 0.2, SFL > 0.2, Pchi > 0, doca > 0, Rxy Mass No cut Fsig 4.0 Rxy 0.2cm SFL 0cm Pchi > 0.001 doca 0.4cm |Rz| 2.8cm Previous cuts + doca > 0 Rxy Mass Reduction S: 15% B: 98% Reduction S: 16% B: 98.3% Fsig 4.1 Rxy 0.2cm SFL 0.2cm Pchi > 0 Previous cuts + doca 0.4 | Rz| 2.8cm Liliana Teodorescu, Brunel University 16 GEP 5 Reduction S: 7.6% B: 87.8% 95.04 Fsig 3.63, |Rz| 2.65, Rxy < Pchi Fsig 3.63 |Rz| 2.65cm Reduction S: 16% B: 97.8% Fsig 3.63 |Rz| 2.65cm Rxy<Pchi Liliana Teodorescu, Brunel University 17 Data samples: S/N =0.25, 1, 5 10 functions S/N = 0.25 S/N = 5 S/N = 1 Liliana Teodorescu, Brunel University 18 Data sample: S/N =0.25 36 mathematical functions No. of genes = 1, Head length =10 Classification Accuracy = 95.00% Liliana Teodorescu, Brunel University 19 Data sample: S/N =0.25 36 functions Classification Accuracy 95% Liliana Teodorescu, Brunel University 20 GEP allows fast identification of powerful cuts signal/background separation of 92-95% accuracy for samples with S/N = 0.25, 1, 5 potential of discovering new correlations between variables large number of selection functions does not improve the classification accuracy GEP is still in the R&D phase needs software development -> underway Liliana Teodorescu, Brunel University