Download slides - Indico

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computing in High Energy and Nuclear Physics, 13-17 February 2006, Mumbai, India
1
2




Introduction to evolutionary computation
Gene Expression Programming (GEP)
Application of GEP for Event Selection
Conclusion
Liliana Teodorescu, Brunel University
3
 Evolutionary computation simulates the natural evolution on a computer
process leading to maintenance or increase of a population
ability to survive and reproduce in a specific environment
quantitatively measured by evolutionary fitness
 Goal of natural evolution - to generate a population of individuals with
increasing fitness
 Goal of evolutionary computation - to generate a set of solutions (to a
problem) of increasing quality
Liliana Teodorescu, Brunel University
4
 Individual – candidate solution to a problem
decoding
encoding
 Chromosome – representation of the candidate solution
 Gene – constituent entity of the chromosome
 Population – set of individuals/chromosomes
 Fitness function – representation of how good a candidate solution is
 Genetic operators – operators applied on chromosomes in order
to create genetic variation (other chromosomes)
Liliana Teodorescu, Brunel University
5
Natural evolution simulation - core of the evolutionary algorithms:
Basic evolutionary algorithm
Run
Start
Initial population creation (randomly)
Fitness evaluation (of each chromosome)
New generation
 Problem definition
 Encoding of the
candidate solution
 Fitness definition
 Run
 Decoding the best fitted
chromosome = solution
yes
Terminate?
Stop
no
Selection of individuals (proportional with fitness)
Reproduction (genetic operators)
Replacement of the current population with the new one
Liliana Teodorescu, Brunel University
6
 Genetic Algorithms (GA) (J. H. Holland, 1975)
 Genetic Programming (GP) (J. R. Koza, 1992)
 Gene Expression Programming (GEP) (C. Ferreira, 2001)
Main differences
 Encoding method
 Reproduction method
Liliana Teodorescu, Brunel University
7
 search for the computer program that solve the problem (as GP)
 works with two entities: chromosomes and expression trees
Encoding
Candidate solution represented
by an expression tree (ET)
Q
( a  b)  (c  d )
*
Q*-+abcd
+
a
ET encoded in a chromosome:
read ET left - right
and top - down
b
c
Q means sqrt
d
Decoding the chromosome (translates the chromosome in an ET)
•first line of ET (root) – first element of the chromosome
•next line of ET – as many arguments needed by the element in
the previous line
Liliana Teodorescu, Brunel University
8
Reproduction
Genetic operators applied on chromosoms not on ET =>
always produce sintactically correct structures!
 Recombination – exchanges parts of two chromosomes
 Mutation – changes the value of a node
 Transposition – moves a part of the chromosome to another location
in the same chromosome
e.g. Mutation: Q replaced with *
*b+a-aQab+//+b+babbabbbababbaaa
*
+
b
-
a
a
*b+a-aQab+//+b+babbabbbababbaaa
*
+
b
-
a
Q
a
a
*
a
b
Liliana Teodorescu, Brunel University
9
Chromosome – has one or more genes of equal length
Gene – head: contains both functions and terminals (length h)
- tail: contains only terminals (length t)
t=h(n-1)+1
n – number of arguments of the function
with the highest number of arguments
e.g. set of functions: Q,*,/,-,+
set of terminals: a,b
n=2; h=15 (choosen)
t =16 => length of gene=15+16=31
*
+
b
-
a
a
*b+a-aQab+//+b+babbabbbababbaaa
ET ends before the end of the gene!
Q
a
Liliana Teodorescu, Brunel University
10
GEP for event selection
 cuts/selection criteria finding
 classification problem (signal/background classification)
 statistical learning approach
Data samples:
 Monte-Carlo simulation from BaBar experiment
 Ks production in e+e- (~10 GeV)
 5000 training events (for classification rule extraction)
 5000 test events (others than training events)
 limitations imposed by APS 3.0
 S/N = 0.25, 1, 5
Software resources
 APS 3.0 (Automatic Problem Solver)
- commercial package (Windows based)
- www.gepsoft.com
- function finding, classification, time series analysis
Liliana Teodorescu, Brunel University
11
Functions and constants to be used in the classification rules (cut type rule)
10 functions
- AND1 (x<0 and y<0 => 1 else 0), AND2 (x0 and y0 => 1 else 0)
- OR1 (x<0 or y<0 => 1 else 0), OR2 (x0 or y0 => 1 else 0)
- <, >, <=, >=, =, !=
36 functions
-previous 10 functions + common mathematical functions
constants
- floating point constants (-10,10)
 
Data – variables usually used in a cut based analysis for K S    selection
- doca (distance of closest approach)
- |cos(hel)|
- Fsig (Flight Significance)
- Mass (KS reconstructed mass)
- RXY, |RZ| (region around interaction point)
- SFL (Signed Flight Length)
- Pchi (2 probability of the vertex)
GEP parameters
- fitness function: number of hits (events correctly classified)
- gene length (head = 1-20)
- no. of chromosomes per generation: 100
- no. of generations per run: 1000-20000
- genetic operators rates: mutation 0.044, inversion 0.1, transposition 0.3, recombination 0.1
Liliana Teodorescu, Brunel University
12
Data sample: S/N =0.25
10 functions
No. of genes = 1, Head length =10
Fsig  5.26,
Rxy < 0.19,
doca <1,
Pchi > 0
Classification Accuracy = 95.36%
Liliana Teodorescu, Brunel University
13
Data samples: S/N =0.25, 1, 5
10 functions
95%
S/N = 0.25
5%
92%
S/N = 1
92%
S/N = 5
8%
8%
Liliana Teodorescu, Brunel University
14
Data sample: S/N =0.25
10 functions
Head
Acc. (%)
Selection criteria
1
83.34
Fsig  9.93
2
90.76
Fsig 8.80, doca <1
3
94.88
Fsig > 3.67, Rxy  Pchi
4
94.88
Fsig > 3.67, Rxy  Pchi
5
95.04
Fsig  3.63, |Rz|  2.65, Rxy < Pchi
7
95.04
Fsig  3.64, Rxy < Pchi, Pchi > 0
10
95.360
Fsig  5.26, Rxy < 0.19, doca <1, Pchi > 0
20
95.50
Fsig > 4.1, Rxy  0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy  Mass
Liliana Teodorescu, Brunel University
15
GEP
20
Cut-based analysis
95.50
Fsig > 4.1, Rxy  0.2, SFL > 0.2, Pchi > 0,
doca > 0, Rxy  Mass
No cut
Fsig  4.0
Rxy  0.2cm
SFL  0cm
Pchi > 0.001
doca  0.4cm
|Rz|  2.8cm
Previous cuts
+
doca > 0
Rxy  Mass
Reduction
S: 15%
B: 98%
Reduction
S: 16%
B: 98.3%
Fsig  4.1
Rxy  0.2cm
SFL  0.2cm
Pchi > 0
Previous cuts
+
doca  0.4
| Rz|  2.8cm
Liliana Teodorescu, Brunel University
16
GEP
5
Reduction
S: 7.6%
B: 87.8%
95.04
Fsig  3.63, |Rz|  2.65, Rxy < Pchi
Fsig  3.63
|Rz|  2.65cm
Reduction
S: 16%
B: 97.8%
Fsig  3.63
|Rz|  2.65cm
Rxy<Pchi
Liliana Teodorescu, Brunel University
17
Data samples: S/N =0.25, 1, 5
10 functions
S/N = 0.25
S/N = 5
S/N = 1
Liliana Teodorescu, Brunel University
18
Data sample: S/N =0.25
36 mathematical functions
No. of genes = 1, Head length =10
Classification Accuracy = 95.00%
Liliana Teodorescu, Brunel University
19
Data sample: S/N =0.25
36 functions
Classification Accuracy  95%
Liliana Teodorescu, Brunel University
20
GEP allows
 fast identification of powerful cuts
 signal/background separation of 92-95% accuracy
for samples with S/N = 0.25, 1, 5
 potential of discovering new correlations between
variables
 large number of selection functions does not improve
the classification accuracy
GEP
 is still in the R&D phase
 needs software development -> underway
Liliana Teodorescu, Brunel University