Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Identifying Interesting Association
Rules with Genetic Algorithms
Elnaz Delpisheh
York University
Department of Computer Science and Engineering
May 22, 2017
Data mining
Too much
data
•I = {i1,i2,...,in} is a set of items.
•D = {t1,t2,...,tn} is a transactional database.
•ti is a nonempty subset of I.
•An association rule is of the form AB, where
A and B are the itemsets, A⊂ I, B⊂ I, and
A∩B=∅ .
•Apriori algorithm is mostly used for association
rule mining.
•{milk, eggs}{bread}.
2
Data
Data Mining
Association
rules
Apriori Algorithm
TID
3
List of item IDs
T100
I1,I2,I3
T200
I2, I4
T300
I2, I3
T400
I1,I2,I4
T500
I1, I3
T600
I2, I3
T700
I1, I3
T800
I1, I2, I3, I5
T900
I1, I2, I3
Apriori Algorithm (Cont.)
4
Association rule mining
Too much
data
Data
Data Mining
Too many
association
rules
Association
rules
5
Interestingness criteria
 Comprehensibility.
 Conciseness.
 Diversity.
 Generality.
 Novelty.
 Utility.
 ...
6
Interestingness measures
 Subjective measures
 Data and the user’s prior knowledge are considered.
 Comprehensibility, novelty, surprisingness, utility.
 Objective measures
 The structure of an association rule is considered.
 Conciseness, diversity, generality, peculiarity.
 Example: Support
 It represents the generality of a rule.
 It counts the number of transactions containing both A and B.
7
Drawbacks of objective measures
 Detabase-dependence
 Lack of knowledge about the database
 Threshold dependence
 Solution
 Multiple database reanalysis
 Problem
o Large number of disk I/O
Detabase-independence
8
Genetic algorithm-based learning
(ARMGA )
Initialize population
2. Evaluate individuals in population
3. Repeat until a stopping criteria is met
1.
Select individuals from the current population
B. Recombine them to obtain more individuals
C. Evaluate new individuals
D. Replace some or all the individuals of the current population
by off-springs
A.
4.
9
Return the best individual seen so far
ARMGA Modeling
 Given an association rule XY
 Requirement
 Conf(XY) > Supp(Y)
 Aim is to maximise
10
ARMGA Encoding
 Michigan Strategy
 Given an association k-rule XY, where X,Y⊂I, I is
a set of items I=i1,i2,..., in, and X∩Y= .
 For example
 {A1,...,Aj}{Aj+1,...,Ak}
11
ARMGA Encoding (Cont.)
 The aforementioned encoding highly depends on the length
of the chromosome.
 We use another type of encoding:
 Given a set of items {A,B,C,D,E,F}
 Association rule ACFB is encoded as follows
 00A11B00C01D11E00F
 00: Item is antecedent
 11: Item is consequence
 01/10: Item is absent
12
ARMGA Operators
 Select
 Crossover
 Mutation
13
ARMGA Operators-Select
 Select(c,ps): Acts as a filter of the chromosome
 C: Chromosome
 Ps: pre-specified probability
14
ARMGA Operators-Crossover
 This operation uses a two-point strategy
15
ARMGA Operators-Mutate
16
ARMGA Initialization
17
ARMGA Algorithm
18
Empirical studies and Evaluation
 Implement the entire procedure using Visual C++
 Use WEKA to produce interesting association rules
 Compare the results
19
20