Download Exploratory Mining and Pruning Optimization of Constrained

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Exploratory Mining and Pruning
Optimization of Constrained
Associations Rules
1998년 8월 7일
Data Engineering Lab
성 유진
1
Abstract
• Standpoint of supporting human-centered
discovery of Knowledge
– lack of user exploration and control
– lack of focus
– rigid notion of relationship
• Constrained association queries
– pruning using monotonicity, succinctness
1998년 8월 7일
Data Engineering Lab
성 유진
2
Introduction
• Problem1 (Lack of User Exploration and
Control)
– Mining Process => Black Box
– (user can’t preempt and needs to wait for hours)
– establish clear breakpoints to allow user feedback
• Problem2 (Lack of Focus)
– on which to focus the mining
 to find association between sets of items whose
types do not overlap
1998년 8월 7일
Data Engineering Lab
성 유진
3
 associations from item sets whose total price is at
least $1,000
– provide a rich interface for the user to express
focus (CAQ)
• Problem3 (Rigid notion of Relationship)
– significance metrics :
– separate criteria for selecting candidates for the
antecedent and consequent:
 association from items to sets of types
pepsi => snacks
1998년 8월 7일
Data Engineering Lab
성 유진
4
1998년 8월 7일
Data Engineering Lab
성 유진
5
Architecture
• Phase 1
– user initially specifies CAQ
• includes a set of constraints C
• C is applicable to the antecedent and consequent
– output:
• pairs of candidates(Sa, Sc)
• Sa, Sc have support over thresholds
– user can add, delete, of modify the constraints
as many times as desired
1998년 8월 7일
Data Engineering Lab
성 유진
6
• Phase 2
– significance metric
– a threshold for the metric
– whatever further conditions to be imposed ont
the antecedent and consequent
 classical association mining
- confidence (as significance metric)
- confidence threshold
- require ( SaSc) be frequent
1998년 8월 7일
Data Engineering Lab
성 유진
7
1998년 8월 7일
Data Engineering Lab
성 유진
8
Constrained Association Queries
• CAQ
– S  Item : S is a set variable on the Item
domain
– {(S1, S2) |C}, C is a set of constraints on S1, S2
– frequent constraints freq(Si)
– trans(TID, Itemset), iteminfo(Item, Type, Price)
– S.price  100 : all items in S are of price less
than of equal to $100
– {snacks, sodas}  S.Type
1998년 8월 7일
Data Engineering Lab
성 유진
9
• CAQ Examples
– {(S1, S2) | S1 Item & S2  Item & count(S1) = 1 & count(S2)
= 1 & freq(S1) & freq(S2)}
• S1.Type  S2.Type   and max(S1.Price)  avg(S2.Price)
– {(S1, S2) | agg1(S1.Price)  100 & agg2(S2.Price  1000}
– {(S1, S2) | S1.Type  {Snacks} & S2.Type  {beers} &
max(S1.Price)  min(S2.Price)
• Sound/Complete
– algorithm is sound if it only finds frequent sets that satisfy the
given constraints
– algorithm is complete if all frequent sets satisfying the given
constraints are found
1998년 8월 7일
Data Engineering Lab
성 유진
10
• Goal
– to push the constraints as deeply as possible
inside the computation of frequent set
– classical algorithm + test them for constraint
satisfaction => too inefficient
– sound/complete : anti-monotone, succinctness
1998년 8월 7일
Data Engineering Lab
성 유진
11
Anti-Monotone Constraints
• Find constraints which satisfy anti-monotone
– prune away a significant num of candidates
• Definition
– A 1-var constraint C is anti-monotone iff for all sets
S, S’:
• S  S’ & S satisfies C  S’ satisfies C
• Identify which constraints are anti-monotone
– Fig3
– min(S)  v (anti-monotone) , min(S)  v (not )
1998년 8월 7일
Data Engineering Lab
성 유진
12
1998년 8월 7일
Data Engineering Lab
성 유진
13
Succinct Constraints
• once-and-for-all (before any iteration takes place)
– not generate and test paradigm
– how to
• succinctness
• member generating functions
– definition
• SATc(Item) : the set of item sets satisfying C , pruned space
– C1  S.Price 100 , pruned space for C1 contains only item sets
such that each item in the set has a price at least $100
• selection predicate, p
1998년 8월 7일
Data Engineering Lab
성 유진
14
1998년 8월 7일
Data Engineering Lab
성 유진
15
Example
 C1  S.Price 100 , let Item1 = price 100 (Item):
 C1 is succinct because its pruned space SATc1(Item)
is simply 2item1
 C2  {snacks, sodas}  S.Type : Let Item2, Item3 , Item4
be the sets type = ‘snacks’(Item), type = ‘sodas’(Item) , type  ‘snacks’ 
type  ‘sodas’
(Item)
 C2 is succint SATC2(Item) can be expressed as
2item - 2item2 - 2item3 - 2item4 - 2item2  item4 - 2item3  item4
1998년 8월 7일
Data Engineering Lab
성 유진
16
 Example
C1  S.Price 100, MGF = {X |X  Item1 & C   }
C2  {snacks, sodas}  S.Type, MGF = {X1 X2  X3|
X1 Item2 & X1   & X2 Item3 & X2   & X3 Item4}
1998년 8월 7일
Data Engineering Lab
성 유진
17
Algorithms
• Algorithm Apriori+
– computes the frequent set => among frequent set, those
which satisfy constraints become answer set
• Algorithm Hybrid(m)
– in case (C - Cfreq ) is more selective , apriori+ is
inefficient
– First check Cfreq for m iterations
– to reduce the remaining I/O cost, it switches to
checking (C- Cfreq)
1998년 8월 7일
Data Engineering Lab
성 유진
18
1998년 8월 7일
Data Engineering Lab
성 유진
19
CAP algorithm
• 4 Cases
 succinct and Anti-monotone
– Replace C1 in the Apriori Algorithm by C1c
 succinct but not anti-monotone
1998년 8월 7일
Data Engineering Lab
성 유진
20
 Anti-monotone but Non-succinct
– Define Ck as in apriori algorithm, drop the candidates S if S fails C
– constraint satisfaction is tested before counting is done
 neither
–
Induce any weaker constraint C’ from C, depending on whether
C’ is anti-monotone and /or sucinct, use the above strategies
– Once all frequent sets are generated, test them for satisfaction of C
1998년 8월 7일
Data Engineering Lab
성 유진
21
1998년 8월 7일
Data Engineering Lab
성 유진
22