Download Framework Unifying Association Rule Mining, Clustering and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Framework Unifying Association
Rule Mining, Clustering and
Classification
Anne Denton and William Perrizo
Dept of Computer Science North Dakota
State University
Introduction

The fundamental concept of a partition links almost all knowledge
discovery and data mining.
– Such fundamental and unifying concepts are very important since there is
such a wide variety of problem domains covered under the general
headings of knowledge discovery and data mining.

The concept of a relation is also at the center of our model.
– The concept of a equivalence relation is central to the understanding of
data patterns through similarity partitioning.
• glues object together
• reflexive, symmetric and transitive relation
– The concept of a comparison relation (partial order relation or hierarchy)
is central to distinguishing similarity patterns.
• distinguishes objects
• irreflexive, and transitive.
Mathematical Foundations of Data Mining:
Relation







.-->Equivalence Relation<-.
|
|
v
v
Function
Closed Undirected Graph
^
^
|
|
`---->Partition<----------'
.->Partial Order Relation<-.
|
|
|
|
|
|
|
|
`->Directed Acyclic Graph<-'
The RELATION, and the restricted notions of equivalence relations
(partitions)and order relations (Partially Ordered Sets or POSets)
are key to Database and Data Mining.
The SELF-RELATION on a set S is a relation on {S,S'} where S'is an
alias for S.
Truely the Relational Model is ubiquitous in DB and DM.
There is no other! Ted Codd was right on the money!
Relations

A relation, R(A1,…,An) with Dom(Ai) = Xi, is the f -1(1)-component
of the pre-image partition generated by a function f:X1…Xn 
{0,1} which assigns 1 if the tuple “exists in the relation” and 0 if it
“does not exist in the relation”
– function pre-images; partitions and equivalence_relations are pair-wise
dual concepts.
• we partition the full Cartesian product of the attribute domains into two
components (the pre-images under f, above) whenever we define a relation.
• Data mining and database querying are a matter of describing the nonrandomness of that partition boundary (if it is non-random).
• Clearly, if f is identically 1, the relation is the entire Cartesian product and
there is no boundary (one extreme.)
• At the other extreme, f is the characteristic function of a singleton set and
there is a clear boundary (clear non-randomness.) (Data mining in this case
is query processing.)
• Data mining can be viewed as finding and describing the non-randomness
of that boundary.
Partitions
The fundamental concept of a partition links almost all knowledge
discovery and data mining.
Such fundamental and unifying concepts are very important since there is
such a wide variety of problem domains covered under the general
headings of knowledge discovery and data mining. For instance, a data
store that tries to analyze shopping behavior would not benefit much from
a machine learning algorithm that allows prediction of one quantity as a
function of some number of other variables. Yet, such an algorithm may
be precisely the right tool for an agricultural producer who wants to
predict yield from the nitrogen and moisture values in his field. We will
show that both problems and their solutions, can be described in the
framework of partitions and generalized database operations.
A LABELED_PARTITION is a Partition in which every component is
assigned a label (from some label space, L)
Graphs

UNDIRECTED GRAPH G=(N,E); N=node set; E=edge set (edge=pair of nodes)
(Directed Graph is just an undirected graph in which each edge is an ordered set of nodes)

SUBGRAPH, G'=(N',E') of G is a graph : N' subset of N and E' subset of E.

GRAPH-PARTITION of G is a set of subgraphs, {G1..Gn}, : their edge sets,
{E1..En}, forms a partition of E.

PATH in G is a subgraph, P=(N',E') such that there exists an ordering of N',
(n1..nk), such that {(n1,n2)..(nk-1,nk)} is a subset of E'

PATH-CONNECTED GRAPH (or just CONNECTED) is a graph : for every
pair of nodes (ni,nj) there is a path in G connecting ni to nj.

CONNECTIVITY PARTITION is the partition into path-components

CLOSED GRAPH is a graph : {n1,n2}, {n2,n3} edges ==> {n1,n3} is an edge.
Equivalently, If there's a path n1  n2, there's edge n1 n2. (note: This means
provides the transitive property to the induced equivalence relation)
Directed Graph

DIRECTED GRAPH: (digraph) G=(N,E) is : N is a set (of nodes).
– E is a set of ordered pairs of nodes.
– A Digraph induces canonical graph by unordering each ordered pair in E.

DIRECTED SUBGRAPH, G'=(N',E') of a Directed Graph, G, is such that N' is a
subset of N E' is a subset of E

PATH in a Directed Graph, G is a subgraph, P=(N',E') : there exists an ordering
of N', (n1..nk), such that {(n1,n2)..(nk-1,nk)} is E'
–
–
–
–
TRIVIAL if ni=nj i, j=1,...,k
SIMPLE if all ni distinct except possibly n1 and/or nk.
CYCLE is simple nontrivial path with n1=nk (simple closed path).
MINIMAL: any cycle nodes ni,nj if (ni,nj) is an edge, it is in the cycle
DIRECTED ACYCLIC GRAPH (DAG) is a digraph containing no cycles.
– SOURCE = node with outgoing but no incoming edges.
– SINK = node with incoming but no outgoing edges.
– ROOTED DAG = dag with unique source.

TRANSITIVE CLOSURE: if G=(N,E) a digraph and G+=(N,E') is : (a,b) in E‘
iff there is a nontrivial path from a to b in G then G+ is the transitive closure of G.
– For every G there is 1 and only 1 transitive closure
• Transitive closure: put (ni,nj) in E' for every non-trivial path in G from ni to nj.
PARTIAL ORDER RELATIONS

PARTIAL_ORDER is a binary Self-Relation, R, on S such that
(s,s) not in R for any s in S
(ir-reflexive)
if (s1,s2) in R then (s2,s1) not in R
(anti-symmetric)
if (s1,s2), (s2,s3) are in R then (s1,s3) is in R
(transitive)

LINEAR ORDER, R, on S is a Partial Order such that for every s1 not equal to s2
in S, either (s1,s2) or (s2,s1) in R (of course never both)

LATTICE ORDER, R, on S is a partial Order on S (and (S,R) is a LATTICE) if
for every s1 not equal to s2 in S there is a s3 in S such that (s1,s3) is in S and (s2,s3)
is in S (every pair has upper bound)

PARTITION LATTICE of S is lattice ordering of all Partitions of S under the
ordering of sub-partitions, where a sub-partition, Q, of a partition, P, is such that
every component of Q is a subset of a component of P (and of course only one).

CONCEPT HIERARCHY on an attribute is Partition Lattice of that attribute
– THE concept hierarchies for that attribute, since any user defined concept hierarchy
based on some domain knowledge, is a sub-Lattice of this Mother Lattice).
CLOSED_UNDIRECTED_GRAPH
DUALITIES on a set S
PARTITION
FUNCTION
EQUIVALENCE_RELATION

The LABELED_EQUIVALENCE_RELATION induces the canonical
LABELED_PARTITION of equivalence components (a, b belong to same
component iff they're equivalent)

The LABELED_PARTITION, P={C1..Cn}, induces the canonical
FUNCTION, g: S{C1, …, Cn} by g(s)=Ci iff s is in Ci (letting Ci be both
the partition component and the label (name) assigned to that component.

The FUNCTION, g: SL induces the canonical CLOSED
UNDIRECTED GRAPH on S with edge set = {(s1, s2) | g(s1)=g(s2)}.

The CLOSED UNIDRECTED GRAPH induces the canonical LABELED
EQUIVALENCE RELATION by s1, s2 are equivalent iff there is a path
connecting them.
Another DUALITY
Partially Ordered Set
Directed Graph

The POset, L= ( S, < ) can be viewed as Directed Acyclic Graph (DAG),
G= (S,E), where N=S and (a,b) in E iff a < b

A DAG, G, can be viewed as POset, L, where S=N and a < b iff ab and
(a,b) is in Closure(E)
– often only non-transitive edges are included, ie, if a < b, b < c we don't
include edge (a,c) since the additional edges cluter the picture.
– Given POSet L, one can diagram it as a DAG using the duality just described.
– However, since a diagram is intended to help visualization, and therefore
should not be cluttered, we generally do not include all edges (don't display
the closure - in fact usually display the minimal dag that corresponds to the
POSet.
Cartesian Products, Star schemas, Histograms

In a RELATION, R(S1,..,Sk,A1,..,An), the S's are dimension (or structure) attributes
and the A's are either features is a subset of the CARTESIAN PRODUCT of all
domains.

The STAR SCHEMA is the normalization of R into
S1(A1,1..A1,n1) S2(A2,1..A2,n2) . . . Sk(Ak,1..Ak,nk)
\
\
/
C(S1..Sk,A(k+1),1..A(k+1),n(k+1))
– where {A1,1..A1,n1, A2,1..A2,n2, Ak,1..Ak,nk, A(k+1),1..A(k+1),n(k+1)} is a
partition of {A1..An} into dimension features and cube measurements or features.

A HISTOGRAM is the rollup of the Cartesian product along the structure attributes
(using some aggregate, usually count) and, possibly, projected onto selected Ai's.
For R(S1, S2, A1) the Cartesian product is:
________________________________________
/
/
/
/
/
/
/
/
/|
A1,7 /
/
/
/
/
/
/
/
/ |
/____/____/____/____/____/____/____/____/ |
/
/
/
/
/
/
/
/
/| |
A1,6 /
/
/
/
/
/
/
/
/ | /|
/____/____/____/____/____/____/____/____/ |/ |
/
/
/
/
/
/
/
/
/| | |
A1,5 /
/
/
/
/
/
/
/
/ | /| /|
/____/____/____/____/____/____/____/____/ |/ |/ |
/
/
/
/
/
/
/
/
/| | | |
A1,4 /
/
/
/
/
/
/
/
/ | /| /| /|
/____/____/____/____/____/____/____/____/ |/ |/ |/ |
/
/
/
/
/
/
/
/
/| | | | |
A1,3 /
/
/
/
/
/
/
/
/ | /| /| /| /|
/____/____/____/____/____/____/____/____/ |/ |/ |/ |/ |
/
/
/
/
/
/
/
/
/| | | | | |
A1,2 /
/
/
/
/
/
/
/
/ | /| /| /| /| /|
/____/____/____/____/____/____/____/____/ |/ |/ |/ |/ |/ |
/
/
/
/
/
/
/
/
/| | | | | | |
A1,1 /
/
/
/
/
/
/
/
/ | /| /| /| /| /| /|
/____/____/____/____/____/____/____/____/ |/ |/ |/ |/ |/ |/ |
/
/
/
/
/
/
/
/
/| | | | | | | |
A1,0 /
/
/
/
/
/
/
/
/ | /| /| /| /| /| /| /|
/S1,0/S1,1/S1,2/S1,3/S1,4/S1,5/S1,6/S1,7/ |/ |/ |/ |/ |/ |/ |/ |
|
|
|
|
|
|
|
|
| | | | | | | | |
S2,0|
|
|
|
|
|
|
|
| /| /| /| /| /| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/ |/ |/ |/ |/ |/
|
|
|
|
|
|
|
|
| | | | | | | /
S2,1|
|
|
|
|
|
|
|
| /| /| /| /| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/ |/ |/ |/ |/
|
|
|
|
|
|
|
|
| | | | | | /
S2,2|
|
|
|
|
|
|
|
| /| /| /| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/ |/ |/ |/
|
|
|
|
|
|
|
|
| | | | | /
S2,3|
|
|
|
|
|
|
|
| /| /| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/ |/ |/
|
|
|
|
|
|
|
|
| | | | /
S2,4|
|
|
|
|
|
|
|
| /| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/ |/
|
|
|
|
|
|
|
|
| | | /
S2,5|
|
|
|
|
|
|
|
| /| /| /
|____|____|____|____|____|____|____|____|/ |/ |/
|
|
|
|
|
|
|
|
| | /
S2,6|
|
|
|
|
|
|
|
| /| /
|____|____|____|____|____|____|____|____|/ |/
|
|
|
|
|
|
|
|
| /
S2,7|
|
|
|
|
|
|
|
| /
|____|____|____|____|____|____|____|____|/
The Star Schema
(A1 as a fact measurement or feature attribute):
________________________________________
/
/
/
/
/
/
/
/|
/
/
/
/
/
/
/
/
/ |
/S1,0/S1,1/S1,2/S1,3/S1,4/S1,5/S1,6/S1,7/ |
|
|
|
|
|
|
|
|
| |
S2,0|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
/
|
|
|
|
|
|
|
|
|
S2,1|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,2|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,3|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,4|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,5|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,6|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| /
S2,7|
|
|
|
|
|
|
|
| /
|____|____|____|____|____|____|____|____|/
A1-values
Star Schema (A1 is a Feature Attribute of S2): (from 1-NF to 3-NF)
________________________________________
/
/
/
/
/
/
/
/
/|
/
/
/
/
/
/
/
/
/ |
/S1,0/S1,1/S1,2/S1,3/S1,4/S1,5/S1,6/S1,7/ |
|
|
|
|
|
|
|
|
| |
S2,0|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
|
S2,1|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,2|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,3|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,4|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,5|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| |
S2,6|
|
|
|
|
|
|
|
| /|
|____|____|____|____|____|____|____|____|/ |
|
|
|
|
|
|
|
|
| /
S2,7|
|
|
|
|
|
|
|
| /
|____|____|____|____|____|____|____|____|/
S2-dimension file:
S2 A1
---+--|0 | 4 |
|1 | 2 |
|2 | 7 |
|3 | 1 |
|4 | 1 |
|5 | 2 |
|6 | 1 |
|7 | 6 |
0/1-values ("relationship existence”)
Histogram Cube counts the number of facts for each value of a cube of feature attributes (of the fact cube
and/or dimension files).
In this case we have just one such, namely A1 so it is a 1-D cube:
_____
/
/|
A1,7 /
/ |
/____/ |
/
/| |
A1,6 /
/ | /
/____/ |/
/
/| /
A1,5 /
/ | /
/____/ |/
/
/| /
A1,4 /
/ | /
/____/ |/
/
/| /
A1,3 /
/ | /
/____/ |/
/
/| /
A1,2 /
/ | /
/____/ |/
/
/| /
A1,1 /
/ | /
/____/ |/
/
/| /
A1,0 /
/ | /
/____/ |/
|
|
|
| /
|____|/
count of facts showing A1=0 value, etc.
Classification (supervised learning)

The set of all partitions under set containment forms a lattice

CLASSIFICATION is choosing a good partition level in the Partition Lattice
of R.

EAGER CLASSIFIER (e.g., a decision tree) is selecting from the Partition
Lattice of T(A1..An,C) (training set), a closed form partition in which ClassCount-Histograms (CCHs) are sufficiently discriminatory in terms of picking
a winner (ie, maximal class is sufficiently more populous than the next
highest).

LAZY CLASSIFIERS (e.g., K-Nearest-Neighbor) focus locally around
unclassified sample, for "neighborhood" (1 partition component) in which the
CCH is sufficiently discriminatory (then discarding that information.
– If the locality is taken as the entire training set (e.g., methods where every training
pt is a neighbor of some degree - depending on distance from the unclassified
sample), then the difference between lazy and eager goes away.
Clustering (unsupervised learning)

Clustering is choosing a partition from the lattice of all
partitions.
– Using a similarity matrix
– Using a distance function
• Actually a pseudo-metric is sufficient
Association Rule Mining

How does ARM relate to Machine Learning (Classification and Clustering)?

Given a lattice universe, ( U , < ) and a TransactionSet, T, which is a subset of U and a "size function,
s:(U,<)  Z which is monotone increasing and a "support function, supp:(U,<)  Z which is monotone
decreasing and a support threshold, minsupp in Z, A u in U is FREQUENT iff supp(u) > minsupp.
APRIORI-type algorithms solve two problems:

"Find all FREQUENT subsets of U" (Exhaustive search is too complex, since the solution space is 2 U)
–
–
–
–

The second question answered by APRIORI is: Find all strong pairs in U = UxU-{(a,c)|ther're no
nonempty common progeny} (disjoint in the case of set containment)
–

prunes the search of the solution space using the fact:
"If u1 is infrequent and u1 < u2, then u2 is infrequent.“ (infrequency pushes up the lattice)
Another useful fact (when searching for infrequent sets?) is
"If u2 is frequent and u1 < u2, then u1 is frequent." (frequency pushes down the lattice)
where "strong" means conf(a,c) > minconf where conf: U  Z is monotone on antecedents, a
or a' < a =>conf( a => u-a ) < conf( a'=> u-a') or
"Find all strong pairs in U2= {(a,u) in UxU| a subset of u subset of U} where "strong" means conf(a,u-a)
> minconf where conf:U2Z is monotone on antecedents, a or a' < a => conf( a=>u-a ) < conf( a'=>ua‘)