Download Data Mining Concept Animation Library

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining
Concept Animation Library
By
Nisarg Shah
Project Advisor
Dr. Meiliu Lu
Department of Computer Science
California State University, Sacramento
Spring 2008
Agenda
z
z
z
z
z
z
z
z
z
z
2
Motivation
Scope of the Project
Background Knowledge
Approach
Implementation Details
Demo
Lessons Learned / Reinforced
Future work
References
Q&A / Feedback
3/10/2008
Data Mining Concept Animation
Library
Motivation
z
CSc177 – Data Mining courseware
–
–
z
Idea from another Masters project
–
z
Operating System Concept Animation Library [2]
Own experience
–
–
3
Another students’ work in CSc212 courseware [1]
Something that helps students learning the material better
Having gone through pain of understanding Data Mining
algorithms myself
Not much of Graphical & Interactive stuff available outside
3/10/2008
Data Mining Concept Animation
Library
Scope of the Project
z
Data Mining Concept Animation Library
–
A collection of Data Mining algorithms with graphical and
interactive user interface
–
Basic idea: Students can learn, understand and compare
different algorithms
–
Plenty of algorithms and impossible to cover all of them
–
Just a start …
z
z
4
Apriori algorithm
Frequent Patten (FP) Growth algorithm
3/10/2008
Data Mining Concept Animation
Library
Background Knowledge
Apriori Algorithm
z
Apriori pruning principle:
–
z
5
If there is any itemset which is infrequent, its superset
should not be generated/tested!
Method:
–
Initially, scan DB once to get frequent 1-itemset
–
Generate length (k+1) candidate itemsets from length k frequent
itemsets
z
Step 1: self-joining Lk
z
Step 2: pruning
–
Test the candidates against DB
–
Terminate when no frequent or candidate set can be generated
3/10/2008
Data Mining Concept Animation
Library
Background Knowledge (2)
Apriori Example
Database (Supmin = 2)
Tid
10
20
30
40
L2
Items
A, C, D
B, C, E
A, B, C, E
B, E
Itemset
{A, C}
{B, C}
{B, E}
{C, E}
C3
6
C1
1st scan
sup
2
2
3
2
Itemset
{B, C, E}
C2
Itemset
{A}
{B}
{C}
sup
2
3
3
{D}
{E}
1
3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
3rd scan
L1
sup
1
2
1
2
3
2
L3
Itemset
{A}
{B}
{C}
{E}
C2
nd
2 scan
Itemset
{B, C, E}
sup
2
sup
2
3
3
3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Background Knowledge (3)
Frequent Pattern Growth Algorithm
z
Mining Frequent Patterns Without Candidate Generation
z
Two-step method
–
Construct FP-tree from a Transaction Database
z
z
z
–
Find Patterns Having P From P-conditional Database
z
z
z
7
Scan DB once, find frequent 1-itemset (single item pattern)
Sort frequent items in frequency descending order, f-list
Scan DB again, construct FP-tree
Starting at the frequent item header table in the FP-tree
Traverse the FP-tree by following the link of each frequent item p
Accumulate all of transformed prefix paths of item p to form p’s
conditional pattern base
3/10/2008
Data Mining Concept Animation
Library
Background Knowledge (4)
FP Growth Example – Step 1
Construct FP-tree from a Transaction Database
TID
Items bought
Header Table
100
200
300
400
500
{f, a, c, d, g, i, m, p}
{a, b, c, f, l, m, o}
{b, f, h, j, o, w}
{b, c, k, s, p}
{a, f, c, e, l, p, m, n}
Item frequency head
f
4
c
4
a
3
b
3
m
3
p
3
(ordered) frequent items
100 {f, c, a, m, p}
200 {f, c, a, b, m}
min_support = 3
300 {f, b}
400 {c, b, p}
F-list=f-c-a-b-m-p
500 {f, c, a, m, p}
8
3/10/2008
{}
f:4
c:3
c:1
b:1
a:3
b:1
p:1
m:2
b:1
p:2
m:1
Data Mining Concept Animation
Library
Background Knowledge (5)
FP Growth Example – Step 2
Find Patterns Having P From P-conditional Database
{}
Conditional pattern bases
Header Table
Item frequency head
f
4
c
4
a
3
b
3
m
3
p
3
9
f:4
c:3
c:1
b:1
a:3
p:1
m:2
b:1
p:2
m:1
3/10/2008
b:1
item
cond. pattern base
c
f:3
a
fc:3
b
fca:1, f:1, c:1
m
fca:2, fcab:1
p
fcam:2, cb:1
Data Mining Concept Animation
Library
Approach
z
Gathering requirements
–
–
–
z
In-depth understanding of the two algorithms
Input from students/professor (couldn’t really do this in
detail)
Choosing the appropriate tools for implementation
Feasibility test
–
–
–
Starting point – Algorithm pseudo code
Console based application
Dynamic algorithms in nature
z
10
With configurable files for input parameters and transaction
details
3/10/2008
Data Mining Concept Animation
Library
Approach (2)
z
GUI Application
–
Choosing right technology
–
Choosing right components
–
Setting limitations (max items, max transactions)
–
Simple and consistent background
–
Multiple algorithms under different tabs
z
11
User can match output against each other
3/10/2008
Data Mining Concept Animation
Library
Implementation Details
z
12
Why Applets?
–
An Applet is an program written in Java programming language
that can be included in an HTML page
–
Works well with Java technology-enabled web browser
–
It can run at a comparable speed to other compiled languages
such as C++, but many times faster than JavaScript
–
It can move the work from the server to the client, making a web
solution more scalable with the number of users/clients
3/10/2008
Data Mining Concept Animation
Library
Implementation Details (2)
z
Problem - Counting support of Candidates
–
z
How many times a particular candidate itemset (of
any length) appears in the transaction table?
Solution
–
–
All possible candidate itemsets are generated and
calculated only once – at the beginning
Candidate itemsets are stored in a Hash
table(key,val) pair
z
13
Key=candidate itemset; val=count
3/10/2008
Data Mining Concept Animation
Library
Implementation Details (3)
z
Use of third party tool
z
Combination generator [4]
–
–
Generates all possible combinations of given size
for given itemset
Example: Given Itemset {A,B,C,D}
z
14
All possible 3-itemsets: {A,B,C}, {A,B,D}, {A,C,D},
{B,C,D}
3/10/2008
Data Mining Concept Animation
Library
Implementation Details (4)
z
Choosing between components
–
–
15
Purpose: display and edit regular 2-D tables of cells
Options: JTable Vs JEditorPane
z
JTable
– Rendering problem:
refreshing screen on
detecting table selections
z
JEditorPane
– With the use of
JCheckboxes
3/10/2008
Data Mining Concept Animation
Library
Implementation Details (5)
z
Problem: How to display FP-tree?
z
Solution: JGraph
–
16
Takes the description of a graph as
input, and produces a graph display
on the standard output
3/10/2008
Data Mining Concept Animation
Library
Lessons Learned/Reinforced
17
z
Don’t procrastinate (specially if you’re working fulltime)
z
Clear understanding of Apriori & FP growth algorithms
z
A good programming experience with Java and Applets
z
Demonstrations/graphical tools are useful for explaining
concepts
z
Simple project idea can be meaningful
3/10/2008
Data Mining Concept Animation
Library
Future Work
18
z
Get feedback and recommendations from potential
users (students & instructor)
z
An open source library
Data Mining Concept Animation Library
z
A comparison between multiple algorithms on same
set of data
z
Potential idea for a course project or
Bachelors/Masters project
3/10/2008
Data Mining Concept Animation
Library
References
1.
XML Data Representation and Transformations for
Bioinformatics
http://athena.ecs.csus.edu/~woodsk/courseware/
2.
Operating System Concept Animation Library
http://gaia.ecs.csus.edu/%7Ezhangd/oscal/oscal.htm
3.
Data Mining Concepts and Techniques book
by Jiawei Han and Micheline Kamber
4.
Combination generator tool
http://www.merriampark.com/comb.htm
19
3/10/2008
Data Mining Concept Animation
Library
Q&A / Feedback
z
z
Questions ??
Feedback
– Useful or not?
– Any recommendations / suggestions?
– Which algorithms/concepts from current
material you would like to be implemented
in such a tool?
Thank you!!!
20
3/10/2008
Data Mining Concept Animation
Library
Related documents