Download An Excel-based Data Mining Tool

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An Excel-based Data Mining Tool
Chapter 4
1
4.1 The iData Analyzer
2
Interface
Data
PreProcessor
Large
Dataset
Yes
Heuristic
Agent
No
Mining
Technique
Neural
Networks
Explaination
No
ESX
Yes
Generate
Rules
Yes
RuleMaker
Rules
No
Report
Generator
Excel
Sheets
3
4
4.2 ESX: A Multipurpose Tool
for Data Mining
5
ESX
• Supports supervised learning and
unsupervised clustering
• Does not make statistical assumptions
• Deal with missing attribute values
• Applied to categorical and numerical data
• Point out inconsistencies and unusual
values
6
• For supervised classification, ESX can
determine those instances and attributes
best able to classify new instances
• For unsupervised clustering, ESX
incorporates a globally optimizing
evaluation function that encourages a best
instance clustering
7
Root Level
Concept Level
Instance Level
Root
C1
I11 I12 . . .
C2
I1j
I21 I22 . . . I2k
...
Cn
In1 In2 . . .
Inl
8
4.3 iDAV Format for Data
Mining
9
Table 4.1 • Credit Card Promotion Database: iDAV Format
Income
Range
Magazine
Promotion
Watch
Promotion
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
C
I
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
C
I
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
Yes
No
No
Yes
No
No
C
I
No
Yes
No
Yes
No
No
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
C
I
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
C
I
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
C
I
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
R
I
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
10
Table 4.2 • Values for Attribute Usage
Character
I
U
D
O
Usage
The attribute is used as an input
attribute.
The attribute is not used.
The attribute is not used for classification or clustering, but
attribute value summary information is displayed in all output
reports.
The attribute is used as an output attribute. For supervised learning
w ith ESX, exactly one categorical attribute is selected as the output
attribute.
11
4.4 A Five-step Approach for
Unsupervised Clustering
Step 1: Enter the Data to be Mined
Step 2: Perform a Data Mining Session
Step 3: Read and Interpret Summary Results
Step 4: Read and Interpret Individual Class
Results
Step 5: Visualize Individual Class Rules
12
Step 1: Enter The Data To Be
Mined
13
14
Step 2: Perform A Data Mining
Session
15
16
17
Step 3: Read and Interpret
Summary Results
• Class Resemblance Scores
• Domain Resemblance Score
–Attributes, instances, no model
• Domain Predictability
18
19
20
Step 4: Read and Interpret
Individual Class Results
• Class Predictability is a withinclass measure.
• Class Predictiveness is a
between- class measure.
21
22
23
Step 5: Visualize Individual
Class Rules
24
25
4.5 A Six-Step Approach for
Supervised Learning
Step 1: Choose an Output Attribute
Step 2: Perform the Mining Session
Step 3: Read and Interpret Summary
Results
Step 4: Read and Interpret Test Set
Results
Step 5: Read and Interpret Class Results
Step 6: Visualize and Interpret Class Rules
26
Read and Interpret Test Set
Results
27
4.6 Techniques for Generating
Rules
• 1. Choose an attribute
• 2. use the attribute to subdivide instances
into classes
• 3.
– if the instances in the subclass satisfy a
predefined criteria, generate a defining rule
– If not, repeat 1
28
4.6 Techniques for Generating
Rules
1.
2.
3.
4.
5.
Define the scope of the rules.
Choose the instances.
Set the minimum rule correctness.
Define the minimum rule coverage.
Choose an attribute significance value.
29
30
4.7 Instance Typicality
31
Typicality Scores
• Identify prototypical and outlier
instances.
• Select a best set of training instances.
• Used to compute individual instance
classification confidence scores.
32
33
4.8 Special Considerations
and Features
• Avoid Mining Delays
• The Quick Mine Feature
• Erroneous and Missing Data
34
Related documents