Download Data Mining A Tutorial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An Excel-based Data Mining Tool
Chapter 4
4.1 The iData Analyzer
Interface
Data
PreProcessor
Large
Dataset
Yes
Heuristic
Agent
No
Mining
Technique
Neural
Networks
Explaination
No
ESX
Yes
Generate
Rules
Yes
RuleMaker
No
Report
Generator
Excel
Sheets
Figure 4.1 The iDA system
architecture
Rules
Figure 4.2 A successful installation
4.2 ESX: A Multipurpose Tool for
Data Mining
Root Level
Concept Level
Instance Level
Root
C1
I11 I12 . . .
C2
I1j
I21 I22 . . . I2k
Figure 4.3 An ESX concept hierarchy
...
Cn
In1 In2 . . .
Inl
4.3 iDAV Format for Data Mining
Table 4.1 • Credit Card Promotion Database: iDAV Format
Income
Range
Magazine
Promotion
Watch
Promotion
Life Insurance
Promotion
Credit Card
Insurance
Sex
Age
C
I
40–50K
30–40K
40–50K
30–40K
50–60K
20–30K
30–40K
20–30K
30–40K
30–40K
40–50K
20–30K
50–60K
40–50K
20–30K
C
I
Yes
Yes
No
Yes
Yes
No
Yes
No
Yes
Yes
No
No
Yes
No
No
C
I
No
Yes
No
Yes
No
No
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
C
I
No
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
Yes
Yes
No
Yes
C
I
No
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Yes
C
I
Male
Female
Male
Male
Female
Female
Male
Male
Male
Female
Female
Male
Female
Male
Female
R
I
45
40
42
43
38
55
35
27
43
41
43
29
39
55
19
Table 4.2 • Values for Attribute Usage
Character
I
U
D
O
Usage
The attribute is used as an input
attribute.
The attribute is not used.
The attribute is not used for classification or clustering, but
attribute value summary information is displayed in all output
reports.
The attribute is used as an output attribute. For supervised learning
w ith ESX, exactly one categorical attribute is selected as the output
attribute.
4.4 A Five-step Approach for
Unsupervised Clustering
Step 1: Enter the Data to be Mined
Step 2: Perform a Data Mining Session
Step 3: Read and Interpret Summary Results
Step 4: Read and Interpret Individual Class Results
Step 5: Visualize Individual Class Rules
Step 1: Enter The Data To Be Mined
Figure 4.4 The Credit Card
Promotion Database
Step 2: Perform A Data Mining
Session
Figure 4.5 Unsupervised settings for
ESX
Figure 4.6 RuleMaker options
Step 3: Read and Interpret
Summary Results
• Class Resemblance Scores
• Domain Resemblance Score
• Domain Predictability
Figure 4.8 Summery statistics for the
Acme credit card promotion database
Figure 4.9 Statistics for numerical
attributes and common categorical
Step 4: Read and Interpret
Individual Class Results
• Class Predictability is a within-class measure.
(=1 for necessary condition)
• Class Predictiveness is a between-class
measure. (=1 for sufficient condition)
Class:
1
Total Number of Instances:
3
Class Resemblance Score:
0.81
Most Typical Instances:
Least Typical Instances:
Categorical Attribute Summary:
Income Range
Magazine
Promo
Watch
Promo
Life Ins
Promo
Credit Card
Ins.
"40-50,000"
Yes
No
No
No
"40-50,000"
No
No
No
No
Income Range
Magazine
Promo
Watch
Promo
Life Ins
Promo
Credit Card
Ins.
"40-50,000"
No
No
No
No
"30-40,000"
Yes
No
No
No
Predictabil Predictiven
Frequency
ity
ess
Name
Value
Income Range
"40-50,000"
2
0.67
0.50
"30-40,000"
1
0.33
0.20
Yes
2
0.67
0.25
No
1
0.33
0.14
Magazine
Promo
between-class measure.
Sex
Male
Male
Age
Typicality
45
0.86
42
0.79
Sex
Male
Male
Age
Typicality
42
0.79
43
0.79
Class:
2
Total Number of Instances:
5
Class Resemblance Score:
0.53
Most Typical Instances:
Least Typical Instances:
Categorical Attribute Summary:
Income Range
Magazine
Promo
Watch
Promo
Life Ins
Promo
Credit Card
Ins.
"20-30,000"
No
Yes
No
No
"20-30,000"
No
Yes
Yes
No
Income Range
Magazine
Promo
Watch
Promo
Life Ins
Promo
Credit Card
Ins.
"40-50,000"
No
Yes
No
No
"20-30,000"
No
No
Yes
Yes
Predictabil Predictiven
Frequency
ity
ess
Name
Value
Income Range
"20-30,000"
4
0.80
1.00
"40-50,000"
1
0.20
0.25
No
5
1.00
0.71
Magazine
Promo
Sex
Male
Male
Age
Typicality
27
0.64
29
0.57
Sex
Male
Femal
e
Age
Typicality
55
0.50
19
0.39
Figure 4.10 Class 3 summary results
Figure 4.11 Necessary and sufficient
attribute values for Class 3
Step 5: Visualize Individual Class
Rules
Figure 4.7 Rules for the credit card
promotion database
4.5 A Six-Step Approach for
Supervised Learning
Step 1: Choose an Output Attribute
Step 2: Perform the Mining Session
Step 3: Read and Interpret Summary Results
Step 4: Read and Interpret Test Set Results
Step 5: Read and Interpret Class Results
Step 6: Visualize and Interpret Class Rules
Read and Interpret Test Set Results
Figure 4.12 Test set instance
classification
4.6 Techniques for Generating Rules
All rules or covering set rules
1.
2.
3.
4.
5.
Ref. Figure 4.6
Define the scope of the rules.
Choose the instances.
Set the minimum rule correctness.
Define the minimum rule coverage.
Choose an attribute significance value
4.7 Instance Typicality
Typicality Scores
• Identify prototypical and outlier instances.
• Select a best set of training instances.
• Used to compute individual instance
classification confidence scores.
Figure 4.13 Instance typicality
4.8 Special Considerations and
Features
• Avoid Mining Delays
• The Quick Mine Feature
• Erroneous and Missing Data
Related documents