Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
An Excel-based Data Mining Tool Chapter 4 4.1 The iData Analyzer Interface Data PreProcessor Large Dataset Yes Heuristic Agent No Mining Technique Neural Networks Explaination No ESX Yes Generate Rules Yes RuleMaker No Report Generator Excel Sheets Figure 4.1 The iDA system architecture Rules Figure 4.2 A successful installation 4.2 ESX: A Multipurpose Tool for Data Mining Root Level Concept Level Instance Level Root C1 I11 I12 . . . C2 I1j I21 I22 . . . I2k Figure 4.3 An ESX concept hierarchy ... Cn In1 In2 . . . Inl 4.3 iDAV Format for Data Mining Table 4.1 • Credit Card Promotion Database: iDAV Format Income Range Magazine Promotion Watch Promotion Life Insurance Promotion Credit Card Insurance Sex Age C I 40–50K 30–40K 40–50K 30–40K 50–60K 20–30K 30–40K 20–30K 30–40K 30–40K 40–50K 20–30K 50–60K 40–50K 20–30K C I Yes Yes No Yes Yes No Yes No Yes Yes No No Yes No No C I No Yes No Yes No No No Yes No Yes Yes Yes Yes Yes No C I No Yes No Yes Yes No Yes No No Yes Yes Yes Yes No Yes C I No No No Yes No No Yes No No No No No No No Yes C I Male Female Male Male Female Female Male Male Male Female Female Male Female Male Female R I 45 40 42 43 38 55 35 27 43 41 43 29 39 55 19 Table 4.2 • Values for Attribute Usage Character I U D O Usage The attribute is used as an input attribute. The attribute is not used. The attribute is not used for classification or clustering, but attribute value summary information is displayed in all output reports. The attribute is used as an output attribute. For supervised learning w ith ESX, exactly one categorical attribute is selected as the output attribute. 4.4 A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules Step 1: Enter The Data To Be Mined Figure 4.4 The Credit Card Promotion Database Step 2: Perform A Data Mining Session Figure 4.5 Unsupervised settings for ESX Figure 4.6 RuleMaker options Step 3: Read and Interpret Summary Results • Class Resemblance Scores • Domain Resemblance Score • Domain Predictability Figure 4.8 Summery statistics for the Acme credit card promotion database Figure 4.9 Statistics for numerical attributes and common categorical Step 4: Read and Interpret Individual Class Results • Class Predictability is a within-class measure. (=1 for necessary condition) • Class Predictiveness is a between-class measure. (=1 for sufficient condition) Class: 1 Total Number of Instances: 3 Class Resemblance Score: 0.81 Most Typical Instances: Least Typical Instances: Categorical Attribute Summary: Income Range Magazine Promo Watch Promo Life Ins Promo Credit Card Ins. "40-50,000" Yes No No No "40-50,000" No No No No Income Range Magazine Promo Watch Promo Life Ins Promo Credit Card Ins. "40-50,000" No No No No "30-40,000" Yes No No No Predictabil Predictiven Frequency ity ess Name Value Income Range "40-50,000" 2 0.67 0.50 "30-40,000" 1 0.33 0.20 Yes 2 0.67 0.25 No 1 0.33 0.14 Magazine Promo between-class measure. Sex Male Male Age Typicality 45 0.86 42 0.79 Sex Male Male Age Typicality 42 0.79 43 0.79 Class: 2 Total Number of Instances: 5 Class Resemblance Score: 0.53 Most Typical Instances: Least Typical Instances: Categorical Attribute Summary: Income Range Magazine Promo Watch Promo Life Ins Promo Credit Card Ins. "20-30,000" No Yes No No "20-30,000" No Yes Yes No Income Range Magazine Promo Watch Promo Life Ins Promo Credit Card Ins. "40-50,000" No Yes No No "20-30,000" No No Yes Yes Predictabil Predictiven Frequency ity ess Name Value Income Range "20-30,000" 4 0.80 1.00 "40-50,000" 1 0.20 0.25 No 5 1.00 0.71 Magazine Promo Sex Male Male Age Typicality 27 0.64 29 0.57 Sex Male Femal e Age Typicality 55 0.50 19 0.39 Figure 4.10 Class 3 summary results Figure 4.11 Necessary and sufficient attribute values for Class 3 Step 5: Visualize Individual Class Rules Figure 4.7 Rules for the credit card promotion database 4.5 A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules Read and Interpret Test Set Results Figure 4.12 Test set instance classification 4.6 Techniques for Generating Rules All rules or covering set rules 1. 2. 3. 4. 5. Ref. Figure 4.6 Define the scope of the rules. Choose the instances. Set the minimum rule correctness. Define the minimum rule coverage. Choose an attribute significance value 4.7 Instance Typicality Typicality Scores • Identify prototypical and outlier instances. • Select a best set of training instances. • Used to compute individual instance classification confidence scores. Figure 4.13 Instance typicality 4.8 Special Considerations and Features • Avoid Mining Delays • The Quick Mine Feature • Erroneous and Missing Data