Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An Excel-based Data Mining Tool Chapter 4 1 4.1 The iData Analyzer 2 Interface Data PreProcessor Large Dataset Yes Heuristic Agent No Mining Technique Neural Networks Explaination No ESX Yes Generate Rules Yes RuleMaker Rules No Report Generator Excel Sheets 3 4 4.2 ESX: A Multipurpose Tool for Data Mining 5 ESX • Supports supervised learning and unsupervised clustering • Does not make statistical assumptions • Deal with missing attribute values • Applied to categorical and numerical data • Point out inconsistencies and unusual values 6 • For supervised classification, ESX can determine those instances and attributes best able to classify new instances • For unsupervised clustering, ESX incorporates a globally optimizing evaluation function that encourages a best instance clustering 7 Root Level Concept Level Instance Level Root C1 I11 I12 . . . C2 I1j I21 I22 . . . I2k ... Cn In1 In2 . . . Inl 8 4.3 iDAV Format for Data Mining 9 Table 4.1 • Credit Card Promotion Database: iDAV Format Income Range Magazine Promotion Watch Promotion Life Insurance Promotion Credit Card Insurance Sex Age C I 40–50K 30–40K 40–50K 30–40K 50–60K 20–30K 30–40K 20–30K 30–40K 30–40K 40–50K 20–30K 50–60K 40–50K 20–30K C I Yes Yes No Yes Yes No Yes No Yes Yes No No Yes No No C I No Yes No Yes No No No Yes No Yes Yes Yes Yes Yes No C I No Yes No Yes Yes No Yes No No Yes Yes Yes Yes No Yes C I No No No Yes No No Yes No No No No No No No Yes C I Male Female Male Male Female Female Male Male Male Female Female Male Female Male Female R I 45 40 42 43 38 55 35 27 43 41 43 29 39 55 19 10 Table 4.2 • Values for Attribute Usage Character I U D O Usage The attribute is used as an input attribute. The attribute is not used. The attribute is not used for classification or clustering, but attribute value summary information is displayed in all output reports. The attribute is used as an output attribute. For supervised learning w ith ESX, exactly one categorical attribute is selected as the output attribute. 11 4.4 A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules 12 Step 1: Enter The Data To Be Mined 13 14 Step 2: Perform A Data Mining Session 15 16 17 Step 3: Read and Interpret Summary Results • Class Resemblance Scores • Domain Resemblance Score –Attributes, instances, no model • Domain Predictability 18 19 20 Step 4: Read and Interpret Individual Class Results • Class Predictability is a withinclass measure. • Class Predictiveness is a between- class measure. 21 22 23 Step 5: Visualize Individual Class Rules 24 25 4.5 A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules 26 Read and Interpret Test Set Results 27 4.6 Techniques for Generating Rules • 1. Choose an attribute • 2. use the attribute to subdivide instances into classes • 3. – if the instances in the subclass satisfy a predefined criteria, generate a defining rule – If not, repeat 1 28 4.6 Techniques for Generating Rules 1. 2. 3. 4. 5. Define the scope of the rules. Choose the instances. Set the minimum rule correctness. Define the minimum rule coverage. Choose an attribute significance value. 29 30 4.7 Instance Typicality 31 Typicality Scores • Identify prototypical and outlier instances. • Select a best set of training instances. • Used to compute individual instance classification confidence scores. 32 33 4.8 Special Considerations and Features • Avoid Mining Delays • The Quick Mine Feature • Erroneous and Missing Data 34