Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part I Data Mining Fundamentals Data Mining: A First View Chapter 1 1.1 Data Mining: A Definition Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned. Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process. Data Mining: A KDD Process Pattern Evaluation – Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases Selection 1.2 What Can Computers Learn? Four Levels of Learning • Facts • Concepts • Procedures (to be worked out) • Principles Concepts Computers are good at learning concepts. Concepts are the output of a data mining session. Three Concept Views • Classical View (Crisp)---old hands –As a definition • Probabilistic View (85%)---with some experience –DM rules with confidence • Exemplar View (CBR)—new comer •An illustrated example: –good credit? Supervised Learning • Build a learner model using data instances of known origin. • Use the model to determine the outcome new instances of unknown origin. Supervised Learning: A Decision Tree Example Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. Table 1.1 • Hypothetical Training Data for Disease Diagnosis Patient ID# Sore Throat Fever Swollen 扁桃腺腫脹 淋巴腺 Congestion Headache Diagnosis Yes Yes 鏈球菌性喉炎 Strep Yes No No No No No Yes Yes Yes throat Allergy Cold Strep throat Cold Allergy Strep throat Allergy Cold Cold Glands 1 2 3 4 5 6 7 8 9 10 Yes No Yes Yes No No No Yes No Yes Yes No Yes No Yes No No No Yes Yes Yes No No Yes No No Yes No No No Yes Yes No Yes Yes No Yes Yes Yes Swollen Glands No Yes Diagnosis = Strep Throat Fever No Diagnosis = Allergy Yes Diagnosis = Cold Figure 1.1 A decision tree for the data in Table 1.1 Table 1.2 • Data Instances with an Unknown Classification Patient ID# Sore Throat 11 12 13 No Yes No Fever Swollen Glands Congestion Headache Diagnosis No Yes No Yes No No Yes No No Yes Yes Yes ?ANS=strep throat ? ANS=cold ? ANS=allergy Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy Unsupervised Clustering A data mining method that builds models from data without predefined classes. Table 1.3 • Acme Investors Incorporated Customer Account 保證金 I(融 Transaction Trades/ Favorite Annual 資操 作)Margin ID 1005 1013 1245 2110 1001 Type Joint Custodial Joint Individual Individual Account Method Month Sex Age Recreation Income No No No Yes Yes Online Broker Online Broker Online 12.5 0.5 3.6 22.3 5.0 F F M M M 30–39 50–59 20–29 30–39 40–49 Tennis Skiing Golf Fishing Golf 40–59K 80–99K 20–39K 40–59K 60–79K 3 groups formed (table 1.3 is only a part of whole table) G1.MarginAccount=yes and age =20-29 and AnnualIncome=40-59k accuracy=80% coverage=0.5 G2. AccountType=Custodial and FavoriteRecreation=Skiing and AnnualIncome=40-59k accuracy=95% coverage=0.35 G3.AccountType=joint and Trades/Month>5 and TransactionMethod=online accuracy=82% coverage=0.65 1.3 Is Data Mining Appropriate for My Problem? Data Mining or Data Query? • Shallow Knowledge (SQL) • Multidimensional Knowledge (OLAP) • Hidden Knowledge (DM) • Deep Knowledge (human) Data Mining vs. Data Query: An Example • Use data query if you already almost know what you are looking for. • Use data mining to find regularities in data that are not obvious. 1.4 Expert Systems or Data Mining? 圖14-2 專家系統架構細部圖 Expert System A computer program that emulates the problem-solving skills of one or more human experts. Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge. Data Data Mining Tool If Swollen Glands = Yes Then Diagnosis = Strep Throat Human Expert Knowledge Engineer Expert System Building Tool If Swollen Glands = Yes Then Diagnosis = Strep Throat Figure 1.2 Data mining vs. expert systems 1.5 A Simple Data Mining Process Model Operational Database Data Warehouse SQL Queries Data Mining Interpretation & Evaluation Figure 1.3 A simple data mining process model Result Application Assembling the Data • The Data Warehouse • Relational Databases and Flat Files Mining the Data Interpreting the Results Result Application 1.6 Why Not Simple Search? • Nearest Neighbor Classifier (i.e., CBA, add a new instance in a class based on similarity) –Time consuming and entropy independent • K-nearest Neighbor Classifier –Form a class consisting of K-nearest neighbors Assignment 4 Table 1.1 • Hypothetical Training Data for Disease Diagnosis Patient ID# Sore Throat Swollen 扁桃腺腫脹 淋巴腺 Congestion Headache Diagnosis Fever Glands 1 Yes Yes Yes Yes Yes 鏈球菌性喉炎 Strep 2 3 4 5 6 7 8 9 10 No Yes Yes No No No Yes No Yes No Yes No Yes No No No Yes Yes No No Yes No No Yes No No No Yes Yes No Yes Yes No Yes Yes Yes Yes No No No No No Yes Yes Yes throat Allergy Cold Strep throat Cold Allergy Strep throat Allergy Cold Cold A new instance, Patient ID=14, Sore Throat=yes, Fever =No, Swollen Glands=No, Congestion =No, Headache =No Comparison: with one matched attribute: ID=1,9 with one matched attribute: ID=2,5,10 with one matched attribute: ID=3,6,7,8 with one matched attribute: ID=4strep throat? Correct diagnosis should be allergy using decision tree Q: Try K-nearest Neighbor Classifier 1.7 Data Mining Applications Customer Intrinsic Value _ _ _ _ _ _ _ Intrinsic (Predicted) Value _ _ X X _ _ X X X X X Actual Value Figure 1.4 Intrinsic vs. actual customer value X X