Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda Outline • Brief introduction to data mining • Definition • Objective • Application • Decision tree What is Data Mining? • Process of automatically finding the relationships and patterns, and extracting the meaning of enormous amount of data. • Also called “knowledge discovery” Objective • Extracting the hidden, or not easily recognizable knowledge out of the large data… Know the past • Predicting what is likely to happen if a particular type of event occurs … Predict the future Application • Marketing example • Sending direct mail to randomly chosen people • Database of recipients’ attribute data (e.g. gender, marital status, # of children, etc) is available • How can this company increase the response rate of direct mail? Application (Cont’d) • Figure out the pattern, relationship of attributes that those who responded has in common • Helps making decision of what kind of group of people the company should target • Data mining helps analyzing large amount of data, and making decision…but how exactly does it work? • One method that is commonly used is decision tree Decision Tree • One of many methods to perform data mining - particularly classification • Divides the dataset into multiple groups by evaluating attributes • Decision tree can be explained a series of nested if-then-else statements. Decision Tree (Cont’d) • Each non-leaf node has a predicate associated, testing an attribute of data • Leaf node represents a class, or category • To classify a data, start from root node and traverse down the tree by testing predicates and taking branches Example of Decision Tree Advantages of Decision Tree • Easy to visualize the process of classification • Can easily tell why the data is classified in a particular category - just trace the path to get to the leaf and it explains the reason • Simple, fast processing • Once the tree is made, just traverse down the tree to classify the data Decision Tree is for… • Classifying the dataset which • The predicates return discrete values • Does not have an attributes that all data has the same value