Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Tri Nguyen Agenda      Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples Data Mining and KDD Putting the results in practical use What is Data Mining?  “the automated extraction of hidden predictive information from large databases”  Algorithms produce patterns, rules  Predict future trends/behavior  Used to make business decisions Classification    Items belong to classes Given past items’ classification, predict class of new item Example: Issuing credit cards   Use information: income, educational background, age, current debts Credit worthiness: Bad, good, excellent Decision Tree Classifiers    Internal Node has predicate Leaf node is class To classify instance    Start at root node Traverse tree until reach leaf node Each internal node, make decision Credit Risk Decision Tree Decision Tree Construction  Some Definitions   Purity: > # instances of each leaf belonging to only 1 class means > purity Best Split: split giving the maximum information gain ratio (info gain/info content)  Choose attribute and condition resulting in maximum purity Decision Tree Construction Association Rules  antecedent  consequent      if  then beer  diaper (Walmart) economy bad  higher unemployment Higher unemployment  higher unemployment benefits cost Rules associated with population, support, confidence Association Rules   Population: instances such as grocery store purchases Support   % of population satisfying antecedent and consequent Confidence  % consequent true when antecedent true Association Rules  Population    Support (MS)= 3/6   MS, MSA, MSB, MA, MB, BA M=Milk, S=Soda, A=Apple, B=beer (MS,MSA,MSB)/(MS,MSA,MSB,MA,MB, BA) Confidence (MS) = 3/5  (MS, MSA, MSB) / (MS,MSA,MSB,MA,MB) Clustering  “The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables.” Clustering     Birch Algorithm points inserted into multidimensional tree items guided to leaf nodes "near" representative internal nodes nearby points clustered into one leaf node Clustering   Example of Clustering predict what new movies a person is interested in    1) a person’s past movie preferences 2) others with similar preferences 3) preferences of those in the pool for new movies Clustering    1) cluster people with similar movie preferences 2) given a new movie goer, find a cluster of similar movie goers 3) then predict the cluster's new movie preferences Amazon Examples Amazon Examples Amazon Examples Amazon Examples References    http://www.thearling.com/text/dmwhite/dmwhite.htm http://www.cse.ohio-state.edu/~srini/694Z/part1.ppt http://www-aig.jpl.nasa.gov/public/kdd95/tutorials/IJCAI95tutorial.html