Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SI 654 Database Application Design Winter 2003 Dragomir R. Radev 1 © 2002 by Prentice Hall Data Mining (continued) 2 © 2002 by Prentice Hall arff files @relation weather @data sunny,85,85,FALSE,no @attribute outlook {sunny, overcast, rainy} sunny,80,90,TRUE,no @attribute temperature real overcast,83,86,FALSE,yes @attribute humidity real rainy,70,96,FALSE,yes @attribute windy {TRUE, FALSE} rainy,68,80,FALSE,yes @attribute play {yes, no} rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no 3 © 2002 by Prentice Hall Predictive models • Inputs (e.g., medical history, age) • Output (e.g., will patient experience any side effects) • Some models are better than others 4 © 2002 by Prentice Hall Operating curves success optimal practical random failure most likely 5 least likely © 2002 by Prentice Hall Principles of data mining • Training/test sets • Error analysis and overfitting error test training input size • Cross-validation • Supervised vs. unsupervised methods 6 © 2002 by Prentice Hall Representing data • Vector space credit pay off default salary 7 © 2002 by Prentice Hall Decision surfaces credit pay off default salary 8 © 2002 by Prentice Hall Decision trees credit pay off default salary 9 © 2002 by Prentice Hall Linear boundary credit pay off default salary 10 © 2002 by Prentice Hall kNN models • Assign each element to the closest cluster • Demos: – http://www2.cs.cmu.edu/~zhuxj/courseproject /knndemo/KNN.html 11 © 2002 by Prentice Hall Other methods • • • • 12 Decision trees Neural networks Support vector machines Demos – http://www.cs.technion.ac.il/~rani/ LocBoost/ © 2002 by Prentice Hall arff files @relation weather @data sunny,85,85,FALSE,no @attribute outlook {sunny, overcast, rainy} sunny,80,90,TRUE,no @attribute temperature real overcast,83,86,FALSE,yes @attribute humidity real rainy,70,96,FALSE,yes @attribute windy {TRUE, FALSE} rainy,68,80,FALSE,yes @attribute play {yes, no} rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no 13 © 2002 by Prentice Hall Weka http://www.cs.waikato.ac.nz/ml/weka Methods: rules.ZeroR bayes.NaiveBayes trees.j48.J48 lazy.IBk trees.DecisionStump 14 © 2002 by Prentice Hall kMeans clustering • http://www.cs.mcgill.ca/~bonnef/project.h tml • http://www.cs.washington.edu/research/im agedatabase/demo/kmcluster/ • http://www2.cs.cmu.edu/~dellaert/software/ • java weka.clusterers.SimpleKMeans -t data/weather.arff 15 © 2002 by Prentice Hall More useful pointers • http://www.kdnuggets.com/ • http://www.twocrows.com/booklet.ht m 16 © 2002 by Prentice Hall