Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
WEKA A Data Mining Tool By Susan L. Miertschin 1 Data Mining Task Types Numerous Algorithms Classification C4.5 Decision Tree Clustering K-Means K Means Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis Regression R i Detecting Deviations from Normal 2 http://www cs waikato ac nz/ml/weka/ http://www.cs.waikato.ac.nz/ml/weka/ WEKA can be freelyy downloaded byy visitingg the Web site 3 WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato , New Zealand Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to realreal world data-mining problems Developed p in Java J 4 WELA’s Collection of Machine Learning Alg ith Algorithms Algorithms for data mining tasks WEKA is open source software issued under the GNU General Public License Tools T l ffor: Data pre-processing Classification Regression Clustering Association rules Visualization 5 After Installing - Start WEKA 6 WEKA Main Interface 7 WEKA Sample Files C:\Program Files\weka\data WEKA formatted files (.arff) Open the contact-lenses file 8 Example – Contact Lens Data How manyy data instances are in the file? How many attributes? Numerical attributes? Categorical attributes? 9 Example – Contact Lens Data Can you think of problems that might be solved with ith thi this data? 10 Example – Contact Lens Data If supervised learning were to be done, which would be the output attribute, attribute do you think? 11 Example – Contact Lens Data 12 Example – Contact Lens Data 13 Example – Contact Lens Data 14 E Example l – Classify Cl if - Contact C t t Lens L Data D t 15 E Example l – Classify Cl if - Contact C t t Lens L Data D t 16 E Example l – Classify Cl if - Contact C t t Lens L Data D t Select the rule generator named PART from th list the li t that th t shows up after you select Choose 17 E Example l – Classify Cl if - Contact C t t Lens L Data D t 18 E Example l – Classify Cl if - Contact C t t Lens L Data D t 19 10 Fold Cross-Validation 10-Fold Cross Validation Data is partitioned into 10 equally (or nearly equally) sized segments or folds 10 iterations of training and validation are completed In I eachh iteration a different d ff ffold ld off the h data d is held h ld out ffor validation, with the remaining 9 folds used for learning 20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf E Example l – Classify Cl if - Contact C t t Lens L Data D t 21 E Example l – Classify Cl if - Contact C t t Lens L Data D t IF tear tear-prodprod rate = reduced THEN contactllenses = none IF astigmatism tig ti = no THEN co tact e ses contact-lenses = soft 22 E Example l – Classify Cl if - Contact C t t Lens L Data D t Coverage = 12 23 E Example l – Classify Cl if - Contact C t t Lens L Data D t IF tear tear-prodprod rate = reduced THEN contactllenses = none IF astigmatism tig ti = no THEN co tact e ses contact-lenses = soft 24 E Example l – Classify Cl if - Contact C t t Lens L Data D t Coverage = 6 Misclassification = 1 Accuracy = 5/6 = 83.3% 25 E Example l – Classify Cl if - Contact C t t Lens L Data D t 26 WEKA A Data Mining Tool By Susan L. Miertschin 27