Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Weka: An open-source tool for data analysis and mining with machine learning Quantitative Data Analysis Colloquium Centenary College of Louisiana Mark Goadrich 4/17/2008 Regression lines and correlation • Find relationship between two attributes • Correlation coefficient Categorization • Can we learn one category based on the others? • This search for classification lines is called machine learning Data Sets • • • • • House of Representative Votes Labor Relations Iris (plant) Discrimination Breast Cancer Many more at http://archive.ics.uci.edu/ml/ • Table of Features – Example is a row – Features are discrete or continuous Weka Time - Explore • http://www.cs.waikato.ac.nz/ml/weka/ • Open Explorer • Open Data File – ARFF or CSV • Visualize All • Visualize Crosstabs Discrete : Decision Trees • Reduce confusion (entropy) in the data by drawing recursive lines • Result is comprehensible to humans Continuous : ANN and SVM • Artificial Neural Networks simulate activating and thresholding neurons • Support Vector Machines use a kernel to transform data to higher dimensions Weka Time - Classify • Choose Algorithm – J48, Multilayered Perceptron, SMO • Validate Learning – Training set – Cross validation • Visualize output – ROC Curves – Precision-Recall Curves Future Topics • Clustering – Number and makeup of categories unknown • Relational Data – Features are related within examples – Features are related across examples