Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Application of Data Mining Techniques on Survey Data using R and Weka Supunmali Ahangama 29/11/2013 Outline Introduction to data mining in R Introduction to data mining in Weka Example 2 What is R? 3 Why Learn R? R offers more analytical methods and now over 1000 add-on packages are available R is far more flexible in the type of data it can analyze R’s procedures (functions), are open for you to see and modify R is free 4 The Popularity of R is Growing Fast #1 most used data mining tool (in both 2010 and 2011). Up from #5 in 2007 An increasing number of data miners consider R their primary tool #2 in parimary tool rankings (in 2011). Up from #7 in 2008. Reference: Rexer Analytic 2011- Data miner survey summary report 5 Data Mining Software 6 Reference: Rexer Analytic 2011- Data miner survey summary report Graphical User Interface (GUI) R Studio R Commander Rattle Deducer Revolution Analytics Reference: Rexer Analytic 2011- Data miner survey summary report 7 Rattle 2.6.26 - Installation Startup R (v3.0.2) and then > install.packages("rattle") > library("rattle") > rattle() Artificial Neural Network (ANN) package: neuralnet 1.32 8 Weka Waikato Environment for Knowledge Analysis A collection of machine learning algorithms and visualization tools Written in Java RWeka – An R interface for Weka 9 Data set (Y. Hayashi & R. Seti0no 2010) Aim: To discover factors that could be used to distinguish the consumers who eat out frequently to those who do not. Survey is conducted in major cities in Taiwan in 2003 among consumers aged 15 to 64 years Target: Class 1 – if the respondent’s eat out frequency is less than 25 per month on average Class 2 –otherwise Predictor variables – respondent’s eating out considerations and personal characteristics (socio-demographics, psychological information) 10 Methodology: ANN ANN - Artificial Neural Network Set of connected input and output units in which each connection has a weight associated with it Network learn by adjusting the weights so as to be able to predict the correct class label of the input tuples 11 ANN Black box 12 Tools Rattle GUI Weka 13 Techniques for Ferretting Out Information from Trained ANN Sensitivity analysis Probe ANN with test inputs, and record the outputs Determining the impact or effect of an input variable on the output hold the other inputs to some fixed value (e.g. mean or median value), vary only the input while monitoring the change in outputs measure of the degree to which each input contributes to the output error the largest error the largest impact Rule Extraction 14 Sensitivity Analysis Relative Importance Where wji = weight from the ith input node to the jth hidden node wkj = weight from the jth hidden node to the kth output node. 15 In a nutshell Appreciation of R as a leading statistical tool How Rattle GUI and Weka could be used for data mining How ANN could be applied into consumer behaviour study Identification of the relationship between predictors and dependent variable through sensitivity analysis 16 Thank You. 17