Download R - Shengdong Zhao

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Transcript
Application of Data
Mining Techniques on
Survey Data
using R and Weka
Supunmali Ahangama
29/11/2013
Outline
 Introduction to data mining in R
 Introduction to data mining in Weka
 Example
2
What is R?
3
Why Learn R?
 R offers more analytical methods and now
over 1000 add-on packages are available
 R is far more flexible in the type of data it can
analyze
 R’s procedures (functions), are open for you
to see and modify
 R is free
4
The Popularity of R is Growing Fast
 #1 most used data mining
tool (in both 2010 and 2011).
Up from #5 in 2007
 An increasing number of data
miners consider R their
primary tool
 #2 in parimary tool rankings
(in 2011). Up from #7 in 2008.
Reference: Rexer Analytic 2011- Data miner survey summary report
5
Data Mining Software
6
Reference: Rexer Analytic 2011- Data miner survey summary report
Graphical User Interface (GUI)
 R Studio
 R Commander
 Rattle
 Deducer
 Revolution Analytics
Reference: Rexer Analytic 2011- Data miner survey summary report
7
Rattle 2.6.26 - Installation
 Startup R (v3.0.2) and then
 > install.packages("rattle")
> library("rattle")
> rattle()
Artificial Neural Network
(ANN) package: neuralnet
1.32
8
Weka
 Waikato Environment for Knowledge Analysis
 A collection of machine learning algorithms and visualization
tools
 Written in Java
 RWeka – An R interface for Weka
9
Data set (Y. Hayashi & R. Seti0no 2010)
 Aim: To discover factors that could be used to distinguish the
consumers who eat out frequently to those who do not.
 Survey is conducted in major cities in Taiwan in 2003 among
consumers aged 15 to 64 years
 Target:
 Class 1 – if the respondent’s eat out frequency is less than 25 per
month on average
 Class 2 –otherwise
 Predictor variables – respondent’s eating out considerations and
personal characteristics (socio-demographics, psychological
information)
10
Methodology: ANN
 ANN - Artificial Neural Network
 Set of connected input and output units in which each
connection has a weight associated with it
 Network learn by adjusting the weights so as to be able to
predict the correct class label of the input tuples
11
ANN
Black box
12
Tools
 Rattle GUI
 Weka
13
Techniques for Ferretting Out
Information from Trained ANN
 Sensitivity analysis
 Probe ANN with test inputs, and record the outputs
 Determining the impact or effect of an input variable on the
output
 hold the other inputs to some fixed value (e.g. mean or median
value), vary only the input while monitoring the change in outputs
 measure of the degree to which each input contributes to the
output error
 the largest error  the largest impact
 Rule Extraction
14
Sensitivity Analysis
Relative Importance
Where
wji = weight from the ith input node to the jth hidden
node
wkj = weight from the jth hidden node to the kth
output node.
15
In a nutshell
 Appreciation of R as a leading statistical tool
 How Rattle GUI and Weka could be used for data mining
 How ANN could be applied into consumer behaviour study
 Identification of the relationship between predictors and
dependent variable through sensitivity analysis
16
Thank You.
17