Download CPCS202 - The Lab Note

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Naive Bayes classifier wikipedia , lookup

Transcript
2nd May
2011
LAB 9: Data Mining Technique: Classification
Statement Purpose:
Today we will continue the topic from the last week which is Data
Mining Techniques. As you already knew that we use data mining
for providing intelligence to our business by extracting patterns and
new rules from the previous data.
Today the data mining technique which we will see is
Classification  Decision Tree & Naïve Bayes Algorithm.
Activity Outcomes:
The main purpose of this lab is to prepare student for using
classification technique by using Rapid Miner. Including, building,
testing and analyzing the classification models using decision tree
and Naïve Bayes algorithm.
Instructor Note:
Follow the instructions.
CPIS-342 - The Lab Note
Lab 9
2nd May
2011
LAB 9: Data Mining Technique: Classification
 Today we use sample data set from Rapid Miner Repositories
 Open rapid miner and go to Repositories.
 Go to Samples then Data use the Golf data set
 Here we will use Decision tree method
CPIS-342 - The Lab Note
Lab 9
2nd May
2011
LAB 9: Data Mining Technique: Classification
 First, using Decision Tree
 In Operators expand Modeling ,in classification and regression
expand Tree Induction ,use DecisionTree
Observes the result
CPIS-342 - The Lab Note
Lab 9
2nd May
2011
LAB 9: Data Mining Technique: Classification
 Second, using Bayesian Modeling
 Import the Golf data set
 In Operators expand Modeling, in classification and regression
expand Bayesian Modeling, use Naïve Bayes

CPIS-342 - The Lab Note
Lab 9
2nd May
2011
LAB 9: Data Mining Technique: Classification
Observes the result
CPIS-342 - The Lab Note
Lab 9
2nd May
2011
LAB 9: Data Mining Technique: Classification
Some Theory about Decision Tree and Naïve Bayes algorithm for
more understanding:
Decision tree: used in statistics, data mining and machine learning, uses a decision tree as a
predictive model which maps observations about an item to conclusions about the item's
target value. More descriptive names for such tree models are classification trees or
regression trees. In these tree structures, leaves represent classifications and branches
represent conjunctions of features that lead to those classifications.
A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or
siblings aboard).
The figures under the leaves show the probability of survival and the percentage of
observations in the leaf.
A Naive Bayes classifier : is a simple probabilistic classifier based on applying Bayes'
theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more
descriptive term for the underlying probability model would be "independent feature
model".
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a
particular feature of a class is unrelated to the presence (or absence) of any other feature.
For example, a fruit may be considered to be an apple if it is red, round, and about 4" in
diameter. Even if these features depend on each other or upon the existence of the other
features, a naive Bayes classifier considers all of these properties to independently
contribute to the probability that this fruit is an apple.
CPIS-342 - The Lab Note
Lab 9