Download CES 514 Data Mining Fall 2003

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Inverse problem wikipedia , lookup

Neuroinformatics wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Theoretical computer science wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

Data analysis wikipedia , lookup

Data assimilation wikipedia , lookup

Corecursion wikipedia , lookup

Pattern recognition wikipedia , lookup

Transcript
CES 514 Data Mining
Spring 2010
Home Work # 4
Due:
April 14, 2010
1) Consider the training examples shown in the table below for a
binary classification problem (the last column being the class).
(a)
What is the entropy of this collection of training
examples?
(b)
What are the information gains of the attributes
Gender and Car Type?
(c)
What are the gain ratios of Gender and Shirt Size ?
(d)
Apply the Naïve Bayes’ algorithm to determine the
probability that the data point shown below belongs to
class C1:
ID
21
Gender
F
Car Type
Family
Shirt Size
Large
Class
?
(e)
Apply the decision tree built using the algorithm
presented in class to determine the class to which the
above data point belongs.
2) You are given a data set that contains various attributes. The
last column contains the classification (0 or 1).
(a)
Using Weka, create a decision tree based on 80% of
the data points and apply it to the remaining 20% data
items. Report the success % achieved. (The data set can be
found in the file hw4problem2dataset.txt in the Home Work
directory.)
(b)
For the same data set as in Problem 2(b), apply the
Naïve Bayes’ algorithm. Use the same 80% of the data points
for training and determine the probability for each of the
remaining 20% to be classified as 0 or 1. Finally calculate
the success % achieved.