Download ASS3 data

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Assignment 3
Step 1: Determine the Root of the Tree.
Step 2: Calculate Entropy for The Classes.
Step 3: Calculate Entropy After Split for Each Attribute.
Step 4: Calculate Information Gain for each split.
Step 5: Perform the Split.
Step 6: Perform Further Splits.
Step 7: Complete the Decision Tree.
The decision tree built may overfit the training data. There could be too many branches, some of
which may reflect anomalies in the training data due to noise or outliers. Tree pruning addresses
this issue of overfitting the data by removing the least reliable branches (using statistical
measures). This generally results in a more compact and reliable decision tree that is faster and
more accurate in its classification of data. The drawback of using a separate set of tuples to
evaluate pruning is that it may not be representative of the training tuples used to create the
original decision tree. If the separate set of tuples are skewed, then using them to evaluate the
pruned tree would not be a good indicator of the pruned tree’s classification accuracy.
Furthermore, using a separate set of tuples to evaluate pruning means there are less tuples to use
for creation and testing of the tree. While this is considered a drawback in machine learning, it
may not be so in data mining due to the availability of larger data sets.
If pruning a subtree, we would remove the subtree completely with method
(b). However, with method (a), if pruning a rule, we may remove any
precondition of it. The latter is less restrictive.
Naive Bayesian classification is called naive because it assumes class conditional
independence. That is, the effect of an attribute value on a given class is independent of the
values of the other attributes.
This assumption is made to reduce computational costs, and hence is considered “na¨ıve”. The
major idea behind naive Bayesian classification is to try and classify data by maximizing
P(X|Ci) P(Ci) (where i is an index of the class) using the Bayes’ theorem of posterior
Method and General Characteristics of Clustering
Partitioning methods:
– Find mutually exclusive clusters of spherical shape
– Distance-based
– May use mean or medoid (etc.) to represent cluster centre
– Effective for small- to medium-size data sets
Hierarchical methods:
– Clustering is a hierarchical decomposition (i.e., multiple levels)
– Cannot correct erroneous merges or splits
– May incorporate other techniques like micro clustering or consider object “linkages”
Density-based methods:
– Can find arbitrarily shaped clusters
– Clusters are dense regions of objects in space that are separated by low-density regions
– Cluster density: Each point must have a minimum number of points within its
– May filter out outliers
a) The three cluster centers after the first round of execution. The three clusters are A1,
B1, and C1, so calculating the Euclidean distance between of each point from all the
three clusters.
First Iteration:
The three clusters with cluster points are:
Cluster 1 = {A1(2,10)}
Cluster 2 = {A3(8,4), B1(5,8), B2(7,5), B3(6,4), C2(4,9)}
Cluster 3= {A2(2,5), C1(4,9)}
Calculating the centre (centroid) after the first round:
Center1 = (2,10)
Center2 = {(5+8+7+6+4)/5, (8+4+5+4+9)/5} = (6,6)
Center3 = (1.5, 3.5)
b) The final three clusters.
Second iteration:
After the third iteration the final clusters are:
Cluster 1: {(A1, C2, B1}
Cluster 2: {(A3, B2, B3)}
Cluster 3: {(A2, C1)}
The global minimum depends on the type of error function you define to minimize. For KMeans algorithm I found the way to get to global minima would be by brute-force. Suppose
you have k = 2 and points = 6 the way you can initialize them is a max of 2^6 ways. Solving
k-means for that will give me all the local minima’s for an error function. The global
minimum will be the one with the least error as that will encompass all the local minima’s.
Within-cluster-variance is a simple to understand measure of compactness (there are others,
too). So basically, the objective is to find the most compact partitioning of the data set into k
partitions. K-Means, in the Lloyd version, actually originated from 1d PCM data as far as I
know. So assuming you have a really bad telephone line, and someone is bleeping a number
of tones on you, how do you assign frequencies to a scale of say 10 tones? Well, you can tell
the other to just send a bulk of data, store the frequencies in a list, and then try to split it into
10 bins, such that these bins are somewhat compact and separated. Even when the frequencies
are distorted by the transmission, there is a good chance they will still be separable with this
Partitioning based clustering and
hierarchical clustering uses Euclidean
distance to measure distance and forms
clusters which can only make clusters of
spherical shape and encounters difficulties
in making clusters of arbitrary size.
Partitioning methods also have difficulties
in finding clusters of different densities and
diameter. There are two major flaws of
hierarchical methods, the first one is they
are not scalable and the second one is they
don’t have the ability to undo what was
done in the previous step which makes
density based algorithm more suitable than
partitioning based clustering based
density-based methods like DBSCAN,
OPTICS, DENCLUE can form clusters of
arbitrary shape and can handle noise in the
data. Each cluster are collection of points
with high density compared to the points
outside the clusters.
density-based clustering
Related documents