Download ASS3 data

Assignment 3 Q1: Step 1: Determine the Root of the Tree. Step 2: Calculate Entropy for The Classes. Step 3: Calculate Entropy After Split for Each Attribute. Step 4: Calculate Information Gain for each split. Step 5: Perform the Split. Step 6: Perform Further Splits. Step 7: Complete the Decision Tree. Q2: The decision tree built may overfit the training data. There could be too many branches, some of which may reflect anomalies in the training data due to noise or outliers. Tree pruning addresses this issue of overfitting the data by removing the least reliable branches (using statistical measures). This generally results in a more compact and reliable decision tree that is faster and more accurate in its classification of data. The drawback of using a separate set of tuples to evaluate pruning is that it may not be representative of the training tuples used to create the original decision tree. If the separate set of tuples are skewed, then using them to evaluate the pruned tree would not be a good indicator of the pruned tree’s classification accuracy. Furthermore, using a separate set of tuples to evaluate pruning means there are less tuples to use for creation and testing of the tree. While this is considered a drawback in machine learning, it may not be so in data mining due to the availability of larger data sets. Q3: If pruning a subtree, we would remove the subtree completely with method (b). However, with method (a), if pruning a rule, we may remove any precondition of it. The latter is less restrictive. Q4: Naive Bayesian classification is called naive because it assumes class conditional independence. That is, the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is made to reduce computational costs, and hence is considered “na¨ıve”. The major idea behind naive Bayesian classification is to try and classify data by maximizing P(X|Ci) P(Ci) (where i is an index of the class) using the Bayes’ theorem of posterior probability. Q5: Q6: Q7: Method and General Characteristics of Clustering Partitioning methods: – Find mutually exclusive clusters of spherical shape – Distance-based – May use mean or medoid (etc.) to represent cluster centre – Effective for small- to medium-size data sets Hierarchical methods: – Clustering is a hierarchical decomposition (i.e., multiple levels) – Cannot correct erroneous merges or splits – May incorporate other techniques like micro clustering or consider object “linkages” Density-based methods: – Can find arbitrarily shaped clusters – Clusters are dense regions of objects in space that are separated by low-density regions – Cluster density: Each point must have a minimum number of points within its “neighbourhood” – May filter out outliers Q8: a) The three cluster centers after the first round of execution. The three clusters are A1, B1, and C1, so calculating the Euclidean distance between of each point from all the three clusters. First Iteration: The three clusters with cluster points are: Cluster 1 = {A1(2,10)} Cluster 2 = {A3(8,4), B1(5,8), B2(7,5), B3(6,4), C2(4,9)} Cluster 3= {A2(2,5), C1(4,9)} Calculating the centre (centroid) after the first round: Center1 = (2,10) Center2 = {(5+8+7+6+4)/5, (8+4+5+4+9)/5} = (6,6) Center3 = (1.5, 3.5) b) The final three clusters. Second iteration: After the third iteration the final clusters are: Cluster 1: {(A1, C2, B1} Cluster 2: {(A3, B2, B3)} Cluster 3: {(A2, C1)} Q9: The global minimum depends on the type of error function you define to minimize. For KMeans algorithm I found the way to get to global minima would be by brute-force. Suppose you have k = 2 and points = 6 the way you can initialize them is a max of 2^6 ways. Solving k-means for that will give me all the local minima’s for an error function. The global minimum will be the one with the least error as that will encompass all the local minima’s. Within-cluster-variance is a simple to understand measure of compactness (there are others, too). So basically, the objective is to find the most compact partitioning of the data set into k partitions. K-Means, in the Lloyd version, actually originated from 1d PCM data as far as I know. So assuming you have a really bad telephone line, and someone is bleeping a number of tones on you, how do you assign frequencies to a scale of say 10 tones? Well, you can tell the other to just send a bulk of data, store the frequencies in a list, and then try to split it into 10 bins, such that these bins are somewhat compact and separated. Even when the frequencies are distorted by the transmission, there is a good chance they will still be separable with this approach. Q10: Partitioning based clustering and hierarchical clustering uses Euclidean distance to measure distance and forms clusters which can only make clusters of spherical shape and encounters difficulties in making clusters of arbitrary size. Partitioning methods also have difficulties in finding clusters of different densities and diameter. There are two major flaws of hierarchical methods, the first one is they are not scalable and the second one is they don’t have the ability to undo what was done in the previous step which makes density based algorithm more suitable than others. partitioning based clustering based density-based methods like DBSCAN, OPTICS, DENCLUE can form clusters of arbitrary shape and can handle noise in the data. Each cluster are collection of points with high density compared to the points outside the clusters. density-based clustering

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download ASS3 data