Download UNIT – I

UNIT – IV Part – A 1. Define classification. Data classification is a tow-step process. In the first step, a model is built describing a predetermined set of data classes or concepts. In the second step, the model is used for classification by estimating the predictive accuracy of the model. 2. Define prediction. Prediction can be viewed as construction and use of a model to access the class of an unlabelled sample or to access the value of an attribute that a given sample is likely to have. 3. What is the difference between classification and regression? Classification is used to predict discrete or nominal values. While regression is used to predict continuous or ordered values. 4. What is supervised and unsupervised learning? If the class label of each training sample is provided, it is called supervised learning. If the class label of each training sample is not known and the number or set of classes to be learned is not known in advance, it is called unsupervised learning. 5. What is accuracy of a model? The accuracy of a model on a given test is the percentage of test set samples that are correctly classified by the model. If accuracy is considered acceptable, the model can be used to classify future data tuples or objects for which the class label is not known. 6. What are the preprocessing steps for classification and prediction? The following preprocessing steps may be applied  Data cleaning  Relevance analysis  Data transformation 7. How are various classification methods compared?  Predictive accuracy  Speed  Robustness  Scalability  Interpretability 8. What is a decision tree? A decision tree is a flow-chart like tree structure, where each internal node represents a test on an attribute, each branch represents an outcome of the test and leaf nodes represent classes or class distribution. The topmost node in the tree is the root node. 9. What is tree pruning? Tree pruning attempts to identify and remove branches that reflect noise or outliers in the training data with the goal of improving classification accuracy on unseen data. 10. Define information gain. The information gain measure is used to select the test attribute at each node in the tree. Such a measure is referred to as an attribute selection measure or measure of goodness of split. The attribute with the highest information gain is chosen as the test attribute for the current node. 11. How does tree pruning work? There are two approaches to tree pruning a) In prepruning approach, a tree is pruned by halting its construction early. E.g. by deciding not to further split the training samples at a given node. Upon halting, the node becomes a leaf node. b) In postpruning approach, all branches from a fully-grown tree are removed. The lowest pruned node becomes a leaf and is labeled by the most frequent class. 12. How are classification rules extracted from a decision tree? The knowledge represented in a decision tree can be extracted and represented in the form of classification IF-THEN rules. One rule is created for each path from the root to a leaf node. E.g. IF age=”<=30” AND student = “yes” THEN buys_computer = “no” IF age=”<=30” AND student = “no” THEN buys_computer = “yes” 13. What are the problems to which a decision tree is prone? The following are the problems in using decision tree 1) Fragmentation - the number of samples at a given branch becomes so small as to be statistically insignificant 2) Repetition – occurs when an attribute is repeatedly tested along a given branch of a tree 3) Replication – duplicate subtrees exist within the tree 14. What is attribute construction? Attribute construction is an approach for dealing with the problems of fragmentation, repetition and replication in a decision tree. In this method, the limited representation of the given attributes is improved by creating new attributes based on the existing ones. 15. What is the scalability issue in decision tree algorithms? Most decision tree algorithms have a restriction that the training samples should reside in the mail memory. In data mining applications, very large training sets of millions of samples are common. Hence, this restriction limits the scalability of algorithms. 16. What is an exception threshold? In decision tree induction, the partitioning of the subset is continued till the portion of samples in the given subset is less than a threshold value, known as exception threshold. 17. What are Bayesian classifiers? Bayesian classifiers are statistical classifiers. They can predict class membership probabilities such as the probability that a given sample belongs to a particular class. 18. What is class conditional independence? Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. 19. What are the two components of a belief network? The two components of a belief network are 1) A directed acyclic graph, where each node represents a random variable and each arc represents a probabilistic dependence 2) A conditional probability table (CPT) for each variable 20. What are the three steps in the learning algorithm of Bayesian network? a) Compute the gradients b) Take a small step in the direction of the gradient c) Normalize the weights Part – B 1. Explain classification by Decision tree induction in detail. (16) 2. What is Bayesian classification? Explain in detail. (16) 3. a) Explain the different classification methods. (8) b) What is prediction? Give an account of different types of regression. (8) 4. What is partitioning? Explain different partitioning methods for cluster analysis. 5. Explain the various hierarchical methods for cluster analysis? (16)

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download UNIT – I