Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SATHYABAMA UNIVERSITY (Established under section 3 of UGC Act,1956) Course & Branch :M.Tech - IT/W-IT Title of the Paper :Data Mining and Data Warehousing Max. Marks :80 Sub. Code :731301 Time : 3 Hours Date :10/05/2010 Session :FN ______________________________________________________________________________________________________________________ 1. PART - A (6 x 5 = 30) Answer ALL the Questions Use a flow chart to summaries the following procedures for attribute subset selection: (a) Stepwise forward selection (b) Stepwise backward elimination (c) A combination of forward selection and backward elimination. 2. Outline a general procedure which describes how a class comparison is performed. Can class comparison mining be implemented efficiently using data cube techniques? 3. Illustrate with a good example the working of attribute oriented induction. 4. Why is tree pruning useful in decision tree induction? What is the drawback of using a separate set of samples to evaluate pruning? 5. Compare the advantages and disadvantages of eager classification (e.g., Bayesian, Neural Network etc.,) vs. lazy classification (e.g., k- nearest neighbour, case based reasoning etc.) 6. Write an algorithm for k-nearest neighbour classification, given k and n, the number of attributes describing each sample. PART – B (5 x 10 = 50) Answer ALL the Questions 7. 8. Suppose that the data for analysis include the attribute “age”. The age values for the data tuples are: 13, 15, 16, 16, 19, 20, 21, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52 and 70. (a) Use min-max normalization to transform the value 35 for age onto the range [0.0, 1.0]. (3) (b) Use Z-Score normalization to transform the value 35 for age, where the standard deviation of age is 12.94 years. (3) (c) Use normalization by decimal scaling to transform the value 35 for age. (or) Elaborate on the various discretization and concept hierarchy generation for numeric data. Comment on which method would you prefer to use for which situation, giving reasons as to why. Suppose that the data for analysis include the attribute “age”. The age values for the data tuples are: 13, 15, 16, 16, 19, 20, 21, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52 and 70. (a) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal etc.) (2) (b) What is the midrange of the data? (2) (c) Can you find the 1st Quartile (Q1) and the third Quartile (Q3) of that data? (Roughly) (4) (d) Give the five number summary of the data. (2) (or) 10. For class characterization, what are the major differences between a data cube based implementation and a relation tuple such as the attribute oriented induction? Discuss which method is most efficient and under what conditions this is so. 9. 11. (a) Briefly outline the major steps of decision tree classifer. (b) Given a decision tree, you have the option of (i) Converting the decision tree to rules and then pruning the resulting rules. (ii) Pruning the decision tree and then converting the pruned trees to rules. What advantage does (i) have over (ii)? (or) 12. It if difficult to assess classification accuracy when individual data objects may belong to more than one class at a time. In such cases, comment on what criteria you would use to compare different classifiers modeled after the same data. Illustrate with a simple example. 13. Briefly compare the following concepts. You may use an example to explain your concepts. (a) Snowflake schema, fact constellation, starmet query model. (b) Discovery driven cube, multi-feature cube, virtual warehouse. (or) 14. Suppose that a data warehouse contains 20 dimensions each with about 5 levels of granularity: (a) Users are mainly interested in 4 particular dimensions each having 3 frequently accessed levels for rolling up and drilling down. How would you design a data cube structure to support this preference efficiently? (b) At times a user may want to drill through the cube, down to the raw data for one/two particular dimensions. How would you support this feature? 15. Propose a few implementation methods for WWW mining, discussing the issues underlying it. (or) 16. Elaborately discuss the social impacts of data mining.