Download CS-515 Data Warehousing and Data Mining

Roll No. ………………….. Lingaya’s University M.T ech . 1 s t Yea r ( T er m – III) ( FT ) E xami n ati on – Ju ne 2011 Data Warehousing and Data Mining (CS - 51 5 ) [Time: 3 Hours] [Max. Marks: 100] Before answering the question, candidate should ensure that they have been supplied the correct and complete question paper. No complaint in this regard, will be entertained after examination. Note: – Attempt five questions in all. All questions carry equal marks. Question no. 1 (Section A) is compulsory. Select two questions from Section B and two questions from Section C. Section – A Q-1. Answer the following questions: (a) Explain how the evolution of database technology led to data mining. (b) List and describe the five primitives for specifying a data mining task. (c) Discuss various methods of dealing with issue of missing values in an attribute. (d) What is frequent pattern mining? What is it useful for? (e) Discuss various OLAP operations [54=20] 1 Section – B Q-2. (a) Compare the snowflake schema, fact constellation and starnet query model with example. [10] (b) Outline the major steps of decision tree classification.[10] Q-3. (a) What is boosting? State why it may improve the accuracy of decision tree induction? [5] (b) Define each of the following data mining functionalities: characterization, discrimination, association rule, prediction and clustering analysis. Give examples of each data mining functionality using a real-life database. [15] Q-4. (a) Is it needed to have a separate data warehouse system in addition to the OLAP system to analyze data? Why? [5] (b) Describe the Naïve Bayes and k-NN approaches to classification. [15] Section – C Q-5. (a) Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. (i) Give the five-number summary of the data. (ii) Show a boxplot of the data. (iii) Use smoothing by bin means to smooth the data, using a bin depth of 3. [10] 2 (b) What are the different storage design structures of a cube? What are the differences between them? [10] Q-6. (a) Use the two methods below to normalize the following group of data: 200, 300, 400, 600, 1000 (i) min-max normalization by setting min=0 and max=1 (ii) (b) z-score normalization [5] A database has five transactions. Let min_sup = 60% and min_conf = 80% TID (i) Items_bought T1 {M,O,N,K,E,Y} T2 {D,O,N,K,E,Y} T3 {M,A,K,E} T4 {M,U,C,K,Y} T5 {C,O,O,K,I,E} Find all frequent itemsets using Apriori algorithm. (ii) List all of the strong association rules (with supports and confidence c) matching the following metarule, where X is a variable representing customers, and item, denotes variable representing items (e.g. “A”, “B”, etc.): x  transaction, buys (X, item1)  buys (X, item2)  buys (X, items3) [s,c] [15] 3 Q-7. (a) Following contingency table summarizes supermarket transaction data, where hotdogs refer to the transactions containing hot dogs, hotdogs refers to the transactions that do not contain hot dogs, hamburgers refer to the transactions containing hamburgers, hamburgers refers to the transactions that do not contain hamburgers.  hotdogs hotdogs hamburgers 2100 500 2500 hamburgers 1000 1500 2500 3000 2000 5000  col row Suppose that the association rule “hotdogs  hamburgers” is mined. Given a min_support threshold of 25% and a min_conf threshold of 50%, is this association rule strong? [5] (b) Briefly outline how to compute the dissimilarity between objects described by the following types of variables: (i) Numerical (interval-scaled) variables (ii) Asymmetric binary variables (iii) Categorical variables (iv) Ratio-scaled variables (v) Q-8. Nonmetric vector objects [15] Write the algorithm for attribute-oriented induction. [20] 4

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download CS-515 Data Warehousing and Data Mining