Download Question Bank/Assignment

GOVERNMENT ENGINEERING COLLEGE-MODASA MODASA Question Bank Faculty Subject Name Subject Code Semester Department Term Prof. C. H. Makwana Data Warehousing and Data Mining 171601 7th Information Technology ODD 2015 Chapter 1: Introduction to Data Warehousing 1. Define following terms: 1. Data Mart 2. Enterprise Warehouse 3. Virtual Warehouse (Summer 2014) 2. Explain Data Cube with all OLAP Operations in brief. (Winter 2013)  Explain different OLAP operation with example. (Summer 2015)  What is Cuboid? Explain various OLAP operations on data cube with suitable example. (Winter 2014) 3. Differentiate OLAP vs. OLTP. (Winter 2014, Nov/Dec 2011)  Differentiate between Operational Database System and Data Warehouse.(Summer 2014) 4. Explain Star, Snowflake, Fact Constellation Schema for Multidimensional Database. (Winter 2013)  Explain Star and Fact Galaxy schemas used in data warehouse for multidimensional database. (Winter 2014)  Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where charge is the fee that a doctor charges a patient for a visit. 1) Draw a star schema diagram for the data warehouse. 2) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? (Summer 2015) 5. Differentiate Fact table vs. Dimension table. (Winter 2014) 6. Explain three tier architecture of Data Warehouse (Winter 2013, Nov/Dec 2011) 7. Discuss the application of data warehousing and data mining in government sector. (Winter 2014) Chapter 2: Introduction to Data Mining 1. Define Data Mining. Explain various application area of Data Mining Techniques and Knowledge Discovery Process in brief. (Winter 2013)  Define the term “Data Mining”. With the help of a suitable diagram explain the process of knowledge discovery from databases. 2. Define KDD. How data mining techniques applied over multimedia database, temporal database and spatial database to extract useful knowledge. (Winter 2014)  Explain the KDD process in details. (Nov/Dec 2011) 3. What is Data Mining? List Challenges to data mining regarding data mining methodology and user- interaction issues. (Summer 2014) 4. Write short note on Spatial, Legacy and Multimedia Database. (Summer 2014) 5. List and describe major issues in data mining. (Nov/Dec 2011) 6. Explain metadata repository. (Nov/Dec 2011) 1|Page Chapter 3: Data Preprocessing and Data Mining Primitives 1. What is data cleaning? Discuss various ways of handling missing values during data cleaning.  List and describe methods for handling missing values in data cleaning (Summer 2014, Nov/Dec 2011) 2. What is noise? Explain the different techniques to remove the noise from data. (Summer 2015, Summer 2014)  What is noise? Describe the possible reasons for noisy data. Explain the different techniques to remove the noise from data. (Nov/Dec 2011)  What is noise? Explain data smoothing methods as noise removal technique to divide given data into bins of size 3 by bin partition (equal frequency), by bin means, by bin medians and by bin boundaries. Consider the data: 10, 2, 19, 18, 20, 18, 25, 28, 22 (Winter 2014)  Explain Following: 1. Binning method 2. How to handle missing values in Data Set. (Winter 2013) 3. What are the major challenges of mining a huge amount of data in comparison with mining a small amount of data? (Summer 2015) 4. Define sampling. Explain different type of sampling techniques with example. (Summer 2015) 5. Discuss why analytical data characterization is needed and how it can be performed. (Summer 2014) 6. How to compute the dissimilarity between objects described by the following types of variables: 1) Interval-scaled variables 2) Asymmetric binary variables 3) Categorical variables (Summer 2015) 7. Suppose that the data for analysis includes the attribute age. The age values for the data tuples are (in increasing order): 13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72 i) Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0] ii) Use z-score normalization to transform the value 45 for age, where the standard deviation of age is 20.64 years. (Winter 2014) 8. Explain Data Discretizaion and Concept Hierarchy Generation in brief. (Winter 2013)  What is concept hierarchy? List and explain types of concept hierarchy in detail. (Winter 2014, Summer 2014) 9. Explain data transformation in data mining. (Nov/Dec 2011) 2|Page Chapter 4: Concept Description and Association Rule Mining 1. Explain with an example attribute removal and attribute generalization. (Summer 2014) 2. What is Measures? List and explain types of measures. (Summer 2014) 3. Explain Market Basket Analysis with it’s use and Association Rules in brief. (Winter 2013) 4. Write an algorithm for finding frequent item-sets using candidate generation.  Describe the list of techniques for improving the efficiency of Apriori-based mining.  State the Apriori Property. Generate large itemsets and association rules using Apriori algorithm on the following data set with minimum support value and minimum confidence value set as 50% and 75% respectively. TID Items Purchased T101 Cheese, Milk, Cookies T102 Butter, Milk, Bread T103 Cheese, Butter, Milk, Bread T104 Butter, Bread (Winter 2014)  Explain Apriori Algorithm for finding Frequent Item-sets. (Winter 2013) 5. List two shortcomings of the algorithms which helped in improving the efficiency of Apriori algorithm. Discuss any TWO variations of the Apriori algorithm to improve the efficiency. (Summer 2014) 6. State how the partitioning method may improve the efficiency of association mining. (Summer 2014) 7. Why strong association rule is not always interesting? Explain with example. (Summer 2015) 8. How multilevel association rules can be mined efficiently using concept hierarchy? (Summer 2015) 9. Explain Outlier Analysis Techniques. (Winter 2013) 3|Page Chapter 5: Classification and Clustering 1. Explain Following terms: 1. Information Gain 2. Mean, Median, Mode. (Winter 2013)  Short note: Information gain, Gain ratio, Gini index. 2. Explain rule based classification and case based reasoning in details. (Nov/Dec 2011) 3. What is decision tree induction? Write Basic algorithm for inducing a decision tree from training tuples. (Summer 2015)  Briefly outline the major steps of decision tree classification. (Summer 2014)  Explain Classification with Decision Tree Induction method. (Winter 2013) 4. List strengths and weakness of neural network as classifier. (Summer 2015) 5. Explain how the accuracy of a classifier can be measured. How Bagging strategy helps improving the classifier accuracy? (Winter 2014) 6. What is supervised learning? Using the given table, show how the ROOT splitting attribute is selected using InfoGain measure in the overall process of decision tree induction. Class Windy 1 False N 2 True N 3 False P 4 False P 5 False P 6 True N 7 True P 8 False N 9 False P 10 False P 11 True P 12 True P 13 False P 14 True N (Winter 2014) 7. What are neural networks? Describe the various factors which make them useful for classification and prediction in data mining. Explain how the topology of neural network is designed. (Winter 2014) 8. Why naïve Bayesian classification is called “naïve”? Briefly outline the major ideas of naïve Bayesian classification. (Summer 2014)  Explain Baye’s Theorm and Naïve Bayesian Classification. (Winter 2013) 10. Write the typical requirements of clustering in data mining. 11. What is Cluster Analysis? List and explain requirements of clustering in data mining. (Summer 2014) 12. How can distance be computed for attributes that having missing valves in K-Nearest Neighbor classifier? (Summer 2015)  Explain k-means and k-medoids algorithm for clustering. (Winter 2013, Nov/Dec 2011)  How K-Mean clustering method differs from K-Medoid clustering method? Discuss the process of K-Mean clustering. Also outline major drawbacks of K-Mean clustering technique. (Winter 2014)  Write the steps of the k-medoids clustering algorithm with limitation. (Summer 2014) 13. Suppose that the data mining task is to cluster the following eight points (with (x, y) representing location) into three clusters: A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9): The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1 as the center of each cluster, respectively. Use the k-means algorithm to show 1) The three cluster centers after the first round execution 2) The final three clusters (Summer 2015) 14. Explain Linear Regression and Non-linear Regression techniques of prediction. (Winter 2014) No Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain 4|Page Attributes Temperature Humidity Hot High Hot High Hot High Mild High Cool Normal Cool Normal Cool Normal Mild High Cool Normal Mild Normal Mild Normal Mild High Hot Normal Mild High  Explain linear regression? What are the reasons for not using the linear regression model to estimate the output data? (Summer 2015)  Explain Linear Regression with example. (Winter 2013) 15. Differentiate Classification and Clustering. (Winter 2013) 16. Short Note: Distributive and Holistic measures. Chapter 6: Advance Topics of Data Mining and its Applications 1. Explain different types of web mining with suitable example. (Summer 2014)  What is web log? Explain web structure mining and web usage mining in detail. (Winter 2014)  Explain use of Data Mining techniques in Web/Internet Technology.  What are the challenges for effective resource and knowledge discovery in mining the World Wide Web? 2. Explain the information retrieval methods used in text mining. (Winter 2014, Summer 2014) 3. Explain mining in following Databases with example. 1. Temporal Databases 2. Sequence Databases 3. Spatial Databases and Spatiotemporal Databases. (Winter 2013) 4. Explain the methodologies for stream data processing and stream data Systems. (Nov/Dec 2011) 5|Page

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Question Bank/Assignment