Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
P.D.E.A.’s Prof. Ramkrishna More Arts, Commerce and Science College Akurdi, Pune-44. M.Sc. [Computer Science] Sem-II Assignment: 1 (Introduction to Data Mining & Introduction to Data Warehousing) Date: 18/01/13 Submission Date: 28/01/13 2 Mark Questions: 1. What is Data Mining? What are the alternative names of it? 2. How data mining differs from query processing? 3. How classification differs from clustering? 4. What is data warehouse? 5. What is OLAP? What are the different OLAP operations? 6. What is incomplete, noisy and inconsistent data? Give examples of each. 7. How to handle missing and noisy data? 8. What is concept hierarchy? 9. What is pattern matching? 10. What is machine learning? 4 Mark Questions: 1. 2. 3. 4. 5. 6. 7. 8. 9. What is KDD? Explain all the steps of KDD? Explain different visualization techniques. Explain the basic data mining tasks. Explain different data mining issues. Explain the applications of data mining. Explain multi-tiered architecture of data warehouse. Differentiate between OLAP and OLTP. Explain star schema with example. What are the advantages and disadvantages of it? Explain snowflake schema with example. What are the advantages and disadvantages of it? 10. Consider the following data (in increasing order) for the attribute age : 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. Smooth above data using following binning technologies: (bin size=3) i. By equal frequency ii. By bin means iii. By bin boundaries 11. Consider the following group of data 200, 300, 400, 600, 1000 Use following methods to normalize the given data: i. min-max normalization by setting min=0 & max=1. ii. Z-Score normalization iii. Normalization by decimal scaling. 12. What is data reduction? Explain different data reduction strategies. 13. Suppose that a data ware house for Big University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lower conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. (a) Draw a snowflake schema diagram for the data warehouse. (b) Starting with the base cuboids (student, course, semester, instructor), what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each Big University student. Ms.Katkar R.J. Mr.Lakhdive S.G. Lecturer Head of Department P.D.E.A.’s Prof. Ramkrishna More Arts, Commerce and Science College Akurdi, Pune-44. M.Sc. [Computer Science] Sem-II Assignment: 2 (Data Mining Techniques & Classification and Prediction) 2 Mark Questions: 1. 2. 3. 4. 5. 6. Define a frequent set. Define an association rule. Define support and confidence. What is classification? Explain 2 steps of it. List the different decision tree algorithms. Define: discrete valued, continue valued attribute. Why naïve Bayesian classification is called “naïve”? 4 Mark Questions: 1. Explain frequent item-set algorithm (apriori). 2. How to improve efficiency of apriori algorithm? 3. Define a FP-tree. Discuss the method of computing a FP-tree. 4. Explain the basic algorithm for inducing a decision tree. 5. Explain the different attribute selection measures. 6. What is tree pruning? Why it is useful in decision tree induction? What are the 2 types of it? 7. What is regression? Explain the different types of it. 8. Construct an FP-Tree for the following: TID 1 Item E, A, D, B 2 3 4 5 6 7 8 D, A, C, E, B C, A, B, E B, A, D D D, B A, D, E B, C 9. The following table shows the terminal and annual exam marks obtained by student in the database. Use the method of least squares to find an equation for the prediction of a student’s annual exam marks on the student’s terminal exam marks in the course. Predict the annual exam marks of a student who received 78 marks in the terminal exam. Terminal Exam (X) Annual Exam (Y) 56 34 53 45 44 55 67 89 41 51 56 63 53 90 56 75 90 76 69 74 10. The following table contains the training data from weather database containing attributes: outlook, temperature, humidity, windy and class. Let “Class” be the class level attribute. Given data tuple having the values, “rain”, “hot”, “high”, “false” for the attributes outlook, temperature, and humidity, windy. Compute a naïve Bayesian classification of the class. Outlook Temperature Humidity Windy Class Sunny Hot High False N Sunny Hot High True N Overcast Hot High False P Rain Mild High False P Rain Cool Normal False P Rain Cool Normal True N Overcast Cool Normal True P Sunny Mild High False N Sunny Cool Normal False P Rain Mild Normal False P Sunny Mild Normal True P Overcast Mild High True P Overcast Hot Normal False P Rain Mild High True N P.D.E.A.’s Prof. Ramkrishna More Arts, Commerce and Science College Akurdi, Pune-44. M.Sc. [Computer Science] Sem-II Assignment: 3 (Accuracy Measures & Clustering) 2 Mark Questions: 1. Define the following terms: a) Precision b) F-measure c) Confusion matrix d) Cross-validation e) Bootstrap 2. What is clustering? What are the different techniques of it? 4 Mark Questions: 1. Explain k-means clustering algorithm. 2. What is hierarchical clustering? Explain types of it. P.D.E.A.’s Prof. Ramkrishna More Arts, Commerce and Science College Akurdi, Pune-44. M.Sc. [Computer Science] Sem-II Assignment: 4 (Brief overview of advanced techniques) 2 Mark Questions: 1. What are the different types of web mining? 4 Mark Questions: 1. What is crawler? What are the different types of it? 2. Explain page rank algorithm. 3. Explain the data structures used to keep track of patterns identified during web mining.