Download DMDW Assignments - Prof. Ramkrishna More Arts, Commerce

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
P.D.E.A.’s
Prof. Ramkrishna More Arts, Commerce and Science College
Akurdi, Pune-44.
M.Sc. [Computer Science] Sem-II
Assignment: 1 (Introduction to Data Mining & Introduction to Data
Warehousing)
Date: 18/01/13
Submission Date: 28/01/13
2 Mark Questions:
1. What is Data Mining? What are the alternative names of it?
2. How data mining differs from query processing?
3. How classification differs from clustering?
4. What is data warehouse?
5. What is OLAP? What are the different OLAP operations?
6. What is incomplete, noisy and inconsistent data? Give examples of each.
7. How to handle missing and noisy data?
8. What is concept hierarchy?
9. What is pattern matching?
10. What is machine learning?
4 Mark Questions:
1.
2.
3.
4.
5.
6.
7.
8.
9.
What is KDD? Explain all the steps of KDD?
Explain different visualization techniques.
Explain the basic data mining tasks.
Explain different data mining issues.
Explain the applications of data mining.
Explain multi-tiered architecture of data warehouse.
Differentiate between OLAP and OLTP.
Explain star schema with example. What are the advantages and disadvantages of it?
Explain snowflake schema with example. What are the advantages and disadvantages of
it?
10. Consider the following data (in increasing order) for the attribute age :
13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45,
46, 52, 70.
Smooth above data using following binning technologies: (bin size=3)
i.
By equal frequency
ii.
By bin means
iii. By bin boundaries
11. Consider the following group of data
200, 300, 400, 600, 1000
Use following methods to normalize the given data:
i.
min-max normalization by setting min=0 & max=1.
ii.
Z-Score normalization
iii. Normalization by decimal scaling.
12. What is data reduction? Explain different data reduction strategies.
13. Suppose that a data ware house for Big University consists of the following four
dimensions: student, course, semester, and instructor, and two measures count and
avg_grade. When at the lower conceptual level (e.g., for a given student, course,
semester, and instructor combination), the avg_grade measure stores the actual course
grade of the student. At higher conceptual levels, avg_grade stores the average grade for
the given combination.
(a) Draw a snowflake schema diagram for the data warehouse.
(b) Starting with the base cuboids (student, course, semester, instructor), what
specific OLAP operations (e.g., roll-up from semester to year) should one perform
in order to list the average grade of CS courses for each Big University student.
Ms.Katkar R.J.
Mr.Lakhdive S.G.
Lecturer
Head of Department
P.D.E.A.’s
Prof. Ramkrishna More Arts, Commerce and Science College
Akurdi, Pune-44.
M.Sc. [Computer Science] Sem-II
Assignment: 2 (Data Mining Techniques & Classification and Prediction)
2 Mark Questions:
1.
2.
3.
4.
5.
6.
Define a frequent set. Define an association rule.
Define support and confidence.
What is classification? Explain 2 steps of it.
List the different decision tree algorithms.
Define: discrete valued, continue valued attribute.
Why naïve Bayesian classification is called “naïve”?
4 Mark Questions:
1. Explain frequent item-set algorithm (apriori).
2. How to improve efficiency of apriori algorithm?
3. Define a FP-tree. Discuss the method of computing a FP-tree.
4. Explain the basic algorithm for inducing a decision tree.
5. Explain the different attribute selection measures.
6. What is tree pruning? Why it is useful in decision tree induction? What are the 2 types of
it?
7. What is regression? Explain the different types of it.
8. Construct an FP-Tree for the following:
TID
1
Item
E, A, D, B
2
3
4
5
6
7
8
D, A, C, E, B
C, A, B, E
B, A, D
D
D, B
A, D, E
B, C
9. The following table shows the terminal and annual exam marks obtained by student in the
database. Use the method of least squares to find an equation for the prediction of a
student’s annual exam marks on the student’s terminal exam marks in the course. Predict
the annual exam marks of a student who received 78 marks in the terminal exam.
Terminal Exam (X) Annual Exam (Y)
56
34
53
45
44
55
67
89
41
51
56
63
53
90
56
75
90
76
69
74
10. The following table contains the training data from weather database containing
attributes: outlook, temperature, humidity, windy and class. Let “Class” be the class level
attribute. Given data tuple having the values, “rain”, “hot”, “high”, “false” for the
attributes outlook, temperature, and humidity, windy. Compute a naïve Bayesian
classification of the class.
Outlook Temperature Humidity Windy Class
Sunny
Hot
High
False
N
Sunny
Hot
High
True
N
Overcast
Hot
High
False
P
Rain
Mild
High
False
P
Rain
Cool
Normal
False
P
Rain
Cool
Normal
True
N
Overcast
Cool
Normal
True
P
Sunny
Mild
High
False
N
Sunny
Cool
Normal
False
P
Rain
Mild
Normal
False
P
Sunny
Mild
Normal
True
P
Overcast
Mild
High
True
P
Overcast
Hot
Normal
False
P
Rain
Mild
High
True
N
P.D.E.A.’s
Prof. Ramkrishna More Arts, Commerce and Science College
Akurdi, Pune-44.
M.Sc. [Computer Science] Sem-II
Assignment: 3 (Accuracy Measures & Clustering)
2 Mark Questions:
1. Define the following terms:
a) Precision
b) F-measure
c) Confusion matrix
d) Cross-validation
e) Bootstrap
2. What is clustering? What are the different techniques of it?
4 Mark Questions:
1. Explain k-means clustering algorithm.
2. What is hierarchical clustering? Explain types of it.
P.D.E.A.’s
Prof. Ramkrishna More Arts, Commerce and Science College
Akurdi, Pune-44.
M.Sc. [Computer Science] Sem-II
Assignment: 4 (Brief overview of advanced techniques)
2 Mark Questions:
1. What are the different types of web mining?
4 Mark Questions:
1. What is crawler? What are the different types of it?
2. Explain page rank algorithm.
3. Explain the data structures used to keep track of patterns identified during web mining.