Download CS-515 Data Warehousing and Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Roll No. …………………..
Lingaya’s University
M.T ech . 1 s t Yea r ( T er m – III) ( FT )
E xami n ati on – Ju ne 2011
Data Warehousing and Data Mining (CS - 51 5 )
[Time: 3 Hours]
[Max. Marks: 100]
Before answering the question, candidate should ensure that they
have been supplied the correct and complete question paper. No
complaint in this regard, will be entertained after examination.
Note: – Attempt five questions in all. All questions carry equal
marks. Question no. 1 (Section A) is compulsory. Select two
questions from Section B and two questions from Section C.
Section – A
Q-1.
Answer the following questions:
(a)
Explain how the evolution of database technology led to data
mining.
(b)
List and describe the five primitives for specifying a data
mining task.
(c)
Discuss various methods of dealing with issue of missing
values in an attribute.
(d)
What is frequent pattern mining? What is it useful for?
(e)
Discuss various OLAP operations
[54=20]
1
Section – B
Q-2. (a)
Compare the snowflake schema, fact constellation and
starnet query model with example.
[10]
(b)
Outline the major steps of decision tree classification.[10]
Q-3. (a)
What is boosting? State why it may improve the accuracy
of decision tree induction?
[5]
(b)
Define each of the following data mining functionalities:
characterization, discrimination, association rule,
prediction and clustering analysis. Give examples of each
data mining functionality using a real-life database. [15]
Q-4. (a)
Is it needed to have a separate data warehouse system in
addition to the OLAP system to analyze data? Why? [5]
(b)
Describe the Naïve Bayes and k-NN approaches to
classification.
[15]
Section – C
Q-5. (a)
Suppose that the data for analysis includes the attribute
age. The age values for the data tuples are (in increasing
order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25,
25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(i)
Give the five-number summary of the data.
(ii)
Show a boxplot of the data.
(iii) Use smoothing by bin means to smooth the data,
using a bin depth of 3.
[10]
2
(b)
What are the different storage design structures of a cube?
What are the differences between them?
[10]
Q-6. (a)
Use the two methods below to normalize the following
group of data:
200, 300, 400, 600, 1000
(i) min-max normalization by setting min=0 and
max=1
(ii)
(b)
z-score normalization
[5]
A database has five transactions. Let min_sup = 60% and
min_conf = 80%
TID
(i)
Items_bought
T1
{M,O,N,K,E,Y}
T2
{D,O,N,K,E,Y}
T3
{M,A,K,E}
T4
{M,U,C,K,Y}
T5
{C,O,O,K,I,E}
Find all frequent itemsets using Apriori algorithm.
(ii) List all of the strong association rules (with supports
and confidence c) matching the following metarule,
where X is a variable representing customers, and item,
denotes variable representing items (e.g. “A”, “B”, etc.):
x  transaction, buys (X, item1)  buys (X, item2) 
buys (X, items3) [s,c]
[15]
3
Q-7. (a)
Following contingency table summarizes supermarket
transaction data, where hotdogs refer to the transactions
containing hot dogs, hotdogs refers to the transactions
that do not contain hot dogs, hamburgers refer to the
transactions containing hamburgers, hamburgers refers
to the transactions that do not contain hamburgers.

hotdogs
hotdogs
hamburgers
2100
500
2500
hamburgers
1000
1500
2500
3000
2000
5000

col
row
Suppose that the association rule “hotdogs 
hamburgers” is mined. Given a min_support threshold of
25% and a min_conf threshold of 50%, is this association
rule strong?
[5]
(b)
Briefly outline how to compute the dissimilarity between
objects described by the following types of variables:
(i)
Numerical (interval-scaled) variables
(ii)
Asymmetric binary variables
(iii) Categorical variables
(iv) Ratio-scaled variables
(v)
Q-8.
Nonmetric vector objects
[15]
Write the algorithm for attribute-oriented induction. [20]
4