Download Sathyabama Univarsity M.Tech May 2010 Data Mining and Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
SATHYABAMA UNIVERSITY
(Established under section 3 of UGC Act,1956)
Course & Branch :M.Tech - IT/W-IT
Title of the Paper :Data Mining and Data Warehousing
Max. Marks :80
Sub. Code :731301
Time : 3 Hours
Date :10/05/2010
Session :FN
______________________________________________________________________________________________________________________
1.
PART - A
(6 x 5 = 30)
Answer ALL the Questions
Use a flow chart to summaries the following procedures for
attribute subset selection:
(a) Stepwise forward selection
(b) Stepwise backward elimination
(c) A combination of forward selection and backward
elimination.
2.
Outline a general procedure which describes how a class
comparison is performed. Can class comparison mining be
implemented efficiently using data cube techniques?
3.
Illustrate with a good example the working of attribute oriented
induction.
4.
Why is tree pruning useful in decision tree induction? What is the
drawback of using a separate set of samples to evaluate pruning?
5.
Compare the advantages and disadvantages of eager
classification (e.g., Bayesian, Neural Network etc.,) vs. lazy
classification (e.g., k- nearest neighbour, case based reasoning
etc.)
6.
Write an algorithm for k-nearest neighbour classification, given k
and n, the number of attributes describing each sample.
PART – B
(5 x 10 = 50)
Answer ALL the Questions
7.
8.
Suppose that the data for analysis include the attribute “age”. The
age values for the data tuples are: 13, 15, 16, 16, 19, 20, 21, 21,
22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52 and 70.
(a) Use min-max normalization to transform the value 35 for age
onto the range [0.0, 1.0].
(3)
(b) Use Z-Score normalization to transform the value 35 for age,
where the standard deviation of age is 12.94 years.
(3)
(c) Use normalization by decimal scaling to transform the value
35 for age.
(or)
Elaborate on the various discretization and concept hierarchy
generation for numeric data. Comment on which method would
you prefer to use for which situation, giving reasons as to why.
Suppose that the data for analysis include the attribute “age”. The
age values for the data tuples are: 13, 15, 16, 16, 19, 20, 21, 21,
22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52 and 70.
(a) What is the mode of the data? Comment on the data’s
modality (i.e., bimodal, trimodal etc.)
(2)
(b) What is the midrange of the data?
(2)
(c) Can you find the 1st Quartile (Q1) and the third Quartile (Q3)
of that data? (Roughly)
(4)
(d) Give the five number summary of the data.
(2)
(or)
10. For class characterization, what are the major differences
between a data cube based implementation and a relation tuple
such as the attribute oriented induction? Discuss which method is
most efficient and under what conditions this is so.
9.
11. (a) Briefly outline the major steps of decision tree classifer.
(b) Given a decision tree, you have the option of
(i) Converting the decision tree to rules and then pruning the
resulting rules.
(ii) Pruning the decision tree and then converting the pruned
trees to rules.
What advantage does (i) have over (ii)?
(or)
12. It if difficult to assess classification accuracy when individual
data objects may belong to more than one class at a time. In such
cases, comment on what criteria you would use to compare
different classifiers modeled after the same data. Illustrate with a
simple example.
13. Briefly compare the following concepts. You may use an
example to explain your concepts.
(a) Snowflake schema, fact constellation, starmet query model.
(b) Discovery driven cube, multi-feature cube, virtual warehouse.
(or)
14. Suppose that a data warehouse contains 20 dimensions each with
about 5 levels of granularity:
(a) Users are mainly interested in 4 particular dimensions each
having 3 frequently accessed levels for rolling up and drilling
down. How would you design a data cube structure to support
this preference efficiently?
(b) At times a user may want to drill through the cube, down to
the raw data for one/two particular dimensions. How would you
support this feature?
15. Propose a few implementation methods for WWW mining,
discussing the issues underlying it.
(or)
16. Elaborately discuss the social impacts of data mining.