Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Nearest-neighbor chain algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
Jan2017
ASSESSMENT_CODE MC0088_Jan2017
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5228
QUESTION_TEXT
Explain the concept of data transformation.
SCHEME OF
EVALUATION
In data transformation, the data are transformed or consolidated into
forms appropriate for mining. It involves the following: (2 marks)
Smoothing: which works to remove the noise form data? Such
techniques include binning, clustering and regression.
Aggregation, where summary of aggregation operations are applied to
the data. This step is typically used in constructing a data cube for
analysis of the data at multiple granularities. (2 marks)
Generalization of the data, where low level or primitive data are
replaced by higher level concepts through the use of concept
hierarchies. Ex like street can be generalizes to city or country (2
marks)
Normalization, where attribute data are scaled so as to fall within a
small specified range such as 1.0 to 1.0 or 0.0 to 1.0 (2 marks)
Attribute construction where new attributes are constructed and added
from the given set of attributes to help the mining process. (2 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5229
QUESTION_TEXT
What is Divisive clustering? Write algorithmic steps for the divisive
clustering.
SCHEME OF
EVALUATION
The variant of hierarchical clustering is called top-down clustering or
divisive clustering.We start at the top with all documents in one cluster.
the cluster is split using a flat clustering algorithm. This procedure is
applied recursively until each document is in its own singleton cluster. (2
marks)
Top down clustering is conceptually more complex than bottom –up
clustering since we need a second, flat clustering algorithm as a
subroutine. It has the advantage of being more efficient if we do not
generate a complete hierarchy all the way down to individual document
leaves.For fixed number of top levels, using an efficient flat algorithm
like k-means, top down algorithms are linear in the number of
documents and clusters. (3 marks)
Algorithm: Divisive clustering starts by placing all objects into a single
group. Before we start the procedure, we need to decide on a threshold
distance. Once this is done then the procedure is (1 mark)
1.The distance between all the pairs of objects within the same group is
determined and the pair with the largest distance is selected. (1 mark)
2.This maximum distance compared to the threshold distance. (2 marks)
a.If it is larger than the threshold, this group is divided in two. This is
done by placing the selected pair into different groups and using them as
seed points. All other objects in this group are examined, and are placed
into the new group with the closest seed point. The procedure then
returns to step 1. (1 mark)
b.If the distance between the selected objects is less than the threshold,
the divisive clustering stops. (Total 10 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72562
QUESTION_TEXT
What are the Four different views regarding the design of a data
warehouse? Explain.
SCHEME OF
EVALUATION
The views are:
● Top-down view
● Data source view
● Data warehouse view
● Business query view
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72564
QUESTION_TEXT
Define these Data mining techniques:
a. Classification
b. Regression
c. Clustering
d. Neural networks
SCHEME OF
EVALUATION
a. Classification: Classification is a Data Mining (machine learning)
technique used to predict group membership for data instances.
b. Regression: Regression is the oldest and most well known
Statistical technique that the Data Mining community utilizes. Basically,
Regression takes a numerical dataset and develops a mathematical
formula (Eg: y=a+ bx, here y is the dependant variable and x is the
independent variable) that fits the data.
c. Clustering: Clustering is a method of grouping data into different
groups, so that the data in each group share similar trends and
patterns.
d. Neural networks: An Artificial Neural Network (ANN) is an
information-processing paradigm that is inspired by the way biological
nervous systems, such as the brain, process information.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117791
QUESTION_TEXT
Discuss Data Mining technologies.
a.
Decision trees
b.
Rule induction
c.
Genetic algorithms
d.
Nearest neighbor
e.
Artificial neural networks
SCHEME OF EVALUATION
(2 marks each with explanation)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117792
QUESTION_TEXT
Explain the constraint based Association Mining.

Knowledge type constraints

Data constraints

Dimensional/level constraints
SCHEME OF EVALUATION

Interestingness constraints

Rule constraints
5×2=10 marks