Download Answer all questions 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
[ FALL, 2015 ] SCHEME OF EVALUATION
PROGRAM
SEMESTER
SUBJECT CODE &
NAME
CREDIT
BK ID
MAX. MARKS
BSc IT
FOURTH
BT0050, DATA WAREHOUSING & MINING
4
B0038
60
Answer all questions
Q. No.
1
A
Question and scheme of evaluation
Describe all data mining functionalities.
Unit/p
age
no.
Marks
Total
Marks
1/24
10
10
10
The data mining functionalities are as follows:
1. Association Analysis
“What is association analysis” Association analysis is the
2½
discovery of association rules showing attribute – value
conditions that occur frequently together in a given set of data.
Association analysis is widely used for market basket or
transaction data analysis…..
2. Classification and Predication
Classification is the process of finding a set of models (or
2½
functions) that describe and distinguish data classes or concepts,
for the purpose of being able to use the model to predict the class
of objects whose class label is unknown. The derived model is
based on the analysis of a set of training data (i.e. data objects
whose class label is known)……
3. Cluster Analysis
“What is cluster analysis” unlike classification and prediction,
which analyze class – labeled data objects, clustering analyze
data objects without consulting a known class label. In general,
2½
the class labels are not present in the training data simply
because they are not known to begin with. Clustering can be
used to generate such labels…..
4. Outlier Analysis
A database may contain data objects that do not comply with the
general behaviour or model of the data. These data objects are
2½
outliers. Most data mining methods discard outliers as noise or
exceptions. However, in some applications such as fraud
detection, the rare events can be more interesting than the more
regularly occurring ones. The analysis of outlier data is referred to
as outlier mining……
2
A
Explain the three-tier Data Warehouse Architecture.
Data warehouses often adopt a three – tier architecture, as
presented in Figure below:
2/66
10
10
10
10
The bottom tier is a warehouse database server
The middle tier is an OLAP
The top tier is a client
3
What is noise? Explain data smoothening techniques.
A
Noise is random error or variance in measured variable.
4
The following are data smoothing techniques.
Binning
Clustering
Combined computer and human inspection:
Regression
Explain in brief the Pincers – Search Algorithm
A
One can see that the a priori algorithm operates in a bottom – up,
breadth – first search method. The computation starts form the
3/74
2+8
10
2
10
4x2=8
4/115
10
10
6
10
smallest set of frequent itemsets and moves upward till it
reaches the largest frequent itemset the number of database
passes is equal to the largest size of the frequent itemset. When
nay one of the frequent itemsets becomes longer, the algorithm
has to go through many iterations and, as a result, the
performance decreases.
A natural way to overcome this difficulty is to somehow
incorporate a bi – directional search, which takes advantages of
both the bottom – up as well as the top – down process. The
pincer – search algorithm is based on this principle. It attempts
to find the frequent itemsets in a bottom – up manner but, at the
same time, it maintains a list of maximal frequent itemsets. While
making a database pass, it also counts the support of these
candidate maximal frequent itemsets to see if any one of these is
actually frequent. In that event, it can conclude that all the
subsets of these frequent sets are going to be frequent and,
hence, they are not verified for the support count in the next
pass……..
Write pincer search method
5
Explain the preprocessing steps to improve the efficiency and
4
5/164
10
10
4x 2 ½
10
2+8
10
2
10
accuracy of data classification or prediction process.
A
Preprocessing steps may be applied to the data in order to help
improve the accuracy, efficiency, and scalability of the
classification or prediction process are:
Data cleaning
Relevance analysis
Data transformation
Normalization
Explain all.
6
A
What is clustering? Explain the requirements of Cluster in data
mining.
The process of grouping a set of physical or abstract objects into
6/177,
179
classes of similar objects is called clustering.
Requirements of Clustering in data mining are:
Scalability:
Ability to deal with different types of attributes
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to determine input
parameters
Ability to deal with noisy data
Insensitivity to the order of input records
High dimensionality
Constraint – based clustering
Explain all.
8