Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
OCTOBER15
ASSESSMENT_CODE MC0088_OCTOBER15
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72563
QUESTION_TEXT
In relation to Association Rule Mining define:
a. Association rule
b. Frequency set
c. Maximal frequency set
d. Border set
SCHEME OF
EVALUATION
a. Association rule: Association rules can be classified in various
ways, based on the following criteria
● Based on the types of values handled in the rule
● Based on the dimensions of data involved in the rule
● Based on the levels of abstractions involved in the rule set
● Based on various extensions to association mining
b. Frequency set: Let T be the transaction database and  be the
user – specified minimum support. An item set X A is said to be a
frequent item set in T with respect to , if s(X)T  .
c. Maximal frequency set: A frequent set is a maximal frequent set if
it is a frequent set and no superset of this is a frequent set.
d. Border set: An item set is a border set if it is not a frequent set,
but all its proper subsets are frequent sets.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72564
QUESTION_TEXT
Define these Data mining techniques:
a. Classification
b. Regression
c. Clustering
d. Neural networks
SCHEME OF
EVALUATION
a. Classification: Classification is a Data Mining (machine learning)
technique used to predict group membership for data instances.
b. Regression: Regression is the oldest and most well known
Statistical technique that the Data Mining community utilizes. Basically,
Regression takes a numerical dataset and develops a mathematical
formula (Eg: y=a+ bx, here y is the dependant variable and x is the
independent variable) that fits the data.
c. Clustering: Clustering is a method of grouping data into different
groups, so that the data in each group share similar trends and
patterns.
d. Neural networks: An Artificial Neural Network (ANN) is an
information-processing paradigm that is inspired by the way biological
nervous systems, such as the brain, process information.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117787
QUESTION_TEXT
Explain Dynamic Itemset Counting algorithm.
Initially,
Solid box contains the empty itemset;
Solid circle is empty;
Dashed box is empty;
Dashed circle contains all 1 – itemsets with the respective stop –
number as 0;
Current stop – number := 0;
SCHEME OF
EVALUATION
do until the dashed circle is empty read the database till the next stop
point and increase the counters of the itemsets in the dashed box and in
the dashed circle as we go along, record by record, to reach the next
stop.
increase the current – stop – number by 1;
for each itemset in the dashed circle
if count of the itemset is greater than 
then move the itemset to the dashed box
generate a new itemset to be put into the dashed circle
with counter value = 0 and stop number = current stop number.
else
if its stop number is equal to the current stop number
then move this itemset to solid circle.
for each itemset in the dashed box
if its stop – number is equal to the current stop umber
then move this itemset to the solid box
end
return the itemsets in solid box
Algorithm 7 marks
Explanation 3 marks
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117789
QUESTION_TEXT
What is data cleaning? Explain missing values method for data
cleaning.
Data cleaning routines attempt to fill in missing values, smooth out
noise while identifying outlines, and correct inconsistencies in the
data. (2 marks)
Missing value Methods are:
SCHEME OF
EVALUATION
1.
Ignore the tuple
2.
Fill in the missing value immediately
3.
Use a global constant to fill in the missing value (4 marks)
4.
Use the attribute value to fill in the missing value
5.
Use the attribute mean for all samples belonging to the same
class as the given tuple
6.
Use the most probable value to fill in the missing value (4 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117791
QUESTION_TEXT
Discuss Data Mining technologies.
a.
Decision trees
b.
Rule induction
c.
Genetic algorithms
d.
Nearest neighbor
e.
Artificial neural networks
SCHEME OF EVALUATION
(2 marks each with explanation)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117793
QUESTION_TEXT
List and explain the various criteria used to compare the
classification methods.
SCHEME OF
EVALUATION

Predictive accuracy

Speed

Robustness

Scalability

Interpretability
5×2=10 marks