Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
APR2016
ASSESSMENT_CODE MC0088_APR2016
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5228
QUESTION_TEXT
Explain the concept of data transformation.
SCHEME OF
EVALUATION
In data transformation, the data are transformed or consolidated into
forms appropriate for mining. It involves the following: (2 marks)
Smoothing: which works to remove the noise form data? Such
techniques include binning, clustering and regression.
Aggregation, where summary of aggregation operations are applied to
the data. This step is typically used in constructing a data cube for
analysis of the data at multiple granularities. (2 marks)
Generalization of the data, where low level or primitive data are
replaced by higher level concepts through the use of concept
hierarchies. Ex like street can be generalizes to city or country (2
marks)
Normalization, where attribute data are scaled so as to fall within a
small specified range such as 1.0 to 1.0 or 0.0 to 1.0 (2 marks)
Attribute construction where new attributes are constructed and added
from the given set of attributes to help the mining process. (2 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5230
QUESTION_TEXT
Explain Web structure mining.
SCHEME OF
EVALUATION
Web structure mining focuses on analysis of the link structure of the web
and one of its purposes is to identify more preferable documents. The
different objects are linked in some way. The intuition is that a hyperlink
from document A to document B implies that the author of document. (2
marks)
Web structure mining helps in discovering similarities between web sites
or discovering important sites for a particular topic or discipline or in
discovering web communities. Simply applying the traditional process
and assuming that the events are independent can lead to wrong
conclusions. (2 marks)
The goal of web structure mining is to generate structural summary
about the web site and web page. Technically web content mining
mainly focuses on the structure of inner document. While web structure
mining tries to discover the link structure of the hyperlinks at the interdocument level. (2 marks)
Based on the topology of the hyperlinks, web structure mining will
categorize the web pages and generate the information, such a s the
similarity and relationship between different web sites. Web structure
mining can also have another direction-discovering the structure of web
document itself. (2 marks)
This type of structure mining can be used to reveal the Web structure of
web pages. This would be good for navigation purpose and make it
possible to compare/integrate web page schemes. This type of structure
mining will facilitate introducing database techniques for accessing
information in web by providing a reference schema. (2 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72559
QUESTION_TEXT
Discuss data smoothing technologies.
a.
b.
SCHEME OF EVALUATION c.
d.
Binning
Clustering
Combined computer and human inspection
Regression
(2.5 marks each with explanation)
QUESTION_T
DESCRIPTIVE_QUESTION
YPE
QUESTION_ID 117786
QUESTION_T
Define Data Mining and DBMS. Differentiate between them.
EXT
Data Mining or knowledge discovery in databases, as it is also known, is the
non-trivial extraction of implicit, previously unknown and potentially useful
information from the data.
SCHEME OF
EVALUATION
Data mining is the search for the relationships and global patterns that exist
in large databases but are hidden among vast amounts of data, such as
relationship between patient data and their medical diagnosis.
Data Mining is the process of discovering meaningful, new correlation
patterns and trends by sifting through large amounts of data stored in
repositories, using pattern recognition techniques.
(Any 1 definition 1 mark)
A DBMS is a "Database Management System". This is the software that
manages data on physical storage devices.
(1 mark)
Diffrences: (8 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117788
QUESTION_TEXT
Explain the methods of classification by Decision Tree Induction.
A decision tree is a flow – chart – like tree structure, where each
internal node denotes a test on an attribute, each branch represents an
outcome of the test, and leaf nodes represent classes or class
distributions. The top – most node in a tree is the root node.
SCHEME OF
EVALUATION
Algorithm: Generate _ decision _ tree. Generate a decision tree from
the given training data.
Input: The training samples, samples, represented by discrete – valued
attributes; the set of candidate attributes, attribute – list.
Output: A decision tree
Method:
1.
create a node N;
2.
if samples are all of the same class, C then
3.
return N as a leaf node labeled with the class C;
4.
if attribute – list is empty then
5.
return N as a leaf node labeled with the most common class in
samples ;
// majority voting
6.
select test – attribute, the attribute among attribute – list with the
highest
information gain;
7.
label node N with test – attribute;
8.
for each known value ai of test – attribute // partition the sample
9.
grow a branch from node N for the condition test – attribute = ai;
10. let si be the set of samples in samples for which test – attribute = ai;
// a
partition
11.
if si is empty then
12.
attach a leaf labeled with the most common class in samples;
13. else attach the node returned by Generate _ decision_ tree (si,
attribute – list – test – attribute);
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117790
QUESTION_TEXT
Define FP-Tree. Explain FP-tree construction Algorithm
A frequent pattern tree( or fp-tree) is a tree structure consisting of an
item –prefix –tree and a frequent – item-header table. (3 marks)
SCHEME OF
EVALUATION
*
Item – prefix- tree:
*
It consists of a root node labelled null
*
*
Each on-root node consists of three fields
*
Item name
*
Support count,
*
Node link
Frequent – item – header – table: it consists of two fields:
*
*
the FP-tree
Item name
Head of node link which points to the first node in
(7 marks)
FP-Tree construction algorithm is given in page no: 113