Download CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CUSTOMER_CODE
SMUDE
DIVISION_CODE
SMUDE
EVENT_CODE
JAN2016
ASSESSMENT_CODE MC0088_JAN2016
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
5230
QUESTION_TEXT
Explain Web structure mining.
SCHEME OF
EVALUATION
Web structure mining focuses on analysis of the link structure of the web
and one of its purposes is to identify more preferable documents. The
different objects are linked in some way. The intuition is that a hyperlink
from document A to document B implies that the author of document. (2
marks)
Web structure mining helps in discovering similarities between web sites
or discovering important sites for a particular topic or discipline or in
discovering web communities. Simply applying the traditional process
and assuming that the events are independent can lead to wrong
conclusions. (2 marks)
The goal of web structure mining is to generate structural summary
about the web site and web page. Technically web content mining
mainly focuses on the structure of inner document. While web structure
mining tries to discover the link structure of the hyperlinks at the interdocument level. (2 marks)
Based on the topology of the hyperlinks, web structure mining will
categorize the web pages and generate the information, such a s the
similarity and relationship between different web sites. Web structure
mining can also have another direction-discovering the structure of web
document itself. (2 marks)
This type of structure mining can be used to reveal the Web structure of
web pages. This would be good for navigation purpose and make it
possible to compare/integrate web page schemes. This type of structure
mining will facilitate introducing database techniques for accessing
information in web by providing a reference schema. (2 marks)
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72561
QUESTION_TEXT
Discuss key features of a Data warehouse as per W. H. Inmon’s
statement.
SCHEME OF
EVALUATION
Key features are:
● Subject-oriented
●
●
●
Integrated
Time variant
Non-volatile
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72563
QUESTION_TEXT
In relation to Association Rule Mining define:
a. Association rule
b. Frequency set
c. Maximal frequency set
d. Border set
SCHEME OF
EVALUATION
a. Association rule: Association rules can be classified in various
ways, based on the following criteria
● Based on the types of values handled in the rule
● Based on the dimensions of data involved in the rule
● Based on the levels of abstractions involved in the rule set
● Based on various extensions to association mining
b. Frequency set: Let T be the transaction database and  be the
user – specified minimum support. An item set X A is said to be a
frequent item set in T with respect to , if s(X)T  .
c. Maximal frequency set: A frequent set is a maximal frequent set if
it is a frequent set and no superset of this is a frequent set.
d. Border set: An item set is a border set if it is not a frequent set,
but all its proper subsets are frequent sets.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
72564
QUESTION_TEXT
Define these Data mining techniques:
a. Classification
b. Regression
c. Clustering
d. Neural networks
SCHEME OF
EVALUATION
a. Classification: Classification is a Data Mining (machine learning)
technique used to predict group membership for data instances.
b. Regression: Regression is the oldest and most well known
Statistical technique that the Data Mining community utilizes. Basically,
Regression takes a numerical dataset and develops a mathematical
formula (Eg: y=a+ bx, here y is the dependant variable and x is the
independent variable) that fits the data.
c. Clustering: Clustering is a method of grouping data into different
groups, so that the data in each group share similar trends and
patterns.
d. Neural networks: An Artificial Neural Network (ANN) is an
information-processing paradigm that is inspired by the way biological
nervous systems, such as the brain, process information.
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117788
QUESTION_TEXT
Explain the methods of classification by Decision Tree Induction.
A decision tree is a flow – chart – like tree structure, where each
internal node denotes a test on an attribute, each branch represents an
outcome of the test, and leaf nodes represent classes or class
distributions. The top – most node in a tree is the root node.
Algorithm: Generate _ decision _ tree. Generate a decision tree from
the given training data.
Input: The training samples, samples, represented by discrete – valued
attributes; the set of candidate attributes, attribute – list.
Output: A decision tree
SCHEME OF
EVALUATION
Method:
1.
create a node N;
2.
if samples are all of the same class, C then
3.
return N as a leaf node labeled with the class C;
4.
if attribute – list is empty then
5.
return N as a leaf node labeled with the most common class in
samples ;
// majority voting
6.
select test – attribute, the attribute among attribute – list with the
highest
information gain;
7.
label node N with test – attribute;
8.
for each known value ai of test – attribute // partition the sample
9.
grow a branch from node N for the condition test – attribute = ai;
10. let si be the set of samples in samples for which test – attribute = ai;
// a
partition
11.
if si is empty then
12.
attach a leaf labeled with the most common class in samples;
13. else attach the node returned by Generate _ decision_ tree (si,
attribute – list – test – attribute);
QUESTION_TYPE
DESCRIPTIVE_QUESTION
QUESTION_ID
117794
QUESTION_TEXT
List and explain the web content mining challenges.

Data/Information extraction

Web information integration and schema matching

Opinion extraction from online sources

Knowledge synthesis

Segmenting web pages and detecting noise
SCHEME OF EVALUATION
5×2=10 marks