Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE JAN2016 ASSESSMENT_CODE MC0088_JAN2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5230 QUESTION_TEXT Explain Web structure mining. SCHEME OF EVALUATION Web structure mining focuses on analysis of the link structure of the web and one of its purposes is to identify more preferable documents. The different objects are linked in some way. The intuition is that a hyperlink from document A to document B implies that the author of document. (2 marks) Web structure mining helps in discovering similarities between web sites or discovering important sites for a particular topic or discipline or in discovering web communities. Simply applying the traditional process and assuming that the events are independent can lead to wrong conclusions. (2 marks) The goal of web structure mining is to generate structural summary about the web site and web page. Technically web content mining mainly focuses on the structure of inner document. While web structure mining tries to discover the link structure of the hyperlinks at the interdocument level. (2 marks) Based on the topology of the hyperlinks, web structure mining will categorize the web pages and generate the information, such a s the similarity and relationship between different web sites. Web structure mining can also have another direction-discovering the structure of web document itself. (2 marks) This type of structure mining can be used to reveal the Web structure of web pages. This would be good for navigation purpose and make it possible to compare/integrate web page schemes. This type of structure mining will facilitate introducing database techniques for accessing information in web by providing a reference schema. (2 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72561 QUESTION_TEXT Discuss key features of a Data warehouse as per W. H. Inmon’s statement. SCHEME OF EVALUATION Key features are: ● Subject-oriented ● ● ● Integrated Time variant Non-volatile QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72563 QUESTION_TEXT In relation to Association Rule Mining define: a. Association rule b. Frequency set c. Maximal frequency set d. Border set SCHEME OF EVALUATION a. Association rule: Association rules can be classified in various ways, based on the following criteria ● Based on the types of values handled in the rule ● Based on the dimensions of data involved in the rule ● Based on the levels of abstractions involved in the rule set ● Based on various extensions to association mining b. Frequency set: Let T be the transaction database and be the user – specified minimum support. An item set X A is said to be a frequent item set in T with respect to , if s(X)T . c. Maximal frequency set: A frequent set is a maximal frequent set if it is a frequent set and no superset of this is a frequent set. d. Border set: An item set is a border set if it is not a frequent set, but all its proper subsets are frequent sets. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72564 QUESTION_TEXT Define these Data mining techniques: a. Classification b. Regression c. Clustering d. Neural networks SCHEME OF EVALUATION a. Classification: Classification is a Data Mining (machine learning) technique used to predict group membership for data instances. b. Regression: Regression is the oldest and most well known Statistical technique that the Data Mining community utilizes. Basically, Regression takes a numerical dataset and develops a mathematical formula (Eg: y=a+ bx, here y is the dependant variable and x is the independent variable) that fits the data. c. Clustering: Clustering is a method of grouping data into different groups, so that the data in each group share similar trends and patterns. d. Neural networks: An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117788 QUESTION_TEXT Explain the methods of classification by Decision Tree Induction. A decision tree is a flow – chart – like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. The top – most node in a tree is the root node. Algorithm: Generate _ decision _ tree. Generate a decision tree from the given training data. Input: The training samples, samples, represented by discrete – valued attributes; the set of candidate attributes, attribute – list. Output: A decision tree SCHEME OF EVALUATION Method: 1. create a node N; 2. if samples are all of the same class, C then 3. return N as a leaf node labeled with the class C; 4. if attribute – list is empty then 5. return N as a leaf node labeled with the most common class in samples ; // majority voting 6. select test – attribute, the attribute among attribute – list with the highest information gain; 7. label node N with test – attribute; 8. for each known value ai of test – attribute // partition the sample 9. grow a branch from node N for the condition test – attribute = ai; 10. let si be the set of samples in samples for which test – attribute = ai; // a partition 11. if si is empty then 12. attach a leaf labeled with the most common class in samples; 13. else attach the node returned by Generate _ decision_ tree (si, attribute – list – test – attribute); QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117794 QUESTION_TEXT List and explain the web content mining challenges. Data/Information extraction Web information integration and schema matching Opinion extraction from online sources Knowledge synthesis Segmenting web pages and detecting noise SCHEME OF EVALUATION 5×2=10 marks