Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE APR2016 ASSESSMENT_CODE MC0088_APR2016 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5228 QUESTION_TEXT Explain the concept of data transformation. SCHEME OF EVALUATION In data transformation, the data are transformed or consolidated into forms appropriate for mining. It involves the following: (2 marks) Smoothing: which works to remove the noise form data? Such techniques include binning, clustering and regression. Aggregation, where summary of aggregation operations are applied to the data. This step is typically used in constructing a data cube for analysis of the data at multiple granularities. (2 marks) Generalization of the data, where low level or primitive data are replaced by higher level concepts through the use of concept hierarchies. Ex like street can be generalizes to city or country (2 marks) Normalization, where attribute data are scaled so as to fall within a small specified range such as 1.0 to 1.0 or 0.0 to 1.0 (2 marks) Attribute construction where new attributes are constructed and added from the given set of attributes to help the mining process. (2 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5230 QUESTION_TEXT Explain Web structure mining. SCHEME OF EVALUATION Web structure mining focuses on analysis of the link structure of the web and one of its purposes is to identify more preferable documents. The different objects are linked in some way. The intuition is that a hyperlink from document A to document B implies that the author of document. (2 marks) Web structure mining helps in discovering similarities between web sites or discovering important sites for a particular topic or discipline or in discovering web communities. Simply applying the traditional process and assuming that the events are independent can lead to wrong conclusions. (2 marks) The goal of web structure mining is to generate structural summary about the web site and web page. Technically web content mining mainly focuses on the structure of inner document. While web structure mining tries to discover the link structure of the hyperlinks at the interdocument level. (2 marks) Based on the topology of the hyperlinks, web structure mining will categorize the web pages and generate the information, such a s the similarity and relationship between different web sites. Web structure mining can also have another direction-discovering the structure of web document itself. (2 marks) This type of structure mining can be used to reveal the Web structure of web pages. This would be good for navigation purpose and make it possible to compare/integrate web page schemes. This type of structure mining will facilitate introducing database techniques for accessing information in web by providing a reference schema. (2 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72559 QUESTION_TEXT Discuss data smoothing technologies. a. b. SCHEME OF EVALUATION c. d. Binning Clustering Combined computer and human inspection Regression (2.5 marks each with explanation) QUESTION_T DESCRIPTIVE_QUESTION YPE QUESTION_ID 117786 QUESTION_T Define Data Mining and DBMS. Differentiate between them. EXT Data Mining or knowledge discovery in databases, as it is also known, is the non-trivial extraction of implicit, previously unknown and potentially useful information from the data. SCHEME OF EVALUATION Data mining is the search for the relationships and global patterns that exist in large databases but are hidden among vast amounts of data, such as relationship between patient data and their medical diagnosis. Data Mining is the process of discovering meaningful, new correlation patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition techniques. (Any 1 definition 1 mark) A DBMS is a "Database Management System". This is the software that manages data on physical storage devices. (1 mark) Diffrences: (8 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117788 QUESTION_TEXT Explain the methods of classification by Decision Tree Induction. A decision tree is a flow – chart – like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. The top – most node in a tree is the root node. SCHEME OF EVALUATION Algorithm: Generate _ decision _ tree. Generate a decision tree from the given training data. Input: The training samples, samples, represented by discrete – valued attributes; the set of candidate attributes, attribute – list. Output: A decision tree Method: 1. create a node N; 2. if samples are all of the same class, C then 3. return N as a leaf node labeled with the class C; 4. if attribute – list is empty then 5. return N as a leaf node labeled with the most common class in samples ; // majority voting 6. select test – attribute, the attribute among attribute – list with the highest information gain; 7. label node N with test – attribute; 8. for each known value ai of test – attribute // partition the sample 9. grow a branch from node N for the condition test – attribute = ai; 10. let si be the set of samples in samples for which test – attribute = ai; // a partition 11. if si is empty then 12. attach a leaf labeled with the most common class in samples; 13. else attach the node returned by Generate _ decision_ tree (si, attribute – list – test – attribute); QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117790 QUESTION_TEXT Define FP-Tree. Explain FP-tree construction Algorithm A frequent pattern tree( or fp-tree) is a tree structure consisting of an item –prefix –tree and a frequent – item-header table. (3 marks) SCHEME OF EVALUATION * Item – prefix- tree: * It consists of a root node labelled null * * Each on-root node consists of three fields * Item name * Support count, * Node link Frequent – item – header – table: it consists of two fields: * * the FP-tree Item name Head of node link which points to the first node in (7 marks) FP-Tree construction algorithm is given in page no: 113