Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE Jan2017 ASSESSMENT_CODE MC0088_Jan2017 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5228 QUESTION_TEXT Explain the concept of data transformation. SCHEME OF EVALUATION In data transformation, the data are transformed or consolidated into forms appropriate for mining. It involves the following: (2 marks) Smoothing: which works to remove the noise form data? Such techniques include binning, clustering and regression. Aggregation, where summary of aggregation operations are applied to the data. This step is typically used in constructing a data cube for analysis of the data at multiple granularities. (2 marks) Generalization of the data, where low level or primitive data are replaced by higher level concepts through the use of concept hierarchies. Ex like street can be generalizes to city or country (2 marks) Normalization, where attribute data are scaled so as to fall within a small specified range such as 1.0 to 1.0 or 0.0 to 1.0 (2 marks) Attribute construction where new attributes are constructed and added from the given set of attributes to help the mining process. (2 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 5229 QUESTION_TEXT What is Divisive clustering? Write algorithmic steps for the divisive clustering. SCHEME OF EVALUATION The variant of hierarchical clustering is called top-down clustering or divisive clustering.We start at the top with all documents in one cluster. the cluster is split using a flat clustering algorithm. This procedure is applied recursively until each document is in its own singleton cluster. (2 marks) Top down clustering is conceptually more complex than bottom –up clustering since we need a second, flat clustering algorithm as a subroutine. It has the advantage of being more efficient if we do not generate a complete hierarchy all the way down to individual document leaves.For fixed number of top levels, using an efficient flat algorithm like k-means, top down algorithms are linear in the number of documents and clusters. (3 marks) Algorithm: Divisive clustering starts by placing all objects into a single group. Before we start the procedure, we need to decide on a threshold distance. Once this is done then the procedure is (1 mark) 1.The distance between all the pairs of objects within the same group is determined and the pair with the largest distance is selected. (1 mark) 2.This maximum distance compared to the threshold distance. (2 marks) a.If it is larger than the threshold, this group is divided in two. This is done by placing the selected pair into different groups and using them as seed points. All other objects in this group are examined, and are placed into the new group with the closest seed point. The procedure then returns to step 1. (1 mark) b.If the distance between the selected objects is less than the threshold, the divisive clustering stops. (Total 10 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72562 QUESTION_TEXT What are the Four different views regarding the design of a data warehouse? Explain. SCHEME OF EVALUATION The views are: ● Top-down view ● Data source view ● Data warehouse view ● Business query view QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72564 QUESTION_TEXT Define these Data mining techniques: a. Classification b. Regression c. Clustering d. Neural networks SCHEME OF EVALUATION a. Classification: Classification is a Data Mining (machine learning) technique used to predict group membership for data instances. b. Regression: Regression is the oldest and most well known Statistical technique that the Data Mining community utilizes. Basically, Regression takes a numerical dataset and develops a mathematical formula (Eg: y=a+ bx, here y is the dependant variable and x is the independent variable) that fits the data. c. Clustering: Clustering is a method of grouping data into different groups, so that the data in each group share similar trends and patterns. d. Neural networks: An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117791 QUESTION_TEXT Discuss Data Mining technologies. a. Decision trees b. Rule induction c. Genetic algorithms d. Nearest neighbor e. Artificial neural networks SCHEME OF EVALUATION (2 marks each with explanation) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117792 QUESTION_TEXT Explain the constraint based Association Mining. Knowledge type constraints Data constraints Dimensional/level constraints SCHEME OF EVALUATION Interestingness constraints Rule constraints 5×2=10 marks