Download Answer all questions 1

[ FALL, 2015 ] SCHEME OF EVALUATION PROGRAM SEMESTER SUBJECT CODE & NAME CREDIT BK ID MAX. MARKS BSc IT FOURTH BT0050, DATA WAREHOUSING & MINING 4 B0038 60 Answer all questions Q. No. 1 A Question and scheme of evaluation Describe all data mining functionalities. Unit/p age no. Marks Total Marks 1/24 10 10 10 The data mining functionalities are as follows: 1. Association Analysis “What is association analysis” Association analysis is the 2½ discovery of association rules showing attribute – value conditions that occur frequently together in a given set of data. Association analysis is widely used for market basket or transaction data analysis….. 2. Classification and Predication Classification is the process of finding a set of models (or 2½ functions) that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e. data objects whose class label is known)…… 3. Cluster Analysis “What is cluster analysis” unlike classification and prediction, which analyze class – labeled data objects, clustering analyze data objects without consulting a known class label. In general, 2½ the class labels are not present in the training data simply because they are not known to begin with. Clustering can be used to generate such labels….. 4. Outlier Analysis A database may contain data objects that do not comply with the general behaviour or model of the data. These data objects are 2½ outliers. Most data mining methods discard outliers as noise or exceptions. However, in some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring ones. The analysis of outlier data is referred to as outlier mining…… 2 A Explain the three-tier Data Warehouse Architecture. Data warehouses often adopt a three – tier architecture, as presented in Figure below: 2/66 10 10 10 10 The bottom tier is a warehouse database server The middle tier is an OLAP The top tier is a client 3 What is noise? Explain data smoothening techniques. A Noise is random error or variance in measured variable. 4 The following are data smoothing techniques. Binning Clustering Combined computer and human inspection: Regression Explain in brief the Pincers – Search Algorithm A One can see that the a priori algorithm operates in a bottom – up, breadth – first search method. The computation starts form the 3/74 2+8 10 2 10 4x2=8 4/115 10 10 6 10 smallest set of frequent itemsets and moves upward till it reaches the largest frequent itemset the number of database passes is equal to the largest size of the frequent itemset. When nay one of the frequent itemsets becomes longer, the algorithm has to go through many iterations and, as a result, the performance decreases. A natural way to overcome this difficulty is to somehow incorporate a bi – directional search, which takes advantages of both the bottom – up as well as the top – down process. The pincer – search algorithm is based on this principle. It attempts to find the frequent itemsets in a bottom – up manner but, at the same time, it maintains a list of maximal frequent itemsets. While making a database pass, it also counts the support of these candidate maximal frequent itemsets to see if any one of these is actually frequent. In that event, it can conclude that all the subsets of these frequent sets are going to be frequent and, hence, they are not verified for the support count in the next pass…….. Write pincer search method 5 Explain the preprocessing steps to improve the efficiency and 4 5/164 10 10 4x 2 ½ 10 2+8 10 2 10 accuracy of data classification or prediction process. A Preprocessing steps may be applied to the data in order to help improve the accuracy, efficiency, and scalability of the classification or prediction process are: Data cleaning Relevance analysis Data transformation Normalization Explain all. 6 A What is clustering? Explain the requirements of Cluster in data mining. The process of grouping a set of physical or abstract objects into 6/177, 179 classes of similar objects is called clustering. Requirements of Clustering in data mining are: Scalability: Ability to deal with different types of attributes Discovery of clusters with arbitrary shape Minimal requirements for domain knowledge to determine input parameters Ability to deal with noisy data Insensitivity to the order of input records High dimensionality Constraint – based clustering Explain all. 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Answer all questions 1