Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SMU-DDE-Assignments-Scheme of Evaluation PROGRAM SEMESTER SUBJECT CODE & NAME BK ID DRIVE MARKS Q. No 1 A 2 A MSC BIOINFORMATICS 4 BI0041 DATA WAREHOUSING AND DATA MINING B0038 WINTER 2015 60 Total Marks Define a border set. Prove that every subset of any item set must contain either a frequent set or a border set. (Unit 4, Page No. 104) Definition of border set: An item set is a border set if it is not a 2 10 frequent set, but all its proper subsets are frequent sets. Explanation for every subset of any item set must contain either a 8 frequent set or a border set. Discuss the following with suitable examples in the context of Data Cleaning. a. Missing Values (Unit 3, Page No. 73) b. Noisy Data Five methods of filling the missing values: 5 10 Ignore the tuple: This is usually done when the class label is missing (assuming the mining task involves classification or description). This method is not very effective, unless the tuple contains several attributes with missing values. It is especially poor when the percentage of missing values per attribute varies considerably. Fill in the missing value manually: In general, this approach is time consuming and may not be feasible given a large data set with many missing values. Use a global constant to fill in the missing value: Replace all missing attribute values by the same constant, such as a label like “Unknown” or- . If missing values are replaced by ,say, “Unknown,” then the mining program may mistakenly think that they form an interesting concept, since they all have a value in common-that of “Unknown.” Hence, although this method is simple, it is not recommended. Use the attribute mean to fill in the missing value: For example, suppose that the average income of All Electronics customer is $28,000. Use this value to replace the missing value for income. Use the attribute mean for al samples belonging to the same class as the given tuple. Criteria Explanation for ‘Noisy Data in the context of data clearing’ with example. Noise is random error or variance in measured variable. Data smoothing techniques to remove noise are: Binning Marks 5 SMU-DDE-Assignments-Scheme of Evaluation 3 A 4 A 5 A 6 Clustering Combined computer and human inspection Regression Explain any five major differences between operational database systems and data warehouses. (Unit 2, Page No. 43) Differences between operational database systems and data 5X2 10 warehouses in terms of the following parameters: Users and system orientation Data contents Database design View Access patterns Define data reduction. Explain the techniques of data reduction. (Unit 3, Page No. 81) Definition of data reduction. 1 10 Data reduction is a technique applied to obtain a reduced representation of the data set that is much smaller in volume, yet closely maintains the integrity of the original data. Explanation for the following strategies for data reduction include the 9 following: 1. Date cube aggregation, where aggregation operations are applied to the data in the construction of a data cube. 2. Dimension reduction, where irrelevant, weakly relevant, or redundant attributes or dimensions may be detected and removed. 3. Data compression, where encoding mechanisms are used to reduce the data set size. 4. Numerosity reduction, where the data are replaced or estimated by alternative, smaller data representations such as a parametric models (which need store only the model parameters instead of the actual data), or nonparametric methods such as clustering, sampling, and the use of histograms. 5. Discretization and concept hierarchy generation, where raw data values for attributes are replaced by ranges or higher conceptual levels. Define data mining. Explain the major issues in data mining. (Unit 1, Page No. 11, 27) Definition of data mining: 3 10 Data mining refers to extracting or “mining” knowledge from large amounts of data. Explaining the major issues in data mining: 7 Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Data mining query languages and ad hoc data mining Presentation and visualization of data mining results Handling noisy or incomplete data Pattern evaluation Describe any five requirements for clustering in data mining. (Unit 6, Page No. 179) SMU-DDE-Assignments-Scheme of Evaluation A Explanation for any five of the following requirements of Clustering in data mining; Scalability Ability to deal with different types of attributes Discovery of clusters with arbitrary shape Minimal requirements for domain knowledge to determine input parameters Ability to deal with noisy data Insensitivity to the order of input records High dimensionality Constraint – based clustering 5X2 10 *A-Answer Note –Please provide keywords, short answer, specific terms, specific examples (wherever necessary) ***********