Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CUSTOMER_CODE SMUDE DIVISION_CODE SMUDE EVENT_CODE OCTOBER15 ASSESSMENT_CODE MC0088_OCTOBER15 QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72563 QUESTION_TEXT In relation to Association Rule Mining define: a. Association rule b. Frequency set c. Maximal frequency set d. Border set SCHEME OF EVALUATION a. Association rule: Association rules can be classified in various ways, based on the following criteria ● Based on the types of values handled in the rule ● Based on the dimensions of data involved in the rule ● Based on the levels of abstractions involved in the rule set ● Based on various extensions to association mining b. Frequency set: Let T be the transaction database and be the user – specified minimum support. An item set X A is said to be a frequent item set in T with respect to , if s(X)T . c. Maximal frequency set: A frequent set is a maximal frequent set if it is a frequent set and no superset of this is a frequent set. d. Border set: An item set is a border set if it is not a frequent set, but all its proper subsets are frequent sets. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 72564 QUESTION_TEXT Define these Data mining techniques: a. Classification b. Regression c. Clustering d. Neural networks SCHEME OF EVALUATION a. Classification: Classification is a Data Mining (machine learning) technique used to predict group membership for data instances. b. Regression: Regression is the oldest and most well known Statistical technique that the Data Mining community utilizes. Basically, Regression takes a numerical dataset and develops a mathematical formula (Eg: y=a+ bx, here y is the dependant variable and x is the independent variable) that fits the data. c. Clustering: Clustering is a method of grouping data into different groups, so that the data in each group share similar trends and patterns. d. Neural networks: An Artificial Neural Network (ANN) is an information-processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117787 QUESTION_TEXT Explain Dynamic Itemset Counting algorithm. Initially, Solid box contains the empty itemset; Solid circle is empty; Dashed box is empty; Dashed circle contains all 1 – itemsets with the respective stop – number as 0; Current stop – number := 0; SCHEME OF EVALUATION do until the dashed circle is empty read the database till the next stop point and increase the counters of the itemsets in the dashed box and in the dashed circle as we go along, record by record, to reach the next stop. increase the current – stop – number by 1; for each itemset in the dashed circle if count of the itemset is greater than then move the itemset to the dashed box generate a new itemset to be put into the dashed circle with counter value = 0 and stop number = current stop number. else if its stop number is equal to the current stop number then move this itemset to solid circle. for each itemset in the dashed box if its stop – number is equal to the current stop umber then move this itemset to the solid box end return the itemsets in solid box Algorithm 7 marks Explanation 3 marks QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117789 QUESTION_TEXT What is data cleaning? Explain missing values method for data cleaning. Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outlines, and correct inconsistencies in the data. (2 marks) Missing value Methods are: SCHEME OF EVALUATION 1. Ignore the tuple 2. Fill in the missing value immediately 3. Use a global constant to fill in the missing value (4 marks) 4. Use the attribute value to fill in the missing value 5. Use the attribute mean for all samples belonging to the same class as the given tuple 6. Use the most probable value to fill in the missing value (4 marks) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117791 QUESTION_TEXT Discuss Data Mining technologies. a. Decision trees b. Rule induction c. Genetic algorithms d. Nearest neighbor e. Artificial neural networks SCHEME OF EVALUATION (2 marks each with explanation) QUESTION_TYPE DESCRIPTIVE_QUESTION QUESTION_ID 117793 QUESTION_TEXT List and explain the various criteria used to compare the classification methods. SCHEME OF EVALUATION Predictive accuracy Speed Robustness Scalability Interpretability 5×2=10 marks