Download Chapter 2

Data Mining Review Questions / XLMiner Labs Chapter 2 – Overview of the Data Mining Process 1. Assuming that data mining techniques are to be used in the following cases, identify whether the task required is supervised or unsupervised learning (textbook reference - 2.1). a. Deciding whether or not to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers). b. In an online bookstore, making recommendations to customers concerning additional items to buy based on the buying patterns of prior transactions. c. Identifying a network data packet as dangerous (e.g., virus, hacker attack) based on comparison to other packets whose threat status is known. d. Identifying segments of similar customers. e. Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and non-bankrupt firms. f. Estimating the repair time required for an aircraft based on a trouble ticket. g. Automated sorting of mail by zip code scanning. h. Printing of customer discount coupons at the conclusion of a grocery store checkout based on what you just bought and what others have bought previously. 2. Describe the difference in roles assumed by the validation partition and the test partition (textbook reference - 2.2). 3. Using the concept of over fitting, explain why that when a model is fit to training data, zero error with those data are not necessarily good (textbook reference - 2.5). 4. Two models are applied to a dataset that has been partitioned. Model A is considerably more accurate than Model B on the training data but slightly less accurate than Model B on the validation data. Which model are you more likely to consider for final deployment? Explain your choice. (textbook reference - 2.10) Page 1 of 2 5. The next 2 Questions require the Use of XLMiner Data Mining software and the UniversalBank.xls dataset . . . a. Use XLMiner’s Convert to Dummies utility to convert the categorical variable Education to binary dummy variables. After the conversion, how many resulting columns exist for the Education variable? Why is this conversion performed? b. Using the newly created dataset (with binary dummy variables), use XLMiner’s Partitioning function to perform Standard Partitioning (accept the default percentages for partitioning). How many records were assigned to the Training Partition? How many records were assigned to the Validation Partition? Why was a Test Partition not created? Page 2 of 2

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 2