Download Chapter 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining Review Questions / XLMiner Labs
Chapter 2 – Overview of the Data Mining Process
1. Assuming that data mining techniques are to be used in the following cases, identify whether the
task required is supervised or unsupervised learning (textbook reference - 2.1).
a. Deciding whether or not to issue a loan to an applicant based on demographic and financial
data (with reference to a database of similar data on prior customers).
b. In an online bookstore, making recommendations to customers concerning additional items
to buy based on the buying patterns of prior transactions.
c. Identifying a network data packet as dangerous (e.g., virus, hacker attack) based on
comparison to other packets whose threat status is known.
d. Identifying segments of similar customers.
e. Predicting whether a company will go bankrupt based on comparing its financial data to
those of similar bankrupt and non-bankrupt firms.
f.
Estimating the repair time required for an aircraft based on a trouble ticket.
g. Automated sorting of mail by zip code scanning.
h. Printing of customer discount coupons at the conclusion of a grocery store checkout based
on what you just bought and what others have bought previously.
2. Describe the difference in roles assumed by the validation partition and the test partition (textbook
reference - 2.2).
3. Using the concept of over fitting, explain why that when a model is fit to training data, zero error
with those data are not necessarily good (textbook reference - 2.5).
4. Two models are applied to a dataset that has been partitioned. Model A is considerably more
accurate than Model B on the training data but slightly less accurate than Model B on the validation
data. Which model are you more likely to consider for final deployment? Explain your choice.
(textbook reference - 2.10)
Page 1 of 2
5. The next 2 Questions require the Use of XLMiner Data Mining software and the UniversalBank.xls
dataset . . .
a. Use XLMiner’s Convert to Dummies utility to convert the categorical variable Education to
binary dummy variables. After the conversion, how many resulting columns exist for the
Education variable? Why is this conversion performed?
b. Using the newly created dataset (with binary dummy variables), use XLMiner’s Partitioning
function to perform Standard Partitioning (accept the default percentages for partitioning).
How many records were assigned to the Training Partition? How many records were
assigned to the Validation Partition? Why was a Test Partition not created?
Page 2 of 2