Download Data Mining Tutorials Lesson 1-4 Notes for Data Mining Tutorial

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining Tutorials Lesson 1-4
Notes for Data Mining Tutorial Lesson #1 (Preparing the Analysis Services
Database)
Start at: http://msdn.microsoft.com/en-us/library/ms167167(SQL.90).aspx


This just involves setting up the Data Source and the Data Source View. Nothing
new here, just make sure you get the right tables.
Note: the table vTargetMail is the "training data", and contains an indicator
attribute BikeBuyer as the predictable attribute for the model. The table
ProspectiveBuyer is used by the data mining models after they has been
"trained" on the training data.
Notes for Data Mining Tutorial Lesson #2 (Building a Targeted Marketing
Scenario)
This lesson presents the scenario of an analyst applying various data mining
algorithms to Adventure Works customer and Internet sales data to determine the
demographic attributes of customers who are likely to purchase a bicycle. The
analyst can then apply the data mining model to a list of potential customers in
order to determine which customers are most likely to respond to a targeted
mailing that promotes Adventure Works bikes.

Task #1: Creating a Targeted Mailing Mining Model Structure
Using the wizard to create a Decision Tree model based on Decision Tree. This is
applied to training data in vTargetMail.

Task #2: Modifying the Targeted Mailing Model
Adding Cluster and Naïve Bayes models, then processing the models against the
training data.

Task #3: Exploring the Targeted Mailing Models
View how each of the three models classified the training data.

Task #4: Testing the Accuracy of the Mining Models
Compare how well each of the three models fit the training data.

Task #5: Creating Predictions
Apply the models to the data in ProspectiveBuyer.
Deciding which algorithm to use for which task:
http://msdn.microsoft.com/en-us/library/ms175595(SQL.90).aspx
Lesson 3: Time sequence
Just follow instructions for this
Lesson 4: Association Rules
Important measures for association rules:
1) support of a rule or itemset = frequency of itemset in the total population
2) confidence of a rule (called probability in Microsoft) = support of left and right
divided by support of left side alone
3) lift of a rule (called "importance" in Microsoft) – this formula depends on who
you ask. The book says one thing, Microsoft says another. Wikipedia says what
the book says but has another measure called "conviction" that is more similar to
Mocrosoft's "lift" measure.
Wikipedia: http://en.wikipedia.org/wiki/Association_rules
Note: in lessons 4, the tutorial states that the keys are automatically identified by the
wizard when generating the mining structure (for association rules and for sequence
clustering). I did not find this to be the case, and I had to explicitly set the appropriate
keys and inputs. When you get to this step in the lessons, if you can't figure out which
keys to set, call on me and I'll help.
Useful links:

Introduction to SQL Server 2005 Data Mining: http://msdn.microsoft.com/enus/library/ms345131(SQL.90).aspx

Data Mining Algorithms: http://msdn.microsoft.com/en-us/library/ms175595.aspx