Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Tutorials Lesson 1-4 Notes for Data Mining Tutorial Lesson #1 (Preparing the Analysis Services Database) Start at: http://msdn.microsoft.com/en-us/library/ms167167(SQL.90).aspx This just involves setting up the Data Source and the Data Source View. Nothing new here, just make sure you get the right tables. Note: the table vTargetMail is the "training data", and contains an indicator attribute BikeBuyer as the predictable attribute for the model. The table ProspectiveBuyer is used by the data mining models after they has been "trained" on the training data. Notes for Data Mining Tutorial Lesson #2 (Building a Targeted Marketing Scenario) This lesson presents the scenario of an analyst applying various data mining algorithms to Adventure Works customer and Internet sales data to determine the demographic attributes of customers who are likely to purchase a bicycle. The analyst can then apply the data mining model to a list of potential customers in order to determine which customers are most likely to respond to a targeted mailing that promotes Adventure Works bikes. Task #1: Creating a Targeted Mailing Mining Model Structure Using the wizard to create a Decision Tree model based on Decision Tree. This is applied to training data in vTargetMail. Task #2: Modifying the Targeted Mailing Model Adding Cluster and Naïve Bayes models, then processing the models against the training data. Task #3: Exploring the Targeted Mailing Models View how each of the three models classified the training data. Task #4: Testing the Accuracy of the Mining Models Compare how well each of the three models fit the training data. Task #5: Creating Predictions Apply the models to the data in ProspectiveBuyer. Deciding which algorithm to use for which task: http://msdn.microsoft.com/en-us/library/ms175595(SQL.90).aspx Lesson 3: Time sequence Just follow instructions for this Lesson 4: Association Rules Important measures for association rules: 1) support of a rule or itemset = frequency of itemset in the total population 2) confidence of a rule (called probability in Microsoft) = support of left and right divided by support of left side alone 3) lift of a rule (called "importance" in Microsoft) – this formula depends on who you ask. The book says one thing, Microsoft says another. Wikipedia says what the book says but has another measure called "conviction" that is more similar to Mocrosoft's "lift" measure. Wikipedia: http://en.wikipedia.org/wiki/Association_rules Note: in lessons 4, the tutorial states that the keys are automatically identified by the wizard when generating the mining structure (for association rules and for sequence clustering). I did not find this to be the case, and I had to explicitly set the appropriate keys and inputs. When you get to this step in the lessons, if you can't figure out which keys to set, call on me and I'll help. Useful links: Introduction to SQL Server 2005 Data Mining: http://msdn.microsoft.com/enus/library/ms345131(SQL.90).aspx Data Mining Algorithms: http://msdn.microsoft.com/en-us/library/ms175595.aspx