Download Hands On Lab: Building Your First Data Mining Model with SQL

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Transcript
Hands-On Lab
Building Your First Data Mining Model with
SQL Server 2008 R2 Analysis Services
Lab version:
1.0.0
Last updated:
4/29/2017
CONTENTS
OVERVIEW ................................................................................................................................................... 3
EXERCISE 1: EMBEDDING DATA MINING RESULTS INTO A CUSTOM APPLICATION ...................... 4
Task 1 – Browsing the Adventure Works Online Shopping Application ............................................... 5
Task 2 – Opening the AdventureWorksBI Solution ............................................................................... 6
Task 3 – Creating the Basket Analysis Data Source View...................................................................... 7
Task 4 – Configuring the Basket Analysis Data Source View................................................................. 8
Task 5 – Creating the Basket Analysis Mining Model ........................................................................... 9
Task 6 – Configuring the Basket Analysis Mining Model Algorithm Parameters ................................ 10
Task 7 – Processing the Basket Analysis Mining Model ...................................................................... 11
Task 8 – Viewing the Basket Analysis Mining Model Content ............................................................ 12
Task 9 – Querying the Basket Analysis Mining Model ........................................................................ 15
Task 10 – Enhancing the Adventure Works Online Shopping Application ......................................... 19
Task 11 – Browsing the Enhanced Adventure Works Online Shopping Application .......................... 20
Task 12 – Finishing Up ......................................................................................................................... 20
SUMMARY .................................................................................................................................................. 21
Overview
This lab will create a data mining model that uses the Microsoft Association Rules algorithm to identify
patterns about models commonly purchased together. The data mining model will be used to provide
relevant purchasing suggestions to online customers.
Note: Before you start with this exercise you must ensure that your machine meets the system
requirements detailed in the next section. Additionally, you must complete the setup steps described in
the next section.
Objectives
The objectives of this exercise are to:

Create a data source view

Create a Microsoft Association Rules data mining model

View the mining model content

Query the mining model

Embed the mining model query results into a Web application
System Requirements
You must have installed the following items to complete this lab:



Microsoft SQL Server 2008 R2:
◦
Database Engine
◦
Analysis Services
◦
SQL Server Business Intelligence Development Studio
SQL Server AdventureWorks2008 R2 sample databases
◦
AdventureWorks2008R2
◦
AdventureWorksDW2008R2
Microsoft Visual Studio 2010 SP1
◦
Visual Basic
◦
Visual Web Developer
Setup
All the requisites for this lab are verified using the Configuration Wizard. To make sure that everything is
correctly configured, follow these steps.
Note: To perform the setup steps you need to run the scripts in a command window with
administrator privileges.
1. Launch the Configuration Wizard for this lab by double-clicking the Dependencies.dep file
located under the Source\Setup folder of this lab. Install any pre-requisites that are missing
(rescanning if necessary) and complete the wizard.
Cleanup
There is no need to cleanup if you intend to continue the sequence of labs in this training kit.
1. To restore the original state of the AdventureWorks2008R2 and AdventureWorksDW2008R2
SQL Server databases and remove the Sales Analysis Analysis Services database, execute the
Cleanup.cmd script located under the Setup folder in the Source folder of this lab.
Exercises
This Hands-On Lab comprises the following exercise:
1. Embedding Data Mining Results Into a Custom Application
Estimated time to complete this lab: 30 minutes.
Exercise 1: Embedding Data Mining
Results Into a Custom Application
In this exercise, you will develop a data mining model that uses the Microsoft Association Rules
algorithm to identify rules about models commonly purchased together. This type of data mining is
called market basket analysis. The patterns discovered by the data mining model will be used by the
Adventure Works Online Shopping Web application to cross-promote models by suggesting relevant
models during the shopping cart checkout.
Task 1 – Browsing the Adventure Works Online Shopping Application
In this task, you will explore the Adventure Works Online Shopping Web application to understand how
it presently delivers suggestions during check out.
1. Open Visual Studio 2010 from Start | All Programs | Microsoft Visual Studio 2010 | Microsoft
Visual Studio 2010.
2. If prompted to choose default environment settings (required the first time Visual Studio is
launched), select Visual Basic Development Settings, and then click Start Visual Studio.
3. To open the AWOnlineShopping solution, on the File menu, select Open | Project/Solution.
4. In the Open Project window, navigate to the
Ex1-EmbeddingDataMining\Begin\AWOnlineShopping folder located in the Source folder for
this lab, select the AWOnlineShopping.sln file, and then click Open.
5. On the Debug menu, select Start Without Debugging.
6. When the Internet Explorer window opens, if required, maximize the window.
7. On the menu (located on the left), select Catalog by Category.
Figure 1
Selecting the menu item
8. On the Catalog by Category page, in the Product list, click the Mountain-200 Black, 38 link.
9. On the Product Details page, click Add to Shopping Cart.
10. On the Shopping Cart page, notice the three suggestions at the bottom of the page.
11. Click the Display Database Command label, and then review the database command.
Figure 2
Reviewing the database command
Note: These suggestions were retrieved by a relational database stored procedure. They
represent a static collection of suggestions, and as such they do not take into consideration
items already added to the shopping cart. Clearly, the suggestion to purchase a Mountain-200
is no longer relevant.
12. Close the Internet Explorer window.
13. Leave Visual Studio open.
Task 2 – Opening the AdventureWorksBI Solution
In this task, you will open an existing solution that consists of the completed labs in this training course.
You will then configure the deployment properties for the Sales Analysis Analysis Services project. In
this exercise, you will be extending this project to include a new data source view and data mining
structure.
1. Open SQL Server Business Intelligence Development Studio from Start | All Programs |
Microsoft SQL Server 2008 R2 | SQL Server Business Intelligence Development Studio.
2. To open the AdventureWorksBI solution, on the File menu, select Open | Project/Solution.
3. In the Open Project window, navigate to the
Ex1-EmbeddingDataMining\Begin\AdventureWorksBI folder located in the Source folder for
this lab, select the AdventureWorksBI.sln file, and then click Open.
Note: This solution consists of all completed labs that precede this lab in the training course.
4. In Solution Explorer, if necessary, collapse the Populate DW and Sales Reports projects.
5. In Solution Explorer, right-click the Sales Analysis project, and then select Properties.
6. In the Sales Analysis Property Pages window, select the Deployment page, set the Server
property to <servername>, and then click OK.
Note: You will need to substitute <servername> for the name of the machine that hosts
Analysis Services.
7. To save the solution, on the File menu, select Save All.
Task 3 – Creating the Basket Analysis Data Source View
In this task, you will you create the Basket Analysis data source view. The data source view will be the
foundation upon which the data mining model in this exercise will be developed.
8. In Solution Explorer, expand the Sales Analysis project, right-click the Data Source Views
folder, and then select New Data Source View.
9. In the Data Source View Wizard, read the welcome message, and then click Next.
10. In the Select a Data Source step, notice that the Adventure Works DW2008R2 data source is
selected, and then click Next.
11. In the Select Tables and Views step, in the Available Objects list, scroll to the bottom of the list.
12. While pressing the Control key, select the v2008Order and v2008OrderLine views.
13. Click the arrow to add the selected tables to the Included Objects list.
Figure 3
Adding the views to the Included Objects List
14. Click Next.
15. In the Completing the Wizard step, in the Name box, replace the text with Basket Analysis, and
then click Finish.
16. When the wizard completes, in Solution Explorer, notice the addition of the Basket Analysis
data source view, and that the data source view designer opens automatically.
17. To save the solution, on the File menu, select Save All.
Task 4 – Configuring the Basket Analysis Data Source View
In this task, you will refine the design of the data source view. This will involve providing friendly names
for each of the data source view tables, defining a logical primary key and establishing a relationship
between the tables.
1. To rename the tables, in the data source view designer, in the Tables pane (located in the
bottom left corner), select the v2008Order table, and then in the Properties window, modify
the FriendyName property to Order.
Note: If the Properties window is not visible, on the View menu, select Properties Window.
2. Repeat the last step for the v2008OrderLine table, and modify the FriendlyName property to
Basket.
Note: The purpose of this step is to create a user-friendly data model. It is important to
configure friendly names at the data source view level so that they are consistently inherited
throughout the objects (cubes, dimension and, mining models) created upon this view.
3. To define the primary key in the Order table, in the Order table, right-click the OrderNumber
column, and then select Set Logical Primary Key.
4. To establish a relationship between the Basket table and the Order table, in the Basket table,
drag the OrderNumber column on top of the OrderNumber column in the Order table.
Figure 4
Establishing the relationship between the tables
5. To arrange the tables, right-click in a blank area of the diagram, and then select Arrange Tables.
6. To explore the data in the Basket table, in the Tables pane (or the diagram), right-click the
Basket table, and then select Explore Data.
7. In the explorer window, notice that many orders include many models.
Note: The data mining model that you will develop in this exercise will produce a model to
describe the relationships between models purchased together (in the same order).
8. To close the explorer window, on the File menu, select Close.
9. To close the data source view designer, on the File menu, select Close.
10. On the File menu, click Save All.
Task 5 – Creating the Basket Analysis Mining Model
In this task, you will use the Data Mining Wizard to create the BasketAnalysis_AR mining model.
1. In Solution Explorer, in the Sales Analysis project, right-click the Mining Structures folder, and
then select New Mining Structure.
2. In the Data Mining Wizard, read the welcome message, and then click Next.
3. In the Select the Definition Method step, notice the default selection, and then click Next.
4. In the Create the Data Mining Structure step, in the dropdown list, select the Microsoft
Association Rules data mining algorithm, and then click Next.
5. In the Select Data Source View step, in the Available Data Source Views list, select the Basket
Analysis data source view, and then click Next.
6. In the Specify Table Types step, specify the table types as shown, and then click Next.
Figure 5
Specifying the table types
7. In the Specify the Training Data step, specify the columns to use in the mining model as shown,
and then click Next.
Figure 6
Specifying the training data
8. In the Specify Columns' Content and Data Type step, click Next.
9. In the Create Testing Set step, reduce the Percentage of Data for Testing value to 0, and then
click Next.
Note: It is very important that you follow the lab instructions precisely, particularly when
naming objects. This lab includes code that expect objects have been named correctly.
10. In the Completing the Wizard step, in the Mining Structure Name box, replace the text with
BasketAnalysis, and in the Mining Model Name box, replace the text with BasketAnalysis_AR.
11. Click Finish.
12. When the wizard completes, in Solution Explorer, notice the addition of the Basket Analysis
mining structure, and that the mining structure designer opens automatically.
13. On the File menu, click Save All.
Task 6 – Configuring the Basket Analysis Mining Model Algorithm Parameters
In this task, you will configure the Basket Analysis mining model algorithm parameters.
1. In the mining structure designer, select the Mining Models tab.
2. Right-click the BasketAnalysis_AR model, and then select Set Algorithm Parameters.
Figure 7
Opening the Algorithm Parameters window
3. In the Algorithm Parameters window, configure the Value property for the
MINIMUM_PROBABILITY and MINIMUM_SUPPORT parameters as shown.
Figure 8
Configuring the algorithm parameters
Note: The two parameters configured here define the sensitivity of the thresholds used to
analyze the data when the mining model processes.
4. Click OK.
Task 7 – Processing the Basket Analysis Mining Model
In this task, you will process the Basket Analysis mining model. Once processed, the mining model will
contain the patterns and statistics that describe the relationships between frequently purchased
models.
1. In Solution Explorer, inside the Sales Analysis project, right-click the BasketAnalysis mining
structure, and then select Process.
2. If prompted to build and deploy the project, click Yes.
3. If prompted to overwrite the database, click Yes.
4. In the Process Mining Structure window, click Run.
Note: The deployment process creates and processes the mining structure. At this time, the
data is retrieved from the data source, and the Microsoft Association Rules algorithm
correlates and identifies frequent relationships across attribute values, which in this case are
product models.
5. When processing completes, in the Process Progress window, click Close.
6. In the Process Mining Structure window, click Close.
Task 8 – Viewing the Basket Analysis Mining Model Content
In this task, you will use three mining model viewers to explore and understand the model content.
1. In the mining structure designer, select the Mining Model Viewer tab.
2. If prompted to build and deploy the project, click No.
3. In the Show dropdown list, select Show Attribute Name Only.
Figure 9
Configuring the attribute properties to show
4. To sort the rules in descending order of importance, click the Importance header twice.
Figure 10
Sorting the Importance in descending order
5. Review the most important rules (located at the top of the list).
Note: The first rule, Touring Tire Tube -> Touring Tire, reads “there is a 54.3% probability that
the purchase of a Touring Tire Tube will result in the purchase of a Touring Tire.”
6. Scroll to the bottom of the list to find rules with negative importance. The purchase of these
combinations are highly unlikely; in fact, the purchase of one discourages the purchase of the
other.
7. In the Mining Model Viewer tab, select the Itemsets tab.
Figure 11
Selecting the Itemsets tab
8. In the Show dropdown list, select Show Attribute Name Only.
9. Increase the Minimum Itemset Size value to 3.
10. Review the frequent itemsets that contain three models.
Note: The Support column represents the number of orders that included these three models.
11. Notice that the most frequent itemset that contain three models includes the Mountain-200
model. You will explore this model visually in the remaining steps of this task.
12. To view the item dependencies, select the Dependency Network tab.
13. In the viewer, on its toolbar, in the Show dropdown list, select Show Attribute Name Only.
14. To locate the Mountain-200 model, in the viewer, click the Find Node toolbar button.
Figure 12
Locating the Find Node toolbar button
15. In the Find Node window, select the Basket(Mountain-200) = Existing node, and then click OK.
16. To zoom in, in the viewer, click the Zoom In toolbar button until you can adequately see the
selected node and its related nodes.
Figure 13
Locating the Zoom In toolbar button
Figure 14
Exploring the Mountain-200 and related nodes
Note: Each line (in technical terms it is named an edge) represents a pairwise association. The
slider (located on the left) is associated with the importance score.
17. Gradually drag the slider down to highlight the stronger edges by filtering out the weaker edges.
Stop when you see only the selected node, Mountain-200, and the one node that the selected
node predicts.
Figure 15
Exploring the Mountain-200 and strongest related node
Note: The legend at the bottom of the viewer describes the node colors. Initially, in this
scenario, all nodes predict both ways (i.e. the sales of Mountain-200 often results in the sale of
the Mountain Bottle Cage, and vice versa). As you filter out the weaker edges, notice that
ultimately it is the Mountain-200 that more likely results in a purchase of the HL Mountain
Tire.
Task 9 – Querying the Basket Analysis Mining Model
In this task, you will create two singleton queries to test the model predictions.
1. In the mining structure designer, select the Mining Model Prediction tab.
2. Right-click inside the Select Input Table(s) window, and then select Singleton Query.
Figure 16
Configuring a singleton query
Note: A singleton query enables the input of data expressed in the query rather than sourced
from an external dataset.
3. Click inside the Value box to reveal an ellipsis, and then click the ellipsis.
Figure 17
Locating the clicking the ellipsis
4. In the Nested Table Input window, in the Key Column list, select the Mountain-200 model, and
then click Add.
5. Click OK.
6. In the query designer, in the query grid, in the Source column dropdown list, select Prediction
Function.
Figure 18
Selecting the Prediction Function source
7. In the corresponding Field column dropdown list, select PredictAssociation.
8. From the Mining Model window, drag Basket into the corresponding Criteria/Argument
column.
Figure 19
Dragging Basket to the Criteria/Argument column
9. To query the three likely models associated with the Mountain-200 model, in the
Criteria/Argument column, append a comma and the number 3 at the end to create the
following argument.
DMX
[BasketAnalysis_AR].[Basket], 3
10. The query should look like the following.
Figure 20
Reviewing the query
11. On the mining model prediction toolbar, toggle to Query.
Figure 21
Toggling to the Query view
Note: The query designer displays the DMX statement. This statement is requesting the three
most likely models based on a basket consisting of only Mountain-200.
12. To execute the query, on the mining model prediction toolbar, toggle to Result.
13. Expand Expression to reveal the three predicted models.
Figure 22
Reviewing the query result
14. On the mining model prediction toolbar, toggle to Design.
15. To add another model to the basket, on the Singleton Query Input window, click inside the
Value box, and then click the ellipsis.
16. In the Nested Table Input window, in the Key Column list, select the HL Mountain Tire model,
and then click Add.
17. Click OK.
18. On the mining model prediction toolbar, toggle to Query.
19. Notice the addition of the HL Mountain Tire.
20. Modify the SELECT line to read SELECT FLATTENED.
Note: The FLATTENED keyword will produce a flattened result that can be easily consumed by
an application. Note however, that once you modify the query created by the graphic designer
that you lose the graphic support functionality.
21. On the mining model prediction toolbar, toggle to Result.
22. Review the query result that now requests the three most likely models based on a basket
consisting of the Mountain-200 and the HL Mountain Tire models.
Task 10 – Enhancing the Adventure Works Online Shopping Application
In this task, you will use modify the AWOnlineShopping Web application to deliver relevant model
suggestions using the BasketAnalysis_AR data mining model.
1. Switch to Visual Studio 2010.
2. In Solution Explorer, right-click the Web.config file, and then select Open.
3. On the Edit menu, select Find and Replace | Quick Replace.
4. In the Find and Replace window, in the Find What dropdown list, enter SERVERNAME.
5. In the Replace With dropdown list, enter the name of the machine that hosts Analysis Services.
6. Click Replace All.
7. Verify that one replacement occurred, and then click OK.
8. Close the Find and Replace window.
9. On the File menu, select Save All.
10. To close the Web.config file, on the File menu, select Close.
11. In Solution Explorer, right-click the ShoppingCart.aspx item, and then select View Code.
12. Press Control+G, then in the Go to Line window, enter 90, and then click OK.
13. In the code window, review the GetDataMiningSuggestions function and the associated
comments.
Note: It is not necessary to understand the details of this code. This code is responsible for
dynamically building a DMX statement similar to the one you created in the previous task.
14. Press Control+G, then in the Go to Line window, enter 67, and then click OK.
15. Replace this line with the following line.
Visual Basic
dr = GetDataMiningSuggestions()
Note: This modification will retrieve suggested models predicted by the BasketAnalysis_AR
data mining model.
16. On the File menu, click Save All.
Task 11 – Browsing the Enhanced Adventure Works Online Shopping Application
In this task, you will browse the enhanced AWOnlineShopping Web application.
1. On the Debug menu, select Start Without Debugging.
2. When the Internet Explorer window opens, if necessary, maximize the window.
3. On the menu, select Shopping Cart.
4. On the Shopping Cart page, notice that the three suggestions have changed according to
associations with the items added to the shopping cart.
5. Click the Display Database Command label, and then review the database command.
6. From the suggestions list, click the HL Mountain Tire link.
7. On the Catalog by Model page, click the HL Mountain Tire link.
8. On the Product Details page, click Add to Shopping Cart.
9. On the Shopping Cart page, notice that the three suggestions have been revised.
10. Click the Display Database Command label, and then review the database command.
Task 12 – Finishing Up
In this task, you will finish up by closing all applications.
1. Close the Internet Explorer window.
2. In Visual Studio, on the File menu, select Exit.
3. In SQL Server Business Intelligence Development Studio, on the File menu, select Exit.
Summary
In this lab, you have created a data mining model that uses the Microsoft Association Rules algorithm to
identify patterns about models commonly purchased together. The patterns discovered by the data
mining model were used to enhance the customer experience while shopping online.