Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003 1 Oracle Relational Approach Meta-Data Relational Tables Customer Sale SaleItem Materialized Views Star Design Dimension Dimension Sale + Fact Measure Customer Item Dimension Dimension 2 Desired Sales Cube Dimensions Sales Dimensions State (ship) Month Category Style SkillLevel Size Color Manufacturer BindingStyle WeightMax? ItemMaterial? WaistWidth? 3 Early Data: Spreadsheets 4 External Tables: Attach to CSV create or replace directory csv_dir as ‘D:\students\BuildAllPowder\AllPowderSampleDataCSV'; create table OldSale_Ext ( SaleID INTEGER, SaleDate DATE, ShipState VARCHAR2(50), ShipZIP VARCHAR2(50), PaymentMethod VARCHAR2(50), SKU VARCHAR2(50), QuantitySold INTEGER, SalePrice NUMBER(10,2) ModelID VARCHAR2(250), ItemSize NUMBER, ManufacturerID INTEGER, Category Color ModelYear Graphics ItemMaterial ListPrice Style SkillLevel WeightMax WaistWidth BindingStyle ) VARCHAR2(50), VARCHAR2(50), INTEGER, VARCHAR2(50), VARCHAR2(50), NUMBER(10,2), VARCHAR2(50), INTEGER, NUMBER, NUMBER, VARCHAR2(50) Continued on next slide Warning: currency columns cannot have $ symbols or commas 5 External File Definition organization external ( type oracle_loader default directory csv_dir access parameters ( records delimited by newline fields terminated by ',' optionally enclosed by '"' lrtrim missing field values are null ( SaleID, SaleDate char date_format date mask "mm/dd/yyyy", ShipState, ShipZIP, PaymentMethod, SKU, QuantitySold, SalePrice, odelID, ItemSize, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle ) ) location ('Lab 08-01 Early Sales.csv') ) reject limit unlimited; 6 Create Customer and Employee CustomerID and EmployeeID are missing from the old data. Instead of relying on blank cell values, create a new customer called “Walk-in” and a new employee called “Employee” Write down the ID numbers generated for these anonymous entries. If you use SQL, you can assign a value of zero to these entries. INSERT INTO Customer (CustomerID, LastName) Values (0,'Walk-in') INSERT INTO Employee (EmployeeID, LastName) Values (0,'Staff') 7 Extract Model Data SELECT DISTINCT OldSale_ext.ModelID, OldSale_ext.ManufacturerID, OldSale_ext.Category, OldSale_ext.Color, OldSale_ext.ModelYear, OldSale_ext.Graphics, OldSale_ext.ItemMaterial, OldSale_ext.ListPrice, OldSale_ext.Style, OldSale_ext.SkillLevel, OldSale_ext.WeightMax, OldSale_ext.WaistWidth, OldSale_ext.BindingStyle FROM OldSale_ext; 8 UNION Query for Models SELECT DISTINCT ModelID, ManufacturerID, Category, … FROM OldSales_ext UNION SELECT DISTINCT ModelID, ManufacturerID, Category, … FROM OldRentals_ext 9 Insert Model Data into ItemModel INSERT INTO ItemModel (ModelID, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle) SELECT DISTINCT qryOldModels.ModelID, qryOldModels.ManufacturerID, qryOldModels.Category, qryOldModels.Color, qryOldModels.ModelYear, qryOldModels.Graphics, qryOldModels.ItemMaterial, qryOldModels.ListPrice, qryOldModels.Style, qryOldModels.SkillLevel, qryOldModels.WeightMax, qryOldModels.WaistWidth, qryOldModels.BindingStyle FROM qryOldModels; 10 Insert SKU Data into Inventory CREATE VIEW qryOldInventory AS SELECT DISTINCT ModelID, SKU, ItemSize FROM OldSale_ext UNION SELECT DISTINCT ModelID, SKU, ItemSize FROM OldRental_ext; INSERT INTO Inventory (ModelID, SKU, ItemSize, QuantityOnHand) SELECT DISTINCT qryOldInventory.ModelID, qryOldInventory.SKU, qryOldInventory.ItemSize, 0 As QuantityOnHand FROM qryOldInventory; Note the use of the column alias to force a zero value for QuantityOnHand for each row 11 Copy Sales Data INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod) SELECT DISTINCT OldSales_ext.SaleID, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethod FROM OldSales_ext; Note that if you have added data to your Sales table, your existing SaleID values might conflict with these You can solve the problem by adding a number to these values so they are all larger than your highest ID INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod) SELECT DISTINCT OldSales_ext.SaleID+5000, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethod FROM OldSales_ext; 12 Copy SaleItem Rows INSERT INTO SaleItem (SaleID, SKU, QuantitySold, SalePrice) SELECT DISTINCT OldSale_ext.SaleID+5000, OldSale_ext.SKU, OldSale_ext.QuantitySold, OldSale_ext.SalePrice FROM OldSale_ext; If you transformed the SaleID in the prior step for the Sale data, you must do the exact same calculation for SaleID in the SaleItem table 13 Copy Rental Data INSERT INTO Rental (RentID, RentDate, ExpectedReturn, PaymentMethod) SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.RentDate, OldRental_ext.ExpectedReturn, OldRental_ext.PaymentMethod FROM OldRental_ext; INSERT INTO RentItem (RentID, SKU, RentFee, ReturnDate) SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.SKU, OldRental_ext.RentFee, OldRental_ext.ReturnDate FROM OldRental_ext; 14 Discoverer Administrator: Load Business Area Schema Tables and views Select tables 15 Load Wizard Options: LOV Most options are selected by default Select the LOV option to have Discoverer build lookup lists 16 Discoverer: Business Area Tables shown as folders and named so managers understand them Columns shown as items Add a calculated item 17 Create a Data Hierarchy Select Category and Style from the SkiBoardStyle lookup table 18 Discoverer Desktop: New Workbook Select the dimensions and the fact item 19 Initial Crosstab Layout Page area Column area Row area 20 Discoverer Crosstab Browser Totals Select all items Format options 21 Time Series Analysis: Moving Average 22 Time Series Analysis: Discoverer 23 Sales by State for Regression Note that some states are missing from the list. 24 Regression Data Query CREATE VIEW StateSales2004 AS SELECT StateName, Income2001, Pop2002, Sum(SalePrice*QuantitySold) AS Sales2004 FROM Sale INNER JOIN StateDemographics ON Sale.ShipState = StateDemographics.StateCode INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE ShipState IS NOT NULL AND SaleDate Between '01-Jan-2004' And '31-Dec-2004' GROUP BY StateName, Income2001, Pop2002 ORDER BY StateName; 25 Regression Setup You should include the label row but be sure to check the box to show you included it 26 Regression Results Relatively high R-square Population is a significant predictor, Income is not 27 Association Rules/Market Basket Locate folders Item to find Possible location Data mining samples D:\Oracle\ora92\dm\demo\sample ORACLE_HOME D:\Oracle\ora92 JAVA_HOME C:\OracleData\Ora92DS\jdk 28 Copy Files to Protect Original compileSampleCode.bat executeSampleCode.bat Sample_AssociationRules.java Sample_AssociationRules_Transactional.property Sample_Global.property 29 Edit Sample_Global.property File miningServer.url=jdbc:oracle:thin:@YourServerName:1521:DBName miningServer.userName=odm miningServer.password=password inputDataSchemaName=powder outputSchemaName=powder timeout=120 If necessary, use enterprise manager to unlock and assign new passwords to accounts: odm and odm_mtr 30 Create New Table To Hold Transaction Basket Data CREATE TABLE MARKET_BASKET_TX_BINNED ( SEQUENCE_ID INTEGER, ATTRIBUTE_NAME VARCHAR2(35), VALUE NUMBER ); GRANT SELECT ON MARKET_BASKET_TX_BINNED TO odm; commit; If you use these names, you do not have to edit the Transactional.property file 31 Copy SaleItem Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT SaleID, ItemModel.Category || '_' || ItemModel.Style AS AName, 1 As Value FROM SaleItem Inner Join Inventory ON SaleItem.SKU = Inventory.SKU Inner Join ItemModel ON Inventory.ModelID = ItemModel.ModelID GROUP BY SaleID, ItemModel.Category || '_' || ItemModel.Style; 32 Copy Sale Data INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE) SELECT SaleID, 'ID', SaleID FROM Sale; commit; 33 Remove Dashes from Attribute UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,instr(ATTRIBUTE_NAME,'-')-1) || '_' || substr(ATTRIBUTE_NAME,instr(ATTRIBUTE_NAME,'-')+1) WHERE instr(ATTRIBUTE_NAME,'-') > 0; commit; Run at least twice—until you get zero changes. Because a row might have more than one dash. 34 Limit Size of Attribute_Name UPDATE MARKET_BASKET_TX_BINNED SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,20); commit; This is critical—but is probably due to a bug in Oracle’s code. There is a slight chance it arises because of the 30 character name limitation in Oracle. 35 Compile and Run the Code SET ORACLE_HOME = D:\Oracle\ora92 SET JAVA_HOME = C:\OracleData\ora92DS\jdk compileSampleCode.bat Sample_AssociationRules.java executeSampleCode.bat Sample_AssociationRules Sample_AssociationRules_Transactional.property Type as all one line—do not hit <Enter> until the end To redirect the output to a file, at the end, add: >myfile.txt 36 Sample Results Getting top 5 rules for model: Sample_AR_Model_tx sorted by support. Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967] Rule 101: If Board_Half_Pipe=1 then Clothes_=1 [support: 0.11357143, confidence: 0.4622093] Rule 9: If Clothes_=1 then Board_Half_Pipe=1 [support: 0.11357143, confidence: 0.23177843] Rule 100: If Ski_Freestyle=1 then Clothes_=1 [support: 0.09785714, confidence: 0.48070174] Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16. Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816] Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967] Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more. Investigate and think about the results. Do you have too many clothes targeted to half-pipe boards and freestyle skiers, or not enough? 37 GIS: Microsoft MapPoint The Discoverer worksheet places the data into rows and columns A dynamic copy of this sheet is used to remove the top rows 38 MapPoint Data Wizard 39 GIS Analysis of Sales 40