Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and Genus Software. Any use of the this material, in part or whole, except in context of Genus Data Mining Integrator and Data Mart Builder, without written permission from HP and Genus is prohibited. © 2002 page 1 agenda agenda • data mining in ZLE solutions • ZLE data mining toolkit • toolkit demonstration © 2002 page 2 Meta Group • process of identifying and/or extracting previously unknown, non-trivial, unanticipated, important information from large sets of data title text Gartner Group • process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies, statistical and mathematical techniques © 2002 page 3 • role – determine most effective responses to business events • ZLE facilitates mining by providing title text – a rich, integrated, current data source – an integrated operational environment into which models can be deployed • data mining helps to realize the full business value of a ZLE system © 2002 page 4 ZLE data mining process • understand the opportunity – identify and define business opportunity identify and define business opportunity • prepare data – – – – profile and understand data derive attributes transform data create case set typically about 75% of process profile data derive attributes transform data • build models – train models – assess model performance • use models – deploy model – monitor model performance create case set train models assess performance deploy model monitor model performance © 2002 page 5 agenda agenda • data mining in ZLE solutions • ZLE data mining toolkit • toolkit demonstration © 2002 page 6 the ZLE data mining toolkit • goal: – provide tools that facilitate ZLE data mining – reduce process cycle times dramatically • three tools being developed by Genus Software: – data preparation – data transfer – model deployment • partners: Genus, MicroStrategy, SAS • product names: – Genus Mining Integrator for NonStop SQL (all three tools) – Genus Mart Builder for NonStop SQL (first two tools only) © 2002 page 7 ZLE data mining analytical cycle Data Preparation (profiling/transforming data) Real-Time Scoring Data Transfer Interaction Manager (using the Recommender) (fast parallel streams) Modeling (SAS Enterprise Miner) Rules Engine Scoring Engine Agg. Engine Mining Mart Data Store (NonStop SQL) Model Deployment (Tru64/Windows) (written to DB tables) part of Genus toolkit available from SAS part of ZDK 3 © 2002 page 8 agenda agenda • data mining in ZLE solutions • ZLE data mining toolkit • toolkit demonstration © 2002 page 9 toolkit demonstration • credit card fraud detection example • opportunity: use ZLE data store data to predict, in real-time, which credit card purchases are likely to be fraudulent • use tools to: – – – – build a case set table with one row describing each purchase transfer table to SAS server for modeling deploy predictive model to ZLE data store execute model in real-time to make fraud predictions • steps described, including many tool screen shots © 2002 page 10 toolkit data preparation solution • based on the MicroStrategy (MSI) Business Intelligence toolset, leverages GUI, logical data model support, SQL generation, etc. • uses NonStop SQL/MX DBMS, leverages sampling, TRANSPOSE, statistical functions, … • custom tool developed by Genus using MSI SDK for NonStop SQL operations and functionality not supported by MSI tools © 2002 page 11 two main ZLE data preparation tasks 1. profile tables – column names and types – partitioning information, attributes, key structure, … – column values 2. transform source tables – – – – – © 2002 derive new attributes aggregate to appropriate level clean data pivot combine to form case set page 12 the MicroStrategy desktop © 2002 page 13 MSI profile report: fraud vs. billing state © 2002 page 14 NonStop SQL/MX sampling • source table sampling – insert into CustSamp select * from Cust sample random 1 percent clusters of 10 blocks union select * from Cust where CardNo in (select CardNo from FrdFlg) • enables interactive and exploratory data prep • cleanly integrated into SQL • performed efficiently in DP2 • easily accessible through Genus tool © 2002 page 15 creating a materialized sample table using the Genus Data Mart Builder © 2002 page 16 identifying source and sample method © 2002 page 17 specifying materialized sample table © 2002 page 18 transforming source data Purchase PurchDt 102302 11:02:44 102302 11:02:44 102302 11:02:45 102302 11:02:45 … 102402 11:01:01 102402 11:02:59 102402 11:02:21 102402 11:03:58 … 102502 12:01:34 102502 12:01:49 102502 12:03:45 102502 12:03:58 … Account Amt Store $4.50 423 $88.38 221 $121.33 221 $19.99 73 Acct Size 8849940044 249 8376636636 337 8376636636 893 3866493657 102 $43.84 $77.01 $11.63 $144.00 743 23 189 270 8376636636 5378366284 8376636636 3866493657 219 430 501 194 12 6 14 2 44 90 23 5 0 0 0 1 $289.08 $71.99 $38.23 $58.84 45 301 219 17 6474538469 3866493657 5382638977 3866493657 579 220 331 430 5 13 1 8 75 34 91 18 0 1 0 0 Billions of Purchase s © 2002 Store Millions of Accounts Age 4 9 1 19 Purchase History CS CR CrLim 33 1 1000 88 0 4600 76 0 1700 43 1 1700 S1 0 1 0 0 A1 P3 S3 0 0 0 $54 1 1 0 0 0 0 0 0 Item Summary Fraud Ten 8 46 15 15 P1 0 1 0 0 4600 1000 2000 1500 89 1 20 12 1 1 2 1 1 $121 1 $54 2 $79 1 $20 1 1 2 3 1 $121 $8 $19 1 $54 $15 $22 2 $79 $1 $3 1 $60 $11 $42 1 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 3000 3300 2900 1800 30 28 29 16 0 2 0 3 0 1 0 2 0 4 0 5 0 1 0 2 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 $54 0 $55 A3 Min Max Elec Vid Jewl Frd? 0 $1 $3 0 0 0 0 $54 $9 $17 1 1 0 1 0 $19 $42 0 0 1 0 0 $4 $9 0 1 0 0 0 $19 $98 $59 $7 $22 0 $4 $9 $58 $6 $14 Aggregate and Pivot page 19 result: a case set for modeling Hundreds of Attributes PurchDt 102302 11:02:44 102302 11:02:44 102302 11:02:45 102302 11:02:45 … 102402 11:01:01 102402 11:02:59 102402 11:02:21 102402 11:03:58 … 102502 12:01:34 102502 12:01:49 102502 12:03:45 102502 12:03:58 … Amt Store $4.50 423 $88.38 221 $121.33 221 $19.99 73 Acct Size 8849940044 249 8376636636 337 8376636636 893 3866493657 102 $43.84 $77.01 $11.63 $144.00 743 23 189 270 4674847467 5378366284 8376636636 3866493657 219 430 501 194 12 6 14 2 44 90 23 5 0 0 0 1 $289.08 $71.99 $38.23 $58.84 45 301 219 17 6474538469 3866493657 5382638977 3866493657 579 220 331 430 5 13 1 8 75 34 91 18 0 1 0 0 One Row Per Purchase © 2002 Age 4 9 1 19 CS CR CrLim 33 1 1000 88 0 4600 76 0 1700 43 1 1700 Ten 8 46 15 15 P1 0 1 0 0 S1 0 1 0 0 A1 P3 S3 0 0 0 $54 1 1 0 0 0 0 0 0 4600 1000 2000 1500 89 1 20 12 1 1 2 1 1 $121 1 $54 2 $79 1 $20 1 1 2 3 1 $121 $8 $19 1 $54 $15 $22 2 $79 $1 $3 1 $60 $11 $42 1 1 0 1 0 1 0 1 0 0 0 1 1 1 0 1 3000 3300 2900 1800 30 28 29 16 0 2 0 3 0 1 0 2 0 4 0 5 0 1 0 2 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 $54 0 $55 A3 Min Max Elec Vid Jewl Frd? 0 $1 $3 0 0 0 0 $54 $9 $17 1 1 0 1 0 $19 $42 0 0 1 0 0 $4 $9 0 1 0 0 0 $19 $98 $59 $7 $22 0 $4 $9 $58 $6 $14 Mix of Fraud and No-Fraud Purchases page 20 MSI Datamart report summarizing items © 2002 page 21 data transfer tool • task: transfer case set from data store to mining mart – design HTTP Web browser client HTML Web App. Web server JDBC coordinator NonStop SQL/MX coordinator transfer transfer transfer transfer Data Store © 2002 receive receive receive receive Mining Mart SAS import SAS import SAS import SAS import ASCII files SAS data set page 22 data transfer specification screen © 2002 page 23 transfer monitoring © 2002 page 24 modeling in SAS enterprise miner © 2002 page 25 model export score converter node generates Java model code body copy reporter node exports code and HTML report to project directory © 2002 page 26 model deployment tool • task – copy model information to a ZLE Data Store – design HTTP Web browser client HTML JDBC access © 2002 Web App. Web Server File/registry access NonStop SQL/MX SAS Open Metadata server Data Store File/SAS server SAS Enterpris e Miner Mining Mart Model export/registration page 27 starting the model deployment tool © 2002 page 28 connecting to a Data Store © 2002 page 29 a list of models in the Data Store © 2002 page 30 viewing a deployed model © 2002 page 31 selecting a SAS report directory © 2002 page 32 viewing available reports © 2002 page 33 viewing an Enterprise Miner report © 2002 page 34 deploying a model © 2002 page 35 deployment confirmation © 2002 page 36 Interaction Manager real-time scoring using the Recommender © 2002 Offers / Advice Rules Engine Business Rules Model Scores Scoring Engine Deployed Models Model Aggregates Customer Data Aggregation Engine Aggregate Definitions page 37 how to get the data mining tools •Product Names – Genus Mining Integrator for NonStop SQL (Data Preparation, Data Transfer, and Model Deployment tools) – Genus Mart Builder for NonStop SQL (first two tools only) • Can be ordered through HP, support provided by Genus • Availability: calendar Q4 2002 • For more information, contact – [email protected] (Product Manager) – [email protected] (Program Manager) – [email protected] (Development) © 2002 page 38