Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining in SQL Server 2000 and Yukon Richard Lees [email protected] RichardLees.com.au Agenda What isn’t Data Mining Demo What is Data Mining Demo What’s Coming in Yukon Create a data mine 4 ways to view data mine Demo Questions Throughout Which Questions are Data Mining? Who are our biggest customers? What are customers buying with cigars? What are the customer retention levels of our branches? Which customers have bought olives, feta cheese but no ciabatta bread? Which regions have the highest male/female ratio of single 20 somethings? Which region has lowest customer retention levels and list out lost customers? Demonstration Ad hoc query Drill through to details Business Intelligence tool History of OLAP and Data Mining 19xx Custom Data Mining available to Fortune 100 1993 Codd’s Defined 12 rules for OLAP 1998 Microsoft SQL 7 • OLAP v1 1999 2000 OLAP on the Web • ThinSlicer • Many others Microsoft SQL 2000 • OLAP v2 • Data Mining • English Query SAS and SPSS offer Data Mining tools To those who can afford Future Data Mining V2 • SQL 2005 • BI Tools Sample Data I Will be Using Wellington Libraries Loan DB We wanted sample data for data mining They were just writing off a data warehouse project “The experts have spent 12 months trying to import data!” “How could Microsoft help us? The data are in IBM databases!” What is Data Mining? “Data mining is the use of powerful software tools to discover significant traits or relationships, from databases or data warehouses and often used to predict future events” It exploits statistical algorithms such as decision trees, clustering, sequence clustering, association, naïve bayes, neural network and time series algorithms Once the “knowledge” is extracted it: Can be used to discover Can be used to predict values of other cases OLAP versus Data Mining OLAP Data Mining Is about fast ad hoc querying Analysis by dimensions and measures Gives precise answers May use rdbms or OLAP source Is about discovering and predicting Gives imprecise answers OLAP is not a prerequisite for data mining, but it almost always comes first (learning to ride a bike before a car) Clusters Annual Income Age Library Clusters Decision Trees Input data About cases Discovering relationships Predicting outcomes Data Mining Demo with real data Build a data mine View data mine 1. 2. 3. 4. 5. Browse dependencies Browse decision trees Query using MDX Query using ThinMiner Batch update Elite Embedded Uses of Data Mining Risk assessment Claim likelihood Customer profitability predictions Fraud detection Treatment efficacy Product suggestions Web shopping Call centre tool Successful Data Mining Projects Two additional Critical Success Factors 1. 2. Discover something interesting Profit from discovery For example ComputerFleet (Localhost) What’s Coming in Yukon Decision Trees Sequence Clustering Clustering Association Time Series Naïve Bayes Confusion Matrix Neural Networks Lift Charts Naïve Bayes NOK OK .90 (.27) .27 /.41 J NOK (.3x.9)+(.7x.2) =.41 =.67 .14 /.41 =.33 .30 .10 (.03) .03 /.59 .70 .80 (.56) Actual =.05 .20 (.14) Actual declared J OK (.3x.1)+(.7x.8 ) =.59 Judged .56 /.59 =.95 Posterior (actual) Demonstration Yukon Development New algorithms Lift chart Profit curve Query tool Questions: References Microsoft Research http://Research.Microsoft.com/research/pubs Richard Lees [email protected] http://RichardLees.com.au