Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to SQL Server Data Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia Agenda What is Data Mining? Why use Data Mining? Data Mining Tasks Data Mining Process SQL Server 2005 Data Mining Demonstration SQL Server 2005 Data Mining Discussion What is Data Mining? What is not Data Mining? • • • • • • Ad-Hoc Query Event Notifications Multidimensional Analysis/Slice Dice Statistics OLAP Canned or ad-hoc reports What is Data Mining? “Data mining is the semiautomatic extraction of patterns, changes, associations, anomalies, and other statistically significant structures from large data sets.” R. Grossman Also known as Machine Learning Predictive Analytics Why Data Mining? Disk Processor Time Types of Analysis Query-Reporting-Analysis “What happened?” Real-Time - “What is happening?” Simple Reports Key Performance Indicators OLAP Cubes – Slice/Dice Events/Triggers Data Mining “What will happen?” “How/why did this happen?” Data Mining Tasks Explores Your Data Finds Patterns Performs Predictions Data Mining Tasks Mining Model Training Data Data Mining Model To Predict DM Engine DM Engine Mining Model Predicted Data Customer Examples ComputerFleet (Australia): Predict when hired equipment will be returned Sanford Securities (Australia): Data mining automation Clait Health Services: Identify patients likely to suffer deteriorating health for pro-active treatment AIM Healthcare: Identify billing errors, duplicate payments etc. to minimize costs Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis • • • • • • • What type of membership card should I offer? Which customers will respond to my mailing? Is this transaction fraudulent? Will I lose this customer? Will this product be defective? Why is my system failing? Which patients health will degrade? Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis • • • • How much revenue will I get from this customer? How long will this asset be in service? What is the mean time to failure? What is the particle density of this fluid? Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis • • • • Describe my customers How can I differentiate my customers? How can I organize my data in a manner that makes sense? Is this record an outlier? Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis • • • What items are bought together? Which services are used together? What products should I recommend to my customers? Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis – What are projected revenues for all products? – What are inventory levels next month? Data Mining Tasks Classification Estimation Segmentation Association Forecasting Text Analysis • Analysis of unstructured data – – – Finds key terms and phrases in text Conversion to structured data Feed into other algorithms • • • • • • Classification Segmentation Association How do I handle call center data? How can I classify mail? What can I do with web feedback? Data Mining Process CRISP-DM “Doing Data Mining” Business Understanding Data Understanding Data Preparation Data Deployment Modeling Evaluation “Putting Data Mining to Work” www.crisp-dm.org Value of Data Mining Business Knowledge Relative Business Value SQL Server 2005 Data Mining OLAP Reports (Adhoc) Reports (Static) Easy Difficult Usability Data Mining Process CRISP-DM “Doing Data Mining” Business Understanding Data Understanding Data Preparation Data Deployment Modeling Evaluation “Putting Data Mining to Work” www.crisp-dm.org Data Mining User Interface SQL Server BI Development Studio Creation and exploration environment Data Mining projects inside Visual Studio solutions with related projects Source Control Integration SQL Server Management Studio Single place for management of all SQL Server technologies Manage, Browse, and Query Data Mining Models Data Mining Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • • • • Decision Trees Neural Nets Naïve Bayes Logistic Regression Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • • • • Decision Trees Neural Nets Logistic Regression Linear Regression Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • • Clustering Sequence Clustering Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • • Association Rules Decision Trees Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • Time Series Data Mining Algorithms Classification Estimation Segmentation Association Forecasting Text Analysis • Integration Services – Term Extraction Transform – Term Lookup Transform Data Mining Programmability DMX Query Interface OLEDB, ADO, ADO.Net, ADOMD.Net, XMLA Dim cmd as ADOMD.Command Dim reader as ADOMD.DataReader Cmd.Connection = conn Set reader = Cmd.ExecuteReader(“Select Predict(Gender)…”) Data Mining Object Model Analysis Management Objects (AMO) ADOMD.Net, Server ADOMD.Net Direct access to Mining content CLR User Defined Procedures execute on the server Expandability Plug-In Algorithms Plug-In Viewers Session Summary Data Mining is the automatic extraction of information from data for descriptive or predictive purposes Data Mining addresses a wide variety of problems SQL Server 2005 contains a fullfeatured set of data mining tools and API’s for the creation and deployment of data mining solutions. Next Steps 1) 2) 3) 4) 5) SQL Server website: http://www.microsoft.com/sql Virtual labs Data Mining Tutorial Find more info at: http://www.sqldatamining.com Ask Questions: news:microsoft.public.sqlserver.datamining