Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining Techniques and Opportunities for Taxation Agencies Louis Panebianco Florida Eric Wojcik Consultant In This Session ... You will learn the data mining techniques below and their application for Tax Agencies ABC Analysis Association Analysis Clustering Decision Trees Score Carding Techniques You will be provided with descriptions of realworld usage and ideas for practical application in your agency You will hear examples of their usage from the State of Florida Copyright © 2008 Deloitte Development LLC. All rights reserved. 1 Agenda What is Data Mining? Eric Wojcik Data Mining Techniques Eric Wojcik How can we use Data Mining in our Agency? Eric Wojcik Usage in Florida Louis Panebianco Recap/Closing Louis Panebianco Copyright © 2008 Deloitte Development LLC. All rights reserved. 2 A Few Common Questions… Data mining still holds a mythical reputation – Is it very complicated? Advanced techniques still are… on paper! Conceptually, techniques are still a challenge. Application is usually not. – What can data mining do? Turn tomes of data into small, digestible, ideally actionable amounts of information – What staff do I need to mine data? In the past it was the actuarial staff, select financial staff, and PhDs. Tools of today allow the process Power Users to data mine effectively. Copyright © 2008 Deloitte Development LLC. All rights reserved. 3 What is Data Mining? – The Answer! Definition: “…the science of extracting useful information from large data sets or databases.” D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA. “…the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.” W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992). "Knowledge Discovery in Databases: An Overview“ “…the statistical and logical analysis of large sets of … data, looking for patterns that can aid decision making.” Ellen Monk, Bret Wagner (2006). Concepts in Enterprise Resource Planning, Second Edition. Thomson Course Technology, Boston, MA “Intentionally using statistical techniques on a large amount of data to either discover new insight or to convert it into actionable information.” - Eric Wojcik Reporting and Analytics 2008 Conference Copyright © 2008 Deloitte Development LLC. All rights reserved. 4 4 What is data mining? – Vital Concepts Focused Ideas – Especially Goal Data Quality/Integrity Discovery vs. Predictive Training, Benchmarking, and Prediction Copyright © 2008 Deloitte Development LLC. All rights reserved. 5 Agenda What is Data Mining? Eric Wojcik Data Mining Techniques Eric Wojcik How can we use Data Mining in our Agency? Eric Wojcik Usage in Florida Louis Panebianco Recap/Closing Louis Panebianco 6 Copyright © 2008 Deloitte Development LLC. All rights reserved. ABC Analysis - Concept ABC Analysis is similar to the Pareto Principle – 80/20 Rule – The Law of Vital Few – The Principle of Factor Sparsity ABC Analysis is primarily used as a discovery technique. Concept: 80% of the outcome come from 20% of the root causes – Call Center: 80% of all incoming calls are in reference to 20% of the potential reasons for calling in – Economics: 80% of a country’s wealth is owned by 20% of the population.* Copyright © 2008 Deloitte Development LLC. All rights reserved. 7 7 ABC Analysis – Concept (cont.) ABC Analysis is a slight deviation of the Pareto Principle – Began as an inventory classification technique and commonly referenced in Supply Chain applications ABC Codes break down as following: – “A” Class contains 60% of total value – “B” Class contains 20% of total value – “C” Class accounts for the remaining 20% of total value – Classification codes are very process specific and likely have a different meaning across each process Copyright © 2008 Deloitte Development LLC. All rights reserved. 8 Association Analysis - Concept Association Analysis unveils hidden interactions, correlations, and patterns in your data. – Affinity Analysis – Market Basket Analysis (MBA) Concept: – Uncover patterns in the data and then define rules to the interaction. – “If a person purchases onions and potatoes, they are 50% more likely to also purchase beef.” {Onions, Potatoes} {Beef} Copyright © 2008 Deloitte Development LLC. All rights reserved. 9 Association Analysis – Concept (cont.) Transactions determine the rules – The rules should change over time - moving timeframe Association Analysis is typically augmented with other techniques if automated 10 Copyright © 2008 Deloitte Development LLC. All rights reserved. Clustering - Concept Clustering is statistically separating records into smaller unique groups based on common similarities – Also known as a Classification Problem Clustering can be used for discovery or prediction. The two primary forms of clusters are Hierarchical and Partitional Raw Data Copyright © 2008 Deloitte Development LLC. All rights reserved. Clustered Data 11 Clustering – Concept (cont.) Hierarchical Clusters 12 Copyright © 2008 Deloitte Development LLC. All rights reserved. Clustering – Concept (cont.) Partitional Clusters Copyright © 2008 Deloitte Development LLC. All rights reserved. 13 13 Decision Trees An easy to understand method that classifies possible outcomes from previous decisions bases on characteristics in the data and develops rules to predict the future outcomes. Once a tree is “trained” and created it can be used on new data to predict decisions Trees can even be partially defined to meet regulatory requirements Copyright © 2008 Deloitte Development LLC. All rights reserved. 14 Decision Trees – Concept (cont.) Decision Trees can also be used in two ways – Discovery – to find new insight into the data – Prediction – to statistically guess what the outcome will be (requires a “trained” decision tree) A prediction tree can automate business processes, but, must be periodically retrained Commonly used in – Insurance – Legal – Sales/Marketing Copyright © 2008 Deloitte Development LLC. All rights reserved. 15 Score Carding Techniques Score Carding is a technique in which a valuation function is applied to value (score) a data set – Approximation The two most common techniques are weighted scoring and regression – Weighted Scoring is a manually defined score – Regression (either linear or non-linear) is an automated method for scoring 16 Copyright © 2008 Deloitte Development LLC. All rights reserved. Score Carding Techniques (cont.) Weighted Average Linear Regression Copyright © 2008 Deloitte Development LLC. All rights reserved. Non-Linear Regression 17 Agenda What is Data Mining? Eric Wojcik Data Mining Techniques Eric Wojcik How can we use Data Mining in our Agency? Eric Wojcik Usage in Florida Louis Panebianco Recap/Closing Louis Panebianco Copyright © 2008 Deloitte Development LLC. All rights reserved. 18 How can we use Data Mining? Copyright © 2008 Deloitte Development LLC. All rights reserved. 19 How can we use Data Mining? 20 Copyright © 2008 Deloitte Development LLC. All rights reserved. Agenda What is Data Mining? Eric Wojcik Data Mining Techniques Eric Wojcik How can we use Data Mining in our Agency? Eric Wojcik Usage in Florida Louis Panebianco Recap/Closing Louis Panebianco/Eric Wojcik Copyright © 2008 Deloitte Development LLC. All rights reserved. 21 Florida’s Decision Tree Distributions Copyright © 2008 Deloitte Development LLC. All rights reserved. 22 Florida’s Lead Development Copyright © 2008 Deloitte Development LLC. All rights reserved. 23 Agenda What is Data Mining? Eric Wojcik Data Mining Techniques Eric Wojcik How can we use Data Mining in our Agency? Eric Wojcik Usage in Florida Louis Panebianco Recap/Closing Louis Panebianco/Eric Wojcik Copyright © 2008 Deloitte Development LLC. All rights reserved. 24 Questions Copyright © 2008 Deloitte Development LLC. All rights reserved. 25 Copyright © 2008 Deloitte Development LLC. All rights reserved.