Download Data Mining Techniques and Opportunities for Taxation Agencies

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining Techniques
and Opportunities for
Taxation Agencies
Louis Panebianco
Florida
Eric Wojcik
Consultant
In This Session ...
 You will learn the data mining techniques below
and their application for Tax Agencies





ABC Analysis
Association Analysis
Clustering
Decision Trees
Score Carding Techniques
 You will be provided with descriptions of realworld usage and ideas for practical application in
your agency
 You will hear examples of their usage from the
State of Florida
Copyright © 2008 Deloitte Development LLC. All rights reserved.
1
Agenda
What is Data Mining?
Eric Wojcik
Data Mining Techniques
Eric Wojcik
How can we use Data Mining in our Agency?
Eric Wojcik
Usage in Florida
Louis Panebianco
Recap/Closing
Louis Panebianco
Copyright © 2008 Deloitte Development LLC. All rights reserved.
2
A Few Common Questions…
Data mining still holds a mythical reputation
– Is it very complicated?
Advanced techniques still are… on paper!
 Conceptually, techniques are still a challenge.
Application is usually not.

– What can data mining do?
 Turn tomes of data into small, digestible, ideally
actionable amounts of information
– What staff do I need to mine data?
 In the past it was the actuarial staff, select financial
staff, and PhDs. Tools of today allow the process
Power Users to data mine effectively.
Copyright © 2008 Deloitte Development LLC. All rights reserved.
3
What is Data Mining? – The Answer!
Definition:
“…the science of extracting useful information from large data sets or
databases.”
D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA.
“…the nontrivial extraction of implicit, previously unknown, and
potentially useful information from data.”
W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992). "Knowledge Discovery in
Databases: An Overview“
“…the statistical and logical analysis of large sets of … data, looking
for patterns that can aid decision making.”
Ellen Monk, Bret Wagner (2006). Concepts in Enterprise Resource Planning, Second Edition.
Thomson Course Technology, Boston, MA
“Intentionally using statistical techniques on a large
amount of data to either discover new insight or to
convert it into actionable information.”
- Eric Wojcik Reporting and Analytics 2008 Conference
Copyright © 2008 Deloitte Development LLC. All rights reserved.
4
4
What is data mining? – Vital Concepts
 Focused Ideas – Especially Goal
 Data Quality/Integrity
 Discovery vs. Predictive
 Training, Benchmarking, and Prediction
Copyright © 2008 Deloitte Development LLC. All rights reserved.
5
Agenda
What is Data Mining?
Eric Wojcik
Data Mining Techniques
Eric Wojcik
How can we use Data Mining in our Agency?
Eric Wojcik
Usage in Florida
Louis Panebianco
Recap/Closing
Louis Panebianco
6
Copyright © 2008 Deloitte Development LLC. All rights reserved.
ABC Analysis - Concept
 ABC Analysis is similar to the Pareto Principle
– 80/20 Rule
– The Law of Vital Few
– The Principle of Factor Sparsity
 ABC Analysis is primarily used as a discovery technique.
 Concept: 80% of the outcome come from 20% of the root causes
– Call Center: 80% of all incoming
calls are in reference to 20%
of the potential reasons for
calling in
– Economics: 80% of a country’s
wealth is owned by 20% of the
population.*
Copyright © 2008 Deloitte Development LLC. All rights reserved.
7
7
ABC Analysis – Concept (cont.)
 ABC Analysis is a slight deviation of the Pareto Principle
– Began as an inventory classification technique and commonly
referenced in Supply Chain applications
ABC Codes break down as following:
– “A” Class contains 60% of total value
– “B” Class contains 20% of total value
– “C” Class accounts for the remaining 20% of total value
– Classification codes are very process specific and likely have a
different meaning across each process
Copyright © 2008 Deloitte Development LLC. All rights reserved.
8
Association Analysis - Concept
 Association Analysis unveils hidden interactions,
correlations, and patterns in your data.
– Affinity Analysis
– Market Basket Analysis (MBA)
 Concept:
– Uncover patterns in the data and then define rules to the
interaction.
– “If a person purchases onions and
potatoes, they are 50% more likely to
also purchase beef.”
 {Onions, Potatoes}  {Beef}
Copyright © 2008 Deloitte Development LLC. All rights reserved.
9
Association Analysis – Concept (cont.)
 Transactions determine the rules
– The rules should change over time - moving timeframe
 Association Analysis is typically augmented with
other techniques if automated
10
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Clustering - Concept
 Clustering is statistically separating records into smaller unique groups
based on common similarities
– Also known as a Classification Problem
 Clustering can be used for discovery or prediction.
 The two primary forms of clusters are Hierarchical and Partitional
Raw Data
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Clustered Data
11
Clustering – Concept (cont.)
 Hierarchical Clusters
12
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Clustering – Concept (cont.)
 Partitional Clusters
Copyright © 2008 Deloitte Development LLC. All rights reserved.
13
13
Decision Trees
 An easy to understand method that classifies possible outcomes from
previous decisions bases on characteristics in the data and develops
rules to predict the future outcomes.
 Once a tree is “trained” and created it can be used on new data to
predict decisions
 Trees can even be
partially defined to meet
regulatory requirements
Copyright © 2008 Deloitte Development LLC. All rights reserved.
14
Decision Trees – Concept (cont.)
 Decision Trees can also be used in two ways
– Discovery – to find new insight into the data
– Prediction – to statistically guess what the outcome will be (requires
a “trained” decision tree)
 A prediction tree can automate business
processes, but, must be periodically
retrained
 Commonly used in
– Insurance
– Legal
– Sales/Marketing
Copyright © 2008 Deloitte Development LLC. All rights reserved.
15
Score Carding Techniques
 Score Carding is a technique in which a valuation
function is applied to value (score) a data set
– Approximation
 The two most common techniques are weighted
scoring and regression
– Weighted Scoring is a manually defined score
– Regression (either linear or non-linear) is an automated
method for scoring
16
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Score Carding Techniques (cont.)
Weighted Average
Linear Regression
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Non-Linear Regression
17
Agenda
What is Data Mining?
Eric Wojcik
Data Mining Techniques
Eric Wojcik
How can we use Data Mining in our Agency?
Eric Wojcik
Usage in Florida
Louis Panebianco
Recap/Closing
Louis Panebianco
Copyright © 2008 Deloitte Development LLC. All rights reserved.
18
How can we use Data Mining?
Copyright © 2008 Deloitte Development LLC. All rights reserved.
19
How can we use Data Mining?
20
Copyright © 2008 Deloitte Development LLC. All rights reserved.
Agenda
What is Data Mining?
Eric Wojcik
Data Mining Techniques
Eric Wojcik
How can we use Data Mining in our Agency?
Eric Wojcik
Usage in Florida
Louis Panebianco
Recap/Closing
Louis Panebianco/Eric
Wojcik
Copyright © 2008 Deloitte Development LLC. All rights reserved.
21
Florida’s Decision Tree Distributions
Copyright © 2008 Deloitte Development LLC. All rights reserved.
22
Florida’s Lead Development
Copyright © 2008 Deloitte Development LLC. All rights reserved.
23
Agenda
What is Data Mining?
Eric Wojcik
Data Mining Techniques
Eric Wojcik
How can we use Data Mining in our Agency?
Eric Wojcik
Usage in Florida
Louis Panebianco
Recap/Closing
Louis Panebianco/Eric
Wojcik
Copyright © 2008 Deloitte Development LLC. All rights reserved.
24
Questions
Copyright © 2008 Deloitte Development LLC. All rights reserved.
25
Copyright © 2008 Deloitte Development LLC. All rights reserved.