Download Predictive Models for Enhanced Audit Selection

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Predictive Models for
Enhanced Audit Selection:
The Texas Audit Scoring
System
FTA TECHNOLOGY CONFERENCE 2003
Bill Haffey, SPSS Inc.
Daniele Micci-Barreca, Elite Analytics LLC
Agenda
ß Data
Mining Overview
ß Audit
Selection Problem
ß Data
Mining for Audit Selection
Texas Audit Select System
ß Predictive Models
ß Results
ß
ß Final
Remarks
1
Data Mining Overview
ß
Data Mining
ß
ß
“. . . is the process of discovering meaningful new
correlations, patterns and trends by sifting through large
amounts of data stored in repositories, using pattern
recognition technologies as well as statistical and
mathematical techniques.” (Gartner Group)
Mature, widely accepted in:
ß
ß
Commercial (e.g., CRM, credit-scoring, and ‘eTail’)
Public sector (e.g., payment error detection, law
enforcement, logistics, and homeland security)
Data Mining Overview
Cross Industry Standard Process for Data
Mining (CRISP-DM)
• launched 1996
• 200 members across vendor,
integrator/consultant, and user communities
• application/tool/vendor neutral
• focus on business issues, guidance framework,
and comfort factor
• www.crisp-dm.org
2
Data Mining Overview Modeling
• Data driven
• Supervised techniques
• modeling relationships between
known targets and potentially
relevant inputs using training data
• for example, Neural Networks
• Unsupervised techniques
• ‘discovering’ patterns
(associations, sequences,
segments)
• for example, Clustering
• Clusters/groups/segments
of similar returns
• ‘small’ clusters can be
interesting
Audit Selection Challenge
Problem:
Maximizing tax collection and compliance with limited
auditing resources.
Texas Statistics:
ß
Active Sales Tax Taxpayers: 761,434
ß
Available Auditors: 380
ß
Yearly Audits Completed: ~5,200
ß
% of active taxpayers audited yearly: 0.7%
3
Finding the “golden
needle”
ßA
small percentage of audits account for a
very large proportion of assessments
ß
Total assessed tax adjustments from audits:
ß
ß
Approximately $90M (2002)
40% from the top 0.5% of completed audits
ß One
fifth of resources spent on no-change
audits
ß
Percentage of “no-change” audits: 35-40%
ß
No-change audits account for 15-20% of hours
Audit Selection Strategies
Traditional audit selection strategies:
ß Top
Contributors
Focus on reported tax dollars, large businesses
ß Texas “Priority 1” program
ß
ß Prior
Audits
Focus on prior audit outcome
ß Texas “Prior Productive” program
ß
4
Why Data Mining?
ß Traditional
audit selection criteria typically
leverage a single metric:
Total tax reported
ß Prior audit results
ß Percent deductions, SIC, Age of the business, etc.
ß
ß Difficult
to identify and leverage patterns from
audit outcome data:
Which “metrics” are more relevant?
ß Profiling the “golden needles” to find more of
those…
ß
Multi-Dimensional Approach
TAXPAYER PROFILE
Data Mining can leverage dozens of taxpayer
metrics and characteristics for enhanced audit
selection.
Gross Sales
Deductions
SIC
Wages
Other Tax Types
Prior Audits
Years in Business
Other TP characteristics
5
Leveraging Audit Results
Analytical
Models
Closed-Loop
Data-Driven
Process
Data
Mining
Audit
Selection
Audits
Texas Audit Select Scoring
System
ß Designed
for Sales Tax audit selection
ß Based
on a predictive model estimating “final
tax adjustment”
ß Score
ß All
active sales tax accounts scored yearly
ß Score
ß In
Scaled between 1 and 1000
presented in Audit Select application
use since mid-2000
6
What is a Predictive Model?
A representation of some target function that maps a
set of input feature to one or more target variables
Input 1
Input 2
Input 3
…
…
Input 10
Model
Target
ß
Inputs: Gross Sales, Deductions, SIC, Prior Audits,…
ß
Target: Final Tax Adjustment, Change vs. No-Change
Predictive models are “trained” using an historical data set, for
which the inputs and the target variable are known.
Main Model Inputs
Historical
Audits Outcome
SCORE
Training
Data
Sales Tax
Filings
Other Tax
Filings
Model
Employment
Records
Audit
History
Business
Information
MODEL INPUTS
7
Modeling Audit Outcomes
Tax Filings
Taxpayer
Information
Employment
Data
Historical Audits
Other TP
Years in Business
Prior Audits
Other Tax
Wages
SIC
Deductions
Taxpayer Profile
at Audit
Gross Sales
Modeling Tools
Audit Outcome ($)
Model
Scoring Taxpayers
Employment
Data
Other TP
Years in Business
Prior Audits
Other Tax
Wages
SIC
Current Taxpayer Profile
Deductions
Taxpayer
Information
Gross Sales
Tax Filings
Score
0
1000
8
ADS Environment
“Audit Select”
Application
Data Warehouse
Mainframe
External Data
Sources
Taxpayer “Scores”
Data Mining
Server
Tax Adjustment Vs. Score
ß
High Score region (800 to 1000) produce higher than
average tax adjustments
9
Finding New Leads
Score Distribution
ß
Right-tail Detail
Over one third of current taxpayers scoring above 750 had no
prior audits
Score Vs. Priority One
ß
Top 9% scoring audits vs. P1Audits (9% of total)
ß
Only 36% of top 9% scoring audits were P1 audits
10
Score-Enhanced
P1 Selection Strategy
25% of P1 Audits
Avg. Hourly Yield = $255
P1 Avg. Hourly Yield = $512
Score
0
600
State Contribution Ranking
45% to 65%
&
1
Score < 600
2
Avg. Hourly Yield = $377
Keep Priority-1 Status
3
New Priority-1’s
1000
40%
65%
% of Total Reported Tax
ß
Use scores to refine the Priority 1 selection strategy
ß Replace Audits in Region-1 with Audits in Region-3
Score Vs. “Prior
Productive”
• Average tax adjustment for PP audits: $17,700 (median $4,100)
• Top-ranking 20% of the audits based on of the Score, the average tax
adjustment is $27,000 (median $6,500)
11
Challenges
ß
Much more difficult to predict “no-change” outcome
than outcome scale
ß
Score relevance/accuracy limited by prior audit
selection criteria
ß
Selection Bias Problem
ß
End-user education: selectors need training to gain
confidence in the score
ß
Score does not replace human judgment, its just a
selection tool
Current Developments
ß
Score combines two predictive models
ß
ß
ß
“Audit Likelihood” model: models the selector judgment
“Audit Outcome” model: predicts outcome
High score if:
ß
ß
1) Taxpayer similar to other previously audited taxpayers, and
2) Likely to result in large tax adjustment
Does the
taxpayer fit the
“selection profile”?
Audit
Likelihood
Score
Combined
Score
If selected, what will
be the likely
tax adjustment?
Audit
Outcome
Score
12
Ongoing and Future Work
ß Audit
ß
Select Model Enhancement
Additional Tax Payer Factors
ß
IRS Data, Return amendments, etc.
Better Modeling Algorithms
ß Taxpayer Segmentation – Models by Segment
ß
ß Franchise
Tax Scoring Model
ß Enforcement
ß Tax
Models
Affinity – Non-Filers
Summary
ß
Data mining has great potential in tax compliance:
ß
ß
ß
Translates mountains of data into actionable information,
predictions and categorization
Improves efficiency of resource-intensive tasks, such as
auditing
Supports decision making and helps uncover more golden
needles …
13
Acknowledgments
ß Texas
Comptroller’s Office, Audit Division
Lisa McCormack
ß Rose Orozco
ß Eddie Coats
ß
ß Elite
ß
Analytics
Dr. Satheesh Ramachandran
Thanks
ß Bill
ß
Haffey, SPSS Inc.
[email protected]
ß Daniele
ß
Micci-Barreca, Elite Analytics LLC
[email protected]
14