Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Predictive Models for Enhanced Audit Selection: The Texas Audit Scoring System FTA TECHNOLOGY CONFERENCE 2003 Bill Haffey, SPSS Inc. Daniele Micci-Barreca, Elite Analytics LLC Agenda ß Data Mining Overview ß Audit Selection Problem ß Data Mining for Audit Selection Texas Audit Select System ß Predictive Models ß Results ß ß Final Remarks 1 Data Mining Overview ß Data Mining ß ß “. . . is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” (Gartner Group) Mature, widely accepted in: ß ß Commercial (e.g., CRM, credit-scoring, and ‘eTail’) Public sector (e.g., payment error detection, law enforcement, logistics, and homeland security) Data Mining Overview Cross Industry Standard Process for Data Mining (CRISP-DM) • launched 1996 • 200 members across vendor, integrator/consultant, and user communities • application/tool/vendor neutral • focus on business issues, guidance framework, and comfort factor • www.crisp-dm.org 2 Data Mining Overview Modeling • Data driven • Supervised techniques • modeling relationships between known targets and potentially relevant inputs using training data • for example, Neural Networks • Unsupervised techniques • ‘discovering’ patterns (associations, sequences, segments) • for example, Clustering • Clusters/groups/segments of similar returns • ‘small’ clusters can be interesting Audit Selection Challenge Problem: Maximizing tax collection and compliance with limited auditing resources. Texas Statistics: ß Active Sales Tax Taxpayers: 761,434 ß Available Auditors: 380 ß Yearly Audits Completed: ~5,200 ß % of active taxpayers audited yearly: 0.7% 3 Finding the “golden needle” ßA small percentage of audits account for a very large proportion of assessments ß Total assessed tax adjustments from audits: ß ß Approximately $90M (2002) 40% from the top 0.5% of completed audits ß One fifth of resources spent on no-change audits ß Percentage of “no-change” audits: 35-40% ß No-change audits account for 15-20% of hours Audit Selection Strategies Traditional audit selection strategies: ß Top Contributors Focus on reported tax dollars, large businesses ß Texas “Priority 1” program ß ß Prior Audits Focus on prior audit outcome ß Texas “Prior Productive” program ß 4 Why Data Mining? ß Traditional audit selection criteria typically leverage a single metric: Total tax reported ß Prior audit results ß Percent deductions, SIC, Age of the business, etc. ß ß Difficult to identify and leverage patterns from audit outcome data: Which “metrics” are more relevant? ß Profiling the “golden needles” to find more of those… ß Multi-Dimensional Approach TAXPAYER PROFILE Data Mining can leverage dozens of taxpayer metrics and characteristics for enhanced audit selection. Gross Sales Deductions SIC Wages Other Tax Types Prior Audits Years in Business Other TP characteristics 5 Leveraging Audit Results Analytical Models Closed-Loop Data-Driven Process Data Mining Audit Selection Audits Texas Audit Select Scoring System ß Designed for Sales Tax audit selection ß Based on a predictive model estimating “final tax adjustment” ß Score ß All active sales tax accounts scored yearly ß Score ß In Scaled between 1 and 1000 presented in Audit Select application use since mid-2000 6 What is a Predictive Model? A representation of some target function that maps a set of input feature to one or more target variables Input 1 Input 2 Input 3 … … Input 10 Model Target ß Inputs: Gross Sales, Deductions, SIC, Prior Audits,… ß Target: Final Tax Adjustment, Change vs. No-Change Predictive models are “trained” using an historical data set, for which the inputs and the target variable are known. Main Model Inputs Historical Audits Outcome SCORE Training Data Sales Tax Filings Other Tax Filings Model Employment Records Audit History Business Information MODEL INPUTS 7 Modeling Audit Outcomes Tax Filings Taxpayer Information Employment Data Historical Audits Other TP Years in Business Prior Audits Other Tax Wages SIC Deductions Taxpayer Profile at Audit Gross Sales Modeling Tools Audit Outcome ($) Model Scoring Taxpayers Employment Data Other TP Years in Business Prior Audits Other Tax Wages SIC Current Taxpayer Profile Deductions Taxpayer Information Gross Sales Tax Filings Score 0 1000 8 ADS Environment “Audit Select” Application Data Warehouse Mainframe External Data Sources Taxpayer “Scores” Data Mining Server Tax Adjustment Vs. Score ß High Score region (800 to 1000) produce higher than average tax adjustments 9 Finding New Leads Score Distribution ß Right-tail Detail Over one third of current taxpayers scoring above 750 had no prior audits Score Vs. Priority One ß Top 9% scoring audits vs. P1Audits (9% of total) ß Only 36% of top 9% scoring audits were P1 audits 10 Score-Enhanced P1 Selection Strategy 25% of P1 Audits Avg. Hourly Yield = $255 P1 Avg. Hourly Yield = $512 Score 0 600 State Contribution Ranking 45% to 65% & 1 Score < 600 2 Avg. Hourly Yield = $377 Keep Priority-1 Status 3 New Priority-1’s 1000 40% 65% % of Total Reported Tax ß Use scores to refine the Priority 1 selection strategy ß Replace Audits in Region-1 with Audits in Region-3 Score Vs. “Prior Productive” • Average tax adjustment for PP audits: $17,700 (median $4,100) • Top-ranking 20% of the audits based on of the Score, the average tax adjustment is $27,000 (median $6,500) 11 Challenges ß Much more difficult to predict “no-change” outcome than outcome scale ß Score relevance/accuracy limited by prior audit selection criteria ß Selection Bias Problem ß End-user education: selectors need training to gain confidence in the score ß Score does not replace human judgment, its just a selection tool Current Developments ß Score combines two predictive models ß ß ß “Audit Likelihood” model: models the selector judgment “Audit Outcome” model: predicts outcome High score if: ß ß 1) Taxpayer similar to other previously audited taxpayers, and 2) Likely to result in large tax adjustment Does the taxpayer fit the “selection profile”? Audit Likelihood Score Combined Score If selected, what will be the likely tax adjustment? Audit Outcome Score 12 Ongoing and Future Work ß Audit ß Select Model Enhancement Additional Tax Payer Factors ß IRS Data, Return amendments, etc. Better Modeling Algorithms ß Taxpayer Segmentation – Models by Segment ß ß Franchise Tax Scoring Model ß Enforcement ß Tax Models Affinity – Non-Filers Summary ß Data mining has great potential in tax compliance: ß ß ß Translates mountains of data into actionable information, predictions and categorization Improves efficiency of resource-intensive tasks, such as auditing Supports decision making and helps uncover more golden needles … 13 Acknowledgments ß Texas Comptroller’s Office, Audit Division Lisa McCormack ß Rose Orozco ß Eddie Coats ß ß Elite ß Analytics Dr. Satheesh Ramachandran Thanks ß Bill ß Haffey, SPSS Inc. [email protected] ß Daniele ß Micci-Barreca, Elite Analytics LLC [email protected] 14