Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
31 January 2005 INTERNAL ATO – Various 4 June 2009 SEGMENT AUDIENCE DATE Analytics: Data Mining for Risk and Compliance Name of Presenter Title of Presenter Analytics, Office of the Chief Knowledge Officer Version 1.0 Analytics: Data Mining for Risk and Compliance 1 Overview Analytics and the Data Mining Process Exploring Data Supervised Modelling Unsupervised Modelling Data Matching Analytics Project Achievements Analytics: Data Mining for Risk and Compliance 2 Analytics and the Data Mining Process The Shape and Form of a Data Mining Project Analytics: Data Mining for Risk and Compliance 3 Analytics Under Office of the Chief Knowledge Officer, and is part of EST sub-plan Established as a National capability in 2003 Team has been built up to 19 data mining specialists, representing the largest data mining team in Australia. Working with up to 60 analysts throughout the organisation to spread the new technology and provide an over arching framework for Risk Management for the ATO. The National team works closely with Business Lines to both deliver new risk models and to transfer skills and technology Analytics Community of Practise meets weekly to share experiences and technology, and to peer review modelling across the ATO. Analytics: Data Mining for Risk and Compliance 4 Analytics Functions Deploy data mining, Working with business lines to deliver new risk models Improved strike rates and more efficient usage of limited resources Analytics Community of Practise Weekly meetings and emailing lists to share experiences and to introduce new technologies AnalyticsNet Infrastructure New 64bit hardware to allow our large datasets to be analysed in memory (32GB memory) Sharing of new tools and technology Analytics Training Beginning a series of courses introducing data mining A hands-on approach – kick start with own data Analytics: Data Mining for Risk and Compliance 5 Analytics and Traditional Modelling Analytics brings a different, but complementary and advanced, approach to modelling and predicting client behaviour. Traditional modelling approaches explore client data and couple this with an understanding of financial processes to build mathematical models to simulate these processes, and to then identify non-compliance to models. Analytics, using data mining technology, supplements traditional modelling approaches by modelling from the data – using powerful tools to automatically search for interesting, unusual, unexpected, patterns that indicate non-compliance – a data driven approach. Analytics: Data Mining for Risk and Compliance 6 Data driven approach Crucial to have the right data - Clean - Relevant - Before the event An data mining project is a joint process between the business experts and data miners - business problem - business processes - data Analytics: Data Mining for Risk and Compliance 7 CRISP-DM The Data Mining Process 1. Business Understanding 2. Data Understanding 3. Data Preparation 4. Modelling 5. Evaluation 6. Deployment Sourer: http://www.crisp-dm.org/Process/index.htm Analytics: Data Mining for Risk and Compliance 8 Applying results of data mining… 1 2 3 4 Apply New Risk Segmentation Tune Screening Rules Optimise a Treatment Strategy Optimise Treatment Portfolio Instead of using $ value or market segment as proxy for risk, identify actual group and its characteristics. Adjust screening rules (thresholds, ratios, exceptions) to reflect better understanding of risk. Find the optimal point to maximise revenue collection, while minimising caseload and occurrence of fraud. Find the optimal point to maximise revenue collection, while minimising caseload and occurrence of fraud – for the whole of treatment portfolio. Look at adjusting, combining rules. Can be applied straight away. Apply risk scores to case selection to get best overall outcomes. Create new language and awareness of risk. Optimise the treatment mix Degree of Sophistication Optimisation is more than picking the right clients – the right treatment and right work mix also need to be optimised… Analytics: Data Mining for Risk and Compliance 9 Client Scoring for treatment selection… So we can personalise our treatment strategies to the client Letter X Decision Tree of Rules derived from data to assign scores Letter Y Treatment – Audit Call Treatment – Review Decision Tree Score 1000 950 900 850 800 750 700 650 600 550 500 450 400 350 300 250 200 150 100 50 0 Analytics: Data Mining for Risk and Compliance Neural Net Rule Induction Regression DM Neural In fact scores are likely to be done via several models ‘voting’ together – Ensembles. 10 Moving Forward with Analytics The low hanging fruit for Data Mining is the large collection of outcomes from audit activity – this has been a primary focus in the first instance. It is a more difficult data mining task to identify emerging risks, but technology for identifying emergent patterns is becoming available. Text mining and social network analysis will significantly enhance our Intelligence and Risk Modelling capabilities. Deployment of Analytics through Operational Analytics How best to deploy Analytics Models – new territory Translate models to SQL or leave in native language (R, SAS, Java)? Computational requirements of SQL over the Data Warehouse Analytics: Data Mining for Risk and Compliance 11 Supervised Modelling Working From What We Know To Build Models To Automate “Case Selection” Analytics: Data Mining for Risk and Compliance 12 Supervised modelling predict some value or outcome having seen a number of training examples - training data will have a ‘target’ variable - prediction can be a continuous variable, or a class model ‘learns’ from training data, and is tested on ‘unseen’ cases Analytics: Data Mining for Risk and Compliance 13 Effect of Adding More Data – Data is Fundamental Base Data 80 60 40 Performance (%) 60 40 Revenue Recall Precision Revenue Recall Precision 20 20 Performance (%) 80 100 100 Client History 4% 0 0 7% 0 20 40 60 80 Caseload (%) Analytics: Data Mining for Risk and Compliance 100 0 20 40 60 80 100 Caseload (%) 14 New Technologies Regression Decision Trees Random Forests Boosted Trees Support Vector Machines Neural Networks Analytics: Data Mining for Risk and Compliance 15 Unsupervised Modelling A Data Driven Approach to Identifying – Exploring – Understanding Client Groups Analytics: Data Mining for Risk and Compliance 16 Unsupervised modelling A class of problems in which one seeks to determine how the data are organised Distinct from supervised modelling in that the data have no ‘target’ variable Seek to summarise and explain key features of the data. Analytics: Data Mining for Risk and Compliance 17 Cluster Analysis Seeks to identify homogeneous subgroups in a population establish groups and then analyse group membership discovers structures in data without explaining why they exist mostly used when no a priori hypotheses, but are still in the exploratory phase of our research use to classify large amounts of information into manageable meaningful piles Analytics: Data Mining for Risk and Compliance 18 Omitted Income – outlier detection outlie r outlier outlie r outlier Analytics: Data Mining for Risk and Compliance 19 Self Organising Maps (SOM) A self-organizing map is a special type of artificial neural network which performs unsupervised competitive learning (Kohonen, 1982) Useful for visualising low-dimensional views of highdimensional data Plot the similarities of the data by grouping similar data items together Analytics: Data Mining for Risk and Compliance 20 Debt Behaviour - Self Organising Maps Aim: understand the logic and structures that drive tax payers’ compliance behaviour (behavioural archetypes). Construct ‘psychographic groups’ (Wells 1975) by using data mining clusters – each cell in the “map” represents thousands of entities who are similar across many characteristics. Identify hot spots which indicate high levels of “activity” associated with different characteristics. 6.5 Million entities in total population Analytics: Data Mining for Risk and Compliance 21 Text Mining of Complex Documents Large collections of documents (unstructured data from multiple sources including source systems, client hard drives and scanned material) need to be reviewed Task: systematically sift the required information from the “noise” Aim: Reduce the time taken to identify those documents that support compliance treatment Analytics: Data Mining for Risk and Compliance 22 Associated Entities Identifying and understanding Associated Entities is important in many different Taxation contexts. Debt: Linking Associated Entities is important in understanding an entity’s Propensity and Capacity to Pay and then in modelling their debt risk. Entities are associated through partnerships, directorships, and consolidated groups where we need to identify the ultimate holding company. One Degree of Separation Lodgment: Analyse lodgment behaviour and risk to revenue by knowing relationships between Associated Entities. Relationships derived from the linkages could be used for identifying “leverage” points for more effective treatment strategies. Tool is in the early stages of development. Colours: Companies Government Individuals Partnerships Superannuation Trusts Triangle = non lodged; Circle = lodged; Size = Ind … Large Analytics: Data Mining for Risk and Compliance Two Degrees of Separation 23 Associated Entities One Degree of Separation Two Degrees of Separation Three Degrees of Separation Companies Government Individuals Partnerships Suoerannuation Trusts Triangle = non lodged Circle = lodged Four Degrees of Separation Size = Ind … Large Five Degrees of Separation Analytics: Data Mining for Risk and Compliance 24 Data Matching AUSTRAC Internal data Analytics: Data Mining for Risk and Compliance 25 Analytics Project Achievements Application of Data Mining in the ATO Analytics: Data Mining for Risk and Compliance 26 Data mining at work Analytics: Data Mining for Risk and Compliance 27 Intangible Effect of Data Mining Analytics: Data Mining for Risk and Compliance 28 Other projects Ceased Business Failure to Lodge IT Return Not Necessary Propensity to Lodge Risk to Revenue – FBT Strategy Evaluation and Improvement In House Prosecutions Risk to Information Risk Score Associated Entities Risk to Reputation Analytics: Data Mining for Risk and Compliance 29 Some New Points of View Fraud found at the edge or boundary of pockets of activity rather than being outliers Outliers Boundary Cases Analytics: Data Mining for Risk and Compliance 30 Scattergram (Taylor-Russell Table) Aberrant Cases Baseline Separating Aberrant from Acceptable Cases False Negatives True Negatives Acceptable Cases Analytics: Data Mining for Risk and Compliance True Positives False Positives Cutoff used by Classifier 31 Role of Expertise Need to develop procedures/methods for capturing the knowledge, skills and strategies experts employ yo identify non compliance or the smell factor with cases and to incorporate these as routines and models in our discovery and detection systems Examples include the expertise used for Risk Identification Feature Selection Classifications of Cases Analytics: Data Mining for Risk and Compliance 32