Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Control Methods Workshop 2010 13-15 April Ispra Italy Risk based selection for On the Spot Control at Agricultural and Rural Development Agency Miklós Lelkes Central Physical Control Department Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Selection for physical control • Start point: 100 % of claims • Selection of control sample (cost effectiveness) (e.g. min. 5% control rate) – Random selection: 20-25% – Risk analysis: 75-80% (representative overview) (financial risk of EU) – Direct selection • To have effective control methods Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Defining the risk factors and weighting • First year(s) –Based on expert appraisal • Evaluation of the results of the Control  update of the selection method • Changes in the category limits • Changes in the scoring • Changes in the weighting Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Definition of a risk factor (example) Risk score small / large cases are with a higher risk Average parcel size Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Evaluation of the effectiveness of the risk analysis (2007) • ~ 241.000 claims • For > 80 measures • 4th year of Hungary/ARDA in the EU  Huge amount of information in the IACS  Need for special solution for deriving information from the DB  Data mining software/technique Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy The goal of Data Mining ?  Not to find the perfect model for a certain problem  but to find the optimal model for a certain problem that is: • • • • Robust Generalizes well Easy to understand Provides insight into drivers of the problem • Easy to implement Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Risk Analysis#1 Create an abstract mathematical model that behaves coherently with regard to risky farmers. Model 1 Model 2 Generalize risk patterns for automatic detection - Identify typical patterns - Score applications for probability of noncompliance Agricultural and Rural Development Agency (ARDA) Budapest, Hungary training validation Control Methods Workshop 2010, Ispra Italy Risk Analysis#2 • Modeling techniques: - Predictive Modeling: • Decision Trees • Neural Networks • Regression • Scorecards • Pre-requirements: - confirmed historical cases as input data (non-compliance flag) • Result: Risk score Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Data mining / estimation models % of anomaly as a function of sampling rate (with population ordered by decreasing rate of anomaly) Average rate of anomalies Target variable: Errors (bad / good) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Data mining with SAS in Hu • Pilot in 2007 – Selection of OTSC sample of SAPS by data mining • Operational from 2008 – All area and animal based subsidies • 2nd year of operational work in 2009 – Extend use to all measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Interesting first results (2007) Area size • Less categories in the factors Age High risk = low score >0.3 and <1.0 -5 >1.0 and <10.0 3 >10.0 -1 >18 and <40 1 >40 and <60 8 >60 • Some criterion were not relevant! (but in 2007 they were used because of the regulation) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary 3 Control Methods Workshop 2010, Ispra Italy Operational work from 2008 • The Integrated Risk Analysis System includes: • Interface to – – – – – IACS System (direct access) Hungarian ovine and caprine I&R System Hungarian bovine I&R System Hungarian porcine I&R System Farmers Registry • Risk datamart • Analytical models (approx. 15 analytical models) – Area based measures (e.g. SAPS); – Agricultural Environment Protection Program based measures; – Rural Development (investment projects) based measures Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Architecture of Integrated Risk Analysis System Place of operational work (system for transactions) Utilization Datamart Risk management Datawarehouse IACS Source Extraction data quality integration transformation Data mining Statistical analysis Web reporting Information for management External Uniform handling of metadata Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Models • Decision tree – Easy to understand (if not too complicated) – Big groups • Regression • Neural network – Normally the best prediction result – The result can not be interpreted (“black box”) • Scorecard Agricultural and Rural Development Agency (ARDA) Budapest, Hungary 14 Control Methods Workshop 2010, Ispra Italy Scorecard • Easy to work with, can be calculated “by hand” • Easy to interpret • Good for non linear variables (i.e. age) • Not a problem, if the variable has strange distribution • Sometimes can result in big groups • Not a group of factors, but a uniform model! Agricultural and Rural Development Agency (ARDA) Budapest, Hungary 15 Control Methods Workshop 2010, Ispra Italy Data Mining Model Training and Scoring Target Variable • Converting a complex control result in a binary format (black or white) • Should be defined by PA Analysis Score Rules (Score Code) Scoring Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Prediction Control Methods Workshop 2010, Ispra Italy Operational work from 2008 • The Integrated Risk Analysis System includes (cont.): – Scoring lists – Evaluation statistics (Random and Risk based) – OLAP reports – Ad-hoc reporting and analysis interface SAS® Enterprise Guide SAS® Enterprise Miner 5 Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Results • Analysis of results (e.g.: interpretation of scorecards) • Model documentation – Automatic report in Enterprise Miner – Auditable documentation of data mining process • Documentation of all selection procedures • Review of model quality • Statistics for EU (OLAP cubes) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Risk factors found relevant by data mining 2007 2008 2009 Total amount of application Total area change in the area --Total area change in the area --Total area change in the area Number of physical blocks in 2007 Number of physical blocks --Number of physical blocks in 2006 number of parcels Number of parcels Number of different land use type average parcel size underclaimed reference parcel underclaimed reference parcel in the previous year number of parcels change in number of parcels from change in number of parcels previous year from previous year change in number of parcels from --2005 to 2006 change in number of parcels from --2004 to 2005 --------underclaimed reference underclaimed reference parcel parcel --- Agricultural and Rural Development Agency (ARDA) Budapest, Hungary --- Control Methods Workshop 2010, Ispra Italy Risk factors found relevant by data mining 2007 percentage of risky parcels in the application (close to minimum parcel size) joint cultivation grassland in the application 2008 2009 --- --- ----- percentage of orchard and vineyard --- compactness of the farm change of application complexity result of earlier controls ----result of earlier controls --grassland in the application percentage of orchard and vineyard ------- TOP-UP crops in the application --- Gender of applicant --- Age of applicant Age of applicant Other area based applications --- Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Results by mean of efficiency • Cumulative lift: 1.5-4 times higher compared to random sample Lift in SAPS (total) 1.9 1.8 1.7 1.6 1.5 1.4 – Fine tuning of criterion 1.3 and weighting 1.2 1.1 (0.2-3 lift increase) 1 2006 2007 2008 2009 2010 – Selection of variables (0.1-1 lift increase) Lift = ratio of anomalies in risk – Proposal of new variables to include sample over random sample (0.1-0.8 lift increase) – Global optimization of cross-validation effects (0.1-0.5 lift increase) Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Some objectives found by OTSC • The quality of application could be better  level of irregularities in the random sample still high ?(although it is better from year to year) – Higher control rate?  better risk assessment (in terms of % of anomaly)! • Rate of irregularities in classical field inspection sample is significantly higher than in RS sample (research needed) – Difference in technique – Difference in selection / population Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Sliding window Blocks  farmers  risk calculation per window, other info Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Various shapes 30 x 30, 30 x 42, 10 x 30, 30 x 10, 10 x 10… Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Efficiency Net over gross ratio constraint may be opposite to risk? Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Conculsion for the effective risk analysis • Worth to use data mining solution • Yes, we need RS, but the level of anomalies in RS sample is a question • Better not to use all farms in the RS zone – 20-25% use of VHR? – Site shape? – Quota for MS? • Cost of data mining solution? – Expensive – But cheaper, than a flat rate correction! Agricultural and Rural Development Agency (ARDA) Budapest, Hungary Control Methods Workshop 2010, Ispra Italy Thank you for your attention! Agricultural and Rural Development Agency H-1095 Soroksári út 22-24. www.mvh.gov.hu Central Physical Control Department Tel.: + 36 1 301-2409 Fax:+ 36 1 301-2444 E-mail: [email protected]