Download HU_Risk_analysis2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Control Methods Workshop 2010
13-15 April
Ispra Italy
Risk based selection
for On the Spot Control
at Agricultural and Rural
Development Agency
Miklós Lelkes
Central Physical Control Department
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Selection for physical control
• Start point: 100 % of claims
• Selection of control sample
(cost effectiveness)
(e.g. min. 5% control rate)
– Random selection:
20-25%
– Risk analysis:
75-80%
(representative overview)
(financial risk of EU)
– Direct selection
• To have effective control methods
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Defining the risk factors and
weighting
• First year(s)
–Based on expert appraisal
• Evaluation of the results of the Control
 update of the selection method
• Changes in the category limits
• Changes in the scoring
• Changes in the weighting
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Definition of a risk factor
(example)
Risk score
small /
large cases
are with a
higher risk
Average parcel size
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Evaluation of the effectiveness of
the risk analysis (2007)
• ~ 241.000 claims
• For > 80 measures
• 4th year of Hungary/ARDA in the EU
 Huge amount of information in the IACS
 Need for special solution for deriving information from
the DB
 Data mining software/technique
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
The goal of Data Mining ?
 Not to find the perfect model for
a certain problem
 but to find the optimal model
for a certain problem that is:
•
•
•
•
Robust
Generalizes well
Easy to understand
Provides insight into drivers
of the problem
• Easy to implement
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Risk Analysis#1
Create an abstract
mathematical model that
behaves coherently with
regard to risky farmers.
Model 1
Model 2
Generalize risk patterns for
automatic detection
- Identify typical patterns
- Score applications for
probability of noncompliance
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
training
validation
Control Methods Workshop 2010, Ispra Italy
Risk Analysis#2
• Modeling techniques:
- Predictive Modeling:
• Decision Trees
• Neural Networks
• Regression
• Scorecards
• Pre-requirements:
- confirmed historical
cases as input data
(non-compliance flag)
• Result: Risk score
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Data mining /
estimation models
% of anomaly as a function of sampling rate
(with population ordered by decreasing rate of anomaly)
Average rate of anomalies
Target variable:
Errors (bad / good)
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Data mining with SAS in Hu
• Pilot in 2007
– Selection of OTSC sample of SAPS by data mining
• Operational from 2008
– All area and animal based subsidies
• 2nd year of operational work in 2009
– Extend use to all measures
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Interesting first results (2007)
Area size
• Less categories
in the factors
Age
High risk = low score
>0.3 and
<1.0
-5
>1.0 and
<10.0
3
>10.0
-1
>18 and <40 1
>40 and <60 8
>60
• Some criterion were not relevant!
(but in 2007 they were used because of the regulation)
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
3
Control Methods Workshop 2010, Ispra Italy
Operational work from 2008
• The Integrated Risk Analysis System includes:
• Interface to
–
–
–
–
–
IACS System (direct access)
Hungarian ovine and caprine I&R System
Hungarian bovine I&R System
Hungarian porcine I&R System
Farmers Registry
• Risk datamart
• Analytical models (approx. 15 analytical models)
– Area based measures (e.g. SAPS);
– Agricultural Environment Protection Program based
measures;
– Rural Development (investment projects) based measures
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Architecture of
Integrated Risk Analysis System
Place of operational work
(system for transactions)
Utilization
Datamart
Risk management
Datawarehouse
IACS
Source
Extraction
data quality
integration
transformation
Data mining
Statistical analysis
Web reporting
Information for management
External
Uniform handling of metadata
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Models
• Decision tree
– Easy to understand (if not too complicated)
– Big groups
• Regression
• Neural network
– Normally the best prediction result
– The result can not be interpreted (“black box”)
• Scorecard
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
14
Control Methods Workshop 2010, Ispra Italy
Scorecard
• Easy to work with, can be calculated “by
hand”
• Easy to interpret
• Good for non linear variables (i.e. age)
• Not a problem, if the variable has strange
distribution
• Sometimes can result in big groups
• Not a group of factors, but a uniform
model!
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
15
Control Methods Workshop 2010, Ispra Italy
Data Mining
Model Training and Scoring
Target Variable
• Converting a complex control
result in a binary format
(black or white)
• Should be defined by PA
Analysis
Score Rules
(Score Code)
Scoring
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Prediction
Control Methods Workshop 2010, Ispra Italy
Operational work from 2008
• The Integrated Risk Analysis System includes
(cont.):
– Scoring lists
– Evaluation statistics (Random and Risk based)
– OLAP reports
– Ad-hoc reporting and analysis interface
SAS® Enterprise Guide
SAS® Enterprise Miner 5
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Results
• Analysis of results
(e.g.: interpretation of scorecards)
• Model documentation
– Automatic report in Enterprise Miner
– Auditable documentation of data mining
process
• Documentation of all selection procedures
• Review of model quality
• Statistics for EU (OLAP cubes)
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Risk factors found relevant by data mining
2007
2008
2009
Total amount of application
Total area
change in the area
--Total area
change in the area
--Total area
change in the area
Number of physical blocks in 2007
Number of physical blocks
--Number of physical blocks in 2006
number of parcels
Number of parcels
Number of different land use type
average parcel size
underclaimed reference parcel
underclaimed reference parcel in
the previous year
number of parcels
change in number of parcels from change in number of parcels
previous year
from previous year
change in number of parcels from
--2005 to 2006
change in number of parcels from
--2004 to 2005
--------underclaimed reference
underclaimed reference parcel
parcel
---
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
---
Control Methods Workshop 2010, Ispra Italy
Risk factors found relevant by data mining
2007
percentage of risky parcels in the
application (close to minimum
parcel size)
joint cultivation
grassland in the application
2008
2009
---
---
-----
percentage of orchard and vineyard
---
compactness of the farm
change of application complexity
result of earlier controls
----result of earlier controls
--grassland in the application
percentage of orchard and
vineyard
-------
TOP-UP crops in the application
---
Gender of applicant
---
Age of applicant
Age of applicant
Other area based applications
---
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Results by mean of efficiency
• Cumulative lift: 1.5-4
times higher compared to
random sample
Lift in SAPS (total)
1.9
1.8
1.7
1.6
1.5
1.4
– Fine tuning of criterion
1.3
and weighting
1.2
1.1
(0.2-3 lift increase)
1
2006
2007
2008
2009
2010
– Selection of variables
(0.1-1 lift increase)
Lift = ratio of anomalies in risk
– Proposal of new variables to include sample over random sample
(0.1-0.8 lift increase)
– Global optimization of cross-validation effects
(0.1-0.5 lift increase)
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Some objectives found by OTSC
• The quality of application could be better
 level of irregularities in the random sample still
high ?(although it is better from year to year)
– Higher control rate?  better risk assessment (in
terms of % of anomaly)!
• Rate of irregularities in classical field inspection
sample is significantly higher than in RS sample
(research needed)
– Difference in technique
– Difference in selection / population
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Sliding window
Blocks  farmers  risk calculation per window, other info
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Various shapes
30 x 30, 30 x 42, 10 x 30, 30 x 10, 10 x 10…
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Efficiency
Net over gross ratio constraint may be opposite to risk?
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Conculsion
for the effective risk analysis
• Worth to use data mining solution
• Yes, we need RS, but the level of anomalies in
RS sample is a question
• Better not to use all farms in the RS zone
– 20-25% use of VHR?
– Site shape?
– Quota for MS?
• Cost of data mining solution?
– Expensive
– But cheaper, than a flat rate correction!
Agricultural and Rural Development Agency (ARDA)
Budapest, Hungary
Control Methods Workshop 2010, Ispra Italy
Thank you for your attention!
Agricultural and Rural
Development Agency
H-1095 Soroksári út 22-24.
www.mvh.gov.hu
Central Physical Control Department
Tel.: + 36 1 301-2409 Fax:+ 36 1 301-2444
E-mail: [email protected]