Download Bridging the Academic–Practitioner Divide in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Forecasting wikipedia , lookup

Least squares wikipedia , lookup

Choice modelling wikipedia , lookup

Coefficient of determination wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Bridging the Academic–Practitioner
Divide in Credit Risk Modeling
Vadim Melnitchouk, Metropoliten State University,
Saint Paul, MN, US
Agenda
Academic model selection by a
practitioner and organizational
issues
1.
2. ‘Optimal complexity model’ :
stochastic parametric method
with macroeconomic variables and
unobserved consumer heterogeneity
3. Data access , collaboration &
prototype development
Who is a practitioner?
1. Ph. D in applied math, former
academic, teaching part-time ‘Data
Mining’.
2. Ph. D in physics, former academic
3. M.S. in OR, former ‘Fed’ examiner
4. M.S. in Econometric
A practitioner’s search
for the right academic paper /model
Paper/Methodology
Potential Business Impact
Andreeva, Ansell & Crook
'Modeling Profitability using
Survival Combination
Scores'
Increase Profitability
Organizational issue
How to get CRO & CMO to
agree on the same KPIs?
Belloti & Crook Forecasting More accurate estimation
US Banks are getting a
and Stress Testing Credit
for unexpected losses,
stress test scenario from
Card Default..'
Economic Capital Reduction Regulators
Fader & Hardie 'CustomerBase Analysis with Discrete- Increase Sales, prevent
time …'
Customer Attrition
Was implemented at GE
Money in 2008-2009
Fader & Hardie 'CustomerBase Analysis with Discretetime …'
Reduce losses
Cultural resistence
Leow & Crook 'Intensity
Models and Transition
Probabilities ‘
Feasible, but optimal
complexity model is
required
Reduce losses
Time to Default: Optimal complexity model
1.
According to Bellotti & Crook (2007) survival (hazard)
modeling is competitive alternative to logistic
regression when predicting default events.
2. The method has become a model of choice in
recent publications. But its complexity makes such
technique unfeasible for practitioners.
3. It also has some limitations. Bellotti (2010) believes
that ‘any credit risk model with macroeconomic
variables can’t be expected to capture the direct
reason for default like a loss of job, negative
equity or a sudden personal crisis such as sickness
or divorce’.
Methodology
 The goal of this paper is to present more practical
method which also can take unobserved obligor
heterogeneity into account.
 Stochastic parametric Time to Event method is well
known in marketing (Hardie & Fader, 2001).
 It was also applied by Brusilovskiy (2005) to predict
the time of the first home purchase by immigrants.
 The method as far as we know has not been used in
credit risk by academics or practitioners.
Assumptions & inputs
1. Time to Default - Weibull distribution (Appendix)
2. Default density across obligors - Gamma
distribution (to include unobserved consumer
heterogeneity).
3. Vintage aggregate level modeling to avoid so
called aggregation bias when unemployment is
used.
Inputs:
1. Monthly number of defaults
2. Time varying covariates : Unemployment and Home
Price Index (HPI). Macroeconomic factors are
incorporated into the hazard rate function.
Recent trends in mortgage default rate & data
1. The default rates have spiked from historical
trends in 2005 and more significantly in 2006
& 2007 beginning almost immediately after
origination.
2. Average time to reach maximum default rate
decreased from 5-6 (Vintage 2001-2004) to 2-3
years (Vintage 2005-2007)
3. LPS prime, first, fixed rate 30 years mortgage
originated in 2006 data were used to build a
model (Schelkle, 2011).
Model training and out-of-time validation
1.
Model training period for vintage 2006 was
June 2006 – March 2009.
2.
April 2009 to March 2010 period was selected
for ‘out of time’ validation because
unemployment increased from 8.5% to 10.1%
during this period.
3.
The model was implemented in MS Excel
(using Solver) and in SAS/IML. Maximum
likelihood was estimated to get values for
five parameters.
Forecasted vs Actual monthly # of defaults
Weibull/Gamma model for 2006 mortgage origination year (LPS data,
vintage 2006).
140000
120000
100000
80000
Pred
60000
40000
20000
0
Act
Results & Discussion
The forecast accuracy for ‘out-of-time’ period is
at acceptable level (low forecast error and
conservative estimate for regulators).
Issues with one segment model:
1.
2.
Time varying covariates formula is taken from
marketing application and is not flexible one
for credit risk modeling (Appendix).
The impact of unemployment and HPI can be
double counted.
Next steps in collaboration with academics
1. Bayesian parameters’ estimation was applied in
collaboration with Prof. Shemyakin (St.Thomas
University, St. Paul. MN) and his students to
improve numeric stability.
 Two segments latent class Weibull model
(Appendix) was also used to estimate parameters
of consumer segment with default hazard
increasing over time.
 Unemployment and HPI were not included to
avoid double counting (academic’s preference).
Data access and three levels of collaboration
Collaboration
Academic's
level
Execution by Motivation
Practitioner's
Motivation
Looking over
your shoulder Practitioner
Joint
supervision
Student
Bridging the
Academic–
Academic
Practitioner
and
Divide
practitioner
Data Access
Academic
Partner
Marketing
and
validation
Apply new
method
(professional
growth)
N/A
Prof. Fader
& Prof.
Hardie
Real life
project for a
student
Additional
validation &
enhancement
Vintage
aggregated
data only
Prof.
Shemyakin,
June 2012
?
Resolve real
issue like wrong
signs in
multinomial Aggregated by
regression
delinquency
status
coefficients
?
Data access
1. It is very problematic to get loan level data
from financial firms for joint projects.
2. Aggregate level delinquency and default data
for mortgages, credit cards , installment loans
and commercial lending can be extracted from
public websites.
3. But data decomposition of completely
aggregated data like Federal Reserve one
(Appendix) should be implemented first to apply
vintage based modeling.
From a prototype to production: possible collaboration
Model description
Non-stationary Markov
Chain model with hazard
functions and
macroeconomic variables
Model Category
Production
Scope
Consumer &
Commercial
Non-stationary Markov
Chain model with
multinomial transition
functions and
Consumer &
macroeconomic variables
Production
Commercial
Experiment with a
second order Markov
Chain
Research
Commercial
Forecasting Time to
Delinquency using
Stochastic Parametric
Model
Benchmarking Consumer
Predicting delinquent
loans’ recovery using
Stochastic 'Choice' Model Benchmarking
Consumer
Major Issue
Possible solution
Zero values for some
transition coefficients
Bayesian
estimator/ Gibbs
sampling?
Wrong signs in some
transition coefficients
To many parameters,
small sample size for
some transitions
MCMC
MLE estimation
numerical stability
Bayesian
estimation
Not included in SAS, R,
etc., no standard tests
Alternative to
Markov model
?
Next search for optimal complexity model:
Combined Markov Chain
and Survival Analysis
Model description
Macroeconomic
variables
Objective
function
Major Issue
Possible solution
Next step
Partial MLE
?
N/A
Yes
Partial MLE
Correlated event times
Clustering
Least Sq.
Migration
underestimation
Bayesian MCI
(Christodoulakis)
MPLE
Zero values in some
transition coefficients
Gibbs sampling
MLE
Statistical significance
for some transitions
Bayesian
estimator
Leow & Crook 'Intensity
Models and Transition
Probabilities ‘
Louis, Laere,
Baesens ‘Predicting
bank rating
transitions..’
Jones ‘Estimating
Markov Transition’
Kunovac ‘Estimating
Credit Migration
…– Bayesian
Approach,
Grimshaw & Alexander
‘Markov Chain model for
delinquency..’
Yes
No
No
Conclusions
 Stochastic parametric method with
macroeconomic variables and unobserved
consumer heterogeneity can be used by
practitioners as an alternative to survival
modeling
 The optimal complexity model can
provide an incentive to try to bridge the
Academic –Practitioner Divide
Appendix
Latent class Weibull model
with two segments
Assumptions:
1.
All obligors can be divided into two segments with
their own fixed but unknown values of shape and scale
parameters.
Large segment has decreasing default hazard.
3. Relatively small consumer segment exists with default
hazard increasing over time . The segment size
(percentage) is latent variable which must be
estimated for each vintage.
2.