Download PD models in banks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Time series wikipedia , lookup

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Data assimilation wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
PD models in CSOB
Scoring Models for
Individuals
Retail portfolio
Scoring Function
π‘†π‘π‘œπ‘Ÿπ‘’π‘– = 𝑏1 π‘₯𝑖1 + 𝑏2 π‘₯𝑖2 + 𝑏3 π‘₯𝑖3 + β‹― + π‘π‘˜ π‘₯π‘–π‘˜ = 𝒃′ π’™π’Š
Vector of borrower’s characteristics: π’™π’Š =
π‘₯𝑖1
π‘₯𝑖2
…
π‘₯𝑖2
Vector of unknown parameters (weights): 𝒃 =
𝑏1
𝑏2
…
π‘π‘˜
Internal model versus crystal ball
How to assess creditworthiness of each borrower?
To whom I can borrow?
Who should be rejected from loan providing?
Who will not be able to repay his loan?
Scoring – classing/ ordering people
Score: 0, 12, 38, 44, 48,67, 78, 93,101, 112, 230, 330,560
Risk Drivers for retail portfolio
β€’ Borrower’s specific characteristics
β€’ Age of borrower
β€’ Marital Status
β€’ Occupation, income
β€’ Loan’s specific information
β€’ Type of loan
β€’ Monthly instalment
β€’ Transactions information (in case if available)
β€’ Drawings
β€’ Payment balance
β€’ External /Macro
β€’ Unemployment rate
β€’ Czech Banking Credit Bureau Data
Weight of evidence and information value definition
WoE=ln(P(c/Good))-ln(P(c/Bad)
P(Good/c)
P(Good)
ln(P(c/Good))-ln(P(c/Bad)=ln
- ln
P(Bad)
P(Bad/c)
WeE = A Posteriory log odds – A priory log odds
Interpretation: the improvement of forecast through the information of category c.
IV =
π‘ͺ
𝒄=𝟏 𝑾𝒐𝑬𝒄 (P(c/Good))βˆ’P(c/Bad))
Interpretation: a high value of IV indicates a high discriminative power of a variable
You cannot predict everything
β€’ Unacceptable risk (DR β‰ˆ 37%)
β€’ Decline on spot
β€’ No products allowed
Score 0
Score 2
33%
25%
Score 3
Score 4
Score 5
Score 6
14%
8%
0%
20%
β€’ Medium risk
β€’ Manual review
β€’ Proceed with caution
β€’ Income (800 mln)
β€’ Debt (0 mln)
β€’ Region (Capital region)
β€’ High score + Default β‰ˆ Fraud!
Model
60%
Score 1
β€’ Low risk portfolio
β€’ Automatically accepted
β€’ No manual review
Basel Accord
To set higher risk management and internal control standards for banks
To introduce more risk-sensitive approach to the regulatory capital requirements
β€’ 1988: 1st Capital Accord – Basel I (BCBS, BIS)
β€’ 1999: Basel II 1st Consultative Paper
β€’ 2004: The New Capital Accord - Basel II
β€’ 2006: EU Capital Adequacy Directive
β€’ 2008: Basel II effective for banks
β€’ 2010: Basel III following the crisis (effective 2013-2019)
The main principles of Basel Accord (example)
Bank A
Bank B
Liabilities
Assets
Deposits
900
Capital
100
Total
1000
Loans
Total
Liabilities
1000
1000
After credit Loss of $ 125 m
Deposits
900
Capital
-25
Total
875
Loans
Total
Assets
Deposits
600
Capital
400
Total
1000
Loans
1000
Total
1000
Loans
875
Total
875
After credit Loss of $ 125 m
875
875
Capital negative  insolvency
Bank’s assets are not enough to repay its deposits
To cease the operations or recapitalized/bailed out
Deposits
600
Capital
275
Total
875
Capital positive  solvent
Continues the operations
Banks operate on thin margins(difference in margins between deposits and loans). Some amount of leverage (usage of
borrowed capital) is a must. But high leverage is a risk.
The main principles of Basel Accord: losses over time
The expected part of the losses – EL , based on the long term history, should be covered from annual revenues.
EL – is the cost of doing business, the credit pricing has to absorb EL
The unexpected part – UL, if not covered by revenues, must be charged against the capital.
The main principles of Basel Accord: Expected Loss (EL) and
Unexpected Loss (UL)
In good years losses might be less than expected, but in bad years the bank needs a sufficient capital buffer.
The goal of regulation: to set up a procedure estimating the potential unexpected loss (UL) on
a regulatory level – 99.9% of potential losses should be covered by the outcome of internal models.
Rating System and its
Performance
External versus Internal Rating
Category
Rating
Highest Quality
AAA
Very Good Quality
AA
β€’ Moody’s (1906)
Good Quality
A
β€’ Standard&Poor’s (S&P, 1916)
Medium Quality
BBB
β€’ Fitch, etc
Low Medium Quality
BB
Poor Quality
B
Speculative Quality
C
Default
D
β€’ External - Rating Agencies:
β€’ Internal Rating system –differ from FI to FI
β€’ Basel requirement: at least 7 rating grades
β€’ WHY Internal Rating?
β€’ External ratings represent the quality of particular debt issue/ internal rating evaluate client’s quality
β€’ External Ratings are available only for large corporates, sovereigns/ Internal Ratings assess all medium, small
business segments
β€’ Conflict of interest: External Agencies earn their income providing rating service from fees, which are paid by
corporate clients
Probability of Default and Default rates
β€’ Qualitative measure of default likeliness
β€’ Number in a range: 0<PD<1 or 0%<PD<100%
β€’ DR = D during observation period/(Total ND at the beginning of observation period)
β€’ DR is empirical realization of PD
β€’ PD is individual characteristic of client, DR is characteristics of clients’ sample / portfolio
β€’ Benefits of PD:
β€’ Objective Information, which can be used in comparison of different clients in different portfolios and benchmarked with external
default data
β€’ PD is useful for credit pricing: the higher PD, the higher risk, hence credit premium ought to be higher
β€’ PD enables calculation of the economic capital: the higher PD, the higher capital requirements
β€’ Cohorts:
2005
2007
2006
Outcome period
Cumulative Accuracy Profile (CAP)and Accuracy Ratio (AR) – I
CAP curve – line in the rectangle [0,1]x [0,1], plots
cumulative distribution of defaulted debtors 𝐢𝐷 - ordered
from worse rating to the best
against cumulative distribution of all debtors 𝐢𝑇 -ordered
from worse rating to the best
ο‚£
CD
Intuition: bad clients have to be assigned to the worst
Rating grades
𝐢𝑇 (𝑅𝑖 ) - for given rating 𝑅𝑖 - percentage of all debtors
with rating 𝑅𝑖 or worse
𝐢𝐷 (𝑅𝑖 ) - for given rating 𝑅𝑖 - percentage of all debtors
with rating 𝑅𝑖 or worse
Random Model: straight line which halves the quadrant –
rating system contains no information about debtor’s credit
quality; assign x% of defaulters among x% debtors with the
worst rating (no discrimination)
Perfect Model: all defaulters get a worse rating than the
non-defaulted debtors and CAP raises straight to 1 and
stays there.
Cumulative Accuracy Profile (CAP)and Accuracy Ratio (AR) - II
π‘Žπ‘ƒ
π‘Žπ‘…
ο‚£
CD
CAP curve can be summarized into a single number –
The Accuracy Ratio – AR or
(Gini Coefficient, or Power Statistics)
π‘Žπ‘…
𝐴𝑅 =
π‘Žπ‘ƒ
π‘Žπ‘… - the area between CAP rating model and CAP random
model;
π‘Žπ‘ƒ - the area between CAP perfect model and CAP rating
model;
0 <= AR <= 1
Intuition: the closer AR to 1, the more CAP curve to the
upper left, the higher is the discriminative power of a rating
model
Discriminative Power of the Rating System
PD models for Legal Entities
Corporate/SME portfolio
Model Selection: Logit Regression
Advantages: 1. Output from Logistic regression can be interpreted directly as probability of default (PD)
2. Easy check – dependence between explanatory variables and default risk
π‘¦π‘–βˆ— latent (unobservable variable):
𝑦𝑖 - observable output default event
Logistic distribution
Interpretation of Logistic coefficients: the odds – the relation of probability of default
to the survival probability changes by 𝑒
π›½π‘˜
when explanatory variable π‘₯π‘˜ changes by 1 unit.
Note: other models are also in use such as probit, survival model, decision trees, neural networks, panel data models, etc.
Data Collection: Risk Drivers for Legal Entities
Firm Credit Risk
Operating (NonFinancial) Risks
Financial Risk
Internal (Company
Level) Risks
Industry Risks
External Risks
Operating Risks
β€’ External Risks
β€’
β€’
β€’
β€’
β€’
β€’
National Development (Macroeconomic Factors: Private Consumption, Government Spending, Inflation, etc)
International Development (Exchange Rates, Fiscal Policy, Monetary Policy, etc.)
Economic Factors (business cycle, investment, import, export, etc.)
Political Factors (terrorism, civil wars)
Social Factors (demography, education)
Cultural Factors, Regulatory Framework, Technology, Environment)
β€’ Industry Risks
β€’
β€’
β€’
β€’
β€’
β€’
β€’
Risks Emanating from External Environment (tariff barriers, changes of consumer preferences)
Industry Specific Risks (new entrance, price wars, bargaining power of suppliers and buyers)
Risks Emanating from Industry Drivers (demand factors)
Industry Cycle Stages: pioneering, rapid growth, maturity, Stabilization, Decline
Permanence of Industry
Government Support
Industry Profitability (completion among existing firms, threat of new, threat of substitute products)
Data Processing
1995
β€’ Data Cleaning
1996
N
default
sales
Net
sales
1
0
-1230
N/A
2
1
N/A
3
N/A
N/A
…..
1997
Horizontal cleaning: repeated rows,
errors in specific cells,
non representative firms (too large, too small…)
non annual financial statements collected
default information is missing
– row should be deleted
N/A
Vertical cleaning: if specific variable has high number of missing values
– column should be deleted
β€’ Treatment of missing Values
β€’ Substitute by mean or median (neutral) value some of the input variables
β€’ Treatment of Outliers
β€’ Replace extreme values by an appropriate cut-off value or percentile
β€’ Transformation of Explanatory Variables
β€’ Standardization or/and normalization
Single Factor Analysis: the choice of input variables (Long List)
1. The expected dependence between the financial
ratio and probability of default
2. Test of linearity assumption:
2.1. Divide the indicators into groups that contains the
same number of observations, for instance by percentile
2.2. Default rate (π·π‘…π‘˜ ) in each group;
𝐷𝑅
2.3. Empirical log odd within each group 𝑙𝑛( π‘˜ )
1βˆ’π·π‘…π‘˜
2.4. Estimate linear regression of log odds on the mean
values π‘₯π‘˜ of the each group of indicators.
3. Test whether the postulated dependency in step 1.
coincides with the linearity in step 2.
Model building: Training Sample versus Validation Sample
The whole portfolio, 5 cohorts
Splitted portfolio, 70 : 30
Source: Hayden (2002)
Training Sample serves for developing of final model
Validation Sample serves for out-of-sample evaluation of final model
Model building: Pairwise correlation (Pirson’s correlation)
Short list creation of variables potentially appropriate for the final model:
Variable 1 is highly correlated with Variable 2 (both measuring leverage) but AR Var 1 < AR Var 2=> Var 1 is
dropped from multivariate analysis
Variable 10 is highly correlated with Variable 11 (both measuring profitability) but AR Var 10 < AR Var 11=>
Var 10 is dropped from multivariate analysis
Model Building : Multivariate Analysis
Total number of different combinations variables in the model : 212 = 4 096 models=> try backward /forward selection.
Step by step elimination of insignificant variables until remaining variables have high significance level.
True parameters are different from zero with a probability of 90% (*), 95% (**), or 99%(***) (low p-value to reject H0:coef=0.)
Model 1: Variable 9 has opposite sigh in the regression than it was expected hypothetically. Var 9 is correlated with group of
other variables: Var 10,11,12 => remove Var 9 from the model
Model 2: Variable 6 became highly insignificant => removed from the final model.
Model Building : Performance
Model’s Robustness: test statistics, AR, HL in the development sample and validation sample are close enough
Goodness of Fit: Brier test, Hosmer-Lemenshow test (high p-value)