Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Time series wikipedia , lookup

Transcript
Introduction
In finance, a loan is a debt evidenced by a note which specifies, among other things, the
principal amount, interest rate, and date of repayment. A loan entails the reallocation of the subject
asset(s) for a period of time, between the lender and the borrower.
In a loan, the borrower initially receives or borrows an amount of money, called the
principal, from the lender, and is obligated to pay back or repay an equal amount of money to the
lender at a later time. Typically, the money is paid back in regular installments, or partial
repayments; in an annuity, each installment is the same amount [1].
Uncovering the values on which interest rate depends may help us predict what interest rate a
person may get assuming some other parameters. We will conduct our analysis on the basis of
dataset composed by specialists from Lending Club Company.
Lending Club is a US peer to peer lending company, headquartered in San Francisco,
California. It was the first peer-to-peer lender to register its offerings as securities with the Securities
and Exchange Commission (SEC), and to offer loan trading on a secondary market. Lending Club
operates an online lending platform that enables borrowers to obtain a loan, and investors to
purchase notes backed by payments made on loans. As of June 2013, Lending Club has originated
over 2 billion USD in loans, and averages $2.7 million in daily loan originations [2].
Our analysis suggests that FICO score is not the only parameter predicting Interest Rate and
using statistical methods such as multivariate linear regression we show that there is obvious
relationship between Interest Rate and the following parameters: Loan length, amount of money
funded by investors, debt to income ratio and inquiries in the last 6 months [3].
Methods
Data collection
The data collected for our analysis was obtained from spark-public.s3.amazonaws.com on
Monday, November 11, 2013 at 14:30 using R programming language [4]
Exploratory Analysis
This kind of analysis was conducted in order to detect missing values and determine values
that can fit the multivariate regression model predicting interest rate [5].
Exploratory analysis was performed by examining tables, scatter plots and histograms of the
data.
Some variables were transformed in order to make statistical modeling easier.
Statistical Modeling
We used standard multivariate linear regression model to associate interest rate with other
parameters presented in the data. We selected this model assuming linear relationship of interest rate
and FICO range. We also suggested the same kind of relationship between interest rate and some
other variables, which we used in our analysis.
Reproducibility
This analysis was performed using R language and it may be reproduced by obtaining the same data
from spark-public.s3.amazonaws.com. The actual results may change as changes the available data
on the web site based on date.
Results:
The analyzed data contains information on the following parameters:
1. Amount of requested money in dollars (Amount.Requested)
2. Amount of money that was loaned to the individual(Amount.Funded.By.Investors)
3. The lending interest rate (Interest.Rate)
4. Length of time of the loan in months (Loan.Length)
5. Purpose of a loan as stated by the applicant (Loan.Purpose)
6. The percentage of consumer’s gross income that goes
toward paying debt (Debt.To.Income.Ratio)
7. The abbreviation for the U.S. state of residence of (State)
8. A variable indicationg whether the applicant owns, rents or has a mortgage on his home
(Home.ownership)
9. The monthly income of the applicant in dollars (Monthly.income)
10. A measure of the creditworthiness of the applicant, FICO Score (FICO.Range)
11. The number of open lines of credit the applicant had at the
time of application (Open.CREDIT.Lines)
12. The total amount outstanding all lines of credit (Revolving.CREDIT.Balance)
13. The number of authorized queries about the applicant's creditworthness in the 6 months
before the credit was issued (Inquiries.in.the.Last.6.Months)
14. Length of time employed at current job (Employment.Length)
The dataset contained 2500 samples. We identified some missing values (namely, 2) and
excluded them from the analysis. All other measured variables showed no values outside the
standard ranges.
The distribution of the interest rate is bimodal and right skewed, 70% of interest rate values
lie in range 10 to 20. Median interest rate value is 13.7%.
Correlation analysis showed high negative correlation between Interest rate and FICO score
(about 0.71) and moderate positive correlation between interest rate and loan length and amount of
money funded by investors (0.42 and 0.34 correspondingly).
There also is slight positive correlation between interest rate and inquires in the last six
months and quantity of open credit lines (0.16 and 0.1 correspondingly).
First of all, we fit a regression model associating interest rate to FICO range. When the
residuals showed non-random variation patterns, we made an attempt to explain them by adding
variables that can theoretically be confounders into our regression model. So the final multivariate
regression model is as follows:
IR_model = b0 + b1(FICO.Range) + b2(Loan.Length) + b3(Amount.Funded.By.Investors) +
b4(Open.CREDIT.Lines) + b5(Inquiries.in.the.Last.6.Months) + e
where b0 is an intersept item and b1...5 represent the change of interest rate assosiated with a
change of one unit in the following varibles.
All these variables are statistically significant (P < 2e-16) for predicting the interest rate.
Percentage of variance explained is about 76%.
The change of one unit in FICO score range corresponded to change of b1= -0.087 percent of
interest rate (95% confidence interval -0.089 -0.085) [6].
The change of one unit in loan length in months corresponded to change of b2= 0.13 percent
of interest rate (95% confidence interval 0.12 0.14)
The change of one unit in amount of money loaned by investors corresponded to change of
b3 = 0.00015 percent of interest rate (95% confidence interval 0.00014 0.00016)
The change of one unit in quantity of open credit lines corresponded to change of b4 = 0.049
percent of interest rate (95% confidence interval 0.067 0.03)
The change of one unit in quantity of inquires in the last six months corresponded to change
of b5 = 0.39 percent of interest rate (95% confidence interval 0.32 0.46)
For example, interest rate for a person with FICO score of 735, loan length of 36 month,
amount of money loaned by investors of 20000 dollars, quantity of open credit lines of 14 and
quantity of inquires of 2 is predicted 12.6%.
Conclusions:
We found out that there is association of interest rate not only with FICO score, but also with
loan length, amount of money funded by investors, quantity of inquires in the last six months and
quantity of open credit lines.
We included these variables into multivariate linear regression model thereby improving the
model fit, but the regression line still had expressive slope.
According to the results of the analysis, the difference in interest rate between two people
with the same FICO score may be explained assuming other variables.
Although the analysis assumes obvious association of interest rate and other variables, larger
dataset (either more observations or variables) may be more suitable for better understanding the
relationship between interest rate and other loan parameters
References:
1. Wikipedia “Loan” page, URL: http://en.wikipedia.org/wiki/Loan, accessed 12/11/2013
2. Wikipedia “Lending Club” page, URL: http://en.wikipedia.org/wiki/Lending_Club, accessed
12/11/2013
3. “How my FICO Score is calculated”, URL:
http://www.myfico.com/crediteducation/whatsinyourscore.aspx, accessed 12/11/2013
4. R Core Team (2012). “R: A language and environment for statistical computing”, URL:
http://www.r-project.org/
5. Alan O. Sykes, An Introduction to Regression Analysis, Chicago University, 2008
6. Engineering statistics handbook, “What are confidence intervals?”, URL:
http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm, accessed 12/11/2013