Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction In finance, a loan is a debt evidenced by a note which specifies, among other things, the principal amount, interest rate, and date of repayment. A loan entails the reallocation of the subject asset(s) for a period of time, between the lender and the borrower. In a loan, the borrower initially receives or borrows an amount of money, called the principal, from the lender, and is obligated to pay back or repay an equal amount of money to the lender at a later time. Typically, the money is paid back in regular installments, or partial repayments; in an annuity, each installment is the same amount [1]. Uncovering the values on which interest rate depends may help us predict what interest rate a person may get assuming some other parameters. We will conduct our analysis on the basis of dataset composed by specialists from Lending Club Company. Lending Club is a US peer to peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. Lending Club operates an online lending platform that enables borrowers to obtain a loan, and investors to purchase notes backed by payments made on loans. As of June 2013, Lending Club has originated over 2 billion USD in loans, and averages $2.7 million in daily loan originations [2]. Our analysis suggests that FICO score is not the only parameter predicting Interest Rate and using statistical methods such as multivariate linear regression we show that there is obvious relationship between Interest Rate and the following parameters: Loan length, amount of money funded by investors, debt to income ratio and inquiries in the last 6 months [3]. Methods Data collection The data collected for our analysis was obtained from spark-public.s3.amazonaws.com on Monday, November 11, 2013 at 14:30 using R programming language [4] Exploratory Analysis This kind of analysis was conducted in order to detect missing values and determine values that can fit the multivariate regression model predicting interest rate [5]. Exploratory analysis was performed by examining tables, scatter plots and histograms of the data. Some variables were transformed in order to make statistical modeling easier. Statistical Modeling We used standard multivariate linear regression model to associate interest rate with other parameters presented in the data. We selected this model assuming linear relationship of interest rate and FICO range. We also suggested the same kind of relationship between interest rate and some other variables, which we used in our analysis. Reproducibility This analysis was performed using R language and it may be reproduced by obtaining the same data from spark-public.s3.amazonaws.com. The actual results may change as changes the available data on the web site based on date. Results: The analyzed data contains information on the following parameters: 1. Amount of requested money in dollars (Amount.Requested) 2. Amount of money that was loaned to the individual(Amount.Funded.By.Investors) 3. The lending interest rate (Interest.Rate) 4. Length of time of the loan in months (Loan.Length) 5. Purpose of a loan as stated by the applicant (Loan.Purpose) 6. The percentage of consumer’s gross income that goes toward paying debt (Debt.To.Income.Ratio) 7. The abbreviation for the U.S. state of residence of (State) 8. A variable indicationg whether the applicant owns, rents or has a mortgage on his home (Home.ownership) 9. The monthly income of the applicant in dollars (Monthly.income) 10. A measure of the creditworthiness of the applicant, FICO Score (FICO.Range) 11. The number of open lines of credit the applicant had at the time of application (Open.CREDIT.Lines) 12. The total amount outstanding all lines of credit (Revolving.CREDIT.Balance) 13. The number of authorized queries about the applicant's creditworthness in the 6 months before the credit was issued (Inquiries.in.the.Last.6.Months) 14. Length of time employed at current job (Employment.Length) The dataset contained 2500 samples. We identified some missing values (namely, 2) and excluded them from the analysis. All other measured variables showed no values outside the standard ranges. The distribution of the interest rate is bimodal and right skewed, 70% of interest rate values lie in range 10 to 20. Median interest rate value is 13.7%. Correlation analysis showed high negative correlation between Interest rate and FICO score (about 0.71) and moderate positive correlation between interest rate and loan length and amount of money funded by investors (0.42 and 0.34 correspondingly). There also is slight positive correlation between interest rate and inquires in the last six months and quantity of open credit lines (0.16 and 0.1 correspondingly). First of all, we fit a regression model associating interest rate to FICO range. When the residuals showed non-random variation patterns, we made an attempt to explain them by adding variables that can theoretically be confounders into our regression model. So the final multivariate regression model is as follows: IR_model = b0 + b1(FICO.Range) + b2(Loan.Length) + b3(Amount.Funded.By.Investors) + b4(Open.CREDIT.Lines) + b5(Inquiries.in.the.Last.6.Months) + e where b0 is an intersept item and b1...5 represent the change of interest rate assosiated with a change of one unit in the following varibles. All these variables are statistically significant (P < 2e-16) for predicting the interest rate. Percentage of variance explained is about 76%. The change of one unit in FICO score range corresponded to change of b1= -0.087 percent of interest rate (95% confidence interval -0.089 -0.085) [6]. The change of one unit in loan length in months corresponded to change of b2= 0.13 percent of interest rate (95% confidence interval 0.12 0.14) The change of one unit in amount of money loaned by investors corresponded to change of b3 = 0.00015 percent of interest rate (95% confidence interval 0.00014 0.00016) The change of one unit in quantity of open credit lines corresponded to change of b4 = 0.049 percent of interest rate (95% confidence interval 0.067 0.03) The change of one unit in quantity of inquires in the last six months corresponded to change of b5 = 0.39 percent of interest rate (95% confidence interval 0.32 0.46) For example, interest rate for a person with FICO score of 735, loan length of 36 month, amount of money loaned by investors of 20000 dollars, quantity of open credit lines of 14 and quantity of inquires of 2 is predicted 12.6%. Conclusions: We found out that there is association of interest rate not only with FICO score, but also with loan length, amount of money funded by investors, quantity of inquires in the last six months and quantity of open credit lines. We included these variables into multivariate linear regression model thereby improving the model fit, but the regression line still had expressive slope. According to the results of the analysis, the difference in interest rate between two people with the same FICO score may be explained assuming other variables. Although the analysis assumes obvious association of interest rate and other variables, larger dataset (either more observations or variables) may be more suitable for better understanding the relationship between interest rate and other loan parameters References: 1. Wikipedia “Loan” page, URL: http://en.wikipedia.org/wiki/Loan, accessed 12/11/2013 2. Wikipedia “Lending Club” page, URL: http://en.wikipedia.org/wiki/Lending_Club, accessed 12/11/2013 3. “How my FICO Score is calculated”, URL: http://www.myfico.com/crediteducation/whatsinyourscore.aspx, accessed 12/11/2013 4. R Core Team (2012). “R: A language and environment for statistical computing”, URL: http://www.r-project.org/ 5. Alan O. Sykes, An Introduction to Regression Analysis, Chicago University, 2008 6. Engineering statistics handbook, “What are confidence intervals?”, URL: http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm, accessed 12/11/2013