Prosper.com Lending Data Analysis: Understanding Borrower Behavior for Wise Investment, Exams of Statistics

An analysis of prosper.com's peer-to-peer lending data, focusing on borrower behavior for debt consolidation loans. The borrower's loan details, such as amount borrowed, rate, credit grade, home ownership status, length of loan, and purpose. The analysis involves using logistic regression models to examine the relationship between borrower credit grade and current payment status, both as nominal and ordinal predictors. The results indicate that the borrower's credit grade significantly influences their current payment status.

Typology: Exams

Pre 2010

Uploaded on 07/23/2009

koofers-user-km7
koofers-user-km7 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 665 (Spring 2008)
Kaizar
Midterm Data and Analyses
This packet describes the data and several analyses conducted on this data.
You will need this to answer the questions presented in a separate packet.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Prosper.com Lending Data Analysis: Understanding Borrower Behavior for Wise Investment and more Exams Statistics in PDF only on Docsity!

Stat 665 (Spring 2008) Kaizar

Midterm Data and Analyses

This packet describes the data and several analyses conducted on this data.

You will need this to answer the questions presented in a separate packet.

The Data

Prosper.com is an online marketplace of peer-to-peer lending, where someone who would like to borrow money (the borrower) posts a proposal for the loan, and then lenders can bid to fund that loan. Once the loan is funded, the lender gives the borrower the money, and then the borrower makes regular payments to the lender. An example finalized loan is as follows:

Variable Value for this loan General Description Title Bye Bye Credit Cards! Amount Borrowed $3, Rate 6.86% this is the rate that the borrower pays for the loan Credit Grade AA this is a measure of the borrower’s credit. The grades are AA, A, B, C, D, E, and HR. AA is the best and HR is the worst. Home Owner True this indicates whether or not the borrower owns a home Length of Loan 3 years Purpose Debt Consolidation For the purposes of the exam, I collapsed the category into two options–debt consoli- dation and other.

Suppose you are a potential lender. You would like to make a wise investment, and so you would like to understand what kind of borrowers keep their accounts current. Prosper.com allows you to download the entire set of currently active loans, including whether or not the borrower is late in their payments. You will use a subset of this history, which contains 17699 observations, to try to better understand what kind of borrowers make a good investment. Because of Prosper.com restrictions on loans, it is reasonable to assume that each borrower only has one loan in this dataset, and so the observations are independent.

The variable “Current” in the data set is coded 0 for late, and 1 for current (not late). That is, for loan i,

Currenti =

{ 0 if the borrower is current with his/her payments (good for the lender) 1 if the borrower is late making a payment (bad for the lender)

The variable “Credit Grade” is treated in two ways. The first is nominal, which is coded in the variable “Grade”. The second is ordinal, which is coded in the variable “GradeOrdinal” according to the scale:

Grade AA A B C D E HR GradeOrdinal 1 2 3 4 5 6 7

C. A model using Grade as an ordinal predictor:

Call: glm(formula = Current ~ GradeOrdinal, family = binomial("logit"), data = llsubset) Deviance Residuals: Min 1Q Median 3Q Max -2.5986 0.2637 0.4267 0.6762 1.

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.83715 0.07053 54.41 <2e- GradeOrdinal -0.49556 0.01302 -38.07 <2e-


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 16960 on 17698 degrees of freedom Residual deviance: 15193 on 17697 degrees of freedom AIC: 15197

Number of Fisher Scoring iterations: 5

The variance-covariance matrix for the coefficient estimates:

(Intercept) GradeOrdinal (Intercept) 0.0049741146 -0. GradeOrdinal -0.0008787796 0.

D. A model using Grade as a nominal predictor:

Call: glm(formula = Current ~ Grade, family = binomial("logit"), data = llsubset)

Deviance Residuals: Min 1Q Median 3Q Max -2.5913 0.2663 0.4454 0.6103 1.

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.3219 0.1324 25.083 < 2e- GradeA -0.6521 0.1633 -3.993 6.53e- GradeB -1.0610 0.1487 -7.137 9.54e- GradeC -1.5222 0.1412 -10.778 < 2e- GradeD -1.7358 0.1404 -12.362 < 2e- GradeE -2.4262 0.1401 -17.321 < 2e- GradeHR -3.0578 0.1381 -22.135 < 2e-


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 16960 on 17698 degrees of freedom Residual deviance: 15154 on 17692 degrees of freedom AIC: 15168

Number of Fisher Scoring iterations: 5

E. A model using Grade as an ordinal predictor interacted with whether or not the loan is for debt consolidation, with plot:

Call: glm(formula = Current ~ Consolidation * GradeOrdinal, family = binomial("logit"), data = llsubset)

Deviance Residuals: Min 1Q Median 3Q Max -3.2584 0.1272 0.4576 0.7121 1.

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.64514 0.07077 51.505 < 2e- ConsolidationDebtConsolidation 2.14853 0.77334 2.778 0. GradeOrdinal -0.48048 0.01308 -36.744 < 2e- ConsolidationDebtConsolidation:GradeOrdinal 0.23539 0.10309 2.283 0.

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 16960 on 17698 degrees of freedom Residual deviance: 14668 on 17695 degrees of freedom AIC: 14676

Number of Fisher Scoring iterations: 7

Ordinal Credit Grade

Probability of Payment being Current

Other Debt Consolidation

Question 1 (5 points total)

Based on the tables in results part A, answer the following.

(a) (4 points) Is the current status of the loan independent of whether or not the loan is for debt consolidation? Note which test you are using and why you chose it. Show your work. We learned about three different tests of independence for 2x2 tables. Fisher’s exact test is not necessary, since all of the cells have greater than 5 observations. Because the sample size is quite large, we aren’t too concerned about the χ^2 approximation for either Pearson’s χ^2 test or the Likelihood Ratio test G^2. However, since one of the cell counts is only 15, Pearson’s χ^2 test might be better. In this case, all 3 would be acceptable. This is the table of differences between the observed and expected cell counts:

Other Debt Consolidation Late 317.53 −317. Current −317.53 317.

Thus, the Pearson χ^2 statistic is:

X^2 =

i,j

(nij − μij )^2 μij

=

X^2 is asymptotically distributed according to a χ^2 distribution with 1 degree of freedom. Using the chart, I see that the p-value is < 0 .0005. Thus, I reject the null hypothesis of independence and conclude that the current status and debt consolodation are not independent.

(b) (1 point) Based on your answer to part (a), would you rather give a loan to someone who was consolidating their debt, or who was using the money for some other purpose? Briefly explain. Because the odds ratio is 30.6, the “other” loans are more likely to to have a late payment than the “debt consolidation” loans. Thus, since we already know this is a significant difference, I would rather give a loan to someone who wants to consolidate debt.

Question 2 (13 points total)

Answer the following questions based on the model in results part C:

(a) (3 points) Describe the generalized linear model fit here in terms of its random compo- nent, systematic component and link function. Random Component: Crediti ∼ Binomial(1, πi) Systematic Component:

g(πi) = α + β × GradeOrdinal

Link Function: g(πi) = logit(πi) =

( (^) π i 1 − πi

)

(b) (1 point) Report the estimated prediction equation.

logit(πi) = 3. 837 − 0. 496 × GradeOrdinali

(c) (4 points) Is the ordinal credit grade significantly associated with the borrower’s current payment status? Report the name, null hypothesis, value of the test statistic, the distribution of your test statistic, and conclusions of your test. Test: Wald Test Null Hypotheis: H 0 : βGradeOrdinal = 0 Test Statistic: -38. Distribution: Under the null hypothesis, the test statistic is distributed according to a standard Normal distribution Conclusions: Because the p-value of the test is very small (< 2 × 10 −^16 ), I reject the null hypothesis and conclude that the ordinal credit grade is significantly associated with the current payment status.

(d) (3 points) Estimate the probability of the borrower being current on the loan if the borrower has credit grade “B”. Show your work.

logit(πi) = 3. 837 − 0 .496(GradeOrdinal) = 3. 837 − 0 .496(3) = 2. 35

πi =

e^2.^35 1 + e^2.^35 = 0. 9129

(e) (2 points) Construct a 95% confidence interval for the probability you found in part (d). Show your work.

Question3 (6 points total)

Answer the following questions based on the models in results parts C, D, and E.

(a) (2 points) Consider models C and D. Which model do you prefer? Briefly explain. Because these models are not nested, we can not compare them using a likelihood ratio test. Thus, we must rely on AIC or BIC. Since AIC is provided (and the data is ungrouped, making the sample size specification for BIC difficult), I will use this to compare the two models. Since the AIC for model D is smaller than that for model C, this model fits better than model C, and I prefer model D.

(b) (4 points) Conduct a Likelihood Ratio test of models C and E. Be sure to state your null hypothesis, and show your work. According to your test, which model is preferred? Null hypothesis: The coefficient for “Debt Consolication” and the interaction between “Debt Consolidation” and ordinal “Grade” are zero.

LRT = DC − DF = 15193 − 14668 = 525

The difference in the number of parameters in the two models is 2, so the LRT statistic has an asymptotic χ^2 distribution with 2 degrees of freedom. According to the table, the p-value is < 0 .0005, and so I reject the null hypothesis and prefer the larger model E.

Question4 (6 points total)

Answer the following questions based on the model in results part E.

(a) (4 points) In the context of this problem, interpret the effect of credit grade on the probability of a borrower’s payments being current. Because the effect of credit grade interacts with the effect of the purpose of the loan (Debt consolidation or not), we can not interpret the main effects without consider- ing the interaction term. Thus, I consider the two cases separately - those who are consolidating debt, and those who are not. For those who are not consolidating debt, a one-unit increase in the credit scale results in a 0.48 decrease in the log odds of current payment status. Looking at the plot, we see that when this is transferred to the probability scale, the effect of credit grade on the probability of a current payment dramatic, with those having the best credit having about a 95% expected probability of being current and those with the worst credit having nearly a 50% expected probability of being current. For those who are consolidating debt, a one-unit increase in the credit scale results in a 0.48-0.24 = 0.24 decrease in the log odds of a current payment status. Looking at the plot, we see that when this is transferred to the probability scale, the effect of credit grade on the probability of a current payment is not dramatic, since the probability of a current payment is already quite high for this group. Thus, the change in credit grade has little effect on the probability of a current credit status.

(b) (2 points) Briefly suggest one way you might assess the fit of this model, either using the output provided or if you had access to the data and a computer. The one thing you can’t do to assess the fit is to use the deviance of this ungrouped data. Other than that, you could use residuals, or influence diagnostics, or group the data to look at deviance. To compare the model to other models, you can use likelihood ratio tests or the area under the ROC curve. However, you are then choosing the best among a limited number of models, all of which may have a poor fit.

END OF THE EXAM