Predicting Returns for Ongoing Loans


In a previous paper, we showed how we could predict the future return of a loan at its inception, based on historical data from the marketplace and the loan’s characteristics.

Although such information is crucial to decide what to invest in on the primary market, it is also beneficial to be able to re-assess expected returns when loans are already generating payments, in particular to monitor the performance of a portfolio or to compare strategies. The fact that a loan hasn’t defaulted yet bears information that should be used to update its default likelihood and increase its expected return.

The goal of the present paper is to offer a mathematical model to factor in such payments.

Internal Rate of Return

As seen previously, our preferred method for calculating loans performance is the Internal Rate of Return. The monthly rate of return r is the discounting rate such that the Net Present Value, or the sum of the discounted future cash flows minus the original investment equals 0. Given a loan of amount A that makes m monthly payments of p, r is such that:

$$ NPV = 0 = \frac {-A} {(1+r)^0} + \frac {p_1} {(1+r)^1} + \frac {p_2} {(1+r)^2} + … + \frac {p_{m}} {(1+r)^{m}} $$


$$ A = \sum_{i=1}^{m} \frac {p_i} {(1+r)^i} $$

The amount paid each month p is constant over time and known in advance, therefore predicting the rate of return only requires to estimate the number of payments m.

A series of identical payments made at fixed time intervals is called an annuity, and using the sum of geometric series formula, the previous formula can be rewritten as:

$$ A = p \times \frac{1-(1+r)^{-m}}{r} $$

If the borrower decides to repay the loan early, he will make an extraneous payment, of a different amount p’. That payment is likely to happen out-of-schedule. We can express that time of payment as i a floating number of months since issuance date. For instance, the Lending Club loan #69001 was issued on September 25th, 2009, and was repaid early on July, 5th, 2012, or i=33.31 months later. Likewise, a loan could default, yet generate some recovery payment at a later date.

We can include an optional adjustment, such as an early re-payment or late recovery, by adding its present value to the right side of the previous equation. Therefore we have:

$$ A = p \times \frac{1-(1+r)^{-m}}{r} + \frac{p’}{(1+r)^i} $$ (eq1)

Nota bene: obtaining the annual rate R requires to annualize the monthly rate r with the following formula: $(R = (1 + r)^{12} – 1 ) $

Number of Payments

Let $ h(t)$ be the unconditional probability of a loan to default during the month preceding t, calculated as the number of loans defaulting during the month preceding t divided by the total number of loans considered. The probability that a loan makes the first payment is  $ 1-h(1)$. Likewise, the probability to make a second payment is $ \left(h-(1)\right)-h(2)$.

By extension, the probability P(k) to keep paying until the kth month and calculated at issuance equals 1 minus all the probabilities that it defaulted before reaching k:

$$ P(k) = 1 – \sum_{i=1}^{k} h(i) $$

Which means in turn that at issuance, the expected total number of payments m made by a loan of term n is the sum of all those probabilities until maturity:

$$ m = \sum_{i=1}^{n} \left( 1 – \sum_{j=1}^{i} h(j) \right)$$

When k payments have been made already, the risk of default before reaching k drops to 0. Therefore m becomes:

$$ m = \sum_{i=1}^{k} 1 + \sum_{j=k+1}^{n} \left( 1 – \sum_{t=k+1}^{j} h(t) \right) $$

Which can be simplified in:

$$ m = k + \sum_{j=k+1}^{n} \left( 1 – \sum_{t=k+1}^{j} h(t) \right) $$ (eq2)

A central question is the homogeneity of borrowers’ behavior access time and risk profile. A study of 1,300,000 European consumer loans (Francesca, 2012) shows that while the probability of default is only monotonously decreasing for low-risk borrowers, the profiles of hazards rates remain somewhat similar.

Analyzing the hazard rate for all Lending Club loans by grade A-B, C-D and E-F-G also shows similar curves:
(Only the shape similarity is important, not the values themselves as including young loans makes early defaults over-represented)

Hazard Rate by Grade Categories
Hazard Rate by Grade Categories

Predicting Default Occurrences

We can estimate $ h(t)$, the probability of a loan to default between t and t–1, also called the hazard function, by observing the occurrence of defaults in historical data.

Lending Club’s historical data shows that out of 23,156 Lending Club loans old enough to have reached maturity, 2,871 have defaulted. Since we know when those defaulting loans stopped paying, we can graph the Probability density function, or hazard rate for defaulting loans with a term of 36 months. The risk of default sharply increases during the first third of its life, then decreases each month until maturity:

Hazard Rate for defaulting, 36-months loans
Hazard Rate for defaulting, 36-months loans

As seen in a previous paper, we can analyze the effect of the default rate on the ratio of missed payments (term minus payments made divided by term) through a Monte-Carlo simulation. This relationship is very linear, indicating the hazard rate is purely proportional to the default rate:

Missing Payments by Default rate
Missing Payments by Default rate

In other words, whatever the default probability of a loan, its hazard cuve will always have the same shape. For instance, a grade A and a grade F loans are both 4 times less likely to default on the 34th month than on the 13th.

Therefore knowing d the overall probability of default of a loan and Pdf(n,t) the generic probability distribution function of default occurring between the months t–1 and t when the term is n, we have: $$ h(n,t,d) = d \times Pdf(n,t) $$ (eq3)

Which means, with (eq2), that the predicted number of payments m is:

$$ m = k + \sum_{j=k+1}^{n} \left( 1 – d \sum_{t=k+1}^{j} Pdf(n,t) \right) $$ (eq4)

Predicting Return

By combining (eq1) and (eq4), we can obtain the monthly rate of return r for a loan of amount A, term n, installment p and default probability d when it has made k payments already:

$$ A = p \times \frac{1-(1+r)^{-\left( k + \sum_{j=k+1}^{n} \left( 1 – d \sum_{t=k+1}^{j} Pdf(n,t) \right) \right)}}{r} + \frac{p’}{(1+r)^i} $$ (eq5)

Let us take for example the Lending Club loan #3696613 that was issued on 09–12–2013. The loan has a sub-grade B2 with an interest rate of 11.14%. The loan amount is \$10,000, its term is 36, and it has to pay back \$328.06 per month, minus 1% service fees. To estimate the probability of default, we take the average default rate amongst loans with the same sub-grade, which is 10.31% amongst 1,280 B2 loans past maturity.

Number of payments made Expected number of payments Expected Return
0 33.83 6.80%
1 33.88 6.92%
2 33.96 7.08%
3 34.06 7.28%
4 34.17 7.50%
5 34.29 7.73%
6 34.41 7.97%
7 34.53 8.20%
8 34.65 8.43%
9 34.76 8.66%
10 34.87 8.87%
11 34.98 9.07%
12 35.08 9.26%
13 35.17 9.44%
14 35.26 9.60%
15 35.34 9.75%
16 35.42 9.90%
17 35.49 10.03%
18 35.55 10.14%
19 35.61 10.25%
20 35.66 10.35%
21 35.71 10.44%
22 35.76 10.53%
23 35.80 10.60%
24 35.83 10.67%
25 35.87 10.72%
26 35.90 10.77%
27 35.92 10.82%
28 35.94 10.86%
29 35.96 10.89%
30 35.97 10.91%
31 35.98 10.93%
32 35.99 10.94%
33 35.99 10.95%
34 36.00 10.96%
35 36.00 10.96%
36 36.00 10.96%

Late Payments

According to Lending Club, the probabilities of defaulting once a loan is late in payment are as follows:

Status Probability of default
In Grace (1 to 15 days late) 23%
Late (16 to 30 days) 49%
Late (31 to 60 days) 62%
Late (61 to 90 days) 78%
Late (91 to 120 days) 84%
Probability of Default by Days late
Probability of Default by Days late

These data can be interpolated with a bi-exponential function, such that the probability h2(l) that a loan won’t make any further payments when it is l days late is:

$$ h_2(l) = a_1 + a_2 \cdot e^{-a_3 \cdot l} + a_4 \cdot e^{-a_5 \cdot l} $$ with a1 = 1.0191, a2=–0.3564, a3=0.0857, a4=–0.6653 and a5=0.01278

Historical data do not show what happens for loans that resume paying, and we have to consider that they return to their previous default risk levels. Since the missed payments accumulate, they are to be paid once the loan becomes current again, and late fees shall compensate for delayed payments. In other words, the loan is ‘back to normal’. Therefore, given (eq4), the expected number of payments m for a loan of term n, default probability d, that has made k payments already and which is l days late is:

$$ {m = k + 0 \text { with probability } h_2(l) k + \sum_{j=k+1}^{n} \left( 1 – d \sum_{t=k+1}^{j} Pdf(n,t) \right) \text {with probability } \left(1 – h_2(l) \right)} $$

Which can be simplified in:

$$ m = k + \left (1 – h_2(l) \right) \cdot \sum_{j=k+1}^{n} \left( 1 – d \sum_{t=k+1}^{j} Pdf(n,t) \right) $$ (eq6)

Combining (eq6) with (eq5) gives:

$$ A = p \times \frac{1-(1+r)^{-\left( k + \left(1 – h_2(l) \right) \cdot \sum_{j=k+1}^{n} \left( 1 – d \sum_{t=k+1}^{j} Pdf(n,t) \right) \right)}}{r} + \frac{p’}{(1+r)^i} $$ (eq7)

For instance, if the loan #3696613 paid 9 months already then begins to be late in payments, we have:

Number of days late Expected number of payments Expected Return
0 34.76 8.66%
1 33.72 6.58%
2 32.33 3.60%
3 30.71 –0.19%
5 27.92 –7.68%
10 22.91 –24.78%
20 17.48 –49.47%
30 14.75 –64.03%
40 13.14 –72.87%
50 12.07 –78.60%
60 11.30 –82.49%
70 10.74 –85.22%
80 10.31 –87.18%
90 9.99 –88.59%


Francesca, G., 2012. A Discrete-Time Hazard Model for Loans: Some Evidence from Italian Banking System. American Journal of Applied Sciences


  1. Nicolas says:

    Dear Mr. Marot,

    trying to replicate your approach to forecasting the default probability of current and late loans of the up-to-date LC loan book, I have come to point where I do not know how to proceed anymore.

    I have been able to calculate and plot the Kaplan-Meier Failure/Survival rate as well as the Cox Prob function.
    However, given my idea of eventually calculating the ROI and default rate of the current loans on LC, I simply do not know how to proceed, even given the wonderful explanation in this study.

    Would you or one of the dear readers be able to shortly summarize the next step necessary to determine the default rate of the current loans – given the my results (NOTE: I am unsure what determines the actual predicted default rate of loan of specific remaining runtime!)

    The calculations are for my master thesis

    Thank you very much in advance!

    1. Emmanuel Marot Emmanuel Marot says:


      Once you get the cumulative hazard rate and the overall risk of default, then the probability a payment is missed at time t equals risk of default * cumulative hazard, so you discount it by that. For instance, if a loan has a monthly installment of $120, an overal probability of default of 15% and a cumulative hazard of 0.8 at month 25, then the payment you take into account for that 25th month is $120 * ( 1 – 0.15 * 0.8) = $105.60.

      You sum all those discounted payments, compare the sum with the loan amount and that should give you the ROI.


      you may discount a payment at time t by multiplying it by (1-hazard)*(1-default risk). For instance, a $1

      1. Nicolas says:

        Mr. Marot,
        First of all, thank you so much for your reply – I can imagine your available time is scarce and valuable.

        Given your further explanation, I know understand how to eventually calculate ROI after summing the discounted payments for every loan in the data sample and dividing it by the original loan amount – thank you very much for your explanation.
        However, there is just one pieces unclear to me regarding the overall risk of default:
        Preparing the LC loan book in Excel, I calculate the runtime of every loan (Last payment date – Issue date in “months”), given the fact that most loans are still “Current”. This “runtime” variable is used as the “survival time”, given the loan has not defaulted or has not been paid back yet – depending on the individual status, naturally. Following, I use STATA to calculate both the Kaplan-Meier Failure/Survival rate as well as the cumulative hazard rate.

        My question is just now: What is the overall probability of default or how – better – do I calculate the overall probability of default? Is it simply the hazard rate for each month run?

        Again, thank you very much in advance!


Leave a Reply