LendingRobot Can Improve Returns in Peer Lending

Summary

Back-testing simulations show that relying on LendingRobot’s proprietary scoring mechanism allows to increase investment performances significantly, on both Lending Club and Prosper marketplaces.

Based on an analysis of 277,814 loans, both ongoing and mature, Lending Club returns on average 6.93% for investors (yearly internal rate of return, net of fees). The top 25% of loans, as scored by the LendingRobot model, return 9.53% per year before fees, or 2.60% more than the marketplace average. Even when factoring the LendingRobot fees, which are 0.45% per year, that still represents a net performance increase of 2.15%.

Similar simulations with Prosper show an even bigger advantage, as LendingRobot’s top quartile returns 8.06% per year after fees, compared to 2.81% for the marketplace average.

Marketplace Total Funded amount Total Number of Loans Average Return Top LendingRobot Quartile Return
Lending Club $3,865,227,475 277,814 6.93% 9.53%
Prosper $1,365,035,192 146,504 2.81% 8.51%
Combined $5,230,262,667 424,318 5.54% 9.12%

Nota Bene: Average returns may seem surprisingly low, due to the calculation method used (see below). For Prosper marketplace, underwriting principles have dramatically changed since inception, and taking old data into account heavily penalized average returns. However, we considered more reliable to avoid any cherry-picking in this study, and this is why we factored in all the data available, without any censorship.

Introduction

Peer lending returns differ significantly based on which loans are selected. The best half of Lending Club loans gives a return of 14.20%, the worst half, –0.30%.

Relying on loan grades alone for selection does not significantly improve returns. Logically enough, better grades have lower delinquency risks, but also offer lower interest rates.

Lending Club Grade Number of loans Average Rate Average Return
A 45,430 7.64% 4.38%
B 88,434 11.77% 6.30%
C 72,531 15.12% 7.82%
D 42,241 18.07% 8.30%
E 19,129 20.77% 9.04%
F 8,145 23.12% 8.46%
G 1,904 24.06% 5.73%

Obtaining above-average performance necessitates to identify loans whose interest rates do not fully reflect defaulting risks; which requires, in turn, to be able to better assess the probability of default. Incidentally, constantly monitoring the most popular loans , as provided on LendingRobot’s Latest Lending Club loans page shows that educated investors seek mostly middle to low grades, and stay away from the safest loans, indicating they are also focusing on over-penalized borrowers.

Due to a lack of experience in loan underwriting by retail investors, limited industry background and the heterogeneity of the loans, predicting the risk of default is a complex endeavor. Competition amongst investors also requires one to act fast once a new loan is made available. Hence the appeal in relying on machine-learning techniques to provide simple-to-use, timely and sophisticated scores.

Estimating Loan Returns

The most popular performance measurement is ‘Return on Investment’, which is simply $\frac{P-A}{A} $, where A is the funded amount, and P the sum of payments. However, such a measurement does not take the time value of money into account, and is therefore too optimistic. Lending Club shows ‘Net Annualized Returns’ that have the benefit of being more realistic. Unfortunately, due to its calculation principle, the NAR systematically decreases over time. As of July 2014, the initial median return is 10.6%, but it drops to 6.9% 30 months later.

For measuring performance, LendingRobot uses the Internal Rate of Return, which is the discounting rate r at which the funded amount equals the sum of the present value of the payments:

$$ A = \frac {p_1} {(1+r)^1} + \frac {p_2} {(1+r)^2} + … + \frac {p_{n}} {(1+r)^{n}} $$

The main difficulty when measuring investment performance in peer lending is taking into account loans that are not mature yet. Due to the rapid growth of the industry, the loans old enough to have reached maturity represent only a small fraction of all the loans that have been issued. For instance, from Lending Club’s publicly available data as of April 2014 (the data are updated quarterly), out of the 277,576 loans issued by Lending Club, only 19,970, or 7% of them, are old enough to have reached maturity.

Restricting analysis to mature loans would introduce two issues: first, less data means a higher noise-to-signal ratio. Second, loans issued several years ago are likely to have been issued under different economic conditions, borrower’s demographics and with different underwriting processes.

Similarly, a back-tested approach is required, since both LendingRobot and its model are too young to measure actual performance.

LendingRobot has developed a method to estimate future returns of on-going loans, that allows to factor in the number of payments they made already: the more payments a loan has made, the less likely it is to default. Such a method allows to take on-going loans into consideration when back-testing performance.

This method takes into account a pre-defined risk of default, and a distribution of that risk over time. The distribution of probabilities of default over time, also called hazard rate is empirically determined by analyzing when loans stopped paying on the marketplace, and is considered constant for a given term. As for the pre-defined risk of default itself, it is crucial to NOT rely on LendingRobot’s proprietary prediction system but on something entirely different. Using the same method to both select loans and measure the selection’s performance would be utterly misleading. To get a default rate estimated outside of LendingRobot, the current analysis simply relies on historical averages by (sub)grades.

These default rates by grade are observed from marketplaces’ historical data. They are smoothed and interpolated through a bi-exponential function such as $ y = a + b e^{-cx} + d e^{-fx}$ to remove noise and make them monotonously increasing. For instance, for Lending Club:

Lending Club Sub-grade Number of Loans Default Rate Smoothed Default Rate
A1 312 2.56% 1.51%
A2 699 3.72% 3.32%
A3 1,081 4.44% 4.91%
A4 1,452 5.10% 6.32%
A5 1,698 7.01% 7.58%
B1 955 9.42% 8.72%
B2 1,088 10.11% 9.76%
B3 1,219 10.58% 10.72%
B4 1,197 12.70% 11.62%
B5 1,326 11.61% 12.48%
C1 1,028 14.88% 13.31%
C2 992 14.01% 14.12%
C3 909 16.61% 14.91%
C4 840 13.45% 15.70%
C5 750 16.53% 16.50%
D1 709 19.04% 17.31%
D2 652 19.17% 18.13%
D3 579 20.73% 18.97%
D4 477 20.55% 19.84%
D5 386 18.91% 20.74%
E1 315 24.44% 21.68%
E2 265 23.40% 22.65%
E3 211 18.48% 23.67%
E4 163 25.77% 24.74%
E5 152 20.39% 25.85%
F1 100 28.00% 27.02%
F2 98 29.59% 28.25%
F3 69 33.33% 29.54%
F4 64 34.38% 30.90%
F5 51 39.22% 32.33%
G1 43 25.58% 33.83%
G2 33 39.39% 35.42%
G3 29 34.48% 37.09%
G4 47 40.43% 38.84%
G5 61 37.70% 40.69%

Knowing its expected default rate and how many payments a loan has already made, we can calculate its expected monthly internal rate of return r with the following formula:

$$ A = p \times \left(\sum_{i=1}^{k} \frac {1} {(1+r)^i} + \left(1 – h_2(l) \right) \times \sum_{j=k+1}^{n} \frac { 1 – d \sum_{t=k+1}^{j} Pdf(n,t) } {(1+r)^j}\right) + \frac{p’}{(1+r)^{t’}}$$ (eq1)

Where:

  • ( A ) is the loan funded amount
  • ( p ) is the monthly payment amount
  • ( k ) is the number of monthly payments already made
  • Therefore $ p \times \sum_{i=1}^{k} \frac {1} {(1+r)^i}$ is the present value of the k payments already made
  • $ Pdf(n,t) $ is the probability of default occurring between the month t–1 and t for a loan with term n , as described in detail in a previous paper
  • Which gives $ \sum_{t=k+1}^{j} Pdf(n,t) $, the cumulative hazard rate, or probability of a loan to stop paying before month j knowing it has made already k payments but is certain to default.
  • As shown in the paper mentioned above, the hazard rate is homomorphic to the default probability: the overall default risk scales the hazard rate up or down without modifying its curvature
  • Therefore $ 1 – d \sum_{t=k+1}^{j} Pdf(n,t) $ is the probability of making the payment j when the estimated default rate of the loan is d
  • $ h_2(l) $ is the probability of the loan to stop making further payment when it is already late by l days. Again described in detail in the same paper
  • $ \frac{p’}{(1+r)^{t’}}$ is the present value of an extra payments p’ occurring at time t’, such as an early re-payment.

The rate of return r is then annualized such that $ R = (1 + r)^{12} – 1$

Scoring

A few months after starting to work on automating order submissions for retail investors, LendingRobot began to research a method for selecting the loans with the best return potential. Several methods were evaluated, and the company eventually decided to use a combination of statistical and machine-learning techniques. The underlying principle, called the Cox Proportional Hazard model, is a well-know method extensively used in the medical industry. Predicting how long a loan will keep paying is eerily similar to forecasting the life expectancy of a patient. The LendingRobot algorithm is described in detail in this blog post.

Back-Testing (Lending Club)

Historical data can be downloaded from the Lending Club website. Note: a Lending Club account is required to download the extended dataset.

For each of the 277,814 loans in the dataset, we calculate both the LendingRobot score, and the latest expected return.

  • The LendingRobot score only takes into account information that was available at loan issuance, such as loan amount, purpose or debt-to-income ratio. The number of payments it will actually make or its future status (eg. ‘fully paid’) are NOT used for calculating that score.
  • Equation (eq1) is used to calculate the expected rate of return, after deducting 1% of Lending Club service fee from all payments and applying the smoothed default rates by (sub)grade
  • Any discrepancy between the total amount paid and the product of the installment and the number of payments is reconciled at the last payment date when the load was repaid early, and at maturity otherwise
  • When the internal rate of return cannot be calculated (for instance if there are no payments at all), the Return on Investment is used instead.
  • All investments are assumed to happen at the same time.

Since an individual investor is more likely to invest an equal sum of money in each loan, regardless of each total funded amount, the return average is the arithmetic average of all the returns, it is not dollar-weighted. The average Lending Club return is 6.93%.
Incidentally, we can also see that this is very close to Lending Club’s median value of 6.9%.

To calculate the performance of LendingRobot, the loans are sorted by decreasing LendingRobot score. Then a given percentile of the loans are selected, and the average return is calculated again.

Selection Average Return Standard Deviation
Top 5% 11.22% 22.07%
Top 10% 10.61% 21.82%
Top 15% 10.14% 21.65%
Top 20% 9.81% 21.34%
Top 25% 9.53% 21.00%
Top 30% 9.25% 20.76%
Top 35% 9.03% 20.53%
Top 40% 8.84% 20.22%
Top 45% 8.68% 19.98%
Top 50% 8.53% 19.74%
Top 55% 8.37% 19.58%
Top 60% 8.21% 19.43%
Top 65% 8.01% 19.35%
Top 70% 7.83% 19.25%
Top 75% 7.65% 19.11%
Top 80% 7.48% 18.91%
Top 85% 7.29% 18.76%
Top 90% 7.13% 18.52%
Top 95% 6.96% 18.30%
All loans 6.93% 18.57%

We consider the top quartile to be representative of LendingRobot’s picks when calculating alpha. Quartiles are commonly used in statistics. It also ensures the selection is broad enough for that advantage to be sustainable as LendingRobot grows in popularity (one could imagine early adopters can get even better performance). Returns keeps increasing with the selection being narrower, therefore the LendingRobot advantage remains valid across the board. Backtesting also shows that the increase in returns more than compensates for the 0.45% of LendingRobot fees as soon as 80% or less of loans are selected.

Calculating Alpha

Alpha, or the risk-adjusted measure of over-performance, is calculated as $ \alpha = r_a – \beta r_b$, where $ \beta = \rho_{a,b} \times \frac{\sigma_a}{\sigma_b}$ is the comparative measure of risk, or the correlation of the returns ra and rb multiplied by the ratio of their standard deviations. Since the LendingRobot selection is only a subset, $ \rho_{a,b} = 1 $ and therefore:

$ \alpha = \left( 9.53 – 0.45 \right) – 6.93 \times \frac{21.00}{18.57} =1.24 $

Back-Testing (Prosper)

Similarly, we measure default rates by grade from Prosper’s historical data and smooth them through a bi-exponential function.

Prosper Grade Number of Loans Default Rate Smoothed Default Rate
AA 4,828 12.22% 11.77%
A 5,295 18.19% 19.74%
B 5,719 26.86% 25.61%
C 7,217 31.11% 30.14%
D 8,188 32.47% 34.29%
E 5,262 40.59% 39.79%
HR 4,986 50.84% 50.95%

Applying the same back-testing methods, but with parameters specific to Prosper, shows the following results:

Selection Average Return Standard Deviation
Top 5% 10.67% 36.66%
Top 25% 8.51% 32.37%
Top 50% 7.28% 27.43%
Top 75% 5.87% 25.10%
All loans 2.81% 28.13%

Limitations

Hypothetical performance is never as reliable as actual performance. Although we tried to avoid introducing any insight bias, all the following limitations must be kept in mind:

  • Some loans may not be available anymore when trying to invest money in them. Competition amongst investors may restrict access to some notes.
  • Investing may be prevented by the cash not being available because it was temporarily allocated to another loan, that eventually will not be issued.
  • Cash is supposed to be totally invested. In reality, it may have to accumulate before it can be re-invested (minimum Lending Club investments are $25).
  • The time between cash allocation and loan issuance, which is not bearing any interest, is not taken into account.
  • LendingRobot fees on idle cash are NOT taken into account.
  • All investments are considered to happen at once, while in real-life the amount available for re-investment depends upon the performance of previous investments.
  • Resulting tax liability is not deducted from performance results.
  • Historical data obtained from Lending Club and Prosper are not warranted to be accurate, complete or timely.
  • Historical data doest not show the exact dates and amounts of payments made, therefore exact sequences of payments have to be extrapolated.
  • Back-tested performance does not involve financial risk, and no hypothetical trading record can completely account for the impact of financial risk in actual trading.
  • Past performance does NOT guarantee future results.

7 Comments

  1. Darrell says:

    Please ignore my previous post–I discovered my math error. By the way, good article. I hope to use Lending Robot once they allow it in my state.

  2. Lincoln Fiske Jr says:

    Have you done any studies to see how your 2.60% advantage has varied since you calculated this in mid-2014 and whether it has sustained? Also, I’m assuming the advantage varies quite a bit for the different grades–what are the separate grade advantages? Thank you.

    1. Emmanuel Marot Emmanuel Marot says:

      Please check https://www.lendingrobot.com/#/resources/performance/LC/ for updated statistics.
      Alpha tends to be consistent across all grades,but is slightly higher for more aggressive loans.

      1. Lincolnf says:

        Thanks. Here’s what’s especially confusing. When you published your article in July, 2014, the net advantage of LendingRobot over LendingClub was 2.15%. If I now look at your performance chart, June, 2014 (I’m assuming you were using data through June 30) now has an advantage of 1.11%. Is this because your advantage degrades as loans mature, and is this why the chart advantage increases after 12/31/2012, because you have increasingly more immature loans as we move through time? As far as grades, are you saying that LendingRobot can increase grade A loan returns by the same raw amount as D or G, so that if the advantage is 2.60%, instead of an A raw return of 4% I could get something like 6.60%? Or do you mean the proportional advantage is about the same? In this study, since as you say “the loans are sorted by decreasing LendingRobot score,” doesn’t that have to result in the top quartile having few or no A-C grade loans, so that therefore the advantage is almost entirely due to the D-G loans? Or did you sort to take the top quartile of each grade, as you’re now doing for the performance chart (using top quintile)?

Leave a Reply