On the Mark with Markowitz, Diversification Part 1


In investing, where we walk a fine line between risk and return, the age-old adage of “don’t put all your eggs in one basket” is strikingly relevant. As touted repeatedly by financial pundits, your investment portfolio should be sufficiently diversified so that the performance of any single investment does not have an unreasonably large impact on your entire portfolio’s performance. Diversification can reduce the volatility of a portfolio while still targeting an expected overall return. In this article, we attempt to apply diversification to Peer Lending through the analysis of a marketplace’s historical loan and payment history data (Lending Club). Our methodology follows Harry Markowitz’s Modern Portfolio Theory, which states that for a desired expected return, there is a corresponding portfolio with a minimum amount of risk.

In the following discussion, assume that all cashflows are net of Lending Club service fees unless otherwise specified. Also note that numbers may be slightly off due to rounding.


There are several assumptions that we make for our analysis, namely:

  1. With enough \(A\) grade loans, your portfolio of \(A\) grade loans will come to represent the performance of a single most typical \(A\) grade loan. The \(A\) asset class refers to this portfolio of \(A\) grade loans. The same applies to all other loan grades.
    1. The underlying thought is that once you have a certain number of \(A\) grade loans, you’re essentially getting the same return and variance as a portfolio of the entire universe of the marketplace’s \(A\) grade loans. An analogy can be made to the Exchange Traded Fund \(VTI\), which is a subset of US stocks selected to closely mimic the performance of the total US stock market. Thus, the returns and risk from all \(A\) grade loans can be used to determine the return and risk of \(A\) grade loans as an asset class, much like the returns of stocks or bonds. We must do this aggregation for determining the covariance between the various loan grades.
  2. You can buy or sell your loans on the secondary market at any time, and earn a return from it equal to its calculated expected return, \(E(r)\), based on our methodology. Note that every loan has a positive expected return at issuance based entirely on its term, interest rate, and loan grade’s historic default rate.
    1. We make this assumption so that the return of a loan is not singular and invariant (if you are unable to sell a loan, then you have essentially locked in a single return, of which you won’t know until the loan fully matures, prepays, or becomes charged-off). If a loan’s return were singular and invariant, then we would not be able to compute the necessary variance and covariances for the Modern Portfolio Theory analysis. An illustrative example:
      Let us invest in a note today that makes no payments in the next 4 months and is subsequently charged-off. If there was no secondary market and everyone must hold onto notes they bought through the primary market, then our return on that note is going to be -100% and we are locked into that return as soon as we purchase the note. With a secondary market and the ability to sell, perhaps we noticed that this note had gone into grace period and attempt to sell it on the secondary market at a discount. If we are successful, then we mitigated our potential loss; we could have a -17% return or a -26% return, in both cases significantly better than -100%. Again, for our purposes we assume that the return you earned when selling that note was equal to the \(E(r)\) calculated using our methodology.

For this analysis, we look at Lending Club’s 36 month loans issued at least three years ago, so we know exactly how they performed and can accurately track their returns over time. This corresponds to 75,000 36 month loans. Since there are only about 3,000 completed 60 month loans, which results in some grades having too few loans available on which to draw statistically sound conclusions, we exclude 60 month loans from our analysis. The number of 36 month loans per grade is shown below:

Grade A B C D E F G Total
# of loans 20,494 26,469 16,244 8,937 2,096 516 244 75,000

Modern Portfolio Theory can be summarized as follows: given \(n\) assets with known expected returns, variances, and covariances between the \(n\) assets, a portfolio consisting of those \(n\) assets will have an expected return (\(E(r)\)) and a standard deviation (\(\sigma\)) calculated with the following formulas:

\[E(r) = \sum_{i=1}^{n}w_iE(r_i)\]

\[\sigma = \sqrt{\sum_{i=1}^{n}w_i^2\sigma_i^2 + \sum_{i, j=1, i\neq j}^{n}2w_iw_j\sigma_{ij}^2}\]

where \(w_i\) is the portfolio weight of asset \(i\) and \(\sigma_{ij}^2\) is the covariance between the \(i\)th and \(j\)th asset in the portfolio. With the above in hand, we can plot an Efficient Frontier which shows the portfolios that have the smallest standard deviation (risk) for a given expected return.

When applying this to a portfolio of Lending Club loans, we have \(n = 7\), for the seven loan grades \(A\)-\(G\). What we are still missing from our above equations are the \(E(r)\), \(\sigma_i^2\), \(\sigma_{ij}^2\) between the loan grades (asset classes), and \(w_i\) of each loan grade.

The next section describes how we aggregate loans within a grade to be representative of that grade as an asset class, which allows us to calculate our missing inputs.


As stated above, since we are limiting our analysis to old loans that have almost entirely reached matureity, we are confident in knowing how much each loan has paid and the dates of each payment. Therefore, we can accurately generate the series of cashflows and returns corresponding to a note for each month. For example, below we show the actual cashflows and status progression for a \(C\) grade loan with ID: 887606.

Month/Date Sep 2011 Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Status
0 -$15,000 Issued
1 -$15,000 $509.49 Current
2 -$15,000 $509.49 $509.49 Current
3 -$15,000 $509.49 $509.49 $509.49 Current
4 -$15,000 $509.49 $509.49 $509.49 $509.49 Current
5 -$15,000 $509.49 $509.49 $509.49 $509.49 $0 Grace
6 -$15,000 $509.49 $509.49 $509.49 $509.49 $0 $0 Late
7 -$15,000 $509.49 $509.49 $509.49 $509.49 $0 $0 $0 Late
8 -$15,000 $509.49 $509.49 $509.49 $509.49 $0 $0 $0 $0 Default

And next the expected returns of each month based on our predicted cashflows:

Date Sep 2011 Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012
E(r) 7.60% 7.85% 8.15% 8.48% 8.83% –7.00% –42.3% –56.4% –99.9%

There are two things to note: first, that if payments are actually received, the expected return increase slightly due to the value of the received payment replacing the value of predicted expected payments (the former is always greater than the latter). Second, that these returns are annualized monthly IRRs rather than simple ROIs, thus even though we would have been paid \($2037.97\) and our simple ROI in month 8 would be \(\frac{$2037.97}{$15,000} – 1* 100\% = -86.41\%\), the annualized IRR shows \(-99.9\%\). This post and subsequent post already discuss the why and how of calculating returns for Peer Lending. In the present case, to put it succinctly; when faced with uncertainty, we try to err on the conservative side.

At this point, we can already calculate \(E(r)\) for an asset class. Since we know the cashflows for each grade’s loans we can align them, come up with one aggregated cashflow array for the asset (loan grade), and compute the \(E(r)\).

An example of computing \(C\)’s \(E(r)\) (numbers in millions):

Funding Payment 1 Payment 2 Payment 3 Payment 4 Payment 5 Payment 6 Payment 7 Payment 67
-$171.88 $7.81 $6.67 $6.77 $6.80 $6.66 $6.51 $7.02 $0.001068

Note that sometimes payments in later months can be higher than payments in previous months. This is due to some loans missing payments in that month and other loans making prepayments/fully prepaying in that month. Also note that the cashflows go to 67 payments even though we’re limiting our analysis to only 36 month loans. This is because of some 36 month loans becoming late, followed by a renegotiation of terms to 60 month payment plans.

Taking the monthly IRR of the above cashflow and annualizing:

\[E(r_C) = ((1 + mIRR_C)^{12} – 1) * 100\% = ((1 + 0.005466)^{12} – 1) * 100\% = 6.76\%\]

Doing the same for every grade, we get the following \(E(r)\) values:

Grade A B C D E F G
E(r) 4.34% 6.28% 6.76% 7.67% 8.24% 0.94% 1.78%

Next is finding \(\sigma_i^2\) and \(\sigma_{ij}^2\).

So with a time series of returns for every loan, we focus on all loans within a grade (e.g. 16244 \(C\) grade loans) and align their expected returns by date. Below is what a small snippet focusing on just 3 of these \(C\) loans would look like. Loan 653597 was issued in August 2011, and loan 734373 was issued in July 2011.

Loan ID/Date Jul 2011 Aug 2011 Sep 2011 Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Jun 2012
887606 7.60% 7.85% 8.15% 8.48% 8.83% –7.00% –42.3% –56.4% –99.9%
653597 6.76% 7.03% 7.32% 7.65% 7.99% 8.35% 8.72% 9.09% 9.46% 9.81% 10.16%
734373 6.95% 7.21% 7.52% 7.86% 8.23% 8.61% 9.00% 9.39% 9.78% 10.16% 10.53% 10.89%

Again, our main objective is to aggregate all same-grade loans into one asset class, being able to determine how that asset class is doing every month (better or worse than expected), and comparing the returns of two asset classes month after month. To do this, we need to adjust (de-trend) every loan’s return to see if they are actually performing well or poorly. An example:

Loan 887606’s September 2011 return was 7.60%, which corresponds to its return at issuance. We want to find out if this return was good or not compared to all other \(C\) loans. To do this, we find \(E(r_{C0})\), the expected return of a \(C\) grade loan at issuance, by averaging every \(C\) loan’s return at issuance. So, loan 653597’s and loan 734373’s returns at issuance (6.76% and 6.95% respectively) are included in the \(E(r_{C0})\) calculation.

Doing this for every month’s return, we get the following table for \(C\) grade loans:

Month (k) 0 1 2 3 4 5 6 67
E(rCk) 7.60% 7.68% 7.99% 8.11% 8.35% 8.27% 8.07% 14.15%

So going back to our question of “is loan 887606’s return at issuance of 7.60% good?”, it can now be answered by comparing it to the expected value for the month in question (issuance is k = 0) of a \(C\) grade loan. Loan 887606’s return was 7.60%, and we see that the expected return of a \(C\) grade loan at issuance (\(E(r_{C0})\)) is 7.60%. The mathematical comparison becomes:

\[887606(r_{0}) – E(r_{C0}) = 7.60\% – 7.60\% = 0.00\%\]

We see that it performed exactly as expected so its return at issuance is neither better nor worse than the typical \(C\) grade loan. If the result were negative, then we’d interpret it to mean that the loan’s performance in the first month was below average, and if the result were positive then we’d understand the loan to have performed better than average. We do this comparison for every return of every loan, and reconstruct the date-aligned adjusted expected returns as follows. Finally, we get our single time series of returns for the \(C\) asset class by taking the average of returns of all \(C\) loans in a specific month (shown in the Average Return row):

Loan ID/Date Jul 2011 Aug 2011 Sep 2011 Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Jun 2012
887606 0.00% 0.18% 0.16% 0.36% 0.48% –15.23% –50.36% –64.57% –108.23%
653597 –0.88% –0.65% –0.67% –0.47% –0.36% 0.08% 0.65% 0.92% 1.20% 1.38% 1.42%
734373 –0.72% –0.47% –0.47% –0.25% –0.12% 0.34% 0.93% 1.22% 1.53% 1.73% 1.73% 2.13%
Average Return –0.82% –1.09% –1.20% –0.91% –0.75% –0.67% –0.57% –0.41% –0.44% –0.25% –0.11% –0.16%

A quick aside: the interpretation of the Average Return for the \(C\) asset class during the time period shown is that it performed poorly since the average returns are all negative.

Now that we have the time series of returns for every asset class (\(A\) – \(G\)) we can compute the covariance needed in the formula for the portfolio’s \(\sigma\). For those unfamiliar with covariance, it is a close cousin of correlation. Going back to our goal of diversification, you do not want your portfolio to consist of only highly correlated assets because if one asset performs poorly, then all of them would tend to perform poorly. Conceptually, we can think of covariance as answering the question of “do two asset classes tend to move together over time?” and might reach a qualitative “”yes”” or “no” by comparing the returns of each asset class month by month and seeing if they tend to increase/decrease together. Formally, the equation we use for covariance is:

\[\sigma_{ij} = \frac{ \sum \limits_{k=1}^{n}(i_k – \bar{i})(j_k – \bar{j})}{n-1}\]

where \(i\) and \(j\) are the two assets being compared and \(k \in [1,n]\) is the observation number in \(n\) observations.

With the time series returns of every asset class, we can now construct the covariance matrix. Units are in \(\%^2\):

Grade A B C D E F G
A 6.42 6.80 6.18 4.76 4.91 6.94 5.71
B 9.50 10.05 9.84 10.91 16.18 13.34
C 11.80 12.24 13.73 20.84 16.81
D 14.55 16.42 25.21 21.10
E 19.38 29.23 24.33
F 46.44 38.16
G 35.37

Luckily for us, the covariance matrix contains the variance \(\sigma_i^2\) along the diagonal since, by definition, the covariance of an asset with itself is its variance.

Finally, we are ready to simulate portfolios of randomly generated weights for each asset class via monte carlo simulations. For the hyperbolic efficient frontier, we use quadratic programming to solve for the asset class weights of portfolios that lie on it. \(E(r)\) and \(\sigma\) for each asset class are shown below, followed by a plot of the randomly generated portfolios and Efficient Frontier.

Return by Risk by Grade Efficient Frontier Monte Carlo

And for the various weights of each asset class corresponding to portfolios that lie on the Efficient Frontier:

E(r) σ (risk) A B C D E F G
8.24% 4.40% 100.00%
8.20% 4.36% 7.03% 92.97%
7.89% 4.00% 61.14% 38.86%
7.57% 3.73% 7.54% 90.95% 1.51%
7.28% 3.50% 27.97% 72.03%
6.87% 3.24% 15.53% 20.40% 64.07%
6.35% 2.93% 39.48% 60.52%
6.04% 2.78% 48.96% 51.04%
5.79% 2.67% 56.48% 43.52%
5.59% 2.60% 62.46% 37.54%
5.21% 2.52% 73.97% 26.03%
4.92% 2.49% 82.63% 17.37%
4.83% 2.48% 85.17% 14.83%


Readers will quickly notice that according to this analysis, \(C\), \(F\), and \(G\) asset classes are never included in the optimal portfolios. This is because historical data indicates there are better risk/return loan grades that should substitute for them in a portfolio. When looking at \(E(r_F)\) and \(E(r_G)\) from the aggregated cashflows, the paltry expected returns they provide do not justify their huge variance in returns (see the diagonal along the covariance matrix). As for \(C\), it is largely overshadowed by \(D\), which has a superior expected return and even lower covariances with the \(A\) and \(B\) asset classes. It too makes sense that one would not allocate any of their portfolio to \(C\) when it could instead be allocated to \(D\).

Stay tuned for the next installment in this series, attempting to answer the question of “how many \(A\) notes does one actually need to obtain the risk/return profile of \(A\) as an asset class.”


This analysis was done at an entire marketplace level with no association at all to LendingRobot’s scoring algorithm or selected loans. While the results show no allocation for \(C\), \(F\), and \(G\) as asset classes, this does not necessarily mean they should be avoided at an individual loan level. It is also worth noting that the expected returns, covariance matrix, and portfolio weights may differ if looking at only LendingRobot picked loans, and further analysis is warranted.