A short discourse on LendingRobot Series’ loan selection algorithm
With the launch of LendingRobot Series comes our new model LRv2, aptly named to distinguish it from its predecessor, LRv1. LRv2 was designed specifically to work with LendingRobot Series, and represents a significant improvement in loan selection over LRv1. In this blog post we’ll discuss the inner workings of LRv2, its differences to LRv1, and how it fits into LendingRobot Series.
For those unfamiliar with our methodology behind LRv1 and/or how we calculate expected returns for a single loan, relevant links are here and here. Basic understanding of the time value of money, expected value, and logistic regression will also likely be helpful.
The examples throughout this post pertain to Lending Club, but the modeling and analysis processes are similar across all platforms. As for our notation, matrices are bold uppercase, vectors are bold lowercase, and elements of vectors are normal typeface and indexed. Vectors are assumed to be column vectors unless otherwise specified.
Scoring and Ranking Loans
Scoring loans can be broken down into two parts: 1) calculate the probability of default for a loan and 2) generate the expected future cashflows for that loan and compute the internal rate of return (IRR) as the score. Net present value (NPV) is also computed for analysis purposes, but using a slightly modified version discussed below. Let’s examine these two points in the order of increasing mathematical complexity.
Generating the Series of Cashflows
The main goal of the scoring system is to “pick a good loan.” At a high level this can equate to a good loan having the highest IRR and/or NPV, which then frequently equates to choosing loans that do not default and have a higher interest rate. For those familiar with the aforementioned financial concepts, it should be immediately recognizable that IRR and NPV calculations both depend on having a series of underlying cashflows. The specifics of how we generate the cashflows is described in detail in this blog post, but the abridged version is that for a loan known to default, the probability that the loan defaults in a specific month after issuance is known (e.g. a defaulting loan might have a 2% chance to default 2 months after issuance but a 14% chance to default 9 months after issuance). With these empirically known probabilities, we can generate the expected value of future cashflows by multiplying the cashflow’s value by the probability of actually receiving it. For those more visually inclined, the series of cashflows is depicted below:
Note the grey-shaded discounted regions growing larger in payments further away due to the increasing cumulative probability of not receiving payments at later dates.
With the cashflow array in hand, we can calculate IRRs and NPVs (choosing a larger or smaller discount rate depending on time preference for money) and thus have our criterion for ranking loans.
Predicting the probability of default.
As stated before, we know the probability of a defaulting loan becoming delinquent in a particular month, so the missing piece of our scorer is determining what a particular loan’s probability of default is. In LRv2, we generate the default probabilities via logistic regression with an elastic net penalty added to the loss function for regularization. That’s a mouthful so let’s examine each component below.
Logistic regression is a well established method for predicting probabilities, making it particularly suited to our binary classification of defaulting and non-defaulting loans. Suppose we have \(m\) historical loans with \(n\) features per loan (features being continuous, discrete, and categorical) organized into the \(m\ {\times}\ n\) matrix \(\mathbf{X}\) and \(\mathbf{x}_{i}\) is a row vector, \(i = 1, 2, . . ., m \) of \(\mathbf{X}\) that contains the \(n\) feature elements \(x_{k}\) where \(k = 1, 2, . . ., n \). We observe which loans have defaulted and mark them by setting their target as \(y=1\) for defaulting loans and \(y=0\) for paid-off loans. Therefore, we have an \(m\)-element target vector, \(\mathbf{y}\). Our predictions will be in the m-element vector, \(\hat{\mathbf{y}}\).
Logistic regression makes predictions for the probability of default for a given loan \(\mathbf{x}_{i}\) using the formula:
\[ P(y_{i}=1|\mathbf{x}_{i}) \approx \hat{y_{i}} = \hat{f}(\mathbf{x}_{i}) = {1 \over 1+e^{-(b_0+\sum\limits_{k=i}^{n}{(b_kx_k)})}} = {1 \over 1+e^{-\boldsymbol\beta \mathbf{x}_{i}}}\]
where \(b_{0}\) is an intercept, \(b_{k}\) is the coefficient to feature \(x_{k}\), and \(\boldsymbol\beta\) is our vector form of the \(b_{k}\) coefficients. Graphing the above equation for various values of \(\boldsymbol\beta \mathbf{x}\) gives us the following plot:
We see that as \(\boldsymbol\beta \mathbf{x} \rightarrow +\infty\), \(\hat{y} \rightarrow 1\) and as \(\boldsymbol\beta \mathbf{x} \rightarrow -\infty\), \(\hat{y} \rightarrow 0\). Also note that at \(\boldsymbol\beta \mathbf{x} = 0\), \(\hat{y} = 0.5\). Thus, the logistic regression can be thought of as a transformation of real valued linear regressor outputs to a probability bound between 0 and 1.
So how do we choose the coefficients in \(\boldsymbol\beta\)? We fit the coefficients to our training data via stochastic gradient descent subject to minimizing our cross-entropy loss function:
\[L^{*}(\mathbf{y},\mathbf{\hat{y}}) = -\frac{1}{m} \sum\limits_{i=1}^{m}y_{i} \ln \hat{y_{i}}+(1−y_{i})\ln(1−\hat{y_{i}})\]
The intuition behind this equation is simple: if \(y_{i}\) is \(1\) (a defaulted loan), the second term in the sum zeros out. If our prediction \(\hat{y_{i}}\) is close to 1 (a good prediction because we think the loan will default and the loan did default), \(y_{i} \ln \hat{y_{i}}\) gives a small negative number. If our \(\hat{y_{i}}\) is instead closer to 0 (a bad prediction of non-default when the loan did default), then \(y_{i} \ln \hat{y_{i}}\) returns a larger negative number. In the case where \(y_{i}\) is 0, the sum’s first term zeros out and similar results can be reasoned out for the good and bad prediction cases. The closer our loss function is to 0, the better our predictions are. Since the terms in the sum produce negative numbers, we multiply the whole sum by a negative, thus making the problem one of minimizing the sum of errors of our predictions to the true outcomes.
Recall at the start of this section on predicting default probability, we included an elastic net penalty to our loss function. Taking that penalty into account, the full form of our loss function is:
\[L(\mathbf{y},\mathbf{\hat{y}}) = L^{*}(\mathbf{y},\mathbf{\hat{y}}) + (\alpha*l_{1}\operatorname-ratio)||\boldsymbol\beta||_{1} + (.5*(1-\alpha)*l_{1}\operatorname-ratio)||\boldsymbol\beta||_{2}^{2}\]
where the additional terms constitute the elastic net penalty. Put simply, the elastic net penalty is a blend of using the \(L_{1}\) and \(L_{2}\)-norms to constrain the coefficients to a pre-selected threshold. The \(L_{1}\)-norm is the sum of absolute value of the elements in \(\boldsymbol\beta\) (e.g. \(\sum\limits_{k=1}^{n} |b_{i}|\)) and the \(L_{2}\)-norm is the euclidean distance of the elements in \(\boldsymbol\beta\) (e.g. \(\sqrt{\sum\limits_{k=1}^{n} b_{i}^{2}}\)). This is a form of regularization, which smooths the regressor and allows it to generalize better. \(\alpha\) is how much we want to penalize against large coefficients, and \(l_{1}\operatorname-ratio\) controls the ratio of each norm we want in the elastic net penalty. The elastic net penalty has the effect of shrinking and zeroing out some of the coefficients in \(\boldsymbol\beta\), thus making our model less susceptible to picking up tiny nuances in the training data and making better predictions on unseen loans.
Specific values for \(\alpha\) and \(l_{1}\operatorname-ratio\) were determined using grid search, which iterates through the space of hyperparameters and finds the combination giving the lowest loss function value via k-fold cross validation.
Analysis and Comparisons of LRv1 and LRv2
Ultimately, what we care most about with LRv2 is that it gives us greater returns than random selection and LRv1. To evaluate this we calculate a Net Present Value Return On Investment (NPVROI), for every loan. To do this, we generate the loan’s cashflow array (as described in the previous section), but instead of using our model’s probability of default we use the platform’s historical probabilities of default by grade or subgrade. Additionally, whenever possible we use actual known cashflows to more accurately depict what occurred. From the series of cashflows, we calculate the NPV using an annual discount rate of 5%, of which is adjustable depending on the duration preference of the LendingRobot Series in question. Finally, we calculate the NPVROI by dividing the NPV by the loan’s original funded amount.
To account for the uncertainty in calculating NPVROI for loans that are still ongoing, we reduce the importance of them in our analysis by using two penalizing weights. The first is a maturity weight, which penalizes a loan’s NPVROI by the fraction of \(\frac{months\ on\ books}{term}\) (a loan that was just issued would be penalized completely to having no weight in the analysis and a 36 month loan issued 4 years ago would take full weight). The second is a “doneness” weight, which penalizes the impact of loans that are far from being done. At first glance this seems similar to the maturity weight, but the difference is that any loan not expecting further payments is done and takes full weight. Thus, defaulting loans and prepaid loans have a full “doneness” weight and loans that have partially prepaid proportionally increase their “doneness” weight.
With the NPVROI and weights in hand we examine average NPVROI for all percentile groups, each group spanning ten percentiles, for each scorer as well as randomly selecting loans. We obtain the following results:
The graphs show the average NPVROI for each rolling percentile group. The top graph shows the analysis done with our penalizing maturity and “doneness” weights, while the bottom graph is the same analysis but using only loans that are done and old enough to be done and therefore have no weight penalty. In both cases, we can see that LRv2 performs as desired; loans that have a higher score have a higher return. Additionally, we are only truly interested in the very top percentiles of scored loans (as the various platforms have much more loans available than we can fund). When sampling over trials (of 10000 loans each) in the top 8 percent of scores for LRv1 and LRv2, the average NPVROIs are 2.2% and 6.1% with standard deviations of 1% and .7% respectively. Nominal returns (without time discount) are 7.2% and 11.1% for LRv1 and LRv2, respectively. So, we believe that LRv2 increases returns compared to LRv1 and happens to do it with slightly less variability in returns.
So what’s changed with LRv2 to cause the improvement? The two most notable factors are more data with which to train our algorithm, and the change to the algorithm itself. LRv1 used a forward selection stepwise regression, and several of the shortcomings of stepwise regression (such as being a greedy algorithm that incrementally adds features into the model) have been addressed in the elastic net.
LRv2’s Role in LendingRobot Series
So we have our shiny new and improved model, but it is only one piece of LendingRobot Series. Again, we believe that one of the largest value-adds of LendingRobot Series is the easy diversification across multiple platforms to target various risk profiles and time horizons. LRv2 does the loan selection at the platform level, and from those platform picks we do additional platform-level allocation to the Series specific investment criteria.
Conclusion
In brief, LendingRobot Series uses a new algorithm to select loans. Improved model performance can be seen in comparison to LRv1 and random selection returns. And as important as LRv2 is to LendingRobot Series, by laying the foundation for picks in each platforms, one should not forget that LendingRobot Series is more than trying to pick the loans with highest returns; it is trying to pick the right loans for the investor.
- Justin Hsi
- 0 Comment