Full-Metal Algorithm

“A Blindfolded monkey throwing darts at the newspaper’s financial pages could select a portfolio that would do just as well as one carefully selected by the experts.”

 Burton Malkiel, A Random Walk down Wall Street 1973

Wall street brokers are on the chopping block, and no one will shed a tear over their demise. The public perception is that the financial industry is rife with crooks in thousand-dollar suits who swindle the American people, and the data suggests that even the well-meaning fund managers are outmoded: randomized ‘monkey’ portfolios outperform a weighted index by 1.7% per year. It’s not just computerized ‘monkey’ portfolios, either; in the 1990’s Raven the chimp managed to “beat the best-of-the-best of all mutual funds run by America’s top managers” with her Monkeydex in the 1990’s. (In the UK they use a cat.)

What chance do top-school MBAs have against automation if they already lose to primates and felines?

We are at the beginning of a Robot Revolution. Robots are besting humans in many arenas: In the 1990s Deep Blue was declared the world’s best chess player, in the 2000s Watson was crowned king of Jeopardy. Even industries you would think to be purely in the human realm are starting to be taken over by automation: virtual therapists are being used by the US military to help soldiers cope with PTSD. And depending on who you are, the future looks bleak: approximately 50% of jobs are at risk to be impacted by advances in technology.

Enter the robo-advisor. It is a fund-manager’s nemesis: a dedicated, personal machine-learning algorithm that can make more intelligent trading decisions faster than a human. It works 24 hours a day without complaining, won’t steal from you, and you don’t even have to change its litter box.

Efficiency is King

“Why is it every time I ask for a pair of hands, they come with a brain attached?” – Henry Ford

The start of every major revolution is the desire to be more efficient. More than a century ago Henry Ford revolutionized the automobile business by perfecting the assembly line; Ford turned workers into human machines dedicated to a single and repetitive step in a very complex process. This process shortened the time to build a car from over a day to just 93 minutes.

Ford’s method of breaking complex processes into manageable pieces, specializing production, and eliminating the inefficiencies naturally has evolved into mechanization. Any process that can be quantified is at risk of being automated. Once introduced into a process, automation does not go away.

What is machine learning?

Efficiency extends past the physical world as well. The Efficient-Market Hypothesis states that it is difficult to “beat the market” because the stock market price incorporates and reflects all relevant information.

There is an adage from the 1950’s banking world that makes sense today. It used to be said that bankers would pay depositors 3%, loan it to creditors at 6%, and be on the golf course at 3 o’clock. A few ambitious investors still fight to beat the market, but for most wealth managers that believe in the Efficient-Market Hypothesis it is reasonable to simply place clients’ money in an index fund, collect their fee, and go play golf.

But worry not! There is an arms race happening that is altering the entire financial landscape. Robo-advisors are currently duking it out, each searching for the best prediction algorithms. But the war between robots and humans will already be over before most financial advisors and fund managers start to perceive robo-advisors as a serious threat.

This isn’t to say that robots are smarter than humans: quite the opposite. Robots are very stupid when left to themselves because without proper training a robot could rely on nonsensical and spurious correlations. For instance: a statistical model is 95% confident that per capita consumption of mozzarella cheese is strongly correlated (95.23%) with civil engineering doctorates awarded.

chart

If for whatever reason an abnormal quantity of civil engineers receive their doctorate this year, the robot would conclude that it should heavily invest in mozzarella futures. The United States would end up with a repeat of the US government surplus cheese program.

Obviously this is nonsense, but how can we train a robot to ignore irrelevancies? This is where machine learning comes into play. Machine learning is a type of artificial intelligence that allows computers to learn something without being explicitly programmed to do so.

Tech startups all over are focusing on this branch of research. In the finance world one of those companies is LendingRobot, a robo-advisor in the Peer Lending scene.

What is Peer Lending?

Peer Lending (also known as Marketplace Lending, or Peer-to-Peer Lending) is one of the newest, largest, and fastest growing sections within FinTech – or financial technology. Lending Club and Prosper are two Peer Lending platforms that crowd-fund loans. Each platform is built in the cloud, connects borrowers with lenders, then handles or outsources origination, custody, underwriting, and servicing.

Each individual loan that is placed on the marketplace is inherently risky. Anyone can lose their job, have a medical emergency, or otherwise be unable to repay the lender. For this reason, most loans are sliced into pieces and parceled out to a multitude of lenders. Each lender typically invests in hundreds of loans, which mitigates risk.

This market is perfect for the analysis of machine learning. There is enough data to be statistically relevant; each loan has a set life, a known and definite starting place and initial APR. Additionally, there is no ‘outside opinion’ that can be considered: the Wall Street Journal can run an opinion piece on AT&T’s latest bond issuance that can alter the price, but there are no press releases concerning Lending Club client #4483382. In order to calculate the best returns, the robot simply needs to calculate the probability of default or early repayment.

But even in this scenario there are a lot of intricacies that can alter the results. Who is more likely to default: a loan applicant with a high income, high debt-to-income ratio, and owns their own home but has a low FICO score, or an applicant with a low income, low debt-to-income ratio, rents an apartment and has a high FICO score?

Survival_probability

How to teach a machine

Imagine you have a not-so-bright French cousin named Bernie. He has limited intuition but he really wants to be useful. Bernie’s main attributes are that he is cheap, doesn’t get bored, and follows directions very well.

You decide that a good job for Bernie would be as “Chief Maze Solver” so you put Bernie in the middle of the maze and tell him to get out, then press “Start” on your stopwatch and observe what happens.

Bernie walks straight ahead and hits the wall. He takes two steps backwards, then moves forward again and bounces off the wall. You press “Stop” on your stopwatch. Time for some rules.

“Hey Bernie,” you say “next time you hit the wall, turn to your right by 45 degrees.” Bernie follows your instructions to the letter. Every time he hits the wall he turns and goes much further that he did before, but eventually he winds up getting stuck in the corner. You decide you need another rule – Bernie will always turn right when he hits a wall, but will always turn left if given the choice.

This works pretty well, but part of the way through Bernie gets turned around and ends up wandering in circles again. Oh Bernie, what shall we do?

You have an idea – Bernie really likes cheeseburgers (or as he calls them, a Royale with Cheese); why not place some cheeseburgers at ten-foot intervals throughout the maze? This way, if Bernie doesn’t see a cheeseburger, he will know that he has been to that particular part of the maze before and won’t backtrack. You tell Bernie – “If there is no cheeseburger, then it is OK to turn right at the intersection instead of left.”

This time Bernie makes it out of the maze. You give Bernie a nice glass of red Burgundy and a two-hour lunch break, then stick him in another maze to see if he can figure it out by himself.

After many trial and error runs, Bernie is really tired of eating all the cheeseburgers (maybe next time we should bribe Bernie with celery sticks), but can also consistently exit mazes without help. Well done; Bernie is now Chief Maze Solver.

Training a machine is very similar to training Bernie, except that the machine is like a Bernie who wears jet-powered roller-skates and can do 1,000 mazes per minute.

Real World Example

The financial world’s algorithms are a bit more complex and contain fewer calories. LendingRobot, utilizes six main tools to get their robot out of the decision “maze”:

1) Observations: LendingRobot first simply observes the data and looks for statistical significance.

2) Monte Carlo Method: A fancy way of saying ‘repeat random trials a lot of times.’ LendingRobot uses this method to estimate the standard deviation from an expected value.

3) Stepwise Regression models: This is a way to weed out irrelevancies like engineering degrees correlating with mozzarella consumption.

4) Cox Proportional Hazards Model: Designed for the healthcare industry, this is a statistical survival model that attempts to predict the amount of time that passes before some event occurs. In the healthcare world, this would mean death; in the finance world, this means default.

5) Greedy Algorithms: Greedy algorithms don’t actually care about money – this algorithm always takes the path to maximize short-term gain.

6) Expected Return: The ultimate goal of investing is good returns. Expected return is the predicted outcome.

LendingRobot starts by making statistical observations provided to it by the marketplace. For example, it is observed that for a borrower with a stellar credit score and high-paying job, the amount borrowed and the loan purpose are nearly irrelevant. But for a borrower with a lower credit score and medium income, the amount borrowed and the loan purpose are very important to the probability they will pay back the loan.

After observing for awhile, LendingRobot utilizes the Stepwise Regression models to select a set of predictors that are important for each group (like home ownership, loan purpose, or credit inquiries). The idea is that loans may behave differently depending on their type. The Cox Proportional Hazards Model is then used to calculate the probability of the ‘death’ of the loan. Greedy algorithms then fine-tune the system by testing very small changes in each parameter to find the optimum decision, which is then ranked based on its Expected Return.

Once the robot has been trained, it can calculate the expected return of any new loan it sees, even under unique circumstances, and it can rank each loan less than a second after it appears on the marketplace.

Conclusion

There is no way that humans can compete against this type of system. A robot will have analyzed all the loans and intelligently scored each based on best expected outcome by the time a human could even move a mouse to refresh the webpage.

The robots are in a heavyweight league of their own, while humans are left to compete with the monkeys (or cats).

2 Comments

  1. Michael says:

    Thanks for the article — especially the discussion of the six main tools that you use to build your model. I’ve been wondering what kind of approach you take — as I’ve been playing around with LendingClub data for several months now; trying to predict loan performance.

    Do you apply any NLP on the user descriptions? I’ve noticed some predictive signal in the loans that have a a few dozen words or more.

    Love your articles. Keep up the good work!

    1. Thanks Michael!

      We don’t use NLP on user descriptions yet, since the data isn’t included in Lending Club’s API (although this may change soon). I agree with you, and descriptions may be particularly helpful in secondary market trading, as we may be able to discover if a borrower is avoiding collection calls. The data will tell!

Leave a Reply