Black Crow's LTV predictions are critical infrastructure for our partners, who use them to focus paid ads spend on high-value customers, assess how successfully they're acquiring high-value customers, and understand those high-value customers better. Because these predictions are so powerful, partners understandably want to know how they're generated and how reliable these predictions are. In this article we hope to answer these questions, but please talk to your CSM if you'd like to discuss our LTV predictions further, or open a ticket here. Let's get into it.
We start with the data
There are two key sources of data that go into our predictions:
- Shopify orders from partner shops
- Browsing data from partner sites
From there we can generate a lot of signals for our model. Some examples:
- Total amount spent
- Exact combination of products ordered
- Any discount codes used
- Customer's location
- Number of sessions logged prior to ordering
- Products browsed prior to ordering
And so on. There are hundreds of signals like these that we can generate for every order; some more important than others, but all contributing to the accuracy of the model.
One important call-out: in building a model for a partner, we use that partner's data and only that partner's data. We don't share data across shops. Every model we build is bespoke for our partners.
Then we train a model
Once we have access to the data we need, we can train a predictive LTV model for a partner. We look over the past couple of years of orders: not so far back that the data isn't relevant, but far enough back that we have a good, representative sample size. We predict LTV for the six months following a customer's order, so our predicted LTV is the total amount we expect a customer to spend in six months, including their initial order. Customers may continue to be valuable after that six-month window, of course, but we find that six months is enough time to identify our partners' highest- and lowest-value customers.
We use a combination of classification and regression models to generate our predictions; let us know if you'd like to get more into the details than that, our data scientists would be happy to nerd out on it. The output is a single predicted value for each customer, for instance:
- Customer 1: Spends $50 on their initial order, we predict $67 in future revenue, total predicted LTV is $117
- Customer 2: Spends $55 on their initial order, we predict $121 in future revenue, total predicted LTV is $176
It sure would be great to know who was customer 1 and who was customer 2 up front, right? That's what our model tells you. Of course, this assumes that the model works, which brings us to our next step ...
We make sure the model is on point
So here's the crucial part. None of the above matters unless our model makes predictions that match what the customer actually does in the future. I won't bury the lede - it does - but let's get into how we know.
So the first thing we do is that when we're training the model, we test it against historical orders from that two years of past data we're looking at. Not against the same orders we trained on - that would be cheating - we hold out 25% (or less, depending on the size of the shop) of orders from model training just for testing. That way we're grading the model on orders it's never seen before.
So when we run the model on those test orders from the past, we know what the model predicted the customer would spend over the next six months, and what the customer actually spent over the next six months. Let's see how this looks for a model we trained recently for a new partner, using the lovely charts that our fancy modeling software generates:
Let's break down what we're looking at: we've applied our model to past customers that the model has never seen. We've sorted those customers into ten groups by their predicted LTV, low to high, so the customers with the lowest predicted LTV are counted in decile 1, and the customers with the highest predicted LTV are counted in decile 10. The green bars show you the average LTV we predicted for each group, so of course that increases as we go left to right. And the differences are fairly dramatic: we predict 8x more LTV from the highest-LTV customers than the lowest.
So that's what the model predicted. How much LTV the customers actually generated is represented in the fuchsia bars. If the model were really bad, those bars would be all over the place, and bear little relation to what we predicted. But instead, the actual LTV matches the predicted LTV very well: for every customer group, the bars are about the same height. We systematically quantify this difference between predicted and actual LTV, and ensure that every model we generate for a partner meets a high standard of accuracy.
We also apply a similar process to predictions the model is making today. In six months, we'll be able to check today's predictions against what actually happened, just as we can with those historical orders. We do the same analysis, and we see that the model is similarly accurate. We also check in after one month and three months to make sure the model is on track.
Then we deploy
Once we have a good model, we get to the fun stuff: we push its predictions to Meta, and our portal, so our partners can acquire more of the high-LTV customers that really drive revenue. We send events once a day and retrain our models every 30 days to ensure that we're up to date, and we're always testing new signals or ways of processing the data. If you think that sounds cool - and honestly? it is kind of cool - then have a chat with your Black Crow CSM about our predictive LTV products.