How simple bloom filters helped Prime Video solve a duplicate problem
An unlikely customer anecdote and some simple bloom filters helped a Principal Engineer solve a duplicate problem on the Prime Video homepage.
Like many success stories at Amazon, this one begins with a customer anecdote.
One evening a few months ago, my wife and I, like millions of other people, were casually flipping between different streaming services and trying to decide what to watch. Neither one of us was particularly invested in the outcome. We idly scrolled, clicked a bit, and we’d sporadically say “meh” at each other’s half-hearted suggestions.
On the Prime Video application, my wife suddenly became annoyed and asked: “Why are there so many duplicates? Look! I scroll down and there are the same titles, again and again. Let me count! This one shows up three times! This one shows up four times! Why can’t you fix this, Mr. Fancy Principal Engineer?”
This was a decidedly different tone to the other “mehs” because my wife was genuinely frustrated about this duplication of titles. Fortunately, I work at Prime Video. I could figure this out. I could find out why this happens and then maybe do something about it.
Finding the problem
The Prime Video storefront works by loading carousels in “page” size chunks. A page is a list of carousels (for example, the page with the “home” ID is what customers see when they start up the application). On the first page, customers see ten carousels and then, as they scroll down, we load the next ten. The first chunk of ten is page one and the second chunk is page two. Every carousel’s content is personalized. Not just personalized to a cohort but personalized for every customer. Even for a given customer, they’ll see different carousels depending on what device they’re on and where they are in the world. Of course, this will make any self-respecting cache burst into tears of rage and indignation. The key space is way too large for any kind of caching or precalculating solution.
The next important factor is that at Prime Video’s scale, we work pretty hard to avoid holding session state. For example, we don’t store anything server-side for what the customer saw on page one. Instead, any information we need to build page two from page one must be in the paging request. Clients use an opaque page state token they receive in every page response in subsequent paging requests.
The unfortunate side effect of this stateless approach is that each page showed customers its “best” content. Great titles on page one that are in carousels on page two are still the best and will show up again. This means that customers end up with too much of a good thing.
A naive approach would be to ask the clients (devices) to remove duplicates. They have all the content, so why can’t they remove them? Well, Prime Video has nine different clients and each client organization has multiple teams. Coordinating this business logic across all these teams would mean that customers will experience divergent content across their devices. The logic is also fiddly (for example, in some factual cases like a top ten list, we might not want to remove duplicates), so replicating it across clients will be met with a lot of angry, disappointed looks and “How could you?” from our future selves when we looked back at this decision. Even if we didn’t mind this, we’d also have to wait for all customers to upgrade their devices.
Building the solution
To remove these pesky duplicates without client work, we needed a way to tell page two about the awesome things that happened on page one. We could persist them server-side all in a store or a cache, but the cost becomes mind-bogglingly high at our scale. It also means that by the time a customer is on page ten, we’re checking hundreds of titles against thousands of titles for duplicates.
I needed a data structure that was size-bounded (so that things don’t get worse as customers keep scrolling) and small enough to pass back-and-forth with every request. I could use the existing opaque paging state token to stuff this data in to avoid any new device-side logic. Most importantly, I didn’t need to find a perfect solution, I just needed to make the problem less awful. Enter bloom filters.
A bloom filter is a space-efficient probabilistic data structure. You can put a lot of “Things” in it, and then ask if it’s seen a “Thing” before. The space efficiency comes from the filter storing a small hash of the Thing, as long as you don’t mind the odd false positive. If you’re curious about the implementation details, there’s a wealth of information about this available on the internet.
For my purposes I didn’t need anything particularly clever. I chose to use Guava’s built-in Bloom Filter class. Over a few days I took this up as a hobby project, adding the filter, and passing it through to an encoded token to devices, which then dutifully returned it back in their paging requests so I could reconstruct the filter and check for Things, and throw in new Things (up to a configured limit, beyond which the filter becomes “full” and the false-positive rate shoots up). Selecting the “right” size for the filter is, as everything, a tradeoff. The more accurate the filter is (more items, lower false positive rate), the larger the payload increase. The larger the payload, the bigger the impact to end to end latency.
Testing it out
After I wrote the code, the next step was to send it into production and see what happens. Of course, I had to verify that the new experience was better for customers. I needed an A/B test to verify whether my changes were better in some measurable way.
To run my test, I used our internal version of Amazon CloudWatch Evidently. I knew that the customer experience must become nicer. Heeding the stern warnings of my Fourth Grade English teacher that you must never use “nice” to describe anything, I now needed to somehow quantify content “nice.” But was that 10% nicer? Six absolute “nices?”
At Prime Video, we have many output metrics. These are metrics that we can measure for how customers interact with our service, things we can’t directly control. Input metrics are the things under our control, so if we select the right input metrics to optimize then hopefully we would see a bump in output metrics.
The challenge here was choosing the correct output metric. Reducing duplicates isn’t going to increase Amazon Prime signups, so clearly that’d be a poor success metric to pick. We finally landed on converted coverage as the right metric. Converted coverage is the average number of unique TV show seasons or movie titles watched by a thousand customers across a locale. The theory is that if customers see more unique titles and less duplicates, the number would go up and customers would watch a wider variety of titles.
After I chose my success metric and put in the A/B allocation code, I launched my bloom filter and waited anxiously for the results to start rolling in. I had two treatments for different filter sizes. One treatment would deduplicate across four pages and the other treatment across six pages. I chose the two treatments based to minimize the increase in payload size, to get some benefit from the deduplication without impacting customer perceived latency.
The experiment ran for a month; a shorter duration would risk not getting statistically significant data on certain customer groupings (such as occasional customers, or customers that haven’t streamed anything in three months), and I was in no particular rush (this was a hobby project after all). The results were surprising. It turns out that customers don’t mind some duplication if its sufficiently spaced out. The treatment that won out was the four page one. What was more surprising was how much of a win this seemingly trivial change delivered. It drove millions of new title discovery streams and significantly impacted the converted coverage metric that I’d chosen as the launch criteria.
So, what’s the lesson of this story? For me, it’s that you should take all customer anecdotes seriously. A late-evening quibble with my wife led to an awful lot of fun puttering about in codebases that I was not intimately familiar with. Mostly though, I was reminded that no matter how big, complex, or daunting your space is, curiosity and bias for action can still be massively impactful!