Skip to main content

Shaping live sports publishing traffic through a distributed scheduling system

Prime Video applies a distributed workflow scheduling system for publishing live sports content.

The Live Events Publishing team at Prime Video Sports owns a digital supply chain for sports video selection. Typically, a manufacturing supply chain procures raw materials, assembles a product, performs quality checks, and releases the product to customers. At Prime Video, the process is similar and our live selection supply chain gathers the necessary metadata for a live event, assembles the event, performs data validations, and publishes the event on Prime Video.

After beginning to carry live sports events on Prime Video in 2017, our team was tasked in 2018 with building support for entire regular league seasons in different sports. Supporting an entire season’s worth of sports requires the ability to publish events in bulk with minimal human intervention. Up until then, we had supported only a few live events per year; we quickly learned that our downstream systems needed to scale.

The bursty traffic problem

The problem we faced was not new and can be generalized to a class of problem that occurs with bursty traffic. Bursty traffic is when the inbound request rate is unpredictable and has a high standard deviation. It’s challenging to design systems handling unpredictable request rates because the design and infrastructure choices are often a function of expected inbound traffic. At Amazon, we design our services to be horizontally scalable; however, scaling comes at a cost and solving a problem by using money and complexity is not always the best path.

We knew we had to do something drastically different than per-host rate limiting.

Typically, a service can protect itself from a higher-than-usual request rate by using some form of rate limiting. Rate limiting is a class of strategies used to prevent the rate of an operation from exceeding system constraints. In our case, a server-side rate limiting solution (such as request throttling) would prevent downstream impact by rejecting requests that exceed an acceptable request rate. But it would also limit our ability to publish live sports events. Our first approach was a temporary mitigation through a quick implementation using Guava’s rate limiter on a per-host basis. We knew this wouldn’t work out for the long term. Because we were operating a distributed system, we knew we’d need to rate limit the total throughput regardless of the number of hosts.

To make this problem even more complex, we also knew that we not only needed to publish these games but also had to keep them updated over time. For example, a game might be postponed due to inclement weather and we wanted to inform our customers about it as soon as possible. By adding more and more games, this would become more and more of a problem. How could we balance near real-time updates with the need to onboard seasons of sporting events?

We knew we had to do something drastically different than per-host rate limiting.

Leaky buckets, queuing theory, and traffic shaping to the rescue

We started researching two common traffic shaping algorithms (a form of rate limiting) from the networking world: leaky buckets and token buckets. Both of them control the rate of traffic sent through a network.

The leaky bucket algorithm is best visualized as a bucket with a hole at the bottom. Water can be poured into the bucket at any rate, but it flows through the hole at a constant rate. In the token bucket algorithm, individual discrete tokens are collected in a bucket instead of water. Each request requires a certain number of tokens to proceed. If enough tokens are available, traffic continues flowing through the system. If not enough tokens are available, the request is held up until more tokens are available.

The Guava rate limiter gave us a token bucket solution on a per-host level, but for our purposes a leaky bucket approach would work just fine. We just needed to centrally control flow for all hosts in the fleet instead of per-host control. After we decided to go with a leaky bucket algorithm, we had to figure out how to implement it.

We decided to use Amazon Simple Queue Service (Amazon SQS) First-In-First-Out (FIFO) queues and AWS Lambda as the core constructs of our implementation. The queues act as the bucket where requests (known as “work items”) are accumulated and Lambda is used as an SQS poller and houses the leaky bucket logic. Lambda is periodically invoked and releases a preconfigured number of work items in the order that they are received. The following diagram shows a single FIFO queue being processed by our Lambda, which acts as a gatekeeper in how many active work items we’d work on at once.

A work pool with evenly sized items get enqueued in a first-in, first-out (FIFO) manner, and released as active work items on a periodic basis.

A single queue by itself doesn’t ensure that urgent updates get through quickly. To ensure we’d account for any urgency, we used two FIFO queues: high priority and low priority. This allowed our work to be classified into two priority levels and shape the flow to account for both urgency and precedence (in that order). So, we could now make sure that today’s game is updated regardless of how much other work is pending. The following diagram shows our gatekeeper Lambda processing more than one FIFO queue and treating all high-priority work items before looking at any low priority items.

The work pool items now get enqueued two distinct FIFO queues, high and low priority, providing a “fast lane” to preempt lower priority work.

We now had a central, urgency-aware, and seamless work scheduling system. So, we’re done here...right? Not quite!

We had one last problem to deal with: work items weren’t uniform in their processing cost. Some were more expensive to process than others. To account for this, we added the concept of an abstract “cost” and assigned a cost to each work item to declare how expensive it is. In every cycle, the scheduling Lambda was given a fixed amount of abstract “money” that it could spend on work items. Money would be spent on the work items according to their declared cost. If enough money is available, the scheduler releases the item, pays its “cost,” and then moves on to the next one.

This introduces a new problem. Depending on their cost, a work item can take one or more cycles of the Lambda execution before it is released. We introduced a separate “head” SQS queue (standard, non-FIFO) to store the currently processing work item. We could have used something else but chose Amazon SQS to reduce our dependencies and keep things simple. The following diagram shows the head queue where we would park work items waiting for enough resources between Lambda invocations.

The work pool items are now variable in size, and larger items might take more than one period before it should be released. The head queue is a “parking” spot for the work item that’s next to be released into the active pool.

Now we had all the elements in place. We had a centralized, urgency-aware, and seamless way to control publishing flow rate for variable-size work items. The following graph shows our solution in action, with the large spike being flattened over a much longer period of time but the most important work gets done first.

Large traffic bursts get flattened over time, our downstream systems our protected from traffic spikes, and we do the most important work first.

There are limits to what we have described here, such as the possibility of running into resource starvation in which a constant influx into the high priority queue “starves” the work items in the low priority queue. Starvation is a well-known problem with scheduling algorithms and there are potential solutions like tracking the age of a work item and changing priority based on increasing age. We monitor the scheduler system closely to see any signs of starvation or other limits and plan to make improvements if necessary.


Our scheduling system (referred to as “the scheduler”) has been quietly doing its job in the background for the past five years. We’re able to tune the rate of publishing, prioritize urgent publishes over others, perform backfills and, most importantly, not lose sleep over any of it!

When the start time of an upcoming game changes, the request goes through the high priority queue of the scheduler and reaches onsite within a few minutes, while seamlessly handling a concurrent publish request for the full season.

As engineers at Prime Video and Amazon, we take great pride and joy in deploying our skills to delight our customers. Through the scheduler, we deliver the latest game updates to our customers while giving our business team the confidence to sign more deals and increase our selection, which in turn delights even more customers.

Stay tuned for more from us!

Principal Engineer – Prime Video
Senior SDE – Prime Video