Building a home for live sports at Prime Video
Prime Video teams had ambitions to add support for live sport broadcasts and scale from 20 live events a year to thousands. Here’s how we did it.
In 2017, live sports were added to Prime Video and today customers can watch different sports such as NFL Thursday Night Football (TNF) and the English Premier League, all from inside the familiar Prime Video experience.
In this article we’ll discuss some architectural challenges that we faced while building a solid foundation for live sports on Prime Video. In subsequent articles for this series, we’ll do deep dives into the cool features and technology that power this customer experience.
Access to sports is an important reason why folks don’t cut the cord and cancel their cable subscriptions. Adding sports to Prime Video meant that we could bring sports content to millions of Amazon Prime customers in over 200 countries and territories on thousands of device types.
To deliver for our customers, we had to build a new set of capabilities and operational support that we didn’t yet have. Building this within an established architecture and setting ourselves up for the future extension was an especially fun challenge. While we initially had to meet our very demanding goals, we used techniques (for example, the Strangler Fig pattern) to refactor the broader architecture to build a sustainable home for live sports.
Pizza and football
Here at Amazon, we prefer deep ownership by small and independent teams aligned around the concept of web services. The rule of thumb is that two pizzas should be enough to feed all members of any given dev team.
Amazon grants deep ownership to its two-pizza teams and they own all decisions regarding their software, including which languages and frameworks they want to use, deployments, testing, and operations. Amazon pioneered the use of service-oriented architectures in the late nineties and today it’s still the dominant model for our backend architecture. The combination of two-pizza teams and small services (also sometimes called microservices) is baked into Amazon’s DNA.
The challenge was that we needed to keep our legacy functionality operational while simultaneously building new features and a new catalog system, all against an immovable deadline.
This combination has proved to be a good fit for Amazon over the years. It allows a large number of dev teams to build in parallel and promotes the reuse of existing functionality. Services also allow for fine-grained scaling of individual components and have small blast-radiuses in the case of failure, which generally means faster recovery during outages. This type of architecture isn’t for everybody, but because of Amazon’s size and scale, this architecture works for us. It’s truly astounding to see how quickly Amazon teams can mobilize dev teams to launch new things, and live sports on Prime Video was no exception.
One of the hardest things to do at Amazon is make a significant change to a large, well-established system. We’d like to have an intentional, well-designed architecture; however, because system complexity compounds as more services are added and our decentralized model makes it difficult to coordinate changes, the risk is that we end up with a haphazard architecture that gets more difficult to extend as time goes on.
In 2017, we launched NFL Thursday Night Football (TNF) on Prime Video not by targeting native support for live content but by baking in thousands of edge-cases into the existing services and creating a small number of new services only when absolutely necessary. It was a monumental effort to coordinate all teams involved but it came at the cost of taking on a mountain of tech debt.
We had launched and customers were delighted that they could watch NFL games from within Prime Video, but the system was brittle and had high operational cost — we could only support about 20 broadcasts a year. We had a laundry list of features and capabilities that we needed to add, and thousands more events to support. Now that we had some headroom post-launch, it was time to roll up our sleeves and refactor our architecture.
Immovable deadlines and a tight squeeze
Although there were several anchor systems at Prime Video, we wanted to start with the Prime Video catalog services. The Prime Video Catalog is comprised of tens of millions of shows (episodic content) and movies records. It stores and vends rich and accurate metadata localized to dozens of languages at scale. We overloaded unused and deprecated catalog attributes used for episodic content with live sports metadata for launch. We were running out of attributes to overload and the structure of episodic content was affecting usability for sports like tennis. Modeling as episodic content forced users to scroll an unreasonable amount to find the match they were looking for. What we needed was native support for live sports in the Prime Video catalog.
The catalog team wanted not just to build a native representation of live sports into their systems, but to support a more robust model that would make it easy for partner teams to add future extensions. However, this next-generation video catalog was going to take time to do right. It was going to take close to a year to build the new system up and to migrate their existing clients.
We targeted a launch with the new catalog architecture along with a bevy of new features for the 2019 US Open and a very high-profile launch of the English Premier League just a couple months later. The challenge now was that we needed to keep our legacy functionality operational while simultaneously building new features and a new catalog system, all against an immovable deadline.
It was going to be a tight squeeze to get it all done in time and we would have missed our goals if we had not used Strangler Fig architectural pattern. The Strangler Fig pattern – named after a type of tree that competes for light in dark forests by climbing onto an established tree and eventually replacing it – is used in service architectures when there is a desire to incrementally move from a legacy system to a modern system.
Leveraging this pattern avoids issues with a “big bang” switchover by eventually replacing old functionality with the new system via a smart router. Eventually, the legacy system is removed and fully replaced by the modern system.
The following diagrams show the evolution of catalog systems over time using the Strangler Fig pattern. In the first phase manual data entry is replaced with a new upstream data source (Live Sports Catalog) and a publishing component while the next generation catalog is built in isolation.
After this is complete, the publishing system is modified to act as a Smart Router as clients are gradually moved to the new system. Over time, the proportion of clients of the legacy system decreases until all clients are using the new system. At this point the legacy catalog can be safely spun down.
First, we lifted the canonical representation of live broadcasts from the overloaded episodic content representation and into a new suite of services that published back into the legacy catalog. While this phase didn’t add any net-new functionality, it allowed us to create a natural and purpose-built model for live sports that encapsulated the essential attributes of a sporting event, the number and type of streams it had, which teams were playing, and the time it was going to start. We created new publishing services that served as a smart router, which allowed us to intelligently move from the legacy system to the new one, with some amount of time running in dual mode.
After we decommissioned the legacy catalog, we could remove a ton of tech debt that we had racked up. We were able to light up several features and capabilities that were enabled directly and indirectly by the next-generation catalog, including multiple streams per game, full-game replays, and real-time stats and highlights for X-Ray. We built tools to scale up our operations and have since supported tens of thousands of live broadcasts.
But the biggest benefit of the native support enabled by our flexible catalog is that catalog consumers that aren’t involved with live sports are now no longer disrupted when we make a change.
Looking to the future
The catalog is just one of several extension points that our team drove within Prime Video’s architecture.
We’re on similar journeys with other large anchor systems, such as automated merchandising, recommendations, personalization, and search. But our team doesn’t exclusively work on big architectural challenges. We also have teams working on cutting-edge machine learning/computer vision (ML/CV), new immersive and innovative customer experiences, further increasing our selection.
Stay tuned for more from us!