Skip to main content

“We’re just beginning to build the future of live sports streaming”

At the European Women in Tech conference 2022, Filippa Hasselstrom, head of low-latency streaming at Prime Video, explained how her team builds the future of live sports streaming using UDP.

Filippa presenting on stage at the "European Women in Tech" event.

I’m Filippa Hasselstrom and I head up the live low-latency streaming experience at Prime Video. In this article – which is based on my presentation to the European Women in Tech conference 2022 – I will dive deep into streaming and how it has evolved over time, discuss streaming in a nutshell, and examine some of the challenges that we see with live streaming.

Finally, we’ll look at something that’s really close to my heart – low-latency streaming and the experience that we can deliver to customers by using low latency. But first, let’s start from the beginning.

TCP: Great for VOD, not great for live streaming

When TV was only broadcast using terrestrial or analog technologies, everyone was watching the same thing at the same time with very low latency. However, in the past 15 years, streaming has drastically changed the TV landscape.

So, what is “streaming?” Well, the first application to appear using streaming was video-on-demand (VOD). When you’re on Prime Video and watching Jack Ryan or Fleabag (my personal favorite show), you’re watching VOD. It’s an individual experience because you can hit play or pause and stop the stream whenever you want. You have full control.

To stream VOD content, transmission control protocol (TCP)-based technologies were developed (for example, DASH or HLS). These streaming solutions were designed to provide you with the best possible customer experience. They solved for poor internet connections by doing two things: adapting the bit rate based on your network connection and buffering on the client side. Yet, this buffering is great for VOD but not great for live streaming because it introduces latency.

When live and linear applications were introduced, many companies reused TCP technology because it was already available. The predominant thought was “Let’s use what we have.” But no one considered the latency issue, even though it disconnected the streaming audience’s experience from that of the broadcast audience.

So back in the day, we were working on a solution and trying to communicate this emerging latency issue. But not many people were interested in it. Then in 2016, Twitter created a joint streaming experience with the National Football League (NFL). They made a website, put the game feed on one side and the Twitter feed to create a global TV couch. But, what happened was that a user would tweet “Did you see that goal? What a goal!” but another user would reply “What goal?” Customers weren’t seeing the same thing at the same time. This was a great example and helped us to convince people that latency was a problem for the live use case.

Live streaming is a different ball game, and one of its biggest technical challenges is when tens of thousands, hundreds of thousands, or millions of customers joining simultaneously to watch the game. Additionally, things happen during live events – this is what makes them exciting, right? The game can be boring. People leave. Or something unexpected happens and more customers come to view the event. So, live streaming is not only thrilling for the viewers, it is also thrilling for us as tech providers because unexpected things can happen.

The drawback of using TCP to deliver events from the venue directly to your phone

Imagine you want to watch a game on your phone, well before you even receive a stream a lot of stuff happens. The following diagram shows you an overview of the streaming process using TCP.

The diagram shows the live event venue (such as a stadium) with cameras and microphones. The video and audio feeds come from this sports venue and go into the live production facility as a produced feed and sent to the AWS Cloud. The produced feed is then re-encoded and sent to a client device.

The diagram shows the live event venue (such as a stadium) with cameras and microphones. The video and audio feeds come from this sports venue and go into the live production facility where the producer chooses what feeds to send to customers. This is called the “produced feed.” It’s very high quality and is sent to the Amazon Web Services (AWS) Cloud.

The produced feed is the gold standard for the event, so it’s really important that we don’t lose it and we have very secure redundant connections to ensure that it will never be lost. But is the feed now ready to send to customers? Not yet, because it’s a very high-resolution feed. If you’re going to watch the stream on your phone, it needs to be adapted. This is where we re-encode your file so there’s a flavor for every device, which then means it’s perfectly catered to the device that you view the live stream on. We also make sure to create different flavors, pending your network connection.

The feeds are then stored in real time in a live origin. So, are we ready to start live streaming now? No, not yet. This is where it’s important to understand the difference between streaming and downloading. If you’re downloading a file, you have to wait for the entire file to be downloaded. If you’re streaming, it’s actually being chopped up into smaller segments. When the segment is downloaded, you can start viewing immediately.

When you hit the play button, you get what’s called a “manifest file.” This manifest file contains links to where to go fetch these video segments that are encoded in different bit rates. In TCP, these segments have a header and there’s data. In this case that data is video and TCP sends these segments to your phone. If you have a good internet connection, more segments are going to be stored on your phone and your phone will actually ask for a segment with a higher resolution.

The buffering and the control of segments is all client side. It will give you an amazing VOD experience but for live, it will introduce latency and prevent synchronization across devices.

Using UDP to build low-latency live streaming

When we started building our low-latency live streaming experience, we had one vision: bridging the gap between the broadcast audience and the streaming audience to ensure that they saw the same thing and shared the same experience.

We wanted to have a broadcast-like experience, which means low latency in live streaming that’s comparable to that of the live broadcast’s latency. It means it’s in perfect synchronization, not only across all devices but also feeds, metadata, and audio. And it needs to look good both on your phone and on the large smart TV in your living room. We also wanted to ensure seamless ad-insertion, instant channel changes and instant replays.

Now that we had our goals, we designed a solution based on user datagram protocol (UDP), which is a different protocol compared with TCP. As an illustrative example, TCP moves through a room steadily, albeit slowly, in the same direction. UDP, in contrast, is like an Olympic sprinter, running through the room at full speed, not stopping anywhere to take a break.

UDP was developed for applications to be able to communicate with very little latency over the internet but these applications also have to be somewhat forgiving about packet loss. We have taken the best of UDP and added our magic on top of it. Our magic includes making sure that you’re not losing any packets and also includes the adaptive bit rate, making sure that we can deliver the video at the best possible resolution for a customer’s bandwidth. UDP streaming also means smaller packets, not segments, which means there’s no caching or buffering. It’s more like a constant firehose with no latency.

The following diagram shows our low-latency architecture, which is similar to the one for TCP, we pick up after the encoder that creates all these different flavors for different devices.

The diagram shows how Prime Video brings the content into the AWS Cloud origin and that, during the cloud fan out step, they do a point-to-multi-point replication to make sure that all CDNs can be connected. Prime Video is also at the CDN site, and does point-to-multi-point replication all the way to the CDN edge. After this, our customers can begin streaming.

In this architecture, we bring the content into the AWS Cloud origin and there, in the cloud fan out step, we do point-to-multi-point replication to make sure that we can connect with all these content delivery networks (CDNs) out there. We’re also at the CDN site, and we do point-to-multi-point replication all the way to the CDN edge. Now you, as a customer can start streaming.

The following diagram shows this process and what you can see that we’re different because we have a fixed low latency.

The diagram shows how Prime Video brings the content into the AWS Cloud origin and in the cloud fan out step, they do a point-to-multi-point replication to make sure that we can connect with all CDNs. However, this is all done with a fixed low latency. Prime Video is also at the CDN site, and we do point-to-multi-point replication all the way to the CDN edge. After this, the customer can begin streaming.

Our solution is designed to making decisions server side, controlling the latency , this is in contrast to TCP where the clients make decisions based on their network. This makes it possible to guarantee that everybody sees the same thing at the same time.

The future of live streaming

We’re building the future of live streaming for our future customers. We believe that for many of them mobile is going to be their first screen, not their second screen. They might use a big screen to get a better resolution but they’re definitely used to watching video on interactive platforms such as their phone. They’re very engaged with their content and communicate what they’re experiencing. They’re not just passively sitting and watching something.

Furthermore, the trends we see mean that the experience must be personalized. As a customer, we want to be able to choose what we’re watching. We also want to it to be shareable.

This probably means having not just one produced feed from a live event venue but multiple ones. It means that the content is personalized. And finally, it means that the content is interactive and engaging for our audiences.

And we’re already there. At Prime Video, we have a feature called X-Ray that helps you dive deeper into content and find out more. It provides you with live in-game statistics and information. If you joined the game a little late, this feature will guide you on the past events that’s occurred so you can go back on track. The following image shows X-Ray interactivness in action.

The image shows X-Ray in action, featuring a still from a soccer match with key statistics overlayed on it, including the number of passes, pass accuracy, tackles, clearances, corners, fouls, red cards, yellow cards, and important season statistics for both teams.

We’re doing this and so much more at Prime Video, but we’re just at the beginning of building the future of live streaming and entertainment. I think we’re going to see more and more experiences that continue to build on customer engagement, energy, and excitement around participating from a living room in these live sports experiences.

Stay tuned for more from us!

Head of low-latency streaming – Prime Video