Innovating live video streaming for a VOD-only world
Here’s how Prime Video delivers live video streaming on customer devices that only support video-on-demand (VOD) playback.
Prime Video customers can access thousands of live video streams and sporting events from the NBA, MLB, English Premier League, or NFL Thursday Night Football (TNF). These streams are available on the Prime Video application across a wide variety of customer devices, including mobile, desktop computers, and living room devices such as smart TVs. This article describes one of our innovations that helped us achieve comprehensive device coverage for our live streaming feature.
The Prime Video application on certain streaming devices from the early-to-mid 2010s was shipped with a media player that was only capable of video-on-demand (VOD) playback. After we realized that a subset of these devices could not be updated to incorporate native live-streaming support, we took on the challenge of engineering a system that delivered live content by leveraging the VOD playback capabilities.
This effort resulted in a system that allowed Prime Video’s origin servers to map VOD media requests from these client devices to segments on the live stream in real time. We built a suite of services to maintain this mapping and adapt artifacts from the live stream for VOD playback. The system required no client updates and was orchestrated on the server side.
High-level design challenges
A live video stream is typically delivered as a segmented video stream. This segmented video stream is created by breaking down the incoming video into small segments and publishing the sequence in an index file called a manifest. Video players download the manifest to discover available media before downloading the media itself for playback. Its very nature means that a live manifest is constantly updated to add newly generated content segments. As a result, the player is expected to continually refresh the manifest to find the latest fragments in the stream.
VOD-only players didn’t refresh the manifest and we had to find a mechanism to reliably predict a stream’s future segment sequence
Our first challenge was to adapt the live stream manifest for VOD playback. While the manifest for live streaming is constantly refreshed, there is no need for refreshing for VOD because the manifest never changes. Therefore, our VOD-only players didn’t refresh the manifest and we had to find a mechanism to reliably predict a stream’s future segment sequence and add that information to a VOD manifest.
The second challenge was digital rights management (DRM) because our VOD-only players didn’t support the mid-stream key rotation that a live-capable player would be capable of. Because a mid-stream key rotation would cause playback to fail, we had to ensure that there were no active customer sessions on the stream during key rotation.
Another challenge was ensuring audio video synchronization on the devices. The affected devices did not support all VOD features including explicit values of fragment timestamps, and assumed that both audio and video fragments start at time zero ‘0’. It was crucial to ensure that our VOD video manifest starts at a point where audio and video in the underlying live stream are exactly aligned.
We also had to find ways to introduce live-specific features such as automated failovers and heuristics used for optimizing the live playback experience.
Live to VOD translation
The first part of our solution involved deriving a VOD manifest from the live manifest. We tried extrapolating the sequence in the live manifest but found that our VOD players rebase the start time for audio and video streams to ‘0’ irrespective of the start time provided in the manifest. Therefore, we chose to provide clients with a manifest whose sequence began from ‘0’ and padded it with a fragment sequence whose total duration added up to six hours. We chose six hours to optimize between uninterrupted playback duration and the manifest file size that could be reasonably downloaded and processed on these devices.
Next, we had to configure our origin servers to map the segment requests from the extrapolated VOD manifest onto the live stream. To do so, we decided to include the live sequence number corresponding to the ‘0’ in the VOD manifest, in its URL path. We call this number the ‘t’ value and it signifies the time code number of the latest segment on the live stream at the time of creation of the manifest.
As shown in the following diagram, if we create a VOD manifest when the latest fragment in the live stream is numbered as 100000, the URL for the VOD manifest would include a URL path element denoting this number as
Our VOD manifests use relative segment addressing – players construct segment URLs by replacing the manifest file name with the segment name in the original manifest URL. This allows us to receive the required offset value with every fragment request. The origin server arithmetically derives the live segment corresponding to a given VOD segment request using this value and the value of the segment number requested in the VOD request. This mapping is shown in the arrow labeled
Content delivery network (CDN) cache efficiency
We had to be very careful with our chosen ‘t’ value as described above. Dynamically generating a ‘t’ value for every client playback request to match the live edge at time of the request provides clients with maximum playback time. However, it’s also detrimental to our CDN cache efficiency because we could end up with hundreds or thousands of cache keys indexing to the same fragment.
For example, if we generate a new ‘t’ value every second, clients joining the stream as live head moves from 990 to 1000 would address the fragment number 1000 with URL paths that look like ‘../(t=990)/10’, ‘../(t=991)/9’ . As the cache keys used in CDNs are derived from the URL, we would end up creating ten different cache keys that index to the same fragment. Ideally, a single fragment has only one or several cache keys. This allows for a smaller cache footprint, reducing storage costs and delivery latencies.
To solve this, we anchored the ‘t’ value for the duration of the manifest and used the seek functionality on players to move them to the live edge. In addition to the manifest URL, the players were provided with a seek value at playback initialization. The seek value was constantly updated on the server to reflect the live edge and could range from ‘0’ to the last segment number in the manifest. We built a service that handled both updating the seek point and shifting the anchor point forward at the end of manifest duration.
Ensuring playback length optimization
Our cache optimization strategy presented a challenge with playback duration. As the seek point ranged from 0 to the entire manifest duration, a customer joining the stream when the live head was in the latter half of the manifest would have their playback interrupted after a short duration as the VOD manifest ran out of segments.
While this strategy halved our CDN cache efficiency, it provided a more viable customer experience and any customer joining the stream had at least ‘d/2’ hours of playback
To avoid this, we jumped ‘t’ at a cadence of ‘d/2’ where ‘d’ represents the typical predefined duration of our synthesized VOD manifest. While this strategy halved our CDN cache efficiency, it provided a more viable customer experience and any customer joining the stream had at least ‘d/2’ hours of playback. The typical value of ‘d’ used is six hours based on the size of the VOD manifest file that the devices could support and the typical maximum duration of a live event. This means that in the worst-case scenario, a customer will have three hours of uninterrupted playback before they are required to restart playback.
While we strive for the highest cache efficiency, we traded off doubling our cache footprint to ensure that customers have at least three hours of playback. The following diagram shows the progressions on the ‘t’ value over time.
Ensuring AV synchronization
The media inside audio and video segments in Prime Video live streams is not perfectly aligned for every audio-video segment pair due to differences between audio and video segment durations. Live streaming clients handle the offset between audio and video using timing information associated with each segment in the manifest. However, with our synthesized VOD manifests, both audio and video sequences start at time ‘0’, which makes us lose the ability to provide this synchronization information.
With our choice of encoding parameters and the selection of starting audio and video segments, we can ensure that ‘t’ values are computed from the point of perfect A/V sync and subsequent playback requests are served a seek value in increments of the A/V sync cycle duration. This implies, however, that clients could potentially lag behind live by the duration of sync cycle.
Seamlessly rotating keys during playback
To seamlessly rotate keys without any playback interruption, we decided to create two different outputs on our packager and move customer traffic over to one of the outputs while we rotate keys on the other. As mentioned in the previous section, we have two ‘t’ values in use at any point in time. We bind each output to one of these ‘t’ values and rotate keys on the underlying output after our synthesized VOD manifest finished and before the corresponding ‘t’ value was moved forward.
The following diagram shows how VOD manifests vended from each endpoint are staggered and how we rotate keys at the end of each manifest.
The following diagram shows the daemon processes that generate artifacts and the data required to orchestrate playback. They run independent of any customer traffic.
The following diagram shows the flow of data as a client requests a playback asset.
Despite its complexity, our system is stable, has been continually used in production since 2016, and has brought a live streaming experience to tens of millions of customers, regardless of the device they use for streaming. Without this capability, it wouldn’t have been possible for us to deliver the wide suite of Prime Video live events and linear channels to older devices.
The development and continuous support of this system at Prime Video is one of the many ways that we exemplify the Amazon Leadership Principles of Invent and Simplify and Customer Obsession. It also provides an insight into usability and system design trade-offs made to build economically and technically viable systems that provide the best possible experience for our customers.
Part of what makes working at Prime Video such a gratifying experience is being able to collaborate with some of the brightest engineers and video specialists in the industry to solve challenging technical problems like this one.