Skip to main content

Our Innovation

At Prime Video, we solve technical challenges and build solutions at scale. Here’s how we do it.

Prime Video invented a new way to set the producer reference time as the global time reference for live-event playback on customer devices.
Prime Video used closed-form equations and solutions to represent and compress audio media timelines in a pattern template, which ensures a second-order, lossless compression for audio stream media timelines.
Simple matrix factorization techniques can be employed to build an accurate and provable clustering algorithm whose performance doesn’t necessarily degrade even if the space of the clusters is close.
Prime Video beat previous state-of-the-art work on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.
During the Winter Conference on Applications of Computer Vision (WACV), Prime Video’s Yongjun Wu and Sriram Sethuraman discussed Video/Audio Quality in Computer Vision, and Hai Wei presented the HDR VQM Grand Challenge awards.
Targeted handling of three distinct types of “special events” dramatically reduces false-alarm rate.
At the European Women in Tech conference 2022, Filippa Hasselstrom, head of low-latency streaming at Prime Video, explained how her team builds the future of live sports streaming using UDP.
Amazon Studios Technology holds workshop to enhance interoperability of the JPEG XS codec, ensuring a healthy environment for this low-latency, high-quality transport that can send on-set camera feeds and graphical workstation outputs between on-premises locations and the cloud.
Two Prime Video papers at the Winter Conference on Applications of Computer Vision (WACV) 2021 proposed neural models for enhancing video-streaming experiences.
The switch to WebAssembly increases stability and speed.
Science teams presented two state-of-the-art works at the Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
Detectors for block corruption, audio artifacts, and errors in audio-video synchronization are just three of Prime Video’s quality assurance tools.
A deep dive into the adoption of Kotlin for developing microservices at Prime Video.
The paper introduced a technique to efficiently compute the difference in cost between two versions of a program and was presented at the conference on Programming Language Design and Implementation (PLDI) 2022.
In this paper, we consider using a multiscale approach to reduce complexity while maintaining coding efficiency.
In this work, we describe the various factors which affect the suitability of a face image for recognition by humans. We propose efficient solutions which can solve the problem without the use of ground truth data. We train a regression model using weak supervision provided by heuristics based on features which affect face quality. Finally, we use professional photography techniques to create standardized and aesthetically pleasing profile images.
In this work, we present a Multi-Lingual (MLi) and Multi-Task Learning (MTL) audio only SER system based on the multi-lingual pre-trained wav2vec 2.0 model.
In this work, we pose intro and recap detection as a supervised sequence labeling problem and propose a novel end-to-end deep learning framework to this end.
This work presents a No-Reference model to detect audio artifacts in video. The model, based upon a Pretrained Audio Neural Network, classifies a 1-second audio segment as either No Defect, Audio Hum, Audio Hiss, Audio Distortion or Audio Clicks. The model achieves a balanced accuracy of 0.986 on our proprietary simulated dataset.
In this work, we develop a data collection pipeline to address long sequence of texts and integrate this pipeline with a multi-head self-attention model.
We show that, (a) audio based approach results in superior performance compared to other baselines, (b) benefit due to audio model is more pronounced on global multi-lingual data compared to English data and (c) the multi-modal model results in 63% rating accuracy and provides the ability to backfill top 90% Stream Weighted Coverage titles in PV catalog with 88% coverage at 91% accuracy.
In this work, we propose LipNeRF, a lip-syncing NeRF that bridges the gap between the accurate lip synchronization of GAN-based methods and the accurate 3D face modeling of NeRFs.
We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
This paper discusses the problem of missing transcription, where the subtitle blocks corresponding to some speech segments in the DEC are non-existent. We present a solution to augment human correction process by automatically identifying the timings associated with the non-transcribed dialogues in a language agnostic manner.
In this work, we propose to adopt a two-tower model, in which one tower is to learn the user representation based on their watch history, and the other tower is to learn the effective representations for titles using metadata.
We propose a simple yet effective approach that uses single-frame depth-prior obtained from a pretrained network to significantly improve geometry-based SfM for our small-parallax setting.
The goal of this work is to assess the importance of spatial and temporal learning for production-related VQA. In particular, it assesses state-of-the-art UGC video quality assessment perspectives on LIVE-APV dataset, demonstrating the importance of learning contextual characteristics from each video frame, as well as capturing temporal correlations between them.
In this work, we present a thorough survey on DNN based VADs on DEC data in terms of their accuracy, Area Under Curve (AUC), noise sensitivity, and language agnostic behavior.
We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow.
In this paper, we present a novel, accurate and efficient method for temporal sync detection between dubbed audio tracks and corresponding non-dubbed original-language audio tracks.

Stories about the builders and innovators at Prime Video

BA Winston, VP of Technology at Prime Video, reflects on eight years, many launches, and reducing latency for live streaming at Prime Video.
Prime Video’s Michelle Dauphiny Becker, Director of Video Search, explains how Amazon empowers leaders with the trust needed to embrace unique perspectives, set bold goals, and effect positive change.
Three students share their experiences about being SDE apprentices at Prime Video in the UK.
It’s hard to tell where some stories begin, but for Girish Bajaj, VP WW Prime Video & Studios Technology, the story of innovation at Prime Video and Amazon Studios started in 2006 and hasn’t stopped since.
Embracing what she calls a “blue-sky opportunity,” the seasoned product manager opens up about her first months at Amazon Studios, helping to create the studio of the future.
The senior software development engineer talks about how he found his career calling and a community that feels like home at Amazon Studios.