Skip to main content

Computer Vision

Content about computer vision at Prime Video.

Science teams presented two state-of-the-art works at the Conference on Computer Vision and Pattern Recognition (CVPR) 2022.
Prime Video beat previous state-of-the-art work on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.
Two Prime Video papers at the Winter Conference on Applications of Computer Vision (WACV) 2021 proposed neural models for enhancing video-streaming experiences.
During the Winter Conference on Applications of Computer Vision (WACV), Prime Video’s Yongjun Wu and Sriram Sethuraman discussed Video/Audio Quality in Computer Vision, and Hai Wei presented the HDR VQM Grand Challenge awards.
Prime Video invents new state-of-the-art weakly and self-supervised contrastive learning algorithms to reduce its dependence on large amounts of labeled training data.
Prime Video used computer vision technology to reinvent sports-field tracking for monocular broadcasting videos.
Actor identification and localization in movies and TV series seasons can enable deeper engagement with the content. Manual actor identification and tagging at every time-instance in a video is error prone as it is a highly repetitive, decision intensive and time-consuming task. The goal of this paper is to accurately label as many faces as possible in the video with actor names.
We propose a novel framework to register sports-fields as they appear in broadcast sports videos. Unlike previous approaches, we particularly address the challenge of field registration when: (a) there are not enough distinguishable features on the field, and (b) no prior knowledge is available about the camera.
To overcome the drawbacks of prior MCTF design, we propose an encoder-aware MCTF (EA-MCTF) that resides within the encoder.
In this paper, we present a large-scale HDR video quality dataset for sports content that includes the above mentioned important issues in live streaming, and a method of merging multiple datasets using anchor videos.
We built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Live stream Database. The LIVE Livestream Database includes 315 videos of 45 contents impaired by 6 types of distortions.
We propose a new prototype model for no-reference video quality assessment (VQA) based on the natural statistics of space-time chips of videos. Space-time chips (ST-chips) are a new, quality-aware feature space which we define as space-time localized cuts of video data in directions that are determined by the local motion flow.