Raffay Hamid

Graphic with the text: "Label-efficient video-content understanding."

How Prime Video uses contrastive learning to accelerate automatic video-understanding at scale

Prime Video invents new state-of-the-art weakly and self-supervised contrastive learning algorithms to reduce its dependence on large amounts of labeled training data.

Raffay Hamid

Feb 16, 2023

Graphic with the text "Using CV to reinvent sports-field tracking."

Computer Vision

Prime Video uses automatic field registration to create immersive viewing experiences for live sports

Prime Video used computer vision technology to reinvent sports-field tracking for monocular broadcasting videos.

Raffay Hamid, Xiaohan Nie

Feb 13, 2023

Graphic with the text "Creating a faster and more precise CV model."

Computer Vision

Automatically identifying scene boundaries in movies and TV shows

Prime Video beat previous state-of-the-art work on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.

Shixing Chen, Xiaohan Nie, David Fan, Dongqing Zhang, Vimal Bhat, Raffay Hamid

Feb 09, 2023

Graphic with the text "Prime Video presents at WACV 2021."

Computer Vision

Prime Video’s work on sports field registration, recap/intro detection

Two Prime Video papers at the Winter Conference on Applications of Computer Vision (WACV) 2021 proposed neural models for enhancing video-streaming experiences.

Raffay Hamid

Feb 01, 2023

Graphic with the text "Prime Video presents at CVPR 2022."

Computer Vision

Prime Video presents two papers at CVPR 2022

Science teams presented two state-of-the-art works at the Conference on Computer Vision and Pattern Recognition (CVPR) 2022.

Raffay Hamid, Xiaohan Nie, Shixing Chen

Feb 01, 2023

Computer Vision

Intro and recap detection for movies and TV series

In this work, we pose intro and recap detection as a supervised sequence labeling problem and propose a novel end-to-end deep learning framework to this end.

Xiang Hao, Ben Cheung, Raffay Hamid

Jan 02, 2023

Computer Vision

CNN-based audio event recognition for automated violence classification and rating for Prime Video content

We show that, (a) audio based approach results in superior performance compared to other baselines, (b) benefit due to audio model is more pronounced on global multi-lingual data compared to English data and (c) the multi-modal model results in 63% rating accuracy and provides the ability to backfill top 90% Stream Weighted Coverage titles in PV catalog with 88% coverage at 91% accuracy.

Mayank Sharma, Xiang Hao, Raffay Hamid

Jan 02, 2023

Computer Vision

Robust cross-modal representation learning with progressive self-distillation

We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.

Shixing Chen, Raffay Hamid

Jan 02, 2023

Computer Vision

Depth-guided sparse structure-from-motion for movies and TV shows

We propose a simple yet effective approach that uses single-frame depth-prior obtained from a pretrained network to significantly improve geometry-based SfM for our small-parallax setting.

Xiaohan Nie, Raffay Hamid

Jan 02, 2023

Computer Vision

A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows

In this work, we present a thorough survey on DNN based VADs on DEC data in terms of their accuracy, Area Under Curve (AUC), noise sensitivity, and language agnostic behavior.

Mayank Sharma, Raffay Hamid

Jan 02, 2023