Senior Principal Scientist – Prime Video
Prime Video invents new state-of-the-art weakly and self-supervised contrastive learning algorithms to reduce its dependence on large amounts of labeled training data.
Feb 16, 2023
Prime Video uses automatic field registration to create immersive viewing experiences for live sports
Prime Video used computer vision technology to reinvent sports-field tracking for monocular broadcasting videos.
Prime Video beat previous state-of-the-art work on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.
Two Prime Video papers at the Winter Conference on Applications of Computer Vision (WACV) 2021 proposed neural models for enhancing video-streaming experiences.
Feb 01, 2023
In this work, we pose intro and recap detection as a supervised sequence labeling problem and propose a novel end-to-end deep learning framework to this end.
CNN-based audio event recognition for automated violence classification and rating for Prime Video content
We show that, (a) audio based approach results in superior performance compared to other baselines, (b) benefit due to audio model is more pronounced on global multi-lingual data compared to English data and (c) the multi-modal model results in 63% rating accuracy and provides the ability to backfill top 90% Stream Weighted Coverage titles in PV catalog with 88% coverage at 91% accuracy.
We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
We propose a simple yet effective approach that uses single-frame depth-prior obtained from a pretrained network to significantly improve geometry-based SfM for our small-parallax setting.
A comprehensive empirical review of modern voice activity detection approaches for movies and TV shows
In this work, we present a thorough survey on DNN based VADs on DEC data in terms of their accuracy, Area Under Curve (AUC), noise sensitivity, and language agnostic behavior.