Skip to main content

Machine Learning

Content about machine learning at Prime Video.

In this work, we propose to adopt a two-tower model, in which one tower is to learn the user representation based on their watch history, and the other tower is to learn the effective representations for titles using metadata.
We show that, (a) audio based approach results in superior performance compared to other baselines, (b) benefit due to audio model is more pronounced on global multi-lingual data compared to English data and (c) the multi-modal model results in 63% rating accuracy and provides the ability to backfill top 90% Stream Weighted Coverage titles in PV catalog with 88% coverage at 91% accuracy.
This paper discusses the problem of missing transcription, where the subtitle blocks corresponding to some speech segments in the DEC are non-existent. We present a solution to augment human correction process by automatically identifying the timings associated with the non-transcribed dialogues in a language agnostic manner.
In this work, we develop a data collection pipeline to address long sequence of texts and integrate this pipeline with a multi-head self-attention model.
In this work, we describe the various factors which affect the suitability of a face image for recognition by humans. We propose efficient solutions which can solve the problem without the use of ground truth data. We train a regression model using weak supervision provided by heuristics based on features which affect face quality. Finally, we use professional photography techniques to create standardized and aesthetically pleasing profile images.
In this work, we present a Multi-Lingual (MLi) and Multi-Task Learning (MTL) audio only SER system based on the multi-lingual pre-trained wav2vec 2.0 model.
In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification.
A Prime Video paper discusses one of the fundamental clustering techniques in machine learning, examining its key performance factors, and revealing the underlying mechanism used to find the clusters.