Skip to main content

Intro and recap detection for movies and TV series

By Xiang Hao, Kripa Chettiar, Ben Cheung, Vernon Germano, and Raffay Hamid

Modern video streaming service companies offer millions of video-titles for its customers. A lot of these titles have repetitive introductory and recap parts in the beginning that customers have to manually skip in order to achieve an uninterrupted viewing experience. To avoid this unnecessary friction, some of the services have recently added “skip-intro” and “skip-recap” buttons to their video players before the intro and recap parts start. To efficiently scale this experience to their entire catalogs, it is important to automate the process of finding the intro and recap portions of titles. In this work, we pose intro and recap detection as a supervised sequence labeling problem and propose a novel end-to-end deep learning framework to this end. Specifically, we use CNNs to extract both visual and audio features from videos, and fuse these features using a B-LSTM in order to capture the various long and short term dependencies among different frame-features over time. Finally, we use a CRF to jointly optimize the sequence labeling for the intro and recap parts of the titles. We present a thorough empirical analysis of our model compared to several other deep learning based architectures and demonstrate the superior performance of our approach.

For the full paper, see Intro and recap detection for movies and TV series on the Amazon Science website.

Principal Applied Scientist – Prime Video
Software Development Engineer – Amazon
Senior Principal Scientist – Prime Video