A Brief Survey of Video Keyframe Extraction

Video is the currently most important information transmission format and entertainment way in the current Internet world. The public consumers are more and more addicted to the online video content. Among the video channels, TikTok becomes famous for its algorithm of video recommendation. Therefore, the analysis of video content is fundamental technology in the current tech world.

Video is a complex information representative format in the computer on Internet. Video is composed of the video channel and the audio channel. The information in audio channel contains the speech and/or the music, with the background sound sometimes, while the speech could be from one or multiple people. While the video channel is composed of multiple frames containing the foreground objects and background image. Very often, the frames may contain the embedded text and captions too.

Since each shot of the video is composed the similar duplicated frames, video itself is a format with much redundant information. In all the technology of video analysis, keyframe extraction is the most important and fundamental approach to remove the duplicated visual information but keep the most unique visual information. The extracted keyframe is also the basis for the higher-level video processing, like video recommendation system and video indexing system.

Traditionally keyframe extraction technologies can be classified into sampling-based, shot-based, and clustering-based techniques. In the recent years, many machine learning based approach of keyframe has been proposed to enhance the accuracy and efficiency of the keyframe extraction. Skip RNN [2] is one of the successful keyframe extraction algorithms by machine learning. It considers frame sequences of video as the sequence for the RNN but detect the frame with duplicated information to skip in the RNN sequences. Its code is available in [1].

Currently we could find the ready-to-use library including the function of keyframe extraction, like Katna[3]. It can extract keyframe with parallel computing with the accelerated speed. Katna could also be used to resize video, crop image, and resize image.

Next time we will use code example to explain the practical skills of keyframe extraction.

[1] https://github.com/imatge-upc/skiprnn-2017-telecombcn

[2] Victor Campos, Brendan Jou, Xavier Giro-i-Nieto, Jordi Torres, and Shih-Fu Chang. “Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks”, In International Conference on Learning Representations, 2018.

[3] https://katna.readthedocs.io/en/latest/