HISTORIAN: a large-scale HISTORIcal film dataset with cinematographic ANnotation


Developing automated tools for sustainable film preservation of extensive historical film collections assumes an understanding of fundamental cinematographic settings. In order to be able to investigate new approaches to detect and classify cinematographic settings, this paper proposes a novel large-scale historical film dataset with cinematographic annotations (HISTORIAN), i.e., shot boundaries, shot types, camera movements. The dataset consists of 98 digitized original analog film reels related to the Second World War and 10593 film shots manually annotated by human film experts. Moreover, annotations for overscan areas such as sprocket holes are included. A baseline film analysis pipeline is introduced and evaluated. To the best of our knowledge, HISTORIAN is the first dataset that covers the challenges and characteristics of historical film documentaries and provides novel possibilities for exploring automatic film analysis tools.

This repository presents a tiny set including a few examples for demonstration.

A link to the Github repository (including helper scripts and readme) can be found here.

Download and Use


This database may be used for non-commercial research purpose only. If you publish material based on this database, we request that you include a reference to our paper [1].

[1] Helm D., Jogl F., and Kampel M., “HISTORIAN: A Large-Scale HISTORIcal Film Dataset with Cinematographic ANnotation,” 2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 2087-2091, doi: 10.1109/ICIP46576.2022.9897300.