MSBin – MultiSpectral Document Binarization

This dataset is named MSBin which stands for MultiSpectral Document Binarization. The dataset is dedicated to the (document image) binarization of multispectral images. A ReadMe is contained within the dataset and also available here: ReadMe The dataset can be downloaded from Zenodo: The dataset is introduced in: Hollaus, S. Brenner and R. Sablatnig: “CNN based … Continue reading MSBin – MultiSpectral Document Binarization

Ruling Database

The CVL ruling dataset was synthetically generated to allow for comparing different ruling removal methods. It is based on the ICDAR 2013 Handwriting Segmentation database [1]. It was generated by synthetically adding four different ruling images resulting in a total of 600 test images. The pixel values are: 255 background 155 ruling 100 text 0 … Continue reading Ruling Database


An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting The CVL Database is a public database for writer retrieval, writer identification and word spotting. The database consists of 7 different handwritten texts (1 German and 6 Englisch Texts). In total 310 writers participated in the dataset. 27 of which wrote 7 texts and … Continue reading CVL-Database

ICDAR2013 – Handwritten Digit and Digit String Recognition Competition

Introduction Handwriting recognition is an open research topic in the document analysis community. We provide two new, freely available real world datasets for an established problem. The competition consists of two independent tasks, namely segmented single Arabic digits and Arabic digit strings. Contributions will be accepted for either of the competitions. The dataset of segmented … Continue reading ICDAR2013 – Handwritten Digit and Digit String Recognition Competition