Learning Aggregation Functions for Writer Retrieval

Status: available
Supervisors: Marco Peer, Florian Kleber

Deep-learning-based methods for writer retrieval make use of sampling local characteristics of handwriting, for example using patches extracted at SIFT keypoint locations(see Figure 1), to learn discriminative features. To compute a global page descriptor of those local embeddings, state-of-the-art methods rely on fixed aggregation functions, e.g. sum/average pooling [1,2] or advanced strategies like generalized max-pooling [3,4]. Those aggregation functions are predefined and not tuneable (Sum Pooling: each embedding is weighted equally), which can harm the performance, in particular when only a limited amount of handwriting is available.

Figure 1: Interest points are detected by SIFT. Small patches, e.g. 32×32, are extracted and forwarded to a neural network to extract embeddings. (Image Source: Thesis by M. Koepf)

Therefore, the goal of this thesis is to investigate learnable aggregation approaches to improve the retrieval performance. Possible considerable solutions are DeepSets, set transformers, or Learnable Aggregation Functions (LAF), but are extendible according to the student’s interest. Ideally, the method is able to provide interpretable results, e.g. highlighting the most important features.

The approach developed should be evaluated on different public datasets available for writer retrieval (contemporary and/or historical).

[1] M. Peer, F. Kleber and R. Sablatnig, “Towards Writer Retrieval for Historical Datasets,” accepted for oral presentation at 2023 International Conference on Document Analysis and Recognition, ICDAR 2023, San José, California, USA, August 21-26, 2023. arXiv

[2] M. Peer, F. Kleber and R. Sablatnig,  “Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval,” in Frontiers in Handwriting Recognition, Hyderabad, India, 2022 pp. 122–136.

[3] N. Murray, “Generalized Max Pooling,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014.

[4] S. Rasoulzadeh and B. Babaali, “Writer Identification and Writer Retrieval Based on NetVLAD with Re-ranking,” in IET Biometrics (Vol. 11, Issue 1, pp. 10–22), 2021.

Code repositories:

LAF: https://github.com/alessandro-t/laf
DeepSets: https://github.com/yassersouri/pytorch-deep-sets
Set transformer: https://github.com/juho-lee/set_transformer

The thesis consists of

  • Literature Review – getting to know the methods
  • Implementation & Evaluation
    • Extract local features of the documents, e.g. using the approach proposed in [1,4]
    • Implement a framework to evaluate different (non-)learnable aggregation functions
    • Compare your methods with the approaches currently used in writer retrieval
  • Written Report/Thesis and final presentation

Helpful experience

  • Python
  • Basic/Good understanding of deep learning
  • Machine Learning frameworks (preferably PyTorch)