Deep Learning for Papyri

The CVL offers bachelor/master theses or student projects in the domain of deep-learning-based document analysis for papyri.

Supervisor: Marco Peer
Status: open

First Picture: Vesuvius Challenge
Second Picture: Peer and Sablatnig, HIP’23
Third picture modified from FAU Erlangen

Motivation

Greek papyri, ancient documents made from a type of paper, offer valuable insights into the past. They help us discover old writings that were thought to be lost forever. These documents are often written very carefully, with letters that aren’t connected, similar to fancy handwriting. While there are many of these papyri, they usually don’t have clear dates or information about who wrote them, unlike newer manuscripts and suffer from degradation due to their age and preservation conditions. However, studying them is important because they teach us about how ancient societies functioned, including their culture, organization, and daily life. With the advancement in deep learning for computer vision, they are an interesting field for applying and further improving document analysis approaches.

You might have heard of the Vesuvius Challenge which is also dealing (or dealt) with papyri – basically making the handwriting visible for further studies, such as image enhancement/binarization, writer identification/retrieval or character localization and detection, all of those relying on deep learning. Fortunately, there is  a variety of research for papyri ongoing, with plenty of data available.

Topics

The topic is defined after discussion and we are open for your ideas, e.g., as mentioned above

  • Writer Retrieval/Identification (see here for recent papers). Datasets, e.g. GRK-50/120 or PapyRow.
  • Image binarization [1] – investigating network architectures, data augmentation or advanced techniques such as self-supervised pretraining,
  • letter detection/classification [2]
  • or a combination of the tasks!

[1] DIBCO2019 Challenge and dataset https://ieeexplore.ieee.org/document/8978205
[2] ICDAR2023 Competition on Detection of Greek Letters https://lme.tf.fau.de/competitions/2023-competition-on-detection-and-recognition-of-greek-letters-on-papyri/

Prerequisites

The programming should be done in Python with Pytorch – for that, basic knowledge of Python is appreciated. We also provide access to the Vienna Scientific Cluster (VSC) for GPU access.

Parts of the thesis

  • Literature review + writing a specification
  • Development of a approach for the topic chosen (pytorch)
  • Evaluation
  • Writing and documentation
  • Optionally: Summarizing your work for publication

Participation in our seminar (either Bachelor or Master) is recommended/mandatory and includes short presentations of your progress. Duration depends on your availability and scope of the work (Usually: up to 6 months for bachelor thesis or student project, master thesis 6-12 months)

Contact: mpeer@cvl.tuwien.ac.at