Automated Analysis of NHM Herbarium Collection

Supervisor: Michael Reiter, Florian Kleber

Start: as soon as possible

Problem Statement

A herbarium is a collection of preserved plant specimens with information from the collector and additional data. The Herbarium of the Natural History Museum in Vienna (W) was established in 1807 and is now ranked amongst the top five botanical collections in the world. Current holdings in NHMW are approximately 5.5 million plant specimens. The herbarium is especially rich in types (the plant specimen to which the scientific name of that species is attached) with around 200,000 type specimens. An example can be seen in the image.

Herbarium – Beispiel

The digitized images can easily be provided to researchers. However, efficient search tools are needed to explore the collection of 5.5 million specimens. Thus, the handwriting itself, the provided metadata, the layout of the label, and the flowers themselves can be used as information.

 

Goal

The overall goal of the diploma thesis is to develop deep learning-based methods which can model similarities of plants based on the images. In addition, the layout of the label should be classified. Another possibility is to classify the handwriting or connect the image information with the metadata. The thesis will be specified together with the supervisors.

Workflow

Literature research
Diploma thesis specification
Implementation / Evaluation
Written thesis

Requirements

Python
Knowledge in Computer Vision (associated lectures)
Knowledge in Deep Learning (PyTorch) and Machine Learning (associated lectures)