READ | Computer Vision Lab

READ (Recognition and Enrichment of Archival Documents)

The overall objective of READ is to implement a Virtual Research Environment where archivists, humanities scholars, computer scientists and volunteers are collaborating with the ultimate goal of boosting research, innovation, development and usage of cutting edge technology for the automated recognition, transcription, indexing and enrichment of handwritten archival documents. This Virtual Research Environment will not be built from the ground up, but will benefit from research, tools, data and resources generated in multiple national and EU funded research and development projects and provide a basis for sustaining the network and the technology in the future. This ICT based e-infrastructure will address the Societal Challenge mentioned in Europe in a Changing World namely the “transmission of European cultural heritage” and the “uses of the past” as one of the core requirements of a reflective society. Based on research and innovation enabled by the READ Virtual Research Environment we will be able to explore and access hundreds of kilometres of archival documents via full-text search and therefore be able to open up one of the last hidden treasures of Europe’s rich cultural hertitage. One of the main ambitions of READ is to revolutionise access to handwritten document collections.

transcribus — Transcribus – Expert Interface

Often handwriting is hard to read for human beings, all the more it is for machines. Handwritten scripts are as individual as their writers. Not only do they vary from one country to another, they are also dependent on the language, the character sets, abbreviations, and writing styles in use during a given time period. The same is true for the layout of archival documents. In READ we will find administrative records from ministries, councils or communes; court decisions and treaties; lists of the names of immigrants and emigrants; personal letters, note books, diaries and postcards; cadastre books and maps; shipping records; and many, many other types. READ will do an automated segmentation and transcription (HTR processing) and it will also be possible to train specific scripts.

The READ Platform will also consist of a website (directly connected with the platform’s central database) which will on the one hand be the main source of information for all users (registration, screencasts, guidelines, support) and on the other hand serve as a digital library application for the collections maintained within READ. Typical interactions, such as full-text searching, Keyword Spotting, or Image Retrieval can be performed via the READ website.

For details see http://read.transkribus.eu.

Project Partners:

University of Innsbruck, Austria (Co-ordinator)
Technical University Valencia, Spain
University College London, UK
National Centre for Scientific Research “Demokritos”, Greece
National Archives, Finland
University of London Computer Centre, UK
Technical University Lausanne, Switzerland
University of Rostock (CITlab), Germany
Xerox Research Centre Europe, France
Zurich State Archives, Switzerland
Leipzig University, Germany
Passau Diocesan Archives, Germany