Multilingual Speech Recognition for Low-Resource Languages

Diplomarbeit/Master thesis
Supervisor: Martin Kampel
Status: open

Motivation

Speech recognition has achieved impressive performance for major world languages such as English, Mandarin, or Spanish. However, many languages with smaller speaker communities suffer from a lack of resources for training accurate models. In various scenarios, multilingual speech interfaces can play an important role—e.g. in automated translation, command recognition, or hands-free operation. Therefore, it is essential to develop robust multilingual speech recognition systems that perform reliably even under noisy and real-world conditions.

Goal of the Work

This project aims to explore, evaluate, and improve speech recognition systems that can handle multiple languages, with a particular focus on local processing and low-resource. The project includes selecting or preparing suitable datasets, training multilingual acoustic models, and evaluating their performance using standard benchmarks. The final goal is to demonstrate a prototype system that can recognize spoken input in at least three different languages.

Tasks

Literature review on current multilingual ASR (Automatic Speech Recognition) models (e.g. Whisper, MMS by Meta, XLS-R)
Selection and preprocessing of open-source multilingual speech corpora (e.g. Common Voice, FLEURS)
Fine-tuning or evaluating pre-trained models using suitable toolkits (e.g. HuggingFace, ESPnet, Whisper)
Implementation of evaluation metrics (WER, CER) across different languages
(Optional) Noise augmentation or domain adaptation to simulate realistic deployment environments
(Optional) Real-time demo of recognition system

Required Skills

Programming experience in Python
Familiarity with deep learning frameworks (e.g. PyTorch or TensorFlow)
Interest in speech processing and multilingual systems
Basic understanding of machine learning and neural networks

Contact: martin.kampel@tuwien.ac.at