Multilingual Speech Recognition for Low-Resource Languages

Diplomarbeit/Master thesis
Supervisor: Martin Kampel
Status: open

Motivation

Speech recognition has achieved impressive performance for major world languages such as English, Mandarin, or Spanish. However, many languages with smaller speaker communities suffer from a lack of resources for training accurate models. In various scenarios, multilingual speech interfaces can play an important role—e.g. in automated translation, command recognition, or hands-free operation. Therefore, it is essential to develop robust multilingual speech recognition systems that perform reliably even under noisy and real-world conditions.

Goal of the Work

This project aims to explore, evaluate, and improve speech recognition systems that can handle multiple languages, with a particular focus on local processing and low-resource. The project includes selecting or preparing suitable datasets, training multilingual acoustic models, and evaluating their performance using standard benchmarks. The final goal is to demonstrate a prototype system that can recognize spoken input in at least three different languages.

Tasks

  • Literature review on current multilingual ASR (Automatic Speech Recognition) models (e.g. Whisper, MMS by Meta, XLS-R)
  • Selection and preprocessing of open-source multilingual speech corpora (e.g. Common Voice, FLEURS)
  • Fine-tuning or evaluating pre-trained models using suitable toolkits (e.g. HuggingFace, ESPnet, Whisper)
  • Implementation of evaluation metrics (WER, CER) across different languages
  • (Optional) Noise augmentation or domain adaptation to simulate realistic deployment environments
  • (Optional) Real-time demo of recognition system

Required Skills

  • Programming experience in Python
  • Familiarity with deep learning frameworks (e.g. PyTorch or TensorFlow)
  • Interest in speech processing and multilingual systems
  • Basic understanding of machine learning and neural networks

Contact: martin.kampel@tuwien.ac.at