Diplomarbeit/Master thesis
Supervisor: Martin Kampel
Status: open
Motivation
Speech recognition has achieved impressive performance for major world languages such as English, Mandarin, or Spanish. However, many languages with smaller speaker communities suffer from a lack of resources for training accurate models. In various scenarios, multilingual speech interfaces can play an important role—e.g. in automated translation, command recognition, or hands-free operation. Therefore, it is essential to develop robust multilingual speech recognition systems that perform reliably even under noisy and real-world conditions.
Goal of the Work
This project aims to explore, evaluate, and improve speech recognition systems that can handle multiple languages, with a particular focus on local processing and low-resource. The project includes selecting or preparing suitable datasets, training multilingual acoustic models, and evaluating their performance using standard benchmarks. The final goal is to demonstrate a prototype system that can recognize spoken input in at least three different languages.
Tasks
- Literature review on current multilingual ASR (Automatic Speech Recognition) models (e.g. Whisper, MMS by Meta, XLS-R)
- Selection and preprocessing of open-source multilingual speech corpora (e.g. Common Voice, FLEURS)
- Fine-tuning or evaluating pre-trained models using suitable toolkits (e.g. HuggingFace, ESPnet, Whisper)
- Implementation of evaluation metrics (WER, CER) across different languages
- (Optional) Noise augmentation or domain adaptation to simulate realistic deployment environments
- (Optional) Real-time demo of recognition system
Required Skills
- Programming experience in Python
- Familiarity with deep learning frameworks (e.g. PyTorch or TensorFlow)
- Interest in speech processing and multilingual systems
- Basic understanding of machine learning and neural networks
Contact: martin.kampel@tuwien.ac.at