Distributed Deep Learning

Supervisors: Dominik Schörkhuber, Margrit Gelautz

Distributed Deep Learning

In this practicum, we aim to evaluate the feasibility of distributed deep learning for Computer Vision applications on the Vienna Scientific Cluster (VSC) (https://www.it.tuwien.ac.at/services/forschung/high-performance-computing/vsc-vienna-scientific-cluster).

We are currently developing deep learning algorithms with steep requirements when it comes to computational resources for an ongoing research project in the automotive field. As the development of large models is increasingly difficult on our workstations, we are looking for ways to speed up our development process. The VSC is a collaboration of several Austrian universities that provides supercomputing resources. Among others, special purpose nodes with GPUs are available. However, while the cluster provides large computational resources, the software projects need to be adapted and evaluated for use on such a cluster. Among the challenges are code adaptations for distributed training on multiple GPUs and the evaluation of disk storage requirements, as Computer Vision datasets often span hundreds of gigabytes.

Helpful experience:

  • Familiar with Linux, Python and DL frameworks like Keras, Tensorflow and Pytorch
  • Knowledge regarding Computer Vision applications and frameworks