Visual tracking has been tackled conceptually in terms of the underlying subproblems of visual description and correspondence of objects rather independent of the camera. However, visual tracking has less been seen technically in terms of available sensors and computational resources. Modern sensors have remarkable abilities e.g. CMOS windowing, slow motion functionality and asynchronous events. This rather new camera techniques allow rethinking of visual tracking as attentive process similar to human vision where available visual data is reasonably selected to achieve trade-off among robustness, accuracy and computational tractability of object tracking.
The current approach is to model the localisation and classification of persons by a convolutional neural network (CNN). Such networks are trained end-to-end with real images and given labels, i.e. with rectangles enclosing the visible persons. One way to improve current state of the art is to use more diverse data capturing persons in scenes typical for the considered application. Imagine future autonomous driving where scenes are enormously diverse, from city traffic, to highway driving to rural roads, on all continents all around the globe. Here the question is if diversity is constrained by common features all scenes share in natural and man-made environments and if this diversity can be captured by large but finite datasets.
Compose image datasets, train an existing deep neural network and compare the results with existing benchmarks. Implement the neural network for person detection.
The thesis can be combined with a preceding Informatik Praktika.
- Review literature
- Create training and validation dataset based on previous work
- Implement training and test algorithms
- Test results on CityPersons dataset
- Optional: Improve algorithms for better results
- Written report/thesis and final presentation
- Basic knowledge in computer vision
- Basic experience in Matlab, C++, Python
- Interest in Machine Learning, maths, statistics
- Interest in GPU programming
- H. K. Galoogahi, A. Fagg, C. Huang, D. Ramanan, S. Lucey. Need for Speed: A Benchmark for Higher Frame Rate Object Tracking. CoRR, 2017.
- A. Handa, R. A. Newcombe, A. Angeli, A. J. Davison. Real-Time Camera Tracking: When is High Frame-Rate Best? European Conference on Computer Vision (ECCV), LNCS vol. 7578, p. 222-235, 2012.
- G. Sperl, R. Pflugfelder. Person Classification with Convolutional Neural Networks. Master’s Thesis, TU Vienna, 2016.