Project Details



Grant Number



2011/01/01 – 2013/12/31


Martin Kampel


Martin Kampel
Michael Hödlmoser

Humans are dealing with 3D object detection and recognition in their everyday life. In computer vision applications, 3D information is often gathered by using multiple cameras. When multiple cameras having an overlapping view and a wide baseline are used it is not possible to reconstruct an observed object based on corresponding points because such features can’t be found between an image showing the left side of an object and an image showing the right side of the object. Additionally, when using video streams as input, the trajectory of the observed object may not be consistent, because each frame is treated independently.

The goal of the CAPRI (Classification and Pose Recovery in Image Sequences) project is to develop a real time framework which overcomes these problems and presents a new way to classify both rigid and non-rigid objects and determine their 3D pose by using predefined 3D models in combination with videos as input data.
In a first step, a rough classification and pose estimation is done by detecting some of the most likely poses of the observed object using synthetic 3D models for each frame of a video. In a second step the pose is refined over multiple frames using some predefined metrics. Additionally, the pose is refined over multiple distributed cameras, which should lead to robust and smooth results.

Applications are located in the area of 3D surveillance networks and can mainly be used for

  • Urban traffic analysis: As the framework should be able to do a precise classification of different cars in terms of size, shape and colour, the algorithms can be used on parking lots, in garages etc. for counting, classifying and identifying specific vehicles and pedestrians.
  • Security scenarios: Abnormal behaviour of humans can be detected and alarms can be given to the security staff. Due to the massive amount of cameras, it is unlikely for humans to track multiple objects on multiple cameras simultaneously. In this area, automatic approaches would improve their work enormously.


This project is performed in cooperation with:



M. Hödlmoser, B. Micusik, M. Kampel – “Sparse Point Cloud Densification by Using Redundant Semantic Information” – In Proc. of the 3rd Int. Conf. on 3D Vision (3DV), Seattle, USA, June 2013, to appear.

M. Hödlmoser, B. Micusik, M. Pollefeys, M.-Y. Liu, M. Kampel – “Model-Based Vehicle Pose Estimation and Tracking in Videos Using Random Forests” – In Proc. of the 3rd Int. Conf. on 3D Vision (3DV), Seattle, USA, June 2013, to appear.

M. Hödlmoser, B. Micusik – “Surface Layout Estimation Using Multiple Segmentation Methods and 3D Reasoning” – In Proc. of the 6th Iberian Conf. on Pattern Recognition and Image Analysis (IbPRIA), Madeira, Portugal, June 2013, to appear.

M. Hödlmoser, B. Micusik, M.-Y. Liu, M. Pollefeys, M. Kampel – “Classification and Pose Estimation of Vehicles in Videos by 3D Modeling within Discrete-Continuous Optimization” – In Proc. of the 2nd Int. Conf. on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT’12), pp 198-205, Zurich, Switzerland, October 2012.

M. Hödlmoser, B. Micusik, M. Kampel – “Camera Auto-Calibration Using Pedestrians and Zebra-Crossings” – In Proc. of the 11th IEEE Int. Conf. on Computer Vision Workshop on Visual Surveillance (ICCV-VS’11), pp 1697-1704, Barcelona, Spain, November 2011.

M. Hödlmoser, B. Micusik, M. Kampel – “Exploiting Spatial Consistency for Object Classification and Pose Estimation” – In Proc. of the 18th IEEE Int. Conf. on Image Processing (ICIP’11), pp 1009-1012, Brussels, Belgium, September 2011.