Document layout analysis deals with the layout structure of document images, thus segmenting a page into homogeneous image regions. Within the project READ a framework for layout analysis is currently developed. The layout analysis allows to detect text regions (text lines, text blocks, etc.).
The main goal of the master thesis will be to adopt the layout analysis to detect the region of graphs, charts and images mainly in newspapers. The document images are created from pdfs. An example image is shown on this page.
A layout analysis methodology will be implemented in this master thesis with special focus on the detection and classification of image and graph regions.
On success a funding by APA-IT is possible
- Matlab or C++ knowledge
- Machine Learning/Computer Vision knowledge
- ideally VU Document Analysis