Committee:
Prof. Dr. Prof. h.c. Andreas Dengel, German Research Center for Artificial Intelligence. Univ. of Kaiserslauren, Germany
Prof. Dr. Jean-Marc Ogier, Laboratoire Informatique, Image et Interaction. Univ. La Rochelle, France.
Dr. Ramon Baldrich. Dept. Ciències de la Computació & Centre de Visió per Computador. Univ. Autònoma de Barcelona, Spain.
Graphical documents express complex concepts using a visual language. This language consists of a vocabulary (symbols) and a syntax (structural relations between symbols) that articulate a semantic meaning in a certain context. Therefore, the automatic interpretation by computers of these sort of documents entails three main steps: the detection of the symbols, the extraction of the structural relations between these symbols, and the modeling of the knowledge that permits the extraction of the semantics. Different domains in graphical documents include: architectural and engineering drawings, maps, flowcharts, etc.
Graphics Recognition in particular and Document Image Analysis in general are born from the industrial need of interpreting a massive amount of digitalized documents after the emergence of the scanner. Although many years have passed, the graphical document understanding problem still seems to be far from being solved. The main reason is that the vast majority of the systems in the literature focus on very specific problems, where the domain of the document dictates the implementation of the interpretation. As a result, it is difficult to reuse these strategies on different data and on different contexts, hindering thus the natural progress in the field.
In this thesis, we face the graphical document understanding problem by proposing several relational models at different levels that are designed from a generic perspective. Firstly, we introduce three different strategies for the detection of symbols. The first method tackles the problem structurally, wherein general knowledge of the domain guides the detection. The second is a statistical method that learns the graphical appearance of the symbols and easily adapts to the big variability of the problem. The third method is a combination of the previous two methods that inherits their respective strengths, i.e. copes the big variability and does not need annotated data. Secondly, we present two relational strategies that tackle the problem of the visual context extraction. The first one is a full bottom up method that heuristically searches in a graph representation the contextual relations between symbols. Contrarily, the second is syntactic method that models probabilistically the structure of the documents. It automatically learns the model, which guides the inference algorithm to encounter the best structural representation for a given input. Finally, we construct a knowledge-based model consisting of an ontological definition of the domain and real data. This model permits to perform contextual reasoning and to detect semantic inconsistencies within the data. We evaluate the suitability of the proposed contributions in the framework of floor plan interpretation. Since there is no standard in the modeling of these documents there exists an enormous notation variability from plan to plan in terms of vocabulary and syntax. Therefore, floor plan understanding is a relevant task in the graphical document understanding problem.