Thèse de doctorat en Informatique
Soutenue le 15-05-2014
à l'Ecully, Ecole centrale de Lyon , dans le cadre de École doctorale en Informatique et Mathématiques de Lyon , en partenariat avec Laboratoire d'InfoRmatique en Images et Systèmes d'information (Ecully, Rhône) (laboratoire) et de Extraction de Caractéristiques et Identification (équipe de recherche) .
Le président du jury était Nicole Vincent.
Pas de résumé
Contribution à la détection de concepts sur des images utilisant des descripteurs visuels et textuels
This thesis is dedicated to the problem of training and integration strategies of several modalities (visual, textual), in order to perform an efficient Visual Concept Detection and Annotation (VCDA) task, which has become a very popular and important research topic in recent years because of its wide range of application such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of efforts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. This means that the image content can hardly be described by low-level visual features. In order to address these problems, the text associated with images is used to capture valuable semantic meanings about image content. Moreover, In order to benefit from both visual models and textual models, we propose multimodal approach. As the typical visual models, designing good visual descriptors and modeling these descriptors play an important role. Meanwhile how to organize the text associated with images is also very important. In this context, the objective of this thesis is to propose some innovative contributions for the task of VCDA. For visual models, a novel visual features/descriptors was proposed, which effectively and efficiently represent the visual content of images/videos. In addition, a novel method for encoding local binary descriptors was present. For textual models, we proposed two kinds of novel textual descriptor. The first descriptor is semantic Bag-of-Words(sBoW) using a dictionary. The second descriptor is Image Distance Feature(IDF) based on tags associated with images. Finally, in order to benefit from both visual models and textual models, fusion is carried out by MKL efficiently embed. [...]