Contribution to concept detection on images using visual and textual descriptors

par Yu Zhang

Thèse de doctorat en Informatique

Sous la direction de Liming Chen et de Stéphane Bres.

Soutenue le 15-05-2014

à l'Ecully, Ecole centrale de Lyon , dans le cadre de École doctorale en Informatique et Mathématiques de Lyon , en partenariat avec Laboratoire d'InfoRmatique en Images et Systèmes d'information (Ecully, Rhône) (laboratoire) et de Extraction de Caractéristiques et Identification (équipe de recherche) .

Le président du jury était Nicole Vincent.

Le jury était composé de Liming Chen, Walid Mahdi.

Les rapporteurs étaient Jean-Yves Ramel, Georges Quénot.


  • Résumé

    Pas de résumé

  • Titre traduit

    Contribution à la détection de concepts sur des images utilisant des descripteurs visuels et textuels


  • Résumé

    This thesis is dedicated to the problem of training and integration strategies of several modalities (visual, textual), in order to perform an efficient Visual Concept Detection and Annotation (VCDA) task, which has become a very popular and important research topic in recent years because of its wide range of application such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of efforts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. This means that the image content can hardly be described by low-level visual features. In order to address these problems, the text associated with images is used to capture valuable semantic meanings about image content. Moreover, In order to benefit from both visual models and textual models, we propose multimodal approach. As the typical visual models, designing good visual descriptors and modeling these descriptors play an important role. Meanwhile how to organize the text associated with images is also very important. In this context, the objective of this thesis is to propose some innovative contributions for the task of VCDA. For visual models, a novel visual features/descriptors was proposed, which effectively and efficiently represent the visual content of images/videos. In addition, a novel method for encoding local binary descriptors was present. For textual models, we proposed two kinds of novel textual descriptor. The first descriptor is semantic Bag-of-Words(sBoW) using a dictionary. The second descriptor is Image Distance Feature(IDF) based on tags associated with images. Finally, in order to benefit from both visual models and textual models, fusion is carried out by MKL efficiently embed. [...]


Il est disponible au sein de la bibliothèque de l'établissement de soutenance.

Consulter en bibliothèque

La version de soutenance existe

Informations

  • Détails : 1 vol. (vii-149 p.)
  • Annexes : Bibliogr. p. [137]-148

Où se trouve cette thèse\u00a0?

  • Bibliothèque : Ecole centrale de Lyon. Bibliothèque Michel Serres.
  • Disponible pour le PEB
  • Cote : T2391
  • Bibliothèque : Ecole centrale de Lyon. Bibliothèque Michel Serres.
  • Non disponible pour le PEB
  • Cote : T2391 mag
  • Bibliothèque : Ecole centrale de Lyon. Bibliothèque Michel Serres.
Voir dans le Sudoc, catalogue collectif des bibliothèques de l'enseignement supérieur et de la recherche.