Analyse de scènes dynamiques complexes par segmentation de mouvement - Application aux Véhicules Intelligents

par Hernan Gonzalez

Projet de thèse en Robotique

Sous la direction de Sergio Rodriguez.

Thèses en préparation à Paris Saclay , dans le cadre de Sciences et Technologies de l'Information et de la Communication , en partenariat avec SATIE - Systèmes et Applications des Technologies de l'Information et de l'Energie (laboratoire) et de Université Paris-Sud (établissement de préparation de la thèse) depuis le 01-09-2016 .


  • Résumé

    Dans le cadre de cette thèse nous allons étudier l'impacte de l'utilisation des modèles d'erreurs dans le but d'augmenter la consistence et la robustesse d'un système de perception. Les méthodes issues de cette recherche seront appliquées et mises en oeuvre sous un système M-SfM (Multi-body Structure from Motion) pour les transports intelligents. Les résultats de notre recherche seront confrontés aux méthodes de l'état de l'art en exploitant de base de données de référence. Plus précisément, le système de perception à considérer devra exploiter les informations issues d'une caméra projective et devra estimer la géométrie de l'scène observée, sa position et la position des objets dynamiques aux alentours. L'apport en performance obtenue par l'exploitation des informations géographiques (e.g. carte ) sera aussi étudié afin de réduire la complexité des experiments.

  • Titre traduit

    Complex dynamic scene analysis through multi-body motion segmentation


  • Résumé

    Context ---------------------------------------------------------------------------- Several years of robotics research have been dedicated to the analysis of dynamic scenes. Such analysis is intended to provide, an automatic system, with the capability to estimate simultaneously the scene's geometry, its own position and the position of the dynamic surrounding objects with respect to a global reference frame. In this context, one can highlight conventional advanced techniques as visual odometry, SFM 1 , and SLAM 2 . Those techniques have been largely investigated and evaluated under user-controlled environments, particularly indoor ones. Nowadays, such contributions are required to be transposed to more challenging and highly complex situations in order to answer to new society needs like Intelligent Transportation Systems (ITS). In order to perform the analysis of dynamic scenes, it is necessary to carry out a complex perception process. To this end, it is highly desired the use of sensors which are able to provide complete and redundant information about observed the scene. Vision is a passive sensing mean with low energy consumption and low manufacturing costs which is well-suited to retrieve appearance and structure information in a dense fashion. Learning-based approaches using vision have been widely studied and they usually take advantage not only of appearance object information but also integrate prior object models in order to cope with complex object detection situations (e.g. partial occlusion) [1]. Alternatively, scene structure can also be analyzed by the use of multi-view projective constraints. Such an approach has been proposed in [2] and a recent investigation [3] has proven promising results while transposing this approach into ITS applications. State-of-the-art ---------------------------------------------------------------------------- In [4], a first complete stereo vision-based scene analysis was proposed. Such an approach integrates motion estimation, 2D object detection/tracking and 3D localization and trajectory estimation on full-scale experiments including car passages and crowded scenarios. A further investigation [5] was conducted on a monocular-based real time system. This solution introduces the concept of multibody Structure from Motion(SfM) as a generalization of classical SfM for dynamic scenes. This approach was speeded up and robustified by means of geometrical constraints and probabilistic models. Previous approaches required an explicit moving object detection function which greatly increases the complexity of the perception process. To address this issue, dense motion segmentation methods were studied. In [2] and [6], a two-class motion segmentation (i.e. camera motion and independently moving objects motion) is carried by a multi-frame monocular analysis. This analysis is done under the assumption of the presence of a 3D plane on the scene. Later on, a multi-frame monocular fundamental matrix (MMFM) was proposed for non-planar 3D scenarios, [7]. It avoids any dominating plane assumption enlarging the functional spectrum of the method. Recently, a multiple rigid motion segmentation methodology was proposed based on a unified representation of rigid-body and planar motions by the means of a hybrid perspective constraint [8]. This approach has demonstrated on a public dataset, a higher accuracy and a faster convergence with respect to the existing methods. Finally, in [9], an alternative approach performs multiple rigid-body motion segmentation thanks to an iterative estimation scheme where motion hypotheses are simultaneously evaluated with the 3D scene structures. This approach was evaluated on an on-road dataset achieving promising results. Objective ---------------------------------------------------------------------------- This study will investigate the contributions of error models and quantify their impact in the aim of increasing the consistency and robustness of a perception system. This research will be applied to a multi-body SfM system for autonomous and intelligent vehicles (IV). The obtained results will allow us to compare the performance of the proposed system with respect to the state-of-the-art methods on public road databases. Precisely, the considered monocular vision system will retrieve relevant information such as the scene's geometry, its own position and the position of the dynamic surrounding objects. The use and the contribution of geographical information databases will be investigated in order to reduce the complexity of the experiments on ITS full scale scenarios.