Méthodes parallèles et distribuées de Monte-Carlo par Chaînes de Markov pour l'Inférence Bayésienne de modèles à Factorisation de tenseurs

par Thanh huy Nguyen

Projet de thèse en Traitement du signal et des images

Sous la direction de Gaël Richard et de Umut Simsekli.

Thèses en préparation à Paris Saclay , dans le cadre de École doctorale Sciences et technologies de l'information et de la communication (Orsay, Essonne) , en partenariat avec Laboratoire de Traitement et Communication de l'Information (laboratoire) , S2A - Statistique et Apprentissage (equipe de recherche) et de Télécom ParisTech (établissement de préparation de la thèse) depuis le 15-09-2017 .


  • Résumé

    Matrix and tensor factorization methods provide a unifying view for a broad spectrum of techniques in machine learning and signal processing, providing both sensible statistical models for datasets as well as efficient computational procedures framed as decomposition algorithms [1, 2]. So far, algebraic or optimization based approaches prevailed for the computation of such factorizations [1]. In contrast, the topic of this PhD thesis will aim at developing state-of-the-art Markov Chain Monte Carlo (MCMC) methods for full Bayesian inference in matrix and tensor factorization models. However, MCMC methods are generally perceived as being computationally very demanding and impractical for big data problems [3]. The central goal of this PhD thesis will be then to exploit parallel and distributed computation to push the state-of-the-art in terms of scalability, statistical efficiency and computational complexity. More precisely, the three main goals of the project are: • The development of novel parallel and distributed Markov Chain Monte Carlo methods for Bayesian inference of factorization-based data models • The development of new parallel Bayesian model selection and model averaging methods for large-scale matrix and tensor factorization problems • The demonstration of the practical utility of the developed parallel and distributed MCMC methods on two challenging applications from two domains: audio source separation and link prediction.

  • Titre traduit

    Parallel and Distributed Markov Chain Monte Carlo for Bayesian Inference in Matrix and Tensor Factorization Models


  • Résumé

    Matrix and tensor factorization methods provide a unifying view for a broad spectrum of techniques in machine learning and signal processing, providing both sensible statistical models for datasets as well as efficient computational procedures framed as decomposition algorithms [1, 2]. So far, algebraic or optimization based approaches prevailed for the computation of such factorizations [1]. In contrast, the topic of this PhD thesis will aim at developing state-of-the-art Markov Chain Monte Carlo (MCMC) methods for full Bayesian inference in matrix and tensor factorization models. However, MCMC methods are generally perceived as being computationally very demanding and impractical for big data problems [3]. The central goal of this PhD thesis will be then to exploit parallel and distributed computation to push the state-of-the-art in terms of scalability, statistical efficiency and computational complexity. More precisely, the three main goals of the project are: • The development of novel parallel and distributed Markov Chain Monte Carlo methods for Bayesian inference of factorization-based data models • The development of new parallel Bayesian model selection and model averaging methods for large-scale matrix and tensor factorization problems • The demonstration of the practical utility of the developed parallel and distributed MCMC methods on two challenging applications from two domains: audio source separation and link prediction.