Détection de communautés sur des graphes aléatoires, algorithmes de messages

par Léo Miolane

Projet de thèse en Mathématiques

Sous la direction de Marc Lelarge.

Thèses en préparation à Paris Sciences et Lettres , dans le cadre de École doctorale de Sciences mathématiques de Paris Centre (Paris) , en partenariat avec DIENS - Département d'informatique de l'École normale supérieure (laboratoire) et de Ecole normale supérieure (établissement de préparation de la thèse) depuis le 01-09-2016 .

  • Résumé

    Beaucoup de problèmes en statistiques, informatique, ou physique font intervenir un grand nombre de variables qui interagissent entre elles. Ces interactions peuvent être décrites par un graphe de facteurs. Nous allons nous intéresser en particulier au problème de détection de communautés sur des graphes aléatoire. Nous allons y étudier les algorithmes de messages, qui sont très efficaces pour traiter ce genre de problèmes.

  • Titre traduit

    Community detection on random graphs, message-passing algorithms

  • Résumé

    Many problems in combinatorics, computer science, statistical inference and physics can be cast along the following lines: there are a large number of variables interacting through constraints that each bind a few variables. The interactions can be described naturally by a factor graph. We will be concerned with models where the factor graph is random like the random k-SAT model, the random graph colouring problem or community detection. Over the past decade, the second moment method applied to the partition function has emerged as the principal tool for the analysis of such models. However, in many cases the use of the second moment method is precluded by large deviations phenomena. The obvious remedy is to condition on an event that avoids them but this conditioning can render the second moment computation infeasible. Indeed, the recent history of the random k-SAT problem illustrates how conditioning turns a second moment computation into a formidable task [1, 2]. A completely different but non-rigorous method, the replica symmetric cavity method, has been suggested on the basis of ideas from statistical physics. According to the cavity method, under certain assumptions the asymptotic value of the log-partition function can be calculated by maximising a functional called the Bethe free energy. Furthemore, the physics recipe for solving this maximisation problem is to iterate a message passing algorithm called Belief Propagation on the factor graph until convergence. Unfortunately, in general there are several fixed points and non- trivial insights are necessary to steer Belief Propagation toward the correct one. Even worse, in the planted case (corresponding to the community detection problem for example), the maximal value of the Bethe free energy will not approximate the log-partition function. This phenomenon has an interpretation in term of statistical physics concept, namely phase transition [3]. Our main first goal will be to analyse rigorously this phase transition and understand its algorithmic implications. [1] A. Coja-Oghlan and K. Panagiotou The asymptotic k-sat threshold. Advances in Mathe- matics 288 (2016): 985-1068. [2] J. Ding, A. Sly, and N. Sun. Proof of the satisfiability conjecture for large k. Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing. ACM, 2015 [3] L. Zdeborová and F. Krzakala. Statistical physics of inference: Thresholds and algorithms. arXiv preprint arXiv:1511.02476 (2015).