Apprentissage profond Bayésien pour la sélection de modèles et l'inférence approximative

par Ekaterina Iakovleva

Projet de thèse en Mathématiques et Informatique

Sous la direction de Jakob Verbeek.

Thèses en préparation à Grenoble Alpes , dans le cadre de École doctorale mathématiques, sciences et technologies de l'information, informatique (Grenoble) , en partenariat avec Laboratoire Jean Kuntzmann (Grenoble) (laboratoire) et de LEAR : Learning and Recognition in Vision (equipe de recherche) depuis le 01-10-2018 .

  • Résumé

    Le domaine de l'apprentissage automatique a récemment été considérablement bouleversé par l'apprentissage profond. Les réseaux neuronaux profonds sont maintenant à la base de l'état de l'art en matière de vision par ordinateur, de reconnaissance vocale, de traitement du langage naturel et de nombreux autres domaines. Bien que très efficaces, ces modèles sont coûteux en termes de calcul et nécessitent de grandes quantités de données pour estimer avec précision leurs nombreux paramètres. Les statistiques bayésiennes offrent un cadre théoriquement bien fondé pour raisonner sur l'incertitude et constituent l'une des pierres angulaires de l'apprentissage automatique moderne. Bien qu'apparemment assez éloigné, il existe un grand potentiel de fertilisation croisée entre l'apprentissage en profondeur et les statistiques bayésiennes. Pourtant, l'interaction entre ces deux paradigmes d'apprentissage est relativement peu exploré jusqu'à présent. Le but de cette thèse est d'apporter théorie et techniques pratiques qui se situent à l'interface de ces deux domaines. Voir version anglais pour sujet de thèse en détail.

  • Titre traduit

    Bayesian deep learning for model selection and approximate inference

  • Résumé

    The field of machine learning has recently been drastically impacted by deep learning. Deep neural networks are now at the basis of the state-of-the-art in computer vision, speech recognition, natural language processing, and many other areas. While very effective, these models are computationally costly and require large quantities of data to accurately estimate their many parameters. Bayesian statistics offer a theoretically well founded framework to reason about uncertainty, and it is one of the cornerstones of modern machine learning. Although seemingly quite remote from deep neural networks which are theoretically poorly understood, there is a great potential of cross-fertilization between deep learning and Bayesian statistics. Yet, the interaction between these two learning paradigms is relatively underexplored so far. The goal of this thesis is to contribute new theory and practical techniques that lie at the interface of these two techniques. This thesis will explore the Bayesian learning framework to address two important problems in learning deep neural networks. The first is the design of energy and memory efficient neural network models that are suitable for deployment in devices such as mobile phones or drones. Current solutions only consider using sparsity inducing priors to suppress redundant network parameters in a given architecture, or posterior concentration to sparse deep networks. Another line of work on neural architecture search is based on reinforcement learning and requires training of thousands of deep neural nets across hundreds of GPUs. To make progress and also learn the overall architecture of convolutional neural networks, we want to explore Bayesian learning of the convolutional neural fabric meta-architecture which allows implicit learning across thousands of architectures in parallel, with hierarchical sparsity inducing prior distributions over the parameter space. The second problem we want to address is the reliance of deep learning on large quantities of training data, which are typically time consuming and expensive to acquire. In the context of visual object recognition, our goal is to learn to recognize a new object class from just a few examples, something that humans do effortlessly, but machines are currently not capable of. Within the Bayesian learning framework we can harness the experience gathered from learning to recognize N previous object classes in a posterior distribution over the network parameters, and use this as a prior distribution over the parameters for the new object category. Vice-versa deep learning provides a rich set of techniques that can be used to address problems in probabilistic graphical models, such as Bayesian networks and Markov random fields. These probabilistic frameworks represent the conditional independence structure of complex multivariate distributions and derive generic computational tools from the graphical properties of the dependency structure. One of the most important problems in such models is to characterize the posterior distribution over the latent random variables given observed data: a problem known as inference. For many models of interest, the posterior is however intractable to compute exactly. Existing approximate inference techniques, such as loopy belief propagation or mean-field inference, are limited in the sense that they are (i) generic and not adapted to a specific graphical model of interest, and (ii) computationally costly due to their iterative nature. The second topic of this thesis is the exploration of deep neural networks as trainable approximate inference engines: given observed data the neural network outputs a distribution over the latent variables. The inference networks can be trained using a variational maximum likelihood criterion or from approximate posteriors computed using conventional iterative techniques. To validate the practical effectiveness of the developed Bayesian learning techniques we will consider deep convolutional networks for visual recognition tasks. To assess the effectiveness of the proposed approximate inference techniques we will evaluate them for Markov random field models for image processing and latent aspect models, such as Latent Dirichlet Allocation, for textual data.