Stratégies optimistes en apprentissage par renforcement

Sarah Filippi

Le moteur de recherche
des thèses françaises

Désactiver l'aide à la saisie

FR |

EN

Auteur / Autrice :	Sarah Filippi
Direction :	Olivier Cappé, Aurélien Garivier
Type :	Thèse de doctorat
Discipline(s) :	Signal et images
Date :	Soutenance en 2010
Etablissement(s) :	Paris, Télécom ParisTech

Mots clés

FR

Mots clés contrôlés

Radio cognitive

Apprentissage automatique

Représentation des connaissances

Intelligence artificielle

Résumé

FR |

EN

This thesis concerns model-based methods to solve reinforcement learning problems: these methods define a set of models which could explain the interaction between an agent and an environment. We consider different models of interaction : (partially observed) Markov decision processes and bandit models. We show that our novel algorithms perform well in practice and theoretically. The first algorithm consists of following an exploration policy during which the model is estimated and then an exploitation one. The duration of the exploration phase is controlled in an adaptative way. We then obtain a logarithmic regret for a parametric Markov decision problem even if the state is partially observed. This model is motivated by an application of interest in cognitive radio : the opportunistic access of a communication network by a secondary user. We are also interested in optimistic algorithms: the agent chooses the optimal actions for the best possible model. We construct such an algorithm in a parametric bandit model for a generalized linear model. We consider an online advertisement application. We then use the Kullback-Leibler divergence to construct the set of likely models in optimistic algorithms for finite Markov decision processes. This change in metric is studied in details and leads to significant improvement in practice.

Le moteur de recherche
des thèses françaises

Les thèses

Les personnes
liées aux thèses

Stratégies optimistes en apprentissage par renforcement

Mots clés

Mots clés contrôlés

Résumé

Le moteur de recherche des thèses françaises

Les thèses

Les personnes liées aux thèses

Recherche Avancée

Stratégies optimistes en apprentissage par renforcement

Mots clés

Mots clés contrôlés

Résumé

Le moteur de recherche
des thèses françaises

Les personnes
liées aux thèses