Thèse de doctorat en Informatique
Soutenue le 17-12-2012
à Clermont-Ferrand 2 , dans le cadre de École doctorale des sciences pour l'ingénieur (Clermont-Ferrand) , en partenariat avec Laboratoire de Physique Corpusculaire (Aubière, Puy-de-Dôme) (équipe de recherche) .
Le président du jury était Patrick Bellot.
Molecular epidemiology and high-throughput metagenomics on the grid
The objective of this thesis focuses on the study and the development of bioinformatics platforms and tools on the grid. The second objective is to develop applications in molecular epidemiology and metagenomics based on these tools and platforms. Based on the studies of existing bioinformatics platforms and tools, we propose our solution: a platform and a portal for molecular epidemiology and high throughput metagenomics on the grid. The main idea of our platform is to simplify the submission of jobs to the grid via the pilots jobs (jobs generic that can control and launch many real tasks) and the PULL model (tasks are retrieved and executed automatically). There are other platforms that have similar approaches but our platform focuses on the simplicity and the saving time for the submission of jobs. Bioinformatics tools chosen to deploy the platform are popular tools that can be used in many bioinformatics analyses. We apply a workflow engine in the platform so that users can make the analysis easier. Our platform can be seen as a generalized system that can be applied to both the epidemiological surveillance and metagenomics of which two use cases are deployed and tested on the grid. The first use case is used to monitor bird flu. The approach of this application is to federate data sequences of influenza viruses and provide a portal with tools on the grid to analyze these data. The second use case is used to apply the power of the grid in the analysis of high throughput sequencing of amplicon sequences. In this case, we prove the efficiency of the grid by using our platform to gridifier an existing application, which has much less performance than the gridified version.