Spécification profonde en Coq de langages NoSQL

par Mohammed houssem eddine Hachmaoui

Projet de thèse en Informatique

Sous la direction de Evelyne Contejean et de Véronique Benzaken.

Thèses en préparation à Paris Saclay , dans le cadre de Sciences et Technologies de l'Information et de la Communication , en partenariat avec LRI - Laboratoire de Recherche en Informatique (laboratoire) , VALS - Vérification d'Algorithmes, Langages et Systèmes (equipe de recherche) et de Université Paris-Sud (établissement de préparation de la thèse) depuis le 01-10-2016 .


  • Résumé

    Objectif A first step could be to (deeply) specify a toy NoSQL language in Gallina. This language should enjoy several properties: being simple and modular enough to be deeply specified and formally proven; being powerful and realistic enough to be a credible alternative to existing ones; Then, among the variety of NoSQL languages, the candidate will have to choose one (MongoDB, JAQL, Pig etc) and will provide, may be extending the afore mentionned algebra, a mechanised algebraic semantics for it. Contexte Internet explosion and the ever growing importance of data in applications as well as the recent emergence of Cloud computing, has given birth to a whirlwind of new data models (XML, JSON, RDF) and languages (XPath, XQuery, Pig, Jaql, Sparql...). Whether they are developed under the banner of NoSQL (which stands for Not Only SQL), for BigData Analytics, for Cloud computing or as domain specific languages (DSL) embedded in a host language, most of them share a common subset of SQL and/or the ability to handle semistructured data. Such languages have no clear semantics and can greatly benefit from formal uniform foundations. We argue that such foundations should account for novel features critical to various application domains. A very promising approach consists in using proof assistants such as Coq [3, 4]. Based on the work in [1, 2] a first mechanisation of the SQL language has been proposed. In order to take into account full SQL, we proposed and formalised an algebra extending the relational one allowing to assign it a semantics in a named setting. Unlike what is found in the litterature, our algebra is very concise and parametric with respect to the data model. This means that when data are flat records it captures SQL. Our framework is versatile enough to define function symbols with a predefinite semantics (eg, SQL aggregates avg, count, sum) as well as user defined ones. Last, we provide a Coq mechanised parser of SQL to our, name-based, extended algebra together with its Coq adequation proof and its Ocaml extraction. Référence bibliographique [1] V. Benzaken, É. Contejean, and S. Dumbrava. A Coq Formalization of the Relational Data Model. In 23rd European Symposium on Programming (ESOP), 2014. [2] Véronique Benzaken and Évelyne Contejean. The datacert library (http://datacert.lri.fr/), 2012. [3] Xavier Leroy. A formally verified compiler back-end. J. Autom. Reasoning, 43(4):363–446, 2009. [4] Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Toward a verified relational database management system. In ACM Int. Conf. POPL, 2010.

  • Titre traduit

    Coq deep specification of NoSQL languages


  • Résumé

    Objectif A first step could be to (deeply) specify a toy NoSQL language in Gallina. This language should enjoy several properties: being simple and modular enough to be deeply specified and formally proven; being powerful and realistic enough to be a credible alternative to existing ones; Then, among the variety of NoSQL languages, the candidate will have to choose one (MongoDB, JAQL, Pig etc) and will provide, may be extending the afore mentionned algebra, a mechanised algebraic semantics for it. Contexte Internet explosion and the ever growing importance of data in applications as well as the recent emergence of Cloud computing, has given birth to a whirlwind of new data models (XML, JSON, RDF) and languages (XPath, XQuery, Pig, Jaql, Sparql...). Whether they are developed under the banner of NoSQL (which stands for Not Only SQL), for BigData Analytics, for Cloud computing or as domain specific languages (DSL) embedded in a host language, most of them share a common subset of SQL and/or the ability to handle semistructured data. Such languages have no clear semantics and can greatly benefit from formal uniform foundations. We argue that such foundations should account for novel features critical to various application domains. A very promising approach consists in using proof assistants such as Coq [3, 4]. Based on the work in [1, 2] a first mechanisation of the SQL language has been proposed. In order to take into account full SQL, we proposed and formalised an algebra extending the relational one allowing to assign it a semantics in a named setting. Unlike what is found in the litterature, our algebra is very concise and parametric with respect to the data model. This means that when data are flat records it captures SQL. Our framework is versatile enough to define function symbols with a predefinite semantics (eg, SQL aggregates avg, count, sum) as well as user defined ones. Last, we provide a Coq mechanised parser of SQL to our, name-based, extended algebra together with its Coq adequation proof and its Ocaml extraction. Référence bibliographique [1] V. Benzaken, É. Contejean, and S. Dumbrava. A Coq Formalization of the Relational Data Model. In 23rd European Symposium on Programming (ESOP), 2014. [2] Véronique Benzaken and Évelyne Contejean. The datacert library (http://datacert.lri.fr/), 2012. [3] Xavier Leroy. A formally verified compiler back-end. J. Autom. Reasoning, 43(4):363–446, 2009. [4] Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Toward a verified relational database management system. In ACM Int. Conf. POPL, 2010.