Techniques de compilation intelligentes pour la programmation Big Data

par Sarah Chlyah

Projet de thèse en Informatique

Sous la direction de Pierre Geneves.

Thèses en préparation à Grenoble Alpes , dans le cadre de École doctorale mathématiques, sciences et technologies de l'information, informatique (Grenoble) , en partenariat avec Laboratoire d'Informatique de Grenoble (laboratoire) depuis le 20-03-2018 .


  • Résumé

    Building AI applications with amounts of data that exceed single computer capabilities remains a very time­consuming and expensive task. As pointed out in a recent Stanford report, “This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end­to- end machine learning application development, from data preparation and labeling to productionization and monitoring”. In the CLEAR project of the Tyrex team at LIG/Inria Grenoble, we seek to provide high­level programming abstractions and techniques to facilitate the construction of AI/big data applications. Current big data toolboxes (such as Apache Spark) provide low­level primitives for distributed data­centric computations. These toolboxes are progressively extended with higher­level programming abstractions such as dataframes, ML pipelines, and queries. The next challenge is to make such programming abstractions and techniques more efficient and more accessible in each stage of the construction of a big data application. This is what we explore in the CLEAR project, where we investigate the synthesis of code optimized for big data toolboxes. In particular, we explore how code can be automatically generated and optimized from higher­level descriptions. The overall goal is to provide high­level programming abstractions and techniques to facilitate the construction of more robust and more efficient AI/big data applications, such as the ones found in linked data, finance, retail, healthcare, etc.

  • Titre traduit

    Smart compilation techniques for Big Data programming


  • Résumé

    Building AI applications with amounts of data that exceed single computer capabilities remains a very time­consuming and expensive task. As pointed out in a recent Stanford report, “This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end­to- end machine learning application development, from data preparation and labeling to productionization and monitoring”. In the CLEAR project of the Tyrex team at LIG/Inria Grenoble, we seek to provide high­level programming abstractions and techniques to facilitate the construction of AI/big data applications. Current big data toolboxes (such as Apache Spark) provide low­level primitives for distributed data­centric computations. These toolboxes are progressively extended with higher­level programming abstractions such as dataframes, ML pipelines, and queries. The next challenge is to make such programming abstractions and techniques more efficient and more accessible in each stage of the construction of a big data application. This is what we explore in the CLEAR project, where we investigate the synthesis of code optimized for big data toolboxes. In particular, we explore how code can be automatically generated and optimized from higher­level descriptions. The overall goal is to provide high­level programming abstractions and techniques to facilitate the construction of more robust and more efficient AI/big data applications, such as the ones found in linked data, finance, retail, healthcare, etc.