Thèse en cours

Ordonnancement dynamique pour l'inférence et l'apprentissage dans les réseaux de neurones profonds

FR  |  
EN
Auteur / Autrice : Jean-françois David
Direction : Olivier Beaumont
Type : Projet de thèse
Discipline(s) : Informatique
Date : Inscription en doctorat le 29/10/2021
Etablissement(s) : Bordeaux
Ecole(s) doctorale(s) : École doctorale de mathématiques et informatique (Talence, Gironde ; 1991-....)
Partenaire(s) de recherche : Laboratoire : Laboratoire bordelais de recherche en informatique
Equipe de recherche : Supports et Algorithmes pour les applications numériques hautes performances (SATANAS)

Mots clés

FR  |  
EN

Résumé

FR  |  
EN

The training phase and the inference in deep neural networks induce increasing needs in the use of computing resources. Today, learning is classically performed on GPU clusters, relying essentially on data parallelism [1], which requires the realization of costly collective communications that limit scalability [2]. Inference is usually performed on a GPU or the processor of an embedded device such as a phone. However, the use of more complex and deeper networks and the need to obtain real-time results may also require the use of more parallel resources and more sophisticated algorithms. From a scheduling point of view, the inference problem comes back to a steady state scheduling problem [3,4], but with new constraints related to the 'in-order' execution of tasks. The training problem is also a steady state scheduling problem, but the particular form of the dependencies (between the forward and the backward phases) induces complex memory management problems [5,6,7]. A first algorithmic contribution of the PhD thesis is therefore to adapt steady-state scheduling algorithms to the specific context of training and inference in deep neural networks. In the field of parallelism, for linear algebra and numerical simulation applications mainly, an approach based on dynamic (runtime) scheduling has been proposed [8,9]. Indeed, because it is difficult to reliably predict the execution duration of the different tasks, point-to-point communications and collective communications, it is difficult to rely on a static assignment of computational tasks to parallel resources and a pre-calculated schedule. In contrast, the idea of dynamic strategies is to make placement and scheduling decisions at runtime, based on simpler (and cheaper) policies and algorithms but with a clear knowledge of the state of the resources and of the application. A second contribution of the thesis is therefore to adapt dynamic schedulers such as StarPU to the application context of learning and to interface them with frameworks such as PyTorch and TensorFlow, based on the RoToR [10] and StarPU [11] software. Work Plan This thesis is funded by the European H2020 project TextaRossa. First, we will carry out a state of the art study, both on steady-state scheduling techniques and on dynamic runtimes. In a second step, we will focus on the inference problem, by identifying a suitable application framework, with the design of static and dynamic scheduling strategies and the realization of a dynamic scheduler prototype on heterogeneous resources. Finally, in a third step, we will consider the case of the learning phase to extend the previous prototype. Requirements The candidate should have a very good mathematical and algorithmic background, with good understanding of Deep Learning and parallelism, as well as programming skills in Python (and C and/or C++). References [1] Shallue, C. J., Lee, J., Antognini, J., Sohl-Dickstein, J., Frostig, R., & Dahl, G. E. (2018). Measuring the effects of data parallelism on neural network training. arXiv preprint arXiv:1811.03600. [2] Awan, A. A., Hamidouche, K., Hashmi, J. M., & Panda, D. K. (2017, January). S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 193-205). [3] Banino, C., Beaumont, O., Carter, L., Ferrante, J., Legrand, A., & Robert, Y. (2004). Scheduling strategies for master-slave tasking on heterogeneous processor platforms. IEEE Transactions on Parallel and Distributed Systems, 15(4), 319-330. [4] Benoit, A., Marchal, L., Pineau, J. F., Robert, Y., & Vivien, F. (2009). Scheduling concurrent bag-of-tasks applications on heterogeneous platforms. IEEE Transactions on Computers, 59(2), 202-217. [5] Beaumont, O., Eyraud-Dubois, L., Hermann, J., Joly, A., & Shilova, A. (2019). Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory. arXiv preprint arXiv:1911.13214. [6] Jain, P., Jain, A., Nrusimha, A., Gholami, A., Abbeel, P., Keutzer, K., ... & Gonzalez, J. E. (2019). Checkmate: Breaking the memory wall with optimal tensor rematerialization. arXiv preprint arXiv:1910.02653. [7] Olivier Beaumont, Lionel Eyraud-Dubois, and Alena Shilova (2020). Optimal GPU- CPU Offloading Strategies for Deep Neural Network Training. 26th International Conference on Parallel and Distributed Computing, 151-166, Aug 2020, Warsaw, Poland. [8] Augonnet, C., Thibault, S., Namyst, R., & Wacrenier, P. A. (2011). StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2), 187-198. [9] Thibault, S. (2018). On runtime systems for task-based programming on heterogeneous platforms (Habilitation dissertation, University of Bordeaux). [10] RoToR https://gitlab.inria.fr/hiepacs/rotor [11] StarPU https://starpu.gitlabpages.inria.fr