Vers des infrastructures en nuage efficaces pour le paradigme FaaS : outils, architectures logicielles et heuristiques d'optimisation

par Christopher Ferreira

Projet de thèse en Informatique

Sous la direction de Vivien Quema et de Renaud Lachaize.

Thèses en préparation à l'Université Grenoble Alpes , dans le cadre de École doctorale mathématiques, sciences et technologies de l'information, informatique , en partenariat avec Laboratoire d'Informatique de Grenoble (laboratoire) et de Efficient and RObust Distributed Systems (equipe de recherche) depuis le 01-10-2018 .

  • Résumé

    (voir la version en anglais pour plus de détails) L'objectif de ce projet de recherche est de permettre aux fournisseurs et opérateurs de services Cloud d'obtenir de meilleure performances, une plus grande efficacité énergétique et une meilleure consolidation des ressources en améliorant l'efficacité des couches logicielles basses (systèmes d'exploitation et intergiciels) sur chacune des machines d'un centre de données (data center). Le travail proposé est centré autour de deux aspects principaux. Outils de profiling: L'outillage existants pour la détection et la compréhension des problèmes de performances est inadéquat par rapport aux caractéristiques des environnements modernes de Cloud computing. Le travail consistera à combler certaines lacunes de l'état de l'art en proposant de nouvelles approches de profiling, et notamment en tissant des liens entre des techniques existantes disjointes (causal profiling, idleness diagnosis) et en couvrant de nouveaux besoins (prise en charge de modèles d'exécution hétérogènes, détection d'inefficacités sporadiques). Optimisations de performances pour les infrastructure Serverless/FaaS (Functions as a Service) : Ces infrastructures émergentes dans le domaine du Cloud computing présentent des caractéristiques atypiques qui introduisent des défis et des opportunités particulières concernant les optimisations de performances. Cette seconde partie du travail proposé consistera à développer des techniques d'optimisations spécifiques, à la lumière des informations obtenues via les outils développés dans le cadre de la première partie du travail.

  • Titre traduit

    Towards efficient infrastructures for the 'Functions-as-a-Service' paradigm: profiling tools, architectural design and optimization heuristics

  • Résumé

    Scientific context: Cloud computing is gaining more and more traction, in particular due to (i) the cost-effectiveness of its resource pricing models and (ii) the powerful programming abstractions and resource management facilities that simplify the tasks of developers and operators of hosted applications. However, to keep up with this momentum, (public and private) Cloud providers are facing complex challenges for the design and optimization of their software infrastructures. On the one hand, in order to achieve high performance per dollar, they must constantly ensure high utilization and energy efficiency of the hardware resources, through multi-tenancy and resource over-commitment (also known as “server consolidation”). On the other hand, they must provide high performance for the hosted applications, which have increasingly demanding requirements in terms of service-level agreements (low latency, high throughput) as well as complex, diverse and time-varying load patterns. In this general context, this project is more specifically focused on “serverless computing” (also known as “functions as a service” of FaaS), an emerging service model for the design and execution of hosted applications. This model is becoming popular and is now included in the service portfolio of many Cloud providers (e.g., Amazon Lambda and Google Cloud Functions), and also supported by open source projects like Apache OpenWhisk. In this model, applications developers provide and register handler functions, whose execution is automatically triggered by specific events. The execution of a given function is performed under a capped amount of resources (CPU time and memory footprint) and is billed according to the actual execution time (as multiples of a fixed time slice – e.g., 100 ms). This model has already proven to be extremely useful, convenient and cost-effective for a larger number of application domains, including Web applications, IoT (Internet of Things), data analytics, multimedia processing and storage. From the point of view of Cloud providers and operators, this new paradigm brings both new challenges and new opportunities, especially regarding performance tuning. Project description: The goal of this research project is to help Cloud providers and operators achieve better performance, improved energy efficiency and more aggressive server consolidation by improving the efficiency of the software stack (operating system and middleware) that manages each individual (physical) machine in a data center. In such an environment, the typical low-level stack consists in the Linux operating system and additional middleware layers (for example, a container management runtime like Docker engine, and runtime environments for managed languages like Java, Javascript and Python). The proposed work is structured around the following aspects and challenges: 1) Profiling tools Existing tooling (both production-grade tools and research prototypes) aimed at pinpointing performance problems in the software stack and understanding the underlying root causes is inadequate for today's Cloud environments, due to the combination of several factors: the complexity of the modern hardware platforms (massive parallelism, deep hierarchies of memory layers with non-uniform access times, large set of power management states, ultra-low latency I/O devices), the evolution of the software stack (large number of interconnected services, based on heterogeneous languages and programming/execution models and hosted in diverse/hierarchical execution environments (such as language runtimes, operating system containers and system virtual machines) and the stringent application-level requirements (e.g., strong sensitivity to tail latencies, tasks with diverse and potentially very fine granularities). In particular, this work will aim at revisiting, enhancing and bridging existing techniques: - (i) causal profiling, which allows the detection of complex/hidden sources of inefficiencies but provides little context to developers and only works for steady-state systems and on-CPU bottlenecks; - (ii) idleness diagnosis, which allows understanding the root causes of under-used resources, but lacks generality. Besides, this work will also require the development of new techniques providing support for (i) profiling application with heterogeneous execution models (e.g., mixing thread-based and event-driven models) and (ii) detecting and understanding transient and sporadic inefficiencies (e.g., to address the root causes of high tail latencies). 2) Performance optimization of serverless infrastructures Serverless infrastructures have a set of unique features that bring new opportunities for performance optimizations. For example: * Very short task lifespans – yet with potentially very diverse timescales (from microseconds to seconds). The former aspect makes the case for streamlined code paths in the system stack, in order to efficiently deal with the extremely frequent execution of task startup and termination routines. Besides, the latter aspect introduces the need to identify tasks that have fundamentally different requirements, and to use distinct resource allocation/accounting strategies accordingly (for example, different CPU time slicing and memory allocation algorithms). * A run-to-completion model, often with single-threaded execution and relatively predictable I/O patterns (for example, only one I/O request at the beginning and at the end of the function). This aspect makes the case for limiting the impact of preemptions (triggered by timer and I/O interrupts), at least on a subset of the CPUs. * New opportunities for scheduling heuristics. Compared to traditional, long-running tasks on a server machine, the invocations of lambda functions are very short and have a limited number of inputs (often of relatively limited size). As a consequence, for lambda functions that are frequently executed, it becomes more likely for the OS to predict the execution time as well as the memory and I/O requirements based only on a small set of metadata (function name, user name, input arguments). This kind of insight can be used to drive the resource management policies of the OS (CPU scheduling, memory allocation, cache prefetching/partitioning …). This part of the project will consist in investigating the impact and trade-offs of the specific optimizations strategies like the ones listed above. This body of work will be driven by the analyses obtained thanks to the tools and profiling methodologies developed in the above-described context ('profiling tools'), in order to identify/confirm the most significant performance and energy-efficiency bottlenecks. The optimizations will be introduced in two steps. First, the least intrusive modifications will be introduced through existing hooks in the different software layers (e.g., scheduling policies of the OS kernel and the language runtimes, memory allocation and garbage collection algorithms, tuning of the networking and I/O stacks ...), with a particular concern for automatic or semi-automatic parameter tuning. Second, the more demanding optimizations will be implemented through a redesign of the key software components and interfaces (for example, a new container management subsystem with more efficient communication channels and lifecycle management primitives).