Projet de thèse en Informatique, données, IA
Sous la direction de Mauro Sozio.
Thèses en préparation à l'Institut polytechnique de Paris , dans le cadre de École doctorale de l'Institut polytechnique de Paris , en partenariat avec LTCI - Laboratoire de Traitement et Communication de l'Information (laboratoire) et de RMS : Réseaux, Mobilité et Services (equipe de recherche) depuis le 01-06-2018 .
See the english version. This PhD thesis will be carried on in an international context, hence the candidate must be fluent in english. Présentation détaillée en anglais
Artificial Intelligence and Machine Learning for Network Inference and Control
Recently, networking has become the focus of a huge transformation enabled by new models resulting from virtualization and cloud computing. This has led to a number of novel architectures supported by emerging technologies such as Software-Defined Networking (SDN), Network Function Virtualization (NFV) and more recently, edge cloud and fog. This development towards enhanced design opportunities along with increased complexity in networking as well as in networked applications has fueled the need for improved network automation in agile infrastructures. Similarly, the push toward Model Driven Telemtry (MDT) makes it possible to have a deluge of very fine-grained per-protocol data, that AI/ML techniques can exploit. However, it is unclear which can be the benefit and how much data should be streamed, or if is possible to export paramters of the AI/ML models instead. This new networking environment calls for even more automation, as exemplified by recent initiatives to set-up network automation platforms. Whereas Machine Learning techniques have been intensively used in networking since a long time [1,2], however the use of ML in networking application is far from being systematically exploited. For instance, while a plethora of techniques for anomaly detection do exist  however the majority of studies are still not leveraging them . Additionally, ML techniques have been used generally to understand, as opposite to control, the network Additionally, whereas machine learning and big data analysis in 'batches' is relatively well exploited [5,6,7,8,9,10] (including for instance classical flow-level statistics with MapReduce [6, 7], or performing IP, TCP, HTTP, and NetFlow analysis in a scalable manner, or focusing on specific aspects such as smartphone traffic characterization  to anomaly and botnet  detection) fewer work exist that exploit 'stream processing' systems that operates over data in real time: these includes both the general purpose systems (e.g. Storm, Spark, Flink ) as well as systems specialized for the networking domain (e.g. Blockmon, DBStream). Addressing the above evolution, the aim of this thesis is to bring stream-based models into networking applications. For a starter, data stream models will be developed locally at each node. Then, we will study the tradeoff of the amount of data to be exchanged among nodes (e..g, network telemetry vs model-parameter only) to be able to centralize learning from the distributed individual learners; finally, we will study how to refine local distributed models using the centralized learner, and how to use these models not only to understand, but also to control, the way in which the network operates. The candidate will carry on original research work pertaining to the design and performance evaluation of the above schemes. We advocate the use of complementary experimental methodologies including (i) protoyping, to engineer the tools necessary to perform network measurement from which to extract useful ``features''; (ii) machine learning technique, with special attention to stream learning, to automatically build data-driven models out of the measured features. Specifically, as for the envisioned tool, we enlist here some of the relevant preselected tools: C/python for prototyping, scipy/numpy for numerical analysis and julia/spark/samoa/panda for stream learning.