KAUST Research Workshop on Optimization and Big Data
Marco Canini is an assistant professor in Computer Science at KAUST. Marco obtained his Ph.D. in computer science and engineering from the University of Genoa in 2009 after spending the last year as a visiting student at the University of Cambridge, Computer Laboratory. He was a postdoctoral researcher at EPFL from 2009 to 2012 and after that a senior research scientist for one year at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at the Université catholiqué de Louvain. He also held positions at Intel Research and Google.
Programmable networking hardware creates new opportunities for infusing intelligence into the network. This raises a fundamental question: what kinds of computation should be delegated to the network?To answer, we turn our attention to modern machine learning workloads. Efficiently training complex machine learning models at scale requires high performance at the infrastructure level. With large models, communication among multiple workers becomes a scalability concern due to limited bandwidth.
We propose to address this problem by redesigning communication in distributed machine learning to take advantage of programmable network data planes. Our key insight is to reduce the volume of exchanged data by performing in-network computation to aggregate the model’s parameter updates as they are being transferred. However, in-network computation tasks must be judiciously crafted to match the limitations of the network machine architecture of programmable devices. With the help of our experiments on machine learning workloads, we identify that aggregation functions raise opportunities to exploit the limited computation power of networking hardware to lessen network congestion and improve the overall application performance. Moreover, as a proof-of-concept, we propose DAIET, a system that performs in-network data aggregation. Experimental results with an initial prototype show a large data reduction ratio (86.9%-89.3%) and a similar decrease in the workers' computation time.