KAUST Research Workshop on Optimization and Big Data
Srikanth Kandula is a principal researcher at Microsoft Research. His research interests span many aspects of networked systems including approximate analytics, datacenters, network management, diagnosis, applied statistical inference and security. He has published over 20 papers in top-tier venues such as SIGCOMM, NSDI, OSDI, SIGMOD and MobiSys. He is a winner of the MSR Goldstar award (2011) and an NSDI best student paper award (2005). He obtained his PhD from the Massachusetts Institute of Technology (2008).
Despite decades of research, approximations are not widely used in data analytics platforms. To understand why, an ideal approximate analytics system has to meet at least four goals: cover a large class of queries, offer much better latency and/ or throughput, have small overhead and offer accuracy guarantees. Whether such a system exists remains an open question. In this talk, I will describe alternate approaches that (a) introduce samplers as native SQL operators including samplers that can sample before a join and a group-by and (b) extend a cost-based query optimizer so as to improve the performance of plans with samplers without changing their accuracy. These techniques are used within Microsoft and are publicly available in the Azure Data Lake Analytics platform.