KAUST Research Workshop on Optimization and Big Data
Panos Kalnis is Professor and Chair of the Computer Science program in the King Abdullah Univ. of Science and Technology (KAUST). In 2009 he was visiting assistant professor in the CS Dept., Stanford University. Before that, he was assistant professor in the CS Dept., National University of Singapore (NUS). In the past he was involved in the designing and testing of VLSI chips and worked in several companies on database designing, e-commerce projects and web applications. He has served as associate editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) from 2013 to 2015, and on the editorial board of the VLDB Journal from 2013 to 2017. Currently he is on the editorial board of the Data Science and Engineering Journal. He received his Diploma from the Computer Engineering and Informatics Dept., Univ. of Patras, Greece in 1998 and his PhD from the Computer Science Dept., Hong Kong Univ. of Science and Technology (HKUST) in 2002. His research interests include Big Data, Cloud Computing, Parallel and Distributed Systems, Large Graphs and Long Sequences.
Identifying frequent subgraphs is a core operation for many analytics and machine learning algorithms on graphs. Frequent subgraph mining is computationally expensive rendering many graph analytics algorithms prohibitive for large graphs. Our group developed SmartPSI, a frequent subgraph mining algorithm based on the notion of pivoted subgraph isomorphism. Within SmartPSI we propose two radically different algorithms, called optimistic and pessimistic, each one suitable for different inputs. We also include a classifier based on machine learning, that is trained on-the-fly to decide dynamically which algorithm to execute for each part of the input graph. Finally, we implement an optimizer that generates for each algorithm a low-cost execution plan. Our experimental evaluation with large real graphs reveals that SmartPSI achieves up to 6 times performance improvement compared to the state-of-the-art distributed frequent subgraph mining system.