Interactive Data Analysis (IDA) is a core knowledge-discovery process, in which data scientists explore datasets by issuing a sequence of data analysis actions (e.g. filter, aggregation, visualization), referred to as a session. Since IDA is a challenging task, special recommendation systems were devised in previous work, aimed to assist users in choosing the next analysis action to perform at each point in the session. Such systems often record previous IDA sessions and utilize them to generate next-action recommendations. To do so, a compound, dedicated session-similarity measure is employed to find the top-k sessions most similar to the session of the current user. Clearly, the efficiency of the top-k similarity search is critical to retain interactive response times. However, optimizing this search is challenging due to the non-metric nature of the session similarity measure. To address this problem we exploit a key property of IDA, which is that the user session progresses incrementally, with the top-k similarity search performed, by the recommender system, at each step. We devise efficient top-k algorithms that harness the incremental nature of the problem to speed up the similarity search, employing a novel, effective filter-and-refine method. Our experiments demonstrate the efficiency of our solution, obtaining a running-time speedup of over 180X compared to a sequential similarity search.
|Title of host publication||Advances in Database Technology - EDBT 2020|
|Subtitle of host publication||23rd International Conference on Extending Database Technology, Proceedings|
|Editors||Angela Bonifati, Yongluan Zhou, Marcos Antonio Vaz Salles, Alexander Bohm, Dan Olteanu, George Fletcher, Arijit Khan, Bin Yang|
|Number of pages||12|
|State||Published - 2020|
|Event||23rd International Conference on Extending Database Technology, EDBT 2020 - Copenhagen, Denmark|
Duration: 30 Mar 2020 → 2 Apr 2020
|Name||Advances in Database Technology - EDBT|
|Conference||23rd International Conference on Extending Database Technology, EDBT 2020|
|Period||30/03/20 → 2/04/20|
Bibliographical noteFunding Information:
Acknowledgments. This work has been partially funded by the Israel Innovation Authority (MDM), the Israel Science Foundation, the Binational US-Israel Science foundation, Len Blavatnik and the Blavatnik Family foundation, and Intel® AI DevCloud.
© 2020 Copyright held by the owner/author(s).