Abstract
One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of contextually-related query operations that together demonstrate interesting hypotheses and conjectures on the data. Unfortunately,relevant such notebooks, that had been prepared on the same dataset, and in light of thesame analysis task, are often nonexistent or unavailable. In this work, we describe ATENA-PRO, a framework for auto-generating such relevant, personalized exploratory sessions. Using a novel specification language, users first describe their desired output notebook. Our language contains dedicated constructs for contextually connecting future output queries. These specifications are then used as input for a Deep Reinforcement Learning (DRL) engine, which auto-generates the personalized notebook. Our DRL engine relies on an existing, general-purpose, DRL framework for data exploration. However, augmenting the generic framework with user specifications requires overcoming a difficult sparsity challenge, as only a small portion of the possible sessions may be compliant with the specifications. Inspired by solutions for constrained reinforcement learning, we devise a compound, flexible reward scheme as well as specification-aware neural network architecture. Our experimental evaluation shows that the combination of these components allows ATENA-PRO to consistently generate interesting, personalized exploration sessions for various analysis tasks and datasets.
Original language | English |
---|---|
Title of host publication | SIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data |
Publisher | Association for Computing Machinery |
Pages | 167-170 |
Number of pages | 4 |
ISBN (Electronic) | 9781450395076 |
DOIs | |
State | Published - Jun 2023 |
Event | 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, United States Duration: 18 Jun 2023 → 23 Jun 2023 |
Publication series
Name | Proceedings of the ACM SIGMOD International Conference on Management of Data |
---|---|
ISSN (Print) | 0730-8078 |
Conference
Conference | 2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 |
---|---|
Country/Territory | United States |
City | Seattle |
Period | 18/06/23 → 23/06/23 |
Bibliographical note
Publisher Copyright:© 2023 Owner/Author.
Funding
We thank Oz Zafar for his contribution to the implementation and presentation of our system. This work has been partially funded by the Israel Science Foundation grant number 2707/22 and the Binational US-Israel Science Foundation grant number 2018194.
Funders | Funder number |
---|---|
Israel Science Foundation | 2707/22, 2018194 |
Keywords
- AI for data analytics
- automated data exploration