ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement Learning

Tavor Lipman, Tova Milo, Amit Somech

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the most common, helpful practices of data scientists, when starting the exploration of a given dataset, is to examine existing data exploration notebooks prepared by other data analysts or scientists. These notebooks contain curated sessions of contextually-related query operations that together demonstrate interesting hypotheses and conjectures on the data. Unfortunately,relevant such notebooks, that had been prepared on the same dataset, and in light of thesame analysis task, are often nonexistent or unavailable. In this work, we describe ATENA-PRO, a framework for auto-generating such relevant, personalized exploratory sessions. Using a novel specification language, users first describe their desired output notebook. Our language contains dedicated constructs for contextually connecting future output queries. These specifications are then used as input for a Deep Reinforcement Learning (DRL) engine, which auto-generates the personalized notebook. Our DRL engine relies on an existing, general-purpose, DRL framework for data exploration. However, augmenting the generic framework with user specifications requires overcoming a difficult sparsity challenge, as only a small portion of the possible sessions may be compliant with the specifications. Inspired by solutions for constrained reinforcement learning, we devise a compound, flexible reward scheme as well as specification-aware neural network architecture. Our experimental evaluation shows that the combination of these components allows ATENA-PRO to consistently generate interesting, personalized exploration sessions for various analysis tasks and datasets.

Original languageEnglish
Title of host publicationSIGMOD 2023 - Companion of the 2023 ACM/SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages167-170
Number of pages4
ISBN (Electronic)9781450395076
DOIs
StatePublished - 4 Jun 2023
Event2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023 - Seattle, United States
Duration: 18 Jun 202323 Jun 2023

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2023 ACM/SIGMOD International Conference on Management of Data, SIGMOD 2023
Country/TerritoryUnited States
CitySeattle
Period18/06/2323/06/23

Bibliographical note

Publisher Copyright:
© 2023 Owner/Author.

Funding

We thank Oz Zafar for his contribution to the implementation and presentation of our system. This work has been partially funded by the Israel Science Foundation grant number 2707/22 and the Binational US-Israel Science Foundation grant number 2018194.

FundersFunder number
Israel Science Foundation2707/22, 2018194

    Keywords

    • AI for data analytics
    • automated data exploration

    Fingerprint

    Dive into the research topics of 'ATENA-PRO: Generating Personalized Exploration Notebooks with Constrained Reinforcement Learning'. Together they form a unique fingerprint.

    Cite this