Skip to main navigation Skip to search Skip to main content

Automating Exploratory Data Analysis via Machine Learning: An Overview

  • Tel Aviv University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

86 Scopus citations

Abstract

Exploratory Data Analysis (EDA) is an important initial step for any knowledge discovery process, in which data scientists interactively explore unfamiliar datasets by issuing a sequence of analysis operations (e.g. filter, aggregation, and visualization). Since EDA is long known as a difficult task, requiring profound analytical skills, experience, and domain knowledge, a plethora of systems have been devised over the last decade in order to facilitate EDA. In particular, advancements in machine learning research have created exciting opportunities, not only for better facilitating EDA, but to fully automate the process. In this tutorial, we review recent lines of work for automating EDA. Starting from recommender systems for suggesting a single exploratory action, going through kNN-based classifiers and active-learning methods for predicting users' interestingness preferences, and finally to fully automating EDA using state-of-the-art methods such as deep reinforcement learning and sequence-to-sequence models. We conclude the tutorial with a discussion on the main challenges and open questions to be dealt with in order to ultimately reduce the manual effort required for EDA.

Original languageEnglish
Title of host publicationSIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PublisherAssociation for Computing Machinery
Pages2617-2622
Number of pages6
ISBN (Electronic)9781450367356
DOIs
StatePublished - 14 Jun 2020
Externally publishedYes
Event2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020 - Portland, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Country/TerritoryUnited States
CityPortland
Period14/06/2019/06/20

Bibliographical note

Publisher Copyright:
© 2020 Association for Computing Machinery.

Funding

Tova Milo is a full professor in the school of computer science in Tel Aviv University, and holds the Chair of Information Management. Her research focuses on large-scale data management applications such as data integration, semi-structured information, Data-centered Business Processes and Crowd-sourcing, studying both theoretical and practical aspects. Tova served as the Program Chair of multiple international conferences, including PODS, VLDB, ICDT, XSym, and WebDB, and as the chair of the PODS Executive Committee. She served as a member of the VLDB Endowment and the PODS and ICDT executive boards and as an editor of TODS, IEEE Data Eng. Bull, and the Logical Methods in Computer Science Journal. Tova has received grants from the Israel Science Foundation, the US-Israel Binational Science Foundation, the Israeli and French Ministry of Science and the European Union. She is an ACM Fellow, a member of Academia Europaea, a recipient of the 2010 ACM PODS Alberto O. Mendelzon Test-of-Time Award, the 2017 VLDB Women in Database Research Award, the 2017 Weizmann award for Exact Sciences Research, and of the prestigious EU ERC Advanced Investigators grant. Amit Somech is a 5th year Ph.D student in Tel Aviv University under the supervision of Professor Tova Milo. His research focuses primarily on the facilitation of interactive data exploration, with the ultimate goal of making it more accessible to non-expert users. Prior to his studies, Amit practiced data science and big data analytics in the industry, and still engages in consulting work in these areas. He has published in multiple international conferences and journals including SIGMOD, VLDB, KDD, ICDE, CIKM and EDBT.

Funders
EU ERC
Israeli and French Ministry of Science
US-Israel Binational Science Foundation
European Commission
Israel Science Foundation

    Keywords

    • EDA
    • data exploration
    • exploratory data analysis

    Fingerprint

    Dive into the research topics of 'Automating Exploratory Data Analysis via Machine Learning: An Overview'. Together they form a unique fingerprint.

    Cite this