DSWorkFlow: A Framework for Capturing Data Scientists' Workflows

Moshe Mash, Stephanie Rosenthal, Reid Simmons

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

While machine learning algorithms continue to improve, their success often relies upon the data scientists' ability to detect patterns, determine useful features and visualizations, select good models, and evaluate and iterate upon results. Data scientists often spend a long time making very little progress as they struggle to determine how to proceed. In this respect, the understanding of data scientists' workflows and challenges has recently attracted a great deal of scholarly interest. However, the literature is mostly based on interviews and qualitative research methodologies. With this in mind, we developed DSWorkFlow, a data collection framework that provides researchers with the ability to observe and analyze data scientists' cognitive workflows as they develop predictive models. Using DSWorkFlow, researchers can collect data from a Jupyter Notebook, to reconstruct the code execution order and extract relevant information about data scientist workflow alongside the concomitant collection of qualitative data. We tested the framework experimentally with seven data scientists as they each created three machine learning models to inform our extraction algorithms.

Original languageEnglish
Title of host publicationExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450380959
DOIs
StatePublished - 8 May 2021
Externally publishedYes
Event2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021 - Virtual, Online, Japan
Duration: 8 May 202113 May 2021

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021
Country/TerritoryJapan
CityVirtual, Online
Period8/05/2113/05/21

Bibliographical note

Publisher Copyright:
© 2021 Owner/Author.

Funding

This paper is based in part upon work funded and supported by the Department of Defense under contract FA8702-15-D-0002. This research was also funded in part by JPMorgan Chase & Co. Any views or opinions expressed herein are solely those of the authors listed, and may difer from the views and opinions expressed by JPMorgan Chase & Co. or its afliates. This material is not a product of the Research Department of J.P. Morgan Securities LLC. This material should not be construed as an individual recommendation for any particular client and is not intended as a recommendation of particular securities, fnancial instruments or strategies for a particular client. This material does not constitute a solicitation or ofer in any jurisdiction.

FundersFunder number
JPMorgan Chase & Co
U.S. Department of DefenseFA8702-15-D-0002

    Keywords

    • data science process
    • workflow analysis
    • workflow extraction

    Fingerprint

    Dive into the research topics of 'DSWorkFlow: A Framework for Capturing Data Scientists' Workflows'. Together they form a unique fingerprint.

    Cite this