While machine learning algorithms continue to improve, their success often relies upon the data scientists' ability to detect patterns, determine useful features and visualizations, select good models, and evaluate and iterate upon results. Data scientists often spend a long time making very little progress as they struggle to determine how to proceed. In this respect, the understanding of data scientists' workflows and challenges has recently attracted a great deal of scholarly interest. However, the literature is mostly based on interviews and qualitative research methodologies. With this in mind, we developed DSWorkFlow, a data collection framework that provides researchers with the ability to observe and analyze data scientists' cognitive workflows as they develop predictive models. Using DSWorkFlow, researchers can collect data from a Jupyter Notebook, to reconstruct the code execution order and extract relevant information about data scientist workflow alongside the concomitant collection of qualitative data. We tested the framework experimentally with seven data scientists as they each created three machine learning models to inform our extraction algorithms.
|Title of host publication||Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021|
|Publisher||Association for Computing Machinery|
|State||Published - 8 May 2021|
|Event||2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021 - Virtual, Online, Japan|
Duration: 8 May 2021 → 13 May 2021
|Name||Conference on Human Factors in Computing Systems - Proceedings|
|Conference||2021 CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths, CHI EA 2021|
|Period||8/05/21 → 13/05/21|
Bibliographical noteFunding Information:
This paper is based in part upon work funded and supported by the Department of Defense under contract FA8702-15-D-0002. This research was also funded in part by JPMorgan Chase & Co. Any views or opinions expressed herein are solely those of the authors listed, and may difer from the views and opinions expressed by JPMorgan Chase & Co. or its afliates. This material is not a product of the Research Department of J.P. Morgan Securities LLC. This material should not be construed as an individual recommendation for any particular client and is not intended as a recommendation of particular securities, fnancial instruments or strategies for a particular client. This material does not constitute a solicitation or ofer in any jurisdiction.
© 2021 Owner/Author.
- data science process
- workflow analysis
- workflow extraction