MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with Transformers

Nadav Schneider, Tal Kadosh, Niranjan Hasabnis, Timothy Mattson, Yuval Pinter, Gal Oren

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Computational science has made rapid progress in recent years, leading to ever increasing demand for supercomputing resources. For scientific applications that leverage such resources, Message Passing Interface (MPI) plays a crucial role in enabling distributed memory parallelization across multiple nodes. However, parallelizing MPI code manually, and specifically, performing domain decomposition, is a challenging and error-prone task. In this paper, we address this problem by developing MPI-rical, a novel data-driven, programming-assistance tool that assists programmers in writing domain decomposition based distributed memory parallelization code using MPI. Specifically, we leverage Transformer architecture - the invention that led to advancements in the field of natural language processing (NLP) - with a supervised language model to suggest MPI functions and their proper locations in the code on the fly. In addition to the novel model for MPI-based parallel programming, in this paper, we also introduce MPICodeCorpus, the first publicly-available corpus of MPI-based parallel programs that is created by mining more than 15,000 open-source repositories on GitHub. Experimental results demonstrate the effectiveness of MPI-rical on both dataset from MPICodeCorpus and more importantly, on a compiled benchmark of MPI-based parallel programs for numerical computations that represent real-world scientific applications. Specifically, MPI-rical achieves F1 scores between 0.87-0.91 on these programs, demonstrating its accuracy in suggesting correct MPI functions at appropriate code locations. The source code used in this work, as well as other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/MPI-rical.

Original languageEnglish
Title of host publicationProceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PublisherAssociation for Computing Machinery
Pages2-10
Number of pages9
ISBN (Electronic)9798400707858
DOIs
StatePublished - 12 Nov 2023
Externally publishedYes
Event2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023 - Denver, United States
Duration: 12 Nov 202317 Nov 2023

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Country/TerritoryUnited States
CityDenver
Period12/11/2317/11/23

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Funding

This research was supported by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev, Israel; Intel Corporation (oneAPI CoE program); and the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [3] and Intel Developer Cloud [21]. The authors thank Re’em Harel, Israel Hen, and Gabi Dadush for their help and support.

FundersFunder number
Data Science Research Center
Intel Developer Cloud
Lynn and William Frankel Center for Computer Science
Intel Corporation
Ben-Gurion University of the Negev
Council for Higher Education

    Keywords

    • Domain Decomposition
    • LLM
    • MPI
    • MPI-rical
    • MPICodeCorpus
    • SPT-Code
    • Transformer

    Fingerprint

    Dive into the research topics of 'MPI-RICAL: Data-Driven MPI Distributed Parallelism Assistance with Transformers'. Together they form a unique fingerprint.

    Cite this