TY - JOUR
T1 - PragFormer
T2 - Data-Driven Parallel Source Code Classification with Transformers
AU - Harel, Re’em
AU - Kadosh, Tal
AU - Hasabnis, Niranjan
AU - Mattson, Timothy
AU - Pinter, Yuval
AU - Oren, Gal
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2025/2
Y1 - 2025/2
N2 - Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public. Moreover, an interactive demo of our tool, as well as a Hugging Face webpage to experiment with our tool, are already available.
AB - Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant. Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public. Moreover, an interactive demo of our tool, as well as a Hugging Face webpage to experiment with our tool, are already available.
KW - Artificial intelligence
KW - Parallel programming
KW - Programming assistance
KW - Software development
UR - http://www.scopus.com/inward/record.url?scp=85208114638&partnerID=8YFLogxK
U2 - 10.1007/s10766-024-00778-9
DO - 10.1007/s10766-024-00778-9
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85208114638
SN - 0885-7458
VL - 53
JO - International Journal of Parallel Programming
JF - International Journal of Parallel Programming
IS - 1
M1 - 2
ER -