Abstract
There is an ever-present need for shared memory parallelization schemes to exploit the full potential of multi-core architectures. The most common parallelization API addressing this need today is OpenMP. Nevertheless, writing parallel code manually is complex and effort-intensive. Thus, many deterministic source-to-source (S2S) compilers have emerged, intending to automate the process of translating serial to parallel code. However, recent studies have shown that these compilers are impractical in many scenarios. In this work, we combine the latest advancements in the field of AI and natural language processing (NLP) with the vast amount of open-source code to address the problem of automatic parallelization. Specifically, we propose a novel approach, called OMPify, to detect and predict the OpenMP pragmas and shared-memory attributes in parallel code, given its serial version. OMPify is based on a Transformer-based model that leverages a graph-based representation of source code that exploits the inherent structure of code. We evaluated our tool by predicting the parallelization pragmas and attributes of a large corpus of (over 54,000) snippets of serial code written in C and C++ languages (Open-OMP-Plus). Our results demonstrate that OMPify outperforms existing approaches — the general-purposed and popular ChatGPT and targeted PragFormer models — in terms of F1 score and accuracy. Specifically, OMPify achieves up to 90% accuracy on commonly-used OpenMP benchmark tests such as NAS, SPEC, and PolyBench. Additionally, we performed an ablation study to assess the impact of different model components and present interesting insights derived from the study. Lastly, we also explored the potential of using data augmentation and curriculum learning techniques to improve the model’s robustness and generalization capabilities. The dataset and source code necessary for reproducing our results are available at https://github.com/Scientific-Computing-Lab-NRCN/OMPify.
Original language | English |
---|---|
Title of host publication | OpenMP |
Subtitle of host publication | Advanced Task-Based, Device and Compiler Programming - 19th International Workshop on OpenMP, IWOMP 2023, Proceedings |
Editors | Simon McIntosh-Smith, Tom Deakin, Michael Klemm, Bronis R. de Supinski, Jannis Klinkenberg |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 3-17 |
Number of pages | 15 |
ISBN (Print) | 9783031407437 |
DOIs | |
State | Published - 2023 |
Externally published | Yes |
Event | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 - Bristol, United Kingdom Duration: 13 Sep 2023 → 15 Sep 2023 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14114 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | Proceedings of the 19th International Workshop on OpenMP, IWOMP 2023 |
---|---|
Country/Territory | United Kingdom |
City | Bristol |
Period | 13/09/23 → 15/09/23 |
Bibliographical note
Publisher Copyright:© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Funding
Acknowledgments. This research was supported by the Israeli Council for Higher Education (CHE) via the Data Science Research Center, Ben-Gurion University of the Negev, Israel; Intel Corporation (oneAPI CoE program); and the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [5] and Intel Developer Cloud [26]. The authors thank Re’em Harel, Israel Hen, and Gabi Dadush for their help and support.
Funders | Funder number |
---|---|
Data Science Research Center | |
Intel Developer Cloud | |
Lynn and William Frankel Center for Computer Science | |
Intel Corporation | |
Ben-Gurion University of the Negev | |
Council for Higher Education |
Keywords
- Code Completion
- Code Representations
- NLP
- OpenMP
- S2S Compilers
- Shared Memory Parallelism
- Transformers