Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing

Yehonatan Fridman, Yaniv Snir, Matan Rusanovsky, Kfir Zvi, Harel Levin, Danny Hendler, Hagit Attiya, Gal Oren

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

As the High Performance Computing (HPC) world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional stor-age/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive (HDD) or an SSD storage device. These standard block devices offer large and relatively cheap non-volatile storage, but their access times are orders-of-magnitude slower than those of DRAM. Optane™ DataCenter Persistent Memory Module (DCPMM) [1], a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots (DDR DIMMs) and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on keeping the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can also be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing annlications. The source code used by this work, as well as the benchmarks and other relevant sources, are available at: https://github.com/Scientific-Computing-Lab-NRCN/StoringStorage.

Original languageEnglish
Title of host publicationProceedings of FTXS 2021
Subtitle of host publicationWorkshop on Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages11-20
Number of pages10
ISBN (Electronic)9781665420594
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2021 - St. Louis, United States
Duration: 14 Nov 2021 → …

Publication series

NameProceedings of FTXS 2021: Workshop on Fault Tolerance for HPC at eXtreme Scale, Held in conjunction with SC 2021: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2021 Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS 2021
Country/TerritoryUnited States
CitySt. Louis
Period14/11/21 → …

Bibliographical note

Publisher Copyright:
© 2021 IEEE.

Keywords

  • DAOS
  • DMTCP
  • FIO
  • NAS Parallel Benchmark
  • NOVA
  • Non-Volatile RAM
  • Optane™ DCPMM
  • PMFS
  • PolyBench
  • SCR
  • SplitFS
  • ext4-dax
  • xfs

Fingerprint

Dive into the research topics of 'Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing'. Together they form a unique fingerprint.

Cite this