Abstract The current static usage model of HPC systems is becoming increasingly inefficient due to the continuously growing complexity of system architectures, combined with the increased usage of coupled applications, the need for strong scaling with extreme scale parallelism, and the increasing reliance on complex and dynamic workflows. Malleability techniques adjust resource usage dynamically for HPC systems and applications to extract maximum efficiency. In this paper, we present FlexMPI, a tool being developed in the ADMIRE project that provides an intelligent global coordination of resource usage at the application level. FlexMPI considers runtime scheduling of computation, network usage, and I/O across all system architecture components. It can optimize the exploitation of HPC and I/O resources while minimizing the makespan of applications in many cases. Furthermore, FlexMPI provides facilities such as application world recomposition to generate a new consistent state when processes are added or removed to the applications, data redistribution to the new application world, and I/O interference detection to migrate congesting processes. We also present an environmental use case co-designed using FlexMPI. The evaluation shows its adaptability and scalability.

Malleability Techniques for HPC Systems

Montella R.
Conceptualization
2023-01-01

Abstract

Abstract The current static usage model of HPC systems is becoming increasingly inefficient due to the continuously growing complexity of system architectures, combined with the increased usage of coupled applications, the need for strong scaling with extreme scale parallelism, and the increasing reliance on complex and dynamic workflows. Malleability techniques adjust resource usage dynamically for HPC systems and applications to extract maximum efficiency. In this paper, we present FlexMPI, a tool being developed in the ADMIRE project that provides an intelligent global coordination of resource usage at the application level. FlexMPI considers runtime scheduling of computation, network usage, and I/O across all system architecture components. It can optimize the exploitation of HPC and I/O resources while minimizing the makespan of applications in many cases. Furthermore, FlexMPI provides facilities such as application world recomposition to generate a new consistent state when processes are added or removed to the applications, data redistribution to the new application world, and I/O interference detection to migrate congesting processes. We also present an environmental use case co-designed using FlexMPI. The evaluation shows its adaptability and scalability.
2023
9783031304446
9783031304453
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11367/134379
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact