: Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.

GPU Acceleration of Large-Scale Full-Frequency GW Calculations / Yu, Victor Wen-Zhe; Govoni, Marco. - In: JOURNAL OF CHEMICAL THEORY AND COMPUTATION. - ISSN 1549-9626. - 18:8(2022), pp. 4690-4707. [10.1021/acs.jctc.2c00241]

GPU Acceleration of Large-Scale Full-Frequency GW Calculations

Govoni, Marco
2022

Abstract

: Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.
2022
18
8
4690
4707
GPU Acceleration of Large-Scale Full-Frequency GW Calculations / Yu, Victor Wen-Zhe; Govoni, Marco. - In: JOURNAL OF CHEMICAL THEORY AND COMPUTATION. - ISSN 1549-9626. - 18:8(2022), pp. 4690-4707. [10.1021/acs.jctc.2c00241]
Yu, Victor Wen-Zhe; Govoni, Marco
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Licenza Creative Commons
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11380/1295287
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 16
social impact