: Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.
GPU Acceleration of Large-Scale Full-Frequency GW Calculations / Yu, Victor Wen-Zhe; Govoni, Marco. - In: JOURNAL OF CHEMICAL THEORY AND COMPUTATION. - ISSN 1549-9626. - 18:8(2022), pp. 4690-4707. [10.1021/acs.jctc.2c00241]
GPU Acceleration of Large-Scale Full-Frequency GW Calculations
Govoni, Marco
2022
Abstract
: Many-body perturbation theory is a powerful method to simulate electronic excitations in molecules and materials starting from the output of density functional theory calculations. By implementing the theory efficiently so as to run at scale on the latest leadership high-performance computing systems it is possible to extend the scope of GW calculations. We present a GPU acceleration study of the full-frequency GW method as implemented in the WEST code. Excellent performance is achieved through the use of (i) optimized GPU libraries, e.g., cuFFT and cuBLAS, (ii) a hierarchical parallelization strategy that minimizes CPU-CPU, CPU-GPU, and GPU-GPU data transfer operations, (iii) nonblocking MPI communications that overlap with GPU computations, and (iv) mixed precision in selected portions of the code. A series of performance benchmarks has been carried out on leadership high-performance computing systems, showing a substantial speedup of the GPU-accelerated version of WEST with respect to its CPU version. Good strong and weak scaling is demonstrated using up to 25 920 GPUs. Finally, we showcase the capability of the GPU version of WEST for large-scale, full-frequency GW calculations of realistic systems, e.g., a nanostructure, an interface, and a defect, comprising up to 10 368 valence electrons.File | Dimensione | Formato | |
---|---|---|---|
2203.05623v3.pdf
Open access
Tipologia:
AAM - Versione dell'autore revisionata e accettata per la pubblicazione
Dimensione
4.64 MB
Formato
Adobe PDF
|
4.64 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris