howto:compile_with_cuda
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
howto:compile_with_cuda [2019/04/09 09:14] – alazzaro | howto:compile_with_cuda [2020/08/21 10:15] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 4: | Line 4: | ||
* Anything that uses '' | * Anything that uses '' | ||
* FFTs, when compiled with '' | * FFTs, when compiled with '' | ||
- | * If linked against an accelerated scalapack/ | + | * If linked against an accelerated scalapack/ |
To enable all CUDA acceleration options the following lines have to be added to the ARCH-file: | To enable all CUDA acceleration options the following lines have to be added to the ARCH-file: | ||
Line 10: | Line 10: | ||
NVCC = / | NVCC = / | ||
DFLAGS += -D__ACC -D__DBCSR_ACC -D__PW_CUDA | DFLAGS += -D__ACC -D__DBCSR_ACC -D__PW_CUDA | ||
- | LIBS += -lcudart -lcublas -lcufft -lrt | + | LIBS += -lcudart -lcublas -lcufft -lnvrtc |
</ | </ | ||
- | See [[https:// | + | See [[https:// |
As a prerequisite the [[https:// | As a prerequisite the [[https:// | ||
===== Libcusmm ===== | ===== Libcusmm ===== | ||
- | The acceleration of DBCSR is performed by libcusmm. This library provides a number of kernels. Each of these kernels can multiply blocks of specific blocksizes. The blocksizes of a simulation are determined by the employed basis-set. As of DBCSR 1.0, by default libcusmm is complied with about 200 common kernels. However, if an exotic basis set is used the particular blocksizes might be missing. This can be seen from the //DBCSR Statistics// | + | The acceleration of DBCSR is performed by libcusmm. This library provides a number of kernels. Each of these kernels can multiply blocks of specific blocksizes. The blocksizes of a simulation are determined by the employed basis-set. As of DBCSR 1.1, by default libcusmm is able to generate any kernel for {m,n,k}≤80, see [[ https:// |
- | In the following example the kernel for 13x13x15 was missing: | ||
< | < | ||
| | ||
Line 32: | Line 31: | ||
| | ||
| | ||
- | | ||
| | ||
... | ... | ||
Line 40: | Line 38: | ||
</ | </ | ||
- | + | More supported GPUs can be added, please refer to [[https:// | |
- | There are over 2300 readily optimized kernel-parameters available in [[src>src/dbcsr/libsmm_acc/libcusmm/]]. | + | |
- | If the desired kernel is already listed in one of the '' | + | |
===== Profiling ===== | ===== Profiling ===== | ||
- | If you are interested in profiling CP2K with nvprof have a look at [[dev: | + | If you are interested in profiling CP2K with nvprof have a look at [[dev: |
howto/compile_with_cuda.1554801287.txt.gz · Last modified: 2020/08/21 10:15 (external edit)