howto:libcusmm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
howto:libcusmm [2014/03/28 14:28] – oschuett | howto:libcusmm [2019/04/09 12:45] (current) – removed alazzaro | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Howto Optimize Cuda Kernels for Libcusmm ====== | ||
- | === Step 1: Go to the directory libcusmm directory === | ||
- | < | ||
- | cdCP2K_ROOT/ | ||
- | </ | ||
- | === Step 2: Run the script tune.py === | ||
- | The script takes as arguments the blocksizes you want to add to libcusmm. For example, if your system contains blocks of size 5 and 8 type: | ||
- | < | ||
- | $ ./tune.py 5 8 | ||
- | Found 23 parameter sets for 5x5x5 | ||
- | Found 31 parameter sets for 5x5x8 | ||
- | Found 107 parameter sets for 5x8x5 | ||
- | Found 171 parameter sets for 5x8x8 | ||
- | Found 75 parameter sets for 8x5x5 | ||
- | Found 107 parameter sets for 8x5x8 | ||
- | Found 248 parameter sets for 8x8x5 | ||
- | Found 424 parameter sets for 8x8x8 | ||
- | </ | ||
- | |||
- | The script will create a directory for each combination of the blocksizes: | ||
- | < | ||
- | $ ls -d tune_* | ||
- | tune_5x5x5 | ||
- | </ | ||
- | |||
- | Each directory contains a number of files: | ||
- | < | ||
- | $ ls -1 tune_8x8x8/ | ||
- | Makefile | ||
- | tune_8x8x8_exe0_main.cu | ||
- | tune_8x8x8_exe0_part0.cu | ||
- | tune_8x8x8_exe0_part1.cu | ||
- | tune_8x8x8_exe0_part2.cu | ||
- | tune_8x8x8_exe0_part3.cu | ||
- | tune_8x8x8_exe0_part4.cu | ||
- | tune_8x8x8.job | ||
- | </ | ||
- | For each possible parameter set a // | ||
- | |||
- | In order to parallelize the compilation and the benchmarking the launchers are distributed over several files. | ||
- | Currently, up to 10000 launchers are compiled into one // | ||
- | |||
- | === Step 3: Submit Jobs === | ||
- | Each tune-directory contains a job file. | ||
- | Since, there might be many tune-directories the convince script '' | ||
- | |||
- | When '' | ||
- | < | ||
- | $ ./ | ||
- | tune_5x5x5: Would submit, run with " | ||
- | tune_5x5x8: Would submit, run with " | ||
- | tune_5x8x5: Would submit, run with " | ||
- | tune_5x8x8: Would submit, run with " | ||
- | tune_8x5x5: Would submit, run with " | ||
- | tune_8x5x8: Would submit, run with " | ||
- | tune_8x8x5: Would submit, run with " | ||
- | tune_8x8x8: Would submit, run with " | ||
- | Number of jobs submitted: 8 | ||
- | </ | ||
- | |||
- | Only when '' | ||
- | < | ||
- | $ ./submit.py doit! | ||
- | tune_5x5x5: Submitting | ||
- | Submitted batch job 277987 | ||
- | tune_5x5x8: Submitting | ||
- | Submitted batch job 277988 | ||
- | tune_5x8x5: Submitting | ||
- | Submitted batch job 277989 | ||
- | tune_5x8x8: Submitting | ||
- | Submitted batch job 277990 | ||
- | tune_8x5x5: Submitting | ||
- | Submitted batch job 277991 | ||
- | tune_8x5x8: Submitting | ||
- | Submitted batch job 277992 | ||
- | tune_8x8x5: Submitting | ||
- | Submitted batch job 277993 | ||
- | tune_8x8x8: Submitting | ||
- | Submitted batch job 277994 | ||
- | Number of jobs submitted: 8 | ||
- | </ | ||
- | |||
- | === Step 4: Collect Results === | ||
- | Run '' | ||
- | < | ||
- | $ ./ | ||
- | Reading: tune_5x5x5/ | ||
- | Reading: tune_5x5x8/ | ||
- | Reading: tune_5x8x5/ | ||
- | Reading: tune_5x8x8/ | ||
- | Reading: tune_8x5x5/ | ||
- | Reading: tune_8x5x8/ | ||
- | Reading: tune_8x8x5/ | ||
- | Reading: tune_8x8x8/ | ||
- | Kernel_dnt_tiny(m=5, | ||
- | Kernel_dnt_tiny(m=5, | ||
- | Kernel_dnt_medium(m=5, | ||
- | Kernel_dnt_tiny(m=5, | ||
- | Kernel_dnt_medium(m=8, | ||
- | Kernel_dnt_medium(m=8, | ||
- | Kernel_dnt_tiny(m=8, | ||
- | Kernel_dnt_tiny(m=8, | ||
- | </ |
howto/libcusmm.1396016895.txt.gz · Last modified: 2020/08/21 10:15 (external edit)