dev:profiling
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
profiling [2013/10/15 15:48] – add cuda profiling info vondele | dev:profiling [2020/08/21 10:15] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 70: | Line 70: | ||
- TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | - TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | ||
- | Note that, for the threaded code, only the master thread is instrumented. | + | By default, only routines contributing up to 2% of the total runtime are included in the timing report. |
+ | Note that, for the threaded code, only the master thread is instrumented. | ||
==== Modifying the timing report ==== | ==== Modifying the timing report ==== | ||
Line 227: | Line 228: | ||
export PMI_NO_FORK=1 | export PMI_NO_FORK=1 | ||
# no cuda proxy | # no cuda proxy | ||
- | # export | + | # export |
# use all cores with OMP | # use all cores with OMP | ||
export OMP_NUM_THREADS=8 | export OMP_NUM_THREADS=8 | ||
# use aprun in MPMD mode to have only the output from the master rank (here 169 nodes are used) | # use aprun in MPMD mode to have only the output from the master rank (here 169 nodes are used) | ||
- | aprun -N 1 -n 1 -d ${OMP_NUM_THREADS} | + | COMMAND=" |
+ | PART1=" | ||
+ | PART2=" | ||
+ | aprun ${PART1} : ${PART2} | ||
</ | </ | ||
dev/profiling.1381852121.txt.gz · Last modified: 2020/08/21 10:14 (external edit)