dev:profiling
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
profiling [2013/10/16 09:27] – [nvprof] oschuett | dev:profiling [2020/08/21 10:15] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 70: | Line 70: | ||
- TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | - TOTAL TIME: How much time is spent in this subroutine, including time spent in timed subroutines. AVERAGE and MAXIMUM as defined above | ||
- | Note that, for the threaded code, only the master thread is instrumented. | + | By default, only routines contributing up to 2% of the total runtime are included in the timing report. |
+ | Note that, for the threaded code, only the master thread is instrumented. | ||
==== Modifying the timing report ==== | ==== Modifying the timing report ==== | ||
Line 227: | Line 228: | ||
export PMI_NO_FORK=1 | export PMI_NO_FORK=1 | ||
# no cuda proxy | # no cuda proxy | ||
- | # export | + | # export |
# use all cores with OMP | # use all cores with OMP | ||
export OMP_NUM_THREADS=8 | export OMP_NUM_THREADS=8 | ||
# use aprun in MPMD mode to have only the output from the master rank (here 169 nodes are used) | # use aprun in MPMD mode to have only the output from the master rank (here 169 nodes are used) | ||
- | COMMAND="/ | + | COMMAND=" |
PART1=" | PART1=" | ||
PART2=" | PART2=" |
dev/profiling.1381915673.txt.gz · Last modified: 2020/08/21 10:14 (external edit)