Debugging CP2K can be a little challenge. So suggestions and techniques here make things easier.
To debug the code, one will need to have an input that reliably triggers the problem. If the bug is really hard (i.e. requires repeatedly running the testcase), finding a sufficiently small testcase is very valuable. See if the bug reproduces with few atoms, lower cutoff, small basis, energy instead of md, … usually understanding what is needed to trigger the bug will help a lot fixing it.
Check if the bug only happens on machine X or with library Y. Buggy libraries are unfortunately very common. Try linking netlib scalapack/blas, gfortran (-O0) compiled binaries on linux as a reference. If you find bugs in the libraries/tools, report them to the vendors, only this way things improve in the long run.
Run CP2K with the ' TRACE' keyword enabled in the '&GLOBAL' section. The additional output gives a good idea where things might go wrong.
A large number of issues can be caught using a debug built with bounds checking. With gfortran, the following flags are useful in the arch file:
FCFLAGS = -O1 -fstrict-aliasing -g -fno-omit-frame-pointer -fno-realloc-lhs \ -fcheck=bounds,do,recursion,pointer -ffree-form $(DFLAGS)
also, link against netlib blas/lapack/scalapack (compiled with the same options).
valgrind is very useful to find additional bugs, most commonly related to undefined variables. Unfortunately, the slowdown caused by valgrind makes this practical only for test cases that run within seconds/minutes. The full regtester is run from time to time under valgrind.
The following arch file works well with valgrind (in particular no libraries that return false positives, needs gfortran >4.8.X to avoid spurious warnings).
CC = cc CPP = FC = gfortran LD = gfortran AR = ar -r CPPFLAGS = DFLAGS = -D__GFORTRAN -D__FFTSG -D__LIBINT -D__FFTW3 -D__LIBINT_MAX_AM=6 -D__LIBDERIV_MAX_AM1=5 -D__LIBXC2 FCFLAGS = -O0 -g -ffree-form $(DFLAGS) LDFLAGS = $(FCFLAGS) -L/data/vjoost/libint_ham/install/lib/ -L/data/vjoost/scalapack/scalapack_installer_1.0.2/install/lib/ -L/data/vjoost/libxc-2.0.1/install/lib LIBS = -lderiv -lint -lstdc++ -lfftw3 -lreflapack -lrefblas -lxc OBJECTS_ARCHITECTURE = machine_gfortran.o
Run valgrind with
valgrind --max-stackframe=2100192 --leak-check=full --track-origins=yes
to get the origin of undefined variables in addition to a leak check report.
valgrind also comes with the 'massif' tool, which can provide detailed information about memory usage. How much is the peak allocated memory, and where do most of these allocations come from?
Rather easy with
valgrind --tool=massif ../../../exe/local_valgrind/cp2k.sopt test.inp ms_print massif.out.XYZ
The valgrind homepage has detailed description of massif.
valgrind can also be used for debugging parallel code, eg.:
mpirun -np 2 -x OMP_NUM_THREADS=2 valgrind --max-stackframe=2100192 --leak-check=full --track-origins=yes cp2k.psmp cp2k.inp
Starting from gcc 4.9.0, good memory leak checking is integrated with gfortran. Compile CP2K with
-fno-omit-frame-pointer -O1 -g -fsanitize=leak
to get detailed memory leak reports. CP2K should be fully clean, however, this also find leaks in libraries such as mpi and scalapack. It is possible to write suppression files that require an export like
export LSAN_OPTIONS=suppressions=suppr.txt
for the format of the file see LeakSanitizer docs.
Unfortunately is the GNU Fortran compiler not on the same level concerning warnings as its C/C++ counterparts. Especially the -Wunitialized
which is part of -Wall
may give spurious warnings of the following kind when building together with -O1
(or greater):
attention : ‘arr.offset’ may be used uninitialized in this function [-Wuninitialized] attention : ‘arr.dim[1].stride’ may be used uninitialized in this function [-Wuninitialized] attention : ‘arr.dim[0].ubound’ may be used uninitialized in this function [-Wuninitialized]
This is tracked at GNU/gfortran upstream here Bug 66459 and Bug 60500